Evaluating Corporate Training: Models and Issues

Evaluating Corporate Training:
Models and Issues

Evaluation in Education and
Human Services
Editors:
George F. Madaus, Boston College,
Chestnut Hill, Massachusetts, U.S.A.
Daniel L. Stufflebeam, Western Michigan
University, Kalamazoo, Michigan, U.S.A.
Other books in the series:
Smith,M.:
Evaluability Assessment
Ayers,1. and Berney, M.:
A Practical Guide to Teacher Education Evaluation
Hambleton, Rand Zaal, I.:
Advances in Educational and Psychological Testing
Gifford, B. and O'Connor, M.:
Changing Assessments
Gifford, B.:
Policy Perspectives on Educational Testing
Basarab, D. and Root, D.:
The Training Evaluation Process
Haney, W.M., Madaus, G.F. and Lyons, R.:
The Fractured Marketplace for Standardized Testing
Wing, L.C. and Gifford, B.:
Policy Issues in Employment Testing
Gable, RE.:
Instrument Development in the Affective Domain (2nd Edition)
Kremer-Hayon, L.:
Teacher Self-Evaluation
Payne, David A.:
Designing Educational Project and Program Evaluations
Oakland T. and Hambleton, R:
International Perspectives on Academic Assessment
Nettles, M.T. and Nettles, A.L.:
Equity and Excellence in Educational Testing and Assessment
Shinkfield, A.I. and Stufflebeam, D.L.:
Teacher Evaluation: Guide to Effective Practice
Birenbaum, M. and Dochy, Filip 1.R.C.:
Alternatives in Assessment ofAchievements. Learning
Processes and Prior Knowledge
Mulder, M., Nijhof, W.I., Brinkerhoff, RO.:
Corporate Trainingfor Effective Performance
Britton, E.D. and Raizen, S.A.:
Examining the Examinations
Candoli, c., Cullen, K. and Stufflebeam, D.:
Superintendent Performance Evaluation
Evaluating Corporate Training:
Models and Issues
edited by
Stephen M. Brown
Sacred Heart University
and
Constance J. Seidner
Digital Equipment Corporation
SPRINGER SCIENCE+BUSINESS MEDIA, LLC

Library of Congress Cataloging-in-Publication Data
Evaluating corporate training / edited by Stephen M. Brown and
Constance J. Seidner.
p. cm.
IncIudes bibliographical references and index.
ISBN 978-94-010-6031-8 ISBN 978-94-011-4850-4 (eBook)
DOI 10.1007/978-94-011-4850-4
1. Employees--Training of--Evaluation. 2. Personnel management-
- Research--Methodology. I. Brown, Stephen M. (Stephen Michael),
1950- . 11. Seidner, Constance J.
HF5549.5.T7E883 1997
973'.0496073' 0082--dc21 97-36963
CIP
Copyright © 1998 by Springer Science+Business Media New York

Originally published by Kluwer Academic Publishers, New York in 1998
Softcover reprint of the hardcover 1st edition 1998
All rights reserved. No part of this publication may be reproduced, stored in a

retrieval system or transmitted in any form or by any means, mechanical, photo-
copying, recording, or otherwise, without the prior written perm iss ion of the
publisher, Springer Science+Business Media, LLC.
Printed on acid-free paper.

Contents
Acknowledgments vii
Preface ix
Section I-The Context of Evaluation 1
1 The Changing Context of Practice 3
Stephen M. Brown
2 Organizational Strategy and Training Evaluation 19
Carol Ann Moore and Constance J. Seidner
3 What Stakeholders Want to Know 41
Oliver W. Cummings
4 The Learning Organization: Implications for Training 63
Barry Sugarman
Section II-Models of Evaluation 93
5 The Four Levels of Evaluation 95
Donald L. Kirkpatrick
6 Level 4 and Beyond: An ROI Model 113
Jack J. Phillips
7 Clarifying and Directing Impact Evaluation 141
Robert O. Brinkerhoff
8 Formative Evaluation 167
Wilbur Parrott
vi
9 Assessing Employee Competencies 183
Susan Ennis
10 The Organizational Action Research Model 209

Mort Elfenbein, Stephen M. Brown and Kim Knight
Section III-Issues in Evaluation 235

11 The Ethics of Evaluating Training 237
Patricia A. Lawler
12 Cultural Dimensions of Evaluation 257

Sadie Burton-Goss and Michael Kaska
13 Impact of Technology on Training Evaluation 279

Hallie Ephron Touger
14 Design Team Performance: Metrics and the Impact of
Technology 297
Larry Leifer
15 Using Quality to Drive Evaluation: A Case Study 321
David J. Basarab
16 Certification: Business and Legal Issues 337

Ernest S. Kahane and William McCook
17 Lessons from Education 359

Jean Moon
Subject Index 379

Acknowledgments
We would like to express our appreciation and thanks to a number of people

who helped make this book possible. First and foremost we thank our authors
for sharing their considerable expertise and insights. We appreciate the
professionalism, responsiveness, and commitment of time and energy they have
demonstrated throughout the process.
Zachary Rolnik and Suzanne Rumsey of Kluwer Academic Publishers were

helpful and supportive. Laura Gerhard and Susan Griffith from Lesley College,
and Dr. Howard H. Brown from Bradford College provided much appreciated
assistance. We also thank George F. Madaus and Daniel L. Stufflebeam, editors
of the Evaluation in Education and Human Services series for the opportunity
to edit this book.
Finally, we want to thank our families, Robert, David, and Glen Seidner and
Kathy, Jonathan, and Jared Brown for their support, patience, and
understanding throughout this endeavor.
Preface
We are glad to have the opportunity to work together again in the planning and
preparation of this edited volume on the evaluation of corporate training. Our
respective professional careers have provided us with experience in this area,
both as practitioners and as academicians. It is from both of these perspectives
that we approached the preparation of this volume. Our purpose is to provide
training professionals in business and industry, and students of human
resources development with an overview of current models and issues in
educational evaluation.
The book is organized around three themes: context, models, and issues.
The chapters in the context section are intended to provide the reader with an
understanding of the social, organizational, and interpersonal factors that
provide background and give meaning to evaluation practice. The models
section brings together contributions from some of the most influential thinkers
and practitioners in the field. The chapters in this section provide perspective
on the dominant themes and emergent trends from individuals who have been,
and continue to be, the drivers of those trends. Contributions to the issues
section highlight some pervasive themes as well as illuminate new areas of
concern and interest that will affect how we assess learning interventions in the
organizations of today and tomorrow.
We believe that evaluation is a tool to improve the practice of training and

institutionalization of learning in business and industry. We hope this book will
x
provide students of educational evaluation, practitioners in training functions,

their managers, and other HRD professionals with insights that will advance
the profession and the clients it serves.
SECTION I-THE CONTEXT
OF EVALUATION
The fIrst section of the book is designed to describe the context within which
corporate training programs are evaluated. It presents points of view on the
external environment, the corporate setting, the clients and users of training,
and an organizational learning model to which many corporations currently
aspire. Through these chapters, the reader can develop an understanding of
some of the contextual factors that affect the evaluation of corporate training.
The fIrst chapter, written by Stephen M. Brown, describes the rapid and
signifIcant changes currently happening in the external environment. Dr.
Brown documents these changes and provides an analysis of their ramifIcations
to training evaluation. He then outlines the types of evaluation needed in the
new context.
The role training evaluation can play in organizational strategy, and

particularly a strategy of change, is addressed by Carol Ann Moore and
Constance Seidner in their chapter, Organizational Strategy and Training
Evaluation. They incorporate the thinking of David Norton and Chris Argyris
in their analysis of how training evaluation can be a "strategic lever" in driving
organizational change.
Oliver Cummings represents the stakeholder's view of training evaluation.

Dr. Cummings presents a service management model that provides a
framework for understanding the multiple concerns of the evaluator's clients,
and how client needs can be addressed within the context of sound business
practice and professional evaluation standards.
2 The Context of Evaluation
Barry Sugarman provides a thoughtful introduction to learning

organizations. He not only describes the concept of a learning organization but
also ties the concept to additional sources in the literature. The Learning
Organization as a model is provided in the Context section because learning
organizations address and illustrate so much of the change in our
organizations.
The chapters in this section work together to create a picture of the changing
context within which training evaluation is practiced. The ideas presented in
the chapters provide multiple lenses with which to view the evolving corporate
landscape, and as such can inform the theory and practice of evaluation.
1 THE CHANGING CONTEXT OF
PRACTICE
Stephen M. Brown
We are living in a time of incredibly rapid, all pervasive, and continuous

change. Some writers have suggested we are witnessing a paradigm shift,
entering a new age: the post-industrial or the information age. While we are not
able to fully describe what the new age will look like, we are able to see side-
by-side, glimpses of the old and new paradigms. We are aware that old ways of
doing business and thinking no longer work. Even the old questions have no
meaning in the new context. Concepts that were opposites are often no longer
opposite and are now related. For example, Republicans and Democrats work
feverishly to point out their differences, but they sound more and more alike.
They sound like politicians from the old paradigm. The opposite of feminine is
not masculine; it is not feminine. These two constructs, once seen as polar, are
no longer so.
Our awareness of the persuasiveness of change is exhibited in our attempt to

change and recalibrate our social institutions. We are reinventing government,
reforming education, restructuring organizations, and reengineering businesses.
All have in common the awareness that our social institutions are facing new
challenges, and old answers do not work. In fact, the questions have changed.
Unfortunately, our inability to see fully the new paradigm has often lead us to
find comfort in our old answers. We are debating the answers of the present
within the context of the past. For we know these things, and we know they
worked and were comfortable in a different time.
4 The Changing Context of Practice
The demands on all professionals are more complex, ever changing, and
allow less room for error than ever before. They are asked to develop solutions
to a new set of questions, using new information, in increasingly rapid time
frames. Professionals do this in a context that is changing and is understood
only in the incomplete state in which it has revealed itself. Wheatley (1991)
uses the analogy of the change in physics when the prevalent theory changed
from a Newtonian model to a model based on quantum physics. This level of
change, in which our entire frame of reference is changed, is analogous to what
is happening in our current practice. This change of the frame of reference
makes all of our explanations and understanding profoundly limited.
It is the understanding of context that has always been the strength of

successful practitioners. The understanding of context constitutes a major
portion of the implicit knowledge that writers such as Schon (1983) attribute to
effective practitioners. Context is not only a systematic part of our
understanding, it is the background to our actions as practitioners. It is part of
the frame of reference through which we practice and through which we
construct meaning about our practice. Wenger (1997) says, "Practice connotes
doing, but not just doing of itself. It is doing in a historical and social context
that gives structure and meaning to what we do."
We are currently living in a world that is rapidly and thoroughly changing.

This chapter will now take a brief look at environmental forces that are driving
the change. These forces are not mutually exclusive but are overlapping and
interdependent.
Demographics
Who America is as a people has changed and continues to change. This means
our customers (clients) and workers are changing. This, of course, has profound
implications for our organizations and work. The demographic changes, which
are mirrored in the other advanced industrial nations, are: aging of the
population and the workforce, fewer entry-level employees, and a greater
diversity in the workforce. Another dimension to this change results from the
internationalization of the economy. This has meant that our workforce,
customers, and organizations are by their very nature multicultural and
multinational. At a minimum our competition is, too.
Evaluating Corporate Training: Models and Issues 5
The people born during the post-World War II baby boom make up an
extraordinary percentage of the population (about one-third of the total
population and an even higher percentage of the current workforce.) They are
aging. They are also immediately followed by a generation that is relatively
small. This is a result of lowered fertility rates among the Baby Boomers,
particularly among white suburban dwellers. The interaction of these factors
and the considerably longer life spans as a result of the advancement of medical
technology has resulted in an aging population, an aging workforce, and fewer
first-time entrants into the workforce. The median age of the American
workforce was 28 years old in 1970; by 1990 it had risen to 33 years; it will
continue to rise to 39 years in 2010, and peak at 42 in the year 2040. (U.S.
Senate Select Committee on Aging, 1988) It is noteworthy that fIrst-time
entrants to the workforce have traditionally been the source of newly educated,
relatively cheap, and energetic labor.
The lowered fertility rate of white suburban baby boomers, the relatively
higher fertility rates of African Americans and Hispanics, and relaxed
immigration laws have combined to make our population and workforce much
more diverse than ever before. The higher participation rate of women in the
workforce has given this diversity an additional dimension. Attempts to
understand the ramifIcations of this diversity are hampered because we view it
through the lens of the old context. So, industrial America attempts to
understand the ramifIcations of organizations in which white males are no
longer the majority. The real news is that no group is the majority, and the
population has become so diverse that there is no common defInition of
minority. Our workforce has obvious differences of race, gender, language,
country of origin, and culture. Very few universal assumptions about the values,
experiences, and motivation of our workforce can be made.
Employees from the new generation entering the workforce have

fundamental different skills and values from the Baby Boomer generation.
While generations have always had differences, the differences have been
exaggerated by the rapidity of change. That is, the new generation has grown
up in a world that is fundamentally and profoundly different from the preceding
generation, and it has had extremely different experiences. For example, the
new generation is the fIrst to have grown up with computer technology around
them. They do not translate a pre-computer world experience to the computer
as the preceding generation has had to do. The issues of computerization,
computer integration, and computer literacy are not their issues.
These changes have resulted in a workforce that is more diverse, and may
even further extend the popular meaning of the word. The workforce is aging
and more experienced, but the experience is in a world that is fading from
view. Our workforce, customers, and competition are often separated by
geography, culture, and language. The most basic assumptions about our
workforce, organizations, and practice need to be examined as they often are
predicated on a more homogeneous population. Even our theories of learning,
upon which most of the practice of training professionals is based, are relatively
untested with our new definition of population.
Information Explosion
A second environmental force that is driving many other changes is an

information explosion. Change is not only all pervasive but is occurring at an
unprecedented pace. We are doubling our knowledge about every 36 months, a
process that took thousands of years during prehistoric times and hundreds of
years only a few generations ago. This process continues to accelerate. People
who we think of as founders of fields, such as Sigmund Freud and Albert
Einstein are just a few generations removed from us. DNA and the microchip
were all discovered during the lifetime of the typical Baby Boomer. Likewise,
90 percent of all works in the Library of Congress have been published since
1950.
Many of these changes are being fueled by rapid advances in computer and
communications technology. These advances are unprecedented in their
constancy, rapidity, and implication. They have resulted in computer
technology becoming small and cheap enough to be available to the majority of
workers. In 1993, 45.8 percent (Bassi, Benson & Cheney, 1997, p. 33) of
workers reported that they use computers on their jobs, and the number
continues to rise. Advances in communications technology make the global
economy possible.
Availability of information technology to employees in every level in the

organization makes the flattening of organizations possible. This can also lead
to moving decision-making down the organization and lessening power
differences, which are often based on access to information. Changes in work
processes, which have been supported by advances in technology, have
increased the output per employee and support new organization structures.
Technological advances have also supported improvements in customer service.
While the availability of massive amounts of information presents incredible

possibilities, it can make the simplest of tasks very COOlplicated and time
consuming. Every decision can involve endless amounts of information, and
eventually still be based on incomplete data. Standard answers are open to
questions. The shelf life of knowledge and products is extremely short.
Knowledge has become a competitive advantage and the fuel of the new
economy. Learning and applying new knowledge are important tasks in all
organizations.
Economic Shifts
Our economy is global. Our customers and competition are international. This
has forced us to meet international standards of quality, customer service, and
value. The new customer has high expectations. Fast moving communications
technology and international competition means an innovation will soon be
produced better, faster, or cheaper. Knowledge creation and application are the
competitive advantage of the new economy.
Our organizations have become flatter, less hierarchical, and more customer
driven. The training organization often has less full-time personnel to perform
more tasks. The American Society for Training and Development's (ASTD)
Benchmarking Forum reports that the number of employees per training staff
member increased 10 percent from 1994 to 1995 (Bassi, Benson & Cheney,
1997, p. 3). The new structure has internal staff performing tasks that are part
of the organization's core capabilities, and contracting for the others. The
training organization will increasingly become a networked or virtual
organization with the ability to access incredible expertise, but it will have
fewer full-time, permanent employees.
Managing this type of organization takes a different set of skills. These

changes have created new roles, less security, and greater demands for
knowledge and output on all workers. Working with employees to realize these
changes and organizing ourselves within the new framework are important and
immediate tasks for the training organization.
In general, the focus of training is moving from the individual to the

organization. This manifests itself in various ways. We now speak of
organizational learning. We tend to be interested in the organizational impact
of training, not individual learning. Training solutions have been replaced with
performance interventions that are measured by their organizational impact.
Less and less our language describes organizations in terms of the individuals
that perform the roles. More often we describe organizations with language
from computer technology: networks, software, systems, generation, and
output.
Training
Training will have to adjust to new roles and expectations in organizations. The
organizations have changed. The workforce has changed. The context of
practice has changed. The customer and products have changed. Thus, training
will have to change to be effective. There is a demand for justification of
training expenditures and initiatives. This has led to traditional classroom
training becoming an intervention of secondary resort. Importantly, it has also
led to the need to demonstrate training activities' impact on strategic
initiatives, core organizational capabilities, organizational effectiveness, and
the bottom line.
The most important and fundamental change will be a shift from training to
learning. Filling seats and hours will not be our task. However, leading people
through the changes and helping them adapt to new ways of doing business will
be.
We will have new roles and relationships within the organization.

Environmental influences drive our need to change. Our training functions
must be designed to help the organization survive these times of uncertainty
and realize their vision of the future. This requires
• Visualizing the strategy of the organization
• Assessing the current state of the workforce
• Developing a strategy to develop the competence needed to meet the

strategy
• Implementing the strategy
• Institutionalizing learning
• Evaluating the process
This often occurs in a context in which the required tasks have increased,
the core staff who perform these tasks has decreased, and the workforce is
enveloped by change.
Training Expectations
The organizational expectations for training have also shifted dramatically. The
most pronounced change is a new and vigorous justification of the cost of
training based on return or organizational impact. This is being driven by the
competitive nature of cost structures of the international economy and the
resulting organizational structure, which is flat, thinner, and supports very few
administrative costs.
Often training professionals are being asked to do more, because they have a
more important role in the strategy of the organization. The ability to generate
and apply knowledge is a competitive advantage and source of new products,
services, and revenue. These changes and the changing organizational context
have created new roles for trainers. Some of these roles are
• Business Unit Manager and manager of a virtual organization with

permeable boundaries and vendors to be managed and evaluated
• Facilitator of Change and consultant to change managers on managing
change
• Learning Specialist who consult to organizational business partners
• Performance Enhancers who work to enhance individual employee
performance through intervention strategies that include, but are not
limited to, classroom training
• Manger of Information Resources
• Facilitator of Learning Organization and proponent of systems thinking
• Diversity Advocate, facilitator of mUltiple voices, conflict resolver, and

defender of the unpopular opinion
• Intervention Consultant and Intervention Evaluator
• Assessment and Evaluation Expert
Conceptually, training professionals have a different relationship with

knowledge (information) and a new definition of the customer. The information
explosion and technological advances have created an incredible, rapidly
growing body of knowledge available to more and more people. No longer can
trainers stay experts in multiple fields. Trainers no longer hold the privileged
position of all knOwing content expert. It is quite common that within groups
being trained there are individuals with more depth of knowledge, more
experience, or more time to access current knowledge. This has moved training
professionals from being content exPerts who bring ''the info" into a training
room to being facilitators of learning and guides to available knowledge.
Trainers no longer own knowledge but synthesize and provide resources to
clients who also have access to knowledge.
A new definition of customer is brought about by the changed organizational

structure. As training has moved from satisfying trainees to affecting
organizational performance, the definition of customer has broadened. Trainees
are still customers and their evaluations are important sources of feedback for
continuous improvement and qUality. However, their organizational unit and
the organization as a whole are now part of the client system, where training is
performed to solve business problems of the unit and affect the organization.
Evaluation
These changes have resulted in increased pressure on the training function to

demonstrate its worth. This has been asked in several different ways. Do
trainers do a good job? What is their impact on our work? Is there a cheaper
way to do this? What is the value-added? The last is, What is the effect on our
profitability; is there an ROI?
The literature of training evaluation provides a framework to answer these

questions and has addressed many of the current issues for trainers. Some time
ago Donald Kirkpatrick (1994) provided a framework of four levels of

evaluation. The ftrst level being trained perception; the second, learning; the
third, performance; and the fourth, impact. Jack Phillips (1996) has written that
evaluation must go beyond level four and focus on real measurement of ROI.
Dana Gaines Robinson, whose writing (1989) focused many trainers on impact
now (1994) directs trainers to become performance consultants and de-
emphasize training as an intervention. Robert Brinkerhoff (1987) uses data
gathering and evaluation to make the training function more customer focused
and to practice continuous improvement. His work emphasizes the evaluation
of learning as training's only direct result.
However, these practices do not, in the author's opinion, represent a choice.

They do not represent a menu from which evaluators can chose. Evaluators
must respond to the new requirements by implementing all these concepts and
evaluating at multiple levels. Training must be customer focused and
committed to continuous improvement. To do this evaluators must collect data
that can be used to improve training in terms of customer expectations and
satisfaction. In the new organization, the deftnition of customer training has
expanded from the trainee in the classroom to include the trainee's unit
manager, the unit, and the organization. Level I data (from Kirkpatrick's
model) is still needed to get feedback on the trainee's perceptions of the
experience. This data can directly give information for improvement and focus
on satisfying the most visible customer. However, Kirkpatrick's Level III
evaluation, which measures the performance of the trainee, is important to both
trainees and their business unit. The impact on the business problem being
addressed is probably the most important data to the unit and organization. It
tells us not only if the training was effective but if the training provided was an
appropriate response to the business problem being addressed. Return on
investment can be viewed as the ultimate impact, or an additional level of
evaluation. Either way it is an all-important evaluation level in today's business
environment, for it addresses a question that is inevitable.
Kirkpatrick's Level II evaluation (learning) has probably become less

important in today's business environment. This is not because there isn't need
to learn; it is because the business environment is an applied one. In the
business context, learning usually means application or performance. When a
new software system is to be learned, it means we can use it. This separates the
business environment from school. In school the only task is to demonstrate
that you know something as it is deftned by the school. In the business
environment, you need to do something with this knowledge. The new business
environment is all about performance.
Level III measures performance, or the application of knowledge, and Level

IV, the impact on the business problem. These are the levels, along with ROI,
upon which today's organizations have focused. Additionally, most evaluators
have done a poor job with Level II evaluation, using tests or certifications as the
measurement. However, paper-and-pencil tests are not a substitute for
performing the task for which one is being trained (authentic assessment).
Authentic assessment is being used more often in education (Moon, 1997).
Authentic assessment looks more like Level III evaluation.
Data show that evaluation is predominately performed at Level I. In ASTD's

Benchmarking report (Bassi, Benson & Cheney, 1997, p. 3), it was reported
that 94 percent of courses were evaluated at Level I. The percentages then
descend, 34 percent at Level II, 13 percent at Level III, and 3 percent at level
IV. The differences are not as great in a survey reported by Linkage, Inc.,
(AME Newsletter 1996). They report that only 42 percent of responding
companies perform Level I evaluation, 43 percent Level II, 34 percent Level III,
and 16 percent Level IV. It is apparent that evaluation of any kind is not
universally done, even though there is a need to justify training, and it is
through systematic evaluation that we improve our practice.
Integrating the concepts of the cited authors, there are probably three kinds
of evaluation data needed today. They measure
• Customer satisfaction
• Impact on the business problem
• Return on investment
The tasks for training are to satisfy customers and meet their expectations,
provide solutions to business problems with which they are presented, and
contribute to the profitability and the mission of the company. Everything that
training does should contribute to one of these tasks. These are the same tasks
of all other business units in a corporation. There should be no more
discussions about training being more like a business. If the training unit is in a
corporation or is providing training to customers, it is part of the business. The

current move to align training more with the corporate strategy is to admit the
need for a correction.
Evaluative data should measure training's success at completing its

business tasks. Individual performance, as in Kirkpatrick's Levels II and III,
are not goals in and of themselves but are a focus only when they are solutions
to a business problem or integral to customer satisfaction. In the business
environment, learning and individual performance are important as they are
applied and promote customer satisfaction, solutions to business problems, or
contribute to profitability.
Customer Satisfaction
The evaluation of customer satisfaction is a little more complex than was once
thought. The definition of customer has been expanded. Not only are the
employees participating in the training of customers, but in this new
organizational configuration, the business unit with the problem and the
business unit manager are also customers.
We are measuring perception of quality, convenience, and value as well as

whether the customer's expectations were met. This information is crucial to
continuous improvement. It is important to note that if the customer's
expectation includes learning or improved individual performance, then
achievement of these goals is part of customer satisfaction. Thus, it needs to be
evaluated. This is where the individualistic notion of learning comes into this
schema. It is important when it is part of the customer's expectation or as a
prerequisite for performance. This notion of customer satisfaction has expanded
the definition of both who is identified as the customer and the elements of
satisfaction.
Business Impact
This level of evaluation is the one that is usually the most important to the
business unit manager. It answers the question, Did the training make a
positive difference in the business problem I have? Doing this level of
evaluation requires working with the business unit manager to identify the
business problem up front. The emphasis is on identifying the problem, not on

what needs to be taught, delivery, or trainees to be serviced. All of these things
are designed to address the business problem. It also makes trainers think of
training as one problem-solving intervention among many.
The steps for performing this level of evaluation are
1. Identify the business problem.

2. Develop an evaluation strategy.
3. Collect baseline data on the problem.
4. Design training intervention.
5. Deliver training.
6. Collect outcome data on the problem.
Return on Investment
Training has no choice but to demonstrate its effect on corporate profitability in

today's organization. This is true of every unit in the organization. While it was
once considered impossible to measure the ROI of training, many organizations
are now doing it. The knowledge to do it is readily available to the practitioner.
However, it is still difficult, long term, and complex. Discussions with cost
accountants are helpful. However, it is possible that once you begin to measure
ROI, your process will improve. The advice of many in the profession is to
demonstrate your contribution, because the stereotype about training is not
necessarily that it makes a contribution.
Conclusion
Environmental and organizational changes are putting new demands on

training. Training must change to support new organizational structures within
a complex and changing business and social enviroment. Training is changing
in who we serve, how we serve them, and why we serve them.
Forces of change in the environment-the global, knowledge-based

economy, the information explosion, advances in technology, changes in the
workforce, and evolving organizational architecture-have resulted in pressure
on training organizations to justify their costs in terms of improvement in
individual and organizational performances. There is pressure to be customer
focused, high quality, just in time, and cost effective. Training organizations
are required to facilitate the changing roles of an aging, international, and
culturally diverse work force. Training is also being asked to facilitate the
application and creation of new knowledge and the integration of ever-
changing technology in work processes.
These factors have resulted in new roles for training organizations, such as
change conSUltants, vendor managers, and information synthesizers. New
organizational structures, which are smaller, flexible and have permeable
boundaries, with external vendors have been adopted. Training is seen as one of
many performance enhancing interventions. There is movement away from the
classroom toward less costly, more decentralized delivery, including
electronically distributed delivery.
The implications are that training evaluation has had to focus on

achievement of strategic initiatives, organizational performance, and return on
investment on training expenditures. Training organizations have had to
demonstrate they are well-managed organizations that make decisions based on
results, costs, and other organizational business concerns. The definition of
customers has been expanded from the employees in the classroom to include
their managers and business units, and evaluation has had to expand from
employee-in-the-classroom satisfaction to individual and organizational
performance. This has meant not only a change in the level of evaluation but in
evaluation becoming multileveled. While these evaluative processes should be
made easier by the availability of information and technology, they have often
added levels of complexity to them.
To meet these challenges, evaluators must move with the environmental and
organizational changes. These changes create a need to look at evaluation
differently. Evaluation of training must be multilevel, customer focused, and
support continuous improvement of training. Evaluation should demonstrate its
affect on a targeted business problem and ROI. The challenge is to provide
meaningful data that enables you to assess customer satisfaction, business

impact, and return on investment.
References
AME Newsletter (1996). Lexington, MA: Linkage, Inc.
Bassi, L.J., Benson G., & Cheney, S. (1997). The top ten trends. Training and
Development Journal. 50 (11), 28- 42.
Brinkerhoff, R. (1987). Achieving results from training. San Francisco, CA: Jossey-
Bass.
Gaines Robinson, D., & Robinson, J. (1994). Performance consulting: Moving beyond
training. San Francisco, CA: Jossey-Bass.
Gaines Robinson, D., & Robinson, J. (1989). Training for impact. San Francisco, CA:
Jossey-Bass.
Kirkpatrick, D. (1994). Evaluating training programs: The four levels. San Francisco,
CA: Berrett-Koehler.
Phillips, J. (1996). Accountabilitv in human resource management. Houston, TX: Gulf

Publishing Company.
Schon, D. (1983). The reflective practitioner. New York: Basic Books.
U. S. Bureau of the Census. (1993). Levels of access and use of computers 1984. 1989.
and 1993. Washington, DC: U.S. Government Printing Office.
U.S. Senate Select Committee on Aging. (1988). Aging America: Trends and
projections. 1987-88. Washington, DC: U.S. Department of Health and Human
Services.
Wheatley, M. 1. (1991). Leadership and the new science: Learning about organization.
San Francisco, CA: Berrett-Koehler.
Wenger, E. (1997). Communities of practice: Learning meaning. and identity.

Cambridge University Press.
About the Author
Stephen M. Brown is Dean of University College and Professor of Adult

Education at Sacred Heart University in Fairfield, Connecticut. He is the co-
editor of Evaluating Corporate Training and the co-author of Outsourcing
Human Resources. He is the co-founder and co-chair of the Assessment,
Measurement, and Evaluation Conference. He also maintains a consulting
practice. Dr. Brown received a Bachelor's degree from the University of
Massachusetts, Dartmouth; a Master's degree from the University of Rhode
Island; and a Doctorate from Boston University.
2 ORGANIZATIONAL STRATEGY
AND TRAINING EVALUATION
Carol Ann Moore and Constance J. Seidner
Introduction
Learning enables change, and it is becoming a necessary part of everyone's job.

Today's work force must be able to adapt quickly to new business directions
and ways of working, to new information, and to new projects and work groups.
Timely action can make the difference between success and failure in bringing
new products and services to market. To stimulate learning and motivate
needed changes, corporations have invested in training programs that are
designed to impact both organizational and individual performance.
The focus of this chapter is on how training may be used to support

organizational and individual learning in response to changing business
demands, and particularly how training evaluation can contribute to the
efficacy of the change initiative. Evaluation involves definition of success
criteria and the collection and analysis of information to judge training success,
followed by recommendations for improvement. Such recommendations can
reach beyond the training activity itself to affect change strategies and direct
organizational improvements. Both intended and unintended outcomes of
training may be assessed for their value, rutd sometimes the least expected
outcomes turn out to have the most significant impact on performance.
20 Organizational Strategy and Training Evaluation
Training from a Strategic Perspective
Training offers a means for realizing organizational change required for

business success. Through training, the workforce can be introduced to new
ideas and new perspectives, and learn new skills, attitudes, and ways of
behaving. To be effective, however, training must be aligned with
organizational change initiatives and be understood in a strategic context.
The Strategic Context
From a strategic perspective, both the corporation and each business unit must
make decisions about how to invest limited resources to maximize gain over
short-term and long-range horizons. Senior managers set goals for performance
in such areas as financial results, customer service, new product development,
supply chain management, technology transfer, and workforce productivity.
While these goals frame expectations for performance, they do not necessarily
show how they can be achieved or how to balance conflicting goals. This is
particularly true when intangible assets of a business may be important in
reaching goals. Such assets include the effectiveness of internal business
processes, and learning and growth by individuals and by the organization as a
whole (Kaplan & Norton, 1996a, 1996b).
Kaplan and Norton (1992, 1996a, 1996b) developed the Balanced Scorecard
as a framework for translating strategy into operational terms and actions, and
for measuring business performance. They provide a way of thinking about
strategic implementation that considers both traditional measures such as
financial and customer/market performance, and less tangible assets related to
internal business processes and organizational learning and growth. As Figure
2-1 shows, the process begins with senior managers who set corporate vision,
translate strategies into overall business objectives, and communicate these
strategic objectives to the business units. Senior managers must ensure that
strategic initiatives are aligned across business units and overall business
performance targets are clearly understood. Successful communication of
corporate goals and new strategic initiatives involves educating managers and
employees in the business units, and obtaining ongoing feedback on how
strategy is being implemented and on how organizations are changing to meet
strategic requirements.
Clarifying and
Translating the
Vision and Strategy
• Clarifying the vision
• Gaining consensus
Strategic Feedback
Communicating
and Learning
and Linking
Balanced • Articulating the shared
• Communicating and
vision
educating Scorecard • Supplying strategic
• Setting goals
feedback
• Linking rewards to
• Facilitating strategy
performance measures
review and leaming
Planning and
Target Setting
• Setting targets
• Aligning strategic
initiatives
• Allocating resources
• Establishing
milestones
Figure 2-1. The balanced scorecard as a framework for action
Reprinted by permission of Harvard Business Review. An Exhibit From "Using the Balanced
Scorecard as a Strategic Management System," by Robert S. Kaplan and David P. Norton,
(January-February 1996): 77. Copyright ©1996 by the President and Fellows of Harvard
College, all rights reserved.
Individual business units then translate broad business objectives into a linked
set of measures and activities for achieving them within the unit.
The Balanced Scorecard should translate a business unit's mission

and strategy into tangible objectives and measures. The measures
represent a balance between external measures for shareholders
and customers, and internal measures of critical business
processes, innovation, and learning and growth. The measures are
balanced between the outcome measures-the results from past
efforts-and the measures that drive future performance. And the

scorecard is balanced between objective, easily quantified
outcome measures and subjective, somewhat judgmental
performance drivers of the outcome measures. (Kaplan & Norton,
1996b, p 10.)
Reprinted by permission of Harvard Business School Press. From The Balanced
Scorecard by Robert S. Kaplan and David P. Norton. Boston, MA 1996, P 10.
Copyright © 1996 by the President and Fellows of Harvard College, all rights reserved.
Throughout, Kaplan and Norton encourage managers to nurture strategic

feedback and learning.
The feedback and learning dimension of the balanced scorecard offers one of
the best perspectives for understanding how learning and workforce
development, including training and training evaluation, contribute to
realization of business strategy. Without sufficient infrastructure and support
for individual and organizational growth, achievement of other business
objectives becomes unlikely. Success in any area of strategic importance is
highly dependent upon the capabilities of the organizations involved and on the
ability of organizations and employees to learn and change.
Successful strategies for implementing change are not uniform across

companies but require careful analysis of the particular organization for the
most effective approach. When people need to work in novel ways within and
across traditional organizational boundaries, basic organizational assumptions
and mindsets underlying behavior may need to change, too. Attitudes, values,
and behaviors that are deeply rooted in organizational tradition and individual
self-images do not change easily. Where the need for developing or re-orienting
the workforce is great, corporations may tum to training as a means for
accomplishing change.
Training as a Strategic Lever in Organizational Change
As a strategic lever for implementing organizational change, training offers a

number of benefits (Gherson & Moore, 1987). First, training is accomplished
in-house, changing the ways organizations and individuals function without
significant external visibility. At the same time, from an internal perspective, it
is a highly visible investment in existing staff, and it may be received more
positively by the workforce than other approaches to change that may be more
wrenching. In addition, while training is gradual, incrementally changing the
ways in which individuals, teams, and organizations behave, some effects are
felt immediately. As soon as the first training is delivered, some of the
workforce begins to develop new attitudes, behaviors, and skills, and to
incorporate these learnings into their work. Finally, a special advantage of
training is that it can be modified while in progress, if it is not as effective as
desired. This allows for incremental adjustments and continuous improvement
of training, even as strategic change is in progress.
Strategic Training Initiatives: Some Examples
Training was a key component of the quality improvement program that turned
Xerox around in the mid 1980s. The Xerox change strategy involved driving a
Leadership Through Quality initiative across the organization. Successful
implementation of the initiative required sweeping changes in management
practices as well as cultural norms. To enable these changes, quality training
was cascaded throughout the organization, beginning with the CEO, David
Kearns, and reaching all employees in the corporation. Managers who received
the training then delivered it to the employees in their respective work groups.
Anyone in a managerial or supervisory position passed through the training
twice, first as a trainee, and then as a trainer. This top down approach to
training was seen as critical in successfully implementing the turnaround
strategy (Kearns and Nadler, 1992).
Corporate quality education, in general, has gained visibility in the 1990s

and brought a unifying theme to corporate training activities. Corporate quality
universities (Meister, 1994) have broader goals than the traditional training
functions, which have typically focused on developing management skills and
the technical skills needed for product manufacture. Corporate universities seek
to ensure that employees, customers, and suppliers understand the company's
values and quality vision, and develop the competencies needed for success. For
many companies who have chosen to unify corporate educational efforts under
this umbrella, the corporate university is a school without walls. It may
encompass a vast range of education and training supported by the corporation.
Education may take place over the Intranet, at a local university, on the job, or
in classrooms. In some companies, it is also designed to promote continuous
and lifelong learning.
Interviewing managers at 30 companies, Meister (1994) found the scope and

mission of corporate quality universities to include 1) building competency-
based curricula; 2) developing a shared vision of the company, its values and
culture; 3) offering training to the entire supply chain for the company from
suppliers to customers; and 4) experimenting with new approaches to learning
required competencies. While some efforts are targeted specifically at sharing
new visions and initiatives throughout the corporation and supply chain, many
develop curricula addressing core workplace competencies that all employees
should demonstrate. Most benefit from economies of scale, shared use of
resources, common information systems, and standardization of such processes
as course registration and evaluation.
Applied Materials offers a good example of a corporate university. Currently

the world's largest producer of wafer fabrication systems and services for the
semiconductor industry, Applied Materials is competing in a global and
growing industry in which technological leadership is critical to success. At the
same time, the company's workforce expanded 70% from 1991 to 1996,
presenting a clear challenge to human resources development and training to
assimilate new employees, to assure competitive competency, and to create a
shared identity across the workforce (Applied Global University, 1996).
Integrating training under one framework consisting of eight institutes,

Applied Global University (AGU) brings together new and existing courseware
and tools to build core workforce competencies and shared values across the
company. Each institute focuses on certain job- and task-specific competencies
needed to solve daily business challenges. Competencies include management
skills, professional skills (e.g., safety, quality), skills in software applications,
job-specific skills in manufacturing, and technical skills in the operation and
maintenance of Applied Materials' products. AGU also helps to integrate new
college graduates into the company, and develops multimedia-based training
and knowledge transfer tools for the corporation.
Evaluation is an important element of the AGU Management Institute,

which focuses on developing and enhancing management skills across the
organization, from department supervisors to executive management.
Evaluation supports the design of management training at all points from needs
assessment to ongoing adjustments to improve program quality. (See Figure 2-
2.) The evaluation process is based on Kirkpatrick's (1994) four levels of
evaluation, now viewed as a standard in industry training (Moller and Mallin,

1996). Using a variety of methods during and after training, the Institute
collects information from participants and managers for each of the four levels:
1) quality (effectiveness and efficiency) of the training experience, 2)
knowledge and skills learned, 3) use of learned knowledge and skills in the
workplace, and 4) business impact of using new knowledge and skills.
Interpretation of this information is made in the context of ongoing needs
assessments, input from customers, best practices, program objectives, and the
broader business goals, strategies, and values of Applied Materials.
AGU Management Institute

A Simple Instructional Design Model
1. determine ----l~.. 2. determine - - - . 3. develop

needs learning content
objectives / (build or buy)
~
,------------,
4. obtain
instructors : feedback/evaluation :
I
/
5. pilot ---~... 6. adjustments ..
----l~ 7. run course
Figure 2-2. Role of evaluation in management development and

training at Applied Materials, Inc.
Printed by pennission of Applied Materials University, Applied Materials, Inc., 1997.
With the development of corporate universities, such as Applied Global

University, the role of training evaluation grows. Effective practices are shared,
and conclusions about the impact of individual training programs may benefit
broader workforce development initiatives.
Need to Redefine Traditional Training
While training may playa substantial role in implementing change, traditional

training departments have typically not. been major players in such efforts.
Gayeski (1996), Brinkerhoff & Gill (1994), Gherson & Moore (1987), and
others concerned with the nature of industry training have pointed to the
narrow definition often given for training. When training departments are
isolated from key decision-making forums, valuable training and evaluation
expertise may be "left on the table" as companies drive strategic organizational
change and learning separately from other training efforts.
Two trends have the potential to foster a strategic partnership between the
training function and corporate leaders. First, corporate leaders are increasingly
realizing that the success of change initiatives depends on the capabilities and
commitment of the workforce. Kantor and her colleagues (Kantor, Stein & Jick,
1992) identify training as an important element in the long march toward
significant and lasting change. They suggest that when companies do not invest
sufficient time and resources to upgrade people's skills, change programs can
be undercut or even destroyed.
Second, traditional training departments are rethinking their mission and

future (Gayeski, 1996). Examples of new directions can be found throughout
the current literature. Learning is being redefined to encompass organizational
as well as individual learning (Senge, 1990). Training is increasingly being
viewed as a partnership and a process, rather than a series of discrete events
(Brinkerhoff & Gill, 1994). There is increased emphasis on defining training
objectives that are clearly rooted in the performance requirements of the
organization (Robinson & Robinson, 1995). Because of the rapid pace of
change, these performance requirements may need to be reassessed frequently
and modified. This argues for establishing training processes that are integrated
into normal business practice, with built-in feedback mechanisms, much like
those that characterize quality management systems (Brinkerhoff, 1997).
The best outcome as these two trends come together would be the emergence
of a stronger, more organic relationship between management and training.
The more closely and actively trainers are aligned with change strategists, the
more effective they will be in providing the critical link between the strategists
and the recipients of the change initiatives, the people in the organization on
whom the future depends.
Strategic Value of Evaluation
Evaluation has the potential to help leverage the transformation of training

(Brinkerhoff, 1989) and in so doing enhance the contribution of training to
organizational change. Educational evaluation grows out of a rich tradition of
social science inquiry and objective, empirical research. While in most cases,
its application in industry has been confined to assessing the outcomes of
specific training events, its potential for broader decision support is large.
Supporting Change Strategies
Building on social science methods and tools for inquiry, educational

evaluation brings to a change initiative 1) objective information about the
progress of a program and its value in relation to predetermined criteria, 2) an
outcome-based perspective on workforce development, 3) systematic methods of
inquiry where feedback and assessment are integral to the process, and 4)
techniques for identifying factors that may facilitate or inhibit application of
new learnings. Several themes of organizational change reinforce ways in
which evaluation can support change initiatives and benefit the organization as
a whole.
Establishing Collaborative, Cross-Organizational Goals
Implementing change is a complex and dynamic process that usually calls for
extensive collaboration across functions and organizations. The Big 3 Model
proposed by Rosabeth Moss Kantor and her colleagues (Kantor, Stein & Jick,
1992) identifies three action roles in the process: change strategists,
implementers, and recipients. People who play these roles need to work
together, taking a broad systems view that recognizes the complex
interrelationships within organizations. They need to create the vision and
provide the infrastructure and business practices required to enable change.
Implementers, such as educators and evaluators, can be more effective when

they work directly with change strategists and managers who establish the
goals and criteria for organizational change and workforce performance.
Educators contribute an understanding of human behavior that can help to set
realistic and attainable goals. Evaluators can help to define valid and useful
"leading" and "lagging" measures (Kaplan & Norton, 1996a) to both drive and
assess individual and collective movement toward strategic organizational goals
When trainers and evaluators participate in strategic organizational

planning, they also are better positioned to help their traditional clients in
business units see themselves within the context of the larger organization and
define training objectives that are in line with targeted business goals.
Evaluators, in particular, can encourage collaborative partnerships between
training departments and business units by focusing attention on the results of
training (Gill, 1989). The process of clearly defining desired results and criteria
for success leads to valuable exchange among change strategists, educators, and
managers, and builds management involvement and commitment to the change
process.
Linking Training Performance Goals to Organizational Goals
For the strategic value of training to be realized, individual performance

objectives must be explicitly linked to business performance goals. The current
focus within training departments on performance consulting (Robinson &
Robinson, 1995) should help drive the design of interventions that will achieve
the desired business impact. This focus is complemented by increased
awareness within the broader human resources community that strategically
defined competency models can contribute to organizational change.
The systems model for competency-based performance improvement

proposed by Dubois (1993) outlines the linkages between strategic
organizational goals, competency models, and interventions designed to
facilitate acquisition of those competencies. Evaluation is an integral part of the
model, providing feedback to all levels. (See Figure 2-3.) In real-life execution
of this model, the challenge lies in designing effective training interventions
(Steps 3 and 4) that are clearly linked to identified competency requirements
(Step 2). Evaluators can help their training colleagues explicitly establish this
link by asking hard questions about what the training is being designed to
accomplish. Is it targeted toward fostering the behaviors and mindsets that are
strategic to the company's direction? Will strategic training translate into
changes in the individual and collective competencies of the workforce?
Training that isn't producing any discernible business impact may not be
targeting the "differentiating" behaviors (Ennis, 1997) needed for company
success.
External Environment
Organizational Environment
.--------.
Strategic Goals,
Step 1 Objectives, and
Business Plans
Step 2
Step 3
Step 4
StepS
1_ _ _ _ _ _ - - - - - - - - - - - - - - - - - - - - - - - - - -
Figure 2-3. Strategic Systems Model for competency-based performance

improvement in organizations
Reprinted from Competency-Based Peiformance Improvement: A Strategy for

Organizational Change, written by David D. Dubois, Ph.D., copyright 1993. Reprinted by
permission of the publisher, HRD Press, Amherst, MA (412) 253-3488.
Evaluators can support and leverage the efforts of performance consultants

by forging a chain of evidence from organizational strategy, to individual
performance requirements, to training objectives to training results that link
back to organizational goals. By clearly defining the conceptual linkages,
identifying indicators that make those linkages real, and generating data that
confirm a causal relationship, evaluators are providing tangible evidence of the
value of training in achieving business goals.
The design of effective interventions is often made more difficult by the

dynamic nature of change and pressures to evolve rapidly the skills of the
workforce. Training may have to be developed quickly without the luxury of
significant redesign. This is particularly important when training delivery
systems are an integral part of the workplace, or are incorporated into
performance support systems that people use on a daily basis (Touger, 1997).
With rapid turnaround and technology-based delivery systems, the use of
formative evaluation and rapid prototyping become paramount. Both of these
methodologies, aimed at "getting it right the first time," depend heavily on
evaluation data from iterative reviews of prototype materials during the
development cycle.
Generating Information and Providing Feedback
Information and feedback are central to successful implementation of strategic

organizational initiatives. However, getting the right information at the right
time, and using it wisely is not as easy as it may seem at first.
Valid Criteria
A key role of the evaluator is to provide objective information and feedback of

sufficient quality to enable change strategists and performance consultants to
draw supportable conclusions. Data collection should be iterative, utilizing a
wide range of appropriate measurement strategies and methods: surveys,
interviews, focus groups, observation, team performance indicators and
business data. Process methodologies and indicators (Stufflebeam, 1983;
Brinkerhoff, 1987) can provide insight into the contextual variables that
influence the application of new learnings back on the job.
The evaluator must make sure that what is measured is not whatever is
easiest to measure but is a true indicator of the performance objectives for the
intervention. Meaningful evaluation criteria and legitimate measures of
performance must be established early in the change effort, for spurious
measures and invalid data in the midst of implementing a change initiative can
have disastrous and costly effects.
Valid criteria are important for another reason. Evaluation is an

intervention. Like business metrics, evaluation criteria drive behavior. If the
criteria are not good indicators of the intended behaviors, the evaluation effort
itself may encourage counterproductive behaviors. Meyer's (1994) discussion of
measures that help teams excel illustrates this point. If cross-functional
collaboration and communication are important to successful teamwork, then
measures of team performance should include process measures. One such
measure might be reduced cycle time, a goal that requires cross-functional
collaboration. If instead, indicators of successful team performance are solely
results oriented, teams might be encouraged to spend their time trying to
produce immediate, visible results while minimizing time devoted to
developing the cross-functional communication and support mechanisms
needed for successful long term team performance.
Messages About Learning and Performance
Evaluation practices also send subtle messages to employees about the larger
organization and their place in it. These messages need to be consistent with
the values and assumptions underlying the change effort. For example, one of
the assumptions underlying the theory of learning organizations is that
individuals are self-directed, that they want to feel competent and to achieve
personal mastery (Senge, 1990). Practices that encourage self-assessment
against personal goals are therefore compatible with the values of a learning
organization. Evaluators in these organizations might encourage people to
document what they do and reflect on outcomes of their behaviors, so that they
can systematically and continuously learn from their experience.
On the other hand, evaluation techniques that involve external judgments

about worth and value, such as tests and 360 degree skills assessment, are less
compatible with norms and values that learning organizations seek to promote
(Senge, 1995). This is not to say that tests should never be used. In some
organizations, testing is an integral and valued element of the work and reward
structure, for example, in organizations with certification programs. The point
is to align evaluation methods with the behaviors and values the organization
seeks to establish and reinforce.
Use of Evaluation Results
The way organizations use evaluation data also sends subtle messages that may
affect behavior, and consequently the validity of the data itself. For example,
using student feedback to improve the quality of training sends a message to
instructors and participants that training is important, and that their honest
feedback is valued and used. When the same data are used to determine
instructor salary and promotions, a different message is sent, promoting
different behaviors, particularly on the part of instructors. For their own
livelihood, instructors will be motivated to influence students to rate their
courses highly. As a result, the quality of the training may be overrated, and
needed improvements will go undetected. Evaluators have the opportunity to
encourage decision makers to think about the subtle and not-so-subtle messages
they send to employees by the strategies they employ, the data they collect, and
the use that is made of that data. Those messages should be consistent with the
behaviors and attitudes the organization intends to promote.
Overall, evaluation supports and affects change initiatives in a number of

ways. In this section, we have discussed how evaluation processes can facilitate
cross-functional collaboration, focus on linking training performance goals
with organizational goals, and provide information and feedback integral to
change initiatives. The following section suggests an even broader, potentially
more strategic contribution that evaluation can make to organizational change.
Evaluation and Double-Loop Learning
The role of evaluation and the power of the analytic approaches offered by
evaluators can extend beyond assessment of the value of specific training
programs during change. Evaluation processes offer insight into the nature and
functioning of the organization undergoing change. In an objective manner,
evaluation techniques and results can describe the true opinion of different
stakeholders, performance improvements of different participants, and the
degree to which desired business outcomes are reached. Such information can
lead to a reassessment not only of training and training strategies but also of
change strategies implemented in part through training programs. Thus,
evaluation can contribute to double-loop learning (Argyris, 1990; Argyris &
Schon, 1996) as much or more than to single-loop learning in organizational
change. (See Figure 2-4.)
Single-loop learning refers to changes in assumptions and strategies for

action to solve organizational problems. When circumstances require a
reengineering of current processes to attain better organizational functioning,
most approaches yield changes to the current processes. This involves
identifying and solving problems within a given environment, taking actions to
improve organizational processes and practices, evaluating the results, and, if
successful, extending learnings from this experience to other organizations.
Argyris refers to such problem solving as single-loop learning. Sometimes,
however, no matter how elegantly the current processes are redesigned, the
underlying causes are never touched and problems persist.
..... Goveming
Values f-r+ Actions r--- IItIsmatc:h
or Errors 1--..
Single-Loop Learning
Double-Loop Learning
Figure 2-4. Single and double loop learning
Overcoming Organizational Defenses by Argyris, © 1990. Reprinted by permission of Prentice-

Hall, Inc., Upper Saddle River, NJ.
Double-loop learning requires stepping back and looking deeper for the
foundation of problems, particularly underlying values, norms, and beliefs
about how the world works (theory-in-use) that drive strategies and actions.
Such theories-in-use may be unspoken and not recognized as important to
desired organizational change. Decision makers, managers, and consultants

may not realize that the ways in which they are defining and thinking about
problems may hinder successful change. When problem-solving strategies don't
succeed, it is easy for people to become defensive and to reason narrowly about
solutions. Substantive organizational change, however, requires participants to
inquire into underlying sources of problems, to seek information and feedback,
to discuss alternative explanations, and to reflect on how individual and
collective behavior contribute to existing circumstances.
Consider the development of Motorola University, described by Wiggenhorn

(1990), as an example of double-loop learning as well as single-loop learning.
Responding to competitive pressures, Motorola began a long-term investment
in workforce development, relying on training as a major vehicle for
implementing change. What followed was a series of educational initiatives,
each endorsed at the highest level of the corporation and each moving Motorola
closer to today's corporate university. Wiggenhorn (1990) refers to these efforts
as an odyssey or expedition leading to eye-opening discoveries about workforce
skills and abilities, the way in which work was actually accomplished, and
resistance to change. With each new initiative, the strategy itself was changed
as the problem to be solved was redefmed based on prior learnings.
At first, Motorola management believed that instruction in new tools and

technology, and teamwork would be sufficient to create a competitive
workforce. Disappointing results were attributed to lack of on-the-job support
for new learning by management, and led to an intensive course for managers
in general business management. When this approach was not as successful as
desired, focus shifted to educating all employees in quality processes, so that all
would see a need for change. Motorola Training and Education Center was
established with five-year goals to expand participative management and help
to improve product quality ten times. A major evaluation study showed that
successful business results were realized only at plants where the quality
curriculum was fully taught, implemented, and supported. At those plants, the
return on investment was impressive: $33 for every dollar invested, including
the wages of workers while in class. Other plants broke even or even showed a
negative return on investment. As a result, additional actions were taken to
involve managers and employees. New cross-functional seminars and
management education were added to the curriculum, and recruiting practices
and reward systems for employees were reviewed. Key topics discussed in
executive seminars were integrated into other courses to develop common

understandings throughout the corporation.
A clear turning point carne, however, when plans for a new cellular
manufacturing facility motivated inquiry into the math skills of the local
Motorola workforce. The selected workforce had experience with a similar but
older technology, had improved quality tenfold, and were continuing to
improve. The skills assessment was to determine whether workers needed any
further training to transition to production of the new cellular products. The
results were surprising: 60% of the local workforce had difficulty with simple
arithmetic. Even broader needs for basic skills education were uncovered
through the resulting math classes. Not only were employees having trouble
with math, but also many were having difficulty reading and, for some, even
comprehending English. The ramifications of illiteracy for job performance
were staggering and demanded attention. Clearly changing the theory-in-use
about the workforce, this discovery completely redefined workforce
development needs and training strategy corporatewide. Senior management
decided to invest in basic skills education by turning to local community
colleges for help, to review other areas of technical and business skills that they
had assumed the workforce to have, and, eventually, to build extensive
educational partnerships with schools and colleges, and to establish Motorola
University .
While single-loop learning leads to changes in actions to correct or

improve a process or situation, double-loop learning questions the definition of
the problem itself and the underlying values, norms, and assumptions that
sustain the status quo. Both types of learning are based on inquiry into why
results of actions are different from expectations. Two evaluation studies were
highlighted by Wiggenhorn (1990). However, ongoing evaluation is implicit.
At each phase, management received feedback on the outcomes of training, and
at times received new revelations about the true nature of workforce
competencies and attitudes, and the programs needed to develop a competitive
workforce. The emergence of Motorola University with growing insight into
change requirements was quite dramatic. Although change is usually
incremental and the effects of inquiry more modest, it is difficult to predict
when an apparently routine inquiry will yield information that reframes an
entire change process. When evaluation processes are designed to collect
process and outcome information, within and across business units, at all
evaluation levels (Kirkpatrick, 1994; Phillips, 1997), a reservoir of
organizational information begins to grow, providing data to gauge

organizational health, to assess the impact of change over time, and to query
basic assumptions in designing change strategies.
Summary
In the competitive global markets today, businesses must be ready to respond

rapidly to market pressures and to adopt new ways of functioning. For success,
whole organizations must remain on the cutting-edge, ready to assimilate each
new change. Individuals, teams, and entire organizations need to be learning as
well as doing most of the time. We have discussed how training may be used to
support organizational and individual learning in response to changing
business demands, and particularly how training evaluation can contribute to
the efficacy of the change initiative.
To successfully implement needed change, training must be aligned with

strategic organizational change initiatives and be understood in a strategic
context. The strategic feedback and learning dimension of the Balanced
Scorecard framework (Kaplan & Norton, 1992, 1996a, 1996b) offers insight
into how learning and workforce development, including training and training
evaluation, contribute to realization of business strategy. When used as a
strategic management tool, the Balanced Scorecard leads management to
clarify corporate vision and strategies, set goals, and align strategic initiatives,
and requires business objectives and metrics that will pull the organization
toward its overall strategy and vision. Strategic learning and feedback are
central in closing the loop between strategy and action. In this framework,
training becomes a lever for implementing strategy, and evaluation a source of
feedback on the progress of learning.
Evaluation is an established process designed to ascertain the value of a

training endeavor. However, the role of evaluation and the power of the
analytic approaches offered by evaluators extends beyond the narrow
assessment of the value of specific training programs. Evaluation processes can
offer insight into the nature and functioning of the organization undergoing
change. The strategic value of evaluation for organizational change is discussed
in establishing collaborative, cross-organization goals; developing a systems
perspective linking training goals to organizational goals; and generating
organizational information and providing feedback.
Specifically, evaluators can help to set realistic goals for workforce

performance that are aligned with business strategy, and define valid and useful
measures of their attainment. Evaluators can help leverage the strategic value
of training by ensuring that outcomes are aligned with the performance goals
required for business success and provide iterative, objective assessments-
from prototype review through evidence of business application-that improve
the quality of training and demonstrate its effectiveness.
Looking beyond training, evaluation techniques and results offer insight and
feedback on the efficacy of broader change initiatives. Evaluation can document
the true opinion of different stakeholders, performance improvements of
different participants, and the degree to which desired business outcomes are
reached. Such information can lead to a reassessment not only of training and
training strategies, but also of change strategies implemented in part through
training programs. Thus, evaluation can contribute to double-loop learning-
learning about the underlying values, norms, and assumptions that drive
strategies and actions (Argyris 1990, Argyris & Schon, 1996)-as well as
learning about how to correct and improve practices and reengineer processes.
The challenge posed here is to look beyond the very narrow traditional
images of training evaluation to recognize the larger strategic value that
evaluation processes offer the corporation undergoing organizational change.
The decision-support potential of evaluation during implementation of change
initiatives cannot be denied and should not be overlooked.
Acknowledgments
We thank Janet Potter, Manager of the Management Institute, Applied Global

University, Applied Materials, Inc., for sharing materials on the Institute and
on AGU, and for obtaining necessary permission to include information on
Applied Global University in this chapter.
References
Applied Global University. (1996). Santa Clara, CA: Applied Materials, Inc.
Argyris, C. (1990). Overcoming organizational defenses: Facilitating organizational

learning. Boston: Allyn and Bacon.
Argyris, C., & Schon, D. A. (1996). Organizational learning II. Themy. method. and
practice. Reading, MA: Addison-Wesley.
Brinkerhoff, R. O. (1987). Achieving results from training. San Francisco: Jossey-Bass.
Brinkerhoff, R. O. (1989). Using evaluation to transform training. In R. 0. Brinkerhoff

(ed.), Evaluating training programs in business and industry. San Francisco: Jossey-
Bass.
Brinkerhoff, R. 0., & Gill, S. 1. (1994). The learning alliance: Svstems thinking in
human resource development. San Francisco: Jossey-Bass.
Brinkerhoff, R. O. (1997). Clarifying and directing impact evaluation. In S. M. Brown

& C. J. Seidner (eds.), Evaluating corporate training: Models and issues. Boston:
Kluwer Academic Publishers.
Dubois, D. D. (1993). Competency-based performance improvement: A strategy for

organizational change. Amherst, MA: HRD Press, Inc.
Ennis, S. (1997). Assessing people's competencies. In S. M. Brown & C. 1. Seidner

(eds.), Evaluating corporate training: Models and issues. Boston: Kluwer Academic
Publishers.
Gayeski, D. M. (1996). From "training department" to "learning organization."

Performance Improvement 35 (7), 8-11.
Gherson, D. J., & Moore, C. A. (1987). The role of training in implementing strategic
change. In L. S. May, C. A. Moore, & S. J. Zammit (eds.), Evaluating business and
industry training. Boston: Kluwer Academic Publishers.
Gill, S. J. (1989). Using evaluation to build commitment to trammg. In R. O.

Brinkerhoff (ed.), Evaluating training programs in business and industry. San
Francisco: Jossey-Bass.
Kantor, R. M., Stein, B. A., & Jick, D. J. (1992). The challenge of organizational
change: How companies experience and leaders guide it. New York: The Free
Press.
Kaplan, R. S., & Norton, D. P. (1992). The balanced scorecard: Measures that drive
performance. Harvard Business Review. 70 (1),71-79.
Kaplan, R. S., & Norton, D. P. (1996a). The balanced scorecard. Boston: Harvard
Business School Press.
Kaplan, R. S., & Norton, D. P. (1996b). Using the balanced scorecard as a strategic
management system. Harvard Business Review. 74 (1),75-85.
Kearns, D. T., & Nadler, D. A. (1992). Prophets in the dark: How Xerox reinvented
itself and beat back the Japanese. New York: Harper Collins.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco,
CA: Berrett-Koehler Publishers, Inc.
Meister, J. C. (1994). Corporate Quality universities. Lessons in building a world-class

work force. Burr Ridge, IL: Irwin Professional Publishing.
Meyer, C. (1994). How the right measures help teams excel. Harvard Business Review.
72 (3), 95-103.
Moller, L., & Mallin, P. (1996). Evaluation practices of instructional designers and
organizational supports and barriers. Performance Improvement Quarterly. 9 (4),82-
92.
Phillips, 1. J. (1997). Level 4 and beyond: An ROI model. In S. M. Brown & C. 1.

Seidner (eds.), Evaluating corporate training: Models and issues. Boston: Kluwer
Academic Publishers.
Robinson, D. G., & Robinson, J. C. (1995). Performance conSUlting: Moving beyond

training. San Francisco: Berrett-Koehler Publishers.
Senge, P. (1990). The fifth discipline: The art and practice of the learning organization.
New York: Doubleday.
Senge, P. (1995, June). Building learning organizations through partnerships between

HR and line leadership. Keynote speech presented at the Assessment. Measurement
and Evaluation of Human Performance Conference. Boston, MA.
Stufflebeam, D. L. (1983). The CIPP model for program evaluation. In O. F. Madaus,

M. S. Scriven, and D. L. Stufflebeam. Evaluation models: Viewpoints on educational
and human services evaluation. Boston: Kluwer-NijhoffPublishing.
Touger, H. E. (1997). Impact of technology on training evaluation. In S. M. Brown & C.

1. Seidner (eds.), Evaluating corporate training: Models and issues. Boston: Kluwer
Academic Publishers.
Wiggenhom, W. (1990). Motorola U: When training becomes an education. Harvard

Business Review. 68 (4), 71-83.
About the Authors
Carol Ann Moore is Associate Director, Stanford Integrated Manufacturing

Association (SIMA), at Stanford University. SIMA is a partnership among the
School of Engineering, Graduate School of Business, and industry to support
research and education in manufacturing. Prior to Stanford, she spent ten years
with Digital Equipment Corporation in quality assurance of education, market
intelligence for customer training, manufacturing training, and human
resources development during reengineering. She holds a Ph.D. degree in
education from Stanford University and an M.S. degree in management from
the Sloan School, Massachusetts Institute of Technology.
Constance Seidner is a senior consultant in Human Resources/Development

and Learning at Digital Equipment Corporation. While at Digital she has
managed the Training Evaluation and Assessment Group, and Quality
Assurance for the Training Design and Development Group. She has
experience in the design, development, delivery, and evaluation of internal and
customer training. While an assistant professor at Boston University, she taught
educational psychology and social psychology in the School of Education. She
holds a Ph.D. degree in education and an M.A. degree in sociology from the
University of Southern California.
3 WHAT STAKEHOLDERS WANT TO
KNOW
Oliver W. Cummings
Introduction
"Know me, know my business" was one of the catch phrases that grew out of
Peters and Waterman's (1982) work on excellence. This charge to service
providers, like evaluators, is easier to say than to achieve, but putting oneself in
the client's shoes-looking at their problems through their frame of
reference-can help in the effort to understand. This chapter provides a
framework for examining what clients of an evaluation function want to know,
what drives their needs, and how the needs can be addressed within the context
of sound business practice and professional standards.
The chapter begins by explicitly defining who the evaluator's clients

(stakeholders) really are. It then examines the context within which these
stakeholders and the training and development function operate. A discussion
of a service management model is followed by a layered analysis of what
evaluation stakeholders want to know and why they are motivated to know
these things. Finally the evaluator's role and function is presented as a service
model in its own right, bringing the logical argument full circle. Underlying
the entire discussion is the theme that understanding the stakeholders' concerns
from a business as well as an individual perspective, will enable the evaluator
to better frame and execute an evaluation effort that both meets the needs of the
client and satisfies the quality standards of the evaluation professional.
42 What Stakeholders Want to Know
Who Are the Stakeholders?
For this discussion, stakeholders are not simply managers with decision-
making authority. Stakeholders are the potential users of evaluation results and
others who may be affected by them. Each stakeholder has a somewhat different
perspective and set of concerns. The most important training and development
stakeholders are
• Organizational Management: those charged with responsibility for the

overall productivity and profitability of the business
• Training & Development Function Management: those who are responsible
for the overall contribution of the training and development unit(s) within
the business
• Program Sponsors: individuals who make the go/no-go decision to fund
the development of a training or performance enhancement program
• Intervention Developers: those charged with creating the training or other
performance enhancement programs funded
• PersonneVHuman Resource Development Managers: decision-makers
(whether from line or HR departments) who commit support to send an
individual to training or to participate in an other intervention
• Participants: individuals who commit time and energy to participating in
an intervention
• Suppliers: individuals or other businesses or organizations that provide
products or services that are essential ingredients in the planning and
execution of a training or other performance improvement intervention
With all of these stakeholders creating demand for evaluation results, there
is both good news and bad news. The good news is that there is a lot of
opportunity for evaluation results to affect decision-making in a positive way.
The bad news is that the questions and concerns of these groups, while
overlapping, are not identical, and that makes the evaluator's job exceedingly
complex. Cooley (1984), operating from his experiences that convinced him
"that it is important to identify a primary client, and to serve that client well"
(p. 29), offered a central piece of wisdom for dealing with this complexity when
he included as one of his ten commandments for evaluators: ''Thou shalt have
no other clients before thy primary client" (p. 36). Cooley's point was not to
suggest that evaluators ignore other stakeholder needs, but to emphasize the
need for the evaluator to create a conscious focus and framework for the
evaluation. That focus is guided by the primary client, who in most cases must
also be somewhat concerned about other stakeholder interests as well as his/her
own.
A further set of stakeholders, beyond the scope of our analysis here, deserves
mentioning: communities and society at large. While our focus is on
evaluations initiated within a company, there is a need for much broader
assessments of the processes and outcomes of training, performance
enhancement, and education programs. As company training programs
proliferate and supplement more traditional schooling, as the adult's need for
learning new skills increases and is met, we need to look at the aggregate
impact on society.
Researchers looking at the effects of general secondary and higher education

have estimated the return on the education and training investment. An
economist at the University of Illinois estimated that for the money the State of
Illinois invests in undergraduates at the University of Illinois, the resulting
higher taxes paid to the state by those students "yields a 6 percent real
(inflation-adjusted) return, which beats that of the 30-year treasury bond" (The
Wall Street Journal, 1996, p. 1). In a somewhat broader view,
labor economist James Heckman has concluded that a good

starting point for estimating the payoff to additional public
investment in education and training would be to assume that it
yields a rate of return of ten percent-about the same as the return
to investment in business capital (Schultze, 1996, p. 9).
So, is the community or a society a stakeholder under the definition above?

Perhaps this is stretching the definition a bit, but societies through their
cultural, political, and economic policies stimulate or impede synergies among
resources and, clearly, are beneficiaries of both training and evaluation results.
Nevertheless, the remainder of this discussion will be confined to the narrower
list of company-focused stakeholders in the evaluation.
These stakeholders all work or connect with a managed business and its
training and development function in some direct way. What is of concern to
the stakeholder changes as the contextual focus changes. The business context
is complex, and it is to the business management context we now turn.
Figuring Out Where a Stakeholder Is Coming From
Business management is among the most complex activities human beings

perform. Evaluators need to understand and deal with their stakeholders
regardless of their position or activity in the business. To achieve this
understanding, the evaluator must be able to comprehend the overall system,
and the best way to do this is through the application of a conceptual model. All
attempts to present business organization or management models, whether
Drucker's (1954) classic work, Kotter's (1978) insightful interpretation, or the
model presented here, oversimplify the process, but they provide a basis for
understanding. Since training is a service business, the model presented here is
built around the findings on service management by researchers such as
Heskett, Sasser, and Hart (1990), Bowen, Chase, and Cummings (1990), and
Maister (1993).
Expectations
At the heart of service business management is a set of stakeholder

expectations. The service model presented here addresses the expectations of
four clusters of clients. The clusters encompass all of the stakeholders noted
earlier as clients of a training and development function. The expectations of
concern are
• Client or customer expectations: training sponsors, personnellHRD

managers, and participants, for example
• Employee expectations: training and development function management
and intervention developers, for example
• Management expectations: business unit or company management, for
example
• Supplier expectations: contracted consultants, for example
These expectations are at the center of the model presented in Figure 3-1.
Need-Service Linking - - - - -
Client
Benefit/Cost
Satisfaction
Optimization
Creation
. 0<:>" 1o.<t?Y
~<:; ~'"
Client Service-Business df $~<$J~
: ; - - - - - Objectives Integration ----~~ ~4.,<;
Figure 3-1. Elements of the manager's complex environments
Cornerposts of Service
The stakeholders may be focused at any point in time on one or more of the
four comerposts of service. The four comerposts of the model are
• Target markets
• Core service definition and norms
• Operations management & systems
• Service delivery/distribution systems
Target markets are groups of clients that can be differentiated from others in
some meaningful way. In doing a training needs assessment, for example, one
evaluation task is to define a target audience for the training to be developed.
By discriminately defining the audience for whom the training is best suited,
the training organization can simultaneously encourage their best clients (i.e.,
those most likely to really benefit from the training) to attend and discourage
others (in this case, those who do not need the training) from attending. To take
a broader scenario, a training department of a large organization may be able to
focus its resources on certain groups of training sponsors or organization
management through its budget allocations or through developing a critical
competence timed to meet that targeted market's needs.
Core service definition and norms encompass both what the function
delivers and how attitudes, processes, and tangible resources are aligned to
position training and development relative to its target markets. For example,
building a training center in which highly skilled instructors deliver programs
to groups of participants brought together to be trained and to reinforce
interpersonal networks in a company, represents a very different core service
definition and operates around different norms than creating and disseminating
point-of-need, technology-based, individual performance support.
Operations management and systems encompass all of the internal

methodologies, processes, and systems that support the business. These systems
include everything from strategic planning to staffing systems, from employee
recruitment and retention systems to instructional or other design
methodologies, and from budgeting to financial and activity accounting
systems. Through operations management and systems the regulatory processes
used to direct, monitor, and control the business of the function and to increase
its cost effectiveness are enacted.
Service delivery/distribution systems involve all of the contact points

between the client and the service providers. These, in reality, are where the
service is consumed and where client satisfaction is created. Elements of the
delivery system include interorganizational contracts/agreements, course
production, course catalogs, registration systems, billing systems, facilities,
faculty quality, materials quality, and training logistics.
While the four cornerposts define some anchor points in the management of
a service, the dynamic part of service management is handling the linkages
between the cornerposts. It is in the linkages that image is created, efficiency is
achieved, competing needs are balanced, and satisfaction is realized.
Linkages
In addition to understanding the interactions between stakeholders'

expectations and the four service cornerposts, the evaluator must also attend to
the linkages between the four cornerposts.
Bringing together a target market with the core services offered involves
assessing the buyer's needs and positioning the core services and norms to
reduce those needs. Establishing management's overall aims and values is a
starting point that must be married to the training function's ability to present a
suite of particular services. In view of the company's objectives, functional
management for training and development should consciously decide how to
position the service. This poSitioning will communicate to the individuals in
the target market that they are important to the training and development
function and that the function can help them meet their needs. It will also
communicate to training and development employees that they can make a real
contribution and meet their own professional needs by addressing the potential
client's need.
Evaluation stakeholder concerns at this point in the model tend to be driven

by image and market questions. In addition to gathering and analyzing market
data, evaluators are likely to be faced with the need to operationally define the
positioning buzz words. For example, positioning training as a ''world class"
service implies much about quality norms, reach, flexibility, impact
expectations, and so on. It is often the evaluator's task to get the stakeholders to
give objective meaning to these ideas. It is only after clear definitions are
agreed upon that evaluation activities can be designed to address the important
information needs dictated by the definition.
The linkage between core services and operations management is where the
relationship of the benefits to be realized by the client and the costs to provide
the service that will produce the benefits is optimized. Stakeholder concerns
here are often related to methodological questions and process efficiencies.
Again to take the training example, a core service may be the development and
delivery of computer-enhanced point-of-need learning products. To be truly
successful, the levels of creativity, flexibility, and quality in the process must be
balanced against considerations such as the need to create and adhere to
standards, create and maintain reusable templates and computer code, and to
level staffing and workloads. The range of evaluation questions and data that
can be brought to bear on deciding the best balance of these competing needs is
very broad.
The tension between the operations management system and the service
delivery/distribution system involves balancing the value of services to the
client with the training and development function's business objectives. At the
extremes, the training client would like to receive just the right amount of
exceptionally high-quality training for free; while the training providers would
like to be paid a premium for doing just what they want to do. In practice, there
must be a balance between these extremes in which clients expect operations to
support and not interfere with their project and providers expect to achieve a
reasonable cost recovery or budget support as well as get deserved recognition
for delivering quality results. The stakeholder concerns at this point are often
focused on fairness, performance, and value. The challenge to the business
manager is to find or define standards that balance stakeholder power. The
challenge to the evaluator is to create measures that reflect the manager's intent
and are credible to the stakeholders.
Finally, within the linkage between the appropriately identified target

market and a well-designed delivery/distribution system lie the elements for
creating client satisfaction. For trainers this may mean the right relationship
elements demonstrated in sponsor interactions or the execution of the program
according to a plan. Stakeholders are concerned with issues like satisfaction,
sustain ability, target market penetration, etc. in this part of the model. The
challenge to the evaluator is to build a family of measures to adequately sample
and assess these attributes of the service.
Figure 3-1 illustrates the key elements of the complex business

environments. Since all of our stakeholders and their evaluation questions are
represented somewhere in the model, it is helpful in establishing an
understanding of what drives stakeholders to think or behave in certain ways.
These business model elements come into play over and over again in
establishing the kinds of evaluation efforts undertaken in support of training. It
is helpful, though, to also understand why stakeholders want to know certain
things. Understanding the motivation of the client gives the evaluator an even
more fertile field for asking pertinent questions and planning appropriate
inquiry strategies. It is often said that people behave as they perceive. The
stakeholders in evaluations are no different.
Why Do Stakeholders Want Evaluations Done?
If we take as givens a few basic things about the psychological makeup of

stakeholders in general, it will be easier to determine what they are likely to want to
know. Stakeholders exhibit a diversity of psychological makeup. They are generally
inclined to approoch and do things that are positive foc them and to avoid things that
are negative. Generally, they have at least a modicum of a need to achieve, and they
can be expected to behave in ways that reflect what they perceive.
These assumptions lead to three fundamental reasons why stakeholders want

evaluation done. They want 1) their risks reduced, 2) their projects, efforts, etc.
to be more efficient, or 3) to differentiate themselves, their products, or services
from others. Most of the decisions stakeholders will make based on evaluation
results find their roots in one or more of these three ultimate aims.
Understanding what need is important to a client gives the evaluator an edge

in designing and selling high-quality projects that meet the client's
requirements. If, for example, a client's primary concern is to eliminate or
reduce costs, the evaluator might sell needs assessment as a way to focus the
intervention on actual need, thereby reducing development, production, and
delivery costs and increasing productivity of employees (well-trained or well-
supported employees are more efficient). For the same objective (eliminate or
reduce cost) formative evaluations of courses or curricula under development
can be positioned as systematic assessments to minimize the review and
revision cycles necessary to achieve a finished product, thus reducing
development costs. And, longer-term cost efficiencies can be a result of doing
appropriate follow-up evaluations. Assessing the intervention's performance in
the field ensures that only functional products or programs are left in place.
This ensures that participant time and maintenance efforts associated with the
product are justified.
Table 3-1 gives further examples of how evaluations help clients reduce
risks, increase efficiencies and differentiate themselves. The table illustrates
how different evaluation practices (needs assessment, formative evaluation, and
follow-up studies) can address the benefits clients seek.
Table 3-1. Some benefits of evaluations to clients
Description
Benefit o[Benejit How It Works
Reduce risk Evaluation In needs assessment Systematic assessment of
will assure needs assures that the project will focus on
quality so the fundamental questions and can be translated into
stakeholder high-quality, follow-on products, reducing the risk
will know that that off-target interventions will be produced.
the investment
continues to In formative evaluation: Validating fulfillment of
be justified. the intervention's objectives and proper execution
of approaches reduces the risk of program quality
failure.
In follow-up: Quality assurance can be achieved by

systematically confirming the transfer of the
intervention's outcomes to the job, reducing the
risk that trained skills may atrophy because they
are not used.
Increase Evaluation can In needs assessment: Defining audience
efficiency shorten the characteristics, job responsibilities, tasks and
time between associated knowledge, attitudinal, and skill needs
problem for employees leads to a detailed, focused approach
identification to the follow-on work. Doing just what is needed
and solution and doing it right the first time requires less time
delivery. and effort than redoing segments.
In formative evaluation: Because data regarding

strengths and weaknesses are collected
systematically as portions of the intervention are
implemented on a trial basis, changes can be made
during product or program development. This
strategy is more efficient than waiting until the
finished product is available, testing it, and then
redoing major segments.
In follow-up: Follow-up studies often identify new

needs arising from changes in the market or
business. Thus, problems may be identified earlier
and defined more clearly before they reach crisis
proportions. This enables the team to react faster to
solve problems.
Description
Benefit of Benefit How It Works
Differentiate Evaluation In needs assessment: Positive reactions to an
will improve intervention are dependent upon getting the right
or protect the intervention to the right clients in the right way.
client's Realistic study design and good links with other
reputation important systems that are affected serve to
enhance the training and development
organization's reputation.
In formative evaluation: When evaluation points

out key strengths and weaknesses and subsequently
the strengths are enhanced and the weaknesses
addressed before release to the field, consistent
high quality in new offerings will serve to
differentiate.
In follow-up: Follow-up helps to ensure that

valuable interventions are left in the field and
justifies elimination of those that do not yield
expected benefits. Proactive management of
programs leads to a reputation for relevance and
quality.
Adapted from Tate & Cummings, 1991
If risk, efficiency, and differentiation are the underlying motivations for

wanting to know things about a program, the next step is to explore the kinds
of things the stakeholder will want to know to enable them to assess whether or
not those benefits were achieved. It is to this topic that we now turn.
What Kinds of Things Do Stakeholders Want to Know?
Stakeholders, with all their individually unique needs, are often unable to tell
the evaluator directly, clearly, and succinctly what they want to know. Yet
reflection and study has yielded some patterns than can enhance our
understanding and ability to respond appropriately. Stakeholders' questions
seem to sort into patterns along two main dichotomies. These dichotomies are
juxtaposed in Table 3-2, and will be further illustrated in Figures 3-2 and 3-3.
They are 1) Effects vs. fidelity, and 2) Micro concerns vs. macro concerns.
Table 3-2. Two dichotomies that are sources of stakeholder questions
MICRO MACRO
Evaluation Evaluation
EFFECfS Assessing results of a Assessing results of a
(Results Assurance) course, curriculum, or department or function
specific intervention
FIDELITY Assessing processes Assessing processes
(Process Assurance) related to a course, related to department or
curriculum, or specific function management
intervention
Now we are ready to begin to bring together this complex set of concepts to
form a set of hypotheses about what stakeholders want to know. The two
dichotomies, Effectiveness vs. fidelity and Micro vs. macro evaluation, are used
to create and organize this set of hypotheses.
When the two dichotomies are set against each other, four quadrants, each
representing a different combination of types of concerns, are created. These
combinations determine the patterns of questions that stakeholders ask, and
each stakeholder brings his/her own perspective to what will be asked. Figure
3-2 shows a number of the driving concerns that different stakeholders have,
depending on whether they are more focused on effects or fidelity, and on
macro or micro elements.
For illustrative purposes, we will examine in some detail the concerns of

specific stakeholders, beginning with program participants. Program
participants are most directly concerned about the experience (i.e., the process)
they go through and the personal value (i.e., effects) of the outcomes of the
experience. They want the process to be easy, if not exciting or fun, and they
want it to be relevant to their needs and expectations. Thus, in evaluation, and
subsequent reporting, they want to know that the training is right for them and
that their peers who have participated found the experience acceptable. The
first step in evaluation of a training program, assuming that a valid needs
assessment was the origin of the program prior to its development, is evaluating
acceptance. If you build it and potential participants reject it, it is not an
intervention.
Etreds
• Participant: Job made easier, increased • Participant: Image of quality,
opportunity reputation, use generally creates value
• Program Sponsor: Olange in behavior; • Training Sponsor: Consistent quality,
evidence of success; justification of expense, competitive advantages
penetration; meet specific objectives • Organization Management: Volume or
• Organizational Management: Relevance, throughput, organizational
attendance, absence of complaint, differentiation in marketplace, improve
transferable or redeploy able resources promotability rate, better use lower
• Supplier: Client satisfaction level personnel for higher level work
• T &D Function Management: Achievement (leverage), comparative quality or
of intended knowledge, skill, or attitude efficiency
change • Supplier: Organizational viability
• Intervention Developer: Acceptance of the • T &D Function Management: A good
work, behavior change effected, participants reputation in the organization and in the
recommend the program to others training & development professions
• PersonnellHRD Manager: Absence of • Intervention Developers: A viable
complaints, participants recommend the organization, a good reputation in the
program to others, participants better able to field, employee talent appropriately
do the job, easier or more flexible to assign used in the work
• PersonnellHRD Manager: Participants
in general and their supervisors respond
favorably to the expenditure and time
investment
Micro Macro
• Participant: Relevance, ease of process & • Participant: Consistency in service
logistics, quality of the experience level, predictability
• Training Sponsor: Good project • Training Sponsor: Ease of working
management, no surprises in budget or relationship, right and sufficient skill
process sets in place
• Organizational Management: No crises, • Organization Management:
meet approval process Efficiency, overall operational
• Supplier: Timeliness management to plan
• T &D Function Management: Absence of • Supplier: Predictability
crises that command time from management • T &D Function Management: Good
on the project; ability to deploy resources estimating processes and process agility
easily, adequately to do the project • Intervention Developers: Management
• Intervention Developer: Support needed to decisions that result in right resource
accomplish an acceptable product (access to deployment and in staff assignments for
content, client reviews, etc.) both success and for long-range
• PersonnellHRD Manager: Ease in getting development
someone prepared for and participating in the • PersonnellHRD Manager: Systems in
intervention place to enable use of services when
needed and to access the services easily
Fidelity
Figure 3-2. Some stakeholders' driving concerns by evaluation intent

Participants further want to be assured that the results of their participation

will be worth the effort, that in some way the effect of their participation will be
to make their job easier or expand their opportunities. What they need to know
then, is that people who complete the program generally improve a skill,
knowledge set, or ability to perform a function; that they are more likely to
succeed, win, or be promoted. While each participant will assess these things
for him/herself after going through the training program, they depend on
evaluative statements made by program representatives about what they will get
out of their attendance as a way to decide whether to attend or, if they have no
alternative about whether to be present, the extent to which they will be
engaged while present. As shown in Figure 3-3, some of the evaluator's tools
for providing the evidence participants need are in-process, end-of-course, and
follow-up surveys, tests of achievement, participant interviews, and course
debriefing sessions.
Both of the above examples are from the micro (course, specific curriculum,
or other specific intervention) side of the model. Turning to the macro
(function- or unit-level) issues, we find the participant concerned with
predictability and consistency of processes used by the training and
development function over several contacts or programs. From an evaluation
perspective, this concern can only be dealt with by looking across a variety of
programs and assessing the consistency with which services were delivered.
One tool here is to track performance against standards. Also on the results
side, the partiCipant is concerned about the reputation of the training and
development organization. A reputation for delivering excellent training serves
some companies well, even making a positive difference in the recruiting
process. Unless the reputation is built from hype, which is not reasonable in the
long run, it will be built on the aggregate of evaluations done, through
summaries of client satisfaction and other types of evaluation. Whether those
evaluations are systematic or not is a separate question in many companies, but
they can and should be systematic.
While participants are very focused on the training event and the processes
and outcomes related to it, training sponsors and organization management
have broader views that also encompass the program design and development
process. Starting again with the fidelity issues on the micro-side, the training
sponsor is concerned that the project management is well executed. Sponsors
want no unpleasant surprises. Organizational management, likewise, wants to
be untroubled by individual projects after they are approved-thus they want no
crises. While these concerns will infrequently become the object of an

evaluation on a single project/program, there is a side issue for evaluators, who
have a responsibility to prepare good estimates and to execute their work
according to approved plans. Stepping up to effects, however, the evaluation of
a course's or program's outcomes in terms of the four A's (acceptance,
achievement, application, and accomplishment) is clearly at the heart of what
many consider to be training and development evaluation's scope. For example,
see Kirkpatrick's (1994) conceptualization, which includes reaction, learning,
behavior, and results as its organizing framework. These evaluations deal with
the concerns of attendance, behavioral changes, target audience penetration,
and other outcome expectations and investment-related evidence, which is often
used to justify the expense of training. The evaluator's tools for these questions
are those used in field studies. Program records reviews, surveys, and
interviews are commonly used, as are various measures of business activity.
Effects
• End-of-course questionnaires & interviews • Benchmarking
• Tests • Financial accounting/analysis
• Self-reported application results • PeIformance measurement
(questionnaires, interviews, observations) • Image surveys
• Supervisorlpeerlsubordinate ratings • Client satisfaction summary
• Correlational studies of training
peIformance and key business variables
• Program records analysis
Micro Macro
• Needs assessment surveys • Strategic plan review-Is it what we

• Consensus debriefing should be doing?
• Observation • Budgeting process assessment-
• Module and course questionnaires Efficiently done? Disciplined?
• Individual participant and faculty interviews • Peer reviews for "State of the Practice"
• Project management report analyses assessment-Is it well done?
• Comparison of activity against charter-
Do we do what we say we will? Meet
our standards.
• Process measures
Fidelity
Figure 3-3. Examples of tools of particular value by evaluation intent

On the macro side, sponsors and organizational management are concerned

with realized impact, consistency of quality, cost efficiency on a function scale
and with overall volume or, in the case of facilities management, throughput.
These challenges are met with tools like benchmarking, analyses of financial
accounting records, and outcome performance measures. (see Figure 3-3.)
Concerns of training and development management, employees and suppliers
flow along these same two dimensions and into the same four quadrants.
So what stakeholders want to know is motivated by reducing their risk,

increasing their efficiency or differentiating them and is directly determined by
their focus on micro or macro issues and on effects or fidelity. The key to
giving stakeholders what they want to know lies in understanding the specifics
of the client's concerns, and applying the best tools to gamer and interpret the
necessary evidence to support decision-making.
The evaluation challenges posed by the kinds of concerns noted in the macro
columns of Figures 3-3 and 3-4 ask the evaluator to take a business function
view. This drives evaluators to strive to migrate their services from specific
courses or very narrow program scopes to programmatic and function concerns,
as has been recommended by Cooley (1984), Cummings (1986), and others.
This shift may be seen by some as beyond the evaluator's scope of service.
How Is Evaluation Defined in This Complex Context?
Evaluators, like other stakeholders in the evaluation, behave as they perceive. A

significant part of the evaluator's perception of how to frame a study is a result
of how they define evaluation. Evaluation, like many terms that are used in
social and organizational sciences, has been defined in a number of different
ways. Since the meanings given to terms strongly influence the behaviors they
drive, it is important to have clear definitions. The evaluator's operational
definition for evaluation influences how they set the scope of what they do, how
they structure their work, the literature they read to stay current, the ways in
which they report results, and even how they view their responsibilities relative
to the various stakeholders.
For the purposes to which evaluation is applied in business, a blending of

definitions is appropriate. The definition given here arises from the works of
writers like Cronbach and Suppes (1969), Worthen and Sanders (1987),
Krathwohl (1993), Brinkerhoff (1987), and Basarab and Root (1992). For our
purposes a definition of evaluation adapted to business is
Evaluation is a disciplined inquiry to gather facts and other

evidence that allow an evaluator to make assertions about the
quality, effectiveness, or value of a program, a set of materials or
some other object of the evaluation in order to support decision
making.
Krathwohl (1993, p. 550) said,
Utilization of an evaluation is a prime criterion of its success. If

the results are to be utilized, stakeholders must trust the process
that produced them. Therefore, often the process by which the
evaluation was carried out is as important as the product.
Disciplined inquiry is a process by which we can address the things

stakeholders want to know.
A Brief Elaboration on the Evaluator's Role
Some would contend that anyone who gathers data and applies that data to
make judgments about a program or some other object is an evaluator. If,
however, a definition of disciplined inquiry applies, an evaluator must bring a
substantial level of rigor to the inquiry, build evidence according to the rules of
evidence, and then draw conclusions that lead to judgments of quality,
effectiveness, or value. The best of fact-based decision making often occurs
when an evaluator has been involved in the process.
At times, the evaluator is a manager "taking on the role of evaluator" to

gather facts, to interpret those facts in a relatively objective fashion, and then to
translate interpretations into judgments that lead to management action. When
an individual begins to implement solutions, that individual ceases to be an
objective evaluator and becomes a manager once again.
Two core skills the professional evaluator shares with other professionals
and managers are project management and interpersonal skills. There are,
however, three other key differentiating skill sets that the evaluator brings to a
project:
• Evaluation and research design

• Test, questionnaire, protocol, and field procedure design and development
• Data gathering, analysis, and interpretation expertise
These differentiated skill sets add value through the evaluation process to the
subsequent decision-making.
Going back to our definition of evaluation in the business management

context, the purpose of evaluation is to inform decision-making. The
importance of the criterion of "disciplined inquiry" in that earlier definition is
that decisions should be based, not on capriciousness or, worse, on
misinformation, but on reliable facts as far as they can be determined. One
resulting role, then, of the evaluator is to provide objective data of sufficient
quality to enable users to draw supportable conclusions. There are three
separate but very important issues in this part of the evaluator's role:
1. The objectivity of the data should assure that management or client beliefs
and wishes do not bias the results. The implication is that the evaluator
needs constructive, if not literal, independence from the client.
2. Sufficient quality is a subjective issue that frequently produces conflict.
Most often managers/clients do not have the same level of understanding
of or concern about sampling theory, test design, questionnaire design,
content analysis, etc. as do the evaluators and, therefore, do not see the
need for investing time and money at the levels the evaluator believes is
necessary. On the other hand, some evaluators approach each study as if it
should produce close to flawless, unassailable results. Finding the balance
between the criticality of the decision and usefulness of the evaluation data
is a challenge in the design of each new project the evaluator takes on.
3. Drawing supportable conclusions ties back to the quality of the data and
objectivity, but it also requires a broader understanding of the client and
their situation. In drawing conclusions, the evaluator must place relative
value on alternative decisions. In most cases the evaluator, going back
again to our definition, will make assertions about those alternative
decisions or will through the evaluative process make direct
recommendations about certain value-related choices or actions. This is not

to say the evaluator's role is to make the decision for the client, but the role
certainly is to bring clarity to the decision process.
The independent evaluator's responsibilities are to interpret the object of the
evaluation (i.e., to figure out what it really is that the primary client wants to
study), to get a precise definition of the evaluation questions that must be asked
in order to address the needs of the client, and then to plan and execute a
systematic data gathering and rigorous analysis and interpretation process
leading to recommendations that will be implemented. This role points to a
need for the evaluation function to be in the hands of professional evaluators
who are skilled as well in the arts of service management.
The Evaluation Function as a Professional Service
Service management concepts and strategic success factors are complementary

constructs by which evaluation can be examined. As stated earlier, training and
development functions are service businesses. Similarly, evaluation is a service
business. It is manageable as a service, and the success factors related to
evaluation map easily to service management concepts (Cummings, 1993).
Success factors for an evaluation function include a range of issues that touch
on all aspects of the service and, when attended to appropriately, assure the
success of the function. Figure 3-1 showed key elements of service
management. Figure 3-4 shows a mapping of critical success factors for
evaluation functions to those service management elements.
In the circle at the center of Figure 3-4, some of the key success elements for
the evaluator are presented under the most directly affected stakeholder. The
stakeholders' wants, laid out in the earlier parts of this paper, are serviced by
the evaluator meeting these success factors and attending as well to those under
the four cornerposts of target markets: core service definitions and norms,
operations management and systems, and service delivery/distribution systems.
In the realm of target market, the stakeholders want to know that you have
picked them well and that you know enough about them and their unique needs
to be of service. Relative to your core service definitions and norms, they want
to know that your staff is competent in their disciplines and that the services
offered are beneficial to the stakeholders. Under operations management and
systems, they want to be assured that you operate by a set of ethical and
professional standards and that you function efficiently. And in regard to the
service delivery/distribution system, the stakeholders want to know that the

system works, is easy to enter, and that their needs will be satisfied through
your good efforts.
It is well known that empathy with the client, sometimes referred to as

"know me, know my business," is a key element in producing client satisfaction
(Zeithaml, Parasuraman, and Barry, 1990). For evaluators working in training
and development projects, recognition that both their clients and they
themselves are in the same sort of business can greatly enhance the evaluator's
empathy for the client. With the iI1l-Tease in empathy comes increasingly higher
levels of understanding of what ilie client wants and needs to know. In this case
it is "know yourself and your business in order to better know your clients and
their business."
Core Service Definition &

Target Markets Norms
Respond to unique character of Follow rules of evidence
different clients Accurate records (data
Know decision context and files)
issues facing client Multiple data sources
Know bow they think about Establish rapport Varied approaches to fit
issues Fair and balanced reporting circumstances
Identify primary elienUattend to stakeholders Appropriate analytical
Independence (not overly identified with the techniques
client) Adequate data and
information dissemination
~ Planned. executable
Common mission methodology and service
Qearroles migration strategy
Appropriate self-image
Service Deliveryl Team identification Operations Management
Distribution Systems Open communication & Systems
Fair management practices
Market the role of the eareer path Policies on proprietary
evaluator nature of data
Eager to please. work Management Rewards for
bard • Balance expense and return contribution-individual
Focused, businesslike and team
approach Maintain integrity
Balance efficiency with Sensitivity to organization
creativity and freshness in which initiative must
of approach operate (internal group)
Tailored. understandable, Management of turnover
timely reporting
Responsiveness
Develop adequate
promotionallmarketing
approach for senices
Figure 3-4. Evaluation success factor mapped to service management elements

Adapted from Cununings, 1993
References
Basarab, D. 1., & Root, D. K. (1992). The training evaluation process. Boston, MA:
Kluwer Academic Publishers.
Bowen, D. E., Chase, R. B., Cummings, T. G., & Associates. (1990). Service
management effectiveness: Balancing strategy. organization and human resources.
operations. and marketing. San Francisco: Jossey-Bass.
Brinkerhoff, R. O. (1987). Achieving results from training. San Francisco: Jossey-Bass.
Cooley, W. W. (1984). The difference between the evaluation being used and the
evaluator being used. In S. 1. Hueftle (ed.), The utilization of evaluation:
Proceedings of the 1983 Minnesota Evaluation Conference, (pp. 27-40).
Minneapolis, MN: The Minnesota Research and Evaluation Center.
Cronbach, L. J., & Suppes, P. (1969). Research for tomorrow's schools: Disciplined
inguiIy for education. New York: Macmillan.
Cummings, O. W. (1993). Evaluation as a service: Integrating service management

concepts with success factors for evaluation groups. Paper presented at the annual
meeting of the Canadian Evaluation Society, Banff, Alberta.
Cummings, O. W. (1986). Success factors for evaluation in a business training context.

Evaluation Practice. 7 (4), 5-13.
Drucker, P. F. (1954). The practice of management. New York: Harper.
Heskett, J. L., Sasser, W. E., & Hart, C. W. L. (1990). Service breakthroughs: Changing
the rules of the game. New York: Free Press.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco:
Berrett-Koehler Publishers, Inc.
Kotter, J. P. (1978). Organizational dynamics: Diagnosis and intervention. Reading,

MA: Addison-Wesley.
Krathwohl, D. R. (1993). Methods of educational and social science research. White

Plains, NY: Longman, Inc.
Maister, D. H. (1993). Managing the professional service firm. New York: Free Press.
Peters, T. J., & Waterman, R. H. (1982). In search of excellence: Lessons from

America's best-run companies. New York: Harper & Row.
Schultze, C. (1996, November 5). Promises, Promises: The Elusive Search for Faster
Economic Growth. International Political Economy. 3 (20), 6-10.
Tate, D. L., & Cummings, O. W. (1991). Promoting evaluations with management. New
Directions for Program Evaluation, No. 49, (Spring), 17-25.
Business Bulletin. (1996, November 14). The Wall Street Journal, p. 1.
Worthen, B. R., & Sanders, J. R. (1987). Educational evaluation: Alternative

approaches and practical guidelines. White Plains, NY: Longman, Inc.
Zeithaml, V. A., Parasuraman, A., & Berry, L. L. (1990). Delivering guality service:
Balancing customer perceptions and expectations. New York: Free Press.
About the Author
Oliver Cummings is an Associate Partner and Managing Director of Shared

Education Services for Andersen Worldwide in St. Charles, Illinois. The
Shared Education Services groups focus on organizational communications,
training and development, and evaluation activities. Cummings joined
Andersen in 1981 and was promoted to Manager of the Evaluation Unit in
1983. He holds a Ph.D. from Southern Illinois University-Carbondale and is a
member of the Board of Directors of the Southern Illinois University
Foundation. In 1981, Cummings received the AMEG Research Award from the
National Association for Measurement and Evaluation in Guidance. In 1991,
he was one of four Americans recognized by VETRON (the Dutch National
Organization for Human Resource Development) for professional contributions
in an international conference on corporate training. Cummings has published
over 20 works and is currently working in the areas of evaluation and service
management effectiveness.
4 THEIMPLICATIONS
LEARNING ORGANIZATION:
FOR TRAINING
Barry Sugarman
Introduction
The field formerly known as training faces an onslaught of serious change.

Training already includes much more than just classroom training; it now
includes not only self-paced, take-home learning packages, but also CD-ROMS,
on-site coaching, and on-demand, hotline support services. Even these
interesting changes, however, still reflect the old focus on the individual
worker. This will change. The learning organization viewpoint focuses heavily
on the work team and goals of learning how to perform better as a team,
including how to learn better as a team, and how to learn how they learn. Just
as the music critic or the theater critic reviews the performance of the
ensemble, and just as the outcome of a basketball game is a team product,
despite the allure of star players, so most work groups need to produce results
that are greater than the sum of their individual efforts.
The perspective of the learning organization requires more attention to the

collective part of the equation. The added value that the learning organization
provides is twofold: it lies in the heightened role of learning in work and in the
enhancement of the collective effectiveness of teams. These two together make
organizational learning, and this is the mother of all core competencies in
tOday's world.
64 The Learning Organization: Implications for Training
We live in a time of profound changes in the economy, with structural

changes in every facet of work. The security, stability, and boundaries that we
used to count on are changing. Roles that were stable are changing and very
likely will continue to change. The mental models we formerly used to
understand our work, careers, the world of business, and society at large are no
longer valid. Training and development should expect to be affected by these
changes no less than everything else. The evaluation of training and
development presupposes some agreement on its role within the organization,
but this is undergoing drastic change. A central set of ideas that is guiding
much of the managed change, and so should be some help in understanding it,
is known as the learning organization. In place of the traditional, bureaucratic
model of organizations, which assumes that the organization has a formula or
design for its business embodied in a hierarchical structure and set of rules and
policies, the learning organization assumes that any such formula is
provisional and needs continuous revision to keep up with rapid change among
competitors and in the environment of that industry. In place of the
bureaucratic model that calls for obedient cogs in its well-adjusted machine, the
learning organization calls for workers who can contribute as creative
participants to revising and adapting the formula or rules of operation. The
learning organization is an organization that utilizes the learning capabilities of
its members and is open to changes in its structure-not just change within that
structure. The learning organization exhibits, not just learning in the
organization by individual members, but learning by that organization in
reshaping itself. This point will be developed in the section on organizational
learning.
Another way in which traditional notions of training are sharply challenged

by newer thinking is in the relationship assumed to exist between training,
learning, and work. The old model was first training, then work; the revised
model was training, then work, more training, more work .... The new model,
however, requires a bigger shift of thinking in at least two ways. One way
involves thinking more about learning and less about training, recognizing that
a small fraction of all learning comes as a result of formal training or teaching.
Of course, successful training implies learning, but it is a matter of reversing
figure and ground. If we focus first on the naturally occurring phenomena of
learning, before we think about our contrived methods of training, we should be
able to see many ways to foster increased learning through other means besides
traditional training. This revised and expanded role for training is implied in
the learning organization viewpoint.
The second way this learning organization model affects our thinking about
the assumed relationships between training, learning, and work involves
highlighting the amount of learning that occurs as part of work itself. In place
of the old model, which was ftrst learning, then work, we now have the new
model: ftrst learning, then work-which-includes-continuous-learning. We are
not just learning to do the work better; we are building the organization's
knowledge base and revising its tools, processes, and products, as we work.
This, again, is what organizational learning means. This creates many new
potential areas for useful training by those who understand what is supposed to
happen here.
The huge environmental changes that are going on around us can be viewed
in terms of new organizational models, as we have been doing; they can also be
viewed in terms of the nature of work and the new importance of knowledge
and information in that work process. The information economy means quite
concrete changes in job demands. Smart work is breaking out allover; skill
levels have risen universally. Workers are required to handle far more
information, not just looking up one database but sometimes knowing how to
ftnd the appropriate one that may be located somewhere else, and applying the
information to solving customers' problems. The concierge of a big-city hotel
could be the template for many of these smart jobs. Discussion of the rising
importance of knowledge workers used to refer to scientists, creative artists,
and other highly educated professionals, but now the knowledge invasion has
swept through most jobs at all levels. Even the janitor must know about many
different cleaning chemicals, power tools, and specialized equipment, as well as
having detailed knowledge of the buildings, their layout, utilities, and the
special needs of different areas. The skills of the travel reservation worker may
start to emulate those of the reference librarian. As automated phone systems
and automated directory information systems replace human operators, those
operators and receptionists who remain must handle more demanding inquiries,
i.e., less routine questions that require problem-solving skills.
Experience in such jobs involves learning ever-more efftcient and effective

ways of doing the work. At a novice level, the work involves learning to solve
new problems posed by customers. As this experience base accumulates for the
individual worker, s/he learns to identify pattern problems for which s/he
already knows the answer. As some workers identify patterns that others have
not yet discovered, it is valuable to share this information. Sometimes
information storage systems need to be created. This, again, is organizational

learning. As we said, learning-both individual and organizational-is part of
the job. After basic training for new hires, most of the necessary learning can
emerge on the job, especially if we have communication among peer workers.
More valuable than old-fashioned training is good supervision that nurtures the
natural, informal practices of sharing expertise among colleagues. Consultation
with supervisors on these process skills in team development might be
productive, therefore.
The times of profound change in which we live involve both changes in the
nature of work and changes in the nature of organizations, including the ways
in which work is managed through those new kinds of organizations. In this
brief introduction we have noted some of the changes in the nature of work. We
turn next to analyze the changes in organizational structures and processes. In
the next section, we will review a body of research that compares organizations
that represent both the old and new models, and draw some specific points of
contrast between them. We will review a literature that has tried to draw a
broad blueprint for the new organization, based on consultation and
intervention in various organizational change projects in cooperation with their
leaders. That section leads into a major learning organization case study. The
third major section will be a review of some of the key concepts of
organizational learning that underlie all of this work. Finally, we examine
some major implications of these conceptual models for assessment and
evaluation.
The New Organization
Observers of the business scene in the United States and the Western world in
general have been telling us that something fundamental is changing, and they
have been telling us this for more than twenty years. They have based this
conclusion on the study of newly successful companies and those whose success
has come to an end, attempting to profile the differences. There has been a
steeply rising curve of book sales on this topic, magazine articles, conferences,
consulting revenues, and all of the usual indicators. This essay will look in turn
at several aspects and implications of this phenomenon. To begin, we look at
new ideas about the new organization. Book lists in the 1990s are teeming with
management books touting new-type organizations by many names-
adhocracy, flexible organization, organismic organization, virtual organization,
network organization, innovative organization, intelligent organization, matrix

organization.
Along with these ideas about the new organization come ideas about what
exactly is wrong with the established bureaucratic model, and sometimes ideas
for new models of organization. The new models are all "unbureaucracies," for
the aim of the new organization is to escape from the limitations of the
bureaucracy, especially its resistance to innovation. If the revolt against
bureaucracy peaked in the nineties, it had been building up for some time. As
early as 1967 Burns and Stalker compared two UK samples of mechanistic and
organismic companies and found the latter significantly more innovative. At
about the same time Lawrence and Lorsch also conducted comparison studies
of U.S. companies, classifying them in a rather similar way.
These researchers were far ahead of the state of opinion among most
managers and received little attention, except from some scholars. By the late
1980s, though, the climate of popular interest had changed drastically. Writers
in this area also benefited from the change in public attitudes to facing the
weaknesses of U.S. management. In 1979 and 1983, Kanter published her
studies of companies, trying to understand how some companies were more
successful than others in producing innovations. The turning point in public
receptivity came in the early 1980s as new Japanese competitors began to
outclass and humiliate some large, established Western corporations, especially
in the automobile and consumer electronics industries (Ouchi, 1981; Athos and
Pascale, 1981).
The concern over innovation thus goes back at least to the 1960s but it took
a long time to get any serious attention. There was great resistance to the
shocking message that the basic model of organization and management for all
large business corporations in the Western world is deeply flawed and needs to
be changed significantly. This message threatened to undermine the current
security of the status quo and its beneficiaries who saw no reason to overcome
their disbelief. When the icy drafts of cold reality hit some major Western
companies in the late 1970s, the problem was first defined as one of Japanese
competition and one of product qUality. Thus Quality Management approaches
commanded some attention, and U.S. quality experts could finally get work in
the United States, after returning from Japan where they had found capable
followers who implemented their ideas since the 1950s.
Quality thinking about management and organization fits well with the
organismic model and opposes some fundamental postulates of bureaucratic
management. It can be a useful platform from which to rebuild corporate
designs in a direction more favorable to quality-enhancing innovations. The
radical refocusing of all discourse around the primacy of the customer, which is
fundamental to TQM, can be a powerful force for flexibility. In practice,
though, many of those managers who claim to have taken up the quality
challenge have narrowed its definition to make sure that it does not go beyond
the improvement of concrete production processes with extensive cross-
boundary involvement at operating levels. Anything that might touch on
corporate policy and strategy or affect the balance of power between
departments, functions, and other power bases is carefully avoided. So there is
much unfinished work left from the quality movement, especially for those who
take seriously the theories of Deming about workers' motivation (Senge, 1992).
Although it was some Japanese companies and their managers that grabbed
the attention of the Western management world, the challenge for the West is
not how to make our managers and corporations into facsimiles of the
successful Japanese ones of the 1980s. Although understanding those
differences is an essential first step, it is just the beginning. Context counts and
history moves on. In the 1980s Japanese companies found a way to outcompete
the U.S. companies of that time. Since then the best U.S. companies have
learned a lot from those Japanese competitors, and both sides are now facing
serious competition from a new direction-the new tigers of Asia, such as
South Korea, Singapore, Malaysia, Taiwan, and mainland China. Today, both
U.S. and Japanese companies need to ask themselves allover again how they
can leverage the advantages and strengths they have relative to the current
environment, which includes the new competitors, while overcoming their
weaknesses, so as to re-make themselves organizationally in ways that bring to
bear their newly-configured competencies against the weaknesses of their
formidable competitors in the new marketplace
Each of the early students of the new organization gave us some version of a
two-column comparison chart, contrasting significant features of the old and
new types. The following table combines ideas from many sources, including
the type labels used by Bums and Stalker (1966)-Mechanistic and
Organismic-and the seven-S categories used by Athos and Pascale (1981).
Entries into the Organismic column draw from the findings of many of these
studies, from Mintzberg (1979), as well as from less rigorously documented

conclusions drawn from the quality and learning organization literature.
Table 4.1
MECHANISTIC ORGANISMIC
BUREAUCRATIC LEARNING ORGANIZATION
OW MODEL NEW MODEL
STRATEGY Replicate. Innovate to please the customer.

Mass produce.
STRUCTURE Hierarchy. Heterogeneous project teams.

Chain of command.
Big picture held at the top. Everyone has the big picture.
SYSTEMS Formalized. Informal.

Coordination by rule book. Coordination by mutual adjustment.
SOPs very important.
STYLE Conformity. Creativity. Learning. Participation.

Please the boss. Politics.
"Everything in its place."
STAFF Role clarity. Flexible job boundaries.

Dispassionate Passion about their work.
SKILLS Narrowly specialized. Versatile.

Cross-trained.
SUPER- Create profits and Fulfill shared vision through creative

ORDINATE rising share values for work.
GOALS investors. Community of stakeholders.
Two Faces of Bureaucracy
When Max Weber first wrote about bureaucracy early in this century, his ideal
type was contrasted with older types of authority based on traditional (e.g.,
feudal) social and legal ties or personal-charismatic relationships. For Weber,
looking at the early stages of the Industrial Revolution, bureaucracy was more
impersonal, rational, dependable, and efficient than the system that had existed
earlier. Now, however, we find it less efficient and less effective than the
emerging "unbureaucracies" or new organizations. For the needs of our current
environment, bureaucratic or mechanistic models will not suffice; what is
needed is something entirely more organismic.
Even in its own day, the bureaucratic paradigm did not give an accurate
picture of how the system actually operated. Studies of bureaucratic workplaces
from twenty years ago began to show the limits and dysfunctions of the strict
bureaucracy. Communication, especially up the hierarchy, is often
systematically distorted, concealing bad news from the boss. The assumed
expertise of higher levels to solve problems at lower levels was wrong, because
people at the upper levels lacked the information possessed by lower-level
workers, who were reluctant to share it. Sometimes the reasons for not sharing
the information were fear of punishment or the boss's refusal to listen;
sometimes the reasons were secretiveness of the subordinates. Either way,
studies showed that informal adaptations can develop in the nooks and crannies
or unsupervised areas of bureaucracy, creating local, unofficial, and sometimes
covert solutions to operating problems not officially recognized at higher levels.
Sometimes these informal arrangements were aligned with the goals of the host
organization; sometimes they served the goals of the sub-unit rather than those
of the whole organization, and sometimes they involved sabotaging efforts of
the administration to tighten controls on the rank and file (Blau and Meyer,
1971). Often enough, the informal adaptations worked in the interests of the
larger organization, solving problems effectively without any fuss or bothering
the boss.
Some sterling examples of communities of practice have been studied,

showing how in one company, office machine repair technicians (who were
assumed to work separately) talked to each other and evolved patterns of
sharing stories and tips about specific machine problems and trouble-shooting
methods. The body of knowledge passed around among these peers helped
them become more expert, more rapidly and also led to the discovery of new
solutions to problems that were not even in the official repair manuals (Brown,
1991). Sociologists have asked whether managers could increase the
effectiveness of their organizations by encouraging and supporting such
informal adaptations, but some, especially rank and filers, believe that such
attempts are usually clumsy and counterproductive. Note that the Quality
movement has attempted to capture this widespread, informal, creative energy
in its own quality improvement projects.
This kind of thinking about the other face of bureaucracy opens many
questions about whether the old paradigm is useful, even as a heuristic analytic
device. It begins to look as if the notion of formal organization as depicted by
the conventional organizational chart is more misleading than helpful once we
reflect on the facts that actual lines of communication are quite different and
that local subcultures may support norms and mental models quite different
from the official model which is the old paradigm. Among sociologists, the
classical, Weberian model gave way to a natural systems model, and
sociologists began to pay attention to the negotiations between an organization
and elements of its environment as well as coalition-building and other forms
of internal negotiations between components or groupings (Bidwell, 1986;
Kanter, 1989). Natural systems and open systems models came to replace the
classical model of bureaucracy, paving the way towards better understanding of
organizational dynamics and change. Another conceptual innovation, network
analysis, was also necessary.
Network AnalYSis
New paradigm thinking about organizations needs conceptual tools that go

beyond those that were adequate for the old model. So, while sociologists at one
time found the concepts of group, organization, boundary, and formal structure
to be quite serviceable, we now see that something more is needed. The concept
of organizational culture has been helpful in some respects. However, a new
sociological framework was needed that could cope with the fact that
organizational boundaries were not a good enough guide to understanding
behavior, either within organizations or between them. This is the conceptual
gap that networks offered to fill, providing an approach to the analysis of
relationships within an organization and relationships between members of one
organization and others. For example, there might be a network of new product
innovators across several departments within one company, or a cross-
functional network set up to plan a new project, which might involve outside
suppliers and customers as well as the usual inside players. All the outside
suppliers of materials and components to a specific product line might be
designated as the supply network. The research and development scientists in a
certain industry might be considered a network, and all the businesses in an
industry with much mutual contracting might also be a network.
Nohria and Eccles (1992) define network as "a fluid, flexible, and dense
pattern of working relationships that cut across various intra- and inter-
organizational boundaries" (p. 289). A network is defined by its ties or
relationships between members. A, B, and C may be members of the same
network but each one's network may not share the same other members. D may
be related to A and C but not to B. To the extent that these networks converged
on a common membership, we would approach a bounded group with common
membership identity, as in the older model, an organization or a community. It
seems, however, that the looser model works better for analyzing important
situations that do not conform to the assumptions of the older model. Each
person's network is built link-by-link, as opposed to becoming a member of a
group or organization and suddenly acquiring a new set of fellows, even if one
does not meet them until later.
Networks have norms, link-by-link. They are more effective than pure
markets where all are strangers. They can generate trust, link-by-link, and A,
who has found Q to be a dependable, trustworthy partner, may recommend Q to
her partner, B, who is not acquainted with Q. Based on B's trust of A and their
history of doing business together, B may trust the stranger Q with a contract
he would not otherwise offer to a stranger. Networks may grow through
sponsorship like this. In an integrated network Q will not let B down, out of
respect for the relationship with A (Uzzi, 1996). Thus network may be a more
refined tool of analysis than the older ones of group/organization and culture.
Learning Organizations
The three examples of learning organization models we shall review in this

section represent bolder and more comprehensive approaches than anything so
far examined. Compared to the previous ideas about new paradigm
organizations, these three learning organization models present considerably
more specifics and more depth to their blueprint. Most of these writers are
consultants to managers not detached scholars. Their aim is not just to outline a
structural model but to provide enough of the dynamics to understand the
renewal processes, and also (for at least two of the three) to address some of the
issues of how to initiate such a learning organization. All three models to be
reviewed here consider the essence of the new organization to revolve around a
new concept of work as learning, which profoundly affects the relationship of
workers to the job, to each other, and to the boss. According to Garvin (1993, p.
80-81),
A learning organization is an organization skilled at creating,

acquiring, and transferring knowledge, and at modifying its
behavior to reflect new knowledge and insights ...
Learning organizations are skilled at five main activities:

systematic problem solving,
experimentation with new approaches,
learning from their own experience and past history,
learning from the experiences and best practices of others,
transferring knowledge quickly and efficiently throughout the
organization.
We shall draw mainly on the learning organization model of Senge and his
colleagues at the MIT Center for Learning Organizations but, first, we will
acknowledge two other approaches. Nonaka and Takeuchi (1996) present a
notable attempt to understand how innovative product development takes place
in some highly effective Japanese companies. The model presented in this book
is based on detailed ethnography and meticulous theory-building. Unlike the
other two learning organization examples to be presented here, Nonaka and
Takeuchi were not change facilitators but rather observers and interpreters of
advanced and unfamiliar change processes, building theoretical explanations to
make them more explicit and hence accessible. They focus on both the nature of
knowledge, how it is used, and how it is produced.
By organizational knowledge creation we mean the capability

of a company as a whole to create new knowledge,
disseminate it throughout the organization, and embody it in
products, services, and systems. Organizational knowledge
creation is the key to the distinctive ways that Japanese
companies innovate. (Nonaka & Takeuchi 1996, p. 3)
They identify three distinct roles that are required to cooperate in industrial
knowledge creation: "[N]o one department or group of experts has the exclusive
responsibility for creating new knowledge. Front-line employees, middle
managers, and senior managers all playa part" (Nonaka & Takeuchi 1996, p.
59). They each have distinct roles, and there is dynamic interplay between
them. Senior managers articulate a broad corporate mission, stated quite
abstractly; front-line employees struggle to find ways to translate this into more
concrete terms, using their tacit and explicit expert knowledge that they have
worked hard to share with each other. "[A] corporate vision presented as an
equivocal strategy by a leader is organizationally constructed into knowledge
through interaction with the environment by the corporation's members"
(Nonaka & Takeuchi, p. 59).
Ashkenas, Ulrich, Jick, and Kerr (1993) present another version of a

learning organization with many practical guidelines, although neither the
word learning or knowledge appears much in their exposition. These writers
are consultants who have worked with several major corporations, especially
General Electric, on these issues of achieving greater productivity, and their
focus is very much on how to change old-style organizations in the new
direction. In their use of boundaries as a pivotal concept, these writers are
following a similar line of thought to the one that led us from comparing the
old, bureaucratic paradigm to successful examples of the new paradigm and
inferring key differences in structure, strategy, and process. (See Table 4.1).
They make the boundary concept the key to identifying a variety of practical
change projects while keeping the historical perspective.
They emphasize the importance of improved communications, pointing to

the G.E. town meetings, workouts, and cross-boundary teams. But they are
relentless in reminding us of the bottom line:
The primary vehicle for fostering innovations in RFS is a

crOSS-boundary team. Of course, the use of cross-boundary
teams in RFS, as in most Boundaryless Organizations, always
serves a specific goal; it is not just an excuse to get people
talking ... [T]he power of teams comes from their drive to
achieve goals. And in the process of achieving those goals,
they foster the continuing permeability of the organizational
boundaries. (Ashkenas, et. al., 1993, p. 339)
They are deeply concerned with the practical issues of implementing these
changes in the workplace. They take a very tough line on the question of how
one can balance or tradeoff short-term and long-term results and refuse to allow
any sacrifice of current performance while investing in capacity-building for

the future of the organization.
We now come to our third and main exhibit in the learning organization
literature. This is the work of the MIT Center for Organizational Learning, a
substantial network of consultants and researchers, united by their commitment
to a set of ideas and principles outlined in The Fifth Discipline (Senge, 1990)
and other publications (Senge, Roberts, Ross, Smith, and Kleiner, 1994;
Chawla & Renesch, 1995; Kim, 1993; Kofinan & Senge, 1995; Roth &
Kleiner, 1996; and Schein, 1993). This literature describes and interprets a
number of change-oriented field projects, it presents the theoretical models
being tested and developed; it describes the methodology used; and it presents a
vision of better organizations, which is central to all this work.
This then is the basic meaning of a 'learning organization'-

an organization that is continually expanding its capacity to
create its future. For such an organization, it is not enough
merely to survive. 'Survival learning' [adaptive learning] is
necessary. But for a learning organization, adaptive learning
must be joined by 'generative learning; learning that
enhances our capacity to create. (Senge, 1990a, p. 14)
For this group the learning organization is a body of practice for

transforming organizations towards the vision, defined in terms of five
disciplines, that include many tools and infrastructures. The five disciplines of
the learning organization (team learning, mental models, shared vision,
personal mastery, and systems thinking) combine various sets of skills and
understandings. Some of these disciplines superficially resemble some old
familiar tools of 00, but systems thinking (the fifth discipline) enjoys special
emphasis and makes a unique combination.
All five disciplines are essential but personal mastery has a special,
foundational role, because of the importance placed on intrinsic motivation in
this model. "[B]y focusing on performing for someone else's approval,
corporations create the very conditions that predestine them to mediocre
performance" (Senge, 1990a).
The learning organization for Senge extends the vision of the quality
movement. The learning organization fulfills the uncompleted part of
Deming's vision, especially his views of the capacity of the average worker for
deep intrinsic commitment to the highest quality standards, given the right
conditions (Senge, 1990b).
He also emphasizes the potential of the learning organization to introduce

significant change-not just a great volume of small, adaptive changes.
[I]ncreasing adaptiveness is only the first stage in moving

toward learning organizations. The impulse to learn, at its
heart, is an impulse to be generative, to expand our capability.
This is why leading corporations are focusing on generative
learning, which is about creating, as well as adaptive
learning, which is about coping (Senge, 1990b, p. 8 emphasis
added).
If the personal mastery discipline is important for reaching the intrinsic

motivation of the worker or manager, the systems thinking discipline is also
important to create remarkable results. "Generative learning requires seeing the
systems that control events" (Senge, 1990, p. 8). This emphasis on systems
thinking alone would be enough to differentiate this learning organization
approach from conventional work in OD from which it nonetheless borrows.
The kind of leadership required for such an organization is radically

different from the traditional model, of course. The new script calls for the
leader as designer; leader as teacher, coach, facilitator; leader as steward, but
no longer do we want leader as hero. Senge has provided a list of the
competencies needed by the new leader for a true learning organization (Senge,
1990a).
Building shared vision

Encouraging personal vision
Communicating and asking for support
Visioning as an ongoing process
Blending extrinsic and intrinsic visions
Distinguishing positive from negative visions
Surfacing and testing mental models
Seeing leaps of abstraction

Balancing inquiry and advocacy
Distinguishing espoused theory from theory in use
Recognizing and defusing defensive routines
Systems thinking
Seeing interrelations, not things; and processes, not snapshots
Moving beyond blame
Distinguishing detail complexity from dynamic complexity
Focusing on areas of high leverage
Avoiding symptomatic solutions
Lastly, we must emphasize that the learning organization vision of Senge

and associates is based on a respect for the humanity of the employees. The
only way that they will or can deliver the remarkable performance for the
organization, which characterizes the learning organization, is when they feel a
passionate commitment to their work and the organization. Such feelings are
not common. We associate them with traditional craft masters, professionals,
and elite military units in times of danger. Can people really feel a passionate
commitment to making portable phones or delivering packages or cleaning
buildings? Apparently they can feel this way if they have seen that those they
work with and work for respect them, as shown by many examples of listening,
attending to their concerns, recognizing them as individuals, treating them with
consistency and honesty. There is more, but that much is necessary. All this
implies certain beliefs about the nature of the organization and its
superordinate goals. We compared mechanistic and organistic models in a
previous section. While the superordinate goal of the traditional U.S.
corporation is to make profits for the shareholders and to give them increasing
stock values, the superordinate goal of the learning organization is to fulfill a
shared vision of all stakeholders, including employees, through creative work.
Unlike most other organizations, therefore, the learning organization is a living
community of workers, investors, and other stakeholders (Smith & Kleiner,
1995; de Gues).
Case Study: The AutoColEpsilon Program
AutoCo is one of the Big Three Detroit carmakers and Epsilon was its program
to create a new design and production tools for one of the high-end passenger
vehicles (Roth and Kleiner, 1996). These 300 engineers and related workers
were responsible for being ready to launch production in three years, meeting
many quality standards, and for staying within a budget. The program director
came in with a personal goal, not only to produce a great car design but also to
improve drastically the process of program development. Having worked on
several prior new car programs, he was familiar with the customary panic,
stress, and pandemonium in the last phase, just before launch, where everyone
would be working extreme amounts of overtime to catch up with many overdue
components. He was convinced that this was unnecessary and could be avoided
by better management. Exactly how, he did not know, but he was fortunate to
have as his deputy someone who had studied learning organizations, especially
the version being developed at MIT. A consulting team from MIT was engaged
to work with the Epsilon program. While the MIT team had many ideas and
tools to offer, they did not have a fully developed package and agreed to work
as partners with the Epsilon team, not as consultants in the traditional role.
A core learning team was formed with ten Epsilon managers and several
staff from MIT, which met every month or two for nine months. They
conducted joint assessment and diagnosis on team working issues, become
acquainted, and began learning the basic concepts and tools of organizational
learning. Members of this leadership group first engaged themselves in some
very serious learning and change, before asking the rest of the program staff to
become involved in changes. And when they did approach them, the senior
managers of Epsilon from the core learning team acted as teachers and coaches
for the other staff, with the help of the MIT consultants and other AutoCo
consultants, who had also been involved in the core learning team.
The training agenda and content was developed by the core learning team.
Several members interviewed other program staff about their greatest
challenges and strengths. The core team used that data to study to question
"Why are we always late?" Working together they created a systems map
(causal loops) of many factors, which led to discovering the point of leverage. It
was the fact that engineers having a problem with a component would not
report the problem until very late, which would cause other dependent elements
to be delayed, unnecessarily compounding the problem. Had the problem been
Evaluating Corporate Training: Models and Issnes 79
revealed earlier, others conld have helped to speed up the solution and prevent
the escalation effect.
Concealing problems was a consistent pattern. The are team discussed why
this happened, concluding that it was a combination of engineering culture,
which does not report any problem until you know the solution, and a company
culture in which reporting problems would be held against someone,
downgrading performance appraisals and reputation. The Epsilon core team
wanted to change that, and they realized that it would require establishing trust
among the program staff that bearing bad tidings would be safe. This required
some other supporting norms and beliefs. Of great importance would be the
belief that no one has all the answers, and that cooperation for the good of the
whole program was more important than some individuals being embarrassed
because their part of the work was not going well right now. In other words, the
Epsilon team was aiming to create a culture very different from the traditional
one at AutoCo, which all members had experienced over many years, a new
culture in which managers expected engineers to make their own decisions and
in which cross-functional collaboration was common. These insights were
developed first among the core learning team, over many meetings, including a
two-day off-site learning lab.
Several similar learning labs were held for a total of one-third of the entire
staff and many briefings and discussions were held with all of them. Much of
the training curriculum was concerned with communications and with
changing the norms of communication. The learning labs included various
experiential methods, including computer simulation of the entire program
development process. What made the biggest impact on employees, though, and
what they believed made the biggest contribution to changing the culture, was
seeing that senior Epsilon managers actually changed their behavior. They
became less authoritarian and more open to other viewpoints, and when an
engineer reported a serious problem delaying hislher work, they made sure that
they got help and made sure that the engineers did not suffer for their honesty.
The changed work process and culture was successful on the bottom line.
Launch of the new model was, as intended, a nonevent and actually took place
ahead of schedule. This was an unprecedented event in living memory. The
new model also came in well under budget; customer reaction and various
quality measures on the new vehicle were also well above previous levels.
In three years, a remarkably short time for serious change in organizational

culture, this program achieved impressive strides towards becoming a learning
organization. It was, however, just one unit within a large corporation that was
not yet ready to support, much less to emulate, such changes. A good strategy
for change within the parent company would have involved building on these
achievements and transferring the experience into other units. While corporate
executives could not question the bottom line results, they did not understand
the kind of changes in work organization that produced them. Epsilon leaders
and their outside advisors made the strategic error of believing that the results
would speak for themselves, and so they did not make major preemptive efforts
to overcome the communication gulf with the corporate managers overseeing
Epsilon. AutoCo had made tentative moves towards exploring learning
organization ideas, hosting repeated visiting speakers at their senior
management development seminars, but they never adopted an learning
organization type of vision for the future of the corporation or even adopted it
as a goal for Epsilon at the corporate level. The initiative came entirely from
the Epsilon leadership itself, with only passive approval from above. These are
critical issues of strategy that must be considered very seriously in learning
organization planning.
These issues can also be seen through the lens of organizational learning, for
they involve, in this case, a lost opportunity for the corporation to have learned
from the experience of the EpSilon project. The Epsilon organization itself
learned very effectively, up to a point. Not only did the individual members of
Epsilon learn new skills and attitudes (individual learning) but the whole
Epsilon project learned to operate according to new norms and designs
(organizational learning) involving close collaboration geared to optimizing
their collective performance as a team and producing a great new vehicle. This
is opposed to putting their component first and the whole product second, as the
old way. The concept of organizational learning is a subtle and difficult one,
but it is crucial to a true understanding of the issues involved. We turn now to a
discussion of this hard but rewarding concept, which lies at the heart of the
learning organization.
Organizational Learning
The learning organization is an important attempt to define a new paradigm of

management and represents about a decade of research within that framework,
building on more than a decade of earlier work by various other scholars

working within the organizational learning framework. In organizational
learning research there are two schools of work: one associated with March,
Simon, and their associates (1991); the other with Argyris and Schon (1996).
In terms of contributions to the learning organization field, there is no question
as to the magisterial status of Argyris and Schon. When they first published
their book in 1978, the idea was new, controversial, and repugnant to some
scholars. When they reissued a revised edition in 1996, they claimed that the
idea of organizational learning had meanwhile become widely accepted.
Organizational learning occurs, Argyris and Schon maintain, when

individuals within an organization experience a problematic situation and
inquire into it on the organization's behalf. They define a problematic situation
as one in which we are surprised to experience a mismatch between what we
expected and the actual results of action. That mismatch leads us to respond
through a process of thought and further action, which in tum leads us to
modify our images of the organization or our understandings of organizational
phenomena and to change our behavior so as to bring outcomes and
expectations more into line. Through this action learning, we end up changing
our organizational theory in use. So far this is all individual learning within
an organization, about an organization, but it is not organizational learning,
per se. To become organizational property, the learning that results from this
individual inquiry within an organization "must become embedded in the
images of the organization held in the minds of its members and/or in the
epistemological artifacts (the maps, memories, and programs) embedded in the
organizational environment" (Argyris & Schon, 1996, p.16).
The metaphor of organizational memory is sometimes used to refer to this

part of the process of organizational learning, the storing the new beliefs,
knowledge, or patterns for action, or adopting new routines (policies and
procedures). Unless individual learning is entered into organizational memory,
it remains the private property of certain individuals. This may sometimes
require official action and issuing revised regulations or operating guidelines;
sometimes, among informal communities of practice (Brown & Duguid, 1991),
it acquires informal acceptance by enough opinion leaders and rank and file
members for it to be disseminated as valid and valued knowledge. Each specific
workplace culture will have its own language for certifying such accepted
knowledge as straight dope or whatever. As our case-study of AutoCo Epsilon
illustrated, organizational learning that is accepted at one level of an
organization may or may not be passed on to other units or levels of the

organization. One unit's organizational learning may be another unit's rubbish
or heresy. Higher levels of the power structure can destroy the organizational
learning of lower levels as a matter of policy, or even as a matter of neglect or
indifference. In the case of a strong counterculture arising out of long conflict
and shared grievance, it is hard to destroy.
Within the definition of organizational learning just outlined, there is one

important distinction between two types of organizational learning, crucially
different. Many have seen the difference to be crucial and have given their own
labels to the two types: Argyris calls them single-loop and double-loop
learning; Senge uses adaptive and generative learning; members of the March
school refer to exploitive and exploratory; yet others refer to thinking inside the
box and outside the box, or framebreaking. The common point is that one kind
of learning (first order) operates within a framework of customary, accepted
assumptions, while another kind questions those assumptions from the vantage
point of higher order, shared views, in order to solve problems. The conditions
for each type of learning to occur are quite different; so are the consequences of
each.
When the environment changes fast and changes significantly, as it now

does, most organizations need a certain amount of second-order thinking in the
right places if they are to find the significantly new strategies demanded for
success in the new conditions. They need transformation through generative
learning, as Senge states. Simply piling up large amounts of first-order learning
will not accomplish this.
The view that an organization processes information from the

external environment in order to adapt to new circumstances
has a fundamental limitation. It does not explain innovation.
That involves creation of new knowledge and information
from the inside out, in order to redefine both problems and
solutions, and in the process, to re-create their environment
(Senge 1990, p. 56).
Lest anyone be tempted to suppose that an organization failing to use

second-order thinking might partly make up for it by an outstanding
performance in first-order learning, let them heed the warning of Argyris. This
long-time champion for the essential role of double-loop learning has argued as
follows: Because most organizations suffer from some dysfunctions in not

living up to their professed principles and standards, their leaders feel shame
and sensitivity over this. Intended or not, the impression is formed by
employees that any mention of such shortcomings is taboo; they are considered
undiscussable. Moreover, the further impression arises that this taboo itself
must not be mentioned. These taboos lie like fallen trees across the road to
organizational renewal. So long as everyone is afraid to mention the
unmentionable basic problems, their concern that all is not well may be
diverted into strenuous pursuit of all the single-loop improvements possible.
These are easy to produce, since there are well-defined paths across this land.
The problem is that they are limited and cannot possibly improve the real,
serious problems, which they are not allowed to mention or imply under
traditional, single-loop forms of discourse. Thus a focus on single-loop learning
alone becomes actually harmful, because it systematically diverts attention
away from where it is critically needed. The road away from the fallen trees
leads right over the edge of the cliff: Argyris teaches ways to get around this
barrier. They involve a combination of inquiry and advocacy, and require long,
devoted study and practice to achieve proficiency. They must be part of the
competencies of those who would be leaders of learning organizations, insists
Argyris, with the concurrence of Senge (Argyris and Schon, 1996).
When we compare organizations, we find substantial differences in their

approaches to organizational learning. Some import more knowledge from
outside the organization than others; some focus learning more on the product
or service they sell, while others focus their learning more on their processes
for generating that product or service; some use much formal documentation
and dissemination methods, while others count on informal means; some
emphasize both single- and double-loop learning (DiBella, Gould, & Nevis
1996). Some organizations are powerful breeding grounds for innovation
through learning, while others are deserts or wastelands. Their members may
be similar in their individual learning but the culture of each organization
encourages and permits very different behavior regarding the processes of
organizational learning. The concept of organizational learning steers us to
focus on the organizational processes for reviewing the suggestions of
individuals, encouraging them, enhancing them, combining them to create new
ideas, promoting them, generalizing them, adopting them, storing them,
remembering them, and so forth. These processes are enacted by individuals on
behalf of the organization. It is the value of the organizational learning concept
that it concentrates attention on all these factors.
Assessment and Evaluation
The work of training and development and the work of assessment and
evaluation of performance all need to be rethought in the light of the conceptual
models of organizational learning and the learning organization to the degree
that these represent the mental models and aspirations of our clients and
ourselves. In its simplest terms, the goals and objectives of training and
development will be quite different in a learning organization, and many key
assumptions that are made about the place of the training and development
effort within the organization must be reassessed. In its fullest terms, the entire
traditional scope and definition of training and development is called into
question in favor of a more complex and interesting one. We shall review a few
key points of the change: what, why, who, and how.
What is the goal of the training? What are the learning goals? These
questions determine what needs to be assessed. Two changes in perspective are
important. One is the shift from individual learning and performance as the
focus to collective or team learning and performance as well as the individual
kind in the learning organization. The other change involves the shift in
balance among process and outcome factors; this is a complex shift, not just a
shift from more A to more B. The learning organization is deeply concerned
with both process and outcome factors, at both individual and collective levels.
In the Epsilon case we saw how bottom line collective performance was crucial.
This included on-time launch, within budget, good quality reviews by
consumers. Individual assessments at this level were irrelevant. But during the
three-year run-up to the launch, various process measures were critical, since
they had a goal of changing the nature of collaborative work among these
design engineers. To achieve the desired ease of collaboration, give-and-take in
service of the collaborative effort, Epsilon needed to achieve greater trust
among engineers and managers. A key indicator of trust was the number of
problems reported by memo. At early stages of the design process a large
number was positive, resulting in lower numbers at the end of the project
timeline.
Why is a particular assessment being done? The key distinction is the classic
one between (1) formative assessment for the purpose of supporting learning
and improvement and (2) summative assessment for the purpose of helping to
make an executive decision, which will impact on those being assessed. It may
affect their compensation, promotion prospects, or the future of the project
itself, and hence their jobs. Summative assessment often is known as

evaluation.
How will the assessment be carried out? This will be related to how the
training was initially conceived and designed. The more discrete and traditional
the design, the more discrete and convenient the assessment can be. But in the
learning organization context, of course, things can be more complex.
Returning to the Epsilon example again, we recall that some of the training was
done in a discrete, off-site setting, so traditional evaluation could be done at the
conclusion of training, batch-style. Did individuals, for example, learn skills of
constructing causal loop diagrams? Did certain groups collectively learn to
conduct discussion according to certain guidelines? Evaluating behavior
change, the question is not just "Can they perform certain skills when asked?"
but "Do they perform those skills, when needed, in an effective way?"
If the assessment feedback indicates that effective learning is not taking

place, then closer scrutiny is needed to guide appropriate corrective action. This
scrutiny must include the learner, the trainer/s, the materials and structure of
the learning situation, and the context of performance. The context includes
both material factors and people; it includes not only prior, set-up, and
preparation factors, but also concurrent factors and follow-up factors. The role
of the training and development professional may be involved in far more than
the traditional designer-instructor roles, since direct training and coaching may
be provided by colleagues and, as in the Epsilon case, managers; and these
fellow-employees are also involved in the contextual roles that are very
important to successful performance. The role of the training and development
professional is to be very aware of the complexity of workplace learning, to
negotiate with all of the relevant parties what their roles will be, and to manage
the feedback of assessment data.
Who is the prime training and development client, and who are the
recognized stakeholders? Who gets what reports? Who is involved in
negotiating the contract with the training and development assessment staff?
Who is greatly influenced by why. When formative assessment to promote better
learning is the purpose (why?), then smart managers will agree to a hands-off
approach in favor of letting the work-group manage themselves. Without such
an understanding, there can be no learning organization.
In the culture of the learning organization, formative assessment is

considered essential, and there is acute awareness that the privacy of formative
assessment must be protected from executive contamination. As soon as the
worker-learner suspects that the boss may see the data on a certain
performance, there is a significant change in how slbe approaches it and an
equally significant change in how the data is reported. Without any conscious
intention to distort the data, workers, trainers, assessors, and supervisors all
behave differently when the performance review is private and strictly in-house,
restricted to those who are trusted to have the learner's best interests at heart,
compared to a review that may be seen by the bosses. Because we all want to
look good to the boss and want our buddies to look good, without even realizing
it, the subject preens for the camera and the photographer air-brushes the
picture that goes to the boss, smoothing away those unpleasant blemishes. In a
true learning organization there will be enough trust in certain managers that
they can be included without contaminating the complete openness of the
assessment, but that is rare. For most purposes, the guideline should be to be
very conservative in keeping managers out of formative assessment if you want
to foster real learning.
Usually the boss should get the summative or bottom-line reports. Usually
and mainly process performance and learning data should be reserved for the
individuals and work-teams themselves. Completely excluding the boss,
however, can also be a bad mistake, as the Epsilon case suggests. The learning
organization requires that bosses and workers should all share the same basic
mental models of collaboration and process. The problem at Epsilon was not
that their corporate boss needed to see process data, but that he interpreted it to
mean the opposite of what it meant within Epsilon. To them, the high number
of problems reported meant that trust levels were improved so that the problems
could be solved sooner with lower costs and delays, with a huge improvement
in cooperation and total system effectiveness, a triumph. To the boss, thinking
in terms of the traditional company culture, where most problems are
concealed, the same data meant that Epsilon was totally out of control, a
disaster. This learning organization in the Epsilon case stopped abruptly at the
program boundary; hence dialogue across this boundary involved a major
language problem. Assessment was certainly one of the conversations that
broke down because of the lack of shared meanings at this point. Whether
identified assessment professionals or versatile managers are in charge, the
challenge is to decide, jointly among the players, what is to be assessed, why,
how, and by whom.
Conclusion
Today's organizations face unprecedented competition, at home and abroad,

forcing them to offer unprecedented levels of service at whatever stage or sector
of the productive process they find themselves. They face "conditions of
unprecedented knowledge intensity, uncertainty, ambiguity, and risk" (Nohria
and Eccles 1992, p. 290). Service is a major part of every business, and
knowledge is a major component of good service. Hence the ability to manage
knowledge, its collection, production, sharing, transformation, and utilization,
has become a major survival factor of every organization, with major effects on
how it must be organized and managed.
There is little question that the 'intangibles' of databases,

personal know-how, technological understanding,
communication networks, market knowledge, brand
acceptance, distribution capabilities, organizational flexibility
and effective motivation are the true assets of most companies
and the primary sources of their future income streams
(Quinn, 1992, p. 243).
In today's global economy, companies face new success factors, above all the
need for fast and flexible change on the part of organizations. The challenge is
the changed environment; the solution could be the learning organization. That
scenario creates entirely new challenges for the field of training and
development. Can this field respond adequately?
To succeed, the training and development field will have to be capable of re-
inventing itself-just like the organizations it wishes to serve. It must fully
understand these organizations as its customers, especially their cultures and
aspirations. For the most farsighted organizations the aspiration is to become
more like a true learning organization. If training and development work is not
to be downgraded and commoditized in the current cost-cutting frenzy, it must
win acceptance as a strategic partner of top management, based on its ability to
provide all kinds of needed training (broadly construed) and supports for
managers and key employees as they learn their changing roles in the process
of organizational learning. Training and support will take many forms, new to
members of this profession, as new forms continue to be invented. Many will be
embedded in work systems in new ways. The profeSSion must invent them if it
is to succeed, and these roles will sometimes require new kinds of

collaboration.
References
Appelbaum, E., & Batt R., (1994). The new American workplace: Transforming work
systems in the U.S. Ithaca, NY: ILR Press.
Argyris, C., & Schon D. A, (1996). Organizational learning. (vols. 1-2). Reading, MA:
Addison Wesley.
Argyris, C., & Schon D. A, (1978). Organizational learning: A theOl)' of action

perspective. Reading, MA: Addison Wesley.
Ashkenas, R., Ulrich, D., Jick, T., & Kerr, S. (1993). The boundaryless organization.
San Francisco, CA: Jossey Bass.
Bidwell, C. (1986). Complex organizations: A critical essay (3rd ed.). New York:
Random House.
Blau, P., & Meyer, M. W. (1971). Bureaucracy in modern society. New York, NY:
Random House.
Brown, J. S., & Duguid, P. (1991). Organizational learning and communities-of-

practice: Toward a unified view of working, learning, and innovation. Organization
Science. 2(1), 40-57.
Burns, T., & Stalker, J. M. (1966). The management of innovation. London: Tavistock.
Chawla, S., & Renesch, 1. (eds.) (1995). Learning organizations: Developing cultures
for tomorrow's workplace. Portland, OR: Productivity Press.
DiBella, A. J., Nevis, E. C., & Gould, J. M. (1996). Understanding organizational

learning capability. Journal of Management Studies, 33(3), 361-379.
Garvin, D. A, (1993). Building a learning organization. Harvard Business Review

71(40),78-92.
Kanter, R. M. (1983). The change masters. New York, NY: Simon & Schuster.
Kanter, R. M. (1989). When giants learn to dance. New York, NY: Simon & Schuster.
Kim, D. H. (1993). The link between individual and organizational learning. Sloan
Management Review, 125-136.
Kofman, F., & Senge, P. (1995). Communities of commitment: The heart of learning
organization. In Chawla, S., & Renesch, 1. (Eds.). Learning organizations:
Developing cultures for tomorrow's workplace. (pp. 17-51). Portland, OR:
Productivity Press.
Lawrence, P. R, & Lorsch, J. W. (1967). Organization and environment: Managing

differentiation and integration. Boston, MA: Graduate School of Business
Administration, Harvard University.
Levitt, B., & March, J. G. (1988). Organizational learning. Annual Review of

Sociology. 14,319-340.
Mintzberg, H. (1979). The structuring of organizations. Englewood Cliffs, NJ: Prentice

Hall.
Nohria, N., & Eccles, R G. (eds.) (1992). Networks and organizations: Structure. form.
and action. Boston, MA: Harvard Business School Press.
Nonaka, I., & Takeuchi, H. (1996). The knowledge creating company. New York:
Oxford University Press ..
Ouchi, W. G. (1981). Theory Z: How American business can meet the Japanese
challenge. Reading, MA: Addison Wesley.
Pascale, R T., & Athos, A. G. (1981). The art of Japanese management. New York:
Simon and Schuster.
Quinn, J. B. (1992). The intelligent enterprise: A knowledge and service based

paradigm for industry. New York: Free Press.
Roth, G., & Kleiner, A. (1996). The learning initiative at the AutoCo Epsilon Program
1991-94. Cambridge, MA: Center for Organizational Learning, Massachusetts
Institute of Technology.
Schein, E. H. (1993). Organizational culture and leadership (2nd ed.). San Francisco:
Jossey-Bass.
Senge, P. M. (1990a). The fifth discipline. New York: Doubleday.

Senge, P. M. (1990b). The leader's new work: Building learning organizations. Sloan
Management Review.
Senge, P. M. (1992). The real message of the quality movement: Building learning
organizations. Journal for Quality Participation, March.
Senge, P. M., Roberts, C., Ross, R., Smith, B., & Kleiner, A. (1994). The fifth
discipline fieldbook. New York: Doubleday.
Simon, H. A. (1991). Bounded rationality and organizational learning. Organization

Science. 2(1), 125-136.
Smith, B., & Kleiner A. (1995). Is there more to corporations than maximizing profits?
The Systems Thinker. 6(3).
Uzzi, B. (1996). The sources and consequences of embeddedness for the economic
performance of organizations: The Network effect. American Sociological Review,
674-698.
Weber, M. (1947). The theory of social and economic organization. (Trans. A.M.
Henderson and T. Parsons.) Glencoe,IL: Free Press. (Original work published.)
The Web site for the MIT Center for Organizational Learning is now available via the
MIT homepage, providing a list of working papers, including some in full text.
http://learning.mit.edul
About the Author
Barry Sugarman is Professor of Management at Lesley College where he

teaches strategy to mid-career adults in management. His current research
focuses on organizational learning and the learning organization. In 1996 he
was a Research Associate at the MIT Center for Organizational Learning. At
Lesley he has developed and teaches an ethics course on the Internet. Sugarman
grew up in London, England. He earned a Ph.D. in sociology and anthropology
from Princeton University. Mter helping to establish a research unit in moral
development at Oxford University, England, he moved to the United States in
the 1970s as one of the fIrst professionals to work with the new therapeutic
communities treating drug addicts. He later worked in the community mental
health system in planning, information systems, and program evaluation. He
has been a consultant to many organizations. He is the author of four books and
numerous articles.
His web site (http://aristotle.lesley.edulfaculty/sugarman/index.htm (not "html")

contains his MIT report "Towards Closer Collaboration between
Organizational Theory and the Learning Organization."
SECTION II-MODELS OF
EVALUATION
Models for evaluating training have their intellectual roots in the discipline of
measurement and evaluation in the field of psychology. Evaluation practice has
also been influenced by pedagogical methods related to various theories of
learning and instruction in the discipline of educational psychology.
Particularly influential in the field of corporate training evaluation has been the
work of Donald Kirkpatrick. The four evaluation steps he proposed in a series
of articles in the Journal of the American Society for Training and
Development in 1959, came to be known as Kirkpatrick's four levels. This
section appropriately begins with a contribution by Kirkpatrick that describes
his work. Over the years, Kirkpatrick's "levels" have become the de facto
model for describing the evaluation of corporate training. At the same time it
has provided direction, it has also stimulated thinking, and spawned a healthy
debate that has led to the elaboration and expansion of his original work.
The chapter by Jack Phillips, one of the leading advocates and practitioners
of evaluating the return on training investments, represents both an elaboration
and expansion of the four levels. He proposes return on investment (ROI) as a
fifth level in the paradigm. Adding this level has, in turn, sparked another
debate. Phillips argues that return on investment can be calculated with an
acceptable degree of confidence and at a reasonable cost. His chapter provides
guidelines for how to position and carry out this type of evaluation.
Robert Brinkerhoff s perspective represents a reframing of the question

What impact did the training have? He suggests that the value of training is not
unidimentional; neither is it something that can be assessed without taking into
consideration the complexities of the business environment in which it is
embedded. The concern for Brinkerhoff is not whether the training worked but
94 Models of Evaluation
whether it fit the performance needs and made an effective and efficient
contribution to achieving the performance goals of the organization. One
cannot address this concern without examining the logic underlying the design
of the intervention and the value chain underlying the implementation process.
Conspicuously missing from the spotlight in the debate around training

evaluation is the function and role of formative evaluation. For many
practitioners, like the author of the next chapter, Wilbur Parrott, formative
evaluation is a critical and sometimes missing link in the value chain that leads
from performance need to business impact. Parrott discusses its significance
and provides practical guidelines for embedding iterative design and prototype
review processes within the standard design/development cycle.
One of the methodologies used to determine training needs is competency

assessment. Competency models are often used to describe the individual and
collective capabilities of the workforce, juxtaposed against the set of
knowledge, skills, and behaviors required to sustain organizational
performance and growth. The chapter by Susan Ennis provides insightful
perspectives that will help practitioners sort through the complexities of this
methodology, and enable them to better match models and methods to specific
business needs. Ennis discusses the costs and benefits of competency
assessment programs and provides professional guidelines around the collection
and use of assessment data, with particular attention to the popular 360 degree
approach.
This section ends with a discussion by Brown and Elfenbein, which provides
us with a model for practitioners to assimilate known knowledge into their
practice. This academic chapter builds upon the work of psychologist David
Kolb in experiential learning. The model attempts to integrate the introduction
of new knowledge into the intelligence of practice.
The chapters in this section provide the reader with historical background
on the field of training evaluation, new perspectives on established practice,
and fresh ideas to stimulate the growth and advancement of the field. It also
firmly places the practice of training evaluation within the context of human
resource development. Training and the evaluation of its effectiveness cannot
be isolated from the business goals of the organization and the personal
competencies required of the workforce to meet them.
5 THE FOUR LEVELS OF
EVALUATION
Donald L. Ki rkpatrick
Introduction
The reason I developed this four-level model was to clarify the elusive term
evaluation. Some training and development professionals believe that
evaluation means measuring changes in behavior that occur as a result of
training programs. Others maintain that the only real evaluation lies in
determining what final results occurred because of training programs. Still
others think only in terms of the comment sheets that participants complete at
the end of a program. Others are concerned with the learning that takes place
in the classroom, as measured by increased knowledge, improved skills, and
changes in attitude. And they are all right-and yet wrong, in that they fail to
recognize that all four approaches are parts of what we mean by evaluating.
These four levels are all important, and they should be understood by all
professionals in the fields of education, training, and development, whether
they plan, coordinate, or teach; whether the content of the program is technical
or managerial; whether the participants are or are not managers; and whether
the programs are conducted in education, business, or industry. In some cases,
especially in academic institutions, there is no attempt to change behavior. The
end result is simply to increase knowledge, improve skills, and change
attitudes. In these cases, only the first two levels apply. But if the purpose of the
training is to get better results by changing behavior, then all four levels apply.
However, in human resource development circles, these four levels are
96 The Four Levels of Evaluation
recognized widely, often cited, and often used as a basis for research and
articles dealing with techniques for applying one or more of the levels.
Evaluating: Part of a Ten-Step Process
The reason for evaluating is to determine the effectiveness of a training

program. When the evaluation is done, we can hope that the results are positive
and gratifying, both for those responsible for the program and for upper-level
managers who will make decisions based on their evaluation of the program.
Therefore, much thought and planning need to be given to the program itself to
make sure that it is effective. This chapter is devoted to suggestions for
planning and implementing the program to ensure its effectiveness.
Each of the following factors should be carefully considered when planning

and implementing an effective training program:
l. Determining needs
2. Setting objectives
3. Determining subject content
4. Selecting participants
5. Determining the best schedule
6. Selecting appropriate facilities
7. Selecting appropriate instructors
8. Selecting and preparing audiovisual aids
9. Coordinating the program
10. Evaluating the program
Reasons for Evaluating
There is an old saying among training directors: when there are cutbacks in an
organization, training people are the first to go. Of course, this isn't always
true. However, whenever downsizing occurs, top management looks for people
and departments that can be eliminated with the fewest negative results. Early
in their decision, they look at such overhead departments as Human Resources.
Human Resources typically includes people responsible for employment, salary
administration, benefits, labor relations (if there is a union), and training. In
some organizations, top management feels that all these functions except
training are necessary. From this perspective, training is optional, and its value
to the organization depends on top executives' view of its effectiveness. In other
words, trainers must justify their existence. If they don't and downsizing
occurs, they may be terminated, and the training function will be relegated to
the Human Resources manager, who already has many other hats to wear.
The second reason for evaluating is to determine whether you should

continue to offer a program. Some programs are offered as a pilot in hopes that
they will bring about the results desired. These programs should be evaluated to
determine whether they should be continued. If the cost outweighs the benefits,
the program should be discontinued or modified.
The most common reason for evaluation is to determine the effectiveness of

a program and ways in which it can be improved. Usually, the decision to
continue it has already been made. The question then is, How can it be
improved? In looking for the answer to this question, you should consider these
eight factors:
1. To what extent does the subject content meet the needs of those attending?
2. Is the leader the one best qualified to teach?
3. Does the leader use the most effective methods for maintaining interest
and teaching the desired attitudes, knowledge, and skills?
4. Are the facilities satisfactory?
5. Is the schedule appropriate for the participants?
6. Are the aids effective in improving communication and maintaining
interest?
7. Was the coordination of the program satisfactory?
8. What else can be done to improve the program?
A careful analysis of the answers to these questions can identify ways and
means of improving future offerings of the program.
Most companies use reaction sheets of one kind or another. Most are
thinking about doing more. They have not gone any further for one or more of
the following reasons:
• They don't give it a lot of importance or urgency.

• They don't know what to do or how to do it.
• There is no pressure from higher management to do more.
• They feel secure in their job and see no need to do more.
• They have too many other things that are more important or that they
prefer to do.
In most organizations, both large and small, there is little pressure from top
management to prove that the benefits of training outweigh the cost.
There are three reasons for evaluating training programs. The most common
reason is that evaluation can tell us how to improve future programs. The
second reason is to determine whether a program should be continued or
dropped. The third reason is to justify the existence of the training department.
By demonstrating to top management that training has tangible, positive
results, trainers will find that their job is secure, even if and when downsizing
occurs. If top-level managers need to cut back, their impression of the need for
a training department will determine whether they say, ''That's one department
we need to keep" or "That's a department we can eliminate without hurting
us." And their impression can be greatly influenced by trainers who evaluate at
all levels and communicate the results to them.
The Four Levels: Overview
The four levels represent a sequence of ways to evaluate programs. Each level
is important. As you move from one level to the next, the process becomes more
difficult and time-consuming, but it also provides more valuable information.
None of the levels should be bypassed simply to get to the level that the trainer
considers the most important. The four levels are:
Levell-Reaction
Level 2-Learning
Level 3-Behavior
Level4--Results
Level1-Reaction
As the word reaction implies, evaluation on this level measures how those who
participate in the program react to it. I call it a measure of customer
satisfaction. For many years, I conducted seminars, institutes, and conferences
at the University of Wisconsin Management Institute. Organizations paid a fee
to send their people to these public programs. It is obvious that the reaction of
participants was a measure of customer satisfaction. It is also obvious that
reaction had to be favorable if we were to stay in business and attract new
customers as well as get present customers to return to future programs.
It isn't quite so obvious that reaction to in-house programs is also a measure

of customer satisfaction. In many in-house programs, participants are required
to attend whether they want to or not. However, they still are customers even if
they don't pay, and their reactions can make or break a training program. What
they say to their bosses often gets to higher-level managers, who make
decisions about the future of training programs. So, positive reactions are just
as important for trainers who run in-house programs as they are for those who
offer public programs.
It is important not only to get a reaction but to get a positive reaction. As

just described, the future of a program depends on positive reaction. In
addition, if participants do not react favorably, they probably will not be
motivated to learn. Positive reaction may not ensure learning, but negative
reaction almost certainly reduces the possibility of its occurring.
Evaluating reaction is the same thing as measuring customer satisfaction. If

training is going to be effective, it is important that trainees react favorably to
it. Otherwise, they will not be motivated to learn. Also, they will tell others of
their reactions, and decisions to reduce or eliminate the program may be based
on what they say. Some trainers call the forms that are used for the evaluation
of reaction happiness sheets. Although they say this in a critical or even cynical
way, they are correct. These forms really are happiness sheets. But they are not
worthless. They help us to determine how effective the program is and learn
how it can be improved.
Measuring reaction is important for several reasons. First, it gives us

valuable feedback that helps us to evaluate the program as well as comments
and suggestions for improving future programs. Second, it tells trainees that the
trainers are there to help them do their job better and that they need feedback to
determine how effective they are. If we do not ask for reaction, we tell trainees
that we know what they want and need and that we can judge the effectiveness
of the program without getting feedback from them. Third, reaction sheets can
provide quantitative information that you can give to managers and others
concerned about the program. Finally, reaction sheets can provide trainers with
quantitative information that can be used to establish standards of performance
for future programs.
Evaluating reaction is not only important but also easy to do and do

effectively. Most trainers use reaction sheets. I have seen dozens of forms and
various ways of using them. Some are effective, and some are not. Here are
some guidelines that will help trainers to get maximum benefit from reaction
sheets:
Guidelines for Evaluating Reaction
1. Determine what you want to find out.

2. Design a form that will quantify reactions.
3. Encourage written comments and suggestions.
4. Get tOO percent immediate response.
5. Get honest responses.
6. Develop acceptable standards.
7. Measure reactions against standards and take appropriate action.
8. Communicate reactions as appropriate.
Level2-Learning
Learning can be defined as the extent to which participants change attitudes,

improve knowledge, andlor increase skill as a result of attending the program.
Those are the three things that a training program can accomplish.
Programs dealing with topics like diversity in the workforce aim primarily at
changing attitudes. Technical programs aim at improving skills. Programs on
topics like leadership, motivation, and communication can aim at all three
objectives. To evaluate learning, the specific objectives must be determined.
Some trainers say that no learning has taken place unless a change in
behavior occurs. In the four levels described in this book, learning has taken
place when one or more of the following occur. Attitudes are changed.
Knowledge is increased. Skill is improved. Change in behavior is the next
level.
There are three things that instructors in a training program can teach:
knowledge, skills, and attitudes. Measuring learning, therefore, means
determining one or more of the following:
What knowledge was learned?

What skills were developed or improved?
What attitudes were changed?
It is important to measure learning because no change in behavior can be

expected unless one or more of these learning objectives have been
accomplished. Moreover, if we were to measure behavior change (level 3) and
not learning and if we found no change in behavior, the likely conclusion is
that no learning took place. This conclusion may be very erroneous. The reason
no change in behavior was observed may be that the climate was preventing or
discouraging. In these situations, learning may have taken place, and the
learner may even have been anxious to change his or her behavior. But because
the boss either prevented or discouraged the trainee from applying the new
learning on the job, no change in behavior took place.
The measurement of learning is more difficult and time-consuming than the

measurement of reaction. These guidelines will be helpful:
Guidelines for Evaluating Learning
1. Use a control group if practical.

2. Evaluate knowledge, skills, andlor attitudes both before and after the
program. Use a paper-and-pencil test to measure knowledge and attitudes,
and use a perfonnance test to measure skills.
3. Get a 100 percent response.
4. Use the results of the evaluation to take appropriate action.
Level3-Behavior
Behavior can be defined as the extent to which change in behavior has occurred
because the participants attended the training program. Some trainers want to
bypass levels 1 and 2-reaction and learning-in order to measure behavior.
This is a serious mistake. For example, suppose that no change in behavior is
discovered. The obvious conclusion is that the program was ineffective and that
it should be discontinued. This conclusion mayor may not be accurate.
Reaction may have been favorable, and the learning objectives may have been
accomplished, but the level 3 or 4 conditions may not have been present.
For change to occur, four conditions are necessary:
1. The person must have a desire to change.

2. The person must know what to do and how to do it.
3. The person must work in the right climate.
4. The person must be rewarded for changing.
The training program can accomplish the first two requirements by creating
a positive attitude toward the desired change and by teaching the necessary
knowledge and skills. The third condition, right climate, refers to the
participant's immediate supervisor. Five different kinds of climate can be
described:
1. Preventing: The boss forbids the participant from doing what he or she
has been taught to do in the training program. The boss may be influenced by
the organizational culture established by top management. Or the boss's
leadership style may conflict with what was taught.
2. Discouraging: The boss doesn't say, "You can't do it," but he or she
makes it clear that the participant should not change behavior because it would
make the boss unhappy. Or the boss doesn't model the behavior taught in the
program, and this negative example discourages the subordinate from
changing.
3. Neutral: The boss ignores the fact that the participant has attended a
training program. It is business as usual. If the subordinate wants to change,
the boss has no objection as long as the job gets done. If negative results occur
because behavior has changed, then the boss may tum to a discouraging or even
preventing climate.
4. Encouraging: The boss encourages the participant to learn and apply his
or her learning on the job. Ideally, the boss discussed the program with the
subordinate beforehand and stated that the two would discuss application as
soon as the program was over. The boss basically says, "I am interested in
knowing what you learned and how I can help you transfer the learning to the
job."
5. Requiring: The boss knows what the subordinate learns and makes sure
that the learning transfers to the job. In some cases, a learning contract is
prepared that states what the subordinate agrees to do. This contract can be
prepared at the end of the training session, and a copy can be given to the boss.
The boss sees to it that the contract is implemented. Malcolm Knowles's book
Using Learning Contracts (1986) describes this process.
The fourth condition, rewards, can be intrinsic (from within), extrinsic

(from without), or both. Intrinsic rewards include the feelings of satisfaction,
pride, and achievement that can occur when change in behavior has positive
results. Extrinsic rewards include praise from the boss, recognition by others,
and monetary rewards, such as merit pay increases and bonuses.
It becomes obvious that there is little or no chance that training will transfer
to job behavior if the climate is preventing or discouraging. If the climate is
neutral, change in behavior will depend on the other three conditions just
described. If the climate is encouraging or requiring, then the amount of
change that occurs depends on the fIrst and second conditions.
As stated earlier, it is important to evaluate both reaction and learning in

case no change in behavior occurs. Then it can be determined whether the fact
that there was no change was the result of an effective training program or of
the wrong job climate and lack of rewards.
It is important for trainers to know the type of climate that participants will
face when they return from the training program. It is also important for them
to do everything that they can to see to it that the climate is neutral or better.
Otherwise there is little or no chance that the program will accomplish the
behavior and results objectives, because participants will not even try to use
what they have learned. Not only will no change occur, but those who attended
the program will be frustrated with the boss, the training program, or both for
teaching them things that they can't apply.
One way to create a positive job climate is to involve bosses in the

development of the program-asking bosses to help to determine the needs of
subordinates. Such involvement helps to ensure that a program teaches
practical concepts, principles, and techniques. Another approach is to present
the training program, or at least a condensed version of it, to the bosses before
the supervisors are trained.
A number of years ago, I was asked by Dave Harris, personnel manager, to

present an eighteen-hour training program to 240 supervisors at A. O. Smith
Corporation in Milwaukee. I asked Dave if he could arrange for me to present a
condensed, three-to six-hour version to the company's top management. He
arranged for the condensed version to be offered at the Milwaukee Athletic
Club. After the six-hour program, the eight upper-level managers were asked
for their opinions and suggestions. They not only liked the program but told us
to present the entire program first to the thirty-five general foremen and
superintendents who were the bosses for 240 supervisors. We did what they
suggested. We asked these bosses for their comments and encouraged them to
provide an encouraging climate when the supervisors had completed the
program. I am not sure to what extent this increased change in behavior over
the level that we would have seen if top managers had not attended or even
known the content of the program, but I am confident that it made a big
difference. We told the supervisors that their bosses had already attended the
program. This increased their motivation to learn and their desire to apply their
learning on the job.
What happens when trainees leave the classroom and return to their jobs?
How much transfer of knowledge, skills, and attitudes occurs? This is what
level 3 attempts to evaluate. In other words, what change in job behavior
occurred because people attended a training program?
It is obvious that this question is more complicated and difficult to answer

than evaluating at the ftrst two levels. First, trainees cannot change their
behavior until they have an opportunity to do so. For example, if you, the reader
of this chapter, decide to use some of the principles and techniques that I have
described, you must wait until you have a training program to evaluate.
Likewise, if the training program is designed to teach a person how to conduct
an effective performance appraisal interview, the trainee cannot apply the
learning until an interview is held.
Second, it is impossible to predict when a change in behavior will occur.

Even if a trainee has an opportunity to apply the learning, he or she may not do
it immediately. In fact, change in behavior may occur at any time after the ftrst
opportunity, or it may never occur.
Third, the trainee may apply the learning to the job and come to one of the
following conclusions: "I like what happened, and I plan to continue to use the
new behavior." "I don't like what happened, and I will go back to myoid
behavior." "I like what happened, but the boss and/or time restraints prevent
me from continuing it." We all hope the rewards for changing behavior will
cause the trainee to come to the ftrst of these conclusion. It is important,
therefore, to provide help, encouragement, and rewards when the trainee
returns to the job from the training class. One type of reward is intrinsic. This
term refers to the inward feelings of satisfaction, pride, achievement, and
happiness that can occur when the new behavior is used. Extrinsic rewards are
also important. They include praise, increased freedom and empowerment,
merit pay increases, and other forms of recognition that come as the result of
the change in behavior.
In regard to reaction and learning, the evaluation can and should take place
immediately. When you evaluate change in behavior, you have to make some
important decisions: when to evaluate, how often to evaluate, and how to
evaluate. This makes it more time-consuming and difficult to do than levels I
and 2. Here are some guidelines to follow when evaluating at level 3.
Guidelines for Evaluating Behavior

2. Allow time for behavior change to take place.
3. Evaluate both before and after the program if practical.
4. Survey and/or interview one or more of the following: trainees, their
immediate supervisors, their subordinates, and others who often observe
their behavior.
5. Get 100 percent response on sampling.
6. Repeat the evaluation at appropriate times.
7. Consider cost versus benefits.
Level 4-Results
Results can be defined as the final results that occurred because the participants
attended the program. The final results can include increased production,
improved quality, decreased costs, reduced frequency and/or severity of
accidents, increased sales, reduced turnover, and higher profits and return on
investment. It is important to recognize that results like these are the reason for
having some training programs. Therefore, the final objectives of the training
program need to be stated in these terms.
Some programs have these in mind on what we have to call a far-out basis.
For example, one major objective of the popular program on diversity in the
workforce is to change the attitudes of supervisors and managers toward
minorities in their departments. We want supervisors to treat all people fairly,
show no discrimination, and so on. These are not tangible results that can be
measured in terms of dollars and cents. But it is hoped that tangible results will
follow. Likewise, it is difficult if not impossible to measure final results for
programs on such topics as leadership, communication, motivation, time
management, empowerment, decision making, or managing change. We can
state and evaluate desired behaviors, but the final results have to be measured
in terms of improved morale or other nonfinancial terms. It is hoped that such
things as higher morale or improved quality of work life will result in the
tangible results just described.
Now comes the most important and difficult task of all-determining what
final results occurred because of attendance and participation in a training
program. Trainers ask question like these:
How much did quality improve because of the training program on total
quality improvement that we have presented to all supervisors and
managers, and how much has it contributed to profits?
How much did productivity increase because we conducted a program on
diversity in the workforce for all supervisor and managers?
What reduction did we get in turnover and scrap rate because we taught our
foremen and supervisors to orient and train new employees?
How much has management by walking around improved the quality of
work life?
What has been the result of all our programs on interpersonal
communication and human relations?
How much has productivity increased and how much have costs reduced
because we have trained our employees to work in self-directed work
teams?
What tangible benefits have we received for all the money we have spent on
programs on leadership, time management, and decision making?
How much have sales increased as the result of teaching our salespeople
such things as market research, overcoming objections, and closing a
sale?
What is the return on investment for all the money we spend on training?
All these and many more questions usually remain unanswered for two
reasons. First, trainers don't know how to measure the results and compare
them with the cost of the program. Second, even if they do know how, the
findings probably provide evidence at best and not clear proof that the positive
results come from the training program. There are expectations, of course.
Increases in sales may be found to be directly related to a sales training
program, and a program aimed specifically at reducing accidents or improving

quality can be evaluated to show direct results from the training program.
A number of years ago, Jack Jenness, a friend of mine at Consolidated

Edison in New York, was asked by his boss to show results in terms of dollars
and cents from an expensive program on leadership that they were giving to
middle- and upper-level managers. The company had hired consultants from
St. Louis at a very high fee to conduct the program. I told Jack, "There is no
way it can be done!" He said, ''That's what I told my boss." Jack then asked me
to come out to his organization to do two things: Conduct a workshop with
their trainers on the four levels of evaluation, and tell his boss that it couldn't
be done. I did the first. I didn't get a chance to do the second because the boss
had either been convinced and didn't see the need, or he didn't have the time or
desire to hear what I had to say.
This example is unusual at this point in history, but it might not be too
unusual in the future. Whenever I get together with trainers, I ask, "How much
pressure are you getting from top management to prove the value of your
training programs in results, such as dollars and cents?" Only a few times have
they said they were feeling such pressure. But many trainers have told me that
the day isn't too far off when they expect to be asked to provide such proof.
When we look at the objectives of training programs, we find that almost all
aim at accomplishing some worthy result. Often it is improved quality,
productivity, or safety. In other programs, the objective is improved morale or
better teamwork, which, it is hoped, will lead to better quality, productivity,
safety, and profits. Therefore, trainers look at the desired end result and say to
themselves and others, "What behavior on the part of supervisors and managers
will achieve these results?" Then they decide what knowledge, skills, and
attitudes supervisors need in order to behave in that way. Finally, they
determine the training needs and proceed. In so doing, they hope (and
sometimes pray) that the trainees will like the program; learn the knowledge,
skills, and attitudes taught; and transfer them to the job. The first three levels of
evaluation attempt to determine the degree to which these three things have
been accomplished.
So now we have arrived at the final level, What final results were
accomplished because of the training program? Here are some guidelines that
will be helpful?
Guidelines for Evaluating Results

2. Allow time for results to be achieved.
3. Measure both before and after the program if practical.
4. Repeat the measurement at appropriate times.
5. Consider cost versus benefits.
6. Be satisfied with evidence if proof is not possible.
Evaluating results, level 4, provides the greatest challenge to trammg
professionals. After all, that is why we train, and we ought to be able to show
tangible results that more than pay for the cost of training. In some cases, such
evaluation can be done and quite easily. Programs that aim at increasing sales,
reducing accidents, reducing turnover, and reducing scrap rates can often be
evaluated in terms of results. And the cost of the program isn't too difficult to
determine. A comparison can readily show that training pays off.
Implementing the Four Levels
Everybody talks about it, but nobody does anything about it. When Mark Twain
said this, he was talking about the weather. It also applies to evaluation-well
almost. My contacts with training professionals indicate that most use some
form of reaction, "smile," or "happiness" sheets. Some of these are, in my
opinion, very good and provide helpful information that measures customer
satisfaction. Others do not. And many trainers ignore critical comments by
saying, "Well, you can't please everybody" or "I know who said that, and I am
not surprised."
Where do I start? What do I do first? These are typical questions from

trainers who are interested in evaluation, and have done little of it.
My suggestion is to start at levelland proceed through the other levels as

time and opportunity allow. Some trainers are anxious to get to level 3 or 4
right away because they think the first two aren't as important. Don't do it.
Suppose, for example, that you evaluate at level 3 and discover that little or no
change in behavior has occnrred. What conclusions can you draw? The first
conclusion is probably that the training program was no good, and we had
better discontinue it or at least modify it. This conclusion may be entirely
wrong. The reason for no change in job behavior may be that the climate
prevents it. Supervisors may have gone back to the job with the necessary
knowledge, skills, and attitudes, but the boss wouldn't allow change to take
place. Therefore, it is important to evaluate at level 2 so you can determine
whether the reason for no change in behavior was lack of learning or negative
job climate.
The first step for you to take in implementing the evaluation concepts,
theories, and techniques described in the preceding chapters is to understand
the guidelines of levelland apply them in every program. Use a philosophy
that states, "If my customers are unhappy, it is my fault, and my challenge is to
please them." If you don't, your entire training program is in trouble. It is
probably true that you seldom please everyone. For example, it is a rare
occasion when everyone in my training classes grades me excellent. Nearly
always some participants are critical of my sense of humor, content presented,
or the quality of the audiovisual aids. I often find myself justifying what I did
and ignoring their comments, but I shouldn't do that. My style of humor, for
example, is to embarrass participants.
I hope in a pleasant way so that they don't resent it. That happens to be my
style, and most people enjoy and appreciate it. If I get only one critical
comment from a group of twenty-five, I will ignore it and continue as I did in
the past. However, if the reaction is fairly common because I have overdone it,
then I will take the comment seriously and change my approach.
I used to tell a funny story in class. It was neither dirty nor ethnic. Nearly
everyone else thought it was funny, too, and I had heard no objections to it. One
day, I conducted a training class with social workers. I told the story at the
beginning of the class and proceeded to do the training. After forty minutes, 1
asked whether anyone had a comment or question. One lady raised her hand
and said, "I was offended by the joke you told at the beginning of the session,
and 1 didn't listen to anything you said after that!"
I couldn't believe it. I was sure she was the only one who felt that way, so 1
asked the question, "Did any others feel the same way?" Seven other women
raised their hands. There were about forty-five people in the class, so the
percentage was very much in my favor. But 1 decided that that particular joke
had no place in future meetings. If she had been the only one, I probably would
still be telling it.
The point is this: look over all the reaction sheets, and read the comments.
Consider each one. Is there a suggestion that will improve future programs? If
yes, use it. If it is an isolated comment that will not improve future programs,
appreciate it, but ignore it.
Evaluating at level 2 isn't that difficult. All you need to do is to decide what
knowledge, skills, and attitudes you want participants to have at the end of the
program. If there is a possibility that one or more of these three things already
exist, then a pretest is necessary. If you are presenting something entirely new,
then no posttest is necessary. You can use a standardized test if you can find
one that covers the things you are teaching. Or you can develop your own test
to cover the knowledge and attitudes that you are teaching.
Levels 3 and 4 are not easy. A lot of time will be required to decide on an
evaluation design. A knowledge of statistics to determine the level of
significance may be desirable. Check with the research people in your
organization for help in the design. If necessary, you may have to call in an
outside consultant to help you or even do the evaluation for you. Remember the
principle that the possible benefits from an evaluation should exceed the cost of
doing the evaluation, and be satisfied with evidence if proof is not possible.
There is another important principle that applies to all four levels: You can
borrow evaluation forms, designs, and procedures from others, but you cannot
borrow evaluation results. If another organization offers the same program as
you do and they evaluate it, you can borrow their evaluation methods and
procedures, but you can't say, ''They evaluated it and found these results.
Therefore, we don't have to do it, because we know the results we would get."
Learn all you can about evaluation. Find out what others have done. Look
for forms, methods, techniques, and designs that you can copy or adapt. Ignore
the results of these other evaluations, except out of curiosity.
Trainers must begin with desired results and then determine what behavior
is needed to a accomplish them. Then trainers must determine the attitudes,
knowledge, and skills that are necessary to bring about the desired behavior.
The final challenge is to present the training program in a way that enables the
participants not only to learn what they need to know but also to react favorably
to the program. This is the sequence in which programs should be planned. The
four levels of evaluation are considered in reverse. First, we evaluate reaction.
Then, we evaluate learning, behavior, and results-in that order. Reaction is
easy to do, and we should assess it for every program. Trainers should proceed
to the other three levels as staff, time, and money are available.
Concepts in this chapter were originally published in Evaluating Training

Programs: The Four Levels, by Donald L. Kirkpatrick. (San Francisco: Berret~
Koehler, 1994). This chapter is published with permission of Berret-Koehler.
References
Kirkpatrick, D. L (1994). Evaluating Training Programs: The Four Levels. San
Francisco: Berrett-Koehler Publishers.
Knowles, M. (1986). Using Learning Contracts. San Francisco: lossey-Bass.
About the Author
Donald L. Kirkpatrick is Professor Emeritus, University of Wisconsin, and a

widely respected teacher, author, and consultant. He is the author of eight
management inventories and five books: How to Manage Change Effectively,
How to Improve Performance Through Appraisal and Coaching, How to Train
and Develop Supervisors, How to Plan and Conduct Productive Business
Meetings, and No-nonsense Communication. He is best known for developing
the internationally accepted four-level approach for evaluating training
programs. He received his B.B.A., M.B.A., and Ph.D. degrees from the
University of Wisconsin-Madison.
6 LEVEL FOUR AND BEYOND: AN
ROI MODEL
Jack J. Phillips
The ROI Imperative
In every aspect of human resource development (HRD), the concept of return

on investment (ROI) is being explored, tested, discussed, or debated. On almost
every conference agenda, the ROI issue is there. In workshops and general
sessions, participants turn out in record numbers to find out how to measure
ROJ. Workshops on measuring ROI in training and development have been
conducted in diverse settings and locations ranging from deep in the jungles of
Indonesia to New York City, the world's financial center. In training and HRD
professional journals, the ROI issue commands top print space, including an
occasional cover story. Hundreds of organizations are pursuing the ROI issue in
a deliberate and determined way and are reporting excellent results (Phillips,
1994). Even top executives are now seeking a measurable return on their
training investment. One business school magazine prepared a special issue on
the topic and sent it to all of the USA Fortune 500 CEO's (Phillips, 1995).
The ROI issue has sparked some interesting debate. Some argue that the
ROI must be developed, at least for a small number of programs. Others oppose
the ROI process and recommend that evaluation stop with an on-the-job skills
check. This debate creates several questions. Can the return on investment be
calculated? If so, can it be done with an acceptable level of confidence and at a
reasonable cost? Are many organizations pursuing ROI calculations? Early
114 Level Four and Beyond: An ROI Model
evidence suggests that the answer to all questions is a resounding yes. We begin
with a review of the concept of evaluation levels.
The Four-Level Framework: Uses and Shortcomings
Use of The Model
The HRD profession and the climate in which the profession is practiced has
changed significantly in the 40 years since Kirkpatrick developed his four
levels for evaluating training programs (Kirkpatrick, 1959a; Kirkpatrick,
1959b; Kirkpatrick, 1960a; Kirkpatrick, 1960b). At the time of its
development, the framework provided an ingenious way to sort through the
maze of evaluation problems facing training practitioners at that time. There
was a void of useful models to help the practitioners tackle this important issue.
The four-level framework was well received and has since been the model of
choice for the practitioners (Kaurman & Keller, 1994). It has enjoyed
widespread use and has a reputation throughout the world as a logical,
practical, and useful framework to pursue evaluation (Bramley & Kitson,
1994). With the growth of HRD and the growing importance of the function,
adjustments are necessary in the model to make it more useful and practical for
today's practitioners while at the same time addressing some of the concerns
about the approach.
Kirkpatrick's original model, shown in Table 6-1, was narrowly focused,

deviating very little from the literal definition of the questions. Although the
basic model has not been modified in later presentations, Kirkpatrick began to
broaden some of the concepts and definitions in later years (Kirkpatrick, 1994).
Perhaps this was a result of criticisms of the model from both the practitioners
and researchers.
There has been a tremendous amount of research undertaken to test

Kirkpatrick's model and to examine relationships within the four levels.
Alliger and Janak identified three problematic assumptions about the
Kirkpatrick model: 1) three levels are arranged in ascending order of
information provided, 2) the levels are causally linked and, 3) the levels are
positively intercorrelated (Alliger & Janak. 1989). Their study examined the
validity of these assumptions, the frequency of each level in published

evaluation studies, correlations from the literature in regard to Assumptions 2
and 3, and implications for the researcher and training manager. Collectively
they examined over 200 studies and found little support for the assumptions.
Kaufman and Keller (1994) compared the Kirkpatrick framework to five levels
of critical organizational concern and evaluation and offered an adjusted model.
Table 6-1. Kirkpatrick's four levels of evaluation with original definitions

Level Question
1. Reaction How well do the trainees like the program?
2. Learning What principles, facts, and techniques did the conferees
understand and absorb?
3. Behavior Did the trainees change their on-the~iob behavior?
4. Results What results were obtained?
Acceptance of the model has been widescale, particularly with practitioners.

The model provided a useful, understandable, and practical way in which
practitioners could evaluate their programs. The extent to which practitioners
have utilized the concept is impressive (Moseley & Larson, 1994). The four-
level model provides a reference for many current studies about evaluation. For
example, a major effort involving a search for best practices, the American
Society for Training and Development launched a benchmarking forum in 1992
(Kimmerling, 1993). Measurement and evaluation is one of the four key areas
of initial investigation with this group. The participants in the forum present
data by level of evaluation using Kirkpatrick's model.
Problems with the Kirkpatrick Model
Linkages Between Levels
The model is based on assumptions that there was a linkage between the
different levels: when a participant enjoyed the program, learning would occur;
when there was learning, there would be on-the-job behavior change; when
there was behavior change, there would be results. Unfortunately, these
linkages do not exist or are not very strong (Alliger & Janak, 1989). There is
considerable evidence that suggests that positive reactions at the end of a

program do not necessarily correlate with learning (Dixon, 1990). Also, many
research studies on transfer of training have shown that while participants may
learn the material, there are many legitimate reasons why it does not appear as
behavior changes on the job (Broad & Newstrom, 1992).
Definitions
Kirkpatrick's model was limited in its definition of levels 1, 2, and 3. In the

basic model, level 1 focuses on how well the trainees "liked a particular
training program," a very narrow view. There are other important elements that
could be included at this level, including resources, efficiency, and planned
actions. At level 2, Kirkpatrick defined learning as "principles, facts, and
techniques .. , understood and absorbed by the conferees." This is rather limited
and should be expanded to include knowledge, skills, and attitudes. At level 3,
the term behavior change is limiting. Although most training brings about a
change in behavior, there are other times when training causes the acquisition
of knowledge or changes in attitudes that may not appear as direct observable
behavioral change. Because of this, the definition of behavior needs to be
broadened to indicate any on-the-job applications of the training or learning
material to produce needed changes.
Accountability
Originally Kirkpatrick's level 4 focused on results. He stated that training

should cause measurable changes in improved productivity or quality, reduced
absenteeism and turnover, or a variety of results at the work unit, department,
division, or organizational level. The initial focus of the level 4 evaluation was
on monitoring these output variables to observe changes after the training
program was conducted, for example; productivity increased by 10 percent six
months after the training program was conducted or absenteeism was reduced
by 15 percent three months after the program.
In terms of the value of the data collected, this approach has two limitations.
First, the results were not converted to dollar values. There is a need to
understand and account for the value of an improvement (Fitz-enz, 1994). For
example, top management may want or need to know the value of a 15 percent
reduction in absenteeism. This requires organizations to place values on all

types of training program results. Thus, a value-added component is needed for
a more complete level of evaluation.
The second limitation is the absence of training costs. For a complete

evaluation, the cost of the program must be compared with the actual value
derived (Marrelli, 1993). In too many cases, the results of a training program,
although positive, did not add enough value to cover its costs (Altizer, 1994).
As a result, many programs evaluated at level 4 escaped the true accountability
of comparing the net benefits of the program with all of the costs of the
program. In later versions of Kirkpatrick's model, the concept of costibenefit
analysis was discussed as "one way to measure the results" (Kirkpatrick, 1983).
This confuses practitioners since a costibenefit analysis is a different level of
evaluation when compared to tracking results such as productivity and quality
improvements. For years, many practitioners have omitted the cost component
with the assumption that they are tracking level 4 performance data.
Isolating the Effects of Training
Earlier versions of Kirkpatrick's model paid little attention to the various ways
of isolating the effects of training, which is an important part of level 3 and 4
evaluations. In almost every training program where improvement is
documented with on-the-job behavior change or results, there were other
influences or factors that contributed to those results (Davidove, 1993). Few
cases exist in which training is the only input variable that has an influence on
an output variable during the timeframe of a training program. Therefore, an
important and essential ingredient of any evaluation effort must be a deliberate
attempt to isolate the effects of training.
In the initial presentation of the model, Kirkpatrick stated that for an

evaluation at level 3, a control group was needed to determine the amount of
behavior change. Yet, some of the studies used to illustrate the point did not use
a control group. In reality, there are many ways to isolate the effect of training
(Davidove, 1993). One recent report identified ten strategies to isolate the
effects of training (Phillips, 1996). This issue must be addressed at level 4 as
well as level 3, although it is probably assumed that controlling at level 3 will
address the issue at level 4. Without proper attention to this issue at both levels
3 and 4, training will improperly take credit for any improvement, and the
evaluation process will lose considerable credibility.
Why ROI?
There are some good reasons why return on investment has become a hot topic.
Although the viewpoints and explanations may vary, some things are very
clear. First, in most industrialized nations, training budgets have continued to
grow year after year, and as expenditures grow, accountability becomes a more
critical issue. An increasing budget makes a larger target for internal critics,
often forcing the development of the ROI.
Second, total quality management and continuous process improvement

have focused increased attention on measurement issues. Organizations now
measure processes and outputs that were not previously measured, monitored,
and reported. This measurement focus has placed increased pressure on the
training and HRD function to develop measures of its output and successes
(Biech, 1994).
Third, the restructuring and reengineering experience and the threat of

outsourcing has caused many HRD executives to focus more clearly and directly
on bottom-line issues. Many training and development functions have been
reengineered so that programs are linked to business needs and efficiencies are
enhanced (Shandler, 1996). Change processes have caused HRD executives to
examine evaluation issues and measure the contribution of specific programs.
The threat of outsourcing has forced some HRD managers to more closely align
programs to organizational objectives and measure successes so that
management can understand the contribution of HRD to the organization.
Fourth, the business management mindset of many current training and

HRD managers leads them to put more emphasiS on the economic issues within
the training function. The training manager of the 1990s is more aware of
bottom line issues in the organization and is more knowledgeable of the
operational and financial areas. This new "enlightened" manager often takes a
business approach to training and development, and the ROI issue is a part of
this process.
Fifth, there has been a persistent trend of accountability in organizations all

over the globe. Every support function is attempting to show its worth by
capturing the value that it adds to the organization (Geber, 1995). From the
accountability perspective, the HRD function should be no different from the
other functions. It must show its contribution.
Sixth, top executives are now demanding return on investment calculations

in organizations where they were previously not required. For years, training
and HRD executives have convinced top executives that training cannot be
measured, at least to the level desired by executives. Yet, many of the
executives are now finding out that it can and is being measured in many
organizations, thanks in part to articles in publications aimed at top executives.
Becoming aware that it can be done, top executives are subsequently
demanding the same accountability from their HRD departments. In some
extremes, HRD departments are being asked to show the return or face
significant budget cuts (Bleech & Mutchler, 1994). Others are just asking for
results. The chief operating officer for a Canadian-based global
telecommunications company recently described it this way: "For years we have
evaluated training with variables such as number of participants, number of
programs, length of programs, cost of programs, and content of programs.
These are input-focused measures. Now, we must show what these programs
are doing for our company and speak in terms that we can understand. We need
output-focused measures."l These no-nonsense comments are being repeated
throughout the corporate world.
A Revised Framework
The limitations of the Kirkpatrick model, and the concern for more attention to
ROI, has led to the development of a modified version of Kirkpatrick's model,
depicted in Table 6-2.
The primary modification is the addition of the fifth level, which focuses on
the return on investment. The other four levels are very similar to Kirkpatrick's
with some minor changes. Level 5 evaluation requires the addition of two
important and necessary steps. The first step is the conversion of the results
tabulated in level 4 to a monetary value to be used in an ROI formula. This
requires a direct conversion of hard data, such as quantity, quality, cost, or
time, which can be relatively easy for programs such as technical training. For
soft data, the task is more difficult, although a variety of techniques are utilized
to place values on the improvements. Estimating values can be a reliable and
accurate method for placing values on training data (Marrelli, 1993). Even soft,
interpersonal skills training can be evaluated in dollar-and-cents terms (Fitz-
enz, 1994).
Table 6-2. Five levels of evaluation
Level Measurement Focus
Reaction & Planned Measures participant satisfaction with the program

Action and captures planned actions.
Learning Measures changes in knowledge, skills, and attitudes.
Job Applications Measures changes in on-the-job behavior.
Business Results Measures changes in business impact variables.
Return on Investment Compares program benefits to the costs.
The second step involves the calculation of the costs for the program.
Although there has always been a need to capture training costs, the need is
greater with the pressure for ROI. Unfortunately costs are not clearly defined or
pursued in many organizations. The addition of this step will focus more
attention on accountability and require organizations to develop costs as they
move to the fifth level, the ultimate level of evaluation.
With this process, HRD professionals must develop an evaluation strategy

and make a deliberate attempt to move through the chain of evaluation levels
up to and including the fifth level. In some cases, the evaluation may stop at
levels 1 or 2. In others, evaluation may be conducted at level 3 with no
evaluation at level 4. Still, in other cases, a level 4 evaluation may be developed
without a level 5 evaluation. And, finally, level 5 would show the ultimate
evaluation. The movement up the chain depends on the type of program,
management's interest in higher-level evaluation, and the resources available
for evaluation. Some organizations develop targets for each level.
This model has several minor adjustments from Kirkpatrick's model, which
increase its likelihood of success. Reaction at level 1 in Kirkpatrick's model is
changed to reaction and planned action. Because training transfer is such an
important issue, it is helpful to include efforts to enhance the possibility of
transfer. At level 2, the definition of learning is broadened to specifically
include knowledge, skills, and attitudes. At level 3, another adjustment is made
in the use of on-the-job applications instead on behavior change. This implies

that participants may have changes in knowledge, skills, and attitudes that
could be applied in any combination to improve the situation at the work place.
It is not strictly confined to behavioral changes. Finally, level 4 is changed to
business results to reflect the need to present results in a form that top
management expect and understand.
The ROI Process
The calculation of the return on investment in HRD begins with the basic
model shown in Figure 6-1, where a potentially complicated process can be
simplified with sequential steps. The ROI process model provides a systematic
approach to ROI calculations. A step-by-step approach helps to keep the
process manageable so that users can tackle one issue at a time. The model also
emphasizes the fact that this is a logical, systematic process that flows from one
step to another. Applying the model provides consistency from one ROI
calculation to another. Each step of the model is briefly described here.
TABUI.ATE
PROGRAM
G()STS
CONVERT
ISOLATE C,'\LCULATE
DATA TO
THE EFFECTS -fHE ~lETUHN ON
MONETARY
OF TRAINING iNVESTMENT
VALUE
IDENTiFY
iNTANGIBLE
BENEFns
Figure 6-1. ROI process model

Preliminary Evaluation Information
Several pieces of the evaluation puzzle must be explained when developing the
evaluation plan for an ROI calculation. Four specific elements are important to
evaluation success and are outlined below.
Evaluation purposes should be considered prior to developing the evaluation

plan because the purposes will often determine the scope of the evaluation, the
types of instruments used, and the type of data collected. For example, when an
ROI calculation is planned, one of the purposes would be to compare the cost
and benefits of the program. This purpose has implications for the type of data
collected (hard data), type of data collection method (performance Monitoring),
the type of analysis (thorough), and the communication medium for results
(Formal Evaluation Report). For most programs, multiple evaluation purposes
are pursued.
A variety of instruments are used to collect data and the appropriate

instruments should be considered in the early stages of developing the ROI.
Questionnaires, interviews, and focus groups are common instruments used to
collect data in evaluation. The instruments most familiar to the culture of the
organization and appropriate for the setting and evaluation requirements should
be used in the data collection process.
Training programs are evaluated at five different levels and are illustrated in
Table 6-2. Data should be collected at levels 1, 2, 3, and 4 if an ROI analysis is
planned. This helps ensure that the chain of impact occurs as participants learn
the skills, apply them on the job, and obtain business results.
A final aspect of the evaluation plan is the timing of the data collection. In
some cases, preprogram measurements are taken to compare with postprogram
measures and in some cases multiple measures are taken. In other situations,
preprogram measurements are not available and specific follow-ups are still
taken after the program. The important issue in this part of the process is to
determine the timing for the follow-up evaluation. For example, in some
programs, evaluations can be made as early as three weeks after customer
service skills training program for a major airline. Five years may be required
to measure the payback for an employee to attend an MBA program sponsored
by an Indonesian company. For most profeSSional and supervisory training, a
follow-up is usually conducted in the range of 3-6 months.
These four elements, evaluation purposes, instruments, levels, and timing

are all considerations in selecting the data collection methods and developing
the data collection plan.
Collecting Data
Data collection is central to the ROI process. In some situations, postprogram

data are collected and compared to preprogram situations, control group
differences, and expectations. Both hard data, representing output, quality, cost,
and time; and soft data, including work habits, work climate, and attitudes are
collected. Data are collected using a variety of methods, including the
following:
• Follow-up surveys are taken to determine the degree to which participants

have utilized various aspects of the program. Survey responses are often
developed on a sliding scale and usually represent attitudinal data. Surveys
are useful for level 3 data.
• Follow-up questionnaires are administered to uncover specific
applications of training. Participants provide responses to a variety of types
of open-ended and forced-response questions. Questionnaires can be used
to capture both level 3 and 4 data.
• On-the-job observation captures actual skill application and use.
Observations are particularly useful in customer service training and are
more effective when the observer is either invisible or transparent.
Observations are appropriate for level 3 data.
• Postprogram interviews are conducted with participants to determine the
extent to which learning has been utilized on the job. Interviews allow for
probing to uncover specific applications and are appropriate with level 3
data.
• Focus groups are conducted to determine the degree to which a group of
participants have applied the training to job situations. Focus groups are
appropriate with level 3 data.
• Program assignments are useful for simple short-term projects.
Participants complete the assignment on the job, utilizing skills or
knowledge learned in the program. Completed assignments can often

contain both level 3 and 4 data.
• Action plans are developed in training programs and are implemented on
the job after the program is completed. A follow-up of the plans provide
evidence of training program success. Level 3 and 4 data can be collected
with action plans.
• Performance contracts are developed where the participant, the
participant's supervisor, and the instructor all agree on specific outcomes
from training. Performance contracts are appropriate for both level 3 and 4
data.
• Programs are designed with a follow-up session, which is utilized to
capture evaluation data as well as present additional learning material. In
the follow-up session, participants discuss their successes with the
program. Follow-up sessions are appropriate for both level 3 and 4 data.
• Performance monitoring is useful when various performance records and
operational data are examined for improvement. This method is
particularly useful for level 4 data.
The important challenge in this step is to select the data collection method
or methods appropriate for the setting and the specific program, within the time
and budget constraints of the organization.
Isolating the Effects of Training
An issue overlooked in most evaluations is the process to isolate the effects of

training. In this step, specific strategies are explored, which determine the
amount of output performance directly related to the program. The result is
increased accuracy and credibility of the ROI calculation. This step is essential
because there are many factors that will influence performance data after
training. The following strategies have been utilized by organizations to tackle
this important issue:
• A control group arrangement is used to isolate training impact. With this

strategy, one group receives training while another similar group does not
receive training. The difference in the performance of the two groups is
attributed to the training program. When properly established and
implemented, the control group management is the most effective way to

isolate the effects of training.
• Trend lines are used to project the value-specific output variables if
training had not been undertaken. The projection is compared to the actual
data after training and the difference represents the estimate of the impact
of training. Under certain conditions this strategy can be an accurate way
to isolate the impact of training.
• When mathematical relationships between input and output variables are
known, a forecasting model is used to isolate the effects of training. With
this approach, the output variable is predicted using the forecasting model
with the assumption that no training is conducted. The actual performance
of the variable after the training is then compared with the forecasted value
to estimate the impact of training.
• Participants estimate the amount of improvement related to training.
With this approach, participants are provided with the total amount of
improvement, on a pre- and postprogram basis, and are asked to indicate
the percent of the improvement that is actually related to the training
program.
• Supervisors of participants estimate the impact of training on the output
variables. With this approach, supervisors of participants are presented
with the total amount of improvement and are asked to indicate the percent
related to training.
• Senior management estimates the impact of training. In these cases,
managers provide an estimate or "adjustment" to reflect the portion of the
improvement related to the training program. While perhaps inaccurate,
there are some advantages of having senior management involved in this
process.
• Experts provide estimates of the impact of training on the performance
variable. Because the estimates are based on previous experience, the
experts must be familiar with the type of training and the specific situation.
• In supervisory and management training, the subordinates of participants
identify changes in the work climate that could influence the output
variables. With this approach, the subordinates of the supervisors receiving
training determine if other variables, which could have influenced output
performance, changed in the work climate.
• When feasible, other influencing factors are identified and the impact
estimated or calculated leaving the remaining unexplained improvement
attributed to training. In this case, the influence of all of the other factors
are developed and training remains the one variable not accounted for in
the analysis. The unexplained portion of the output is then attributed to
training.
• In some situations, customers provide input on the extent to which
training has influenced their decision to use a product or service.
Although this strategy has limited applications, it can be quite useful in
customer service and sales training.
Collectively, these ten strategies provide a comprehensive set of tools to
tackle the important and critical issue of isolating the effects of training.
Converting Data to Monetary Values
To calculate the return on investment, data collected in a level 4 evaluation are

converted to monetary values to compare to program costs. This requires a
value to be placed on each unit of data connected with the program. Ten
strategies are available to convert data to monetary values where the specific
strategy selected usually depends on the type of data and the situation:
• Output data is converted to profit contribution or cost savings. In this

strategy, output increases are converted to monetary value based on their
unit contribution to profit or the unit of cost reduction. These values are
readily available in most organizations.
• The cost of quality is calculated and quality improvements are directly
converted to cost savings. This value is available in many organizations.
• For programs where employee time is saved, the participant wages and
benefits are used for the value for time. Because a variety of programs
focus on improving the time required to complete projects, processes, or
daily activities, the value of time becomes an important and necessary
issue.
• Historical costs are used to develop the value when they are available for a
specific variable. In this case, organizational cost data are utilized to
establish the specific value of an improvement.
• When available, internal and external experts may be used to estimate a

value for an improvement. In this situation, the credibility of the estimate
hinges on the expertise and reputation of the individual providing the
estimate.
• External databases are sometimes available to estimate the value or cost
of data items. Research, government, and industry databases can provide
important information for these values. The difficulty lies in finding a
specific database related to the situation.
• Participants estimate the value of the data item. For this approach to be
effective, participants must be capable of providing a value for the
improvement.
• Supervisors of participants provide estimates when they are both willing
and capable of assigning values to the improvement. This approach is
especially useful when participants are not fully capable of providing this
input or when supervisors need to confirm or adjust the participant's
estimate.
• Senior management provides estimates on the value of an improvement
when they are willing to offer estimates. This approach is particularly
helpful to establish values for performance measures that are very
important to senior management.
• HRD statT estimates may be used to determine a value of an output data
item. In these cases it is essential for the estimates to be provided on an
unbiased basis.
This step in the ROI model is very important and is absolutely necessary to
determine the monetary benefits from a training program. The process is
challenging, particularly with soft data, but can be methodically accomplished
using one or more of the above strategies.
Tabulating Program Costs
The other part of the equation on a costlbenefit analysis is the cost of the
program. Tabulating the costs involves monitoring or developing all of the
related costs of the program targeted for the ROI calculation. Among the cost
components that should be included are
• The cost to design and develop the program, possibly prorated over the
expected life of the program
• The cost of all program materials provided to each participant
• The cost for the instructor/facilitator, including preparation time as well as
delivery time
• The cost of the facilities for the training program
• Travel, lodging, and meal costs for the participants, if applicable
• Salaries plus employee benefits of the participants to attend the training
• Administrative and overhead costs of the training function allocated in
some convenient way to the training program
In addition, specific costs related to the needs assessment and evaluation
should be included, if appropriate. The conservative approach is to include all
of these costs so that the total is fully loaded.
Calculating the ROI
The return on investment is calculated using the program benefits and costs.
The benefit/cost ratio is the program benefits divided by cost. In formula form it
is
Program Benefits
BCR=
Program Costs
The return on investment uses the net benefits divided by program costs. The
net benefits are the program benefits minus the costs. In formula form, the ROI
becomes
Net Program Benefits
ROI (%) = Program Costs
x 100
This is the same basic formula used in evaluating other investments in

which the ROI is traditionally reported as earnings divided by investment. The
ROI from some training programs is high. For example, in sales training,
supervisory training, and managerial training, the ROI can be quite large,
frequently over 100 percent, while the ROI value for technical and operator
training may be lower.
Identifying Intangible Benefits
In addition to tangible, monetary benefits, most training programs will have

intangible nonmonetary benefits. The ROI calculation is based on converting
both hard and soft data to monetary values. Other data items are identified that
are not converted to monetary values and these intangible benefits include
items such as
• Increased job satisfaction

• Increased organizational commitment
• Improved teamwork
• Improved customer service
• Reduced complaints
• Reduced conflicts
During data analysis, every attempt is made to convert all data to monetary
values. All hard data, such as output, quality, and time are converted to
monetary values. The conversion of soft data is attempted for each data item.
However, if the process used for conversion is too subjective or inaccurate and
the resulting values lose credibility in the process, then the data is listed as an
intangible benefit with the appropriate explanation. For some programs
intangible, nonmonetary benefits are extremely valuable, often wielding as
much influence as the hard data items.
Implementation Issues
The best designed process model or technique will be worthless unless it is

integrated efficiently and effectively in the organization. Although the ROI
process is presented here as a step-by-step, methodical, and simplistic
procedure, it will fail even in the best organizations if it is not integrated into
the mainstream of activity and fully accepted and supported by those who
should make it work in the organization.
Targets
Setting specific targets for evaluation levels is an important way to make

progress with measurement and evaluation. Targets enable the staff to focus on
the improvements needed with specific evaluation levels. In this process, the
percent of courses planned for evaluation at each level is developed. The first
step is to assess the present situation. The number of all courses, including
repeated sections of a course, is tabulated along with the corresponding level(s)
of evaluation presently conducted for each course. Next, the percent of courses
using level 1 reaction questionnaires is calculated. The process is repeated for
each level of the evaluation.
After detailing the current situation, the next step is to determine a realistic
target within a specific time frame. Many organizations set annual targets for
changes. This process should involve the input of the full HRD staff to ensure
that the targets are realistic and that the staff is committed to the process. If the
training and development staff does not buy into this process, the targets will
not be met. The improvement targets must be achievable, while at the same
time, challenging and motivating. Table 6-3 shows the targets established for
Andersen Consulting for four levels (Geber, 1995). Andersen indicates that
many of the level 4 evaluations are taken to ROI. In some organizations, half of
the level 4 calculations are taken to level 5, while in others, everyone of them
are taken. Table 6-4 shows current percentages and targets for five years in
another organization, a large multinational company. This table shows the
gradual improvement of increasing evaluation activity at levels 3, 4, and 5. In
this table, year 0 is the current status.
Table 6-3. Evaluation targets in Arthur Andersen & Co.
Level TarJ(et
Levell, Reaction 100%
Level 2, Learning 50%
Level 3, Job Application ",30%
Level 4, Business Results 10%
Target setting is a critical implementation issue. It should be completed

early in the process with full support of the entire HRD staff. Also, if practical
and feasible, the targets should have the approval of the key management staff,
particularly the senior management team.
Table 6-4. Percentages and targets for five years in a large multinational
company
Percent of Courses Evaluated at Each Level

Year Year Year Year Year Year
0 1 2 3 4 5
• Reaction and 85% 90% 95% 100% 100% 100%
Planned Action
• Learning 30% 35% 40% 45% 50% 60%
• Job Application 5% 10% 15% 20% 25% 30%
• Business Results 2% 4% 5% 9% 12% 15%
• ROI 0 2% 4% 6% 8% 10%
Planning
Few initiatives will be effective without proper planning and the ROI process is
no exception. Planning is synonymous with success. Several issues are
presented next to show how the organization should plan for the ROI process
and position it as an essential component of the training and development
process.
Assigning Responsibilities
Determining the specific responsibilities is a critical issue because there can be

confusion when individuals are unclear of their specific assignments in the ROI
process. Within the HRD department, responsibilities apply to two groups. The
first is the measurement and evaluation responsibility for the entire training
and development staff. It is important for all of those involved in developing
and delivering programs to have some responsibility for measurement and
evaluation. These responsibilities include providing input on the design of
instruments, planning a specific evaluation, analyzing data, and interpreting
the results. Typical responsibilities include
• Ensuring that the needs assessment includes specific business impact

measures
• Developing specific application objectives (level 3) and business impact
objectives (level 4) for each program
• Focusing the content of the program on the objectives of the business

perfol1llance improvement, ensuring that exercises, case studies, and skill
practices relate to the desired objectives
• Keeping participants focused on the end results of application and impact
• Communicating rationale and reasons for evaluation
• Assisting in follow-up activities to capture business impact data
• Providing technical assistance on data collection, data analysis, and
reporting
• Designing specific instruments and procedures for data collection and
analysis
• Presenting evaluation data to a variety of groups
While it may be inappropriate to have each member of the staff involved in
all of these activities, each individual should have at least one or more
responsibilities as part of their regular job duties. This assignment of
responsibility keeps the ROI process from being disjointed and separate from
major training and development activities. More important, it helps to bring
accountability to those who develop and deliver the programs.
The second issue involves the responsibilities of the technical support

function. Depending on the size of the training and development staff, it may
be helpful to establish a group of technical experts who provide assistance with
the ROI process. When this group is established, it must be clear that the
experts are not there to relieve others of evaluation responsibilities but to
supplement technical expertise. Technical group responsibilities revolve around
six key areas:
• Designing data collection instruments

• Providing assistance for developing an evaluation strategy
• Analyzing data, including specialized statistical analyses
• Interpreting results and making specific recommendations
• Developing an evaluation report or case study to communicate overall
results
• Providing technical support in any phase of the ROI process
The assignment of responsibilities for evaluation is also an issue that needs

attention throughout the evaluation process. Although the training and
development staff must have specific responsibilities during an evaluation, it is
not unusual to require others in support functions to have responsibility for data
collection. These responsibilities are defined when a particular evaluation
strategy plan is developed and approved.
RevisinglDeveloping Policies and Procedures
Another key part of planning is revising, or developing, the organization's

policy concerning measurement and evaluation-{)ften a part of policy and
practice for developing and implementing training and development programs.
The policy statement contains information developed specifically for the
measurement and evaluation process. It is frequently developed with the input
of the HRD staff and key managers or clients. Sometimes policy issues are
addressed during internal workshops designed to build skills with measurement
and evaluation. The policy statement addresses critical issues that will
influence the effectiveness of the measurement and evaluation process. Typical
topics include adopting the five-level model presented in this book, requiring
level 3 and 4 objectives in some or all programs, and defining responsibilities
for training and development.
Another part of this process is the specific guidelines for measurement and
evaluation. The guidelines are more technical than policy statements and often
contain detailed procedures showing how the process is actually undertaken
and developed. They often include specific forms, instruments, and tools
necessary to facilitate the process. Overall, the guidelines show how to utilize
the tools and techniques, guide the design process, provide consistency in the
ROI process, ensure that appropriate methods are used, and place the proper
emphasis on each of the areas.
Training and Preparation
One group that will often resist the ROI process is the trammg and
development staff who must design, develop, and deliver training. These staff
members often perceive evaluation as an unnecessary intrusion into their
responsibilities, absorbing precious time, and stifling their freedom to be
creative. The cartoon character Pogo perhaps characterizes it best when he said
"We have met the enemy and he is us." This section outlines some important
issues that must be addressed when preparing the staff for the implementation
ofROI.
Involving the Staff
On each key issue or major decision, the staff should be involved in the process.
As policy statements are prepared and evaluation guidelines developed, staff
input is absolutely essential. It is difficult for the staff to be critical of
something they helped design and develop. Therefore, their involvement
becomes a critical issue. Using meetings, training sessions, and task forces,
combined with routine input, the staff should be involved in every phase of
developing the framework and supporting documents for ROJ.
Using ROI as a Learning Tool
One reason the HRD staff may resist the ROI process is that the effectiveness of
their programs will be fully exposed, putting their reputation on the line. They
may have a fear of failure. To overcome this, the ROI process should clearly be
positioned as a tool for learning and not a tool to evaluate training staff
performance, at least during its early years of implementation. HRD staff
members will not be interested in developing a process that will be used against
them.
Evaluators can learn as much from failures as successes. If the program is

not working, it is best to find out quickly and understand the issues firsthand,
not from others. If a program is ineffective and not producing the desired
results, it will eventually be known to clients and/or the management group, if
they don't know it already. Consequently, they will become less supportive of
training. If the weaknesses, shortcomings, and failures of programs are
identified and adjustments are made quickly, not only will effective programs
be developed, but the credibility and respect for the function will be enhanced.
Teaching the Staff
The training staff will usually have inadequate skills in measurement and
evaluation and thus will need to develop some expertise in the process.
Measurement and evaluation is not always a formal part of their preparation to
become a trainer or instructional designer. Consequently, each staff member
must be provided training on the ROI process to learn how the overall ROI
process works, step-by-step. In addition, staff members must know how to
develop an evaluation strategy and specific plan, collect and analyze data from
the evaluation, and interpret results from data analysis. Sometimes a one-to-
two-day workshop is needed to build adequate skills and knowledge to
understand the process, appreciate what it can do for the organization, see the
necessity for it, and participate in a successful implementation.
Teaching the Managers
Perhaps no group is more important to the ROI process than the management
team, who must allocate resources for training and development and support
the programs. In addition, they often provide input and assistance in the ROI
process. Specific actions to train and develop the management team should be
carefully planned and executed.
One effective approach to prepare managers for the ROI process is to

conduct a workshop for managers, ''Training for Nontraining Managers."
Varying in duration from one half to two days, this practical workshop shapes
critical skills and changes perceptions to enhance the support of the ROI
process. Managers leave the workshop with an improved perception of the
impact of training and a clearer understanding of their roles in the training
process. More important, they often have a renewed commitment to make
training work in their organization.
Due to the critical need for this topic in management training, this workshop
should be required for all managers, unless they have previously demonstrated
strong support for the training function. Because of this requirement, it is
essential for top executives to be supportive of this workshop and, in some
cases, take an active role in conducting it. To tailor the program to specific
organizational needs, a brief needs assessment may be necessary to determine
the specific focus and areas of emphasis for the program.
Monitoring Progress and Communicating Results
A final part of the implementation process is to monitor the overall progress

made and communicate the results of specific ROI projects. Although it is an
often overlooked part of the process, an effective communication plan can help
keep the implementation on target and let others know what the ROI process is
accomplishing for the organization.
Monitoring Progress
The initial schedule for implementation of ROI provides a variety of key events
or milestones. Routine progress reports need to be developed to present the
status and progress of these events or milestones. Reports are usually developed
at six-month intervals. Two target audiences, the training and development
staff and senior managers, are critical for progress reporting. The entire
training and development staff should be kept informed on the progress, and
senior managers need to know the extent to which ROI is being implemented
and how it is working in the organization.
Developing Evaluation Reports
The results from an ROI impact study must be reported to a variety of target
audiences. One of the most important documents for presenting data, results,
and issues is an evaluation report. The typical report provides background
information, explains the processes used, and most importantly, presents the
results. Recognizing the importance of on-the-job behavior change, level 3
results are presented first. Business impact results are presented next, which
include the actual ROI calculation. Finally, other issues are covered along with
the intangible benefits. While this report is an effective and professional way to
present ROI data, several cautions need to be followed. Since this document is
reporting the success of a training and development program involving one or
more groups of employees, the complete credit for all of the success and results
must go to the participants and their immediate supervisors or leaders. Their
performance has generated the success. Also, another important caution is to
avoid boasting about the results. Although the ROI process may be accurate and
credible, it still may have some subjective issues. Huge claims of success can
quickly turn off an audience and interfere with the delivery of the desired
message.
A final caution concerns the structure of the report. The methodology should
be clearly explained along with the assumptions made in the analysis. The
reader should readily see how the values were developed and how the specific
steps were followed to make the process more conservative, credible, and
accurate. Detailed statistical analyses should be relegated to the appendix.
Communicating Results to a Variety of Audiences
While several potential audiences could receive ROI evaluation data, four
audiences should always receive the data. A senior management team (however
it may be defined) should always receive information about the ROI project
because of their interest in the process and their influence to allocate additional
resources for HRD and evaluation. The supervisors of program participants
need to have the ROI information so they will continue to support programs
and reinforce specific behavior taught in the program. The participants in the
program, who actually achieved the results, should receive a summary of the
ROI information so they understand what was accomplished by the entire
group. This also reinforces their commitment to make the process work. The
training and development staff must understand the ROI process and,
consequently, needs to receive the information from each ROI project. This is
part of the continuing educational process. Collectively, these four groups
should always receive ROI information. In addition, other groups may receive
information based on the type of program and the other potential audiences.
Based on an interview with the author.
Acknowledgment
Some of the concepts and practices described in this chapter were developed for
a new book, Return on Investment in Training and Performance Improvement
Programs, Gulf Publishing, 1997. Reprinted by permission.
References
Altizer, C. (1994). Increasing your training ROI: Tying training to performance

management for improved performance. Performance & Instruction. 33 (3),30-34
Alliger, G., & Janak, E. (1989). Kirkpatrick's levels of training criteria: Thirty years
later. Personnel Psycholocy. 42 (3),331-342.
Biech, E. (1994). TQM for Training. New York: McGraw-Hill, Inc.
Bleech, J.M., & Mutchler, D.G. (1994). Let's get results. not excuses! Grand Rapids,
MI: MBP Press.
Bramley, P., & Kitson, B. (1994). Evaluating training against business criteria. Journal
of European Industrial Training. 18 (1),10-14.
Broad, M. L., & Newstrom, J. W. (1992). Transfer oftraining. Reading, MA: Addison-
Wesley.
Davidove, E. A. (1993). Evaluating the return on investment of training. Performance &

Instruction. 32 (1), 1-8.
Dixon, N. M. (1990). The relationship between trainee responses on participant reaction

forms and posttest scores. Human Resource Development Quarterly. 1 (2), 129-137.
Fitz-enz,1. (1994). Yes ... You can weigh training's value. Training. 31 (7),54-58.
Geber, B. (1995). Does training make a difference? Prove it! Training. 32 (3), 27-36.
Kaufman, R., & Keller, 1. M. (1994). Levels of evaluation: Beyond Kirkpatrick. Human
Resource Development Quarterly, 5 (4),371-380.
Kimmerling, G. (1993). Gathering best practices. Training & Development, 47 (9), 28-
36.
Kirkpatrick, D. L. (1959a). Techniques for evaluating training programs. Journal of

ASTD, 13 (II), 3-9.
Kirkpatrick, D. L. (1959b). Techniques for evaluating training programs: Part 2-

Learning. Journal ofASTD, 13 (12),21-26.
Kirkpatrick, D. L. (1960a). Techniques for evaluating training programs: Part 3-

Behavior. Journal ofASTD. 14 (1),13-18.
Kirkpatrick, D. L. (1960b). Techniques for evaluating training programs: Part 4 -

Results. JournalofASTD. 14 (2), 28-32.
Kirkpatrick, D. L. (1983, November). Four steps to measuring training effectiveness.

Personnel Administrator, 19-25.
Kirkpatrick, D. L. (1994). Evaluating training programs. San Francisco: Berrett-

Koehler Publishers.
Krein, T. J., & Weldon, K. C. (1994). Making a play for training evaluation. Training &
Development. 48 (4), 62-67.
Marrelli, A. F. (1993a). Cost analysis for training. Technical & Skills Training. 4 (7),
35-40.
Marrelli, A. F. (1993b). Determining training costs, benefits, and results. Technical &
Skills Training. 4 (8), 8-14.
Moseley, J. L., & Larson, S. (1994). A qualitative application of Kirkpatrick's model for
evaluating workshops and conferences. Performance & Instruction. 33 (8),3-5.
Phillips, J. 1. (ed.), (1994). In action: measuring return on investment. Alexandria, VA:

American Society for Training and Development.
Phillips, 1. J. (1995). Measuring training's ROI: It can be done. William and Mary
Business Review, (Summer), 6-10.
Phillips, J. J. (1996). Was It the Training? Training & Development. 50 (3),28-32.
Rust, R. T., Zahorik, A. 1., & Keiningham, T. L. (1994). Return on quality: Measuring
the financial impact of your company's quest for quality. Chicago: Probus
Publishing Company.
Shandler, D. (1996). Reengineering the Training Function. Delray Beach, FL: St. Lucie
Press.
Veech, A. M. (1994). Who's on first: Management's responsibility to the bottom line.

Performance & Instruction. 33 (7), 9-13.
About the Author
Jack Phillips has 27 years of corporate experience in five industries. He has

served as Training and Development manager at two Fortune 500 firms, senior
HR officer at two ftrms, and President of a Regional Federal Savings Bank. In
1992 Dr. Phillips founded Performance Resources Organization (PRO), an
international consulting ftrm that specializes in human resources accountability
programs. He has authored or edited several books. His most recent works
include Accountability in Human Resource Management, Handbook of
Training Evaluation and Measurement Methods, and Measuring Return on
Investment. He has undergraduate degrees in electrical engineering, physics
and mathematics from Southern College of Technology and Oglethorpe
University, a Master's degree from Georgia State University, and a Ph.D. in
Human Resource Management from the University of Alabama. In 1987 he
won the Yoder-Henerman Personnel Creative Application Award from the
Society for Human Resource Management.
7 CLARIFYING AND DIRECTING
IMPACT EVALUATION
Robert O. Brinkerhoff
Introduction
As the title of this chapter suggests, training is intended to have an impact.

Impact of training typically refers to the benefit that an organization expects to
achieve because it provided that training. But what is this benefit? If a company
provides training to its employees, what should it expect: Greater loyalty?
Better perfonnance? Less employee costs? Increased profits?
One of the major issues in evaluating the impact of training is achieving

greater clarity among training stakeholders as to the meaning of and
assumptions inherent in the concept of training impact. Thus, the first section
of the chapter explores and discusses the several definitions and assumptions of
training impact. The next major portion of the chapter confronts the misleading
beliefs that surround the concept and practice of impact evaluation. This
focuses especially on the failure of traditional impact evaluation approaches to
assess the role of training in the larger process of performance management
improvement, and explains why most impact evaluation strategies are often
counterproductive. The chapter closes with a recommendation to adopt some of
the useful concepts and methods of quality management in impact evaluation of
training.
142 Clarifying and Directing Impact Evaluation
Defining Training Impact
The most common understanding of training impact is that training should

bring direct benefit to the organization that sponsors that training. Training
will not be supported by an organization simply as a philanthropic gesture
because it is the right thing to do. Nor is employee training and development
typically sponsored by organizations solely as an employee benefit, such as
career counseling or retirement planning services.
Certainly, part of the total value of training derives from the reality that
training is, in fact, a function perceived by many employees as a benefit. And,
many organizations promote continued education of employees by offering to
reimburse all or a portion of their tuition when they enroll in higher education
courses. Further, organizations that provide ample opportunity for training and
development would be perceived as employee friendly, while those that were
very stingy with training opportunities would be perceived as less friendly, and
would be thus less competitive in attracting and retaining high-quality
employees. But this benefit function of training is not the major reason that
companies support training, nor is it typically the central focus of evaluation.
The popular notion of training impact holds that training is, above all, an
instrument for improving employee, and thence organizational, performance
and effectiveness. The rationale for training is based on the easily understood
assumption that competent (i.e., well-trained) employees should perform more
effectively than less-competent employees. Note in the foregoing, however,
that the emphasis is on performance, not simply gains in competence. The
impact of training is normally construed as the organizational benefits (such as
increased production, greater quality, reduced costs) that ensue when
improvements in competence are manifested in improvements in job
performance. This is what has been called (Brinkerhoff 1987) the fundamental
logic of training.
The Logic of Training Value
The logic of training in its generic form (different variations of the fundamental
logic are discussed later) is represented in Figure 7-1.
The "Logic" of Training
Improved .
~ Job ~ BusIness
Results Value
o = Lacking S&K
f8.i<\ =Refined
@ = With new S&K ~ Competencies
(skills & knowledge)
Figure 7-1 Logic Of Training
As depicted in Figure 7-1, the long-range goal of training is value added to

products and services. This goal is reached by improved performance of
employees. The improved performance is reached when employees use their
new learning. New learning is the immediate outcome of a learning
intervention. Thus, value is achieved when this sequence of steps in Table 7-1
is successfully executed.
Table 7-1
The typical training value chain
1) Trainees, who lack a certain set of skills and knowledge, enter a learning
intervention.
2) Employees engage in learning tasks.
3) Trainees exit the learning intervention having mastered the intended new
skills and knowledge, and return to their jobs.
4) Trainees try out their new skills and knowledge.
5) Employees develop more refined, enabling job competencies.
6) Employees perform in more effective ways.
7) Employees increase the effectiveness and quality of their job results.
8) Employees enhance the value of products and/or services.
9) Organizational benefits (e.g., greater profits, competitive advantage) are
realized.
As Figure 7-1 makes clear, the value (defined typically as impact) that
results from training is realized only after training-acquired skills are used by
trainees in on-the-job behaviors. The immediate outcomes of the learning
intervention, if the learning intervention is successful, are new skills and
knowledge. But, in most types of training applications, the new skills and
knowledge alone do not add value; they must be practiced and effectively used
for value to be achieved. There are, however, exceptions to and variations of
this general principle, and thus a variable range of understandings of what
constitutes training impact.
Variations on the Fundamental Logic
Sometimes the purpose of training is only to provide the capacity to perform,

versus actual usage of the skills. Consider the example of an airline company
providing training to its pilots in how to perform in an emergency, such as how
to safely land the aircraft when the landing gear cannot be lowered or engaged.
Value is added to the service the airline provides when the pilots are fully
competent in this "wheels up" emergency procedure because passengers are
safer. The pilots do not actually have to use the training; in fact, we hope they
never will. The value of such training is a function of the likelihood of an event
occurring that would require the trained behavior (stuck landing gear), and the
magnitude of the consequences if the event were to occur, but the pilots could
not cope with it. If, for instance, there was literally no chance of the landing
gear becoming stuck, then this training would not have value. But, because
there is a chance that the emergency behavior may be required, and the pilots
have been thoroughly trained, the service provided has greater value (it is
perceived as more safe), and thus the training can be shown to add value
(organizational impact) to the core service of the airline company.
Another variation on the fundamental logic of training is training that is

construed as acculturation, wherein the purpose of the learning intervention is
to indoctrinate employees in an organization's culture and context. In this sort
of training, no immediate and discrete manifestation in job behavior may be
intended or expected. Yet another variant of the fundamental logic is apparent
in learning interventions intended to build the capability of employees so that
they are more qualified to be promoted into new pOSitions, should succession
opportunities become available.
The key message of Figure 7-1 is that training must have a logic, and that
the validity of the logic is the determinant of the potential value (impact) of the
training. The more valid the logic, then the more likely it is that the training
will lead to actual added value. One of the primary functions for impact
evaluation of training is to surface the logic of training and assess its validity.
The logic of training clarifies the basis of the value claim for training. If no
credible argument can be made that the learning to be gained by training could
possibly lead to performance improvements that would subsequently lead to
added value for products and services, then the logic is invalid; if a highly
credible argument can be made, then the logic is more valid.
As noted, however, there is more than one single logic that defines training
value. Confusion about the more specific logic of particular training initiatives
can undermine training effectiveness and impede the practice and effectiveness
of impact evaluation.
Categorizing Training Outcomes
One major factor that complicates the issue of evaluating training impact is that
training is not a unitary function intended to have only one single result. There
are variable intentions for training, and these several purposes for training
imply different constructions of the concept of impact. Leonard Nadler (Nadler,
1980) in his seminal discussions of human resources development, or HRD (his
suggested term for what he saw as the outdated name for training), postulated a
three-part taxonomy of training purposes: training, which focused on job
incumbents with the purpose of improving their current job performance;
education, which focused on employees with the purpose of preparing them for
specified new jobs and roles; and, development, which aimed at building the
organization's capacity to perform effectively in the future by increasing the
competence of employees. As the practice and profession of HRD has grown,
this simple taxonomy of aims and purposes has become increasingly outdated.
Table 7-2 provides a more current categorization of the several purposes of
training as it is commonly practiced in organizations today (Brinkerhoff &
Brown, 1997). This taxonomy is based on an analysis of the various intended
organizational benefits that training may be applied against, and thus is of
special importance for those concerned with evaluating the impact of training.
Table 7-2 names each type of intended benefit in a few words, describes it
briefly, then provides one or more common labels that are sometimes used to
characterize that category of training in different organizational contexts.
The outcome categories in Table 7-2 are useful in clarifying the intended
objectives for any particular organization-sponsored learning program or effort,
and help as well in identifying expectations among providers and consumers of
training as to the benefits and results they do, or can hope to, realize.
Table 7-2. Types of purposes & goals served by training. 1
Type A: Current Job Peiformance

To provide employees the skills and knowledge they need to perform more effectively in
the;\" current jobs (training, job training. in-service education. staff development.
management development, supervisory training)
Type B: Advancement & Promotion

To provide employees the skills and knowledge they need to achieve promotion and
other career advancement goals (career development, staff development, staff education)
1 © 1997 Robert O. Brinkerhoff & Valerie Brown-reproduced with

pennission.
Type C: Organizational Capacity

To build the organization's capacity to perform in an uncertain future by giving
employees skills and knowledge they may need in future roles (development)
Type D: Orientation & Acculturation

To provide employees the knowledge and understanding they need to become oriented
to, identified with, and to join their organization, knowing its business, its culture, its
rules and policies, and so forth (employee orientation, organizational culture, business
orientation)
Type E: Employee Capacity

To increase employees capacity to handle stress, work healthily, cope with change, and
master the other personal resiliency skills they need to continue to grow, learn, stay
healthy, and perform effectively in the continuously changing world of their work
(development, staff development, wellness)
Type F: Leadership Capacity

To develop leaders in the organization and give them the skills and knowledge they
need to effectively formulate strategy and goals, and guide the organization and its
people (leadership development)
Type G: Personal Benefits

To provide employees learning opportunities to gratify their interests and develop skills
and knowledge they personally wish to acquire but that are not needed for current or
future performance (educational benefits, tuition reimbursement)
The ftrst category (skills and knowledge to enhance current job

performance) is probably the largest category, both in terms of effort expended
in organizations, and in the types of training subsumed. Safety training, for
example, ftts within this category, as safety rules and procedures are expected
to be used on the job. Emergency procedures training, or hazardous materials
handling procedures, fall in this category, as do training in machine operation,
manufacturing skills, and so forth. Management development and supervisory
training are also in this category, as they are aimed at providing skills and
knowledge that are intended to lead to speciftc competencies in job-related
tasks.
The ftrst category is often partitioned into what are called soft skills, which
popularly refer to skills in human interactions, such as communications, values
clariftcation, conflict resolution, and so forth. Production procedures, operator
training, computer skills, and otber concrete psychomotor sorts of skills are
often called hard, or technical training, skills. The practitioner will encounter
many labels and definitions. The distinguishing characteristic of tbe function of
the training in this first category is the intended benefit: improved job
performance, whether that improvement should be manifested immediately or
after a period of time.
The outcomes represented by Type B are similar in their focus on job

performance, except that the focus is a specific future job, versus the present
job. Thus, Type B efforts aim at helping employees identify career goals,
understand their capacity and capability, and acquire the competencies they
need to get ahead. Of importance in this category is tbat results may be delayed.
There are no intended changes in personal performance until the new job is
acquired, though immediate changes in enabling behaviors and abilities (such
as seeking out career information, increased self- awareness of capabilities,
increased knowledge of position requirements, for instance) may be expected.
What is typically being enhanced in Type B efforts is the capacity to perform,
versus current performance, witb the focus on a specific and known, though
new to that employee, job role.
Type C outcomes are likewise focused on future roles and positions.

However, there are two key distinctions. First, the organization is the primary
intended beneficiary, for the focus is to produce a workforce that will have the
skills and knowledge needed to compete and perform in the future. Thus, a
company that provides general technology skills in advance of an employment
of specific new technology, or foreign language training in advance of a global
expansion, is using Type C outcome thinking. The notion here is to hedge
against technology and market conditions on the horizon that are not currently
affecting jobs but may in the future. The second distinguishing characteristic is
that this sort of outcome does not have a known and specific job referent. In
Type B efforts, the job is new only to the person striving for it; otherwise its
skill and knowledge prerequisites can be fully assessed and described. In Type
C efforts, the future job roles are uncertain, since they do not yet exist.
The training outcomes of Type D and Type E efforts are driven by similar
assumptions: that these training results are needed by a broad spectrum of the
organization's employees. A familiar analogy for Type D training is the civics
curriculum once common in America's secondary schools. The civics
curriculum was derived from the core of knowledge that a student would need
to become a good citizen, and included some basic history, philosophy of a

democratic society, as well as general rules and principles of accepted social
behavior. Likewise, employees need, so goes the Type D logic, to become
familiar with a certain body of knowledge in order to become a good citizen of
the organization that they have joined. The Type D knowledge base is typically
drawn from two sources: the rules and culture of the employee's organization,
and the business or industry in which the organization is engaged.
Like orientation training, Type E learning is aimed at a broad spectrum of

employees, and consists of skills and knowledge that it is assumed virtually
everyone in the organization needs, regardless of their specific job role. Type E
needs are derived from assumptions about the core of personal effectiveness
skills that transcend job boundaries and, when mastered, help an employee
perform at optimum levels. Examples of such skills might be the ability to
manage time, to constructively accept feedback, to cope with stress, to accept
and constructively adapt to change, and so forth. Type E outcomes are
relatively generic when they are learned but would have variable and specific
application depending on an individual's particular job and psycho-social
characteristics and needs.
The Type F category refers to learning efforts that are intended to develop
leadership skills and capacities. Type F outcomes are driven both by immediate
needs for skills and knowledge by persons in current leadership positions, and
by the needs of the organization to have a pool of leadership talent from which
to draw to replace leaders and meet emerging needs for leadership at all levels
in the organization. Training structures, such as workshops and seminars, are
often accompanied by personnel performance review systems and other
processes to identify and groom potential and future leaders.
The final category, Type G, is typically not the focus of training evaluation
inquiry, since it is not intended to have much, if any, impact on performance.
This category is listed to capture the sorts of learning experiences that
organizations sometimes provide to their employees simply as a benefit, in the
same spirit that organizations might provide health insurance, child care,
exercise facilities, or a cafeteria. Along this line, some companies provide
learning opportunities strictly as a sort of giveaway to employees, such as
classes in how to control home finances, or even French cooking. The rationale
for the giveaway is that such learning opportunities increase the attractiveness
of the organization as an employer, serving the same function as other benefits
programs: to attract or retain employees. Also included in Type G are similar

learning opportunities that can be more closely, though indirectly, tied to job or
organizational performance. Examples might be classes in weight control or
how to stop smoking. The argument for organization sponsorship of such
opportunities is that they may benefit the organization in the long run, by
reducing costs of health insurance and health-related performance
impediments. Type G efforts are included in Table 7-2 only because they are
commonly encountered, but they are rarely to be considered training in the
sense that the other types are aimed more clearly and directly at improved
organizational and individual performance. In any case, Type G efforts are
rarely the focus of training evaluation inquiry; they would be more frequently
considered under evaluation efforts that fall broadly under the umbrella of the
entire human resources (personnel) function.
These, then, are the typically encountered and major sorts of efforts that will
be found under the rubric of training or other terms meant to represent the
function in organizations that provides for ongoing learning among employees.
As was noted, the function will carry different names in different organizations
and enterprises. The purposes served, as defined in Table 7-2, however, are
very consistent across organizations. There are, of course, marked differences
between any given organizations as to what extent they pursue each type of
training outcome and how they label and budget for that training.
The extent to which a particular organization dedicates its learning

resources to the different types of training outcomes is, in itself, a point for
impact evaluation. Needs for different training outcomes vary by organization
and enterprise. A computer company, for example, will have to spend
considerable resources to pursue Type C outcomes, implementing significant
training efforts to provide employees with skills in new technologies that are
just emerging, even though they will not be used for current tasks and
production. On the other hand, a fast-food business, such as McDonald's,
where technologies and production methods are relatively stable, but which
adds great numbers to its workforce frequently, will need to focus on Type A
(current job skills) training. Or, consider an organization with a stable and
mature workforce that expands its business through opening new branch
offices of significant size and complexity. This sort of organization and
business strategy calls for a sizable investment in Type F (leadership) efforts, to
assure a deep and consistent pool of leadership talent from which to select new
office managers. The sorts of training outcomes that an organization pursues
should be driven by that organization's strategy, its market conditions, and

other contextual factors, such as the demographics of its workforce and the
surrounding populations from which it hires employees.
Using the Logic of Training in Impact Evaluation
Impact evaluation intersects the logic of training in three vital ways. First,
evaluation can be applied to the logic itself: the logical basis of the impact
argument. There are several purposes that training may serve, and each is
driven by a different logic. Training evaluators can very usefully serve
organizations simply by surfacing and clarifying the logic of the different sorts
of training the organization supports, engendering discourse about priorities,
and helping to resolve misunderstandings about what sorts of results training
can and cannot achieve. Where evaluative data cast doubt on the validity of the
logic of training, decision makers can be engaged in discussions of whether or
to what extent the training should be pursued.
Valid logic, however, determines only the potential value of training. For
training to actually add value, the training process must be designed and
managed so that all phases of the logic-chain are effectively implemented.
Almost never are the immediate results of training included in what is
popularly conceived of a training impact. The specific learner gains in
knowledge and skill, or positive reactions toward the training, are not the
impact that is sought. Rather, it is some job or organizational performance
characteristic that defines impact, and the training is seen as an enabler of that
result.
Thus, the second focus for evaluation attention is to monitor the critical
series of events in the logic value-chain as the training is being implemented.
Where obstructions and obstacles are identified, clarified, and understood,
decision makers can intervene to put the value-chain back on track, enhancing
the likelihood that impact will, in fact, be achieved.
Recall that the discussion of Figure 7-1 included a description of the generic
sequence of events that training is intended to engender so that organizational
value (impact) can be achieved. This sequence is referred to in this chapter as
the training value chain, which includes the major events in Table 7-1.
Most often, the impact of training is defined such that it includes point 7,
improved job effectiveness, and point 8, enhanced value of products and
services, and/or point 9, organizational benefits. Points 4, trying out learning,
5, refining competencies through practice, and 6, performing more effectively,
are typically subsumed under the rubric of transfer of training or learning.
Because the impact points are so dependent upon the transfer points in the
value chain, it is often the case that impact evaluation efforts include an
investigation of transfer, even though the transfer alone is not construed as
impact per se.
In any case, readers should notice the obvious fact that training impact is not
achieved immediately and is greatly dependent upon subsequent points in the
value chain. Any of these subsequent points might break, threatening or
canceling the impact of training. Trainees might not, for example, learn the
skills in the first place because of a faulty learning intervention. Or, they may
forget the skills because of lack of practice. Or, they may remember them but
never find the time or opportunity to use them, or they may even be prevented
from using them by some job context factor, such as a policy that rewards non-
use. They may not use them because their peers do not, and thus they feel that
the culture of the workplace discourages them from using their new learning.
Or, they may not want to use them because they object to them, or think they
are not worthwhile, and thus may not learn them well in the first place. Or,
they may use their skills, but organizational priorities and goals may have
changed such that their improved performance has only very marginal value.
The training value chain is often fragile, in that the new learning calls for
new sorts of performance, which are likely to be overwhelmed by whatever old
ways of doing things prevailed prior to the training. The training value chain is
always threatened by factors other than whether the trainees learned the new
skills and knowledge in the first place. Competent performance is a complex
phenomenon, assailed by many organizational forces, and never caused solely
by the capability of the performer him- or herself. Thus, this second focus for
impact evaluation-studying the value chain during implementation to help
keep it on track-is a vitally important function.
The third focus for impact evaluation is retrospective, whereby the logic of
training can be reconstructed and analyzed after training has been
implemented. This can help decision makers determine whether and how much
impact the training value chain has led to. More important, this analysis can
help determine what factors and events during the history of the training
implementation facilitated impact, or what factors impeded or obviated impact.
This knowledge of impact-influencing factors can be of great help in designing
and planning further training, and in building the organization's capability for
getting continuously greater value from training endeavors.
The Mythology of Impact Evaluation
The popularity of the quest for impact evaluation is easily understood in the
competitive context of organizations. Because resources in the organization are
always limited, executive management is in the perennial role of stewardship:
deciding where and how the organizational resources can be best allocated to
achieve competitive advantage. The benefits of training are often marginal at
best, almost always hard to discern, and are frequently long term. It is no
wonder that the training function is a favorite target for the budget cutters'
axes. It is likewise no wonder that training professionals in these organizations
have longed for the sorts of hard data that would let them compete in the
budget process with production and other line areas of the organization. Access
to return on investment and other financial indices would justify their resource
outlays in terms of results the business readily recognizes.
But the very notion of evaluating the impact of training can, and often does,
beg some very fundamental questions. Of greatest concern is that the pairing of
the term impact with the term training implies that one should expect an
impact from training, which leads one to jump to the next conclusion that
training causes the impact with which it is paired. The quest for impact data
and other hard measures of training value can mislead the organization as to
the true nature and substance of training, and the process by which training
leads to organizational benefit.
Learning Itself Does Not Produce Impact
Training produces only capability, not performance. It is performance that

produces impact, not new learning alone. Imagine that a company has rightly
determined that it could save money and achieve greater production if its
production machines were more consistently maintained, leading to greater
efficiency and less down-time. To implement the new maintenance procedures,
the company trains its production staff in how to effectively apply the new
maintenance procedures. Now, imagine that the production staff, in fact, learns,
then uses, the new procedures; the machines stay operative longer, and
efficiency increases while costs (due to repairs, etc.) decline. Now we have
impact. But, let us be clear what caused the impact. It was not the learning but
the use of the learning (performance) that led to the impact. Imagine, on the
other hand, that no one had used the learning, and the maintenance procedures
were never implemented. Clearly, in this case, there would have been no
impact.
If it is performance (the effective application of learning in important job

tasks), not learning (the acquisition of capability), that creates impact as it is
construed in the accepted understanding of impact evaluation. It is then
misleading to promote evaluation of impact of training without careful
integration of the concept of performance.
The Complex Causes of Performance
Performance is affected by many factors other than the sheer capability of the
performers. As performance analysts and researchers have pointed out
(compare Gilbert, 1978; RummIer & Brache, 1994), performance is shaped by a
complex combination of systemic organizational factors. This complexity of
interaction between performer capability and performance system factors can be
readily discerned and discussed in a typical example, as described in the case
that follows.
Table 7-3. Analysis of performance complexity in a hypothetical investment

advising company-A case example
Setting
Shurwin Investment Advisors, a large investment advising company, seeks to increase
customer satisfaction, thereby retaining customers for longer periods of time and
reducing its cost of sales. The company has determined through research that a critical
element in long-term customer satisfaction is the quality of the service it provides
customers in an initial investment analysis meeting between the customer and an
investment advisor. As a part of the process for improving this critical component of
service quality, it has purchased a new investment analysis software program.
The Training Logic

Investment advisors need the capability to use the new system, in the form of skill in
using the system, and knowledge about how the system can be used in the analysis
interview. Training interventions will provide advisors with the new skills and
knowledge. Then, advisors will employ their new capacity in initial analysis interviews
with customers, the analysis plan based on this interview and provided to the customer
will be more accurate, complete, and useful, and thus customers will be more satisfied.
This is the logic of this particular training effort, and it has been validated through
proprietary company research.
Additional Systemic Performance Factors

Whether a customer is more or less satisfied after the initial interview, and whether the
intended training truly contributes to the longer-range organizational goals, depends on
a myriad of factors. Here are just a few:
• The marketing effort may not bring the right sort of customer into the interview;
thus, the analysis driven by the software will not be right for that customer.
• The software design may be too complex, causing advisors to abandon it in favor of
their old, more comfortable approach.
• The investment advisors may feel pressure to increase sales, causing them to push
too hard for a sale of products beyond the initial analysis, in turn making the
customer react negatively to high-pressure tactics.
• Peers who have not yet been trained may exert subtle peer pressure to adhere to
current rather than new expectations for successful behavior.
• The analysis software may include examples not appropriate for the cultural
diversity of some customers, thus offending rather than satisfying them.
• The software training may be delivered too early. What advisors might really need
is more skill in putting customers at ease and listening more closely to their
concerns before they try to use the software-driven analysis scheme.
• Supervisors may feel the software is too high tech, and encourage their advisors to
use softer, low-tech interactions instead, like the methods these supervisors have
found successful in the past.
• Advisors might hear from others that the software is not effective, and enter the
training with a bad attitude toward it, keeping them from learning how to use it.
• Some offices may not need the training, since their advisors already have achieved
100% customer satisfaction with an alternative initial interview method.
• Other offices may not need the training now because their problem is retaining
customers in latter service stages, or planning to serve emerging markets, not
increasing the satisfaction of new customers with the initial interview.
It is clear that the intent of the training in the case was to add value by
increasing customer satisfaction with the investment analysis provided in the
initial interview. But anyone of the factors (or many others) could keep the
training from having a positive impact on customer satisfaction. These factors
exemplify a problem with which virtually all training practitioners are familiar.
People may have the skills and knowledge needed to perform more effectively
but may not do so because of failures in the performance support system:
insufficient or missing incentives, peer pressure to maintain old behaviors, a
lack of supervision, inadequate tools and resources, unclear performance
specifications, vague direction and lack of feedback, to name a few.
The complexity of the performance equation makes it abundantly clear that

any effort to assess the impact of training, where that impact is construed to
mean organizational results and benefits, must cope head on with the fact that
training-the acquisition of capability though increased competence-is only a
partial player, and most likely a bit player at that, in the overall drama of
effective performance and business results. In the practice of training
evaluation, there have historically been essentially three strategies for
addressing the performance complexity issue. These are 1) use statistical and
research design methods to isolate and quantify the partial effects of training
vis-a-vis overall impact; 2) assess the overall impact of training and report
these results, essentially ignoring the complex causal nexus; 3) assess only the
immediate results of training (learning), using the complexity reality as a
rationale for not assessing impact beyond this point.
A fourth strategy, which has begun to emerge (Brinkerhoff & Gill, 1992), is
to adopt the concepts and procedures of quality management, or TQM (total
quality management), and apply these to impact evaluation.
These four strategies are discussed briefly in the next section of this chapter,
and some closing recommendations are made.
Impact Evaluation Strategy
The difficulties inherent in evaluating training have vexed training

professionals for decades. For evaluation researchers and other academics, the
problems of determining the effects of learning and training on performance
have been a boon, spawning a number of journal articles and research studies.
For the training practitioner with bureaucratic accountability for the effective
operation of the training function, they have been a nightmare. On the one
hand, as the opening section of the chapter implies, they have had to cope with
ambiguity and confusion surrounding the expectations for training results, and
the range of organizational purposes that reside under the training umbrella.
On the other but related hand, these practitioners are pressured to assess and
prove their worth, and have looked to evaluation to help them do this. As many
surveys have shown (for example, Carnevale & Schulz, 1990), practitioners
have done reasonably well in assessing their trainees reactions to the training
they receive. But, reports of successful evaluation of the indicators of training
impact, as these are defined further away from the immediate learning along
the training-performance value chain, decline dramatically.
This final portion of the chapter is intended to shed some light on the
strengths and weaknesses of the currently available impact-evaluation
strategies, and to make some closing recommendations for further development
by researchers, and applications for practitioners.
Strategy One: Assess the Partial Effects of Training
This strategy honors the fact that training is but a single element in the
complex nexus of actions to improve performance and achieve organizational
results. Several approaches are spawned by this strategy, yet all are essentially
complicated and require access to controls and resources that are typically
beyond the reach of the typical training practitioner.
The evaluation methods available within this strategy all derive essentially
from the experimental paradigm as it is applied in behavioral research
(Kerlinger, 1973, among many), which would consider training as the main
effect under inquiry and assign all other performance influencing factors the
status of extraneous variables. The training evaluator would then seek to
construct an evaluation design that would control and account for, either
through manipulation of the conditions of the evaluation test or through
statistical methods, these extraneous variables. One might, for example,
randomly assign one group of employees to a training intervention and another
group to a placebo treatment, and then compare the later job production and
effectiveness of these employees. Another similar approach might measure a
group of trainees on many of the known extraneous variables and one or more
impact variables, and then statistically control (through regression analysis, for
example) and account for the partial effects of each of the related variables,
seeking to isolate and describe the effects of the training variable alone.
The experimental design methods have been used far more in the research
arena than by practitioners in applied settings, and have yielded important,
though largely inclusive and incomplete, information about training
effectiveness. (For an excellent summary of training research, see Tannenbaum
& YukI, 1993.) Further, some of the utility methods used by researchers have
shown interesting but limited applicability for practitioners (Mathieu &
Leonard, 1987 ).
Methodologically, the experimental design strategy approaches are that it is

far better suited to laboratory and occasional controlled experiments and studies
in applied settings. The methods are too complex for everyday practitioner use,
and are too expensive, diverting precious training resources from more direct
application. Strategically, the experimental design methods must be seriously
questioned. They are based on the assumption that it is not only possible but
desirable to single out and isolate the effects of training, and that other
performance factors are extraneous. The mental model that these assumptions
spawn is diametrically opposed to the performance management approach,
which argues that the factors of the performance system need to be managed as
an integral system.
Strategy Two: Measure and Report Impact Results Alone
This strategy is implicitly reinforced by the outdated but popular and durable
Kirkpatrick Model (Kirkpatrick, 1975) of training evaluation that posed four
levels of training results: 1) reactions of trainees, 2) learning of trainees, 3)
usage of learning by trainees (transfer) on the job, and 4) organizational
benefits. The great strength and contribution of the Kirkpatrick model is that it
focused management attention on the longer-range purpose of training
(impact), and gave practitioners an apparent hierarchy along which to pursue
and focus their evaluative efforts. Its greatest weakness is that it promoted the
simplistic and false view that training could somehow lead directly to
organizational benefit (level 4) without a great deal of other effort. It also
promoted the belief, appealing intuitively but unsupported by research
(Tannenbaum & YukI, 1993), that there are causal connections from level to
level.
Because of the intuitive appeal of the Kirkpatrick model, and its timely
introduction to the training field, it has spawned a push for level 4 evaluation
studies. As it is largely practiced, level 4 evaluation studies primarily seek
impact data, and make only superficial efforts to study the larger causal context
of the performance that underlies the impact results. A typical level 4 study
design might, for example, implement a survey of all past attendees of training,
many months after they have attended training. It would ask these trainees (or
their supervisors) to report instances of application of the training, and then
further ask the respondents to estimate the impact and value of these
applications. Survey data would be aggregated, and the instances and
proportions of successful impact would be defined and reported. The strength
of such approaches is that they are relatively simple and inexpensive. They are
also almost certain, for a variety of reasons, to tum up seemingly positive and
credible instances of impact, which when they are found, the training function
is usually quick to claim credit. Relatedly, it is also a strength of these methods
that, when insufficient evidence of impact is found, there is also a handy escape
route available to avoid blaming the training itself for the failure, as the
evaluators can cite the uncertainty of the findings in light of the unknown
effects of alternative causes.
Recently, methods that purport to assess the return on investment (ROI) of

training have gained popularity among training practitioners. The appeal of
these methods is that they produce relatively convincing data about presumed
training impact and value. To be sure, these methods, and especially those
thoughtfully constructed by Jack Phillips (Phillips, 1994), have served training
leaders well, as they have struggled to defend budgets and justify training costs.
There are, however, several serious issues with the ROI methods.
Primary among the problems with the ROI methods is their failure to deal
with the complex causality issues. Further, the pursuit of ROI evaluation also
introduces confusion as to the proper goals of training. According to the ROI
process, a positive ROI (a return greater than the costs of training) is equal to
success. That is, training that has a calculated ROI greater than its costs
indicates that the training is good, and a calculated ROI lower than the costs of
training indicates that it is bad. It is fully possible, of course, that training
might result in dollars saved in some area of a company but that the cost
savings have virtually no strategic or business value. Further, when this is the
case, the net ROI is negative, since whatever was spent on the training and the
ROI study should have been invested where the business needed it most. The
purpose of training is not to save money! It is to help improve performance in
parts of the business that are crucial for strategic effectiveness.
Last, the ROI methods require evaluators to isolate the effects of training. As
we know, the effects of training should be integrated with, not isolated from,
the performance system. Methods that seek to claim training credit for impact,
and not recognize the vital contributions of other players in the performance
process, are divisive and exacerbate political isolation of the training function.
Further, training functions are already isolated, physically in the geography of
the organization, and conceptually in the minds of line management, from the
mainstream of business performance. Evaluation methods that separate training
more are probably best avoided.
Strategy Three: Evaluate Only Immediate Resufts of Training or Other

Nonimpact Variables
This is the principal de facto evaluation strategy of the majority of training

departments and practitioners. According to surveys of training practice
(Carnevale & Schulz, 1990), there is a very high rate of measurement of the
reactions (e.g., trainee satisfaction) of trainees to the training events and
procedures. There is also widespread assessment of training department
operations variables, such as the numbers of hours of training provided, costs
per hour of training, and so forth. The implicit assumption of this strategy
seems to be that it is too difficult and costly to measure much more than this,
and that the data that are reported should suffice. This approach is undeniably
popular, and one could conclude from its frequency and longevity that it is at
least partially successful. But, while it may have sufficed over time to reduce
pressure for more credible evidence of impact, it has very probably contributed
to the compounding effect of lowering senior management's estimation of the
business value of training. If one subscribes to the notion that organizations
measure what is important to them (they count what counts), then one could
implicitly conclude that the training function is seeking to satisfy trainees, and
that is about it. If this is true, then senior management should expect little more
from the internal training department than employee satisfaction, and when it
needs real business results, it should get it somewhere other than from the
training function. If this observation is true, it is little wonder then that training
managers complain that they have difficulty in getting real commitments and
participation from line management in the training process.
Strategy Four: Pursue Impact Evaluation as a Quality

Management Process
The quality management concept as it has been defined by Deming and others
(Deming, 1986; Crosby, 1979; Juran & Gryna, 1980; Taguchi & Clausing,
1990) fits the training evaluation context very well. Quality management
stresses customer-oriented definitions of quality, systems thinking, process
analysis, and process measurement, all of which have are responsive to the
context of training, with its inextricable interweaving into the fabric of the
complex performance management and improvement system, and the resultant
multiplicity of stakeholders and customers.
Older (pre-total quality management era) approaches to defining quality

were driven by technical quality specifications, and significantly, are very
parallel to the current mental models that typically guide evaluation of training
practice today. Specifications for quality in automobiles, for example, were
defined by automotive engineers and production executives who based their
thinking on automotive design principles and manufacturing process controls.
Then, quality was inspected at the end of the production process by assessing
the final product against the specifications set for it. Similarly, in training,
quality specifications are largely driven by training and instructional design
experts, and training results are assessed against these expectations for
achievement of learning objectives, adequacy of learning designs, and the
translation of learning by trainees into practice routines prescribed by the
training designers.
One key and defining distinction of the quality management approach is that
the customer is the arbiter of quality, and quality is functionally defined as
being fit for use by the customer. Thus the first step in quality definition is to
study the customers and determine their needs and expectations. These needs
and expectations are then translated into product and service specifications,
which in tum drive the design of production processes.
Applied to training, the quality perspective forces the question of who is the
customer of training. Where trainees themselves are implicitly defined as
customers, quality specifications tend to represent short-term characteristics of
the training intervention itself, such as whether the training venue is
comfortable, whether the training leader is entertaining and supportive, or
whether the trainees find the content relevant to their perceived needs and
experience. On the other hand, when training customers are rightfully viewed
as the immediate and up-line managers of the trainee, then the quality picture
changes considerably. The trainees' managers are interested most in training
that helps their employees perform more effectively, thus helping them and
their senior managers achieve business performance goals. Quality
specifications for training driven by the ultimate customers of training-the
managers who pay for it-tend to focus more on the results and value of
training in terms of the business, thus helping evaluation gain more leverage
for assessing and increasing impact.
A vital second defining characteristic is the immersion of quality

management processes into a systemic framework. By this thinking, customer
quality is achieved not by the single results of one organizational unit but by the
integrated cooperation of the several functions of the organization into a
coherent process. This is clearly much closer to the reality of training impact
than is the view that seeks to isolate effects of training. Systems thinking
applied to training helps identify the critical performance support systems and
factors (as exemplified in Table 7-3) that impinge on and interact with learning
acquisition processes. Because so many of the factors that influence training
impact are outside the scope of control of the typical training function and
effort, this systems thinking framework can help identify the nontraining roles.
Process analysis in quality management prescribes that the entire value-

chain (the sequence of subprocesses that add value to customer products and
services) be identified and analyzed. High leverage steps can be isolated and
then carefully measured and monitored while the process is operating, versus
assessing quality at the end of the value-chain. This approach uses
measurement built into the production process, with the intent of producing
nothing but quality results, rather than using quality measurement at the end of
the process to inspect results, and then sort out those results that do not meet
quality specifications.
The concepts of quality management can be very usefully employed in

training evaluation thinking and design. Brinkerhoff & Gill (1992) initially
presented an evaluation framework based on the principles of total quality
management. This framework was later developed into a comprehensive
approach for managing the training (Brinkerhoff & Gill, 1994). The
Brinkerhoff and Gill approach presents a comprehensive approach to defining
and managing training in organizations that incorporates evaluation as a core
process but does not singularly focus on evaluation. Rather, it is an evaluation-
based framework for managing and improving the impact of training.
Tony Newby's (1992) book on evaluation defines an evaluation process that

is tied to a fairly comprehensive training development, delivery, and support
process. Newby's work is noteworthy for its emphasis on the strategic role of
evaluation, and he echoes the Brinkerhoff & Gill (1992) call for evaluation as a
method to reshape organizational thinking about training. He notes that most
training is approached and evaluated in a manner that fails to demonstrate
business value, and thus perpetuates a vicious circle of decreasing expectations
and respect for training in the organization. While Newby strongly urges that
evaluation be aimed at linking training to the essential values and goals of the
organization, he does not provide much direction for how this can be
accomplished.
In sum, the notion that training evaluation can be profitably viewed from the
perspective of quality management is a sound idea, and represents a potentially
powerful strategy. The previously described strategies (one through three) may
all lend helpful methods, and may at times be useful in temporary situations,
such as conducting a quick study of ROI when the training department is under
attack and must employ emergency procedures to defend its budget. But none of
these previous strategies represent a viable long-term strategy for impact
evaluation. Above all, the goal of impact evaluation (any evaluation, for that
matter) is learning. Most organizations do not get the impact they should from
training. Constructive and thoughtful impact evaluation, using a process-
oriented approach as described here, can be a powerful force for building
organizational capability, through learning, to achieve increasingly more
valuable results from training resources.
Summary
The issue of training impact is complicated by the several and varied purposes
to which organizations direct employee learning and education. These purposes
center on, but are not exclusive to, achieving business results and value through
the improvement of employee performance. Even when training impact is
clearly defined as performance improvement that achieves business goals,
training impact is subject to myriad forces and factors that impinge on job
performance and the value of job results that range well beyond the relatively
small contribution training alone makes.
It is very clear that training may ~and probably often is-needed to

implement improved performance. But it must be equally clear that training is
not the cause of the performance improvement, and therefore training alone
cannot lay claim to impact. It is also vitally important that the argument of the
extent to which training is a cause of impact is not simply academic and
methodological. It has very real and critical manifestations in the way that
training is perceived, valued, defined, and managed in organizations.
When the belief in the myth of training as the cause of impact is strong, it is
very likely that training will have only marginal impact. Since insufficient
quality management attention will be devoted to the many performance factors
that interact with learning to produce performance improvement, and hence
business results (impact). On the other hand, when training is viewed as one of
several means to performance improvement, and when performance
improvement efforts are managed based on a comprehensive understanding of
the systemic underpinnings of organizational and individual performance, then
training is very likely to be a powerful tool for business success.
Impact evaluation cannot prove the training worked. As we have seen,

whether it works means whether business results of the performance
improvement processes to which training is linked have been positively
affected. This impact is beyond the scope of training alone. The more
reasonable charge for evaluation of training impact is to prove that the training
solution fits the performance needs, and that the training produces an effective
and efficient contribution to the performance goal. To respond to these issues
requires a systemic view of training as a process, not as an event.
This larger systemic view that characterizes approaches to quality

management will dictate that impact evaluation focus on a number of
contributing factors, and interpret these in the context of the final results
achieved, as well as the performance processes that drove these results. When
evaluation tracks and reports the factors that impinge on impact, it helps
assure optimal impact from learning resources. When evaluation identifies and
describes breakdowns in the learning-performance value chain, it helps
stakeholders understand and better manage their contributions. When impact
evaluation uncovers and documents impact, it helps all of the players in the
complex learning-performance process recognize and celebrate their success.
References
Brinkerhoff, R. 0., & Brown, V. (1997). How to evaluate training. Beverly Hills, CA:
Sage.
Brinkerhoff, R. 0., & Gill, S. J. (1994). The learning alliance. San Francisco: Jossey-
Bass.
Brinkerhoff, R. 0 & Gill, S. J. (1992). Managing the total quality of training. Human
Resources Development Ouarterly. 3 (2), 121-131.
Brinkerhoff, R. 0., (1987). Achieving results from training. San Francisco: Jossey-Bass.
Carnevale, A., & Schulz, E. (1990). Evaluation practices. Training & Development
Journal. 44(S), 23-29.
Crosby, P.B. (1979). Oualitv is free. New York: McGraw-Hill.
Deming, W.E. (1986). Out of the crisis. Cambridge, MA: MIT Press.
Gilbert, T.F. (1978). Human competence: Engineering worthy performance. New York:
McGraw-Hill.
Juran, 1M. & Gryna, F.M. (1980). Qualitv planning and analysis. New York: McGraw-
Hill.
Kerlinger, F.N. (1973). The foundations of behavioral research (2nd. ed.). New York:
Holt, Reinhart, & Winston.
Kirkpatrick, D. L. (1975). Evaluating training programs. Alexandria, VA: ASTD Press.
Mathieu, 1., & Leonard, R. (1987). Applying utility concepts to a training program in
supervisory skills. Academy of Management Journal. 30, 316-335.
Nadler, L. (1980). Corporate human resources development. New York: Van Nostrand
Reinhold.
Newby, A. T. (1992). Training evaluation handbook. San Diego, CA: Pfeiffer.
Phillips, J. J. (ed.) (1994). The return on investment in human resources development.

Washington, D.C.: ASTD Press.
RummIer, G., & Brache, A. (1994). Performance improvement (2nd ed.) San Francisco:
Jossey-Bass.
Taguchi, G., & Clausing, D. (1990). Robust quality. Harvard Business Review, 68(1),
66-75.
Tannenbaum, S., & YukI, G. (1992). Training and development in work organizations,
Annual Review of Psychology. 43,399-441.
About the Author
Robert O. Brinkerhoff is an internationally recognized expert in training

effectiveness and evaluation. He has provided consultation in training
effectiveness and evaluation to dozens of major companies and organizations in
the United States and around the world. Author of nine books on evaluation
and training, his most recent work is The Learning Alliance.' Systems Thinking
in Human Resources Development. Rob Brinkerhoff earned a doctorate at the
University of Virginia in 1974, and is currently Professor of Education at
Western Michigan University, where he coordinates graduate programs in
human resources development.
8 FORMATIVE EVALUATION
Wilbur Parrott
Introduction
Formative evaluation is the review and revision of all parts of the instructional
design process from the needs assessment through the development and field
testing of the materials. Although the concepts and methods have been with us
for 30 years, there are signs of that it is being "rediscovered" by a new
generation of educators.
Formative evaluation is not a new methodology. Beginning in the late 1960s

a number of scholars saw the necessity for formative evaluation. This group
included Scriven (1967), Stake (1967), Stufflebeam (1968), and Cronbach
(1975). They are most often remembered for defining the terms formative and
summative evaluation and the distinction, or lack of distinction, between them.
However, it would be better if we remembered their contributions in making
evaluation central to the design and development process.
The chapter begins by defining formative evaluation and contrasting it with

the purpose and methods of summative evaluation. Some general principles
that apply to all phases and methodologies employed in formative evaluation
are then discussed. Specific methodologies used at different phases in the
formative evacuation process are described in more detail, and practical
guidelines suggested for their successful implementation. The methodologies
include Design Review, Prototype Review, and Prototype Testing. The chapter
concludes by considering why formative evaluation does not occur more
frequently and highlighting the opportunities this methodology presents.
168 Formative Evaluation
Formative and Summative Evaluation
A discussion of formative evaluation is incomplete without a brief discussion of

summative evaluation. Table 8-1 summarizes the distinctions between
formative and summative evaluation.
Table 8-1. Distinction between formative and summative evaluation
Formative Evaluation Summative Evaluation

Definition Review and revision of all Evaluation method that measures
parts of the instructional to what extent the training solved
design process while the the problem(s) identified in the
instruction is being designed needs assessment
and developed
When it occurs From the beginning of the Typically three to twelve months
instructional design process after the student completes the
(needs assessment) until the training
materials are completed
Methods Reviews Questionnaires

Observation Interviews
Pilots/field tests Tests
Performance data
Business data
Summative evaluation serves several purposes. It provides information to

decision makers, including the client who funded the project. Was it a wise
investment? Should it be maintained, modified, or discontinued? Summative
information also provides the training development and delivery teams with
valuable information about the strengths and weaknesses of the course or
training program. This type of information, in addition to providing specific
feedback about a particular product or program, can often be used to identify
systemic process issues which, if addressed, can produce overall improvements
in the quality of products and the efficiency with which they are produced.
Sometimes a systematic review of summative data will identify design flaws,

technical inaccuracies, or misidentification of target audience needs and
preferences. These are the kinds of discrepancies or deficiencies that, had they
been discovered and addressed while the training was being designed and
developed, would have produced a very different set of summative results. The
cost of producing inappropriate or useless training, both in human terms and in
dollars, argues loudly for more attention to that often-neglected element of
evaluation to which we now turn our attention: formative evaluation.
General Principles
Formative evaluation is an iterative process that should be embedded within the

design/development cycle. The principles and processes described in this
chapter are pertinent, in varying degrees, to training that utilizes a variety of
methods and delivery media. Each method and media has an additional layer of
specific principles. ,
Review Constantly
The developers should constantly be testing their ideas with colleagues and
peers. A review may be a formal meeting with agenda and sign-offs. It may also
be a discussion over a cup of coffee. Formative review almost becomes a
mindset, a willingness to share ideas and receive feedback. The sooner feedback
is received, the easier it is to incorporate it into the design of the project. One
last point on the necessity for continuous review should be noted. The longer
people hold on to an idea and polish their material, the more it becomes a part
of them. As a result, they are more likely to resist changes.
Review from Different Perspectives
To be most effective, formative evaluations need to include reviewers who have

multiple perspectives and skill sets. Instructional designers, subject matter
experts, the client, prospective recipients of the training and, if appropriate,
instructors, are all people who should be included in one or more stages of the
formative review process.
The role of an instructional designer is to ensure that the product is

educationally sound. That means that it is efficient and effective.
Subject matter experts are authorities on the content. They bring depth of
knowledge and experience, often at a level beyond what will be taught. Subject
matter experts typically do not have a background in education or training.
The client is the person who funds the project. It is critical that the client be
involved in decisions that affect the outcome of the project and that, when
compromises have to be made, they serve the needs of all parties involved,
including the client.
Prospective students are usually not included in early stages of the

formative review process. However, as the project progresses through each
subsequent stage, they become more and more critical.
If the training will be lead by an instructor or facilitator, someone who has

taught the same or a similar subject may be able to offer insights based on
experience.
In some cases an individual may have more than one of the skill sets and
perspectives mentioned above. For example, your client could also be a subject-
matter expert. When that is the case the person should review the material
more than one time, concentrating on the perspective of each skill set.
Develop a Network
The continuous review process can make people feel vulnerable. Inviting
criticism is a hard thing to do, and hearing the feedback can be even more
difficult. Yet good instructional products just don't happen. In all likelihood
they have undergone continuous review and revision. For this to occur on a
consistent basis, you need to develop a network of people who trust and respect
one another and are willing to give and receive feedback as mutually supportive
professionals.
Some people in a review network may not be on the project team, but they
may be knowledgeable about instruction or the content. When building a
network, remember that this is time consuming and often extra work. It may
not be a technically required part of that person's job. This means that for
networks to be effective, they must become a two-way street. You must also
review the work of your colleagues. If it becomes a one-way street, it will not be
sustainable.
Design Review
Most people would subscribe to the truism "You can't get there if you don't
know where you are going." The design phase in the development of an
instructional project determines where you are going and how you will get
there. Yet, some people see the whole design process as just something to get
over, an impediment to doing "real work." In the United States, course
developers typically spend 10 percent of a project on design and 90 percent on
implementation and revision. In other cultures, particularly in Japan, those
proportions are nearly reversed.
What Is a Design Review?
A design review is an examination of the plans made for the instructional

product before the product development begins. If you are following a basic
instructional design model, the needs assessment, audience analysis, task
analysis, objectives, assessment or testing strategy, media selection, and
instructional methods strategy would have been completed. The review is the
last thing that occurs before development begins.
The design review should be a formal activity, a gate that a designer or

project team must go through. Informal review should have been occurring as
every part of the design was researched, thought about, and documented. The
design review is the phice to share the results of those preliminary design
activities. The result of the review is agreement on the design of the product
and the plan that will be used to guide development of the materials.
What Should Be Reviewed?
Table 8-2 summarizes the elements and highlights critical points to consider
during the review.
Table 8-2. Elements of a design review
Element Important Review Considerations

Title Descriptive, sets the right expectations and tone.
Needs Clearly states what people should be able to do at the end of

analysis instruction. Identifies the gap between current performance and
desired performance.
Instructional Clearly states performance(s) required to fill the gap.

goals and
objectives
Task analysis Defines skills, knowledge, and behaviors related to each of the
instructional goals, at the same level of detail that will be taught.
Audience Inclusive and descriptive of audience characteristics, experience,

analysis learning preferences, and level of motivation.
Prerequisites Assumptions about audience skills and knowledge level clearly

stated.
Evaluation Clearly articulates how student reactions and learning will, or will
strategy not, be assessed; identifies any posttraining evaluation activities.
Instructional Describes instructional methods as they relate to instructional goals;

strategy provides sufficient detail to guide subsequent development
activities.
Media Describes media or combination of media, related to instructional

goals; takes into consideration audience characteristics.
Project Addresses all issues that will affect the project: schedule, budget,
management roles, responsibilities, and working norms for the team.
-.
The review should include several discrete items that were developed during
analysis and design. This list is far from exhaustive. For more detail refer to
any standard instructional design text. In addition to reviewing each item,

consider the interaction between them and the overall effect of the design.
Process Guidelines
The process for conducting the review can affect the outcome. Following are
some suggestions for conducting effective design reviews.
• Include the right people and set clear expectations; usually the entire team
and the client should attend.
• Send out prereview materials early; point reviewers to specific items that
need to be more fully addressed at the meeting.
• Conduct the meeting in a manner .that is congruent with culture of the
environment and the size of the project; it may be an informal discussion
around a table or a formal presentation.
• At the conclusion of the meeting review decisions and outcomes; gain
agreement on the design and on roles and responsibilities.
• Document all decisions and distribute them to everyone that was present.
Consequences of a Poor or Incomplete Design Review
If a design review is not done, is poorly executed, or comes to the wrong

conclusions, the consequences can be disastrous. If there is no solid plan to
guide the work, the entire development effort is at risk, particularly when it is a
team effort. The majority of instructional development projects now require a
team to put it together; this is particularly true for multimedia projects.. The
lack of a shared design plan makes it difficult, if not impossible, to work
together effectively as a team.
Decisions that are not made in early stages will pile up later. At the time of a
design review, people will often say, "Oh, we don't need to worry about that
now. We can decide on that later." What they are really saying is something
more like, "I don't know the answer" or "I can't (or will not) figure out the
answer." Or they may be avoiding making a tough decision. In any case, the
necessity of the decisions or their importance does not go away. They will keep
coming back to haunt the project team until they get answered. If
procrastination continues, a number of the decisions may have to be made at
once. The later the decisions get made, the less time and resources are available
to implement them. Bite the bullet and make the tough decisions as part of the
design review phase. Some projects manage to seemingly never answer the
hard questions posed as part of design. This is an illusion. They get answered.
The answers almost always make no one happy.
Prototype Review
Prototype review refers to the process of reviewing each prototype or draft

module as it is developed. Prototype review is not something that usually occurs
only once. It is a series of reviews of the same materials in an evolutionary
process. Initial prototype reviews are done with rough drafts of the first or
second significant modules: all of the ideas may not be in place, some examples
may still be missing, some of the illustrations may be rough. There are two
reasons for using early, rough materials.
First, if you are at an early stage, it is possible to make substantial changes.

As the process goes forward, it becomes more difficult to do this. There is often
not enough time. A second problem is psychological. The longer people work
on the material, the less likely they are to be willing to make substantial
changes because they have invested too much of themselves in it. Another
reason for reviewing material early is that the first module becomes a template
for later modules. If the material is at the right level of detail, is understandable
by the audience, is easy to follow, and is technically correct, you have valuable
information to guide the development of subsequent modules.
A ModeJ for Reviewing
A useful model for reviewing modules was devised by David Merrill. The
model, called the Instructional Quality Inventory (Montague, 1983), addresses
issues of consistency and adequacy.
Consistency focuses on the degree to which the elements of design are

reflected in the instructional material under development. For example, is the
technical level of the material consistent with the technical level of the
audience described in the design document? If the design review indicated that
there would be many exercises and practical examples, is this what is reflected
in the materials? Review for consistency is mainly the province of the
instructional designer.
After modules are reviewed for consistency, they should be reviewed for
what Merrill calls adequacy. This refers to how well each element of the design
is executed. Are there clear instructions for the exercises? Is the material
technically correct? Instructional designers, subject-matter experts, instructors
and potential students, each focused on a different element of adequacy, can all
provide useful information at this stage. Subject-matter experts can assess the
technical accuracy of the material. Students can provide feedback on clarity and
understandability. Instructors can tell you a great deal about structure and flow.
For instructional designers, the important questions are around the
effectiveness and efficiency of the instruction.
Adequacy only matters when the parts are consistent. For example, unless
the multiple-choice test items in an end-of-module test match the materials
found in the module, it doesn't matter how well the test items are written.
Because of this, examining consistency before adequacy is important.
Process Guidelines
The more people that review the materials, the better the chances for achieving
a high-quality product. One person may be able to review from two
perspectives, for example an instructor and a subject-matter expert. This is
acceptable as long as they go through the materials twice and do a review from
each perspective. The following guidelines help ensure useful and timely
feedback from reviewers
• Make personal contact with each potential reviewer to ensure participation.

• Let them know what is coming and when it will arrive so they can plan
time for the reviews.
• Give people specific directions about what to look for rather than saying,
"Please review this for me."
• Send out materials in small manageable quantities, 20 pages rather than
100.
• After people have returned their comments to you, get back to them. Tell
them what you changed or left alone and why. After they made the effort,
people want to know that you did consider their suggestions. This will also
make them more willing reviewers in the future.
Prototype Testing
This is the point at which instructional materials are tested in a context that is
similar to the real setting in which they will be used. Prototype testing may
occur in several stages. You may flrst test the materials with colleagues, then
with a group of people similar to the audience, and flnally with the actual target
audience.
The methods used for testing will differ based on the format of the materials.
Some general guidelines are discussed for trying out self-paced materials and
for conducting an early module teach for an instructor-led training activity.
Individual Tryouts for Self-Paced Material
Self-paced material is instructional material that is presented without the

guidance of an instructor or facilitator. It includes print workbooks, computer-
based training, and video-based instruction. Sometimes an entire course or
program is self-paced. Sometimes self-paced materials are included in
instructor-led courses or seminars.
The basic methodology for trying out self-paced materials is to have

participants go through the materials individually with an evaluator present. As
the students go through the materials they are encouraged to think aloud. That
is, they should say what they are thinking. This might include reading the
materials, answering questions to exercises, noting when they are unsure about
what to do next, stating when something in the content is not clear, expressing
uncertainty about what they see on a computer screen, saying when they are
surprised about what happened next. The evaluator should have a print copy of
everything that the student sees, such as a copy of a workbook or a print copy of
the storyboards for computer-based training. Without judgment, the evaluator
records what happens.
The problem with this methodology is that it is exceptionally time

consuming. Because of time limitations, it is typically done with only a small
number of potential students. However, it often takes only a few tryouts for
patterns to emerge in the data collected by the evaluator. An excellent source
for a specific and highly detailed methodology relevant to this type of
evaluation can be found in Structured Walkthroughs by Edward Yourdon
(1989). The book was written for walking through computer software
programs, not instructional products. However, the methodology that he
describes is exceptionally useful and can be applied to an educational setting.
Early Module Tryout for Instructor-Led Materials
After the first significant module of an instructor-led course has been drafted
and reviewed, it can be tried out in a mini-teaching situation. Only a small
number of people are needed. It is better if they resemble the target audience,
but if they don't, the characteristics of the target audience can be described to
them, and they can be asked to play that role.
Typically, the course developer teaches from the draft materials, using them
in the same way they are intended to be used in final course. If possible, an
evaluator or observer should be present to record behavior, questions, delays,
and problems.
Following the teaching portion, analyze the results from three perspectives:
1) student achievement, 2) student perception, and 3) instructor perception.
Assessment of achievement can be accomplished with a short test or simply
through a discussion about what people learned. Student perceptions can be
assessed in a questionnaire and through a structured group discussion. The
instructor can summarize what went well, what did not, and identify how
changes in the materials would make the content easier to teach. This
information, coupled with the observer's notes, can provide valuable
information to guide subsequent development of the course.
Even though this technique can produce excellent results, it does not have to
be done with every module or even with every course. Use it in situations in
which there is some uncertainty about the content of the materials, the
methodology used to present them, or the capability of the instructor who will
be teaching the course pilot or subsequent courses. In each of these situations it
will provide useful information that will help identify and avoid potential
problems.
Pilot Teach with a Class
This is the final phase of prototype testing. This is usually referred to as a

course pilot and is one of the more frequently employed formative evaluation
techniques. If other types of formative evaluations have preceded this stage, the
pilot should only yield minor changes. The pilot is conducted with the actual
target audience using final or near-final materials.
Participants in a pilot are informed ahead of time it is a pilot, and that the
last part of the course will be an evaluation. They are typically encouraged to
note anything they think would improve the course along the way. The typical
method for collecting data from the pilot is to have a conference with the
students at the end of the course. If it is a multi-day course, some instructors
also like to have a daily short analysis of how thing are going at the beginning
or end of each day.
Someone other than the instructor should conduct the conference, preferably
an evaluator. It sometimes helps people be more candid if the instructor is not
present during the discussion. Often a combination of written questionnaires
and structured discussions are used to gather information from pilot
participants. If both are used, the questionnaires should be completed before the
class engages in a group discussion. Some of the topics that can be addressed in
the pilot evaluation, either through questionnaires or in discussion include
• Learner reaction to the overall experience, good and bad

• Match between prior expectations for the course and what
happened in the course itself
• Practicality and usefulness of the instruction back on the job
• Appropriateness of the examples

• Balance between lecture and exercises
• Usability of handouts and other materials during class
• Appropriateness of language and grammar
• Level of difficulty of the materials
A number of specific techniques to use for these types of evaluation can be
found in the ASTD Trainer's Toolkit: Evaluation Instruments (Fisk, 1991).
Issues in Formative Evaluation: Barriers and

Opportunities
The Barriers
Formative evaluation has always been a part of every formal instructional

design model. But the majority of course development projects include little or
no formative evaluation. Why?
In part, the lack of formative evaluation is increasingly due to the current

propensity to do everything faster, cheaper, and with fewer resources. Within
this general management trend, formative evaluation is often seen as something
that would be nice if we had more time or a larger budget. This is a somewhat
misguided notion, as will be discussed later.
Formative evaluation requires a unique skill set, as does every other element
of the instructional design process. Over the past twenty years there have been a
number of studies to identify the competencies needed in each area of
instructional design. Many graduate programs in instructional design,
particularly smaller programs, will have introductory instructional design
courses, courses specific to media, for example computer-based instruction, or a
course on needs assessment or project management. Most small- to medium-
sized departments do not have a course on formative or summative evaluation.
This means that many people with a graduate degree in instructional design
have little formal training in evaluation. This problem is compounded by the
fact that not much formative evaluation is done in corporate training

departments. People do not learn the skills in school nor do they have the
opportunity to learn them on the job.
Other parts of the overall instructional design process have received more
attention. For example many people perform needs assessment. People see the
necessity for it, and a number of people have developed skills in the process.
An indication of this is the number of companies that provide needs assessment
services as compared to formative or summative evaluation services. The index
of the 1996 ASTD Buyer's Guide and Consultant Directory lists 127 companies
that offer needs assessment services. The same directory lists 44 companies that
offer evaluation services. But the actual number of companies offering
formative and summative services is far less because the list of 44 is
predominated by companies offering specific evaluation instruments, not
formative or summative services. Parenthetically, this may be a clue to some
enterprising types that this is a market with large opportunities.
For some course developers, what matters is the technical accuracy and
completeness of the materials. A large number, if not the majority of people
who develop and deliver technical training, are subject-matter experts who have
been placed in a different role. What has mattered in the past was the technical
aspects of a product or service. This is what they know and what they feel is
important. They often do not understand the absolute necessity for a balance
between technical accuracy and educational soundness. If you want quality
courses, both are an absolute necessity.
The Opportunities
What opportunities or benefits are realized when formative evaluations are

done? The first and foremost benefit is qUality. When you create a course, you
have two customers. First is the client who has funded or directed your efforts.
This could be your manager or a paying client. Formative evaluation involves
the client throughout the instructional design process and gives the client
something to be pleased with in the end. Your second customer is the student.
This chapter outlines a set of steps that ensures you have paid attention to the
needs of the student and used evaluation methods to see that those needs have
been met. Early prototype evaluation helps you identify and build on the
strengths of your course design. The result is a more solid and effective
learning experience for the student.
Contrary to some popular management beliefs, formative evaluation is

potentially a big money saver. If you find a problem when you are well into the
development process, or, even worse, at the end of development, it is very
expensive to correct the problem. It is usually much less expensive to find and
correct a problem in the design or prototype stages than it is after the course is
piloted or released. Moreover, the damage done in terms of customer dis-
satisfaction, particularly if students are external customers, can have
repercussions that ripple across the corporation.
There is one last benefit to mention, and that is for students using self-paced
materials. This includes any media or method in which the student and
instructor are not physically together. This would include computer-based
instruction, multimedia, print workbooks, and much of the distance material
that is currently being produced. If a student is in a classroom and does not
understand something, he or she can ask a question. This is much more
difficult, and sometimes impossible to do when using self-paced material. If the
student does not understand some basic concepts in a multimedia course or
needs something explained a different way, what can the student do except
become frustrated. This is something that you want to avoid. Formative
evaluation is a tool to ensure that students can perform well when using this
material. By trying it out and testing your assumptions along the way, you are
limiting the frustration and loss of learning that may occur in the future. In
addition, the cost of changing and redistributing materials, particularly
multimedia materials, can be formidable.
Formative evaluation is not a ''nice'' thing to do, if you have the time and a
big staff with expert skills. It is a necessity for every course. There are concrete
processes that anyone can learn and use. It's a well-documented and mature
field upon which we can continue to build. Its rediscovery is heartening and
should be encouraged.
References
Cronbach, L. J. (1975). Course improvement through evaluation. Reprinted in D. A.

Payne, & R. G. McMorris, (eds.), Educational and psychological measurement.
Morristown, N.J.: General Leaming Press.
Fisk, C. N. (1991). ASTD trainer's toolkit: Evaluation instruments. Alexandria, VA:

American Society for Training and Development.
Montague, W. E. (1983). The Instructional Ouality Inventory ClOD: A formative tool for
instructional systems development (ERIC Document Reproduction Services No. ED
235-243).
Scriven, M. (1967). The methodology of evaluation. In R. W. Tyler, et al. Perspectives

of curriculum evaluation. (AERA Monogram Series on Curriculum Evaluation,
Number 1). Chicago: Rand-McNally.
Shufflebeam, D. L. (1968). Evaluation as enlightenment for decision-making.

Columbus, OH: Ohio State University College of Education.
Stake, R. E. (1967). The countenance of educational evaluation. Teachers College

Record. 68, 523-540.
Yourdon, E. (1989). Structured walkthroughs <4th ed.). Englewood Cliffs, NJ: Prentice-
Hall.
About the Author
Wilbur Parrott is Practice Director for Global Knowledge Network's

Competency Management Services in North America. He has more than 22
years experience in education and training, including instructional design,
competency, and learning processes. He has worked in Fortune 500 companies
and governmental agencies and has consulted with a variety of organizations in
the United States, Canada, Europe, and Japan. Wilbur has lectured half-time
for the past eleven years at Boston University, where he taught graduate
courses, was faculty advisor to doctoral students, and acted as first or second
reader for 26 doctoral dissertations in instructional design and multimedia. He
holds a B.A. from Mars Hill College, a M.L.S. from the University of Rhode
Island, and a M. Ed. from Bridgewater State College and an Ed.D. in
computer-based instruction and multimedia from Boston University.
9 ASSESSING EMPLOYEE
COMPETENCIES
Susan Ennis
A recent convergence of social, economic, and business forces has led corporate
practitioners in training and development to look at employees' competency
assessment results not only as indicators of performance strengths and
development needs, but also as sources of data for organizational decision
making. In fact, one of the hot topics in human resources today, especially in
the areas of compensation and development, is competencies and their potential
for integrating the various elements of human resource systems. Most of these
approaches favor multirater or 360-degree feedback as the competency
assessment mechanism of choice (American Compensation Association, 1996).
There are, however, several fundamental issues that most of the current
literature and conference presentations fail to address:
• Why assess employee competency?

• Can we use a single competency measurement to guide development,
deployment, training, compensation, workforce planning, and performance
appraisal?
• Will a competency assessment help us meet our business needs?
As practitioners, we must be cautious about offering simple solutions to
complex problems. We must question whether we can reliably and ethically
offer competency assessment and its results as a multiple-problem fix. We must
also judge if we are damaging the impact of using competencies by trying to do
too much with them. Are we treating one source of data about human
184 Assessing Employee Competencies
performance as the seminal, causal, and predictive indicator of job success and
organizational effectiveness? Are we trying to provide a single solution to
relieve managers' stress and the challenge of managing people and
performance?
This chapter begins with a short discussion of the social, technological, and
especially economic conditions that have contributed to the popularity of
employee competency assessment. Then four key topics are discussed:
• What is competency?
• What are the competency identification methodologies?
• What is competency assessment?
• When do you use a multirater competency assessment?
The chapter ends with a discussion of tools and techniques that contribute to
the formulation and implementation of a useful and meaningful competency
assessment effort.
Convergence of Economic, Social, and Technological

Conditions
Several conditions have recently converged to make competency assessment a

potentially valuable tool for measuring employee performance, potential, and
capability. The question remains whether you can use the results of one
assessment for all three of these applications simultaneously.
Social forces have set the stage for an increase in psychological and multi-
rater measures of performance. Technological innovations, by improving the
collection, storage, and crunching of the numbers that represent people's
capabilities, have led to the multimillion-dollar business of competency
assessment. At the same time, economic requirements are producing work
accomplishments and other outputs that are complex, interdependent, and
progressively harder to measure. Exploring the conditions that produced this
convergence of economic, social, and technological factors highlights the
strengths and problems of the competency assessment movement.
Social conditions have created an aura of democratization in the workplace,

producing a tolerance for feedback and a norm that multiple perspectives are
important. Employees demand that their life-balance and lifestyle concerns be
factored into organizational demands. The multirater or 360-degree approach
to assessing competencies flows from this whole-person, mUlti-perspective
approach (Kantor, 1997). According to some practitioners, it "feels" more valid
and realistic (Hirsch, 1994; Yukl & Lepsinger, 1995).
Another factor that drives competency assessment is computer-based techno-

logical innovations, such as relational databases, survey software, optical
scanning tools, and the dramatically increased power of desktop machines that
allow the human resources practitioner or line manager to gather and analyze
data. These tools are making multirater assessments ever more easy to
implement.
Economic and Business Conditions
Economic conditions such as increasing global competition, the movement of

production jobs offshore, reduced margins, and the reengineering, downsizing,
and acquisition of businesses have created demanding business challenges. A
key result of today's tightening economic conditions is that defining work and
measuring output is harder to do. One solution is to measure, as indicators of
output, the knowledge, skills, and behavior people bring to and use in their
work. Competencies hold the promise of capturing these hard-to-measure
aspects of performance so multiple stakeholders can use them for several
purposes at once (Lawler & Ledford, 1996).
Economic conditions often translate into flatter organizations, streamlined

processes, and reduced job security. As work becomes more complex and
sometimes amorphous, it crosses and recrosses organizational and professional
boundaries. The social interaction skills needed to meet customer demand or
work as a team are equally complex.
Merely establishing minimum performance standards will not meet the

challenge. Even the more complex job-task analyses that comprehensively
identify the knowledge, skills, and tasks required for work performance does
not fully capture the abilities that today's knowledge workers need (American
Society for Training and Development, 1996). The job competency movement
begun in the 1970s by David McClelland (1973) and later codified by Richard
Boyatzis (1982) took a rigorous research design and statistical analysis
approach to identifying and defining the thoughts, behaviors, actions, motives,
and traits that predict successful job performance. McClelland was the first to
state that successful job performance is unrelated to IQ but highly correlated to
the demonstration of competencies.
Many top-selling management books today continue this trend on a more

popular level. Stephen Covey's The 7 Habits of Highly Effective People (1989)
and Daniel Goleman's Emotional Intelligence (1995) are just two of the current
titles that explore meta-abilities, defined as the personal characteristics,
aptitudes, and attitudes that differentiate performance.
Whether you take a popular or a more-technical perspective, competencies

are a means of identifying a set of performance requirements necessary to
succeed in today's economic environment. The competency approach captures
the essence of performance and uses it to create a measurement system
(Edwards & Erwen, 1996). Keep in mind, however, that producing a numeric
result for hard-to-measure competencies may provide a false sense of order and
control.
The job, so long the "atom" around which human resource practices were
built, is no longer a viable concept (Ledford, 1996). There is a greater demand
to reward people not just for ''making the numbers" or reaching predefined
goals but also for considering how these goals were met. Do individuals meet
short- as well as long-term customer expectations? Do they decrease costs while
ensuring high qUality? Did they bum out other staff while doing so? These are
examples of the softer, less output-oriented "how" side that competencies
commonly measure. As work becomes more complex and outputs more team-
driven and interdependent, competency assessment is considered a potential
solution to the new challenges of measuring both the harder "what" and the
softer "how" side of performance.
Today's managers, as they grapple with ever-increasing numbers of direct

reports, want a simple, additive, and comparable measure of both the what and
the how. They prefer numbers that can be stored, retrieved, and used for
multiple purposes. They, therefore, use competencies both to evaluate
performance and indicate potential. Competency assessment results, once used
to measure employee progress towards an ideal benchmark or superior

performer rating, are now used as tools of organizational decision making.
The concepts of empowerment, learning for future employability, and

personal responsibility for career growth are becoming operational in employee
minds as well as employer practices. Measuring and rewarding performance
fairly and equitably is an increasingly complex task because work can be both
highly differentiated yet interdependent. Competencies, which provide a neutral
common language that managers and employees can use to discuss these issues,
are an alternative to traditional pay grade or job leveling structures. They also
replace vague performance feedback, such as "communicate better," with more
precise, action-oriented behaviors (American Compensation Association,
1996).
Employees are also drawn to the competency approach because they want to
develop the "how," otherwise known as their "portable competencies" (Larrere
and Williams, 1996). Taking initiative, listening to others, working efficiently
in diverse teams, thinking critically, and acting with flexibility-these are the
fundamental qualities that individuals can take with them from job to job,
company to company. To employees, competencies are the concepts they must
master to stay employable and succeed in tightening economic conditions.
Using competency assessment results as a business standard does have an

aura of clarity and finality. Still, the "how" of defining and assessing
competencies may not be ready for such a weighty application. The science of
competency identification is too often not applied evenly or rigorously. Creating
a competency model using empirical research or selecting one off the shelf is
one of the most complex, least understood aspects of competency assessment.
What Is Competency?
Individual or people competencies are based on human performance in a task,

job, or role, usually presented in a model or profile and accompanied by
behavioral descriptions. These models do not directly address the core
competencies of the corporation as popularized by Prahalad and Hamel in their
widely read Harvard Business Review article (1990).
An organization may develop a universally applicable competency model

that reflects a set of company-wide values: for example, all employees should
demonstrate teamwork, diversity, integrity, and initiative. While this chapter
may clarify the development and application of universal competency models,
such is not its primary intent. Its focus is rather on the competencies needed to
perform assigned roles successfully.
There are three main categories of individual or people competencies:

minimum standard, descriptive, and differentiating.
Minimum Standard Competency Models
Minimum standard competencies refer to observable and often easily testable

knowledge and skills. Since they establish baseline criteria or job requirements
such as arithmetic literacy or forklift operation, these competencies are used to
screen applicants, establish certification requirements, guide training designs,
and meet government or industry regulations. A minimum standard
competency model is best suited for simple tasks because its concrete nature
does not adequately address the interpersonal or cognitive aspects of work. This
type of model is usually not included in the competency assessment debate.
Designing and administering tests to measure minimum standard

competencies may not be expensive. However, the costs of storing great
volumes of information, maintaining current records, retrieving, and using this
type of data are much higher. Considering the inherent low value of the
information, minimum standard competency data are normally stored for such
regulatory purposes as demonstrating compliance with industry or government
guidelines.
Descriptive Competency Models
Descriptive competency models take a broader look at the "how" of

performance. Based on job analysis or instructional design methodologies,
these models describe what an individual needs to do: the knowledge, skills,
and behaviors to perform a task, job, or work role. Since it describes
performance criteria in detail, a descriptive competency model can grow
cumbersome in length and specificity. Often the individual competencies
measured are not balanced in breadth or depth; for example, knowledge of

financial indicators vs. consulting skills.
Descriptive models arise from the behaviorist psychological tradition of B.

F. Skinner, Edward Thorndike, Clark Hull, and others. They rely on observing
behavior rather than capturing motive or intent. More ephemeral or deep-seated
characteristics, such as state of mind, self-concept, or difficult-to-observe
cognitive processes, are usually not considered. Aptitude, trait, attitude, and
other psychological characteristics are often described superficially or missing
altogether (McClelland 1987).
These models describe but don't differentiate the critical characteristics that
cause or predict successful performance. They are most valuable for developing
job profiles, job orientation and training programs, deployment indicators, and
performance appraisal criteria. Descriptive models become problematic when
they attempt to go beyond description to create performance differentiators
without the benefit of empirical data.
To obtain the greatest value from a descriptive model, it is important to

distinguish threshold competencies, which are necessary to meet performance
requirements, from distinguishing competencies that are possibly related to
successful performance. Understanding this distinction, precisely measuring the
competencies, and determining how to use the resulting data are difficult tasks.
Much of this difficulty can be handled by applying a rigorous empirical
competency methodOlogy, and then using professional standards established for
testing and assessment by the joint committee of the American Education
Research Association/American Psychological Association/National Council on
Measurement in Education (American Psychological Association, 1985).
Differentiating Competency Models
The third category of individual competency model used in business today is

the excellence or superior-performer model. It has attracted the most attention
because its proponents claim to be able to identify the critical human
characteristics that cause and predict successful job performance.
A common lay definition states that differentiating competencies are what

superior performers do more often, in more appropriate situations, and for
better results than average performers. Spencer and Spencer offer a more
technical definition: "A competency is an underlying characteristic of an
individual that is causally related to criterion-referenced effective and/or
superior performance in a job or situation" (1993, p. 9). According to this
approach, if you can identify star performers, you can study them; figure out
what makes them great; label and behaviorally define their characteristics,
especially deep-seated ones like motive, personality traits, and self-concept; and
use the resulting criteria to assess, select, develop, and train others.
Performance in the more complex jobs or roles becomes less a function of
knowledge, task-related skills, intelligence, or credentials and more a result of
competencies, such as a motivating drive, interpersonal skills, positive self-
image, or political savvy (Spencer & Spencer, 1993).
Differentiating or excellence models are based on the personality psychology

tradition of Sigmund Freud, Henry Murray, Carl Rogers, Abraham Maslow,
and David McClelland (1987). A basic principle underlying this approach is
that human behavior has a purpose or intent driven by internal needs or
unconscious motives. Three outcomes of this tradition are still contributing to
the competency assessment debate.
First, can competencies such as conceptual thinking, self-confidence, and

sense of humor, which are based on internal dynamics, be identified? Second,
can others measure and assess them? Third, how do you identify competencies?
Respondent measures alone, including surveys and multiple-choice tests, may
not be enough. They involve making a choice among several well-defined
alternative responses to carefully structured situations. Decision making is
rarely so static in the workplace; nor are alternatives so stark and well-defined.
Operant measures, which require candidates to think and respond in on-the-job
situations rather than just choosing from a list of answers, are better predictors
of differentiating competencies. Operant tests include Behavioral Event or
Critical Incident Interviews (Spencer & Spencer, 1993; Flanagan, 1954) and
Thematic Apperception Tests (McClelland, 1987).
The obvious potential benefit of the differentiating approach has spawned

numerous competency consulting firms as well as variations in methodology
and result utilization. Three popular and markedly different books provide
vivid examples of contrasting competency approaches. These publications vary
primarily in their methods of competency identification and application of
findings. The 7 Habits of Highly Effective People, by Stephen Covey, discusses
seven universal competencies based on anecdotal research, literature review,

philosophical tracts, and personal experience. These competencies or habits,
defined as "the intersection of knowledge, skills, and desire" (Covey, 1989, p.
47), theoretically represent stages of mastery in the process of becoming a
leader. They describe a way of being in both work and personal situations.
Daniel Goleman's Emotional Intelligence (1995), based on a broad review

of recent behavioral and brain research, generates new insights by combining
empirical studies. Goleman is also interested in what makes for a successful
person. He promotes the concept of emotional intelligence, a social, practical,
and personal intelligence alongside the traditional IQ concept. In fact, Goleman
refers to both emotional intelligence and IQ as competencies.
Competence at Work, by Lyle Spencer and Signe Spencer (1993), updates

Boyatzis's original competency research. Spencer and Spencer differentiate
between visible or surface competencies, such as skills, and hidden or core
elements, such as motives and personality traits. It is the interface of surface
and core competencies that produces the attitudes and values demonstrated in
the workplace. Spencer and Spencer's comprehensive competency dictionary is
based on their rigorous review and qualitative comparison of behavioral
indicators in 286 competency models. Despite its depth of research,
Competence at Work has not captured the imagination of the reading public. By
focusing on competency in the workplace, this book does not take on the self-
help nature of Covey or the broad social implications of Goleman.
Excellence or differentiating models are most valuable when they are well
researched and grounded in business needs. Their highly context-sensitive
approach makes them invaluable in an organizational or cultural change effort.
Since this type of model is so tightly linked to top performance in a role or job,
it is not suitable as a company-wide universal model.
What Are the Competency Identification Methodologies?
Every social science research method is used in the attempt to identify

competencies: surveys, focus groups, brainstorming sessions, literature reviews,
card sorts, software tools, interviews of all types, and personality tests. Data are
often gathered by respondent measures that use multiple perspectives from
managers, peers, customers, and observers. Some practitioners use both
respondent and operant measures to differentiate values from intent, or actual

causal behavior from desired or "correct" practices.
Methodological Considerations
As a practitioner, how do you know if you are choosing the most appropriate
and cost-effective methodology? Your methodology, or combination of data-
gathering and analysis approaches, should be closely tied to intended business
outcomes. There are several key questions to keep in mind:
1. What is the business need for creating the competency model?

2. How will the assessment data be used?
3. What category of competency model-minimum standard, descriptive, or
differentiating-do we need to answer questions 1 and 2?
4. How will the model be applied?
5. Is the organization ready? Is management ready to commit to using the
model and applying its results on an ongoing basis? Do employees
understand the benefits of participation?
If these questions cannot be readily answered, then the effort of conducting a
competency study and building a model is either premature or not necessary at
all.
The type of model and data collection methodology you choose should be
related to the intended business use of the resulting information. If your
organization wants to improve its "hit rate and retention" for hiring in key
positions with significant customer interface-a senior account executive, for
example-you will need a differentiating competency model with high
criterion-related, construct, and content validity. If, on the other hand, solid but
streamlined job descriptions are needed to clarify roles during reorganization or
create job-focused development plans, a descriptive competency model may be
adequate. Table 9-1 will help you decide which type of model to use.
Table 9-1. Competency model decision matrix
Competency Methodologies Used Business ApplicQJion

Model Type
Minimum Task or job analysis Regulatory compliance
Standard Respondent measures Task deployment
Descriptive Role or work analysis Performance goals
Performance requirements Training design
analysis (self and others) Development planning
Criterion samples Workforce planning
Respondent measures Job redesign, reorganization
Differentiating Multiple-perspective Selection
respondent measures Certification
Criterion samples Complex project assignments
()perantmeasures Development for key
performers
The most critical factor in determining whether a competency model

differentiates or simply describes performance is the use of both criterion
samples and operant measures. The most lucid explanation of this rationale
comes from David McClelland in his introduction to Competence at Work,
where he takes a retrospective look at how the competency movement began.
He concludes that the traditional methods of assessing aptitude and knowledge,
including grades and credentials, are poor predictors of success. Also, these
measures are often biased by race, sex, or socioeconomic factors (Spencer &
Spencer, 1993).
McClelland believes that a less-biased method of predicting job performance

is the use of criterion samples, which compare "people who have clearly had
successful jobs or interesting lives with people who are less successful in order
to identify those characteristics associated with success" (Spencer & Spencer,
1993, p. 3). A second method involves the
identification of operant thoughts and behaviors causally related

to successful outcomes .. . [The] best predictor of what a person
can and will do is what he or she spontaneously thinks and does in
an unstructured situation-or has done in similar past situations
(Spencer & Spencer, 1993, p. 4).
The most valuable competency studies also use multiple data gathering and
analytical approaches designed to overlap and produce similar or converging
results. If a competency study uses only one data-gathering method, whether
interviews, surveys, literature review, or focus groups, key behaviors or
competencies are practically guaranteed to be missing. Within reason and
budget, the more data gathered, the more perspectives tapped, the more
analytical approaches used and reviews and feedback from potential users
sought, the better the chance of producing a sound, valid, context-sensitive
competency model of any kind.
Balancing the Costs of Competency Identification
A common refrain from line clients about competency studies, and especially
differentiating models, is that they take too long and cost too much. As
practitioners, we need to improve our project planning and management skills,
aligning our activity and deliverables with our company's business processes
and schedules.
Most competency vendors and consulting fIrms offer generic competency

dictionaries or reliable databases that can be customized to your company's
needs. Using these resources can reduce both the cost and time of a competency
research project. However, because it has not been developed for your company
specifIcally, a database model may lose precision in describing behavioral
indicators and responding to business-context issues. It will lack the language
and vocabulary common to the organization. In other words, this competency
model will be in the ballpark, but not a home run.
A generic model based on a database will save you money. It will open doors
for future competency initiatives and establish credibility. It will defInitely
provide greater value than a brainstormed list. The alternative approach, using
a vendor to develop a company-specifIc model from scratch, is much more
expensive.
Another option for answering the costitime refrain is to build a business case
based on the costs of not doing a quality job of competency identifIcation. For
example, the cost of recruiting, hiring, relocating, orienting, and assimilating a
single senior new hire is in the hundreds of thousands, especially when lost
productivity, a hiring bonus, and a relocation package are included. Preventing
one or two poor hiring decisions more than pays for the competency study. If
you are planning a three-day offsite seminar, calculate the attendance costs for
150 managers. Include travel, food, accommodations, fully loaded salary, and
lost opportunity dollars. Add the cost of training and design, including
simulations and custom videos. Isn't it worth the cost of a competency study to
ensure that the behaviors being assessed and targeted for training are proven
predictors of success?
What Is Competency Assessment?
Assessment is a means to an end. It is a process for measuring an individual's

knowledge, skills, behavior, and personality traits against predefined criteria.
This process has four steps: 1) defining criteria, 2) collecting data, 3) analyzing
the results; and 4) taking action on the findings. Assessment results can be
applied to workforce planning, development, selection, certification, and
deployment.
To make the assessment worthwhile, you must determine which

competencies you are assessing, measure them carefully, and clearly describe
how the data will be used. Preparing or training the assessors by providing
clear directions and tools is a part of the process that is too often overlooked
(Brachen, 1994). Most important, you must specify the business need for a
competency assessment. Will the results of the assessment meet that need?
Business-sponsored assessments send messages to the organization, positively
or negatively affect morale, and, in these times of downsizing, increase
employee anxiety and create business risk.
A recent study at a Fortune 100 company determined that a quality

competency assessment cost about $1,000 per professional employee. Before
introducing any assessment initiative, your organization should answer the
following questions:
1. Is the proposed assessment a worthwhile business cost?

2. Do we have the time and resources to invest in a quality assessment that
meets our business needs?
3. What's in it for the employee?
4. What's in it for the business?
5. Is management committed to conducting the assessment and using the

results for their intended purpose?
6. Is the workforce ready for assessment?
7. What are the tools, processes, and information we need to conduct this
assessment?
Once you have determined that competency assessment is a good business
decision, the next step is to design a reliable assessment instrument that
demonstrates a level of validity that is appropriate to the intended uses of the
data. Reliability refers to the consistency of the competency measurement over
time. Content validity assures that your instrument is measuring the skills,
knowledge, and behaviors necessary to master the competency. Construct
validity measures whether you are using the right tools to assess competencies.
Criterion-related validity measures the extent to which a competency is related
to job performance. This type of validity is more difficult to establish but is
required when actions taken as the result of the assessment will affect
employment decisions. Finally, a well-designed assessment instrument will
have high predictive validity-you will be able to use it to estimate future job
performance. Such an instrument is a worthwhile investment. It will provide
valid and reliable data for employability decisions, help you reduce business
risk, and increase value for employees and the organization.
Carefully determine the uses of assessment data and work closely with line
managers to help them understand the legal, ethical, and moral implications of
using data about people. Managers also need to understand the kinds of
business decisions they can make from different types of data. For example,
drawing up an employee development plan from the organization's composite
scores does not make good business sense, but using that data for training
program design and priority development decisions is probably worthwhile. Say
a manager wants to use competency assessment ratings to determine individual
potential for more senior positions. The competency model, instrument,
assessment process, and resulting data must meet tough standards and should
always be corroborated by other data, such as performance appraisals, business
results, and customer satisfaction scores. Competency assessment information
does not replace the hard job of managing people and performance. It is rather
a tool that informs and enhances that job.
When Do You Use a Multirater Competency Assessment?
The objective of multirater assessment is to obtain data from individuals who

interact with an employee in all his or her varying roles. Data gathering takes
place in a full circle, on a 360-degree basis: above from managers; below from
direct reports; and laterally from peers, colleagues, business partners, and
customers both internal and external. For reasons of cost and consistency, a
surveyor other instrument is normally used to rate the employee against a set of
common criteria.
Multirater results are more credible to employees, motivating them to

change and learn. From their perspective, a 360-degree approach provides a
more complete and accurate assessment than manager-only ratings. They
believe it minimizes bias and produces a fairly accurate gauge of capability or
performance. And, as they assess others, employees feel heard, valued, and
empowered (Budman & Rice, 1994). Recent research shows that "with the
proper safeguards, subordinate evaluation provides assessment information
with greater validity than assessment centers." (Edwards & Erwen, 1996, p. 7).
Whenever you open a human resources trade journal or conference brochure,

multirater assessment is glowingly described as creating competitive advantage,
driving strategic direction, changing people's lives, and increasing customer
focus, to name just a few miracle cures (O'Reilly, 1994; Edwards, 1995;
Hoffman, 1995; Sanders, 1996; Gebelein, 1996). The majority of articles go on
to profile three or more companies that have adopted the practice in several
human resources processes. These publications provide tactical or internal
marketing advice: "three lessons learned," "ten reasons to use," "five things to
watch out for."
Although a review of the literature may lead you to think that 360-degree
assessment is becoming a standard practice for managers, executives, and
employees who interact with customers, it is still unclear how many U.S.
companies actually use this methodology (American Compensation
Association, 1996; Bohl, 1996). The multirater approach is taking hold in
Europe as well (Handy, Devine, & Heath, 1996). To date, however, there has
been no rigorous study that catalogs the use of multirater assessment
methodologies, much less addresses their business, financial, or even personal
benefits. And no independent body of research has yet proven the common-
sense conclusion that the multirater approach reduces bias and adds credibility
compared to more traditional, supervisor-only input.
In the eyes of some practitioners, multirater competency assessment is

expedient yet comprehensive, qualitative yet quantitative, complex yet simple.
It is a fast way to obtain numbers on performance, or even potential
performance, and use them for multiple purposes simultaneously. Multirater
competency assessment has become the veg-o-matic of the human resources
tool kit. It slices, dices, and peels back the data of human capability so you can
answer any business question-individual development, organizational
planning, pay, training impact-simply by referring to a database of employee
competency assessment scores.
At this point reality, along with professional standards, steps in to suggest

the need for a more critical look at the viability of using multirater competency
assessments for simultaneous, and sometimes conflicting, purposes.
When to Use a Multirater Approach
As illustrated in Table 9-2, the two major uses of multirater assessments are
appraisal and feedback.
Table 9-2. Appraisal and feedback

Assessment Type Purpose Impact on Employee
Appraisal Evaluation • Selection
• Perfonnance rating
• Salary action
• Deployment
• Promotion
• Succession
Feedback Input • Personal learning and growth
• Development actions
• Training plan
• Career decisions
Both uses affect people's lives, sometimes dramatically. For this reason
alone, practitioners must carefully monitor the quality and efficacy of the
multirater assessment process. The criteria or competencies assessed must be
empirically sound. The assessment instrument, including questions, scales,

directions, and reports, should meet American Psychological Association
standards (1985), or other established psychometric guidelines (Van Velsor &
Leslie, 1991).
Appraisal decisions, including selection and salary, often have legal implica-
tions and should be closely monitored to avoid potential litigation. Feedback
decisions, including career plans and development actions, are usually less
scrutinized since they do not have the same financial implications as appraisal
decisions. Yet, for fragile egos or employees undergoing personal crises,
feedback results from peers and direct reports as well as managers could prove
overwhelming.
The Pitfalls of Double-Duty Assessment
Using a multirater assessment for either appraisal or feedback is valuable.

However, using the same multirater instrument, process, or scores for both
outcomes diminishes rather than increases the value of the assessment.
Practitioners are too quick to assume that a successful multirater assessment
designed for feedback purposes is equally applicable to appraisal decisions.
Efficiency aside, using the same score for both appraisal and feedback is a poor
business investment (Carey, 1995).
On the appraisal side, raters tend to inflate their scores if they believe
managers have access to individual assessment results and use them for
evaluation rather than development (Dalton, 1996). Research studies are
beginning to establish that, once peers perceive a potential negative
consequence to a rating, their scores are more lenient, less differentiating, less
reliable, and less valid (Farh, Cannella, & Bedeian, 1991).
Maxine Dalton of the Center for Creative Leadership, one of the driving
forces behind establishing the value of multirater feedback, says that
traditional performance systems, in which feedback intended to

motivate changes in performance was tied to other administrative
decisions, led to denial, resistance, and failure to accept the
ratings and were a demoralizing experience (1996, p. 15).
She cites a study in which manager and self-scores on a multirater survey

were lower when revealed to the participant alone but higher when data were
shared (Dalton, 1996). A 1995 study found that 40 percent of raters who
provided feedback in confidential and anonymous conditions would change
their ratings if managers had access to the results and used them for appraisals
(London, 1995). The value of shared data for organizational planning is
diminished and the goal of obtaining an accurate picture of individual
performance subverted.
From the feedback point of view, the intent of multirater assessment is to

help individuals change or redirect their efforts by providing candid and
comprehensive data. This feedback can be disconcerting, even painful. For
example, the executive who sees himself as a visionary driver of change may be
rated by others as unable to implement; the manager who considers herself a
keen tracker of costs and projects may be hindering an initiative designed to
instill responsibility, autonomy, and empowerment in work teams; or the
engineer so proud of his personal standards for functional elegance and
technical sophistication may come across as inattentive to customers and
oblivious to market needs.
Because feedback may strike at the core of a person's personal

belief system, it is crucial to set the conditions of feedback so that
he or she will be able to tolerate, hear, and own this discrepant
information. If conditions facilitate defensiveness and denial, then
the feedback will not be heard and the likelihood of change will
decrease accordingly. If conditions facilitate acceptance of the
feedback information, then the likelihood of change increases
(Dalton, 1996, p. 3).
Feedback on its own can be overwhelming enough. Using the same

assessment process to provide both feedback and appraisal can be devastating,
counterproductive, and, in the final analysis, ineffective. More effective are
distinct appraisal and feedback processes that work together in a coordinated
fashion. For example, manager and employee may mutually agree upon
development goals based on a confidential multirater competency assessment,
and then incorporate progress towards these goals into the performance
appraisal process. Increases in competency assessment scores will be used to
guide development rather than measure performance.
Using multirater approaches for both assessment and feedback may sound
efficient, but differentiating the two assessments is even more important. For
development purposes a competency assessment may be the right approach. On
the appraisal side, however, assessing people against company values, project
norms, or business results may be better.
It is critical to explain the purpose of all assessments to everyone involved.

Coordinating the timing of multiple assessment processes so they don't overlap
reduces fatigue and cynicism on the part of assessors. More important, linking
assessments to business cycles and the organizational calendar increases
validity and embeds assessment results in organizational decision making. If
the aggregate results of a competency study will help develop a workforce plan
for the upcoming year, then scheduling assessment activity to coincide with
plan development lends credibility to the effort.
Tools That Reinforce the Value of Multirater Assessments
Companies eagerly attach competency assessment scores to performance

appraisal because, all too often, a development focus alone does not produce
adequate results. These companies are replacing the development carrot with
the appraisal stick. There is, however, evidence that the carrot works. Three
recent studies demonstrate that confidential multirater competency assessments
for development planning lead to skill acquisition and behavior changes
(Atwater, Rousch, & Fischtal, 1995; Hazucha, Hezlett, & Scheider, 1993;
Smither et al., 1995). In fact, an earlier study concluded that the quality of
coaching based on a development plan was the key facilitator of change--not
threats, salary action, or punishment (Meyer, Kay, & French, 1965).
One reason feedback works so slowly is because too often we provide not
full-grown carrots but just the seeds of development. The process of receiving,
processing, and responding to feedback must be structured. Employees cannot
be left to figure out and implement their assessment data without assistance.
The value of the assessment exercise is often inadequately explained, applied,
and reinforced. In this atmosphere discrepant data confuse, discomforting
results are ignored, and the demands of business give employees an excuse for
ignoring their assessments. Employees need quality debriefing and coaching,
plus materials specifically designed to reinforce the value and applicability of
the feedback they receive. The seeds of development must be planted, tilled,
and tended.
As practitioners, we focus too heavily on the assessment process and

instruments and not enough on applying the diagnosis. Use individual
debriefing sessions, facilitated workshops, and comprehensive, user-friendly
guides to help employees understand their multirater data. DeSign actionable
tools for both managers and employees. Employees are under enough pressure
running the business-give them development and training guides and options
that are easy to use. Tie development suggestions to specific competencies,
content, or skill areas, as identified in the original competency model and
assessed in the instrument. Suggest current books and articles; multimedia tools
including CD-ROMs, videos, and audiotapes; coaching options; job-related
activities such as special assignments; and internal, easily accessible training
and educational tools. Produce and distribute guides using the media your
audience is most comfortable with--e-mail, online information, audiotape,
simple or elaborate hard copy.
Treat development tools with as much thought and care as the assessment
instrument itself. Most important, keep in mind the requirements of your
customer base. As practitioners, it is our responsibility to offer our organization
and its employees the fully grown carrots of solid development planning
processes, tools, and other applications that allow people to make the best use
of their competency assessment data. Mere carrot seedlings of assessment
reports, plus a few guidelines for development plans, are not enough.
It's Harder to Do Than It Looks
Designing and implementing a competency assessment and its resulting

applications, such as a development planning process and training programs, is
hard to do. Pasting together the process and tools yourself is time consuming,
while handing it all off to a vendor is expensive. Detennining the appropriate
balance of internal and external resources is a challenge. The depth and breadth
of expertise needed to launch a successful competency assessment, especially a
multirater process, is a daunting prospect.
One resource that provides comprehensive practical guidelines to

implementation is Providing 360-Degree Feedback by Edwards and Erwen
(1996). Although they do not discuss competency model development, the

authors do outline the key steps, learnings, and guidelines for conducting a
multirater assessment.
Competency assessment should not be viewed as a single and finite event,

but rather as a step in a process, an investment of time and money that
demands organizational and management commitment. That commitment
includes helping employees use assessment results and reassessing at regular
intervals to provide a measurement of progress. If your company intends to use
aggregate assessment results for organizational decisions, such as workforce
planning, then the currency and accuracy of the data must be maintained.
Practitioners and sponsoring line management should establish policies

regarding individual competency data handling, storage, and transfer. These
issues are especially important for employees outside the United States. Most
European and many Asian countries have legal restrictions on the use of
individual personal data. The multicultural implications of conducting
competency assessments are also enormous, with little research to guide
implementation.
It is critical to define informed consent and set policy on handling misuses

of data or broken confidentiality agreements. Addressing these issues prior to
implementation and confronting a problem head on if it occurs will heighten
the credibility of all assessments, current and future, and increase employee
willingness to participate.
The Benefits of Competency Assessment
Competency assessment has become a tool of organizational decision making.

A well-designed assessment process, such as a multirater competency
assessment for development purposes, will lead to skill acquisition and
behavior change. The results of such assessments must, however, be applied
with caution. A model and instrument designed for feedback purposes is rarely
rigorous enough to be applicable to appraisal as well. As the studies discussed
in this chapter reveal, the responses in such double-duty situations lack
accuracy and usability. They are a waste of time and money and can seriously
diminish employee morale.
Designing a valid and reliable competency study depends on selecting the

right model for the business problem to be solved. Whether you choose a
minimum standard, descriptive, or differentiating competency model, tie it to
business process and desired outcomes. Obtain management commitment and
make sure employees understand the value of the assessment process to both the
company and themselves.
Multirater competency assessment results are powerful individual and

organizational levers. They can establish both a baseline and a benchmark for
performance. Use a multirater approach not only to gather development data
but also to increase employee participation and empowerment. Companies
engaged in quality improvement and team-building efforts often use multirater
competency assessments to improve teamwork and innovation. Another
important application of multirater assessment is its potential for determining
organizational capability by analyzing anonymous and composite data from
defined groups. These groups can include business units and job types-retail
division, middle managers, new hires, principal engineers (Joines, Quisenberry
& Sawyer, 1994).
Multirater assessments contain valuable market research data. They can, for
example, provide the cost of readying a workforce to meet a new strategic
direction. They can also identify workforce strengths that will help your
company use a unique combination of skill sets to fill a market niche.
Competencies establish a common language of success. Competency

assessment can focus employees on the "how" as well as the "what" of
performance and potential. It can focus a company on its culture and values,
helping evaluate and improve organizational climate. A commitment to using
assessment results for their intended purpose increases management credibility
and employee willingness to participate. With a well-designed assessment,
employees can develop skills and behaviors that will transfer across jobs or
even careers. Design and implement a competency assessment well, and you
will be creating a catalyst for individual and organizational change.
References
American Compensation Association. (1996). Raising the bar: Using competencies to

enhance employee performance. Scottsdale, AZ.
American Psychological Association. (1985). Standards for educational and

psychological testing. Washington, DC: Author.
American Society for Training and Development. (1996). Trends that affect corporate
learning and performance (2nd ed.). Alexandria, VA
Atwater, L., Roush, P., & Fischal, A (1995). The influence of upward feedback on self
and follower ratings. Personnel Psychology. 48. 35-60.
Bohl, D. L. (1996, September 19). Mini-survey: 360 degree appraisals yield superior
results, survey shows. Compensation and Benefits Review. 28.16.
Boyatzis, R. E. (1982). The competent manager: A model for effective performance.

New York: John Wiley & Sons.
Bracken, D. W. (1994). Straight talk about multirater feedback. Training and

Development. 48 (9), 44-49.
Budman, M., & Rice, B. (1994, February). The rating game. Across the Board. 35-38.
Carey; T. (1995, March). Coming around to 360-degree feedback. Performance, 56-60.
Covey, S. R. (1989). The 7 habits of highly effective people: Powerful lessons in

personal change. New York: Simon & Schuster.
Dalton, M. (1996). Multi-rater feedback and conditions for change. Consulting

Psychology Journal: Practice and Research. 48 (1), 12-16.
Edwards, C. C. (1995, June). 360 degree feedback. Management Services. 24.
Edwards, M. R., & Ewen, A. J. (1996). Providing 360-degree feedback: An approach to

enhancing individual and organizational effectiveness. Scottsdale, AZ: American
Compensation Association.
Farh, 1. L., Cannella, A A., & Bedeian, AG. (1991). Peer ratings: The impact of
purpose on rating quality and user acceptance. Group and Organizational Studies •
.!§. 367-386.
Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin. 51. 327-
358.
Gebelein, S. H. (1996, January). Multi-rater feedback goes strategic. H.R. Focus, 3-6.
Goleman, Daniel. (1995). Emotional intelligence. New York: Bantam Books.
Handy, L., Devine, M., & Heath, L. (1996). 3600 feedback: Unguided missile or
powerful weapon? Berkhamsted, England: Ashridge Management College, Ashridge
Management Research Group.
Hazucha, 1., Heziett, S. A., & Scheider, R. J. (1993). The impact of 360-degree
feedback as a competitive advantage. Human Resource Management. 32. 325-354.
Hirsch, M. S. (1994, August). 360 degrees of evaluation. Working Woman, 20-21.
Hoffman, R. (1995, April). Ten reasons you should be using 360 degree feedback. HR
Magazine, 82-85.
Joines, R. C., Quisenberry, S., & Sawyer, G. W. (1994, September-October). Business

strategy drives three-pronged assessment system. Compensation and Benefits
Review. 73-76.
Kantor, R. M. (1997). Restoring people to the heart of the organization of the future. In
F. Hesselbein, M. Goldsmith, & R. Beckhard. (Eds.), The organization of the future.
Drucker Foundation Future Series. San Francisco: Jossey-Bass.
Larrere, 1., & Williams, D. (1996, June 25). Helping workers grow with firms. The
Boston Globe. p. 18.
Lawler, E. E., III, & Ledford, G. L., Jr. (1996). New approaches to orgamzmg:
competencies, capabilities and the decline of the bureaucratic model. CEO
Publication 096-7 (30D). Los Angeles: University of Southern California, School of
Business Administration.
Ledford, G. E., Jr. (1996, October). New directions in competency-based pay. Paper
presented at the meeting of the American Compensation Association Performance
and Rewards Forum, Chicago, IL.
London, M. (1995). Can multi-source feedback change self-evaluations, skill

development, and performance: Theory-based actions and directions for research.
Personnel Psychology, 48 (4),803-840.
McClelland, D. C. (1973). Testing for competence rather than for intelligence.

American Psychologist. 28. 1-14.
McClelland, D. C. (1987). Human motivation. Cambridge: Cambridge University Press.
Meyer, H. H., Kay, E., & French, 1. P. R. (1965). Split roles in performance appraisal.
Harvard Business Review. 43 (1), 123-129.
0' Reilly, B. (1994, October 17). 3600 feedback can change your life. Fortune, 93-100.
Prahalad, C. K., and Hamel, G. (1990). Core competence of the corporation. Harvard
Business Review, 68 (3), 79-9l.
Sanders, B. D. (1996, September 1). Latest in appraising employees: 360 degree

feedback. Sacramento Bee, p. D2.
Smither, 1., London, M., Vasilopoulos, N. L., Reilly, R. R., Millsap, R. R., &
Salvemini, N. (1995). An examination of the effects of an upward feedback program
over time. Personnel Psychology, 48 (1), 34.
Spencer, L. M., Jr., & Spencer, S. M. (1993). Competence at work: Models for superior
performance. New York: John Wiley & Sons.
Van Ve1sor, E., & Leslie, J. B. (1991). Feedback to managers. Vol 1: A guide to
evaluating multi-rater feedback instruments. Greensboro, NC: Center for Creative
Leadership.
YukI, G., & Lepsinger, R. (1995, December). How to get the most out of 3600 feedback.
Training, 45-50.
About the Author
Susan Ennis recently joined BankBoston heading up the newly created function
of Executive Development. Susan is responsible for creating and implementing
a competency-based executive development strategy, which includes succession,
replacement, selection, high-potential identification, retention, assessment, and
coaching. She also will manage executive educational programs and processes.
Prior to her role at BankBoston, Susan held various Human Resource
Development positions over a ten-year period at Digital Equipment
Corporation. In her last position there, Susan managed an internal consulting
group that produced competency research and applications and handled
evaluation and assessment projects. In her six years at McBer and Company,
Susan was a project manager, account manager, and manager of product
development. Susan has worked in the public and private sector as well as
government agencies. She has over 20 years of experience in competency-based
HRD efforts. Susan has a B.A. degree from Harvard University and a M.Ed.
from Northeastern University.
10 THE ORGANIZATIONAL ACTION
RESEARCH MODEL
Morton Elfenbein, Stephen M. Brown and
Kim H. Knight
It is a cliche to say that we are in an era of unprecedented continual change.

But, change is the hallmark of our times. Some writers have called this a period
of transition in which we are moving toward a new social order. Every
institution in our society has, and is still, undergoing radical change, and those
that have not been able to change fast enough to meet the challenges presented
by the new age have been met with widespread criticism. Along with the
change in our institutions, there are two parallel and concurrent changes. These
are changes in the skills and abilities needed by the practitioners who manage
organizations and hence a concurrent change in the training of those who are to
manage. It is this latter issue that is the major focus of this chapter. We will
begin by examining the parameters of the failure of the current mental model
from the perspectives of a number of current authors. This will be followed by
an examination of the work of David Kolb, who has given considerable thought
to the ways adults in general and managers in particular, think and build
knowledge. This will be followed by the Organizational Action Research Model
(OARM) of organizational action research that we propose as a solution to the
failure of the current model. It provides practitioners with a different
knowledge-building perspective, and a way of reflecting and evaluating their
practice.
This chapter proposes a paradigm, which empowers practitioners to practice

research to meet their needs and to advance the profession to which they
belong. It proposes the integration of practitioner and researcher role as an
210 Organizational Action Research Model
alternative to the fragmented model that currently exists. In doing so, it draws
much from the past tradition of the action researchers as well as the action
science approach espoused by Argyris, Putnam & Smith (1985) and also the
work of Schon (1983). In this way, the needs of individual managers to
evaluate their espoused theories and their theories-in-use can be undertaken so
that their organizations can function more realistically and can respond more
effectively to the need for self-examination and change.
The Current Model
Many of the ideas concerning the problems associated with training of

professionals have been addressed before. Donald Schon has directed
considerable criticism at professional training in general. Professionals as a
whole, he believes, are a product of a paradigm that may not be moving with
the times and certainly not within the context of practice. The traditional view
of a professional discipline is, as Schon (1983) stated, based on technical
rationality. In this view, new knowledge is generated through scientific
research in the basic sciences or underlying disciplines. Then there is an
applied science, which addresses diagnostic and problem-solving techniques.
Applied science rests on the foundation of basic science. Application is found
in practice. The three components are given in hierarchical order of status, with
practice having the lowest status. Most professional disciplines have attempted
to gain the status of the higher professions by emulating the medical model. In
this model knowledge is generated through the scientific method and applied to
problems that the professional addresses. This model relegates practice to the
implementation of scientific principles and the lowest of status. The highest
statoS and generator of research is found in the basic disciplines, such as
biology and psychology, whose investigations are single disciplines in focus.
These disciplines tend to have a preference for experimental designs that tend
to reduce problems to single variables, which can be researched through
experimental designs. These disciplines do not address the all-happening-at-
once interdisciplinary nature of problems found in professional practice. The
researchers in this older paradigm tend to be isolated and removed from
practice. They often do not understand the problems encountered in practice or
are unaware of the ramifications of implementing their solutions. This
paradigm has resulted in the researcher having authority in the accumulation of
knowledge, and has perpetuated a top-down view of knowledge. The
researchers claim to be value-free and neutral in their investigations. They
strive for detachment and objectivity. However, this is contrary to the

practitioner's need for good solutions and their passionate commitment to their
professions and the mission of their organizations. Schon summarized his
work:
I argued for a new epistemology of practice, one that would

stand the questions of professional knowledge on its head by
taking as its point of departure the competence and artistry
already embedded in skillful practice-especially, the
reflection-in-action (the thinking what they are doing while
they are doing it) that practitioners sometimes bring to
situations of uncertainty, uniqueness and conflict. In contrast,
I claimed, the professional school of contemporary research
universities give privileged status to systematic, preferably
scientific, knowledge. Technical rationality, the schools'
prevailing epistemology of practice, treats professional
competence as the application of privileged knowledge to
instrumental problems of practice. The school's normative
curriculum and separation of research from practice leave no
room for reflection-in-action, and thereby create-for
educators, practitioners and students-a dilemma of rigor or
relevance (Schon, 1987, p. xi).
Professional Training
In a time when there is a growing demand for professional services and

increasing expectation for results from professional problem-solving, there is
also a crisis of confidence in the professions. The professions have been unable
to solve many problems and unable to predict some undesirable effects of
attempted solutions. The professionals find themselves in an increasingly
complex, changing environment where unique, new, or unusual cases are
encountered and the models or theories generated within the scientific
paradigm do not apply. There is a growing suspicion that professional schools
teach science that does not apply to practice. Again Schon notes:
The crises of confidence in professional knowledge

correspond to a similar crisis in professional education. If
professions are blamed for ineffectiveness and impropriety,
their schools are blamed for failing to teach the rudiments of

effective and ethical practice. Chief Justice Warren Burger
criticizes the law schools, for example, because trial lawyers
are not good at their jobs. In the present climate of
dissatisfaction with public schools, schools of education are
taken to task ... Business schools become targets of criticism
when their MBAs are seen as having failed to exercise
responsible stewardship or rise adequately to the Japanese
challenge. Schools of engineering lose credibility because
they are seen as producing narrowly trained technicians
deficient in capacity for design and wisdom to deal with
dilemmas of technological development (Schon, 1987, p. 8).
The crisis in the professions exists and is rooted in "the epistemology of

professional practice." This paradigm and its epistemology separates research
and its resulting theory from practice.
What is more, there is a disturbing tendency for research and

practice to follow divergent paths. Practitioners and
researchers tend increasingly to live in different worlds,
pursue different enterprises and have little to say to one
another (Schon, 1987).
This can result in models that are not useful to practice and that can hinder
the development of the field. While a field can develop from practice or
research, a profession needs theory, models or research that have practical
applicability in the field. Conversely, practice can develop a field when it can
be generalized to a model, principle, or theory that goes beyond the unique case
and can be made useful to other practitioners.
However, the schism between theory and research that currently exists in
many professions potentially thwarts this type of development and results in
isolated practice and impractical and irrelevant science. Schon states that the
rational technical models leave practitioners with the "relevance or rigor"
dilemma (1987). This model assigns the notion of rigor to a methodology that
has become irrelevant to practice.
Management Training
It is our view as well as Schon's that this criticism also pertains to the field of
general management. Part of this misdirection lies in the failure of managers to
learn the principles of knowledge formation, principles that knowledge
consumers and practitioners need to know. As a result, they are not capable of
effectively criticizing, altering, or developing the knowledge vis-a-vis their
practice as managers in organizations. Thus they espouse theories that are
faulty and resistant to change. They cannot do the type of research that John
Seely Brown (1991) has called research that continuously reinvents the
corporation.
As a result of this failure in the training of professionals, managers have

similarly suffered in their ability to learn adequately from their practice.
Several authors have written on this problem including Argyris (1982), Argyris
& Schon (1974), and Kolb (1984). Argyris and Argyris & Schon focused their
criticism on the training of managers and have expressed their concern that the
theories that are espoused by managers may not be the same theories that
actually guide their behavior. For managers to be effective in their business,
they must learn to discard and replace ineffective theory through special
processes of learning, which Argyris and Schon call double-loop learning,
learning that looks into assumptions, norms and contexts, and frames of
reference that guide behavior. However, managers are not trained in the ways
of knowing that will help them resolve this problem. This kind of training does
not take place.
From a developmental perspective, the manager has learned to manage

based on information gathered. First, as a child living in the family, and then in
school, the child interacts with another set of individuals, a set in which
hierarchy is more salient. Some formal training in management commences
when the individual begins to take management courses in college. The
individuals confronted with academic theories of managing, some of which
have limited "scientific" value and are not very rigorous but appear like
clockwork in the curriculum. However, some of these theories may be quite
rigorous but seem more difficult to apply. In addition, individuals learn theories
of leadership often without explicit concern for integrating the theories into
their personal repertoire. Later they get case studies that are designed to
provide an experiential base and models of behavior to be used in real
management settings, but often the cases are so complex that it is difficult to
generalize from them. Then too, the cases are often provided from a CEO's
perspective with much information provided, but little or no training on how to
collect the data that leads to this perspective. How to apply the information or
implications at lower levels of the hierarchy in organizations with cultures
different from the case are also not addressed.
During an internship in some organization or on their first jobs, they get to

watch real managers doing their thing. They may be mentored by an
experienced senior manager. As new managers, they probably will participate
in training and development activities sponsored and planned by the
organization. They may receive two or three or four whole-day sessions on the
skills of leadership. They are probably evaluated but mayor may not get
feedback on their learning.
These training programs are often generic off-the-shelf packages that are
difficult to apply, especially when the training is given to isolated individuals in
isolated parts of the organization. In addition to the above, the manager will
learn much from the actual job of managing. Experiencing what works and
what doesn't work will become part of a repertoire of behaviors and feelings.
These will become part of a reservoir of concrete experiences that are
integrations of values and feelings. Some managers may stop to reflect and
observe their behavior and the behavior of others and to compare this with the
previous education and training. A very few may actually begin to develop their
own theories of managing. But, as a general rule, most will try to experiment
with new approaches, to see if they appear to work. If they appear not to work,
the manager will search for new ideas. They will borrow ideas or parts of ideas
from current popular readings and fads and experiment to see how they work.
There are several very important classes of elements that are missing from
the picture above. Most managers probably do not learn to think about the ways
they acquire certain types of knowledge and how these different types of
knowledge are related to each other and how they are related to managing. For
example, managers may not be aware that they have been taught to have a
predilection for valuing experimentation and experience but not for other types
of knowing or other ways of collecting data. They may not appreciate that the
very job of management, as defined in our culture, has forced them into being a
nonreflective, reactive knowledge builder who blindly tries new stuff (like
forcing participative management). They rarely conceive of a systematic
epistemology of many parts, much less one that systematically evaluates its
implications.
Managers may have taken courses in accounting and marketing and

operations management and are able to do cost and profitability analyses. They
can use complex regression and time series analyses for predicting trends in
marketing and use complex mathematics to match production with demand.
However, they probably know very little about effectively taking the pulse of
their organization, about collecting qualitative data to begin to effectively
formulate some grounded theory about the functioning of this particular
organization. They undoubtedly have a fetish for quantitative data and
relinquish qualitative approaches to mere anecdotes. They probably know next
to nothing about the strengths of qualitative research. They may have but the
slightest inkling that the academics they learned in college mayor may not fit
their organization. And they have neither the time, the training, nor the
inclination to reflect and build a grounded theory of their own and to see how it
relates to academic nomothetic theories. They have barely the slightest idea of
how to test and evaluate their inchoate theories in an ongoing organizational
setting, if they build such theories at all.
Managers probably know very little about organizational diagnosis from a

quantitative perspective and have few insights into the strengths and
weaknesses of research design. They do not understand the limitations of
quantitative measures in specific contexts. They know very little about
quantitative issues of reliability and validity of measurement and can be easily
fooled by a persuasive consultant who offers them an off-the-shelf training
package (or even a tailored one) that will change the course of the organization.
They know a bit about the systems nature of organizations and people but
not much. At some deep level, they may doubt the effectiveness of three or four
days of costly leadership without a truly systems-wide perspective. They know
you must systematically tie strategy, tactics, training and evaluation to
organizational goals. But, they relegate this fear to limbo hoping that human
resources or training and development folks know something more.
They know little about their own implicit theories of organizing and
managing and how extraordinarily pervasive but subtle these determinants of
their management behavior are. It is unlikely they know how difficult it is to
change these implicit and often unproductive theories, even with three or four
days of leadership or TQM training.
They may themselves have implicit beliefs in the truth of numbers or

experiences, not realizing that both the quantitative and the qualitative
approaches can play complementary roles in knowing and changing an
organization.
To be sure, the foregoing presentation has been one-sided and biased and
has emphasized many of the negative aspects of the situation and few of the
positive. Nonetheless the caricature can serve the purpose of highlighting some
serious problems. There are a number of sources to which one can appeal to
remediate the concerns expressed above. The first of these is to study different
ways of knowing and how these different ways can be learned, trained, and
assessed. This is primarily the work of David Kolb and his types of
epistemological approaches. Second, we present the OARM model, which is an
integration of a number of organizational action research models. This is an
approach that managers can use in the resolution of many problems.
Kolb and Experiential Learning
Kolb (1984), who in addition to helping us understand the nature of the failure
described above, also has provided a very clear delineation of this inability to
learn from experience. In doing this, he has introduced some conceptual tools
and empirical data for understanding the problems of training managers. Kolb's
concepts for evaluating the experiential learning and ways of understanding
that managers use is based on his theory of experiential learning.
He has developed a four-facet theory of the types of knowledge that are used
in understanding in general. In addition, he has developed a theory of
experiential learning that describes both the sequence of learning as well as a
theory about the predilection for individuals to be fixated or characterized by
one of these four types. Our model is based in part on Kolb's types of knowing
and his sequencing, which itself has been heavily influenced by the approach of
action research. The four personal knowledge types are variously called
divergent, assimilative, convergent, and accommodative. These can be seen in
Figure 10-1.
KOLBMODEL
APPREHENSION
(CONCRETE
EXPERIENCE)
FEELING
QuaIitarivelHlIDWIistic
SYNTHETIC
EXTENSION INTENSION
(ACTIVE (REFLECTIVE
EXPERIMENTATION) OBSERVATION)
OOING WATCHINGBasic
Applied INTEGRATIVE
DISPERSIVE
COMPREHENSION
Figure 10-1. (ABSTRACT CONCEPTUALIZATION)
From Kolb (1984) THINKING
QuantitativelScientific
ANALYTIC
Figure 10-1. The Kolb Model
Divergent Know/edge
The first of these is divergent knowledge, which is based on concrete

experiences that are transformed by reflective observation. Individuals who are
oriented toward concrete experience have an emphasis on the immediate
human situation and the associated feelings towards understanding of
qualitative rather than quantitative aspects of knowledge. The general
orientation is toward a synthesis or unity. The orientation toward reflective
observation or intention focuses on watching and understanding situations, as
opposed to acting in the situation. The variety of knowledge associated with
this pure type is highly integrative. Hence the type of knowledge that is
generated by these two orientations, divergent knowledge, is "to view concrete
situations from many perspectives and to organize many relationships into a
meaningful 'gestalt' with a sensitivity to meaning and values" (Kolb, 1984).

The emphasis is on adaptation by observations, rather than action. The term
divergent is used because of the impact of the diversity of concrete experiences
on the reflective state.
Assimilation
The second type of knowledge, that produced by the actions of reflective

observation on the abstract conceptualizing style, is called assimilative
knowledge. The individual with an orientation toward abstract
conceptualization is the logician. This is a person who is more predisposed
toward inductive logical thinking than toward feeling, and who is a constructor
of general theories with a focus on analysis (as opposed to synthesis). This
person prefers quantification and abstraction to feeling. Reflective observation
focuses on understanding by observation, as opposed to practical application.
This is a very systematic, quantitative, and rigorous orientation. In conjunction
with the reflective transformation style, this leads to a type of knowledge less
concerned with people than with ideas and abstract concepts whose practical
value is not as significant as its precision and its logicality.
Convergent Knowledge
The third type of knowledge is the convergent type, which is composed of

abstract conceptualization and active experimentation. As an individual
orientation, active experimentation involves an orientation toward doing rather
than observing, with a more pragmatic and applied orientation, knowledge that
is useful or applicable is more important than that which is absolutely true.
This active or doing orientation tends to lead to knowledge that is more
dispersive than integrative. The combination of this style with the abstract
conceptualization style leads to a learning style focused on problem solving,
decision making, and practical application of ideas using the logic of the
hypothetic-deductive method. The term convergence was used to reflect the
search for the one, best solution to a question or problem.
Accommodation
The final type of knowledge is derived from the active experimentation mode
and the orientation to concrete experience. This style is called accommodative
because the learner, focusing on feelings and concrete experiences, transforms
these by active experimentation without reflection or conceptualization. This
type of knowledge would involve decision making and accomplishment of tasks
in uncertain situations, a job not unlike those in general or executive
management.
Managerial Ways of Knowing
Kolb has characterized general managers both theoretically and empirically as

individuals who by bent or training and experience have a predilection for what
he calls an accommodative style. This is a style based on both action or active
experimentation with a strong basis in the concrete world of feeling. Kolb also
holds that a singular style of knowing may not be effective in all situations and
that for the highly integrative individual the use of all the modalities of
knowing is both possible and much more adaptive.
It is particularly interesting to note that those epistemologies that Kolb views

as opposite to those that generally characterize managers, that is, those types
who partake of reflection and abstraction, are the very types who Schon has
posited as important in professional training. In his recent work (1983, 1987),
Schon noted that many practitioners, locked into a view of themselves as
technical experts, find nothing in the work of practice to occasion reflection.
They have become too skillful at techniques of selective inattention, junk
categories, and situation control techniques, which they use to preserve the
constancy of their knowledge-in-practice. For them, uncertainty is a treat; its
admission is a sign of weakness. Others who are more inclined to use
reflection in action nevertheless feel profoundly uneasy because they cannot
say what they know how to do, they cannot justify its quality or rigor. For these
reasons, the study of reflection-in-action is critically important.
The dilemma of rigor or relevance may be dissolved if we can

develop an epistemology of practice which places technical
problem solving within a broader context of reflective
inquiry, show how reflection in action may be rigorous in its
own right, and links the art of practice in uncertainty and

uniqueness to the scientist's art of research. We may thereby
increase the legitimacy of reflection-in-action and encourage
its broader, deeper and more rigorous use (Schon, 1983 p.
69).
Researchers and Practitioners
There is a basic antithesis between the professional's ways of knowing, derived

from the training that managers receive as professionals and practitioners and
that is needed for effective practice in modern organizations. The resolution of
this antithesis requires a new model for the training of practitioners (Schon
1987). Practitioners tend to be educated and sophisticated in their
understanding and dedicated to an end. They work in a rapidly changing
environment, where new complex problems are encountered. They serve clients
who have become increasingly demanding in this consumer society.
Practitioners want to utilize knowledge and collect data to meet a practical end.
It is our belief that the schism that currently exists between research and
practice has weakened both. Researchers are often chasing irrelevant problems
and are ignorant of the interesting emerging issues in the field. Practitioners
often need useful models, research, or theories to aid in the practice of their
profession, and these models are often nonexistent. We also believe that the
narrowing of this schism could make a positive difference in the practice of the
professions and the accumulation of knowledge. The boundaries between these
roles must become more permeable. This has begun to happen as practitioners
receive more formal education. However, researchers must become aware of the
field. This can be accomplished by spending some time practicing in the field
doing the equivalent of organizational action research, that is, research whose
primary goal is the improvement of organizational conditions and the solution
to organizational problems (French & Bell, 1984; Lewin, 1984; Love, 1991;
Porras, 1987; Whyte, 1991).
Researchers can also engage in collaborative research with practitioners.

This will provide them with an inside view of practice and further practitioners'
understanding of research. Whyte's (1991) participative action research (PAR)
has been extremely effective in this regard. It is the position taken in this
chapter that of the various solutions to these problems, the most important
approach and that which has the greatest potential, is one in which
practitioners learn new methods of inquiry, to think beyond their current

problems, and pose models and research questions from their practice. In doing
this, they need to be more sophisticated in research and measurement
methodology so that their conceptualizations will have transferability. They
should be systematic in their data-gathering approach, and thus be able to share
their finding with a wider audience, including the basic and applied scientists.
Finally, they must be able to articulate, doubt, and test their privately held
assumptions about their organizational world.
Practitioners as Researchers
As previously stated, this chapter proposes a paradigm that empowers

practitioners to practice research to meet their needs and to advance the
profession to which they belong. It proposes the integration of practitioner and
researcher roles as an alternative to the fragmented model that currently exists.
In doing so, it draws much from the past tradition of the action researchers as
well as the action science approach espoused by Argyris, Putnam, & Smith
(1985) and also the work of Schon (1983). In this way, the needs of individual
managers to evaluate their espoused theories and their theories-in-use can be
undertaken so that their organizations can function more realistically and can
respond more effectively to the need for self-examination and change. Although
the role of practitioner only and researcher only still have a place in the
professions, the proposal attempts to make the boundaries between roles more
permeable. This will empower practitioners, through cross-training in scientific
methodology, to pose problems, seek answers, and advance their discipline
using techniques USUally relegated only to the basic or applied scientist.
The empowerment of individual practitioners is consistent with the

contemporary movement we are currently witnessing in organizations when
top-down decision making is being replaced by bottom-up and more
collaborative models. This new research process can make the knowledge of the
professions more relevant and serve greater organizational and social purposes.
Practitioner research can provide useful answers, identify problems to be
researched, and can result in adding to the body of accumulated knowledge.
This can help other practitioners in these turbulent times and provide not only
interventions that can make a difference in practice but also the knowledge base
that can be useful for basic researchers in pursuing their own ends. The
proposed model for practitioners to undertake research in their practice setting
consists of five steps or phases, which usually occur in sequence. These phases
are represented in Figure 10-2.
The Oarm Model
FINDING
(RESEARCHING)
-·O-·EJ ,
,,
T
,-- - - - - - - - - -..... - - - - - - - - - - - - - ..... - - - - - - - - - - - - - - - --''
Figure 10-2. The Oarm Model
The Organizational Action Research Model (OARM)
Practice
Practice is what the professional does. Practice is that set of experiences and
ways of understanding that determine the expected and everyday way of
behaving of the manager within an organization. This definition not only
includes behavior but also the determinants of behavior as well. Practice can be
understood by examining three levels of forces that act upon the individual.
These forces include individual, organizational, and external environmental
aspects. Individual forces refer to the knowledge, values, interests, role
definition, and role behaviors that the manager holds or does with respect to the
job and the organization. These are defined in part by training, personality,
professional interests, and level of development as well as the predominant
knowledge orientation that the individual prefers (Kolb, 1984). These
individual forces act either in concert with or in contrast to forces that define
the organization. Organizational forces consist of the goals, expected specific
standards of practice, organizational culture, image, preferred epistemology,
and general value system that constitute the people of the organization. These
forces shape the individual through the general socialization techniques that
work to modify or alter individuals to fit existing norms and expectations.
While these organizational forces change over time, often the change is slow
and the nature of the change may not necessarily be adaptive or in the best
interest of the organization. While it is the individuals who develop and
maintain these organizational forces through formal and informal
communication patterns and through selection and retention of individuals,
more often than not the totality of these forces are beyond the ken of any single
individual. Hence, organizational activities can become nonadaptive, and
individuals may not possess a clear understanding of how or why problems
have occurred or how to change them. The ways of knowing or modal
epistemology that is characteristic of the organization may be self-limiting and
hence maladaptive. Very often change is required of organizations because of
events outside of the organization, such as existing technology, ethics and
values systems, markets for organizational services or products, other
organizations, regulatory mechanisms at the city, state, federal or international
level, as well as models or theories of either technology or organizational
functioning. These external forces are in constant change (although the speed
of change can vary from one type of organization to another). This change
requires that organizations be able to systemically anticipate, sense, and
respond to maintain organizational identity and integrity.
Practice then can be conceived of as a complex and systematic set of

understanding and behaviors that constitute the established way of proceeding
on the part of individuals as they go about their business. These understandings
and behaviors are themselves the result of multiple interacting forces derived
from the three sources described above. Practitioners do their work guided and
determined by all three sets of these forces. Most often they do this using tacit
understanding or tacit knowledge. Work behavior is often a balancing or
compromise of these forces to keep them in homeostasis. This homeostatic
condition can be disrupted gradually or abruptly by a change, breakthrough or
modification in or between these forces. For example, a change in the

professional practice, such as those presented by the Total Quality Management
movement, has led managers to change their practice and certainly to question
personal and organizational ways of doing and understanding. The practitioner
possesses tacit understanding of practice, of the organization, and of the
environmental impact on practice. This tacit understanding· is what Schon
(1983) calls knowing-in-action. This knowledge is often acquired through the
process described by Kolb (1984) as Apprehension, the gathering of knowledge
from concrete experience, which is personal and intuitive and often tacit. This
understanding may make practitioners knowledgeable change agents and
potential sources of novel answers and new insights to practice problems.
However, at the same time, this concrete experience may well be interpreted
through of some ineffective implicit theories learned from childhood. These
implicit theories mayor may not be effective in managing but they can totally
define what aspects of the environment are attended to-as well as what aspects
are totally ignored. The strength of the internal action researcher is also the
greatest weakness. The strength is knowing the values, feelings, context of the
practice. The weakness is not being able to reconstrue this practice from an
alternative perspective.
When practice fails to be effective, and groups or the organization suffer

from inability to sense, diagnose, understand and change, a model for
facilitating change is necessary. The approach for this has often been to find a
change agent, a consultant who can be brought in to aid the organization in
understanding diagnosis and change. Without exception the change agent or
consultant would come to the organization and begin to observe and examine
various aspects of its structure, its processes, its productivity, and the way it
transforms input or raw material into output or finished product or service. This
initial stage involves a focus or an orientation toward data gathering and
observation of concrete experience in a personal way. This initial step is
devoted to the knowledge gathering process, which we have described above, as
Apprehension, the gathering of knowledge about others through concrete
experience so as to create an intuitive and personal knowledge. The personal
model begins with the manager/practitioner as potential change agent. Because
of this, the process of understanding or apprehending the organization is
shortened in some ways but is made more complex because of implicit beliefs.
But before change can take place, there are many steps in the inquiry process
necessary to produce effective change. The first of these begins with the
understanding of practice as we have described above.
Reflecting
Reflecting is defined as a thought occurring in consideration or meditation.

Kolb (1984) describes reflective observation as "understanding the meaning of
ideas and situations by carefully observing and impartially describing them." It
emphasizes understanding as opposed to practical application: a concern with
what is true or how things happen. Thus the second step in the model requires
that an individual manager step back from practice, from the collection of
concrete experiences, and reflect on that practice. Schon in his book The
Reflective Practitioner (1983) describes reflection-in-action:
When we go about the spontaneous, intuitive performance of

the action of everyday life, we show ourselves to be
knowledgeable in a special way. Often we cannot say what it
is that we know. When we try to describe it we find ourselves
at a loss, or we produce descriptions that are obviously
inappropriate. Our knowing is ordinarily tacit, implicit in our
patterns of action and in our feel for the stuff with which we
are dealing. It seems right to say that our knowing is in our
action. Similarly, the workaday life of the professional
depends on tacit knowing-in-action. Every competent
practitioner can recognize phenomena-families of symptoms
associated with a particular disease, peculiarities of a certain
kind of building site, irregularities of materials or structure-
for which he (she) cannot give a reasonably accurate or
complete description. In his day-to-day practice he makes
innumerable judgments of quality for which he cannot state
adequate criteria, and he displays skills for which he cannot
state the rules and procedures . . . On the other hand, both
ordinary people and professional practitioners often think
about what they are doing, sometimes even while doing it.
Stimulated by surprise, they turn thought back on action and
on the knOwing which is implicit in action. They may ask
themselves, for example, "What features do I notice when I
recognize this thing? What are the criteria by which I make
this judgment? What procedures am I enacting when I
perform this skill? How am I framing the problem that I am
trying to solve?" (Schon, 1983, pp. 49-50).
The reflecting phase is composed of multiple substages including both a

framing as well as an exploration substage. Reflection is the point where
problem awareness begins. It is the beginning of framing the problem. Kolb
would call the kind of knowledge generated by this activity divergent
knowledge; that is, knowing which is informed by the meaning and values of a
particular set of individuals within an organizational setting. It is in this
domain of knowing that the internal practitioner consultant has the greatest
advantages as well as the greatest deficit, because the wealth of concrete
experience to be transformed by reflection essentially already exists. But the
predilection to know only that which our implicit theories allow us to know is
our greatest weakness. Divergent thinking is probably quite close to the kind of
connected knowing described by Belenky, et al. (1986). It is in this modality
that alternative views of organizational reality can be entertained and perhaps a
single gestalt developed. This gestalt is intuitive: it is composed of a complex of
ideas, causal connections of multifold variables. It is colored by values derived
from a history and being socialized in an organization knowing what is possible
and what is acceptable given the politics and culture. This stage occurs because
expected results of a particular kind did not occur. A prior form of practice,
which involved acting or behaving within the organization has ceased to
produce the kind of expected results. As Argyris, Putnam, and Smith have
described it:
Drawing on these ideas, we can now sketch a more

comprehensive and dynamic model of the epistemology of
practice. The agent, confronted with a complex, puzzling, and
ambiguous set of circumstances, draws on tacit knowledge to
frame the situation and act. The consequences of this action
generate information about the situation and about the
suitability of the framing and action of the agent. The agents
interprets this information, again drawing on tacit
knowledge. If the action-as-probe generates information
inconsistent with the original framing, if the action-as-move
does not achieve intended consequences or leads to
unintended consequences, or if the action as hypothesis is
disconfirmed, the agent may be led to reflect on the tacit
understanding that informed the original framing and action.
This reflection mayor may not lead to a reframing of the
situation and a new sequence of moves (Argyris, Putnam &
Smith, 1985, p. 51).
Problem framing within reflection is the beginning of the process that a

scientist would call theory construction. It is also part of the activity that action
researchers call diagnosis. It is the internal search for understanding of
phenomena in terms of cause-and-effect relations so that control can be
regained. In this case, we refer to control of organizational processes that may
have gone awry. To accomplish this understanding, tacit theory, that which we
hold in an unreflective way, needs to be made explicit and overt. The process
for doing this requires several steps. A detailed presentation of this is beyond
the scope of this chapter. However, in brief, the process involves the active
work of writing one's observations of concrete experience as well as writing out
in considerable detail the implicit causal model or models that inform
understanding of the problem. It is crucial that this be done in a written form,
because the next step in reflection is to explore, to undertake secondary
research, that is, library research, looking at the ways others have theorized
about such problems and the kinds of concepts and constructs they have used.
However, reading others' research can modify or change one's understanding
and can even lead to a reframing or altered perception of the problem. This
mayor may not be helpful. To maintain the integrity of the original
observations, the basic concrete experiences, and the reflected theories that
related these to each other, must be written. We use the term integrity here, but
this may be misleading. It is absolutely critical that the individual
manager/researcher confront her or his own implicit theories learned in earlier
years. These single-loop models, as Argyris calls them (1982), are subtle and
pervasive. They may be effective in the diagnosis process or again they may be
terribly damaging. They are hidden, illogical, covert, and virulently resistant to
change. In order to discover them, they must become open, expressed, overt,
and amenable to exploration and logical understanding through the reflective
process. This is accomplished by writing them in their original "integrity" so
they can be evaluated. An example of this issue is how many managers profess
theory y but perform theory x. In addition, it is important that the researcher
also consider a constructivist perspective. Here the qualitative research
approach of interviewing others in the organization with a concern for
connected (Clinchy, 1996) knowing, or epistemology, is important. Since the
organization and its various construals by its members is the focus of concern,
an accurate diagnosis of others is critical. This too must be written and
consistent with qualitative research methodology and checked out, through
consultation with the informants. In a thesis, this would be a detailed
description of the concrete problem but with the constructivist's understanding
that organizational reality may have multiple facets. Here the

practitioner/researcher is building a grounded theory.
The next aspect that follows the framing of the problem is exploration. With
a framed question and perhaps a number of tentatively held hypotheses about a
diagnosis, the practitioner explores the body of accumulated knowledge to
discover alternative ways of naming or conceptualizing the problem. The
discovery of new ways of conceptualizing may produce a new consciousness
about the problem. It is this secondary background research and exploration
that enhances the reflection process and begins the process Kolb calls abstract
conceptualization. The practitioner approaches the exploration of accumulated
knowledge at this stage much as a researcher would. The difference is that the
practitioner's inquiry began from practice; a researcher usually begins with a
knowledge of the discipline and is looking to test a logically derived hypothesis.
Practitioners have methods of exploration, in addition to library research,
which lead to problem framing. These include interviews with other
knowledgeable or experienced practitioners. Particularly useful are interviews
with practitioners who have experienced a solution to the problem.
Collaboration with an informed third party, such as an academic or consultant,
can also be a useful approach. The initial exploration can produce models or
theories that approximate (or are analogous to) the problem encountered. Schon
calls these exemplars. Kolb calls the type of knowledge that derives from
reflection and causal analysis assimilative knowledge. These two steps in the
reflective process, framing and exploration, are interactive, with each one
informing the other in a circular pattern until the practitioner is comfortable
with the fit of the problem as framed. The result of this stage is a set of research
questions, framed by the practitioner in the context of accumulated knowledge.
The role played by the practitioners in this step is that of the reflective
practitioner. The practitioner has moved from tacit practice to an understanding
the problem, perhaps even multiple understandings.
As an addition to the reflective and abstract conceptualization processes, we

would also add what we have come to call meta-assimilation or meta-reflection.
It is very useful to be able to understand the totality of the research process
from the Kolbian perspective. Hence, we would urge each reflective practitioner
to also reflect on the Kolbian theory and epistemology. This opportunity to
reflect on the very process of reflection provides a conceptual road map, as it
were, so the action research has a sense of the role of each activity.
Finding
The third stage of the model is that of finding. In this stage the practitioner
plays the role of a researcher and moves from the various understandings of the
problem to a data-based knowledge of the problem in its organizational context.
The various steps in this stage are quite similar to those taken by an
independent researcher. The practitioner would further explore in a focused
manner his or her current understanding of the problem. This exploration
would include further library research and probing and inquiry within the
organization. Probing and inquiry may also involve some data collection
techniques, such as a sensing interview, process observation, and ethnographic
data collection, or they may involve the use of more quantitative measures if
this is appropriate. Practitioners have an advantage in using these techniques
based on their acceptance as a participant and their holistic understanding of
the organizational context. It may seem that this data collection is repetitive,
that is, members of the organization' have already been consulted. But the
participative imperative in identifying acceptable realities requires not only
"buy in" but a mutuality of theory and constructs. Based on their acceptance as
participants and their holistic understanding of the organizational context,
practitioner/scientists are ready to design a study. The same rules and decision
considerations of scientific inquiry are present in the present approach as they
would be in any research design. Issues such as internal and external validity,
measurement reliability, and validity are extant in this research setting, which
will be done in an operating organization. This "in vivo" setting presents the
same constraints and problems that are found in action research. Hence, the
practitioner must very often make methodological choices that force
compromises, which can threaten the internal and external validity of the study.
The final aspect of this stage is the data collection summary and analysis. It is
in the context of finding" that a clear understanding of the research modalities
of qualitative and quantitative techniques becomes important. What can one
learn and find from one approach? What can one learn and find using the
other? Here we would urge not only triangulation of methods using multiple
measuring techniques but also multiplicity of epistemology.
Knowing
The fourth stage is that of knowing. It is in this stage that the practitioner
integrates the knowledge gained through all other stages. The first step is to
interpret the data analyzed in the prior stage. The practitioner then integrates
the tacit knowledge from practice, the reflection from the second stage, the
accumulated knowledge discovered, and the data collected in the finding stage.
This integration takes place in the context of tacit understanding of the three
forces-personal, external, and organizational. Practitioners integrate this
knowledge in the assimilative and divergent and convergent modes (as
described by Kolb).
The roles played in this stage are multiple and include theorist, reflective
practitioner, data analyst, and model builder. Hence, the practitioner moves
from being a researcher to becoming an expert in the situation. The practitioner
has knowledge from many domains and several epistemologies and now has an
informed basis for generating policy alternatives and for choosing among
alternative action possibilities. It is here that a less-tentative diagnosis can be
posited. Along with the diagnosis is a theory, applicable to this organization
and its contexts. The theory suggests a causal understanding of the problem, in
all its systemic complexity, as well as a set of interventions that can alter the
situations.
Acting
In this final stage the practitioners uses the informed basis for action. The roles
played are that of change agent and expert. The steps are to plan for
implementation and evaluation, to actually implement, to gather evaluative data
through feedback and evaluate mechanisms. Hence, the practitioner moves
from expert in the situation to an experimenter and informed practitioner. The
resultant action may be a change in the system, implementation of a new model
or practice, growth in knowledge, or, in the event the implementation was not
effective, a clearer understanding of the situation that arises from action. This
final kind of knowledge is what Kolb calls accommodative. It partakes of the
result of active experimentation or action in conjunction with the apprehension
of the results of action. Hence, the practitioner/scientist returns to being a
practitioner, more informed in the area as a result of the cycle.
In conclusion, just as any scientifically based system of inquiry tends to be

repetitive, self-correcting, and open-ended, so too, the model proposed above
would have these same characteristics. To be sure, there are many aspects of the
model that we have not covered in this presentation and that are of considerable
importance. These include, but are not limited to, issues of values in the choice
of action alternatives, ethical problems, and considerations in doing
organizational research, a specific methodology for examining one's own and
others' constructs in the organizational diagnosis process as well as the
manifold difficulties that reside in any self-diagnostic activity.
The OARM model represents an integration of both types of epistemologies

and research practices. A review of other action research models (Elfenbein,
Brown & Knight, 1996) using the Kolbian categories suggests that the actual
sequencing of these activities need not be in the exact order suggested, and
there may be repetitive sequences, such as moving back and forth between
concrete experience, through the reflective process to abstraction several times
before moving on to active experimentation. It is also possible to do mind
experiments as part of the reflective process: theories are tested in imagination.
In addition, many of the action research models confirm the necessity of
participative research. In all of the action research models the Kolbian
epistemologies are used.
Getting There
Elements of the OARM are known by practitioners through their education,

training, and practice. However, they are scattered, isolated, and non systematic.
Practitioners must start to use the OARM in a systematic way. Like an athlete
or a musician, a practice or rehearsal is useful in developing the model as a way
of approaching a problem. Creating a laboratory or using OARM as an
alternative approach to a real problem is a good way to start. Applying the
OARM model in a "community of practice" or a peer group of learners is
highly recommended.
Kolb's theory provides practitioners with one method of understanding their

metacognition, or how they think. Only through self-understanding of how they
think and awareness of their assumptions in their practice, can practitioners
construct new theories-in-use, and develop breakthrough solutions.
The OARM stresses the systematic collection and analysis of data. It is

through data that we can truly question our assumptions and the meanings we
have constructed, and test the applicability of the models and solutions we have
created to the all-at-once practice environment. It is advised that the
practitioner who is new to applied research find an experienced person to act as

a methodological consultant. This will free the practitioner to reflect on the
problem at hand. This also brings together the world views of implicit and
rational technical knowledge. Over time, the consultant will be needed less and
less. But in the beginning the consultant is the coach, or the maestro, of the
OARM.
When we speak of data or research methods, we implicitly mean data and

methods that are labeled both quantitative and qualitative. Just as we have
argued for the integration of theory and research with the rich, multivariable
experience of the practice field, we argue that quantitative and qualitative data
are in fact one the same in the practice environment. Both contribute to our
understanding of phenomena, and alone both have limitations in describing the
practice environment. The goals are to improve practice and develop processes
that allow us to continually improve practice. The richness of data allow us to
do this.
An example from marketing demonstrates how we naturally integrate data

collection. To test a new product we survey a scientifically determined sample
to determine what respondents like and don't like about the product. When we
find an item they do not like, we want to know why. So, we use focus groups to
get in-depth, multivariable data on the item. This often leads to a survey,
further testing our findings.
The OARM has several uses for the training evaluator. The most obvious
application is when confronted by a problem or assignment that involves a new
knowledge area or an area with rapidly changing knowledge. The OARM is a
systematic way to understand, implement, and evaluate its applicability to the
practice field. The same application of OARM can be done as an evaluative
method for any intervention. These interventions can be what you are already
doing or the testing of a new intervention.
OARM represents one way of adapting in an ever-changing practice

environment, and applying new knowledge in the middle of an information
explosion. The OARM systematically applied can provide practitioners with a
process to test their assumptions, which guide their practice. This could result
in a reconstruction of their frame of reference through which they act and make
meaning. This could result in a new personal paradigm of practice and
breakthroughs in the reflection in action.
For the training-and-development evaluator, this can be difficult and

painful. As a new personal paradigm of practice is developed, assumptions
about the very nature of your work may come into question. A more systems-
oriented view will grow, and the interconnectedness among data, interventions,
and results will become more apparent. Eventually, your practice may not even
resemble that which you now do every day. However, the potential growth in
your practice and benefits to your organization and customers is worth the
struggle.
References
Argyris, C. (1982). Reasoning. learning and action: Individual and organizational. San
Francisco: Jossey-Bass.
Argyris, C., Putnam, R., & Smith, D. (1985). Action science. San Francisco: Jossey-
Bass.
Argyris, C., & Schon, D. (1974). Theory in practice: Increasing professional

effectiveness. San Francisco: Jossey-Bass.
Argyris, C., & Schon, D. (1978). Organizational learning: A theory of action

perspective. Reading, MA: Addison-Wesley.
Belenky, M., Clinchy, B., Goldberger, N., & Tarule, J. (1986). Women's way of
knowing. New York: Basic Books
Brown, J. S. (1991). Research that reinvents the corporation. Harvard Business Review.
69(1), 102-111.
Clinchy, B. (1996). Connected and separate knowing: Toward a marriage of two minds.
In Goldberger, N., Tarule, J., Clinchy, B., & Belenky, M. (Eds.), Knowledge.
difference and power (pp. 205-247). New York: Basic Books.
Elfenbein, M., Brown, S., & Knight, K. (1996). Kolb and action research: Additional
support for paradigm integration. Manuscript submitted for publication.
French, W., & Bell, C. (1984). Organization development: Behavioral science

interventions for organization improvement. (2nd ed.) Englewood Cliffs, NJ:
Prentice Hall.
Kolb, D. (1984). Experiential learning. Englewood Cliffs, NJ: Prentice Hall.

Lewin, K. (1984). Action research and minority problems. In Lewin, O. W., (Ed.),
Resolving social conflicts (pp. 201-216). New York: Harper & Row.
Love, A. J. (1991). Internal evaluation. Newbury Park, CA: Sage.
Porras, S. 1. (1987). Stream analysis. Reading, MA: Addison-Wesley.
Schon, D. (1983). The reflective practitioner. New York: Basic Books.
Schon, D. (1987). Educating the reflective practitioner. San Francisco: Jossey-Bass.
Whyte, W. (1984). Learning from the field. Beverly Hills, CA: Sage.
Whyte, W. (1991). Action research for the twenty-first century: Participation, reflection,
and practice. American Behavioral Scientist. 32(5), 499-623.
About the Authors
Stephen M. Brown is Dean of University College and Professor of Adult

Education at Sacred Heart University in Fairfield, Connecticut. He is the co-
editor of Evaluating Corporate Training and the co-author of Outsourcing
Human Resources. He is the co-founder and co-chair of the Assessment,
Measurement, and Evaluation Conference. He also maintains a consulting
practice. Dr. Brown received a bachelor's degree from the University of
Massachusetts, Dartmouth, a master's degree from the University of Rhode
Island, and a doctorate from Boston University.
Morton Elfenbein is a Professor of Psychology and Management and Chairman

of the Psychology Department at the University of Massachusetts, Dartmouth.
He is the author of numerous articles and the book Experimental Research
Methods in Psychology (with Barry Haimson). Dr. Elfenbein has received
bachelor's, master's, and doctoral degrees from Boston University.
Kim H. Knight is a Professor of Psychology and former Chairperson of the

Psychology Department at Roger Williams University in Bristol, Rhode Island.
She is the author of numerous articles and is currently preparing a book
manuscript on the convergence and integration of qualitative and quantitative
research methods. Dr. Knight received a bachelor's degree from the University
of Massachusetts, Dartmouth, a master's degree from the University of Rhode
Island and a doctorate from Boston University.
SECTION III-ISSUES IN
EVALUATION
A number of issues cut across the context and theory of evaluation practice.
This section highlights issues that are both timely and timeless. The section
begins with a discussion by Patricia Lawler of ethical issues related to
educational evaluation. Lawler reviews the literature related to ethics in other
professions, and then suggests a framework for developing ethical practice in
the evaluation of training. The chapter provides the field with a solid
foundation upon which to build.
Sadie Burton-Goss and Michael Kaska examine the cultural dimensions of

educational evaluation. Beginning with the premise that culture significantly
influences the way people think, learn, and behave, the authors go on to explore
the implications of the premise on the evaluation practice. They suggest that
ignoring or discounting the cultural influences on evaluation data will almost
certainly produce misleading information, which, in tum, can lead to incorrect
judgments and conclusions.
Technology is changing how training is delivered. Hallie Ephron Touger

describes some of these changes and how they, in tum, affect how training is
evaluated. Multimedia computer-based training has been with us for some time,
as has the concept of performance support that makes training available on
demand at the workplace. The ability to deliver these solutions over the
network has the potential to truly revolutionize training delivery, placing the
locus of control in the hands of the traditional recipients of training. Touger
discusses the implications of these innovations for the evaluation of training,
both in the formative stages and within the context of deployment across the
business.
236 Issues in Evaluation
Larry Leifer provides a closer look at a specific technology-mediated

distance learning program that includes learners from a university setting and
learners at remote corporate locations. Feedback mechanisms designed to
provide formative information to virtual design teams, as well as outcome
measures of the quality of final products, are technically instrumented. The
results of this work have implications for electronic knowledge-capture and
knowledge-sharing within and across organizational boundaries.
David Basarab's chapter is a case study that illustrates cross-organizational

collaboration of a different kind. Basarab built upon Motorola's corporate
commitment to quality to establish a partnership with business clients aimed at
assessing the application of training back on the job. Using well-established
and well-understood quality processes and tools, a cross-organizational team
has defined an ongoing, level 3 evaluation process that is driven by the
business, and institutionalized within their daily operations.
Ernest Kahane and William McCook explore some of business issues that
are driving the current certification trend in industry. The chapter provides
perspective on industry certification programs relative to other types of
professional certification, and discusses measurement concepts that are central
to the professionally sound and legally defensible implementation of a
certification program.
The chapter by Jean Moon provides insights from another venerable

educational institution, public schools, on the evaluation of educational
outcomes. The practice of performance-based assessment she describes is not
dissimilar to many efforts within industry and higher education to move beyond
rote learning to the application of learning in solving real-life problems.
Although each contribution in this section treats a specific issue, closer

examination suggests that the issues may be intertwined. Can certification be
discussed without attention to ethical use of the data? How do cultural
differences affect employee reactions to technical innovations, or the approach
one uses to engage business managers in the implementation of evaluation? It is
well to keep these interdependencies in mind while reading the chapters in this
section. Look at them as lenses that provide multiple perspectives on the
complex issue of determining the value of learning interventions to the growth
and development of individuals.
11 THE ETHICS OF EVALUATING
TRAINING
Patricia A. Lawler
Introduction
Those responsible for the various aspects of training within organizations are
faced with practical problems and daily challenges as they take on the tasks
associated with evaluation. We usually identify these challenges as work issues
and management problems inherent in every organization. Yet, there are times
when we are faced with a dilemma that evokes a more visceral response.
Consider this situation:
Cathy Reardon was enjoying her job as a new member of the training team at
Rosco futernational Computing Corporation. She really likes two things about her
job, the opportunity to put her presentation skills to use and the positive corporate
culture. At Rosco there was a climate of openness that was quite evident. During the
training for new managers, there were many opportunities for the participants to
learn about the corporation and the Rosco way. So Cathy was surprised when Ralph,
a new manager, was critical of the way in which she incorporated company policy
and philosophy into the Safety Skills Training. Although the discussion did not last
long, Cathy could sense that this participant was disgruntled and wanted to keep
corporate values separate from skill training. When the course finished and the
evaluations were turned in, Cathy saw that although her evaluation as a presenter
was good on all the forms, one evaluation stood out. She suspected it was Ralph
because many of the comments on the evaluation were the exactly same as those he
had made during the course. This participant objected to "the culture indoctrination
aspect of the course, which got in the way of learning the skills."
238 Ethics of Evaluating Training
About a week later Richard, the director of training, approached Cathy and
informed her that he had heard that one of the new managers who had attended her
training was disgruntled and not buying into the Rosco culture: "Cathy, people like
that just don't make good managers here. We need to identify them right in the
beginning." Richard asked her to send him the original evaluation forms and identify
the manager. Cathy wonders what to do, thinking that both she and the participants
assumed that the evaluations were anonymous and confidential. On the other hand,
Richard is her boss and expects her to follow his requests.
Situations like the one facing Cathy are common in work life. They
challenge us in our professional roles and may cause stress, conflict, and even
termination of employment. In this case, what is Cathy to do? What procedure
should she use to think through this dilemma that Richard has posed? Where is
she to tum for guidance? What are her options? Identifying this incident as an
ethical problem and understanding her perspective on ethical problems are the
ftrst steps in the problem-solving process.
Professionals like Cathy will find few resources in her field when faced with
ethical dilemmas. If she turns to the literature on training, she would find little
discussion of ethics, and if she never took a course in philosophy or ethics in
college, Cathy may feel somewhat intimidated by academic writings on the
subject. Without these, Cathy may be unaware of the ethical dimensions of her
work in training and lack an ethical framework with which she can describe
her problem and develop an ethical solution.
In the literature on training evaluation one finds little, if any, discussion of

ethics, ethical decision making, and ethical guidance for practice. Recent
training and training evaluation literature (Basarab & Root, 1992; Kirkpatrick,
1994; Mitchell, 1993; Newby; 1992; Phillips, 1991; Shapiro, 1995; Tracey,
1994) lack discussions of the ethical issues trainers encounter. Those with the
responsibility of evaluation, of their own and of other's training efforts, must
rely on discussions in other fields, such as education and the social sciences, for
information on standards of good practice and codes of ethics. This chapter
addresses this paucity of information and presents a discussion of ethics in
training evaluation for the practitioner. To provide a common framework for
training evaluators, an overview of basic ethical concepts, along with
identifying and defining ethical problems will be presented with practical
applications based on research and practice from the fields of education,
program evaluation, assessment, and human resource development.
Professional organizations in various fields have put forth standards and

codes to guide practice. A discussion of these standards is crucial to
understanding the place of ethics in the professions and may provide a guide
for trainers like Cathy. The work of several organizations will be presented and
reviewed. The chapter concludes with strategies for improving the ethical
practice of training evaluation through research and professionalization.
Evaluation of Training
Definitions of evaluation and trammg evaluation abound in the literature.

Basarab and Root (1992) describe evaluation as
a systematic process by which pertinent data are collected and

converted into information for measuring the effects of
training, helping in decision making, documenting results to
be used in program improvement and providing a method for
determining the quality of training (p. 2).
Along with others (Brookfield, 1988; Murninghan, 1989), Basarab and Root
focus on the process as a value-laden endeavor occurring in an cultural context.
Gordon (1991) sees evaluation as part of a system, stressing that to be
meaningful evaluation must be considered within a broader context. Brookfield
(1988) specifically describes evaluation's "political nature" (p. 90), that is, how
it is influenced by the actors or stakeholders in the process.
Evaluation is a process that supplies information for decision making within

this political context. Inherent in this process is making 'judgments of worth"
(Newby, 1992, p. 24). Those making the judgments have values, biases, and
experiences that influence how the decisions are made. The organization has a
cultural context that also influences this process. The judgments concern not
only the worth of the training but also its relevancy and reliability (Newby,
1992). Chalofsky (1994) asserts, "The success of an evaluation is not a matter
of how much information is obtained: rather it is a function of the influence
that the evaluation has on decision making" (p. 1336).
The evaluation process therefore provides fertile ground for conflicts and
dilemmas as personal and professional values, along with organizational
culture and goals, collide. Dealing with these issues in an effective way is an
important part of the professional life of the training evaluator. It is important

to frame the problem as having an ethical dimension.
Identifying Ethical Problems
The case study presented at the beginning of this chapter reveals several
characteristic features of an ethical problem.
First, there is a question about right and wrong conduct. Cathy was asked to
provide information about the participants that they might not have provided
had they known it would be used for personnel decisions. Cathy is concerned
that it may be wrong to provide that information. Note that this question is not
the same as asking if it would be an effective managerial tool to use the
evaluations in this way. That frames the issue simply as a management
problem. From that standpoint, it might be very useful to use this information
to weed out the Ralphs who do not fit the company's culture. But the ethical
issue is whether it would be right to treat the participants this way, not whether
it would advance organizational goals.
Standards of right and wrong conduct are stated as rules or principles. Some
are familiar, garden-variety ethical rules, such as don't lie, don't cheat, don't
steal, etc. The Ten Commandments are a set of ethical rules, as are professional
and business codes of ethics. These rules state our obligations to others to
conduct ourselves as the rules require. Obligations typically can be restated as
rights. My obligation not to reveal confidential information can be restated as
your right not to have that information revealed.
The question for Cathy is this: "What ethical rules of conduct do I apply to
this situation?" Her organization's code of ethics, if there is one, and its
policies on confidentiality would be a good place to start. There are professional
standards and codes of conduct that also should be consulted. These will be
discussed later in the chapter.
Ethical problems frequently, though not always, contain a conflict of

obligations or values. As Burns and Roche (1988) point out, "The practice of
ethics is an exercise in clarifying the nature of conflicting claims" (p. 53).
Cathy believes that she has an obligation to the participants not to reveal the
information; at the same time, as a loyal employee she has an obligation to
follow legitimate management directives. Are botb obligations valid ones? If so,
which should take precedence? Professional standards and codes of etbics are
often helpful in resolving tbese kinds of situations.
Etbical problems often embody conflicts between obligations and otber

goals. If Catby decided tbat she had an etbical obligation not to provide tbe
information and refused Richard's request, her career might be in jeopardy.
Meeting her etbical obligation might mean delays on her career patb or even
termination. The conflict is exacerbated by tbe imbalance of power between
Catby and her boss. He conducts performance reviews and is influential in
decisions regarding her salary and promotion. She has less power and is,
tberefore, more vulnerable to retaliation for following her etbical obligations.
This often poses painful dilemmas for conscientious practitioners.
Defining Ethical Problems
Why is Catby's problem an etbical one? First, tbere are professional rules of
conduct regarding participant confidentiality and tbe participants assumed tbat
tbeir evaluations were confidential and would not be used for otber purposes.
On tbe otber hand, Catby is a member of an organization, Rosco International
Computing Corporation, and a subordinate to Richard. She has obligations to
follow legitimate organizational requests and work for organizational goals.
These obligations seem to be in conflict, creating an etbical dilemma.
Etbical sensitivity, tbe ability to identify etbical issues, is useful in tbese

types of situations. Educators and otbers who are writing on in tbe field of
program planning and etbics (Cervero and Wilson, 1994; Brockett, 1990;
Lawler and Fielder, 1993) report tbat since every organization engaged in tbe
evaluation of training is faced witb etbical problems, etbical sensitivity is
necessary to recognize tbese situations. Being able to articulate tbe situation in
etbical terms reframes management problems and provides opportunities to
develop etbical solutions.
There are several frameworks Catby could use to increase her etbical
sensitivity. One such framework has been proposed by Zinn (1993). The
following nine questions, when answered yes, may indicate tbat you are facing
an etbical dilemma.
1. When I (or others) talk about this matter, do we use key words or phrases, such
as right or wrong, black or white, fair/unfair, bottom line, should, appropriate,
ethical, conflict, or values?
2. Is there any question of legality? Of violation of professional standards or codes

of ethics?
3. Will there be potential harm to anyone as a result of my decision/action, or of

my failure to act?
4. Am I concerned about my decision being equally fair to all parties?
5. Do I feel a conflict between my personal values and professional interests (e.g.,

institutional goals or client needs)?
6. Is there any controversy or strong opposition regarding this decision? Will it be

divisive in some way?
7. Will I be hesitant to reveal my decision about this matter to others? Would I

take the same action in a "clean, well-lit room"? (This is a favorite test used by
counselors and therapists.)
8. Do I have a gut feeling that something is not quite right about this?
9. Is this a decision that nobody else wants to make? (p. 8)

If the answer to any of these questions is yes, your problem probably has a
significant ethical dimension.
Ethical Decision Making
Ethical dilemmas require action. Cathy needs to make a decision and respond
to Richard's request. Framing and analyzing the ethical problem in decision-
making terms is useful. Many models have been proposed, providing steps and
questions to aid training evaluators in this ethical decision-making process
(Lawler and Fielder, 1991; McLagan, 1989; Newman and Brown, 1996;
Walker, 1993). These models are similar in that they suggest a linear approach
to decision making. McLagan (as cited in "Ethics for Business," 1991) sets out
a model for HRD professionals, which includes four steps:
1. Identify the quandary and the ethical issue involved. Identify the goals a
solution would achieve.
2. Identify the consequences of the possible solution-both positive and

negative--on individuals, the organization, and the profession.
3. Determine the right course of action.
4. Would the employee be willing to publicize the decision or action to fellow

professionals? (p. 8)
McLagan encourages us to share this infonnation, involve others in the
process and at least make the decision known.
Reviewing ethical issues in an open forum-so people can

understand the complexities influencing decisions about their
resolution-is often the only way to handle an ethical
challenge fairly (p. 42).
This model provides a framework, but also assumes that the training
professional has a working knowledge of ethical issues and is clear on his or
her own values. Although helpful, a model such as McLagan's may not take
into consideration all of the values and contextual factors inherent in an ethical
dilemma. Walker (1993) reminds us that the competing and numerous values
and interests in these dilemmas make the decision-making process difficult.
"Decisions are made in the context of professional, social, and economic
pressures, which may clash with or obscure the moral issues." (p. 14) Austin
(1992) talks about risk taking as a consideration, as well as the severity of the
problem.
A more helpful framework for ethical decision making is one suggested by

1. H. Fielder (1996). It consists of the following steps:
1. What is the ethical question? For Cathy it is: How should I respond to Richard's
request for training evaluation information? This locates the problem where the
decision maker encounters it.
2. What are my alternatives? This step asks the decision maker to consider a
number of different responses, such as complying with the request, refusing to
provide the information, and as I discuss below, finding a way to help Richard
evaluate Ralph without using information from the training evaluation.
3. What rules or principles should guide my choice of alternatives? The strength of

this model is that it focuses on articulating shared ethical rules as the basis for
the decision. Cathy could use a corporate code of ethics, if one is available, a
professional code of conduct, or similar policy statement. Her role
responsibilities for being a training evaluator can also be constructed from an
examination of other codes of ethics from related fields. In addition, her role as
a Rosco employee will bring in additional rules of conduct that may be in
conflict with her role as a training evaluator.
4. What personal or organizational goals are at stake? Frequently there are

conflicts between what is ethically required of us and what we want for
ourselves or our organization. Richard's goal of evaluating managers is a
legitimate one, as is Cathy'S goal of advancing her career. The ideal result in
decision making is to find a way to meet all of our ethical obligations and as
well as other goals.
5. Of all things considered, ethical obligations, other goals, and practical matters,
what is the best alternative? Decision making is ultimately a matter of striking a
balance among conflicting demands. While there is no formula for determining
this balance, making the elements explicit and clarifying their importance will
insure that the important factors will be taken into consideration.
Role Responsibility
Training professionals have many roles and along with those roles come
responsibilities and obligations to various stakeholders. Considering the various
roles professionals have is useful to incorporate contextual factors into the
decision-making process. It is important to be aware of these various roles, such
as, trainer, evaluator, reporter of data, administrator or manager, member of a
profession, and member of society (Newman and Roche, 1996). Austin (1992)
also reminds us of our personal roles, and their concurrent values, which may
influence our work life. With each of these roles come specific, and at times
conflicting, responsibilities. For instance, Cathy is faced with her role as a
training evaluator and its responsibility of participant privacy. At the same
time, she is faced with her role as an administrator obligated to her manager,
Richard. Her goals as an evaluator and those of Richard, her boss, collide in our
case.
The concept of role responsibility helps to clarify the obligations people have
by deriving some of them from their roles. According to this approach, each
role embodies a set of rights and obligations-ethical, social and legal-in a

social or organizational function. Determining whether individuals act ethically
is often a matter of deciding what the person's relevant role obligations are,
whether or not they are fulfilled, and whether there were any excusing or
justifying conditions present.
An important subset of role obligations are professional obligations. These

are obligations that apply to special kinds of careers, which are called
professions. Law, medicine, nursing, and engineering are typical examples of
professions. They are characterized by three items: a) the possession of
specialized skills and knowledge, b) extensive, formal education involving
substantial theoretical content, and c) provision of service performed by the
practitioner is essential to the community (Bayles, 1988). Professional
obligations, embodied in codes of ethics, represent a consensus by the
profession concerning proper conduct in that role.
Typically these obligations are owed to various stakeholders, those who have
a significant stake in the professional's decisions. In Cathy's case, these would
include the participants who are being evaluated, the requester of the
evaluation, the organization within which the evaluation takes place, and of
course, Cathy herself.
Ethical Principles
Training practitioners interested in ethical principles applicable to evaluation

may find guidance in the fields of education, social science, and human
resource development. For instance, general guidance for human resource
professionals, and specifically for members, can be found in the American
Society for Training and Development's National Code of Ethics. The code
states that members will strive to
• Recognize the rights and dignities of each individual.
• Develop human potential.
• Provide employers, clients, and learners with the highest level of quality
education, training, and development.
• Comply with an copyright laws and the laws and regulations governing my
position. Keep informed of pertinent knowledge and competence in the human
resource field.
• Maintain confidentiality and integrity in the practice of my profession. Support

my peers and avoid conduct that impedes their practicing their profession.
• Improve the public understanding of human resource development and

management.
• Fairly and accurately represent my human resource development/human

resource management credentials, qualifications, experience, and ability.
• Contribute to the continuing growth of the society and its members. (Gordon &
Baumhart, 1995, p. 6)
This code presents a call to high standards with very general directions for
specific issues and decision making. Although professionals need generalized
codes to strive for in their practice, when faced with a dilemma the codes may
not offer much help. For those specifically involved with evaluation and its
unique issues, this code is not precise enough to be very helpful.
The Academy of Professional Consultants and Advisors has a similar code

for its members. They address the right to confidentiality and privacy with their
directive:
Do not disclose proprietary data or information obtained

while serving a client to others or make use of such data or
information in the service of another client without the
permission of the rightful owner of such data and information
(Gordon & Baumhart, 1995, p. 4).
This would not be of much help to Cathy, since she is not being asked to reveal
information belonging to one client to another.
McLagan (1989), in her discussion of roles as an HRD, professional cites six

ethical issues that practitioners may face in their role as evaluators of training:
confidentiality, inappropriate requests, organization versus individual needs,
customer/user participation, conflicts of interest and individual and population
differences. She asserts that these issues "describe the major, predictable areas
in which people may face challenging ethical dilemmas as they do HRD work"
(p. 41). However, she fails to present a discussion of these issues or their
underlying ethical principles.
Another source of ethical guidance for training evaluation is the

American Psychological Association (APA) Ethical Principles of Psychologists
and Code of Conduct (1992). This extensive document is designed to provide
guidance to psychologists as they perform a variety of functions in research,
teaching, and testing. As a result, it can provide only general and indirect
advice on evaluation of training. For example, if Cathy had looked for guidance
on her problem, she would have found in section 2.02, Competence and
Appropriate Use of Assessments and Interventions, "Psychologists refrain from
misuses of assessment techniques, interventions, results, and interpretations
and take reasonable steps to prevent others from misusing the information these
techniques provide" (p. 1603). While this would alert her to issues of misuse of
data, it does not provide a way to distinguish appropriate and inappropriate
uses of information. Is the request made of her a misuse of evaluation of data?
The only example given in section 2.02 is releasing raw test results to persons
not qualified to use such information, which does not apply to her situation.
Similarly, section 5.05, Disclosures, states, "Psychologists disclose

confidential information without the consent of the individual only as mandated
by law, or permitted by law for a valid purpose, such as (1) to provide needed
professional services to the patient or the individual or organizational client ..
." (p. 1606). They may also "disclose confidential information with the
appropriate consent of the patient or the individual or organizational client ..."
(p. 1606). Does this help Cathy decide to comply with Richard's request? The
first question is whether or not the information is indeed confidential. An
assessment or test done by a psychologist, who may be licensed by the state
with attendant legal obligations, is probably confidential, but should a training
evaluation be treated similarly? Second, who may give consent to disclose, the
individuals who provided the information, or the organization? The answers to
the questions are not obvious, mainly because the APA Code was not really
designed to address the kinds of ethical problems training evaluators encounter.
In 1994, The Joint Committee on Standards for Educational Evaluation,

representing 16 profesSional organizations, published a comprehensive report
setting forth a list of 30 standards. The Committee defines a standard as "a
principle mutually agreed to by people engaged in a professional practice, that
if met, will enhance the quality and fairness of that professional practice, for
example, evaluation" (The Joint Committee on Standards for Educational

Evaluation, 1994, p. 2). While not addressing ethical issues directly, these
standards offer broad guidelines for the evaluator, and raise awareness of areas
likely to have ethical problems.
The Program Evaluation Standards may offer much to evaluators in the

training field. Although lengthy, they are accessible in the comprehensive text
The Program Evaluation Standards, 2nd Edition, which includes guidelines for
practical application, and an abbreviated version in an ERIC/AE Digest (1995).
The standards are classified under four attributes of evaluation: utility,
feasibility, accuracy, and propriety. (Appendix A)
There are seven utility standards, which provide guidance for how
evaluations should be "informative, timely, and influential. Overall, the utility
standards define whether an evaluation serves the practical information needs
of a given audience" (The Joint Committee on Standards for Educational
Evaluation, 1994, p. 5). They include stakeholder identification, evaluator
credibility, information scope and selection, values identification, report clarity,
timelines and dissemination, and evaluation impact. This last standard deals
with use of the evaluation and not with the integrity with which the evaluation
is used.
The setting where evaluation takes place is the focus of the three feasibility
standards. "Taken together, the feasibility standards call for evaluations to be
realistic, prudent, diplomatic, and economical" (The Joint Committee on
Standards for Educational Evaluation, 1994, p. 6). They include: practical
procedures, political viability, and cost effectiveness. Here a warning of
political context is found but nothing to guide the evaluator through the
context.
The 12 accuracy standards focus on the soundness of the information and

"are intended to ensure that an evaluation will reveal and convey accurate
information about the program's merit and/or worth" (The Joint Committee on
Standards for Educational Evaluation, 1994, p. 6). These standards cover the
areas of: program documentation, context analysis, described purposes and
procedures, defensible information sources, valid information, reliable
information, systematic information, analysis of quantitative and qualitative
information, justified conclusions, impartial reporting, and metaevaluation.
Again, these provide direction for action, but little help in conflicting
situations.
The eight propriety standards concentrate on the rights of individuals

affected by the evaluation. "They promote sensitivity to and warn against
unlawful, unscrupulous, unethical, and inept actions by those who conduct
evaluations" (The Joint Committee on Standards for Educational Evaluation,
1994, p. 5). These eight standards directly address ethical issues and include:
service orientation, formal agreements, rights of human subjects, human
interactions, complete and fair assessment, disclosure of findings, conflict of
interest, and fiscal responsibility. They provide guidance and direction on such
issues of respect for individuals, integrity, honesty, and justice.
Although these standards are geared to the educational evaluator, the spirit
in which they are written and presented has worth for those in training.
Basarab and Root (1992) cite these standards as important to conducting
metaevaluations.
Two other professional organizations have published information for

evaluation. The American Evaluation Association Task Force on Guiding
Principles for Evaluators (1994) and the prepared Guiding Principles for
Evaluators. The principles are
1. Systematic Inquiry: Evaluators conduct systematic, data-based inquiries about
whatever is being evaluated.
2. Competence: Evaluators provide competent performance to stakeholders.
3. Integrity/Honesty: Evaluators ensure the honesty and integrity of the entire

evaluation process.
4. Respect for People: Evaluators respect the security, dignity, and self-worth of
the respondents, program participants, clients, and other stakeholders with
whom they interact.
5. Responsibilities for General and Public Welfare: Evaluators articulate and take
into account the diversity of interests and values that may be related to the
general and public welfare (p. 20).
These guiding principles address ethical issues in a general way but once
again give little guidance in the actual practice of making ethical decisions.
The American Educational Research Association ("Ethical standards,"

1992) has developed ethical standards in seven categories, which focus on
conducting and reporting research in the field of education. Like the standards
already cited, these focus on the rights of the various stakeholders involved in
research, especially vulnerable participants, along with the responsibilities
researchers have to the professional, research populations, educational
institutions, and the public. There is also guidance here for intellectual
ownership, reviewing and appraising research, as well as standards for the
evaluation of sponsors and users.
While there is value in all of these different codes of ethics and standards,
the fact remains that none of them was specifically designed to address ethical
issues in training evaluation. As a result, they cannot provide much guidance to
practitioners trying to deal with ethical problems.
Making an Ethical Decision
How do we interpret these principles, standards, and codes? What information

do they have for us in the daily practice of training evaluation? Would any of
them be useful to Cathy in her decision-making process? None of these lists of
standards and principles is very specific to her situation. But the lists do call
attention to the rights of participants and the need to think about proper
disclosure of information. A useful way to try to determine Cathy's
responsibilities here is to ask how the participants would feel to learn that their
evaluations were being used for personnel decisions. Would they feel betrayed?
Did they assume that the information was only for evaluation of the training?
The answers would probably be yes, indicating that the participants felt that the
proper ethical conduct would be to use the information only for evaluation.
Thus one of the ethical rules to guide practice would be to use information
provided by participants for evaluation of training only for that purpose unless
it is made clear to participants that it will be used for other purposes. If the
organization wants that information for other purposes, it should be made clear
to the participants. Of course, such a policy of informed consent might result in
participants not providing as much useful information for the evaluation.
A better policy would be to explicitly make clear to participants that the

evaluation information will only be used for that purpose. Therefore, Cathy
should refuse Richard's request explaining why it would not be right to do so.
She should also acknowledge the company's legitimate goal of weeding out
managers who do not fit the company culture. A political solution would be to
offer her observations about Ralph, based on her perceptions of his behavior in
the class. That information is not confidential and there are no ethical
prohibitions to sharing her observations of managers to her superiors. This
solution separates two legitimate demands-<:onfidentiality and personnel
evaluation-and meets both. Many good solutions to ethical problems have this
structure: find a way to meet organizational goals that does not infringe on the
rights of any of the stakeholders.
In summary, using Fielder's (1996) decision-making model, Cathy

1. stated the ethical question: "How should I respond to Richard's request to use
participant evaluations for personnel decisions?"
2. set out some alternatives: give Richard the information, refuse to provide it
3. considered what ethical rules and principles should be used to evaluate her
alternatives: use training evaluation information only for that purpose unless
participants have been informed otherwise
4. considered organizational and personal goals: Rosco's legitimate goal of

evaluating managers and Cathy's career aspirations
5. considered obligations, practical matters, and other goals, and then chose the
best alternative: find another ethically acceptable way to assist Richard in his
personnel evaluation.
The crucial element is the identification of the ethical rules and principles to
guide evaluation of alternative courses of action. At the present time,
practitioners like Cathy have little guidance from their professional
organizations that would help them understand the ethical problems in their
work as training evaluators and that would provide guidance regarding their
ethical obligations.
Improving the Ethical Practice of Training Evaluation
Training professionals need to address the ethical dimension of their work,

especially in the area of evaluation. The lack of discussion and guidance for
practitioners like Cathy create stressful dilemmas and poor practice. The
profession has an obligation to identify typical ethical problems that arise in
training evaluation and establish guidelines for practical application. Issues,
such as confidentiality, conflict of interest, bias, misuse of information, and
interpreting data, appear over and over again in the general evaluation
literature. These general discussions of issues and problems are not necessarily
useful. How specifically evaluators of training deal with these issues needs to be
addressed. What are the specific ethical problems that evaluators of training
face in their practice? It is critical to have these real-life problems identified
and discussed.
According to Sanders (1989), this lack of guidance for the professional is

especially acute. Describing training in business settings, his concern is that
persons who are evaluating the training have little or no formal education in
program evaluation and are usually the trainers themselves or the clients of
trainers. His observations of the "substantial discrepancies between the
evaluation practices described in the literature oriented to trainers and the
actual evaluation practices of trainers" (p. 69) are very similar to the ethical
problems raised in this chapter. A lack of education, along with such a
discrepancy, may be perceived as a lack of enculturalization into the profession,
with its accompanying discussion of values and ethics. Professional trainers
who have the role of evaluator can profit from a discussion of real-life problems
and the profession's guidelines for navigating them.
At present, trainers involved in evaluating have little to tum to for guidance.

However, those in the profession who see this acute need can follow the
direction of other professions in developing a course of action to address this
need. First, it is imperative to understand the real-life situations in which
training evaluators find themselves. The literature provides little assistance in
this area (Sanders, 1989). Eliciting this information from practitioners is called
for. The various standards, guidelines, and codes focusing on evaluation should
be reviewed to assess their relevance to these real-life situations. Codes and
standards may not always address what practitioners find to be problematic
(Austin, 1992; Lawler and Fielder, 1993). Love (1994) has an additional
concern, "Given the varied educational backgrounds and professional
affiliations of evaluators, they may practice under several different potentially
conflicting ethical codes" (p. 33). This conflict is even in research with
educators, which has shown that practitioners find role conflict and
professional expectations to be the most crucial areas of ethical problems
(Lawler and Fielder, 1993). Yet, there is little discussion in the training
evaluation literature of these issues.
I propose two recommendations to ameliorate this problem. First, research

needs to be done to identify the ethical issues, problems, and dilemmas training
evaluators encounter in their everyday practice. When the APA developed their
code in the 1950s, they undertook a survey of psychologists to identify the kinds
of ethical problems they encountered in their work. A similar survey was
carried out by Lawler and Fielder (1993) with professionals in continuing
higher education. Professional organizations and those in academic programs
should take the lead in this research. Models for research into ethical issues are
available and adaptable (Lawler and Fielder, 1993; Lawler, 1996). Data from
asking practitioners what they find to be ethical challenges should then be
disseminated in the literature, at professional meetings, and throughout the
profession, and clearly defined as real~life problems.
The second recommendation would be to use this process of identifying

typical ethical problems to develop consensus among professional training
evaluators concerning their ethical obligations. A code of ethics should embody
the considered judgments of practitioners concerning ethical rules of their
practice.
Improving training evaluation requires the development of consensus-based

professional standards to guide practice. This is true not only for standards of
validity and reliability but also true for ethics. Training evaluators cannot
simply rely on other professional organizations to provide the guidance for
practice that Cathy and other conscientious training evaluators need. This
should not be seen as simply a problem but as an opportunity for those in
training evaluation to establish themselves as true professionals by setting up
their own standards of practice. My hope is that this chapter will encourage
training evaluators to begin work toward developing a code of ethics that is
specifically designed for the ethical issues in their valuable work.
References
The American Evaluation Association, Task Force on Guiding Principles for Evaluators.
(1995). Guiding principles for evaluators. In W.R. Shadish, D.L. Newman, M.A.
Scheirer, & C. Wye. (Eds.). Guiding principles for evaluators, San Francisco: Jossey-
Bass.
Austin, N. K. (1992, September). Ethics: Personal vs. professional. Working Woman,

28,32.
Basarab, D. I., Sr., & Root, D. K. (1992). The training evaluation process: A practical
approach to evaluating corporate training programs. Boston: Kluwer Academic
Publishers.
Bayles, M. D. (1988). The professions. In J. C. Callahan (Ed.). Ethical issues in

professional life (pp. 27-30), New York: Oxford University Press.
Brockett, R. G. (1990). Adult education: Are we doing it ethically? MPAEA Journal,

Fall,5-12.
Brookfield, S. (1988). Ethical dilemmas in evaluating adult education programs. In R.

G. Brockett (Ed.). Ethical issues in adult education (pp. 88-102). New York:
Teachers College Press.
Burns, J. H., & Roche, G. A. (1988). Marketing for adult educators: Some ethical
questions. In R.G. Brockett (Ed.). Ethical issues in adult education (pp. 51-63). New
York: Teachers College Press.
Cervero, R. M., & Wilson, A. L. (1994). Planning responsibly for adult education: A
Guide to negotiating power and interest. San Francisco: Jossey-Bass.
Chalofsky, N. F. (1994). In W. R. Tracey, (ed.). Human resources management and

development handbook (2nd ed.). (pp. 1323-1342) New York: Amacom, American
Management Association.
Department of Education. (1995). The program evaluation standards (Report No. EDO-
TM-95-7) Washington, D.C.: The Catholic University of America. (ERIC Document
Reproduction Service No. ED 385 612).
Ethical principles of psychologists and code of conduct. (1992). American Psychologist.

47(12), 1597-1611.
Ethical standards of the American Educational Research Association. (1992).

Educational Research. 21(7), 23-26.
Ethics for business, (1991, March). LINE. 103. 1-12.
Fielder, J. (1996). A framework for ethical decision making. Unpublished manuscript,

Villanova University, Villanova, PA.
Gordon, E. E., & Baumhart, J. E. (1995, August). Ethics for training and development.
INFO-LINE, (9515). Alexandria, VA: American Society for Training and
Development.
Gordon, J. (1991, August). Measuring the goodness of training, Training, 19-25.
The Joint Committee on Standards for Educational Evaluation. (1994). The program
evaluation standards (2nd ed). Thousand Oaks, CA: SAGE Publications.
Berrett-Koehler Publishers.
Lawler, P. (1996). Developing a code of ethics: A case study approach. The Journal of
Continuing Higher Education. 3(44), 2-14.
Lawler, P., & Fielder, J. (1991). Analyzing ethical problems in continuing higher
education: A model for practical use. The Journal of Continuing Higher Education.
39(2), 20-25.
Lawler, P., & Fielder, J. (1993). Ethical problems in continuing higher education:
Results of a survey. The Journal of Continuing Higher Education. 41(1), 25-33.
Love, A. 1. (1994). Should evaluators be certified? fu J. W. Altschuld & M. Engle,

(Eds.). The preparation of professional evaluators: Issues. perspectives. and
programs. San Francisco: Jossey-Bass.
McLagan, P. A. (1989). Models for HRD practice, Alexandria, VA: American Society
for Training and Development.
Mitchell, G. (1993). The trainer's handbook: The AMA guide to effective training. New
York: Amacom, American Management Association.
Muminghan, M. M. (1989). The ethics enterprise. Foundation News, 61-63.
Newby, A. C. (1992). Training evaluation handbook, San Diego, CA: Pfeiffer &
Company.
Newman, D. L., & Brown, R. D. (1996). Applied ethics for program evaluation,
Thousand Oaks, CA: SAGE Publications.
Phillips, 1. 1. (1991). Handbook of training evaluation and measurement methods (2nd

ed.). Houston, TX: Gulf Publishing Company.
Sanders, N. M. (1989). Evaluation of training by trainers. In R. O. Brinkerhoff (Ed.).

Evaluating training programs in business and industly (pp. 59-70). San Francisco:
lossey-Bass.
Shapiro, L. T. (1995). Training effectiveness handbook. New York: McGraw-Hill.
Tracey, W. R. (ed.). (1994). Human resources management and development handbook

(2nd ed.). New York: Amacom, American Management Association.
Walker, K. (1993). Values, ethics and ethical decision making. Adult Learning, 5(2),
13-14,27.
Zinn, L. M. (1993). Do the right thing. Ethical decision making in professional and
business practice. Adult Learning, 5(2), 7-8.
About the Author
Patricia Lawler is an Associate Professor in the Center for Education at Widner

University. Her current research is in ethics and faculty development from an
adult learning perspective. She also consults in the area of ethical issues in
education and training. Dr. Lawler received a Ed.D. from Columbia Teachers
College in Continuing Higher Education.
The author wishes to acknowledge the contributions of Katherine
Beauchemin, Alicia Carnett, John Fielder, and Ann McHenry in preparing this
chapter.
12 CULTURAL DIMENSIONS OF
EVALUATION
Sadie Burton-Goss and Michael Kaska
The Nature of Culture and Its Implications on Training
In recent years, training professionals have experienced increasing demands to

become more business- or results-focused and deliberate in the design and
evaluation of training impact. At the same time, business environments and
organizations have become increasing complex. Two reasons for the new level
of complexity is the changing workforce and the global business environment.
There are, now, many more factors to consider for the successful evaluation of
training in organizations. One of the most compelling factors, and perhaps least
considered, is culture.
A training professional might be asked to design and deliver a new training

curriculum for a company's operation in Indonesia or to facilitate a self-
directed team building program in EI Paso, incorporating all of the current
thinking and best practices in the field. After delivering the program, there may
be a feeling that the training did not quite have the impact hoped for. A quick
review of the evaluation data doesn't seem to accurately reflect what was seen
or experienced in the classroom. The trainer is not even certain he can rely on
the information to make the usual changes and improvements. In another
setting, a trainer may have just completed a program originally designed for a
sales function in a company. Participants had consistently evaluated the
program as excellent, meeting and exceeding expectations. The same program,
258 Cultural Dimensions of Evaluation
delivered for another function in the same company has been evaluated as a
dismal failure by both the participants and their managers. Training
professionals in a variety of settings can anticipate becoming increasingly
perplexed by these types of experiences. What new insight will we need to
address these experiences? What additional knowledge or considerations will
increase our ability to evaluate training in this ever-changing environment?
Perhaps part of the answer lies in the uncomfortable realization that most, if
not all, of the underlying assumptions we have come to accept as gospel for
evaluating training may not be universal truths-that the templates we have
created, which tell us whether a training program has or has not been effective
may not be so easily transferred in a multicultural setting with equal
effectiveness. Taylor Cox, author of the book Cultural Diversity in
Organizations: Theory, Research and Practice, urges us to consider the
elements of cultural differences.
Cultural differences potentially explain a great deal about the less-

than-fully satisfactory experiences of employees in diverse work
groups. In the increasingly global economy, the importance of
cultural differences among people of different nationalities is
becoming more obvious. However, most of us have only a cursory
knowledge of what these differences are and how they may affect
behavior. Moreover, it is widely believed that cultural differences
exist only between people of different nationalities, whereas the
message of this chapter is that there are significant culture group
differences within societies (Cox, 1993, p. 27).
Significant culture group differences exist not only within societies but
within organizations, work environments, and across functions. The potential
impact of culture on evaluation effectiveness and the resulting implications are
significant.
In his book Understanding Culture's Influence on Behavior, Brislin

suggests that the following features of culture be taken into account:
1. Culture consists of ideals, values, and assumptions about life that guide
specific behaviors.
2. Culture consists of those aspects of the environment that people make.
3. Culture is transmitted generation to generation, with the responsibility

given to parents, teachers, religious leaders, and other respected elders in a
community.
4. The fact summarized in point 3 means that there will be childhood
experiences that many people in a community remember happening to
them.
5. Aspects of one's culture are not frequently discussed by adults. Since
culture is widely shared and accepted, there is little reason to discuss it
frequently.
6. Culture can become clearest in well-meaning clashes. This term refers to
interactions among people from very different backgrounds. They may
behave in proper ways according to their socialization, but there is a clash
when the people from different cultures interact.
7. Culture allows people to "fill in the blanks" when presented with a basic
sketch of familiar behavior.
8. Cultural values remain despite compromises and slip-ups. Even though
people can list exceptions, the culture value is seen as a constant that
continues to guide specific behaviors.
9. There are emotional reactions when cultural values are violated or when a
culture's expected behaviors are ignored.
10. There can be acceptance and rejection of a culture's values at different
times in a person's life. Common examples involve rebellious adolescents
and young adults who accept a culture's expectations after having children
of their own.
11. When changes in cultural values are contemplated, the reaction that "this
will be difficult and time consuming" is likely.
12. When comparing proper, expected behavior across cultures, some
observations are summarizable in sharp contrast. Examples are the
treatment of time, the spatial orientations that people adopt, and the clarity
(versus lack thereoO of the rules and norms for certain complex behaviors
(BrasIin, 1993, pp. 24-25).
Helen and Robert Moore, partners and senior consultants for the Interactions
Group in Massachusetts, offer the metaphors of vantage point and filters to
articulate culture and its impact. 1 Culture is the product of our vantage point,
who we are and where we are, as well as our experiences over a lifetime that
ultimately shape our perceptions, values, and beliefs. These are the elements of
our cultural response to data and experiences, including training experiences.
From one commonly held cultural vantage point, we may often assume that
anonymity ensures objectivity. After all, if participants do not have to reveal
their identity, then there is a good likelihood that they will be bluntly honest
with their feedback. Deal, Kennedy, and Allan, authors of the book, Corporate
Cultures: The Rites and Rituals of Corporate life, remind us that values are
the bedrock of culture. Yet, if preserving face and not offending anyone were
values deeply ingrained in someone since childhood, the anonymity of the
evaluation process (form) would have little to do with the quality of the
feedback (substance). Do we unintentionally value the form of evaluation more
than its substance?
Certainly quantifiable data generated by Likert scales and analyzed using the
holy grail of statistics are beyond refute, right? Perhaps. Yet consider the fact
that much of the evaluation data we generate are actually qualitative in origin.
"How confident are you 1" "Rate the degree to which you feel ... " "To what
extent will you use ... 1" Feelings, attitudes, and opinions are reflections of
deeply rooted cultural values. Some cultures (e.g., American) do not seem to
have a great deal of difficulty being self-critical. Yet, the cloak of scientific
method alone may not be enough to account for the influence of collectivism on
a group or the difficulty participants from other cultural backgrounds may have
when asked to be introspective and self-critical.
According to Susan E. Jackson (1992), author of Diversity in the Workplace:
Cultures have consequences that are easily experienced but more

difficult to describe. For many people, the concept conjures up
images of the exotic customs, religions, food, clothing, and life-
styles that make foreign travel-as well as trips into the ethnic
enclaves in local cities-both stimulating and enjoyable. These
aspects of a foreign culture can be experienced without ever
engaging in conversation with someone from that culture. It is
also easy for business to accommodate these aspects of cultural
diversity-the cafeteria can offer a variety of ethnic, foods and
flexible policies can allow employees to observe whichever
Evaluating C.orp.orate Training: Models and Issues 261
h.olidays they choose. H.owever, the deeper c.onsequences .of

culture-such as values and ways .of interpreting the w.orld-
cann.ot be handled merely by changing menus and p.olicies. And it
is these deeper consequences that .organizati.ons are struggling
with today. When people with different habits and world views
come t.ogether in the workplace, misunderstandings and c.onflicts
inevitably occur as a result .of dissimilar expectati.ons and norms"
(Jacks.on, 1992, pp. 22-23).
The p.oint here is n.ot t.o simply questi.on the value .of using written
evaluati.on instruments in multicultural settings. If that were the .only issue, it
eQuId easily be res.olved by substituting .observati.on .of behavi.or and
measurement .of results as indicati.ons .of training effectiveness. Yet imagine the
difficulty f.or a trainer attempting t.o accurately assess the behavi.or .of an
individual .or individuals wh.ose cultural upbringing placed an abs.olute
premium .on .one's ability t.o .outwardly c.ontr.ol any expressi.ons .of em.oti.on.
C.onsider the impact .on evaluati.on if this includes disagreement .or
dissatisfacti.on with a learning experience?
It might appear .one eQuId perhaps argue that the best way t.o evaluate
training in a multicultural setting w.ould be t.o cut t.o the chase and c.onsider
.only the quantifiable results .of th.ose eff.orts. Perhaps that is true. Yet assume
for a m.oment that the results .of a particular training interventi.on were less (.or
m.ore) than expected. W.ould y.ou n.ot find y.ourselfback at the beginning, asking
questi.ons such as, Did the training pr.ogram achieve its .objectives? Did
participants learn anything? Did they actually apply what they learned when
they returned t.o their j.obs? The p.oint is that relying solely .on measuring the
results .of training, as we always have, may .only return us t.o the fundamental
questi.on: Was training effective?
A good place t.o start is by understanding the s.ource .of .our challenge-
s.omething called cultural bias. Culture, whether ethnic .or .organizati.onal,
nati.onal .or internati.onal, has always been an issue in training and its
evaluati.on. But as the W.orld bec.omes smaller, econ.omic activity m.ore gl.obal,
and the American w.orkforce more diverse, cultural influences and their impact
.on training and its evaluati.on are bec.oming m.ore n.oticeable, and m.ore
meaningful.
In his book Cultural Influences on Research Methods and Statistics, David

Matsumoto defines culture as the set of attitudes, values, beliefs, and behaviors
shared by a group of people, communicated from one generation to the next via
language or some other means of communication (Matsumoto, 1994, p. 4).
It is within this context that many organizations today find themselves

referring to cultural differences from an international or ethnic perspective. Yet
culture, in its purest sense, goes far beyond these two surface distinctions.
Those defining culture would agree that as American business acknowledges
the growing diversity ofits workforce, we don't need to leave the United States
to have an intercultural experience. From a domestic vantage point, cultural
differences manifest themselves on a regional, urban and suburban, even a
state-by-state basis. Cultural issues even permeate the work environment, as
organizations develop cultures that define the rules employees must follow to
survive and succeed. Race, religion, gender, disabilities, sexual orientation, and
age are just a few of the inherent differences that are increasingly being
recognized as cultural determinants.
There are many reasons members of various cultures develop different ways
of looking at and dealing with issues of human behavior. Like lenses on a pair
of glasses, members of one culture can look at, interpret, and draw entirely
different conclusions from the same event, depending upon which cultural lens
they are looking through. For example, singling out a particular workshop
participant and asking for his or her opinion on a point you have been trying to
make would be perfectly acceptable behavior in a culture that emphasizes
individualism. Yet, take the same situation and put it in another context, in a
cultural setting that emphasizes collectivism and the value of saving face, and
such an innocent act may very well produce unintentional embarrassment and
anger.
The point is that culture determines the way people see things, and not
everyone sees the same things the same way. In itself, culture is neither good
nor bad; it simply is. What is correct and proper behavior in one culture may be
considered incorrect and improper in another. Cultural differences are not, or at
least should not be, problems. What becomes problematic is when we interpret
behavior, pass jUdgment, or draw conclusions based on our cultural orientation
and take action(s) based on the stereotypes we have developed. Often, it is what
we do as a result of our cultural bias that creates difficulties for others, as well
as organizations, and even nations.
As HRD practitioners, we need to understand that culture has a tremendous

influence on the way people think, learn, and behave. The same logic that
encourages us to determine the various learning styles of participants in a
training program should also encourage us to identify the cultural issues that
may influence the way our audience receives and processes information,
perhaps even to the point of developing a cultural profile of a class and trying
to adapt evaluation tools accordingly.
The point is that we may not be able to change who we are or completely
separate ourselves from the way we see the world. Yet, as HRD professionals,
we can develop a greater awareness of and sensitivity to the issues of cultural
differences and how those differences influence the way we measure and
interpret behavior. When you consider that the ultimate objective of training is
to change behavior, even acknowledging the potential implications cultural
differences may have on that process becomes profoundly important. One place
those differences come to light dramatically is when we evaluate the
effectiveness of our training efforts.
Cultural Implications for Training Evaluation
Even the best work on culture may not fully express the complexity of this area
or the potential implications for its impact. Clearly, a new look at culture is
important because of the relative progress made in acknowledging and valuing
human differences. Perhaps unlike any other point in history, organizations are
beginning to legitimize the dialogue around human differences. There is an
emerging enthusiasm for greater insight and both greater knowledge and skill
in successfully engaging across cultures. As our world becomes smaller, this
ability is becoming more highly valued as a basic skill.
While dialogue regarding human differences and our abilities to value them
may be experiencing greater legitimacy, at the same time we are also
experiencing growing interest in evaluating the effectiveness of training.
Changing organizations are demanding clarity about the value-added of every
expenditure, and training is certainly no exception. Given that reality, what
would happen to training, and more importantly the value it delivers to an
organization, if the following cultural implications were true?
• Commonly held assumptions about what constitutes sound and effective

training or responses to effective training were not true for certain cultures.
• Cultural considerations could make or break even the most effective
training evaluation process and tools unless adjustments were made to
make both culturally sensitive and culturally specific.
• A one-size-fits-all approach to the design, delivery, and evaluation of a
training program would not yield the best results, if any, in a multicultural
setting.
• Without taking culture into account, we would not be able to accurately
measure or evaluate the impact of training or the value it adds to an
organization.
If our definitions and even one of the above cultural implications about
training are true (and we suspect they all are true), then at the very least, to be
effective, our tools, approach, and methodology for evaluating training must
take into account the context of culture and its implications. The question is
How do we do that?
Training Evaluation Within a Multicultural Context
One undeniable truth is that more and more business is being done today in an
increasingly complex, multicultural context, and that includes training. As
HRD practitioners, we rely on and make use of information in many different
ways, as well as form impressions and make judgments based on our own
background and experiences. Culture provides us with a framework within
which we do all of this.
Yet, what happens when multiple cultures are involved in training and its
evaluation? It is possible that what we, as trainers, perceive as good might
actually be considered bad by our participants? What we think is effective may
actually be ineffective? What we intend to be positive feedback may actually be
viewed as neutral or, worse, negative feedback? Actions that we believe indicate
great enthusiasm and learning may simply not mean that at all?
When evaluating training in a multicultural context, there are a number of

important issues to consider such as
• How authority is viewed and responded to

• The value of verbal versus written skills
• Language skills and pace
• Cultural values as they relate to evaluating self and others
• Validity of judgments based on behavioral observation
• Demographic and psychographic differences (e.g., race, age, gender,
religion)
What we don't lack are a selection of evaluation models. There are literally
dozens of ways to gather information regarding training's effectiveness. Yet,
whether we use Kirkpatrick's (1994) model, the Bell System Approach, CIRO
(context, input, reaction, outcome), or CIPP (content, input, process, product),
our ultimate goal remains the same. Regardless of the context in which training
has been delivered, the goal is to collect and analyze appropriate data to
determine the value and effectiveness of our efforts, and to make whatever
design or delivery changes and improvements necessary to gain maximum
benefit for the time and effort expended.
Even in a multicultural setting, accomplishing this goal boils down to

looking for answers to a series of fairly logical questions, such as
• To what extent did the training program achieve its performance and
learning objectives? Did it accomplish what it set out to do?
• How effective was the instructor in providing knowledge and facilitating
skill building?
• How effective were the activities in accomplishing specific performance or
learning goals?
• How much, if any, learning took place?
• At the end of the program, how confident were the participants that they
could apply what they learned in the classroom back on the job?
• What specific barriers or obstacles will participants face on the job that
may make it difficult or impossible for them to use their new skills?
• To what extent, if any, did participants actually use what they learned in
the classroom back on the job?
• What results, if any, were produced? Did behavior change, and did the
change produce the result(s) anticipated?
• Were the benefits, both quantitative and qualitative, worth the time, effort,
and resources?
• If you had the chance to do it again, but differently, what would you do and
why would it work?
Answers to these and other evaluatory questions come from many different
sources. Participants in the training session or program provide written and
verbal feedback using preprinted questionnaires or small group interviews and
discussions. As trainers, we form impressions through observation. At times,
we generate quantifiable data measuring the amount of learning and the results
produced when concepts and techniques are applied. Yet, regardless of
methodology, we need to be aware of the tremendous influence culture plays on
what is written, spoken, and observed, and more important, the meaning we
place on all this information.
Ignoring or discounting cultural influences on evaluation data almost

certainly produces misleading information, which, in turn, leads to incorrect
judgments and conclusions, and perhaps worse, inappropriate responses. What
happens when we ignore the potential for cultural bias? What are the possible
results of simply overlaying one set of cultural assumptions regarding what
constitutes good training over different cultural assumptions of our training
participants? Consider some of the following real-life examples.
You have just completed a three-day workshop in Singapore on conflict

resolution. You've delivered this workshop hundreds of times back in the
States. Unfortunately, jet lag, time change, different weather, and diet have all
taken their toll on your performance. While most of the workshop objectives
were met, many of the exercises designed to bring out individual styles of
dealing with conflict didn't work well. Worse yet, the one-on-one mediation
role-plays were a disaster!
In spite of all of this, scores on the reaction evaluations were the highest
you've ever seen. Average ratings on the conflict resolution styles exercise and
role-plays were both Very Good. In fact, there was not a single score below
Good. Surprisingly, participants rated your performance as an instructor as
Excellent, and many wrote they couldn't wait for your next workshop! What's
going on here?
Perhaps the answer lies in a single word-culture. Consider for a moment

the audience (Chinese) and the influence their culture might have on the way
they would complete a traditional western-style reaction evaluation
questionnaire. In our example, one likely explanation for the surprising ratings
might relate to cultural issues of
• Hierarchy, authority, or rank

• Face
In many Asian cultures, the influence of a Confucian tradition translates
into a heightened sensitivity to issues of hierarchy and respect paid to rank or
positions of authority. Employer/employee, parent/child, husband/wife,
senior/junior, and a myriad of other relationships all carry with them certain
expectations as to what constitutes proper behavior. In contrast, while a sense
of hierarchy also exists in the United States, its influence is curbed by an even
stronger emphasis on informality and individualism.
Contrary to popular belief, the notion of "face" is not a foreign concept to

most Americans. In fact, in the United States and a number of western cultures,
the way we see ourselves and are viewed by others is often referred to as self-
esteem. Yet, the importance we place on self-esteem pales in comparison with
value of preserving face in the Chinese, Spanish, Italian, and Middle Eastern
cultures. Here, the value of avoiding conflict and maintaining good
relationships cannot be overstated.
The issues of hierarchy and face come together in a dramatic fashion in an

instructor/student relationship. In a Chinese culture, for example, an instructor
would have more status because of his or her knowledge and position in the
hierarchy. This would be especially true if the instructor is older than the
participants. The instructor's position provides instant credibility and stature,
which would be difficult to question and painful to criticize. In fact, it would be
considered extremely disrespectful and cause considerable loss of face for any
student or participant in a training program to draw attention to the faults or
mistakes made by the instructor. The point is that for a culture that is not used
to drawing attention to mistakes, a less than Good rating on an evaluation

form, even when completed anonymously, may be considered an embarrassing
criticism that would cause the instructor and participant to lose face.
Impressions are also formed through observation. Here, again, culture can
play an important role in the way we, as trainers, interpret visual cues that help
us form judgments and draw conclusions about others' behavior. How can
misinterpreted cues in a multicultural setting lead to incorrect conclusions?
Let's look at another example.
You have incorporated a role-playing exercise into a training program

you're facilitating in Thailand on conducting performance appraisals. Nothing
new. Something you've done a hundred times before. In this exercise, you're
looking for a number of specific behaviors including how the manager delivers
praise and criticism to a direct report, as well as nonverbal communication
cues, such as proper eye contact, body language, gestures, and the like.
While no one likes to role-play, getting two participants to volunteer for the
exercise was nearly impossible. After trying every technique in the book, you
finally had to pick two people to play each role while others were to observe
their performance and provide feedback. Once the "volunteers" made it to the
front of the room, the role-play started off slowly and went downhill from there.
Delivering praise went fairly well, although you were surprised to hear the
"employee" respond with denial. Providing constructive criticism changed
little. Speech patterns were stiff and formal, and very little useful information
on specific performance behavior was discussed. After the exercise, class
feedback on the performance was amazingly supportive, even complimentary.
Clearly, most of the major points you made during class about how to conduct a
proper performance appraisal were missed, right?
Perhaps, but another explanation may have to do with the cultural dynamics
at play. In this exercise, a number of cultural influences are evident in both the
process as well as content. In addition to issues of hierarchy and face, there are
examples of
• Collectivism
• Self-efficacy
• Indirectness
• Risk aversion
In terms of cultural influences on the process (role-playing), consider for a

moment that in Thailand, as in most Asian cultures, India, and Saudi Arabia, a
great deal of value is placed on the group and the good of the whole
(collectivism). In fact, a person often defines self based on his or her
relationship to the group. Compare this with the American view that places
tremendous value on the rights, opinions, and values of an individual. Evidence
of this cultural dynamic at play may have been the general unwillingness of the
participants to take the initiative and volunteer for the role-playing exercise,
especially if it meant drawing attention to themselves away from the group.
The very nature of the exercise (role-playing) as well as its content

(performance appraisal) carry issues of face. We know that practice (role-play)
is a cornerstone for developing skills. At the same time, however, anytime you
ask someone to role-play, you invite criticism and possible embarrassment.
Therefore, singling out two participants, asking them to perform in front of the
group, and requesting their performance be evaluated by the rest of the class are
all in stark contrast with the desire to avoid embarrassment, protect others'
feelings, and avoid calling attention to mistakes. Even the participants' overall
performance may have reflected a deep-seated belief in maintaining harmony
and avoiding conflict (risk aversion).
Cultural influences may also have been at work in terms of the content. In
many cultures, formal performance reviews are not common. In our example,
when dealing with praise, a strong tradition of humility often translates itself
into self denial-a self-effacing behavior. On the other hand, when faced with
criticism, even when done constructively, issues of face arise again, and at least
the potential exists for causing permanent damage to the boss/subordinate
relationship by pointing out embarrassing mistakes. Finally, concern over not
offending anyone is often demonstrated by an indirect, almost ambiguous
communication style.
Imagine the frustration and tremendous potential for misinterpretation when

this role-playing exercise is observed by a trainer whose cultural orientation
values a direct, almost confrontational tell-it-the-way-it-is approach.
This doesn't mean that role-playing exercises and one-on-one performance

appraisals aren't possible in different cultures. What may be required, however,
is a realization that if our objective as trainers is to change behavior, then our
instructional design, training methods, and evaluation techniques must take
into account our audience and the set of attitudes, values, beliefs, and behaviors
they bring to the learning experience.
Consider how a culture that values indirectness (e.g., Chinese, Japanese,

Spanish, Saudi Arabia) might influence an attempt by an American trainer,
with his or her cultural values of directness, to evaluate how much a class
learned by trying to generate an end-of-program discussion of key points. In
this context, questions designed to stimulate a lively discussion may be greeted
by deafening silence. However, unlike in the United States, silence does not
necessarily mean ignorance. The truth may be that each person knows the
answer but is thinking about how to an.swer the question.
Using closed-end questions that generate simple yes-or-no responses may

produce even worse results. Individuals from cultures that value indirect
communication are extremely reluctant to say no. In fact, issues of face may
require a yes response-not meaning we agree, but simply we hear you.
While these examples of international training experiences illustrate the

potential for cultural bias, we don't need to leave our own backyard to
encounter similar experiences. Regionally, something as deceptively simple as
the rate of speech may be culturally interpreted to imply high or low
intelligence depending upon the cultural value and lens being worn. A trainer
from the northeast delivering a program in the south brings with him or her the
cultural value of fast-paced verbal delivery to a region valuing the slow
savoring of words, with time to think between sentences. Combine the different
cultural values in a classroom setting and you have the ingredients for an
interesting experience in cultural bias. For example, the instructor may
incorrectly draw the conclusion that the comparative slowness of the language
implies less intelligence.
Examples of domestic cultural differences can even be found in the various

functions within an organization or different industries. These differences can
be in every way as profound and perplexing as those found among people from
different national and ethnic backgrounds.
Often, individuals working in different functions or industries can be viewed

as literally speaking different languages. Words, imagery, and values are very
often different enough to create yet another opportunity for cultural bias. Take,
for example, a trainer, accustomed to working with retail sales people, trying to
interact with actuaries working in the financial services industry. There is a
good chance that the trainer will find a radically different vocabulary and set of
values, and therefore very different cultural requirements for delivering good
training.
The high-tech industry, known for its speed and innovation, would provide a
stark contrast to many insurance companies where speed and innovation may
not carry the same level of importance. Developing training that would be
effective in these environments, without taking into account the distinct
cultural differences each might require, would generate questionable results.
Other ways of viewing cultural implications in training from a domestic

perspective would include cultural perspectiveslbiases resulting from inherent
differences in religion, age, gender, sexual orientation, disability, and race.
These differences may also blend into other cultural considerations, and are
becoming more and more a topic of our changing workforce. The language,
pace, use of case studies, examples, and methodology applied even in the best
of training efforts may not be effective, appropriate, or valued by a participant
who happens to be culturally different in any number of ways.
From almost any cultural perspective, the training experience should be a

dynamic and multifaceted process. To ensure an experience rich in relevant
learning and insight, particularly for the adult learner, we have learned that a
variety of approaches are needed. Role playing exercises and various activities
infuse the experience with a range of learning opportunities. Training delivery
and evaluation should be deliberately designed to achieve a desired and positive
outcome. Our selection of exercises, tools, activities, timing, and even how we
carry out our roles as trainers and facilitators may demonstrate, even
unconsciously, a strong bias for our particular culture.
Given this reality, the dramatically growing diversity and international

scope of our organizations, and the increased emphasis on delivering effective
training-training that produces results-there are a multitude of compelling
reasons for taking into account the impact of culture on our delivery
assumptions and the methodology we choose for evaluating our training efforts.
Make no mistake about it; cultural differences can dramatically influence our
judgment as HRD practitioners and the conclusions we draw even from a
seemingly straightforward attempt to evaluate the training we deliver.
Considerations for Optimizing Effectiveness in the

Evaluation of Training Across Cultures
As business and industry become more global, so will training and its
evaluation. In this context, cultural issues, which have always been present but
largely ignored, now take on much more meaning as their influence on training
design, delivery, and evaluation become more transparent and understood.
While there may be a number of behaviors that are similar across different
cultures, there are also an equal number of behaviors that are unique to a
specific culture. This mosaic of similarities and differences carries with it
tremendous opportunities, challenges, and implications for training and its
evaluation.
As you might expect when dealing with such a complex issue as culture and
its impact on behavior in training, there is no one simple answer. Yet, in
practical terms, there are a number of actions you can take in response. What
we offer is a menu of ideas and actions compiled from our own personal HRD
experiences and those of our colleagues in dealing with this marvelously
complex and rich tapestry of multiculturalism in training and development.
One fundamental yet critical step in the process is acknowledging the fact
that important cultural differences do exist, differences that can affect the
effectiveness of your training and evaluation efforts. Equally important is
recognizing that your personal cultural biases influence your instructional
design and evaluation.
These two steps begin the process of integrating a cross-cultural perspective

into your up-front analysis and preparation. At this point in your instructional
design process, take the time needed to find out the cultural mix of the group
you will be working with, and incorporate differences in attitudes and values
into your training design and evaluation processes and tools. While some may
think of this as an extra step in an already laborious design process, it is a step
that is as important, perhaps even more so, as discovering a group's particular

learning styles and preferences.
Once you have reached this point, it is important to work closely with
individuals who share the same cultural values and attitudes as those of your
target audience. At the very least, you should consider working with a cultural
expert or coach. Foreign university students taking English as a second
language are often an excellent and cost-effective source for finding such
experts and coaches. The purpose in doing this is to have someone review and
suggest modifications to your evaluation approach and tools-someone who
wears the same cultural lenses as the group you intend to work with. Not only
does this improve the chances that the intended meaning of words found in
your written questionnaire(s) and tools are carried over, but it also increases the
likelihood that you will actually measure what you want to measure. What
comes out the other end of all this activity is a more culturally appropriate and,
consequently, more effective training and evaluation process.
Keep in mind, however, that translating written evaluation tools is not

enough. Certain words, phrases and cliches used in American English, such as
brainstorm, the bottom line, or up in the air, do not carry the same meaning in
other languages. This is especially true when dealing with words that carry
subjective qualities. Could it be possible that words such as excellent or very
good might not carry the same meaning in other cultural contexts? Probably.
The point is that it may be necessary to define those terms to increase the odds
that the words used in your evaluation tools carry the same sense and meaning
you intend. Above all, it is important that you always keep the objectives of
conducting an evaluation in front of you. In trying to modify your evaluation
tools for cultural differences, don't lose sight of the very purpose of doing it in
the first place.
Once you find yourself implementing your instructional design in a

multicultural setting, there are a number of actions you can take that will
improve the effectiveness of your efforts. For example, when faced with the
challenge of dealing with issues such as group orientation (collectivism), face,
and indirectness, why not take advantage of the situation by incorporating study
teams and other group activities into your delivery and evaluation process?
Always try to assign tasks to teams rather than individuals, remembering that
in group-oriented cultures, value comes from the opinions of the group over an
individual's opinions and values. At the same time, keep in mind that
providing feedback on performance is directly related to issues of face. The

more you can use the group to provide feedback, both positive and negative, the
better. Divide participants into small groups. Ask each group to elect a
spokesperson who will share with you his or her group's collective thoughts
and opinions on what went well and not so well. The point is that it is often
better to ask for feedback in a more indirect manner.
As accomplished HRD practitioners, we all pride ourselves on incorporating

high levels of participation and activity in all of our training efforts. Yet, in
multicultural settings it is important to keep in mind that any time you provide
verbal instructions (e.g., completing an exercise or reaction evaluation
questionnaire), you should not assume that what you meant to say was actually
understood. This is especially true when working with participants for whom
English is a second language. Even when it appears that the level of spoken
English is pretty good, it is not always wise to operate under the same
assumption when it comes to comprehension. Although at times it may feel
awkward, it is often necessary to slow the rate of your own speech, and be
certain to ask questions that test for understanding.
When the time comes for evaluating the results of your training efforts, you
can take a number of simple yet effective actions to improve the quality of the
information you receive, especially when using written instruments. Beyond the
more obvious need(s) for translation and using words and phrases that carry the
intended meaning, using reaction evaluation questionnaires to determine if the
program met its objectives, or paper-and-pencil examinations to measure the
amount of learning that took place can present special challenges. For example,
there are situations when using paper-and-pencil examinations to test for
knowledge may not be well received in some cultures that would consider such
an exercise childish and, therefore, insulting to them as adults.
Yet, even in multicultural settings, written evaluation instruments can play

an important role in the evaluation process as long as your verbal and written
instructions are clear, you give participants plenty of time to complete their
work, and ensure individual anonymity-remember face. One way to do this is
to ask participants to code their work by recording their identification number
on the top of the questionnaire or exam. This allows you to collect the
instruments, record the data, and report group, not individual, scores. Not only
does this save face, it also improves the reliability of your feedback.
When evaluating role-playing exercises in a multicultural setting, it is

important to be much more sensitive to issues of face and self-esteem. Rather
than asking What went wrong?, consider rephrasing the question along the
lines of What went right, and why? or If I could repeat the role-playing
exercise, what would I do differently, and why?
Once you have evaluation data in hand, you need to be sensitive to your own
attitudes, values, and beliefs and how they influence your interpretation of both
written information and observed behavior. This is especially important when
using your own norms for making judgments of others' behavior. An
emotional, warm, vocal approach may be entirely acceptable in the United
States, but completely inappropriate in a culture that places a premium on
controlling emotions.
When it comes to making judgments based on observed behavior, accept the

fact that there may be no right or wrong behavior when evaluating training
effectiveness across cultures. At the very least, be aware that your concepts of
correct behavior may, in fact, be entirely different from those shared by your
participants, and take this into account when drawing conclusions. If you find
yourself delivering training in an especially complicated multicultural setting,
one where you are dealing with participants from a number of different cultural
backgrounds, it may even be wise to reserve evaluation until you have had an
opportunity to learn more about those cultures and how various attitudes and
values interact. In situations like these it's always important to be open and
willing to discover the wonderful nuances of culture, not only the cultures of
others, but even your own.
If there is only one idea you walk away with after reading this chapter, we
hope it is this. When working in a multicultural training environment, above
all else, be open to a different way of being. It may help to keep in mind that
the goal of evaluation in a multicultural context is to find a method(s) that is
acceptable to the predominant culture of the participants you are working with
and provides the answers you need to determine effectiveness.
Culture is not about being right or wrong. It is about values, norms, a

learned way of viewing and making sense of the world. As HRD professionals
operating in increasing multicultural learning environments, flexibility and
adaptability are key to our effectiveness. It is not a question of extremes but of
the acknowledgment of potentially different points of view. It is looking at the
same issue differently and enjoying the amazing insights different perspectives
can provide. It is also about being yourself but increasing your awareness that
there are differences, which can be positive when recognized as opportunities to
learn, not just inform.
Personal Communication.
References
Cox, T. (1993). Cultural diversity in organizations: Theory. research and practice. San
Francisco: Berett-Koehler Publishers.
Brislin, R. (1993). Understanding culture's influence on behavior. Philadelphia:

Harcourt Brace College Publishers.
Deal, T., Kennedy, E., & Allan, A. (1982). Corporate cultures: The rites and rituals of
cO!:porate life. Reading, MA: Addison-Wesley.
Jackson, S. E. (1992). Diversity in the workplace: Human resource initiatives. New

York: The Guiford Press.
Barrett-Koehler.
Matsumoto, D. (1994). Cultural influences on research methods and statistics. Pacific

Grove, CA: Brooks/Cole Publishing Company.
About the Authors
Sadie Burton-Goss is a writer, speaker, trainer, and management consultant.

For almost twenty years now she has worked with organizations in the areas of
change management, organization development, human resource management,
diversity, leadership and planning. Her experiences as an internal consultant
and manager in five different industries, and as an external consultant to over
fifty different organizations inform her work and perspective in both broad and
specific ways. An urgency for the development of organizations as dynamically
inclusive, high-performing systems, maximizing total human potential is a core
theme throughout her work. She is a popular presenter on the national
conference circuit with topics Training Managers to Manage Diversity, The
Affirmative Action/Diversity Debate, and The Difference You Make. Her
writings include articles and chapters in human resource development and
training publications, such as Managing Diversity Is Simply Good Business,
The Elements of Competence, and Tools and Activities for a Diverse Workforce
for The American Society for Training and Development, and published by
McGraw-Hill. Sadie received her B.A. from Wellesley College and a Master's
degree in Management from Lesley College School of Management.
Michael D. Kaska is an HRD practitioner, performance consultant, and

manager, with over fifteen years of experience in the field of adult education
and training in both the United States and Southeast Asia. As General Director
of Marketing and Training for a large, international insurer, Mike has spent the
last eight years working on a myriad of strategic and functional HRD issues in
China, Vietnam, Thailand, the Philippines, Malaysia, Singapore, and
Indonesia. The ability to transform knowledge into practical skills as one of the
six key elements in performance improvement has been a major theme in the
dozens of HRD curricula Mike has designed and hundreds of workshops and
seminars he has delivered throughout the Pacific Rim. He earned his
undergraduate degree in education from Aquinas College and is completing
requirements for his masters degree in Training and Development from Lesley
College.
13 IMPACT OF TECHNOLOGY ON
TRAINING EVALUATION
Hallie Ephron Touger
Introduction
Technology is changing how training is delivered and, in turn, changing how it

is evaluated.
As work itself becomes increasingly computer-mediated, the training and

information that support the work are being digitized and delivered
electronically. More and more of today's workers have desktop pes,
workstations, terminals, or television devices that connect to a network.
Together, these provide an infrastructure for delivering technology-based
training. With this infrastructure, training occurs more and more often on the
job. As such, it can become more aligned with and embedded in the work itself.
As training and job performance merge into a single continuum, the logical
consequence is that the way we think about and perform training evaluation
changes as well.
This chapter discusses the impact of technology on evaluation. First the

chapter provides a summary of how technology is changing the way training is
delivered; then it examines the reasons why businesses embrace the new
technologies; and finally, it discusses how technology affects the way
businesses think about and perform the evaluation of training.
280 Impact of Tecbnology on Training Evaluation
How Technology Is Changing Training Delivery
By far the lion's sbare of training in corporations continues to be instructor-

delivered. Tbe results of a 1996 survey of 146 Fortune 1000 companies sbown
in Table 13-1 illustrate this point. Tbe survey found that 68 percent of all
training hours were delivered by instructors; 16 percent were delivered by
multimedia; and 16 percent were self-paced, video, manual, or on-the-job
(Multimedia Training, 1996).
Table 13-1. Training delivery methods in corporations based on 1996 survey

Method ofTraininx Delivery Percent
Ins tructors 68
Multimedia 16
Self-paced, video, manual, or on-the job mentoring 16
In a 1996 industry survey, Training magazine reported that fewer than balf
of the companies surveyed-only 37 percent-report that they use computer-
based training (Industry Report, 1996).
Tecbnology is used more often to deliver tecbnology-based content in large

companies, and more often in companies that create technology-based products
and services. A 1996 industry report found that in organizations with more
than 10,000 employees, an average of 25 percent of IT courses are delivered
without buman instructors (Industry Report, 1996). Some computer companies
bave made an even larger sbift towards technology-driven training. For
example, in three years, Apple Computer replaced 85 percent of its classroom
training with just-in-time multimedia learning platforms (Caudron, 1996).
Multimedia Computer-based Training and CD-ROMs
Learning tecbnologies that can deliver multimedia, interactive training on

desktop personal computers bave been available for more than a decade. Until
recently, authoring tools bave been the primary tools for creating this training.
Course developers need programming skills to effectively use these
sopbisticated, complex tools.
Computer-based training software takes longer to develop than slides and

text-based handouts. Once created, most computer-based training is
cumbersome to modify. And once modified, additional effort is required to
update outdated versions. These drawbacks have made computer-based training
slow to catch on and become used widely.
CD-ROM has been considered the ultimate computer-based, multimedia

training delivery medium. But CD-ROM has its disadvantages. CD-ROM is not
very effective in giving employees information that has to be updated. CD-
ROMs must be developed, pressed, and distributed before employees can take
advantage of them. And the user must have access to a desktop machine with
the appropriate operating system and a CD drive. While CD drives have
become standard in new PCs, the home user often has a better computing
infrastructure than most companies provide for their employees. This occurs
because cost-conscious businesses try to wring the last benefits out of aging
computing infrastructures. They are wary of replacing desktop systems with
new ones that will themselves soon become obsolete.
Electronic Performance Support
In parallel with multimedia, computer-based training, electronic performance

support systems (EPSS) have become popular (Gery, 1991). These are computer
applications that provide on-demand, task-specific information and skills
training. A successful EPSS puts the right tools and information at the
fingertips of workers when they need them.
Some electronic performance support systems are built into a company's

proprietary software and are indistinguishable from the user interface. Others
provide online documentation or context-sensitive help. Others are
freestanding, searchable, hypertext resources.
EPSS has been far less pervasive than multimedia, computer-based training.
A 1996 report found 15 percent of organizations larger than 10,000 employees,
and 5 percent of smaller organizations, had EPS systems (Industry report,
1996,79).
282 Impact of Technology on Training Evaluation
Network Technology Merges Interactive Multimedia and

EPSS
The same networks that provide the backbone for modem business computing,
allowing companies to create clientJserver business applications, are being
leveraged to provide training and information. The Internet and private
corporate networks or intranets provide a medium for training anywhere from
10 to 10 million people.
Network technology provides the tools for merging interactive multimedia

training and electronic performance support systems while simplifying the
authoring skills required and delivering easily updatable information from a
single source across virtually all desktop platforms.
The requisites for these technologies are desktop computers or terminals and
a computer network. In 1996, more than 60 percent of all business computers
attach to local networks (Edwards, 1996).
Widely available and virtually free, Web browsers, such as Netscape and
Internet Explorer, provide a universal desktop interface. The hypertext mark-up
language, HTML, provides the tool for quickly and easily creating corporate
information and training that can be viewed on any platform. Developers no
longer have to worry about what kind of computer each of the trainees is using,
how much RAM they have installed, whether or not they have a CD-ROM
drive, or what version of the operating system they are using.
Initial training that used this technology was simply text and graphics with
embedded hyperlinks. But, with the advent of audio and video streaming-
technologies that play multimedia as it is being downloaded-and the
development of tools for creating animations and interactivity, the basic
elements for building effective, multimedia training are available. However, the
capability to deliver high-quality multimedia over a network is still in its
infancy. Bandwidth continues to be the major stumbling block, even on
corporate intranets with dedicated higher speed lines. Planners may have to
abandon an instruction ally sound multimedia design because large graphics
and photo images, video, and audio take too long to download.
The so-called "hybrid CD" offers a solution. A hybrid CD is a CD-ROM that

integrates online components. For example, the CD-ROM might deliver
bandwidth-intensive multimedia components of training along with a user

interface that integrates the multimedia components with links to related Web
sites, Web-based databases, and chat groups. This hybrid solution has
enormous possibilities for training delivery.
Vendors have updated authoring tools as demand for just-in-time training

calls for networked delivery systems. A survey, conducted in 1996 by CBT
Solutions of Hingham, Massachusetts, and described in New Media (Violette,
1996), reported that 60 percent of 1,024 CBT providers said they deliver or
plan to deliver training via Internet, satellite links, LANs, WANs, or other
network connection. The percentage is even higher for companies with 5,000 or
more employees.
The use of network infrastructure to deliver training is growing. While in

1995, only 9 percent of companies reported they were delivering distance
learning by computer, a year later 33 percent of companies with over 10,000
employees, and 22 percent of smaller ones, reported using the network to
deliver training (Industry Report, 1995,62; Industry Report, 1996, 77).
Delivering and Distributing Training with the Network
With these new technologies, the network can become the distribution
mechanism, and the browser the delivery mechanism for training and
penormancesupport.
Network technology allows companies to eliminate the costs of

manufacturing and distributing training. Creating training for distribution on a
network does not require training authors to learn complex authoring software.
Most word processing software can output HTML as a standard format and
include the capability to linking in standard audio, video, and animations.
Thus, virtually anyone with access to the Internet and basic word processing
tools can create Web-based training and information and distribute it. Learners
become not only consumers of information and training but creators and
publishers as well.
This shift puts the trainee rather than an instructor in the driver's seat. The
trainee picks and chooses training and information modules as needed. He or
she selects and constructs training and information to meet perceived needs. It
also becomes possible to treat training modules as data objects and store them
in a network-accessible database. Then, based on the results of a pretest or
interest inventory, for example, automatically deliver appropriate training
modules to the learner.
In addition, network technology brings a new dimension to technology-

based training: interactive communication. Video teleconferencing, e-mail,
threaded discussions, and chat can be combined to create Web sites that are
virtual classrooms. These technologies enable communication among trainees,
instructors, and subject-matter experts. Threaded discussions and e-mail
provide ways of asynchronous communication. This is especially useful when
trainees are distributed globally in different time zones. Chat and video
teleconferencing add the interactivity of synchronous communication in real
time.
Security remains an issue for distributing Web-based training. Managers

often want to track training usage and administer tests. Confidentiality requires
that usage records, tests, and test results be maintained in a secure
environment. At this writing, even within a secure corporate Intranet, plug-
and-play tools for creating a solid security infrastructure are not available.
Why Companies Embrace the New Technology
Companies recognize that job requirements are rapidly changing. Furthermore,

they believe that classroom training is expensive, ineffective, and takes people
away from their jobs. Faced with competitive pressures, these companies are
looking for ways to cut costs and gain a competitive advantage. As product life
cycles have been getting shorter, more frequent training must occur just to keep
up, while less and less time is available for training. They are looking for
training that is more effective, more efficient, and less costly. Technology-
driven training is seen as providing precisely these advantages.
Effectiveness
Several studies demonstrate that traditional training does not result in adequate
behavior change (Tannenbaum and YukI, 1992; Baldwin and Ford, 1988;
Broad and Newstrom, 1992). Only about 10 to 20 percent of trainees who take
traditional corporate training report actually using the training in their jobs.
In their summary of the benefits of multimedia for adult learning, Ragan

and Smith (1992) found that compared to conventional forms of instructional
delivery, multimedia is as effective and often more effective. Greater
effectiveness seems to result from higher levels of interactivity, along with
some degree of structure and advice to guide the learning.
Brinkerhoff and Gill (1994) suggest that traditional training is flawed

because too many trainees end up taking training they don't need or receive
training too early or too late to be useful. When training is available on demand
from the desktop, people who need it can dip into it whenever they feel they
need to rather than waiting for a scheduled class. This explains why companies
cite learner control as the major advantage of multimedia-based training
(Multimedia Training, 1996).
Efficiency
Well-designed multimedia computer-based training appears to be more efficient

than traditional training. Ragan and Smith (1992) found that compared to
conventional forms of instructional delivery, multimedia is approximately 30
percent more efficient than conventional forms of instructional delivery. Add to
this the elimination of travel time, time off the job, and time spent in unneeded
training, and it is not surprising that companies in the business of developing
computer-based training argue that the compression rate of instructor-led to
computer-based training is as high as three to one, with three hours of
classroom instruction compressing to one hour of computer-based training
(Wolff, 1995).
Companies who have shifted from instructor-delivered training to

multimedia, computer-based training testify to efficiency gains. For example,
when Andersen Consulting replaced an instructor-led six-week basic business
practices course for new employees with multimedia CD-ROM training, they
reported nearly a 40 percent drop in the time it took for their employees to learn
basic business practices (Caudron, 1996).
Cost to Create
Generally speaking, technology-driven training is more expensive to create

than traditional instructor-led training. Cost savings associated with using
technology accumulate over time in reuse.
A simple example demonstrates this. Suppose that 24 hours of Web-based

training costs $150,000 to create. Forty hours of instructor-led training would
be needed to cover comparable content, at a cost of $50,000. For purposes of
this example, assume that a technical infrastructure-network, desktop
systems, servers, and browsers-is already in place. Also, assume a class size of
twenty and that half of the trainees need to travel to training.
Initially, Web-based training appears to be far more costly. However, every

time the instructor-led training is delivered, it costs an additional $13,000, as
shown in Table 13-2.
Table 13-2. Cost comparison: Instructor-led training versus Web-based training
Costs-Instructor- Costs-Web-
Led Training Based Training
Course Development $ 50,000 $150,000
Travel & Expenses Per Off-site Trainee $ 1,000 0
Traininf? Materials per Trainee $ 50 0
Instructor Per Session $ 2,000 0
Total Cost to Train: 20 $ 63,000 $150,000
40 $ 76,000
60 $ 99,000
... ...
140 $ 151,000
500 $ 375,000
1000 $ 700,000 $150,000
As the number of trainees who require the same training increases, Web-
based training becomes more cost-effective. In the example, instructor-led
training is far more cost-effective for fewer than 100 trainees; the break-even
point, where instructor-led and Web-based training cost the same, is at 140
trainees. Dramatic cost savings can be achieved for 500 or 1000 trainees with
Web-based training.
This example is intended to demonstrate that while technology-driven

training is more expensive to create in the ftrst place, it can be far more cost-
effective if it is delivered to a large, distributed population. Each opportunity
for training should be assessed on a case-by-case basis to determine the more
cost-effective method. Additional cost factors such as time away from the job
for participants, system maintenance, hardware and software upgrades, content
updates, and so on, should be included.
The example also demonstrates that greater risk is associated with the
development of technology-driven training. This is because it requires a
greater, up-front commitment of corporate resources. Abrupt changes of
directions, not uncommon in corporate management, can render just-completed
training instantly obsolete.
How Technology Affects Training Evaluation
Training evaluation is not a single activity. It is a series of checks, from design

through deployment, that measure whether training effectively meets its stated
goals and results in behavior that promotes the business goals it supports.
The focus of training evaluation shifts. Evaluation during planning focuses

on business and training needs. Evaluation during development focuses on the
training itself. During deployment, focus shifts ftrst to evaluating the trainees
and then to evaluating progress towards realizing business outcomes.
Figure 13-1 shows the typical kinds of activities that need to occur to
evaluate technOlogy-driven training. Notice that the entire process, from
planning to deployment, is bracketed by business objectives and measurable
business outcomes. Training objectives and goals should devolve from business
objectives; evaluating the success of training requires assessing the business
outcomes that were linked to those business objectives.
Evaluation During Planning
There are four evaluation activities needed during planning for technology-
driven training: business case definition, needs assessment, requirements
definition, and cost justification.
First, working with management, planners build a business case for the
training that articulates the measurable business benefits that will result. Then,
planners perform a needs assessment. So far, these activities, are no different
from those that occur when planning traditional training.
Define business objectives

Planning
• Business case definition
• Needs assessment
• Requirements definition
• Cost justification
Development
• Prototype testing
• Pilot testing
Deployment
• Usage tracking
• User feedback
.
• Testing, where appropriate
• Business measures assessment
Measure business outcomes
Figure 13-1. Evaluation activities for technology-driven training
Requirements definition and cost justification are gatekeeping evaluation

activities for technology-driven training. Their results determine whether or not
a technology-driven solution is feasible and cost effective.
Business Case
Training planners work with management to build a business case for training.
Planning begins with the recognition that, in order to meet specific business
objectives, training is required. A business case answers three key questions:
1. What will the training do?

2. What makes the training strategic?
3. What is the payoff in terms of measurable business outcomes?
Payoffs are defined as specific outcomes that result in benefits to the
business, such as increased profits, decreased costs, increased customer
satisfaction, or reduced errors. A payoff is stated in objective, measurable terms
so that business results that the training is designed to achieve can be measured
during deployment.
Needs Assessment
Needs assessment includes activities such as audience analysis, content

analysis, and task analysis. Together with the measurable business results
targeted by the business case, the information gathered during needs
assessment establishes the basis for defining training goals and objectives.
Requirements Definition
Requirements definition provides information to determine whether a

technology-driven solution is feasible. During requirements definition, training
planners identify the required technical infrastructure. Table 13-3 provides an
example of the requirements that need to be defined for Web-based training.
Once requirements are defined, technical support personnel must be

consulted to determine whether the existing technical infrastructure can meet
those requirements, and if not, what changes are needed. Experts in developing
technology-based training must also be consulted to determine the feasibility
and cost associated with developing training that meets those requirements.
This often becomes an iterative process. For example, when planners discover
most of the trainees will connect to training by modem, functionality might be

limited to less bandwidth-intensive media.
Table 13-3. Example of requirements to be defined for Web-based training

Requirement Examples
Functionality Define what the training software will need to do:
• Present text, graphics, multimedia?
• Access local or remote database?
• Provide a simulation?
• Track usage?
• Test?
• Connect to e-mail?
• Provide a threaded discussion?
• Provide video conferencing?
Trainees Determine who will need to be trained:
• Number of trainees?
• Location of trainees?
• Growth over time?
• Turnover rate?
Access methods Determine how trainees will access the training:
• Modem?
• LANIWAN?
• Dedicated line?
Usage Anticipate expected usage:
• Peak and off-peak times?
• Number of simultaneous users at peak and off-peak times?
• Most frequently accessed portions?
Performance Determine the minimum acceptable response time:
• Time to log into the training?
• Time to access lessons?
• Time to download media?
Security Define the level of security needed:
• Password authorization?
• Restricted access to functionality and data for specific
user groups?
• Encryption?
Management Anticipate ongoing management needed during deployment:
• Version control?
• Network and system support?
• Frequency of software updates?
Cost Justification
A final evaluation activity during planning is cost justification. Planners

determine whether a technology-based solution is cost-effective. To do this,
they build and compare budgets for creating and deploying training that is
technology-based versus traditional. Planners can create a table similar to Table
13-2 to compare costs.
Evaluation Activities During Development
Evaluation of the training during development is critical for technology-based

training. While an instructor can modify ineffective instruction on the fly, once
deployed, technology-based training must stand on its own.
Technology-based training should be evaluated at least twice during

development. Early in development, a working prototype should be tried out
with members of the target audience. Before deployment, the training should be
rolled out to a subset of the target audience for piloting.
Prototype Testing
Prototype testing focuses on two areas: usability and instructional effectiveness.

Prototype testing is an in-depth process conducted with a representative
sampling of the intended audience using a representative sampling of the
training.
Without interfering or assisting, an evaluator observes trainees, one on one,

going through a prototype. The evaluator notes the trainees' reactions and any
problems encountered and conducts a follow-up interview and skills
assessment.
Here are some aspects of usability that evaluators observe:
• Ease of access: Is it easy to get into the training? Is it easy to exit? Is it

easy to get back to a specific spot upon reentry?
• Ease of navigation: Do trainees know where they are in the training at

all times? Is it easy to move around within the training? Is it easy to find
specific information?
• Response time: Does the user find the response time acceptable?
Evaluators interview trainees to get their subjective response to the training.
Here are some questions an interviewer might ask:
• Did the lesson hold your interest?

• Was the content appropriate in terms of detail and difficulty?
• Was the presentation clear?
Finally, evaluators assess the effectiveness of the training. Trainees can be
asked to explain or answer questions about the content covered in the prototype.
Better yet, trainees can be asked to perform the skill or skills that were the
focus of the prototype training.
The results of prototype testing are used to modify the training design. For
example, a confusing user interface may need to be refined. Additional practice
exercises may need to be added. Lessons might need to be shortened to match
user attention span.
Because prototype testing occurs early in development, changes can be

implemented without affecting budget or delivery date.
Pilot Testing
Pilot testing is done before deployment. The kinds of major changes that were
possible after prototype testing are no longer feasible. Thus pilot testing-a
preliminary roll out of the training to a subset of the target audience-focuses
on ensuring that the training will function smoothly once it is deployed.
Pilot testing requires a method for testers to report problems and a method
for the problems to be logged and corrected. Courseware may need to be revised
and released several times during piloting before it is ready to be deployed.
Evaluation Activities During Deployment
During deployment, the focus of evaluation shifts from training to trainee, and
from trainee to business outcomes. New concerns: Are people who need
training accessing it? Are trainees applying the skills they learn? Is the
organization making progress towards meeting the specific business objectives
the training was designed to support?
Technology makes it easier to measure some aspects of trainee performance

such as attitude, retention of stated leaming objectives, and training usage.
Attitude surveys and tests can be administered electronically. Usage tracking
capabilities can be built in. During deployment, the data gathered can be used
to monitor training usage.
The challenge for management becomes what to do with the vast amounts of
data that potentially can be collected. If training usage drops off, it may be
because potential users perceive it as useless. Or it may be because it is not
currently needed. Poor survey ratings may indicate that training is ineffective.
On the other hand, poor survey ratings would also result if training is being
administered to the wrong audience. Simply tracking these kinds of training
outcomes does not, by itself, provide training evaluation.
Surveys, tests, and usage reports have other drawbacks as well. First, they
break down when there is no longer a discrete course or curriculum that gets
covered. And second, they fail to measure skills transfer or impact on the
organization.
Content available to trainees expands as network technology makes it easier

for workers and managers to create and publish their own content, to subscribe
to any content accessible to their network browser, and extend even further the
content available to them using tools such as e-mail, threaded discussions, and
chat. This additional ad hoc content usually had no period of evaluation before
release.
Network technology opens a Pandora's box by putting the tools and

infrastructure for delivering information and training in the hands of its
traditional recipients. For better or for worse, training moves out of the
exclusive province of trainers. The locus of control for determining what
training a particular individual needs shifts from instructor to leamer. Training
becomes a much more organic, systemwide process. Training becomes a

resource rather than an intervention. As information and training on intranets
and the Internet proliferate, it becomes harder and harder to say, definitively,
"This is the training." And answering the question, ''Was it the training?"
becomes even more difficult as the effects of training are intertwined more than
ever with the effects of other aspects of the work environment, such as work
conditions, management, policies and procedures, and compensation systems.
Evaluating training must go beyond preconceived tests and surveys when

much of what ends 'up as training is not preconceived. The net effect of this is
that, as shown in Figure 13-2, as training becomes more and more integrated
with the job, training evaluation becomes more and more integrated with
management.
Figure 13-2. The effect of technology on training and evaluation

The focus of evaluation must shift from discrete learning outcomes, which
have been the focus of traditional training evaluation, to the business outcomes
that have been the focus of traditional management. The value of training is
measured using the same tangible evidence that managers use to measure
business results.
The question "Was it the training?" must be set aside as unanswerable,

except in a research context. From a practical point of view, the question
becomes: "Is training resulting in behaviors that are aligned with achieving
business results?" The answer to this question lies in measuring the business
outcomes that were identified when training was planned.
References
Brinkerhoff, R.O., and Gill, S.J. (1994). The learning alliance: Systems thinking in
human resource development. San Francisco: Jossey-Bass.
Caudron, S. (1996). Wake up to new learning technologies. Training & Development.

50 (5), 30-35.
Dixon, N. M. (1996). New routes to evaluation. Training & Development. 50 (5), 82-
85.
Baldwin, T. T., and Ford, S. K. (1988). Transfer of training: A preview and directions
for future research. Personnel Psychology. 41 (1),63-105.
Broad, M., and Newstrom, J. (1992). Transfer of training, San Francisco: Addison-
Wesley.
Edwards, R. (1996, January 12). Software: Land of 1.000 niches -- Part II: Global
network computing software investment opportunities and pitfalls as computing
becomes communicating. San Francisco: Robertson, Stephens & Company
Institutional Research.
Gery, G. G. (1991). Electronic Support Systems. Boston: Weingarten Publications.
Industry report 1996. (1996). Training. 33 (10), 36-79.
Industry report 1995. (1995). Training, 32 (10), 37-64.

Multimedia training in the Fortune 1000: A special report on the status of multimedia-
based training in the nation's largest companies. Training, 33 (9), 53-60.
Ragan, T. J., and Smith, P. L. (1992). University of Oklahoma-Instructional

effectiveness study. Atlanta, GA: Skill Dynamics.
Tannenbaum, S., and YukI, G. (1992). Training and development in work organizations.
Annual Review of Psychology 43, 399-441.
Violette, J. (1996, October 7). Putting interactive training online. New Media, 6,46.
Wolff, R. (1995, NovemberlDecember). CBT vs. ll..T: Cost comparison model. CBT
Solutions, 14-16.
About the Author
Hallie Ephron Touger has her own consulting business based in Milton,
Massachusetts. She works with corporate clients to write Web-based and
multimedia information and training. She is a partner in Quarto Interactive, a
Boston-based new media design and consulting company. She was a project
manager at Oakes Interactive, one of the country's leading multimedia
development firms, and a senior consultant at Digital Equipment Corporation
where she developed guidelines and procedures for evaluating training. She
holds a Ph,D. in educational research, measurement, and evaluation from
Boston College.
14 DESIGN-TEAM
PERFORMANCE: METRICS AND THE
IMPACT OF TECHNOLOGY
Larry Leifer
Introduction
For more than fifteen years the Stanford University Center for Design Research
has been concerned with the formal study of engineering product development
teams at work in academic and corporate settings. This chapter describes one
such effort, the implementation of team-based, distance-learning techniques in
a Stanford University School of Engineering course, Team-Based Design-
Development with Corporate Partners. The course is distributed nationally by
the Stanford Instructional Television Network for on-campus, full-time students
and off-campus, industry-based part-time students.
Applied ethnography methods like video interaction analysis have been used
to reveal the detail and pattern of activity of design teams. Computational
support systems have been developed to facilitate their work. Throughout these
studies we have seen that their activity closely resembles the most attractive
aspects of self-paced education described by constructivist learning theorists.
The design environment has been instrumented to see if technical and
behavioral interventions did, in fact, improve performance. It is this learning
validation phase of our work, and its dependence on technology, both for
instructional delivery and for assessing performance outcomes, that we wish to
share with you in this chapter.
298 Design-team Performance: Metrics and the Impact of Technology
To put the findings in context, we must first introduce the pedagogic

framework that guides our value judgments. Hence, we introduce the concept
and practice of Product-Based-Learning (PBL) and the feedback
instrumentation model we use for assessing PBL performance. Within this
framework, the discussion will focus on technology-specific issues, the
rationale for technical intervention, and a sampling of performance assessment
results.
The Context of Engineering Design Education
Economic and technologic forces are changing the landscape of engineering.

There is an increasing need for organizations to form joint design development
teams that collaborate for the life of a project and then disperse. These teams
need to quickly locate, evaluate, and make effective use of the best resources
available (tools, facilities, people), wherever they may be found. We will
concentrate on a subset of this problem that is critical in the early stages of
product development when new ideas, processes, components, and materials
are being explored and prototype solutions are developed. We seek to enhance
the ability of teams to exploit their own shared knowledge, novel tools and
manufacturing processes for improved product performance, reduced cost, and
documentation of their exploration for those who will inevitably follow, the
next generation redesign team. In this way we believe that the slow cycle of
exploration, maturation, and widespread adoption of advances in information
handling, materials, and processing capabilities can be compressed from
decades to years.
Our ongoing work takes the form of an annual series of experiments

involving 12 to 15 design teams in a product design-development laboratory at
Stanford University. Ten years of careful observation and analysis of teams at
work in their graduate studies and in industry are beginning to yield objective
predictors of design quality outcome and management strategies for assuring
uniformly high team performance. The pedagogic framework of this test-bed
course and associated research program are informed by work in the social and
cognitive sciences.
Kolb's Learning Cycle
Students have been observed to learn in four different ways (see Figure 14-1).
Kolb (1984) proposed that repeated cycles of experiences moving through these
learning modalities improve understanding. The cycle is best started with

concrete experience, proceeding to abstraction (Harb, Durrant & Terry, 1993).
Beginning with a need for learning stimulated by immediate experience, the
learner should reflect and question his perceptions of the learning environment
before proceeding to abstraction and. experimentation. The complete cycle
brings understanding in depth. Prompt feedback of the intention and meaning
of that experience plays an important role in the preservation of learning for
future reference.
Concrete Experience .... Reflective Observation

(reverse engineering) -- (notebook thinking)
(baS:;~:nin2 )
Active Experimentation ......._ _ _- -
.. Abstract Conceptualization
(design synthesis) (modeling & analysis)
Figure 14-1. Four-phase loop of experiential learning
Those who seek to support distant learning, asynchronous teamwork, and

interdisciplinary collaboration must be vigilant about preserving the critical
features of an experiential learning framework while exploring the power of
emerging technology to bring these methods to more people in more places and
at lower cost than has previously been possible.
Leifer (1995), after Kolb (1984), models student experiential learning as a

four-phase loop. Kolb observed that a cycle of experiences improves
understanding and builds bridges between theory and practice. This is a
qualitatively satisfying view of product-based learning as well. It remains a
challenge to objectively and quantitatively demonstrate performance
improvements when this pattern is experienced.
Constructivist Learning
Educational theorists, including Jean Piaget, have developed a model of

learning by which students develop knowledge structures based on previous
experience. A theory of how to teach science, known as the scientific learning
cycle, is a direct outgrowth of Piaget's ideas and constructivism (Harb, Durrant,
and Terry, 1993). Twenty years in engineering design education supports the
view that direct experience is the learning medium of choice in our domain of
higher education and parallels the fact that professional experience is the
measure by which most engineers, scientists, and other professionals are rated.
One may summarize the lessons of constructivist research as declaring that
learning is best done by creating something, a product, that embodies our
knowledge. This is product-based learning.
Vygotsky's Model
Vygotsky, as represented by Moll (1990), argues that knowledge is social before

it is personal, which suggests that it must be interactively and socially
constructed. This is usually observed through language usage, although it can
also be seen in the use of diagrams and sketches. In this model, learning must
be external and shared before it can be internalized and made personal. Much
of the current work on group-based learning derives from his thinking. Once
again, this model reflects our experience that engineering design is a social
activity (Leifer, Koseff, and Lenshow, 1995), and design team failure is usually
due to failed team dynamics. However, learning failure is usually blamed on the
individual. It is increasingly clear that design is a learning activity and that the
social nature of learning is promoted in PBL.
Product-Based Learning and ME 210
It is hypothesized that product-based learning (PBL) methodology and

technology have evolved in a manner that will make widespread PBL adoption
and assessment financially feasible for undergraduate, graduate, and continuing
education. PBL is defined as problem-oriented, project-organized learning that
produces a product for an outsider. Unlike other versions of PBL that focus only
on the project or problem, the product focuses our attention on delivering
something of value beyond a training exercise, something suitable for
evaluation by outsiders. The objectives of PBL curricula may be stated briefly as
follows (after Bridges & Hallinger, 1995):
1. Familiarize students with problems inherent in their future profession.
2. Assure content and process knowledge relevant to these problems.
3. Assure competence in applying this knowledge.
4. Develop problem-formulation and problem-solving skills.
5. Develop implementation (how-to) skills.
6. Develop skills to lead and facilitate collaborative problem solving.
7. Develop skills to manage emotional aspects of leadership.
8. Develop and demonstrate proficiency in self-directed learning skills.
The Stanford University School of Engineering course, Team-Based Design-

Development with Corporate Partners (ME 210), has served as a model for
exploring this approach. The seeds of our understanding of PBL have emerged
from this design-development laboratory where eighteen years of cumulative
evidence is making soft factors, like team personality, ring true statistically as
well as intuitively. Objective metrics are being derived from detailed
examination of the formal written records produced by design teams in the
natural course of doing their job.
Technical Focus on Mechatronics in ME 210
Our curriculum technology focus is Mechatronic systems design-development

(smart products). Mechatronics is frequently defined as the integration of real-
time software, electronics, and mechanical elements for imbedded systems. In
our definition, we include human-computer interaction and materials selection.
Typical products in this domain include disk drives, camcorders, flexible
automation systems, process-controllers, avionics, engine controls, appliances,
and smart weapons. Mechatronics is a particularly good medium for
introducing PBL because it is dependent on interdisciplinary collaboration.
Implementation of this curriculum builds, in part, on recently developed

Internet tools and services for distributed product-development teams (virtual
design teams). Such teams are increasingly common in the mechatronics field.
Using the World Wide Web rNWW) as an informal, work-in-progress

document archive and retrieval environment, we electronically instrument
design-team activity and knowledge sharing for project management and
learning assessment purposes. Examples include
Share: the MadeFast experiment (http://www.madefast.orgl), six

universities, six corporations, built a virtual company, delivered a product
in six months, and documented it all on the WWW (Toye et al., 1994).
ME 210: a graduate curriculum in cross-functional team mechatronic
systems design at Stanford (http://ME 210.stanford.edu), 12 to 15
companies,45 students, 15 of them remote, build and document
Mechatronic products each year.
NSF Synthesis Mechatronics: a multi-university, eight co-development
projects focused on undergraduate mechatronics, NSF Synthesis Coalition
web (http://www.synthesis.org).
ME 210 as a Product-based Learning Model
Corporate clients of the Stanford Center for Professional Development (SCPD)

who sponsor ME 210 projects are increasingly adamant about the need to give
their employees a continuing, lifelong education opportunity without losing
them to full-time study on campus. Encouraged by this demand, ME 210 has
been offered since 1994 to SCPD students across the country and since autumn
1995 to students and corporations around the world. The course is intensely
experiential, hands-on. Distributed design teams, composed of two to four
members, work on different industry-funded projects. They are supported by a
four-person teaching-team, a practicing design-team coach, and a corporate
liaison. The 1994-1995 academic year included Ford, GM, FMC, Lockheed,
Pfizer, 3M, Raychem, NASA-JPL, Redwood MicroSystems, Quantum, Toshiba,
Seiko, and HP Medical Products. The deliverable is a functional product
prototype and detailed electronic documentation of the product and
development process. These teams won 11 of 12 awards in the Lincoln
Foundation Graduate Design Competition in 1995. The competition is based on
formal external peer review of design documentation.
Instrumented Team Learning
ME 210 is supported by a World Wide Web based information system created

to meet the dual challenges of design-knowledge capture and interactive
distance education. ME 210 events are distributed by video and Internet
channels. The focus is on capturing and reusing both informal and formal
design knowledge in support of design for re-design. Video (preferably a digital
video server) is used for high-bandwidth, real-time transmission of lectures,
design reviews, and demonstrations. The Internet is used for low-bandwidth
data transmission, e-mail, knowledge capture, and retrieval. Electronic mail is
used extensively for communication between the teaching staff, students,
coaches, and industry liaisons. Communication is automatically archived and
organized by a HyperMail utility on the Web. The ME 21O-Web also includes
the class schedule, syllabus, assignments, five years of past student work,
design-team background information, and a HyperMail archive of working
notes. The Web archive functions as a cumulative team and curriculum
memory. It facilitates informal knowledge sharing within the class and shares
with subsequent generations the legacy knowledge of prior experience.
In addition to distance education benefits, the dynamic nature of the ME

210-Web fundamentally changes the model of interaction between student
teams, teaching staff, coaches, and industry liaisons. Before the ME 21O-Web,
product-development team progress was only observable at formal meetings
and quarterly reviews. With the ME 210-Web, work in progress is available for
review by all members of the community, any time, any place. This option
augments and strengthens traditional briefings. It facilitates feedback from the
teaching team, coaches, and corporate partners to the design team. Importantly,
teams can share their lessons learned in real-time. All of this gives our
corporate partners a basis for judging the validity of our curriculum and the
value of their financial investment in engineering education.
The course and the web become a model for high-tech instrumented
learning. This extensive use of high-tech delivery systems has led to
formulation of the following PBL assessment instrumentation model. Using the
model, performance studies demonstrate that diverse, distributed teams that
make effective use of electronic communication and documentation services
can, and do, outperform co-located design teams. To achieve this, several lines
of innovation have been introduced systematically in the ME 210 curriculum.
Major changes since 1990 include the following:
Behavior-based restructuring:
1. The conception of engineering as a social activity has been
emphasized.
2. An open-loft community replaced a closed work-room environment.
3. The ftrst ftve weeks of a 30-week development cycle are now
devoted to team building exercises rather than product development.
4. Personal preference inventory scores have been added to the list of
diversity factors used to design peak performance teams.
Technology-based restructuring:
1. Knowledge capture and sharing (WWW access) has been promoted
as a deliverable equal in importance to hardware and software.
2. The use of desktop computers for numerical engineering has been
de-emphasized.
3. The use of laptop computers (mandatory) for design-knowledge
capture and sharing and reuse has been strongly encouraged.
4. Conceptual prototyping (physical and qualitative) is advocated and
detailed design deferred.
Know/edge Capture for Collaborative Learning
It is the custom in industry to conserve knowledge through formal

documentation. Though this beneftts redesign across generations of engineering
teams, individual engineers often consider immediate beneftts of documentation
too small to justify the required effort. Consequently, documentation is treated
as a necessary evil to be done after the fact, after the finer points of decisions
and rationale have eroded from memory. While some engineers have such
details written down informally in meeting notes and notebook entries, these
records tend to be personal accounts that are difftcult to share with colleagues
during the course of design.
Our informal survey of existing computer-supported collaboration tools

showed that most information exchange mechanisms are built on a discourse
metaphor. Designers in a virtual co-located meeting scenario may use shared
white boards, IRC (Internet Relay Chat), e-mail, or usenet newsgroups to
actively enter into synchronous and asynchronous conversation threads. These
tools allow them to extend meetings over space, time, and offer opportunities to
record ongoing discourse.
However, as engineers know, meetings are only one part of teamwork. In

fact, a significant portion of teamwork occurs in parallel, outside of meeting
rooms. Indeed, the strategy of concurrent engineering is to gain lead time by
minimizing sequential dependencies and maximizing parallel task flow. To be
effective in an engineering team, it is equally important to be aware of parallel
tasks without engaging in prolonged discussions. This mode of informal
sharing is exhibited when teammates drop into each other's offices, look over
each other's shoulder at work in progress, or ask one another, "What are you
working on?" While it is possible to record conversations, the inherent purpose
of discourse is time-sensitive information exchange, transmit and forget, rather
than information archiving. Conversely, thought-based information is primarily
intended for one's personal archive and reuse. For effective collaboration, we
observe that it is valuable to capture and integrate collective discourse
knowledge and knowledge generated for personal use (see Table 14-1).
Table 14-1. Comparison of discourse-based and thought-based high-tech

tools
Knowledge Work
Discourse Based Thought Based
PRIMARY AUDIENCE others self
SHARE TENDENCY shared NOT SHARED
RECORD TENDENCY NOT RECORDED recorded
ANALOG EXAMPLE conversations personal notes
DIGITAL EXAMPLE e-mail e-notes
ASSESSMENT VALUE potential demonstrated
We have found that e-mail is the discourse medium of choice for

asynchronous dialogue. Notebooks, broadly used as a metaphor, are the thought
medium of choice for most knowledge workers. As no single paper notebook
serves all application domains, a variety of electronic notebooks have been
created. Our particular notebook, PENS-Personal Electronic Notebook with
Sharing (Hong, Toye & Leifer, 1995)-fills the need for an ultra-light
application that supports and implements WWW-mediated selective sharing of
one's working notes.
In review, we have identified three sources of learning performance data that

can be used for "work in progress" team assessment. Subsequent sections of this
chapter will explore team performance using these materials. Each is available
in electronic form and is WWW accessible. Hence, assessment can be

performed by anyone, anywhere, anytime.
1. formal reports
2. discourse-based e-mail
3. electronic thinking-notes
Product-Based Learning Assessment
Product-based learning integrates five key pedagogic themes, each central to

assessment: 1) externally sponsored projects motivate learning; 2) theory and
practice are synthesized in hands-on development; 3) real-world projects
demand multidisciplinary experience; 4) project management requires
creativity, problem formulation, teamwork, negotiation, oral communication,
and written documentation; 5) naturally occurring byproducts of project work
(proposals, presentations, lab notes, products, and reports) directly support
formative, summative, and validative assessment.
Each aspect of PBL benefits from computer and Internet technology in three
ways. First, computer-mediated work is more easily shared than paper-based
work. The need to share information is driven by cooperative learning and by
the real-world context of PBL activity. Second, formal presentations and
documentation are more easily created and disseminated electronically. Third,
electronic documents facilitate learning quality assessment through emerging
content analysis (Mabogunje, 1996) and organization communication pattern
analysis (Mabogunje, Leifer, Levitt, & Baudin, 1995).
PBL pedagogy themes map closely to the activity and issues of real product
development. Accordingly, the framework for our approach to learning
assessment is derived from observational methodology in the design research
community, especially the work of Suchman (1987), Tang (1989), and
Minneman (1991). Their findings told us what to look for while the authors'
experience in flight simulation research suggested the use of an instrumentation
framework. However, in contrast to flight cockpit team emphasis on precision
communication, our design-team observation studies have clearly shown that
the preservation of ambiguity through negotiation is a necessary component of
productive design environments (learning environments). It is also enabled by
computer and WWW-mediatedcommunication technology.
Instrumentation Feedback Model for Assessment
The guiding metaphor for our assessment and evaluation activity is one of
instrumentation. We use this term in the sense of observing both independent
and dependent variables in an automatic feedback control environment similar
to that found in aircraft flight simulators. The model, illustrated in Figure 14-2,
is an adaptation of the visualization first introduced by Atman, Leifer, Olds, &
Miller (1995).
validative
assessment
Figure 14-2. Design-team performance assessment model

The model includes seven instrumentation nodes required to observe most
critical input-output relationships. A comprehensive assessment program must
be able to observe activity at all nodes concurrently, a requirement that has only
become achievable through advanced technology.
Design-team performance (learning) is assessed and the assessment itself

validated (authenticated) in Product-based learning through three feedback
paths. In the validation loop, curriculum outcomes at node 7 can be compared
with industry performance standards across courses and campuses to assure that
the curriculum targets relevant professional skills and content. In the
summative assessment loop at node 6, individual projects can be compared with
each other to assure that the course is on target. For the innermost, formative
assessment loop at node 5, work in progress can be compared with peer
standards to assure that the team is on target. Results obtained from assessment
along these three feedback paths supports triangulation across data sources,
methods, and assessors.
In each assessment situation we identify the locus for instrument insertion,

the variables of interest, and observation methods of choice. The nature of the
activity and information structure in the process both influence these choices.
Historically, the emphasis has been on output assessment, factors that can be
observed at node 6 (formative-assessment), node 7 (summative-assessment),
and node 8 (validative-assessment). However, from a flight-simulation point of
view, it makes little sense to observe outputs without at least partial knowledge
of related inputs. Hence, we have focused our effort on instrumenting the input
nodes 1 through 4:
Node 1: explicit definition of pedagogic objectives and professional

performance requirements, including the description of scenarios and
situations in which performance is to be judged.
Node 2: course and instructor-specific selection of pedagogic methods and
continuous comparison of course, team, and individual outcomes with
peer standards.
Node 3: student and team-specific assessment of incoming content
knowledge and process management experience for continuous
comparison with goals and work-in-progress indicators of performance
achievement;
Node 4: recognition of, and action taken to reduce the impact of
uncontrolled, unobservable, and simply unknown factors that contribute
noise or uncertainty that affects the overall performance and assessment
process. By convention, noise is shown as a lumped parameter input to
the core teaching-learning process. In practice, noise may enter the
system at any node.
Subsequent sections of this paper review something of the impact of

technology on our ability to instrument, understand, and manage team
productivity and learning through the study of input-output relationships in this
model.
Metrics Enabled by Technology
Through longitudinal analysis of noise entering the teaching situation in the

form of students' prior experience and personal learning style preferences, we
have seen that the structure, membership, and self-management of product
development teams can be addressed in new and productive ways. The union of
an explicit feedback assessment model and advanced technology for
instrumenting and facilitating team activity have begun to yield performance
metrics that can be used by a team to monitor and accelerate its own
productivity. This information is also available to aid project supervisors,
including instructors, in the assessment and prediction of team outcome
performance.
We report two examples of our experience with metrics that have proven
valuable in both understanding and facilitating product-based learning within a
collaborative, team environment. The first is a subjective, behavior-based,
index for controlling the noise of student learning preferences as an input to
team performance (node 4). The other is an objective, content-based,
measurement of work in progress to predict the quality of a team's final
product. Data from work-in-progress assessment is used by teams for self-
assessment (node 3) and by project supervisors as an in-course corrective
mechanism (node 2). Both types of metrics depend on advanced technology in
the workplace for their implementation and acceptance.
Team-Preference Profile Management
The idea of using questionnaires to guide the formation of design teams had its
origins in 1989 in a workshop on creativity for design professors organized at
Stanford by Bernard Roth, Rolf Faste, and Douglas Wilde. To determine the
effectiveness of the workshop, participants were surveyed both before and after
the two-week intensive experience to determine their Gough Creativity Index
(GCI), a measure of creativity devised by the psychologist Harold Gough. The
GCI is an empirically determined linear transformation on the four scores
generated by the Myers-Briggs Type Indicator (MBTI), a questionnaire used
widely for psychological, educational, and vocational counseling (Briggs-Myers
& McCaulley, 1985). As reported by Wilde (1993), the workshop enhanced the
creativity index of the participants and promised to be a useful guide for the
design of effective teams.
In 1990 this idea was extended to ME 210. After responding to the MBTI,
the students were asked to voluntarily form teams composed of individuals
whose MBTI scores differed in at least two of the four variables. Subsequent
performance confirmed that teams having a high GCI member in balance with
other preference profiles tended to win more and better design competition
awards. Initial findings were refined in subsequent years and applied to teams
of two, three, and four members. Mathematical transforms were developed to

use MBTI numerical scores and vector analysis to decompose the score vector
into the 2,3, or 4 components required to guide team formation (Wilde, 1993,
September). It was also possible to compute Jungian cognitive modes, one of
which corresponds closely to the Gel (Wilde & Barberet,·1995).
The MBTI has since been replaced by a much briefer preference

questionnaire. This saves time and deemphasizes the counseling aspect of the
survey. Students wishing to investigate the counseling side further are
encouraged to do the MBTI under the supervision of a licensed counselor. The
preference questionnaire is viewed simply as an expression of personal
preferences rather than an indicator of personality type. Indeed, the data are
only used to place students in broad preference groups rather than personality
categories. Teams are then formed in a peer-to-peer protocol that rewards five
team diversity factors:
1. Include one person from each personal preference group.

2. Balance gender representation.
3. Maximize ethnic diversity.
4. Maximize technical background diversity.
5. Maximize the members' geographic distribution.
Since 1977, ME 210 final reports have routinely been submitted for
consideration in the Lincoln Foundation Graduate Design Competition. The
reports are read and evaluated by designers and design professors, who award
twelve prizes to those they consider best. Judges change from year to year and
are blind to the authors and their university affiliations. Since 1991, balanced,
high Gel teams have clearly been more successful at winning Lincoln awards,
not only winning more of them but also producing higher-quality work (Wilde,
1997). Table 14-2 summarizes the awards won during the five-year period of
our experiment and the preceding thirteen (control group) years.
The table shows that the percentage of prize-winning teams from ME 210
doubled, from 29% to 60%, during the experimental period. High Gel teams
had a higher winning frequency (63%) than did the other teams, whose
performance (50%) had already been improved from 29% simply by
diversification. The study shows that the quality of prizes won was also
considerably better for high GCI teams, which won a high proportion of Silver,
Gold, and Best of Program medals (42%), triple that for the other teams (14%).
In the best year to date, 1995, student teams won eleven out of the twelve prizes
awarded nationally. Team preference diversification thus corresponds to the
quantity of prizes won, an indication that distributing human talent among the
project teams raised overall quality, especially at the top.
Table 14-2. Summary of Lincoln Prize Awards
High-Gel Team Other Teams All Teams

Period Yrs N Aw % N Aw % N Aw %
1977-90 13 - - - - - - 138 40 29
1991-95 5 49 31 63 14 7 50 63 38 60
Yrs = number of years studied; N = number of teams; Aw = number of awards won;

% =percentage of tearns winning awards.
The personal preference questionnaire and group transform algorithm have

been implemented as World Wide Web forms (http://ME 210.stanford.edu).
Given Internet access, anyone, anywhere, can complete the questionnaire to
identify their preference profile. Another Web form is used to characterize the
team-preference diversity map. As documented in Hong, Toye, and Leifer
(1995), WWW mediated peer team formation, using voluntary personal
preference data, is one of the most unique and useful applications of technology
to team performance management in ME 210.
Design Documentation Content Analysis
Engineers will, of course, be more at ease with objective, technical content

factors that predict team performance. Accordingly, and in response to the
increasing number of engineering classes that require open-ended problem
solving (for example, design assignments), we have been studying product
design documentation in the interest of supporting design knowledge reuse and
grading. Traditional tools for measuring student performance, such as written
examinations and multiple-choice questions, are not appropriate assessment
instruments in project courses. Grading product documentation is
uncomfortably subjective. As the instructional situation moves from one in
which there is one right answer and method to one in which there are several
alternative answers and methods, there is a need for assessment tools that can
accommodate multiple points of view, be context dependent, and content
Specific.
312 Design-team Performance: Metrics and tne Impact of Technology
Studies of redesign behavior by Baya et al. (1990) demonstrated that

designers ask for information about prior designs in terms of noun phrases. It
was also observed that an automatic content search engine that could use noun-
phrases in simple associative relationships was far superior to key-word
searches in terms of the designer's appreciation of search results. Hence, noun
phrases have become the object of attention in recent documentation content
analysis studies. As the commentary on technical language usage below
demonstrates, these noun phrases may not be good grammar. But as
engineering instructors and practitioners will recognize, they are common in
practice and may be more important than the commentary by Conner (1968) in
A Grammar of Standard English would lead us to believe.
In the newer technologies-notably in engineering-the

(nomenclature) conventions are not systematic or clear; the
(engineers) themselves are either unaware of the lack of clarity
and system, or do not choose to make the effort to repair it.
Therefore, anyone who undertakes to read technical documents
must make his way through agglomerations like these:
_ the highest previously available intrinsic coercive force

_ single side band transmission
_ high-frequency stability
_ high-energy particle accelerator
_ internal transducer excitation supply
_ the segmented multiple ablative chamber concept
_ combustion chamber crossover manifold coolant passages
This situation will stay with us until the (engineers) establish

some flrm conventions and hold to them as chemi.<;ts dlld
mathematicians hold to theirs.
As correctly pointed out by the grammarian, noun phrases of this type break
several grammatical rules. Viewed differently, they point to the fact that
technical language is itself inventive and potentially a rich source of
information about design-team performance qUality.
The raw data for our study of noun phrases (Mabogunje, 1997) comes from a
set of three reports, named "design-requirement-documents," submitted by
students in ME 210. The projects analyzed were taken from the 1992-1993
academic year and dealt with a wide range of products including a catheter for
gene therapy in the human body, a control mechanism for maintaining the
focus of an infrared optical system, and a power locking device for an

automobile door. In Figure 14-3, team grades have been superimposed on a
graph of the number of distinct noun phrases found in each of their quarterly
design-requirement documents.
Number of Distinct Noun Phrases in the

Quarterly Design Requirement Document
1600 At
.A
1400 At
A
1200 tlz:.-;7~---=:;At
1000 At
A-
800
600
t
----~~..::::~~::::-.... Bt
400 Grade
200
o
Autumn Winter Spring
Academic Quarter
Figure 14-3. Association between distinct noun phrases and academic grades
To calculate the level of association between the incidence of unique noun

phrases and peer assessment of design quality (the grade), a gamma measure of
association was calculated. For the ten project teams studied, gamma was equal
to 1.0, a perfect ordinal prediction, for the winter quarter. For the spring
quarter, gamma was 0.71 (strong but less than perfect), following intervention
by the teaching team to stimulate the performance of slack teams. We found
that the number of distinct noun phrases in these documents was already very
high in the autumn report, written five weeks into a 25-week develop cycle.
Projects assigned the letter grade B or B+ corresponded to reports with the

lowest noun phrase counts, while teams assigned the letter grade A or A+
produced reports with the highest noun phrase counts. It was not possible to
make this sort of differentiation based on the readability of the documents for
which the gamma association with the class grade was low (0.36 for the winter
quarter and 0.31 for the spring quarter). The gamma association between the
grade and the number of words in the document was higher than that for
readability but lower than that for noun phrases (0.93 in winter and 0.49 in
spring). Hence the number of distinctive noun phrases in formal project
documentation appears to be an objective predictor of product quality ranking.
Grading is often perceived as highly subjective. It is based on several factors
and different events. It did not include objective document content analysis. It
may in the future.
While detailed examination of the learning implications of these results

must be deferred, it is important to point out that the results are, in part,
validated by the performance of these same teams in Lincoln Graduate
Engineering Design competition where the top awards were given to projects
with highest number of distinctive noun phrases.
Summary, Implications, and Invitations
This chapter has described our experience in the design, implementation, and
assessment of the Stanford University School of Engineering course, ME 210,
Team-Based Design Development with Corporate Partners. A theme that
pervades our work is the role of technology both in implementing learning
environments and in assessing performance outcomes. Utilization of Internet
capabilities to facilitate team-based distance learning has changed the
character of the learning environment in profound ways. Learning in PBL
environments is applied within the context of the course. Virtual design teams,
separated geographically but linked by a variety of technology-based
communication tools create actual product prototypes. Work-in-progress data,
again technology instrumented, is used for self-assessment as the work of the
teams evolves. The metrics produced by this data have been shown to predict
quality of the final products. The overall quality of designs produced by these
diverse, collaborative teams stands up well against industry standards as
reflected in the Lincoln Design Competition.
The data for these studies was available because the community is rewarded
for electronic knowledge-capture and knowledge-sharing. Early work on design
knowledge reuse had revealed that designers engaged in redesign scenarios are
persistent and insightful questioners. Documents that are rich in noun phrases
had proven most useful during redesign and were most easily indexed for
automated knowledge retrieval. Successful automation of design knowledge
retrieval has, in tum, given us a metric for assessment of performance as
revealed in project documentation.
Teams working on real job assignments (PBL) naturally produce documents

that can be used to assess performance and learning. We have observed that
objective measures of these documents, such as the frequency of unique noun
phrases, can predict the final quality of the team's work as judged by peer
review. The key to this kind of assessment is an atmosphere that values,
promotes, and technically supports real-time project documentation.
Our experience, tools, and metric validation studies only hold for
engineering product development teams working in the ME 210 community
and supported by our knowledge capture and sharing environment. Any
extrapolation to other situations, domains, and communities requires careful
examination. However, we are encouraged to explore the utility of these
methods more widely because, in part, the design-team application domain is
among the most ambiguous and challenging of assessment scenarios.
The role of technology in this endeavor is critical. We could not have

attempted, let alone performed, the objective studies without careful technology
insertion. The application of advanced technology was itself based on the
advanced technology of video interaction analysis used for direct observation of
development teams at work in industry and in our own academic situation.
Importantly, the validation of these measurements was made possible by the
framework of external peer review in the Lincoln Design Competition.
Inevitably, these results pose more questions than they answer. How robust
is the noun-phrase metric? Can it be manipulated by individuals and teams
without corresponding quality outcomes? Can it or related metrics be used daily
with e-mail and electronic notebook records to provide real-time feedback on
work in progress? How do other factors, ethnicity, gender, expertise,

geographic-distribution, and organizational factors affect team performance?
In short, we are encouraged. Our curriculum continues to evolve. New

technology is being evaluated. A culture of technology assisted self-assessment
in higher education is in place. Corporate training and lifelong learning are
now part of our education model and methodology.
Acknowledgments
Few, if any, of these ideas are entirely free from the influence of friends,
colleagues, and students. I would first like to thank the generations of ME 210
students who have guided and surprised me through the years as I attempt to
understand how they defy entropy, create order, and fine machines. Regarding
the assessment of their performance, Ade Mabogunje, David M. Cannon, and
Vinod Baya have been especially valued contributors. Regarding pedagogy and
the assessment of engineering education, I am particularly indebted to Sheri
Sheppard and Margot Brereton. John Tang and Scott Minneman opened the
doors to ethnographic methodology in design research and established the
observe-analyze-intervene research framework that has guided many
subsequent studies. Perceiving and filling a need, George Toye and Jack Hong
created the PENS notebook environment and helped formulate our high-tech
assessment strategy. Doug Wilde's diligent exploration of the correspondence
between team dynamics and design performance has demonstrated something
of the truth in an old assertion that "it's the soft stuff that's hard; the hard stuff
is easy."
References
Atman, C., Leifer, L.. OIds. B .• & Miller. R. (1995). Innovative assessment
opportunities [videotape]. (Available from the National Technical University
satellite broadcast, Fort Collins. CO.)
Baya, V., Givins. J., Baudin. C., Mabogunje, A., Toye. G .• & Leifer, L. (1992). An
experimental study of design information reuse. In Proceedings of the 4th
International Conference on Design Theory and Methodology (pp. 141-147).
Scottsdale. AZ: ASME.
Brereton, M.F., Sheppard, S.D., & Leifer, L.J. (1995). How students connect
engineering fundamentals to hardware design. In Proceedings of the 10th
International Conference on Engineering Design, Prague, (pp. 336-342) Zurich,
Switzerland: Heurista.
Bridges, E.M., & Hallinger, P. (1995). Problem Based Learning: In leadership

development. ERIC Clearinghouse on Education Management, Eugene, OR.
Briggs-Myers, 1, & McCaulley, M. H. (1985). Manual: A guide to the development and

use of the Myers-Briggs Type Indicator, 2nd Edition. Palo Alto, CA: Consulting
Psychologists' Press.
Conner, J.E. (1968). A grammar of standard English. Boston: Houghton Mifflin.
Cutting, D., et al. (1992). A practical part-of-speech tagger. In Proceedings of the

Applied Natural Language Processing Conference, Trento, Italy.
Harb, J., Durrant, S., & Terry, R. (1993). Use of the Kolb Learning Cycle and the
4MAT System in engineering education. ASEE Journal of Engineering
Education.82,(3),70-77.
Hong, 1., & Leifer, L. (1995). Using the WWW to support project-team formation. In
Proceedings of the FIE '95 25th Annual Frontiers in Education Conference on
Engineering Education for the 21st Century, Atlanta, GA.
Hong, J., Toye, G., & Leifer, L. (1995). Personal electronic notebook with sharing. In
Proceedings of the IEEE Fourth Workshop on Enabling Technologies: Infrastructure
for Collaborative Enterprises (WET ICE), Berkeley Springs, WV.
Kolb, D. A. (1984). Experiential learning. Englewood Cliffs, N.J: Prentice-Hall.
Leifer, L. (1995). Evaluating product-based-Iearning education. In Proceedings of the

International Workshop on the Evaluation of Engineering Education, Osaka, Japan.
Leifer, L., Koseff, J., & Lenshow, R. (1995, August). PBL White paper. Report from the
International Workshop on Project-Based-Learning, Stanford, CA.
Mabogunje, A. (1997). Measuring mechanical design process performance: A question

based approach. Unpublished doctoral dissertation, Stanford University.
Mabogunje, A., Leifer, L., Levitt, R.E., & Baudin, C. (1995). ME 21O-VDT: A
managerial framework for measuring and improving design process performance. ill
Proceedings of the FIE '95, 25th Annual Frontiers in Education Conference on
Engineering Education for the 21st Century, Atlanta, GA.
Minneman, S. (1991). The social construction of a technical reality: Empirical studies of

group engineering design practice. (Xerox PARC Technical Report SSL-91-22)
Doctoral dissertation, Stanford University.
Moll, L. (ed.) (1990). Vygotsky and education. Cambridge, UK: Cambridge University
Press.
Suchman, L. (1987). Plans and situated actions: The problem of human machine
communication. Cambridge, UK: Cambridge University Press.
Tang, J.C. (1989). Listing, drawing and gesturing in design: A study of the use of shared
workspaces by design teams. (Xerox PARC Technical Report SSL-89-3) Doctoral
dissertation, Stanford University.
Toye, G., Cutkosky, M., Leifer, L., Tenenbaum, M., & Glicksman, 1. (1994). SHARE:
A methodology and environment for collaborative product development.
International Journal of Intelligent and Cooperative Information Systems. 3 (2), 129-
153.
Wilde, D. 1. (1993). Changes among ASEE creativity workshop participants. Journal of

Engineering Education. 82 (3), 57-63.
Wilde, D. J. (1993, September). Mathematical resolution of MBTI data into personality

type components. Paper presented at the American Society of Mechanical Engineers
1993 Conference on Design Theory and Methodology, Albuquerque, NM.
Wilde, D. J. (1997). Using student preferences to guide design team composition. In

Proceedings of the American Society of Mechanical Engineers 1997 Conference on
Design Theory and Methodology, Albuquerque, NM.
Wilde, D. J., & Barberet, 1. (1995). A Jungian theory for constructing creative design
teams. Proceedings of the 1995 Design Engineering Technical Conferences. 2(83)
525-30.
About the Author
Larry Leifer holds a Bachelor of Science degree in Mechanical Engineering

(Stanford, 1962), a Master of Science in Product Design (Stanford, 1963), and
a Ph.D. in Biomedical Engineering (Stanford, 1969). While at the Swiss
Federal Institute of Technology in Zurich, his work dealt with neuromuscular
control of posture; functional electrical stimulation of hearing; and electro-
physiological measures of human information processing On the faculty at
Stanford since 1976, he presently teaches global team-based product-design
development methodology and develops computational supports services in

collaboration with corporate partners. As director of the Stanford Center for
Design Research since 1982 he works with anthropologists, educators, and
computer scientists to understand through applied ethnography, facilitate
through advanced computer-communication technology, and measure the
performance of creative design teams.
15EVALUATION:
USING QUALITY TO DRIVE
A CASE STUDY
David J. Basarab
Introduction
How do you implement evaluation efforts so that they are considered a normal
activity within your business? The answer is you cannot! If you are a training
operations or human resource person, you do not control business units; you
cannot inculcate evaluation into their culture. But don't worry, sooner or later
they will come and ask you how to find out if training is working. You should
be prepared to answer their questions on the application of training in their
business. The narrative that follows describes how Motorola, at the request of
its business units, has successfully infused level 3 evaluation into its daily
operations.
This is not a theory or a model that someone has created and written about.
Rather it is a case study that describes hours of hard work by a dedicated team
of professionals from Motorola who answered the call from their business units
with respect to application of training on the job. The process described here
may not be applicable to everyone in every situation, but it documents a real
story of how one team infused evaluation into diverse business operations
around the world.
322 Using Quality to Drive Evaluation: A Case Study
The Critical Business Issue
Every year Motorola invests over 4.4 million hours of employee time and over
200 million dollars in training. With this kind of investment in people,
Motorola businesses asked training departments for help in determining if the
money they invest in training results in expected performance back on the job.
To address their concern, a cross-functional, cross-sector, global team was
formed. The team decided to use one of Motorola's strengths, the creation of a
Total Customer Satisfaction (TCS) team.
The TCS process is one of many problem-solving methods at Motorola. It

has strong support within all facets of Motorola's diverse business units and
support from top management. The team must follow a defined process. The
process is outlined in Table 15-1.
The Team
The team represented Motorola business operations of Land Mobile Products

Sector and Semiconductor Products Sector, and the evaluation, design, and
delivery departments of Motorola University. They came from various locations
including Texas, Arizona, Illinois, the United Kingdom, and Singapore. Most
important, the team represented people from business units and training
working together to answer the question, Is the money invested in training
resulting in the application of skills on the job, and if not, why not?
Since team structure spanned the globe, they held monthly meetings that
varied in location and utilized e-mail accounts and conference calling to ensure
total team participation. Overall, the attendance rate exceeded 97 percent. One
week after each meeting, minutes were distributed. This format allowed the
team to successfully complete over 200 action items over an eighteen-month
period. The first team meeting was a training session. The goals of the meeting
were to become a cohesive team and to understand group dynamics. They
agreed upon meeting format, team rules, a process for resolving conflict, and
team problem-solving methodology. As they progressed through the TCS
process, they identified and completed additional training. The additional
training consisted of skills they needed to solve their problem and included the
following:
Table 15-1. TCS process description

i~i~H~ljl~~lillili~ml;~l~~~~ltlll;l;l;lml;~lmmmi~lmmmlm~l;~ll~;lml;lm~~tmm~ml~l;ii:muifli~~~~mml;l;l~~mlml;~l~~lr:~~ill;l~;;;l~;l~~ili~*~;~;l~~~ll;l;~;i;l;l;l;l~l;l;l;l;lr:ml;l;l;l~l;l
1. Project selection
• Identify source of problem.
• Identify problem.
• Write initial problem statement.
• Decision: Does the problem meet the following criteria?
• Problem and solution are within the team's control.
• Problem is customer focused and will improve customer satisfaction.
• Solution can be implemented in 3 to 12 months.
• Must be tied to Motorola's key initiatives.
• Joint responsibility to solve problems by management and TCS team
• Management is conunitted to the team.
2. Team formation
• Form team.
• Select champion and team leader.
• Hold initial TCS meeting.
• Attend 2-day team training as a work team.
• Decide on team name.
• Identify team scribe.
• &tablish ambitious goals that integrate customer needs and requirements.
• Focus on teaming.
• Set schedule for next set of meetings.
3. Analysis
• Fine tune problem statement.
• Develop analysis plan.
• Conduct analysis.
• Produce analysis report.
• Decision: Based upon analysis, can the team proceed to solve the problem?
4. Remedy
• Brainstorm possible solutions.
• Determine feasibility of each solution.
• Develop solution.
• Validate proposed solution with customers.
• Develop strategy for pilot and determine criteria for successful pilot
5. Pilot test and revise solution as necessary
• Develop methods for analyzing pilot test performance.
• Validate methods from previous step.
• Conduct pilot test.
• Summarize/analyze pilot test results.
• Make necessary revisions.
• If necessary, pilot revised solution.
• Report results compared to goals and customer requirements.
• Identify ancillary effects resulting from solution.
6. Institutionalization
• Develop institutionalization plan.
• Execute institutionalization plan.
• Monitor implementation of solution and revise as necessary.
• Statistics
• Process mapping
• Cause-and-effect diagramming
• Force-field analysis
• Pareto analysis
• Pugh concept selection
• Evaluation theory and practice
Project Selection
The team defined their span of control and customer groups. They realized that
they actually had two major customer groups: first the businesses, and second
the training community. The team brainstormed a list of ways in which their
customers' training dollars were being wasted. With the results, they created
the following prioritization matrix to help select the project the team would
address.
The prioritization matrix technique, illustrated in Table 15-2, serves as a

score sheet for weighing possible projects. Along the left of the matrix and
running in a column, the problems of conducting a level 3 evaluation were
listed. Along the top of the matrix, the three criteria the team determined for
project selection were listed. The criteria were
• Impact on customer
• Need to improve
• Team's ability to impact
A numerical value of 5 through 1 was then assigned to each cell in the

matrix, and a score for each problem was arrived at by multiplying them
together. During this brainstorming session, a number of potential projects
were identified. Using the prioritization matrix technique, the team was able to
isolate the project with the most impact to Motorola. They found that issues
surrounding skills application had the biggest impact on our customers, and
with the team's expertise, they could significantly affect this area. To further
narrow the scope of the project, they talked about what tools, if any, operations
had to measure skills application. Interviews with business customers validated
assumptions that they had none.
Table 15-2. Prioritization matrix

Problem Impact on Need to Team's Overall
Customer X Improve X Ability = Score
to
Impact
No opportunity 5 5 4 100
to practice
Organizational 5 5 5 125
barriers
Not skill based 4 5 4 80
Course length 4 5 1 20
Not needed 3 4 1 12
Wrong training 5 4 1 20
Travel 5 4 1 20
Factor Weighting:
5 =Extreme 4 =Very Strong
3 =Moderate 2 =Somewhat
1 =None
The team goal was then defined: to create a process that would enable
businesses to determine if their training investment was resulting in employees
successfully applying skills on the job (level 3 eValuation). To do this they
would develop, pilot, revise, and institutionalize a process throughout Motorola
worldwide.
The team realized that application of skills impacts all of the Motorola
corporate initiatives. In addition, it supported the newest corporate initiative of
Individual Dignity Entitlement. Specifically, it related to the question of
employees having the on-the-job skills and training necessary to be successful.
Analysis Tools
With the problem clearly defined, the team turned to several analysis tools to
help find the source of the problem. This activity was critical in providing the
right solution for the team's customers: the business units and the training
community. Unfortunately, many teams blindly proceed to solution generation
without clearly identifying the root cause of the problem. This results in the
creation of solutions that are wrong or are never adopted. To ensure they did
root cause analysis correctly, the Total Customer Satisfaction team chose
several tools currently used by Motorola in its quality activities. This provided
two benefits. First, it allowed them to properly identify the root cause. A
second, and very important benefit was that they were able to talk in the
"language" of their customer when referring to the problem.
First, the team brainstormed to create the Ishikawa diagram illustrated in

Figure 15-1. The Ishikawa diagram was used to illustrate relationships of all
factors related to why there was no method at Motorola for evaluating training
investment with respect to level 3 evaluation. The team clearly stated the
problem so that everyone could focus on identifying the root cause. After
interviews and discussions with their customers and potential users were
collected, they developed the diagram.
Methods People Power
No method for
_~,.... ..._ _ _ _ _ _ _ _...,_ _ _ _ evaluating training
investment
Machines
Figure 15-1. Ishikawa diagram
First they drew the "spine" of the fish bone (as the Ishikawa diagram is
sometimes called). Next they added the main causes typically used in this
technique.
• Machines
• Methods
• Material
• People power
They identified possible causes for not having a process the business can use
to evaluate their training invesbnent and prioritized the following primary
causes: lack of quality metrics, process ownership, and limited skills.
The team then turned to their customers to validate the primary causes they
had identified and to gain further insight into the elements of the problem as
perceived by their customers. First they distributed a survey to more than 100
training professionals throughout Motorola businesses worldwide. Figure 15-2
displays the top issues identified by training professionals in a Pareto chart.
A Pareto analysis is a bar graph that charts issues beginning with the most
common on the left and moving to the least common on the right. The team
chose to construct a Pareto diagram because the problem of not using level 3
evaluation consisted of several smaller problems or root causes. This allowed
., ... .q g
= Q)
'.,
8
<Il 0: <Il
:a ~ 0 U u
t
0.0 0:
en en '~ ""
~
;:l
'"
Ql
> ~ .§
~ &! u
Figure 15-2. Pareto diagram of customer issues

the team to decide which problems to solve; therefore, maximizing potential

chances for success. The number one issue identified by the training
community was limited knowledge on how to measure application of skills in
the workplace.
Then the team held interviews with their customers from the business
community. They found that the business was concerned about the lack of
quality metrics used to report on their training investment. To understand their
issues more clearly, the team conducted a force-field analysis that exposed the
driving forces and restraining forces that affect the businesses' ability to
evaluate their training investments. The analysis results are illustrated in
Figure 15-3.
Barriers to Success
Business need Impact of results

Become quality oriented Availability/size of staff
Data for training improvement Confusion over relevance
Identify application barriers Personnel evaluation skill level
Partner with business trainers No automated data collection
Upgrade personal skills Credibility of procedure
Upgrade systems Corporate ownership
Budget
Driving forces Restraining forces
Figure 15-3. Force-Field Analysis
In defining a solution, the team would have to build upon the driving forces
for greatest impact while working to eliminate as many restraining forces as
possible.
With the root causes validated by their customers, the team asked themselves
a question: "Can we solve the problem using a process that currently exists, or
will we need to create one?" Since their internal research indicated no such
process existed within Motorola, they decided to benchmark other companies.
They selected companies with documented skills application processes. Upon
initial screening they eliminated processes not meeting a minimum of three of
their customer's requirements.
Using Pugh Concept Selection they compared the remammg available

solutions. The results of this analysis are displayed in Table 15-3.
Table 15-3. Pugh Concept Selection
Criteria for Selection Co. A Co.B Co. C Create a

Motorola
Process
1. Flexible Datum + ++ ++
2. Executed by business Datum - - ++
3. Emphasis on application Datum S + +
4. Use various data collection Datum + + ++
methods
5. Includes customer Datum S S S
6. Training for process users Datum - - ++
7. Provides consultation Datum S + +
8. Unobtrusive Datum - - -
2 plus 5 plus 10 plus
Totals 3 minus 3 minus 1 minus
1 minus 2 plus 9 plus
Pugh Concept Selection is a method of choosing potential solutions to a

problem (Clausing & Pugh, 1991). The technique involves researching several
solutions to the problem and choosing one of those as a data point to reference
all others against. In this case, Company A was chosen as a reference point.
Next, the criteria for selecting a solution were listed in the left column. Then
each solution was rated against the reference point using the defined criteria for
selection. When rating, the team used five indicators:
• S indicates the solution and reference point were judged as the same.
• + indicates the solution was judged better than the reference point.
• ++ indicates the solution was judged superior to the reference point.
• - indicates the solution was judged worse than the reference point.
• - indicates the solution was judged vastly inferior to the reference point.
The pluses and minus were then totaled to arrive at a score for each solution.
The solution with the highest positive value was chosen. As a result of the
analysis, the team eliminated Company A due to lack of flexibility, discounted
Company B because it focused on course problems instead of skill application,
and Company C due to cost. The results showed that only one of the proposed
alternatives would address the causes and issues identified by our analysis.
Remedies
The team accepted the challenge to create Motorola's own Skills Training
Application Review process, STAR.
With a list of customer requirements and team expertise they created a

should-be process map, illustrated in Figure 15-4, consisting of specific steps
and criteria to follow when conducting a STAR.
The STAR process is much more than ~ map. It is a guide, complete with
process specifications and examples, that allows users to easily plan, develop,
collect, analyze, and report various types of skills application data. It is flexible
enough to walk users through simple descriptive analyses and still detailed
enough to guide them through more advanced causal comparative and
correctional studies.
oIdentify course
oIdentify evaluation stakeholders oCreate models to analyze data oScreen incoming completed instruments
oInterview stakeholders oReviewlrevise instruments & models oInput data into analysis models
oWrite evaluation plan °Obtain agreement to instnunents & models
°Obtain agreement to plan
-COmmunicate plan
-Determine areas for improvement -Decide upon report content operfolID detailed analysis
-Assign action items -Write & distribute report -Answer evaluation questions
oimplement changes
Figure 15-4. STAR should-be process map
The team used a modified Quality Function Deployment (QFD) to ensure the
new process would meet their customer's requirements. Quality Function
Deployment (Clausing & Pugh, 1991) is a structured method of determining if
the features of a product or product are aligned with customer needs. The
results of the QFD revealed that their customers liked the proposed process.
With the new process clearly defined and approved by their customers, the
team was ready to pilot. At the same time, to make sure STAR was a
permanent solution, they went back to the root cause analysis and identified
specific issues that needed to be addressed.
They identified three additional remedies that were key to providing

successful solutions for the businesses. First, they created a tool to assess the
integrity of the Skills Training Application Review. Feedback from this
instrument is used for continuous improvement and resulting metrics ensure a
quality process.
Second, they discussed how their business team members could integrate
this process into their operations. The team came up with the concept of a
worldwide STAR network. The concept was to create a group of trained STAR
process personnel from the businesses who actually conduct the level 3
evaluations. Members of the STAR network share data among themselves via
e-mail. The Motorola University Evaluation Service Team serves as the central
point for collection of activities, distribution of material, and communication
with members. Figure 15-5 illustrates the STAR Network concept.
The creation of this network led the team to the third remedy, to develop
training for these network members. Since the TCS team created the process
and had training expertise on the team, they could design, develop, and deliver
the training.
The STAR Network

Ongoing
Communication
Annual
Meetings
Process
Improvement
Grow and
Expand
----rl--- Quality Control
Reports
Trained
Evaluators
Process
Monitoring
Supports Ibe Needs
of the Business
Figure 15-5. The STAR network

Now that each of the root causes was addressed by the remedies, the TCS
team only needed to create an implementation plan and identify key elements
necessary for rolling out the STAR process and network throughout the world.
The plan consisted of piloting the process and revising it to ensure that it
worked and solved all problems identified in earlier analyses. After the pilots,
worldwide communications via electronic methods and grass roots discussions
got the word out. Finally, people from allover the world were trained in the use
of the STAR process.
After thorough execution of the plan, the team was ready to see if it had
achieved the original goals of the project.
Results
The results were impressive. The team set out to determine if the money
invested in training was resulting in employees successfully applying skills on
the job. The team not only managed to define STAR, but did it while
continuously incorporating customer feedback.
They planned to pilot and revise the process. Their first review for Land
Mobile showed that 91 percent of the participants in a sales management
training course were applying the coaching skills taught in class back in the
field. This review showed an effective training investment of $195,000. On the
basis of this pilot's results, some minor modifications were made to the
planning phase of the STAR process. While this was good news, it got even
better.
The results of the second review, a Semiconductor Product Sector

management class, revealed that a 16-hour course could be effectively reduced
to 6 hours with no negative impact on skills application. This resulted in a 62
percent cycle time reduction and a quarter of a million dollars savings to the
Semiconductor Product Sector.
In a third review, Motorola Codex asked to use the STAR process on a

course they were considering purchasing. The process identified a disconnect
between training content, stakeholder requirements, and the local reward and
recognition program. As a result, Motorola Codex did not purchase the course,
avoiding an ineffective investment of $360,000.
Combined, STAR pilots demonstrated an effective investment of $195,000

and cost savings of $610,000, and this was only three reviews. Since then, more
than 100 STARs have been conducted with results varying from demonstrating
successful transfer of skills to completely eliminating courses from the list of
programs offered.
The remaining team goal was to transfer the process to businesses by the end
of the first quarter of 1994. They achieved this goal four months earlier than
expected. Individuals from business operations have taken ownership of this
process and are conducting studies of their own. The process measurement tool
indicates they have achieved 100 percent Total Customer Satisfaction while
ensuring quality studies.
Favorable comments from a general manager of Semiconductor Product

Sector and a senior training officer from a large U.S. company provided
additional evidence that the TCS team had met their goals. What the team did
not expect were these ancillary effects:
1. They are shifting the emphasis from merely getting 40 hours of training (a
Motorola corporate policy) to changing performance back on the job.
2. The STAR process has been benchmarked as best-in-class by companies
outside Motorola.
3. By expanding the STAR network, they are improving cooperation between
business and training.
4. Team members were cross-trained in the STAR process, instructional
design, statistics, problem-solving tools, and key Motorola business issues.
5. The team moved from participative management to empowerment by
effectively implementing the Motorola Total Customer Satisfaction
process.
Institutionalization
When the team agreed to solve this problem, they accepted the responsibility of
making sure improvements were sustainable and permanent. The STAR
network supports and further expands this process throughout businesses.
Sixty-two business locations worldwide have joined the STAR network. In

addition, ongoing communications with technical updates and quality control
reports are exchanged regularly.
The team defined a feedback loop, illustrated in Figure 15-6, that helps
continuously improve the STAR process and support tools.
Ongoing
Communication
Quality
Technical
Control
Updates
Reports
Figure 15-6. Feedback loop
Process fan-out has gained the attention of others both inside and out of
Motorola. A Semiconductor Product Sector wellness manager is using the
process to show application of desired wellness behaviors, and Motorola
University is using the tool in projects dealing with school district reform.
So what does all this mean to Motorola? The new STAR process 1) assists
businesses in making sound training investment decisions, 2) helps the training
community ensure their courses are working, 3) helps operations identify

barriers that are preventing skills from being applied, and 4) helps Motorola be
confident that the annual 4.4 million hours and 200 million dollars invested in
its people are time and money well spent.
The team believes STAR also helps Motorola achieve its corporate vision of
becoming the premiere employer in the world by affecting their greatest
competitive advantage, their people.
Conclusion
This narrative describes how a team, sponsored and driven by business units,
successfully institutionalized evaluation within the business. The team built
upon and leveraged the quality values and tools that are an integral part of
Motorola's culture. The concept is simple:
• Have your customers (business) own their problem and sponsor efforts to
solve it.
• Form a team of training and business professionals.
• Follow a rigorous process that focuses on customer (business unit)
requirements.
• Collect data to determine the real problem, not the perceived one.
• Develop a solution that will solve the problem and satisfy the customer
(business).
• Pilot the solution and champion it by giving presentations that speak the
language of the business, not training or evaluation terminology.
• Ensure that a permanent owner is assigned to monitor and implement the
solution.
There is nothing magical or special about Motorola's success; it's just hard
work and partnering with the customer.
16 CERTIFICATION: BUSINESS
AND LEGAL ISSUES
Ernest S. Kahane and William McCook
Certification: A Growth Trend in Business
In a recent study of education and training in the information technology

industry, vendor-driven technical certification was described as a "strong and
growing" segment (Dataquest, 1995, p. 85). More than half of the resellers
surveyed required certifications for all their employees. Twenty-five percent of
these resellers were willing to spend $5,000 or more a year on individual
training activities leading to certification. Twenty-five percent of the corporate
customers were also willing to spend up to $2,499 a year for individual
certifications. Each group was also willing to invest additional funds in
certification testing.
In Utah, the home of Novell Education, with more than 320,000 Novell
certifications awarded worldwide, a second statewide certification month was
declared by the governor in 1996. Novell plans to bring its corporate
certification programs into secondary schools. IBM currently has programs for
more than 35 different software professional certifications. As one writer
comments, "Certification fever has hit the computer industry. And that means
there's a whole lot of training going on" (Filipczak, 1995, p. 38).
The companies that participated in the Dataquest survey generally believed

that their investments in certification programs had yielded positive returns.
338 Certification: Business and Legal Issues
Improving quality, increasing productivity, and ensuring qualified staff were

their most important reasons for investing in technical certification for such
positions as network engineer. For individuals pursuing certification,
professional advancement was listed as key.
Although most respondents believed that a certified individual provided

better customer service, they also stated that certification was no substitute for
experience. However, when asked whether certification testing provides a "fair
and accurate reflection of employee skills" (Dataquest, 1995, p. 90), more than
half the respondents answered yes. The report concluded that, while in reseller
organizations there is endorsement of certification as a requirement for doing
business, for corporate customers questions about the value of certification still
existed.
An increasing number of specialized companies are entering the

certification marketplace. Dataquest listed 28 companies providing training, of
which 22 also provided certification services. A new entrant into this market is
the Chauncey Group International, Ltd., a for-profit, wholly owned subsidiary
of the Educational Testing Service. The Chauncey Group was launched in
January 1996 to "... assist individuals and organizations with career
advancement and certification of skills and achievement" (The Chauncey
Group International Ltd., 1996a, [On-line]). Building upon the expertise of its
parent organization, which delivers over 9 million tests per year worldwide, the
Chauncey Group overtly advertises the rigor of its evaluation and validation
processes.
Corporate Human Resource Managers must be sure that tests they

use are legally defensible when used to screen candidates ...
These thorough multi phase procedures provide strong evidence of
the assessment's content validity (The Chauncey Group
International Ltd., 1996b, [On-line]).
Certification in a business context is not limited to computer specializations.

One can become certified in areas such as automobile mechanics, interior
design, consulting, project management and in roles such as a swimming pool
service technician or as a business intermediary.
This chapter will work through the key issues required to understand
certification within a business context and from an evaluation perspective that
highlights testing and assessment issues. Training and evaluation specialists
have an important role to playas consultants to businesses because certification
involves complex evaluation issues and business risk. Specifically, evaluation
specialists can help their business partners understand what certification is,
when it is appropriate to meet a business need, and how to successfully design
and implement certification programs.
First, this chapter examines why certification is important in industry and

how it relates to the changing employment contract between companies and
employees. Second, the term, certification is defined and its complementary
relationship to terms such as accreditation and licensing is explained. As a way
to demonstrate how the term certification can refer to different processes and
outcomes, vendor certification in industry is compared to the professional
certification of doctors. Why, for example, does malpractice apply to certified
doctors but not presently to certified network engineers? This chapter will show
that one must understand the underlying social and business drivers of different
forms of certification to make sense of the issues, importance, benefits, and
costs of certification in industry.
Third, specific measurement and evaluation issues associated with

certification programs are detailed and a matrix is presented that compares
technical considerations, legal risks, and costs of two types of certification to
facilitate understanding about their evaluation implications. The steps in
designing a certification program that is professionally and legally sound are
listed. Finally, criteria to help determine whether certification is the best
strategy to meet a business goal are presented.
What Industry Trends Are Driving Certification?
All companies face the same challenges: developing a competitive workforce,

ensuring safety and quality in their work processes, identifying individuals with
the skills and knowledge to perform specialized services. Certification helps
companies meet these needs by providing a process for assessing and
recognizing competence.
Certifications supply information about individuals and their skills that

would otherwise be difficult, expensive, and time consuming for employers to
test and assess directly. By defining levels of competence, certification helps
employers within and among companies screen and manage their human
resource requirements. As one marketing manager states, "Certification
programs have become the quality standard of the 1990s" (Standler, 1996).
Several trends are increasing the pervasiveness of certification in industry.

The old employment contract that typically involved a loyal, stable, full-time
workforce has changed to a new relationship that emphasizes a company
commitment to skill development and employability. Gone is the guarantee of
long-term employment (Waterman, Waterman, & Collard, 1994). Traditional
career ladders are being replaced by employee movement across projects and
companies. Labor arrangements have become more fluid and open, with new
and specialized experts likely to enter the company on a contract rather than
full-time (Harrison, 1994).
Certifications are becoming increasingly important in a marketplace where

competence is required but relationships are open, shorter term, and
interorganizational. For employees who take charge of their career
development, certifications serve as portable credentials that can travel within
and across companies. Employers, primarily in technical areas, can attract and
retain key employees by advertising for specific certifications. Certification also
serves as a competitive differentiator. One company vice president recently
boasted that his company had more individuals who were Microsoft-certified
engineers than Microsoft itself.
What Is Certification?
Certification is a formal process for assessing and recognizing competence. The

assessment and recognition may be provided by an independent profeSSional
board, government agency, corporation, professional organization, or other
authorizing agent. Certification is often used synonymously with other terms
like licensure and accreditation. There are, however, important distinctions
among these terms. Table 16-1 provides summary descriptions that highlight
these differences.
The Professional Model
To understand certification in industry, it is useful to compare it to professional

certification and licensing. In well-established professions such as medicine
(Freidson, 1970), extensive professional autonomy is granted based on the
special skills and trustworthiness of its practitioners.
Table 16-1. Types of certification in industry
Type Definition
Individual certification The process of assessing and recognizing
individuals who demonstrate a predefined level of
proficiency.
Professional certification Individual certification granted by a profession,

such as accounting, insurance underwriting, and
medicine.
Vendor certification fudividual certification, usually product or service

specific, granted by a company that develops and
sells a product or service. Novell's Certified
Network Engineer is an example of vendor-specific
certification offered by the computer industry.
Licensure A certification granted by a government agency to

practice a skill or profession. The license may be
referred to as a registration, such as Registered
Nurse. Licensing is typically required for
independent practice in a profession.
Accreditation An organizational certification recognizing that an

organization or program has met the predefined
standards of the accrediting agency. An example is
IS09000 certification by the International Standards
Organization.
For example, medical boards and associations set standards and determine
who can legitimately practice. In return for granting this professional sanction,
society requires the practitioner to serve the public expertly. "Professionals are
supposed to represent the height of trustworthiness with respect to technically
competent performance and fiduciary obligation and responsibility" (Barber
1983, p. 131). Malpractice suits and public criticism of the professions only
make explicit how fundamental these criteria are. The rationale for certifying
competence is to protect the public.
There are attempts in industry to model certification on the established

professions. The voluntary certification of information systems workers is one
example (Institute for Certification of Computing Professionals, 1996).
According to the Institute for Certification of Computing Professionals (ICCP),
approximately 50,000 people have passed the ICCP exams. The ICCP is
involved in supporting legislation that would protect consumers by
distinguishing certified information systems professionals from "amateurs."
These attempts to emulate professions such as law and medicine have been
successful in setting some common standards in the areas of information
systems, project management, and management accounting. Currently
however, these certifications serve as endorsements of individuals who meet
certain standards rather than as a requirement for practice. The certifying
associations do not yet have the standing to regulate their professions as do
medicine and law.
As well as some individual certifications in fields like training, and

certificates recognizing accomplishment, other types of certification programs
in industry involve compliance with guidelines and laws such as federal
regulations for environmental protection. The business drivers for these
programs include protecting employees and the public as well as ensuring
appropriate competency levels.
Vendor Certification
The fastest growing segment of the certification market is vendor-driven

certification programs such as Novell's Certified Network Engineer. The
vendor certification process assesses and recognizes competence in cases where
standards and knowledge are company and product specific. Again, as with
aspiring professional associations, certification does not grant a license to
practice. Rather, it endorses a level of competence that customers hopefully
seek. However, unlike professional certifications, the rationale for vendor
certification is not protection of the public.
For companies developing vendor certifications programs, the ultimate

rationale is to gain marketing and business advantage. The potentially
improved service quality levels and improved career advancement for
individuals that result from vendor certification occur indirectly through
competitive market mechanisms. As one president of a training company that
specializes in certification states, " ... We've made it [certification] a profit
center" (Filipczak, 1995, p. 42). The value of the certification will be decided
by the market. Are people willing to pay for it?
We can now understand why in the Dataquest report previously cited,

vendor-driven certifications raise issues about the level of competence being
certified and thus issues about the payoff of the certification. One industry
analyst, discussing vendor-driven certification, states,
Though much admired, too much certification can be unhealthy.

The number of 'paper' [certifications] ... is disturbing to those
who have sought and paid for [them] ... only to find out that the
individuals they hired turned out to be more proficient at taking
exams than connecting networks (T.C. Doyle, 1996, p. 128).
Confusion can occur when vendor-driven certifications are mistakenly

considered to have the same driver as professional certification: protection of
the public. To claim malpractice against a vendor-certified practitioner is to
misunderstand the nature of this certification. From a vendor's perspective,
while increased competence requirements might make certification more
attractive to customers, it may also add too many user costs and requirements.
Striking the right balance between competence requirements and other criteria,
such as price and time invested, are challenging design issues for companies
and evaluation specialists building certification programs aimed at market
success. Business strategy and market analysis will determine where the
competence bar will be set: at a level of basic knowledge, competence, or expert
capability (Jones, 1994).
Measurement and Evaluation Issues
Successful implementation of certification as a business strategy requires

resources with expertise in several areas. The areas include subject matter
experts with in-depth knowledge of the content and skills to be certified, legal
resources to assure that applicable laws and regulations are followed, business
resources to assure that the certification program provides a market advantage
and is economically feasible, and evaluation resources to provide specialized
expertise in test design and implementation, as well as the integration of the
business, technical, and legal requirements.
This section focuses on key measurement and evaluation requirements and

issues for evaluators involved in consulting about, assessing, or designing
certification programs. Specifically, the section examines the measurement
concepts, standards, and procedures required to develop and implement
certification tests, and legal issues that arise from implementation of a
certification strategy.
Measurement Concepts: Validity and Reliability
Certification programs almost always involve some form of testing. The

certification is granted in whole or in part based on meeting a passing or cutoff
test score. The precision and accuracy of the test score must align with the risks
associated with its intended use. Obviously, the test design and validation
procedure certifying novice airline pilots to fly differ from those of a self-
assessment test developed for individual career planning. The higher the stakes
and consequences, or the risk of certifying based on a test score, the more
critical it is for that test to accurately and consistently measure knowledge or
performance.
Fundamental to understanding the benefits and risks associated with

certification testing are the measurement concepts of reliability and validity.
Reliability refers to the consistency of the measurement scores and processes.
Validity refers to how accurately a test measures what it intends to measure as
well as the accuracy of test score interpretation. Types of validation common to
certification testing are content, predictive, criterion, and construct validity.
Content validity is the most commonly used approach to assuring accurate test
scores. The Standards for Educational and Psychological Testing states
Content validation should be based on a thorough and explicit

definition of the content domain of interest. For job selection,
classification, and promotion, the characterization of the domain
should be based on job analysis (American Psychological

Association, 1985, p. 60).
Evaluation specialists need to be well versed in describing and establishing

content validity, and understand the processes for establishing criterion related,
predictive, and construct validity. Summary descriptions of these types of
validity are provided in Table 16-2.
Relationship of Validity to Business Use
The business use of certification testing determines how the standards discussed
above are applied. Two possible business uses of certification are selection and
development/classification. Selection involves using the results of certification
testing as the primary basis for making hiring, promotion, or termination
decisions. Certification testing used for development or classification purposes
identify skill levels that are then used to guide skill classification and individual
development discussions. Table 16-3 describes the validation requirements,
risk, and cost for these two business uses of certification.
Clearly, business uses involving selection carry a higher risk and require a
higher investment to ensure a proper validation study. When legal issues
related to equality of opportunity are involved, an organization must present
evidence that the test directly measures the tasks or skills required of the job.
For many knowledge tests where the measurement is indirect, additional
evidence of validity, such as predictive validation, may also be required. When
however, the principal purpose of certification is individual development, the
rigor and requirements for establishing validity are less stringent.
Standards
To optimize corporate investments in certification strategies and to assure legal

compliance, evaluation specialists need to be familiar with standards that guide
the use of certification tests. This section presents an overview of basic
professional, legal, and business standards.
Table 16-2. Reliability and types of validity
Type Definition
Reliability Reliability refers to consistency in scores and processes. High
reliability assures that test scores are not influenced by factors
unrelated to the purpose of the test, e.g., practice taking tests,
skills unrelated to the content of the test.
Content Content validity measures how well a test samples the universe
Validity of behavior, knowledge, or skill under certification. For a test to
be valid, its design and delivery process must ensure that all
knowledge, skill, or performance areas key to certification are
adequately represented in the measurement device. Content
validity is the most common validation approach for certification
testing.
Predictive Predictive validity refers to how well the test score predicts
Validity future performance in the area certified. Although less common
than content validity because it requires more time and effort,
demonstrated predictive validity is required for certification tests
used for job selection where the test may not directly measure job
skills.
Criterion Criterion validity is the statistical demonstration of a relationship

Validity between the test taker's scores and a relevant measurement, such
as successful work performance. When the criterion is future
performance, criterion validity and predictive validity are the
same.
Construct Construct validity refers to how well the test measures

Validity characteristics or psychological traits related to successful
performance. Typically, this is the least used validity measure for
certification tests because the focus in certification testing is on
assessing required knowledge or skills.
Table 16-3. Validation requirements related to business use
Business Use
Selection Development/Classification
Validation Content, construct, Content and/or construct
Requirements and/or criterion validation.
validation.
Rigor, precision, and Rigor, precision, and accuracy

accuracy based on of test development based on
professional and legal professional standards.
standards.
CosURisk High cost. LowlModerate risk.

High risk. Cost and risk correlated with
capability level to be tested.
Professional Standards
Two of the principal sets of professional guidelines for the development and use
of testing are The Standards for Educational and Psychological Testing
(American Psychological Association, 1985) and Principles for the Validation
and Use of Personnel Selection Procedures (Society for Industrial and
Organizational Psychology, 1987). The standards provide technical guidelines
for the development and use of tests, including certification testing. Although
the standards are more widely known, the principles are equally important
because they reinforce the standards and explain in detail how to determine
passing scores for certification tests.
Legal Considerations
The most important U.S. federal law protecting employees against

discrimination is Title VII of the 1964 Equal Employment Opportunity Act. It
is enforced by the Equal Employment Opportunity Commission (EEOC). The
EEOC's Uniform Guidelines on Employee Selection (Equal Employment
Opportunity Commission, 1978) describe the requirements when tests,
including certification tests, are used as an employment selection device. The
Guidelines present validation, record keeping requirements, and discuss issues
related to adverse impact on groups the federal government has designated as

protected groups.
Adverse impact occurs when test results produce a pattern of selection that
does not represent the proportions of protected groups in the population, such
as specific minority groups and females. According to EEOC guidelines, if the
testing tool cannot be shown to be a direct measure of a job task or skill, a
validation study that includes protected groups would be performed to ensure
that adverse impact has not occurred. Members of the protected groups should
include both knowledgeable individuals and novices for comparisons purposes.
The study should demonstrate that the selection rates generated by use of the
testing tool mirror the general population and protected groups population
rates.
The evaluator's role is to assist the business in knowing when to apply the
Uniform Guidelines to certification programs. The key criterion for the
evaluation specialist is to know if certification will be applied as a selection
procedure. A selection procedure occurs when a test score is used as a basis for
employment decisions. The guidelines define any of the following employment
decisions as selection procedures: "hiring, promotion, training, retention,
transfer, membership, compensation, termination, discipline or job assignment"
(Equal Employment Opportunity Commission, 1978, Sn. 2B). In a situation
where selection is anticipated, the evaluation specialist needs to ensure that a
documented validation study is in process (or in place) that demonstrates the
relationship between performance on the test and related job performance. The
study should explore methods of testing or certifying expertise that are less
susceptible to adverse impact.
Assessing potential adverse impact requires a consistent and resolute

attention to record keeping. Even if a third party is providing the procedure
(e.g., certification), the third party program must keep records of the success or
failure of groups entering and exiting the program. The evaluation specialist
should engage legal resources to ensure adherence to these requirements.
Adhering to professional and EEOC guidelines is not sufficient when

establishing a certification testing process in countries other than the United
States. Many European countries have laws regarding the collection, storage,
and distribution of personal data, which includes individual test scores. These
laws require that the test be registered with a proper authority. The registration
typically must include information such as the purposes and process of data
collection; and how and where the data will be stored, accessed, and used. In
certain European countries, restrictions exist regarding the transmission of
personal data across international borders. Some countries, for example
Germany, require prior approval by company worker councils before personal
data collection can proceed.
Business Policies
Some corporations are developing assessment policies to ensure professional

and legal compliance for certification testing. A corporate assessment policy
would list core standards that must be followed when implementing any
certification testing procedure. Such a standard should include topics such as
informed consent. Informed consent means that the individual to be tested must
be informed, in advance, of the purpose of the test, how the test results will be
used, where the test results will be available, and who will have access to the
results. The standards should describe validity and reliability requirements,
administration procedures for ensuring consistency of the test administration
process, and data storage and reporting requirements. Issues of security and
privacy are very important. Many companies have policies that address these
issues as well.
Developing a Certification Test
A core element of any certification effort is the measurement tool used to

determine the skill or knowledge associated with certification. As previously
discussed, the required validity and reliability of certification test scores are
determined by the business use. Evaluation specialists, in reviewing or
designing certification tests, must understand how the validation and reliability
requirements are established in the test design process.
StaIttlards for the development of certification tests have been adopted and
published by testing organizations, testing vendors, industries, and professions
that offer certification as part of the credentialing process. While descriptions
of stages vary slightly by organization, the test development process itself is
well established. The stages are summarized in Table 16-4 and then briefly
discussed. The development of the test blueprint and the determination of a
passing or cutoff score are highlighted in this section because of their

significance to evaluation specialists.
Table 16-4. Stages of certification test development
Stage Activity
1 Job analysis and Identify and describe the skills and knowledge
knowledge specifically related to successful job perfonnance in the
specification certification area.
2 Content domain Using the specifications from Stage 1, develop content
definition categories and learning objectives.
3 Test blueprint Content experts rank the importance of the objectives
definition and content to perfonningjob tasks; priorities are
reflected in a blueprint that identifies number of test
items per obiective, within content areas.
4 Item development Content and testing experts together develop test items
designed to measure blueprint objectives, with the
number and types of items based on the priorities
established in Stage 3.
5 Item review and Content and test development experts review and revise
revision the test items.
6 Pilot test Test is administered to audience comprising content
qualified and less-qualified participants. Based on
results, questions are selected for exam. Questions are
selected to ensure reliability.
7 Detennine passing Evaluation consultants detennine a passing or cutoff
score score, based on results of qualitative review and/or
quantitative results.
8 Administration Test is deployed and results are analyzed on a regular
and maintenance basis to improve and update item pool.
Job Analysis and Knowledge Specification
The first step is to identify and describe the skills and knowledge specifically
related to successful job perfonnance in the certification area. Using
salespeople as an example, the task analysis and knowledge specification might
identify the three key areas of responsibility, work, and skills and knowledge as
described in Table 16-5.
Content Domain Definition
Using the specifications from the task analysis and knowledge specification,
content categories and learning objectives are developed. Content categories
refer to the areas of knowledge that must be certified. Learning objectives are
the level of knowledge needed for certification. Objectives describe in
measurable terms how individuals can demonstrate their knowledge or skill.
The domain definition is a refinement of the first stage of test development. For
example, the objectives for salespeople might include their ability to
1. Successfully negotiate contracts and business agreements within a set of

business standards
2. Accurately describe the strategies, products and business models of
potential channels for a technology
3. Evaluate the potential of channel partners to meet specific revenue targets
4. Effectively maintain relationships with channels partners
Table 16-5. Sample task analysis and knowledge specification
Responsibility Work Skills and Knowiedxe

Lead the development Qualify and 1. Negotiate contracts and business
of account channels develop low-cost, agreements.
(third-party strategy) profitable channels 2. Describe the strategies,
products, and business models
of potential channels.
3. Evaluate ROI of potential
channels partners.
4. Manage relationships among a
variety of partners.
Creating a Test Blueprint
The creation of a test blueprint is a critical element in the design, and

inevitable redesign, of certification tests. It is during this stage that the help of
an evaluation consultant is invaluable. Since in most cases it is not possible to
measure an entire content domain, the blueprint process provides a vehicle to
identify and prioritize the most important content and objectives. The blueprint
establishes content priorities: how much focus is put on a particular content

area and objective. The focus is translated into the number of test questions
asked for a content area and objective. Finally, it provides a tool for updating
the test whenever content changes. Because of frequent developments in
products and technology, content changes in certification areas are quite
common.
Table 16-6 is an example of a test blueprint. Each number in the cells

represents a potential test question. According to this sample blueprint, test
content priorities revolve around objective 4 and content area B.
Resources and references to test design and test blueprinting (O'Brien,

1988) are available from industry, academic, and professional testing
organizations, as well as the World Wide Web. In addition, many of the major
vendors, such as Microsoft and Novell, who offer certifications on their
products, also sponsor certification test development training.
Item Development, Review, and Revision
Content and testing experts together develop test questions. Test items and
tasks are designed to measure the blueprint objectives, with the number and
type of items based on the priorities established in the test blueprint. For
example, test items for Objective 2 may include not only multiple-choice
questions, but also performance tasks such as a presentation on the strategies,
products, and business models of potential channels.
Piloting the Test
The test should be pilot tested with groups of individuals who are
knowledgeable and who are not knowledgeable about the test content. Pilot data
should be analyzed at the group and item level. Item statistics should
demonstrate high difficulty for less knowledgeable individuals and low to
moderate difficulty for knowledgeable individuals. Items should demonstrate
the ability to discriminate between the two groups. Items that do not
discriminate or do not demonstrate appropriate difficulty levels should be
reviewed and considered for revision or deletion from the test.
Table 16-6. Sample test blueprint
Content Content Content Content

Area A AreaB AreaC AreaD
Objective 1: 1,2 25,26
Negotiate contracts and
business agreements.
Objective 2: 4,5,8,
Describe the strategies, 10,11
products, and business
models of potential
channels.
Objective 3: 3,6,7,9 27,28 29,30 23,24

Evaluate ROIof
potential channels
partners.
Objective 4: 12,13,14, 16,17, 20,21,22

Manage relationships 15,22,31 18, 19
among a variety of
partners.
At the group level, average or mean scores between the two groups must
differ. The midpoint of the average difference between the two groups for the
remaining items can be used as one data point in establishing an approximate
passing score.
Establishing a Passing Score
The use of cutoff or passing scores in certification tests is a well-established

practice. In fact, the use of examinations with pass/fail lines to determine
qualifications has been recognized by the U.S. Supreme Court (Cascio,
Alexander, & Barrette, 1988, p. 2). Since individuals can be granted or denied
opportunities based on a test score, evaluation specialists need to understand
relevant professional and legal standards that apply to setting and using passing
scores.
Passing scores are often based on a decision made by an administrative

body, such as a licensing board, professional standards board, or Civil Service
Commission. The general principle accepted by the courts is that the
administrative body has been given the legal authority to give tests and to set
passing scores (Cascio, Alexander, & Barrette, 1988, p. 3).
Another theme that is commonly found in case law has to do with the
reasonableness of the passing score. The administrative body that sets the
passing score must show that there is a relationship between the passing (or
cutoff) score and the purpose of the examination (Cascio et al., 1988, p. 3).
This concept of reasonableness is supported by the Uniform Guidelines on
Employee Selection Procedures published by the U.S. government's Equal
Employment Opportunity Commission (1978). The uniform guidelines state
that "Cutoff scores should normally be set so as to be reasonable and consistent
with normal expectations of acceptable proficiency within the workforce"
(Equal Employment Opportunity Commission, 1978, Sn. 5H).
Cascio, Alexander, and Barrette (1988, pp. 8-9) highlight a number of other
criteria used by the courts in determining what makes a good cutoff (or passing)
score. As related to certification tests, these include the following criteria:
• The cutoff score should be consistent with the results of a job analysis.
• The passing score should permit the selection of qualified candidates.
• Expert opinion/recommendation should be considered in setting the score.
• The passing score must have a clear rationale.
The case law (Cascio, Alexander, & Barrette, 1988, p. 8) on setting passing
scores suggests that there is "no single, mechanical, quantitative approach that
is accepted or required by the courts." However, a typical quantitative approach
is to test two groups, one made up of individuals expected to possess the
required skills and the other of novices. The midpoint between the test score
averages of each group provides an approximate passing score.
A commonly used qualitative method, sometimes referred to as the "Angoff

method," involves having content experts evaluate each test question. They
assess the probability that a barely qualified person could answer the item
correctly. The cutoff score is set based on the expected number of correct
answers for the barely qualified test taker (Angoff, 1971). The recommended
method is to use both quantitative and qualitative approaches in setting cutoff

scores for certification tests.
Certification: Is it the Best Approach to Meet a Business

Goal?
The growth of certification programs in industry has placed more focus on

testing and assessment issues, and emphasis on skills and competence.
However, certification as a proxy of competence, especially in the growth area
of vendor certification, can sometimes be perceived as a paper credential that
does not ensure the level of competence needed to perform the work.
New entrants in the certification training market like the Chauncey Group
are advertising higher levels of competence testing. One can foresee an ever
growing ladder of certifications that assess and recognize progressively higher
levels of competence.
For many business certifications it makes sense to ask the question, What is
the value of the certification? As customers become more informed about the
costs and benefits of certification, they will be better equipped to assess the
value of certification in specific contexts. Already, businesses and employees
are starting to recognize where certification programs make sense and where
they do not.
To decide whether certification programs add value, corporations must view

certification, with its associated costs and risks, as a strategy for achieving a
specific business goal. The following questions are critical to deciding whether
certification is the best business strategy for meeting specific goals:
• Is certification required for compliance with policy or law?

• Will certification provide a market differentiator or advantage?
• Will customers pay for certification? Is it economically feasible?
• Is the program professionally acceptable and legally defensible?
• Is the program administratively feasible?
• Is the certification publicly or organizationally credible?
Evaluators have an important role to play in helping businesses and

consumers understand the risks and assessment implications of certification
programs. Certification will continue to play an important role in society and
businesses. Evaluation specialists can help to ensure that certification plays an
appropriate role.
References
American Psychological Association. (1985). Standards for educational and

psychological testing. Washington, DC: Author.
Angoff, W.H. (1971). Scales, norms and equivalent scores. fu R. L. Thorndike (Ed.),
Educational measurement. Washington, DC: American Council on Education.
Barber, B. (1983). The logic and limits of trust. (USA): Rutgers University Press.
Cascio, W.F., Alexander, R.A., & Barrett, G.V. (1988). Setting cutoff scores: Legal,
psychometric, and professional issues and guidelines. Personnel Psychology. 4 (1),
1-24
Dataquest. (1995). Education and training: Market analysis and outlook. Author.
Doyle, T.e. (1996, July). Who you gonna call: An inside look at the topsy-turvy world of
training and certification. VAR Business, 128.
Equal Employment Opportunity Commission. (1978). Uniform guidelines on employee

selection procedures. Washington, DC: Author.
The Chauncey Group futernational Ltd. (1996a). [On-line]. Available:

http://www.chauncey.comlchncy.htrn
The Chauncey Group futernational Ltd. (1996b). [On-line]. Available:

http://www.chauncey.comlaeq007.htrn
Filipczak, B. (1995). Certifiable. Training.32 (8),38-42.
Freidson, E. (1970). Profession of medicine: A study of the sociology of applied

knowledge. Chicago: University of Chicago Press.
Institute for Certification of Computing Professionals. (1996). [On-Line]. Available:

http://www.iccp.org/
Harrison, B. (1994). Lean and mean: The changing landscape of corporate power in the
age of flexibility. New York: Basic Books.
Jones, P. (1994). Three levels of certification testing. Performance & Instruction. 33 (9),
22-28.
O'Brien N. (1988). Test construction: A bibliography of selected resources. New York:

Greenwood Publishing Group.
Society for Industrial and Organizational Psychology, Inc. (1987). Principles for the
validation and use of personnel selection procedures ( 3td ed.). College Park, MD:
Author.
Standler, B. (1996). Certification programs: earning respect. [On-Line]. Available:

http://www.servicenews.comlI99619604tlbx.htm
Waterman, R.H., Waterman, 1., & Collard, B.A. (1994). Toward a career resilient
workforce. Harvard Business Review. 72 (4), 87-95.
About the Authors
Dr. Kahane is currently the Worldwide Finance Program Training Manager at

Digital Equipment Corporation. He has been involved in certification program
consulting for the past five years and was the Certification Program Manager
for Digital's Sales Training and Marketing organization. He has managed the
development of a corporate assessment standard and certification guidelines. As
an adjunct professor at Boston University, he taught in the areas of adult
learning and training theory. Dr. Kahane received his Ph.D. from the
Department of Educational Policy Studies at the University of Illinois.
William McCook recently joined Fidelity Investments in the role of Director of

Competency, Assessment and Measurement for the Fidelity Institutional
Retirement Services Company's Service Delivery University. Prior to his role at
Fidelity, he held various training measurement and program evaluation
consulting positions over a nine-year period at Digital Equipment Corporation.
In his latest position at Digital, he managed an internal consulting group that
focused on competency research and applications, program evaluation, and

certification projects. His experience in the evaluation and measurement field
also includes consulting for school systems, government agencies, law firms,
and professional schools, as well as university teaching and academic research
assignments. Dr. McCook received his Ph.D. in Evaluation and Measurement
from the University of Connecticut.
17 LESSONS FROM EDUCATION Jean Moon
I have now been a psychologist for 21 years, and one thing of

which I am certain is that I have never--not even once----had
to do in the profession what I needed to do to get an A in the
introductory course, as well as in some of the other courses.
Robert Sternberg (1996)
Over the past decade education, particularly elementary education, has been
redefining the work of teachers, schools, students, and classrooms. This
redefinition has been fueled by a confluence of events involving new
understandings about learning and the importance of connecting the work of
schools with everyday life and with the work of business and industry. A set of
assumptions has emerged from this amalgam of political, social, economic, and
educational events about the nature of learning and how to evaluate student
learning; these assumptions emerged in a somewhat piecemeal fashion, and
since the mid-1980s have helped direct present reforms in education. It is
important to examine past assumptions about school-based learning as well as
present ones that are now quite influential in shaping the ways in which
schools are evaluating students' performance.
For purposes of this chapter, I will identify a representational set of

assumptions that influence student evaluation practices in today's classrooms,
kindergarten through grade 12. I have selected the four assumptions that appear
below not because they represent every side of the discussion about evaluation,
but because I believe they mirror the disposition of many teachers, parents,
administrators, and policymakers about the relationship between learning and
assessment.
360 Lessons from Education
• Learning must be the center of schools and schooling.

• Learning is not so much a reflection of what students know, but what
they can do with what they know.
• A common understanding or set of standards is needed, defining the
knowledge and abilities students should have acquired before they
graduate from high school.
A fourth assumption has a significant relationship to the first three:
• Traditional testing and evaluation strategies are no longer sufficient to
document student learning when learning is understood to include not
only what students know (information) but also how they put
knowledge and skills to use.
This chapter will examine the historical roots of these four assumptions and
how they have set the stage for important lessons that can be shared with any
organizations interested in helping to assess and shape human potential and
performance.
Pictures of Teaching, Learning, and Testing-Where We

Have Been
Central to how many schools and school districts are rethinking what happens
in classrooms is the recognition that learning, not just teaching or testing, is at
the heart of the work of schools. This seems like an obvious statement. Hasn't
learning always been what schools are about? No, surprisingly it is not. In fact,
a persuasive argument could be made that from the early 1900s until quite
recently schools have not put learning at the center of a child's school
experience, but rather standardization of what was taught.
Early on in my career in education I remember hearing a description of

public schools that went something like, "At any time during a school day you
can glance at a clock and have a very good idea of what is happening in any
elementary classroom in the United States." This statement was made with no
apologies that I could discern; indeed, there was a certain smugness in thinking
that at the heart of education was the necessity for efficiency and control;
curricular resources, pedagogical approaches, and testing practices were

efficiently standardized so that what was being taught was tightly controlled.
It was believed that such standardization best prepared children for

employment in rote-oriented jobs, where procedures and processes were
predictable, routinized. As a consequence, learning became identified with the
processes of memorization of facts and general subject-specific concepts and
with the idea that the teacher was the source of information, the students the
recipients. There was little interest or need to disentangle learning from
teaching. In other words, what you learned should mirror what was taught; nor
was there a societal need to do so, since the schools were in fact preparing
students for factory-model jobs and large, multilayered yet centralized
corporations. In this model of learning, students were fairly passive participants
in the educational process, following the step-by-step instructions of their
teachers and textbooks. For years---<lecades really-what a child was taught at
a particular grade level was determined in large part by what were termed
behavioral objectives common to grade levels and to most teachers' lesson
plans.
Behavioral objectives are statements about the discrete skills a child should
be expected to learn within a specified span of time at a particular grade level.
A behavioral objective in elementary mathematics might state that a child
should be able to count forward and backward between one and one hundred by
the end of first grade or know the multiplication tables through the nines by the
middle of fourth grade. These objectives were generally put together not by
teachers but by textbook publishers in consultation with professors of education.
They provided teachers and administrators with a detailed set of benchmarks or
goals in every subject of what was to be learned when by the average child.
As a result what was taught in schools was predictable, as most teachers

tended to follow these behavioral objectives in their everyday lessons as a
matter of routine and in concert with what had been learned in teacher-training
programs. Teaching to these objectives was reinforced by commercial textbook
providers who sequenced the lessons around the attainment of the objectives. In
this model of education, employed between the early 1900s and the early 1980s,
teaching goals were about completing all the lessons in a textbook during a
school year, not necessarily going deeply into anyone topic. Coverage of a
curriculum, not student exploration, was the mantra of administrators and
policymakers. Teachers had few options but to cover what was mandated.
Whether intended or not, lessons constructed around behavioral objectives

create a fragmented learning experience for a child; when facts are emphasized
the conceptual connections across subjects are neglected. For example, the
connections between mathematics and science, while readily experienced in
day-to-day living and working, will frequently not be addressed in a fact-
oriented curriculum-though every day we do encounter the linkages between
mathematics and science when we prepare dinner, discuss the latest weather
patterns, make judgments on home repairs. or work in a variety of occupations,
from agriculture and aviation through waste management. However, when
teaching emphasizes things rather than ideas, answers rather than good
questions, the task of connecting definitions, vocabulary, or procedures with the
larger ideas to which they relate becomes formidable, if not impossible. unless
the interconnections have been suggested. modeled, and practiced.
How did we verify children were learning their general fund of knowledge
set out in behavioral objectives? Testing mastery of these benchmarks was a
fairly straightforward task because of the fact-oriented nature of classroom
lessons-knowing dates. places, rules of grammar, arithmetic algorithms,
routine procedural skills, definitions, and vocabulary. Generally. testing
required no more than a standardized paper and pencil format, often provided
as part of the textbook in use, where students could fill in or circle the correct
answers, or mark a statement true or false.
From the early 1900s until the late 1980s standardized test scores were
commonly the evidence upon which program and student success was
determined. If students scored well on a particular test. one could assume they
had learned the subject matter at hand. For those who did not score well.
judgments were easily made that they had not studied or learned the material.
were not good students. were underachievers. or did not like school. Along the
way schools, school districts, and to some degree teachers became increasingly
accountable to public-policy makers and education officials on the basis of their
students' test score results.
Between 1972 and 1985. the number of mandated state testing programs
grew from 1 to 34; by 1989 every state had some type of mandated testing
program (National Summit on Mathematics Assessment. 1991 p. 5). Public-
policy makers, school officials. and parents insisted on using test results to
judge the merits of an educational experience. A score-based system of
evaluation provided an efficient way to rank and sort students and schools, as
well as to define school success for students, parents, teachers, administrators,
public-policy makers, and, interestingly enough, real-estate agents. This came
to be increasingly the case despite mounting evidence as to the prejudicial
nature of standardized testing instruments with respect to minority children and
the rash of grade inflation, the so-called Lake Wobegon effect, which suggests
that "all" children are above average.! The role of standardized testing in
defining learning and learning potential has been a contentious subject
throughout history of public education. Reducing a child's educational
experience and his or her potential for future educational experience to a single
point value or percentile, while reassuring to some, has been horrifying to
others.
Vito Perrone, Dean of Harvard's Graduate School of Education, wrote in

1991:
In many schools, teaching to the test has become a

significant part of the curriculum. And though the test
facsimiles and tricks that such a process comprises may raise
test scores, they are hardly the grist for an empowering
education. Rising test scores are no longer matters for public
celebration .... Test scores in New York City, for instance,
have been rising for a decade, and averages are now above
national norms. But the popular view is that the city's schools
are in a state of collapse, offering students too little
substantial education. In fact, almost every school district in
every state now reports above-average scores on most testing
programs, but people generally believe that schools have
made little significant, large-scale improvement (p. vii).
1 In 1988 John Cannell, a family physician in Beaver, West Virginia,

announced that he had surveyed state education departments to find that nearly
all reported that their students were scoring above national norms. Followers of
humorist Garrison Keillor were reminded of his description of the mythical
town of Lake Wobegone where "all the children are above average." (See
Cannell, 1987, 1989.)
Likewise, Kathe Jervis (1991) comments on the long-term impact of testing

on children.
When tests determine what is taught, the curriculum

narrows. Children are given fewer opportunities to develop
their strengths and spend more time in drills for multiple-
choice exams. Occasions for children's initiative are reduced.
Children are encouraged to be more passive, more obedient to
authority, and less enthusiastic about asking and answering
open-ended questions. The "right answer" becomes the goal
and children are left practicing isolated test-taking skills (p.
4).
The Shift in the School Paradigm
In the mid-1980s, amid controversies over the role of tests and test scores,
increasing student diversity in America's urban classrooms, and a clamor to
increase the length of the school day and school year, a number of factors
converged, which sparked the forces of educational reform that we are
continuing to experience today. The flrst was the dismal performance of
American children in two subject areas, mathematics and science, thought to be
key to this country's economic competitiveness in an increasingly global
economy.
The second was that as test scores fell well below many of this country's
industrial competitors, including Japan and Germany, concerns began to be
raised about the national impact of a student's kindergarten through grade 12
or college experience. Even students who graduated from U.S. high schools and
colleges with acceptable, if not excellent, grade-point averages arrived at the
workplace or in graduate school programs lacking the abilities and skills to
solve problems, to reason through complex and diverse data, to perceive issues
from a variety of perspectives, and to communicate effectively. Business and
industry had to expend resources teaching what they assumed should already
have been taught-how to solve problems. One policy response forged between
business and the government to these shared concerns was former Secretary of
Education Lamar Alexander's national plan for education contained in Goals
2000.
Third, while reports were lamenting the quality of schools and the
experience of schooling, research activity among cognitive scientists produced
information that was to change the way we would think about how learning
occurred and would ultimately significantly impact the way schools went about
the business of teaching and testing. This research described the need for
children and adults to construct their own understanding of concepts and
relational ideas because information and understanding is stored in large
cognitive networks that develop continuously. At any given time we have
within our mental structures a weblike pattern formed from in-school and out-
of-school experiences. Each central point in this network of mental activity
represents our conceptual understanding of abstract ideas such as government,
arithmetic operations, temperature, color, and so on. Thus the process of
learning involves providing opportunities to fit new information into previous
ideas as our cognitive networks become increasingly sophisticated over time
through multiple experiences that are social and interactive in nature and that
involve elements of conflict, contradiction, and consequence.
This conceptual-network model of knowledge is very different from how we

used to think information was stored and retrieved. Previously it was assumed
that information was stored in a much more linear pattern, as if in a file
cabinet, and that understanding came from the accumulation in a single "file"
of discrete pieces of information about a larger topic (Moon & Schulman,
1995). The goal was to provide students with as many pieces as
possible-dates, names, vocabulary, and procedural skills-so that these could
be called up or pulled out of the file when necessary. The teaching and testing
model to support this storage-and-retrieval image of learning rested on the
assumption that knowledge is objective and can be drilled and pressed into
passive, somewhat blank-slatelike, students, to be called out on cue.
However, we now understand that memorization, while useful for the short
term, does not have long term "staying power" within our mental structures. In
addition, the passivity of students in the learning process does not instill habits
of inquiry needed for a lifetime. We have only to remember all of the hours
spent in foreign language classes to know the limitations of memorization as
proof of learning for the long haul. Likewise, rote and memorized learning
goals create a set of instructional practices focused on individual learning
rather than learning that requires the interactivity and social exchanges that are
more reflective of the reality of today' s work environments.
While the hue and cry over low test scores and insufficient preparation
around critical thinking were perhaps the most audible complaints in the mid-
1980s, a fourth concern about education was beginning to take shape with the
emergence of the information society. With the exploding growth of
technology, the shift to a global economy, and an increasingly information-
dependent society, there was a growing realization that learning must be
ongoing throughout the course of one's lifetime, not just in the formal
educational years. The concept of lifelong learning emerged and, along with it,
the need to look at how elementary and high schools were preparing students
with the skills and abilities to become lifelong learners. A consensus grew
across business and industry that American workers, for reasons of everyday
literacy and economic competitiveness, had to be forced to become competent
in processing large amounts of information. Never again would one learn all
the information needed for a lifetime of productive living and working in a
formal school setting. Never again would work and learning be seen as separate
activities in schools charged with passing on a specific fund of knowledge. The
factory model of education was waning. Schools needed to change.
Changing to What?
If the work of schools could and should no longer be the transmission of a

general fund of knowledge to students, if we were faced with an information-
age driven definition of jobs, then what should be the purpose of kindergarten
through grade 12 education? From 1985 until the present day, many diverse
groups and constituencies in multiple forums have discussed, debated, and
responded to varying iterations of this question. Within these debates the
following issues have been raised:
• What should students know and be able to do in the various subject

areas that make up the K-12 curriculum?
• What is the evidence that students know something?
• How do we meet the challenge of increasingly diverse populations of
students within our classrooms?
• What educational outcomes do we value as a society?
• How can we make learning more about thinking and engaging the
intellect and less about memorizing?
• How can we better prepare our children for a world of complex

problem-solving rather than knowing a set of fixed answers?
• How can we best judge mastery of the educational outcomes we say we
value?
These questions have acted as a definitional compass, prodding teachers,
administrators, parents, and policymakers to define key elements of learning
situations, and how to evaluate what students are expected to know as well as
what they can do with what they are expected to know. In the quest to bring
together knowing and doing, a growing number of education reformers,
policymakers, teachers, administrators, and parents have come to be advocates
of performance assessment, an evaluation strategy that requires a
demonstration of content knowledge along with the application of that content
knowledge through such skills as problem solving, reasoning, communicating,
and the ability to engage in team work. Performance assessment has the
potential to take school-based evaluation to the next evolutionary point, not
only to be sure of knowledge acquisition, but of knowledge in use.
Grant Wiggins (1993), an influential voice in assessment discussions,

suggests that schools aim for two interrelated goals: to transmit what is known
and to shape an ability for intellectual autonomy. The role of evaluation
therefore is twofold as well: 1) to assess what students know and 2) to assess
whether what students know has any meaning for them---am they use that
knowledge appropriately, creatively, strategically, to solve problems, to make
sound decisions, to reason logically, and so forth. Wiggins continues:
Much [of) learning paradoxically requires unthinking

learning of pat formulas, drills, and declarative statements.
But understanding is something different than technical
prowess; understanding emerges when we are required to
reflect upon achievement, to verify or critique---thus to
rethink and relearn--what we know through many "tests" of
experience, action, and discussion (knowledge-in-use).
Understanding involves questioning, as Socrates so often did,
the assumptions upon which prior learning is based.... Our
task, then [as educators] is more difficult than we perhaps
first imagined. The quest is not for "better" tests, as if a mere
technical deficiency or ignorance lay at the heart of our
current problems in assessment for ultimate educational aims.

We have to bring to consciousness and thoughtfully examine
our deep-seated habit of seeking superficially correct answers
to uniform questions (Wiggins, 1993, p. 37-38).
How do we describe performance assessment in a manner that differentiates

it from testing of a more traditional kind? Grant Wiggins (1992) identified
performance-based tasks as having the following characteristics: The task is ill-
structured but well-defined; the goal, specifications, or desired effect is known,
but it is not obvious how to meet it. Edward Chittenden (1991) describes
performance assessments as samples of children's work that are tangible
documents, or artifacts of their accomplishments-their writing, reading,
drawing, computations, models, constructions. Jay McTighe (1996) suggests
that performance-based instruction and assessment should tell us if students can
apply facts, concepts, and skills appropriately to new situations. In previous
writing (Moon & Schulman, 1995; Moon, 1997) I have described performance-
based assessments as open-ended tasks that contribute to evidence-gathering on
students' understanding of knowledge and abilities than is qualitatively
different than the production of correct answers. Open-ended tasks refer to
activities that do not end artificially when students arrive at a correct answer.
Performance-based tasks are conceptually more complex and demanding,
frequently involving a number of steps with multiple answers. The emphasis on
new tasks is on "dOing something" with information, knowledge, not stopping
with a correct answer to a well defined problem.
Figure 17-1, ''Two Approaches to Testing in Math," clearly exemplifies the

distinction between traditional test items and performance assessment
(Wiggins, 1993, p. 114).
Figure 17-1 Two Approaches to Testing in Math

Standardized Test Questions on Volume
1. What is the volume of a cone that has a base area of 78 square

centimeters and a height of 12 centimeters?
a.30cm3
b.312cm 3
c.936cm3
d.2808cm3
2. A round and a square cylinder share the same height. Which has
the greater volume?
A Perfonnance Assessment on Volume
Background: Manufacturers naturally want to spend as little as possible, not only on the
product, but on packing and shipping it to stores. They want to minimize the cost of
production of their packaging, and they want to maximize the amount of what is
packaged inside (to keeping handling and postage costs down: the more individual
packages you ship, the more it costs).
Setting: Imagine that your group of two or three people is one of many in the packing
department responsible for M&M's candies. The manager of the shipping department
has found that the cheapest material for shipping comes as a flat piece of rectangular
paperboard (the piece of posterboard you will be given). She is asking each work group
in the packing department to help solve this problem: What completely closed container,
built out of the given piece of posterboard, will hold the largest volume of M&M's for
safe shipping?
1. Prove in a convincing written report to company executives that both the

shape and the dimensions of your group's container maximize the volume. In making
your case, supply all important data and formulas. Your group will also be asked to
make a three-minute oral report at the next staff meeting. Both reports will be judged
for accuracy, thoroughness, and persuasiveness.
2. Build a model (or multiple models) out of the posterboard of the container
shape and size that you think solves the problem. The models are not proof; they will
illustrate the claims you offer in your report.
Although interest in performance assessment has been widespread over the

last five to six years, progress to date has been uneven across the United States.
One difficulty is that this kind of evaluation entails new roles for teachers and
students. Performance assessment requires that teachers design assessment
tasks and identify the criteria for judging the effectiveness of student
performance. Assessment also demands much greater alignment between what
is taught and what is evaluated; the best assessment tasks are those that are
closely connected to the everyday learning experiences of students.
For students, performance assessment requires greater involvement, less

passivity. Less time is spent responding to a well-defined question or statement.
Instead students must make judgments about how to solve problems, construct
reasonable and defensible solutions, and make choices among those solution
strategies. Students become knowledgeable about the criteria by which their
performances will be judged, instead of wondering if they got just the right
answer or what their percentage of right versus wrong answers was.
Performance-based assessment requires teachers and students to shift to
investigating ways of promoting an understanding of the entirety of student
performance, not just single-subject mastery, with an eye to continually
improving performance over the long term. Following are examples of
performance-based assessment being used in schools across the United States.
Performance Assessment Examples
A Senior Exhibition
In 1994, Greenwood Laboratory School, on the campus of Southwest Missouri

State University, added the senior exhibition as a graduation requirement,
above and beyond traditional Carnegie units (Sills-Briegel, Fisk, & Dunlop,
1996). The exhibition provides an opportunity for seniors to demonstrates their
ability to do the following:
• communicate effectively in writing

• communicate effectively in speaking
• think critically
• access information from a variety of sources

• express ideas creatively.
The exhibit has three parts-a written component, a public speaking
component, and a multimedia component. The written component is much like
a traditional research paper, which analyzes a topic from an interdisciplinary
perspective. In the public speaking component the student provides a five-to-
seven minute formal synopsis on the completed research topic, and engages in a
20-minute question-and-answer period where the student's knowledge of
content and analytical skills are explored. The multimedia presentation asks the
student to analyze some aspect of the research topic using a variety of media
such as drawing, painting, weaving or the use of electronic technologies.
Students are given 45 minutes each to present all three aspects of their
exhibition. The evaluations are open to the public, other students, faculty, and
parents.
Exhibitions are not graded using traditional letter grades. Students have to
demonstrate a level of proficiency on the publicly-stated criteria in five areas:
personal responsibility, critical thinking, writing, public speaking, and
multimedia. If they do not, students revise and repeat their exhibition until it is
judged proficient. Upon completion of the exhibition, the student's advisory
committee members, generally made up of three teachers from different subject
areas, meet with the student to provide feedback. The student receives a copy of
the scored rubric and a written narrative addressing the student's work.
The use of an exhibition as an example of a performance assessment

provides students, teachers, administrators, and parents with an opportunity to
observe firsthand a number of key indices about learning. There is the
opportunity to follow not only the way a student solves a problem, but how she
defines a problem for research. We observe not just learning as directed, but a
student's ability to sort out the essential elements of a problem. We can observe
how a student negotiates his way through the ticklish learning terrain of
competing solutions, instead of finding the right answer. An exhibition focuses
on the lifelong skills of inquiry, persistence with a task, personal values in
competition with available solutions, and communicating ideas.
A Fourth-Grade Mathematics Task
The fourth-grade classes in a small school district in Massachusetts used the

following assessment to determine student progress on a range of mathematics
skills and critical thinking abilities, such as reasoning. Below (Figure 17-2) is
an overview of this task for students in the fourth and fifth grades (Moon,
1997).
Figure 17-2 What We Recycle

During one week students should keep track of what is recycled and how much is
recycled in their homes. Before they begin this investigation you will want to brainstorm
with them questions about how to construct a way to keep track of what is recycled so
that everyone's information is organized in similar units.
For example, you might want to ask how everyone is going to determine what recycling
categories to have ... paper, plastic, and glass? Other categories? Will students weigh
each kind of recycling materials or will they count them? Or will they do some kind of
combination? What makes the most sense? Students are to explain why they choose the
units they did.
When the students have brought their recycling information to school and you have
publicly posted the information, have each child build a graph that displays this
information.
After completing the graphs, have your students write letters to their parents describing
and interpreting their graphs. You may also ask your students to write about what the
graph does not tell about what we recycle.
A Middle-School Writing Project
Teachers in North Carolina developed the following assessment task, titled

Birds and Soldiers (Wiggins, 1996, p. 22).
Wildlife officials and politicians are at odds because of the

rare red-cockade woodpecker on the Fort Bragg military base.
Fort Bragg officials have to limit military training exercises
because of the protection required for the birds under the
Endangered Species Act. The Act states that an endangered
bied's environment cannot be tampered with. Almost half the
known red-cockade woodpecker population is located on the

base. Your task is to proposed a workable solution to the
problem, based on a careful review of the military's needs and
the relevant law. You will write a report and make a speech to
a simulated EPA review board.
Colleges Beginning to Move to Match K-12 Assessment

Reforms
By the year 2000, a growing number of states, such as California, Colorado,

Georgia, Oregon, and Wisconsin, plan to abandon the traditional college-
application procedures in which standardized test scores, Scholastic
Achievement Tests (SAT) and the American College Testing (ACT) play a
pivotal role. In Oregon, new proficiency-based admission standards in six
content areas will replace grades, class rank, credits, and fill-in-the-bubble test
scores as the barometer of whether students have the skills and knowledge to
succeed in college (Olson, 1996). In Washington state, the higher education
coordinating board formed a task force on admission standards in response to a
demand by lawmakers that requires high school students to graduate based on
what they know and can do. In Wisconsin, students from a small but growing
number of high schools are able to apply to any institution in the University of
Wisconsin system using an alternative reporting form that describes their
achievement in five academic subjects. Colorado Governor Roy Romer has
consistently championed the need for higher education to adopt clear standards
of graduate achievement and to credential students through demonstrations of
achievement instead of credits based on seat time (Ewell, 1996). Romer's
efforts to link standards and assessment is a driving force behind the newly
emerging Western Governors' University, an institution that is intended,
through performance-based assessments, to certify and credential learning
gained chiefly through the electronic technologies such as telecommunications
and the Internet.
Colleges are beginning to define what they see as a mismatch between what
higher education measures--Carnegie units and end-of-course grades---and the
direction of change in K-12 education. As elementary and high schools
develop new standards for what students should know and be able to do as well
as new ways for students to demonstrate that information, college admission
systems will follow suit. Changes in the ways that institutions of higher
education come to grips with fundamental shifts in testing practices in the

kindergarten through grade twelve classroom merit close watch over the next
decade. Increasingly students will be graduating with earned credits or
described levels of achievement in state or district mandated outcomes that bear
little resemblance to traditional subject area mastery. How are colleges going to
evaluate these achievements? How are colleges and universities going to
accommodate a growing number of students who will demand more of a
learning experience than lectures and taking notes? These are the questions that
will occupy higher education over the next decade and beyond.
Evaluation Lessons Learned to Date
This chapter has offered a perspective on the thoughts and actions of education
reform over the last fifteen years around issues of student evaluation. What are
the lessons learned to date? Are any of the lessons transferable to learning
organizations in general? Below I have identified ten such lessons; no doubt
there are more, and more will evolve in schools that are serious about
understanding the best ways to measure human performance within the context
of kindergarten through grade twelve.
• Learning and performance, while not synonymous, are inextricably

related.
• Any useful assessment of learning, regardless of the context or the
chronological age of the learner, involves active engagement with real-
life phenomena; paper-and-pencil tests are not always effective
measures of how people will perform in everyday situations
• If we want to know what people have learned, ask them to demonstrate
in a variety of ways what they purport to know.
• Knowledge does not get used in real life along discrete subject-area
lines; therefore, curricula and assessments should emphasize the
connections among disciplines.
• Examinations that call for complex, demanding tasks can be given to a
wide range of students/employees, not just those achieving at the
highest levels in an organization or classroom.
• Evidence of one's ability to collaborate, regardless of grade-level or

job, is an essential skill in today's society; therefore, evaluation of
perfonnance should include opportunities for collaboration.
• A critical part of evaluation is one's ability to reflect on one's
perfonnance, to gain an understanding of one's strengths and
weaknesses in that perfonnance situation.
• Human potential is realized more fully when feedback on one's
perfonnance is part of the assessment scenario; therefore, it is difficult
to improve without an opportunity to discuss that perfonnance with
one's teacher, mentor, or coach.
• We learn more about human potential through publicly shared
indicators and valid descriptors of acceptable perfonnance, not by
arbitrary cutoff scores based on standardized tests.
• We will understand more about what distinguishes an acceptable
perfonnance from an excellent one when we have more thoughtful
examples of the specifics of those perfonnances which anchor our
judgments in more authentic ways than arbitrary scores.
Perfonnance-based assessment is intended to document evolving student
abilities. Its evaluative framework is quite different from that of standardized
testing, which seeks to rank or assign children rather than diagnose and
document their development. Whereas standardized testing allows us to make
comparative statements about how students, schools, and school districts score
on specific skills, alternative assessment provides infonnation on an individual
student's evolving understanding, which is unique and does not fit well with
nonnative comparisons across groups. While the purposes of standardized
testing and perfonnance-based assessment may be complementary-though
many believe they are not:-their methods of monitoring or predicting
perfonnance are not the same. Exchanging one framework for the other begs
the issue, which is, How do we describe human perfonnance in a way that
allows for continual development of skills and abilities that are critical to
school and life success? This is an essential question facing any organization
committed to learning.
References
Cannell, 1.1. (1987). Nationally normed elementary achievement testing in America's

public schools: How all fifty states are above the national average. Daniels, W.VA:
Friends for Education.
Cannell, 1.1. (1989). The "Lake Wobegon" report: How public educators cheat on
standardized achievement tests. Albuquerue, NM: Friends for Education.
Chittenden, E. (1991). Authentic assessment, evaluation, and documentation of student

performance. In Perrone, V. (Ed.). Expanding Student Assessment. Alexandria, VA:
Association for Supervision and Curriculum Development.
Dewey, J. (1933). How we think: A restatement of the relation of reflective thinking to

the educative process. Lexington, MA: Heath.
Ewell, P. (1996). Speaking virtually: Assessment and the Western Governor's New
University. Assessment Update. 8(5), 10.
Jervis, K. (1991). Closed gates in a New York City school. In Perrone, V., (Ed.).
Expanding Student Assessment. Alexandria, VA: Association for Supervision and
Curriculum Development.
McTighe, J. (1996). What happens between assessments? Educational Leadership.

54(4),6-13.
Moon, J. (1993, October 28). Common understandings for complex reforms. Education
Week.
Moon, 1., & Schulman, L. (1995). Finding the connections: Linking assessment.
instruction. and curriculum in elementary mathematics. Portsmouth, NH:
Heinemann.
Moon, J. (1997). Develooinl! iudl!ffient: Assessing children's work in mathematics.

Portsmouth, NH: Heinemann.
National Commission on Testing and Public Policy. (1990). From gatekeeper to

gateway: Transforming testing in America. Chestnut Hill, MA: National Commission
on Testing and Public Policy, Boston College.
National Summit on Mathematics Assessment. (1991). For good measure: Principles

and goals for mathematics assessment. Washington, DC: National Academy Press.
Olson, L. (1996, September 25). Colleges Move to Match K-12 in Admissions.

Education Week, 1, 12-13.
Perrone, V. (1991). Expanding student assessment. Alexandria, VA: Association for

Supervision and Curriculum Development.
Sills-Briegel, T., Fisk, C., & Dunlop, V. (1996). Graduation by exhibition. Educational
Leadership. 54(4), 66-75.
Sternberg, R. (1996, November 13). What is "successful" intelligence? Education

Week,48.
Wiggins, O. (1992). Creating tests worth taking. Educational Leadership. 49(8), 26-33.
Wiggins, O. (1993). Assessing student performance: Exploring the pur.pose and limits of
testing. San Francisco: Jossey-Bass.
Wiggins, O. (1996). Practicing what we preach in designing authentic assessments.

Educational Leadership. 54(4) 18-25.
About the Author
Jean Moon is an Associate Professor of Education and Director of the Center

for Mathematics, Science and Technology in Education at Lesley College,
Cambridge, Massachusetts. Currently, she is on leave from Lesley and is at the
Exxon Education Foundation as an advisor to their mathematics program. Dr.
Moon has been the principal investigator of many funded projects on teacher
enhancement, as well as of research and evaluation studies on curricular and
assessment issues in the elementary grades. In 1995, she authored with Linda
Schulman Finding the Connections: Linking Assessment, Instruction, and
Curriculum in Elementary Mathematics, published by Heinemann. Heinemann
will publish her second book on assessment, Developing Judgment: Assessing
Children's Work in Mathematics. She received her Ph.D. from the University of
Wisconsin at Milwaukee.
INDEX
A ASTD Buyer's Guide and Consultant

Directory, 180
Academy of Professional Consultants and Automobile industry, 78-80
Advisors, 246
Accommodation, 217f, 219
Accreditation. See Certification B
Acculturation, 145
Acting, in organizational action research Baby boom, 5
model, 230-231 Balanced scorecard, 20-22, 21 f
Action plans, 1234 Basarab, on evaluation, 239-240
Adverse impact, 348 Basic skills education, 35
American Educational Research Association, Behavior
250 conditions for change of, 102-103
American Evaluation Association, 249 definition of, 102
American Psychological Association, 247 in four level evaluation, 102-106
American Society for Training and of trainees, 105
Development, 7, 115 Behaviorist psychology, 189
Benchmarking Forum of: 7, 115 Benchmarking, 329
ethical principles of, 245-246 Benefit cost ratio, 128
Andersen Consulting Benefits
on implementation, 130 intangible, 129
on multimedia vs. conventional training, training impact on, 146t, 149-150
285-286 Big 3 Model, 27
Angoffmethod,354-355 Brislin, 258-259
Anonymity, in cross cultural effectiveness, 274 Bureaucracy
A.O. Smith Corporation, 104 information flow in, 70
Applied Global University, 24-26, 25f Max Weber on, 69, 71
Applied Global University Management Bureaucratic model, 67
Institute, 24-25 Business case planning, 289
Applied Materials, corporate university at, 24 Business environments, 45f
Apprehension, 224 Business impact, of training, 13-14
Assessment Business issues, 337-356. See also
of competency (See Competency, Certification
assessment) Business unit manager, 9
of employee competencies, 183-204 (See Buyer's Guide and Consultant Directory,
also Employee competencies) 180
formative, 86
in learning organizations, 84-86
summative, 86
Assimilation, 217f, 218
380 Index
c Chinese culture, 266--268

Client satisfaction, 48
CD-ROMs, 281 Climate, 102-103
Certification, 337-356 Collaboration, cross-functional, 31
definition of, 340, 34lt Collaborative learning, 304-306. See also
evaluation issues in, 343-349 (See also Knowledge capture
Certification, measurement issues in) Communication
growth trend in, 337-339 (See also in bureaucracies, 70
Certification, growth trend in) interactive, 284
industry trends in, 339-340 organizational norms of, 79
measurement issues in, 343-349 (See also technology advances in, 6
Certification, measurement issues in) Communities of practice, 70
professional model of, 341 t, 341-342 Competency. See also Employee
test development in, 349-355 (See also competencies
Certification, test development in) assessment of (See Competency,
value of, 355-356 assessment of)
vendor, 342-343 identification methodologies for,
Certification, growth trend in, 337-339 191-195 (See also Competency,
Chauncy Group International, Ltd. in, 338 identification of)
Dataquest on, 337-338 models of, 187-191 (See also
IBM in, 337 Competency models)
Novell Education in, 337 of workforce, 24
Certification, measurement issues in, 343-349 Competency, assessment of, 195-204
adverse impact in, 348 appropriateness of, 195-196
business considerations in, 349 benefits of, 203-204
Equal Employment Opportunity Act in, cost of, 195
347-348 data in, 195-196
foreign countries in, 348-349 four steps in, 195
legal considerations in, 347-349 multirater, 197-203 (See also
professional standards in, 347 Competency assessment,
reliability in, 344, 346t multi rater)
selection VS. classification in, 345 Competency, identification of, 191-195.
Standards for Educational and See also Competency, assessment of
Psychological Testing on, 344 cost saving vs. precision in, 194-195
validity in, 344, 346t criterion samples in, 193-194
Certification, test development in, 349-355 decision matrix in, 192-193, 193t
Angoffmethod in, 354-355 methodological considerations in,
content domain definition in, 351 192-194,193t
item development in, 352 Competency assessment, multi rater,
job analysis in, 350, 351 t 197-203
knowledge specification in, 350, 351 t appraisal in, 198t, 198-200
measurement tools in, 349 coordination of, 20 I
passing score in, legal issues of, 353-354 credibility of, 197
passing score in, reasonableness of, 354 data gathering in, 197
stages in, 349-350, 350t double-duty in, 199-20 I
test blueprint in, 351-352, 353t efficacy of, 20 I
test piloting in, 352-353 feedback in, 198t, 198-199,200-201
Change implementation of, 202-203
facilitator of: 9 legal implications of, 199
forces 0 f, 15 as ongoing process, 203
implementation of, 22 Competency models
pace of: 6 descriptive, 188-189
paradigm shit! in, 3 differentiating, 189-191 (See also
in social institutions, 209 Differentiating competency
Change strategies, 27-30 model)
collaborative, 27-28 minimum standard, 188
nonuniformity of: 22 Computer-based training, 280, 280t
Chauncy Group International, Ltd., 338 Computer literacy, 5
Computer technology, 6 Deployment, in technology-based

Conceptual models, 44 evaluation, 293-295
Conduct, standards of, 240 Descriptive competency model, 188-189
Confidence, crisis in, 211-212 Design documentation content analysis,
Confidentiality, 240-241 311-314
Conflict, of values, 240-241 noun phrases in, 312-313, 313t
Consolidated Edison, 108 noun phrases vs. design quality in,
Constructivist learning, 300 313-314
Content analysis, 311-314 Design review, 171-174
Content domain definition, 351 definition of, 171
Context, change in, 3--4 elements of, 171-173, 172t
Continuous process improvement, 118 inadequacy of, 173-174
Control group, in return on investment, process guidelines of, 173
124-125 Design-team performance, 297-316
Convergent knowledge, 217f, 218 engineering design education in,
Core competency, 24 298-300
Core service implications of, 314-316
definition of, 45f, 46 mechanical engineering (ME21 0) in,
linkages of, 47 300-306
Corporate universities, 23 metrics enablement in, 308-314
Cost justification, 291 product-based learning assessment in,
Costs, in return on investment, 126 306-308
Course pilot, 178-179 product-based learning in, 300-306
Cox, Taylor, 258 Design-team performance assessment
Criteria, valid, 30-31 model, 307, 307t
Cross-boundary team, 74 Diagnosis, 227
Cross-functional collaboration, 31 Differentiating competency model,
Cultural bias, 261-262 189-191
Cultural coaches, 273 assessment debate in, 190
Cultural context, 239-240 contrasting approaches in, 190-191
Cultural indirectness, 270 definition of, 189-190
Culture surface vs. core competency in, 191
on training, 257-263 (See also Training, Disciplined inquiry, 57-58
cultural dimensions of) conclusions of, 58-59
on training evaluation, 263-276 (See also objectivity of, 58
Training evaluation, cultural quality of, 58
dimensions of) Discourse-based tools, 305, 305t
Customer Distribution systems
business unit as, 13 linkages of, 48
definition of, 10, 13, 15 in service business management, 45f, 46
satisfaction, 12-13 Divergent knowledge, 217f, 217-218, 226
Customer focus, of training, II Diversity advocate, 10
Customer input, 126 Double-duty assessment, 199-201
Customer satisfaction, 12-13 Double-loop learning, 32-36, 33f
Customer service, 7 Argyris on, 32-33
Motorola University as, 34-35
theory-in-use in, 33-34
D vs. single-loop learning, 35
Databases, 127
Data collection E
in feedback, 30
in return on investment, 122-124 Early module tryout, 177-178
timing of, 122 Economic shifts, 7-8
Dataquest,337-338 Economy, global, 7
Decision making, ethical, 242-244, 250-251 Education, 359-375
Deming, 68 behavioral objectives in, 361-362
Demographics, 4-6
382 Index
conceptual-network knowledge model in, Ethical decision making, 242-244,

365 250-251
current issues in, 366--367 Fielder on, 243-244
growth of state testing in, 362 McLagan on, 242-243
information society on, 366 Ethical principles, 245-250
Jervis on, 364 Academy of Professional Consultants
lifelong learning concept in, 366 and Advisors on, 246
memorization in, 365 American Educational Research
performance assessment in, 367-374 (See Association on, 250
also Educational performance American Evaluation Association on,
assessment) 249
Perrone on, 363 American Psychological Association
return on investment of, 43 on,247
standardization in, 360-361 American Society for Training and
standardized test scores in, 362 Development on, 245-246
teaching vs. learning in, 361 Joint Committee on Standards for
test score decrease in, 364 Educational Evaluation on,
Educational performance assessment, 367-374 247-249
college admission reforms in, 373-374 Ethical problems
college reforms in, 373 definition of, 241-242
difficulties in, 370 identification of, 240-241
evaluation lessons in, 374-375 Ethical sensitivity, 241-242
fourth-grade mathematics task in, 372, 372f Evaluation. See also Training evaluation
math testing in, 369f acceptance assessment in, 52
middle-school writing project in, 372-373 application of, 32
senior exhibition in, 370-371 Basarab on, 239-240
vs. traditional assessment, 368, 369f of behavior, 106
Wiggins on, 367-368 borrowing techniques of, III
Educational Testing Service, 338 case study of, 321-336 (See also
Electronic performance support, 281 Motorola Total Customer
Employee capacity, training impact on, 146t, Satisfaction)
148-149 and change initiatives, 32
Employee competencies, 183-204 client benefits in, 50t, 51t
assessment of, 195-204 (See also cultural context of, 239-240
Competency, assessment) data for, 12
convergence of conditions in, 184-187 definition of, 55-57
economic conditions in, 185-187 deployment in, 293-295
as employment requirements, 186 development in, 291-292
identification methodologies for, 191-195 double-loop learning in, 32-36, 33f
(See also Competency, identification) ethics of, 237-252 (See also Training
models of, 187-191 (See also Competency evaluation ethics)
models) formative, 30, 167-181 (See also
as performance rating, 187 Evaluation, formative)
portability of, 187 four levels of, 95-112 (See also Four
social conditions in, 185 level evaluation)
technological innovations in, 184-185 impact of, 141-165 (See also
threshold for, 189 Evaluation impact)
Engineering design education, 298-300 intent of, 53f, 55f
changes in, 298 Kirkpatrick's levels of, 11-13
constructivist learning in, 300 oflearning, 102
Kolb's learning cycle in, 298-299 in learning organizations, 84-86
Stanford University in, 298 micro vs. macro, 52, 52t, 53f
Vygotsky's model in, 300 at Motorola, 321-336 (See also
English as second language, 274 Motorola Total Customer
Environment, forces of change in, 15 Satisfaction)
Epistemology of professional practice, 212 planning in, 288-291
Equal Employment Opportunity Act, 347-348 as a professional service, 59-60
Estimates, in return on investment, 125
quality measures for (See Motorola Total accommodation in, 219

Customer Satisfaction) assimilation in, 218
of reaction, 100 convergent knowledge in, 218
reasons for, 96-98 divergent knowledge in, 217t: 217-218
ofresults, 109 Exploration, in reflection 228
on self-esteem, 31 Extrinsic rewards, 105 '
and single-loop learning, 33, 33f
strategic val~e of, 27-36 (See also Change
strategies; Feedback; Performance F
goals)
success factors in, 60f Facilitators
summative, 168t, 168-169 of change, 9
technology in, 287-295 (See also oflearning organizations, 9
Technology-based evaluation) Feedback,30-36
ten-step process of, 96 in balanced scorecard 22
tests in, III ~ouple-Ioop learning in, 32-36
tools for, 54, 55f III employee competency assessment,
of training, 10-13,15-16,239-240 (See 198-201
also Training evaluation; Training evaluation results in, 32
evaluation ethics) learning and performance in 31
Evaluation, ethics improvement, 251-253 reaction sheets in, 100 '
consenus-based standards in 253 single-loop vs. double-loop, 35
professional obligation in, 252 valid criteria in, 30-31
research in, 253 Fertility rates, 5
Sanders on, 252 Fidelity,52
Evaluation, formative Fielder, on ethical decision making, 251
barriers in, 179-180 Fifth Discipline, 75-76
cost effectiveness of, 181 Finding, in organizational action research
definition of, 167-168, 168t model,229
design r~view in, 171-174 (See also Design Flattening, organizational, 7
review) Focus groups, 123
general principles of, 169-171 Follow-up session, 124
opportunities in, 180-181 Force-field analysis, 328, 328f
prototype review in, 174-176 (See also Forecasting model, in return on
Prototype review) investment, 125
prototype testing in, 176-179 (See also Formative evaluation, 30. See also
Prototype testing) Evaluation, formative
review network in, 170-171 Four level evaluation, 95-112
review perspectives in, 169-171 implementation of, 109-112
self-paced materials in, 181 level I-reaction in, 99-100
VS. summative evaluation 168t 168-169
level2-learning in, 100-102
Evaluation impact, 141-165' , level-3 behavior in, 102-106
clarification of, 142-153 level 4--resuIts in, 106-109
learning vs. impact in, 153-154 Four-phase loop learning, 299
logic value-chain in, 151-152 Frame of reference, change in, 3-4
mythology of, 153-156
performance factors in, 154-156
retrospective, 152-153
strategy in, 156-165 (See also Impact G
evaluation strategy)
training logic in, 142-145, 143f, 151-153 General Electric, 74
Evaluator, role of, 57-59 Gestalt, 226
Expectations Global economy, 7
of stakeholders, 44, 45f Goal(s)
of training, 9-10 collaborative, 27-28
Experience, concrete, 224. See also performance, 28-30
Experiential learning setting, 27-28
Experiential learning, 216-219, 217f, 224, 299
384 Index
H Joint Committee on Standards for

Educational Evaluation, 247-249
Happiness sheets, 100 Jokes, 110
Hierarchy, cultural dimensions o( 267
Human resource development
accountability in, 119 K
evaluation models in, 114-115
return on investment in, 113-1 14 Kirkpatrick's evaluation, 11-13
Hybrid CD's, 282-283 modified, 119-121, 120t
original definitions of, 114-115, 115t
problems with, 115-118 (See also
Kirkpatrick's model)
Kirkpatrick's model
IBM,337 accountability of, 116-1 17
Illiteracy, 35 definitions in, 116
Impact, II in impact evaluation, 158-159
Impact evaluation strategy, 156-165 isolating training effects in, 117-118
nonimpact variables in, 160-161 linkages in, 115-116
partial effects in, 157-158 modified version of, 119-121, 120t
as quality management process, 161-163 Knowing
results, isolated, 158-160 Belenkey on, 226
results in, immediate, 160-161 connected, 226
return on investment in, 159-160 managerial ways of, 219-220
training assessment in, 157-158 in organizational action research
Implementation model, 229-230
Andersen Consulting on, 130 Knowing-in-action, 224
evaluation targets in, 130-131, 131t Knowledge
Implicit theories, 227 accommodative, 217f, 219
Incentives, 103 assimilative, 217( 218
Information explosion, 6-7 from concrete experience, 224 (See
Instructional designer, 169 also Experiential learning)
Instructional design process, 272-273 convergent, 217f, 218
Instructional quality inventory, 174--175 divergent, 217f, 217-218, 226
Instructor lead training, 280, 280t exploration of, 228
Instrumentation feedback model, 307-308 Knowledge capture, 304-306
Instrumented team learning, 303-304 discourse-based vs. thought-based tools
ME210 Web as, 303 in, 305, 305t
product-based learning model in, 303 formal documentation in, 304
restructuring in, 304 information exchange mechanisms in,
Intangible benefits, 129 304-305, 305t
Interactions Group, 259-260 Knowledge formation principles, 213
Interactive communication, 284 Kolb model, 216-219, 217f
Interventions, design of, 30 accommodation in, 219
Interviews, 123 assimilation in, 218
Investment return, of training, 14 convergent knowledge in, 218
Ishikawa diagram, 326f, 326-327 divergent knowledge in, 217f, 217-218
Kolb's learning cycle
in engineering design education,
298-299
J four-phase loop learning in, 299
Jackson, Susan E., 260-261
Japanese competition, 67-68
Jervis, Kathe, 364 L
Job analysis, 350, 35lt
Job performance, training impact on, 146t, Leadership capacity, training impact on,
147-148 146t, 148-149
Learning, 19
definition of, 101 organizing/managing theories in,

double-loop, 32-36, 33f 215-216
experiential, 299 theory testing in, 215
in four level evaluation, 100--102 Managers
four-phase loop in, 299 knowledge formation and, 213
Kolb's cycle in, 299 lack oflearning of, 213
measurement of, 101-102 learning techniques of, 213-214
messages about, 3 1 ME21O,300--306
organizational, 80-83 Mechanical engineering (ME21 0),
single-loop, 33, 33f, 35 300--306
in training evaluation, 11-12 Mechanistic bureaucratic model, 69t
vs. training, 8 Mechatronics,301-302
Learning organizations, 72-80 definition of, 301
boundary concept in, 74 examples of, 302
case study of, 78-80 Merit pay, 103
employee respect in, 77 Meta-assimilation, 228
features of, 73 Meta-reflection, 228
knowledge creation in, 73 Metrics enablement, 308-314
leadership in, 76-77 design documentation content analysis
MIT Center and, 75 in, 311-314
Learning specialist, 9 feedback assessment in, 309
Legal considerations, 347-349 team-preference profile management
Legal issues, 337-356. See also Certification in, 309-311
Level I-reaction, 99-100 Micro evaluation, 52t, 53f, 55f
Level 2-learning, 100--102 Minimum standard competency model,
Level 3-behavior, 102-106 188
LeveI4-results, 106-109 Minorities, 5
Licensing. See Certification MIT Center for Organizational Learning,
Likert scales, 260 75, 78
Linkages, 47-48 Moore, Robert and Helen, 259-260
Logic value-chain, 151-152 Motorola Total Customer Satisfaction,
321-336
analysis tools in, 325-330 (See also
M Motorola Total Customer
Satisfaction, analysis tools)
business issue in, 322
Macro evaluation, 52t, 53f, 54, 55f, 56
institutionalization of, 334-336, 335f
MadeFast experiment, 302
prioritization matrix in, 323t
Management
process description of, 323t
climate created by, 102-103
project selection in, 324-325
of Motorola, 34
quality function deployment in, 331
operations, 45f, 46
remedies in, 330--333
program development and, 104
results in, 333-334
technology acceptance by, 284-287 (See
Skills Training Application Review
also Technology-based delivery)
process (STAR) in, 330--332,
training of, 26
331f, 332f
Management education, in return on
team representation in, 322
investment, 135
team structure in, 322
Management training
Motorola Total Customer Satisfaction,
deficiencies in, 213-214
analysis tools, 325-330
experiencing in, 214
benchmarking in, 329
formal, 213-214
force-field analysis in, 328, 328f
internship in, 214
Ishikawa diagram in, 326f, 326-327
knowledge acquisition and, 214-215
Pareto analysis in, 327t, 327-328
learning from practice and, 213
Pugh Concept Selection, 329t, 329-330
in organizational action research model,
validation of primary causes in, 327
213-216
organizational diagnosis in, 215
organizational systems in, 215
386 Index
Motorola University, 34-35. See also professional training and, 211-212

Motorola Total Customer Satisfaction reflecting in, 225-228
Multicultural context, 264-272. See also researchers in, 220-221
Training evaluation, cultural dimensions Organizational capacity, training impact
of on, 146t, 148
Multimedia Organizational change
Andersen Consulting on, 285-286 double-loop learning in, 32-36, 331'
learner control in, 285 training tor, 22-23
technology of, 280 Organizational flattening, 7
vs.. conventional delivery, 285 Organizational goals, performance goals
MultIrater competency assessment, 197-203. and, 28-30
See also Competency assessment, Organizational learning, 80-83
multI rater assessment and evaluation in, 84-86
bureaucratic model vs., 64
definition of, 63
N di fferent approaches in, 83
information economy on, 65
Natural systems model, 71 organizational memory in, 81-82
Needs assessment planning, 289 precipating factors in, 81
Network(s) training vs. learning in, 64-65
evaluation review in, 170-171 types of, 82
knowledge model in, 365 Organizational memory, 81-82
norms in, 72 Organizational planning, 28
as training organizations, 7 Organizational strategy, 20-37
Network analysis, 71-72 Organizations
Network distribution, 283-284 concealing problems in, 78-79
Network technology, 282-283 dysfunctions in, 83
New paradigm thinking, 71 goal setting of, 27-28
network analysis of, 71-72
Newtonian model, 4
Novell Education, 337 new models for, 67
norms of communication in, 79
NSF Synthesis Mechatronics, 302
Output data, 126
Outsourcing, 118
o
p
Objectivity, of disciplined inquiry, 58
On-the-job observation, 123
Paradigm shift, 3
Operations management
Pareto analysis, 327t, 327-328
linkages of, 47
Participant reaction, 99
in service business management, 45f, 46
Perception, in training evaluation, II
Organismic model, 67-69, 69t
Performance
Organizational action research model, 209-233
analysis of, 154t, 154-156, 155t
acting in, 230-231
current model and, 210-21 I messages about, 3 I
team, 31
experiential learning in (See Experiential
in training evaluation, 12, 13
learning; Kolb)
training logic in, 155t
finding in, 229
Performance contracts, 124
getting there in, 231-233
Performance enhancers, 9
knowing in, 229-230
Performance goals, 28-30
Kolb and experiential learning and,
Performance interventions 8
216-219,217f
Performance monitoring, i~ ROI data
management training and, 213-216
collection, 124
managerial knowing and, 219-220
Perrone, Vito, 363
practice in, 222-224
Piaget, Jean, 300
practitioners as researchers in, 221-222
222f ' Pilot testing, 292
Political context, of evaluation, 239-240
practitioners in, 220
Practice
definition of: 222 Q

epistemology of, 226
individual forces in, 222-223 Quality, in corporate education, 23-24
noneffectiveness of, 224 Quality Function Deployment, 331
in organizational action research model Quality management, 67-68
222-224 ' Quality management process
organizational forces in, 223 in impact eval uation strategy, 161-163
understanding and behaviors in, 223-224 process analysis in, 162-163
Practice context, 3-16 role of customer in, 161-162
Practitioners systemic framework for, 162
empowerment of, 221 Quantum physics, 4
in organizational action research model 220 Questionnaire
as researchers, 221-222 ' in data collection, 123
Primary client, 42-43 in training evaluation, 274
Problem t1-aming, within reflection, 227-228
Process analysis, in quality management
process, 162-163 R
Process assurance, 52t
Process guidelines
for design review, 173 Reaction
for prototype review, 175-176 evaluation guidelines for, 100
Product-based learning, 300-306 in four level evaluation, 99-100
collaborative learning in, 304-306 Reaction sheets, 98, 100, I II
definition ot: 300 Reflecting
goals 01',300-301 Kolb on, 225
instrumented team learning in, 303-304 in organizational action research
knowledge capture in, 304-306 model, 225-228
mechanical engineering (ME21 0) as, 302 Schon on, 225
mechatronics in, 301-302 Reflection
Product-based learning assessment, 306-308 exploration in, 228
design-team performance model in, 307, meta-reflection in, 228
307t problem framing within, 227-228
instrumentation feedback model in, 307-308 Regional variations, 270
themes of, 306 Reliability, 344, 346t
Professional certification, 341t, 341-342 Requirements definition planning,
Professional demands, 4 289-290, 290t
Professional obligations, 245 Research, vs_ theory, 212
Professional practice, epistemology of, 212 Researchers
Professional standards, 347 in organizational action research
Professions, crisis in confidence in, 211-212 model, 220-221
Program assignments, in ROJ data collection practitioners as, 221-222
123-124 ' Results
Project management, execution of, 54-55 definition of, 106
Prototype review, 174-176 of evaluation feedback, 32
adequacy of, 175 evaluation of, 107
consistency of, 175 in four level evaluation, 106-109
definition of, 174 of impact evaluation strategy, 158-161
instructional quality inventory in, 174-175 intangible, 106-107
model for, 174-175 Results assurance, 52t
process guidelines for, 175-176 Return on investment
Prototype testing, 176-179,291-292 benefit cost ratio in, 128
course pilot in, 178-179 calculation of, 128-129
early module tryout in, 177-178 causality issues in, 159-160
individual, 176-177 communicating results in, 136-137, 137
instructor lead materials in, 177-178 converting data to monetary value in,
self-paced material in, 176-177 126-127
Psychological makeup, of stakeholders, 49 data collection in, 122-124
Pugh Concept Selection, 329t, 329-330 of education, 43
388 Index
evaluation report development in, 136-137 micro vs. macro evaluation of, 52, 52t,
evaluation targets in, 130t, 130-131 53f
formula for, 128 psychological makeup of, 49
in impact evaluation strategy, 159-160 society as, 43
intangible benefits in, 129 Standards for Educational and
isolating training effects in, 124-126 Psychological Testing. 344
as a learning tool, 134 Stanford University Center for Design
management education in, 135 Research, 297-298
monitoring progress in, 136-137 Strategic context, 20-22
necessity of, 14 Strategic perspective, training, 20-27
policy development in, 133-134 initiatives in, 23-26
responsibility assignment in, 131-133 redefinition of training in, 26
staff education in, 135 strategic context in, 20-22
staff involvement in, 134 strategic lever in, 22-23
tabulating program costs in, 127-128 Strategic systems model, 28-29, 29f
technical support function in, 132-133 Strategic training initiatives, 23-26
Return on investment (ROI), 113-137 at Applied Materials, 24-25
preliminary evaluation information for, at Xerox, 23
122-123 Summative evaluation, 168t, 168-169
process model of, 121, 121 f Surveys, 123
reasons for, 118-119
widespread use of, I 13
Review network, 170-171 T
Rewards, 103
extrinsic, 105 Target markets, 45f, 45-46
Role-playing, in cross cultural effectiveness, linkages of, 47
275 Team performance, 31
Role responsibility, 244-245 Team-preference profile management,
professional obligations and, 245 309-311
of training professionals, 244 Technology-based content, 280
Technology-based delivery, 284-287. See
also Training, technology-based
s delivery
cost comparisons vs. conventional,
Sanders, on ethics improvement, 252 286t, 286-287
Saving face, 267 deployment in, 293-295
Schon, Donald, 210-211, 219-220 development in, 291-292
Science, applied, 210 effectiveness of, 285
Security, in technology-based delivery, 284 efficiency of, 285-286
Self-esteem, 30-31 evaluation in, 287-295 (See also
Self-paced material Technology-based evaluation)
optimization of, 181 planning in, 288-291
in prototype testing, 176-177 Technology-based evaluation, 287-295
Service, 45-46 activity overview in, 288, 288t
Service management, elements of, 60 business case planning in, 289
Seven-S categories, 68--69, 69t cost justification in, 291
Share (MadeFast experiment), 302 needs assessment planning in, 289
Single-loop learning, 33, 33f, 35 pilot testing in, 292
Skills assessment, 35 prototype testing in, 291-292
Skills Training Application Review process requirements definition planning in,
(STAR), 330-332, 331 f, 332f 289-290, 290t
Smith Corporation, 104 Web-based training in, 290t
Software, 280 Test development. See Certification, test
Stakeholders, 41--60 development in
balancing power of, 47 Testing, for evaluation, III
expectations of, 44, 45f Thai culture, 268-269
identification of, 42 Theory, vs. research, 212
linkages of, 47 Theory-in-use, 33-34
Thought-based tools, 305, 305t Jackson on, 260--261

Threshold competency, 189 Likert scales in, 260
Total customer satisfaction. See Motorola result variations in, 257-258
Total Customer Satisfaction sensitivity to, 263
Total quality management, 224 vantage point in, 259-260, 262
in implications for training, 68 Training, technology-based delivery oC
return on investment and, 118 280--284
Trainee, skill transfer in, 105 CD-ROMs in, 281
Trainer(s) computer-based vs. instructor lead, 280,
as facilitators, 10 280t
jokes of, I 10 design-team performance in, 297-316
in organizational planning, 28 (See also Design-team
as performance consultants, II performance)
roles of, 9-10 electronic performance support in, 281
Training, 7-14 hybrid CD's in, 282-283
assessment (See Impact evaluation strategy) interactive communication in, 284
business impact of, 13-14 multimedia in, 280, 285
climate on, 103-104 network distribution in, 283-284
conventional vs. multimedia, 285 network technology in, 282-283
cost of, 9 security in, 284
cultural dimensions of, 257-263 (See also software in, 280
Training, cultural dimensions of) technology-based content in, 280
current model of, 210-211 Web browsers in, 282
as customer focused, II Training costs
customer satisfaction in, 12-13, 13 in Kirkpatrick's model, 117
evaluation of, 10--13, 15-16,27-32 (See vs. benefits, 104
also Evaluation; Training evaluation) Training departments, rethinking mission
expectations of, 9-10 of, 26
focus of, 8 Training evaluation, 10-13,27-32
goal assessment in, 84-86 Basarab on, 239-240
improvement of, 97-98 cultural context of, 239-240
investment return of, 14 employee impact of, 31-32
in Kirkpatrick's model, 117-118 establishing goals for, 27-28
logic of, 142-145, 143f ethics of, 237-252 (See also Training
in organizational action research model, evaluation ethics)
211-212 levels of, II
in organizational planning, 28 in multicultural context, 264-276 (See
outcomes, 145-151 (See also Training also Training evaluation, cultural
outcomes) dimensions of)
as partnership, 26 strategic value of, 27-32, 27-36
redefinition of, 26 valid criteria for, 30--31
return on investment in, 14 Training evaluation, cross cultural
roles and expectations of, 9-10 effectiveness of, 272-276
stakeholders in, 42 acknowledgement of difference in, 272
strategic context of, 20--22 anonymity in, 274
strategic initiatives in, 23-26 cultural coaches in, 273
as strategic lever, 22-23 English as second language in, 274
strategic perspective of, 20--27 (See also evaluation questionnaires in, 274
Strategic perspective, training) group orientation in, 273-274
technology-based delivery in, 280--284 (See instructional design process in, 272-273
also Training, technology-based openness to difference in, 275-276
delivery) phraseology in, 273
Training, cultural dimensions of, 257-263 role-playing in, 275
assumptions in, 258 Training evaluation, cultural dimensions
Brislin on, 258-259 of, 257-276
Coxon, 258 Chinese culture in, 266-268
cultural bias in, 261-262 cultural indirectness in, 270
Interactions Group on, 259-260 domestic regional differences in, 270
390 Index
evaluation models in, 265 categorization of, 145-151, 146t

evaluatory questions for, 265-266 Training programs, objectives of, 108
growing recognition of difference in, Transition period, 209
263-264 Trend lines, in return on investment, 125
hierarchy in, 267
industry differences in, 271
multiple cultures in, 264-265 v
optimizing effectiveness in, 272-276 (See
also Training evaluation, cross Validity, 344, 346t
cultural effectiveness of) Vantage point, in cultural differences,
role-playing exercise in, 268-269 259-260,262
saving face in, 267 Vendor certification, 342-343
Thai culture in, 268-269 Vygotsky's model, 300
variety of approaches in, 271
Training evaluation ethics, 237-252
case study of, 237-238
definition of, 239--240 w
definition of ethical problems in, 241-242
ethical decision making in, 250-251 Web-based training, 290t
ethical dilemmas in, 242-244 Web browsers, 282
ethical principles in, 245-250 Wiggins, Grant, 367-368
identifying ethical problems in, 240-241 Women, in workforce, 5
improvement of, 251-253 Workforce
literature on, 238 capabilities and commitment of, 26
role responsibility in, 244-245 competency of, 24
training evaluation in, 239-240 diversity in, 6
Training initiatives, for Xerox, 23 Workforce, adaptability of, 19
Training organization
as networked/virtual, 7
skills for, 7 z
Training organizations, new roles for, 15
Training outcomes, 145-151 Zinn, 241-242
acculturation in, 145

Evaluating Corporate Training: Models and Issues

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Evaluating Corporate Training: Models and Issues

Uploaded by

Copyright:

Available Formats

Evaluating Corporate Training:

Models and Issues

Other books in the series:

SPRINGER SCIENCE+BUSINESS MEDIA, LLC

Copyright © 1998 by Springer Science+Business Media New York

All rights reserved. No part of this publication may be reproduced, stored in a

Printed on acid-free paper.

Section I-The Context of Evaluation 1

1 The Changing Context of Practice 3

2 Organizational Strategy and Training Evaluation 19

Carol Ann Moore and Constance J. Seidner

3 What Stakeholders Want to Know 41

4 The Learning Organization: Implications for Training 63

Section II-Models of Evaluation 93

5 The Four Levels of Evaluation 95

6 Level 4 and Beyond: An ROI Model 113

7 Clarifying and Directing Impact Evaluation 141

8 Formative Evaluation 167

9 Assessing Employee Competencies 183

10 The Organizational Action Research Model 209

Section III-Issues in Evaluation 235

12 Cultural Dimensions of Evaluation 257

13 Impact of Technology on Training Evaluation 279

14 Design Team Performance: Metrics and the Impact of

15 Using Quality to Drive Evaluation: A Case Study 321

16 Certification: Business and Legal Issues 337

17 Lessons from Education 359

Subject Index 379

We would like to express our appreciation and thanks to a number of people

Zachary Rolnik and Suzanne Rumsey of Kluwer Academic Publishers were

We believe that evaluation is a tool to improve the practice of training and

provide students of educational evaluation, practitioners in training functions,

The role training evaluation can play in organizational strategy, and

Oliver Cummings represents the stakeholder's view of training evaluation.

Barry Sugarman provides a thoughtful introduction to learning

We are living in a time of incredibly rapid, all pervasive, and continuous

Our awareness of the persuasiveness of change is exhibited in our attempt to

It is the understanding of context that has always been the strength of

We are currently living in a world that is rapidly and thoroughly changing.

Employees from the new generation entering the workforce have

A second environmental force that is driving many other changes is an

Availability of information technology to employees in every level in the

While the availability of massive amounts of information presents incredible

Managing this type of organization takes a different set of skills. These

In general, the focus of training is moving from the individual to the

We will have new roles and relationships within the organization.

• Visualizing the strategy of the organization

• Assessing the current state of the workforce

• Developing a strategy to develop the competence needed to meet the

• Implementing the strategy

• Evaluating the process

• Business Unit Manager and manager of a virtual organization with

• Diversity Advocate, facilitator of mUltiple voices, conflict resolver, and

Conceptually, training professionals have a different relationship with

A new definition of customer is brought about by the changed organizational

These changes have resulted in increased pressure on the training function to

The literature of training evaluation provides a framework to answer these

ago Donald Kirkpatrick (1994) provided a framework of four levels of

However, these practices do not, in the author's opinion, represent a choice.

Kirkpatrick's Level II evaluation (learning) has probably become less

Level III measures performance, or the application of knowledge, and Level

Data show that evaluation is predominately performed at Level I. In ASTD's

• Impact on the business problem