Professional Documents
Culture Documents
Editors:
George F. Madaus, Boston College,
Chestnut Hill, Massachusetts, U.S.A.
Daniel L. Stufflebeam, Western Michigan
University, Kalamazoo, Michigan, U.S.A.
Smith,M.:
Evaluability Assessment
Ayers,1. and Berney, M.:
A Practical Guide to Teacher Education Evaluation
Hambleton, Rand Zaal, I.:
Advances in Educational and Psychological Testing
Gifford, B. and O'Connor, M.:
Changing Assessments
Gifford, B.:
Policy Perspectives on Educational Testing
Basarab, D. and Root, D.:
The Training Evaluation Process
Haney, W.M., Madaus, G.F. and Lyons, R.:
The Fractured Marketplace for Standardized Testing
Wing, L.C. and Gifford, B.:
Policy Issues in Employment Testing
Gable, RE.:
Instrument Development in the Affective Domain (2nd Edition)
Kremer-Hayon, L.:
Teacher Self-Evaluation
Payne, David A.:
Designing Educational Project and Program Evaluations
Oakland T. and Hambleton, R:
International Perspectives on Academic Assessment
Nettles, M.T. and Nettles, A.L.:
Equity and Excellence in Educational Testing and Assessment
Shinkfield, A.I. and Stufflebeam, D.L.:
Teacher Evaluation: Guide to Effective Practice
Birenbaum, M. and Dochy, Filip 1.R.C.:
Alternatives in Assessment ofAchievements. Learning
Processes and Prior Knowledge
Mulder, M., Nijhof, W.I., Brinkerhoff, RO.:
Corporate Trainingfor Effective Performance
Britton, E.D. and Raizen, S.A.:
Examining the Examinations
Candoli, c., Cullen, K. and Stufflebeam, D.:
Superintendent Performance Evaluation
Evaluating Corporate Training:
Models and Issues
edited by
Stephen M. Brown
Sacred Heart University
and
Constance J. Seidner
Digital Equipment Corporation
Acknowledgments vii
Preface ix
Stephen M. Brown
Oliver W. Cummings
Barry Sugarman
Donald L. Kirkpatrick
Jack J. Phillips
Robert O. Brinkerhoff
Wilbur Parrott
vi
Susan Ennis
Technology 297
Larry Leifer
David J. Basarab
Finally, we want to thank our families, Robert, David, and Glen Seidner and
Kathy, Jonathan, and Jared Brown for their support, patience, and
understanding throughout this endeavor.
Preface
We are glad to have the opportunity to work together again in the planning and
preparation of this edited volume on the evaluation of corporate training. Our
respective professional careers have provided us with experience in this area,
both as practitioners and as academicians. It is from both of these perspectives
that we approached the preparation of this volume. Our purpose is to provide
training professionals in business and industry, and students of human
resources development with an overview of current models and issues in
educational evaluation.
The book is organized around three themes: context, models, and issues.
The chapters in the context section are intended to provide the reader with an
understanding of the social, organizational, and interpersonal factors that
provide background and give meaning to evaluation practice. The models
section brings together contributions from some of the most influential thinkers
and practitioners in the field. The chapters in this section provide perspective
on the dominant themes and emergent trends from individuals who have been,
and continue to be, the drivers of those trends. Contributions to the issues
section highlight some pervasive themes as well as illuminate new areas of
concern and interest that will affect how we assess learning interventions in the
organizations of today and tomorrow.
The fIrst section of the book is designed to describe the context within which
corporate training programs are evaluated. It presents points of view on the
external environment, the corporate setting, the clients and users of training,
and an organizational learning model to which many corporations currently
aspire. Through these chapters, the reader can develop an understanding of
some of the contextual factors that affect the evaluation of corporate training.
The fIrst chapter, written by Stephen M. Brown, describes the rapid and
signifIcant changes currently happening in the external environment. Dr.
Brown documents these changes and provides an analysis of their ramifIcations
to training evaluation. He then outlines the types of evaluation needed in the
new context.
The chapters in this section work together to create a picture of the changing
context within which training evaluation is practiced. The ideas presented in
the chapters provide multiple lenses with which to view the evolving corporate
landscape, and as such can inform the theory and practice of evaluation.
1 THE CHANGING CONTEXT OF
PRACTICE
Stephen M. Brown
The demands on all professionals are more complex, ever changing, and
allow less room for error than ever before. They are asked to develop solutions
to a new set of questions, using new information, in increasingly rapid time
frames. Professionals do this in a context that is changing and is understood
only in the incomplete state in which it has revealed itself. Wheatley (1991)
uses the analogy of the change in physics when the prevalent theory changed
from a Newtonian model to a model based on quantum physics. This level of
change, in which our entire frame of reference is changed, is analogous to what
is happening in our current practice. This change of the frame of reference
makes all of our explanations and understanding profoundly limited.
Demographics
Who America is as a people has changed and continues to change. This means
our customers (clients) and workers are changing. This, of course, has profound
implications for our organizations and work. The demographic changes, which
are mirrored in the other advanced industrial nations, are: aging of the
population and the workforce, fewer entry-level employees, and a greater
diversity in the workforce. Another dimension to this change results from the
internationalization of the economy. This has meant that our workforce,
customers, and organizations are by their very nature multicultural and
multinational. At a minimum our competition is, too.
Evaluating Corporate Training: Models and Issues 5
The people born during the post-World War II baby boom make up an
extraordinary percentage of the population (about one-third of the total
population and an even higher percentage of the current workforce.) They are
aging. They are also immediately followed by a generation that is relatively
small. This is a result of lowered fertility rates among the Baby Boomers,
particularly among white suburban dwellers. The interaction of these factors
and the considerably longer life spans as a result of the advancement of medical
technology has resulted in an aging population, an aging workforce, and fewer
first-time entrants into the workforce. The median age of the American
workforce was 28 years old in 1970; by 1990 it had risen to 33 years; it will
continue to rise to 39 years in 2010, and peak at 42 in the year 2040. (U.S.
Senate Select Committee on Aging, 1988) It is noteworthy that fIrst-time
entrants to the workforce have traditionally been the source of newly educated,
relatively cheap, and energetic labor.
The lowered fertility rate of white suburban baby boomers, the relatively
higher fertility rates of African Americans and Hispanics, and relaxed
immigration laws have combined to make our population and workforce much
more diverse than ever before. The higher participation rate of women in the
workforce has given this diversity an additional dimension. Attempts to
understand the ramifIcations of this diversity are hampered because we view it
through the lens of the old context. So, industrial America attempts to
understand the ramifIcations of organizations in which white males are no
longer the majority. The real news is that no group is the majority, and the
population has become so diverse that there is no common defInition of
minority. Our workforce has obvious differences of race, gender, language,
country of origin, and culture. Very few universal assumptions about the values,
experiences, and motivation of our workforce can be made.
These changes have resulted in a workforce that is more diverse, and may
even further extend the popular meaning of the word. The workforce is aging
and more experienced, but the experience is in a world that is fading from
view. Our workforce, customers, and competition are often separated by
geography, culture, and language. The most basic assumptions about our
workforce, organizations, and practice need to be examined as they often are
predicated on a more homogeneous population. Even our theories of learning,
upon which most of the practice of training professionals is based, are relatively
untested with our new definition of population.
Information Explosion
Many of these changes are being fueled by rapid advances in computer and
communications technology. These advances are unprecedented in their
constancy, rapidity, and implication. They have resulted in computer
technology becoming small and cheap enough to be available to the majority of
workers. In 1993, 45.8 percent (Bassi, Benson & Cheney, 1997, p. 33) of
workers reported that they use computers on their jobs, and the number
continues to rise. Advances in communications technology make the global
economy possible.
increased the output per employee and support new organization structures.
Technological advances have also supported improvements in customer service.
Economic Shifts
Our economy is global. Our customers and competition are international. This
has forced us to meet international standards of quality, customer service, and
value. The new customer has high expectations. Fast moving communications
technology and international competition means an innovation will soon be
produced better, faster, or cheaper. Knowledge creation and application are the
competitive advantage of the new economy.
Our organizations have become flatter, less hierarchical, and more customer
driven. The training organization often has less full-time personnel to perform
more tasks. The American Society for Training and Development's (ASTD)
Benchmarking Forum reports that the number of employees per training staff
member increased 10 percent from 1994 to 1995 (Bassi, Benson & Cheney,
1997, p. 3). The new structure has internal staff performing tasks that are part
of the organization's core capabilities, and contracting for the others. The
training organization will increasingly become a networked or virtual
organization with the ability to access incredible expertise, but it will have
fewer full-time, permanent employees.
Training
Training will have to adjust to new roles and expectations in organizations. The
organizations have changed. The workforce has changed. The context of
practice has changed. The customer and products have changed. Thus, training
will have to change to be effective. There is a demand for justification of
training expenditures and initiatives. This has led to traditional classroom
training becoming an intervention of secondary resort. Importantly, it has also
led to the need to demonstrate training activities' impact on strategic
initiatives, core organizational capabilities, organizational effectiveness, and
the bottom line.
The most important and fundamental change will be a shift from training to
learning. Filling seats and hours will not be our task. However, leading people
through the changes and helping them adapt to new ways of doing business will
be.
• Institutionalizing learning
This often occurs in a context in which the required tasks have increased,
the core staff who perform these tasks has decreased, and the workforce is
enveloped by change.
Training Expectations
The organizational expectations for training have also shifted dramatically. The
most pronounced change is a new and vigorous justification of the cost of
training based on return or organizational impact. This is being driven by the
competitive nature of cost structures of the international economy and the
resulting organizational structure, which is flat, thinner, and supports very few
administrative costs.
Often training professionals are being asked to do more, because they have a
more important role in the strategy of the organization. The ability to generate
and apply knowledge is a competitive advantage and source of new products,
services, and revenue. These changes and the changing organizational context
have created new roles for trainers. Some of these roles are
Evaluation
environment, you need to do something with this knowledge. The new business
environment is all about performance.
Integrating the concepts of the cited authors, there are probably three kinds
of evaluation data needed today. They measure
• Customer satisfaction
• Return on investment
The tasks for training are to satisfy customers and meet their expectations,
provide solutions to business problems with which they are presented, and
contribute to the profitability and the mission of the company. Everything that
training does should contribute to one of these tasks. These are the same tasks
of all other business units in a corporation. There should be no more
discussions about training being more like a business. If the training unit is in a
Evaluating Corporate Training: Models and Issues 13
Customer Satisfaction
The evaluation of customer satisfaction is a little more complex than was once
thought. The definition of customer has been expanded. Not only are the
employees participating in the training of customers, but in this new
organizational configuration, the business unit with the problem and the
business unit manager are also customers.
Business Impact
This level of evaluation is the one that is usually the most important to the
business unit manager. It answers the question, Did the training make a
positive difference in the business problem I have? Doing this level of
evaluation requires working with the business unit manager to identify the
14 The Changing Context of Practice
Return on Investment
Conclusion
These factors have resulted in new roles for training organizations, such as
change conSUltants, vendor managers, and information synthesizers. New
organizational structures, which are smaller, flexible and have permeable
boundaries, with external vendors have been adopted. Training is seen as one of
many performance enhancing interventions. There is movement away from the
classroom toward less costly, more decentralized delivery, including
electronically distributed delivery.
To meet these challenges, evaluators must move with the environmental and
organizational changes. These changes create a need to look at evaluation
differently. Evaluation of training must be multilevel, customer focused, and
support continuous improvement of training. Evaluation should demonstrate its
affect on a targeted business problem and ROI. The challenge is to provide
16 The Changing Context of Practice
References
Bassi, L.J., Benson G., & Cheney, S. (1997). The top ten trends. Training and
Development Journal. 50 (11), 28- 42.
Brinkerhoff, R. (1987). Achieving results from training. San Francisco, CA: Jossey-
Bass.
Gaines Robinson, D., & Robinson, J. (1994). Performance consulting: Moving beyond
training. San Francisco, CA: Jossey-Bass.
Gaines Robinson, D., & Robinson, J. (1989). Training for impact. San Francisco, CA:
Jossey-Bass.
Kirkpatrick, D. (1994). Evaluating training programs: The four levels. San Francisco,
CA: Berrett-Koehler.
U. S. Bureau of the Census. (1993). Levels of access and use of computers 1984. 1989.
and 1993. Washington, DC: U.S. Government Printing Office.
U.S. Senate Select Committee on Aging. (1988). Aging America: Trends and
projections. 1987-88. Washington, DC: U.S. Department of Health and Human
Services.
Wheatley, M. 1. (1991). Leadership and the new science: Learning about organization.
San Francisco, CA: Berrett-Koehler.
Introduction
From a strategic perspective, both the corporation and each business unit must
make decisions about how to invest limited resources to maximize gain over
short-term and long-range horizons. Senior managers set goals for performance
in such areas as financial results, customer service, new product development,
supply chain management, technology transfer, and workforce productivity.
While these goals frame expectations for performance, they do not necessarily
show how they can be achieved or how to balance conflicting goals. This is
particularly true when intangible assets of a business may be important in
reaching goals. Such assets include the effectiveness of internal business
processes, and learning and growth by individuals and by the organization as a
whole (Kaplan & Norton, 1996a, 1996b).
Kaplan and Norton (1992, 1996a, 1996b) developed the Balanced Scorecard
as a framework for translating strategy into operational terms and actions, and
for measuring business performance. They provide a way of thinking about
strategic implementation that considers both traditional measures such as
financial and customer/market performance, and less tangible assets related to
internal business processes and organizational learning and growth. As Figure
2-1 shows, the process begins with senior managers who set corporate vision,
translate strategies into overall business objectives, and communicate these
strategic objectives to the business units. Senior managers must ensure that
strategic initiatives are aligned across business units and overall business
performance targets are clearly understood. Successful communication of
corporate goals and new strategic initiatives involves educating managers and
employees in the business units, and obtaining ongoing feedback on how
strategy is being implemented and on how organizations are changing to meet
strategic requirements.
Evaluating Corporate Training: Models and Issues 21
Clarifying and
Translating the
Vision and Strategy
• Clarifying the vision
• Gaining consensus
Strategic Feedback
Communicating
and Learning
and Linking
Balanced • Articulating the shared
• Communicating and
vision
educating Scorecard • Supplying strategic
• Setting goals
feedback
• Linking rewards to
• Facilitating strategy
performance measures
review and leaming
Planning and
Target Setting
• Setting targets
• Aligning strategic
initiatives
• Allocating resources
• Establishing
milestones
Reprinted by permission of Harvard Business Review. An Exhibit From "Using the Balanced
Scorecard as a Strategic Management System," by Robert S. Kaplan and David P. Norton,
(January-February 1996): 77. Copyright ©1996 by the President and Fellows of Harvard
College, all rights reserved.
Individual business units then translate broad business objectives into a linked
set of measures and activities for achieving them within the unit.
The feedback and learning dimension of the balanced scorecard offers one of
the best perspectives for understanding how learning and workforce
development, including training and training evaluation, contribute to
realization of business strategy. Without sufficient infrastructure and support
for individual and organizational growth, achievement of other business
objectives becomes unlikely. Success in any area of strategic importance is
highly dependent upon the capabilities of the organizations involved and on the
ability of organizations and employees to learn and change.
positively by the workforce than other approaches to change that may be more
wrenching. In addition, while training is gradual, incrementally changing the
ways in which individuals, teams, and organizations behave, some effects are
felt immediately. As soon as the first training is delivered, some of the
workforce begins to develop new attitudes, behaviors, and skills, and to
incorporate these learnings into their work. Finally, a special advantage of
training is that it can be modified while in progress, if it is not as effective as
desired. This allows for incremental adjustments and continuous improvement
of training, even as strategic change is in progress.
Training was a key component of the quality improvement program that turned
Xerox around in the mid 1980s. The Xerox change strategy involved driving a
Leadership Through Quality initiative across the organization. Successful
implementation of the initiative required sweeping changes in management
practices as well as cultural norms. To enable these changes, quality training
was cascaded throughout the organization, beginning with the CEO, David
Kearns, and reaching all employees in the corporation. Managers who received
the training then delivered it to the employees in their respective work groups.
Anyone in a managerial or supervisory position passed through the training
twice, first as a trainee, and then as a trainer. This top down approach to
training was seen as critical in successfully implementing the turnaround
strategy (Kearns and Nadler, 1992).
~
,------------,
4. obtain
instructors : feedback/evaluation :
I
/
5. pilot ---~... 6. adjustments ..
----l~ 7. run course
and conclusions about the impact of individual training programs may benefit
broader workforce development initiatives.
Two trends have the potential to foster a strategic partnership between the
training function and corporate leaders. First, corporate leaders are increasingly
realizing that the success of change initiatives depends on the capabilities and
commitment of the workforce. Kantor and her colleagues (Kantor, Stein & Jick,
1992) identify training as an important element in the long march toward
significant and lasting change. They suggest that when companies do not invest
sufficient time and resources to upgrade people's skills, change programs can
be undercut or even destroyed.
The best outcome as these two trends come together would be the emergence
of a stronger, more organic relationship between management and training.
Evaluating Corporate Training: Models and Issues 27
The more closely and actively trainers are aligned with change strategists, the
more effective they will be in providing the critical link between the strategists
and the recipients of the change initiatives, the people in the organization on
whom the future depends.
Implementing change is a complex and dynamic process that usually calls for
extensive collaboration across functions and organizations. The Big 3 Model
proposed by Rosabeth Moss Kantor and her colleagues (Kantor, Stein & Jick,
1992) identifies three action roles in the process: change strategists,
implementers, and recipients. People who play these roles need to work
together, taking a broad systems view that recognizes the complex
interrelationships within organizations. They need to create the vision and
provide the infrastructure and business practices required to enable change.
28 Organizational Strategy and Training Evaluation
link by asking hard questions about what the training is being designed to
accomplish. Is it targeted toward fostering the behaviors and mindsets that are
strategic to the company's direction? Will strategic training translate into
changes in the individual and collective competencies of the workforce?
Training that isn't producing any discernible business impact may not be
targeting the "differentiating" behaviors (Ennis, 1997) needed for company
success.
External Environment
Organizational Environment
.--------.
Strategic Goals,
Step 1 Objectives, and
Business Plans
Step 2
Step 3
Step 4
StepS
1_ _ _ _ _ _ - - - - - - - - - - - - - - - - - - - - - - - - - -
identifying indicators that make those linkages real, and generating data that
confirm a causal relationship, evaluators are providing tangible evidence of the
value of training in achieving business goals.
Valid Criteria
The evaluator must make sure that what is measured is not whatever is
easiest to measure but is a true indicator of the performance objectives for the
intervention. Meaningful evaluation criteria and legitimate measures of
performance must be established early in the change effort, for spurious
Evaluating Corporate Training: Models and Issues 31
measures and invalid data in the midst of implementing a change initiative can
have disastrous and costly effects.
Evaluation practices also send subtle messages to employees about the larger
organization and their place in it. These messages need to be consistent with
the values and assumptions underlying the change effort. For example, one of
the assumptions underlying the theory of learning organizations is that
individuals are self-directed, that they want to feel competent and to achieve
personal mastery (Senge, 1990). Practices that encourage self-assessment
against personal goals are therefore compatible with the values of a learning
organization. Evaluators in these organizations might encourage people to
document what they do and reflect on outcomes of their behaviors, so that they
can systematically and continuously learn from their experience.
The way organizations use evaluation data also sends subtle messages that may
affect behavior, and consequently the validity of the data itself. For example,
using student feedback to improve the quality of training sends a message to
instructors and participants that training is important, and that their honest
feedback is valued and used. When the same data are used to determine
instructor salary and promotions, a different message is sent, promoting
different behaviors, particularly on the part of instructors. For their own
livelihood, instructors will be motivated to influence students to rate their
courses highly. As a result, the quality of the training may be overrated, and
needed improvements will go undetected. Evaluators have the opportunity to
encourage decision makers to think about the subtle and not-so-subtle messages
they send to employees by the strategies they employ, the data they collect, and
the use that is made of that data. Those messages should be consistent with the
behaviors and attitudes the organization intends to promote.
The role of evaluation and the power of the analytic approaches offered by
evaluators can extend beyond assessment of the value of specific training
programs during change. Evaluation processes offer insight into the nature and
functioning of the organization undergoing change. In an objective manner,
evaluation techniques and results can describe the true opinion of different
stakeholders, performance improvements of different participants, and the
degree to which desired business outcomes are reached. Such information can
lead to a reassessment not only of training and training strategies but also of
change strategies implemented in part through training programs. Thus,
evaluation can contribute to double-loop learning (Argyris, 1990; Argyris &
Schon, 1996) as much or more than to single-loop learning in organizational
change. (See Figure 2-4.)
Evaluating Corporate Training: Models and Issues 33
..... Goveming
Values f-r+ Actions r--- IItIsmatc:h
or Errors 1--..
Single-Loop Learning
Double-Loop Learning
Double-loop learning requires stepping back and looking deeper for the
foundation of problems, particularly underlying values, norms, and beliefs
about how the world works (theory-in-use) that drive strategies and actions.
Such theories-in-use may be unspoken and not recognized as important to
34 Organizational Strategy and Training Evaluation
A clear turning point carne, however, when plans for a new cellular
manufacturing facility motivated inquiry into the math skills of the local
Motorola workforce. The selected workforce had experience with a similar but
older technology, had improved quality tenfold, and were continuing to
improve. The skills assessment was to determine whether workers needed any
further training to transition to production of the new cellular products. The
results were surprising: 60% of the local workforce had difficulty with simple
arithmetic. Even broader needs for basic skills education were uncovered
through the resulting math classes. Not only were employees having trouble
with math, but also many were having difficulty reading and, for some, even
comprehending English. The ramifications of illiteracy for job performance
were staggering and demanded attention. Clearly changing the theory-in-use
about the workforce, this discovery completely redefined workforce
development needs and training strategy corporatewide. Senior management
decided to invest in basic skills education by turning to local community
colleges for help, to review other areas of technical and business skills that they
had assumed the workforce to have, and, eventually, to build extensive
educational partnerships with schools and colleges, and to establish Motorola
University .
Summary
Looking beyond training, evaluation techniques and results offer insight and
feedback on the efficacy of broader change initiatives. Evaluation can document
the true opinion of different stakeholders, performance improvements of
different participants, and the degree to which desired business outcomes are
reached. Such information can lead to a reassessment not only of training and
training strategies, but also of change strategies implemented in part through
training programs. Thus, evaluation can contribute to double-loop learning-
learning about the underlying values, norms, and assumptions that drive
strategies and actions (Argyris 1990, Argyris & Schon, 1996)-as well as
learning about how to correct and improve practices and reengineer processes.
The challenge posed here is to look beyond the very narrow traditional
images of training evaluation to recognize the larger strategic value that
evaluation processes offer the corporation undergoing organizational change.
The decision-support potential of evaluation during implementation of change
initiatives cannot be denied and should not be overlooked.
Acknowledgments
References
Applied Global University. (1996). Santa Clara, CA: Applied Materials, Inc.
Argyris, C., & Schon, D. A. (1996). Organizational learning II. Themy. method. and
practice. Reading, MA: Addison-Wesley.
Brinkerhoff, R. 0., & Gill, S. 1. (1994). The learning alliance: Svstems thinking in
human resource development. San Francisco: Jossey-Bass.
Gherson, D. J., & Moore, C. A. (1987). The role of training in implementing strategic
change. In L. S. May, C. A. Moore, & S. J. Zammit (eds.), Evaluating business and
industry training. Boston: Kluwer Academic Publishers.
Kantor, R. M., Stein, B. A., & Jick, D. J. (1992). The challenge of organizational
change: How companies experience and leaders guide it. New York: The Free
Press.
Kaplan, R. S., & Norton, D. P. (1992). The balanced scorecard: Measures that drive
performance. Harvard Business Review. 70 (1),71-79.
Kaplan, R. S., & Norton, D. P. (1996a). The balanced scorecard. Boston: Harvard
Business School Press.
Kaplan, R. S., & Norton, D. P. (1996b). Using the balanced scorecard as a strategic
management system. Harvard Business Review. 74 (1),75-85.
Kearns, D. T., & Nadler, D. A. (1992). Prophets in the dark: How Xerox reinvented
itself and beat back the Japanese. New York: Harper Collins.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco,
CA: Berrett-Koehler Publishers, Inc.
Meyer, C. (1994). How the right measures help teams excel. Harvard Business Review.
72 (3), 95-103.
Moller, L., & Mallin, P. (1996). Evaluation practices of instructional designers and
organizational supports and barriers. Performance Improvement Quarterly. 9 (4),82-
92.
Senge, P. (1990). The fifth discipline: The art and practice of the learning organization.
New York: Doubleday.
Introduction
"Know me, know my business" was one of the catch phrases that grew out of
Peters and Waterman's (1982) work on excellence. This charge to service
providers, like evaluators, is easier to say than to achieve, but putting oneself in
the client's shoes-looking at their problems through their frame of
reference-can help in the effort to understand. This chapter provides a
framework for examining what clients of an evaluation function want to know,
what drives their needs, and how the needs can be addressed within the context
of sound business practice and professional standards.
For this discussion, stakeholders are not simply managers with decision-
making authority. Stakeholders are the potential users of evaluation results and
others who may be affected by them. Each stakeholder has a somewhat different
perspective and set of concerns. The most important training and development
stakeholders are
suggest that evaluators ignore other stakeholder needs, but to emphasize the
need for the evaluator to create a conscious focus and framework for the
evaluation. That focus is guided by the primary client, who in most cases must
also be somewhat concerned about other stakeholder interests as well as his/her
own.
A further set of stakeholders, beyond the scope of our analysis here, deserves
mentioning: communities and society at large. While our focus is on
evaluations initiated within a company, there is a need for much broader
assessments of the processes and outcomes of training, performance
enhancement, and education programs. As company training programs
proliferate and supplement more traditional schooling, as the adult's need for
learning new skills increases and is met, we need to look at the aggregate
impact on society.
These stakeholders all work or connect with a managed business and its
training and development function in some direct way. What is of concern to
44 What Stakeholders Want to Know
the stakeholder changes as the contextual focus changes. The business context
is complex, and it is to the business management context we now turn.
Expectations
Need-Service Linking - - - - -
Client
Benefit/Cost
Satisfaction
Optimization
Creation
. 0<:>" 1o.<t?Y
~<:; ~'"
Client Service-Business df $~<$J~
: ; - - - - - Objectives Integration ----~~ ~4.,<;
Cornerposts of Service
The stakeholders may be focused at any point in time on one or more of the
four comerposts of service. The four comerposts of the model are
• Target markets
• Core service definition and norms
• Operations management & systems
• Service delivery/distribution systems
Target markets are groups of clients that can be differentiated from others in
some meaningful way. In doing a training needs assessment, for example, one
evaluation task is to define a target audience for the training to be developed.
By discriminately defining the audience for whom the training is best suited,
46 What Stakeholders Want to Know
the training organization can simultaneously encourage their best clients (i.e.,
those most likely to really benefit from the training) to attend and discourage
others (in this case, those who do not need the training) from attending. To take
a broader scenario, a training department of a large organization may be able to
focus its resources on certain groups of training sponsors or organization
management through its budget allocations or through developing a critical
competence timed to meet that targeted market's needs.
Core service definition and norms encompass both what the function
delivers and how attitudes, processes, and tangible resources are aligned to
position training and development relative to its target markets. For example,
building a training center in which highly skilled instructors deliver programs
to groups of participants brought together to be trained and to reinforce
interpersonal networks in a company, represents a very different core service
definition and operates around different norms than creating and disseminating
point-of-need, technology-based, individual performance support.
While the four cornerposts define some anchor points in the management of
a service, the dynamic part of service management is handling the linkages
between the cornerposts. It is in the linkages that image is created, efficiency is
achieved, competing needs are balanced, and satisfaction is realized.
Evaluating Corporate Training: Models and Issues 47
Linkages
Bringing together a target market with the core services offered involves
assessing the buyer's needs and positioning the core services and norms to
reduce those needs. Establishing management's overall aims and values is a
starting point that must be married to the training function's ability to present a
suite of particular services. In view of the company's objectives, functional
management for training and development should consciously decide how to
position the service. This poSitioning will communicate to the individuals in
the target market that they are important to the training and development
function and that the function can help them meet their needs. It will also
communicate to training and development employees that they can make a real
contribution and meet their own professional needs by addressing the potential
client's need.
The linkage between core services and operations management is where the
relationship of the benefits to be realized by the client and the costs to provide
the service that will produce the benefits is optimized. Stakeholder concerns
here are often related to methodological questions and process efficiencies.
Again to take the training example, a core service may be the development and
delivery of computer-enhanced point-of-need learning products. To be truly
successful, the levels of creativity, flexibility, and quality in the process must be
balanced against considerations such as the need to create and adhere to
standards, create and maintain reusable templates and computer code, and to
level staffing and workloads. The range of evaluation questions and data that
48 What Stakeholders Want to Know
can be brought to bear on deciding the best balance of these competing needs is
very broad.
The tension between the operations management system and the service
delivery/distribution system involves balancing the value of services to the
client with the training and development function's business objectives. At the
extremes, the training client would like to receive just the right amount of
exceptionally high-quality training for free; while the training providers would
like to be paid a premium for doing just what they want to do. In practice, there
must be a balance between these extremes in which clients expect operations to
support and not interfere with their project and providers expect to achieve a
reasonable cost recovery or budget support as well as get deserved recognition
for delivering quality results. The stakeholder concerns at this point are often
focused on fairness, performance, and value. The challenge to the business
manager is to find or define standards that balance stakeholder power. The
challenge to the evaluator is to create measures that reflect the manager's intent
and are credible to the stakeholders.
Table 3-1 gives further examples of how evaluations help clients reduce
risks, increase efficiencies and differentiate themselves. The table illustrates
how different evaluation practices (needs assessment, formative evaluation, and
follow-up studies) can address the benefits clients seek.
50 What Stakeholders Want to Know
Description
Benefit o[Benejit How It Works
Reduce risk Evaluation In needs assessment Systematic assessment of
will assure needs assures that the project will focus on
quality so the fundamental questions and can be translated into
stakeholder high-quality, follow-on products, reducing the risk
will know that that off-target interventions will be produced.
the investment
continues to In formative evaluation: Validating fulfillment of
be justified. the intervention's objectives and proper execution
of approaches reduces the risk of program quality
failure.
Description
Benefit of Benefit How It Works
Differentiate Evaluation In needs assessment: Positive reactions to an
will improve intervention are dependent upon getting the right
or protect the intervention to the right clients in the right way.
client's Realistic study design and good links with other
reputation important systems that are affected serve to
enhance the training and development
organization's reputation.
Stakeholders, with all their individually unique needs, are often unable to tell
the evaluator directly, clearly, and succinctly what they want to know. Yet
reflection and study has yielded some patterns than can enhance our
understanding and ability to respond appropriately. Stakeholders' questions
seem to sort into patterns along two main dichotomies. These dichotomies are
juxtaposed in Table 3-2, and will be further illustrated in Figures 3-2 and 3-3.
They are 1) Effects vs. fidelity, and 2) Micro concerns vs. macro concerns.
52 What Stakeholders Want to Know
MICRO MACRO
Evaluation Evaluation
EFFECfS Assessing results of a Assessing results of a
(Results Assurance) course, curriculum, or department or function
specific intervention
FIDELITY Assessing processes Assessing processes
(Process Assurance) related to a course, related to department or
curriculum, or specific function management
intervention
Now we are ready to begin to bring together this complex set of concepts to
form a set of hypotheses about what stakeholders want to know. The two
dichotomies, Effectiveness vs. fidelity and Micro vs. macro evaluation, are used
to create and organize this set of hypotheses.
When the two dichotomies are set against each other, four quadrants, each
representing a different combination of types of concerns, are created. These
combinations determine the patterns of questions that stakeholders ask, and
each stakeholder brings his/her own perspective to what will be asked. Figure
3-2 shows a number of the driving concerns that different stakeholders have,
depending on whether they are more focused on effects or fidelity, and on
macro or micro elements.
Etreds
• Participant: Job made easier, increased • Participant: Image of quality,
opportunity reputation, use generally creates value
• Program Sponsor: Olange in behavior; • Training Sponsor: Consistent quality,
evidence of success; justification of expense, competitive advantages
penetration; meet specific objectives • Organization Management: Volume or
• Organizational Management: Relevance, throughput, organizational
attendance, absence of complaint, differentiation in marketplace, improve
transferable or redeploy able resources promotability rate, better use lower
• Supplier: Client satisfaction level personnel for higher level work
• T &D Function Management: Achievement (leverage), comparative quality or
of intended knowledge, skill, or attitude efficiency
change • Supplier: Organizational viability
• Intervention Developer: Acceptance of the • T &D Function Management: A good
work, behavior change effected, participants reputation in the organization and in the
recommend the program to others training & development professions
• PersonnellHRD Manager: Absence of • Intervention Developers: A viable
complaints, participants recommend the organization, a good reputation in the
program to others, participants better able to field, employee talent appropriately
do the job, easier or more flexible to assign used in the work
• PersonnellHRD Manager: Participants
in general and their supervisors respond
favorably to the expenditure and time
investment
Micro Macro
Evaluation Evaluation
• Participant: Relevance, ease of process & • Participant: Consistency in service
logistics, quality of the experience level, predictability
• Training Sponsor: Good project • Training Sponsor: Ease of working
management, no surprises in budget or relationship, right and sufficient skill
process sets in place
• Organizational Management: No crises, • Organization Management:
meet approval process Efficiency, overall operational
• Supplier: Timeliness management to plan
• T &D Function Management: Absence of • Supplier: Predictability
crises that command time from management • T &D Function Management: Good
on the project; ability to deploy resources estimating processes and process agility
easily, adequately to do the project • Intervention Developers: Management
• Intervention Developer: Support needed to decisions that result in right resource
accomplish an acceptable product (access to deployment and in staff assignments for
content, client reviews, etc.) both success and for long-range
• PersonnellHRD Manager: Ease in getting development
someone prepared for and participating in the • PersonnellHRD Manager: Systems in
intervention place to enable use of services when
needed and to access the services easily
Fidelity
Both of the above examples are from the micro (course, specific curriculum,
or other specific intervention) side of the model. Turning to the macro
(function- or unit-level) issues, we find the participant concerned with
predictability and consistency of processes used by the training and
development function over several contacts or programs. From an evaluation
perspective, this concern can only be dealt with by looking across a variety of
programs and assessing the consistency with which services were delivered.
One tool here is to track performance against standards. Also on the results
side, the partiCipant is concerned about the reputation of the training and
development organization. A reputation for delivering excellent training serves
some companies well, even making a positive difference in the recruiting
process. Unless the reputation is built from hype, which is not reasonable in the
long run, it will be built on the aggregate of evaluations done, through
summaries of client satisfaction and other types of evaluation. Whether those
evaluations are systematic or not is a separate question in many companies, but
they can and should be systematic.
While participants are very focused on the training event and the processes
and outcomes related to it, training sponsors and organization management
have broader views that also encompass the program design and development
process. Starting again with the fidelity issues on the micro-side, the training
sponsor is concerned that the project management is well executed. Sponsors
want no unpleasant surprises. Organizational management, likewise, wants to
be untroubled by individual projects after they are approved-thus they want no
Evaluating Corporate Training: Models and Issues 55
Effects
• End-of-course questionnaires & interviews • Benchmarking
• Tests • Financial accounting/analysis
• Self-reported application results • PeIformance measurement
(questionnaires, interviews, observations) • Image surveys
• Supervisorlpeerlsubordinate ratings • Client satisfaction summary
• Correlational studies of training
peIformance and key business variables
• Program records analysis
Micro Macro
Evaluation Evaluation
The evaluation challenges posed by the kinds of concerns noted in the macro
columns of Figures 3-3 and 3-4 ask the evaluator to take a business function
view. This drives evaluators to strive to migrate their services from specific
courses or very narrow program scopes to programmatic and function concerns,
as has been recommended by Cooley (1984), Cummings (1986), and others.
This shift may be seen by some as beyond the evaluator's scope of service.
Krathwohl (1993), Brinkerhoff (1987), and Basarab and Root (1992). For our
purposes a definition of evaluation adapted to business is
Some would contend that anyone who gathers data and applies that data to
make judgments about a program or some other object is an evaluator. If,
however, a definition of disciplined inquiry applies, an evaluator must bring a
substantial level of rigor to the inquiry, build evidence according to the rules of
evidence, and then draw conclusions that lead to judgments of quality,
effectiveness, or value. The best of fact-based decision making often occurs
when an evaluator has been involved in the process.
Two core skills the professional evaluator shares with other professionals
and managers are project management and interpersonal skills. There are,
58 What Stakeholders Want to Know
however, three other key differentiating skill sets that the evaluator brings to a
project:
1. The objectivity of the data should assure that management or client beliefs
and wishes do not bias the results. The implication is that the evaluator
needs constructive, if not literal, independence from the client.
2. Sufficient quality is a subjective issue that frequently produces conflict.
Most often managers/clients do not have the same level of understanding
of or concern about sampling theory, test design, questionnaire design,
content analysis, etc. as do the evaluators and, therefore, do not see the
need for investing time and money at the levels the evaluator believes is
necessary. On the other hand, some evaluators approach each study as if it
should produce close to flawless, unassailable results. Finding the balance
between the criticality of the decision and usefulness of the evaluation data
is a challenge in the design of each new project the evaluator takes on.
3. Drawing supportable conclusions ties back to the quality of the data and
objectivity, but it also requires a broader understanding of the client and
their situation. In drawing conclusions, the evaluator must place relative
value on alternative decisions. In most cases the evaluator, going back
again to our definition, will make assertions about those alternative
decisions or will through the evaluative process make direct
Evaluating Corporate Training: Models and Issues 59
In the circle at the center of Figure 3-4, some of the key success elements for
the evaluator are presented under the most directly affected stakeholder. The
stakeholders' wants, laid out in the earlier parts of this paper, are serviced by
the evaluator meeting these success factors and attending as well to those under
the four cornerposts of target markets: core service definitions and norms,
operations management and systems, and service delivery/distribution systems.
In the realm of target market, the stakeholders want to know that you have
picked them well and that you know enough about them and their unique needs
to be of service. Relative to your core service definitions and norms, they want
to know that your staff is competent in their disciplines and that the services
offered are beneficial to the stakeholders. Under operations management and
systems, they want to be assured that you operate by a set of ethical and
professional standards and that you function efficiently. And in regard to the
60 What Stakeholders Want to Know
References
Basarab, D. 1., & Root, D. K. (1992). The training evaluation process. Boston, MA:
Kluwer Academic Publishers.
Bowen, D. E., Chase, R. B., Cummings, T. G., & Associates. (1990). Service
management effectiveness: Balancing strategy. organization and human resources.
operations. and marketing. San Francisco: Jossey-Bass.
Cooley, W. W. (1984). The difference between the evaluation being used and the
evaluator being used. In S. 1. Hueftle (ed.), The utilization of evaluation:
Proceedings of the 1983 Minnesota Evaluation Conference, (pp. 27-40).
Minneapolis, MN: The Minnesota Research and Evaluation Center.
Cronbach, L. J., & Suppes, P. (1969). Research for tomorrow's schools: Disciplined
inguiIy for education. New York: Macmillan.
Heskett, J. L., Sasser, W. E., & Hart, C. W. L. (1990). Service breakthroughs: Changing
the rules of the game. New York: Free Press.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco:
Berrett-Koehler Publishers, Inc.
Maister, D. H. (1993). Managing the professional service firm. New York: Free Press.
62 What Stakeholders Want to Know
Schultze, C. (1996, November 5). Promises, Promises: The Elusive Search for Faster
Economic Growth. International Political Economy. 3 (20), 6-10.
Tate, D. L., & Cummings, O. W. (1991). Promoting evaluations with management. New
Directions for Program Evaluation, No. 49, (Spring), 17-25.
Zeithaml, V. A., Parasuraman, A., & Berry, L. L. (1990). Delivering guality service:
Balancing customer perceptions and expectations. New York: Free Press.
Introduction
The second way this learning organization model affects our thinking about
the assumed relationships between training, learning, and work involves
highlighting the amount of learning that occurs as part of work itself. In place
of the old model, which was ftrst learning, then work, we now have the new
model: ftrst learning, then work-which-includes-continuous-learning. We are
not just learning to do the work better; we are building the organization's
knowledge base and revising its tools, processes, and products, as we work.
This, again, is what organizational learning means. This creates many new
potential areas for useful training by those who understand what is supposed to
happen here.
The huge environmental changes that are going on around us can be viewed
in terms of new organizational models, as we have been doing; they can also be
viewed in terms of the nature of work and the new importance of knowledge
and information in that work process. The information economy means quite
concrete changes in job demands. Smart work is breaking out allover; skill
levels have risen universally. Workers are required to handle far more
information, not just looking up one database but sometimes knowing how to
ftnd the appropriate one that may be located somewhere else, and applying the
information to solving customers' problems. The concierge of a big-city hotel
could be the template for many of these smart jobs. Discussion of the rising
importance of knowledge workers used to refer to scientists, creative artists,
and other highly educated professionals, but now the knowledge invasion has
swept through most jobs at all levels. Even the janitor must know about many
different cleaning chemicals, power tools, and specialized equipment, as well as
having detailed knowledge of the buildings, their layout, utilities, and the
special needs of different areas. The skills of the travel reservation worker may
start to emulate those of the reference librarian. As automated phone systems
and automated directory information systems replace human operators, those
operators and receptionists who remain must handle more demanding inquiries,
i.e., less routine questions that require problem-solving skills.
The times of profound change in which we live involve both changes in the
nature of work and changes in the nature of organizations, including the ways
in which work is managed through those new kinds of organizations. In this
brief introduction we have noted some of the changes in the nature of work. We
turn next to analyze the changes in organizational structures and processes. In
the next section, we will review a body of research that compares organizations
that represent both the old and new models, and draw some specific points of
contrast between them. We will review a literature that has tried to draw a
broad blueprint for the new organization, based on consultation and
intervention in various organizational change projects in cooperation with their
leaders. That section leads into a major learning organization case study. The
third major section will be a review of some of the key concepts of
organizational learning that underlie all of this work. Finally, we examine
some major implications of these conceptual models for assessment and
evaluation.
Observers of the business scene in the United States and the Western world in
general have been telling us that something fundamental is changing, and they
have been telling us this for more than twenty years. They have based this
conclusion on the study of newly successful companies and those whose success
has come to an end, attempting to profile the differences. There has been a
steeply rising curve of book sales on this topic, magazine articles, conferences,
consulting revenues, and all of the usual indicators. This essay will look in turn
at several aspects and implications of this phenomenon. To begin, we look at
new ideas about the new organization. Book lists in the 1990s are teeming with
management books touting new-type organizations by many names-
adhocracy, flexible organization, organismic organization, virtual organization,
Evaluating Corporate Training: Models and Issues 67
Along with these ideas about the new organization come ideas about what
exactly is wrong with the established bureaucratic model, and sometimes ideas
for new models of organization. The new models are all "unbureaucracies," for
the aim of the new organization is to escape from the limitations of the
bureaucracy, especially its resistance to innovation. If the revolt against
bureaucracy peaked in the nineties, it had been building up for some time. As
early as 1967 Burns and Stalker compared two UK samples of mechanistic and
organismic companies and found the latter significantly more innovative. At
about the same time Lawrence and Lorsch also conducted comparison studies
of U.S. companies, classifying them in a rather similar way.
These researchers were far ahead of the state of opinion among most
managers and received little attention, except from some scholars. By the late
1980s, though, the climate of popular interest had changed drastically. Writers
in this area also benefited from the change in public attitudes to facing the
weaknesses of U.S. management. In 1979 and 1983, Kanter published her
studies of companies, trying to understand how some companies were more
successful than others in producing innovations. The turning point in public
receptivity came in the early 1980s as new Japanese competitors began to
outclass and humiliate some large, established Western corporations, especially
in the automobile and consumer electronics industries (Ouchi, 1981; Athos and
Pascale, 1981).
The concern over innovation thus goes back at least to the 1960s but it took
a long time to get any serious attention. There was great resistance to the
shocking message that the basic model of organization and management for all
large business corporations in the Western world is deeply flawed and needs to
be changed significantly. This message threatened to undermine the current
security of the status quo and its beneficiaries who saw no reason to overcome
their disbelief. When the icy drafts of cold reality hit some major Western
companies in the late 1970s, the problem was first defined as one of Japanese
competition and one of product qUality. Thus Quality Management approaches
commanded some attention, and U.S. quality experts could finally get work in
the United States, after returning from Japan where they had found capable
followers who implemented their ideas since the 1950s.
68 The Learning Organization: Implications for Training
Quality thinking about management and organization fits well with the
organismic model and opposes some fundamental postulates of bureaucratic
management. It can be a useful platform from which to rebuild corporate
designs in a direction more favorable to quality-enhancing innovations. The
radical refocusing of all discourse around the primacy of the customer, which is
fundamental to TQM, can be a powerful force for flexibility. In practice,
though, many of those managers who claim to have taken up the quality
challenge have narrowed its definition to make sure that it does not go beyond
the improvement of concrete production processes with extensive cross-
boundary involvement at operating levels. Anything that might touch on
corporate policy and strategy or affect the balance of power between
departments, functions, and other power bases is carefully avoided. So there is
much unfinished work left from the quality movement, especially for those who
take seriously the theories of Deming about workers' motivation (Senge, 1992).
Although it was some Japanese companies and their managers that grabbed
the attention of the Western management world, the challenge for the West is
not how to make our managers and corporations into facsimiles of the
successful Japanese ones of the 1980s. Although understanding those
differences is an essential first step, it is just the beginning. Context counts and
history moves on. In the 1980s Japanese companies found a way to outcompete
the U.S. companies of that time. Since then the best U.S. companies have
learned a lot from those Japanese competitors, and both sides are now facing
serious competition from a new direction-the new tigers of Asia, such as
South Korea, Singapore, Malaysia, Taiwan, and mainland China. Today, both
U.S. and Japanese companies need to ask themselves allover again how they
can leverage the advantages and strengths they have relative to the current
environment, which includes the new competitors, while overcoming their
weaknesses, so as to re-make themselves organizationally in ways that bring to
bear their newly-configured competencies against the weaknesses of their
formidable competitors in the new marketplace
Each of the early students of the new organization gave us some version of a
two-column comparison chart, contrasting significant features of the old and
new types. The following table combines ideas from many sources, including
the type labels used by Bums and Stalker (1966)-Mechanistic and
Organismic-and the seven-S categories used by Athos and Pascale (1981).
Entries into the Organismic column draw from the findings of many of these
Evaluating Corporate Training: Models and Issues 69
Table 4.1
MECHANISTIC ORGANISMIC
BUREAUCRATIC LEARNING ORGANIZATION
OW MODEL NEW MODEL
When Max Weber first wrote about bureaucracy early in this century, his ideal
type was contrasted with older types of authority based on traditional (e.g.,
feudal) social and legal ties or personal-charismatic relationships. For Weber,
looking at the early stages of the Industrial Revolution, bureaucracy was more
impersonal, rational, dependable, and efficient than the system that had existed
70 The Learning Organization: Implications for Training
earlier. Now, however, we find it less efficient and less effective than the
emerging "unbureaucracies" or new organizations. For the needs of our current
environment, bureaucratic or mechanistic models will not suffice; what is
needed is something entirely more organismic.
Even in its own day, the bureaucratic paradigm did not give an accurate
picture of how the system actually operated. Studies of bureaucratic workplaces
from twenty years ago began to show the limits and dysfunctions of the strict
bureaucracy. Communication, especially up the hierarchy, is often
systematically distorted, concealing bad news from the boss. The assumed
expertise of higher levels to solve problems at lower levels was wrong, because
people at the upper levels lacked the information possessed by lower-level
workers, who were reluctant to share it. Sometimes the reasons for not sharing
the information were fear of punishment or the boss's refusal to listen;
sometimes the reasons were secretiveness of the subordinates. Either way,
studies showed that informal adaptations can develop in the nooks and crannies
or unsupervised areas of bureaucracy, creating local, unofficial, and sometimes
covert solutions to operating problems not officially recognized at higher levels.
Sometimes these informal arrangements were aligned with the goals of the host
organization; sometimes they served the goals of the sub-unit rather than those
of the whole organization, and sometimes they involved sabotaging efforts of
the administration to tighten controls on the rank and file (Blau and Meyer,
1971). Often enough, the informal adaptations worked in the interests of the
larger organization, solving problems effectively without any fuss or bothering
the boss.
This kind of thinking about the other face of bureaucracy opens many
questions about whether the old paradigm is useful, even as a heuristic analytic
device. It begins to look as if the notion of formal organization as depicted by
the conventional organizational chart is more misleading than helpful once we
reflect on the facts that actual lines of communication are quite different and
that local subcultures may support norms and mental models quite different
from the official model which is the old paradigm. Among sociologists, the
classical, Weberian model gave way to a natural systems model, and
sociologists began to pay attention to the negotiations between an organization
and elements of its environment as well as coalition-building and other forms
of internal negotiations between components or groupings (Bidwell, 1986;
Kanter, 1989). Natural systems and open systems models came to replace the
classical model of bureaucracy, paving the way towards better understanding of
organizational dynamics and change. Another conceptual innovation, network
analysis, was also necessary.
Network AnalYSis
Nohria and Eccles (1992) define network as "a fluid, flexible, and dense
pattern of working relationships that cut across various intra- and inter-
organizational boundaries" (p. 289). A network is defined by its ties or
relationships between members. A, B, and C may be members of the same
network but each one's network may not share the same other members. D may
be related to A and C but not to B. To the extent that these networks converged
on a common membership, we would approach a bounded group with common
membership identity, as in the older model, an organization or a community. It
seems, however, that the looser model works better for analyzing important
situations that do not conform to the assumptions of the older model. Each
person's network is built link-by-link, as opposed to becoming a member of a
group or organization and suddenly acquiring a new set of fellows, even if one
does not meet them until later.
Networks have norms, link-by-link. They are more effective than pure
markets where all are strangers. They can generate trust, link-by-link, and A,
who has found Q to be a dependable, trustworthy partner, may recommend Q to
her partner, B, who is not acquainted with Q. Based on B's trust of A and their
history of doing business together, B may trust the stranger Q with a contract
he would not otherwise offer to a stranger. Networks may grow through
sponsorship like this. In an integrated network Q will not let B down, out of
respect for the relationship with A (Uzzi, 1996). Thus network may be a more
refined tool of analysis than the older ones of group/organization and culture.
Learning Organizations
workers to the job, to each other, and to the boss. According to Garvin (1993, p.
80-81),
They identify three distinct roles that are required to cooperate in industrial
knowledge creation: "[N]o one department or group of experts has the exclusive
responsibility for creating new knowledge. Front-line employees, middle
74 The Learning Organization: Implications for Training
managers, and senior managers all playa part" (Nonaka & Takeuchi 1996, p.
59). They each have distinct roles, and there is dynamic interplay between
them. Senior managers articulate a broad corporate mission, stated quite
abstractly; front-line employees struggle to find ways to translate this into more
concrete terms, using their tacit and explicit expert knowledge that they have
worked hard to share with each other. "[A] corporate vision presented as an
equivocal strategy by a leader is organizationally constructed into knowledge
through interaction with the environment by the corporation's members"
(Nonaka & Takeuchi, p. 59).
They are deeply concerned with the practical issues of implementing these
changes in the workplace. They take a very tough line on the question of how
one can balance or tradeoff short-term and long-term results and refuse to allow
Evaluating Corporate Training: Models and Issues 75
We now come to our third and main exhibit in the learning organization
literature. This is the work of the MIT Center for Organizational Learning, a
substantial network of consultants and researchers, united by their commitment
to a set of ideas and principles outlined in The Fifth Discipline (Senge, 1990)
and other publications (Senge, Roberts, Ross, Smith, and Kleiner, 1994;
Chawla & Renesch, 1995; Kim, 1993; Kofinan & Senge, 1995; Roth &
Kleiner, 1996; and Schein, 1993). This literature describes and interprets a
number of change-oriented field projects, it presents the theoretical models
being tested and developed; it describes the methodology used; and it presents a
vision of better organizations, which is central to all this work.
All five disciplines are essential but personal mastery has a special,
foundational role, because of the importance placed on intrinsic motivation in
this model. "[B]y focusing on performing for someone else's approval,
corporations create the very conditions that predestine them to mediocre
performance" (Senge, 1990a).
The learning organization for Senge extends the vision of the quality
movement. The learning organization fulfills the uncompleted part of
76 The Learning Organization: Implications for Training
Deming's vision, especially his views of the capacity of the average worker for
deep intrinsic commitment to the highest quality standards, given the right
conditions (Senge, 1990b).
AutoCo is one of the Big Three Detroit carmakers and Epsilon was its program
to create a new design and production tools for one of the high-end passenger
vehicles (Roth and Kleiner, 1996). These 300 engineers and related workers
were responsible for being ready to launch production in three years, meeting
many quality standards, and for staying within a budget. The program director
came in with a personal goal, not only to produce a great car design but also to
improve drastically the process of program development. Having worked on
several prior new car programs, he was familiar with the customary panic,
stress, and pandemonium in the last phase, just before launch, where everyone
would be working extreme amounts of overtime to catch up with many overdue
components. He was convinced that this was unnecessary and could be avoided
by better management. Exactly how, he did not know, but he was fortunate to
have as his deputy someone who had studied learning organizations, especially
the version being developed at MIT. A consulting team from MIT was engaged
to work with the Epsilon program. While the MIT team had many ideas and
tools to offer, they did not have a fully developed package and agreed to work
as partners with the Epsilon team, not as consultants in the traditional role.
A core learning team was formed with ten Epsilon managers and several
staff from MIT, which met every month or two for nine months. They
conducted joint assessment and diagnosis on team working issues, become
acquainted, and began learning the basic concepts and tools of organizational
learning. Members of this leadership group first engaged themselves in some
very serious learning and change, before asking the rest of the program staff to
become involved in changes. And when they did approach them, the senior
managers of Epsilon from the core learning team acted as teachers and coaches
for the other staff, with the help of the MIT consultants and other AutoCo
consultants, who had also been involved in the core learning team.
The training agenda and content was developed by the core learning team.
Several members interviewed other program staff about their greatest
challenges and strengths. The core team used that data to study to question
"Why are we always late?" Working together they created a systems map
(causal loops) of many factors, which led to discovering the point of leverage. It
was the fact that engineers having a problem with a component would not
report the problem until very late, which would cause other dependent elements
to be delayed, unnecessarily compounding the problem. Had the problem been
Evaluating Corporate Training: Models and Issnes 79
revealed earlier, others conld have helped to speed up the solution and prevent
the escalation effect.
Concealing problems was a consistent pattern. The are team discussed why
this happened, concluding that it was a combination of engineering culture,
which does not report any problem until you know the solution, and a company
culture in which reporting problems would be held against someone,
downgrading performance appraisals and reputation. The Epsilon core team
wanted to change that, and they realized that it would require establishing trust
among the program staff that bearing bad tidings would be safe. This required
some other supporting norms and beliefs. Of great importance would be the
belief that no one has all the answers, and that cooperation for the good of the
whole program was more important than some individuals being embarrassed
because their part of the work was not going well right now. In other words, the
Epsilon team was aiming to create a culture very different from the traditional
one at AutoCo, which all members had experienced over many years, a new
culture in which managers expected engineers to make their own decisions and
in which cross-functional collaboration was common. These insights were
developed first among the core learning team, over many meetings, including a
two-day off-site learning lab.
Several similar learning labs were held for a total of one-third of the entire
staff and many briefings and discussions were held with all of them. Much of
the training curriculum was concerned with communications and with
changing the norms of communication. The learning labs included various
experiential methods, including computer simulation of the entire program
development process. What made the biggest impact on employees, though, and
what they believed made the biggest contribution to changing the culture, was
seeing that senior Epsilon managers actually changed their behavior. They
became less authoritarian and more open to other viewpoints, and when an
engineer reported a serious problem delaying hislher work, they made sure that
they got help and made sure that the engineers did not suffer for their honesty.
The changed work process and culture was successful on the bottom line.
Launch of the new model was, as intended, a nonevent and actually took place
ahead of schedule. This was an unprecedented event in living memory. The
new model also came in well under budget; customer reaction and various
quality measures on the new vehicle were also well above previous levels.
80 The Learning Organization: Implications for Training
These issues can also be seen through the lens of organizational learning, for
they involve, in this case, a lost opportunity for the corporation to have learned
from the experience of the EpSilon project. The Epsilon organization itself
learned very effectively, up to a point. Not only did the individual members of
Epsilon learn new skills and attitudes (individual learning) but the whole
Epsilon project learned to operate according to new norms and designs
(organizational learning) involving close collaboration geared to optimizing
their collective performance as a team and producing a great new vehicle. This
is opposed to putting their component first and the whole product second, as the
old way. The concept of organizational learning is a subtle and difficult one,
but it is crucial to a true understanding of the issues involved. We turn now to a
discussion of this hard but rewarding concept, which lies at the heart of the
learning organization.
Organizational Learning
The work of training and development and the work of assessment and
evaluation of performance all need to be rethought in the light of the conceptual
models of organizational learning and the learning organization to the degree
that these represent the mental models and aspirations of our clients and
ourselves. In its simplest terms, the goals and objectives of training and
development will be quite different in a learning organization, and many key
assumptions that are made about the place of the training and development
effort within the organization must be reassessed. In its fullest terms, the entire
traditional scope and definition of training and development is called into
question in favor of a more complex and interesting one. We shall review a few
key points of the change: what, why, who, and how.
What is the goal of the training? What are the learning goals? These
questions determine what needs to be assessed. Two changes in perspective are
important. One is the shift from individual learning and performance as the
focus to collective or team learning and performance as well as the individual
kind in the learning organization. The other change involves the shift in
balance among process and outcome factors; this is a complex shift, not just a
shift from more A to more B. The learning organization is deeply concerned
with both process and outcome factors, at both individual and collective levels.
In the Epsilon case we saw how bottom line collective performance was crucial.
This included on-time launch, within budget, good quality reviews by
consumers. Individual assessments at this level were irrelevant. But during the
three-year run-up to the launch, various process measures were critical, since
they had a goal of changing the nature of collaborative work among these
design engineers. To achieve the desired ease of collaboration, give-and-take in
service of the collaborative effort, Epsilon needed to achieve greater trust
among engineers and managers. A key indicator of trust was the number of
problems reported by memo. At early stages of the design process a large
number was positive, resulting in lower numbers at the end of the project
timeline.
Why is a particular assessment being done? The key distinction is the classic
one between (1) formative assessment for the purpose of supporting learning
and improvement and (2) summative assessment for the purpose of helping to
make an executive decision, which will impact on those being assessed. It may
affect their compensation, promotion prospects, or the future of the project
Evaluating Corporate Training: Models and Issues 85
How will the assessment be carried out? This will be related to how the
training was initially conceived and designed. The more discrete and traditional
the design, the more discrete and convenient the assessment can be. But in the
learning organization context, of course, things can be more complex.
Returning to the Epsilon example again, we recall that some of the training was
done in a discrete, off-site setting, so traditional evaluation could be done at the
conclusion of training, batch-style. Did individuals, for example, learn skills of
constructing causal loop diagrams? Did certain groups collectively learn to
conduct discussion according to certain guidelines? Evaluating behavior
change, the question is not just "Can they perform certain skills when asked?"
but "Do they perform those skills, when needed, in an effective way?"
Who is the prime training and development client, and who are the
recognized stakeholders? Who gets what reports? Who is involved in
negotiating the contract with the training and development assessment staff?
Who is greatly influenced by why. When formative assessment to promote better
learning is the purpose (why?), then smart managers will agree to a hands-off
approach in favor of letting the work-group manage themselves. Without such
an understanding, there can be no learning organization.
86 The Learning Organization: Implications for Training
Usually the boss should get the summative or bottom-line reports. Usually
and mainly process performance and learning data should be reserved for the
individuals and work-teams themselves. Completely excluding the boss,
however, can also be a bad mistake, as the Epsilon case suggests. The learning
organization requires that bosses and workers should all share the same basic
mental models of collaboration and process. The problem at Epsilon was not
that their corporate boss needed to see process data, but that he interpreted it to
mean the opposite of what it meant within Epsilon. To them, the high number
of problems reported meant that trust levels were improved so that the problems
could be solved sooner with lower costs and delays, with a huge improvement
in cooperation and total system effectiveness, a triumph. To the boss, thinking
in terms of the traditional company culture, where most problems are
concealed, the same data meant that Epsilon was totally out of control, a
disaster. This learning organization in the Epsilon case stopped abruptly at the
program boundary; hence dialogue across this boundary involved a major
language problem. Assessment was certainly one of the conversations that
broke down because of the lack of shared meanings at this point. Whether
identified assessment professionals or versatile managers are in charge, the
challenge is to decide, jointly among the players, what is to be assessed, why,
how, and by whom.
Evaluating Corporate Training: Models and Issues 87
Conclusion
In today's global economy, companies face new success factors, above all the
need for fast and flexible change on the part of organizations. The challenge is
the changed environment; the solution could be the learning organization. That
scenario creates entirely new challenges for the field of training and
development. Can this field respond adequately?
To succeed, the training and development field will have to be capable of re-
inventing itself-just like the organizations it wishes to serve. It must fully
understand these organizations as its customers, especially their cultures and
aspirations. For the most farsighted organizations the aspiration is to become
more like a true learning organization. If training and development work is not
to be downgraded and commoditized in the current cost-cutting frenzy, it must
win acceptance as a strategic partner of top management, based on its ability to
provide all kinds of needed training (broadly construed) and supports for
managers and key employees as they learn their changing roles in the process
of organizational learning. Training and support will take many forms, new to
members of this profession, as new forms continue to be invented. Many will be
embedded in work systems in new ways. The profeSSion must invent them if it
88 The Learning Organization: Implications for Training
References
Appelbaum, E., & Batt R., (1994). The new American workplace: Transforming work
systems in the U.S. Ithaca, NY: ILR Press.
Argyris, C., & Schon D. A, (1996). Organizational learning. (vols. 1-2). Reading, MA:
Addison Wesley.
Ashkenas, R., Ulrich, D., Jick, T., & Kerr, S. (1993). The boundaryless organization.
San Francisco, CA: Jossey Bass.
Bidwell, C. (1986). Complex organizations: A critical essay (3rd ed.). New York:
Random House.
Blau, P., & Meyer, M. W. (1971). Bureaucracy in modern society. New York, NY:
Random House.
Burns, T., & Stalker, J. M. (1966). The management of innovation. London: Tavistock.
Chawla, S., & Renesch, 1. (eds.) (1995). Learning organizations: Developing cultures
for tomorrow's workplace. Portland, OR: Productivity Press.
Kanter, R. M. (1983). The change masters. New York, NY: Simon & Schuster.
Kanter, R. M. (1989). When giants learn to dance. New York, NY: Simon & Schuster.
Evaluating Corporate Training: Models and Issues 89
Kim, D. H. (1993). The link between individual and organizational learning. Sloan
Management Review, 125-136.
Kofman, F., & Senge, P. (1995). Communities of commitment: The heart of learning
organization. In Chawla, S., & Renesch, 1. (Eds.). Learning organizations:
Developing cultures for tomorrow's workplace. (pp. 17-51). Portland, OR:
Productivity Press.
Nohria, N., & Eccles, R G. (eds.) (1992). Networks and organizations: Structure. form.
and action. Boston, MA: Harvard Business School Press.
Nonaka, I., & Takeuchi, H. (1996). The knowledge creating company. New York:
Oxford University Press ..
Ouchi, W. G. (1981). Theory Z: How American business can meet the Japanese
challenge. Reading, MA: Addison Wesley.
Pascale, R T., & Athos, A. G. (1981). The art of Japanese management. New York:
Simon and Schuster.
Roth, G., & Kleiner, A. (1996). The learning initiative at the AutoCo Epsilon Program
1991-94. Cambridge, MA: Center for Organizational Learning, Massachusetts
Institute of Technology.
Schein, E. H. (1993). Organizational culture and leadership (2nd ed.). San Francisco:
Jossey-Bass.
Senge, P. M. (1990b). The leader's new work: Building learning organizations. Sloan
Management Review.
Senge, P. M. (1992). The real message of the quality movement: Building learning
organizations. Journal for Quality Participation, March.
Senge, P. M., Roberts, C., Ross, R., Smith, B., & Kleiner, A. (1994). The fifth
discipline fieldbook. New York: Doubleday.
Smith, B., & Kleiner A. (1995). Is there more to corporations than maximizing profits?
The Systems Thinker. 6(3).
Uzzi, B. (1996). The sources and consequences of embeddedness for the economic
performance of organizations: The Network effect. American Sociological Review,
674-698.
Weber, M. (1947). The theory of social and economic organization. (Trans. A.M.
Henderson and T. Parsons.) Glencoe,IL: Free Press. (Original work published.)
The Web site for the MIT Center for Organizational Learning is now available via the
MIT homepage, providing a list of working papers, including some in full text.
http://learning.mit.edul
The chapter by Jack Phillips, one of the leading advocates and practitioners
of evaluating the return on training investments, represents both an elaboration
and expansion of the four levels. He proposes return on investment (ROI) as a
fifth level in the paradigm. Adding this level has, in turn, sparked another
debate. Phillips argues that return on investment can be calculated with an
acceptable degree of confidence and at a reasonable cost. His chapter provides
guidelines for how to position and carry out this type of evaluation.
whether it fit the performance needs and made an effective and efficient
contribution to achieving the performance goals of the organization. One
cannot address this concern without examining the logic underlying the design
of the intervention and the value chain underlying the implementation process.
This section ends with a discussion by Brown and Elfenbein, which provides
us with a model for practitioners to assimilate known knowledge into their
practice. This academic chapter builds upon the work of psychologist David
Kolb in experiential learning. The model attempts to integrate the introduction
of new knowledge into the intelligence of practice.
The chapters in this section provide the reader with historical background
on the field of training evaluation, new perspectives on established practice,
and fresh ideas to stimulate the growth and advancement of the field. It also
firmly places the practice of training evaluation within the context of human
resource development. Training and the evaluation of its effectiveness cannot
be isolated from the business goals of the organization and the personal
competencies required of the workforce to meet them.
5 THE FOUR LEVELS OF
EVALUATION
Donald L. Ki rkpatrick
Introduction
The reason I developed this four-level model was to clarify the elusive term
evaluation. Some training and development professionals believe that
evaluation means measuring changes in behavior that occur as a result of
training programs. Others maintain that the only real evaluation lies in
determining what final results occurred because of training programs. Still
others think only in terms of the comment sheets that participants complete at
the end of a program. Others are concerned with the learning that takes place
in the classroom, as measured by increased knowledge, improved skills, and
changes in attitude. And they are all right-and yet wrong, in that they fail to
recognize that all four approaches are parts of what we mean by evaluating.
These four levels are all important, and they should be understood by all
professionals in the fields of education, training, and development, whether
they plan, coordinate, or teach; whether the content of the program is technical
or managerial; whether the participants are or are not managers; and whether
the programs are conducted in education, business, or industry. In some cases,
especially in academic institutions, there is no attempt to change behavior. The
end result is simply to increase knowledge, improve skills, and change
attitudes. In these cases, only the first two levels apply. But if the purpose of the
training is to get better results by changing behavior, then all four levels apply.
However, in human resource development circles, these four levels are
96 The Four Levels of Evaluation
recognized widely, often cited, and often used as a basis for research and
articles dealing with techniques for applying one or more of the levels.
l. Determining needs
2. Setting objectives
3. Determining subject content
4. Selecting participants
5. Determining the best schedule
6. Selecting appropriate facilities
7. Selecting appropriate instructors
8. Selecting and preparing audiovisual aids
9. Coordinating the program
10. Evaluating the program
There is an old saying among training directors: when there are cutbacks in an
organization, training people are the first to go. Of course, this isn't always
true. However, whenever downsizing occurs, top management looks for people
Evaluating Corporate Training: Models and Issues 97
and departments that can be eliminated with the fewest negative results. Early
in their decision, they look at such overhead departments as Human Resources.
Human Resources typically includes people responsible for employment, salary
administration, benefits, labor relations (if there is a union), and training. In
some organizations, top management feels that all these functions except
training are necessary. From this perspective, training is optional, and its value
to the organization depends on top executives' view of its effectiveness. In other
words, trainers must justify their existence. If they don't and downsizing
occurs, they may be terminated, and the training function will be relegated to
the Human Resources manager, who already has many other hats to wear.
1. To what extent does the subject content meet the needs of those attending?
2. Is the leader the one best qualified to teach?
3. Does the leader use the most effective methods for maintaining interest
and teaching the desired attitudes, knowledge, and skills?
4. Are the facilities satisfactory?
5. Is the schedule appropriate for the participants?
6. Are the aids effective in improving communication and maintaining
interest?
7. Was the coordination of the program satisfactory?
8. What else can be done to improve the program?
98 The Four Levels of Evaluation
A careful analysis of the answers to these questions can identify ways and
means of improving future offerings of the program.
Most companies use reaction sheets of one kind or another. Most are
thinking about doing more. They have not gone any further for one or more of
the following reasons:
In most organizations, both large and small, there is little pressure from top
management to prove that the benefits of training outweigh the cost.
There are three reasons for evaluating training programs. The most common
reason is that evaluation can tell us how to improve future programs. The
second reason is to determine whether a program should be continued or
dropped. The third reason is to justify the existence of the training department.
By demonstrating to top management that training has tangible, positive
results, trainers will find that their job is secure, even if and when downsizing
occurs. If top-level managers need to cut back, their impression of the need for
a training department will determine whether they say, ''That's one department
we need to keep" or "That's a department we can eliminate without hurting
us." And their impression can be greatly influenced by trainers who evaluate at
all levels and communicate the results to them.
The four levels represent a sequence of ways to evaluate programs. Each level
is important. As you move from one level to the next, the process becomes more
difficult and time-consuming, but it also provides more valuable information.
Evaluating Corporate Training: Models and Issues 99
None of the levels should be bypassed simply to get to the level that the trainer
considers the most important. The four levels are:
Levell-Reaction
Level 2-Learning
Level 3-Behavior
Level4--Results
Level1-Reaction
As the word reaction implies, evaluation on this level measures how those who
participate in the program react to it. I call it a measure of customer
satisfaction. For many years, I conducted seminars, institutes, and conferences
at the University of Wisconsin Management Institute. Organizations paid a fee
to send their people to these public programs. It is obvious that the reaction of
participants was a measure of customer satisfaction. It is also obvious that
reaction had to be favorable if we were to stay in business and attract new
customers as well as get present customers to return to future programs.
it. Otherwise, they will not be motivated to learn. Also, they will tell others of
their reactions, and decisions to reduce or eliminate the program may be based
on what they say. Some trainers call the forms that are used for the evaluation
of reaction happiness sheets. Although they say this in a critical or even cynical
way, they are correct. These forms really are happiness sheets. But they are not
worthless. They help us to determine how effective the program is and learn
how it can be improved.
Level2-Learning
Those are the three things that a training program can accomplish.
Programs dealing with topics like diversity in the workforce aim primarily at
changing attitudes. Technical programs aim at improving skills. Programs on
topics like leadership, motivation, and communication can aim at all three
objectives. To evaluate learning, the specific objectives must be determined.
Some trainers say that no learning has taken place unless a change in
behavior occurs. In the four levels described in this book, learning has taken
place when one or more of the following occur. Attitudes are changed.
Knowledge is increased. Skill is improved. Change in behavior is the next
level.
There are three things that instructors in a training program can teach:
knowledge, skills, and attitudes. Measuring learning, therefore, means
determining one or more of the following:
Level3-Behavior
Behavior can be defined as the extent to which change in behavior has occurred
because the participants attended the training program. Some trainers want to
bypass levels 1 and 2-reaction and learning-in order to measure behavior.
This is a serious mistake. For example, suppose that no change in behavior is
discovered. The obvious conclusion is that the program was ineffective and that
it should be discontinued. This conclusion mayor may not be accurate.
Reaction may have been favorable, and the learning objectives may have been
accomplished, but the level 3 or 4 conditions may not have been present.
The training program can accomplish the first two requirements by creating
a positive attitude toward the desired change and by teaching the necessary
knowledge and skills. The third condition, right climate, refers to the
participant's immediate supervisor. Five different kinds of climate can be
described:
Evaluating Corporate Training: Models and Issues 103
1. Preventing: The boss forbids the participant from doing what he or she
has been taught to do in the training program. The boss may be influenced by
the organizational culture established by top management. Or the boss's
leadership style may conflict with what was taught.
2. Discouraging: The boss doesn't say, "You can't do it," but he or she
makes it clear that the participant should not change behavior because it would
make the boss unhappy. Or the boss doesn't model the behavior taught in the
program, and this negative example discourages the subordinate from
changing.
3. Neutral: The boss ignores the fact that the participant has attended a
training program. It is business as usual. If the subordinate wants to change,
the boss has no objection as long as the job gets done. If negative results occur
because behavior has changed, then the boss may tum to a discouraging or even
preventing climate.
4. Encouraging: The boss encourages the participant to learn and apply his
or her learning on the job. Ideally, the boss discussed the program with the
subordinate beforehand and stated that the two would discuss application as
soon as the program was over. The boss basically says, "I am interested in
knowing what you learned and how I can help you transfer the learning to the
job."
5. Requiring: The boss knows what the subordinate learns and makes sure
that the learning transfers to the job. In some cases, a learning contract is
prepared that states what the subordinate agrees to do. This contract can be
prepared at the end of the training session, and a copy can be given to the boss.
The boss sees to it that the contract is implemented. Malcolm Knowles's book
Using Learning Contracts (1986) describes this process.
It becomes obvious that there is little or no chance that training will transfer
to job behavior if the climate is preventing or discouraging. If the climate is
neutral, change in behavior will depend on the other three conditions just
described. If the climate is encouraging or requiring, then the amount of
change that occurs depends on the fIrst and second conditions.
104 The Four Levels of Evaluation
It is important for trainers to know the type of climate that participants will
face when they return from the training program. It is also important for them
to do everything that they can to see to it that the climate is neutral or better.
Otherwise there is little or no chance that the program will accomplish the
behavior and results objectives, because participants will not even try to use
what they have learned. Not only will no change occur, but those who attended
the program will be frustrated with the boss, the training program, or both for
teaching them things that they can't apply.
What happens when trainees leave the classroom and return to their jobs?
How much transfer of knowledge, skills, and attitudes occurs? This is what
level 3 attempts to evaluate. In other words, what change in job behavior
occurred because people attended a training program?
Third, the trainee may apply the learning to the job and come to one of the
following conclusions: "I like what happened, and I plan to continue to use the
new behavior." "I don't like what happened, and I will go back to myoid
behavior." "I like what happened, but the boss and/or time restraints prevent
me from continuing it." We all hope the rewards for changing behavior will
cause the trainee to come to the ftrst of these conclusion. It is important,
therefore, to provide help, encouragement, and rewards when the trainee
returns to the job from the training class. One type of reward is intrinsic. This
term refers to the inward feelings of satisfaction, pride, achievement, and
happiness that can occur when the new behavior is used. Extrinsic rewards are
also important. They include praise, increased freedom and empowerment,
merit pay increases, and other forms of recognition that come as the result of
the change in behavior.
In regard to reaction and learning, the evaluation can and should take place
immediately. When you evaluate change in behavior, you have to make some
important decisions: when to evaluate, how often to evaluate, and how to
evaluate. This makes it more time-consuming and difficult to do than levels I
and 2. Here are some guidelines to follow when evaluating at level 3.
106 The Four Levels of Evaluation
Level 4-Results
Results can be defined as the final results that occurred because the participants
attended the program. The final results can include increased production,
improved quality, decreased costs, reduced frequency and/or severity of
accidents, increased sales, reduced turnover, and higher profits and return on
investment. It is important to recognize that results like these are the reason for
having some training programs. Therefore, the final objectives of the training
program need to be stated in these terms.
Some programs have these in mind on what we have to call a far-out basis.
For example, one major objective of the popular program on diversity in the
workforce is to change the attitudes of supervisors and managers toward
minorities in their departments. We want supervisors to treat all people fairly,
show no discrimination, and so on. These are not tangible results that can be
measured in terms of dollars and cents. But it is hoped that tangible results will
follow. Likewise, it is difficult if not impossible to measure final results for
programs on such topics as leadership, communication, motivation, time
management, empowerment, decision making, or managing change. We can
state and evaluate desired behaviors, but the final results have to be measured
in terms of improved morale or other nonfinancial terms. It is hoped that such
Evaluating Corporate Training: Models and Issues 107
things as higher morale or improved quality of work life will result in the
tangible results just described.
Now comes the most important and difficult task of all-determining what
final results occurred because of attendance and participation in a training
program. Trainers ask question like these:
How much did quality improve because of the training program on total
quality improvement that we have presented to all supervisors and
managers, and how much has it contributed to profits?
How much did productivity increase because we conducted a program on
diversity in the workforce for all supervisor and managers?
What reduction did we get in turnover and scrap rate because we taught our
foremen and supervisors to orient and train new employees?
How much has management by walking around improved the quality of
work life?
What has been the result of all our programs on interpersonal
communication and human relations?
How much has productivity increased and how much have costs reduced
because we have trained our employees to work in self-directed work
teams?
What tangible benefits have we received for all the money we have spent on
programs on leadership, time management, and decision making?
How much have sales increased as the result of teaching our salespeople
such things as market research, overcoming objections, and closing a
sale?
What is the return on investment for all the money we spend on training?
All these and many more questions usually remain unanswered for two
reasons. First, trainers don't know how to measure the results and compare
them with the cost of the program. Second, even if they do know how, the
findings probably provide evidence at best and not clear proof that the positive
results come from the training program. There are expectations, of course.
Increases in sales may be found to be directly related to a sales training
108 The Four Levels of Evaluation
This example is unusual at this point in history, but it might not be too
unusual in the future. Whenever I get together with trainers, I ask, "How much
pressure are you getting from top management to prove the value of your
training programs in results, such as dollars and cents?" Only a few times have
they said they were feeling such pressure. But many trainers have told me that
the day isn't too far off when they expect to be asked to provide such proof.
When we look at the objectives of training programs, we find that almost all
aim at accomplishing some worthy result. Often it is improved quality,
productivity, or safety. In other programs, the objective is improved morale or
better teamwork, which, it is hoped, will lead to better quality, productivity,
safety, and profits. Therefore, trainers look at the desired end result and say to
themselves and others, "What behavior on the part of supervisors and managers
will achieve these results?" Then they decide what knowledge, skills, and
attitudes supervisors need in order to behave in that way. Finally, they
determine the training needs and proceed. In so doing, they hope (and
sometimes pray) that the trainees will like the program; learn the knowledge,
skills, and attitudes taught; and transfer them to the job. The first three levels of
evaluation attempt to determine the degree to which these three things have
been accomplished.
So now we have arrived at the final level, What final results were
accomplished because of the training program? Here are some guidelines that
will be helpful?
Evaluating Corporate Training: Models and Issues 109
Everybody talks about it, but nobody does anything about it. When Mark Twain
said this, he was talking about the weather. It also applies to evaluation-well
almost. My contacts with training professionals indicate that most use some
form of reaction, "smile," or "happiness" sheets. Some of these are, in my
opinion, very good and provide helpful information that measures customer
satisfaction. Others do not. And many trainers ignore critical comments by
saying, "Well, you can't please everybody" or "I know who said that, and I am
not surprised."
conclusion is probably that the training program was no good, and we had
better discontinue it or at least modify it. This conclusion may be entirely
wrong. The reason for no change in job behavior may be that the climate
prevents it. Supervisors may have gone back to the job with the necessary
knowledge, skills, and attitudes, but the boss wouldn't allow change to take
place. Therefore, it is important to evaluate at level 2 so you can determine
whether the reason for no change in behavior was lack of learning or negative
job climate.
The first step for you to take in implementing the evaluation concepts,
theories, and techniques described in the preceding chapters is to understand
the guidelines of levelland apply them in every program. Use a philosophy
that states, "If my customers are unhappy, it is my fault, and my challenge is to
please them." If you don't, your entire training program is in trouble. It is
probably true that you seldom please everyone. For example, it is a rare
occasion when everyone in my training classes grades me excellent. Nearly
always some participants are critical of my sense of humor, content presented,
or the quality of the audiovisual aids. I often find myself justifying what I did
and ignoring their comments, but I shouldn't do that. My style of humor, for
example, is to embarrass participants.
I hope in a pleasant way so that they don't resent it. That happens to be my
style, and most people enjoy and appreciate it. If I get only one critical
comment from a group of twenty-five, I will ignore it and continue as I did in
the past. However, if the reaction is fairly common because I have overdone it,
then I will take the comment seriously and change my approach.
I used to tell a funny story in class. It was neither dirty nor ethnic. Nearly
everyone else thought it was funny, too, and I had heard no objections to it. One
day, I conducted a training class with social workers. I told the story at the
beginning of the class and proceeded to do the training. After forty minutes, 1
asked whether anyone had a comment or question. One lady raised her hand
and said, "I was offended by the joke you told at the beginning of the session,
and 1 didn't listen to anything you said after that!"
I couldn't believe it. I was sure she was the only one who felt that way, so 1
asked the question, "Did any others feel the same way?" Seven other women
raised their hands. There were about forty-five people in the class, so the
percentage was very much in my favor. But 1 decided that that particular joke
Evaluating Corporate Training: Models and Issues 111
had no place in future meetings. If she had been the only one, I probably would
still be telling it.
The point is this: look over all the reaction sheets, and read the comments.
Consider each one. Is there a suggestion that will improve future programs? If
yes, use it. If it is an isolated comment that will not improve future programs,
appreciate it, but ignore it.
Evaluating at level 2 isn't that difficult. All you need to do is to decide what
knowledge, skills, and attitudes you want participants to have at the end of the
program. If there is a possibility that one or more of these three things already
exist, then a pretest is necessary. If you are presenting something entirely new,
then no posttest is necessary. You can use a standardized test if you can find
one that covers the things you are teaching. Or you can develop your own test
to cover the knowledge and attitudes that you are teaching.
Levels 3 and 4 are not easy. A lot of time will be required to decide on an
evaluation design. A knowledge of statistics to determine the level of
significance may be desirable. Check with the research people in your
organization for help in the design. If necessary, you may have to call in an
outside consultant to help you or even do the evaluation for you. Remember the
principle that the possible benefits from an evaluation should exceed the cost of
doing the evaluation, and be satisfied with evidence if proof is not possible.
There is another important principle that applies to all four levels: You can
borrow evaluation forms, designs, and procedures from others, but you cannot
borrow evaluation results. If another organization offers the same program as
you do and they evaluate it, you can borrow their evaluation methods and
procedures, but you can't say, ''They evaluated it and found these results.
Therefore, we don't have to do it, because we know the results we would get."
Learn all you can about evaluation. Find out what others have done. Look
for forms, methods, techniques, and designs that you can copy or adapt. Ignore
the results of these other evaluations, except out of curiosity.
Trainers must begin with desired results and then determine what behavior
is needed to a accomplish them. Then trainers must determine the attitudes,
knowledge, and skills that are necessary to bring about the desired behavior.
The final challenge is to present the training program in a way that enables the
112 The Four Levels of Evaluation
participants not only to learn what they need to know but also to react favorably
to the program. This is the sequence in which programs should be planned. The
four levels of evaluation are considered in reverse. First, we evaluate reaction.
Then, we evaluate learning, behavior, and results-in that order. Reaction is
easy to do, and we should assess it for every program. Trainers should proceed
to the other three levels as staff, time, and money are available.
References
Kirkpatrick, D. L (1994). Evaluating Training Programs: The Four Levels. San
Francisco: Berrett-Koehler Publishers.
The ROI issue has sparked some interesting debate. Some argue that the
ROI must be developed, at least for a small number of programs. Others oppose
the ROI process and recommend that evaluation stop with an on-the-job skills
check. This debate creates several questions. Can the return on investment be
calculated? If so, can it be done with an acceptable level of confidence and at a
reasonable cost? Are many organizations pursuing ROI calculations? Early
114 Level Four and Beyond: An ROI Model
evidence suggests that the answer to all questions is a resounding yes. We begin
with a review of the concept of evaluation levels.
The HRD profession and the climate in which the profession is practiced has
changed significantly in the 40 years since Kirkpatrick developed his four
levels for evaluating training programs (Kirkpatrick, 1959a; Kirkpatrick,
1959b; Kirkpatrick, 1960a; Kirkpatrick, 1960b). At the time of its
development, the framework provided an ingenious way to sort through the
maze of evaluation problems facing training practitioners at that time. There
was a void of useful models to help the practitioners tackle this important issue.
The four-level framework was well received and has since been the model of
choice for the practitioners (Kaurman & Keller, 1994). It has enjoyed
widespread use and has a reputation throughout the world as a logical,
practical, and useful framework to pursue evaluation (Bramley & Kitson,
1994). With the growth of HRD and the growing importance of the function,
adjustments are necessary in the model to make it more useful and practical for
today's practitioners while at the same time addressing some of the concerns
about the approach.
The model is based on assumptions that there was a linkage between the
different levels: when a participant enjoyed the program, learning would occur;
when there was learning, there would be on-the-job behavior change; when
there was behavior change, there would be results. Unfortunately, these
linkages do not exist or are not very strong (Alliger & Janak, 1989). There is
116 Level Four and Beyond: An ROI Model
Definitions
Accountability
In terms of the value of the data collected, this approach has two limitations.
First, the results were not converted to dollar values. There is a need to
understand and account for the value of an improvement (Fitz-enz, 1994). For
example, top management may want or need to know the value of a 15 percent
Evaluating Corporate Training: Models and Issues 117
Earlier versions of Kirkpatrick's model paid little attention to the various ways
of isolating the effects of training, which is an important part of level 3 and 4
evaluations. In almost every training program where improvement is
documented with on-the-job behavior change or results, there were other
influences or factors that contributed to those results (Davidove, 1993). Few
cases exist in which training is the only input variable that has an influence on
an output variable during the timeframe of a training program. Therefore, an
important and essential ingredient of any evaluation effort must be a deliberate
attempt to isolate the effects of training.
3 and 4, training will improperly take credit for any improvement, and the
evaluation process will lose considerable credibility.
Why ROI?
There are some good reasons why return on investment has become a hot topic.
Although the viewpoints and explanations may vary, some things are very
clear. First, in most industrialized nations, training budgets have continued to
grow year after year, and as expenditures grow, accountability becomes a more
critical issue. An increasing budget makes a larger target for internal critics,
often forcing the development of the ROI.
A Revised Framework
The limitations of the Kirkpatrick model, and the concern for more attention to
ROI, has led to the development of a modified version of Kirkpatrick's model,
depicted in Table 6-2.
The primary modification is the addition of the fifth level, which focuses on
the return on investment. The other four levels are very similar to Kirkpatrick's
with some minor changes. Level 5 evaluation requires the addition of two
important and necessary steps. The first step is the conversion of the results
tabulated in level 4 to a monetary value to be used in an ROI formula. This
requires a direct conversion of hard data, such as quantity, quality, cost, or
time, which can be relatively easy for programs such as technical training. For
120 Level Four and Beyond: An ROI Model
soft data, the task is more difficult, although a variety of techniques are utilized
to place values on the improvements. Estimating values can be a reliable and
accurate method for placing values on training data (Marrelli, 1993). Even soft,
interpersonal skills training can be evaluated in dollar-and-cents terms (Fitz-
enz, 1994).
The second step involves the calculation of the costs for the program.
Although there has always been a need to capture training costs, the need is
greater with the pressure for ROI. Unfortunately costs are not clearly defined or
pursued in many organizations. The addition of this step will focus more
attention on accountability and require organizations to develop costs as they
move to the fifth level, the ultimate level of evaluation.
This model has several minor adjustments from Kirkpatrick's model, which
increase its likelihood of success. Reaction at level 1 in Kirkpatrick's model is
changed to reaction and planned action. Because training transfer is such an
important issue, it is helpful to include efforts to enhance the possibility of
transfer. At level 2, the definition of learning is broadened to specifically
include knowledge, skills, and attitudes. At level 3, another adjustment is made
Evaluating Corporate Training: Models and Issues 121
The calculation of the return on investment in HRD begins with the basic
model shown in Figure 6-1, where a potentially complicated process can be
simplified with sequential steps. The ROI process model provides a systematic
approach to ROI calculations. A step-by-step approach helps to keep the
process manageable so that users can tackle one issue at a time. The model also
emphasizes the fact that this is a logical, systematic process that flows from one
step to another. Applying the model provides consistency from one ROI
calculation to another. Each step of the model is briefly described here.
TABUI.ATE
PROGRAM
G()STS
CONVERT
ISOLATE C,'\LCULATE
DATA TO
THE EFFECTS -fHE ~lETUHN ON
MONETARY
OF TRAINING iNVESTMENT
VALUE
IDENTiFY
iNTANGIBLE
BENEFns
Several pieces of the evaluation puzzle must be explained when developing the
evaluation plan for an ROI calculation. Four specific elements are important to
evaluation success and are outlined below.
Training programs are evaluated at five different levels and are illustrated in
Table 6-2. Data should be collected at levels 1, 2, 3, and 4 if an ROI analysis is
planned. This helps ensure that the chain of impact occurs as participants learn
the skills, apply them on the job, and obtain business results.
A final aspect of the evaluation plan is the timing of the data collection. In
some cases, preprogram measurements are taken to compare with postprogram
measures and in some cases multiple measures are taken. In other situations,
preprogram measurements are not available and specific follow-ups are still
taken after the program. The important issue in this part of the process is to
determine the timing for the follow-up evaluation. For example, in some
programs, evaluations can be made as early as three weeks after customer
service skills training program for a major airline. Five years may be required
to measure the payback for an employee to attend an MBA program sponsored
by an Indonesian company. For most profeSSional and supervisory training, a
follow-up is usually conducted in the range of 3-6 months.
Evaluating Corporate Training: Models and Issues 123
Collecting Data
• When feasible, other influencing factors are identified and the impact
estimated or calculated leaving the remaining unexplained improvement
attributed to training. In this case, the influence of all of the other factors
are developed and training remains the one variable not accounted for in
the analysis. The unexplained portion of the output is then attributed to
training.
• In some situations, customers provide input on the extent to which
training has influenced their decision to use a product or service.
Although this strategy has limited applications, it can be quite useful in
customer service and sales training.
Collectively, these ten strategies provide a comprehensive set of tools to
tackle the important and critical issue of isolating the effects of training.
The other part of the equation on a costlbenefit analysis is the cost of the
program. Tabulating the costs involves monitoring or developing all of the
related costs of the program targeted for the ROI calculation. Among the cost
components that should be included are
128 Level Four and Beyond: An ROI Model
• The cost to design and develop the program, possibly prorated over the
expected life of the program
• The cost of all program materials provided to each participant
• The cost for the instructor/facilitator, including preparation time as well as
delivery time
• The cost of the facilities for the training program
• Travel, lodging, and meal costs for the participants, if applicable
• Salaries plus employee benefits of the participants to attend the training
• Administrative and overhead costs of the training function allocated in
some convenient way to the training program
In addition, specific costs related to the needs assessment and evaluation
should be included, if appropriate. The conservative approach is to include all
of these costs so that the total is fully loaded.
The return on investment is calculated using the program benefits and costs.
The benefit/cost ratio is the program benefits divided by cost. In formula form it
is
Program Benefits
BCR=
Program Costs
The return on investment uses the net benefits divided by program costs. The
net benefits are the program benefits minus the costs. In formula form, the ROI
becomes
Net Program Benefits
ROI (%) = Program Costs
x 100
frequently over 100 percent, while the ROI value for technical and operator
training may be lower.
Implementation Issues
Targets
After detailing the current situation, the next step is to determine a realistic
target within a specific time frame. Many organizations set annual targets for
changes. This process should involve the input of the full HRD staff to ensure
that the targets are realistic and that the staff is committed to the process. If the
training and development staff does not buy into this process, the targets will
not be met. The improvement targets must be achievable, while at the same
time, challenging and motivating. Table 6-3 shows the targets established for
Andersen Consulting for four levels (Geber, 1995). Andersen indicates that
many of the level 4 evaluations are taken to ROI. In some organizations, half of
the level 4 calculations are taken to level 5, while in others, everyone of them
are taken. Table 6-4 shows current percentages and targets for five years in
another organization, a large multinational company. This table shows the
gradual improvement of increasing evaluation activity at levels 3, 4, and 5. In
this table, year 0 is the current status.
Level TarJ(et
Levell, Reaction 100%
Level 2, Learning 50%
Level 3, Job Application ",30%
Level 4, Business Results 10%
Table 6-4. Percentages and targets for five years in a large multinational
company
Planning
Few initiatives will be effective without proper planning and the ROI process is
no exception. Planning is synonymous with success. Several issues are
presented next to show how the organization should plan for the ROI process
and position it as an essential component of the training and development
process.
Assigning Responsibilities
Another part of this process is the specific guidelines for measurement and
evaluation. The guidelines are more technical than policy statements and often
contain detailed procedures showing how the process is actually undertaken
and developed. They often include specific forms, instruments, and tools
necessary to facilitate the process. Overall, the guidelines show how to utilize
the tools and techniques, guide the design process, provide consistency in the
ROI process, ensure that appropriate methods are used, and place the proper
emphasis on each of the areas.
One group that will often resist the ROI process is the trammg and
development staff who must design, develop, and deliver training. These staff
members often perceive evaluation as an unnecessary intrusion into their
responsibilities, absorbing precious time, and stifling their freedom to be
134 Level Four and Beyond: An ROI Model
creative. The cartoon character Pogo perhaps characterizes it best when he said
"We have met the enemy and he is us." This section outlines some important
issues that must be addressed when preparing the staff for the implementation
ofROI.
On each key issue or major decision, the staff should be involved in the process.
As policy statements are prepared and evaluation guidelines developed, staff
input is absolutely essential. It is difficult for the staff to be critical of
something they helped design and develop. Therefore, their involvement
becomes a critical issue. Using meetings, training sessions, and task forces,
combined with routine input, the staff should be involved in every phase of
developing the framework and supporting documents for ROJ.
One reason the HRD staff may resist the ROI process is that the effectiveness of
their programs will be fully exposed, putting their reputation on the line. They
may have a fear of failure. To overcome this, the ROI process should clearly be
positioned as a tool for learning and not a tool to evaluate training staff
performance, at least during its early years of implementation. HRD staff
members will not be interested in developing a process that will be used against
them.
The training staff will usually have inadequate skills in measurement and
evaluation and thus will need to develop some expertise in the process.
Measurement and evaluation is not always a formal part of their preparation to
become a trainer or instructional designer. Consequently, each staff member
must be provided training on the ROI process to learn how the overall ROI
process works, step-by-step. In addition, staff members must know how to
develop an evaluation strategy and specific plan, collect and analyze data from
the evaluation, and interpret results from data analysis. Sometimes a one-to-
two-day workshop is needed to build adequate skills and knowledge to
understand the process, appreciate what it can do for the organization, see the
necessity for it, and participate in a successful implementation.
Perhaps no group is more important to the ROI process than the management
team, who must allocate resources for training and development and support
the programs. In addition, they often provide input and assistance in the ROI
process. Specific actions to train and develop the management team should be
carefully planned and executed.
Due to the critical need for this topic in management training, this workshop
should be required for all managers, unless they have previously demonstrated
strong support for the training function. Because of this requirement, it is
essential for top executives to be supportive of this workshop and, in some
cases, take an active role in conducting it. To tailor the program to specific
organizational needs, a brief needs assessment may be necessary to determine
the specific focus and areas of emphasis for the program.
136 Level Four and Beyond: An ROI Model
Monitoring Progress
The initial schedule for implementation of ROI provides a variety of key events
or milestones. Routine progress reports need to be developed to present the
status and progress of these events or milestones. Reports are usually developed
at six-month intervals. Two target audiences, the training and development
staff and senior managers, are critical for progress reporting. The entire
training and development staff should be kept informed on the progress, and
senior managers need to know the extent to which ROI is being implemented
and how it is working in the organization.
The results from an ROI impact study must be reported to a variety of target
audiences. One of the most important documents for presenting data, results,
and issues is an evaluation report. The typical report provides background
information, explains the processes used, and most importantly, presents the
results. Recognizing the importance of on-the-job behavior change, level 3
results are presented first. Business impact results are presented next, which
include the actual ROI calculation. Finally, other issues are covered along with
the intangible benefits. While this report is an effective and professional way to
present ROI data, several cautions need to be followed. Since this document is
reporting the success of a training and development program involving one or
more groups of employees, the complete credit for all of the success and results
must go to the participants and their immediate supervisors or leaders. Their
performance has generated the success. Also, another important caution is to
avoid boasting about the results. Although the ROI process may be accurate and
credible, it still may have some subjective issues. Huge claims of success can
Evaluating Corporate Training: Models and Issues 137
quickly turn off an audience and interfere with the delivery of the desired
message.
A final caution concerns the structure of the report. The methodology should
be clearly explained along with the assumptions made in the analysis. The
reader should readily see how the values were developed and how the specific
steps were followed to make the process more conservative, credible, and
accurate. Detailed statistical analyses should be relegated to the appendix.
While several potential audiences could receive ROI evaluation data, four
audiences should always receive the data. A senior management team (however
it may be defined) should always receive information about the ROI project
because of their interest in the process and their influence to allocate additional
resources for HRD and evaluation. The supervisors of program participants
need to have the ROI information so they will continue to support programs
and reinforce specific behavior taught in the program. The participants in the
program, who actually achieved the results, should receive a summary of the
ROI information so they understand what was accomplished by the entire
group. This also reinforces their commitment to make the process work. The
training and development staff must understand the ROI process and,
consequently, needs to receive the information from each ROI project. This is
part of the continuing educational process. Collectively, these four groups
should always receive ROI information. In addition, other groups may receive
information based on the type of program and the other potential audiences.
Acknowledgment
Some of the concepts and practices described in this chapter were developed for
a new book, Return on Investment in Training and Performance Improvement
Programs, Gulf Publishing, 1997. Reprinted by permission.
138 Level Four and Beyond: An ROI Model
References
Alliger, G., & Janak, E. (1989). Kirkpatrick's levels of training criteria: Thirty years
later. Personnel Psycholocy. 42 (3),331-342.
Bleech, J.M., & Mutchler, D.G. (1994). Let's get results. not excuses! Grand Rapids,
MI: MBP Press.
Bramley, P., & Kitson, B. (1994). Evaluating training against business criteria. Journal
of European Industrial Training. 18 (1),10-14.
Broad, M. L., & Newstrom, J. W. (1992). Transfer oftraining. Reading, MA: Addison-
Wesley.
Fitz-enz,1. (1994). Yes ... You can weigh training's value. Training. 31 (7),54-58.
Geber, B. (1995). Does training make a difference? Prove it! Training. 32 (3), 27-36.
Kaufman, R., & Keller, 1. M. (1994). Levels of evaluation: Beyond Kirkpatrick. Human
Resource Development Quarterly, 5 (4),371-380.
Kimmerling, G. (1993). Gathering best practices. Training & Development, 47 (9), 28-
36.
Krein, T. J., & Weldon, K. C. (1994). Making a play for training evaluation. Training &
Development. 48 (4), 62-67.
Marrelli, A. F. (1993a). Cost analysis for training. Technical & Skills Training. 4 (7),
35-40.
Marrelli, A. F. (1993b). Determining training costs, benefits, and results. Technical &
Skills Training. 4 (8), 8-14.
Moseley, J. L., & Larson, S. (1994). A qualitative application of Kirkpatrick's model for
evaluating workshops and conferences. Performance & Instruction. 33 (8),3-5.
Phillips, 1. J. (1995). Measuring training's ROI: It can be done. William and Mary
Business Review, (Summer), 6-10.
Rust, R. T., Zahorik, A. 1., & Keiningham, T. L. (1994). Return on quality: Measuring
the financial impact of your company's quest for quality. Chicago: Probus
Publishing Company.
Shandler, D. (1996). Reengineering the Training Function. Delray Beach, FL: St. Lucie
Press.
Introduction
Certainly, part of the total value of training derives from the reality that
training is, in fact, a function perceived by many employees as a benefit. And,
many organizations promote continued education of employees by offering to
reimburse all or a portion of their tuition when they enroll in higher education
courses. Further, organizations that provide ample opportunity for training and
development would be perceived as employee friendly, while those that were
very stingy with training opportunities would be perceived as less friendly, and
would be thus less competitive in attracting and retaining high-quality
employees. But this benefit function of training is not the major reason that
companies support training, nor is it typically the central focus of evaluation.
The popular notion of training impact holds that training is, above all, an
instrument for improving employee, and thence organizational, performance
and effectiveness. The rationale for training is based on the easily understood
assumption that competent (i.e., well-trained) employees should perform more
effectively than less-competent employees. Note in the foregoing, however,
that the emphasis is on performance, not simply gains in competence. The
impact of training is normally construed as the organizational benefits (such as
increased production, greater quality, reduced costs) that ensue when
improvements in competence are manifested in improvements in job
performance. This is what has been called (Brinkerhoff 1987) the fundamental
logic of training.
The logic of training in its generic form (different variations of the fundamental
logic are discussed later) is represented in Figure 7-1.
Evaluating Corporate Training: Models and Issues 143
Improved .
~ Job ~ BusIness
Results Value
o = Lacking S&K
f8.i<\ =Refined
@ = With new S&K ~ Competencies
(skills & knowledge)
Table 7-1
The typical training value chain
1) Trainees, who lack a certain set of skills and knowledge, enter a learning
intervention.
2) Employees engage in learning tasks.
3) Trainees exit the learning intervention having mastered the intended new
skills and knowledge, and return to their jobs.
4) Trainees try out their new skills and knowledge.
5) Employees develop more refined, enabling job competencies.
6) Employees perform in more effective ways.
7) Employees increase the effectiveness and quality of their job results.
8) Employees enhance the value of products and/or services.
9) Organizational benefits (e.g., greater profits, competitive advantage) are
realized.
As Figure 7-1 makes clear, the value (defined typically as impact) that
results from training is realized only after training-acquired skills are used by
trainees in on-the-job behaviors. The immediate outcomes of the learning
intervention, if the learning intervention is successful, are new skills and
knowledge. But, in most types of training applications, the new skills and
knowledge alone do not add value; they must be practiced and effectively used
for value to be achieved. There are, however, exceptions to and variations of
this general principle, and thus a variable range of understandings of what
constitutes training impact.
there is a chance that the emergency behavior may be required, and the pilots
have been thoroughly trained, the service provided has greater value (it is
perceived as more safe), and thus the training can be shown to add value
(organizational impact) to the core service of the airline company.
The key message of Figure 7-1 is that training must have a logic, and that
the validity of the logic is the determinant of the potential value (impact) of the
training. The more valid the logic, then the more likely it is that the training
will lead to actual added value. One of the primary functions for impact
evaluation of training is to surface the logic of training and assess its validity.
The logic of training clarifies the basis of the value claim for training. If no
credible argument can be made that the learning to be gained by training could
possibly lead to performance improvements that would subsequently lead to
added value for products and services, then the logic is invalid; if a highly
credible argument can be made, then the logic is more valid.
As noted, however, there is more than one single logic that defines training
value. Confusion about the more specific logic of particular training initiatives
can undermine training effectiveness and impede the practice and effectiveness
of impact evaluation.
One major factor that complicates the issue of evaluating training impact is that
training is not a unitary function intended to have only one single result. There
are variable intentions for training, and these several purposes for training
imply different constructions of the concept of impact. Leonard Nadler (Nadler,
1980) in his seminal discussions of human resources development, or HRD (his
146 Clarifying and Directing Impact Evaluation
suggested term for what he saw as the outdated name for training), postulated a
three-part taxonomy of training purposes: training, which focused on job
incumbents with the purpose of improving their current job performance;
education, which focused on employees with the purpose of preparing them for
specified new jobs and roles; and, development, which aimed at building the
organization's capacity to perform effectively in the future by increasing the
competence of employees. As the practice and profession of HRD has grown,
this simple taxonomy of aims and purposes has become increasingly outdated.
Table 7-2 provides a more current categorization of the several purposes of
training as it is commonly practiced in organizations today (Brinkerhoff &
Brown, 1997). This taxonomy is based on an analysis of the various intended
organizational benefits that training may be applied against, and thus is of
special importance for those concerned with evaluating the impact of training.
Table 7-2 names each type of intended benefit in a few words, describes it
briefly, then provides one or more common labels that are sometimes used to
characterize that category of training in different organizational contexts.
The outcome categories in Table 7-2 are useful in clarifying the intended
objectives for any particular organization-sponsored learning program or effort,
and help as well in identifying expectations among providers and consumers of
training as to the benefits and results they do, or can hope to, realize.
The ftrst category is often partitioned into what are called soft skills, which
popularly refer to skills in human interactions, such as communications, values
clariftcation, conflict resolution, and so forth. Production procedures, operator
148 Clarifying and Directing Impact Evaluation
training, computer skills, and otber concrete psychomotor sorts of skills are
often called hard, or technical training, skills. The practitioner will encounter
many labels and definitions. The distinguishing characteristic of tbe function of
the training in this first category is the intended benefit: improved job
performance, whether that improvement should be manifested immediately or
after a period of time.
The training outcomes of Type D and Type E efforts are driven by similar
assumptions: that these training results are needed by a broad spectrum of the
organization's employees. A familiar analogy for Type D training is the civics
curriculum once common in America's secondary schools. The civics
curriculum was derived from the core of knowledge that a student would need
Evaluating Corporate Training: Models and Issues 149
The Type F category refers to learning efforts that are intended to develop
leadership skills and capacities. Type F outcomes are driven both by immediate
needs for skills and knowledge by persons in current leadership positions, and
by the needs of the organization to have a pool of leadership talent from which
to draw to replace leaders and meet emerging needs for leadership at all levels
in the organization. Training structures, such as workshops and seminars, are
often accompanied by personnel performance review systems and other
processes to identify and groom potential and future leaders.
The final category, Type G, is typically not the focus of training evaluation
inquiry, since it is not intended to have much, if any, impact on performance.
This category is listed to capture the sorts of learning experiences that
organizations sometimes provide to their employees simply as a benefit, in the
same spirit that organizations might provide health insurance, child care,
exercise facilities, or a cafeteria. Along this line, some companies provide
learning opportunities strictly as a sort of giveaway to employees, such as
classes in how to control home finances, or even French cooking. The rationale
for the giveaway is that such learning opportunities increase the attractiveness
of the organization as an employer, serving the same function as other benefits
150 Clarifying and Directing Impact Evaluation
These, then, are the typically encountered and major sorts of efforts that will
be found under the rubric of training or other terms meant to represent the
function in organizations that provides for ongoing learning among employees.
As was noted, the function will carry different names in different organizations
and enterprises. The purposes served, as defined in Table 7-2, however, are
very consistent across organizations. There are, of course, marked differences
between any given organizations as to what extent they pursue each type of
training outcome and how they label and budget for that training.
Impact evaluation intersects the logic of training in three vital ways. First,
evaluation can be applied to the logic itself: the logical basis of the impact
argument. There are several purposes that training may serve, and each is
driven by a different logic. Training evaluators can very usefully serve
organizations simply by surfacing and clarifying the logic of the different sorts
of training the organization supports, engendering discourse about priorities,
and helping to resolve misunderstandings about what sorts of results training
can and cannot achieve. Where evaluative data cast doubt on the validity of the
logic of training, decision makers can be engaged in discussions of whether or
to what extent the training should be pursued.
Valid logic, however, determines only the potential value of training. For
training to actually add value, the training process must be designed and
managed so that all phases of the logic-chain are effectively implemented.
Almost never are the immediate results of training included in what is
popularly conceived of a training impact. The specific learner gains in
knowledge and skill, or positive reactions toward the training, are not the
impact that is sought. Rather, it is some job or organizational performance
characteristic that defines impact, and the training is seen as an enabler of that
result.
Thus, the second focus for evaluation attention is to monitor the critical
series of events in the logic value-chain as the training is being implemented.
Where obstructions and obstacles are identified, clarified, and understood,
decision makers can intervene to put the value-chain back on track, enhancing
the likelihood that impact will, in fact, be achieved.
Recall that the discussion of Figure 7-1 included a description of the generic
sequence of events that training is intended to engender so that organizational
value (impact) can be achieved. This sequence is referred to in this chapter as
the training value chain, which includes the major events in Table 7-1.
152 Clarifying and Directing Impact Evaluation
Most often, the impact of training is defined such that it includes point 7,
improved job effectiveness, and point 8, enhanced value of products and
services, and/or point 9, organizational benefits. Points 4, trying out learning,
5, refining competencies through practice, and 6, performing more effectively,
are typically subsumed under the rubric of transfer of training or learning.
Because the impact points are so dependent upon the transfer points in the
value chain, it is often the case that impact evaluation efforts include an
investigation of transfer, even though the transfer alone is not construed as
impact per se.
In any case, readers should notice the obvious fact that training impact is not
achieved immediately and is greatly dependent upon subsequent points in the
value chain. Any of these subsequent points might break, threatening or
canceling the impact of training. Trainees might not, for example, learn the
skills in the first place because of a faulty learning intervention. Or, they may
forget the skills because of lack of practice. Or, they may remember them but
never find the time or opportunity to use them, or they may even be prevented
from using them by some job context factor, such as a policy that rewards non-
use. They may not use them because their peers do not, and thus they feel that
the culture of the workplace discourages them from using their new learning.
Or, they may not want to use them because they object to them, or think they
are not worthwhile, and thus may not learn them well in the first place. Or,
they may use their skills, but organizational priorities and goals may have
changed such that their improved performance has only very marginal value.
The training value chain is often fragile, in that the new learning calls for
new sorts of performance, which are likely to be overwhelmed by whatever old
ways of doing things prevailed prior to the training. The training value chain is
always threatened by factors other than whether the trainees learned the new
skills and knowledge in the first place. Competent performance is a complex
phenomenon, assailed by many organizational forces, and never caused solely
by the capability of the performer him- or herself. Thus, this second focus for
impact evaluation-studying the value chain during implementation to help
keep it on track-is a vitally important function.
The third focus for impact evaluation is retrospective, whereby the logic of
training can be reconstructed and analyzed after training has been
implemented. This can help decision makers determine whether and how much
impact the training value chain has led to. More important, this analysis can
Evaluating Corporate Training: Models and Issues 153
help determine what factors and events during the history of the training
implementation facilitated impact, or what factors impeded or obviated impact.
This knowledge of impact-influencing factors can be of great help in designing
and planning further training, and in building the organization's capability for
getting continuously greater value from training endeavors.
The popularity of the quest for impact evaluation is easily understood in the
competitive context of organizations. Because resources in the organization are
always limited, executive management is in the perennial role of stewardship:
deciding where and how the organizational resources can be best allocated to
achieve competitive advantage. The benefits of training are often marginal at
best, almost always hard to discern, and are frequently long term. It is no
wonder that the training function is a favorite target for the budget cutters'
axes. It is likewise no wonder that training professionals in these organizations
have longed for the sorts of hard data that would let them compete in the
budget process with production and other line areas of the organization. Access
to return on investment and other financial indices would justify their resource
outlays in terms of results the business readily recognizes.
But the very notion of evaluating the impact of training can, and often does,
beg some very fundamental questions. Of greatest concern is that the pairing of
the term impact with the term training implies that one should expect an
impact from training, which leads one to jump to the next conclusion that
training causes the impact with which it is paired. The quest for impact data
and other hard measures of training value can mislead the organization as to
the true nature and substance of training, and the process by which training
leads to organizational benefit.
the company trains its production staff in how to effectively apply the new
maintenance procedures. Now, imagine that the production staff, in fact, learns,
then uses, the new procedures; the machines stay operative longer, and
efficiency increases while costs (due to repairs, etc.) decline. Now we have
impact. But, let us be clear what caused the impact. It was not the learning but
the use of the learning (performance) that led to the impact. Imagine, on the
other hand, that no one had used the learning, and the maintenance procedures
were never implemented. Clearly, in this case, there would have been no
impact.
Performance is affected by many factors other than the sheer capability of the
performers. As performance analysts and researchers have pointed out
(compare Gilbert, 1978; RummIer & Brache, 1994), performance is shaped by a
complex combination of systemic organizational factors. This complexity of
interaction between performer capability and performance system factors can be
readily discerned and discussed in a typical example, as described in the case
that follows.
It is clear that the intent of the training in the case was to add value by
increasing customer satisfaction with the investment analysis provided in the
initial interview. But anyone of the factors (or many others) could keep the
training from having a positive impact on customer satisfaction. These factors
exemplify a problem with which virtually all training practitioners are familiar.
People may have the skills and knowledge needed to perform more effectively
but may not do so because of failures in the performance support system:
insufficient or missing incentives, peer pressure to maintain old behaviors, a
lack of supervision, inadequate tools and resources, unclear performance
specifications, vague direction and lack of feedback, to name a few.
A fourth strategy, which has begun to emerge (Brinkerhoff & Gill, 1992), is
to adopt the concepts and procedures of quality management, or TQM (total
quality management), and apply these to impact evaluation.
These four strategies are discussed briefly in the next section of this chapter,
and some closing recommendations are made.
have been a boon, spawning a number of journal articles and research studies.
For the training practitioner with bureaucratic accountability for the effective
operation of the training function, they have been a nightmare. On the one
hand, as the opening section of the chapter implies, they have had to cope with
ambiguity and confusion surrounding the expectations for training results, and
the range of organizational purposes that reside under the training umbrella.
On the other but related hand, these practitioners are pressured to assess and
prove their worth, and have looked to evaluation to help them do this. As many
surveys have shown (for example, Carnevale & Schulz, 1990), practitioners
have done reasonably well in assessing their trainees reactions to the training
they receive. But, reports of successful evaluation of the indicators of training
impact, as these are defined further away from the immediate learning along
the training-performance value chain, decline dramatically.
This final portion of the chapter is intended to shed some light on the
strengths and weaknesses of the currently available impact-evaluation
strategies, and to make some closing recommendations for further development
by researchers, and applications for practitioners.
This strategy honors the fact that training is but a single element in the
complex nexus of actions to improve performance and achieve organizational
results. Several approaches are spawned by this strategy, yet all are essentially
complicated and require access to controls and resources that are typically
beyond the reach of the typical training practitioner.
The evaluation methods available within this strategy all derive essentially
from the experimental paradigm as it is applied in behavioral research
(Kerlinger, 1973, among many), which would consider training as the main
effect under inquiry and assign all other performance influencing factors the
status of extraneous variables. The training evaluator would then seek to
construct an evaluation design that would control and account for, either
through manipulation of the conditions of the evaluation test or through
statistical methods, these extraneous variables. One might, for example,
randomly assign one group of employees to a training intervention and another
group to a placebo treatment, and then compare the later job production and
effectiveness of these employees. Another similar approach might measure a
158 Clarifying and Directing Impact Evaluation
group of trainees on many of the known extraneous variables and one or more
impact variables, and then statistically control (through regression analysis, for
example) and account for the partial effects of each of the related variables,
seeking to isolate and describe the effects of the training variable alone.
The experimental design methods have been used far more in the research
arena than by practitioners in applied settings, and have yielded important,
though largely inclusive and incomplete, information about training
effectiveness. (For an excellent summary of training research, see Tannenbaum
& YukI, 1993.) Further, some of the utility methods used by researchers have
shown interesting but limited applicability for practitioners (Mathieu &
Leonard, 1987 ).
This strategy is implicitly reinforced by the outdated but popular and durable
Kirkpatrick Model (Kirkpatrick, 1975) of training evaluation that posed four
levels of training results: 1) reactions of trainees, 2) learning of trainees, 3)
usage of learning by trainees (transfer) on the job, and 4) organizational
benefits. The great strength and contribution of the Kirkpatrick model is that it
focused management attention on the longer-range purpose of training
(impact), and gave practitioners an apparent hierarchy along which to pursue
and focus their evaluative efforts. Its greatest weakness is that it promoted the
simplistic and false view that training could somehow lead directly to
organizational benefit (level 4) without a great deal of other effort. It also
promoted the belief, appealing intuitively but unsupported by research
Evaluating Corporate Training: Models and Issues 159
(Tannenbaum & YukI, 1993), that there are causal connections from level to
level.
Because of the intuitive appeal of the Kirkpatrick model, and its timely
introduction to the training field, it has spawned a push for level 4 evaluation
studies. As it is largely practiced, level 4 evaluation studies primarily seek
impact data, and make only superficial efforts to study the larger causal context
of the performance that underlies the impact results. A typical level 4 study
design might, for example, implement a survey of all past attendees of training,
many months after they have attended training. It would ask these trainees (or
their supervisors) to report instances of application of the training, and then
further ask the respondents to estimate the impact and value of these
applications. Survey data would be aggregated, and the instances and
proportions of successful impact would be defined and reported. The strength
of such approaches is that they are relatively simple and inexpensive. They are
also almost certain, for a variety of reasons, to tum up seemingly positive and
credible instances of impact, which when they are found, the training function
is usually quick to claim credit. Relatedly, it is also a strength of these methods
that, when insufficient evidence of impact is found, there is also a handy escape
route available to avoid blaming the training itself for the failure, as the
evaluators can cite the uncertainty of the findings in light of the unknown
effects of alternative causes.
Primary among the problems with the ROI methods is their failure to deal
with the complex causality issues. Further, the pursuit of ROI evaluation also
introduces confusion as to the proper goals of training. According to the ROI
process, a positive ROI (a return greater than the costs of training) is equal to
success. That is, training that has a calculated ROI greater than its costs
indicates that the training is good, and a calculated ROI lower than the costs of
training indicates that it is bad. It is fully possible, of course, that training
might result in dollars saved in some area of a company but that the cost
160 Clarifying and Directing Impact Evaluation
savings have virtually no strategic or business value. Further, when this is the
case, the net ROI is negative, since whatever was spent on the training and the
ROI study should have been invested where the business needed it most. The
purpose of training is not to save money! It is to help improve performance in
parts of the business that are crucial for strategic effectiveness.
Last, the ROI methods require evaluators to isolate the effects of training. As
we know, the effects of training should be integrated with, not isolated from,
the performance system. Methods that seek to claim training credit for impact,
and not recognize the vital contributions of other players in the performance
process, are divisive and exacerbate political isolation of the training function.
Further, training functions are already isolated, physically in the geography of
the organization, and conceptually in the minds of line management, from the
mainstream of business performance. Evaluation methods that separate training
more are probably best avoided.
training function. If this observation is true, it is little wonder then that training
managers complain that they have difficulty in getting real commitments and
participation from line management in the training process.
The quality management concept as it has been defined by Deming and others
(Deming, 1986; Crosby, 1979; Juran & Gryna, 1980; Taguchi & Clausing,
1990) fits the training evaluation context very well. Quality management
stresses customer-oriented definitions of quality, systems thinking, process
analysis, and process measurement, all of which have are responsive to the
context of training, with its inextricable interweaving into the fabric of the
complex performance management and improvement system, and the resultant
multiplicity of stakeholders and customers.
One key and defining distinction of the quality management approach is that
the customer is the arbiter of quality, and quality is functionally defined as
being fit for use by the customer. Thus the first step in quality definition is to
study the customers and determine their needs and expectations. These needs
and expectations are then translated into product and service specifications,
which in tum drive the design of production processes.
162 Clarifying and Directing Impact Evaluation
Applied to training, the quality perspective forces the question of who is the
customer of training. Where trainees themselves are implicitly defined as
customers, quality specifications tend to represent short-term characteristics of
the training intervention itself, such as whether the training venue is
comfortable, whether the training leader is entertaining and supportive, or
whether the trainees find the content relevant to their perceived needs and
experience. On the other hand, when training customers are rightfully viewed
as the immediate and up-line managers of the trainee, then the quality picture
changes considerably. The trainees' managers are interested most in training
that helps their employees perform more effectively, thus helping them and
their senior managers achieve business performance goals. Quality
specifications for training driven by the ultimate customers of training-the
managers who pay for it-tend to focus more on the results and value of
training in terms of the business, thus helping evaluation gain more leverage
for assessing and increasing impact.
In sum, the notion that training evaluation can be profitably viewed from the
perspective of quality management is a sound idea, and represents a potentially
powerful strategy. The previously described strategies (one through three) may
all lend helpful methods, and may at times be useful in temporary situations,
such as conducting a quick study of ROI when the training department is under
attack and must employ emergency procedures to defend its budget. But none of
these previous strategies represent a viable long-term strategy for impact
evaluation. Above all, the goal of impact evaluation (any evaluation, for that
matter) is learning. Most organizations do not get the impact they should from
training. Constructive and thoughtful impact evaluation, using a process-
oriented approach as described here, can be a powerful force for building
organizational capability, through learning, to achieve increasingly more
valuable results from training resources.
164 Clarifying and Directing Impact Evaluation
Summary
The issue of training impact is complicated by the several and varied purposes
to which organizations direct employee learning and education. These purposes
center on, but are not exclusive to, achieving business results and value through
the improvement of employee performance. Even when training impact is
clearly defined as performance improvement that achieves business goals,
training impact is subject to myriad forces and factors that impinge on job
performance and the value of job results that range well beyond the relatively
small contribution training alone makes.
When the belief in the myth of training as the cause of impact is strong, it is
very likely that training will have only marginal impact. Since insufficient
quality management attention will be devoted to the many performance factors
that interact with learning to produce performance improvement, and hence
business results (impact). On the other hand, when training is viewed as one of
several means to performance improvement, and when performance
improvement efforts are managed based on a comprehensive understanding of
the systemic underpinnings of organizational and individual performance, then
training is very likely to be a powerful tool for business success.
References
Brinkerhoff, R. 0., & Brown, V. (1997). How to evaluate training. Beverly Hills, CA:
Sage.
Brinkerhoff, R. 0., & Gill, S. J. (1994). The learning alliance. San Francisco: Jossey-
Bass.
Brinkerhoff, R. 0 & Gill, S. J. (1992). Managing the total quality of training. Human
Resources Development Ouarterly. 3 (2), 121-131.
Brinkerhoff, R. 0., (1987). Achieving results from training. San Francisco: Jossey-Bass.
Carnevale, A., & Schulz, E. (1990). Evaluation practices. Training & Development
Journal. 44(S), 23-29.
Deming, W.E. (1986). Out of the crisis. Cambridge, MA: MIT Press.
Gilbert, T.F. (1978). Human competence: Engineering worthy performance. New York:
McGraw-Hill.
Juran, 1M. & Gryna, F.M. (1980). Qualitv planning and analysis. New York: McGraw-
Hill.
Kerlinger, F.N. (1973). The foundations of behavioral research (2nd. ed.). New York:
Holt, Reinhart, & Winston.
166 Clarifying and Directing Impact Evaluation
Mathieu, 1., & Leonard, R. (1987). Applying utility concepts to a training program in
supervisory skills. Academy of Management Journal. 30, 316-335.
Nadler, L. (1980). Corporate human resources development. New York: Van Nostrand
Reinhold.
RummIer, G., & Brache, A. (1994). Performance improvement (2nd ed.) San Francisco:
Jossey-Bass.
Taguchi, G., & Clausing, D. (1990). Robust quality. Harvard Business Review, 68(1),
66-75.
Tannenbaum, S., & YukI, G. (1992). Training and development in work organizations,
Annual Review of Psychology. 43,399-441.
Introduction
Formative evaluation is the review and revision of all parts of the instructional
design process from the needs assessment through the development and field
testing of the materials. Although the concepts and methods have been with us
for 30 years, there are signs of that it is being "rediscovered" by a new
generation of educators.
When it occurs From the beginning of the Typically three to twelve months
instructional design process after the student completes the
(needs assessment) until the training
materials are completed
preferences. These are the kinds of discrepancies or deficiencies that, had they
been discovered and addressed while the training was being designed and
developed, would have produced a very different set of summative results. The
cost of producing inappropriate or useless training, both in human terms and in
dollars, argues loudly for more attention to that often-neglected element of
evaluation to which we now turn our attention: formative evaluation.
General Principles
Review Constantly
The developers should constantly be testing their ideas with colleagues and
peers. A review may be a formal meeting with agenda and sign-offs. It may also
be a discussion over a cup of coffee. Formative review almost becomes a
mindset, a willingness to share ideas and receive feedback. The sooner feedback
is received, the easier it is to incorporate it into the design of the project. One
last point on the necessity for continuous review should be noted. The longer
people hold on to an idea and polish their material, the more it becomes a part
of them. As a result, they are more likely to resist changes.
Subject matter experts are authorities on the content. They bring depth of
knowledge and experience, often at a level beyond what will be taught. Subject
matter experts typically do not have a background in education or training.
The client is the person who funds the project. It is critical that the client be
involved in decisions that affect the outcome of the project and that, when
compromises have to be made, they serve the needs of all parties involved,
including the client.
In some cases an individual may have more than one of the skill sets and
perspectives mentioned above. For example, your client could also be a subject-
matter expert. When that is the case the person should review the material
more than one time, concentrating on the perspective of each skill set.
Develop a Network
The continuous review process can make people feel vulnerable. Inviting
criticism is a hard thing to do, and hearing the feedback can be even more
difficult. Yet good instructional products just don't happen. In all likelihood
they have undergone continuous review and revision. For this to occur on a
consistent basis, you need to develop a network of people who trust and respect
one another and are willing to give and receive feedback as mutually supportive
professionals.
Some people in a review network may not be on the project team, but they
may be knowledgeable about instruction or the content. When building a
network, remember that this is time consuming and often extra work. It may
not be a technically required part of that person's job. This means that for
networks to be effective, they must become a two-way street. You must also
Evaluating Corporate Training: Models and Issues 171
review the work of your colleagues. If it becomes a one-way street, it will not be
sustainable.
Design Review
Most people would subscribe to the truism "You can't get there if you don't
know where you are going." The design phase in the development of an
instructional project determines where you are going and how you will get
there. Yet, some people see the whole design process as just something to get
over, an impediment to doing "real work." In the United States, course
developers typically spend 10 percent of a project on design and 90 percent on
implementation and revision. In other cultures, particularly in Japan, those
proportions are nearly reversed.
Table 8-2 summarizes the elements and highlights critical points to consider
during the review.
172 Formative Evaluation
Evaluation Clearly articulates how student reactions and learning will, or will
strategy not, be assessed; identifies any posttraining evaluation activities.
Project Addresses all issues that will affect the project: schedule, budget,
management roles, responsibilities, and working norms for the team.
-.
The review should include several discrete items that were developed during
analysis and design. This list is far from exhaustive. For more detail refer to
Evaluating Corporate Training: Models and Issues 173
Process Guidelines
The process for conducting the review can affect the outcome. Following are
some suggestions for conducting effective design reviews.
• Include the right people and set clear expectations; usually the entire team
and the client should attend.
• Send out prereview materials early; point reviewers to specific items that
need to be more fully addressed at the meeting.
• Conduct the meeting in a manner .that is congruent with culture of the
environment and the size of the project; it may be an informal discussion
around a table or a formal presentation.
• At the conclusion of the meeting review decisions and outcomes; gain
agreement on the design and on roles and responsibilities.
• Document all decisions and distribute them to everyone that was present.
Decisions that are not made in early stages will pile up later. At the time of a
design review, people will often say, "Oh, we don't need to worry about that
now. We can decide on that later." What they are really saying is something
more like, "I don't know the answer" or "I can't (or will not) figure out the
answer." Or they may be avoiding making a tough decision. In any case, the
174 Formative Evaluation
necessity of the decisions or their importance does not go away. They will keep
coming back to haunt the project team until they get answered. If
procrastination continues, a number of the decisions may have to be made at
once. The later the decisions get made, the less time and resources are available
to implement them. Bite the bullet and make the tough decisions as part of the
design review phase. Some projects manage to seemingly never answer the
hard questions posed as part of design. This is an illusion. They get answered.
The answers almost always make no one happy.
Prototype Review
A useful model for reviewing modules was devised by David Merrill. The
model, called the Instructional Quality Inventory (Montague, 1983), addresses
issues of consistency and adequacy.
Evaluating Corporate Training: Models and Issues 175
After modules are reviewed for consistency, they should be reviewed for
what Merrill calls adequacy. This refers to how well each element of the design
is executed. Are there clear instructions for the exercises? Is the material
technically correct? Instructional designers, subject-matter experts, instructors
and potential students, each focused on a different element of adequacy, can all
provide useful information at this stage. Subject-matter experts can assess the
technical accuracy of the material. Students can provide feedback on clarity and
understandability. Instructors can tell you a great deal about structure and flow.
For instructional designers, the important questions are around the
effectiveness and efficiency of the instruction.
Adequacy only matters when the parts are consistent. For example, unless
the multiple-choice test items in an end-of-module test match the materials
found in the module, it doesn't matter how well the test items are written.
Because of this, examining consistency before adequacy is important.
Process Guidelines
The more people that review the materials, the better the chances for achieving
a high-quality product. One person may be able to review from two
perspectives, for example an instructor and a subject-matter expert. This is
acceptable as long as they go through the materials twice and do a review from
each perspective. The following guidelines help ensure useful and timely
feedback from reviewers
• Give people specific directions about what to look for rather than saying,
"Please review this for me."
• Send out materials in small manageable quantities, 20 pages rather than
100.
• After people have returned their comments to you, get back to them. Tell
them what you changed or left alone and why. After they made the effort,
people want to know that you did consider their suggestions. This will also
make them more willing reviewers in the future.
Prototype Testing
This is the point at which instructional materials are tested in a context that is
similar to the real setting in which they will be used. Prototype testing may
occur in several stages. You may flrst test the materials with colleagues, then
with a group of people similar to the audience, and flnally with the actual target
audience.
The methods used for testing will differ based on the format of the materials.
Some general guidelines are discussed for trying out self-paced materials and
for conducting an early module teach for an instructor-led training activity.
surprised about what happened next. The evaluator should have a print copy of
everything that the student sees, such as a copy of a workbook or a print copy of
the storyboards for computer-based training. Without judgment, the evaluator
records what happens.
After the first significant module of an instructor-led course has been drafted
and reviewed, it can be tried out in a mini-teaching situation. Only a small
number of people are needed. It is better if they resemble the target audience,
but if they don't, the characteristics of the target audience can be described to
them, and they can be asked to play that role.
Typically, the course developer teaches from the draft materials, using them
in the same way they are intended to be used in final course. If possible, an
evaluator or observer should be present to record behavior, questions, delays,
and problems.
Following the teaching portion, analyze the results from three perspectives:
1) student achievement, 2) student perception, and 3) instructor perception.
Assessment of achievement can be accomplished with a short test or simply
through a discussion about what people learned. Student perceptions can be
assessed in a questionnaire and through a structured group discussion. The
instructor can summarize what went well, what did not, and identify how
changes in the materials would make the content easier to teach. This
information, coupled with the observer's notes, can provide valuable
information to guide subsequent development of the course.
178 Formative Evaluation
Even though this technique can produce excellent results, it does not have to
be done with every module or even with every course. Use it in situations in
which there is some uncertainty about the content of the materials, the
methodology used to present them, or the capability of the instructor who will
be teaching the course pilot or subsequent courses. In each of these situations it
will provide useful information that will help identify and avoid potential
problems.
Participants in a pilot are informed ahead of time it is a pilot, and that the
last part of the course will be an evaluation. They are typically encouraged to
note anything they think would improve the course along the way. The typical
method for collecting data from the pilot is to have a conference with the
students at the end of the course. If it is a multi-day course, some instructors
also like to have a daily short analysis of how thing are going at the beginning
or end of each day.
Someone other than the instructor should conduct the conference, preferably
an evaluator. It sometimes helps people be more candid if the instructor is not
present during the discussion. Often a combination of written questionnaires
and structured discussions are used to gather information from pilot
participants. If both are used, the questionnaires should be completed before the
class engages in a group discussion. Some of the topics that can be addressed in
the pilot evaluation, either through questionnaires or in discussion include
The Barriers
Formative evaluation requires a unique skill set, as does every other element
of the instructional design process. Over the past twenty years there have been a
number of studies to identify the competencies needed in each area of
instructional design. Many graduate programs in instructional design,
particularly smaller programs, will have introductory instructional design
courses, courses specific to media, for example computer-based instruction, or a
course on needs assessment or project management. Most small- to medium-
sized departments do not have a course on formative or summative evaluation.
This means that many people with a graduate degree in instructional design
have little formal training in evaluation. This problem is compounded by the
180 Formative Evaluation
Other parts of the overall instructional design process have received more
attention. For example many people perform needs assessment. People see the
necessity for it, and a number of people have developed skills in the process.
An indication of this is the number of companies that provide needs assessment
services as compared to formative or summative evaluation services. The index
of the 1996 ASTD Buyer's Guide and Consultant Directory lists 127 companies
that offer needs assessment services. The same directory lists 44 companies that
offer evaluation services. But the actual number of companies offering
formative and summative services is far less because the list of 44 is
predominated by companies offering specific evaluation instruments, not
formative or summative services. Parenthetically, this may be a clue to some
enterprising types that this is a market with large opportunities.
For some course developers, what matters is the technical accuracy and
completeness of the materials. A large number, if not the majority of people
who develop and deliver technical training, are subject-matter experts who have
been placed in a different role. What has mattered in the past was the technical
aspects of a product or service. This is what they know and what they feel is
important. They often do not understand the absolute necessity for a balance
between technical accuracy and educational soundness. If you want quality
courses, both are an absolute necessity.
The Opportunities
strengths of your course design. The result is a more solid and effective
learning experience for the student.
There is one last benefit to mention, and that is for students using self-paced
materials. This includes any media or method in which the student and
instructor are not physically together. This would include computer-based
instruction, multimedia, print workbooks, and much of the distance material
that is currently being produced. If a student is in a classroom and does not
understand something, he or she can ask a question. This is much more
difficult, and sometimes impossible to do when using self-paced material. If the
student does not understand some basic concepts in a multimedia course or
needs something explained a different way, what can the student do except
become frustrated. This is something that you want to avoid. Formative
evaluation is a tool to ensure that students can perform well when using this
material. By trying it out and testing your assumptions along the way, you are
limiting the frustration and loss of learning that may occur in the future. In
addition, the cost of changing and redistributing materials, particularly
multimedia materials, can be formidable.
Formative evaluation is not a ''nice'' thing to do, if you have the time and a
big staff with expert skills. It is a necessity for every course. There are concrete
processes that anyone can learn and use. It's a well-documented and mature
field upon which we can continue to build. Its rediscovery is heartening and
should be encouraged.
182 Formative Evaluation
References
Montague, W. E. (1983). The Instructional Ouality Inventory ClOD: A formative tool for
instructional systems development (ERIC Document Reproduction Services No. ED
235-243).
Yourdon, E. (1989). Structured walkthroughs <4th ed.). Englewood Cliffs, NJ: Prentice-
Hall.
A recent convergence of social, economic, and business forces has led corporate
practitioners in training and development to look at employees' competency
assessment results not only as indicators of performance strengths and
development needs, but also as sources of data for organizational decision
making. In fact, one of the hot topics in human resources today, especially in
the areas of compensation and development, is competencies and their potential
for integrating the various elements of human resource systems. Most of these
approaches favor multirater or 360-degree feedback as the competency
assessment mechanism of choice (American Compensation Association, 1996).
There are, however, several fundamental issues that most of the current
literature and conference presentations fail to address:
performance as the seminal, causal, and predictive indicator of job success and
organizational effectiveness? Are we trying to provide a single solution to
relieve managers' stress and the challenge of managing people and
performance?
This chapter begins with a short discussion of the social, technological, and
especially economic conditions that have contributed to the popularity of
employee competency assessment. Then four key topics are discussed:
• What is competency?
• What are the competency identification methodologies?
• What is competency assessment?
• When do you use a multirater competency assessment?
The chapter ends with a discussion of tools and techniques that contribute to
the formulation and implementation of a useful and meaningful competency
assessment effort.
Social forces have set the stage for an increase in psychological and multi-
rater measures of performance. Technological innovations, by improving the
collection, storage, and crunching of the numbers that represent people's
capabilities, have led to the multimillion-dollar business of competency
assessment. At the same time, economic requirements are producing work
accomplishments and other outputs that are complex, interdependent, and
progressively harder to measure. Exploring the conditions that produced this
convergence of economic, social, and technological factors highlights the
strengths and problems of the competency assessment movement.
Evaluating Corporate Training: Models and Issues 185
begun in the 1970s by David McClelland (1973) and later codified by Richard
Boyatzis (1982) took a rigorous research design and statistical analysis
approach to identifying and defining the thoughts, behaviors, actions, motives,
and traits that predict successful job performance. McClelland was the first to
state that successful job performance is unrelated to IQ but highly correlated to
the demonstration of competencies.
The job, so long the "atom" around which human resource practices were
built, is no longer a viable concept (Ledford, 1996). There is a greater demand
to reward people not just for ''making the numbers" or reaching predefined
goals but also for considering how these goals were met. Do individuals meet
short- as well as long-term customer expectations? Do they decrease costs while
ensuring high qUality? Did they bum out other staff while doing so? These are
examples of the softer, less output-oriented "how" side that competencies
commonly measure. As work becomes more complex and outputs more team-
driven and interdependent, competency assessment is considered a potential
solution to the new challenges of measuring both the harder "what" and the
softer "how" side of performance.
Employees are also drawn to the competency approach because they want to
develop the "how," otherwise known as their "portable competencies" (Larrere
and Williams, 1996). Taking initiative, listening to others, working efficiently
in diverse teams, thinking critically, and acting with flexibility-these are the
fundamental qualities that individuals can take with them from job to job,
company to company. To employees, competencies are the concepts they must
master to stay employable and succeed in tightening economic conditions.
What Is Competency?
These models describe but don't differentiate the critical characteristics that
cause or predict successful performance. They are most valuable for developing
job profiles, job orientation and training programs, deployment indicators, and
performance appraisal criteria. Descriptive models become problematic when
they attempt to go beyond description to create performance differentiators
without the benefit of empirical data.
better results than average performers. Spencer and Spencer offer a more
technical definition: "A competency is an underlying characteristic of an
individual that is causally related to criterion-referenced effective and/or
superior performance in a job or situation" (1993, p. 9). According to this
approach, if you can identify star performers, you can study them; figure out
what makes them great; label and behaviorally define their characteristics,
especially deep-seated ones like motive, personality traits, and self-concept; and
use the resulting criteria to assess, select, develop, and train others.
Performance in the more complex jobs or roles becomes less a function of
knowledge, task-related skills, intelligence, or credentials and more a result of
competencies, such as a motivating drive, interpersonal skills, positive self-
image, or political savvy (Spencer & Spencer, 1993).
Excellence or differentiating models are most valuable when they are well
researched and grounded in business needs. Their highly context-sensitive
approach makes them invaluable in an organizational or cultural change effort.
Since this type of model is so tightly linked to top performance in a role or job,
it is not suitable as a company-wide universal model.
Methodological Considerations
As a practitioner, how do you know if you are choosing the most appropriate
and cost-effective methodology? Your methodology, or combination of data-
gathering and analysis approaches, should be closely tied to intended business
outcomes. There are several key questions to keep in mind:
The type of model and data collection methodology you choose should be
related to the intended business use of the resulting information. If your
organization wants to improve its "hit rate and retention" for hiring in key
positions with significant customer interface-a senior account executive, for
example-you will need a differentiating competency model with high
criterion-related, construct, and content validity. If, on the other hand, solid but
streamlined job descriptions are needed to clarify roles during reorganization or
create job-focused development plans, a descriptive competency model may be
adequate. Table 9-1 will help you decide which type of model to use.
Evaluating Corporate Training: Models and Issues 193
The most valuable competency studies also use multiple data gathering and
analytical approaches designed to overlap and produce similar or converging
results. If a competency study uses only one data-gathering method, whether
interviews, surveys, literature review, or focus groups, key behaviors or
competencies are practically guaranteed to be missing. Within reason and
budget, the more data gathered, the more perspectives tapped, the more
analytical approaches used and reviews and feedback from potential users
sought, the better the chance of producing a sound, valid, context-sensitive
competency model of any kind.
A common refrain from line clients about competency studies, and especially
differentiating models, is that they take too long and cost too much. As
practitioners, we need to improve our project planning and management skills,
aligning our activity and deliverables with our company's business processes
and schedules.
A generic model based on a database will save you money. It will open doors
for future competency initiatives and establish credibility. It will defInitely
provide greater value than a brainstormed list. The alternative approach, using
a vendor to develop a company-specifIc model from scratch, is much more
expensive.
Another option for answering the costitime refrain is to build a business case
based on the costs of not doing a quality job of competency identifIcation. For
example, the cost of recruiting, hiring, relocating, orienting, and assimilating a
single senior new hire is in the hundreds of thousands, especially when lost
productivity, a hiring bonus, and a relocation package are included. Preventing
Evaluating Corporate Training: Models and Issues 195
one or two poor hiring decisions more than pays for the competency study. If
you are planning a three-day offsite seminar, calculate the attendance costs for
150 managers. Include travel, food, accommodations, fully loaded salary, and
lost opportunity dollars. Add the cost of training and design, including
simulations and custom videos. Isn't it worth the cost of a competency study to
ensure that the behaviors being assessed and targeted for training are proven
predictors of success?
Carefully determine the uses of assessment data and work closely with line
managers to help them understand the legal, ethical, and moral implications of
using data about people. Managers also need to understand the kinds of
business decisions they can make from different types of data. For example,
drawing up an employee development plan from the organization's composite
scores does not make good business sense, but using that data for training
program design and priority development decisions is probably worthwhile. Say
a manager wants to use competency assessment ratings to determine individual
potential for more senior positions. The competency model, instrument,
assessment process, and resulting data must meet tough standards and should
always be corroborated by other data, such as performance appraisals, business
results, and customer satisfaction scores. Competency assessment information
does not replace the hard job of managing people and performance. It is rather
a tool that informs and enhances that job.
Evaluating Corporate Training: Models and Issues 197
Although a review of the literature may lead you to think that 360-degree
assessment is becoming a standard practice for managers, executives, and
employees who interact with customers, it is still unclear how many U.S.
companies actually use this methodology (American Compensation
Association, 1996; Bohl, 1996). The multirater approach is taking hold in
Europe as well (Handy, Devine, & Heath, 1996). To date, however, there has
been no rigorous study that catalogs the use of multirater assessment
methodologies, much less addresses their business, financial, or even personal
benefits. And no independent body of research has yet proven the common-
198 Assessing Employee Competencies
sense conclusion that the multirater approach reduces bias and adds credibility
compared to more traditional, supervisor-only input.
As illustrated in Table 9-2, the two major uses of multirater assessments are
appraisal and feedback.
Both uses affect people's lives, sometimes dramatically. For this reason
alone, practitioners must carefully monitor the quality and efficacy of the
multirater assessment process. The criteria or competencies assessed must be
Evaluating Corporate Training: Models and Issues 199
Appraisal decisions, including selection and salary, often have legal implica-
tions and should be closely monitored to avoid potential litigation. Feedback
decisions, including career plans and development actions, are usually less
scrutinized since they do not have the same financial implications as appraisal
decisions. Yet, for fragile egos or employees undergoing personal crises,
feedback results from peers and direct reports as well as managers could prove
overwhelming.
On the appraisal side, raters tend to inflate their scores if they believe
managers have access to individual assessment results and use them for
evaluation rather than development (Dalton, 1996). Research studies are
beginning to establish that, once peers perceive a potential negative
consequence to a rating, their scores are more lenient, less differentiating, less
reliable, and less valid (Farh, Cannella, & Bedeian, 1991).
Maxine Dalton of the Center for Creative Leadership, one of the driving
forces behind establishing the value of multirater feedback, says that
Using multirater approaches for both assessment and feedback may sound
efficient, but differentiating the two assessments is even more important. For
development purposes a competency assessment may be the right approach. On
the appraisal side, however, assessing people against company values, project
norms, or business results may be better.
One reason feedback works so slowly is because too often we provide not
full-grown carrots but just the seeds of development. The process of receiving,
processing, and responding to feedback must be structured. Employees cannot
be left to figure out and implement their assessment data without assistance.
The value of the assessment exercise is often inadequately explained, applied,
and reinforced. In this atmosphere discrepant data confuse, discomforting
results are ignored, and the demands of business give employees an excuse for
ignoring their assessments. Employees need quality debriefing and coaching,
plus materials specifically designed to reinforce the value and applicability of
202 Assessing Employee Competencies
the feedback they receive. The seeds of development must be planted, tilled,
and tended.
Treat development tools with as much thought and care as the assessment
instrument itself. Most important, keep in mind the requirements of your
customer base. As practitioners, it is our responsibility to offer our organization
and its employees the fully grown carrots of solid development planning
processes, tools, and other applications that allow people to make the best use
of their competency assessment data. Mere carrot seedlings of assessment
reports, plus a few guidelines for development plans, are not enough.
Multirater assessments contain valuable market research data. They can, for
example, provide the cost of readying a workforce to meet a new strategic
direction. They can also identify workforce strengths that will help your
company use a unique combination of skill sets to fill a market niche.
References
American Society for Training and Development. (1996). Trends that affect corporate
learning and performance (2nd ed.). Alexandria, VA
Atwater, L., Roush, P., & Fischal, A (1995). The influence of upward feedback on self
and follower ratings. Personnel Psychology. 48. 35-60.
Bohl, D. L. (1996, September 19). Mini-survey: 360 degree appraisals yield superior
results, survey shows. Compensation and Benefits Review. 28.16.
Budman, M., & Rice, B. (1994, February). The rating game. Across the Board. 35-38.
Farh, 1. L., Cannella, A A., & Bedeian, AG. (1991). Peer ratings: The impact of
purpose on rating quality and user acceptance. Group and Organizational Studies •
.!§. 367-386.
206 Assessing Employee Competencies
Flanagan, J. C. (1954). The critical incident technique. Psychological Bulletin. 51. 327-
358.
Gebelein, S. H. (1996, January). Multi-rater feedback goes strategic. H.R. Focus, 3-6.
Handy, L., Devine, M., & Heath, L. (1996). 3600 feedback: Unguided missile or
powerful weapon? Berkhamsted, England: Ashridge Management College, Ashridge
Management Research Group.
Hazucha, 1., Heziett, S. A., & Scheider, R. J. (1993). The impact of 360-degree
feedback as a competitive advantage. Human Resource Management. 32. 325-354.
Hoffman, R. (1995, April). Ten reasons you should be using 360 degree feedback. HR
Magazine, 82-85.
Kantor, R. M. (1997). Restoring people to the heart of the organization of the future. In
F. Hesselbein, M. Goldsmith, & R. Beckhard. (Eds.), The organization of the future.
Drucker Foundation Future Series. San Francisco: Jossey-Bass.
Larrere, 1., & Williams, D. (1996, June 25). Helping workers grow with firms. The
Boston Globe. p. 18.
Lawler, E. E., III, & Ledford, G. L., Jr. (1996). New approaches to orgamzmg:
competencies, capabilities and the decline of the bureaucratic model. CEO
Publication 096-7 (30D). Los Angeles: University of Southern California, School of
Business Administration.
Ledford, G. E., Jr. (1996, October). New directions in competency-based pay. Paper
presented at the meeting of the American Compensation Association Performance
and Rewards Forum, Chicago, IL.
Meyer, H. H., Kay, E., & French, 1. P. R. (1965). Split roles in performance appraisal.
Harvard Business Review. 43 (1), 123-129.
0' Reilly, B. (1994, October 17). 3600 feedback can change your life. Fortune, 93-100.
Prahalad, C. K., and Hamel, G. (1990). Core competence of the corporation. Harvard
Business Review, 68 (3), 79-9l.
Smither, 1., London, M., Vasilopoulos, N. L., Reilly, R. R., Millsap, R. R., &
Salvemini, N. (1995). An examination of the effects of an upward feedback program
over time. Personnel Psychology, 48 (1), 34.
Spencer, L. M., Jr., & Spencer, S. M. (1993). Competence at work: Models for superior
performance. New York: John Wiley & Sons.
Van Ve1sor, E., & Leslie, J. B. (1991). Feedback to managers. Vol 1: A guide to
evaluating multi-rater feedback instruments. Greensboro, NC: Center for Creative
Leadership.
YukI, G., & Lepsinger, R. (1995, December). How to get the most out of 3600 feedback.
Training, 45-50.
Susan Ennis recently joined BankBoston heading up the newly created function
of Executive Development. Susan is responsible for creating and implementing
a competency-based executive development strategy, which includes succession,
replacement, selection, high-potential identification, retention, assessment, and
coaching. She also will manage executive educational programs and processes.
Prior to her role at BankBoston, Susan held various Human Resource
Development positions over a ten-year period at Digital Equipment
Corporation. In her last position there, Susan managed an internal consulting
group that produced competency research and applications and handled
208 Assessing Employee Competencies
evaluation and assessment projects. In her six years at McBer and Company,
Susan was a project manager, account manager, and manager of product
development. Susan has worked in the public and private sector as well as
government agencies. She has over 20 years of experience in competency-based
HRD efforts. Susan has a B.A. degree from Harvard University and a M.Ed.
from Northeastern University.
10 THE ORGANIZATIONAL ACTION
RESEARCH MODEL
Morton Elfenbein, Stephen M. Brown and
Kim H. Knight
alternative to the fragmented model that currently exists. In doing so, it draws
much from the past tradition of the action researchers as well as the action
science approach espoused by Argyris, Putnam & Smith (1985) and also the
work of Schon (1983). In this way, the needs of individual managers to
evaluate their espoused theories and their theories-in-use can be undertaken so
that their organizations can function more realistically and can respond more
effectively to the need for self-examination and change.
Professional Training
This can result in models that are not useful to practice and that can hinder
the development of the field. While a field can develop from practice or
research, a profession needs theory, models or research that have practical
applicability in the field. Conversely, practice can develop a field when it can
be generalized to a model, principle, or theory that goes beyond the unique case
and can be made useful to other practitioners.
However, the schism between theory and research that currently exists in
many professions potentially thwarts this type of development and results in
isolated practice and impractical and irrelevant science. Schon states that the
rational technical models leave practitioners with the "relevance or rigor"
dilemma (1987). This model assigns the notion of rigor to a methodology that
has become irrelevant to practice.
Evaluating Corporate Training: Models and Issues 213
Management Training
It is our view as well as Schon's that this criticism also pertains to the field of
general management. Part of this misdirection lies in the failure of managers to
learn the principles of knowledge formation, principles that knowledge
consumers and practitioners need to know. As a result, they are not capable of
effectively criticizing, altering, or developing the knowledge vis-a-vis their
practice as managers in organizations. Thus they espouse theories that are
faulty and resistant to change. They cannot do the type of research that John
Seely Brown (1991) has called research that continuously reinvents the
corporation.
generalize from them. Then too, the cases are often provided from a CEO's
perspective with much information provided, but little or no training on how to
collect the data that leads to this perspective. How to apply the information or
implications at lower levels of the hierarchy in organizations with cultures
different from the case are also not addressed.
These training programs are often generic off-the-shelf packages that are
difficult to apply, especially when the training is given to isolated individuals in
isolated parts of the organization. In addition to the above, the manager will
learn much from the actual job of managing. Experiencing what works and
what doesn't work will become part of a repertoire of behaviors and feelings.
These will become part of a reservoir of concrete experiences that are
integrations of values and feelings. Some managers may stop to reflect and
observe their behavior and the behavior of others and to compare this with the
previous education and training. A very few may actually begin to develop their
own theories of managing. But, as a general rule, most will try to experiment
with new approaches, to see if they appear to work. If they appear not to work,
the manager will search for new ideas. They will borrow ideas or parts of ideas
from current popular readings and fads and experiment to see how they work.
There are several very important classes of elements that are missing from
the picture above. Most managers probably do not learn to think about the ways
they acquire certain types of knowledge and how these different types of
knowledge are related to each other and how they are related to managing. For
example, managers may not be aware that they have been taught to have a
predilection for valuing experimentation and experience but not for other types
of knowing or other ways of collecting data. They may not appreciate that the
very job of management, as defined in our culture, has forced them into being a
nonreflective, reactive knowledge builder who blindly tries new stuff (like
forcing participative management). They rarely conceive of a systematic
Evaluating Corporate Training: Models and Issues 215
epistemology of many parts, much less one that systematically evaluates its
implications.
They know a bit about the systems nature of organizations and people but
not much. At some deep level, they may doubt the effectiveness of three or four
days of costly leadership without a truly systems-wide perspective. They know
you must systematically tie strategy, tactics, training and evaluation to
organizational goals. But, they relegate this fear to limbo hoping that human
resources or training and development folks know something more.
They know little about their own implicit theories of organizing and
managing and how extraordinarily pervasive but subtle these determinants of
their management behavior are. It is unlikely they know how difficult it is to
216 Organizational Action Research Model
change these implicit and often unproductive theories, even with three or four
days of leadership or TQM training.
To be sure, the foregoing presentation has been one-sided and biased and
has emphasized many of the negative aspects of the situation and few of the
positive. Nonetheless the caricature can serve the purpose of highlighting some
serious problems. There are a number of sources to which one can appeal to
remediate the concerns expressed above. The first of these is to study different
ways of knowing and how these different ways can be learned, trained, and
assessed. This is primarily the work of David Kolb and his types of
epistemological approaches. Second, we present the OARM model, which is an
integration of a number of organizational action research models. This is an
approach that managers can use in the resolution of many problems.
Kolb (1984), who in addition to helping us understand the nature of the failure
described above, also has provided a very clear delineation of this inability to
learn from experience. In doing this, he has introduced some conceptual tools
and empirical data for understanding the problems of training managers. Kolb's
concepts for evaluating the experiential learning and ways of understanding
that managers use is based on his theory of experiential learning.
He has developed a four-facet theory of the types of knowledge that are used
in understanding in general. In addition, he has developed a theory of
experiential learning that describes both the sequence of learning as well as a
theory about the predilection for individuals to be fixated or characterized by
one of these four types. Our model is based in part on Kolb's types of knowing
and his sequencing, which itself has been heavily influenced by the approach of
action research. The four personal knowledge types are variously called
divergent, assimilative, convergent, and accommodative. These can be seen in
Figure 10-1.
Evaluating Corporate Training: Models and Issues 217
KOLBMODEL
APPREHENSION
(CONCRETE
EXPERIENCE)
FEELING
QuaIitarivelHlIDWIistic
SYNTHETIC
EXTENSION INTENSION
(ACTIVE (REFLECTIVE
EXPERIMENTATION) OBSERVATION)
OOING WATCHINGBasic
Applied INTEGRATIVE
DISPERSIVE
COMPREHENSION
Figure 10-1. (ABSTRACT CONCEPTUALIZATION)
From Kolb (1984) THINKING
QuantitativelScientific
ANALYTIC
Divergent Know/edge
Assimilation
Convergent Knowledge
Accommodation
The final type of knowledge is derived from the active experimentation mode
and the orientation to concrete experience. This style is called accommodative
because the learner, focusing on feelings and concrete experiences, transforms
these by active experimentation without reflection or conceptualization. This
type of knowledge would involve decision making and accomplishment of tasks
in uncertain situations, a job not unlike those in general or executive
management.
Practitioners as Researchers
consists of five steps or phases, which usually occur in sequence. These phases
are represented in Figure 10-2.
FINDING
(RESEARCHING)
-·O-·EJ ,
,,
T
,-- - - - - - - - - -..... - - - - - - - - - - - - - ..... - - - - - - - - - - - - - - - --''
Practice
Practice is what the professional does. Practice is that set of experiences and
ways of understanding that determine the expected and everyday way of
behaving of the manager within an organization. This definition not only
includes behavior but also the determinants of behavior as well. Practice can be
understood by examining three levels of forces that act upon the individual.
These forces include individual, organizational, and external environmental
aspects. Individual forces refer to the knowledge, values, interests, role
definition, and role behaviors that the manager holds or does with respect to the
Evaluating Corporate Training: Models and Issues 223
job and the organization. These are defined in part by training, personality,
professional interests, and level of development as well as the predominant
knowledge orientation that the individual prefers (Kolb, 1984). These
individual forces act either in concert with or in contrast to forces that define
the organization. Organizational forces consist of the goals, expected specific
standards of practice, organizational culture, image, preferred epistemology,
and general value system that constitute the people of the organization. These
forces shape the individual through the general socialization techniques that
work to modify or alter individuals to fit existing norms and expectations.
While these organizational forces change over time, often the change is slow
and the nature of the change may not necessarily be adaptive or in the best
interest of the organization. While it is the individuals who develop and
maintain these organizational forces through formal and informal
communication patterns and through selection and retention of individuals,
more often than not the totality of these forces are beyond the ken of any single
individual. Hence, organizational activities can become nonadaptive, and
individuals may not possess a clear understanding of how or why problems
have occurred or how to change them. The ways of knowing or modal
epistemology that is characteristic of the organization may be self-limiting and
hence maladaptive. Very often change is required of organizations because of
events outside of the organization, such as existing technology, ethics and
values systems, markets for organizational services or products, other
organizations, regulatory mechanisms at the city, state, federal or international
level, as well as models or theories of either technology or organizational
functioning. These external forces are in constant change (although the speed
of change can vary from one type of organization to another). This change
requires that organizations be able to systemically anticipate, sense, and
respond to maintain organizational identity and integrity.
Reflecting
The next aspect that follows the framing of the problem is exploration. With
a framed question and perhaps a number of tentatively held hypotheses about a
diagnosis, the practitioner explores the body of accumulated knowledge to
discover alternative ways of naming or conceptualizing the problem. The
discovery of new ways of conceptualizing may produce a new consciousness
about the problem. It is this secondary background research and exploration
that enhances the reflection process and begins the process Kolb calls abstract
conceptualization. The practitioner approaches the exploration of accumulated
knowledge at this stage much as a researcher would. The difference is that the
practitioner's inquiry began from practice; a researcher usually begins with a
knowledge of the discipline and is looking to test a logically derived hypothesis.
Practitioners have methods of exploration, in addition to library research,
which lead to problem framing. These include interviews with other
knowledgeable or experienced practitioners. Particularly useful are interviews
with practitioners who have experienced a solution to the problem.
Collaboration with an informed third party, such as an academic or consultant,
can also be a useful approach. The initial exploration can produce models or
theories that approximate (or are analogous to) the problem encountered. Schon
calls these exemplars. Kolb calls the type of knowledge that derives from
reflection and causal analysis assimilative knowledge. These two steps in the
reflective process, framing and exploration, are interactive, with each one
informing the other in a circular pattern until the practitioner is comfortable
with the fit of the problem as framed. The result of this stage is a set of research
questions, framed by the practitioner in the context of accumulated knowledge.
The role played by the practitioners in this step is that of the reflective
practitioner. The practitioner has moved from tacit practice to an understanding
the problem, perhaps even multiple understandings.
Finding
The third stage of the model is that of finding. In this stage the practitioner
plays the role of a researcher and moves from the various understandings of the
problem to a data-based knowledge of the problem in its organizational context.
The various steps in this stage are quite similar to those taken by an
independent researcher. The practitioner would further explore in a focused
manner his or her current understanding of the problem. This exploration
would include further library research and probing and inquiry within the
organization. Probing and inquiry may also involve some data collection
techniques, such as a sensing interview, process observation, and ethnographic
data collection, or they may involve the use of more quantitative measures if
this is appropriate. Practitioners have an advantage in using these techniques
based on their acceptance as a participant and their holistic understanding of
the organizational context. It may seem that this data collection is repetitive,
that is, members of the organization' have already been consulted. But the
participative imperative in identifying acceptable realities requires not only
"buy in" but a mutuality of theory and constructs. Based on their acceptance as
participants and their holistic understanding of the organizational context,
practitioner/scientists are ready to design a study. The same rules and decision
considerations of scientific inquiry are present in the present approach as they
would be in any research design. Issues such as internal and external validity,
measurement reliability, and validity are extant in this research setting, which
will be done in an operating organization. This "in vivo" setting presents the
same constraints and problems that are found in action research. Hence, the
practitioner must very often make methodological choices that force
compromises, which can threaten the internal and external validity of the study.
The final aspect of this stage is the data collection summary and analysis. It is
in the context of finding" that a clear understanding of the research modalities
of qualitative and quantitative techniques becomes important. What can one
learn and find from one approach? What can one learn and find using the
other? Here we would urge not only triangulation of methods using multiple
measuring techniques but also multiplicity of epistemology.
Knowing
The fourth stage is that of knowing. It is in this stage that the practitioner
integrates the knowledge gained through all other stages. The first step is to
230 Organizational Action Research Model
interpret the data analyzed in the prior stage. The practitioner then integrates
the tacit knowledge from practice, the reflection from the second stage, the
accumulated knowledge discovered, and the data collected in the finding stage.
This integration takes place in the context of tacit understanding of the three
forces-personal, external, and organizational. Practitioners integrate this
knowledge in the assimilative and divergent and convergent modes (as
described by Kolb).
The roles played in this stage are multiple and include theorist, reflective
practitioner, data analyst, and model builder. Hence, the practitioner moves
from being a researcher to becoming an expert in the situation. The practitioner
has knowledge from many domains and several epistemologies and now has an
informed basis for generating policy alternatives and for choosing among
alternative action possibilities. It is here that a less-tentative diagnosis can be
posited. Along with the diagnosis is a theory, applicable to this organization
and its contexts. The theory suggests a causal understanding of the problem, in
all its systemic complexity, as well as a set of interventions that can alter the
situations.
Acting
In this final stage the practitioners uses the informed basis for action. The roles
played are that of change agent and expert. The steps are to plan for
implementation and evaluation, to actually implement, to gather evaluative data
through feedback and evaluate mechanisms. Hence, the practitioner moves
from expert in the situation to an experimenter and informed practitioner. The
resultant action may be a change in the system, implementation of a new model
or practice, growth in knowledge, or, in the event the implementation was not
effective, a clearer understanding of the situation that arises from action. This
final kind of knowledge is what Kolb calls accommodative. It partakes of the
result of active experimentation or action in conjunction with the apprehension
of the results of action. Hence, the practitioner/scientist returns to being a
practitioner, more informed in the area as a result of the cycle.
importance. These include, but are not limited to, issues of values in the choice
of action alternatives, ethical problems, and considerations in doing
organizational research, a specific methodology for examining one's own and
others' constructs in the organizational diagnosis process as well as the
manifold difficulties that reside in any self-diagnostic activity.
Getting There
The OARM has several uses for the training evaluator. The most obvious
application is when confronted by a problem or assignment that involves a new
knowledge area or an area with rapidly changing knowledge. The OARM is a
systematic way to understand, implement, and evaluate its applicability to the
practice field. The same application of OARM can be done as an evaluative
method for any intervention. These interventions can be what you are already
doing or the testing of a new intervention.
References
Argyris, C. (1982). Reasoning. learning and action: Individual and organizational. San
Francisco: Jossey-Bass.
Argyris, C., Putnam, R., & Smith, D. (1985). Action science. San Francisco: Jossey-
Bass.
Belenky, M., Clinchy, B., Goldberger, N., & Tarule, J. (1986). Women's way of
knowing. New York: Basic Books
Brown, J. S. (1991). Research that reinvents the corporation. Harvard Business Review.
69(1), 102-111.
Clinchy, B. (1996). Connected and separate knowing: Toward a marriage of two minds.
In Goldberger, N., Tarule, J., Clinchy, B., & Belenky, M. (Eds.), Knowledge.
difference and power (pp. 205-247). New York: Basic Books.
Elfenbein, M., Brown, S., & Knight, K. (1996). Kolb and action research: Additional
support for paradigm integration. Manuscript submitted for publication.
Lewin, K. (1984). Action research and minority problems. In Lewin, O. W., (Ed.),
Resolving social conflicts (pp. 201-216). New York: Harper & Row.
Whyte, W. (1984). Learning from the field. Beverly Hills, CA: Sage.
Whyte, W. (1991). Action research for the twenty-first century: Participation, reflection,
and practice. American Behavioral Scientist. 32(5), 499-623.
A number of issues cut across the context and theory of evaluation practice.
This section highlights issues that are both timely and timeless. The section
begins with a discussion by Patricia Lawler of ethical issues related to
educational evaluation. Lawler reviews the literature related to ethics in other
professions, and then suggests a framework for developing ethical practice in
the evaluation of training. The chapter provides the field with a solid
foundation upon which to build.
Ernest Kahane and William McCook explore some of business issues that
are driving the current certification trend in industry. The chapter provides
perspective on industry certification programs relative to other types of
professional certification, and discusses measurement concepts that are central
to the professionally sound and legally defensible implementation of a
certification program.
Introduction
Those responsible for the various aspects of training within organizations are
faced with practical problems and daily challenges as they take on the tasks
associated with evaluation. We usually identify these challenges as work issues
and management problems inherent in every organization. Yet, there are times
when we are faced with a dilemma that evokes a more visceral response.
Consider this situation:
Cathy Reardon was enjoying her job as a new member of the training team at
Rosco futernational Computing Corporation. She really likes two things about her
job, the opportunity to put her presentation skills to use and the positive corporate
culture. At Rosco there was a climate of openness that was quite evident. During the
training for new managers, there were many opportunities for the participants to
learn about the corporation and the Rosco way. So Cathy was surprised when Ralph,
a new manager, was critical of the way in which she incorporated company policy
and philosophy into the Safety Skills Training. Although the discussion did not last
long, Cathy could sense that this participant was disgruntled and wanted to keep
corporate values separate from skill training. When the course finished and the
evaluations were turned in, Cathy saw that although her evaluation as a presenter
was good on all the forms, one evaluation stood out. She suspected it was Ralph
because many of the comments on the evaluation were the exactly same as those he
had made during the course. This participant objected to "the culture indoctrination
aspect of the course, which got in the way of learning the skills."
238 Ethics of Evaluating Training
About a week later Richard, the director of training, approached Cathy and
informed her that he had heard that one of the new managers who had attended her
training was disgruntled and not buying into the Rosco culture: "Cathy, people like
that just don't make good managers here. We need to identify them right in the
beginning." Richard asked her to send him the original evaluation forms and identify
the manager. Cathy wonders what to do, thinking that both she and the participants
assumed that the evaluations were anonymous and confidential. On the other hand,
Richard is her boss and expects her to follow his requests.
Situations like the one facing Cathy are common in work life. They
challenge us in our professional roles and may cause stress, conflict, and even
termination of employment. In this case, what is Cathy to do? What procedure
should she use to think through this dilemma that Richard has posed? Where is
she to tum for guidance? What are her options? Identifying this incident as an
ethical problem and understanding her perspective on ethical problems are the
ftrst steps in the problem-solving process.
Professionals like Cathy will find few resources in her field when faced with
ethical dilemmas. If she turns to the literature on training, she would find little
discussion of ethics, and if she never took a course in philosophy or ethics in
college, Cathy may feel somewhat intimidated by academic writings on the
subject. Without these, Cathy may be unaware of the ethical dimensions of her
work in training and lack an ethical framework with which she can describe
her problem and develop an ethical solution.
Evaluation of Training
Along with others (Brookfield, 1988; Murninghan, 1989), Basarab and Root
focus on the process as a value-laden endeavor occurring in an cultural context.
Gordon (1991) sees evaluation as part of a system, stressing that to be
meaningful evaluation must be considered within a broader context. Brookfield
(1988) specifically describes evaluation's "political nature" (p. 90), that is, how
it is influenced by the actors or stakeholders in the process.
The evaluation process therefore provides fertile ground for conflicts and
dilemmas as personal and professional values, along with organizational
culture and goals, collide. Dealing with these issues in an effective way is an
240 Ethics of Evaluating Training
The case study presented at the beginning of this chapter reveals several
characteristic features of an ethical problem.
First, there is a question about right and wrong conduct. Cathy was asked to
provide information about the participants that they might not have provided
had they known it would be used for personnel decisions. Cathy is concerned
that it may be wrong to provide that information. Note that this question is not
the same as asking if it would be an effective managerial tool to use the
evaluations in this way. That frames the issue simply as a management
problem. From that standpoint, it might be very useful to use this information
to weed out the Ralphs who do not fit the company's culture. But the ethical
issue is whether it would be right to treat the participants this way, not whether
it would advance organizational goals.
Standards of right and wrong conduct are stated as rules or principles. Some
are familiar, garden-variety ethical rules, such as don't lie, don't cheat, don't
steal, etc. The Ten Commandments are a set of ethical rules, as are professional
and business codes of ethics. These rules state our obligations to others to
conduct ourselves as the rules require. Obligations typically can be restated as
rights. My obligation not to reveal confidential information can be restated as
your right not to have that information revealed.
The question for Cathy is this: "What ethical rules of conduct do I apply to
this situation?" Her organization's code of ethics, if there is one, and its
policies on confidentiality would be a good place to start. There are professional
standards and codes of conduct that also should be consulted. These will be
discussed later in the chapter.
follow legitimate management directives. Are botb obligations valid ones? If so,
which should take precedence? Professional standards and codes of etbics are
often helpful in resolving tbese kinds of situations.
Why is Catby's problem an etbical one? First, tbere are professional rules of
conduct regarding participant confidentiality and tbe participants assumed tbat
tbeir evaluations were confidential and would not be used for otber purposes.
On tbe otber hand, Catby is a member of an organization, Rosco International
Computing Corporation, and a subordinate to Richard. She has obligations to
follow legitimate organizational requests and work for organizational goals.
These obligations seem to be in conflict, creating an etbical dilemma.
There are several frameworks Catby could use to increase her etbical
sensitivity. One such framework has been proposed by Zinn (1993). The
following nine questions, when answered yes, may indicate tbat you are facing
an etbical dilemma.
242 Ethics of Evaluating Training
1. When I (or others) talk about this matter, do we use key words or phrases, such
as right or wrong, black or white, fair/unfair, bottom line, should, appropriate,
ethical, conflict, or values?
8. Do I have a gut feeling that something is not quite right about this?
Ethical dilemmas require action. Cathy needs to make a decision and respond
to Richard's request. Framing and analyzing the ethical problem in decision-
making terms is useful. Many models have been proposed, providing steps and
questions to aid training evaluators in this ethical decision-making process
(Lawler and Fielder, 1991; McLagan, 1989; Newman and Brown, 1996;
Walker, 1993). These models are similar in that they suggest a linear approach
to decision making. McLagan (as cited in "Ethics for Business," 1991) sets out
a model for HRD professionals, which includes four steps:
Evaluating Corporate Training: Models and Issues 243
1. Identify the quandary and the ethical issue involved. Identify the goals a
solution would achieve.
This model provides a framework, but also assumes that the training
professional has a working knowledge of ethical issues and is clear on his or
her own values. Although helpful, a model such as McLagan's may not take
into consideration all of the values and contextual factors inherent in an ethical
dilemma. Walker (1993) reminds us that the competing and numerous values
and interests in these dilemmas make the decision-making process difficult.
"Decisions are made in the context of professional, social, and economic
pressures, which may clash with or obscure the moral issues." (p. 14) Austin
(1992) talks about risk taking as a consideration, as well as the severity of the
problem.
2. What are my alternatives? This step asks the decision maker to consider a
number of different responses, such as complying with the request, refusing to
provide the information, and as I discuss below, finding a way to help Richard
evaluate Ralph without using information from the training evaluation.
244 Ethics of Evaluating Training
5. Of all things considered, ethical obligations, other goals, and practical matters,
what is the best alternative? Decision making is ultimately a matter of striking a
balance among conflicting demands. While there is no formula for determining
this balance, making the elements explicit and clarifying their importance will
insure that the important factors will be taken into consideration.
Role Responsibility
Training professionals have many roles and along with those roles come
responsibilities and obligations to various stakeholders. Considering the various
roles professionals have is useful to incorporate contextual factors into the
decision-making process. It is important to be aware of these various roles, such
as, trainer, evaluator, reporter of data, administrator or manager, member of a
profession, and member of society (Newman and Roche, 1996). Austin (1992)
also reminds us of our personal roles, and their concurrent values, which may
influence our work life. With each of these roles come specific, and at times
conflicting, responsibilities. For instance, Cathy is faced with her role as a
training evaluator and its responsibility of participant privacy. At the same
time, she is faced with her role as an administrator obligated to her manager,
Richard. Her goals as an evaluator and those of Richard, her boss, collide in our
case.
The concept of role responsibility helps to clarify the obligations people have
by deriving some of them from their roles. According to this approach, each
Evaluating Corporate Training: Models and Issues 245
Typically these obligations are owed to various stakeholders, those who have
a significant stake in the professional's decisions. In Cathy's case, these would
include the participants who are being evaluated, the requester of the
evaluation, the organization within which the evaluation takes place, and of
course, Cathy herself.
Ethical Principles
• Provide employers, clients, and learners with the highest level of quality
education, training, and development.
246 Ethics of Evaluating Training
• Comply with an copyright laws and the laws and regulations governing my
position. Keep informed of pertinent knowledge and competence in the human
resource field.
• Contribute to the continuing growth of the society and its members. (Gordon &
Baumhart, 1995, p. 6)
This code presents a call to high standards with very general directions for
specific issues and decision making. Although professionals need generalized
codes to strive for in their practice, when faced with a dilemma the codes may
not offer much help. For those specifically involved with evaluation and its
unique issues, this code is not precise enough to be very helpful.
This would not be of much help to Cathy, since she is not being asked to reveal
information belonging to one client to another.
(p. 41). However, she fails to present a discussion of these issues or their
underlying ethical principles.
There are seven utility standards, which provide guidance for how
evaluations should be "informative, timely, and influential. Overall, the utility
standards define whether an evaluation serves the practical information needs
of a given audience" (The Joint Committee on Standards for Educational
Evaluation, 1994, p. 5). They include stakeholder identification, evaluator
credibility, information scope and selection, values identification, report clarity,
timelines and dissemination, and evaluation impact. This last standard deals
with use of the evaluation and not with the integrity with which the evaluation
is used.
The setting where evaluation takes place is the focus of the three feasibility
standards. "Taken together, the feasibility standards call for evaluations to be
realistic, prudent, diplomatic, and economical" (The Joint Committee on
Standards for Educational Evaluation, 1994, p. 6). They include: practical
procedures, political viability, and cost effectiveness. Here a warning of
political context is found but nothing to guide the evaluator through the
context.
Again, these provide direction for action, but little help in conflicting
situations.
Although these standards are geared to the educational evaluator, the spirit
in which they are written and presented has worth for those in training.
Basarab and Root (1992) cite these standards as important to conducting
metaevaluations.
4. Respect for People: Evaluators respect the security, dignity, and self-worth of
the respondents, program participants, clients, and other stakeholders with
whom they interact.
5. Responsibilities for General and Public Welfare: Evaluators articulate and take
into account the diversity of interests and values that may be related to the
general and public welfare (p. 20).
These guiding principles address ethical issues in a general way but once
again give little guidance in the actual practice of making ethical decisions.
250 Ethics of Evaluating Training
While there is value in all of these different codes of ethics and standards,
the fact remains that none of them was specifically designed to address ethical
issues in training evaluation. As a result, they cannot provide much guidance to
practitioners trying to deal with ethical problems.
should refuse Richard's request explaining why it would not be right to do so.
She should also acknowledge the company's legitimate goal of weeding out
managers who do not fit the company culture. A political solution would be to
offer her observations about Ralph, based on her perceptions of his behavior in
the class. That information is not confidential and there are no ethical
prohibitions to sharing her observations of managers to her superiors. This
solution separates two legitimate demands-<:onfidentiality and personnel
evaluation-and meets both. Many good solutions to ethical problems have this
structure: find a way to meet organizational goals that does not infringe on the
rights of any of the stakeholders.
2. set out some alternatives: give Richard the information, refuse to provide it
3. considered what ethical rules and principles should be used to evaluate her
alternatives: use training evaluation information only for that purpose unless
participants have been informed otherwise
5. considered obligations, practical matters, and other goals, and then chose the
best alternative: find another ethically acceptable way to assist Richard in his
personnel evaluation.
The crucial element is the identification of the ethical rules and principles to
guide evaluation of alternative courses of action. At the present time,
practitioners like Cathy have little guidance from their professional
organizations that would help them understand the ethical problems in their
work as training evaluators and that would provide guidance regarding their
ethical obligations.
practitioners like Cathy create stressful dilemmas and poor practice. The
profession has an obligation to identify typical ethical problems that arise in
training evaluation and establish guidelines for practical application. Issues,
such as confidentiality, conflict of interest, bias, misuse of information, and
interpreting data, appear over and over again in the general evaluation
literature. These general discussions of issues and problems are not necessarily
useful. How specifically evaluators of training deal with these issues needs to be
addressed. What are the specific ethical problems that evaluators of training
face in their practice? It is critical to have these real-life problems identified
and discussed.
(Lawler and Fielder, 1993). Yet, there is little discussion in the training
evaluation literature of these issues.
References
The American Evaluation Association, Task Force on Guiding Principles for Evaluators.
(1995). Guiding principles for evaluators. In W.R. Shadish, D.L. Newman, M.A.
Scheirer, & C. Wye. (Eds.). Guiding principles for evaluators, San Francisco: Jossey-
Bass.
254 Ethics of Evaluating Training
Basarab, D. I., Sr., & Root, D. K. (1992). The training evaluation process: A practical
approach to evaluating corporate training programs. Boston: Kluwer Academic
Publishers.
Burns, J. H., & Roche, G. A. (1988). Marketing for adult educators: Some ethical
questions. In R.G. Brockett (Ed.). Ethical issues in adult education (pp. 51-63). New
York: Teachers College Press.
Cervero, R. M., & Wilson, A. L. (1994). Planning responsibly for adult education: A
Guide to negotiating power and interest. San Francisco: Jossey-Bass.
Department of Education. (1995). The program evaluation standards (Report No. EDO-
TM-95-7) Washington, D.C.: The Catholic University of America. (ERIC Document
Reproduction Service No. ED 385 612).
Gordon, E. E., & Baumhart, J. E. (1995, August). Ethics for training and development.
INFO-LINE, (9515). Alexandria, VA: American Society for Training and
Development.
The Joint Committee on Standards for Educational Evaluation. (1994). The program
evaluation standards (2nd ed). Thousand Oaks, CA: SAGE Publications.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco:
Berrett-Koehler Publishers.
Lawler, P. (1996). Developing a code of ethics: A case study approach. The Journal of
Continuing Higher Education. 3(44), 2-14.
Lawler, P., & Fielder, J. (1991). Analyzing ethical problems in continuing higher
education: A model for practical use. The Journal of Continuing Higher Education.
39(2), 20-25.
Lawler, P., & Fielder, J. (1993). Ethical problems in continuing higher education:
Results of a survey. The Journal of Continuing Higher Education. 41(1), 25-33.
McLagan, P. A. (1989). Models for HRD practice, Alexandria, VA: American Society
for Training and Development.
Mitchell, G. (1993). The trainer's handbook: The AMA guide to effective training. New
York: Amacom, American Management Association.
Newby, A. C. (1992). Training evaluation handbook, San Diego, CA: Pfeiffer &
Company.
Newman, D. L., & Brown, R. D. (1996). Applied ethics for program evaluation,
Thousand Oaks, CA: SAGE Publications.
Walker, K. (1993). Values, ethics and ethical decision making. Adult Learning, 5(2),
13-14,27.
Zinn, L. M. (1993). Do the right thing. Ethical decision making in professional and
business practice. Adult Learning, 5(2), 7-8.
delivered for another function in the same company has been evaluated as a
dismal failure by both the participants and their managers. Training
professionals in a variety of settings can anticipate becoming increasingly
perplexed by these types of experiences. What new insight will we need to
address these experiences? What additional knowledge or considerations will
increase our ability to evaluate training in this ever-changing environment?
Perhaps part of the answer lies in the uncomfortable realization that most, if
not all, of the underlying assumptions we have come to accept as gospel for
evaluating training may not be universal truths-that the templates we have
created, which tell us whether a training program has or has not been effective
may not be so easily transferred in a multicultural setting with equal
effectiveness. Taylor Cox, author of the book Cultural Diversity in
Organizations: Theory, Research and Practice, urges us to consider the
elements of cultural differences.
Significant culture group differences exist not only within societies but
within organizations, work environments, and across functions. The potential
impact of culture on evaluation effectiveness and the resulting implications are
significant.
1. Culture consists of ideals, values, and assumptions about life that guide
specific behaviors.
2. Culture consists of those aspects of the environment that people make.
Evaluating Corporate Training: Models and Issues 259
Helen and Robert Moore, partners and senior consultants for the Interactions
Group in Massachusetts, offer the metaphors of vantage point and filters to
260 Cultural Dimensions of Evaluation
articulate culture and its impact. 1 Culture is the product of our vantage point,
who we are and where we are, as well as our experiences over a lifetime that
ultimately shape our perceptions, values, and beliefs. These are the elements of
our cultural response to data and experiences, including training experiences.
From one commonly held cultural vantage point, we may often assume that
anonymity ensures objectivity. After all, if participants do not have to reveal
their identity, then there is a good likelihood that they will be bluntly honest
with their feedback. Deal, Kennedy, and Allan, authors of the book, Corporate
Cultures: The Rites and Rituals of Corporate life, remind us that values are
the bedrock of culture. Yet, if preserving face and not offending anyone were
values deeply ingrained in someone since childhood, the anonymity of the
evaluation process (form) would have little to do with the quality of the
feedback (substance). Do we unintentionally value the form of evaluation more
than its substance?
Certainly quantifiable data generated by Likert scales and analyzed using the
holy grail of statistics are beyond refute, right? Perhaps. Yet consider the fact
that much of the evaluation data we generate are actually qualitative in origin.
"How confident are you 1" "Rate the degree to which you feel ... " "To what
extent will you use ... 1" Feelings, attitudes, and opinions are reflections of
deeply rooted cultural values. Some cultures (e.g., American) do not seem to
have a great deal of difficulty being self-critical. Yet, the cloak of scientific
method alone may not be enough to account for the influence of collectivism on
a group or the difficulty participants from other cultural backgrounds may have
when asked to be introspective and self-critical.
The p.oint here is n.ot t.o simply questi.on the value .of using written
evaluati.on instruments in multicultural settings. If that were the .only issue, it
eQuId easily be res.olved by substituting .observati.on .of behavi.or and
measurement .of results as indicati.ons .of training effectiveness. Yet imagine the
difficulty f.or a trainer attempting t.o accurately assess the behavi.or .of an
individual .or individuals wh.ose cultural upbringing placed an abs.olute
premium .on .one's ability t.o .outwardly c.ontr.ol any expressi.ons .of em.oti.on.
C.onsider the impact .on evaluati.on if this includes disagreement .or
dissatisfacti.on with a learning experience?
It might appear .one eQuId perhaps argue that the best way t.o evaluate
training in a multicultural setting w.ould be t.o cut t.o the chase and c.onsider
.only the quantifiable results .of th.ose eff.orts. Perhaps that is true. Yet assume
for a m.oment that the results .of a particular training interventi.on were less (.or
m.ore) than expected. W.ould y.ou n.ot find y.ourselfback at the beginning, asking
questi.ons such as, Did the training pr.ogram achieve its .objectives? Did
participants learn anything? Did they actually apply what they learned when
they returned t.o their j.obs? The p.oint is that relying solely .on measuring the
results .of training, as we always have, may .only return us t.o the fundamental
questi.on: Was training effective?
A good place t.o start is by understanding the s.ource .of .our challenge-
s.omething called cultural bias. Culture, whether ethnic .or .organizati.onal,
nati.onal .or internati.onal, has always been an issue in training and its
evaluati.on. But as the W.orld bec.omes smaller, econ.omic activity m.ore gl.obal,
and the American w.orkforce more diverse, cultural influences and their impact
.on training and its evaluati.on are bec.oming m.ore n.oticeable, and m.ore
meaningful.
262 Cultural Dimensions of Evaluation
There are many reasons members of various cultures develop different ways
of looking at and dealing with issues of human behavior. Like lenses on a pair
of glasses, members of one culture can look at, interpret, and draw entirely
different conclusions from the same event, depending upon which cultural lens
they are looking through. For example, singling out a particular workshop
participant and asking for his or her opinion on a point you have been trying to
make would be perfectly acceptable behavior in a culture that emphasizes
individualism. Yet, take the same situation and put it in another context, in a
cultural setting that emphasizes collectivism and the value of saving face, and
such an innocent act may very well produce unintentional embarrassment and
anger.
The point is that culture determines the way people see things, and not
everyone sees the same things the same way. In itself, culture is neither good
nor bad; it simply is. What is correct and proper behavior in one culture may be
considered incorrect and improper in another. Cultural differences are not, or at
least should not be, problems. What becomes problematic is when we interpret
behavior, pass jUdgment, or draw conclusions based on our cultural orientation
and take action(s) based on the stereotypes we have developed. Often, it is what
we do as a result of our cultural bias that creates difficulties for others, as well
as organizations, and even nations.
Evaluating Corporate Training: Models and Issues 263
The point is that we may not be able to change who we are or completely
separate ourselves from the way we see the world. Yet, as HRD professionals,
we can develop a greater awareness of and sensitivity to the issues of cultural
differences and how those differences influence the way we measure and
interpret behavior. When you consider that the ultimate objective of training is
to change behavior, even acknowledging the potential implications cultural
differences may have on that process becomes profoundly important. One place
those differences come to light dramatically is when we evaluate the
effectiveness of our training efforts.
Even the best work on culture may not fully express the complexity of this area
or the potential implications for its impact. Clearly, a new look at culture is
important because of the relative progress made in acknowledging and valuing
human differences. Perhaps unlike any other point in history, organizations are
beginning to legitimize the dialogue around human differences. There is an
emerging enthusiasm for greater insight and both greater knowledge and skill
in successfully engaging across cultures. As our world becomes smaller, this
ability is becoming more highly valued as a basic skill.
While dialogue regarding human differences and our abilities to value them
may be experiencing greater legitimacy, at the same time we are also
experiencing growing interest in evaluating the effectiveness of training.
Changing organizations are demanding clarity about the value-added of every
expenditure, and training is certainly no exception. Given that reality, what
would happen to training, and more importantly the value it delivers to an
organization, if the following cultural implications were true?
264 Cultural Dimensions of Evaluation
If our definitions and even one of the above cultural implications about
training are true (and we suspect they all are true), then at the very least, to be
effective, our tools, approach, and methodology for evaluating training must
take into account the context of culture and its implications. The question is
How do we do that?
One undeniable truth is that more and more business is being done today in an
increasingly complex, multicultural context, and that includes training. As
HRD practitioners, we rely on and make use of information in many different
ways, as well as form impressions and make judgments based on our own
background and experiences. Culture provides us with a framework within
which we do all of this.
Yet, what happens when multiple cultures are involved in training and its
evaluation? It is possible that what we, as trainers, perceive as good might
actually be considered bad by our participants? What we think is effective may
actually be ineffective? What we intend to be positive feedback may actually be
viewed as neutral or, worse, negative feedback? Actions that we believe indicate
great enthusiasm and learning may simply not mean that at all?
What we don't lack are a selection of evaluation models. There are literally
dozens of ways to gather information regarding training's effectiveness. Yet,
whether we use Kirkpatrick's (1994) model, the Bell System Approach, CIRO
(context, input, reaction, outcome), or CIPP (content, input, process, product),
our ultimate goal remains the same. Regardless of the context in which training
has been delivered, the goal is to collect and analyze appropriate data to
determine the value and effectiveness of our efforts, and to make whatever
design or delivery changes and improvements necessary to gain maximum
benefit for the time and effort expended.
• To what extent did the training program achieve its performance and
learning objectives? Did it accomplish what it set out to do?
• How effective was the instructor in providing knowledge and facilitating
skill building?
• How effective were the activities in accomplishing specific performance or
learning goals?
• How much, if any, learning took place?
• At the end of the program, how confident were the participants that they
could apply what they learned in the classroom back on the job?
• What specific barriers or obstacles will participants face on the job that
may make it difficult or impossible for them to use their new skills?
266 Cultural Dimensions of Evaluation
• To what extent, if any, did participants actually use what they learned in
the classroom back on the job?
• What results, if any, were produced? Did behavior change, and did the
change produce the result(s) anticipated?
• Were the benefits, both quantitative and qualitative, worth the time, effort,
and resources?
• If you had the chance to do it again, but differently, what would you do and
why would it work?
Answers to these and other evaluatory questions come from many different
sources. Participants in the training session or program provide written and
verbal feedback using preprinted questionnaires or small group interviews and
discussions. As trainers, we form impressions through observation. At times,
we generate quantifiable data measuring the amount of learning and the results
produced when concepts and techniques are applied. Yet, regardless of
methodology, we need to be aware of the tremendous influence culture plays on
what is written, spoken, and observed, and more important, the meaning we
place on all this information.
In spite of all of this, scores on the reaction evaluations were the highest
you've ever seen. Average ratings on the conflict resolution styles exercise and
Evaluating Corporate Training: Models and Issues 267
role-plays were both Very Good. In fact, there was not a single score below
Good. Surprisingly, participants rated your performance as an instructor as
Excellent, and many wrote they couldn't wait for your next workshop! What's
going on here?
Impressions are also formed through observation. Here, again, culture can
play an important role in the way we, as trainers, interpret visual cues that help
us form judgments and draw conclusions about others' behavior. How can
misinterpreted cues in a multicultural setting lead to incorrect conclusions?
Let's look at another example.
While no one likes to role-play, getting two participants to volunteer for the
exercise was nearly impossible. After trying every technique in the book, you
finally had to pick two people to play each role while others were to observe
their performance and provide feedback. Once the "volunteers" made it to the
front of the room, the role-play started off slowly and went downhill from there.
Delivering praise went fairly well, although you were surprised to hear the
"employee" respond with denial. Providing constructive criticism changed
little. Speech patterns were stiff and formal, and very little useful information
on specific performance behavior was discussed. After the exercise, class
feedback on the performance was amazingly supportive, even complimentary.
Clearly, most of the major points you made during class about how to conduct a
proper performance appraisal were missed, right?
Perhaps, but another explanation may have to do with the cultural dynamics
at play. In this exercise, a number of cultural influences are evident in both the
process as well as content. In addition to issues of hierarchy and face, there are
examples of
• Collectivism
• Self-efficacy
• Indirectness
Evaluating Corporate Training: Models and Issues 269
• Risk aversion
Cultural influences may also have been at work in terms of the content. In
many cultures, formal performance reviews are not common. In our example,
when dealing with praise, a strong tradition of humility often translates itself
into self denial-a self-effacing behavior. On the other hand, when faced with
criticism, even when done constructively, issues of face arise again, and at least
the potential exists for causing permanent damage to the boss/subordinate
relationship by pointing out embarrassing mistakes. Finally, concern over not
offending anyone is often demonstrated by an indirect, almost ambiguous
communication style.
The high-tech industry, known for its speed and innovation, would provide a
stark contrast to many insurance companies where speed and innovation may
not carry the same level of importance. Developing training that would be
effective in these environments, without taking into account the distinct
cultural differences each might require, would generate questionable results.
reasons for taking into account the impact of culture on our delivery
assumptions and the methodology we choose for evaluating our training efforts.
Make no mistake about it; cultural differences can dramatically influence our
judgment as HRD practitioners and the conclusions we draw even from a
seemingly straightforward attempt to evaluate the training we deliver.
As business and industry become more global, so will training and its
evaluation. In this context, cultural issues, which have always been present but
largely ignored, now take on much more meaning as their influence on training
design, delivery, and evaluation become more transparent and understood.
While there may be a number of behaviors that are similar across different
cultures, there are also an equal number of behaviors that are unique to a
specific culture. This mosaic of similarities and differences carries with it
tremendous opportunities, challenges, and implications for training and its
evaluation.
As you might expect when dealing with such a complex issue as culture and
its impact on behavior in training, there is no one simple answer. Yet, in
practical terms, there are a number of actions you can take in response. What
we offer is a menu of ideas and actions compiled from our own personal HRD
experiences and those of our colleagues in dealing with this marvelously
complex and rich tapestry of multiculturalism in training and development.
One fundamental yet critical step in the process is acknowledging the fact
that important cultural differences do exist, differences that can affect the
effectiveness of your training and evaluation efforts. Equally important is
recognizing that your personal cultural biases influence your instructional
design and evaluation.
Once you have reached this point, it is important to work closely with
individuals who share the same cultural values and attitudes as those of your
target audience. At the very least, you should consider working with a cultural
expert or coach. Foreign university students taking English as a second
language are often an excellent and cost-effective source for finding such
experts and coaches. The purpose in doing this is to have someone review and
suggest modifications to your evaluation approach and tools-someone who
wears the same cultural lenses as the group you intend to work with. Not only
does this improve the chances that the intended meaning of words found in
your written questionnaire(s) and tools are carried over, but it also increases the
likelihood that you will actually measure what you want to measure. What
comes out the other end of all this activity is a more culturally appropriate and,
consequently, more effective training and evaluation process.
When the time comes for evaluating the results of your training efforts, you
can take a number of simple yet effective actions to improve the quality of the
information you receive, especially when using written instruments. Beyond the
more obvious need(s) for translation and using words and phrases that carry the
intended meaning, using reaction evaluation questionnaires to determine if the
program met its objectives, or paper-and-pencil examinations to measure the
amount of learning that took place can present special challenges. For example,
there are situations when using paper-and-pencil examinations to test for
knowledge may not be well received in some cultures that would consider such
an exercise childish and, therefore, insulting to them as adults.
Once you have evaluation data in hand, you need to be sensitive to your own
attitudes, values, and beliefs and how they influence your interpretation of both
written information and observed behavior. This is especially important when
using your own norms for making judgments of others' behavior. An
emotional, warm, vocal approach may be entirely acceptable in the United
States, but completely inappropriate in a culture that places a premium on
controlling emotions.
If there is only one idea you walk away with after reading this chapter, we
hope it is this. When working in a multicultural training environment, above
all else, be open to a different way of being. It may help to keep in mind that
the goal of evaluation in a multicultural context is to find a method(s) that is
acceptable to the predominant culture of the participants you are working with
and provides the answers you need to determine effectiveness.
same issue differently and enjoying the amazing insights different perspectives
can provide. It is also about being yourself but increasing your awareness that
there are differences, which can be positive when recognized as opportunities to
learn, not just inform.
Personal Communication.
References
Cox, T. (1993). Cultural diversity in organizations: Theory. research and practice. San
Francisco: Berett-Koehler Publishers.
Deal, T., Kennedy, E., & Allan, A. (1982). Corporate cultures: The rites and rituals of
cO!:porate life. Reading, MA: Addison-Wesley.
Kirkpatrick, D. L. (1994). Evaluating training programs: The four levels. San Francisco:
Barrett-Koehler.
fifty different organizations inform her work and perspective in both broad and
specific ways. An urgency for the development of organizations as dynamically
inclusive, high-performing systems, maximizing total human potential is a core
theme throughout her work. She is a popular presenter on the national
conference circuit with topics Training Managers to Manage Diversity, The
Affirmative Action/Diversity Debate, and The Difference You Make. Her
writings include articles and chapters in human resource development and
training publications, such as Managing Diversity Is Simply Good Business,
The Elements of Competence, and Tools and Activities for a Diverse Workforce
for The American Society for Training and Development, and published by
McGraw-Hill. Sadie received her B.A. from Wellesley College and a Master's
degree in Management from Lesley College School of Management.
Introduction
In a 1996 industry survey, Training magazine reported that fewer than balf
of the companies surveyed-only 37 percent-report that they use computer-
based training (Industry Report, 1996).
EPSS has been far less pervasive than multimedia, computer-based training.
A 1996 report found 15 percent of organizations larger than 10,000 employees,
and 5 percent of smaller organizations, had EPS systems (Industry report,
1996,79).
282 Impact of Technology on Training Evaluation
The requisites for these technologies are desktop computers or terminals and
a computer network. In 1996, more than 60 percent of all business computers
attach to local networks (Edwards, 1996).
Widely available and virtually free, Web browsers, such as Netscape and
Internet Explorer, provide a universal desktop interface. The hypertext mark-up
language, HTML, provides the tool for quickly and easily creating corporate
information and training that can be viewed on any platform. Developers no
longer have to worry about what kind of computer each of the trainees is using,
how much RAM they have installed, whether or not they have a CD-ROM
drive, or what version of the operating system they are using.
Initial training that used this technology was simply text and graphics with
embedded hyperlinks. But, with the advent of audio and video streaming-
technologies that play multimedia as it is being downloaded-and the
development of tools for creating animations and interactivity, the basic
elements for building effective, multimedia training are available. However, the
capability to deliver high-quality multimedia over a network is still in its
infancy. Bandwidth continues to be the major stumbling block, even on
corporate intranets with dedicated higher speed lines. Planners may have to
abandon an instruction ally sound multimedia design because large graphics
and photo images, video, and audio take too long to download.
With these new technologies, the network can become the distribution
mechanism, and the browser the delivery mechanism for training and
penormancesupport.
Thus, virtually anyone with access to the Internet and basic word processing
tools can create Web-based training and information and distribute it. Learners
become not only consumers of information and training but creators and
publishers as well.
This shift puts the trainee rather than an instructor in the driver's seat. The
trainee picks and chooses training and information modules as needed. He or
284 Impact of Technology on Training Evaluation
she selects and constructs training and information to meet perceived needs. It
also becomes possible to treat training modules as data objects and store them
in a network-accessible database. Then, based on the results of a pretest or
interest inventory, for example, automatically deliver appropriate training
modules to the learner.
Effectiveness
Several studies demonstrate that traditional training does not result in adequate
behavior change (Tannenbaum and YukI, 1992; Baldwin and Ford, 1988;
Broad and Newstrom, 1992). Only about 10 to 20 percent of trainees who take
traditional corporate training report actually using the training in their jobs.
Efficiency
reported nearly a 40 percent drop in the time it took for their employees to learn
basic business practices (Caudron, 1996).
Cost to Create
Costs-Instructor- Costs-Web-
Led Training Based Training
Course Development $ 50,000 $150,000
Travel & Expenses Per Off-site Trainee $ 1,000 0
Traininf? Materials per Trainee $ 50 0
Instructor Per Session $ 2,000 0
Total Cost to Train: 20 $ 63,000 $150,000
40 $ 76,000
60 $ 99,000
... ...
140 $ 151,000
500 $ 375,000
1000 $ 700,000 $150,000
Evaluating Corporate Training: Models and Issues 287
As the number of trainees who require the same training increases, Web-
based training becomes more cost-effective. In the example, instructor-led
training is far more cost-effective for fewer than 100 trainees; the break-even
point, where instructor-led and Web-based training cost the same, is at 140
trainees. Dramatic cost savings can be achieved for 500 or 1000 trainees with
Web-based training.
The example also demonstrates that greater risk is associated with the
development of technology-driven training. This is because it requires a
greater, up-front commitment of corporate resources. Abrupt changes of
directions, not uncommon in corporate management, can render just-completed
training instantly obsolete.
Figure 13-1 shows the typical kinds of activities that need to occur to
evaluate technOlogy-driven training. Notice that the entire process, from
planning to deployment, is bracketed by business objectives and measurable
business outcomes. Training objectives and goals should devolve from business
objectives; evaluating the success of training requires assessing the business
outcomes that were linked to those business objectives.
288 Impact of Technology on Training Evaluation
There are four evaluation activities needed during planning for technology-
driven training: business case definition, needs assessment, requirements
definition, and cost justification.
First, working with management, planners build a business case for the
training that articulates the measurable business benefits that will result. Then,
planners perform a needs assessment. So far, these activities, are no different
from those that occur when planning traditional training.
Development
• Prototype testing
• Pilot testing
Deployment
• Usage tracking
• User feedback
.
• Testing, where appropriate
• Business measures assessment
Business Case
Training planners work with management to build a business case for training.
Planning begins with the recognition that, in order to meet specific business
objectives, training is required. A business case answers three key questions:
Needs Assessment
Requirements Definition
Cost Justification
Prototype Testing
The results of prototype testing are used to modify the training design. For
example, a confusing user interface may need to be refined. Additional practice
exercises may need to be added. Lessons might need to be shortened to match
user attention span.
Pilot Testing
Pilot testing is done before deployment. The kinds of major changes that were
possible after prototype testing are no longer feasible. Thus pilot testing-a
preliminary roll out of the training to a subset of the target audience-focuses
on ensuring that the training will function smoothly once it is deployed.
Pilot testing requires a method for testers to report problems and a method
for the problems to be logged and corrected. Courseware may need to be revised
and released several times during piloting before it is ready to be deployed.
Evaluating Corporate Training: Models and Issues 293
During deployment, the focus of evaluation shifts from training to trainee, and
from trainee to business outcomes. New concerns: Are people who need
training accessing it? Are trainees applying the skills they learn? Is the
organization making progress towards meeting the specific business objectives
the training was designed to support?
The challenge for management becomes what to do with the vast amounts of
data that potentially can be collected. If training usage drops off, it may be
because potential users perceive it as useless. Or it may be because it is not
currently needed. Poor survey ratings may indicate that training is ineffective.
On the other hand, poor survey ratings would also result if training is being
administered to the wrong audience. Simply tracking these kinds of training
outcomes does not, by itself, provide training evaluation.
Surveys, tests, and usage reports have other drawbacks as well. First, they
break down when there is no longer a discrete course or curriculum that gets
covered. And second, they fail to measure skills transfer or impact on the
organization.
The focus of evaluation must shift from discrete learning outcomes, which
have been the focus of traditional training evaluation, to the business outcomes
that have been the focus of traditional management. The value of training is
measured using the same tangible evidence that managers use to measure
business results.
References
Brinkerhoff, R.O., and Gill, S.J. (1994). The learning alliance: Systems thinking in
human resource development. San Francisco: Jossey-Bass.
Dixon, N. M. (1996). New routes to evaluation. Training & Development. 50 (5), 82-
85.
Baldwin, T. T., and Ford, S. K. (1988). Transfer of training: A preview and directions
for future research. Personnel Psychology. 41 (1),63-105.
Broad, M., and Newstrom, J. (1992). Transfer of training, San Francisco: Addison-
Wesley.
Edwards, R. (1996, January 12). Software: Land of 1.000 niches -- Part II: Global
network computing software investment opportunities and pitfalls as computing
becomes communicating. San Francisco: Robertson, Stephens & Company
Institutional Research.
Multimedia training in the Fortune 1000: A special report on the status of multimedia-
based training in the nation's largest companies. Training, 33 (9), 53-60.
Tannenbaum, S., and YukI, G. (1992). Training and development in work organizations.
Annual Review of Psychology 43, 399-441.
Violette, J. (1996, October 7). Putting interactive training online. New Media, 6,46.
Wolff, R. (1995, NovemberlDecember). CBT vs. ll..T: Cost comparison model. CBT
Solutions, 14-16.
Hallie Ephron Touger has her own consulting business based in Milton,
Massachusetts. She works with corporate clients to write Web-based and
multimedia information and training. She is a partner in Quarto Interactive, a
Boston-based new media design and consulting company. She was a project
manager at Oakes Interactive, one of the country's leading multimedia
development firms, and a senior consultant at Digital Equipment Corporation
where she developed guidelines and procedures for evaluating training. She
holds a Ph,D. in educational research, measurement, and evaluation from
Boston College.
14 DESIGN-TEAM
PERFORMANCE: METRICS AND THE
IMPACT OF TECHNOLOGY
Larry Leifer
Introduction
For more than fifteen years the Stanford University Center for Design Research
has been concerned with the formal study of engineering product development
teams at work in academic and corporate settings. This chapter describes one
such effort, the implementation of team-based, distance-learning techniques in
a Stanford University School of Engineering course, Team-Based Design-
Development with Corporate Partners. The course is distributed nationally by
the Stanford Instructional Television Network for on-campus, full-time students
and off-campus, industry-based part-time students.
Applied ethnography methods like video interaction analysis have been used
to reveal the detail and pattern of activity of design teams. Computational
support systems have been developed to facilitate their work. Throughout these
studies we have seen that their activity closely resembles the most attractive
aspects of self-paced education described by constructivist learning theorists.
The design environment has been instrumented to see if technical and
behavioral interventions did, in fact, improve performance. It is this learning
validation phase of our work, and its dependence on technology, both for
instructional delivery and for assessing performance outcomes, that we wish to
share with you in this chapter.
298 Design-team Performance: Metrics and the Impact of Technology
Students have been observed to learn in four different ways (see Figure 14-1).
Kolb (1984) proposed that repeated cycles of experiences moving through these
Evaluating Corporate Training: Models and Issues 299
(baS:;~:nin2 )
Active Experimentation ......._ _ _- -
.. Abstract Conceptualization
(design synthesis) (modeling & analysis)
Constructivist Learning
Vygotsky's Model
The course and the web become a model for high-tech instrumented
learning. This extensive use of high-tech delivery systems has led to
formulation of the following PBL assessment instrumentation model. Using the
model, performance studies demonstrate that diverse, distributed teams that
make effective use of electronic communication and documentation services
can, and do, outperform co-located design teams. To achieve this, several lines
of innovation have been introduced systematically in the ME 210 curriculum.
Major changes since 1990 include the following:
304 Design-team Performance: Metrics and the Impact of Technology
Behavior-based restructuring:
1. The conception of engineering as a social activity has been
emphasized.
2. An open-loft community replaced a closed work-room environment.
3. The ftrst ftve weeks of a 30-week development cycle are now
devoted to team building exercises rather than product development.
4. Personal preference inventory scores have been added to the list of
diversity factors used to design peak performance teams.
Technology-based restructuring:
1. Knowledge capture and sharing (WWW access) has been promoted
as a deliverable equal in importance to hardware and software.
2. The use of desktop computers for numerical engineering has been
de-emphasized.
3. The use of laptop computers (mandatory) for design-knowledge
capture and sharing and reuse has been strongly encouraged.
4. Conceptual prototyping (physical and qualitative) is advocated and
detailed design deferred.
Knowledge Work
Discourse Based Thought Based
PRIMARY AUDIENCE others self
SHARE TENDENCY shared NOT SHARED
RECORD TENDENCY NOT RECORDED recorded
ANALOG EXAMPLE conversations personal notes
DIGITAL EXAMPLE e-mail e-notes
ASSESSMENT VALUE potential demonstrated
1. formal reports
2. discourse-based e-mail
3. electronic thinking-notes
Each aspect of PBL benefits from computer and Internet technology in three
ways. First, computer-mediated work is more easily shared than paper-based
work. The need to share information is driven by cooperative learning and by
the real-world context of PBL activity. Second, formal presentations and
documentation are more easily created and disseminated electronically. Third,
electronic documents facilitate learning quality assessment through emerging
content analysis (Mabogunje, 1996) and organization communication pattern
analysis (Mabogunje, Leifer, Levitt, & Baudin, 1995).
PBL pedagogy themes map closely to the activity and issues of real product
development. Accordingly, the framework for our approach to learning
assessment is derived from observational methodology in the design research
community, especially the work of Suchman (1987), Tang (1989), and
Minneman (1991). Their findings told us what to look for while the authors'
experience in flight simulation research suggested the use of an instrumentation
framework. However, in contrast to flight cockpit team emphasis on precision
communication, our design-team observation studies have clearly shown that
the preservation of ambiguity through negotiation is a necessary component of
productive design environments (learning environments). It is also enabled by
computer and WWW-mediatedcommunication technology.
Evaluating Corporate Training: Models and Issues 307
The guiding metaphor for our assessment and evaluation activity is one of
instrumentation. We use this term in the sense of observing both independent
and dependent variables in an automatic feedback control environment similar
to that found in aircraft flight simulators. The model, illustrated in Figure 14-2,
is an adaptation of the visualization first introduced by Atman, Leifer, Olds, &
Miller (1995).
validative
assessment
along these three feedback paths supports triangulation across data sources,
methods, and assessors.
development teams can be addressed in new and productive ways. The union of
an explicit feedback assessment model and advanced technology for
instrumenting and facilitating team activity have begun to yield performance
metrics that can be used by a team to monitor and accelerate its own
productivity. This information is also available to aid project supervisors,
including instructors, in the assessment and prediction of team outcome
performance.
We report two examples of our experience with metrics that have proven
valuable in both understanding and facilitating product-based learning within a
collaborative, team environment. The first is a subjective, behavior-based,
index for controlling the noise of student learning preferences as an input to
team performance (node 4). The other is an objective, content-based,
measurement of work in progress to predict the quality of a team's final
product. Data from work-in-progress assessment is used by teams for self-
assessment (node 3) and by project supervisors as an in-course corrective
mechanism (node 2). Both types of metrics depend on advanced technology in
the workplace for their implementation and acceptance.
The idea of using questionnaires to guide the formation of design teams had its
origins in 1989 in a workshop on creativity for design professors organized at
Stanford by Bernard Roth, Rolf Faste, and Douglas Wilde. To determine the
effectiveness of the workshop, participants were surveyed both before and after
the two-week intensive experience to determine their Gough Creativity Index
(GCI), a measure of creativity devised by the psychologist Harold Gough. The
GCI is an empirically determined linear transformation on the four scores
generated by the Myers-Briggs Type Indicator (MBTI), a questionnaire used
widely for psychological, educational, and vocational counseling (Briggs-Myers
& McCaulley, 1985). As reported by Wilde (1993), the workshop enhanced the
creativity index of the participants and promised to be a useful guide for the
design of effective teams.
In 1990 this idea was extended to ME 210. After responding to the MBTI,
the students were asked to voluntarily form teams composed of individuals
whose MBTI scores differed in at least two of the four variables. Subsequent
performance confirmed that teams having a high GCI member in balance with
other preference profiles tended to win more and better design competition
awards. Initial findings were refined in subsequent years and applied to teams
310 Design-team Performance: Metrics and the Impact of Technology
Since 1977, ME 210 final reports have routinely been submitted for
consideration in the Lincoln Foundation Graduate Design Competition. The
reports are read and evaluated by designers and design professors, who award
twelve prizes to those they consider best. Judges change from year to year and
are blind to the authors and their university affiliations. Since 1991, balanced,
high Gel teams have clearly been more successful at winning Lincoln awards,
not only winning more of them but also producing higher-quality work (Wilde,
1997). Table 14-2 summarizes the awards won during the five-year period of
our experiment and the preceding thirteen (control group) years.
The table shows that the percentage of prize-winning teams from ME 210
doubled, from 29% to 60%, during the experimental period. High Gel teams
had a higher winning frequency (63%) than did the other teams, whose
performance (50%) had already been improved from 29% simply by
diversification. The study shows that the quality of prizes won was also
considerably better for high GCI teams, which won a high proportion of Silver,
Gold, and Best of Program medals (42%), triple that for the other teams (14%).
In the best year to date, 1995, student teams won eleven out of the twelve prizes
awarded nationally. Team preference diversification thus corresponds to the
Evaluating Corporate Training: Models and Issues 311
quantity of prizes won, an indication that distributing human talent among the
project teams raised overall quality, especially at the top.
As correctly pointed out by the grammarian, noun phrases of this type break
several grammatical rules. Viewed differently, they point to the fact that
technical language is itself inventive and potentially a rich source of
information about design-team performance qUality.
The raw data for our study of noun phrases (Mabogunje, 1997) comes from a
set of three reports, named "design-requirement-documents," submitted by
students in ME 210. The projects analyzed were taken from the 1992-1993
academic year and dealt with a wide range of products including a catheter for
gene therapy in the human body, a control mechanism for maintaining the
Evaluating Corporate Training: Models and Issues 313
1000 At
A-
800
600
t
----~~..::::~~::::-.... Bt
400 Grade
200
o
Autumn Winter Spring
Academic Quarter
Figure 14-3. Association between distinct noun phrases and academic grades
association was calculated. For the ten project teams studied, gamma was equal
to 1.0, a perfect ordinal prediction, for the winter quarter. For the spring
quarter, gamma was 0.71 (strong but less than perfect), following intervention
by the teaching team to stimulate the performance of slack teams. We found
that the number of distinct noun phrases in these documents was already very
high in the autumn report, written five weeks into a 25-week develop cycle.
This chapter has described our experience in the design, implementation, and
assessment of the Stanford University School of Engineering course, ME 210,
Team-Based Design Development with Corporate Partners. A theme that
pervades our work is the role of technology both in implementing learning
environments and in assessing performance outcomes. Utilization of Internet
capabilities to facilitate team-based distance learning has changed the
character of the learning environment in profound ways. Learning in PBL
environments is applied within the context of the course. Virtual design teams,
separated geographically but linked by a variety of technology-based
communication tools create actual product prototypes. Work-in-progress data,
again technology instrumented, is used for self-assessment as the work of the
Evaluating Corporate Training: Models and Issues 315
teams evolves. The metrics produced by this data have been shown to predict
quality of the final products. The overall quality of designs produced by these
diverse, collaborative teams stands up well against industry standards as
reflected in the Lincoln Design Competition.
The data for these studies was available because the community is rewarded
for electronic knowledge-capture and knowledge-sharing. Early work on design
knowledge reuse had revealed that designers engaged in redesign scenarios are
persistent and insightful questioners. Documents that are rich in noun phrases
had proven most useful during redesign and were most easily indexed for
automated knowledge retrieval. Successful automation of design knowledge
retrieval has, in tum, given us a metric for assessment of performance as
revealed in project documentation.
Our experience, tools, and metric validation studies only hold for
engineering product development teams working in the ME 210 community
and supported by our knowledge capture and sharing environment. Any
extrapolation to other situations, domains, and communities requires careful
examination. However, we are encouraged to explore the utility of these
methods more widely because, in part, the design-team application domain is
among the most ambiguous and challenging of assessment scenarios.
Inevitably, these results pose more questions than they answer. How robust
is the noun-phrase metric? Can it be manipulated by individuals and teams
without corresponding quality outcomes? Can it or related metrics be used daily
with e-mail and electronic notebook records to provide real-time feedback on
316 Design-team Performance: Metrics and the Impact of Technology
Acknowledgments
Few, if any, of these ideas are entirely free from the influence of friends,
colleagues, and students. I would first like to thank the generations of ME 210
students who have guided and surprised me through the years as I attempt to
understand how they defy entropy, create order, and fine machines. Regarding
the assessment of their performance, Ade Mabogunje, David M. Cannon, and
Vinod Baya have been especially valued contributors. Regarding pedagogy and
the assessment of engineering education, I am particularly indebted to Sheri
Sheppard and Margot Brereton. John Tang and Scott Minneman opened the
doors to ethnographic methodology in design research and established the
observe-analyze-intervene research framework that has guided many
subsequent studies. Perceiving and filling a need, George Toye and Jack Hong
created the PENS notebook environment and helped formulate our high-tech
assessment strategy. Doug Wilde's diligent exploration of the correspondence
between team dynamics and design performance has demonstrated something
of the truth in an old assertion that "it's the soft stuff that's hard; the hard stuff
is easy."
References
Atman, C., Leifer, L.. OIds. B .• & Miller. R. (1995). Innovative assessment
opportunities [videotape]. (Available from the National Technical University
satellite broadcast, Fort Collins. CO.)
Baya, V., Givins. J., Baudin. C., Mabogunje, A., Toye. G .• & Leifer, L. (1992). An
experimental study of design information reuse. In Proceedings of the 4th
International Conference on Design Theory and Methodology (pp. 141-147).
Scottsdale. AZ: ASME.
Evaluating Corporate Training: Models and Issues 317
Brereton, M.F., Sheppard, S.D., & Leifer, L.J. (1995). How students connect
engineering fundamentals to hardware design. In Proceedings of the 10th
International Conference on Engineering Design, Prague, (pp. 336-342) Zurich,
Switzerland: Heurista.
Harb, J., Durrant, S., & Terry, R. (1993). Use of the Kolb Learning Cycle and the
4MAT System in engineering education. ASEE Journal of Engineering
Education.82,(3),70-77.
Hong, 1., & Leifer, L. (1995). Using the WWW to support project-team formation. In
Proceedings of the FIE '95 25th Annual Frontiers in Education Conference on
Engineering Education for the 21st Century, Atlanta, GA.
Hong, J., Toye, G., & Leifer, L. (1995). Personal electronic notebook with sharing. In
Proceedings of the IEEE Fourth Workshop on Enabling Technologies: Infrastructure
for Collaborative Enterprises (WET ICE), Berkeley Springs, WV.
Leifer, L., Koseff, J., & Lenshow, R. (1995, August). PBL White paper. Report from the
International Workshop on Project-Based-Learning, Stanford, CA.
Mabogunje, A., Leifer, L., Levitt, R.E., & Baudin, C. (1995). ME 21O-VDT: A
managerial framework for measuring and improving design process performance. ill
Proceedings of the FIE '95, 25th Annual Frontiers in Education Conference on
Engineering Education for the 21st Century, Atlanta, GA.
318 Design-team Performance: Metrics and the Impact of Technology
Moll, L. (ed.) (1990). Vygotsky and education. Cambridge, UK: Cambridge University
Press.
Suchman, L. (1987). Plans and situated actions: The problem of human machine
communication. Cambridge, UK: Cambridge University Press.
Tang, J.C. (1989). Listing, drawing and gesturing in design: A study of the use of shared
workspaces by design teams. (Xerox PARC Technical Report SSL-89-3) Doctoral
dissertation, Stanford University.
Toye, G., Cutkosky, M., Leifer, L., Tenenbaum, M., & Glicksman, 1. (1994). SHARE:
A methodology and environment for collaborative product development.
International Journal of Intelligent and Cooperative Information Systems. 3 (2), 129-
153.
Wilde, D. J., & Barberet, 1. (1995). A Jungian theory for constructing creative design
teams. Proceedings of the 1995 Design Engineering Technical Conferences. 2(83)
525-30.
Introduction
How do you implement evaluation efforts so that they are considered a normal
activity within your business? The answer is you cannot! If you are a training
operations or human resource person, you do not control business units; you
cannot inculcate evaluation into their culture. But don't worry, sooner or later
they will come and ask you how to find out if training is working. You should
be prepared to answer their questions on the application of training in their
business. The narrative that follows describes how Motorola, at the request of
its business units, has successfully infused level 3 evaluation into its daily
operations.
This is not a theory or a model that someone has created and written about.
Rather it is a case study that describes hours of hard work by a dedicated team
of professionals from Motorola who answered the call from their business units
with respect to application of training on the job. The process described here
may not be applicable to everyone in every situation, but it documents a real
story of how one team infused evaluation into diverse business operations
around the world.
322 Using Quality to Drive Evaluation: A Case Study
Every year Motorola invests over 4.4 million hours of employee time and over
200 million dollars in training. With this kind of investment in people,
Motorola businesses asked training departments for help in determining if the
money they invest in training results in expected performance back on the job.
To address their concern, a cross-functional, cross-sector, global team was
formed. The team decided to use one of Motorola's strengths, the creation of a
Total Customer Satisfaction (TCS) team.
The Team
Since team structure spanned the globe, they held monthly meetings that
varied in location and utilized e-mail accounts and conference calling to ensure
total team participation. Overall, the attendance rate exceeded 97 percent. One
week after each meeting, minutes were distributed. This format allowed the
team to successfully complete over 200 action items over an eighteen-month
period. The first team meeting was a training session. The goals of the meeting
were to become a cohesive team and to understand group dynamics. They
agreed upon meeting format, team rules, a process for resolving conflict, and
team problem-solving methodology. As they progressed through the TCS
process, they identified and completed additional training. The additional
training consisted of skills they needed to solve their problem and included the
following:
Evaluating Corporate Training: Models and Issues 323
• Statistics
• Process mapping
• Cause-and-effect diagramming
• Force-field analysis
• Pareto analysis
• Pugh concept selection
• Evaluation theory and practice
Project Selection
The team defined their span of control and customer groups. They realized that
they actually had two major customer groups: first the businesses, and second
the training community. The team brainstormed a list of ways in which their
customers' training dollars were being wasted. With the results, they created
the following prioritization matrix to help select the project the team would
address.
• Impact on customer
• Need to improve
• Team's ability to impact
The team goal was then defined: to create a process that would enable
businesses to determine if their training investment was resulting in employees
successfully applying skills on the job (level 3 eValuation). To do this they
would develop, pilot, revise, and institutionalize a process throughout Motorola
worldwide.
The team realized that application of skills impacts all of the Motorola
corporate initiatives. In addition, it supported the newest corporate initiative of
Individual Dignity Entitlement. Specifically, it related to the question of
employees having the on-the-job skills and training necessary to be successful.
Analysis Tools
With the problem clearly defined, the team turned to several analysis tools to
help find the source of the problem. This activity was critical in providing the
right solution for the team's customers: the business units and the training
community. Unfortunately, many teams blindly proceed to solution generation
without clearly identifying the root cause of the problem. This results in the
creation of solutions that are wrong or are never adopted. To ensure they did
326 Using Quality to Drive Evaluation: A Case Study
root cause analysis correctly, the Total Customer Satisfaction team chose
several tools currently used by Motorola in its quality activities. This provided
two benefits. First, it allowed them to properly identify the root cause. A
second, and very important benefit was that they were able to talk in the
"language" of their customer when referring to the problem.
No method for
_~,.... ..._ _ _ _ _ _ _ _...,_ _ _ _ evaluating training
investment
Machines
First they drew the "spine" of the fish bone (as the Ishikawa diagram is
sometimes called). Next they added the main causes typically used in this
technique.
• Machines
• Methods
• Material
• People power
Evaluating Corporate Training: Models and Issues 327
They identified possible causes for not having a process the business can use
to evaluate their training invesbnent and prioritized the following primary
causes: lack of quality metrics, process ownership, and limited skills.
The team then turned to their customers to validate the primary causes they
had identified and to gain further insight into the elements of the problem as
perceived by their customers. First they distributed a survey to more than 100
training professionals throughout Motorola businesses worldwide. Figure 15-2
displays the top issues identified by training professionals in a Pareto chart.
A Pareto analysis is a bar graph that charts issues beginning with the most
common on the left and moving to the least common on the right. The team
chose to construct a Pareto diagram because the problem of not using level 3
evaluation consisted of several smaller problems or root causes. This allowed
., ... .q g
= Q)
'.,
8
<Il 0: <Il
:a ~ 0 U u
t
0.0 0:
en en '~ ""
~
;:l
'"
Ql
> ~ .§
~ &! u
Then the team held interviews with their customers from the business
community. They found that the business was concerned about the lack of
quality metrics used to report on their training investment. To understand their
issues more clearly, the team conducted a force-field analysis that exposed the
driving forces and restraining forces that affect the businesses' ability to
evaluate their training investments. The analysis results are illustrated in
Figure 15-3.
Barriers to Success
In defining a solution, the team would have to build upon the driving forces
for greatest impact while working to eliminate as many restraining forces as
possible.
Evaluating Corporate Training: Models and Issues 329
With the root causes validated by their customers, the team asked themselves
a question: "Can we solve the problem using a process that currently exists, or
will we need to create one?" Since their internal research indicated no such
process existed within Motorola, they decided to benchmark other companies.
They selected companies with documented skills application processes. Upon
initial screening they eliminated processes not meeting a minimum of three of
their customer's requirements.
solutions to the problem and choosing one of those as a data point to reference
all others against. In this case, Company A was chosen as a reference point.
Next, the criteria for selecting a solution were listed in the left column. Then
each solution was rated against the reference point using the defined criteria for
selection. When rating, the team used five indicators:
• S indicates the solution and reference point were judged as the same.
• + indicates the solution was judged better than the reference point.
• ++ indicates the solution was judged superior to the reference point.
• - indicates the solution was judged worse than the reference point.
• - indicates the solution was judged vastly inferior to the reference point.
The pluses and minus were then totaled to arrive at a score for each solution.
The solution with the highest positive value was chosen. As a result of the
analysis, the team eliminated Company A due to lack of flexibility, discounted
Company B because it focused on course problems instead of skill application,
and Company C due to cost. The results showed that only one of the proposed
alternatives would address the causes and issues identified by our analysis.
Remedies
The team accepted the challenge to create Motorola's own Skills Training
Application Review process, STAR.
The STAR process is much more than ~ map. It is a guide, complete with
process specifications and examples, that allows users to easily plan, develop,
collect, analyze, and report various types of skills application data. It is flexible
enough to walk users through simple descriptive analyses and still detailed
enough to guide them through more advanced causal comparative and
correctional studies.
Evaluating Corporate Training: Models and Issues 331
oIdentify course
oIdentify evaluation stakeholders oCreate models to analyze data oScreen incoming completed instruments
oInterview stakeholders oReviewlrevise instruments & models oInput data into analysis models
oWrite evaluation plan °Obtain agreement to instnunents & models
°Obtain agreement to plan
-COmmunicate plan
-Determine areas for improvement -Decide upon report content operfolID detailed analysis
-Assign action items -Write & distribute report -Answer evaluation questions
oimplement changes
The team used a modified Quality Function Deployment (QFD) to ensure the
new process would meet their customer's requirements. Quality Function
Deployment (Clausing & Pugh, 1991) is a structured method of determining if
the features of a product or product are aligned with customer needs. The
results of the QFD revealed that their customers liked the proposed process.
With the new process clearly defined and approved by their customers, the
team was ready to pilot. At the same time, to make sure STAR was a
permanent solution, they went back to the root cause analysis and identified
specific issues that needed to be addressed.
Second, they discussed how their business team members could integrate
this process into their operations. The team came up with the concept of a
worldwide STAR network. The concept was to create a group of trained STAR
process personnel from the businesses who actually conduct the level 3
332 Using Quality to Drive Evaluation: A Case Study
evaluations. Members of the STAR network share data among themselves via
e-mail. The Motorola University Evaluation Service Team serves as the central
point for collection of activities, distribution of material, and communication
with members. Figure 15-5 illustrates the STAR Network concept.
The creation of this network led the team to the third remedy, to develop
training for these network members. Since the TCS team created the process
and had training expertise on the team, they could design, develop, and deliver
the training.
Annual
Meetings
Process
Improvement
Grow and
Expand
----rl--- Quality Control
Reports
Trained
Evaluators
Process
Monitoring
Supports Ibe Needs
of the Business
Now that each of the root causes was addressed by the remedies, the TCS
team only needed to create an implementation plan and identify key elements
necessary for rolling out the STAR process and network throughout the world.
The plan consisted of piloting the process and revising it to ensure that it
worked and solved all problems identified in earlier analyses. After the pilots,
worldwide communications via electronic methods and grass roots discussions
got the word out. Finally, people from allover the world were trained in the use
of the STAR process.
After thorough execution of the plan, the team was ready to see if it had
achieved the original goals of the project.
Results
The results were impressive. The team set out to determine if the money
invested in training was resulting in employees successfully applying skills on
the job. The team not only managed to define STAR, but did it while
continuously incorporating customer feedback.
They planned to pilot and revise the process. Their first review for Land
Mobile showed that 91 percent of the participants in a sales management
training course were applying the coaching skills taught in class back in the
field. This review showed an effective training investment of $195,000. On the
basis of this pilot's results, some minor modifications were made to the
planning phase of the STAR process. While this was good news, it got even
better.
The remaining team goal was to transfer the process to businesses by the end
of the first quarter of 1994. They achieved this goal four months earlier than
expected. Individuals from business operations have taken ownership of this
process and are conducting studies of their own. The process measurement tool
indicates they have achieved 100 percent Total Customer Satisfaction while
ensuring quality studies.
1. They are shifting the emphasis from merely getting 40 hours of training (a
Motorola corporate policy) to changing performance back on the job.
2. The STAR process has been benchmarked as best-in-class by companies
outside Motorola.
3. By expanding the STAR network, they are improving cooperation between
business and training.
4. Team members were cross-trained in the STAR process, instructional
design, statistics, problem-solving tools, and key Motorola business issues.
5. The team moved from participative management to empowerment by
effectively implementing the Motorola Total Customer Satisfaction
process.
Institutionalization
When the team agreed to solve this problem, they accepted the responsibility of
making sure improvements were sustainable and permanent. The STAR
network supports and further expands this process throughout businesses.
Evaluating Corporate Training: Models and Issues 335
The team defined a feedback loop, illustrated in Figure 15-6, that helps
continuously improve the STAR process and support tools.
Ongoing
Communication
Quality
Technical
Control
Updates
Reports
Process fan-out has gained the attention of others both inside and out of
Motorola. A Semiconductor Product Sector wellness manager is using the
process to show application of desired wellness behaviors, and Motorola
University is using the tool in projects dealing with school district reform.
So what does all this mean to Motorola? The new STAR process 1) assists
businesses in making sound training investment decisions, 2) helps the training
336 Using Quality to Drive Evaluation: A Case Study
The team believes STAR also helps Motorola achieve its corporate vision of
becoming the premiere employer in the world by affecting their greatest
competitive advantage, their people.
Conclusion
This narrative describes how a team, sponsored and driven by business units,
successfully institutionalized evaluation within the business. The team built
upon and leveraged the quality values and tools that are an integral part of
Motorola's culture. The concept is simple:
• Have your customers (business) own their problem and sponsor efforts to
solve it.
• Form a team of training and business professionals.
• Follow a rigorous process that focuses on customer (business unit)
requirements.
• Collect data to determine the real problem, not the perceived one.
• Develop a solution that will solve the problem and satisfy the customer
(business).
• Pilot the solution and champion it by giving presentations that speak the
language of the business, not training or evaluation terminology.
• Ensure that a permanent owner is assigned to monitor and implement the
solution.
There is nothing magical or special about Motorola's success; it's just hard
work and partnering with the customer.
16 CERTIFICATION: BUSINESS
AND LEGAL ISSUES
Ernest S. Kahane and William McCook
In Utah, the home of Novell Education, with more than 320,000 Novell
certifications awarded worldwide, a second statewide certification month was
declared by the governor in 1996. Novell plans to bring its corporate
certification programs into secondary schools. IBM currently has programs for
more than 35 different software professional certifications. As one writer
comments, "Certification fever has hit the computer industry. And that means
there's a whole lot of training going on" (Filipczak, 1995, p. 38).
This chapter will work through the key issues required to understand
certification within a business context and from an evaluation perspective that
highlights testing and assessment issues. Training and evaluation specialists
have an important role to playas consultants to businesses because certification
involves complex evaluation issues and business risk. Specifically, evaluation
specialists can help their business partners understand what certification is,
when it is appropriate to meet a business need, and how to successfully design
and implement certification programs.
What Is Certification?
Type Definition
Individual certification The process of assessing and recognizing
individuals who demonstrate a predefined level of
proficiency.
For example, medical boards and associations set standards and determine
who can legitimately practice. In return for granting this professional sanction,
society requires the practitioner to serve the public expertly. "Professionals are
supposed to represent the height of trustworthiness with respect to technically
competent performance and fiduciary obligation and responsibility" (Barber
342 Certification: Business and Legal Issues
1983, p. 131). Malpractice suits and public criticism of the professions only
make explicit how fundamental these criteria are. The rationale for certifying
competence is to protect the public.
Vendor Certification
resources to assure that applicable laws and regulations are followed, business
resources to assure that the certification program provides a market advantage
and is economically feasible, and evaluation resources to provide specialized
expertise in test design and implementation, as well as the integration of the
business, technical, and legal requirements.
The business use of certification testing determines how the standards discussed
above are applied. Two possible business uses of certification are selection and
development/classification. Selection involves using the results of certification
testing as the primary basis for making hiring, promotion, or termination
decisions. Certification testing used for development or classification purposes
identify skill levels that are then used to guide skill classification and individual
development discussions. Table 16-3 describes the validation requirements,
risk, and cost for these two business uses of certification.
Clearly, business uses involving selection carry a higher risk and require a
higher investment to ensure a proper validation study. When legal issues
related to equality of opportunity are involved, an organization must present
evidence that the test directly measures the tasks or skills required of the job.
For many knowledge tests where the measurement is indirect, additional
evidence of validity, such as predictive validation, may also be required. When
however, the principal purpose of certification is individual development, the
rigor and requirements for establishing validity are less stringent.
Standards
Type Definition
Reliability Reliability refers to consistency in scores and processes. High
reliability assures that test scores are not influenced by factors
unrelated to the purpose of the test, e.g., practice taking tests,
skills unrelated to the content of the test.
Content Content validity measures how well a test samples the universe
Validity of behavior, knowledge, or skill under certification. For a test to
be valid, its design and delivery process must ensure that all
knowledge, skill, or performance areas key to certification are
adequately represented in the measurement device. Content
validity is the most common validation approach for certification
testing.
Predictive Predictive validity refers to how well the test score predicts
Validity future performance in the area certified. Although less common
than content validity because it requires more time and effort,
demonstrated predictive validity is required for certification tests
used for job selection where the test may not directly measure job
skills.
Business Use
Selection Development/Classification
Validation Content, construct, Content and/or construct
Requirements and/or criterion validation.
validation.
Professional Standards
Two of the principal sets of professional guidelines for the development and use
of testing are The Standards for Educational and Psychological Testing
(American Psychological Association, 1985) and Principles for the Validation
and Use of Personnel Selection Procedures (Society for Industrial and
Organizational Psychology, 1987). The standards provide technical guidelines
for the development and use of tests, including certification testing. Although
the standards are more widely known, the principles are equally important
because they reinforce the standards and explain in detail how to determine
passing scores for certification tests.
Legal Considerations
Adverse impact occurs when test results produce a pattern of selection that
does not represent the proportions of protected groups in the population, such
as specific minority groups and females. According to EEOC guidelines, if the
testing tool cannot be shown to be a direct measure of a job task or skill, a
validation study that includes protected groups would be performed to ensure
that adverse impact has not occurred. Members of the protected groups should
include both knowledgeable individuals and novices for comparisons purposes.
The study should demonstrate that the selection rates generated by use of the
testing tool mirror the general population and protected groups population
rates.
The evaluator's role is to assist the business in knowing when to apply the
Uniform Guidelines to certification programs. The key criterion for the
evaluation specialist is to know if certification will be applied as a selection
procedure. A selection procedure occurs when a test score is used as a basis for
employment decisions. The guidelines define any of the following employment
decisions as selection procedures: "hiring, promotion, training, retention,
transfer, membership, compensation, termination, discipline or job assignment"
(Equal Employment Opportunity Commission, 1978, Sn. 2B). In a situation
where selection is anticipated, the evaluation specialist needs to ensure that a
documented validation study is in process (or in place) that demonstrates the
relationship between performance on the test and related job performance. The
study should explore methods of testing or certifying expertise that are less
susceptible to adverse impact.
typically must include information such as the purposes and process of data
collection; and how and where the data will be stored, accessed, and used. In
certain European countries, restrictions exist regarding the transmission of
personal data across international borders. Some countries, for example
Germany, require prior approval by company worker councils before personal
data collection can proceed.
Business Policies
StaIttlards for the development of certification tests have been adopted and
published by testing organizations, testing vendors, industries, and professions
that offer certification as part of the credentialing process. While descriptions
of stages vary slightly by organization, the test development process itself is
well established. The stages are summarized in Table 16-4 and then briefly
discussed. The development of the test blueprint and the determination of a
350 Certification: Business and Legal Issues
Stage Activity
1 Job analysis and Identify and describe the skills and knowledge
knowledge specifically related to successful job perfonnance in the
specification certification area.
2 Content domain Using the specifications from Stage 1, develop content
definition categories and learning objectives.
3 Test blueprint Content experts rank the importance of the objectives
definition and content to perfonningjob tasks; priorities are
reflected in a blueprint that identifies number of test
items per obiective, within content areas.
4 Item development Content and testing experts together develop test items
designed to measure blueprint objectives, with the
number and types of items based on the priorities
established in Stage 3.
5 Item review and Content and test development experts review and revise
revision the test items.
6 Pilot test Test is administered to audience comprising content
qualified and less-qualified participants. Based on
results, questions are selected for exam. Questions are
selected to ensure reliability.
7 Detennine passing Evaluation consultants detennine a passing or cutoff
score score, based on results of qualitative review and/or
quantitative results.
8 Administration Test is deployed and results are analyzed on a regular
and maintenance basis to improve and update item pool.
The first step is to identify and describe the skills and knowledge specifically
related to successful job perfonnance in the certification area. Using
salespeople as an example, the task analysis and knowledge specification might
identify the three key areas of responsibility, work, and skills and knowledge as
described in Table 16-5.
Evaluating Corporate Training: Models and Issues 351
Using the specifications from the task analysis and knowledge specification,
content categories and learning objectives are developed. Content categories
refer to the areas of knowledge that must be certified. Learning objectives are
the level of knowledge needed for certification. Objectives describe in
measurable terms how individuals can demonstrate their knowledge or skill.
The domain definition is a refinement of the first stage of test development. For
example, the objectives for salespeople might include their ability to
Content and testing experts together develop test questions. Test items and
tasks are designed to measure the blueprint objectives, with the number and
type of items based on the priorities established in the test blueprint. For
example, test items for Objective 2 may include not only multiple-choice
questions, but also performance tasks such as a presentation on the strategies,
products, and business models of potential channels.
The test should be pilot tested with groups of individuals who are
knowledgeable and who are not knowledgeable about the test content. Pilot data
should be analyzed at the group and item level. Item statistics should
demonstrate high difficulty for less knowledgeable individuals and low to
moderate difficulty for knowledgeable individuals. Items should demonstrate
the ability to discriminate between the two groups. Items that do not
discriminate or do not demonstrate appropriate difficulty levels should be
reviewed and considered for revision or deletion from the test.
Evaluating Corporate Training: Models and Issues 353
Objective 2: 4,5,8,
Describe the strategies, 10,11
products, and business
models of potential
channels.
At the group level, average or mean scores between the two groups must
differ. The midpoint of the average difference between the two groups for the
remaining items can be used as one data point in establishing an approximate
passing score.
Another theme that is commonly found in case law has to do with the
reasonableness of the passing score. The administrative body that sets the
passing score must show that there is a relationship between the passing (or
cutoff) score and the purpose of the examination (Cascio et al., 1988, p. 3).
This concept of reasonableness is supported by the Uniform Guidelines on
Employee Selection Procedures published by the U.S. government's Equal
Employment Opportunity Commission (1978). The uniform guidelines state
that "Cutoff scores should normally be set so as to be reasonable and consistent
with normal expectations of acceptable proficiency within the workforce"
(Equal Employment Opportunity Commission, 1978, Sn. 5H).
Cascio, Alexander, and Barrette (1988, pp. 8-9) highlight a number of other
criteria used by the courts in determining what makes a good cutoff (or passing)
score. As related to certification tests, these include the following criteria:
• The cutoff score should be consistent with the results of a job analysis.
• The passing score should permit the selection of qualified candidates.
• Expert opinion/recommendation should be considered in setting the score.
• The passing score must have a clear rationale.
The case law (Cascio, Alexander, & Barrette, 1988, p. 8) on setting passing
scores suggests that there is "no single, mechanical, quantitative approach that
is accepted or required by the courts." However, a typical quantitative approach
is to test two groups, one made up of individuals expected to possess the
required skills and the other of novices. The midpoint between the test score
averages of each group provides an approximate passing score.
New entrants in the certification training market like the Chauncey Group
are advertising higher levels of competence testing. One can foresee an ever
growing ladder of certifications that assess and recognize progressively higher
levels of competence.
For many business certifications it makes sense to ask the question, What is
the value of the certification? As customers become more informed about the
costs and benefits of certification, they will be better equipped to assess the
value of certification in specific contexts. Already, businesses and employees
are starting to recognize where certification programs make sense and where
they do not.
References
Angoff, W.H. (1971). Scales, norms and equivalent scores. fu R. L. Thorndike (Ed.),
Educational measurement. Washington, DC: American Council on Education.
Barber, B. (1983). The logic and limits of trust. (USA): Rutgers University Press.
Cascio, W.F., Alexander, R.A., & Barrett, G.V. (1988). Setting cutoff scores: Legal,
psychometric, and professional issues and guidelines. Personnel Psychology. 4 (1),
1-24
Dataquest. (1995). Education and training: Market analysis and outlook. Author.
Doyle, T.e. (1996, July). Who you gonna call: An inside look at the topsy-turvy world of
training and certification. VAR Business, 128.
Harrison, B. (1994). Lean and mean: The changing landscape of corporate power in the
age of flexibility. New York: Basic Books.
Jones, P. (1994). Three levels of certification testing. Performance & Instruction. 33 (9),
22-28.
Society for Industrial and Organizational Psychology, Inc. (1987). Principles for the
validation and use of personnel selection procedures ( 3td ed.). College Park, MD:
Author.
Waterman, R.H., Waterman, 1., & Collard, B.A. (1994). Toward a career resilient
workforce. Harvard Business Review. 72 (4), 87-95.
Over the past decade education, particularly elementary education, has been
redefining the work of teachers, schools, students, and classrooms. This
redefinition has been fueled by a confluence of events involving new
understandings about learning and the importance of connecting the work of
schools with everyday life and with the work of business and industry. A set of
assumptions has emerged from this amalgam of political, social, economic, and
educational events about the nature of learning and how to evaluate student
learning; these assumptions emerged in a somewhat piecemeal fashion, and
since the mid-1980s have helped direct present reforms in education. It is
important to examine past assumptions about school-based learning as well as
present ones that are now quite influential in shaping the ways in which
schools are evaluating students' performance.
Central to how many schools and school districts are rethinking what happens
in classrooms is the recognition that learning, not just teaching or testing, is at
the heart of the work of schools. This seems like an obvious statement. Hasn't
learning always been what schools are about? No, surprisingly it is not. In fact,
a persuasive argument could be made that from the early 1900s until quite
recently schools have not put learning at the center of a child's school
experience, but rather standardization of what was taught.
Behavioral objectives are statements about the discrete skills a child should
be expected to learn within a specified span of time at a particular grade level.
A behavioral objective in elementary mathematics might state that a child
should be able to count forward and backward between one and one hundred by
the end of first grade or know the multiplication tables through the nines by the
middle of fourth grade. These objectives were generally put together not by
teachers but by textbook publishers in consultation with professors of education.
They provided teachers and administrators with a detailed set of benchmarks or
goals in every subject of what was to be learned when by the average child.
How did we verify children were learning their general fund of knowledge
set out in behavioral objectives? Testing mastery of these benchmarks was a
fairly straightforward task because of the fact-oriented nature of classroom
lessons-knowing dates. places, rules of grammar, arithmetic algorithms,
routine procedural skills, definitions, and vocabulary. Generally. testing
required no more than a standardized paper and pencil format, often provided
as part of the textbook in use, where students could fill in or circle the correct
answers, or mark a statement true or false.
From the early 1900s until the late 1980s standardized test scores were
commonly the evidence upon which program and student success was
determined. If students scored well on a particular test. one could assume they
had learned the subject matter at hand. For those who did not score well.
judgments were easily made that they had not studied or learned the material.
were not good students. were underachievers. or did not like school. Along the
way schools, school districts, and to some degree teachers became increasingly
accountable to public-policy makers and education officials on the basis of their
students' test score results.
Between 1972 and 1985. the number of mandated state testing programs
grew from 1 to 34; by 1989 every state had some type of mandated testing
program (National Summit on Mathematics Assessment. 1991 p. 5). Public-
policy makers, school officials. and parents insisted on using test results to
judge the merits of an educational experience. A score-based system of
Evaluating Corporate Training: Models and Issues 363
evaluation provided an efficient way to rank and sort students and schools, as
well as to define school success for students, parents, teachers, administrators,
public-policy makers, and, interestingly enough, real-estate agents. This came
to be increasingly the case despite mounting evidence as to the prejudicial
nature of standardized testing instruments with respect to minority children and
the rash of grade inflation, the so-called Lake Wobegon effect, which suggests
that "all" children are above average.! The role of standardized testing in
defining learning and learning potential has been a contentious subject
throughout history of public education. Reducing a child's educational
experience and his or her potential for future educational experience to a single
point value or percentile, while reassuring to some, has been horrifying to
others.
In the mid-1980s, amid controversies over the role of tests and test scores,
increasing student diversity in America's urban classrooms, and a clamor to
increase the length of the school day and school year, a number of factors
converged, which sparked the forces of educational reform that we are
continuing to experience today. The flrst was the dismal performance of
American children in two subject areas, mathematics and science, thought to be
key to this country's economic competitiveness in an increasingly global
economy.
The second was that as test scores fell well below many of this country's
industrial competitors, including Japan and Germany, concerns began to be
raised about the national impact of a student's kindergarten through grade 12
or college experience. Even students who graduated from U.S. high schools and
colleges with acceptable, if not excellent, grade-point averages arrived at the
workplace or in graduate school programs lacking the abilities and skills to
solve problems, to reason through complex and diverse data, to perceive issues
from a variety of perspectives, and to communicate effectively. Business and
industry had to expend resources teaching what they assumed should already
have been taught-how to solve problems. One policy response forged between
business and the government to these shared concerns was former Secretary of
Education Lamar Alexander's national plan for education contained in Goals
2000.
Evaluating Corporate Training: Models and Issues 365
Third, while reports were lamenting the quality of schools and the
experience of schooling, research activity among cognitive scientists produced
information that was to change the way we would think about how learning
occurred and would ultimately significantly impact the way schools went about
the business of teaching and testing. This research described the need for
children and adults to construct their own understanding of concepts and
relational ideas because information and understanding is stored in large
cognitive networks that develop continuously. At any given time we have
within our mental structures a weblike pattern formed from in-school and out-
of-school experiences. Each central point in this network of mental activity
represents our conceptual understanding of abstract ideas such as government,
arithmetic operations, temperature, color, and so on. Thus the process of
learning involves providing opportunities to fit new information into previous
ideas as our cognitive networks become increasingly sophisticated over time
through multiple experiences that are social and interactive in nature and that
involve elements of conflict, contradiction, and consequence.
However, we now understand that memorization, while useful for the short
term, does not have long term "staying power" within our mental structures. In
addition, the passivity of students in the learning process does not instill habits
of inquiry needed for a lifetime. We have only to remember all of the hours
spent in foreign language classes to know the limitations of memorization as
proof of learning for the long haul. Likewise, rote and memorized learning
goals create a set of instructional practices focused on individual learning
rather than learning that requires the interactivity and social exchanges that are
more reflective of the reality of today' s work environments.
366 Lessons from Education
While the hue and cry over low test scores and insufficient preparation
around critical thinking were perhaps the most audible complaints in the mid-
1980s, a fourth concern about education was beginning to take shape with the
emergence of the information society. With the exploding growth of
technology, the shift to a global economy, and an increasingly information-
dependent society, there was a growing realization that learning must be
ongoing throughout the course of one's lifetime, not just in the formal
educational years. The concept of lifelong learning emerged and, along with it,
the need to look at how elementary and high schools were preparing students
with the skills and abilities to become lifelong learners. A consensus grew
across business and industry that American workers, for reasons of everyday
literacy and economic competitiveness, had to be forced to become competent
in processing large amounts of information. Never again would one learn all
the information needed for a lifetime of productive living and working in a
formal school setting. Never again would work and learning be seen as separate
activities in schools charged with passing on a specific fund of knowledge. The
factory model of education was waning. Schools needed to change.
Changing to What?
2. A round and a square cylinder share the same height. Which has
the greater volume?
Background: Manufacturers naturally want to spend as little as possible, not only on the
product, but on packing and shipping it to stores. They want to minimize the cost of
production of their packaging, and they want to maximize the amount of what is
packaged inside (to keeping handling and postage costs down: the more individual
packages you ship, the more it costs).
Setting: Imagine that your group of two or three people is one of many in the packing
department responsible for M&M's candies. The manager of the shipping department
has found that the cheapest material for shipping comes as a flat piece of rectangular
paperboard (the piece of posterboard you will be given). She is asking each work group
in the packing department to help solve this problem: What completely closed container,
built out of the given piece of posterboard, will hold the largest volume of M&M's for
safe shipping?
A Senior Exhibition
Exhibitions are not graded using traditional letter grades. Students have to
demonstrate a level of proficiency on the publicly-stated criteria in five areas:
personal responsibility, critical thinking, writing, public speaking, and
multimedia. If they do not, students revise and repeat their exhibition until it is
judged proficient. Upon completion of the exhibition, the student's advisory
committee members, generally made up of three teachers from different subject
areas, meet with the student to provide feedback. The student receives a copy of
the scored rubric and a written narrative addressing the student's work.
For example, you might want to ask how everyone is going to determine what recycling
categories to have ... paper, plastic, and glass? Other categories? Will students weigh
each kind of recycling materials or will they count them? Or will they do some kind of
combination? What makes the most sense? Students are to explain why they choose the
units they did.
When the students have brought their recycling information to school and you have
publicly posted the information, have each child build a graph that displays this
information.
After completing the graphs, have your students write letters to their parents describing
and interpreting their graphs. You may also ask your students to write about what the
graph does not tell about what we recycle.
Colleges are beginning to define what they see as a mismatch between what
higher education measures--Carnegie units and end-of-course grades---and the
direction of change in K-12 education. As elementary and high schools
develop new standards for what students should know and be able to do as well
as new ways for students to demonstrate that information, college admission
systems will follow suit. Changes in the ways that institutions of higher
374 Lessons from Education
This chapter has offered a perspective on the thoughts and actions of education
reform over the last fifteen years around issues of student evaluation. What are
the lessons learned to date? Are any of the lessons transferable to learning
organizations in general? Below I have identified ten such lessons; no doubt
there are more, and more will evolve in schools that are serious about
understanding the best ways to measure human performance within the context
of kindergarten through grade twelve.
References
Cannell, 1.1. (1989). The "Lake Wobegon" report: How public educators cheat on
standardized achievement tests. Albuquerue, NM: Friends for Education.
Ewell, P. (1996). Speaking virtually: Assessment and the Western Governor's New
University. Assessment Update. 8(5), 10.
Jervis, K. (1991). Closed gates in a New York City school. In Perrone, V., (Ed.).
Expanding Student Assessment. Alexandria, VA: Association for Supervision and
Curriculum Development.
Moon, J. (1993, October 28). Common understandings for complex reforms. Education
Week.
Moon, 1., & Schulman, L. (1995). Finding the connections: Linking assessment.
instruction. and curriculum in elementary mathematics. Portsmouth, NH:
Heinemann.
Sills-Briegel, T., Fisk, C., & Dunlop, V. (1996). Graduation by exhibition. Educational
Leadership. 54(4), 66-75.
Wiggins, O. (1992). Creating tests worth taking. Educational Leadership. 49(8), 26-33.
Wiggins, O. (1993). Assessing student performance: Exploring the pur.pose and limits of
testing. San Francisco: Jossey-Bass.
Databases, 127
Data collection E
in feedback, 30
in return on investment, 122-124 Early module tryout, 177-178
timing of, 122 Economic shifts, 7-8
Dataquest,337-338 Economy, global, 7
Decision making, ethical, 242-244, 250-251 Education, 359-375
Deming, 68 behavioral objectives in, 361-362
Demographics, 4-6
382 Index
evaluation report development in, 136-137 micro vs. macro evaluation of, 52, 52t,
evaluation targets in, 130t, 130-131 53f
formula for, 128 psychological makeup of, 49
in impact evaluation strategy, 159-160 society as, 43
intangible benefits in, 129 Standards for Educational and
isolating training effects in, 124-126 Psychological Testing. 344
as a learning tool, 134 Stanford University Center for Design
management education in, 135 Research, 297-298
monitoring progress in, 136-137 Strategic context, 20-22
necessity of, 14 Strategic perspective, training, 20-27
policy development in, 133-134 initiatives in, 23-26
responsibility assignment in, 131-133 redefinition of training in, 26
staff education in, 135 strategic context in, 20-22
staff involvement in, 134 strategic lever in, 22-23
tabulating program costs in, 127-128 Strategic systems model, 28-29, 29f
technical support function in, 132-133 Strategic training initiatives, 23-26
Return on investment (ROI), 113-137 at Applied Materials, 24-25
preliminary evaluation information for, at Xerox, 23
122-123 Summative evaluation, 168t, 168-169
process model of, 121, 121 f Surveys, 123
reasons for, 118-119
widespread use of, I 13
Review network, 170-171 T
Rewards, 103
extrinsic, 105 Target markets, 45f, 45-46
Role-playing, in cross cultural effectiveness, linkages of, 47
275 Team performance, 31
Role responsibility, 244-245 Team-preference profile management,
professional obligations and, 245 309-311
of training professionals, 244 Technology-based content, 280
Technology-based delivery, 284-287. See
also Training, technology-based
s delivery
cost comparisons vs. conventional,
Sanders, on ethics improvement, 252 286t, 286-287
Saving face, 267 deployment in, 293-295
Schon, Donald, 210-211, 219-220 development in, 291-292
Science, applied, 210 effectiveness of, 285
Security, in technology-based delivery, 284 efficiency of, 285-286
Self-esteem, 30-31 evaluation in, 287-295 (See also
Self-paced material Technology-based evaluation)
optimization of, 181 planning in, 288-291
in prototype testing, 176-177 Technology-based evaluation, 287-295
Service, 45-46 activity overview in, 288, 288t
Service management, elements of, 60 business case planning in, 289
Seven-S categories, 68--69, 69t cost justification in, 291
Share (MadeFast experiment), 302 needs assessment planning in, 289
Single-loop learning, 33, 33f, 35 pilot testing in, 292
Skills assessment, 35 prototype testing in, 291-292
Skills Training Application Review process requirements definition planning in,
(STAR), 330-332, 331 f, 332f 289-290, 290t
Smith Corporation, 104 Web-based training in, 290t
Software, 280 Test development. See Certification, test
Stakeholders, 41--60 development in
balancing power of, 47 Testing, for evaluation, III
expectations of, 44, 45f Thai culture, 268-269
identification of, 42 Theory, vs. research, 212
linkages of, 47 Theory-in-use, 33-34
Evaluating Corporate Training: Models and Issues 389