You are on page 1of 10

CHAPTER 6

TRAINING EVALUATION
Chapter 6 focuses on the evaluation of training programs and learner outcomes. It explains the
criticality of evaluating whether the training has accomplished its objectives and, particularly,
whether job performance and organizational results have improved as a result. Formative and
summative evaluation are discussed and compared and reasons for evaluating are identified. The
process of evaluating training is outlined and outcomes used to evaluate training are described in
some detail. Kirkpatrick’s four-level model incorporating four major levels of evaluation is
highlighted; and five major categories of outcomes are presented more extensively. Another
important issue, regarding how good the designated outcomes are, is addressed. Perhaps most
importantly, evaluation designs, important elements of evaluation design and the preservation of
internal validity are discussed as well as the calculation of return on investment for the training
dollar. In an environment of accountability, knowledge of how to show return on investment is
invaluable. Further, this chapter gives the student knowledge of the various evaluation strategies
and how to choose an approach. A list of Key Terms, Discussion Questions, and Application
Assignments follow the end of the chapter.

Objectives

As a result of reading and discussing this chapter, students should be able to

1. Explain why evaluation is important.


2. Identify and choose outcomes to evaluate a training program.
3. Discuss the process used to plan and implement a good training evaluation.
4. Discuss the strengths and weaknesses of different evaluation designs.
5. Choose the appropriate evaluation design based on the characteristics of the company and the
importance and purpose of the training.
6. Conduct a cost-benefit analysis for a training program.

I. Introduction

A. Training effectiveness refers to the benefits that the company and the trainees experience
as a result of training. Benefits for the trainees include learning new knowledge, skills,
and behaviors. Potential benefits for the company include increased sales, improved
quality and more satisfied customers.
B. Training outcomes or criteria refer to measures that the trainer and the company use to
evaluate training programs.
C. Training evaluation refers to the process of collecting data regarding outcomes needed
to determine if training objectives were met. Training outcomes or criteria refer are
measures that are used to determine the affect training has had.
D. Evaluation design refers to from whom, what, when and how information is collected to
determine the effectiveness of the training program.

52
II. Reasons for Evaluating Training

A. Formative evaluation refers to evaluation conducted to improve the training process,


including ensuring that the training program is well-organized and runs smoothly and that
trainees are learning and are satisfied with the training.
1. Pilot testing is the process of previewing a training program with potential trainees
and their managers, or other customers. The pilot testing group is then asked to
provide feedback about the content of the training as well as the methods of delivery.
This feedback enables the trainer to make needed improvements to the training.
B. Summative evaluation is evaluation conducted to determine the extent to which trainees
have improved or acquired knowledge, skills attitudes, behaviors, or other outcomes
specified in the learning objectives, as a result of the training.
C. Reasons training programs should be evaluated:
1. To identify the program’s strengths and weaknesses, including whether the program is
meeting the learning objectives, the quality of the learning environment, and if transfer
of training back to the job is occurring.
2. To assess whether the various features of the training context and content contribute
to learning and the transfer of learning back to the job.
3. To identify which trainees benefited most of least from the program and why.
4. To gather information, such as trainees’ testimonials, to use for marketing training
programs.
5. To determine financial benefits and costs of the program.
6. To compare the costs and benefits of training versus other human resource
investments.
7. To compare the costs and benefits of various training programs in order to choose the
most effective programs.

III. Overview of the Evaluation (or Training) Process (see Figure 6-1, p. 200 of text)

A. Conduct a needs analysis.


B. Develop measurable learning outcomes.
C. Develop outcome measures.
D. Choose an evaluation strategy.
E. Plan and execute the evaluation.

IV. Outcomes Used in Evaluating Training Programs


A. Kirkpatrick’s four-level model (see Table 6-1, p. 201) suggests training can be evaluated
on the following levels:
1. Reactions level, which focuses on trainee satisfaction.
2. Learning level, which focuses on the acquisition of knowledge, skills, attitudes and/or
behaviors.
3. Behavior level, which focuses on improvement in job performance or behaviors.
4. Results level, which focuses on whether desired business results were achieved as a
result of the training.
a. Levels 1 and 2 measures are collected before trainees return to their jobs.

53
b. Levels 3 and 4 criteria measure the extent to which the training transfers back to
the job.
B. More comprehensive models of training criteria, incorporating such training outcomes as
attitudes, motivation and return on investment, are needed.
C. Training outcomes are classified into five major categories (see Table 6-2, p. 202):
1. Cognitive outcomes demonstrate the extent to which trainees are familiar with
information, including principles, facts, techniques, procedures, and processes,
covered in the training program.
2. Skill-based outcomes assess the level of technical or motor skills and behaviors
acquired or mastered. This incorporates both the learning of skills and the application
of them (i.e., transfer).
a. Skill learning is often assessed by observing performance in work samples such as
simulators.
b. Skill transfer is typically assessed by observing trainees on the job or managerial
and peer ratings.
3. Affective outcomes include attitudes and motivation
a. Reaction outcomes refer to the trainees’ perceptions of the training experience,
including the content, the facilities, the trainer and the methods of delivery (see
sample, Table 6-5, p. 204). These perceptions are typically obtained at the end of
the training session via a questionnaire completed by trainees, but usually are only
weakly related to learning or transfer.
b. An instructor evaluation measures a trainer’s or instructor’s success.
c. Other affective outcomes include tolerance for diversity, motivation to learn,
attitudes toward safety, and customer service orientation. The attitude of interest
depends on training objectives.
4. Results are those outcomes used to determine the benefits of the training program to
the company. Examples include reduced costs related to employee turnover or
accidents, increased production, and improved quality or customer service.
5. Return on Investment involves comparing the training program’s benefits in
monetary terms to the program’s costs, both direct and indirect.
a. Direct costs include salaries and benefits of trainees, trainers, consultants, and any
others involved in the training; program materials and supplies; equipment and
facilities; and travel costs.
b. Indirect costs include office supplies, facilities, equipment and related expenses
not directly related to the training program; travel and expenses not billed to one
particular program; and training department management and staff salaries not
related to a single program.
c. Benefits are the gains the company receives from the training.

V. Determining Whether Outcomes Are Good

A. Criteria relevance refers to the extent to which training outcomes appropriately reflect
the content of the training program. The learned capabilities needed to successfully
complete the training program should be the same as those required to successfully
perform one’s job.

54
1. Criterion contamination means that the training evaluation measures reflect
capabilities that were not covered in the training and/or the measurement conditions
are different than the training conditions.
2. Criterion deficiency refers to the failure of the training evaluation measures to reflect
all that was covered in the training program

B. Reliability is the degree to which training outcomes can be measured consistently, be it


over time, across raters, or across parallel measures. Predominantly, we are concerned
with consistency over time, such that a reliable test contains items that do not change in
meaning or interpretation over time.
C. Discrimination refers to the degree to which trainees’ performance on an outcome
measure actually reflects true differences in performance; that is, we want the test to
discriminate on the basis of performance and not other things.
D. Practicality is the ease with which the outcome measures can be collected. Learning, job
performance, and results level measures can be somewhat difficult to collect.

VI. Evaluation Practices

A. Figure 6-3 (p. 208) indicates that reactions and cognitive level outcomes are the most
frequently used outcomes in training evaluation, with results level evaluations conducted
in only 7% of firms.
B. An evaluation limited to reaction and cognitive level outcome measurements does not
assess whether transfer of training has occurred.
C. Outcome measures are largely independent of each other; you cannot assume that positive
reactions to the training program mean that trainees learned more and will apply what they
learned back on the job.
D. To the extent possible, evaluations should include measuring job behavior and results level
outcomes to determine whether transfer of the training has occurred.
E. There are three types of transfer:
1. Positive transfer is demonstrated when learning occurs and job performance and
positive changes in skill-based, affective, or results outcomes are also observed. This
is the desirable type of transfer.
2. No transfer of training is demonstrated if learning occurs, but no changes are
observed in skill-based, affective, or learning outcomes .
3. Negative transfer is evident when learning occurs, but skills, affective outcomes, or
results are less than at pretraining levels.

VII. Evaluation Designs: The design of the training evaluation determines the confidence that
can be placed in the results. No training evaluation can be absolutely certain that the results of
the evaluation are completely true.

A. Threats to validity: Alternative explanations for evaluation results.


1. Internal validity is the believability of the study.
a. It is the extent to which we can isolate training as the cause of a change in
performance.

55
b. Threats to internal validity include characteristics of the company (e.g., history);
the outcome measures (e.g., instrumentation, testing); and the individuals involved
in the evaluation (e.g., maturation, regression toward the mean, mortality, and
initial group difference). (See Table 6-6, p. 210).
2. External validity refers to the generalizability of the evaluation results to other
groups and other situations. Threats to external validity include how participants react
to the pretest, how they react to evaluation, an the interaction of selection and
training, and the interaction of methods.
B. Methods to control for threats to validity:
1. Use pre- and post-tests to determine the extent to which trainees’ knowledge, skills
or behaviors have changed from pre-training to post-training measures. The
pretraining measure essentially establishes a baseline.
2. Use a comparison (or control) group (i.e., a group that participates in the evaluation
study, but does not receive the training) to rule out factors other than training as the
cause of changes in the trainees. The group that does receive the training is referred to
as the training group or treatment group. Often employees in an evaluation will
perform higher just because of the attention they are receiving. This is known as the
Hawthorne effect.
3. Random assignment refers to assigning employees to the control and training groups
on the basis of chance. Randomization helps to ensure that members of the control
group and training group are of similar makeup prior to the training. Randomization
can be impractical and/or even impossible to employ in company settings.
C. Types of evaluation designs (see Table 6-7, p. 212) vary as to whether they include a
pretest and posttest, a control or comparison group and randomization. The chapter
provides an example of each design.
1. The posttest only design involves collecting only posttraining outcome measures. It
would be strengthened by the use of a control group, which would help to rule out
alternative explanations for changes in performance.
2. The pretest/posttest design involves collecting both pretraining and posttraining
outcome measures to determine whether a change has occurred, but without a control
group which helps to rule out alternative explanations for any change that does occur.
3. The pretest/posttest with comparison group design includes pretraining and
posttraining outcome measurements as well as a comparison group in addition to the
group that receives training. If the posttraining improvement is greater for the group
that receives training, as we would expect, this provides evidence that training was
responsible for the change.
4. The time series design involves collecting outcome measurements at periodic intervals
pre- and posttraining. A comparison group may also be used. Time series allows for
an analysis of outcomes, e.g., accident rates, productivity, etc., over time to observe
any changes that occur (see Table 6.9, p. 215). The strength of this design can be
improved by using reversal, which refers to a time period in which participants no
longer receive the training intervention.

56
5. The Solomon Four-Group design combines the pretest/posttest comparison group
design and the posttest-only control group design. It involves the use of four groups:
a training group and comparison group for which outcomes are measured both pre-
and posttraining and a training group and comparison group for which outcomes are
measured only after training. This design provides the most controls for internal and
external validity, but is also the most difficult to employ.
D. Considerations in choosing an evaluation design
1. Factors that influence the type of evaluation design used (see Table 6-11, p. 217):
a. Change potential: Can the program be modified if needed?
b. Importance: Does ineffective training affect major variables, such as customer
service or product development?
c. Scale: How many trainees are involved?
d. Purpose of the training: Is training conducted for learning, results or both?
e. Organization culture: Is accountability, or showing results, part of the company’s
norms and expectations?
f. Expertise: Do parties involved have the skills to conduct a rigorous analysis?
g. Cost: What design can the company afford?
h. Time frame: When do we need the information?
2. Evaluation designs without pretesting or comparison groups are most appropriate
when you are interested only in whether a specific level of performance has been
achieved, and not how much change has occurred.
3. The pretest allows for the examination of how much change has occurred. The
comparison group allows for the isolation of training as the likely cause of the change.

VIII. Determining Return on Investment

A. Cost-benefit analysis of training is the process of determining the net economic benefits
of training using accounting methods. Training cost information is important for several
reasons:
1. To understand total expenditures for training, including direct and indirect costs. .
2. To compare the costs of alternative training programs.
3. To evaluate the proportion of the training budget spent on the development of training,
administrative costs, and evaluation as well as how much is spent on various types of
employees e.g., exempt versus nonexempt).
4. To control costs.
B. Determining costs
1. The resource requirements model compares equipment, facilities, personnel, and
materials costs across different stages of the training process (needs assessment,
development, training design, implementation, and evaluation).
2. There are seven categories of cost sources: costs related to program development or
purchase; instructional materials; equipment and hardware; facilities; travel and
lodging; and salary of the trainer and support staff along with the cost of either lost
productivity or replacement workers while trainees are away from their jobs for the
training.

57
C. Determining benefits can be done via a number of methods, including:
1. Technical, practitioner and academic literature summarizes benefits of training
programs.
2. Pilot training programs assess the benefits from a small group of trainees before a
company commits more resources.
3. Observing successful job performers can help to determine what successful job
performers do differently than unsuccessful performers.
4. Asking trainees and their managers to provide estimates of training benefits.
D. An example of a Cost-Benefit analysis appears on page 221of the text (See also Table 6-
12, p. 221, and Table 6-13, p. 222).
E. Other methods of cost-benefit analysis
1. Utility analysis assesses the dollar value of training based on estimates of the
difference in job performance between trained and untrained employees, the number of
employees trained, the length of time the program is expected to influence
performance, and the variability in job performance in the untrained group of
employees. This is a highly sophisticated formula that requires the use of pretest and
posttest with a comparison group.
2. Other types of economic analysis evaluate training as it benefits the firm or
government using direct and indirect costs, incentives paid by the government for
training, wage increases received by trainees as a result of the training, tax rates, and
discount rates.

CHAPTER 6 SUMMARY

This chapter provides sound base of knowledge regarding training evaluation, the issues
surrounding it, and how to approach it. Reasons for evaluating training were described, and the
process of evaluating training was outlined. Kirkpatrick’s model of evaluation was explained, as
well as the five major categories of outcomes that can be measured to evaluate training
effectiveness. The five outcomes (cognitive, skill-based, affective, results, and ROI) used in
evaluating training programs were explained. Good training outcomes need to be relevant,
reliable, discriminate, and practical. Next, threats to both internal and external validity were
discussed. Various evaluation designs were explained with an emphasis on related costs, time, and
strength. Return on Investment (ROI) and cost-benefit analysis were explained, and examples
given. The chapter concluded with a listing of key terms, discussion questions, and application
assignments.

58
Discussion Questions

1. What can be done to motivate companies to evaluate training programs?

Answer:
The company’s management would need to be made aware of the investment made by the
company for the training and the need to summatively evaluate it to determine if the training
program is effective, and formatively evaluate it to identify its strengths and weaknesses to
better accomplish training. Evaluation is vital to improving a training program, or deciding
whether to replace it completely with a better program or a non-training option. (p. 198)

2. What do threats to validity have to do with training evaluation? Identify internal and external
threats to validity. Are internal and external threats similar? Explain.

Answer:
If threats to validity exist, the evaluator may question whether a training program was really
effective or if possible benefits were the result of other factors. Internal threats to validity
affect the believability of the completed program’s perceived benefits, while threats to the
external validity affect the believability of the program’s benefits for future use. The two types
of threats are strongly related, in that evaluation of future benefit of a program is based on
past performance. (p. 210)

3. What are the strengths and weaknesses of each of the following designs: posttest-only,
pretest/posttest comparison group, pretest/posttest only?

Answer:
A posttest only design is requires less time and effort than the other, but fails to recognize
factors such as initial differences between the control group and the group that received the
training. Pretest/posttest only acknowledges the performance of the trained group before the
training, but ignores other business factors that may be occurring between the time of the two
evaluations. Pretest/posttest with comparison group is the most thorough, acknowledging the
highest number of factors, but involves the most time and effort to collect data. (p. 212-213)

4. What are results outcomes? Why do you think that most organizations don’t use results
outcomes for evaluating their training programs?

Answer:
Results outcomes are the benefits of the training program for the company. These types of
payoffs could be difficult to determine, as many are long-term benefits and are difficult to
track, while others, such as increased customer satisfaction may be difficult to place an exact
monetary value on. (p. 204)

59
5. This chapter discussed several factors that influence the choice of evaluation design. Which of
these factors would have the greatest influence on your choice of and evaluation design?
Which would have the smallest influence? Explain your choices.

Answer:
Answers will vary.

6. How might you estimate the benefits from a training program designed to teach employees
how to use the World Wide Web to monitor stock prices?

Answer:
Test could be performed to see not only if the employees are capable of monitoring the stock
prices, to see if they learned how, but also surveys could be used to see how frequently they
do so in a given day, to determine how well they have integrated that knowledge into their
behavior. Then employees could estimate how much behavior benefits the company. (p. 198)

7. A group of managers (N=25) participated in the problem-solving module of a leadership


development program two weeks ago. The module consisted of two days in which the group
focused on the correct process to use in problem solving. Each manager supervises 15 to 20
employees. The company is willing to change the program, and there is an increasing
emphasis in the company to show that training expenses are justifiable. You are asked to
evaluate this program. Your boss would like the results of the evaluation no later than six
weeks from now. Discuss the outcomes you would collect and the design you would use.
How might your answer change if the managers have not yet attended the program?

Answer:
If the evaluation is done after the program is completed, a posttest only design could be used,
comparing the group of managers to a control group. Areas of evaluation may include
looking as various decisions made by the managers over several weeks and determining their
effectiveness in solving various problems, and the amount of time taken to organize solutions.
If the managers had not yet attended the program, a pretest/posttest method could be used,
comparing the performance of the managers before and after the module, in addition to
making comparisons to a control group. (p. 210-211)

8. What practical considerations need to be taken into account when calculating a training
program’s ROI?

Answer:
The costs and benefits of a training program must all be considered. The costs are usually
able to be calculated using accounting, while the benefits are not always as clearly defined.
First, the reasons for conducting the training must be reviewed. Some benefits, such as
employee productivity, or repeat customer business must be evaluated in the long term. Often
it is helpful to compare the trained group of employee to a control group in order to isolate
the effects of the program. (p. 223-224)

60
9. What metrics might be useful for evaluating the effectiveness of a company’s training
function? Discuss and rate their importance.

Answer:
Answers will vary. Table 6.15 (p. 225) provides examples of different measurements, or
metrics. These metrics are valuable for benchmarking purposes, for understanding the current
amount of training activity in a company, and for tracking historical trends in training activity.
However, collecting these metrics does not address such issues as whether training is effective
or whether the company is using the data to make strategic training decisions.

10. What acceptable methods can be used to show the costs and benefits of training without
collecting statistics and conducting analyses? Explain these methods and their strengths and
weaknesses compared to a cost-benefit analysis.
Answer:
One method is using success cases. These refer to concrete examples of the impact of training
that show how learning leads to results that the company finds worthwhile and the managers
find credible. Success cases do not attempt to isolate the influence of training but rather to
provide evidence that it was useful. (p. 224)

61

You might also like