Kirkpatricks Four Levels of Evaluation

Evaluating Effectiveness of Training
There is a well-established and practical model for measuring whether your training is
effective. it is called Kirkpatrick's Four Levels of Evaluation. Purpose of this document is to
explain the process and how it is applied for EHS training courses at Berkeley lab.
Kirkpatrick's four levels of evaluation

Kirkpatrick's evaluation model is often represented as a set of tiers representing a hierarchy as
shown below.
The Kirpatrick's model is used to evaluate the effectiveness of a training program in order to
determine if a training is yielding the intended outcomes and results? In simple terms it
determines the extent to which the training "hit the mark" in balance to the cost/effort (was it
worth the time, effort and cost to provide the training). This is answered differently for each
program but the methods used can be applied consistently cross-program.
Level 4 To what extent did the training improve business/performance results (ROI/ROE) or in the case of
Results safety, improved metrics as evidenced by a decrease in incidents/accidents, for example.
Level 3 To what extent did participants apply (incorporate) what they learned during training when they
Behavior are back at work (transfer)?
Level 2 To what extent did participants acquire intended knowledge, skills, and abilities as a result of
Learning training (exams, practicals, activities)
Level 1 To what extent did participants respond favorably to the training?

Reaction
A risk-based approach
EHS employs a risk-based approach to determine the extent (effort) used to evaluate training
effectiveness. The reason is that evaluating training at levels 3 and 4 can take a lot of effort,
time and cost so this is balanced to risk. In short, the higher the cosequence to error, the
more effort is used to evaluate effectiveness. Using Rad worker as an exmaple, if it is vital that
radiological workers be able to use survey instruments effectively in order to determine if a
work area is contaminated or not, and the cosequence of not being able to use a survey
instrument is high (impacts worker safety, and/or impacts LBL's reputation or results in
program penalties) then this would be a good candidate for level 3 evaluation because you
would want to verify (in place of work) that those trained are actually capable of using a
survey instrument (wouldn't leave this to chance).
Risk Level 1 hazards:

Trainings that are associated with risk-level 1 hazards are evaluated using Level 1 and Level 2
methods. However this is not a hard-and-fast rule. Ergonomics, for example, has a high
consequence to personal health and safety, as well as institutional cost, but is a risk-level 1
hazard. I mention this because, it is important to apply judgement when determingin the value
for performing Level 3 Evaluation (it requires greater effort). Level three can also be helpful in
identifying if there are other factors that impact dewsired peformance (other than training)
which could include inadequate resources, human factors, management/supervisor issues, etc.

methods, and can benefit from employing Level 3 methods. Again, judgement is required since
risk-level 2 "topics" constitute a wide range of hazard classes and therefre a wide range of
knowledge/skill sets and competencies. This can make evaluation difficult so it is very
important to align evaluation to specific objectives and evaluation used in the training
(discussed later).

and Level 3 methods. THis is not a hard-and-fast rule, but rather a best practice. Why? If, for
example, it is critical that workers who complete Lockout/Tagout (LOTO) are able to perform
the procedure without error, it is not only important to validate this as part of the training, but
also to validate that all critical steps are being applied in the course of work. This answers the
question; did the training transfer? If not, it serves to determine why not? This allows a
program to determine if trainingi is "sticking" and serves as a way to refine and improve
training based on discoveries. It also serves to identify if there are non-training factors that
affect performance, so these can be identified and managed separately.
The following is an outline to help form a decision for when to apply Level 1,2,3,4
Impact Examples Level 1 Level 2 Level 3 Level 4
Risk Level 1 / 2 Standard awareness-level trainings x x

(safety or business process) that are
not critical to safety or achieving
business-critical work.
Risk Level 3 Electrical safety, radiological worker, x x x x

fall protection, key emergency
response, or key business process’
where there is a meaningful
consequence to lack of
performance. (optional)
Impacts large Requirement impacts large population x x x x

population and therefore training program wants
to validate it is achieving results given
the impact/cost (optional)
High administrative Is it important to a program to validate

or programmatic risk training is being applied for assurance
or other business reasons.
Example of Applying Level 2 and 3

The following provides an example for how to apply level 2 and level 3 evaluation methods.
The example is based on training for the use of electrical gloves and tools. Since, evaluation
should align directly to the learning objectives, the example starts with three objectives.
After completing this lesson, participants will be able to:
 Describe when to inspect their electrical gloves (knowledge)

 Describe when to test their electrical gloves (knowledge)
 Validate gloves are functioning as intended by using one or more approved testing
methods (Performance)
 Describe the strengths and limitations of the testing techniques they employ using
workplace examples (contextual or applied knowledge).
Level 2 Methods used to evaluate whether participants Learned. This example uses two parts
to position contect; (the instruction) and (evaluating learning).
 Objective: Validate gloves are functioning as intended by using one or more approved
testing methods
o Instruction:
 Have Instructor model the process, explain the strengths weakness of each
process used. Instructor would also point out the critical steps, the areas
where errors are commonly made and techniques for how to avoid these.
Instructor would then have a participant model the process and have class
provide feedback. Instructor would facilitate and point out what is being
done well and what is being missed. Instructor would then have
participants practice the technique during which they would provide
feedback. Once everyone is confident they can perform inspection well,
instructor validates performance.
o Evaluation:
 Instructor would require that each participant perform glove inspection by

providing a set of gloved (of which one has deficiencies). Learner is then
asked to determine which glove has a deficiency.
 Instructor would have each participant demonstrate (to their satisfaction)
that they can effectively perform glove inspection.
Level 2 evaluation simply evaluates the extent to which participants learned what they were
supposed to learn. It is directly correlated to the performance objectives which describe
learning in performance-based terms.
Level 3 Methods used to evaluate whether the learning that took place in the classroom
transferred to the workplace. The best method is work observation, but other methods include
using confidence intervals and structured interviews. It is suggested that evaluations are
conducted 90 days or more after initial training to determine if the behaviors have “Stuck.” It
is also suggested that evaluation is performed using a random sample of 20 or more. The
following is an example of a structured work observation rubric that is aligned to the learning
objectives.
Performance Inadequate (1) Developing (2) Skilled (3) Exceptional (4) Score
When to Unable to Explained one Provided examples Provided examples or

inspect gloves describe when situation for when for when they when they inspected
glove inspection to inspect, but did inspected their gloves, and were able
is needed using not understand gloves and were to explain multiple
real world other applicable able to indicate if other examples and
examples. situations when inspection was the reasoning behind
probed. needed for other these.
situations when
asked.
How to inspect Missed one or Performed all Performed all Showed automaticity,
gloves more critical steps critical steps but critical steps and but with a critical eye
their confidence able to explain why for error checking.
and/or technique they are critical. Able to explain all
still needed further Technique was critical steps and
development good. demonstrated good
technique.
2) Confidence intervals
Confidence intervals can be a useful technique for measuring the confidence level of
supervisors or managers in relation to worker performance. It is (by itself) not the same as
validating through observation, but provides value to programs where work observation itself
is difficult, or not applicable (for example knowledge work). It is best used by front line
managers, work leads, or activity leads who have direct oversight of the work being performed
so have a close relation to workers. Reliability is measured in part by the consistency in
response between all who respond. Validity would be formative validity in deciding how well
the measures can be used to improve training.
Example: A survey could have the following questions
 Now that your staff have completed training, how confident are you that they know when
to inspect their gloves?
o Very confident
o Somewhat confident
o Not confident
 Now that your staff have completed training, how confident are you that they are able to
effectively test their gloves
o Very confident
o Somewhat confident
o Not confident
 What method do your staff use to test their gloves
o ________________________________
 When should staff inspect their gloves
o ___________________________
The last two questions are used to determine whether the supervisor has adequate
understanding of the process to be considered a reliable source (just examples).
3) Structured Interviews
Structured interviews are used to evaluate understanding and not skill. Use of rubrics are
helpful and making sure that all participants are asked the same questions as a method to
reinforce validity of results. Interviews, in comparison to confidence intervals, allow for
discussion and therefore more meaningful data.
Example:
Q1: When did you recently wear electrical gloves, and why?
Q2: How did you know that the gloves you wore were effective?
Q3: If someone new asked you when they needed to wear gloves, what would you tell them?
Q4: How do you inspect your gloves and when?
All questions should be based on the learning objectives from the training so they are valid.
Summary
Training evaluation allows programs to determine the extent to which workers are meeting
expectations and applying what they learn. Using evidence-based methods strengthens a
program’s ability to provide training that yields expected results. It also provides the data
necessary to show management a positive return on investment (ROI) and return on
expectations (ROE) without which efforts are unsubstantiated. Finally, it allows a program to
clearly define the scope of responsibility so training is not held accountable for non-training
issues.
A U.S. Department of Energy National Laboratory Operated by the University of California

Q U E S T I O N S & C O M M E N T S · P O L I C Y / P R I V A C Y / S E C U R I T Y

Kirkpatricks Four Levels of Evaluation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kirkpatricks Four Levels of Evaluation

Uploaded by

Copyright:

Available Formats

Evaluating Effectiveness of Training

Kirkpatrick's four levels of evaluation

Level 1 To what extent did participants respond favorably to the training?

Risk Level 1 hazards:

Risk Level 2 hazards:

Risk Level 3 hazards:

Risk Level 1 / 2 Standard awareness-level trainings x x

Risk Level 3 Electrical safety, radiological worker, x x x x

Impacts large Requirement impacts large population x x x x

High administrative Is it important to a program to validate

Example of Applying Level 2 and 3

After completing this lesson, participants will be able to:

 Describe when to inspect their electrical gloves (knowledge)

 Instructor would require that each participant perform glove inspection by

Performance Inadequate (1) Developing (2) Skilled (3) Exceptional (4) Score

When to Unable to Explained one Provided examples Provided examples or

 When should staff inspect their gloves

A U.S. Department of Energy National Laboratory Operated by the University of California

You might also like