Professional Documents
Culture Documents
NAME ID/NO
SEID YIBRE 050/10
Evaluation Techniques
Introduction to Evaluation
Evaluation is a process that critically examines a program. It involves collecting and analyzing
information about a program's activities, characteristics, and outcomes. Its purpose is to make
judgments about a program, to improve its effectiveness, and/or to inform programming
decisions.
Evaluation utilizes many of the same methodologies used in traditional social research, but
because evaluation takes place within a political and organizational context, it requires group
skills, management ability, political dexterity, sensitivity to multiple stakeholders and other skills
that social research in general does not rely on as much.
Evaluation is a methodological area that is closely related to, but distinguishable from more
traditional social research.
Evaluation
tests usability and functionality and acceptability of an interactive system
occurs in laboratory, field and/or in collaboration with users
evaluates both design and implementation
should be considered at all stages in the design life cycle
Goals of Evaluation Evaluation has three main goals:
1 to assess the extent and accessibility of the systems functionality,
2 to assess users experience of the interaction, and
3 to identify any specific problems with the system.
Evaluation through expert analysis
Consequently, a number of methods have been proposed to evaluate interactive
systems through expert analysis. These depend upon the designer, or a human
factors expert, taking the design and assessing the impact that it will have upon a
typical user. The basic intention is to identify any areas that are likely to cause
Difficulties because they violate known cognitive principles, or ignore accepted
empirical results we will consider four approaches to expert analysis:
cognitive walkthrough,
heuristic evaluation,
the use of models and
use of previous work
Cognitive Walkthrough
CW is a detailed review of a sequence of actions, in this case, the steps that an interface
will require the user to perform in order to accomplish some known task.
expert ‘walks though’ design to identify potential problems using psychological
principles
3|Page
natural environment
context retained (though observation may alter it)
longitudinal studies possible
Requires an artefact:
Simulation, prototype, full implementation
Experimental evaluation
controlled evaluation of specific aspects of interactive behaviour
Evaluator chooses hypothesis to be tested which can be determined by measuring some
attribute of participant behavior.
Any changes in the behavioral measures are attributed to the different conditions.
Experimental factors:-
participants :- representative, sufficient sample
Variables :- things to modify and measure
Hypothesis:- are predictions of the outcome of an experiment
Experimental design:- how you are going to do it
• Disadvantages:
distractions
noise
• Appropriate
where context is crucial for longitudinal studies
Evaluating Implementations
Hypothesis
prediction of outcome
framed in terms of IV and DV e.g. “error rate will increase as font size decreases”
null hypothesis:
states no difference between conditions
Aim is to disprove this e.g. null hyp. = “no change with font size”
Experimental design
• Within groups design
Each subject performs experiment under each condition.
transfer of learning possible
less costly and less likely to suffer from user variation
• Between groups design
each subject performs under only one condition
4|Page
no transfer of learning
more users required
variation can bias results
Analysis of data
• Before you start to do any statistics:
look at data
save original data
• Choice of statistical technique depends on
type of data
information required
• Type of data
discrete
finite number of values
continuous
any value
Experimental studies on groups
More difficult than single-user experiments Problems with:
subject groups
choice of task
data gathering
The task
must encourage cooperation perhaps involve multiple channels options:
creative task e.g. ‘write a short report on …’
decision games e.g. desert survival task
control task e.g. ARKola bottling plan
Data gathering
several video cameras + direct logging of application
Problems:
– synchronization
– Sheer volume! one solution:
– record from each perspectiv
Analysis N.B. vast variation between group’s solutions:
within groups experiments
micro-analysis (e.g., gaps in speech)
anecdotal and qualitative analysis look at interactions between group and media
controlled experiments may `waste' resources
Data gathering
Several video cameras + direct logging of application problems:
synchronization
5|Page
Sheer volume! one solution:
record from each perspective
Field studies
Experiments dominated by group formation Field studies more realistic:
distributed cognition work studied in context real action is situated action
physical and social environment both crucial Contrast: psychology
o controlled experiment sociology and anthropology
o open study and rich data
Observational Methods
Think Aloud
o Cooperative evaluation Protocol
o analysis Automated analysis
Think Aloud
user observed performing task
User asked to describe what he is doing and why, what he thinks is happening etc.
Advantages
simplicity
requires little expertise
can provide useful insight
can show how system is actually use
Disadvantages
subjective
selective
act of describing may alter task performance
Cooperative evaluation
o variation on think aloud
o user collaborates in evaluation
o both user and evaluator can ask each other questions throughout
o Additional advantages
less constrained and easier to use
user is encouraged to criticize system
clarification possible
Protocol analysis
• Paper and pencil
cheap, limited to writing speed
• Audio
good for think aloud, difficult to match with other protocols
• Video
accurate and realistic, needs special equipment, obtrusive
• Computer logging
automatic and unobtrusive, large amounts of data difficult to analyze
• User notebooks
coarse and subjective, useful insights, good for longitudinal studies
6|Page
• Mixed use in practice.
• Audio/video transcription difficult and requires skill.
• Some automatic support tools available
Automated analysis
Workplace project
Post task walkthrough
– User reacts on action after the event
– used to fill in intention
Advantages
– Analyst has time to focus on relevant incidents
– avoid excessive interruption of task
Disadvantages
– Lack of freshness
– may be post-hoc interpretation of event
Query Techniques
1 Interviews
2 Questionnaire
1 Interview
• Analyst questions user on one-to -one basis usually based on prepared questions
• Informal, subjective and relatively cheap
Advantages
– can be varied to suit context
– issues can be explored more fully
– can elicit user views and identify unanticipated problems
Disadvantages
– Very subjective
– Time consuming
2 Questionnaires
• Set of fixed questions given to users
Advantages
– Quick and reaches large user group
– can be analyzed more rigorously
Disadvantages
– Less flexible
– Less probin
conclusion
In this paper, we have conceptualized and presented evaluation techniques that emerged
during an action design research study which was carried out in order to track pilot
projects applying a new project management methodology and contrast them with
reference projects to find indicators of the effect of the methodology. It is important to
consider whether a certain kind of techniques is appropriate for the stage of development.
Aim of evaluation is to test the functionality and usability of the design and to identify
and rectify any problems. A design can be evaluated before any implementation work has
started, to minimize the cost of early design errors.
7|Page
References
Atkinson, R. (1999). Project management: cost, time and quality, two best guesses and a
phenomenon, its time to accept other success criteria International Journal of Project
Management, 17(6), 337-342. Axelos. (2015). PRINCE2 Agile. Norwich: TSO (The
Stationery Office), part of Williams Lea. Befani, B., Ledermann, S., & Sager, F. (2007).
Realistic Evaluation and QCA: Conceptual Parallels and an Empirical Application.
Evaluation, 13(2), 171-192. Bertalanffy, L. v. (1956).
8|Page