You are on page 1of 50

Chapter 12

Introducing Evaluation

www.id-book.com

2011

The aims
How can the usability of a system be
evaluated?
How can usability problems be found and
improvements suggested?

www.id-book.com

2011

Key questions for an evaluation

Iterative design & evaluation is a continuous process that


examines:
Why: to check users requirements and that users can
use the product and they like it.
What: a conceptual model, early prototypes of a new
system and later, more complete prototypes. Lay down
usability criteria.
Where: in natural and laboratory settings.
When: throughout design; finished products can be
evaluated to collect information to inform new products.
Summative evaluation : final quantitative assessment of
initially defined criteria.

Formative evaluation : at different times, assess current


system against actual requirements.
www.id-book.com
3
2011

Bruce Tognazzini tells you why you


need to evaluate
Iterative design, with its repeating cycle
of design and testing, is the only validated
methodology in existence that will
consistently produce successful results. If
you dont have user-testing as an integral
part of your design process you are going
to throw buckets of money down the
drain.
See AskTog.com for topical discussions
about design and evaluation.
www.id-book.com

2011

Types of evaluation
3 broad categories, depending on the setting,
user involvement and level of control.
Controlled settings involving users
- usability testing & experiments in laboratories
and living labs.
Natural settings involving users
- field studies to see how the product is used in
the real world.
Any settings not involving users
- consultants critique; to predict, analyze &
model aspects of the interface analytics.
www.id-book.com

2011

Pros and Cons

Controlled settings involving users (Lab-based


studies)
- Good at revealing usability problems
- Poor at capturing context of use
Natural settings involving users (Field studies)
-

Good at demonstrating how people


technologies in their intended setting

use

- Expensive and difficult to conduct


Any settings not involving users (Modelling and
predicting approaches)
- Quick and cheap to perform
- Missing unpredictable usability problems and
www.id-book.com
6
2011
subtle aspects of the user
experience

Living labs
Peoples use of technology in their
everyday lives can be evaluated in
living labs.
Such evaluations are too difficult to
do in a usability lab.
Eg the Aware Home was embedded
with a complex network of sensors
and audio/video recording devices
(Abowd et al., 2000).
www.id-book.com

2011

Usability testing & field studies


can compliment

www.id-book.com

2011

Evaluation methods
Method

Controlled Natural
settings
settings

Observing

Asking
users

x
x

Asking
experts
Testing

x
x

Modeling
www.id-book.com

Without
users

2011

Usability Testing
Usability testing refers to evaluating a product, website,
mobile app or system by testing it with representative
users with real-life scenarios and user task.
The goal is to identify any usability problems, collect
qualitative and quantitative data and determine the
participant's satisfaction with the product.

www.id-book.com

10

2011

Usability Testing
1. Get representative users
5 10 participants
2. Define criteria for evaluation
Time to complete a task.
Time to complete a task after a specified time
away from the product.
Number and type of errors per task.
Number of errors per unit of time.
Number of navigations to online help or
manuals.
Number of users making a particular errors.
Number of users completing
a task successfully.
www.id-book.com
11
2011

Usability Testing
3. Develop test scenario:setup+context+task
Choose relevant scenarios (typical vs extreme)
Keep task duration shorter than 30 minutes
Ensure identical conditions for all participants
4. Consider ethical issues
De-brief participants, get consent, etc.
5. Run pilot tests & refine design
Practice with staff and observers
6. Actual testing
Instruction of participants
Carry out test and record data
www.id-book.com

12

2011

Usability Testing
7. Analysis
Statistics eg. Mouse events, menu selection
Screen design : gaze tracking and course of task
completion
Post task video confrontation and user interview
8. Report results and make recommendations
for improvement.

www.id-book.com

13

2011

U s a b ilit y la b w it h o b s e r v e r s
w a t c h in g a u s e r & a s s is t a n t

w w w .id - b o o k .c o m

2011
2011

2011

U s a b ilit y t e s t in g & r e s e a r c h
U s a b ilit y t e s t in g

Im p ro v e p ro d u c ts
F e w p a r t ic ip a n t s
R e s u lt s in f o r m d e s ig n
U s u a lly n o t
c o m p le t e ly r e p lic a b le
C o n d it io n s c o n t r o lle d
a s m u c h a s p o s s ib le
P r o c e d u r e p la n n e d
R e s u lt s r e p o r t e d t o
d e v e lo p e r s
w w w .id - b o o k .c o m

E x p e r im e n t s fo r
resea rch
D is c o v e r k n o w le d g e
M a n y p a r t ic ip a n t s
R e s u lt s v a lid a t e d
s t a t is t ic a lly
M u s t b e r e p lic a b le
S t r o n g ly c o n t r o lle d
c o n d it io n s
E x p e r im e n t a l d e s ig n
S c ie n t if ic r e p o r t t o
s c ie n t if ic c o m m u n it y
2011
2011

P o r t a b le e q u ip m e n t fo r u s e in
t h e f ie ld

2011

Examples of some of the tests


used in the Ipad
evaluation(adapted from Budiu
and Nielsen, 2010)
App or Website

Task

iBook

Download a free copy of Alice's Advantures in


Wonderland and read through the first few pages.

eBay

You want to buy a new iPad on eBay. Find one that


you could buy from the reputable seller.

Time Magazine

Browse through the magazine and find the best


pictures of the week.

Kayak

You are planning a trip to Death Valley in May this


year. Find a hotel located in the park or close to the
park.
2011

Experiments
Predict the relationship between two or
more variables.
Independent variable is manipulated by
the researcher.
Dependent variable depends on the
independent variable.
Typical experimental designs have one or
two independent variable.
Validated statistically & replicable.
www.id-book.com

19

2011

E x p e r im e n t a l d e s ig n s
D if f e r e n t p a r t ic ip a n t s - s in g le g r o u p
o f p a r t ic ip a n t s is a llo c a t e d r a n d o m ly
t o t h e e x p e r im e n t a l c o n d it io n s .
S a m e p a r t ic ip a n t s - a ll p a r t ic ip a n t s
a p p e a r in b o t h c o n d it io n s .
M a t c h e d p a r t ic ip a n t s - p a r t ic ip a n t s
a r e m a t c h e d in p a ir s , e . g . , b a s e d o n
e x p e r t is e , g e n d e r , e t c .
2011

D if f e r e n t , s a m e , m a t c h e d
p a r t ic ip a n t d e s ig n
D e s ig n

A d v a n ta g e s

D is a d v a n t a g e s

D iffe r e n t

N o o rd e r e ffe c ts

M a n y s u b je c ts &
in d iv id u a l d iffe r e n c e s a
p r o b le m

Sam e

F e w in d iv id u a ls , n o
in d iv id u a l d iffe r e n c e s

C o u n t e r - b a la n c in g
needed because of
o r d e r in g e ffe c t s

M a tch e d

S a m e a s d if fe r e n t
p a r t ic ip a n t s b u t
in d iv id u a l d if fe r e n c e s
re d u ce d

C a n n o t b e su re o f
p e r f e c t m a t c h in g o n a ll
d iffe r e n c e s

2011

F ie ld s t u d ie s
F ie ld s t u d ie s a r e d o n e in n a t u r a l s e t t in g s .
in t h e w ild is a t e r m f o r p r o t o t y p e s b e in g
u s e d f r e e ly in n a t u r a l s e t t in g s .
A im t o u n d e r s t a n d w h a t u s e r s d o n a t u r a lly
a n d h o w t e c h n o lo g y im p a c t s t h e m .
F ie ld s t u d ie s a r e u s e d in p r o d u c t d e s ig n t o :
- id e n t if y o p p o r t u n it ie s f o r n e w t e c h n o lo g y ;
- d e t e r m in e d e s ig n r e q u ir e m e n t s ;
- d e c id e h o w b e s t t o in t r o d u c e n e w
t e c h n o lo g y ;
- e v a lu a t e t e c h n o lo g y in u s e .
2011

U b iF it G a r d e n : A n in t h e w ild
s tu d y

2011

A n a ly t ic a l e v a lu a t io n
D e s c r ib e t h e k e y c o n c e p t s a s s o c ia t e d
w it h in s p e c t io n m e t h o d s .
E x p la in h o w t o d o h e u r is t ic e v a lu a t io n
a n d w a lk t h r o u g h s .
E x p la in t h e r o le o f a n a ly t ic s in e v a lu a t io n .
D e s c r ib e h o w t o p e r f o r m t w o t y p e s o f
p r e d ic t iv e m e t h o d s , G O M S a n d F it t s L a w .

2011

I n s p e c t io n s
S e v e r a l k in d s .
E x p e r t s u s e t h e ir k n o w le d g e o f u s e r s &
t e c h n o lo g y t o r e v ie w s o f t w a r e u s a b ilit y .
E x p e r t c r it iq u e s ( c r it s ) c a n b e f o r m a l o r
in f o r m a l r e p o r t s .
H e u r is t ic e v a lu a t io n is a r e v ie w g u id e d
b y a s e t o f h e u r is t ic s .
W a lk t h r o u g h s in v o lv e s t e p p in g t h r o u g h
a p r e - p la n n e d s c e n a r io n o t in g p o t e n t ia l
p r o b le m s .
2011

H e u r is t ic e v a lu a t io n
D e v e lo p e d J a c o b N ie ls e n in t h e e a r ly
1990s.
B a s e d o n h e u r is t ic s d is t ille d f r o m a n
e m p ir ic a l a n a ly s is o f 2 4 9 u s a b ilit y
p r o b le m s .
T h e s e h e u r is t ic s h a v e b e e n r e v is e d f o r
c u r r e n t t e c h n o lo g y .
H e u r is t ic s b e in g d e v e lo p e d f o r m o b ile
d e v ic e s , w e a r a b le s , v ir t u a l w o r ld s , e t c .
D e s ig n g u id e lin e s f o r m a b a s is f o r
d e v e lo p in g h e u r is t ic s .
2011

N ie ls e n s o r ig in a l h e u r is t ic s
1. Visibility of system status
2. Match between system and the real world
Speak the user's language, follow real-world conventions,
make information appear in a natural and logical order
3. User freedom and control
Provide a clearly marked emergency exit to leave an
unwanted state (undo and redo)
4. Consistency and standards
Users should not have to wonder whether different words,
situations, or actions means the same thing.
5. Error prevention

2011

N ie ls e n s o r ig in a l h e u r is t ic s
6. Recognition rather than recall
7. Flexibility and efficiency of use
Cater both inexperienced and experienced users, allow to
tailor frequent actions
8. Aesthetic and minimalist design
Provide no irrelevant or rarely needed info
9. Help users recognize, diagnose and recover from errors
Error messages in plain language (no codes), precisely
indicate the problem, suggest a solution
10. Help and documentation
Provide help and documentation, easy to search, focus on
user task, list concrete steps to be carried out, not too large2011

D is c o u n t e v a lu a t io n
H e u r is t ic e v a lu a t io n is r e f e r r e d t o
a s d is c o u n t e v a lu a t io n w h e n 5
e v a lu a t o r s a r e u s e d .
E m p ir ic a l e v id e n c e s u g g e s t s t h a t
o n a v e r a g e 5 e v a lu a t o r s id e n t if y
7 5 - 8 0 % o f u s a b ilit y p r o b le m s .

2011

N o . o f e v a lu a t o r s & p r o b le m s

2011

3 s t a g e s f o r d o in g h e u r is t ic
e v a lu a t io n
B r ie f in g s e s s io n t o t e ll e x p e r t s w h a t t o
do.
E v a lu a t io n p e r io d o f 1 - 2 h o u r s in w h ic h :
E a c h e x p e r t w o r k s s e p a r a t e ly ;
T a k e o n e p a s s to g e t a fe e l fo r th e p ro d u c t;
T a k e a s e c o n d p a s s t o f o c u s o n s p e c if ic
fe a tu re s .

D e b r ie f in g s e s s io n in w h ic h e x p e r t s
w o r k t o g e t h e r t o p r io r it iz e p r o b le m s .
2011

A d v a n t a g e s a n d p r o b le m s
F e w e t h ic a l & p r a c t ic a l is s u e s t o
c o n s id e r b e c a u s e u s e r s n o t in v o lv e d .
C a n b e d if f ic u lt & e x p e n s iv e t o f in d
e x p e rts.
B e s t e x p e r t s h a v e k n o w le d g e o f
a p p lic a t io n d o m a in & u s e r s .
B ig g e s t p r o b le m s :
I m p o r t a n t p r o b le m s m a y g e t m is s e d ;
M a n y t r iv ia l p r o b le m s a r e o f t e n id e n t if ie d ;
E x p e r t s h a v e b ia s e s .
2011

H e u r is t ic s f o r w e b s it e s fo c u s
o n k e y c r it e r ia (B u d d , 2 0 0 7 )
C la r it y
M in im iz e u n n e c e s s a r y c o m p le x it y &
c o g n it iv e lo a d
P r o v id e u s e r s w it h c o n t e x t
P r o m o t e p o s it iv e & p le a s u r a b le u s e r
e x p e r ie n c e
2011

Walkthroughs
Walkthroughs are an alternative to heuristic
evaluation for predicting user's problems without
doing user testing.
Involve walking through a task with the product
and nothing problematic usability features. Most
walkthrough methods to not involve users.

2011

C o g n it iv e w a lk t h r o u g h s
F o c u s o n e a s e o f le a r n in g .
D e s ig n e r p r e s e n t s a n a s p e c t o f t h e
d e s ig n & u s a g e s c e n a r io s .
E x p e r t is t o ld t h e a s s u m p t io n s
a b o u t u s e r p o p u la t io n , c o n t e x t o f
u s e , t a s k d e t a ils .
O n e o r m o r e e x p e r t s w a lk t h r o u g h
t h e d e s ig n p r o t o t y p e w it h t h e
s c e n a r io .
E x p e r t s a r e g u id e d b y 3 q u e s t io n s .
2011

T h e 3 q u e s t io n s
W ill t h e c o r r e c t a c t io n b e s u f f ic ie n t ly
e v id e n t t o t h e u s e r ?
W ill t h e u s e r n o t ic e t h a t t h e c o r r e c t
a c t io n is a v a ila b le ?
W ill t h e u s e r a s s o c ia t e a n d in t e r p r e t t h e
r e s p o n s e fr o m t h e a c t io n c o r r e c t ly ?
A s th e e x p e rts w o rk th ro u g h th e
s c e n a r io t h e y n o t e p r o b le m s .
2011

P lu r a lis t ic w a lk t h r o u g h
V a r ia t io n o n t h e c o g n it iv e w a lk t h r o u g h
th e m e .
P e r f o r m e d b y a c a r e f u lly m a n a g e d t e a m .
T h e p a n e l o f e x p e r t s b e g in s b y w o r k in g
s e p a r a t e ly .
T h e n t h e r e is m a n a g e d d is c u s s io n t h a t
le a d s t o a g r e e d d e c is io n s .
T h e a p p r o a c h le n d s it s e lf w e ll t o
p a r t ic ip a t o r y d e s ig n .
2011

A n a ly t ic s
A m e t h o d f o r e v a lu a t in g u s e r t r a f f ic
th ro u g h a s y s te m o r p a rt o f a s y s te m
M a n y e x a m p le s in c lu d in g G o o g le
A n a l y t i c s , V i s i s t a t ( s h o w n b e lo w )
T im e s o f d a y & v is it o r I P a d d r e s s e s

2011

S o c ia l a c t io n a n a ly s is
( P e r e r & S h n e id e r m a n , 2 0 0 8 )

2011

P r e d ic t iv e m o d e ls
P r o v id e a w a y o f e v a lu a t in g p r o d u c t s
o r d e s ig n s w it h o u t d ir e c t ly in v o lv in g
u se rs.
L e s s e x p e n s iv e t h a n u s e r t e s t in g .
U s e f u ln e s s lim it e d t o s y s t e m s w it h
p r e d ic t a b le t a s k s - e . g . , t e le p h o n e
a n s w e r in g s y s t e m s , m o b ile s , c e ll
p h o n e s , e tc .
B a s e d o n e x p e r t e r r o r - fr e e b e h a v io r .
2011

GOMS Goal, Operators,


Methods, Selection rules
G o a ls w h a t t h e u s e r w a n t s t o a c h ie v e
e g . f in d a w e b s it e .
O p e r a t o r s - t h e c o g n it iv e p r o c e s s e s &
p h y s ic a l a c t io n s n e e d e d t o a t t a in g o a ls ,
e g . d e c id e w h ic h s e a r c h e n g in e t o u s e .
M e t h o d s - t h e p r o c e d u r e s t o a c c o m p lis h
t h e g o a ls , e g . d r a g m o u s e o v e r fie ld , t y p e
in k e y w o r d s , p r e s s t h e g o b u t t o n .
S e le c t io n r u le s - d e c id e w h ic h m e t h o d t o
s e le c t w h e n t h e r e is m o r e t h a n o n e .
2011

GOAL : delete a word in a sentence


Method for accomplishing goal of deleting a word using menu
options
Method for accomplishing goal of deleting a word using delete key
Operators to use in the above methods :
Click mouse
Drag cursor over text
Select menu
Move cursor to command
Press key
Selection rules to decide which method to use :
1. Delete text using mouse and selecting from menu if a large
amount of text is to be deleted.
2. Delete text using delete' key if small number of letters are to be
deleted.

2011

K e y s t r o k e le v e l m o d e l
G O M S h a s a ls o b e e n d e v e lo p e d t o
p r o v id e a q u a n t it a t iv e m o d e l - t h e
k e y s t r o k e le v e l m o d e l.
T h e k e y s t r o k e m o d e l a llo w s
p r e d ic t io n s t o b e m a d e a b o u t h o w
lo n g it t a k e s a n e x p e r t u s e r t o
p e rfo rm a ta s k .
2011

R e s p o n s e t im e s fo r k e y s t r o k e
l e v e l o p e r a t o r s ( C a r d e t a l. , 1 9 8 3 )
O p e ra to r
K

P1
H
M
R (t)

D e s c r ip t i o n
P r e s s in g a s in g l e k e y o r b u t t o n
A v e r a g e s k ill e d t y p is t ( 5 5 w p m )
A v e r a g e n o n - s k ill e d t y p is t ( 4 0 w p m )
P r e s s in g s h if t o r c o n t r o l k e y
T y p is t u n f a m il ia r w it h t h e k e y b o a r d
P o i n t i n g w it h a m o u s e o r o t h e r d e v ic e o n a
d is p l a y t o s e le c t a n o b j e c t .
T h is v a lu e is d e r iv e d f r o m F it t s L a w w h ic h is
d is c u s s e d b e lo w .
C lic k in g t h e m o u s e o r s im ila r d e v ic e
B r in g h o m e h a n d s o n t h e k e y b o a r d o r o t h e r
d e v ic e
M e n t a l ly p r e p a r e / r e s p o n d
T h e r e s p o n s e t i m e is c o u n t e d o n ly if it c a u s e s
t h e u s e r t o w a it .

T im e ( s e c )
0 .2 2
0 .2 8
0 .0 8
1 .2 0
0 .4 0

0 .2 0
0 .4 0
1 .3 5
t
2011

U s in g K L M t o c a lc u la t e t im e t o
c h a n g e g a z e ( H o lle is e t a l. , 2 0 0 7 )

2011

F it t s L a w

( F it t s , 1 9 5 4 )

F it t s L a w p r e d ic t s t h a t t h e t im e t o p o in t
a t a n o b j e c t u s in g a d e v ic e is a f u n c t io n
o f t h e d is t a n c e f r o m t h e t a r g e t o b j e c t &
t h e o b je c t s s iz e .
T h e f u r t h e r a w a y & t h e s m a lle r t h e
o b je c t , t h e lo n g e r t h e t im e t o lo c a t e it &
p o in t t o it .
F it t s L a w is u s e f u l f o r e v a lu a t in g
s y s t e m s f o r w h ic h t h e t im e t o lo c a t e a n
o b je c t is im p o r t a n t , e . g . , a c e ll p h o n e ,
a h a n d h e ld d e v ic e s .
2011

The language of evaluation


Analytics
In the wild
evaluation
Analytical
Living laboratory
evaluation
Controlled
Predictive evaluation
experiment
Summative
Expert review or crit
evaluation
Usability laboratory
Field study
Formative
User studies
evaluation
Usability testing
Heuristic evaluation Users or participants
www.id-book.com

47

2011

Key points
Evaluation & design are closely integrated in user-centered
design.
Some of the same techniques are used in evaluation as for
establishing requirements but they are used differently
(e.g. observation interviews & questionnaires).
Three types of evaluation: laboratory based with users, in
the field with users, studies that do not involve users
The main methods are: observing, asking users, asking
experts, user testing, inspection, and modeling users task
performance, analytics.
Dealing with constraints is an important skill for evaluators
to develop.

www.id-book.com

48

2011

U s a b ilit y t e s t in g is d o n e in c o n t r o lle d c o n d it io n s .
U s a b ilit y t e s t in g is a n a d a p t e d f o r m o f e x p e r im e n t a t io n .
E x p e r im e n t s a im t o t e s t h y p o t h e s e s b y m a n ip u la t in g c e r t a in
v a r ia b le s w h ile k e e p in g o t h e r s c o n s t a n t .
T h e e x p e r im e n t e r c o n t r o ls t h e in d e p e n d e n t v a r ia b le ( s ) b u t n o t
t h e d e p e n d e n t v a r ia b le ( s ) .
T h e r e a r e t h r e e t y p e s o f e x p e r im e n t a l d e s ig n : d if f e r e n t p a r t ic ip a n t s , s a m e - p a r t ic ip a n t s , & m a t c h e d p a r t ic ip a n t s .
F ie ld s t u d ie s a r e d o n e in n a t u r a l e n v ir o n m e n t s .
I n t h e w ild is a r e c e n t t e r m f o r s t u d ie s in w h ic h a p r o t o t y p e
is f r e e ly u s e d in a n a t u r a l s e t t in g .
T y p ic a lly o b s e r v a t io n a n d in t e r v ie w s a r e u s e d t o c o lle c t f ie ld
s t u d ie s d a t a .
D a t a is u s u a lly p r e s e n t e d a s a n e c d o t e s , e x c e r p t s , c r it ic a l
in c id e n t s , p a t t e r n s a n d n a r r a t iv e s .
2011

K e y p o in t s

I n s p e c t io n s c a n b e u s e d t o e v a lu a t e
r e q u ir e m e n t s , m o c k u p s , f u n c t io n a l
p ro to ty p e s , o r s y s te m s .
U s e r t e s t in g & h e u r is t ic e v a lu a t io n m a y
r e v e a l d if f e r e n t u s a b ilit y p r o b le m s .
W a lk t h r o u g h s a r e f o c u s e d s o a r e s u it a b le f o r
e v a lu a t in g s m a ll p a r t s o f a p r o d u c t .
A n a ly t ic s in v o lv e s c o lle c t in g d a t a a b o u t
u s e r s a c t iv it y o n a w e b s it e o r p r o d u c t
T h e G O M S a n d K L M m o d e ls a n d F it t s L a w
c a n b e u s e d t o p r e d ic t e x p e r t , e r r o r - fr e e
p e r f o r m a n c e f o r c e r t a in k in d s o f t a s k s .
2011

You might also like