Gamers as usability evaluators: a study in the domain of

Virtual Worlds
Thiago S. Barcelos
Instituto Federal de São Paulo
São Paulo, Brazil
Roberto Muñoz
Universidad de Valparaíso
Valparaíso, Chile
Virgínia Chalegre
Recife, Brazil

Virtual Worlds (VW) are interactive systems which aim to
provide an immersive environment where users can interact
with each other and with virtual objects. Hence, a set of
quality criteria to be used for the usability evaluation of
Virtual Worlds must consider these particular
characteristics. Moreover, considering the complexity of
the interactive environment of Virtual Worlds and its
resemblance to digital games, the previous experience of
evaluators may influence the evaluation outcome. In this
paper we present an experimental evaluation which
compares the results of a heuristic evaluation of a VW
environment performed by usability experts and skilled
digital game users (gamers) with initial training in the
evaluation technique. The results show that while usability
experts tend to focus on issues related to the Virtual World
configuration interface, gamers identify more problems
related to the interaction with the Virtual World itself.
Virtual worlds, game users, heuristic evaluation
ACM Classification Keywords
H5.2. User interfaces – Evaluation/Methodology
General Terms
Human Factors; Experimentation
The definition and validation of quality criteria for
particular types of interactive systems is a research
challenge. The specificities of domains such as digital
games, Virtual Worlds and mobile applications call for
novel and amplified definitions of usability. With a
definition of usability “tailored” for a specific type of
interactive system, it may be possible to apply usability
inspection techniques that produce more accurate results
than those obtained with the use of classic usability
definitions. Heuristic evaluation is the most applied of
those inspection techniques [9] and is essentially based on
the identification of usability issues associated with a list of
quality criteria.
The experience of an evaluator on a particular domain may
be crucial to the effectiveness of the evaluation [16].
Nevertheless, it is not easy to find a professional with such
specific skills. In addition to that, the increasing complexity
of new application domains may become a challenge to
usability inspection professionals, that need to understand
and cope with the context and constraints of a specific
application domain [4]. A possible strategy would be to
incorporate domain specialists in the evaluation team [14],
as there is some recent evidence [7] that they may perform
as well as usability experts in finding issues in a interface.
Even if those domain specialists are not experienced at
performing heuristic evaluations, there is evidence that
their performance can be improved by presenting the
heuristic list as a systematic list of patterns [13].
Digital games and VWs are two types of interactive
systems that share some common attributes related to the
mechanisms of interaction. Both types of system present a
synthetic world that can be manipulated by the user
according to a set of defined rules [1]. Nowadays these
types of interactive systems are used on a daily basis by a
great amount of people, facilitating the access to
experienced users who have extensive tacit knowledge
about the interactive environment.
Thus, in this paper we aim to investigate the usability
evaluation of VWs based on a set of specific heuristic items
and the influence of the evaluator profile in the evaluation
results. The research goal is to compare the findings of a
heuristic evaluation done by people who frequently engage
in digital gaming activities to the results generated by
expert evaluators. The type of identified issues and the
associations made with the heuristic items will be used as
comparison parameters.
Different types of interactive systems may emphasize
different aspects related toof user interaction. These
differences are leading to the definition of diverse and
specific usability criteria which go beyond the 10-rule list
by Nielsen [16]. For instance, usability heuristics for
mobile applications [3], digital games [2] and virtual
worlds [15] have been precisely defined. Comparative
evaluations [2,3,15] between the heuristic sets and standard
usability definitions have shown that the first ones have a
Roberto Muñoz is affiliated to the Escuela de Ingeniería Civil en
Informática - Facultad de Ingeniería at Universidad de Valparaíso
greater potential to support the identification of issues in a
usability inspection.
However, a list of heuristics suited to a specific application
domain may not be sufficient to improve evaluation
validity or thoroughness. The inspection results can be
affected by differences between evaluators, and one factor
that has often been discussed is evaluator experience. The
early works on heuristic evaluation by Nielsen [16] indicate
that the best results can be obtained when the evaluators are
“double-specialists” (i.e., when evaluators are experienced
in applying the inspection technique and also have adequate
knowledge about the application domain). Hwang and
Salvendy [10] analyze the results of 18 user-based tests and
heuristic evaluation experiments conducted by different
authors, and conclude that evaluator experience is a
significant factor related to the individual problem
detection rate. On the other hand, domain knowledge
appears as a limiting factor, even to experienced evaluators.
Chilana et al. [4] conclude that lack of domain knowledge
may affect the evaluation results, based on interviews with
usability practitioners.
The influence of domain knowledge on usability evaluation
has also been experimentally explored. Følstad et al. [7]
compared the performance of usability experts and domain
experts in the evaluation of four web-based systems. The
evaluation results were compared to those obtained with a
usability test. The authors conclude that the domain experts
identified issues equally valid in comparison to those
identified by usability experts. However, domain experts
identified a smaller amount of issues that were also
identified as real problems in a usability test. In a previous
similar study [8], the issues identified by domain experts,
though less numerous, were concentrated among the more
severe ones.
In the digital games domain, definitions of usability by
several authors (e.g. [5,6,2]) include aspects specifically
related to the activity of playing the game, such as game
history, challenges, feedback and fairness. They have often
been classified as playability aspects in an attempt to
differentiate the interaction with the game itself from the
interaction with the game configuration interface. Using
playability heuristics for digital game evaluation seems to
help evaluators identify more specific issues [5,11].
Moreover, Febretti and Garzotto [6], based on user testing,
argue that playability issues have a higher impact on user
engagement to the game.

Some relevant similarities between VWs and digital games
can be identified. According to Aldrich [1], all games take
place in a VW with its own set of rules and feedback
mechanisms. The main difference is in the interaction goal
of each environment; in a game, the user has to overcome
some sort of challenge, while in VW environments the
main goal is to allow the user to navigate, build artifacts
and communicate with other users in an adequate way. The
digital game user must also be able to “manipulate” its
VW, but with a specific goal in mind.
Three expert evaluators were invited to participate in the
experiment as a control group. All expert evaluators are
Human-Computer Interaction practitioners; their average
experience with usability evaluation is 2 years. The other
group was formed by five undergraduate Computer Science
students from two different universities in Chile and Brazil.
The students were selected based on a previous
questionnaire to identify their habits as digital game users.
The students (which we will refer to as gamers from now
on) have played digital games for an average 7.4 years
(min: 5 years; max: 11 years). They are also frequent
players: four of them claim to spend between 10 to 42
hours a week playing digital games. One of the gamers
claims that he plays games only during the weekends; the
average week time spend on games is 24.0 hours
considering all gamers and 29.5 excluding the participant
that claims to play only on weekends. All gamers have
participated in at least one heuristic evaluation during the
last year.
Both groups were asked to evaluate the interface and
mechanics of Second Life (SL), which is one of the most
popular VWs in the 25+ age group, with approximately 31
million registered users [12]. It is a three-dimensional
virtual community created entirely by its users.Each
participant, expert or gamer, was given a form with a set of
sixteen evaluation heuristics for VWs [15], along with a
checklist that defines some suggested questions related to
each heuristic item in order to facilitate and guide the
evaluation process. Each heuristic item is presented
together with a short descriptive sentence; due to space
limitation reasons, only the name of the heuristic item is
displayed in Table 1.
The participants were instructed to explore the VW for one
hour and write down the identified usability issues in the
form, and each of them should be necessarily associated
with one or more of the heuristic items. Each evaluator
worked alone and did not have access to any of the
evaluations from a different evaluator.
(H1) Feedback
(H2) Clarity
(H3) Consistency
(H4) Simplicity
(H5) Orientation and navigation
(H6) Camera control and visualization
(H7) Low memory load
(H8) Avatar’s customization
(H9) Flexibility and efficiency of use
(H10) Communication between avatars
(H11) Sense of ownership
(H12) Interaction with the Virtual World
(H13) Support to learning
(H14) Error prevention
(H15) Helps users to recover from errors
(H16) Help and documentation
Table 1: Heuristics for VW usability evaluation.
Given the small number of participants and the different
levels of expertise between the two groups, we did not
expect a great number of issues would be identified in the
same way by all evaluators. Besides that, previous research
[10] indicate that more evaluators may be needed to reach a
consensus. Hence, we chose to use the number of issues
associated to a heuristic item as the main metric for
comparing results generated by both groups.

The expert evaluators identified a total of 41 issues and
made 55 associations of issues with heuristic items, which
gives an average of 18.3 associations per evaluator in this
group. The gamers identified a total of 35 issues and made
40 associations, which gives an average of 8 associations
per evaluator. The expert evaluators have identified more
issues than the gamers; this is also reasonable to expect,
given the larger experience with usability evaluation of the
participants in the first group. This characteristic has also
been identified in previous similar studies [7,8].
Based on previous evidence [10] indicating that expert
evaluators may identify usability issues with a higher
probability than novice evaluators, one could expect that
the issues identified by the gamers would essentially be a
subset of the issues identified by the expert evaluators.
However, an analysis of the heuristic items that were
mostly associated with issues rejects this hypothesis. In
Table 2 we present the total number of associations made
by each group.
The most cited heuristic items in the expert evaluators
group were H1 (Feedback); H2 (Clarity); H3 (Consistency)
and H16 (Help and documentation). On the other hand, the
most cited heuristic in the gamers group were H1
(Feedback); H3 (Consistency); H9 (Flexibility and
efficiency of use) and H11 (Sense of ownership).
We separated the issues into two categories, considering
the description made by the evaluators. The definition of
categories was based on the differences between usability
and playability criteria discussed by [6,11,2] and the
similarities between games and VW’s [1]:
• VW interaction issues: all issues related to navigation
inside the world, avatar control and physical rules.
This is an approximation to the concept of playability
applicable to digital games;
• VW configuration issues: all issues related to the VW
configuration interface, help system and used
terminology. This is an approximation to the classic
usability concept as it can be identified in these
interface aspects.
The issues identified by the expert evaluators were more
concentrated in the VW configuration category, while
issues identified by gamers were more concentrated in the
VW interaction category (Table 3).
The association of issues to heuristics allows us to identify
some similarities between the issues identified by both
groups. Heuristic H1 states that the Virtual World must
provide the user easily noticeable feedback about any
actions that he or she begins. Evaluators in both groups
identified that actions related to the avatar control
sometimes have unpredictable results (such as random
movements). They also noticed that the feedback to actions
related to avatar customization occurs with a high delay;
some actions related to object manipulation inside the VW
are shown as feasible actions, but do not generate any
noticeable result.
Heuristic H3 states that actions that produce similar results
should be performed by the user in a similar way.
Evaluators identified as an issue the fact that some parts of
the interface use their native language, while others are
presented in English. This issue was only identified by the
Spanish-speaking participants in both groups. One possible
explanation to this fact is that the Second Life interface
allows the user to choose the interface language but not all
items inside the VW are translated. The Portuguese-
speaking participants had some English proficiency and
chose not to change the default interface language.
Issues identified only by the expert evaluators are mostly
related to the VW Configuration issues, defined in the
previous section. Heuristic H2 emphasizes that the VW
control panel must be visually organized and use a clear
language. Some issues associated to this heuristic were
Expert evaluators Gamers
H1 6* 10.9% 4* 10,0%
H2 7* 12.7% 3 7,5%
H3 13* 23.6% 5* 12,5%
H4 3 5.5% 2 5,0%
H5 2 3.6% 1 2,5%
H6 2 3.6% 1 2,5%
H7 1 1.8% 0 0,0%
H8 2 3.6% 0 0,0%
H9 2 3.6% 4* 10,0%
H10 0 0.0% 1 2,5%
H11 1 1.8% 8* 20,0%
H12 2 3.6% 3 7,5%
H13 3 5.5% 1 2,5%
H14 2 3.6% 3 7,5%
H15 1 1.8% 1 2,5%
H16 8* 14.5% 3 7,5%
Table 2: Number of associations of issues with heuristic
items. The four most cited items are starred.
Expert evaluators Gamers
VW Interaction 18 43.9% 23 65.7%
VW Configuration 23 56.1% 12 34.3%
Total 41 100.0% 35 100.0%
Table 3: Categorization of identified issues.
described; the configuration interface uses colors with low
contrast, making it difficult to read, and choosing between
options may be confusing due to the terminology being
used. The issue related to heuristic H16 is the lack of a
proper help system.
On the other hand, issues identified only by gamers are
mostly related to the VW Interaction category. One aspect
related to heuristic H9 is the possibility to define
accelerators to common actions. The identified issues point
out that it may be difficult to customize keyboard
combinations to control the avatar or environment options,
such as ambient sound. However, the higher resemblance to
a game environment issue was found in the issues related to
heuristic H11. This item defines that physical rules of the
real world should be maintained in the VW and that any
modification of these rules should be clear to the user. Most
gamers identified that some behaviors are incompatible
with the real world – for instance, the avatar does not swim
when it enters a river and walks on top of the water instead.
The complexity of some interactive environments such as
digital games and VWs may hinder the application of
usability inspection techniques. Evaluators who have been
trained to search for issues based on traditional usability
definitions may need to adapt their work to the context of
particular user experience goals related to different
One possible solution is to incorporate users who have
practical experience in playing games and using VWs into
the evaluation process. Based on this proposal, we
conducted an experiment to compare results of heuristic
evaluations done by expert evaluators and experienced
digital game users with some training in usability
evaluation. The preliminary results show that evaluators
with different profiles may emphasize different aspects of
the VW – expert evaluators found slightly more issues
related to classic usability aspects, such as visual layout,
terminology and help, while a higher proportion of issues
identified by gamers were related to the interaction with the
VW itself. A previous study [6] argues that digital game
users may overlook classic usability issues while playing a
game as long as the gameplay remains attractive. This
study brings some more evidence to support this hypothesis
by focusing on the same user profile in a slightly different
The results obtained so far need further investigation in
order to validate the preliminary findings. In future works,
we intend to increase the number of evaluators to allow a
more accurate statistical validation of the results. This may
also help to elucidate the low proportion of issues identified
by more than one evaluator. We also intend to study the
impact of incorporating gamers into other inspection-based
techniques well suited to the participation of non-experts –
for instance, the pluralistic walkthrough and the
participatory heuristic evaluation.
1. Aldrich, C. Virtual Worlds, Simulations, and Games
for Education: A Unifying View. Innovate 5, 2009.
2. Barcelos, T.S., Carvalho, T., Schimiguel, J., and
Silveira, I.F. Análise comparativa de heurísticas para
avaliação de jogos digitais. Proc. IHC+CLIHC 2011, SBC
(2011), 187–196.
3. Bertini, E., Gabrielli, S., and Kimani, S.
Appropriating and assessing heuristics for mobile
computing. Proc. AVI ’06, ACM (2006), 119–126.
4. Chilana, P.K., Wobbrock, J.O., and Ko, A.J.
Understanding usability practices in complex domains.
Proc. CHI ’10, ACM (2010), 2337–2346.
5. Desuvire, H., Caplan, M., and Toth, J.A. Using
heuristics to evaluate the playability of games. Proc. CHI
’04, ACM (2004), 1509–1512.
6. Febretti, A. and Garzotto, F. Usability, playability,
and long-term engagement in computer games. Proc. CHI
’09, ACM (2009), 4063–4068.
7. Følstad, A., Anda, B.C.D., and Sjøberg, D.I.K. The
usability inspection performance of work-domain experts:
An empirical study. Interac. Comp. 22, 2 (2010), 75–87.
8. Følstad, A. Work-Domain Experts as Evaluators:
Usability Inspection of Domain-Specific Work-Support
Systems. Int. J. Hum.-Comp. Int. 22, 3 (2007), 217–245.
9. Hollingsed, T. and Novick, D.G. Usability inspection
methods after 15 years of research and practice. Proc.
SIGDOC ’07, ACM (2007), 249–255.
10. Hwang, W. and Salvendy, G. What makes evaluators
to find more usability problems?: a meta-analysis for
individual detection rates. Proc. HCI ’07, Springer-Verlag
(2007), 499–507.
11. Korhonen, H. The explanatory power of playability
heuristics. Proc. ACE ’11, ACM (2011), 40:1–40:8.
12. KZero Worldswide. Universe charts for Q1 2012.
13. Lanzilotti, R., Ardito, C., Costabile, M.F., and De
Angeli, A. Do patterns help novice evaluators? A
comparative study. Int. J. Hum.-Comput. Stud. 69, 1-2
(2011), 52–69.
14. Muller, M.J., Matheson, L., Page, C., and Gallup, R.
Methods & tools: participatory heuristic evaluation.
interactions 5, 5 (1998), 13–18.
15. Muñoz, R., Barcelos, T., and Chalegre, V. Defining
and Validating Virtual Worlds Usability Heuristics. Proc.
SCCC 2011, IEEE (2012), 171–178.
16. Nielsen, J. Finding usability problems through
heuristic evaluation. Proc. CHI ’92, ACM (1992), 373–