You are on page 1of 13

This article was downloaded by: [UNSW Library]

On: 13 August 2015, At: 00:52


Publisher: Taylor & Francis
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: 5 Howick Place,
London, SW1P 1WG

International Journal of Human-Computer Interaction


Publication details, including instructions for authors and subscription information:
http://www.tandfonline.com/loi/hihc20

Assessing User Satisfaction in the Era of User


Experience: Comparison of the SUS, UMUX, and UMUX-
LITE as a Function of Product Experience
a b c d c
Simone Borsci , Stefano Federici , Silvia Bacci , Michela Gnaldi & Francesco Bartolucci
a
National Institute for Health Research, Diagnostic Evidence Cooperative, Imperial College
of London, London, United Kingdom
b
Department of Philosophy, Social & Human Sciences and Education, University of Perugia,
Perugia, Italy
c
Department of Economics, University of Perugia, Perugia, Italy
d
Department of Political Sciences, University of Perugia, Perugia, Italy
Accepted author version posted online: 24 Jun 2015.
Click for updates

To cite this article: Simone Borsci, Stefano Federici, Silvia Bacci, Michela Gnaldi & Francesco Bartolucci (2015) Assessing User
Satisfaction in the Era of User Experience: Comparison of the SUS, UMUX, and UMUX-LITE as a Function of Product Experience,
International Journal of Human-Computer Interaction, 31:8, 484-495, DOI: 10.1080/10447318.2015.1064648

To link to this article: http://dx.doi.org/10.1080/10447318.2015.1064648

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.

This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Intl. Journal of Human–Computer Interaction, 31: 484–495, 2015
Copyright © Taylor & Francis Group, LLC
ISSN: 1044-7318 print / 1532-7590 online
DOI: 10.1080/10447318.2015.1064648

Assessing User Satisfaction in the Era of User Experience:


Comparison of the SUS, UMUX, and UMUX-LITE as a Function of
Product Experience
Simone Borsci1, Stefano Federici2, Silvia Bacci3, Michela Gnaldi4, and Francesco Bartolucci3
1
National Institute for Health Research, Diagnostic Evidence Cooperative, Imperial College of London, London,
United Kingdom
2
Department of Philosophy, Social & Human Sciences and Education, University of Perugia, Perugia, Italy
3
Department of Economics, University of Perugia, Perugia, Italy
4
Department of Political Sciences, University of Perugia, Perugia, Italy
Downloaded by [UNSW Library] at 00:52 13 August 2015

is one of the three main components of usability (ISO, 1998).


Nowadays, practitioners extensively apply quick and reliable Satisfaction analysis is a strategic way to collect information—
scales of user satisfaction as part of their user experience anal- before or after the release of a product—about the user experi-
yses to obtain well-founded measures of user satisfaction within ence (UX), defined as the “person’s perceptions and responses
time and budget constraints. However, in the human–computer
interaction literature the relationship between the outcomes of
resulting from the use and/or anticipated use of a product”
standardized satisfaction scales and the amount of product usage (ISO, 2010, p. 3).
has been only marginally explored. The few studies that have
investigated this relationship have typically shown that users who
have interacted more with a product have higher satisfaction. The 1.1. Amount of Experience and Perceived Usability
purpose of this article was to systematically analyze the varia- UX is a relatively new and broad concept that includes
tion in outcomes of three standardized user satisfaction scales and goes beyond traditional usability (Petrie & Bevan, 2009).
(SUS, UMUX, UMUX-LITE) when completed by users who had
As recently outlined by Lallemand, Gronier, and Koenig (2015),
spent different amounts of time with a website. In two studies, the
amount of interaction was manipulated to assess its effect on user experts have different points of view on UX; however, there
satisfaction. Measurements of the three scales were strongly corre- is a wide agreement in the HCI community that the UX con-
lated and their outcomes were significantly affected by the amount cept has a temporal component, that is, the amount of user
of interaction time. Notably, the SUS acted as a unidimensional interaction with a product affects people’s overall experience.
scale when administered to people who had less product experience
In addition, experts generally agree that the interactive expe-
but was bidimensional when administered to users with more expe-
rience. Previous findings of similar magnitudes for the SUS and rience of a user is affected by the perceived usability and
UMUX-LITE (after adjustment) were replicated but did not show aesthetics of an interface (Borsci, Kuljis, Barnett, & Pecchia,
the previously reported similarities of magnitude for the SUS and 2014; Hassenzahl, 2005; Hassenzahl & Tractinsky, 2006; Lee
the UMUX. Results strongly encourage further research to analyze & Koubek, 2012; Tractinsky, 1997), and the extent to which
the relationships of the three scales with levels of product exposure.
user needs are met (Hassenzahl et al, 2015). Accordingly, to
Recommendations for practitioners and researchers in the use of
the questionnaires are also provided. fully model the perceived experience of a user (Borsci, Kurosu,
Federici, & Mele, 2013; Lindgaard & Dudek, 2003; McLellan,
Muddimer, & Peres, 2012), practitioners should include a set
of repeated objective and subjective measures in their eval-
1. INTRODUCTION uation protocols to enable satisfaction analysis as a “subjec-
The assessment of website user satisfaction is a fascinating tive sum of the interactive experience” (Lindgaard & Dudek,
topic in the field of human–computer interaction (HCI). Since 2003, p. 430).
the late 1980s, practitioners have applied standardized usability Several studies have found that the magnitude of user satis-
questionnaires to the measurement of user satisfaction, which faction is associated with a user’s amount of experience with the
product or system under evaluation (Lindgaard & Dudek, 2003;
Address correspondence to Simone Borsci, National Institute
for Health Research, Diagnostic Evidence Cooperative, Imperial McLellan et al., 2012). For instance, Sauro (2011) reported that
College of London, St. Mary’s Hospital, QEQM Building, 10th System Usability Scale (SUS; Brooke, 1996) scores differed as
floor, Praed Street, W2 1NY London, United Kingdom. E-mail: a function of different levels of product experience. In other
s.borsci@imperial.ac.uk words, people with long-term experience in the use of a product
484
USER SATISFACTION IN ERA OF USER EXPERIENCE 485

tended to rate their satisfaction with higher (better) scores than more than 5,000 individual SUS responses, Sauro and Lewis
users with shorter terms of experience. (2012) found the overall mean score of the SUS to be 68 with a
In summary, researchers and practitioners assess user satis- standard deviation of 12.5.The Sauro and Lewis CGS assigned
faction by means of questionnaires, but there are only a few grades as a function of SUS scores ranging from F (absolutely
empirical studies that have systematically analyzed the vari- unsatisfactory) to A+ (absolutely satisfactory), as follows:
ation of the outcomes of satisfaction scales when filled out
• Grade F (0–51.7)
by users with different amounts of experience in the use of a
• Grade D (51.8–62.6)
product (Kortum & Johnson, 2013; Lindgaard & Dudek, 2003;
• Grade C– (62.7–64.9)
McLellan et al., 2012; Sauro, 2011).
• Grade C (65.0–71.0)
• Grade C+ (71.1–72.5)
1.2. The System Usability Scale • Grade B– (72.6–74.0)
Several standardized tools are available in the literature to • Grade B (74.1–77.1)
measure satisfaction (for a review, see Borsci et al., 2013). • Grade B+ (77.2–78.8)
An increasing trend favors the use of short scales due to their • Grade A– (78.9–80.7)
speed and ease of administration, either as online surveys for • Grade A (80.8–84.0)
customers or after a usability test. One of the most popular is • Grade A+ (84.1–100)
the System Usability Scale (SUS; Lewis, 2006; Sauro & Lewis, Although they should be interpreted with caution, the grades
2011; Zviran, Glezer, & Avni, 2006), which has been cited in
Downloaded by [UNSW Library] at 00:52 13 August 2015

from the CGS provide an initial basis for determining if a mean


more than 600 publications (Sauro, 2011) and is considered SUS score is below average, average, or above average (Sauro
an industry standard. Its popularity among HCI experts is due & Lewis, 2012).
to several factors, such as its desirable psychometric proper-
ties (high reliability and demonstrated validity), relatively short
length (10 items), and low cost (free; Bangor, Kortum, & Miller, 1.3. Ultrashort Scales: The UMUX and UMUX-LITE
2008; McLellan et al., 2012). Although the SUS is a quick scale, practitioners sometimes
The 10 items of the SUS were designed to form a need to use reliable scales that are even shorter than the SUS
unidimensional measure of perceived usability (Brooke, 1996). to minimize time, cost, and user effort. “This need is most
The standard version of the questionnaire has a mix of positive pressing when standardized usability measurement is one part
and negative tone items, with the odd-numbered items having of a larger post-study or online questionnaire” (Lewis, 2014,
a positive tone and the even-numbered items having a negative p. 676). As a consequence, quite recently, two new scales have
tone. Respondents rate the magnitude of their agreement with been proposed as shorter proxies of the SUS: the Usability
each item using a 5-point scale from 1 (strongly disagree) to Metric for User Experience (UMUX; see Table 1), a four-item
5 (strongly agree). To compute the overall SUS score, (a) each tool developed and validated by Finstad (2010, 2013), and the
item is converted to a 0–4 scale for which higher numbers indi- UMUX-LITE composed of only the two positive-tone questions
cated a greater amount of perceived usability, (b) the converted from the UMUX (Lewis et al., 2013, this issue). The scale for
scores are summed, and (c) the sum is multiplied by 2.5. This the UMUX items has 7 points, from 1 (strongly disagree) to 7
process produces scores that can range from 0 to 100. (strongly agree).
As Lewis (2014) recently stated, there are still lessons
that have to be learned about SUS, in particular about its
dimensionality. Despite the SUS having been designed to be TABLE 1
unidimensional, several researchers recently showed that the Items of the UMUX and UMUX-LITE
items of SUS might load in two dimensions: usability and
Item No. — Scale Item Content
learnability (Bangor et al., 2008; Borsci, Federici, & Lauriola,
2009; Lewis & Sauro, 2009; Lewis, Utesch, & Maher, 2013, this Item 1 – UMUX [This system’s] capabilities meet
issue; Sauro & Lewis, 2012). Since 2009, however, there have my requirements.
been reports of large-sample SUS data sets for which two-factor Item 1 – UMUX-LITE
structures did not have the expected item-factor alignment Item 2 – UMUX Using [this system] is a
(Items 4 and 10 with Learnable, all others with Usable), indi- frustrating experience.
cating a need for further research to clarify its dimensional Item 3 – UMUX [This system] is easy to use.
structure and the variables that might affect it (Lewis, 2014). Item 2– UMUX-LITE
In recent years, the growing availability of SUS data from a Item 4 – UMUX I have to spend too much time
large number of studies (Bangor et al., 2008; Kortum & Bangor, correcting things with [this
2012) has led to the production of norms for the interpretation system].
of mean SUS scores, for example, the Curved Grading Scale
(CGS; Sauro & Lewis, 2012). Using data from 446 studies and Note. UMUX = Usability Metric for User Experience.
486 S. BORSCI ET AL.

Some findings have shown the UMUX to be bidimensional 1.4. Research Goals
as a function of the item tone, positive versus negative (Lewis, The use of short scales as part of UX evaluation proto-
2013; Lewis et al., 2013), despite the intention to develop a cols could sensibly reduce the costs of assessment, as well as
unidimensional scale. The UMUX’s statistical structure might users’ time and effort to complete the questionnaires. Currently,
be an artifact of the mixed positive/negative tone of the items few studies have investigated the relationship among the SUS,
and in practice might not matter much. In light of this, both the UMUX, and UMUX-LITE, and none have analyzed their relia-
UMUX and its reduced version, the UMUX-LITE, are usually bilities as a function of different amounts of interaction with a
interpreted as unidimensional measures. product.
By design (using a method similar to but not exactly the same The primary goal of this article was to analyze the variation
as the SUS), the overall UMUX and UMUX-LITE scores can of SUS, UMUX, and UMUX-LITE outcomes when completed
range from 0 to 100. Their scoring procedures are as follows: concurrently by users with different levels of experience in the
use of a website. To reach this goal, we pursued three main
• UMUX: The odd items are scored as [score − 1] and
objectives. First, we aimed to explore the variation of UMUX
even items as [7 − score]. The sum of the item scores
and UMUX-LITE outcomes when administered to users with
is then divided by 24 and multiplied by 100 (Finstad,
two different levels of product experience. Second, we aimed
2010).
to observe whether, at different levels of product experience,
• UMUX-LITE: The two items are scored as [score − 1],
the correlations among the SUS, UMUX, and UMUX-LITE
and the sum of these is divided by 12 and multiplied by
were stable, with particular interest in the generalizability of
Downloaded by [UNSW Library] at 00:52 13 August 2015

100 (Lewis et al., 2013). For correspondence with SUS


Equation 1. Finally, we checked whether the levels of respon-
scores, this sum is entered into a regression equation
dents’ product experience affected the dimensional structure of
to produce the final UMUX-LITE score. The follow-
the SUS. It may be that the Learnable scale might not emerge
ing equation combines the initial computation plus the
until respondents have sufficient experience with the product
regression to show how to compute the recommended
they are rating. To achieve these aims, we performed two stud-
UMUX-LITE score from the ratings of its two items.
ies with the three standardized usability metrics to measure
the self-report satisfaction of end-users with different levels of
UMUX − LITE = .65(([Item 1 score] experience with an e-learning web platform known as CLab
(http://www.cognitivelab.it).
+ [Item 2 score] − 2)100/12) + 22.9.
(1)
2. METHODOLOGY
Prior research (Finstad, 2010, 2013; Lewis et al., 2013) has Students enrolled in the bachelor’s degree of psychology
shown that the SUS, UMUX, and UMUX-LITE are reliable program at the University of Perugia are strongly encouraged to
(Cronbach’s alpha between .80 and .95) and correlate signif- use the CLab target platform as an e-learning tool. Commonly,
icantly (p < .001). In the research reported to date, UMUX students access it at least once a week for several reasons:
scores have not only correlated with the SUS but also had a sim- for instance, to look for information about courses and exam
ilar magnitude. However, for the UMUX-LITE, it is necessary timetables, to sign in for mandatory attendance classes, to
to use the preceding formula (Equation 1) to adjust its scores to download course materials, to book a test/exam, or to post a
achieve correspondence with the SUS (Lewis et al., 2013). For question and discuss issues with the professor.
the rest of this article, reported UMUX-LITE values are those For each study, a sample of volunteer students was asked to
computed using Equation 1. Thus, the literature on the UMUX assess the interface after different times of usage (based on their
and UMUX-LITE (Finstad, 2010; Lewis et al., 2013) suggests date of subscription to the platform) by filling out the SUS,
that these two new short scales can be used as surrogates for the UMUX, and UMUX-LITE questionnaires, presented in a ran-
SUS. dom order. The Italian version of the SUS used in Borsci et al.
Currently, three studies (Kortum & Johnson, 2013; McLellan (2009) was administered. In addition, translations and retransla-
et al., 2012; Sauro, 2011) have investigated the relation- tions were made by an independent group of linguistic experts
ship between SUS scores and amount of product experience. to produce Italian versions of the UMUX and UMUX-LITE.
The results of these studies have consistently indicated that Participants of the two studies were invited to fill out the
more experienced users had higher satisfaction outcomes (SUS scales 2 months (Study 1) or 6 months (Study 2) after they
scores). Notably, researchers have not yet studied this effect on first accessed CLab. In these studies, participants received the
the outcomes of quick scales such as the UMUX and UMUX- same instruction before the presentation of the questionnaires:
LITE. The comparative analyses of these instruments were “Please, rate your satisfaction in the use of the platform on the
performed mainly to validate the questionnaires, without con- basis of your current experience of use” [Per favore, in base
sidering the effect of the different levels of experience in the alla tua attuale esperienza d’uso, valuta la tua soddisfazione
use of a website (Finstad, 2010; Lewis et al., 2013). nell’utilizzo della piattaforma].
USER SATISFACTION IN ERA OF USER EXPERIENCE 487

The two studies were organized to measure different times Therefore, we expected students in Study 2 (cumulative
of participants’ exposure to CLab, thus measuring different UX condition) to rate the CLab with all the three scales as
moments of UX acquisition, as follows: more satisfactory compared to users in the first study due
to their greater product exposure (6 months). Concurrently,
• Study 1, carried out 2 months after the students first
we expected participants with greater levels of product
accessed CLab. The participants’ number of access
frequency of use to rate the CLab with all the scales as
times and interaction with the platform (time exposure)
more satisfactory than participants with lower levels of
ranged from eight (once a week) to 56 (once a day).
use.
• Study 2, carried out 6 months after the students first
accessed CLab. The participants’ number of access
times and interaction with the platform (time exposure)
2.2. Data Analysis
ranged from 24 (once a week) to 168 (once a day).
For each study, principal components analyses were per-
The two studies were reviewed and approved by the formed to assess the SUS’ dimensionality—focusing on
Institutional Review Board of the Department of Philosophy, whether the item alignment of the resulting two-component
Social and Human Sciences and Education, University of structure was consistent with the emergence of Learnable and
Perugia. All participants provided their written informed con- Usable components. Only if this expected pattern did not
sent to participate in this study. No minors/children were emerge did we plan to follow up with a multidimensional latent
enrolled in this study. The study presented no potential risks. class item response theory model (LC IRT) to more deeply
Downloaded by [UNSW Library] at 00:52 13 August 2015

test the dimensional structure of the scale (Bacci, Bartolucci,


& Gnaldi, 2014; Bartolucci, 2007; Bartolucci, Bacci, & Gnaldi,
2.1. Hypotheses
2014). The primary purpose of this additional analysis would
Concerning the effect of usage levels on the SUS’ dimen-
be to confirm if a unidimensional structure was a better fit to the
sional structure, we expected the following:
data than the expected bidimensional structure—an assessment
that is not possible with standard principal components analy-
H1: The SUS dimensionality would be affected by the level of
sis because it is impossible to rotate a unidimensional solution
experience acquired by the participants with the product
(Cliff, 1987).
before the administration of the scale.
Descriptive statistics (mean, standard deviation) and Pearson
correlation analyses among the scales were performed to com-
The second hypothesis concerns the correlations among the
pare the outcomes of the three scales and observe their rela-
tools (SUS, UMUX, UMUX-LITE). Recently (Finstad, 2010;
tionships. Moreover, one-way analyses of variance (ANOVAs)
Lewis et al., 2013), researchers have reported strong correla-
were carried out to assess the effect of experience on user satis-
tions among the SUS, UMUX, and UMUX-LITE. There is
faction as measured by the SUS, UMUX, and UMUX-LITE for
as yet, however, no data on the extent to which the correla-
each study, and a final comprehensive ANOVA was conducted
tions might be affected by the users’ levels of acquired product
to enable comparison of results between the two studies. The
experience. Thus, we expected the following:
MultiLCIRT package of R software by Bartolucci et al. (2014)
was used to estimate the multidimensional LC IRT models. All
H2: Significant correlations among the overall scores of the ®
other analyses were performed using IBM SPSS 22.
SUS, UMUX, and UMUX-LITE, for all studies indepen-
dent of the administration conditions, that is, different
amounts of interaction time with the target website. 3. THE STUDIES

Finally, the third hypothesis concerns the relationship 3.1. Study 1


between scale outcomes and users’ levels of product experience. Participants
As previously noted, user satisfaction may vary depending on One hundred eighty-six 1st-year students of psychology
both the level of product experience and the time of exposure (31 male [17%], M age = 21.97, SD = 5.63) voluntarily par-
to a product (Lindgaard & Dudek, 2003; McLellan et al., 2012; ticipated in the study 2 months after their subscription to and
Sauro, 2011). In particular, experts tend to provide higher sat- first use of the platform.
isfaction scores compared to novices (Sauro, 2011). In the light
of this, we expected the following: Procedure
All participants used an online form to fill out the question-
H3: User satisfaction measured through SUS, UMUX, and naires (SUS, UMUX, and UMUX-LITE) and indicated their
UMUX-LITE would be affected by the different conditions weekly use of the platform, from 1 (once per week) to 5 (once a
of time exposure to the target website (2 or 6 months), day). Participants were asked to rate their satisfaction in the use
as well as by different level of website frequency of use. of CLab on the basis of their current experience of use.
488 S. BORSCI ET AL.

Results of Study 1 IRT models are considered a powerful alternative to principal


SUS dimensionality. As shown in Table 2, principal components analysis, especially when questionnaires consist
components analysis with Varimax rotation suggested a of binary or (ordered) polytomously scored items rather than
unidimensional solution was appropriate for this set of SUS quantitative items.
data. The table shows the item loadings for one- and two- Our analysis proceeded with two steps. The first was the
component solutions. selection of the number of latent classes (C). To compare
The expected two-component solution would have shown unidimensional and bidimensional assumptions through the
Items 4 and 10 aligning with one component and the other class of models at issue, we first needed to detect the optimal
eight items aligning on the other. Instead, the apparent pattern of number of latent classes, that is, the number of groups that
alignment was positive tone (odd-numbered) items versus nega- ensures a satisfactory level of goodness of fit of the statistical
tive tone (even-numbered) items, similar to the pattern observed model at issue to the observed data. For this aim, we selected the
in previous research for the UMUX. Another indicator of the number of latent classes, relying on the Bayesian Information
inappropriateness of this two-component solution was the items Criterion (BIC) index (Schwarz, 1978). More specifically, we
that had relatively large loadings on both components (Items 2, estimated unidimensional LC IRT models to increase values of
3, 5, and 7). C (C = 1; 2; 3; 4), keeping constant all the other elements char-
To verify the appropriateness of the one-component solution, acterizing the class of models. We took that value just before
we conducted additional analyses with a special class of sta- the first increase of BIC. We then repeated the analysis under
tistical models known as LC IRT models (Bacci et al., 2014; the assumption of bidimensionality.
Downloaded by [UNSW Library] at 00:52 13 August 2015

Bartolucci, 2007; Bartolucci et al., 2014). This class of mod- Table 3 shows that the minimum value of the BIC index
els extended traditional IRT models (Nering & Ostini, 2010; was observed for C = 3, both in the unidimensional case
van der Linden & Hambleton, 1997) in two main directions. and in the bidimensional case, suggesting that the sample of
First, they allow the analysis of item responses in cases of individuals came from a population composed of three latent
questionnaires that measure more than one factor (also called, classes. As smaller values of BIC are better than higher values, a
in the context of IRT, latent trait or latent variable or abil- comparison between the unidimensional and the bidimensional
ity). Second, multidimensional LC IRT models assume that the models with C = 3 found that the BIC index gave evidence
population is composed of homogeneous groups of individu- of a better goodness of fit for the model with unidimensional
als sharing unobserved but common characteristics (so-called structure (BIC = 2924.805) than for that with bidimensional
latent classes; Goodman, 1974; Lazarsfeld & Henry, 1968). structure (BIC = 2941.496; bold in Table 3).

TABLE 2
Principal Components Analysis of the System Usability Scale in Study 1 Showing One- and Two-Component Solutions
Eigenvalues Extraction Unidimensional Bidimensional

Items One Component Component 1 Component 2


9. I felt very confident using this website. .936 0.198 0.803
7. I would imagine that most people would learn to .932 0.455 0.505
use this website very quickly.
1. I would like to use this website frequently. .867 0.113 0.762
2. I found this website unnecessarily complex. .865 0.509 0.418
8. I found this website very cumbersome/awkward .839 0.822 −0.019
to use.
3. I thought this website was easy to use. .814 0.471 0.503
6. I thought there was too much inconsistency in .815 0.513 0.266
this website.
5. I found the various functions in this website .823 0.500 0.626
were well integrated.
10. I needed to learn a lot of things before I could .753 0.789 0.055
get going with this website.
4. I think that I would need assistance to be able to .637 0.517 0.360
use this website.
Note. Bidimensional loadings greater than .4 in bold.
USER SATISFACTION IN ERA OF USER EXPERIENCE 489

TABLE 3
Unidimensional and Bidimensional LC IRT Models for System Usability Scale Data: Number of Latent Classes (C), Estimated
Maximum Log-Likelihood (), Number of Parameters (#par), Bayesian Information Criterion (BIC) Index
Unidimensional Model Bidimensional Model

C  #par BIC C  #par BIC


1 −1470.447 40 3147.714 1 −1470.447 40 3147.714
2 −1405.317 24 2934.725 2 −1405.057 27 2949.717
3 −1395.186 26 2924.805 3 −1393.190 30 2941.496
4 −1394.914 28 2934.602 4 −1386.052 33 2942.729
Note. Bold indicates better goodness of fit for the model with unidimensional structure than for that with bidimensional structure.

The unidimensionality assumption was also verified through TABLE 4


a likelihood-ratio (LR) test as follows. Given the number of Correlations Among SUS, UMUX, and UMUX-LITE in Study
latent classes selected in the previous step, an LR test was used 1 (With 95% CIs)
to compare models that differ in terms of the dimensional struc-
Downloaded by [UNSW Library] at 00:52 13 August 2015

ture, that is, bidimensional versus unidimensional structure. UMUX CI UMUX-LITE CI


This type of statistical test allowed us to evaluate the similarity
r Lower Upper r Lower Upper
between a general model and a restricted model, that is, a model
obtained by the general one by imposing one constraint so SUS .554∗∗ .430 .679 447∗∗ .313 .581
that the restricted model is nested in the general model. More UMUX 1 — — .838∗∗ .756 .920
precisely, an LR test evaluates, at a given significance level, the
null hypothesis of equivalence between the two nested models Note. SUS = System Usability Scale; UMUX = Usability Metric
for User Experience; CI = confidence interval.
at issue: If the null hypothesis is not rejected, the restricted ∗∗
p = .01 (two-tailed).
mode is preferred, in the interests of parsimony; if the null
hypothesis is rejected, the general model is preferred. In our
framework, the general model represents a bidimensional struc- other scales, so it appeared that the UMUX and UMUX-LITE
ture, where Items 4 and 10 of the SUS questionnaire contribute had similar magnitudes of association with the SUS.
to a different latent trait with respect to the remaining ones, Table 5 shows that, on average, participants rated the plat-
whereas the restricted model is used when all items belong to form as satisfactory (scores higher than 70 out of 100) for all
the same dimension. The LR test is based on the difference three questionnaires, that is, with SUS scores greater than or
between the maximum log-likelihood of the two models, and it equal to C on the Sauro-Lewis CGS (Sauro & Lewis, 2012).
evaluates if this difference is statistically significantly different As demonstrated by comparison of the confidence intervals,
from zero. Higher values of log-likelihood difference denote there were significant differences in the magnitudes of the
that the hypothesis of unidimensionality is unlikely and it three metrics, with the SUS and UMUX-LITE having a closer
should be discarded in favor of bidimensionality. In particular, correspondence than the SUS and UMUX.
the LR test statistic is given by −2 times the difference between User satisfaction and frequency of use. Of the participants,
the maximum log-likelihood and, under certain regularity 61.9% declared that they used the platform for more than 3 days
conditions, is distributed as a chi-square with the difference in per week to every day, whereas 38.1% stated a lower rate
the number of free parameters in the two compared models as of usage. A one-way ANOVA showed a significant difference
degrees of freedom. among the user satisfaction levels between participants with a
The LR test statistic equaled 3.9918 with 4 degrees of free- low (1–2 days per week), medium (3–4 days per week), and
dom. The resulting p value, based on the chi-square distribution, high (≥ 5 days per week) self-reported level of use of CLab
was equal to 0.4071, and therefore the null hypothesis of SUS for all the questionnaires: SUS, F(2, 184) = 4.39, p = .014;
unidimensionality cannot be rejected for Study 1. To conclude, UMUX, F(2, 184) = 8.71, p = .001; and UMUX-LITE, F(2,
both the BIC index and the LR test provided evidence in favor 184) = 6.76, p = .002. In particular, least significant difference
of the unidimensionality assumption. post hoc analyses showed that students who were used to inter-
Correlations among the tools. The overall SUS, UMUX, acting with the platform from more than 3 days per week to
and UMUX-LITE scale results significantly correlated (p < every day (those who used the website more frequently) tended
.001; Table 4). There was considerable overlap in the 95% con- to judge the product as more satisfactory than people with less
fidence intervals for the correlations between the SUS and the exposure (Figure 1).
490 S. BORSCI ET AL.

TABLE 5
Means and Standard Deviations of Overall Scores of the SUS, UMUX, and UMUX-LITE for Study 1 With 95% (and Associated
CGS Grades)
Scale M SD 95% CI Lower 95% CI Upper
SUS 70.88 (C) 6.703 69.9 (C) 71.8 (C+)
UMUX 84.66 (A+) 12.838 82.8 (A) 86.5 (A+)
UMUX-LITE 73.83 (B − −) 9.994 72.4 (C+) 75.3 (B)
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience; CI = confidence interval.

unidimensional and bidimensional models, keeping constant all


the other elements characterizing the class of models. We took
that value just before the first increase of BIC. Table 7 shows
that the minimum value of the BIC index for the unidimensional
model is C = 4, and C = 5 for the bidimensional case. In Study
2, the smaller value of BIC is outlined in the bidimensional
model C = 5. The BIC index gave evidence of a better good-
Downloaded by [UNSW Library] at 00:52 13 August 2015

ness of fit for the model with bidimensional structure (BIC =


1912.636) than for that with unidimensional structure (BIC =
2059.4; bold in Table 7).
Finally, the LR test statistic equaled 164.206 with 3 degrees
of freedom for the bidimensional model and 148.5492 with
2 degrees of freedom for the unidimensional model. For both
the models (bidimensional and unidimensional) the resulting p
value, based on the chi-square distribution, was equal to 0.001.
Therefore, the null hypothesis of SUS unidimensionality can
be rejected for Study 2. To conclude, both the BIC index and
the LR test provided evidence in favor of the bidimensionality
assumption.
Correlations among the tools. As in Study 1, all three
scales were strongly correlated (p < .001; see Table 8).
FIG. 1. Interaction between scale and frequency of use for Study 1. Note. Comparison of the 95% confidence intervals indicated that the
UMUX = Usability Metric for User Experience; SUS = System Usability magnitudes of association among the three scales were similar
Scale.
between the studies (p > .05).
Table 9 shows that participants judged the platform as
3.2. Study 2 satisfactory—that is, with mean scores higher than 75 out of
Participants and Procedure 100—for all three questionnaires. For this set of data, there was
Ninety-three students of psychology (17 male [18%], M substantial overlap in the confidence intervals for the SUS and
age = 22.03, SD = 1.44) voluntarily participated in the study UMUX-LITE, with identical CGS grades for the SUS/UMUX-
6 months after their subscription to and first use of CLab. LITE means and interval limits. Consistent with the findings
Participants followed the same procedure as in Study 1. from Study 1, the magnitude of difference between the mean
SUS and UMUX scores was significant.
Results of Study 2 User satisfaction and frequency of use. Of the participants,
SUS dimensionality. To check the dimensionality of the 51% declared that they used the platform for more than 3 days
SUS after 6 months of usage, we performed a principal compo- per week to every day, whereas 49% declared a lower rate
nents analysis with Varimax rotation. Table 6 shows the SUS, of usage. A one-way ANOVA confirmed that for all three
under the test conditions of Study 2, was composed of two scales there was a significant difference—SUS, F(2, 91) =
dimensions in line with the previous studies that reported align- 18.37, p = .001; UMUX, F(2, 91) = 12.11, p = .001; and
ment of Items 4 and 10 separate from the other items (Borsci UMUX-LITE, F(2, 91) = 11.57, p = .001—among the sat-
et al., 2009; Lewis & Sauro, 2009; Lewis et al., 2013). isfaction rates of participants (at least for students with low
To further confirm the bidimensional structure of SUS in and high levels of exposure to the platform). Again, our results
Study 2, we performed a LC IRT analysis (Table 7). We esti- indicated a significant relationship between frequency of use
mated the increase values of C (C = 1; 2; 3; 4; 5) for both the and satisfaction (Figure 2).
USER SATISFACTION IN ERA OF USER EXPERIENCE 491

TABLE 6
Principal Components Analysis of the System Usability Scale in Study 2
Items Usability Learnability
8. I found this website very cumbersome/awkward to use. .910 .121
1. I would like to use this website frequently. .869 .105
5. I found the various functions in this website were well integrated. .800 .122
3. I thought this website was easy to use. .769 .226
7. I would imagine that most people would learn to use this website .754 .258
very quickly.
2. I found this website unnecessarily complex. .739 .273
6. I thought there was too much inconsistency in this website. .708 .183
9. I felt very confident using this website. .683 .106
10. I needed to learn a lot of things before I could get going with .133 .841
this website.
4. I think that I would need assistance to be able to use this website. .206 .753
Note. Loadings greater than .4 in bold.
Downloaded by [UNSW Library] at 00:52 13 August 2015

TABLE 7
Unidimensional and Bidimensional Latent Class Item Response Theory Models for System Usability Scale Data: Number of
Latent Classes (C), Estimated Maximum Log-Likelihood (), Number of Parameters (#par), Bayesian Information Criterion
(BIC) Index
Unidimensional Model Bidimensional Model

C  #par BIC C  #par BIC


1 −1050.62 40 2282.965 1 −1050.62 40 2282.965
2 −996.09 24 2101.22 2 −964.826 24 2038.692
3 −975.599 26 2069.323 3 −910.835 27 1944.338
4 −966.094 28 2059.4 4 −891.82 30 1919.938
5 −963.457 30 2063.212 5 −881.354 33 1912.636
6 −963.457 32 2072.299 6 −878.343 36 1920.244
Note. Bold indicates a better goodness of fit for the model with bidimensional structure than for that with unidimensional structure.

TABLE 8
Correlations Among SUS, UMUX and UMUX-LITE in Study 2 (With 95% CI)
UMUX CI UMUX-LITE CI

r Lower Upper r Lower Upper


SUS .716∗∗ . 571 . 861 658∗∗ . 502 . 815
UMUX 1 — — . 879∗∗ . 717 . 953
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience; CI = confidence interval.
∗∗
p = .01 (two-tailed).

4. GENERAL RESULTS different frequencies of use indicated significant main effects


Although the questionnaire results strongly correlated and interactions for all three scales (see Table 10). Figure 3
independent of the administration conditions, in line with provides a graphic depiction of the significant interactions.
Hypothesis 3, we performed a comprehensive ANOVA to ana- Least significant difference post hoc analyses revealed sig-
lyze the differences among the ratings as a function of duration nificant differences (p < .001) for all comparisons of the
of exposure (2 or 6 months) and independently as a function of different frequencies of use (low, medium, or high).
492 S. BORSCI ET AL.

TABLE 9
Means and Standard Deviations of Overall Scores of the SUS, UMUX, and UMUX-LITE for Study 2 With 95% CIs (and
Associated Curved Grading Scale Grades)
Scale M SD 95% CI Lower 95% CI Upper
SUS 75.24 (B) 13.037 72.6 (B) 77.0 (B+)
UMUX 87.69 (A+) 10.291 85.6 (A+) 89.8 (A+)
UMUX-LITE 76.45 (B) 9.943 74.4 (B) 78.5 (B+)
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience; CI = confidence interval.

probably sensible to the level of confidence acquired by users in


using the product functions and in anticipating system reactions.
Our results are consistent with Dix, Finlay, Abowd, and Beale’s
(2003) definition of learnability as “the ease with which new
users can begin effective interaction and achieve maximal per-
formance” (p. 260). In fact, the second dimension of the SUS
might emerge only when users perceived themselves as effec-
Downloaded by [UNSW Library] at 00:52 13 August 2015

tive in the use of the product, making this an interesting topic


for future research.
All the satisfaction scale results strongly correlated under
each condition of scale administration, demonstrating conver-
gent validity of the construct they purport to measure. The
Italian versions of the UMUX and UMUX-LITE, as with
the version of the SUS previously validated by other stud-
ies (Borsci et al., 2009), confirmed the reliability coefficient
rates of the English version, with a Cronbach’s alpha between
.80 and .90 for the UMUX, and between .71 and .85 for
the UMUX-LITE. Moreover, independent of the condition of
scale administration, Equation 1 functioned properly by bring-
ing UMUX-LITE scores into reasonable correspondence with
FIG. 2. Interaction between scale and frequency of use for Study 2. Note. concurrently collected SUS scores.
UMUX = Usability Metric for User Experience; SUS = System Usability Unlike previous studies (Finstad, 2010; Lewis et al., 2013),
Scale.
where the magnitudes of UMUX averages were quite close
to corresponding SUS averages, in the present study the aver-
age UMUX scores were significantly higher compared to the
5. DISCUSSION SUS and UMUX-LITE means (Tables 4 and 7). For instance,
Table 11 summarizes the testing outcomes for the for our cohort of participants with 2 months of product expe-
hypotheses. rience, Table 5 shows that although the interval limits around
The outcomes of the studies show that the learnability the UMUX had CGS grade ranges from A to A+, the SUS and
dimension of the SUS, as suggested, might emerge only under UMUX-LITE were more aligned with grade ranges from C to B.
certain conditions, that is, when it is administered to users after a This difference with previous outcomes could be a peculiarity
long enough period of exposure to the interface. This variability of this research. However, the differences among the results of
of the SUS dimensionality may be due to its original develop- UMUX and the other two scales were large enough to lead prac-
ment as a unidimensional scale of perceived usability (Brooke, titioners to make different decisions about CLab. For instance,
1996). Items 4 and 10 of the SUS compose the learnability by relying only on UMUX outcomes, a practitioner may report
dimension of this scale (Tables 1 and 5). Item 4 pertains to the to designers that CLab is a very satisfactory interface and no fur-
need for support in use of the system (“I think I would need the ther usability analyses are needed. Alternatively, on the basis of
support of a technical person to be able to use this system”), the SUS or UMUX-LITE outcomes, practitioners would likely
and Item 10 pertains to the perceived complexity of learning report to designers that CLab is reasonably satisfactory, but fur-
the system (“I needed to learn a lot of things before I could get ther usability analysis and redesign could improve the overall
going with this system”). These two items are strongly related interaction experience of end-users (the UX).
to the ability of users to quickly understand how to use the prod- Finally, as Tables 9 and 10 show, the outcomes of all three
uct without help. In tune with that, the learnability dimension is questionnaires were affected by the levels of experience of the
USER SATISFACTION IN ERA OF USER EXPERIENCE 493

TABLE 10
Main Effects and Interactions for Combined Analysis of Variance
Scale Effect Outcome
SUS Main effect of duration F(1, 263) = 10.7, p = .001
Main effect of frequency of use F(2, 263) = 30.9, p < .0001
Duration × Frequency interaction F(2, 263) = 15.8, p < .0001
UMUX Main effect of duration F(1, 263) = 17.4, p < .0001
Main effect of frequency of use F(2, 263) = 22.2, p < .0001
Duration × Frequency interaction F(2, 263) = 3.4, p = .035
UMUX-LITE Main effect of duration F(1, 263) = 4.7, p = .03
Main effect of frequency of use F(2, 263) = 16.8, p < .0001
Duration × Frequency interaction F(2, 263) = 4.7, p = .01
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience.

gathered by means of short scales, future studies should include


users with various individual differences and divergent char-
Downloaded by [UNSW Library] at 00:52 13 August 2015

acteristics (e.g., experience with the product, age, education,


individual functioning, and disability) in the use of different
types of websites.
Our results, obtained through the variation of the amount of
exposure of users to the interface, showed that people with more
exposure to a product were likely to rate the interface as more
satisfactory. However, because this is correlational evidence, it
is also possible that users who experience higher satisfaction
during early use of a product might choose to use it more fre-
quently, thus gaining high levels of product experience. Future
longitudinal studies should investigate this relationship, mea-
suring, through SUS, UMUX, and UMUX-LITE, whether users
FIG. 3. Interaction between scale and frequency of use for Studies 1 and 2. who perceive a product as satisfactory tend to spend more time
Note. UMUX = Usability Metric for User Experience; SUS = System Usability using it. It would also be of value to conduct a designed exper-
Scale.
iment with random assignment of participants to experience
conditions to overcome the ambiguity of correlations.
respondents, by both the duration of use across months and the
weekly frequency of use. In tune with previous studies (Kortum
& Johnson, 2013; McLellan et al., 2012; Sauro, 2011), users 6. CONCLUSIONS
with a greater amount of product experience were more satis- Prior product experience was associated with the user satis-
fied than users with less experience. This was also confirmed faction measured by the SUS and its proposed alternate ques-
by the ANOVAs performed for Studies 1 and 2: Greater prod- tionnaires, the UMUX and UMUX-LITE. Therefore, consistent
uct experience was associated with a higher the level of user with previous research (Kortum & Johnson, 2013; McLellan
satisfaction. et al., 2012; Sauro, 2011), to obtain an exhaustive picture of user
satisfaction, researchers and practitioners should take into con-
sideration each user’s amount of exposure to the product under
5.1. Limitations of the Study evaluation (duration and frequency of use).
Even though the outcomes of this study were generally All of the scales we analyzed were strongly correlated and
in line with previous research, the representativeness of these can be used as quick tools to assess user satisfaction. Therefore,
results is limited due to the characteristics of cohorts involved practitioners who plan to use one or all of these scales should
in the study. Our results concern satisfaction in the use of an carefully consider their administration for the proper manage-
e-learning web interface rated by students with similar charac- ment of satisfaction outcomes. Based on our results, we offer
teristics (age, education, country, etc.). Therefore, we cannot several points of advice:
assume that the outcomes will generalize to other kinds of
interfaces. To exhaustively explore the relationship between • When administered to users after a short period of
the amount of experience gained through time and satisfaction product use, it is safest to consider the SUS to be a
494 S. BORSCI ET AL.

TABLE 11
Summary of Study Outcomes by Hypothesis
Hypothesis Result Meaning
Hypothesis 1 Supported SUS dimensionality was affected by the different levels of product experience. When
administered to users with less product experience, the SUS had a unidimensional
structure, whereas it had a bidimensional structure for respondents with more product
experience.
Hypothesis 2 Supported All the three scales were strongly correlated, independent of the administration
conditions.
Hypothesis 3 Supported Participants with more product experience were more satisfied than those with less
product experience regardless of whether that experience was gained over a duration of
exposure or by frequency of use.
Note. SUS = System Usability Scale.

unidimensional scale, so we recommend against par- detailed investigation of their relationships and psychometric
titioning it into Usable and Learnable components in properties.
Downloaded by [UNSW Library] at 00:52 13 August 2015

that context. Moreover, practitioners should anticipate


the satisfaction scores of newer users will be signif-
ACKNOWLEDGMENTS
icantly lower than the scores of more experienced
people. We thank Dr. James R. Lewis, senior human factors engineer
• When the SUS is administered to more experienced at IBM Software Group and guest editor of this special issue,
users, the scale appears to have bidimensional prop- for his generous feedback during the preparation of this paper.
erties, making it suitable to compute both an overall
SUS score and its Learnable and Usable components.
The overall level of satisfaction will be higher than that REFERENCES
Bacci, S., Bartolucci, F., & Gnaldi, M. (2014). A class of multidimen-
among less experienced users. sional latent class IRT models for ordinal polytomous item responses.
• Due to their high correlation with the SUS, in par- Communications in Statistics – Theory and Methods, 43, 787–800.
ticular, the UMUX and UMUX-LITE overall scores doi:10.1080/03610926.2013.827718
Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation
showed similar behaviors.
of the System Usability Scale. International Journal of Human-Computer
• If using one of the ultrashort questionnaires as a proxy Interaction, 24, 574–594. doi:10.1080/10447310802205776
for the SUS, the UMUX-LITE (with its adjustment Bartolucci, F. (2007). A class of multidimensional IRT models for test-
formula) appears to provide results that are closer in ing unidimensionality and clustering items. Psychometrika, 72, 141–157.
doi:10.1007/s11336-005-1376-9
magnitude to the SUS than the UMUX, making it the Bartolucci, F., Bacci, S., & Gnaldi, M. (2014). MultiLCIRT: An R package
more desirable proxy. for multidimensional latent class item response models. Computational
Statistics & Data Analysis, 71, 971–985. doi:10.1016/j.csda.2013.05.018
The UMUX and UMUX-LITE are both reliable and valid Borsci, S., Federici, S., & Lauriola, M. (2009). On the dimensionality of the
proxies of the SUS. Nevertheless, Lewis et al. (2013) suggested System Usability Scale (SUS): A test of alternative measurement models.
using them in addition to the SUS rather than instead of the Cognitive Processing, 10, 193–197. doi:10.1007/s10339-009-0268-9
Borsci, S., Kuljis, J., Barnett, J., & Pecchia, L. (2014). Beyond the user
SUS for critical usability work, due to their recent development
preferences: Aligning the prototype design to the users’ expectations.
and still limited employment. In particular, on the basis of our Human Factors and Ergonomics in Manufacturing & Service Industries.
results, we recommend that researchers avoid using only the doi:10.1002/hfm.20611
UMUX for their analysis of user satisfaction because, at least in Borsci, S., Kurosu, M., Federici, S., & Mele, M. L. (2013). Computer systems
experiences of users with and without disabilities: An evaluation guide for
the current study, this scale seemed too optimistic. In the forma- professionals. Boca Raton, FL: CRC Press. doi:10.1201/b15619-1
tive phase of design or in agile development, the UMUX-LITE Brooke, J. (1996). SUS: A “quick and dirty” usability scale. In P. W. Jordan,
could be adopted as a preliminary and quick tool to test users’ B. Thomas, B. A. Weerdmeester, & I. L. McClelland (Eds.), Usability
reactions to a prototype. Then, in advanced design phases or in evaluation in industry (pp. 189–194). London, UK: Taylor & Francis.
Cliff, N. (1987). Analyzing multivariate data. San Diego, CA: Harcourt Brace
summative evaluation phases, we recommend using a combina- Jovanovich.
tion of the SUS and UMUX-LITE (or UMUX) to assess user Dix, A., Finlay, J., Abowd, G. D., & Beale, R. (2003). Human–computer
satisfaction with usability (note that because the UMUX-LITE interaction. Harlow, UK: Pearson Education.
was derived from the UMUX, when you collect the UMUX Finstad, K. (2010). The usability metric for user experience. Interacting with
Computers, 22, 323–327. doi:10.1016/j.intcom.2010.04.004
you also collect the data needed to compute the UMUX-LITE). Finstad, K. (2013). Response to commentaries on “The Usability Metric
Over time this could lead to a database of concurrently collected for User Experience.” Interacting with Computers, 25, 327–330.
SUS, UMUX, and UMUX-LITE scores that would allow more doi:10.1093/iwc/iwt005
USER SATISFACTION IN ERA OF USER EXPERIENCE 495

Goodman, L. A. (1974). Exploratory latent structure analysis using Sauro, J., & Lewis, J. R. (2011). When designing usability question-
both identifiable and unidentifiable models. Biometrika, 61, 215–231. naires, does it hurt to be positive? In Proceedings of CHI 2011
doi:10.1093/biomet/61.2.215 (pp. 2215–2223). Vancouver, Canada: ACM. doi:10.1145/1978942.
Hassenzahl, M. (2005). The thing and I: Understanding the relationship between 1979266
user and product. In M. Blythe, K. Overbeeke, A. Monk, & P. Wright Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience:
(Eds.), Funology: From usability to enjoyment (Vol. 3, pp. 31–42). Berlin, Practical statistics for user research. Burlington, MA: Morgan
Germany: Springer. doi:10.1007/1-4020-2967-5_4 Kaufmann.
Hassenzahl, M., & Tractinsky, N. (2006). User experience—A research agenda. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of
Behaviour & Information Technology, 25, 91–97. doi:10.1080/01449290 Statistics, 6, 461–464. doi:10.2307/2958889
500330331 Tractinsky, N. (1997). Aesthetics and apparent usability: Empirically assess-
Hassenzahl, M., Wiklund-Engblom, A., Bengs, A., Hägglund, S., & ing cultural and methodological issues. In Proceedings of CHI 1997
Diefenbach, S. (2015). Experience-Oriented and Product-Oriented (pp. 115–122). Atlanta, GA: Association for Computing Machinery.
Evaluation: Psychological Need Fulfillment, Positive Affect, and Product doi:10.1145/258549.258626
Perception. International Journal of Human-Computer Interaction, 31, van der Linden, W., & Hambleton, R. K. (1997). Handbook of modern item
530–544. response theory. New York, NY: Springer.
ISO 9241-11:1998 Ergonomic requirements for office work with visual display Zviran, M., Glezer, C., & Avni, I. (2006). User satisfaction from commercial
terminals – Part 11: Guidance on usability. web sites: The effect of design and use. Information & Management, 43,
ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210: 157–178. doi:10.1016/j.im.2005.04.002
Human-centred design for interactive systems.
Kortum, P. T., & Bangor, A. (2012). Usability ratings for everyday prod-
ucts measured with the System Usability Scale. International Journal
of Human–Computer Interaction, 29, 67–76. doi:10.1080/10447318.2012.
681221 ABOUT THE AUTHORS
Downloaded by [UNSW Library] at 00:52 13 August 2015

Kortum, P. T., & Johnson, M. (2013). The relationship between levels of user Simone Borsci is a Research Fellow in Human Factors
experience with a product and perceived system usability. Proceedings of at Imperial College of London NHIR-Diagnostic Evidence
the Human Factors and Ergonomics Society Annual Meeting, 57, 197–201.
doi:10.1177/1541931213571044 Cooperative group. He has over 10 years of experience as psy-
Lallemand, C., Gronier, G., & Koenig, V. (2015). User experience: A con- chologist and HCI expert in both industry and academia. He
cept without consensus? Exploring practitioners’ perspectives through has worked as the UX lead of the Italian Government’s working
an international survey. Computers in Human Behavior, 43, 35–48. group on usability, and as a researcher at University of Perugia,
doi:10.1016/j.chb.2014.10.048
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Brunel University, and Nottingham University.
Houghton, Mifflin. Stefano Federici is currently Associate Professor of General
Lee, S., & Koubek, R. J. (2012). Users’ perceptions of usability and aesthetics Psychology at the University of Perugia. He is the coordi-
as criteria of pre- and post-use preferences. European Journal of Industrial
nator of a research team of CognitiveLab at University of
Engineering, 6, 87–117. doi:10.1504/EJIE.2012.044812
Lewis, J. R. (2006). Usability testing. In G. Salvendy (Ed.), Handbook of human Perugia (www.cognitivelab.it). His research is focused on assis-
factors and ergonomics (pp. 1275–1316). New York, NY: Wiley & Sons. tive technology assessment processes, disability, and cognitive
Lewis, J. R. (2013). Critical review of “The Usability Metric for and human interaction factors.
User Experience.” Interacting with Computers, 25, 320–324.
doi:10.1093/iwc/iwt013
Michela Gnaldi is currently an Assistant Professor of
Lewis, J. R. (2014). Usability: Lessons learned . . . and yet to be learned. Social Statistics at the Department of Political Sciences of the
International Journal of Human-Computer Interaction, 30, 663–684. University of Perugia. Her main research interest concerns mea-
doi:10.1080/10447318.2014.930311 surement in education. On this topic, she participated in several
Lewis, J. R., & Sauro, J. (2009). The factor structure of the System Usability
Scale. In M. Kurosu (Ed.), Human centered design (Vol. 5619, pp. 94–103). research projects of national interest in Italy and in the UK,
Berlin, Germany: Springer. doi:10.1007/978-3-642-02806-9_12 where she has been working as a statistician and researcher at
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013). UMUX-LITE: When there’s the National Foundation for Educational Research.
no time for the SUS. In Proceedings of CHI 2013 (pp. 2099–2102). Paris, Silvia Bacci has been an Assistant Professor of Statistics at
France: ACM. doi:10.1145/2470654.2481287
Lindgaard, G., & Dudek, C. (2003). What is this evasive beast we the University of Perugia. Her research interests concern latent
call user satisfaction? Interacting with Computers, 15, 429–452. variable models, with a special focus on models for categorical
doi:10.1016/S0953-5438(02)00063-2 and longitudinal/multilevel data, latent class models, and item
McLellan, S., Muddimer, A., & Peres, S. C. (2012). The effect of experience on
response theory models. Now she participates in a FIRB project
System Usability Scale ratings. Journal of Usability Studies, 7, 56–67.
Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response funded by the Italian government.
theory models. New York, NY: Taylor & Francis. Francesco Bartolucci is Full Professor of Statistics at the
Petrie, H., & Bevan, N. (2009). The evaluation of accessibility, usability, and Department of Economics of University of Perugia. He is the
user experience. In C. Stephanidis (Ed.), The universal access handbook
(pp. 299–314). Boca Raton, FL: CRC Press.
Principal Investigator of the research project “Mixture and
Sauro, J. (2011). Does prior experience affect perceptions of usability? latent variable models for causal inference and analysis of
Retrieved from http://www.measuringusability.com/blog/prior-exposure. socio-economic data” (FIRB 2012 - “Futuro in ricerca” – Italian
php Government).

You might also like