Professional Documents
Culture Documents
To cite this article: Simone Borsci, Stefano Federici, Silvia Bacci, Michela Gnaldi & Francesco Bartolucci (2015) Assessing User
Satisfaction in the Era of User Experience: Comparison of the SUS, UMUX, and UMUX-LITE as a Function of Product Experience,
International Journal of Human-Computer Interaction, 31:8, 484-495, DOI: 10.1080/10447318.2015.1064648
Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained
in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no
representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and
are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and
should be independently verified with primary sources of information. Taylor and Francis shall not be liable for
any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever
or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of
the Content.
This article may be used for research, teaching, and private study purposes. Any substantial or systematic
reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any
form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Intl. Journal of Human–Computer Interaction, 31: 484–495, 2015
Copyright © Taylor & Francis Group, LLC
ISSN: 1044-7318 print / 1532-7590 online
DOI: 10.1080/10447318.2015.1064648
tended to rate their satisfaction with higher (better) scores than more than 5,000 individual SUS responses, Sauro and Lewis
users with shorter terms of experience. (2012) found the overall mean score of the SUS to be 68 with a
In summary, researchers and practitioners assess user satis- standard deviation of 12.5.The Sauro and Lewis CGS assigned
faction by means of questionnaires, but there are only a few grades as a function of SUS scores ranging from F (absolutely
empirical studies that have systematically analyzed the vari- unsatisfactory) to A+ (absolutely satisfactory), as follows:
ation of the outcomes of satisfaction scales when filled out
• Grade F (0–51.7)
by users with different amounts of experience in the use of a
• Grade D (51.8–62.6)
product (Kortum & Johnson, 2013; Lindgaard & Dudek, 2003;
• Grade C– (62.7–64.9)
McLellan et al., 2012; Sauro, 2011).
• Grade C (65.0–71.0)
• Grade C+ (71.1–72.5)
1.2. The System Usability Scale • Grade B– (72.6–74.0)
Several standardized tools are available in the literature to • Grade B (74.1–77.1)
measure satisfaction (for a review, see Borsci et al., 2013). • Grade B+ (77.2–78.8)
An increasing trend favors the use of short scales due to their • Grade A– (78.9–80.7)
speed and ease of administration, either as online surveys for • Grade A (80.8–84.0)
customers or after a usability test. One of the most popular is • Grade A+ (84.1–100)
the System Usability Scale (SUS; Lewis, 2006; Sauro & Lewis, Although they should be interpreted with caution, the grades
2011; Zviran, Glezer, & Avni, 2006), which has been cited in
Downloaded by [UNSW Library] at 00:52 13 August 2015
Some findings have shown the UMUX to be bidimensional 1.4. Research Goals
as a function of the item tone, positive versus negative (Lewis, The use of short scales as part of UX evaluation proto-
2013; Lewis et al., 2013), despite the intention to develop a cols could sensibly reduce the costs of assessment, as well as
unidimensional scale. The UMUX’s statistical structure might users’ time and effort to complete the questionnaires. Currently,
be an artifact of the mixed positive/negative tone of the items few studies have investigated the relationship among the SUS,
and in practice might not matter much. In light of this, both the UMUX, and UMUX-LITE, and none have analyzed their relia-
UMUX and its reduced version, the UMUX-LITE, are usually bilities as a function of different amounts of interaction with a
interpreted as unidimensional measures. product.
By design (using a method similar to but not exactly the same The primary goal of this article was to analyze the variation
as the SUS), the overall UMUX and UMUX-LITE scores can of SUS, UMUX, and UMUX-LITE outcomes when completed
range from 0 to 100. Their scoring procedures are as follows: concurrently by users with different levels of experience in the
use of a website. To reach this goal, we pursued three main
• UMUX: The odd items are scored as [score − 1] and
objectives. First, we aimed to explore the variation of UMUX
even items as [7 − score]. The sum of the item scores
and UMUX-LITE outcomes when administered to users with
is then divided by 24 and multiplied by 100 (Finstad,
two different levels of product experience. Second, we aimed
2010).
to observe whether, at different levels of product experience,
• UMUX-LITE: The two items are scored as [score − 1],
the correlations among the SUS, UMUX, and UMUX-LITE
and the sum of these is divided by 12 and multiplied by
were stable, with particular interest in the generalizability of
Downloaded by [UNSW Library] at 00:52 13 August 2015
The two studies were organized to measure different times Therefore, we expected students in Study 2 (cumulative
of participants’ exposure to CLab, thus measuring different UX condition) to rate the CLab with all the three scales as
moments of UX acquisition, as follows: more satisfactory compared to users in the first study due
to their greater product exposure (6 months). Concurrently,
• Study 1, carried out 2 months after the students first
we expected participants with greater levels of product
accessed CLab. The participants’ number of access
frequency of use to rate the CLab with all the scales as
times and interaction with the platform (time exposure)
more satisfactory than participants with lower levels of
ranged from eight (once a week) to 56 (once a day).
use.
• Study 2, carried out 6 months after the students first
accessed CLab. The participants’ number of access
times and interaction with the platform (time exposure)
2.2. Data Analysis
ranged from 24 (once a week) to 168 (once a day).
For each study, principal components analyses were per-
The two studies were reviewed and approved by the formed to assess the SUS’ dimensionality—focusing on
Institutional Review Board of the Department of Philosophy, whether the item alignment of the resulting two-component
Social and Human Sciences and Education, University of structure was consistent with the emergence of Learnable and
Perugia. All participants provided their written informed con- Usable components. Only if this expected pattern did not
sent to participate in this study. No minors/children were emerge did we plan to follow up with a multidimensional latent
enrolled in this study. The study presented no potential risks. class item response theory model (LC IRT) to more deeply
Downloaded by [UNSW Library] at 00:52 13 August 2015
Bartolucci, 2007; Bartolucci et al., 2014). This class of mod- Table 3 shows that the minimum value of the BIC index
els extended traditional IRT models (Nering & Ostini, 2010; was observed for C = 3, both in the unidimensional case
van der Linden & Hambleton, 1997) in two main directions. and in the bidimensional case, suggesting that the sample of
First, they allow the analysis of item responses in cases of individuals came from a population composed of three latent
questionnaires that measure more than one factor (also called, classes. As smaller values of BIC are better than higher values, a
in the context of IRT, latent trait or latent variable or abil- comparison between the unidimensional and the bidimensional
ity). Second, multidimensional LC IRT models assume that the models with C = 3 found that the BIC index gave evidence
population is composed of homogeneous groups of individu- of a better goodness of fit for the model with unidimensional
als sharing unobserved but common characteristics (so-called structure (BIC = 2924.805) than for that with bidimensional
latent classes; Goodman, 1974; Lazarsfeld & Henry, 1968). structure (BIC = 2941.496; bold in Table 3).
TABLE 2
Principal Components Analysis of the System Usability Scale in Study 1 Showing One- and Two-Component Solutions
Eigenvalues Extraction Unidimensional Bidimensional
TABLE 3
Unidimensional and Bidimensional LC IRT Models for System Usability Scale Data: Number of Latent Classes (C), Estimated
Maximum Log-Likelihood (), Number of Parameters (#par), Bayesian Information Criterion (BIC) Index
Unidimensional Model Bidimensional Model
TABLE 5
Means and Standard Deviations of Overall Scores of the SUS, UMUX, and UMUX-LITE for Study 1 With 95% (and Associated
CGS Grades)
Scale M SD 95% CI Lower 95% CI Upper
SUS 70.88 (C) 6.703 69.9 (C) 71.8 (C+)
UMUX 84.66 (A+) 12.838 82.8 (A) 86.5 (A+)
UMUX-LITE 73.83 (B − −) 9.994 72.4 (C+) 75.3 (B)
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience; CI = confidence interval.
TABLE 6
Principal Components Analysis of the System Usability Scale in Study 2
Items Usability Learnability
8. I found this website very cumbersome/awkward to use. .910 .121
1. I would like to use this website frequently. .869 .105
5. I found the various functions in this website were well integrated. .800 .122
3. I thought this website was easy to use. .769 .226
7. I would imagine that most people would learn to use this website .754 .258
very quickly.
2. I found this website unnecessarily complex. .739 .273
6. I thought there was too much inconsistency in this website. .708 .183
9. I felt very confident using this website. .683 .106
10. I needed to learn a lot of things before I could get going with .133 .841
this website.
4. I think that I would need assistance to be able to use this website. .206 .753
Note. Loadings greater than .4 in bold.
Downloaded by [UNSW Library] at 00:52 13 August 2015
TABLE 7
Unidimensional and Bidimensional Latent Class Item Response Theory Models for System Usability Scale Data: Number of
Latent Classes (C), Estimated Maximum Log-Likelihood (), Number of Parameters (#par), Bayesian Information Criterion
(BIC) Index
Unidimensional Model Bidimensional Model
TABLE 8
Correlations Among SUS, UMUX and UMUX-LITE in Study 2 (With 95% CI)
UMUX CI UMUX-LITE CI
TABLE 9
Means and Standard Deviations of Overall Scores of the SUS, UMUX, and UMUX-LITE for Study 2 With 95% CIs (and
Associated Curved Grading Scale Grades)
Scale M SD 95% CI Lower 95% CI Upper
SUS 75.24 (B) 13.037 72.6 (B) 77.0 (B+)
UMUX 87.69 (A+) 10.291 85.6 (A+) 89.8 (A+)
UMUX-LITE 76.45 (B) 9.943 74.4 (B) 78.5 (B+)
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience; CI = confidence interval.
TABLE 10
Main Effects and Interactions for Combined Analysis of Variance
Scale Effect Outcome
SUS Main effect of duration F(1, 263) = 10.7, p = .001
Main effect of frequency of use F(2, 263) = 30.9, p < .0001
Duration × Frequency interaction F(2, 263) = 15.8, p < .0001
UMUX Main effect of duration F(1, 263) = 17.4, p < .0001
Main effect of frequency of use F(2, 263) = 22.2, p < .0001
Duration × Frequency interaction F(2, 263) = 3.4, p = .035
UMUX-LITE Main effect of duration F(1, 263) = 4.7, p = .03
Main effect of frequency of use F(2, 263) = 16.8, p < .0001
Duration × Frequency interaction F(2, 263) = 4.7, p = .01
Note. SUS = System Usability Scale; UMUX = Usability Metric for User Experience.
TABLE 11
Summary of Study Outcomes by Hypothesis
Hypothesis Result Meaning
Hypothesis 1 Supported SUS dimensionality was affected by the different levels of product experience. When
administered to users with less product experience, the SUS had a unidimensional
structure, whereas it had a bidimensional structure for respondents with more product
experience.
Hypothesis 2 Supported All the three scales were strongly correlated, independent of the administration
conditions.
Hypothesis 3 Supported Participants with more product experience were more satisfied than those with less
product experience regardless of whether that experience was gained over a duration of
exposure or by frequency of use.
Note. SUS = System Usability Scale.
unidimensional scale, so we recommend against par- detailed investigation of their relationships and psychometric
titioning it into Usable and Learnable components in properties.
Downloaded by [UNSW Library] at 00:52 13 August 2015
Goodman, L. A. (1974). Exploratory latent structure analysis using Sauro, J., & Lewis, J. R. (2011). When designing usability question-
both identifiable and unidentifiable models. Biometrika, 61, 215–231. naires, does it hurt to be positive? In Proceedings of CHI 2011
doi:10.1093/biomet/61.2.215 (pp. 2215–2223). Vancouver, Canada: ACM. doi:10.1145/1978942.
Hassenzahl, M. (2005). The thing and I: Understanding the relationship between 1979266
user and product. In M. Blythe, K. Overbeeke, A. Monk, & P. Wright Sauro, J., & Lewis, J. R. (2012). Quantifying the user experience:
(Eds.), Funology: From usability to enjoyment (Vol. 3, pp. 31–42). Berlin, Practical statistics for user research. Burlington, MA: Morgan
Germany: Springer. doi:10.1007/1-4020-2967-5_4 Kaufmann.
Hassenzahl, M., & Tractinsky, N. (2006). User experience—A research agenda. Schwarz, G. (1978). Estimating the dimension of a model. The Annals of
Behaviour & Information Technology, 25, 91–97. doi:10.1080/01449290 Statistics, 6, 461–464. doi:10.2307/2958889
500330331 Tractinsky, N. (1997). Aesthetics and apparent usability: Empirically assess-
Hassenzahl, M., Wiklund-Engblom, A., Bengs, A., Hägglund, S., & ing cultural and methodological issues. In Proceedings of CHI 1997
Diefenbach, S. (2015). Experience-Oriented and Product-Oriented (pp. 115–122). Atlanta, GA: Association for Computing Machinery.
Evaluation: Psychological Need Fulfillment, Positive Affect, and Product doi:10.1145/258549.258626
Perception. International Journal of Human-Computer Interaction, 31, van der Linden, W., & Hambleton, R. K. (1997). Handbook of modern item
530–544. response theory. New York, NY: Springer.
ISO 9241-11:1998 Ergonomic requirements for office work with visual display Zviran, M., Glezer, C., & Avni, I. (2006). User satisfaction from commercial
terminals – Part 11: Guidance on usability. web sites: The effect of design and use. Information & Management, 43,
ISO 9241-210:2010 Ergonomics of human-system interaction – Part 210: 157–178. doi:10.1016/j.im.2005.04.002
Human-centred design for interactive systems.
Kortum, P. T., & Bangor, A. (2012). Usability ratings for everyday prod-
ucts measured with the System Usability Scale. International Journal
of Human–Computer Interaction, 29, 67–76. doi:10.1080/10447318.2012.
681221 ABOUT THE AUTHORS
Downloaded by [UNSW Library] at 00:52 13 August 2015
Kortum, P. T., & Johnson, M. (2013). The relationship between levels of user Simone Borsci is a Research Fellow in Human Factors
experience with a product and perceived system usability. Proceedings of at Imperial College of London NHIR-Diagnostic Evidence
the Human Factors and Ergonomics Society Annual Meeting, 57, 197–201.
doi:10.1177/1541931213571044 Cooperative group. He has over 10 years of experience as psy-
Lallemand, C., Gronier, G., & Koenig, V. (2015). User experience: A con- chologist and HCI expert in both industry and academia. He
cept without consensus? Exploring practitioners’ perspectives through has worked as the UX lead of the Italian Government’s working
an international survey. Computers in Human Behavior, 43, 35–48. group on usability, and as a researcher at University of Perugia,
doi:10.1016/j.chb.2014.10.048
Lazarsfeld, P. F., & Henry, N. W. (1968). Latent structure analysis. Boston, MA: Brunel University, and Nottingham University.
Houghton, Mifflin. Stefano Federici is currently Associate Professor of General
Lee, S., & Koubek, R. J. (2012). Users’ perceptions of usability and aesthetics Psychology at the University of Perugia. He is the coordi-
as criteria of pre- and post-use preferences. European Journal of Industrial
nator of a research team of CognitiveLab at University of
Engineering, 6, 87–117. doi:10.1504/EJIE.2012.044812
Lewis, J. R. (2006). Usability testing. In G. Salvendy (Ed.), Handbook of human Perugia (www.cognitivelab.it). His research is focused on assis-
factors and ergonomics (pp. 1275–1316). New York, NY: Wiley & Sons. tive technology assessment processes, disability, and cognitive
Lewis, J. R. (2013). Critical review of “The Usability Metric for and human interaction factors.
User Experience.” Interacting with Computers, 25, 320–324.
doi:10.1093/iwc/iwt013
Michela Gnaldi is currently an Assistant Professor of
Lewis, J. R. (2014). Usability: Lessons learned . . . and yet to be learned. Social Statistics at the Department of Political Sciences of the
International Journal of Human-Computer Interaction, 30, 663–684. University of Perugia. Her main research interest concerns mea-
doi:10.1080/10447318.2014.930311 surement in education. On this topic, she participated in several
Lewis, J. R., & Sauro, J. (2009). The factor structure of the System Usability
Scale. In M. Kurosu (Ed.), Human centered design (Vol. 5619, pp. 94–103). research projects of national interest in Italy and in the UK,
Berlin, Germany: Springer. doi:10.1007/978-3-642-02806-9_12 where she has been working as a statistician and researcher at
Lewis, J. R., Utesch, B. S., & Maher, D. E. (2013). UMUX-LITE: When there’s the National Foundation for Educational Research.
no time for the SUS. In Proceedings of CHI 2013 (pp. 2099–2102). Paris, Silvia Bacci has been an Assistant Professor of Statistics at
France: ACM. doi:10.1145/2470654.2481287
Lindgaard, G., & Dudek, C. (2003). What is this evasive beast we the University of Perugia. Her research interests concern latent
call user satisfaction? Interacting with Computers, 15, 429–452. variable models, with a special focus on models for categorical
doi:10.1016/S0953-5438(02)00063-2 and longitudinal/multilevel data, latent class models, and item
McLellan, S., Muddimer, A., & Peres, S. C. (2012). The effect of experience on
response theory models. Now she participates in a FIRB project
System Usability Scale ratings. Journal of Usability Studies, 7, 56–67.
Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response funded by the Italian government.
theory models. New York, NY: Taylor & Francis. Francesco Bartolucci is Full Professor of Statistics at the
Petrie, H., & Bevan, N. (2009). The evaluation of accessibility, usability, and Department of Economics of University of Perugia. He is the
user experience. In C. Stephanidis (Ed.), The universal access handbook
(pp. 299–314). Boca Raton, FL: CRC Press.
Principal Investigator of the research project “Mixture and
Sauro, J. (2011). Does prior experience affect perceptions of usability? latent variable models for causal inference and analysis of
Retrieved from http://www.measuringusability.com/blog/prior-exposure. socio-economic data” (FIRB 2012 - “Futuro in ricerca” – Italian
php Government).