You are on page 1of 38

Self Reported

Matrics
Mustika Febrillia 13414076
Atikah Arysolia Taifur 14414056
Self Reported Data
To learn about the usability of something is to ask
the partcipants to tell you about their experience
with it

How exactly to ask participants so we get good


data?

Self-reported data is one the best to describe it

Self-reported data is the verbatim comment


made by participants while using a product
Kind of Self-reported Data
Subjective data used as a counterpart to
objective, whichh is often used to describe
performance data form a usability study. But
imply lack of objectivity to the data youre
collecting

Preference data Often used as a counterpart


to performance. Preference implies a choice of
option over another, which is often not the case
in UX studies.
Importance of Self-Reported
Data
This data give the most important information about
users perception of the system and their interaction
with it.

The data would tell about how the users feel about
the system

Fact : users will not remember how long the process


of using a website and how many clicks they do,
but if the experience makes them happy , thats the
only thing matters.

Subjective reaction may be the best predictor of


their likelihood to return or make a purchase in the
future
Rating Scale
One of the most common ways to capture sefl-
reported data in a UX study is with some type of
rating scales.
Likert Scales
Using a statement to which respondents rate their
level of agreement.

Characteristics : (1) Express a degree of


agreement with a statement, (2) Give odd
number of response allowing natural response.

In designing statement, avoid adverb such as


very, extremely, or absolutely and use
unmodified versions of adjective
Semantic Differential Scales
The semantic differential technique involves
presenting pairs of bipolar or opposite adjective

Five seven point scale is commonly used (odd)

Please be aware of the connotations of different


pairings of words
When to collect self-
reported data?
Post-task ratings

Quick rating immediately after each task can help


pinpoint tasks and parts of the interface that are
particulary problematic

Post-study ratings

Can provide overall evaluation after the


participant has had interact with product more
fully. Can do more in-depth rating
How to collect ratings
There are 3 ways :

Answer questions or provide rating orally

Easiest method, but can get bias. Ex: feel


uncomfortable verbally stating poor ratings

Record responses on a paper form

Manual entry data can get bias

Provide responses using some type of online tool

Need tablets or laptop


Biases in collecting self-
reported data
Social desirability bias : respondents tend to give
answers they believe will make them look better
in the eyes of others.

Studies shown (Dillman et al., 2008) people


who are asked directly for self-reported provide
more positive feedback than when asked
through an anonymous web survey
General guidelines for rating
scales
Multiple scales to help triangulate can get
more reliable data if can think different ways to
ask participants

Odd or even number of values odd number


can get gather neutral response.

Total number of points five or seven points. But


more is not always better.
Analyzing Rating-Scale Data
The most common technique for analyzing data
from rating scales is to assign a numeric value to
each of the scale positions and then compute
the averages.

Can use descriptive statistics such as average,


modus and etc
Post-Task Ratings
The main goal of ratings associated with each
task is to give you some insight into which tasks
the participants thought the most diffiicult. Next
will examine some of spesific technique.
Ease of Use
Usually asking users to rate how easy or how
difficult each task was

Some UX professionals prefer to use a traditional


Likert scale

Compared to severl other post-task ratings and


found it to be among the most effective
After-Scenario
Questionnaire (ASQ)
Touch upon 3 fundamental areas of usability : (1)
effectiveness, (2) efficiency, and (3) satisfaction

There are developed 3 rating scales :


Expectation Measure
Most important thing about each task is how
easy or difficult it was in comparison to how easy
org difficult the user thought it was going to be

Expecting measure : asking the respondent rate


how easy/difficult they expect each of task to be

The result can interprate in quadran


A comparison of Post-task
Self-Reported Metrics
Goal : to see if these rating techniques are
sensitive to detecting differences in perceived
difficulty of the tasks

Also wanted to see how the perceived difficulty


of the tasks corresponded to the task
performance data
Post-session
Ratings
These can be used as an overall barometer of
the usability of the product
Aggregating Individual Task
Take an average of the individual task-based
ratings using rating or weighted average

Simply take an average of the data. If some tasks


are more important than others, than take a
weighted average.

By looking the data, we can take an average


perception as it changes over the time
System Usability
Scale
One of the most widely used
tools for assessing the
perceived usability of a system
or product.
Consist of 10 statements to
which user rate their level of
agreement
Interpretation of SUS score:
<50 : Not acceptable
50-70 : Marginal
>70 : Acceptable
Computer System Usability
Questionnaire (CSUQ)
CSUQ designed to administered by mail or online

Consists of 19 statements, user rates agreement


on seven-point scale

Statement example : It was simple to use this


system

Results viewed in four categories: System


Usefulness, Information Quality, Interface Quality,
and Overall Satisfaction
Questionnaire for User
Interface Satisfaction (QUIS)
Developed by HCIL 1998

Consists of 27 rating scales divided into five


categories: Overall Reaction, Screen,
Terminology/ System Information, Learning, and
System Capabilities

10-point scales change depend on statement


USE Questionnaire
Proposed by Arnie Lund (2001)

Summary in Radar Chart


Product Reaction Cards
Proposed by Benedeck and Miner (2002)
118 Cards of
Adjectives

Pick Top 5
Cards

Explain Why!

Result using
Word Cloud
Comparison of Postsession
Self-Reported Metrics
Study conducted by Tullis and Stetson (2004)

Used SUS, QUIS, CSUQ, Words, and Ours to evaluate


two web portals 123 participants, each use one
questionnaire
Net Promoter Score (NPS)
Originated by Fred Reichheld (2003)

How likely is it that you would recommend this to


a friend or colleague?

Three catagories of respondent:


o Detractors : gave ratings 0-6
o Passives : gave ratings 7 or 8
o Promoters : gave ratings 9 or 10
Using SUS to Compare
Designs
Traci Hart (2004): comparing three different
websites for adults. After attempting task, fill SUS
questionnaire

The American Insititutes for Research (2001) :


comparing Windows ME and Windows XP, 36
expert attempt task, then fill SUS questionnaire

Sarah (2006) : comparing three types of paper


ballots, using the ballots in simulation, then fill SUS
questionnaire
Online Services
VoC study typically done on live websites

Common approach Pop-up surveys

Another approach standard mechanism for


getting feedback

1. Website Analysis and Measurement Inventory

2. American Customer Satisfaction Index

3. OpinionLab
1. WAMMI
www.wammi.com

Composed of 20 statements, five-point likert


2. ACSI
3. OpinionLab
Page-level feedback from users
Issues with Live-Site Surveys
Number of questions

Self-selection of respondents

Number of respondents

Nonduplication of respondents
Other Types of Self-reported
Metrics
Assessing Specific Attributes

Assessing Specific Elements

Open-Ended Questions

Awareness and Comprehension

Awareness and Usefulness Gaps


Assessing Specific Attributes
Some of attribues of product or websites that
might be assessing:
- Visual Appeal
- Perceived Efficiency
- Confidence
- Usefulness
- Credibility
- Appropriateness of terminology
- Ease of navigation
- Responsiveness
Assessing Specific Elements
Such as:

- Instructions

- FAQs or Online help

- Homepage

- Search function

- Site map

- etc.
Open-Ended Questions
Allow user to add comments related to any of
the individual ratings scales

Ask the users to list three to five things they like


the most about the product and three to five
things they like the least

Ask to describe anything they found confusing


about the interface

Simple analysis method

Word Clouds
Awereness and
Comprehension
Testing users learning and comprehension to
website contents by giving quiz to test their
comprehension of the information

If necessary to administer a pretest to determine


what they already knew and compare to the
post test
Awereness and Usefulness
Gaps
Typically ask user about awareness as
a yes/no question, e.g: Were you
aware of this functionality?, answer
in 1-5 rating

Convert rating-scale data into top-2-


box score

Plot % of user who aware and % of


user who found the functionality
useful