Throughout the history of statistics, there has been an initial reluctance to combine measurements in any way, typically followed by empirical and theoretical work that supports the combination. For example, before the mid-17th century, astronomers would not average their observations—“the idea that accuracy could be increased by combining measurements made under different conditions was slow to come”.
Throughout the history of statistics, there has been an initial reluctance to combine measurements in any way, typically followed by empirical and theoretical work that supports the combination. For example, before the mid-17th century, astronomers would not average their observations—"the idea that accuracy could be increased by combining measurements made under different conditions was slow to come".

01/25/2014

pdf

text

original

Sub: Statistics Topic: Metrics
Throughout the history of statistics, there has been an initial reluctance to combine measurements inany way, typically followed by empirical and theoretical work that supports the combination. Forexample, before the mid-17th century, astronomers would not average their observations
the ideathat accuracy could be increased by combining measurements made under different conditions wasslow to come
. We are now so used to the arithmetic mean that we
often don’t give a second thought
to computing it (and in some situations we really should). But what about combining similarmeasurements
from different sources into a composite metric? That’s exactly what we do when we
compute a stock index such as the Dow-Jones Industrial Average. We are comfortable with this typeof combined score, especially given its successful use for over 100 years, but that level of comfort wasnot always in place. When William Stanley Jevons published analyses in which he combined the pricesof different commodities into an index to study the global variation in the price of gold in the mid-19th century, he met with significant criticism. Stock and commodity indices at least have thecommon metric of price. What about the combination of different metrics, for example, the standardusability metrics of successful completion rates, completion times, and satisfaction? The statisticalmethods for accomplishing this task, based on the concepts of correlation and regression, appeared inthe early 20th century and underwent an explosion of development in its first half (Cowles, 1989),producing principal components analysis, factor analysis, discriminant analysis, and multivariateanalysis of variance (MANOVA). Lewis (1991) used nonparametric rank-based methods to combineand analyze time-on-task, number of errors, and task-level satisfaction in summative usability tests.Conversion to ranks puts the different usability metrics on a common ordinal scale, allowing theircombination through rank averaging. An important limitation of a rank-based approach is that it canonly represent a relative comparison between like-products with similar tasks
it does not result in ameasure of usability comparable across products or different sets of tasks. More recently, Sauro andKindlund (2005) described methods for converting different usability metrics (task completion, error

counts, task times, and satisfaction scores) to z-scores
another way to get different metrics to acommon scale (their Single Usability Metric, or SUM). Sauro and Kindlund (2005) reported significantcorrelations among the metrics they studied. Advanced analysis (specifically, a principal componentsanalysis) indicated that the four usability metrics contributed about equally to the composite SUMscore. In 2009, Sauro and Lewis also found substantial correlations among prototypical usabilitymetrics such as task times, completion rates, errors, post-task satisfaction, and post study satisfactioncollected during the performance of a large number of unpublished summative usability tests.According to psychometric theory, an advantage of any composite score is an increase in the reliabilityof measurement, with the magnitude of the increase depending on correlations among thecomponent scores.

