You are on page 1of 20

Methods and Techniques

of investigating user behavior


Introduction - why M & T?

Gerrit C. van der Veer
gerrit@cs.vu.nl
theory
Methods and techniques for
empirical research
Goals for this course
understand why
understand basic theory
know basic methods and techniques
know how to plan your research
know when to ask for expert consult
Goals of empirical research

an example
Cultural utterances of Martians - artifacts we found:
-w ce g
O g
How to develop a science on this - goals in sequence:
description (variables, quantification, measuring relations)
prediction (based on knowledge of relations)
explanation (causal models)
manipulation (apply control based on known causality)
Characteristics of scientific knowledge
unambiguous
operational definitions for observable phenomena
measurement techniques
scientific language: concepts and relations (esp. unobservable
phenomena)
repeatable studies
describe procedures, population and samples of observations
reliability (of measurement, observers, raters, tests)
controlled for disturbing phenomena
design of study / experiment (sequence, balancing , control groups)
sample
models for measurement of other variables and statistical control
Research methods
observation in nature
case studies (context of use, community of practice, +? -?)
field study and survey
systematic observation / interview / focus group
focused on some phenomena
influence of participant observer
correlation study
tests / questionnaires / behavior measurements
focus on relations between variables
measures no causality (e.g. Malaria)
Research methods
experiment
manipulation of candidate causes
measuring effects
controlling possible other causes
observation in nature

field study and survey

correlation study

Data collection
choice of technique based on
sensitivity for the phenomena
reliability and objectivity
validity
internal - intended concept
external - representative for population of phenomena, context & situation
practicality (effort, time, availability)

Data collection
types of techniques
observation of behavior
registration of .. behavior, physiological data
think aloud during processes / activities
pro? . con?
video with retrospective protocols
interview
free .. structured
objective test
questionnaires
written interview .. subjective rating scales
unobtrusive measurements (e.g. logs)
Scoring
translation of data in units that allow modeling and analysis:
numbers or defined categories

needs interpretation prescriptions that are part of the
operational definition:

relative (frequency per ) / absolute (reaction time)
duration time (sometimes relative to ..)
intensity / strength
category of behavior / option chosen (e.g. marital status)

complex phenomena:
patterns, spectrum, half-life
Scales of measurement
Have been discussed in the Bachelor course Toegepaste Statistiek

ratio scale: 1-dimensional, absolute (comparison with standard unit),
zero=0, cardinal scale
e.g. time on 100 m.
interval scale: no absolute zero
e.g. intelligence coefficient
ordinal scale: comparison between observed data (possible tie) so
no standard unit
e.g. results sports competition
nominal scale: verbal labels or number labels
1=single; 2=married; 3=divorced; 4= widowed; 5=living together
Validity of measures
To what extent does one observe and measure what is aimed at.

predictive validity - predictive power for other behavior (school
exam score for job selection)
content validity - representative for the intended domain (items in
an intelligence test)
concurrent validity - consistency with other types of measures for
the same concept (self report v.s. teacher rating)
concept / construct validity - (multiple choice math questions to
measure mathematical ability)
Experiment: definition
Objective observation of effects that are produced in a controlled
situation, where one or more factors are manipulated and others are
kept constant (Zimney 1961)

terminology:
subject
experimenter

independent variables (antecedent conditions, treatments)
dependent variables (effects)
disturbing / secondary / potential variables
e.g. effect of pre-knowledge on learning speed (with motivation)
p q m q l / p q l & m q l / m q p & m q l
intermediating confounding artifact of selection

Categories of secondary / confounding
variables
1. person variables
capabilities
motivation
age
educational background
2. sequence variables
fatigue / boredom / learning
development of subject during (longitudinal) study in relation to
experiment
3. situation variables
environment: sound/temperature/day time
experimenter effect on subject / experimenter observation bias
task effect: difficulty / modality of stimulus or instruction

Experimental design - how to cope
with secondary variables
Main decision is based on type of the expected / known
main confounding variables
person variables C repeated measures design: each
person is measured in all conditions
needs balancing for possible sequence effects
sequence variables C multiple groups design: each
person is in a single group and participates in one
condition only
needs matched groups (keeps person variables in control) or
randomized groups (more easy, less controlled)
Factorial design:
In practice we often need a
combination of the previous designs
factors between subjects to control for unwanted sequence
effects
factors within subjects (repeated measurements) to control
for person variables

and: we still need to control for situation variables to:
keep these constant (if possible in field experiments)
measure them and apply statistical control

Example theory

based on previous observation of
phenomena, variables, and relations:


women have difficulty to navigate with 3D interface

this phenomenon disappears if screen is sufficiently large
Example hypothesis:

women have more difficulty to navigate with 3D
interface than men, unless screen is large
Independent variables:
gender (F/M)
interface type (2D / 3D)
screen size (Small/Large)

Dependent variable: navigation performance on set of standard tasks
operationally defined: time to click on target button (task effect?)

Confounding variables:
sequence of interface types (makes aware of navigation issues)
learning (can be handled by balancing)
Factorial design
Between subjects
gender (obvious) F/M
interface type (awareness could destroy effect) 2D/3D
makes 2*2=4 groups

Within subjects
screen size S/M
balanced for learning (at random half of subjects in each group S-
M, other half M-S)
for each size 10 navigation trials (to increase validity of navigation
problems)
randomly allocated to size from a set of 20 (because .?)
makes 10+10=20 trials with effect measurement per person
Effects to be tested - ANOVA:
each test is statistically independent from the others
gender differences total - not a hypothesis
interface type (2D vs 3D) - not a hypothesis
screen size - not a hypothesis
sequence effects of trials and interaction with other - not a hypothesis
gender differences in relation to screen size (interaction) - not a
hypothesis
interface type in relation to screen size (interaction) - not a hypothesis

gender differences in relation to type (2D vs 3D) (interaction)

gender differences in relation to screen size and interface type
(interaction)
Stability and reliability of experiment
Reliability = reproducibility of the phenomenon in the hypothetical
case it could be repeated at the same point of time in the same
circumstances

Instability is the reverse, caused by:
1. Characteristics of the measurement technique
2. Observer bias
3. Changes in the observer (fatigue - sequence issue)
4. Changes in the situation
5. Changes in the object/person studied (aging, attitude change -
sequence issue)

4 and 5 are not always a case of unreliability, these changes may be
covered by theory (should be topic of empirical study themselves)

You might also like