- Effect of Interactive Hypermedia Program on Mathematics Achievement in Relation to Locus of Control
- Marketing Research - Quiz MCQs
- MB00 50 – Research Methodology.docx
- Samsung Electronics
- Human Experiment
- How to write a critique in the right way
- UT Dallas Syllabus for psy3392.501.08f taught by Betty-gene Edelman (bedelman)
- weebly lesson 3
- Thesis.ppt
- Concepts of Experimental Design491
- M. Phil 1st Term
- Statics
- sample- THE_IMPACT_OF_GADGETS_IN_LEARNING_AMONG.docx
- The Effect of a Program Play Therapy, Plush Toy
- Muhajirin Icmstea 2016 Fmipa Unm
- how seeds move lesson
- integrated-science-syllabus.pdf
- anqiamz fskwln
- 01 Chapter 1_Thinking Critically With Psychological Science_Student
- Report Writing

What is it?

When to use it?

Types of Variables

Designing an Experiment

Case Study

Analyzing the data

Types of evaluation

Users not involved

Supported by practice/theory

External validity: degree to which research

results applies to real situations

Large Sampling

Subjective/qualitative

Done this someway

In one form or another we have resorted to

experimenting

experimented with various types of ear plugs

experimented with different types of pacifiers

experimented with various types of snow tires

etc

Approaches: Naturalistic

Naturalistic:

describes an ongoing process evolving over time

observation occurs in realistic setting

ecologically valid

real life

External validity

degree to which research results applies to real situations

Approaches: Naturalistic

Advantage

Can state something about the users behavior in an

actual environment

Disadvantage

Cannot know all the contributing factors to users

performance

i.e. do they use menus more frequently than toolbar buttons

because the icons are not comprehensible OR because the

buttons are too small OR simply because they do not know

that they exist OR . [can go on]

Approaches: Experimental

In certain cases you want to make a statement about a

particular UI design choice

i.e. I really want to know whether the size of buttons contribute

to how quickly users click on them

or

i.e. I want to know whether a menu designed in a circular shape

(pie menu) is more effective than a regular menu

or

Want to know the effect of certain variables on outcomes

widely applicable (not only restrained to your app)

Approaches: Experimental

Experimental

study relations by manipulating one or more independent

variables

experimenter controls all environmental factors

observe effect on one or more dependent variables

Internal validity

confidence that we have in our explanation of experimental

results

What are some trade-offs?

Quantitative Evaluation

What task to evaluate?

Depends on application

Attempt to find canonical task(s)

i.e. what would be a set of tasks that can be used to test

whether larger icons contribute to faster selection?

Common measures

Task completion time

Error rate

Learning rate (novice -> expert transition)

Fatigue, comfort?

etc.

What task to evaluate?

Example: Pointing Device Evaluation

pointing is fundamental

abstract, elementary, essential

W

D

Example

Is it easier to read with CAPS or without Caps?

whether CAPS are more efficient than non-Caps

for text, CAPS are 20% less efficient than non-Caps or

for text, CAPS are 25% more efficient than non-Caps

Example

How do we test this question?

a hypothesis

a set of variables we are going to manipulate

a set of variables we are going to measure

reduce the number of confounding variables

a task

a set of randomized trials

Example

THE BROWN FOX JUMPED OVER THE MOON.

OR, SHOULD IT SAY THE BROWN FOX

JUMPED OVER THE CAT.

Example

moon. Or, should it say the brown

fox jumped over the cat.

Example

Would it be sufficient to simply show those two slides

and do some measurements?

Hypothesis

Definition: Statement or claim that the

experimenter wants to test

between two types of variables

Hypothesis

H0: there is no difference in the number of cavities in

children and teenagers using crest and no-teeth

toothpaste

have fewer cavities than those who use no-teeth

toothpaste

Hypothesis

H0: there is no difference in user performance (time

and error rate) when selecting a single item from a

pop-up or a pull down menu, regardless of the

subjects previous expertise in using a mouse or

using the different menu types

New

New Edit

Open

Open View

Close

Close Insert

Save

Save

Hypothesis

Hypothesis can be softer and uncertain:

Will color affect recognition speed?

Will proximity affect perceptual organization?

Etc

Independent Variables

At least one circumstance is of major interest in an experiment

i.e. menu type in selection time experiment OR text type

Independent of the subjects behavior or performance

present (manipulate)

Nothing the subject does can change the levels of the

independent variable

CAPS vs. non-caps

experiment? What are the different levels?

Dependent Variables

Want to measure a subjects behavior in response to

manipulations of the independent variable

between the independent and dependent variables is

referred to as hypothesis (as seen previously)

Control Variables

Only want to manipulate one circumstance

the independent variable

control font of two different types of menus

control color coding on two different types of visualizations

confirm that change in dependent variable due to change in

independent variable

More control leads to less generalization

Confounding Variables

A confounding variable is any factor that varies with the

independent variable

subjects respond more quickly to the last 2

subjects respond more quickly after practice

Practice confounded with speed

Random Variables

Want to avoid confounded effects; allow variables to

randomly vary: random variables

For testing effect of color on visibility of an object

choose subjects randomly from a large population

choose colors to be tested on randomly as well

Age factors, eye deficiencies, and other elements would

randomly enter into the equation (can eliminate some of these)

to select for us

Example

In the previous example what may be a hypothesis

H1: Users are slower reading CAPS

H2: There is no difference in reading rates

H3: CAPS are less memorable

independent variables?

Text type, i.e. CAPS or no Caps (Two levels)

variables?

Lets look first at the hypothesis

H1 or H2: reading speed

H3: recall after 2 hours

Example

What variables do we control?

how do we counter these?

Experimental Design

Manipulating and Measuring Variables

Within vs. Between Subjects Design

Single vs. Multiple Variable Experiment

Choosing an Independent Variable

Should be what the experimenter wants to manipulate:

Font 10 vs. 12 vs. 14 (IV=font size)

Bar graph vs. line graph (IV=type of graph)

Are children more violent after being exposed to games with

violence. What is the IV?

operational definition of violence in games?

Is there shooting/hurting/physical contact?

Are the actions moral/immoral (stealing, deceiving, etc.)?

Language abuse?

Would it be considered violent if outside the game?

Single Variable Experiment

Only one independent variable

where one is the experimental group and the other control

group), i.e. existence vs. non-existence

Advantages:

Way of finding out if IV is worth studying

Results easy to interpret and analyze

Some cases do not need more than two levels

investigating two interaction techniques

two educational methods

etc.

Single Variable Experiment

Disadvantages:

Sometimes does not say much about the relationship

between the IV and the DV

Reading Time

Reading Time

12 10 12 10

Print Size Print Size

Reading Time 12 10

Print Size

Single Variable Experiment

Multilevel Experiments: single variable experiments where IV has > 2

levels

Low High Low Neutral High

Anxiety Level Anxiety Level

Advantages:

Have better handle over IV-DV relationship

The more levels added the less critical is the range of IV (balance

between realistic and large enough)

Disadvantages:

Requires more time and effort than 2-level (within-subjects increases time

for each subject, between-subjects requires additional subjects)

Statistical tests more complex

Need to know when to limit the number of levels

Multiple Variable Experiment

Most frequent design combines several variables in a factorial

combination that pairs each level of IV with the others

referred to as a factorial design

(small/medium/large)

Gives 2 x 3 design

Font Size

Small Medium Large

Yes

Caps

No

Multiple Variable Experiment

Advantages

Interactions between IVs can be studied (interaction occurs

when the relationship between one IV and subjects behavior

depends on the level of a second IV)

Can add additional circumstances by making them IVs

When circumstance that could add variability to the data is

made into a factor, the amount of variability decreases

Disadvantages

Time-consuming and costly

Analysis more complicated, need to typically do an ANOVA

Assumption that variability in data approximates a normal

distribution (dont know until completed experiment)

Interpretation of results is more complex

Range of the Independent Variable

Range is the difference between the highest and lowest level

of a variable; no specific guidelines, need to fit it in the

experiment

effects will definitely be found without carrying out the

experiment

effect

If interested in effect of font size on reading speed choosing

between font 14 vs. font 15 will could lead to false conclusions

out; can test design before proceeding

Choosing a Dependent Variable

Measure of the subjects behavior

childrens aggression?

Panel of judges observing playing behavior + rating

Give a selection of toys and observe how they play

Narrate frustrating stories and count number of direct-attacks

measurements

Reliability/Repeatability

Would the same results be achieved if the test were

repeated?

Experiment is perfectly reliable if you get same results each time

experiment is repeated

Problems

Individual differences:

best user 10x faster than slowest

best 25% of users ~2x faster than slowest 25%

Unreliable instruments

e.g., built in clock vs. stop watch

Partial Solution

Reasonable number and range of users tested

Correlate data from repeated measurements

Validity

Are you measuring what you think youre measuring?

Errors in equipment

Errors in procedure

Incorrect pool of subjects

Errors questions asked, variables measured

Observable Dependent Variables

Directly observable DVs can be measured directly; indirect

DVs use secondary measures

i.e. physiological measures with a lie detector

response time to measure how much info. is processed

speed; usually not sufficiently indicative of performance

i.e. could be very fast but also very inaccurate

example gives an overall better indication of performance

i.e. more valid

combined to form one variable

Questions?

Experimental Design

Individual differences

Need more than one subject

Usually multiple subjects (n=at least 10, ideally much more)

how to distribute tasks amongst subjects?

Within vs. Between Subjects Design

Within subject design:

Pros: Condition 1 Condition 2

All subjects do all conditions

Fewer subjects, less individual differences Subject 1 Subject 1

Easier stats analysis Subject 2 Subject 2

Cons:

Transfer effects . .

Doing 1 condition affects following condition

Subject 10 Subject 10

Often you want subjects to learn extensively

Pros:

Subjects only do one condition Condition 1 Condition 2

No transfer effects

Subject 1 Subject 11

Train to high skill

Cons: Subject 2 Subject 12

More subjects, individual differences

. .

Harder stats analysis

Subject 10 Subject 20

Experimental Design

Order of presentation in within-subjects designs

ABBA counterbalancing:

Every subject does trials in the order: A, B, B, A

Any confounding effect (e.g., learning curve) is counterbalanced

Trial# 1 2 3 4

Condition A B B A

Linear Confounding effect 10 20 30 40

B: 20+30 = 50

B: 30+50 = 80

Experimental Design

Order of presentation in within-subjects designs

Make order a between-subjects variable

Fully counterbalanced:

ABC

ACB

AB

BAC

BA

BCA

CAB

CBA

Combinatorial explosion when n>4

Needs lots of subjects

Experimental Design

Order of presentation in within-subjects designs

Partial counterbalancing. e.g., Latin square:

Ensures each level appears in every position in order equally

often

n rows x n columns and each treatment occurs once in each

row and in each column

ABC

BCA

CAB

Each condition precedes and follows each of the other

conditions equally often:

ABCD

BDAC

D C BA

CAD B 44

Experimental Design

Why counterbalance?

Reduce transfer effects

A-B transfer == B-A transfer

If asymmetric transfer

i.e., A-B transfer > or < B-A transfer then use a between-

subjects design

Range effects

People tend to perform best in middle of range of trials

does between-subjects design solve this?

Context effect when one level of IV is used subjects establish a

context

Activity

How would you carry out the experiment for

comparing CAPS to non-caps, i.e. what would be

your design?

Activity

Design an experiment to compare a pop-up linear

menu vs. a pie menu

Hypothesis? Evening Shift

IV? Night Shift

Split Shift

DV?

Design?

Evening

Task (s)?

Day Night

Split

Activity

Activity

Design an experiment to test whether adding color

coding to a menu interface improves accuracy?

Subjects?

Hypothesis?

IV?

DV?

Design?

Task (s)?

Activity

Only one form of solution, many others exist

Subjects: Taken from user population

Hypothesis: Color coding will make selection more accurate

IV: Color coding

DV: Accuracy measured as number of errors

Design: between groups to ensure no transfer of learning (or

within groups with appropriate safeguards if subjects are scarce)

Task: the interfaces are identical in each of the conditions,

except that, in the second color is added to indicate related

menu items. Subjects are presented with a screen of menu

choices (ordered randomly) and verbally told what they have to

select. Selection must be done within a strict time limit when the

screen clears. Failure to select the correct item is deemed an

error. Each presentation places items in new positions. Subjects

perform in one of two conditions.

Example

The Effect of Shading in Extracting Structure

from

Space-Filling Visualizations

Motivation

Hierarchies are abundant and interacted with on a

regular basis

explicit

navigation complexity increases with size

Space-Filling Visualization

Developed to make more efficient use of display

space

i.e.: Treemap [Shneiderman, 1990]

of showing node size

information?

CushionMap: Shaded Treemap

a 2-D impression, to make structure more explicit

[van Wijk, 1999]

Structure-from-Shading (1)

information early on

[Enns & Rensink, 1990]

Structure-from-Shading (2)

the shape of an object [Sun and Perona, 1996]

information [Ramachandran, 1988]

Structure-from-Shading (3)

Shading useful in extracting structure information in

node-link diagrams [Irani and Ware, 2001]

Structure-from-Shading (4)

Some evidence that shading impairs size judgments

et al, 1991]

counterpart [Zacks et al, 1998]

Study Methodology

Hypotheses

Participants

Apparatus and task

Experimental factors

Study Design

Experiment - Hypotheses

Hypothesis 1: shading (CM) will result in higher performance on

structure related tasks than the no-shading condition (TM)

tasks related to file and directory size comparisons than the

no-shading condition (TM)

Participants

20 undergraduate students (paid) participated

TM first

management tasks/routines

Experiment Method

Half started on TreeMap (TM) the other half on

CushionMap (CM)

and {TM-H2, CM-H1}.

Experiment Tasks

Tasks divided into two major categories:

Structure-based

Count the number of directories in the hierarchy

Find the directory with the most number of files

Count the number of subdirectories in a given directory

Count the number of files in a given subdirectory

Find the directory with the most number of bit map files (.bmp)

Count the number of sub-directories that contain bitmap

(.bmp) files

Size-based

Find the smallest directory in the hierarchy

Find the largest file in the hierarchy

Find the largest file in a given directory

Find the largest mp3 file in the hierarchy

Experiment Measurements

Measure: subjects performance on each task with

respect to two variables:

time until completion (0 to 45 seconds)

successful/unsuccessful completion (0/1)

completion time calculations

Experiment Results (2)

Structure Size

(seconds)

CM = 16.2 (3.7) CM = 20.2 (5.4)

Average # of tasks TM = 2.7 (1.5) TM = 3.4 (0.7)

successfully completed

CM = 4.9 (0.8) CM = 3.1 (0.9)

25 6

5

20

4

15

TM TM

3

CM CM

10

2

5

1

0 0

St ruct ure Size St r uc t ur e Si z e

Experiment Results (3)

Structure Size

TM (p=0.0021) between CM and TM

Success accurate on CM over TM between CM and TM

(p<0.001)

Experiment Subjective Evaluation

Statement TM CM

1. I was able to count the number of directories using toolname. 3.65 4.40

2. I was able to find the bitmap (.bmp) files using toolname. 3.70 4.60

3. I was able to detect the type of files using toolname. 3.95 4.55

5. I was able to find the files inside a sub-directory using toolname. 3.05 3.95

6. I was able to find the largest file using toolname. 3.50 3.95

7. I was able to compare the sizes of files using toolname. 3.30 3.90

8. I was able to find the largest directory using toolname. 3.70 4.40

9. After the training session I knew how to use toolname. 4.00 4.35

Discussion

Level of Support for Tasks Based on Size

Very High

?

n9

High

n5

n1

n10

Sunburst ?

n7 n8

Medium

n0

Low

n1 n2 n3

n4 n5 n6

n7 n8 n9 n10

Discussion

Tested the effect of shading on non-explicit structures (CM vs.

TM)

Users were faster and more accurate in completing directory

management tasks with the shaded hierarchies

effect of shading for size-based tasks

space-filling techniques

Recap

Choosing IVs and DVs

Range of IVs

Questions?

Interpreting Experimental Results

Plotting Frequency Distributions

Statistics for Describing Distributions

Plotting Relationships Between Variables

Describing the Strength of a Relationship

Interpreting Results from Factorial Experiments

Inferential Statistics

Statistical analysis

Calculations that tell us

mathematical attributes about our data sets

mean, amount of variance, ...

whether we are sampling from the same or different distributions

statistical significance

Questions one might ask

Is there a difference?

Is one system better than another?

Techniques addressing this are called hypothesis testing

The answers are not simply yes/no, but of the form: we are 99% certain

that selection on 5 item menus is faster than 7 item menus

i.e. selection from 5 items is 270 ms faster than from 7 items

Called point estimation, often obtained by averages

i.e. selection is faster by 270 +/- 30 ms

Answers to this are in the form of standard deviations or

confidence intervals

we are 95% certain that the difference in response time is

between 240 and 310 ms

Interpreting Results

First two rules:

Look at the data

a graph, histogram or table of results could be more instructive

Exposes outliers, which need to be removed to avoid biases

Save the data

May want to try different analyses on the data

Trace back the analysis to the raw data collected

questions to be answered

Plotting Frequency Distributions

Plot a frequency distribution telling us how frequently each

score appears in the data

Frequency is the number of raw data points that fall into each

score category

between conditions

Want to determine whether video game player who plays racing

games is more comfortable (less anxious) with fast drivers

Plotting Frequency Distributions

Game Player Non-Player 3.5

1 62 11 55 3

2 56 12 42 2.5

3 67 13 61 2

4 91 14 58

1.5

5 53 15 70 0.5

6 87 16 47 0

0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

7 51 17 62

8 63 18 36 Game Player

9 46 19 74

10 71 20 51 3.5

2.5

By looking at distributions we can 2

0.5

0

10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90-99

Plotting Frequency Distributions

Normal distribution, fits a complex mathematical formula. For

our purposes, dist is normal if fits a bell-shaped curve

can apply appropriate statistical tests

single number representing how subjects performed

Statistics for Describing Distributions

Use typically two types of statistics: descriptive and inferential

experimenter to describe some characteristics

Statistics for Describing Distributions

One important descriptor is the location of the middle of a

distribution (central tendency)

and below it

average plays, and your judgment

outliers vs. no outliers

Statistics for Describing Distributions

Another important statistic is the measure of dispersion, or how

spread out the scores are

from the mean, squaring these, adding them up, and dividing

by number of scores

The smaller the std dev, indicates that mean is with fewer

errors

Plotting Relationships Between Variables

Reason for experiment is to determine if there is a relationship

between IV and DV

relationship

If IV levels cannot be represented by numbers use bar graphs

If IV is continuous use histogram or line graph

Plotting Relationships Between Variables

70

70

60

60

50

50

40

40

30

30

20

20

10

10

0

0

1 2 3 4 5

P NP

comfort scores for players (P) comfort scores for players after

and non-players (NP) several months of gaming

Strength of a Relationship

The previous graphs were functions of a descriptive statistic

rather than that of individual points

If you use raw data will very likely find some variability or

spread a scatter plot

Scatterplots

+.87 - 1.0

Strength of a Relationship

Correlation:

Measures the extent to which two concepts are related

e.g. years of university training vs. computer ownership per

capita

How?

obtain the two sets of measurements

calculate correlation coefficient

+1: positively correlated

0: no correlation (no relation)

1: negatively correlated

Strength of a Relationship

10

r2 = .668

condition 1 condition 2 9

5 6

4 5

8

6 7

4 4

5 6 7

3 5

5 7

4 4 6

5 7

6 7

6 6 5

7 7

6 8

7 9 4

3

2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

Pickles eaten per month

Correlation

10

per month (*10,000) 9

5 6

4 5 8

6 7

4 4

5 6 7

3 5

5 7

4 4 6

5 7

6 7

6 6 5

7 7

6 8

7 9 4

3

Which conclusion could be correct? 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5

- Eating pickles causes your salary to increase

Pickles eaten per month

- Making more money causes you to eat more pickles

- Pickle consumption predicts higher salaries because

older people tend to like pickles better than younger

people, and older people tend to make more money than

younger people

Correlation

Dangers

attributing causality

a correlation does not imply cause and effect

cause may be due to a third hidden variable related to

both other variables

unreliable with small groups

be weary of accepting anything more than the direction of

correlation unless you have at least 40 subjects

Correlation

Cigarette Consumption

lung cancer in 1950 per capita

consumption of cigarettes in

1930 in various countries.

can you prove that cigarette

smoking causes death from this

data?

age

poverty

Regression

Calculates a line of best fit

Use the value of one variable to predict the value of the other

e.g., 60% of people with 3 years of university own a computer

10

y = .988x + 1.132, r2 = .668

9

condition 1 condition 2

5 6 8

4 5

6 7 Condition 2

4 4 7

5 6

3 5

5 7 6

4 4

5 7

5

6 7

6 6

7 7 4

6 8

7 9

3

3 4 5 6 7

Interpreting Results from Factorial Experiments

Example:

time it takes subjects to read paragraphs typed in 12-point or 10-

point print

8-year olds in one group, 12-year olds in another group

an effect on the dependent variable

Is there an effect of print size? (main effect)

Is there an effect of age? (main effect)

Does the effect of one variable depend on the level of the other?

(interaction)

Interpreting Results from Factorial Experiments

Main Effects

To evaluate main effects of an IV must average across levels of

the other variable

between the two levels of age at each level of print size

We observe a change in print size (10-point to 12-point) causes a

change in DV (time) yes, there is main effect of print size

between the two levels of print size at each level of age

We observe that a change in age (increase) causes a change in DV

(time decreases) yes, there is a main effect of age

Interpreting Results from Factorial Experiments

Main effect of print size?

40

Reading Time

30

Time

20

yes

10

10 12 10 12

Print Size Print Size

Age

8 years Main Time

12 years effect yes

of age?

10 12

Print Size

Interpreting Results from Factorial Experiments

Interactions

To determine whether the IVs interact we must ask:

is the effect of print size different for each age? (or)

is the effect of age different for each print size?

1st question:

we see that going from 10-point to 12-point causes a decrease in

reading time for 8-year old but no diff for 12-year old

2nd question:

we see that the difference between reading times for the two

ages is larger for 10-point than for 12-point

Interpreting Results from Factorial Experiments

Interaction?

Time

Time

10 12 10 12

Print Size Print Size

Age

8 years yes

12 years

Activity

Time

Time

10 12 10 12

Print Size Print Size

Print size? No

Age? Yes Print size? Yes

Interaction? No Age? Yes

Interaction? No

Age

8 years

12 years

Inferential Statistics

In many experiments testing one design against

another

i.e. the independent variable is usually discrete

Discrete take on finite number of values (screen color)

Continuous take on any value (persons height, time to

complete task)

Special case when continuous variable is positive (response

time cannot be < 0)

Choosing a Statistical Technique

Independent Dependent

Variable Variable

Parametric

Discrete Normal ANOVA (ANalysis Of VAriance)

Continuous Normal Linear (non-linear) regression factor analysis

Non-parametric

Discrete Continuous Rank-sum versions of ANOVA

Continuous Continuous Spearmans rank correlation

Questions?

