You are on page 1of 52

Chapter 2

Collecting Data Sensibly


Note: Correct usage of the vocabulary in this
chapter is VERY important!
These headlines imply that spanking is the CAUSE of
Considerthethe following
observed headlines
difference in IQ. which
occurred onconclusion
Is this September 25, 2009.
reasonable?

“Spanking lowers a child’s IQ” (Los Angeles


Times)

“DoInyou
thisspank” Studies
study, two groupsindicate it could
of children lower
were followed
yourforkids’ IQ.”806
4 years; (SciGuy,
childrenHouston
ages 2 toChronicle)
4 and 704
children ages 5 to 9. IQ was measured at the
beginning of the study and again four years later.
“Spanking canfound
Researchers lowerthat
IQ” (NBC4i,
the averageColumbus,
IQ of children,
Ohio)
ages 2 to 4, who were not spanked was 5 points
higher than those who were spanked and 2.8 points
“Smacking higher for children,
hits kids’ ages 5 to 9.
IQ” (newscientist.com)
How do these two examples differ? Think about:
Observation versus Experimentation
• How the groups were determined?
Look •at
Were
theanyfollowing
variables controlled?
two examples:
• What did the researcher do?

•A social scientist studying a rural community wants to


determine whether gender and attitudes toward abortion
are related. Using a telephone survey, 100 residents
Which isare
the
contacted at random and their gender and attitude
experiment
toward abortion are recorded. and which is
the
•A professor might wonder what would happenobservational
to final
test scores if the required lab time for a chemistry
study?
course is increased from 3-hours to 6-hours. For 100
chemistry students, half were randomly assigned to the
3-hour lab and half to the 6-hour lab. The rest of the
course remained the same for the two groups. The
difference in their final test scores will be examined.
Definitions:
Observational study – a study in which the
researcher observes
A well-designed characteristics
experiment can result inof
data that
a sample providesfrom
selected evidence
onefor
or amore
cause-
populations.effect relationship.

Experiment - a study in which the


researcher observes how a response
variable behaves when one or more
explanatory variables (factors) are
manipulated.
Let’s return to the study on spanking and IQ

In this study, two groups of children were followed for 4 years; 806
children ages 2 to 4 and 704 children ages 5 to 9. IQ was measured
at the beginning of the study and again four years later. Researchers
found that the average IQ of children, ages 2 to 4, who were not
spanked was 5 points higher than those who were spanked and 2.8
points higher for children, ages 5 to 9.

Does spanking “CAUSE” a decrease


These areincalled
IQ?
Why or why not? confounding
variables.

Are there other variables connected to the response


(decreased IQ) and the groups of children?
Definition:
Confounding variable – a variable that is
related to both group membership and
the response variable of interest in a
research study

Because observational studies may


contain confounding variables, their
results can NOT be used to show cause-
effect relationships.
• Observational studies CAN be
generalized to the population if the
sample is randomly selected from the
population of interest, but CANNOT
show cause-effect relationships.

• Well-designed experiments CAN show


cause-effect relationships, but CANNOT
be generalized to the population if the
groups are volunteers or are not
randomly assigned.
Sampling

Section 2.2
Census versus Sample
Why might we prefer to take select a
sample rather than perform a census?
Obtaining information about the entire
1. Measurements that require destroying
population is called a census.
the item
Measuring how long batteries last
Safety ratings of cars
2.Difficult to find entire population
Length of fish in a lake Most common
3. Limited resources reason to use a
Time and money sample
Methods of selecting random samples

Simple Random Sample (SRS)


Suppose a local school has 2000 students. We
want to survey 100 students about the current
Acell
sample
phoneofpolicy.
size n Ais sample
selected
offrom the can be
students
population
selectedinbya putting
way that ensures
each that name
students’ everyon
different possible
individual sample of
(but identical) theof
slips desired
paper size
and
has the them
placing same in
chance
a largeofcontainer.
being selected.
After mixing
well, randomly select 100 names from the
container, one at a time.
It has
A to be possible
simple random for all 100
sample doesstudents
NOT
This is an example
in the sample to of
be a simple
seniors –random
or any sample.
other
guarentee that the sample is
combination of
representative of the
students!
population.
Methods of selecting random samples

Simple Random Sample (SRS) continued

Another
A sampleway
Another to
waysize
of toselect
select aasimple
simple
n is selected randomthe sample
random
from sampleis
toiscreate ainlist
to create
population of
listall
a away ofthe
that all students in every
the students
ensures that the
in school
the
school(called a
(called sampling
a frame).
different possible sample of theframe).
sampling desired size
has the same chance of being selected.
Number each student with a unique number
from 1 to 2000. Use a random digit table or
Sampling
randomframe
number – list of all the
generator (a objects or or
calculator
individuals
computer in the
software) topopulation.
select the 100 students
for the sample.
How to use a Random digit table
The following is part of the random digit table
found in the back of your textbook:
Row
6 0 9 3 8 7 6 7 9 9 5 6 2 5 6 5 8 4 2 6 4
7 4 1 0 1 0 2 2 0 4 7 5 1 1 9 4 7 9 7 5 1
8 6 4 7 3 6 3 4 5 1 2 3 1 1 8 0 0 4 8 2 0
9 8 0 2 8 7 9 3 8 4 0 4 2 0 8 9 1 2 3 3 2

We would continue in this fashion until we had selected


Since our students are 100
numbered 1-2000, we will select
numbers.
4-digit numbers
It would fromto
be faster the table.
use If the
a random number
number is not
generator.
within 1-2000, we will ignore it.
Methods of selecting random samples
Simple Random Sample (SRS) continued

A sample of size n is selected from the population


Most
in a way thatoften sampling
ensures is done
that every without
different
replacement.
possible sample ofThat is once an
the desired individual
size has the or
same
object
chance ofisbeing
selected, they are not replaced and
selected.
cannot be selected again.
Sampling with replacement allows an object or
Although sampling
individual with andmore
to be selected without
thanreplacement
once for a
are different, they can be treated as the same
sample.
when the sample size n is relatively small
compared to the population size (no more than
10% of the population).
Methods of selecting random samples
Stratified Random Sample

Instead ofisadivided
• Population simple into
random sample to answer
non-overlapping
our survey
subgroups calledabout the cell phone policy at
strata
school, suppose we were take four simple
• Simple random samples are selected from each
random samples of size 25 from each grade
stratum
level, freshman, sophomore, junior, and senior.
• Sometimes easier to implement and is more cost
effective than
This would simple
be an
Strata random
groupssampling
example
are of a stratified
that are
• Sometimessimilar
allowsrandom
more sample.
(homogeneous)
accurate based
inferences
upon some
about a population characteristic
than simple randomof the
sampling
group members.
Methods of selecting random samples

Cluster Sampling

• Let’s look atisanother


Population divided way
into to select a sample of
non-overlapping
students
subgroupstocalled
answer our survey on the current
clusters
cell phone policy at our school. One way to do
• this
Randomly
would select clusters and
be to randomly then
select all the
5 classrooms
individuals in the clusters
during 2nd period. Survey are included
all the in the
students in
sample those rooms!based upon
Clusters are often
location.
• Cluster sampling It is easier
is often best iftothe
perform
This is an example
clusters are
and more cost effective.of a cluster
heterogeneous sample.
subgroups from the population.
Methods of selecting random samples
Systematic Sampling
Suppose we randomly select a number between
•1 Aand 20. kUsing
value a alphabetical
is specified list ofkstudents
(for example = 50 or
at
k =our school, select the student whose name
200).
is atofthat
• One the number in the list. isThen
first k individuals choose
selected at
random. every 20th student from there.
• Then every kth individual in the sequence is
included in the sample.
•This
Thisismethod
an example
worksofreasonably
a systematic well as random
long as
sample.
there are no repeating patterns in the
population list.
Identify the sampling design
1)The Educational Testing Service (ETS)
needed a sample of colleges. ETS first
divided all colleges into groups of similar
types (small public, small private, medium
public, medium private, large public, and
large private). Then they randomly selected
3 colleges from each group.

Stratified random
sample
Identify the sampling design
2) A county commissioner wants to survey
people in her district to determine their
opinions on a particular law up for adoption.
She decides to randomly select blocks in
her district and then survey all who live on
those blocks.

Cluster sampling
Identify the sampling design
3) A local restaurant manager wants to survey
customers about the service they receive.
Each night the manager randomly chooses a
number between 1 & 10. He then gives a
survey to that customer, and to every 10th
customer after them, to fill it out before
they leave.

Systematic sampling
Consider the following example:

In 1936, Franklin Delano Roosevelt had been President


for one term. The magazine, The Literary Digest,
predicted that Alf Landon would beat FDR in that
year's election by 57 to 43 percent. The Digest
mailed over 10 million questionnaires to names drawn
from lists of automobile and telephone owners, and
Bias
over 2.3is the people
million tendency for samples
responded to differ
- a huge sample.
Atfrom thetime,
the same corresponding population
a young man named in some
George Gallup
sampled only 50,000 people and predicted that
Roosevelt would win. systematic way. was ridiculed
Gallup's prediction
asThis is aAfter
naive. classic
all, example
the Digestofhad
how bias affects
predicted the
the
winner in every results
election of a1916,
since sample!
and had based its
predictions on the largest response to any poll in
history. But Roosevelt won with 62% of the vote. The
size of the Digest's error is staggering.
Sources of bias People with unlisted
phone numbers – usually
Selection bias high-income families

• Occurs when the way the sample is selected


systematically excludes some part People
ofwithout
the
population of interest –called phone numbers –
undercoverage
Suppose you take a usually low-
sample by randomly income families
• May also occur
selecting if only
names fromvolunteers or self-
selected
the phoneindividuals are used in a study
book – some
groups will not have
People with ONLY cell
the opportunity of
phones – usually young
being selected! adults
Sources of bias

Convenience sampling
An example would be the surveys in
magazines that ask readers to mail in the
survey.
• Using Otheravailable
an easily examplesorare call-in shows,
convenient group to
form a sample. American Idol, etc.
– The Suppose
group may wenot decide to surveyofonly
be representative the the
Remember,
students
population the
in respondent
our
of interest statistics selects
class – why
– themselves
shouldto
Resultsmight thatparticipate
not be inin
generalized
cause bias the
to survey!
a the population
survey?

• Can also occur when samples rely entirely on


volunteers to be part of the sample – called
voluntary response
Sources of bias
Suppose we wanted
Measurement to survey
or Response high school
bias
students
APeople
Gallup on sponsored
survey drug abuse by andAmerican
the we usedPaper
a
are asked if they can trust men in
uniformed
Institute (Wall police
Street officer
Journal, May to17,interview
mustaches
• Occurs when –thethemethod
interviewer is a1994)
of observation man included
with
tends
each student
the following
to produce valuesin
question: ourissample
“It
that estimated – would
systematically we
that disposable
differ
diapers accounts
a mustache.
for less answers?
than 2% of the trash in
from the true getvalue
honest in some way
today’s landfills. In contrast, beverage containers,
– third-class
Improperlymail
calibrated
and yard scale
wasteis used to weigh items
are estimated to
– Tendency
account of people
for about 21% of not to be
trash incompletely honest
landfills. Given this,
when
in your askedwould
opinion, aboutitillegal
be fairbehavior
to tax or or ban
unpopular
disposable
beliefs
diapers?”
– Appearance or behavior of the person asking the
questions
– Questions on a survey are worded in a way that
tends to influence the response
Sources of bias

Nonresponse

• occursThewhen responses
phone are answer.
rings – you not obtained from
“Hello,”
all People are selected
individuals
the person chosen by
says, “doforthe
you researchers,
inclusion
have in the
time for a
sampleHowBUTmight
survey this
refuse
aboutfollow-up
to be done?
participate.
radio stations?”
You hang up!
• To minimize NOT self-selected!
nonresonse bias, it is critical that
a serious effort be made to follow up with
This is often
individuals who didconfused withtovoluntary
not respond the initial
response!
request for information
Identify a potential source of bias.
1) Before the presidential election of 1936, FDR
against Republican ALF Landon, the magazine
Literary Digest predicting Landon winning the
election in a 3-to-2 victory. A survey of 2.3
million people. George Gallup surveyed only
50,000 people and predicted that Roosevelt
would win. The Digest’s survey came from
Undercoverage – since the
magazine subscribers, carDigest’s
owners, survey comes
telephone
from car owners,
directories, etc. etc., the people selected were
mostly from high-income families and thus mostly
Republican! (other answers are possible)
Identify a potential source of bias.

2) Suppose that you want to estimate the


total amount of money spent by students
on textbooks each semester at a local
college. You collect register receipts for
students as they leave the bookstore
Convenience sampling – easy way to
during lunch one day.
collect data
or
Undercoverage – students who buy books
from on-line bookstores are excluded.
Identify a potential source of bias.

3) To find the average value of a home in


Plano, one averages the price of homes
that are listed for sale with a realtor.

Undercoverage – leaves out homes that


are not for sale or homes that are listed
with different realtors.
(other answers are possible)
Comparative Experiments

Sections 2.3 & 2.4


SupposeThis
we are interested
is called in determining
the response variable. the
effect of room temperature on the
performance
Responseon a first-semester
variable calculus
– a variable that is not exam.
So wecontrolled
decide to byperform an experiment.
the experimenter and that is
measured
This is calledasthe
part of the experiment
explanatory variable.

What variable
Explanatory will we– “measure”?
variables those variables that have
values
thethat are controlled
performance on by the experimenter
a calculus exam
(also called factors)

What variable will “explain” the results on the


calculus exam?
the room temperature
Room temperature experiment continued . . .
We decide to use two temperature settings, 65°
and 75°.

How many treatments would our experiment


have?
the 2 treatments are the
2 temperature settings
Experimental condition – any particular
combination of the explanatory variables (also
called treatments)
Room temperature experiment continued . . .
Suppose we have 10 sections of first-semester
calculus that have agree to participate in our study.
Random assignment of subjects to treatments or
Ontreatments
who or whatto trials ensures
will we thatthe
impose thetreatments?
experiment
does not systematically favor one treatment over
the 10 sectionsanother.
of calculus

How would we determine which sections would be in


rooms with the temperature set at 65° and which
sections in rooms
These are set ator75°?
our subjects experimental units.
we need to randomly assign them
Experimental units – the smallest unit to which a
to the treatments
treatment is applied.
Room temperature experiment continued . . .
To randomly assign the 10 sections of first-
Place the numbers 1-10 on identical slips of
semester calculus to the 2 treatment groups, we
paper and put them in a hat. Mix well.
would first number the classes 1-10.

Sections assigned
98
5
73 Treatment 1 (65°) 9 7 5 8 3
Treatment 2 (75°) 1 2 4 6 10

Randomly selectsections
The remaining 5 numbers
will from
have the
the hat.
room
Those willtemperature
be the sections that
set at have the
75°.
room temperature set at 65°.
Room temperature experiment continued . . .
Notice that there are five sections assigned to
each treatment.

Why is
This is called replication. replication an
Sectionsimportant
assigned
Treatment 1 (65°) 9 7 5trait8of a 3
well-designed
Treatment 2 (75°) 1 2 experiment?
4 6 10

Replication ensures that we have multiple


observations for each treatment.
Room temperature experiment continued . . .
In an experiment,
Remember these extraneous
– the explanatory variable is the
Thesevariables
room need tosetting,
temperature
other variables be “controlled”.
are 65°
called and 75°.variables.
extraneous The
response variable is the grade on the calculus
AnDirect control
extraneous is holding
variable is athe extraneous
variable that is NOT one of
exam.Whatconstant
variables about theso variables
that theirthat the are
effects
the explanatory variables (factors) but
Can it experimenter
the is thought to
experimenter can’t
not confounded
affect directly
with those
the control?
of the these extraneous
response.
control
What can
experimental be done to
conditions avoid
(treatments).
Are there confounding
other variables variables?
that could If so, how?
affect the
results?
response?
Remember -
two variables are confounding if their
effects on the response cannot be
distinguished from each other.
Room temperature experiment continued . . .
Suppose that there were five instructors who
taught the first-semester calculus. We do not
have direct control of this variable; however,
we could have each instructor teach 2 sections.
Then we could randomly assign which one of the
2 sections would have a temperature setting of
65° and the other would have a temperature
setting of 75°.
This is an example of blocking.

Blocking is process by which an extraneous variable’s


effects are filtered out. Similar groups, called blocks,
are created. All treatments are tried in each block.
Room temperature experiment continued . . .
What about extraneous variables that we
cannot control directly or that we cannot block
for or that we don’t even think about?

Random assignment should evenly spread all


extraneous variables, that are not controlled
directly or that are not blocked, into all
treatment groups. We expect these variables
to affect all the experimental groups in the
same way; therefore, their effects are not
confounding.
Room temperature experiment continued . . .
Would the students in each section of calculus
know to which treatment group, 65° or 75°,
they were assigned?

If the students knew about the experiment,


they would probably know which treatment
group they were in.

SoAn
Athis experiment
experiment
double-blind is probably
inexperiment
which the subjectsNOT
is one in do not
which
know which
blinded.
neither thetreatment they
subjects nor were
the in is called
individuals whoa
single-blind
measure the experiment.
response knows which treatment
is received.
In the room temperature experiment, we only
have 2 treatment groups, 65° and 75°. We do
NOT have a control group.

Control group is an experimental group that


does NOT receive any treatment.

The use of a control group allows the


experimenter to assess how the response
variable behaves when the treatment is not
used.
This provides a baseline against which the
treatment groups can be compared to
determine whether the treatment had an
effect.
Consider Anna, a waitress. She decides to
perform an experiment to determine if writing
“Thank you” on the receipt increases her tip
percentage.

She plans on having two groups. On one group


she will write “Thank you” on the receipt and on
the other group she will not write “Thank you”
on the receipt.

Which of these is the control group?


Suppose we want to test an herbal supplement
to determine if it aided in weight loss.

Why would it not be beneficial have two


groups in the experiment; one that takes the
supplement and a control
This is called group that takes
a placebo.
nothing?
A placebo is something that is identical to the
treatment
What could group but contains
be done to remedyno active ingredient.
this problem?

Give one group the supplement and give the other


group a pill that is the same size, color, taste, smell,
etc. as the supplement, but contains no active
ingredient.
Let’s recap some ideas-

Random assignment removes the


potential for confounding variables.

Blocking uses extraneous variables to


create groups (blocks) that are similar.
All treatments are then tried in each
block.

Direct control holds extraneous variables


constant so their effects are not
confounded with the treatments.
Experimental Designs
1. Completely randomized design –
experimental units are assigned at
random
Let’s look to treatments
at two examples of or
completely
treatments are experiments.
randomized assigned at random
to trials

Measure
Random Assignment

Treatment response for


A A
Experimental Compare
Units treatments
Measure
Treatment response for
B B
Example 1: A farm-product manufacturer wants
to determine if the yield of a crop is different
when the soil is treated with three different
types of fertilizers. Fifteen similar plots of
land are planted with the same type of seed
but are fertilized differently. At the end of
the growing season, the mean yield from the
sample plots is compared.
Experimental units? Plots of land
Factors? Type of fertilizer
Response variable? Yield of crop
How many treatments? 3
Fertilizer experiment continued: A farm-product
manufacturer wants to determine if the yield of
a crop is different when the soil is treated with
three different types of fertilizers. Fifteen
similar plots of land are planted with the same
type of seed but are fertilized differently. At
the end of the growing season, the mean yield
from the sample plots is compared.
Why is the same type of seed used on all 15
plots? It is part of the controls in the experiment.
What are other potential extraneous
variables? Type of soil, amount of water, etc.
Does this experiment have a placebo? Explain
NO – a placebo is not needed in this experiment
Example 2: A consumer group wants to test cake
pans to see which works the best (bakes evenly).
It will test aluminum, glass, and plastic pans in
both gas and electric ovens. There are 30 boxes
of cake mix to use for this experiment.
Experiment units? Cake mixes

Factors? Two factors - type of pan (aluminum,


glass, and plastic) and type of oven
(electric and gas)
Response variable? How evenly the cake bakes

Name the treatments?


Aluminum pan in electric oven, aluminum pan in gas oven,
glass pan in electric oven, glass pan in gas oven, plastic
pan in electric oven, and plastic pan in gas oven
Cake experiment continued: A consumer group wants to
test cake pans Could wewhich
to see roll a works
die forthe
each box?
best (bakes evenly).
If aluminum,
It will test we roll a “1” assign
glass, andthe box to
plastic theinfirst
pans both gas
treatment
and electric (aluminum
ovens. pan 30
There are in electric
boxes ofoven). If we
cake mix to use
roll aexperiment.
for this 2, assign the box to the 2 treatment, and so
nd

on.
Describe howone
This is just to randomly
way that youassign the cakethis
can perform mixes to
the treatments so randomization.
that there is an even number
in each treatment.
Number the boxes of cake mix from 1 to 30. Write the numbers 1
to 30 on identical slips of paper and place into a hat. Mix well.
Randomly select 6 numbers from the hat and assign those boxes to
the treatment of aluminum pan in electric oven. Randomly select 6
more numbers and assign those boxes to the treatment aluminum
pan in gas oven. Continue this process, randomly assigning 6 boxes
to each treatment glass pan in electric oven, glass pan in gas oven,
and plastic pan in electric oven. The remaining 6 are assigned to
plastic pan in gas oven
Experimental Designs Continued . . .
Units should be blocked on a variable that
2. Randomized block – units are blocked into
effects the response!!!
groups (homogeneous) and then randomly
assigned to treatments
Random Assignment
Treatment
Measure
A response
for A

Block Compare
treatments
1 for block 1

Treatment Measure
response
Experimental

results from the


Create blocks

Compare the
for B

2 blocks
Units

Random Assignment

Treatment
Measure
A response
for A

Block Compare
treatments
2 for block 2

Treatment Measure
B response
for B
Fertilizer experiment revisited: A farm-product
manufacturer wants to determine if the yield of a crop
is different when the soil is treated with two different
types of fertilizers. Twenty plots of land (10 plots are
along a river and 10 plots are away from the river) are
planted with the same type of seed but are fertilized
differently. At the end of the growing season, the
mean yield from the sample plots is compared.

Can the experimenter directly control the


types of soil in the different plots of land?
No – they must use the plots that are available

What can be done to account for this variable?


They could block by type of land
Fertilizer experiment revisited:
Describe how to create the blocks of land and
then to randomly assign plots to the 2 types of
fertilizer.
• First create 2 blocks of land. Block 1 would be the 10 plots that
are by the river. Block 2 would be the 10 plots away from the river.
• Number the 10 plots in block 1 from 1 to 10. Write the numbers 1
to 10 on identical slips of paper and place into a hat. Mix well.
Randomly select 5 numbers from the hat and assign those boxes to
fertilizer A. The remaining 5 are assigned to Fertilizer B.
• Number the 10 plots in block 2 from 1 to 10. Write the numbers 1
to 10 on identical slips of paper and place into a hat. Mix well.
Randomly select 5 numbers from the hat and assign those boxes to
fertilizer A. The remaining 5 are assigned to Fertilizer B.
Experimental Designs Continued . . .
3. Matched pairs - a special type of
block design where the blocks consist
of 2 experimental units that are
similar with each being randomly
assigned to a treatment
OR
the block consist of individual units
that are assigned both treatments in
random order
Example 3: Two new word-processing programs
are to be compared by measuring the speed with
which a standard task can be completed. One
hundred volunteers are will perform the same
task on each of the programs in random order
and their speeds will be measured.

Explain why this is a matched pairs design.


Each block consist of an individual who will do
both treatments
How could we determine which
program the volunteers use first?
We could flip a coin for each volunteer; heads they
do program A first, tails they do program B first.
The ONLY way to show a
cause-effect relationship
is with a well-designed,
well-controlled
experiment!!!

You might also like