Professional Documents
Culture Documents
http://www.cartoonstock.com/directory/d/data_gathering.asp
1.3a 1
Sources of Data - Goals
• Identify anecdotal data in a specific situation and
explain why we should not use this type of data.
• Define what is meant by available data.
• Distinguish between experiments and observational
studies.
1.3a 2
Anecdotal Data
Anecdotal data represent individual cases that
often come to our attention because they are
striking in some way.
1.3a 3
Anecdotal Data
Anecdotal data represent individual cases that
often come to our attention because they are
striking in some way.
1.3a 4
Available Data
Available data are data that were produced in
the past for some other purpose but that may
help answer a present question inexpensively.
The library and the Internet are sources of
available data.
1.3b 5
Observational Studies and Experiments
• In an experimental study, we investigate the
effects of certain conditions on individuals or
objects in the sample.
• In an observational study, we observe the
response for a specific variable for each
individual or object.
1.3b 6
Observational Data
2.5
Number of injuries per cat
2.0
1.0 1.5
0.5
0 1 2 3 4 5 6 7-8 9-32
N: (0) (8) (14) (27) (34) (21) (9) (13)
Number of stories fallen
Photos: The Analysis of Biological Data, Whitlock, Schluter, 2009, Roberts and Company, p. 3
Graph: modified from Whitney, W. O., and C. J. Mehlhaff. "High-rise syndrome in cats." Journal
of the American Veterinary Medical Association 191.11 (1987): 1399-1403.
1.3c 7
Designing Experiments and Observational Studies - Goals
1.3d 8
Terms Used in Experiments
• Experimental units
• Factor (independent variable)
– Level
• Outcome or response (dependent variable)
• Statistically significant
– An observed effect so large that it would
rarely occur by chance.
1.3d 9
Example: Terminology of Experimental Design
For each of the following, define the experimental unit,
factor, levels, response variable and what would make
the study statistically significant.
1) In a study of cell phone usage by college students,
we want to know how much time is being spent on
different types of apps on the phones.
1.3e 10
Example: Terminology of Experimental Design
1) In a study of cell phone usage by college students,
we want to know how much time is being spent on
different types of apps on the phones.
1.3e 11
Example: Terminology of Experimental Design
For each of the following, define the experimental unit,
factor, levels, response variable and what would make
the study statistically significant.
2) In a study of sickle cell anemia, 150 patients were
given the drug hydroxyurea, and 150 were given a
placebo (dummy pill). The researchers counted the
episodes of pain in each subject.
1.3e 12
Example: Terminology of Experimental Design
2) In a study of sickle cell anemia, 150 patients were
given the drug hydroxyurea, and 150 were given a
placebo (dummy pill). The researchers counted the
episodes of pain in each subject.
1.3e 13
Principles of Experimental Design
1. Control: Compare two or more treatments.
2. Randomize: use chance to assign
experimental units to treatments.
3. Replication: Use enough experimental units
in each group to reduce chance variation in
the results.
1.3f 14
Comparative Experiments
Experimental Measure
Treatment
units response
Measure
Control
response
Compare
Experimental
results
units
Measure
Treatment
response
1.3f 15
Bias
The design of a study is biased if it systematically
favors certain outcomes.
1.3g 16
Principles of Experimental Design
1. Control: Compare two or more treatments,
2. Randomize: use chance to assign
experimental units to treatments.
3. Replication: Use enough experimental units
in each group to reduce chance variation in
the results.
1.3h 17
Randomized Experiments
• In a completely randomized design, the
treatments are assigned to all the
experimental units completely by chance.
1.3h 18
Randomized Experiments
Treatment
Group 1 1
Treatment
Group 2
2
19
1.3h
Randomization
1. Label each of the N individuals.
2. Put the N numbers into a hat.
3. Draw the numbers one at a time until you
have n individuals.
1.3h 20
Other Designs
• A matched pair design is when each
experimental unit is matched with another
one.
• A block is a group of experimental units that
are similar.
• In a block design, the random assignment of
experimental units to treatments is carried out
within each block.
1.3h1 21
Principles of Experimental Design
1. Control: Compare two or more treatments,
2. Randomize: use chance to assign
experimental units to treatments.
3. Replication: Use enough experimental units
in each group to reduce chance variation in
the results.
1.3i 22
Cautions about Experimental Design
1. Bias
2. Generalization
1.3j 23
Lack of Realism
1. Is studying the effects of eating red meat on
cholesterol values in a group of middle aged
men a realistic way to study factors affecting
heart disease problems in humans?
1.3k 24
Lack of Realism
2. Is studying the effects of hair spray on rats a
realistic way to determine what will happen
to people with large amounts of hair?
1.3k 25
Good Experimental Design
1. Control
2. Randomization
3. Replication
Control what you can, block what you can’t
control, and randomize to create comparable
groups.
1.3l 26
Sampling Design - Goals
• Be able to define what a probability sample, simple
random sample (SRS), or a stratified random sample is.
• Be able to determine when the preferred method is
simple random sample or stratified random sample.
• Be able to state when there is a response bias with
obtaining a sample: convenience sample, self-selection,
nonresponse
1.3m 27
Population and Sample
• Population: the entire group of objects that
we want information about.
– Census
• Sample: the part (subset) of the population
that we actual examine.
1.3m 28
Population and Sample
1.3m 29
Sampling Methods
• Probability Sample
• Simple Random Sample (SRS)
• Stratified Random Sample
1.3n 30
Simple Random Sampling (SRS)
A (simple) random sample (SRS) of size n is a
sample selected in such a way that every possible
sample of size n has the same chance of being
selected.
1.3n 31
SRS
Method
1. Label every object.
2. Generate random integers from a uniform
distribution to select the objects.
3. After sorting, select the first n objects.
1.3n 32
Sampling Methods
• Probability Sample
• Simple Random Sample (SRS)
• Stratified Random Sample
1.3o 33
Stratified Random Sample
• Procedure
1. Divide the population into groups called strata.
2. Choose a SRS for each group.
1.3o 34
Stratified Random Sample (Example)
Wyoming
585,501
California
39,250,017
1.3o 35
Response Bias
• Convenience sample
• Self-selection
• Undercoverage
• Nonresponse
Your random sample needs to be representative
of the population.
1.3p 36
Bias vs.
Variability
Remove your
bias!
1.3q 37
Bias vs.
Variability
E(t) =
E(x)̄ = μ
1.3q 38
Managing Bias and Variability
• To reduce bias, use random sampling.
• To reduce variability of a statistic from an SRS,
use a larger sample
1.3r 39
Statistical Inference
• Sample has to be representative of the
population
– Randomize
– Remove bias
• The experiment has to be performed in such a
way that you can obtain the data that you are
interested in.
• Perform the correct analysis.
1.3s 40
Causality
http://www.forbes.com/sites/erikaandersen/2012/03/23
/true-fact-the-lack-of-pirates-is-causing-global-warming/
1.3t 41
Causality - Goals
• Be able to determine if a variable is lurking or
confounding.
• Be able to explain an association in terms of
– Causation
– Common response
– Confounding variables
• Apply the criteria for establishing causation.
• Association and causation are NOT the same thing.
• Interpret examples in terms of Simpson’s paradox.
1.3t 42
Causality
• A lurking variable is a variable that is not
among the explanatory or response variables
in a study but that may influence the variables
in the study.
• Confounding occurs when two variables are
associated in such a way that their effects on a
response variable cannot be distinguished
from each other.
1.3t 43
Lurking Variables
In each case, what is the lurking variable?
1.For children, there is an extremely strong
correlation between shoe size and math
scores.
1.3u 44
Lurking Variables
In each case, what is the lurking variable?
2.There is a very strong correlation between ice
cream sales and number of deaths by
drowning.
1.3u 45
Lurking Variables
In each case, what is the lurking variable?
3.There is very strong correlation between the
number of churches in a town and the number
of bars in a town.
1.3u 46
Lurking Variable?
1.3u 47
Causation
Association does not mean causation!
http://www.tylervigen.com/spurious-correlations
1.3v 48
Causation
association causation
x: explanatory y: response z: lurking
1.3v 49
Causation
Association does not mean causation!
1.3v 50
Establishing Causation
Perform an experiment!
What do we need for causation?
1. The association is strong.
2. The association is consistent.
The connection happens in repeated trials
The same connection happens under varying
conditions
3. Alleged cause precedes the effect.
4. The alleged cause is plausible.
1.3w 51
Simpson’s Paradox
1.3x 52
Simpson’s Paradox
Consider the acceptance rates for the following
groups of men and women who applied to
college.
Not Not
Counts Accepted accepted Total Percents Accepted accepted
1.3x 53
Simpson’s Paradox
• Business School
Not Not
Counts Accepted Total Percents Accepted
accepted accepted
Men 18 102 120
Men 15% 85%
Women 24 96 120
Total 42 198 240 Women 20% 80%
• Art School
Counts Accepted Not Total Not
accepted Percents Accepted accepted
Men 180 60 240
Women 64 16 80 Men 75% 25%
Total 244 76 320 Women 80% 20%
1.3x 54
Provide All the Critical Information
“When it came to the statistical review, it was
often clear that critical information was lacking,
and the gaps nearly always had the practical
effect of making the authors’ conclusions look
stronger than they should have.”
- Statistician John Bailer, who as a
consultant for the New England Journal
of Medicine, screened > 4000 papers
over 10 years
1.3y 55
Should we allow this personal
information to be collected?
1. A government agency takes a random sample
of income tax returns to obtain information on
the average income of people in different
occupations. Only the incomes and
occupations are recorded from the returns, not
the names.
1.3z 56
Should we allow this personal
information to be collected?
2. A social psychologist attends public meetings of
a religious group to study the behavior patterns
of members.
3. A social psychologist pretends to be converted
to membership in a religious group and attends
private meetings to study the behavior
patterns of members.
1.3z 57