You are on page 1of 4

9/7/16

Causation
Randomized, comparative experiments are intended to give good evidence that
differences in the treatment are caused observed differences
....
Reducing Impact of Random Chance
No conclusion based on statistical analysis is 100% certain
Every individual in a study will be different
We need to use enough subjects to reduce chance variation that would create
differences due to randomization alone
Fundamental Principles of Design
Control: limit the effects of lurking variable and uninteresting sources of variation
o Use two or more treatments
o There are many other techniques to control for this
o
Statistical Significance
We create a range of values for possibly observed effects due to random chance only
If the treatment effects are outside this range, we say the observed effects are
statistically significance
Matched Pairs Design
Completely randomized designs are the simplest experimental design like how SRS is
the simplest sampling design
In some cases, it makes sense to systematically group subjects based on similar, known
characteristics
Matched pairs design: compares exactly two treatments, either by using a pair of
individuals (that are closely matched) or by using each individual twice
Randomize two treatments within each pair or randomize order if same individual
Block Designs
A matched pair is a special case of a block design
Block: a group of individuals that known before the experiment to be similar in some way
that is expected to impact the effect of the treatments on the response variable
Block design: design in which random assignment . . .
Treatments and Blocks
The treatments are not randomly assigned to the blocks; they are assigned to the
individuals within each block
Other Experimental Considerations

Blind study
Double-blind study

Replication
Using multiple copies of the same treatment gives you a better idea of the effect of that
treatment
Replication gives an idea of the variation associated with the treatment
The most convincing evidence for causation comes from replicating the design in
different locations and independent investigators
9/9/16
Concerns with Experimentation
Lack of realism
o Controlling variation can limit conclusions
o Cannot generalize results to population
Ethics
o Informed consent is required
o Review board
o Data must be kept confidential
Mean
Most common metric of central tendency
Distribution is known
o How often you can expect to see one value
Median
Another measure of central tendency
Sample proportion
Proportion of sample observations in a category
5 number summary
Min, Q1, median, Q3, max
Boxplot

Standard deviation

Deviation how far an observation is from your measure of central tendency


Variance notation?
Standard deviation s = sqr(s2)
*Need to memorize formula for variance & standard deviation
o Know how to calculate
Will only equal zero if all the points are the same
Always positive values

Relationships between variables


Correlation distribution of your response variable changes changes for values of the
explanatory variable
Response variable variable of interest for your study
o Aka dependent variable
Explanatory variable independent variable
2 categorical
Contingency table
Stacked bar chart
1 quantitative, 1 categorical
Side by side boxplots
Overlapping histograms
9/12/16
Distribution?
Conditional distribution The distribution of one variable at a specific state of another
variable
Joint distribution the distribution of two variables together
o Divide individual groups by sample size
Marginal distribution the distribution of one variable by itself
o We are not considering the other variable at all
o Divide row/column total by group size
*If the conditional distributions are different from the marginal distributions, then the two
variables have some relationship

Overlapping Distributions
For quantitative variables, we looked at their distribution

If the response variable is quantitative and the explanatory variable is categorical, we


want to look at how the distributions of response changes depending on the categorical
values
Use previous visualizations of quantitative data, but make one for each categorical value
To compare, make sure the scale is the same for all plots

Overlapping Histograms
Basically the same thing as multiple dot plots, but better for large data
Side-by-Side Boxplots
Using the previous methods is overwhelming for multiple groups and large data
Side-by-side boxplots are the most useful here
Quantitative vs Categorical
Measuring the relationship compares the distribution of the quantitative variable at
different values of the categorical variable
Mainly interested in comparing mean and variance
Explaining Variance
What if we ignored the categorical variable and estimated the mean and variance of the
quantitative variable?

You might also like