Professional Documents
Culture Documents
STUDIES.
1. Observational Collect data in a way that does not interfere with
how data arises. Only establish an association.
1.1.
Retrospective
Uses data from the past.
1.2.
Prospective Data is collected throughout the study.
2. Experiment
Randomly assign subjects to treatments.
Establish causal connections.
Extraneous variables that affect both the explanatory and the response variable,
and that make it seem like there is a relation between them is called a
CONFOUNDING VARIABLE.
CORRELATION DOES NOT IMPLY CAUSATION.
Types of Biases:
1. Convenience sample
People easily available are used in the
study.
2. Non response
Only a few NON-RANDOM people from
the randomly sampled people respond then the result is not
representative.
3. Voluntary Response
Contains people only who volunteer to
respond. This is only when they have a strong opinion from them and thus
is also not representative.
SAMPLING METHODS
1. Simple Random Sampling
Randomly select cases from the population and each case is likely to be
elected. Drawing a name from the hat.
2. Stratified Sampling
Divide the population into homogenous strata then randomly sample
from within each stratum
3. Cluster Sampling
Divide the population onto clusters. Randomly sample a few the clusters
and then randomly sample from within these clusters. Unlike the strata the
clusters might not be homogenous but each cluster is similar to another
such that we can get away from sampling from just a few clusters.
EXPERIMENTAL DESIGN.
1. Control
Compare treatment of interest to a control group.
2. Randomize
Randomly assign subjects to treatment
3. Replicate
Collect a sufficiently large sample or replicate the entire study
4. Block
Block for variables known or suspected to affect the outcome is block.
Difference between explanatory variable and blocking variable:
Explanatory variables (factors) are conditions which we can impose on
experimental units.
Blocking variables are the characteristics that the experimental units come with,
that we would like to control.
Blocking is like stratifying
Scatter Plot
Explanatory Variable is usually the x-axis and the response is the yaxis.
Things to bear in mind when evaluating the relationship between two
variable:
1.1.
Direction
Positive or negative
1.2.
Shape
Linear or some other form
1.3.
Strength
Strong indicated by little scatter or weak indicated by lots of scatter
1.4.
Any potential outliers.
Investigate these points to make sure they are not data entry
errors.
A nave approach would be to ignore (exclude) the outliers but sometimes these
outliers can be very interesting cases and handling them with careful
consideration of research question and other associated variables is important.
2. Histograms
2.1.
Provides a view of the data density.
2.2.
Identifying the shape of the distribution.
The width of the bin in the histogram can alter the story that the histogram is
conveying.
3. Dot Plot
4. Box Plot
5. Intensity Map
MEASURES OF CENTER:
1. Mean
Arithmetic average
2. Median
50th Percentile
3. Most frequent
Most frequent observation
If these measurements are calculated from a sample they are known as sample
statistics.
MEASURE OF SPREAD
1. Range: Max-min
2. Variance
4. Inter-Quartile Range
ROBUST STATISTICS
We define robust statistics as measures on which extreme observations have
little effect.
TRANSFORMING DATA
A transformation is rescaling the data using a function.
When data are very strongly skewed we sometimes transform them so they are
easier to model.
Methods
1. Log (natural) Transformation (most usual)
To make the relationship between variables more linear and hence easier
to model with simple methods.
2. Other Transformations
Goals of Transformation.
2. Pie Chart
Less helpful than bar plots.
3. Contingency Table.
4. Relative frequencies.
5. Segmented bar plot.
6. Relative frequency segmented bar plot.
7. Mosaic plot.
8. Side-by-side box plots.
INTRODUCTION TO INFERENCE
Random Process
In a random process we know what outcomes could happen but we dont
know which particular outcome will happen.
1. Frequentist interpretation
The probability of an outcome is the proportion of times the outcome
would occur if we observed the random process an infinite number of
times
2. Bayesian interpretation
A Bayesian interprets probability as a subjective degree of belief.
INDEPENENT EVENTS
Two processes are said to be independent if knowing the outcome of 1 provides
no useful information about the outcome of the other.
Checking for independence
P (A|B) = P (A), then A & B are independent.
If the difference is large there is stronger evidence that the difference is real
If the sample size is large even a small difference can provide strong evidence of
a real difference.