Professional Documents
Culture Documents
Stat 110 Lecture 1
Stat 110 Lecture 1
1
9/26/2017
2
9/26/2017
Randomized Experiment:
Quitting Smoking with Nicotine Patches CRUCIAL POINT
“After the eight-week period of patch use, almost half (46%) of
the nicotine group had quit smoking, while only one-fifth (20%)
of the placebo group had.” Newsweek, March 9, 1993, p. 62 This study is a randomized experiment.
Double-blind, Placebo-controlled We can conclude that nicotine patches
Randomized Experiment cause people to quit smoking.
240 smokers recruited (volunteers)
Randomized to 22-mg nicotine patch or placebo
(controlled) patch for 8 weeks. Potential confounding variables should be
Double-blind: neither the participants nor the nurses similar in the placebo and nicotine patch
taking the measurements knew who had received groups because of random assignment.
the active nicotine patches.
Summary of Types of Studies Building a Statistical Model:
Four‐step Process Used by Textbook
Observational study – Data are recorded
without “manipulating” any of the variables.
1. CHOOSE – Pick a form for the model.
Statistical experiment – One or more of the
explanatory variables is/are assigned/controlled 2. FIT – Estimate any parameters.
for all experimental units.
3. ASSESS – Is the model adequate? Could
Should use an experiment if we want to it be simpler? Are conditions met?
confirm a “cause/effect” relationship.
4. USE – Answer the question of interest.
Cannot conclude cause/effect from an
observational study!
Simplest Example: Constant Model; predict weight
General form of a model (for each individual):
loss for certain diet, based on sample of people
Y f (X ) Y c
Individual
Random error CHOOSE this model:
where c is an unknown constant.
“Expected” Y for some
combination of predictors Terminology:
The constant c is a parameter of this model.
Data = Model + Error We use data to provide a sample estimate of c.
3
9/26/2017
FIT the model: Predicted Value for Y Assessment Questions
Get an estimate for Y using the predictors
(1) Which estimator (mean or median)
and the model with estimated parameter(s).
For the “constant” model, only 1 parameter. is better?
(That is, how can we compare models?)
Assessing Fit: Residuals Criteria to Minimize Residuals
Residual Y Yˆ
Sum of absolute
deviations:
Y Yˆ
Actual Predicted Sum of squared 2
errors: (Y Yˆ )
Assess fit by creating a summary of size of
the residuals – want it to be small!