You are on page 1of 4


STATISTICS 110 Examples on White Board

Outline for today: 1. Ex 0.4: Do students with higher GPA have a

 Go over syllabus and dates for the quarter better chance of getting into med school?
 Overview of basic terminology MedGPA includes Accept/Deny and GPA
 Cover most of Chapter 0 2. Ex 0.6: Do financial incentives help people
lose weight? Randomly assigned to get
 Overview of coverage in this course and in
incentive or not (control group)
Stat 111/202
WeightLossIncentive4 and page 8.

Some Fundamental Definitions Types of Data (Variables)

• Population: All of the individual units about • Categorical: Data consist of category names
which we want information – Male/Female (two categories = binary)
– Examples on white board – Level of education (ordered categories = ordinal)
• Sample: Units for which we obtain data – Smoker/nonsmoker
– Examples on white board – Opinion on an issue (favor, oppose, no preference)
• A variable: Something we measure (for sample) – Admit status (for med school example)
or could measure (for population) on each unit • Quantitative: Data consist of numbers where
– Examples on white board ordinary arithmetic makes sense
– Height, weight, GPA, number of siblings

More Fundamental Definitions Description or Decision?

How Data Are Used
(Population) Parameter: • Descriptive Statistics: using numerical and
A number associated with a population graphical summaries to characterize
– Example: Proportion admitted to med school a data set (and only that data set).
for the population of applicants with GPA of at
least 3.5. • Inferential Statistics: using sample
(Sample) Statistic: information to make conclusions about
a population.
A number associated with a sample
– Example: Proportion admitted to med school • Models: Used to approximate the population
for the observed sample of applicants with GPA relationship between two (or more) variables.
of at least 3.5. This course is all about finding good models!


Definitions of Types of Studies Experiment:

Observational Study: Researchers manipulate something and
• Researchers observe or question participants measure the effect of the manipulation
about opinions, behaviors, or outcomes. on some outcome of interest.
• Participants not asked to do anything different. Randomized experiments: participants are
• Example: We cannot randomly assign students randomly assigned to participate in one
to have GPA above/below 3.5! condition (called treatment) or another.
Two special cases: Sometimes cannot conduct experiment due
Sample surveys and Case-control studies. to practical/ethical issues.
NOT the same thing as random sampling.

Two Important Issues Based on Types of Variables (Measured or Not)

Data Collection Method
• Explanatory variable (or independent
• Extending results to a population: This
variable) is one that may explain or may cause
can be done if the data are representative
differences in a response variable (or
of a larger population for the question of outcome or dependent variable).
interest. Safest to use a random sample.
• A confounding variable is a variable that:
• Cause and effect conclusion: Can only be – affects the response variable and also
made if data are from a randomized – is related to the explanatory variable.
experiment, not from an observational • Example: Admit (yes/no) is response variable
study. and GPA is explanatory variable. Possible
• Examples on white board confounding variable is general ambition.

Example of an Observational Study:

Lead Exposure and Bad Teeth CRUCIAL POINT
“Children exposed to lead are more likely to suffer tooth decay …”
USA Today
This study is an observational study.
Observational study
involving 24,901 children. We cannot conclude that lead exposure
Explanatory variable = causes tooth decay.
level of lead exposure.
Response variable = extent
child has missing/decayed It would be unethical to do a randomized
teeth. experiment, so we need other (non-
Possible confounding statistical) ways to establish cause and
variables = income level,
diet, time since last dental


Randomized Experiment:
Quitting Smoking with Nicotine Patches CRUCIAL POINT
“After the eight-week period of patch use, almost half (46%) of
the nicotine group had quit smoking, while only one-fifth (20%)
of the placebo group had.” Newsweek, March 9, 1993, p. 62 This study is a randomized experiment.
Double-blind, Placebo-controlled We can conclude that nicotine patches
Randomized Experiment cause people to quit smoking.
240 smokers recruited (volunteers)
Randomized to 22-mg nicotine patch or placebo
(controlled) patch for 8 weeks. Potential confounding variables should be
Double-blind: neither the participants nor the nurses similar in the placebo and nicotine patch
taking the measurements knew who had received groups because of random assignment.
the active nicotine patches.

Summary of Types of Studies Building a Statistical Model:
Four‐step Process Used by Textbook
Observational study – Data are recorded
without “manipulating” any of the variables.
1. CHOOSE – Pick a form for the model.
Statistical experiment – One or more of the
explanatory variables is/are assigned/controlled 2. FIT – Estimate any parameters.
for all experimental units.
3. ASSESS – Is the model adequate? Could
Should use an experiment if we want to it be simpler? Are conditions met?
confirm a “cause/effect” relationship.
4. USE – Answer the question of interest.
Cannot conclude cause/effect from an
observational study!

Simplest Example: Constant Model; predict weight 
General form of a model (for each individual):
loss for certain diet, based on sample of people

Y  f (X )   Y  c
Random error CHOOSE this model:
where c is an unknown constant.
“Expected” Y for some
combination of predictors Terminology:
The constant c is a parameter of this model.
Data = Model + Error We use data to provide a sample estimate of c.

How should we estimate c from data?


FIT the model: Predicted Value for Y Assessment Questions
Get an estimate for Y using the predictors
(1) Which estimator (mean or median)
and the model with estimated parameter(s).
For the “constant” model, only 1 parameter. is better?
(That is, how can we compare models?)

Examples: Yˆ  Y (c = Sample mean) (2) Is either model any good?

(That is, how can we assess fit?)
Yˆ  m (c = Sample median)

Assessing Fit: Residuals Criteria to Minimize Residuals

Using the predicted value for each sample Sum of residuals:  (Y  Yˆ )

point the residual is:

Residual  Y  Yˆ
Sum of absolute
 Y  Yˆ
Actual Predicted Sum of squared 2
errors:  (Y  Yˆ )
Assess fit by creating a summary of size of
the residuals – want it to be small!

Use the Model Overview of Types of Models

After choosing a model, fitting it, and Response Explanatory Procedure Where
assessing that it fits well, you can use it to: Quantitative One Simple linear  Chs 1 &2
quantitative regression
• Predict the response variable for an individual Quantitative Multiple Multiple regr. Chs 3, 4
in the future, when you only know the value(s) Quantitative One  One‐way  Ch 5
of the explanatory variable(s) categorical ANOVA
Quantitative Binary Two‐sample t Stat 7
• Estimate the mean response for a specific
Quantitative Multiple cat. ANOVA Chs 6, 7
value of the explanatory variable(s)
Categorical Categorical Chi‐square Stat 7
• Extend results to a population, if appropriate Categorical Quantitative Logistic regr. Stat 111
• Determine causal relationships, if appropriate Categorical Multiple Logistic regr. Stat 111

You might also like