You are on page 1of 46

Lecture 1

Introduction
What do statistical procedures tell you?
Motivations
Three Kinds of Data
The Mean
Measures of Variability

Introduction
What do statistical procedures tell you?
Motivations
Three Kinds of Data
The Mean
Measures of Variability

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Learning Objectives

Understand why biostatistics is important in practical clinical and


scientific decision making
Understand the key underlying concepts of statistical analysis
Be able to select appropriate statistical methods, use them, and
critique their use by others
Understand the importance of proper study design and the effects of
poor study design

Glantz - Primer of Biostatistics

The
TheMcGraw-Hill
McGraw-HillCompanies,
Companies,Inc,
Inc,2011
2011

Upon completion of this course, you should be able select and


apply statistical tests correctly based on study design and
scale of measurement
(From inside the front cover of Glantz)

Chapter

Biostatistics and Clinical Practice

Biostatistics and
Clinical Practice

Chapter 1

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

Biostatistics and Clinical Practice

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

Biostatistics and Clinical Practice

Introduction
What do statistical procedures tell you?
Motivations
Three Kinds of Data
The Mean
Measures of Variability

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

Biostatistics and Clinical Practice

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

Biostatistics and Clinical Practice

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

Biostatistics and Clinical Practice

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Motivations
Introduction
What do statistical procedures tell you?
Motivations
Three Kinds of Data
The Mean
Measures of Variability

Motivations for performing statistical tests correctly should be


clear, but why learn biostatistics if someone else is performing
the tests?
You need to understand statistics to plan good experiments
Why not just depend on the journals to ensure articles have solid statistical analysis?
Journals often get it wrong.
Why has the problem persisted?
The problem is endemic.

Chapter

How to Summarize Data

How to Summarize
Data

Chapter 2

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Learning Objectives

Understand kinds of data


Understand differences between experiments and observational
studies
Understand how to describe populations
Understand the relationship between populations and samples and
parameters and statistics
Understand distributions

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Three Kinds of Data


Introduction
What do statistical procedures tell you?
Motivations
Three Kinds of Data
The Mean
Measures of Variability

Interval continuous (focus for rest of


chapter)

Nominal (Categorical)
Ordinal ordered categories
Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Populations and Sample


Population

Introduction
What do statistical procedures tell you?
Motivations
Three Kinds of Data
The Mean
Measures of Variability

All individuals of interest


Characterized by parameters
Sample
Actually observed
All the methods we develop assume that the
sample was drawn at random from the population
Statistics computed from samples estimate population
parameters

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Parameters of the Normal Distribution

25

Glantz - Primer of Biostatistics

30

35

40

45

50

55

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Skewed Distribution

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Summarizing data: when to use


mean and standard deviation
When: the value of a variable is more likely
to fall near the mean than far from it
-andIt is equally likely to fall below the mean as
above it
Then: use mean and standard deviation to
describe the the location and amount of
variability

Summarizing data: when the


distribution is not Normal
Then use the median and at least two
other percentiles to describe the data

End Lecture 1

Where we left off in Lecture 1


The population mean ( ) describes the
location of a Normally distributed variable
The population standard deviation ( )
describes the variability of a Normally
distributed variable above and below the
mean
Use mean and standard deviation for
Normally distributed variables, otherwise
use median and at least two percentiles

Lecture 2

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

Chapter

How to Summarize Data

Skewed Distribution

Glantz - Primer of Biostatistics

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Median

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

The median is the value that half the members of

the population fall below.


Put observations in order, from smallest to largest
If odd number of observations, median is the
middle observation
If even number of observations, median is
between two middle observations

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

Percentiles

Defined analogously to median


The 25th percentile point, the point that defines

the lowest quarter of the observations


.25 (n + 1) observation
Average if between two observations
Median is 50th percentile

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Can use percentiles to


check if population is
normally distributed

Glantz - Primer of Biostatistics

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

Simple Random Sample

Every individual has the same probability of


being selected for the sample

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

Other Types of Samples

Stratified random sample


Clustered random sample
These methods are used in more complex
(usually large) studies
Calculations slightly different from what we will
discuss
Basic principles are the same
Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

Two Kinds of Studies


Experiment
Manipulate the environment
Draw conclusions about
causality
Glantz - Primer of Biostatistics

Observational study
Observe existing patterns
Draw conclusions about
associations

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Confounding Variables

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Median and Percentiles


Using Percentiles to Check Normality
Random Samples
Kinds of Studies
Revisiting the Normal Distribution: Sample
Parameters vs. Population Parameters

The Normal Distribution

Interval data
Also called Gaussian distribution
Many (but not all things) are normally distributed
Observation is result of the sum of many
independent random effects
Central Limit Theorem

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Parameters of the Normal Distribution


25

30

35

40

45

50

55

Population

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Parameters of the Normal Distribution


25

30

35

40

45

50

55

Population

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

25

30

25.0000

30.0000

Glantz - Primer of Biostatistics

35

35.0000

40

40.0000

45

50

45.0000

50.0000

55

55.0000

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

25

30

25.0000

30.0000

35

40

35.0000

45.0000

50

50.0000

55

55.0000

X
X=
= 41.4 cm
n

Sample

s=
Glantz - Primer of Biostatistics

40.0000

45

2
(X

X)

n 1

= 3.8 cm

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

25

25.0000

30

30.0000

Glantz - Primer of Biostatistics

35

35.0000

40

40.0000

45

50

55

45.0000

50.0000

55.0000

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

25

25.0000

Glantz - Primer of Biostatistics

30

30.0000

35

35.0000

40

40.0000

45

50

45.0000

50.0000

55

55.0000

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

25

30

25.0000

30.0000

25.0000

25.0000

30.0000

30.0000

Glantz - Primer of Biostatistics

35

35.0000

35.0000

35.0000

40

40.0000

40.0000

40.0000

45

45.0000

45.0000

45.0000

50

55

50.0000

55.0000

50.0000

55.0000

50.0000

55.0000

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Distribution of Sample Means

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Chapter

How to Summarize Data

Precision Increases as Sample Size Grows

Glantz - Primer of Biostatistics

The McGraw-Hill Companies, Inc, 2011

Central Limit Theorem


The distribution of sample means will be approximately
normal, even if the original data was not
The mean of all possible sample means will be equal to
the mean of the original population
The SD of all possible means of samples of a given size
(e.g. sample size=10, as in the previous example), called
the Standard Error of the Mean depends on both the SD
of the original population and sample size.

Chapter

How to Summarize Data

Standard Error of the Mean

Population SEM =

Best estimate of SEM


from a single sample =

Glantz - Primer of Biostatistics

X =
n
s
sX =
n
The McGraw-Hill Companies, Inc, 2011

Coming in Lecture 3
Some worked problems from Ch 2
Begin Ch 3

End Lecture 2

You might also like