You are on page 1of 25

Biostatistics for Life Sciences

Point Estimation
DR. LORI BOIES
ST. MARY ’S UNIVERSITY

@BoiesBiology
1
Point Estimation
Estimation represents ways or a process of learning and determining the
population parameter based on the model fitted to the data.
Point estimation and interval estimation, and hypothesis testing are three main
ways of learning about the population parameter from the sample statistic.
An estimator is a particular example of a statistic, which becomes an estimate
when the formula is replaced with actual observed sample values.
Point estimation = a single value that estimates the parameter. Point estimates
are single values calculated from the sample
A point estimate of a population parameter is a single value used to estimate the population
parameter. For example, the sample mean x̅ is a point estimate of the population mean μ.

For More Information: https://online.stat.psu.edu/stat504/node/18/ 2


Statistical Inference
Statistical inference is
“The attempt to reach a conclusion concerning all members of a class
(population) from observations of only some of them sample)”.
(Runes, 1959)

Two major components of inference:


Estimation and Hypothesis testing

3
Population Compared to Random Samples
The reference, target or study population is the group that
we actually wish to study.
A random sample of size n is a selection of n members of
the population such that each member is independently
chosen and has a known nonzero probability of being
selected. When every member has equal chance of being
selected, it is known as simple random sample.

4
Population and Samples
Important Concepts!
A population is a set of all possible subjects of which we are interested
in learning certain characteristics.
A parameter is a numerical quantity representing a characteristic of
the population.
A sample is a subset of subjects from the population.
A statistic is a numerical quantity representing a characteristic of the
sample.

https://biguru.wordpress.com/2015/01/04/a-brief-introduction-to-statistics-part-3-statistical-inference/
5
Image From: http://faculty.nps.edu/rdfricke/MCOTEA_Docs/Lecture%2010%20-%20Basic%20Statistical%20Inference%20for%20Survey%20Data.pdf 6
Sampling Strategies
 Simple random sample
each person chosen with equal probability: with or without
replacement (see next few slides)
 Stratified random sampling
 divide population into strata (subgroups) – e.g., gender, age, severity
 simple random sample within each stratum
 More complicated sampling
 cluster sampling

7
Parameter Estimation with Statistics
 We are often interested in population parameters.
 Since complete populations are difficult (or impossible) to collect data on, we use sample
statistics as point estimates for the unknown population parameters of interest.

Suppose we randomly sample 1,000 adults from across the US. Would you expect the
sample mean of their heights to be the same, somewhat different, or very different than
the overall population?

8
Parameter Estimation with Statistics
 We are often interested in population parameters.
 Since complete populations are difficult (or impossible) to collect data on, we use sample
statistics as point estimates for the unknown population parameters of interest.

Suppose we randomly sample 1,000 adults from across the US. Would you expect the
sample mean of their heights to be the same, somewhat different, or very different than
the overall population?
 Sample statistics vary from sample to sample.
 Quantifying how sample statistics vary provides a way to estimate the margin of error
associated with our point estimate, or what we refer to as the standard error (the standard
deviation of that statistic or estimate).

9
Example – Cholesterol Levels in the US
 Our target population consists of all people over 55 in the US. We
would like to know the mean of the cholesterol levels of this
population
 We are not going to be able to measure the cholesterol level for all
people over 55 years of age in the US.
 Our strategy is to select a random sample of subjects who are
representative of this large group.
 Then use the observations from this sample to help us estimate
the population parameters.

10
Example – Cholesterol Levels in the US
Let’s Sample!
 Repeatedly take samples of size 5 from our target population
 After selecting those 5 people, we return them to the population
 (This is known as sampling with replacement)

 Each individual sample of 5 consists of 5 different people. However, a


given individual can appear in more than one sample (See pink stick
person in next figure).

11
Example – Cholesterol Levels in the US

 We have an infinite number of possible samples of size 5.


 We can compute a sample mean and sample variance of the 5 cholesterol values for each
sample
12
Sampling Picture
Each sample mean 𝑋𝑋�𝑖𝑖 would be an estimate of the
population mean (𝜇𝜇).
The True
The sampling error (of the mean) represents the population
sample variation from sample 1 to sample 2, etc.
In reality, we have 1 sample so we do not observe this
variation directly (we will revisit this later).

Sample 1 Sample 2 … Sample N

𝑋𝑋�1 𝑋𝑋�2 𝑋𝑋�𝑁𝑁


13
Notations
X: the random variable (eg. total cholesterol).
N: population size
n: sample size
x1,x2,x3,…xN: the values of the random variable (eg. cholesterol
levels) for the individual members of the population.
x1,x2,x3,…xn: the values of the random variable (eg. cholesterol
levels) for the individual members of the sample.

14
Common Parameters and the
Corresponding Sample Estimates
∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖
Population Mean: 𝜇𝜇 =
𝑁𝑁
∑𝑁𝑁
𝑖𝑖=1 𝑥𝑥𝑖𝑖 −𝜇𝜇
2
Population Variance: 𝜎𝜎 2 =
𝑁𝑁

Standard deviation: 𝜎𝜎 = 𝜎𝜎 2

𝑥𝑥𝑖𝑖
Sample mean: 𝑥𝑥̅ = ∑𝑛𝑛𝑖𝑖=1
𝑛𝑛
∑𝑛𝑛
𝑖𝑖=1 𝑥𝑥𝑖𝑖 −𝑥𝑥̅
2
Sample variance: 𝑠𝑠 2 =
𝑛𝑛−1

Sample standard deviation: 𝑠𝑠 = 𝑠𝑠 2

15
Example
College Drinking
Suppose that you don't have access to the
population data. In order to estimate the average
number of drinks it takes these college students to
get drunk, you might sample from the population
and use your sample mean as the best guess for the
unknown population mean.
 Sample, with replacement, ten students from the
population, and record the number of drinks it
takes them to get drunk.
 Find the sample mean.
 Plot the distribution of the sample averages
obtained by members of the class.

16
Example
College Drinking
List of random numbers to identify individuals to include in the sample : 59, 121, 88, 46,
58, 72, 82, 81, 5, 10

Sample mean of the scores (blue column):


(8+6+10+4+5+3+5+6+6+6) / 10 = 5.9

17
Point Estimates and Parameter - Summary
RECALL:
The entire-population response proportion is generally referred to as a parameter of interest.
The sample mean 𝑥𝑥̅ is a point estimate of the population mean µ.
When the parameter is a proportion, it is often denoted by p, and we often refer to the sample
proportion as p̂.
 Unless we collect responses from every individual in the population, p remains unknown, and we use p̂
as our estimator of p. The error is the difference we see between the two. Generally error consists of
two aspects: sampling error and bias.

Sampling error (sampling uncertainty) describes how much an estimate will tend to vary from
one sample to the next. This is a huge part of statistics and sample size plays a role!
Bias describes a systematic tendency to over- or under-estimate the true population value.

OpenIntro Statistics – Chapter 5 (Foundations for Inference) 18


Sampling Distribution
Why is the sampling distribution important?
If a sampling distribution has a lot of variability (that is, a
big standard error), then if you took another sample, it’s
likely you would get a very different result

19
Standard Error of the Mean
The standard deviation of the set of sample means tells us how far
the typical estimate is away from the actual population mean.
 It also describes the typical error of the point estimate, and for this reason
we usually call this standard deviation the standard error (SE)
 Given n independent observations from a population with standard
deviation 𝜎𝜎, the standard error of the sample mean is:

𝜎𝜎
𝑆𝑆𝑆𝑆 =
𝑛𝑛

20
Standard Deviation vs. Standard Error
Standard deviation measures the variability in the
population.
Standard error measures the precision of a
statistic—such as the sample mean or proportion—
as an estimate of the population mean or population
proportion.

21
Estimation for a Population Proportion
𝑁𝑁 𝐴𝐴
The population proportion is 𝜋𝜋 =
𝑁𝑁

Where 𝑁𝑁 𝐴𝐴 = number of elements in the population with a specified characteristic “A”, N = total number
of element in the population (population size)
𝒏𝒏 𝑨𝑨
The sample proportion is 𝒑𝒑 =
𝒏𝒏
Where:
𝑛𝑛 𝐴𝐴 = number of elements in the sample with the same characteristic “A”
n = sample size

A good point estimate for 𝝅𝝅 is p.

22
Example of binomial proportion estimate
Suppose we want to estimate the prevalence of ear infections in
children aged 2 years old.
We randomly sample 123 children aged 2 years old, and 37 of them
have ear infections
What is our estimate of the prevalence of ear infections?
𝑛𝑛(𝐴𝐴) 37
𝑝𝑝 = = = .30
𝑛𝑛 123
We estimate that 30% of children aged 2 years old have ear
infections.

23
Simulation – Some Extra Fun on Your
Own!
Go to this link (below) to play around with the simulation tool about sampling distribution.

http://onlinestatbook.com/stat_sim/sampling_dist/index.html

24
Wrap-up
◦We discussed point estimates (statistics) as the best
estimate for the true but unknown population parameter
◦We learned that point estimates have their own
distribution different – but very related to – from the
distribution of the population. The variability of the point
estimate is called the standard error.
◦ The standard error of the sampling distribution is equal the standard
deviation of the population divided by the square root of the sample size
𝜎𝜎
𝑆𝑆𝑆𝑆 =
𝑛𝑛

25

You might also like