You are on page 1of 33

biometry – bio220

chapter 4
estimation

1
estimation
• estimating population characteristics from
samples
• random sample
• uncertainty = imprecision

• how much can we infer about a population


from a limited sample?

2
estimation

population sample
parameter estimate
mean µ x̄ or Ȳ
sd σ s
proportion p p̂

3
estimation
• population frequency distribution (or
probability distribution)
• sample frequency distribution

4
population
• variable: length of a gene
• population: 20,290 protein coding genes in
the human genome

5
population

6
population
• parameters

7
sampling
• if we only had samples with n=100 from this
population, how should we estimate the
population characteristics?
• sample 100 genes randomly, without
replacement
• sampling without replacement = a single
observation can't be chosen twice

8
sampling
without replacement with replacement

https://www.spss-tutorials.com/simple-
9
random-sampling-what-is-it/
sample

10
sample
• sample estimates & the random sampling
effect

– note: Ȳ also frequently denoted as x ̄


– how to report Ȳ, given uncertainty ?

11
sampling distribution
• probability distribution of all possible sample
estimates
• e.g. all possible sample mean estimates Ȳ
• i.e., the “population” of sample means

12
sampling distribution
• http://www.zoology.ubc.ca/~whitlock/kingfish
er/SamplingNormal.htm

• all estimates have a sampling distribution

• usually we cannot determine the sampling


distribution directly, but statistical theory
allows us to predict properties

13
sampling distribution

14
sampling distribution – its mean
• property 1:
• Ȳ is an unbiased estimate of μ
• the average of sampling distribution of Ȳ will
be μ itself

15
sampling distribution – its shape
• property 2:
• the sampling distribution's shape is normal,
irrespective of the population's distribution
• as long as sample size is large enough

16
sampling distribution – its shape

17
sampling distribution - its variance
• property 3:the variance is an inverse function
of sample size

18
sampling distribution - its variance
higher sample size (n)
 more info (random noise cancels out)
 higher precision
 lower variance

19
standard error
• standard error = standard deviation of a
sampling distribution

•  contains information about the precision of


the sample estimate
– remember: precision ≠ accuracy
• we can use this to calculate confidence
intervals for the population parameter

20
standard error of the mean

21
standard error of the mean
• but we usually do not know σ, but only s

the parameter the estimate

22
confidence interval
• range around a sample estimate
• indicates uncertainty for the estimate
• pop parameter should lie inside, with certain
probability

• 95% CI for the mean: a likely range for the


true population mean

23
confidence interval
• rough approximation to the 95% CI

Ȳ ± 2 SEȲ

• this is based on the fact that sampling


distributions are normal 

24
confidence interval
• remember that in
normal distributions:

• Ȳ ± 2 s.d. ~ 95% of the


data

• the sampling
distribution is also
normal
25
www.mathsisfun.com
confidence interval
• 95% CI : 2121.4 < μ < 2702.2

• “we are 95% confident that the population


mean lies within this range”
• http://www.zoology.ubc.ca/~whitlock/kingfish
er/CIMean.htm

26
confidence interval

27
confidence interval
• Why don`t we use 99.99999% confidence
interval?

28
the logic of estimation
• in real life, we don't have access to
populations
• = in real life, we cannot calculate sampling
distributions
• usually, we only have data from a single
sample, with specific n, mean and sd
• so what can we estimate about a population
based on this single sample?

29
the logic of estimation
• here we studied the behavior of sampling
distributions with simulations, in order to
understand the logic/theory behind the
statistical tools we use for estimation

• i.e. to understand the logic of why a 95% CI


can be constructed as

30
the logic of estimation
• the logic of why a 95% CI can be constructed
as
• because the sampling distribution mean is the
pop mean
• because the sampling distribution s.d. is the
pop s.d./sqrt(n)
• because the sampling distribution has a
normal shape when n is large enough

31
the logic of estimation
• we learned how samples from population
behaved in general
• based on this, we can make reliable inferences
about populations from a single sample

• sample  statistics  population

32
questions / to read
• solve all problems at the end of the chapter

• read the section about pseudoreplication


(interleaf-2)

33

You might also like