Professional Documents
Culture Documents
chapter 4
estimation
1
estimation
• estimating population characteristics from
samples
• random sample
• uncertainty = imprecision
2
estimation
population sample
parameter estimate
mean µ x̄ or Ȳ
sd σ s
proportion p p̂
3
estimation
• population frequency distribution (or
probability distribution)
• sample frequency distribution
4
population
• variable: length of a gene
• population: 20,290 protein coding genes in
the human genome
5
population
6
population
• parameters
7
sampling
• if we only had samples with n=100 from this
population, how should we estimate the
population characteristics?
• sample 100 genes randomly, without
replacement
• sampling without replacement = a single
observation can't be chosen twice
8
sampling
without replacement with replacement
https://www.spss-tutorials.com/simple-
9
random-sampling-what-is-it/
sample
10
sample
• sample estimates & the random sampling
effect
11
sampling distribution
• probability distribution of all possible sample
estimates
• e.g. all possible sample mean estimates Ȳ
• i.e., the “population” of sample means
12
sampling distribution
• http://www.zoology.ubc.ca/~whitlock/kingfish
er/SamplingNormal.htm
13
sampling distribution
14
sampling distribution – its mean
• property 1:
• Ȳ is an unbiased estimate of μ
• the average of sampling distribution of Ȳ will
be μ itself
15
sampling distribution – its shape
• property 2:
• the sampling distribution's shape is normal,
irrespective of the population's distribution
• as long as sample size is large enough
16
sampling distribution – its shape
17
sampling distribution - its variance
• property 3:the variance is an inverse function
of sample size
18
sampling distribution - its variance
higher sample size (n)
more info (random noise cancels out)
higher precision
lower variance
19
standard error
• standard error = standard deviation of a
sampling distribution
20
standard error of the mean
21
standard error of the mean
• but we usually do not know σ, but only s
22
confidence interval
• range around a sample estimate
• indicates uncertainty for the estimate
• pop parameter should lie inside, with certain
probability
23
confidence interval
• rough approximation to the 95% CI
Ȳ ± 2 SEȲ
24
confidence interval
• remember that in
normal distributions:
• the sampling
distribution is also
normal
25
www.mathsisfun.com
confidence interval
• 95% CI : 2121.4 < μ < 2702.2
26
confidence interval
27
confidence interval
• Why don`t we use 99.99999% confidence
interval?
28
the logic of estimation
• in real life, we don't have access to
populations
• = in real life, we cannot calculate sampling
distributions
• usually, we only have data from a single
sample, with specific n, mean and sd
• so what can we estimate about a population
based on this single sample?
29
the logic of estimation
• here we studied the behavior of sampling
distributions with simulations, in order to
understand the logic/theory behind the
statistical tools we use for estimation
30
the logic of estimation
• the logic of why a 95% CI can be constructed
as
• because the sampling distribution mean is the
pop mean
• because the sampling distribution s.d. is the
pop s.d./sqrt(n)
• because the sampling distribution has a
normal shape when n is large enough
31
the logic of estimation
• we learned how samples from population
behaved in general
• based on this, we can make reliable inferences
about populations from a single sample
32
questions / to read
• solve all problems at the end of the chapter
33