You are on page 1of 19

Vol.

9, 2024-07

Estimation of the Mean using Samples obtained from Finite Populations

Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
hugo.hernandez@forschem.org

doi:

Abstract
The error involved in the estimation of the mean value of a population depends on both the
sample size and the population size. Conventional expressions for determining the standard
error in the estimation of the mean have been obtained under the assumption of
independence between the elements in the sample. Unfortunately, for finite populations, the
elements are not independent from each other, but they are correlated since the distribution
of remaining elements in the population changes after an element is sampled. In this report, a
general expression for the estimation error of the mean of finite populations is derived. As the
population size increases, the estimation error approaches the conventional expression for
infinite populations. An illustrative example is used to show the validity of the general
expression obtained.

Keywords
Bessel’s Correction, Estimators, Finite Populations, Inferential Statistics, Parameter Estimation,
Randomistics, Sampling Error, Standard Deviation

1. Introduction

The typical expression used to determine the standard error (𝜎𝜀𝜇 ) in the estimation of the
𝑋 (𝑛)
mean value (𝜇𝑋 ) from a sample of size 𝑛 of a population (represented by variable 𝑋) is [1]:
𝜎𝑋
𝜎𝜀𝜇 = 𝜎𝑋̅(𝑛) =
𝑋 (𝑛)
√𝑛
(1.1)
where 𝜎𝑋̅ is the standard deviation of the sample average, and 𝜎𝑋 is the standard deviation in
the values of the elements in the population.

Cite as: Hernandez, H. (2024). Estimation of the Mean using Samples obtained from Finite Populations.
ForsChem Research Reports, 9, 2024-07, 1 - 19. Publication Date: 20/05/2024.
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

However, it has been tacitly assumed that the population of interest is infinite. But when the
population is finite, Eq. (1.1) is no longer valid, as it will be shown here.
In this report, the standard error in the estimation of the mean value from a sample of a finite
population is derived, and it will be shown that Eq. (1.1) is obtained in the limit when the size of
the population (𝑁) approaches infinity (𝑁 → ∞).

2. Properties of Samples obtained from Finite Populations

2.1. Sample Average from a Finite Population

Let us consider a finite population having 𝑁 elements with values represented by variable 𝑋.
Thus, the 𝑖-th element of the population has a value 𝑋𝑖 . The mean value (𝜇𝑋 ) for this population
is:
𝑁
1
𝜇𝑋 = ∑ 𝑋𝑖
𝑁
𝑖=1
(2.1)

Let us now assume that 𝑛 different elements are randomly selected from the population (1 ≤
𝑛 ≤ 𝑁), and the average value is calculated for this sample, as follows:
𝑛
1
𝑋̅(𝑛) = ∑ 𝑋𝑖(𝑗)
𝑛
𝑗=1
(2.2)
where the 𝑖-th element in the population corresponds to the 𝑗-th element in the sample.
Notice that for 𝑛 = 𝑁,
𝑋̅(𝑁) = 𝜇𝑋
(2.3)

2.2. Sample Variance of a Finite Population

The variance of the population (𝜎𝑋2 ) is defined as the average squared deviation of the
elements in the population with respect to their mean value:
𝑁
1
𝜎𝑋2 = ∑(𝑋𝑖 − 𝜇𝑋 )2
𝑁
𝑖=1
(2.4)
However, the variance of the sample (𝑆𝑋2 ) is calculated as follows [2]:

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (2 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

2
2
∑𝑛𝑗=1(𝑋𝑖(𝑗) − 𝑋̅(𝑛) )
𝑆𝑋(𝑛) =
𝑛−1
(2.5)
where a bias correction denoted as Bessel’s correction is employed [3]. Of course, this
correction is only valid for infinite populations, as for finite populations we would obtain:

2 𝑁
𝑆𝑋(𝑛=𝑁) =( ) 𝜎2
𝑁−1 𝑋
(2.6)
2
which consistent only for infinite populations, as we would obtain lim 𝐸(𝑆𝑋(𝑛=𝑁) ) = 𝜎𝑋2 .
𝑁→∞

2.3. Randomistic Representation of Each Element in the Sample

Since the elements of the sample are randomly chosen, it is useful representing the values of
the population using randomistic variables [4], as follows:

𝑋 = 𝜇𝑋 Υ + 𝜎𝑋 Ξ𝑋
(2.7)

where 𝜎𝑋 = √𝜎𝑋2 is the standard deviation of the population values, Υ is the standard
deterministic variable (equivalent to number 1), and Ξ𝑋 is a type I standard random variable1
representing the behavior of the population.

Thus, the first random element chosen from the population can be represented by Eq. (2.7).
When the population is assumed infinite, removing any element from the population will not
have an effect on the distribution of values, and thus Eq. (2.7) could be used for all elements in
the sample. However, for finite populations, removing an element will change the distribution.
For example, by removing a random element 𝑥𝑗=1 2, the variable describing the next element of
the sample becomes:

𝑋𝑗=2 = 𝜇𝑋,𝑗=2 Υ + 𝜎𝑋,𝑗=2 Ξ𝑋,𝑗=2


(2.8)
where
𝑁𝜇𝑋,𝑗=1 − 𝑥𝑗=1 𝑁 𝑥𝑗=1
𝜇𝑋,𝑗=2 = = 𝜇𝑋 −
𝑁−1 𝑁−1 𝑁−1
(2.9)
and

1
A type I random variable has a mean value of 0 and a standard deviation of 1.
2
A lower case notation is used here to denote a known value instead of a random value.

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (3 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

𝑁
1 2 𝑁 𝑁 2
𝜎𝑋,𝑗=2 =√ ∑(𝑋𝑖(𝑗) − 𝜇𝑋,𝑗=2 ) = √( ) 𝜎𝑋2 − 2 (𝑥𝑗=1 − 𝜇𝑋 )
𝑁−1 𝑁−1 (𝑁 − 1)
𝑗=2

(2.10)
Similarly, for the third element we obtain:

𝑋𝑗=3 = 𝜇𝑋,𝑗=3 Υ + 𝜎𝑋,𝑗=3 Ξ𝑋,𝑗=3


(2.11)
where
(𝑁 − 1)𝜇𝑋,𝑗=2 − 𝑥𝑗=2 𝑁 ∑2𝑗=1 𝑥𝑗
𝜇𝑋,𝑗=3 = = 𝜇𝑋 −
𝑁−2 𝑁−2 𝑁−2
(2.12)
and
𝑁
1 2
𝜎𝑋,𝑗=3 =√ ∑(𝑋𝑖(𝑗) − 𝜇𝑋,𝑗=3 )
𝑁−2
𝑗=3

2
2 2
𝑁 1 2 1
=√ 𝜎2 − ∑(𝑥𝑗 − 𝜇𝑋 ) − (∑(𝑥𝑗 − 𝜇𝑋 ))
𝑁−2 𝑋 𝑁−2 (𝑁 − 2)2
𝑗=1 𝑗=1

(2.13)
Or in general, for the 𝑛-th element:

𝑋𝑗=𝑛 = 𝜇𝑋,𝑗=𝑛 Υ + 𝜎𝑋,𝑗=𝑛 Ξ𝑋,𝑗=𝑛


(2.14)
where
(𝑁 − 𝑛 + 2)𝜇𝑋,𝑗=𝑛−1 − 𝑥𝑗=𝑛−1 𝑁𝜇𝑋 − ∑𝑛−1
𝑗=1 𝑥𝑗
𝜇𝑋,𝑗=𝑛 = =
𝑁−𝑛+1 𝑁−𝑛+1
(2.15)
and
𝑁
1 2
𝜎𝑋,𝑗=𝑛 =√ ∑(𝑋𝑖(𝑗) − 𝜇𝑋,𝑗=𝑛 )
𝑁−𝑛+1
𝑗=𝑛

2
𝑛−1 𝑛−1
𝑁 1 2 1
=√ 𝜎𝑋2 − ∑(𝑥𝑗 − 𝜇𝑋 ) − (∑(𝑥𝑗 − 𝜇𝑋 ))
𝑁−𝑛+1 𝑁−𝑛+1 (𝑁 − 𝑛 + 1)2
𝑗=1 𝑗=1

(2.16)
Thus, for 𝑛 = 𝑁 we obtain:

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (4 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

𝑋𝑗=𝑁 = 𝜇𝑋,𝑗=𝑁 Υ + 𝜎𝑋,𝑗=𝑁 Ξ𝑋,𝑗=𝑁


(2.17)
where
𝑁−1

𝜇𝑋,𝑗=𝑁 = 𝑁𝜇𝑋 − ∑ 𝑥𝑗 = 𝑥𝑗=𝑁


𝑗=1
(2.18)
and
2
𝑁−1 𝑁−1
2 2 2
𝜎𝑋,𝑗=𝑁 = √𝑁𝜎𝑋2 − ∑ (𝑥𝑗 − 𝜇𝑋 ) − (∑(𝑥𝑗 − 𝜇𝑋 )) = √(𝑥𝑗=𝑁 − 𝜇𝑋 ) − (𝜇𝑋 − 𝑥𝑗=𝑁 ) = 0
𝑗=1 𝑗=1

(2.19)
implying that the last element to be chosen is necessarily a deterministic variable.

Notice that for the first element:


𝑋𝑗=1 = 𝜇𝑋,𝑗=1 Υ + 𝜎𝑋,𝑗=1 Ξ𝑋,𝑗=1
(2.20)
where (from Eq. 2.15 and 2.16):
𝜇𝑋,𝑗=1 = 𝜇𝑋
(2.21)
𝜎𝑋,𝑗=𝑛 = 𝜎𝑋
(2.22)
Also notice that in general:
Ξ𝑋,𝑗=1 ≠ Ξ𝑋,𝑗=2 ≠ Ξ𝑋,𝑗=3 ≠ ⋯ ≠ Ξ𝑋,𝑗=𝑛 ≠ ⋯ ≠ Ξ𝑋,𝑗=𝑁
(2.23)

3. Mean Value Estimation Error

3.1. Expected Value, Variance and Covariance of Sample Elements

In this section, the mean value of a randomistic variable is denoted by the expected value
operator (𝐸), and the variance of the variable is denoted by the variance operator (𝑉𝑎𝑟).

The mean value of a population can be estimated from the average value of a random sample
of size 𝑛. The estimated mean value will then be:

𝜇̂ 𝑋 = 𝑋̅(𝑛)
(3.1)

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (5 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

This is equivalent to assuming that all unknown 𝑁 − 𝑛 elements have the same value 𝑋̅(𝑛).

The expected value of this estimator can be determined as follows (considering that the
sample elements are now unknown and using Eq. 2.15):
𝑛 𝑛 𝑗−1𝑛
1 1 1 𝑁𝜇𝑋 − ∑𝑘=1 𝐸(𝑋𝑘 )
𝐸(𝜇̂ 𝑋 ) = 𝐸(𝜇𝑋̅(𝑛) ) = 𝐸(𝑋̅(𝑛) ) = 𝐸 ( ∑ 𝑋𝑖(𝑗) ) = ∑ 𝐸(𝑋𝑖(𝑗) ) = ∑
𝑛 𝑛 𝑛 𝑁−𝑗+1
𝑗=1 𝑗=1 𝑗=1
(3.2)
where 𝐸(𝑋𝑘 ) represents the expected value of the 𝑘-th element in the sample.

For 𝑘 = 1 (from Eq. 2.21):


𝐸(𝑋𝑘=1 ) = 𝐸(𝜇𝑋 ) = 𝜇𝑋
(3.3)
For 𝑘 = 2 (from Eq.2.9):
𝑁 𝑋𝑗=1 𝑁 𝐸(𝑋𝑗=1 )
𝐸(𝑋𝑘=2 ) = 𝐸 ( 𝜇𝑋 − )= 𝜇𝑋 − = 𝜇𝑋
𝑁−1 𝑁−1 𝑁−1 𝑁−1
(3.4)
For 𝑘 = 3 (from Eq.2.12):
𝑁 ∑2𝑗=1 𝑋𝑗 𝑁 2𝜇𝑋
𝐸(𝑋𝑘=3 ) = 𝐸 ( 𝜇𝑋 − )= 𝜇𝑋 − = 𝜇𝑋
𝑁−2 𝑁−2 𝑁−2 𝑁−2
(3.5)
and proceeding similarly up to 𝑘 = 𝑛 (Eq. 2.15):

𝑁 ∑𝑛−1
𝑗=1 𝑋𝑗 𝑁 (𝑛 − 1)𝜇𝑋
𝐸(𝑋𝑘=𝑛 ) = 𝐸 ( 𝜇𝑋 − )= 𝜇𝑋 − = 𝜇𝑋
𝑁−𝑛+1 𝑁−𝑛+1 𝑁−𝑛+1 𝑁−𝑛+1
(3.6)
Thus, we may conclude that:
𝐸(𝑋𝑘 ) = 𝜇𝑋 , 𝑘 = 1,2, … , 𝑛
(3.7)
In addition, it is also possible to obtain the expected value of the squared value of each
element in the sample as follows (considering Eq. 2.14):

𝐸(𝑋𝑘2 ) = 𝐸(𝜇𝑋,𝑘
2 2
+ 𝜎𝑋,𝑘 )
2 𝑘−1
∑𝑘−1
𝑗=1 (𝑋𝑗 − 𝜇𝑋 )
𝑁 1 2
= 𝐸 ((𝜇𝑋 − ) + 𝜎2 − ∑(𝑋𝑗 − 𝜇𝑋 )
𝑁−𝑘+1 𝑁−𝑘+1 𝑋 𝑁−𝑘+1
𝑗=1

2
𝑘−1 𝑘−1
1 𝑁 1
− 2
(∑(𝑋𝑗 − 𝜇𝑋 )) ) = 𝜇𝑋2 + 𝜎𝑋2 − ∑(𝐸(𝑋𝑗2 ) − 𝜇𝑋2 )
(𝑁 − 𝑘 + 1) 𝑁−𝑘+1 𝑁−𝑘+1
𝑗=1 𝑗=1

(3.8)

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (6 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

Thus, for 𝑘 = 1:
2
𝐸(𝑋𝑘=1 ) = 𝜇𝑋2 + 𝜎𝑋2
(3.9)
For 𝑘 = 2:
2
𝑁 1
𝐸(𝑋𝑘=2 ) = 𝜇𝑋2 + 𝜎𝑋2 − 𝜎 2 = 𝜇𝑋2 + 𝜎𝑋2
𝑁−1 𝑁−1 𝑋
(3.10)
and in general:
𝑁 𝑘−1
𝐸(𝑋𝑘2 ) = 𝜇𝑋2 + 𝜎𝑋2 − 𝜎 2 = 𝜇𝑋2 + 𝜎𝑋2 , 𝑘 = 1,2, … , 𝑛
𝑁−𝑘+1 𝑁−𝑘+1 𝑋
(3.11)
Or, in other words,
𝑉𝑎𝑟(𝑋𝑘 ) = 𝐸(𝑋𝑘2 ) − 𝐸 2 (𝑋𝑘 ) = 𝜎𝑋2
(3.12)
These results indicate that the expected value and variance of all unknown elements in the
sample have the same expected value and variance of the finite population.
Now, regarding the element covariance of the elements in the sample will be:

𝐶𝑜𝑣(𝑋𝑘 , 𝑋𝑗≠𝑘 ) = 𝐸(𝑋𝑘 𝑋𝑗≠𝑘 ) − 𝐸(𝑋𝑘 )𝐸(𝑋𝑗≠𝑘 )


(3.13)
The expected value of the product of elements can be determined as follows:
∑𝑁 ⬚
∑𝑁
𝑘=1 ∑𝑗≠𝑘 𝑋𝑘 𝑋𝑗≠𝑘𝑘=1 𝑋𝑘 (𝑁𝜇𝑋 − 𝑋𝑘 ) 𝜇𝑋 ∑𝑁
𝑘=1 𝑋𝑘 ∑𝑁 2
𝑘=1 𝑋𝑘
𝐸(𝑋𝑘 𝑋𝑗≠𝑘 ) = = = −
𝑁(𝑁 − 1) 𝑁(𝑁 − 1) 𝑁−1 𝑁(𝑁 − 1)
2 2 2 2
𝑁𝜇𝑋 𝜇𝑋 + 𝜎𝑋 𝜎𝑋
= − = 𝜇𝑋2 −
𝑁−1 𝑁−1 𝑁−1
(3.14)

Notice that ∑⬚ 𝑁 2
𝑗≠𝑘 𝑋𝑗≠𝑘 was replaced by 𝑁𝜇𝑋 − 𝑋𝑘 according to Eq. (2.1), and ∑𝑘=1 𝑋𝑘 was
replaced by 𝑁(𝜇𝑋2 + 𝜎𝑋2 ) according to Eq. (3.11).

Then,
𝜎𝑋2 𝜎𝑋2
𝐶𝑜𝑣(𝑋𝑘 , 𝑋𝑗≠𝑘 ) = 𝜇𝑋2 − − 𝜇𝑋2 = −
𝑁−1 𝑁−1
(3.15)
This result indicates that there is a negative (non-zero) covariance between the sample
elements chosen from the finite population. The negative sign of the covariance indicates
elements with large values will necessarily be compensated by smaller values in the rest of the
population. This is somehow related to the natural tendency to return (“regress”) towards the

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (7 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

central value [5]. The value obtained in Eq. (3.15) differs from the usual assumption that
𝐶𝑜𝑣(𝑋𝑘 , 𝑋𝑗≠𝑘 ) = 0, considered in infinite populations. However, notice that:

lim 𝐶𝑜𝑣(𝑋𝑘 , 𝑋𝑗≠𝑘 ) = 0


𝑁→∞
(3.16)
providing a consistent result.

3.2. Expected Value and Variance of Sample Averages

Replacing Eq. (3.7) in Eq. (3.2) we obtain the expected value of sample averages:
𝑛 𝑗−1 𝑛 𝑛
1 𝑁𝜇𝑋 − ∑𝑘=1 𝜇𝑋 1 𝑁𝜇𝑋 − (𝑗 − 1)𝜇𝑋 1
𝐸(𝑋̅(𝑛) ) = ∑ = ∑ = ∑ 𝜇𝑋 = 𝜇𝑋
𝑛 𝑁−𝑗+1 𝑛 𝑁−𝑗+1 𝑛
𝑗=1 𝑗=1 𝑗=1

(3.17)
On the other hand, the variance of sample averages will be (considering Eq. 3.12 and 3.15):
𝑛 𝑛 𝑛−1 𝑛
1 1
𝑉𝑎𝑟(𝑋̅(𝑛) ) = 𝑉𝑎𝑟 ( ∑ 𝑋𝑖(𝑗) ) = 2 (∑ 𝑉𝑎𝑟(𝑋𝑖(𝑗) ) + 2 ∑ ∑ 𝐶𝑜𝑣(𝑋𝑖(𝑗) , 𝑋𝑖(𝑘) ))
𝑛 𝑛
𝑗=1 𝑗=1 𝑗=1 𝑘=𝑗+1
𝑛−1
1 𝜎𝑋2 1 𝑛−1 2 𝑁−𝑛
= 2 (𝑛𝜎𝑋2 − 2 ∑(𝑛 − 𝑗)) = (1 − ) 𝜎𝑋 = 𝜎2
𝑛 𝑁−1 𝑛 𝑁−1 𝑛(𝑁 − 1) 𝑋
𝑗=1
(3.18)
𝑛(𝑛−1)
where ∑𝑛−1 𝑛−1 𝑛−1
𝑗=1 (𝑛 − 𝑗) = ∑𝑗=1 𝑛 − ∑𝑗=1 𝑗 = 2
.

Notice that for 𝑁 → ∞, 𝑉𝑎𝑟(𝑋̅(𝑛) ) → 𝜎𝑋2 /𝑛, consistent with the results obtained for infinite
populations [1].

3.3. Expected Value of Sample Variances

Considering Eq. (2.2) and (2.5) and expanding the power, the variance of a sample becomes:
2 2
1 𝑛−1 1
∑𝑛𝑗=1 (𝑋𝑗 − ∑𝑛𝑘=1 𝑋𝑘 ) ∑𝑛𝑗=1 ( 𝑋𝑗 − ∑𝑛𝑘≠𝑗 𝑋𝑘 )
2
𝑆𝑋(𝑛) = 𝑛 = 𝑛 𝑛
𝑛−1 𝑛−1
𝑛 𝑛 𝑛 𝑛 𝑛
1 𝑛−1 2 2 2 1 1
= ( ) ∑ 𝑋𝑗 − 2 ∑ ∑ 𝑋𝑗 𝑋𝑘 + 2
∑ ∑ 𝑋𝑘 2
𝑛−1 𝑛 𝑛 𝑛 − 1𝑛
𝑗=1 𝑗=1 𝑘≠𝑗 𝑗=1 𝑘≠𝑗
𝑛 𝑛 𝑛
1 1
+ ∑ ∑ ∑ 𝑋𝑖 𝑋𝑘
𝑛 − 1 𝑛2
𝑗=1 𝑘≠𝑗 𝑖≠𝑗,𝑘
(3.19)

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (8 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

Thus, the expected value of sample variance is (considering Eq. 3.9 and 3.14):

2 𝑛−1 2 2)
2(𝑛 − 1) 2 𝜎𝑋2 1 2 2)
𝑛−2 2 𝜎𝑋2
𝐸(𝑆𝑋(𝑛) ) =( (𝜇
) 𝑋 + 𝜎𝑋 − (𝜇𝑋 − )+ (𝜇 + 𝜎𝑋 + (𝜇𝑋 − )
𝑛 𝑛 𝑁−1 𝑛 𝑋 𝑛 𝑁−1
𝑛 − 1 2(𝑛 − 1) 1 𝑛 − 2 2 𝑛−1 1 2
=( − + + ) 𝜇𝑋 + ( + ) 𝜎𝑋
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
2
2(𝑛 − 1) 𝑛 − 2 𝜎𝑋 𝑁
+( − ) = 𝜎2
𝑛 𝑛 𝑁−1 𝑁−1 𝑋
(3.20)
indicating that the sample variance is a biased estimator of the population variance for finite
populations.

A modified, unbiased estimator of the population variance would be:


𝑛
∗2 𝑁−1 2 𝑁−1 2
𝑆𝑋(𝑛,𝑁) =( ) 𝑆𝑋(𝑛) = ∑(𝑋𝑖(𝑗) − 𝑋̅(𝑛) )
𝑁 𝑁(𝑛 − 1)
𝑗=1
(3.21)
such that,
∗2
𝐸(𝑆𝑋(𝑛,𝑁) ) = 𝜎𝑋2
(3.22)
Also notice that for 𝑛 = 𝑁:
𝑁 𝑁
∗2 1 2 1 2
𝑆𝑋(𝑛=𝑁,𝑁) = ∑(𝑋𝑖(𝑗) − 𝑋̅(𝑛=𝑁) ) = ∑(𝑋𝑖(𝑗) − 𝜇𝑋 ) = 𝜎𝑋2
𝑁 𝑁
𝑗=1 𝑗=1
(3.23)

3.4. Estimation Error of the Finite Population Mean

The mean value of the population can only be accurately known by measuring all elements in
the population. For infinite populations this is never possible; however, for finite populations it
may be eventually achieved.
Nevertheless, if the whole population is not known, the mean value of the population can be
estimated from the average value of a sample of elements. In this case, it is assumed that the
sample is representative of the population, and therefore, the sample average corresponds to
the mean value of the population. Even by considering a random sample of elements, it is
virtually impossible that the sample average has the exact same value of the population mean,
and there will always be an estimation error.

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (9 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

The estimation error of the mean value (𝜀𝜇𝑋 ) can be defined as follows:

𝜀𝜇𝑋(𝑛) = 𝜇𝑋 − 𝜇̂ 𝑋 = 𝜇𝑋 − 𝑋̅(𝑛)
(3.24)
Notice that the expected value of the estimation error is (considering Eq. 3.17):

𝐸(𝜀𝜇𝑋(𝑛) ) = 𝐸(𝜇𝑋 − 𝜇̂ 𝑋 ) = 𝜇𝑋 − 𝐸(𝜇𝑋̅(𝑛) ) = 0


(3.25)
This result indicates that the sample average is an unbiased estimator of the population mean,
even when the population is finite.

On the other hand, the variance of the estimation error will be (using Eq. 3.18):
𝑁−𝑛
𝑉𝑎𝑟(𝜀𝜇𝑋(𝑛) ) = 𝑉𝑎𝑟(𝜇𝑋 − 𝑋̅(𝑛) ) = 𝑉𝑎𝑟(𝑋̅(𝑛) ) = 𝜎2
𝑛(𝑁 − 1) 𝑋
(3.26)
Thus, the standard error in the estimation of the mean value of a finite population is:

𝑁−𝑛
𝜎𝜀𝜇 = √𝑉𝑎𝑟(𝜀𝜇𝑋 (𝑛) ) = √ 𝜎
𝑋 (𝑛) 𝑛(𝑁 − 1) 𝑋
(3.27)
Notice that for 𝑛 = 𝑁, we obtain:
𝜎𝜀𝜇 =0
𝑋 (𝑛=𝑁)
(3.28)
indicating that the mean value of the population is deterministic when the whole population is
used as a sample, which is a reasonable result.

In addition, when 𝑁 → ∞, then


𝜎𝑋
lim 𝜎𝜀𝜇 =
𝑋 (𝑛)
𝑁→∞ √𝑛
(3.29)
which is the result expected for infinite populations.

Figure 1 to Figure 3 illustrate the effect of the population size (𝑁) and the sample fraction
(𝑛/𝑁) on the relative estimation error (𝜎𝜀𝜇 /𝜎𝑋 ) of the mean value of a finite population.
𝑋 (𝑛)

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (10 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

Figure 1. Effect of sample size on the relative estimation error (in logarithmic scale) of the mean value for
finite populations, for different sample fractions.

Figure 2. Effect of sample fraction on the relative estimation error (in logarithmic scale) of the mean
value for finite populations, for different sample sizes

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (11 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

Figure 3. Surface plot illustrating the simultaneous effect of sample size and sample fraction on the
relative estimation error of the mean value for finite populations

4. Alternative Derivation using Randomistic Variables

4.1. Covariance between Elements of Finite Populations

Let us recall that the elements in the population can be represented by the randomistic
expression presented in Eq. (2.7).

Now, let us assume that two elements are randomly chosen from the population. Then, the
two elements (𝑋(𝑗) and 𝑋(𝑘) ) can be described by the following expressions:

𝑋(𝑗) = 𝜇𝑋 Υ + 𝜎𝑋 Ξ𝑋(𝑗)
(4.1)
𝑋(𝑘) = 𝜇𝑋 Υ + 𝜎𝑋 Ξ𝑋(𝑘)
(4.2)
While for infinite populations it is safe to assume that both variables are independent, in the
case of finite populations it is not the case, because as soon as an element is chosen, it must be

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (12 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

removed from the set of available elements in the population, changing the probability
distribution of the following element. That means that their covariance is different from zero,
since 𝐶𝑜𝑣(Ξ𝑋(𝑗) , Ξ𝑋(𝑘) ) ≠ 0.

The covariance between two elements of the population can be determined as follows
(considering Eq. 4.1 and 4.2):

𝐶𝑜𝑣(𝑋(𝑗) , 𝑋(𝑘) ) = 𝐸(𝑋(𝑗) 𝑋(𝑘) ) − 𝐸(𝑋(𝑗) )𝐸(𝑋(𝑘) ) = 𝐸(𝑋(𝑗) 𝑋(𝑘) ) − 𝜇𝑋2


(4.3)
If we consider the sum of the covariances of element 𝑋(𝑗) with all other elements in the
population we obtain:

∑ 𝐶𝑜𝑣(𝑋(𝑗) , 𝑋(𝑘) ) = 𝐶𝑜𝑣 (𝑋(𝑗) , ∑ 𝑋(𝑘) ) = 𝐸 (𝑋(𝑗) ∑ 𝑋(𝑘) ) − (𝑁 − 1)𝜇𝑋2


𝑘≠𝑗 𝑘≠𝑗 𝑘≠𝑗
(4.4)
Now, since
∑ 𝑋(𝑘) = 𝑁𝜇𝑋 − 𝑋(𝑗)
𝑘≠𝑗
(4.5)
we may rewrite Eq. (4.4) as follows:

2
∑ 𝐶𝑜𝑣(𝑋(𝑗) , 𝑋(𝑘) ) = 𝐸 (𝑋(𝑗) (𝑁𝜇𝑋 − 𝑋(𝑗) )) − (𝑁 − 1)𝜇𝑋2 = 𝑁𝜇𝑋 𝐸(𝑋(𝑗) ) − 𝐸(𝑋(𝑗) ) − (𝑁 − 1)𝜇𝑋2
𝑘≠𝑗
= 𝑁𝜇𝑋2 − (𝜇𝑋2 + 𝜎𝑋2 ) − (𝑁 − 1)𝜇𝑋2 = −𝜎𝑋2
(4.6)
and considering that all covariances are equivalent:

∑ 𝐶𝑜𝑣(𝑋(𝑗) , 𝑋(𝑘) ) = (𝑁 − 1) 𝐶𝑜𝑣(𝑋(𝑗) , 𝑋(𝑘) )


𝑘≠𝑗
(4.7)
then, it can be concluded that:
𝜎𝑋2
𝐶𝑜𝑣(𝑋(𝑗) , 𝑋(𝑘) ) = −
𝑁−1
(4.8)
which is consistent with Eq. (3.15).

Combining Eq. (4.3) and (4.8) we obtain:


𝜎𝑋2
𝐸(𝑋(𝑗) 𝑋(𝑘) ) = 𝜇𝑋2 −
𝑁−1
(4.9)

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (13 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

The term 𝐸(𝑋(𝑗) 𝑋(𝑘) ) can also be expressed using standard variables as follows:

𝐸(𝑋(𝑗) 𝑋(𝑘) ) = 𝐸 ((𝜇𝑋 Υ + 𝜎𝑋 Ξ𝑋(𝑗) )(𝜇𝑋 Υ + 𝜎𝑋 Ξ𝑋(𝑘) ))


= 𝐸(𝜇𝑋2 + 𝜇𝑋 𝜎𝑋 (Ξ𝑋(𝑗) + Ξ𝑋(𝑘) ) + 𝜎𝑋2 Ξ𝑋(𝑗) Ξ𝑋(𝑘) )
= 𝜇𝑋2 + 𝜇𝑋 𝜎𝑋 𝐸(Ξ𝑋(𝑗) + Ξ𝑋(𝑘) ) + 𝜎𝑋2 𝐸(Ξ𝑋(𝑗) Ξ𝑋(𝑘) )
(4.10)

Since, by definition 𝐸(Ξ𝑋(𝑗) ) = 𝐸(Ξ𝑋(𝑘) ) = 0, and considering Eq. (4.9), then Eq. (4.10) can be
rearranged to obtain:
1
𝐶𝑜𝑣(Ξ𝑋(𝑗) Ξ𝑋(𝑘) ) = 𝐸(Ξ𝑋(𝑗) Ξ𝑋(𝑘) ) = −
𝑁−1
(4.11)
indicating that there is a correlation between the standard random variables representing
different elements in the population.

4.2. Randomistic Representation of the Sample Average of Finite Populations

Replacing Eq. (2.14) in Eq. (2.2), and considering a randomistic representation of the sample
average, the following expression is obtained (for a finite population):
𝑛 𝑛
1 𝜎𝑋
𝑋̅(𝑛) = 𝜇𝑋̅(𝑛) Υ + 𝜎𝑋̅(𝑛) Ξ𝑋̅(𝑛) = ∑(𝜇𝑋 Υ + 𝜎𝑋 Ξ𝑋,𝑗 ) = 𝜇𝑋 Υ + ∑ Ξ𝑋,𝑗
𝑛 𝑛
𝑗=1 𝑗=1
(4.12)
The corresponding expected value of the sample average will be:
𝑛
𝜎𝑋
𝜇𝑋̅(𝑛) = 𝐸(𝑋̅(𝑛) ) = 𝜇𝑋 + ∑ 𝐸(Ξ𝑋,𝑗 ) = 𝜇𝑋
𝑛
𝑗=1
(4.13)
since 𝐸(Ξ𝑋,𝑗 ) = 0.

On the other hand, the variance of the sample average is (considering 𝑉𝑎𝑟(Ξ𝑋,𝑗 ) = 1 and Eq.
4.11):
𝑛 𝑛−1 𝑛 𝑛−1
𝜎𝑋2 𝜎𝑋2 𝜎𝑋2 2 𝜎𝑋2
𝑉𝑎𝑟(𝑋̅(𝑛) ) = 2 ∑ 𝑉𝑎𝑟(Ξ𝑋,𝑗 ) + 2 2 ∑ ∑ 𝐶𝑜𝑣(Ξ𝑋,𝑗 , Ξ𝑋,𝑘 ) = − ∑(𝑛 − 𝑗)
𝑛 𝑛 𝑛 𝑁 − 1 𝑛2
𝑗=1 𝑗=1 𝑘=𝑗+1 𝑗=1
𝜎𝑋2 𝑛−1 𝜎𝑋2 𝑁 − 𝑛
= (1 − )= ( )
𝑛 𝑁−1 𝑛 𝑁−1
(4.14)
Thus, the standard deviation of the sample average is:

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (14 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

𝑁−𝑛
𝜎𝑋̅(𝑛) = √𝑉𝑎𝑟(𝑋̅(𝑛) ) = √ 𝜎
𝑛(𝑁 − 1) 𝑋
(4.15)
And the randomistic representation of the sample average for a finite population becomes:

𝑁−𝑛
𝑋̅(𝑛) = 𝜇𝑋 Υ + √ 𝜎 Ξ̅
𝑛(𝑁 − 1) 𝑋 𝑋(𝑛)
(4.16)

4.3. Randomistic Representation of the Error in the Estimation of the Mean of Finite
Populations

Considering the estimation error defined in Eq. (3.24), and using the randomistic expression
obtained in Eq. (4.16) results in:

𝑁−𝑛
𝜀𝜇𝑋(𝑛) = 𝜇𝑋 − 𝑋̅(𝑛) = √ 𝜎 Ξ̅
𝑛(𝑁 − 1) 𝑋 𝑋(𝑛)
(4.17)
Then, it can be concluded that:

𝐸(𝜀𝜇𝑋(𝑛) ) = 0
(4.18)
𝜎𝑋2
𝑁−𝑛
𝑉𝑎𝑟(𝜀𝜇𝑋(𝑛) ) = (
)
𝑛 𝑁−1
(4.19)
and the standard error in the estimation of the mean is:

𝑁−𝑛
𝜎𝜀𝜇 =√ 𝜎
𝑋 (𝑛) 𝑛(𝑁 − 1) 𝑋
(4.20)
which is equivalent to Eq. (3.27).

In the following Section, a simple but illustrative numerical example will be used to test the
validity of the standard error in the estimation of the mean for finite populations (Eq. 3.27 /
4.20).

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (15 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

5. Numerical Example

The owner of a farm currently has 10 cows. The weight of their cows, for a certain day, is
presented in Table 1.
Table 1. Weight of 10 cows in a farm
Cow ID Weight (kg)
A 667
B 562
C 731
D 583
E 688
F 504
G 652
H 682
I 1080
J 990

Thus, the mean weight of the cows in the farm is:


𝜇𝑊 = 713.9 𝑘𝑔
(5.1)
with a population standard deviation of:
𝜎𝑊 = 173.94 𝑘𝑔
(5.2)
The day they were weighed, one of the cows was missing and therefore, only a sample of 9
cows was considered. Depending on the missing cow, the sample averages, corrected sample
variance and estimation error obtained are different, as can be seen in Table 2.

Table 2. Average weight of samples of 9 cows in a farm having 10 cows


Corrected sample
Missing Cow Sample average Estimation Error
variance (𝑺∗𝟐 )
A 719.11 33761 -5.21
B 730.78 31151 -16.88
C 712.00 33999 1.90
D 728.44 31894 -14.54
E 716.78 33952 -2.88
F 737.22 28528 -23.32
G 720.78 33557 -6.88
H 717.44 33908 -3.54
I 673.22 17282 40.68
J 683.22 24507 30.68

The average of sample averages for the weight of the cows in the farm using samples of 9
cows is:

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (16 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

𝐸(𝑋̅(𝑛=9) ) = 713.9 𝑘𝑔
(5.3)
confirming that the sample average is an unbiased estimator of the mean value.

The average corrected sample variance would be:


∗2
𝐸(𝑆𝑊(𝑛=9,𝑁=10) ) = 30253.9 𝑘𝑔2
(5.4)
2
which is an unbiased estimation of the population variance = (173.94 𝑘𝑔)2 =
(𝜎𝑊
30253.89 𝑘𝑔2). Notice that the average uncorrected sample variance obtained is 33615.4 𝑘𝑔2,
clearly overestimating the population variance.

The average estimation error is:

𝐸(𝜀𝜇𝑋(𝑛) ) = 0 𝑘𝑔
(5.5)
which is consistent with Eq. (3.25) / (4.18).

And the standard deviation of the estimation error is:


𝜎𝜀𝜇 = 19.33 𝑘𝑔
𝑊 (𝑛=9)
(5.6)
consistent with Eq. (3.27) / (4.20), for a finite population.

Notice that assuming an infinite population, the standard error in the estimation of the mean
𝜎𝑋
would be in this case (from Eq. 1.1): 𝜎𝜀𝜇 = = 57.98 𝑘𝑔, which clearly overestimates the
𝑊 (𝑛=9) √9
actual standard error observed since the population is finite.

6. Conclusion

In a finite population (of size 𝑁), the distribution of available elements change as soon as one
element is sampled. It means that the elements in the sample are not truly independent from
each other. For this reason, the error involved in the estimation of the mean from the sample
average (with sample size 𝑛), does not strictly follow the conventional expression:
𝜎𝑋
𝜎𝜀𝜇 = 𝜎𝑋̅(𝑛) =
𝑋 (𝑛)
√𝑛
(1.1)
valid only for independent elements, as is the case for infinite populations.

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (17 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

The covariance between an element in a sample and all remaining elements in the population,
for a finite population, is:
𝜎𝑋2
𝐶𝑜𝑣(𝑋𝑘 , 𝑋𝑗≠𝑘 ) = −
𝑁−1
(3.15)
And therefore, the standard error in the estimation of the mean becomes:

𝑁−𝑛
𝜎𝜀𝜇 =√ 𝜎
𝑋 (𝑛) 𝑛(𝑁 − 1) 𝑋
(4.20)
Of course, for 𝑁 → ∞ we obtain:
lim 𝐶𝑜𝑣(𝑋𝑘 , 𝑋𝑗≠𝑘 ) = 0
𝑁→∞
(3.16)
and
𝜎𝑋
lim 𝜎𝜀𝜇 =
𝑋 (𝑛)
𝑁→∞ √𝑛
(3.24)
Notice that the estimation of the mean value in finite populations is also unbiased, as in the
case of infinite populations.
Now, since the population variance or standard deviation may be unknown, an unbiased
estimation from the sample standard deviation would be follows (considering Eq. 2.6):
∗2
𝜎𝑋2 ≈ 𝑆𝑋(𝑛,𝑁)
(6.1)
where
𝑛
∗2 𝑁−1 2 𝑁−1 2
𝑆𝑋(𝑛,𝑁) =( ) 𝑆𝑋(𝑛) = ∑(𝑋𝑖(𝑗) − 𝑋̅(𝑛) )
𝑁 𝑁(𝑛 − 1)
𝑗=1
(3.21)
and therefore,

𝑁−𝑛 ∗ 𝑁−𝑛 ⬚
𝜎𝜀𝜇 ≈√ 𝑆𝑋(𝑛,𝑁) = √ 𝑆
𝑋 (𝑛) 𝑛(𝑁 − 1) 𝑛𝑁 𝑋(𝑛)
(6.2)
∗2 ∗
Consider that while 𝑆𝑋(𝑛,𝑁) is unbiased, the standard deviation 𝑆𝑋(𝑛,𝑁) will be biased [3].

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (18 / 19)
Estimation of the Mean using Samples
obtained from Finite Populations
Hugo Hernandez
ForsChem Research
hugo.hernandez@forschem.org

Acknowledgment and Disclaimer

This report provides data, information and conclusions obtained by the author(s) as a result of original
scientific research, based on the best scientific knowledge available to the author(s). The main purpose
of this publication is the open sharing of scientific knowledge. Any mistake, omission, error or inaccuracy
published, if any, is completely unintentional.

This research did not receive any specific grant from funding agencies in the public, commercial, or not-
for-profit sectors.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:
• Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes were made. This can be done in any reasonable manner, but not in any way that
suggests endorsement by the licensor.
• NonCommercial: This material may not be used for commercial purposes.

References

[1] Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences. 9 th Edition.
Cengage Learning. Section 5.4. pp. 230-238. ISBN: 9781305251809.
[2] Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences. 9 th Edition.
Cengage Learning. Section 1.4. pp. 36-47. ISBN: 9781305251809.
[3] Hernandez, H. (2023). Probability Distribution and Bias of the Sample Standard Deviation.
ForsChem Research Reports, 8, 2023-02, 1 - 26. doi: 10.13140/RG.2.2.22144.51205.
[4] Hernandez, H. (2022). Standard Deterministic, Standard Random, and Randomistic Variables.
ForsChem Research Reports, 7, 2022-06, 1 - 18. doi: 10.13140/RG.2.2.36316.87688.
[5] Galton, F. (1890). Kinship and correlation. The North American Review, 150 (401), 419-431.
https://www.jstor.org/stable/25101964.

20/05/2024 ForsChem Research Reports Vol. 9, 2024-07


www.forschem.org / t.me/forschem (19 / 19)

You might also like