You are on page 1of 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328489774

AMSM- Estimation (Point and Interval)- Chapter Four

Chapter · October 2018


DOI: 10.31219/osf.io/fc9zh

CITATION READS
1 1,248

1 author:

Mohammed Dahman
Kadir Has University
29 PUBLICATIONS   8 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PROBABILITY & STATISTICS View project

Advanced Matrix Theory & linear Algebra View project

All content following this page was uploaded by Mohammed Dahman on 24 October 2018.

The user has requested enhancement of the downloaded file.


Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

1. Preface
In statistics estimation is a data analysis framework that uses a combination of effect sizes, confidence
intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results. A
thorough explanation of point and interval estimation are discussed. Four important steps to understand
interval estimation were explained. In addition to the scenario for more than one population.

2. Introduction
Let’s first discus a term called Estimation. It’s a division of statistics and signal processing that determines
the values of parameters through measured and observed empirical data. The process of estimation is
carried out in order to measure and diagnose the true value of a function or a particular set of populations.
It is done on the basis of observations on the samples, which are a combined piece of the target population
or function. Several statistics are used to perform the task of estimation. It’s an essential to understand this
concept in order to progress further in the field of data analysis. To simplify the definition above, if I want
to construct an experiment to understand the behavior of a population. Then I would draw a sample (1), and
maybe again sample (2), till sample (n). the question is why you drew these samples? The answer is, of which
we have already discussed in previous chapters, I want to estimate the values of the population parameters
using these samples’ statistics. See the word “estimate”, that’s the focal point of this summary paper.

3. Estimation in Statistics
Estimation statistics is a data analysis framework that uses a combination of effect sizes, confidence intervals,
precision planning, and meta-analysis to plan experiments, analyze data and interpret results. It is distinct
from null hypothesis significance testing (NHST), that we will cover in upcoming chapters, which is
considered to be less informative. Estimation statistics, or simply estimation, is also known as the new
statistics, a distinction introduced in the fields of psychology, medical research, life sciences and a wide range
of other experimental sciences where NHST still remains prevalent, despite estimation statistics having been
recommended as preferable for several decades ("Research that Matters ", 2002; Cohen, 1994).

4. Map of Estimation
The map below illustrates a map of estimation in statistics. We are going to discus them in details.

Mean CI for mean


Point
variance single population CI for variance
Esstimation
CI for mean
diference
Interval Two population
CI for ratio of
variance

Figure 1: Map of estimation

Remember that Statisticians use sample statistics to estimate population parameters. For example, sample
means are used to estimate population means; sample proportions, to estimate population proportions.
P a g e 1|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

1. Point estimate: A point estimate of a population parameter is a single value of a statistic. For example,
the sample mean x is a point estimate of the population mean μ. Similarly, the sample proportion p is a
point estimate of the population proportion P. In addition, 𝓼 is a point estimate of the population
variance 𝜎 2 .
2. Interval estimate: An interval estimate is defined by two numbers, between which a population
parameter is said to lie. The picture of interval estimate is this 𝒑(𝑳 ≤ 𝜽 ≤ 𝑼) = 𝟏 − 𝜶 where L is the
lower boundary and U is the upper boundary, 𝜽 is the population parameter. let’s now discus the
interval estimate for a single and two population.
a. Single population: let’s assume that I have got a single population. Then I draw “S1”, “S2”, .. , “Sn”.
as a result, I can calculate the “statistics” from each sample. Let’s take for example the statistic 𝑥̅
and 𝜎 2 . If I settle these statistics in a vector as the figure below, then these vectors will follow a
certain distribution. Please refer to chapter three (Dahman, 2018a).

𝑥̅1 𝜎12
Population
𝑥̅2 𝜎22
𝑥̅ 2
𝑋̅ = 3 𝜎 = 𝜎3
𝑥̅4 𝜎42
S1 S2 … Sn
.. ..
𝑥̅1 𝑥̅2 𝑥̅𝑛 𝑥̅𝑛 𝜎𝑛2
𝜎1 𝜎2 𝜎𝑛

The figure below will help to decide the type of distribution we can use.

n>=30 Z distribution
𝝈 known
n<30 Z distribution
Normal
n>=30 Zdistribution n>=30 Zdistribution
𝝈 un-known
n<30 t distribution 𝝈 known n<30 No
Population

n>=30 Z distribution
Non-Normal

n<30 No
𝝈 un-known

Figure 2: type of distribution according to population

In the single population for either mean or variance we have to understand the confidence interval
before we learn how to calculate it.

P a g e 2|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

• Confidence Interval: Statisticians use a confidence interval to express the precision and
uncertainty associated with a particular sampling method. A confidence interval consists of three
parts:
1) A confidence levels
The probability part of a confidence interval is called a confidence level. The confidence level
describes the likelihood that a particular sampling method will produce a confidence interval
that includes the true population parameter.
Here is how to interpret a confidence level. Suppose we collected all possible samples from a
given population, and computed confidence intervals for each sample. Some confidence
intervals would include the true population parameter; others would not. A 95% confidence
level means that 95% of the intervals contain the true population parameter; a 90% confidence
level means that 90% of the intervals contain the population parameter; and so on. See the
figure below.
2) A statistic, any sample statistics such as mean, proportion, variance, etc.
3) A margin of error
In a confidence interval, the range of values above and below the sample statistic is called the
margin of error.
For example, suppose the local newspaper conducts an election survey and reports that the
independent candidate will receive 30% of the vote. The newspaper states that the survey had
a 5% margin of error and a confidence level of 95%. These findings result in the following
confidence interval: We are 95% confident that the independent candidate will receive
between 25% and 35% of the vote.
Note: Many public opinion surveys report interval estimates, but not confidence intervals. They
provide the margin of error, but not the confidence level. To clearly interpret survey results you
need to know both! We are much more likely to accept survey findings if the confidence level
is high (say, 95%) than if it is low (say, 50%).
The confidence level describes the uncertainty of a sampling method. The statistic and the
margin of error define an interval estimate that describes the precision of the method. The
interval estimate of a confidence interval is defined by the sample statistic +/- margin of error.
For example, suppose we compute an interval estimate of a population parameter. We might
describe this interval estimate as a 95% confidence interval. This means that if we used the
same sampling method to select different samples and compute different interval estimates,
the true population parameter would fall within a range defined by the sample statistic +/-
margin of error 95% of the time (see the figure below). Confidence intervals are preferred to
point estimates, because confidence intervals indicate (a) the precision of the estimate and (b)
the uncertainty of the estimate.

P a g e 3|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

Assume I collected (n) samples. See 𝑥̅4 , it’s


the statistic from sample 4. It doesn’t contain
the parameter mean (i.e. away from the
mean parameter). To the right of the upper
bound. However other statistics do contain
the mean. Then I would have (n-1) samples
contained the parameter mean, and one
sample (i.e. sample 4) did not.
The confidence interval says that I have
chance of 95% the sample will contain the
mean. And 5% will contain the error. So, the
(n-1) was the 95% and sample 4 was in the
5% margin of error.

• Calculate Confidence Interval: the process to obtain the confidence interval for either the mean
or the variance will be accordingly with the type of distribution as we have shown in figure 2.
1) Obtain CI For the population mean: in this section we have the scenario based on 𝝈 is known or
un-known. As well as the sample size.
▪ Use Z distribution: the steps are straightforward as follow:
1. Collect the sample of size (n): 𝑥 = (𝑥1 , 𝑥2 , . . , 𝑥𝑛 )𝑇 ,
1
2. Compute the sample mean and standard deviation: 𝑥̅ = ∑𝑛𝑗=1 𝑥𝑖 ; 𝑠=
𝑛
1
𝑛−1
√∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 ;
𝑥−𝜇
3. Choose alpha and obtain upper and lower value of Z: 𝑝 (𝐿 ≤ ≤ 𝑈) = 1 − 𝛼;
𝜎
4. Develop the interval: 𝑥̅ − 𝑧𝛼/2 𝜎𝑥̅ ≤ 𝜇 ≤ 𝑥̅ + 𝑧𝛼/2 𝜎𝑥̅

If you are interested to learn how we have calculated the formula from step 4. You may refer
to (Dean W. & Wichern, 2007).

Example: you have collected a sample of (n=76); and you computed (𝑥̅ = 7, 𝑠 = 4); given that
the population standard deviation 𝜎 = 3. Construct 95% confidence interval for 𝑥̅ ?

Required 95% interval that means 100(1 − 𝛼) = 95; solve for alpha then you get (𝛼) = 0.05;
and (𝛼/2) = 0.025; apply the formula from step (4); you will get

3 3
7 − 𝑍0.025 ≤ 𝜇 ≤ 7 + 𝑍0.025
√76 √76
Find the value of Z alpha from the table
3 3
7 − 1.96 ≤ 𝜇 ≤ 7 + 1.96
√76 √76

P a g e 4|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

Result will be 6.32 ≤ 𝜇 ≤ 7.67. From the result you can see the difference between the point
estimate and confidence interval. In point estimate we just say 𝜇 = 7. But with interval we say
that 𝜇 will fall between two values (6.32 and 7.67).

Example extend: what will happen if the population standard deviation 𝜎 is un-know. The
answer as we illustrated in figure 2; we still can use Z distribution, as long as the sample size
is larger than 30. What change is that I can use the statistic 𝒔 from the sample instead of the
parameter. So, you can follow the exact same calculation and just replace 𝜎 value with 𝑠 = 4.

▪ Use t distribution: see the same example. If the sample size is less than 30, then in this case
𝑥̅ −𝜇
I can’t use Z distribution, instead, I will use t distribution. 𝑡𝑛−1 = 𝑠 . We can use the same
⁄ 𝑛

four steps as we did in Z distribution. The only changes will be in the formula as following:
𝛼/2 𝑠 𝛼/2 𝑠
𝑥̅ − 𝑡𝑛−1 ≤ 𝜇 ≤ 𝑥̅ + 𝑡𝑛−1
√𝑛 √𝑛

2) Obtain CI For the population variance: the same technique I will follow to obtain the CI for the
population variance. However, the only change will be the distribution. Recall from chapter
three (Dahman, 2018a), we have mentioned that, once I want to learn the distribution of
squared variance then I will follow Chi square distribution. Thus, following the exact same steps
I can write the formula as following:

(𝑛 − 1)𝑠 2 2
(𝑛 − 1)𝑠 2
≤ 𝜎 ≤
𝜒 2 𝛼,𝑛−1 𝜒 21−𝛼,𝑛−1
2 2
I have to draw your attention to the formula above, see the “numerator” terms are the same.
The difference is in the “denominator” terms. As you know that Chi square is restricted to the
degree of freedom (n-1), however in the alpha value (𝛼) we have two sides the upper (U) and
lower (L). see figure below. So, the 𝜎 2 will be greater than the U but less than the L. in other
𝛼
words, if you see the chart distribution you see that, this value 1 − associated with L is less
2
𝛼 (𝑛−1)𝑠 2
than this value associated with U. that means this value will be absolutely less than
2 𝜒2 𝛼
,𝑛−1
2
(𝑛−1)𝑠 2
.
𝜒2 𝛼
,𝑛−1
2

P a g e 5|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

Example: you have collected a sample of (n=30); and you computed (𝑠 2 = 25). Construct 90%
confidence interval for 𝜎 2 ? You have all the information you need. Let’s determine the value of
𝛼
alpha. Given the CI 90%. Then 1 − 𝛼 = 1 − 0.90 that gives value 0.10. = 0.05. now apply the
2
(29)25 (29)25
formula above ≤ 𝜎2 ≤ . Result 17.03 ≤ 𝜎 2 ≤ 40.94. 4.13 ≤ 𝜎 ≤ 6.40.
𝜒2 0.05,29 𝜒2 0.95,29

b. Two population: we have understood the concept of CI for a single population. The question is how
about having two population! Recall the picture of interval estimate 𝒑(𝑳 ≤ 𝜽 ≤ 𝑼) = 𝟏 − 𝜶. This
picture illustrated the scenario for a single population. The question is how if I have two population?
Well it will resemble the same picture with a minor change. 𝒑(𝑳 ≤ 𝜽𝟏 − 𝜽𝟐 ≤ 𝑼) = 𝟏 − 𝜶. Note
that 𝜽𝟏 is the parameter of the first population and 𝜽𝟐 the parameter of the second population.
• Confidence Interval for two population: it’s the same definition, for single population, as we have
introduced in the section above: Statisticians use a confidence interval for two population to
express the precision and uncertainty associated with two population samples for a particular
sampling method. A confidence interval consists of three parts: the confidence level, the statistics,
and the margin of error.
• Calculate the confidence interval for two population: the process to obtain the confidence
interval for two population samples will follow the same map as we illustrated in figure 2.
1) Obtain CI For the two population mean: in this section we have the scenario based on 𝝈 is
known or un-known. As well as the sample size.
▪ Use Z distribution: the four steps are straightforward as the one in single population. I will
have the final formula as following: note for the mathematical abstraction you may see
(Dean W. & Wichern, 2007).
𝜎12 𝜎22 𝜎12 𝜎22
(𝑥̅1 − 𝑥̅2 ) − 𝑧𝛼 √ + ≤ 𝝁𝟏 − 𝝁𝟐 ≤ (𝑥̅1 − 𝑥̅2 ) + 𝑧𝛼/2 √ + .
2 𝑛1 𝑛2 𝑛1 𝑛2
This formula is applicable under some conditions, of which you know one of them. The first
condition is that 𝝈𝟏 and 𝝈𝟐 must be known and not equal. The second one is that sample
size must be larger than 30.
Let’s see the first condition. If you know the two parameters (𝝈𝟏 and 𝝈𝟐 ) that’s fine,
however how if they are equal. In this case, this formula will not apply. We have to use
extend of it. Using something called 𝒔𝟐𝒑𝒐𝒐𝒍𝒆𝒅 .
(𝑛1 −1)𝑆12 +(𝑛2 −1)𝑆22
𝑺𝟐𝒑𝒐𝒐𝒍𝒆𝒅 = . This quantity will be the replacement value of 𝜎 2 from above.
𝑛1 +𝑛1 −2
The new arrangement will be:
1 1 1 1
(𝑥̅1 − 𝑥̅2 ) − 𝑧𝛼 𝑺𝟐𝒑𝒐𝒐𝒍𝒆𝒅 √ + ≤ 𝝁𝟏 − 𝝁𝟐 ≤ (𝑥̅1 − 𝑥̅2 ) + 𝑧𝛼 𝑺𝟐𝒑𝒐𝒐𝒍𝒆𝒅 √ +
2 𝑛1 𝑛2 2 𝑛1 𝑛2
So, now you see in case you have the two-population variances are equal what formula to
𝑆12 +𝑆22
use. One more note, in case that (𝑛1 = 𝑛2 = 𝑛). Then 𝑺𝟐𝒑𝒐𝒐𝒍𝒆𝒅 = .
2

P a g e 6|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

▪ t distribution: Let’s see now the second condition: if the sample size is less than 30. Well the
answer I believe is simple, we will use t distribution. Replace Z with t, and for the degree of
freedom it will be (𝑛1 + 𝑛1 )-2. The formula will be:

1 1 1 1
(𝑥̅1 − 𝑥̅2 ) − 𝑡𝛼;(𝑛 𝑺𝟐𝒑𝒐𝒐𝒍𝒆𝒅 √ + ≤ 𝝁𝟏 − 𝝁𝟐 ≤ (𝑥̅1 − 𝑥̅2 ) + 𝑡𝛼;(𝑛 +𝑛 )−2 𝑺𝟐𝒑𝒐𝒐𝒍𝒆𝒅 √ +
2 1 +𝑛1 )−2 𝑛1 𝑛2 2 1 1 𝑛1 𝑛2

2) Obtain CI For the ratio of two-population variance: it’s the same technique as finding the CI for
𝝈𝟐𝟏
two population mean. The only minor change In the picture is 𝑃 (𝑳 ≤ ≤ 𝑼) = 𝟏 − 𝜶. The
𝝈𝟐𝟐
same four steps will be followed. (1) we collect the sample, (2) we compute the statistic “in this
case the variance”, (3) decide the alpha value, and finally (4) construct the interval. Remember,
from chapter three (Dahman, 2018a), we have mentioned that if I have from population one
(𝑛−1)𝑆 2
sample 𝑛1 , and 𝒔𝟐𝟏 as well as 𝝈𝟐𝟏 . Then will follow 𝝌𝟐 chi square distribution. Same for
𝜎2
population two. In this case I would have for both variances’ ratio in the numerator and the
denominator chi square values. And that will follow F distribution with two degree of freedom.
(𝑛1 −1)𝑠12 (𝑛2 −1)𝑠22
𝐹𝑛1 −1,𝑛2 −1 = / . Finally, you can construct the interval as below. Note the
𝜎12 𝜎22
values of alpha (1 − 𝛼/2) and (𝛼/2). Both are same as explained in single population.
𝑠12 /𝑠22 𝝈𝟐𝟏 𝑠12 /𝑠22
≤ ≤
𝐹
1−𝛼/2 𝝈𝟐𝟐 𝐹 𝛼/2
𝑛1 −1,𝑛2 −1 𝑛1 −1,𝑛2 −1

P a g e 7|8
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four
Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18
License: CC-By Attribution 4.0 International
Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

• "Research that Matters ". (2002). Effect Size FAQs | Research that matters, results that make sense. Retrieved October 23,
2018, from https://effectsizefaq.com/
• Cohen, J. (1994). The earth is round (p<.05). American Psychologiest, 49(12), 997–1003. Retrieved from
http://www.iro.umontreal.ca/~dift3913/cours/papers/cohen1994_The_earth_is_round.pdf
• Dahman, M. R. (2018a). AMSM- Sampling Distribution- Chapter Three. OSF Preprints.
https://doi.org/10.31219/OSF.IO/H5AUC
• Dean W., R. A., & Wichern, J. (2007). Applied Multivariate Statistical Analysis (6th ed.). Pearson Prentice Hall.

P a g e 8|8

View publication stats

You might also like