You are on page 1of 10

Interval Estimation:

Let 𝜃 be a population parameter. Let there exists two statistics 𝐴 = 𝛼(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) and 𝐵 =
𝛽(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) such that 𝑃(𝐴 < 𝜃 < 𝐵) = 1 − 𝜖; 0 < 𝜖 < 1. Then the interval (𝐴, 𝐵) is referred as
a (1 − 𝜖)100% confidence interval for the parameter 𝜃 or confidence interval for the parameter 𝜃
with confidence coefficient 𝟏 − 𝝐. The end points of the interval are called lower and upper
confidence limits.
Let the population random variable 𝑋 be defined on the event space 𝑆 of a random experiment
𝐸. Consider a random sample (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) of size 𝑛 drawn from the population of 𝑋. Let (𝑎, 𝑏) be
the confidence interval for 𝜃 with confidence coefficient 1 − 𝜖 where 𝑎 = 𝛼(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), 𝑏 =
𝛽(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) and (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is a value of (𝑋1 , 𝑋2 , … , 𝑋𝑛 ). Let random samples of fixed size 𝑛
be drawn from the given population 𝑁 times under uniform conditions and (𝑎, 𝑏) are computed for
each of such random samples. Let (𝑎1 , 𝑏1 ), (𝑎2 , 𝑏2 ), … , (𝑎𝑁 , 𝑏𝑁 ) are the computed intervals. If 𝑁 be
large, then 𝜃 will belong approximately to (1 − 𝜖)𝑁 of the computed intervals i.e. (𝑎, 𝑏) will include
𝜃 approximately (1 − 𝜖)𝑁 times if 𝑁 is large.
If 𝜖 = 0.05, and (𝑎, 𝑏) is a confidence interval for 𝜃 with confidence coefficient 0.95, then frequency
interpretation of the confidence interval implies that (𝑎, 𝑏) includes 𝜃 approximately 950 times if the
random samples of size 𝑛 are drawn under uniform conditions 1000 times from the given population.
Generally, the number 𝜖 is chosen to be small, like 0.05, 0.01, 0.001 etc. then corresponding
confidence coefficients are 0.95, 0.99, .999 etc. and corresponding confidence intervals are called
95%, 99%, 99.9% etc. confidence intervals.
Method for finding confidence intervals:
1. Choose a suitable statistic 𝑍 = 𝑔(𝑋1 , 𝑋2 , … , 𝑋𝑛 , 𝜃) whose sampling distribution is
independent of 𝜃 but which itself depends on 𝜃.
2. Take two numbers 𝛼𝜖 , 𝛽𝜖 (> 𝛼𝜖 ) such that 𝑃(𝛼𝜖 < 𝑍 < 𝛽𝜖 ) = 1 − 𝜖. (1)
3. Rewrite the equation (1) as 𝑃(𝐴 < 𝜃 < 𝐵) = 1 − 𝜖. 𝐴 and 𝐵 are in fact functions of
𝑋1 , 𝑋2 , … , 𝑋𝑛 . Then (𝐴, 𝐵) is the desired (1 − 𝜖)100% confidence interval for 𝜃.
𝑵(𝒎, 𝝈) Population:
Confidence interval for m: Two cases may arise.
Case I: 𝜎 is known.
𝑋̅ −𝑚
We choose the statistic 𝑈 = 𝜎 whose sampling distribution is 𝑁(0,1), 𝑋̅ being the mean of a
⁄ 𝑛

random sample of size 𝑛 from a normal population.
Take two points ±𝑢𝜖 symmetrically arranged about the origin such that
𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖.

𝑋̅ − 𝑚
∴ 𝑃 (−𝑢𝜖 < 𝜎 < 𝑢𝜖 ) = 1 − 𝜖
⁄ 𝑛

𝜎𝑢𝜖 𝜎𝑢𝜖
∴ 𝑃 (𝑋̅ − < 𝑚 < 𝑋̅ + )= 1−𝜖
√𝑛 √𝑛

1
𝜎𝑢 𝜎𝑢
Therefore confidence interval for mean 𝑚 having confidence coefficient 1 − 𝜖 is (𝑋̅ − 𝑛𝜖 , 𝑋̅ + 𝑛𝜖 )
√ √
where 𝑢𝜖 is given by
𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖. (2)
𝜖
From symmetry of 𝑁(0,1) distribution, we have from (2) 𝑃(𝑈 > 𝑢𝜖 ) = 2 .

Case II: 𝜎 is known.


𝑋̅ −𝑚
Consider the statistic 𝑇 = 𝑠 whose sampling distribution is 𝑡 −distribution with (𝑛 − 1) degrees
⁄ 𝑛

𝑛
of freedom and 𝑠 2 = 𝑛−1 𝑆 ; 𝑆 being the variance of a random sample of size 𝑛 drawn from a
2 2

normal population.
Take two points ±𝑡𝜖 symmetrically arranged about the origin such that
𝑃(−𝑡𝜖 < 𝑇 < 𝑡𝜖 ) = 1 − 𝜖.

𝑋̅ − 𝑚
∴ 𝑃 (−𝑡𝜖 < 𝑠 < 𝑡𝜖 ) = 1 − 𝜖
⁄ 𝑛

𝑠𝑡𝜖 𝑠𝑡𝜖
∴ (𝑋̅ − < 𝑚 < 𝑋̅ + ) = 1 − 𝜖.
√𝑛 √𝑛

𝑠𝑡 𝑠𝑡
Hence confidence interval for 𝑚 with confidence coefficient 1 − 𝜖 is (𝑋̅ − 𝑛𝜖 , 𝑋̅ + 𝑛𝜖 ) where 𝑡𝜖 is
√ √
given by 𝑃(−𝑡𝜖 < 𝑇 < 𝑡𝜖 ) = 1 − 𝜖. (3)
𝜖
Due to symmetry of 𝑡 −distribution, we have from (3) 𝑃(𝑇 > 𝑡𝜖 ) = 2.

Confidence interval for σ:


𝑛𝑆 2
Here we take the statistic 𝜒 2 = whose sampling distribution is a 𝜒 2 - distribution with (𝑛 − 1)
𝜎2
1 𝑛
degrees of freedom and 𝑆 2 = ∑ (𝑋 − 𝑋̅)2 is the variance of a random sample of size 𝑛 drawn
𝑛 𝑖=1 𝑖
from a normal population.
We choose any positive numbers 𝜒𝜖1 2 and determine 𝜒𝜖2 2 (> 𝜒𝜖1 2 ) such that

𝑃(𝜒𝜖1 2 < 𝜒 2 < 𝜒𝜖2 2 ) = 1 − 𝜖

𝑛𝑆 2
∴ 𝑃 (𝜒𝜖1 2 < 2
< 𝜒𝜖2 2 ) = 1 − 𝜖
𝜎

𝑛 𝑛
∴ 𝑃 (𝑆√ < 𝜎 < 𝑆√ )=1−𝜖
𝜒𝜖2 2 𝜒𝜖1 2

𝑛 𝑛
Therefore the confidence interval for 𝜎 with confidence coefficient 1 − 𝜖 is (𝑆√𝜒 2
, 𝑆√𝜒 2
).
𝜖2 𝜖1

𝜖
Here 𝜒𝜖1 2 and 𝜒𝜖2 2 are determined by 𝑃(0 < 𝜒 2 < 𝜒𝜖1 2 ) = 2

2
𝜖 𝜖
or 𝑃(𝜒 2 > 𝜒𝜖1 2 ) = 1 − 2 and 𝑃(𝜒 2 > 𝜒𝜖2 2 ) = 2.

Binomial (𝑵, 𝒑) population:


Let X be the population random variable.
𝑋−𝑁𝑝
By De-Moivre-Laplace theorem, for a fixed 𝑝(0 < 𝑝 < 1) we know that is an
√𝑁𝑝(1−𝑝)
asymptotically 𝑁(0,1) variate as 𝑁 → ∞. Let a random sample of unit size be drawn from the
population of 𝑋.
𝑋−𝑁𝑝
We consider the statistic 𝑈 = . Sampling distribution of 𝑈 is approximately 𝑁(0,1)
√𝑁𝑝(1−𝑝)
distribution for large 𝑁. Due to the symmetry of normal distribution, it is possible it is possible to
find two points ±𝑢𝜖 such that
𝑢𝜖 𝑢2
1
𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) ≅ ∫ 𝑒 − 2 𝑑𝑢 = 1 − 𝜖 (0 < 𝜖 < 1).
√2𝜋 −𝑢𝜖

(𝑋−𝑁𝑝)2
Now 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 𝑃(𝑈 2 < 𝑢𝜖 2 ) = 𝑃 [𝑁𝑝(1−𝑝) < 𝑢𝜖 2 ] = 𝑃[(𝑥 − 𝑁𝑝)2 < 𝑁𝑝(1 − 𝑝)𝑢𝜖 2 ]

So, 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 𝑃[(𝑁 2 + 𝑁𝑢𝜖 2 )𝑝2 − 𝑝(2𝑁𝑋 + 𝑁𝑢𝜖 2 ) + 𝑋 2 < 0] … (4)
We now consider the following quadratic equation in 𝑝
(𝑁 2 + 𝑁𝑢𝜖 2 )𝑝2 − 𝑝(2𝑁𝑋 + 𝑁𝑢𝜖 2 ) + 𝑋 2 = 0 … (5)

2𝑁𝑋 + 𝑁𝑢𝜖 2 ± √(2𝑁𝑥 + 𝑁𝑢𝜖 2 )2 − 4(𝑁 2 + 𝑁𝑢𝜖 2 )𝑋 2


𝑝=
2(𝑁 2 + 𝑁𝑢𝜖 2 )

2𝑁𝑋 + 𝑁𝑢𝜖 2 ± √4𝑁 2 𝑋𝑢𝜖 2 + 𝑁 2 𝑢𝜖 4 − 4𝑁𝑋 2 𝑢𝜖 2


=
2𝑁(𝑁 + 𝑢𝜖 2 )
1
√4𝑁 2 𝑋𝑢𝜖 2 +𝑁 2 𝑢𝜖 4 −4𝑁𝑋 2 𝑢𝜖 2 √4𝑁𝑋(𝑁−𝑥)𝑢𝜖 2 +𝑁 2 𝑢𝜖 4 √𝑁𝑋(𝑁−𝑋) 𝑁𝑢𝜖 2 2
Again, = = 𝑢𝜖 [1 + 4𝑋(𝑁−𝑋)] .
2𝑁(𝑁+𝑢𝜖 2 ) 2𝑁(𝑁+𝑢𝜖 2 ) 𝑁(𝑁+𝑢𝜖 2 )

𝑋 𝜖 𝜖 𝑁𝑢 2 𝑁𝑢 2 1
For large 𝑁, 𝑁 ≅ 𝑝 and therefore 4𝑋(𝑁−𝑋) ≅ 𝑁(𝑁−𝑁𝑝)𝑝 which is of the order 𝑁.
1 1
−1
√𝑁𝑋(𝑁 − 𝑋) 𝑁𝑢𝜖 2 2 √𝑋(𝑁 − 𝑋) 𝑢𝜖 2 𝑁𝑢𝜖 2 2
∴ 𝑢𝜖 [1 + ] = 𝑢𝜖 (1 + ) [1 + ]
𝑁(𝑁 + 𝑢𝜖 2 ) 4𝑋(𝑁 − 𝑋) √𝑁𝑁 𝑁 4𝑋(𝑁 − 𝑥)

𝑋(𝑁 − 𝑋) 𝑢𝜖 2 𝑢𝜖 4 𝑁𝑢𝜖 2
= 𝑢𝜖 √ (1 − + − ⋯ ) [1 + +⋯]
𝑁3 𝑁 𝑁4 8𝑋(𝑁 − 𝑋)

𝑋(𝑁−𝑋) 𝑁𝑝(𝑁−𝑁𝑝) 𝑝(1−𝑝) 1


= 𝑢𝜖 √ ≅√ =√ (Retaining terms up to the order of )
𝑁3 𝑁3 𝑁 √𝑁

−1
2𝑁𝑋+𝑁𝑢 2 2𝑋+𝑁𝑢 2 2𝑋+𝑁𝑢𝜖 2 𝑢𝜖 2 𝑋 𝑢𝜖 2 𝑢𝜖 2
Also, 2𝑁(𝑁+𝑢 𝜖2) = 2(𝑁+𝑢 𝜖2) = (1 + ) = (𝑁 + ) (1 − +⋯)
𝜖 𝜖 2𝑁 𝑁 2𝑁 𝑁

𝑋 1
≅ 𝑁 (Retaining term up to the order of )
√𝑁

3
𝑋 𝑋(𝑁−𝑋)
Therefore, the roots of the quadratic equation (5) are approximately ± 𝑢𝜖 √ .
𝑁 𝑁3

∴ 𝑃[(𝑁 2 + 𝑁𝑢𝜖 2 )(𝑝 − 𝛼)(𝑝 − 𝛽) < 0] = 𝑃[(𝑝 − 𝛼)(𝑝 − 𝛽) < 0] where 𝛼, 𝛽 are the random
𝑋 𝑋(𝑁−𝑋)
variables corresponding to the roots 𝑁 ± 𝑢𝜖 √ .
𝑁3

𝑋 𝑋(𝑁−𝑋) 𝑋 𝑋(𝑁−𝑋)
Confidence interval for 𝑝 with confidence coefficient 1 − 𝜖 is (𝑁 − 𝑢𝜖 √ , 𝑁 + 𝑢𝜖 √ ). 𝑢𝜖
𝑁3 𝑁3

is given by
∞ 𝑢2
1 − 𝜖
𝑃(𝑈 > 𝑢𝜖 ) ≅ ∫ 𝑒 2 𝑑𝑢 = .
√2𝜋 𝑢𝜖 2

• Find 𝟗𝟓% confidence interval for the mean of 𝑵(𝒎, 𝝈)population using the following
data:
∞ 𝒕𝟐
𝟏
̅ = 𝟒𝟖, 𝝈 = 𝟗, 𝒏 = 𝟑𝟔. [𝐆𝐢𝐯𝐞𝐧
𝒙 ∫ 𝒆− 𝟐 𝒅𝒕 = 𝟎. 𝟎𝟐𝟓]
√𝟐𝝅 𝟏.𝟗𝟔
𝑋̅ −𝑚
We take the statistic 𝑈 = 𝜎 , sampling distribution of 𝑈 is 𝑁(0,1) distribution. Choose two
⁄ 𝑛

real numbers ±𝑢𝜖 s.t 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖.
Here 𝜖 = 0.05.
𝜎𝑢𝜖 𝜎𝑢𝜖
∴ 𝑃 (𝑋̅ − < 𝑚 < 𝑋̅ + ) = 0.95,
√𝑛 √𝑛
3 3
or 𝑃 (48 − 𝑢𝜖 < 𝑚 < 48 + 𝑢𝜖 ) = 0.95.
2 2
𝜖
𝑢𝜖 is obtained from 𝑃(𝑈 > 𝑢𝜖 ) = 2 = 0.025. ∴ 𝑢𝜖 = 1.96.
The 95% confidence interval is (45.06, 50.94).

• The marks obtained by 𝟏𝟖 candidates in an examination have a mean 𝟓𝟔 and variance


𝟔𝟓. Find the 𝟗𝟓% confidence limits for the mean of the population of marks, assuming
it to be normal.
[For 𝟏𝟕 degrees of freedom 𝑷(|𝒕| > 2.11) = 𝟎. 𝟎𝟓]

𝑋̅ −𝑚
Consider the statistic 𝑇 = 𝑠 , sampling distribution of 𝑇 is 𝑡 − distribution with (𝑛 − 1)
⁄ 𝑛

𝑛 18
degrees of freedom. 𝑠 2 = 𝑆 2 = 17 × 65 = 68.82.
𝑛−1
Take two numbers ±𝑡𝜖 .such that 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖.
Here 𝑠 = 8.296, 𝜖 = 0.05.
𝜎𝑡𝜖 𝜎𝑡𝜖
∴ 𝑃 (𝑋̅ − < 𝑚 < 𝑋̅ + ) = 0.95,
√𝑛 √𝑛
8.296 8.296
𝑃 (56 − 𝑡𝜖 < 𝑚 < 56 + 𝑡𝜖 ) = 0.95,
√18 √18
𝑡𝜖 is obtained from 𝑃(|𝑇| > 𝑡𝜖 ) = 1 − (1 − 𝜖) = 𝜖 = 0.05. ∴ 𝑡𝜖 = 2.11.
The confidence limits are 51.874 and 60.126.

4
• In a random sample, 𝟏𝟑𝟔 of 𝟒𝟎𝟎 persons given a flu vaccine experienced some
discomfort. Construct a 𝟗𝟓% confidence interval for the true proportion of persons who
will experience some discomfort from the vaccine.

Here 𝑁 = 400, 𝑥 = 136, 𝜖 = 0.05.


𝑥 𝑥(𝑁−𝑥) 𝑥 𝑥(𝑁−𝑥)
The (1 − 𝜖)100% confidence interval is (𝑁 − 𝑢𝜖 √ , 𝑁 + 𝑢𝜖 √ ), 𝑢𝜖 is given by
𝑁3 𝑁3

∞ 𝑢2
1 − 𝜖
𝑃(𝑈 > 𝑢𝜖 ) ≅ ∫ 𝑒 2 𝑑𝑢 = 2.
√2𝜋 𝑢𝜖

Here 𝑢𝜖 = 1.96. ∴ the confidence interval is

136 136(400−136) 136 136(400−136)


( − 1.96√ , + 1.96√ ) = (0.29, 0.39).
400 4003 400 4003

• A sample of size 𝟖 from a population yields as the unbiased estimate of population


variance the value 𝟒. 𝟒. Obtain the 𝟗𝟗% confidence limits for the population variance
𝝈𝟐 . (Given 𝝌𝟐.𝟗𝟗𝟓 = 𝟎. 𝟗𝟗 𝐚𝐧𝐝 𝝌𝟐.𝟎𝟎𝟓 = 𝟐𝟎. 𝟑 for 𝟕 𝐝𝐞𝐠𝐫𝐞𝐞𝐬 𝐨𝐟 𝐟𝐫𝐞𝐞𝐝𝐨𝐦)

Here 𝑛 = 8, 𝑠 2 = 4.4, 𝜖 = 0.01. ∴ 𝑛𝑆 2 = 4.4 × 7 = 30.8.


The (1 − 𝜖)100% confidence interval for 𝜎 2 is obtained from
𝑛 𝑛
𝑃 (𝑆 2 < 𝜎 < 𝑆2 ) = 1 − 𝜖,
𝜒𝜖2 2 𝜒𝜖1 2
𝜖 𝜖
𝜒𝜖1 2 and 𝜒𝜖2 2 are determined by 𝑃(0 < 𝜒 2 < 𝜒𝜖1 2 ) = , or 𝑃(𝜒 2 > 𝜒𝜖1 2 ) = 1 − and
2 2
𝜖
𝑃(𝜒 2 > 𝜒𝜖2 2 ) = 2.
30.8 30.8
According to the problem, the confidence limits are 20.3 and 0.99 , or 1.52, 31.1.

Bivariate Samples:
Let 𝑋, 𝑌 be the random variables defined over the sample space 𝑆 of a random experiment 𝐸. Then
(𝑋, 𝑌) is a two dimensional random variable which is a mapping from 𝑆 to 𝑅 × 𝑅; 𝑅 being the set of
all real numbers.

Let 𝐸 be performed once and let the outcome be 𝜔(∈ 𝑆). If (𝑋, 𝑌)(𝜔) = (𝑋(𝜔), 𝑌(𝜔)) = (𝑥, 𝑦)
where 𝑥 ∈ 𝑅 and 𝑦 ∈ 𝑅, then (𝑥, 𝑦) is called an observed value of (𝑋, 𝑌).
If 𝐸 be repeated under identical conditions a finite number of times, say 𝑛 times, then a set of
observed values (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) is obtained. This set is called a random bivariate
sample of size 𝑛.
Scatter diagram:
Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) be a bivariate sample of size 𝑛 drawn from a given bivariate
population. We may consider these pair of values to be 𝑛 points in 𝑥𝑦- plane by plotting the variables
𝑋 and 𝑌 along 𝑥 axis and 𝑦 axis respectively in a rectangular Cartesian coordinate system. Then
pointing all these points, we get a set of points in 𝑥𝑦 plane. This diagrammatic representation of a
bivariate data is known as scatter diagram.

5
Sample Characteristics:
Let {(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )} be a bivariate sample of size 𝑛 drawn from a given bivariate
population.
Sample means:
𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑥𝑖 , 𝑦̅ = ∑ 𝑦𝑖 .
𝑛 𝑛
𝑖=1 𝑖=1

Sample variances:
𝑛 𝑛
2 1 1
𝑆𝑥 = ∑(𝑥𝑖 − 𝑥̅ )2 , 𝑆𝑦 2 = ∑(𝑦𝑖 − 𝑦̅)2 .
𝑛 𝑛
𝑖=1 𝑖=1

Sample moments:
𝑛
1
𝑎𝑘𝑙 = ∑ 𝑥𝑖 𝑙 𝑦𝑖 𝑘 (𝑙, 𝑘 are non − negative integers).
𝑛
𝑖=1

Sample central moments:


𝑛
1
𝑚𝑘𝑙 = ∑(𝑥𝑖 − 𝑥̅ )𝑙 (𝑦𝑖 − 𝑦̅)𝑘 .
𝑛
𝑖=1

Sample covariance:
𝑛
1
𝑚11 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅).
𝑛
𝑖=1

Correlation Coefficient:
𝑚11
𝑟=𝑆 .
𝑥 𝑆𝑦

• Prove that if 𝐫 be the sample correlation coefficient of a bivariate sample


(𝒙𝟏 , 𝒚𝟏 ), (𝒙𝟐 , 𝒚𝟐 ), … , (𝒙𝒏 , 𝒚𝒏 ), then −𝟏 ≤ 𝒓 ≤ 𝟏.

1 𝑛
cov(𝑥,𝑦) 𝑚11 ∑ (𝑥 −𝑥̅ )(𝑦𝑖 −𝑦̅)
𝑛 𝑖=1 𝑖
𝑟= =𝑆 =
√Var(𝑥)Var(𝑦) 𝑥 𝑆𝑦 1
√ ∑𝑛
1
(𝑥 −𝑥̅ )2 √ ∑𝑛 (𝑦 −𝑦̅)2
𝑛 𝑖=1 𝑖 𝑛 𝑖=1 𝑖
2
𝑥𝑖 −𝑥̅ 𝑦𝑖 −𝑦̅
Now, ( ± ) ≥ 0; 𝑖 = 1,2, … , 𝑛.
𝑆𝑥 𝑆𝑦
2
𝑥𝑖 −𝑥̅ 2 𝑦𝑖 −𝑦̅ 𝑥𝑖 −𝑥̅ 𝑦𝑖 −𝑦̅
or, ( ) +( ) ± 2( ).( ) ≥ 0.
𝑆𝑥 𝑆𝑦 𝑆𝑥 𝑆𝑦
2
𝑥 −𝑥̅ 2 𝑦𝑖 −𝑦̅ 𝑥𝑖 −𝑥̅ 𝑦𝑖 −𝑦̅
or, ∑𝑛𝑖=1 ( 𝑖 ) + 𝑛
∑𝑖=1 ( ) ± 2 ∑𝑛𝑖=1 ( ).( ) ≥ 0.
𝑆 𝑥 𝑆 𝑦 𝑆𝑥 𝑆𝑦
1 1
or, 𝑆 2 𝑛𝑆𝑥 2 + 𝑆 2 𝑛𝑆𝑦 2 ± 2𝑛𝑟 ≥ 0
𝑥 𝑦

or, 2 ± 2𝑟 ≥ 0
or, −1 ≤ 𝑟 ≤ 1.

6
• Correlation coefficient is independent of the change of origin and of the change of
scales.

Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) be a bivariate sample of size 𝑛.


𝑥 −𝑎 𝑦 −𝑐
Define 𝑢𝑖 = 𝑖𝑏 , 𝑣𝑖 = 𝑖𝑑 , 𝑖 = 1,2, … , 𝑛 where 𝑎, 𝑏, 𝑐, 𝑑 are constans.
𝑥̅ −𝑎 𝑦̅−𝑐 1 1
Then 𝑢̅ = , 𝑣̅ = , 𝑆𝑢 = |𝑏| 𝑆𝑥 , 𝑆𝑣 = |𝑑| 𝑆𝑦 .
𝑏 𝑑
1 1
𝑢𝑖 − 𝑢̅ = (𝑥𝑖 − 𝑥̅ ), 𝑣𝑖 − 𝑣̅ = (𝑦𝑖 − 𝑦̅).
𝑏 𝑑
1 𝑛
∑𝑖=1(𝑢𝑖 − 𝑢̅)(𝑣𝑖 − 𝑣̅ )
𝑟𝑢𝑣 =𝑛
𝑆𝑢 𝑆𝑣
1 𝑛
∑𝑖=1(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)
=𝑛 1 1
𝑏𝑑 |𝑏| 𝑆𝑥 |𝑑| 𝑆𝑦
|𝑏||𝑑|
= 𝑟
𝑏𝑑 𝑥𝑦
∴ 𝑟𝑥𝑦 = 𝑟𝑢𝑣 or 𝑟𝑥𝑦 = −𝑟𝑢𝑣 according as 𝑏 and 𝑑 are of same sign or of opposite signs
respectively.
Curve fitting:
Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) be a random sample of size set of 𝑛 from a bivariate population.
Plot these points (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) on a rectangular coordinate system to obtain a scatter
diagram. From the scatter diagram it is often possible to visualize a smooth curve approximating the
data. The method of curve fitting helps us to obtain a close functional relation between the two
variables 𝑥 and 𝑦.
Regression:
One of the main purposes of curve fitting is to estimate one of the variables (the dependent variable)
from the other (the independent variable). The process of estimation is often referred to as regression.
If 𝑦 is estimated from a function of 𝑥, the corresponding curve is called a regression curve of 𝑦 on 𝑥.
Similarly, regression curve of 𝑥 on 𝑦 is obtained.
The method of Least Squares:
This method says that the best representative curve is that for which the sum of the squares of the
vertical distances of a series of plotted points from the curve is a minimum. Since, the squares of the
vertical distances are positive quantities, the requirement that their sum shall be as small as possible
ensures that the numerical values of the vertical distances will be small; and that means the best
representative curve will pass as closely as possible to all the points.
Let the plotted points be (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ).
For a given value of 𝑥, say 𝑥1 , there will be a difference between the value 𝑦1 and the corresponding
value as determined by the curve 𝐶. We denote the difference by 𝑑1 , which is sometimes referred to
as a deviation error or residual and may be positive, negative or zero.
Similarly, corresponding values of 𝑥2 , … , 𝑥𝑛 , we obtain the deviations 𝑑2 , … , 𝑑𝑛 respectively of all
curves in a given family of curves approximating a set of 𝑛 data points, a curve having the property
that 𝑑1 2 + ⋯ + 𝑑𝑛 2 = a minimum is called a best-fitting curve in the family.
7
A curve having this property is said to fit the data in the least-squares sense and is called a least-
square regression curve or simply least-square curve.
The Least-Square line:
Let the least square line approximating the set of points (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) is of the form
𝑦 = 𝑎 + 𝑏𝑥 … (1)
1
Where 𝑎 and 𝑏 are constants to be determined so that 𝑆 = 𝑛 ∑𝑛𝑖=1(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )2 is minimum.

For 𝑆 to be minimum, we must have


𝜕𝑆 𝜕𝑆
= = 0.
𝜕𝑎 𝜕𝑏
𝑛
𝜕𝑆
Now, = 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 ) = 0,
𝜕𝑎
𝑖=1
𝑛
𝜕𝑆
= 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 )𝑥𝑖 = 0.
𝜕𝑏
𝑖=1

Therefore the normal equations are


𝑛 𝑛

∑ 𝑦𝑖 = 𝑛𝑎 + 𝑏 ∑ 𝑥𝑖 ,
𝑖=1 𝑖=1
𝑛 𝑛 𝑛

∑ 𝑥𝑖 𝑦𝑖 = 𝑎 ∑ 𝑥𝑖 + 𝑏 ∑ 𝑥𝑖 2 … (2)
𝑖=1 𝑖=1 𝑖=1

Solving the system (1) we determine 𝑎 and 𝑏 and hence the best fitting line (1).
Least square parabola:
Let us consider the functional form
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 … (3)
Find the values of 𝑎, 𝑏, 𝑐 which will make the graph of (3) to pass as near as possible to each of the 𝑛
points (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ).
The sum of squares of the deviation errors will thus
𝑛

𝑆 = ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 − 𝑐𝑥𝑖2 )2 .


𝑖=1

For 𝑆 to be minimum, we must have


𝜕𝑆 𝜕𝑆 𝜕𝑆
= = = 0.
𝜕𝑎 𝜕𝑏 𝜕𝑐
𝑛
𝜕𝑆
= 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 − 𝑐𝑥𝑖2 ) = 0,
𝜕𝑎
𝑖=1

8
𝑛
𝜕𝑆
= 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 − 𝑐𝑥𝑖2 ) 𝑥𝑖 = 0,
𝜕𝑏
𝑖=1
𝑛
𝜕𝑆
= 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 − 𝑐𝑥𝑖2 ) 𝑥𝑖2 = 0.
𝜕𝑐
𝑖=1

Therefore, the normal equations are


𝑛 𝑛 𝑛

𝑛𝑎 + 𝑏 ∑ 𝑥𝑖 + 𝑐 ∑ 𝑥𝑖2 = ∑ 𝑦𝑖 ,
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛

𝑎 ∑ 𝑥𝑖 + 𝑏 ∑ 𝑥𝑖2 + 𝑐 ∑ 𝑥𝑖3 = ∑ 𝑥𝑖 𝑦𝑖 ,
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛

𝑎 ∑ 𝑥𝑖2 + 𝑏 ∑ 𝑥𝑖3 + 𝑐 ∑ 𝑥𝑖4 = ∑ 𝑥𝑖2 𝑦𝑖 … (4)


𝑖=1 𝑖=1 𝑖=1 𝑖=1

Least square Geometric curve and exponential curve:


Let us consider the following functional forms
𝑦 = 𝑎𝑥 𝑏 … (5)
and
𝑦 = 𝑎𝑒 𝑏𝑥 … (6)
By introducing logarithms, either of these two forms can be reduced to the form 𝑣 = 𝑎′ + 𝑏𝑢 where
𝑣 = ln 𝑦 , 𝑢 = ln 𝑥 (for (5))or 𝑥 (for (6)), 𝑎′ = ln 𝑎.

The normal equations are thus


𝑛 𝑛

∑ 𝑣𝑖 = 𝑛𝑎′ + 𝑏 ∑ 𝑢𝑖 ,
𝑖=1 𝑖=1
𝑛 𝑛 𝑛

∑ 𝑢𝑖 𝑣𝑖 = 𝑎′ ∑ 𝑢𝑖 + 𝑏 ∑ 𝑢𝑖 2 … (7)
𝑖=1 𝑖=1 𝑖=1

Solving the system we find 𝑎′ , 𝑏 and hence 𝑎 will be determined from 𝑎 = 𝑒 𝑎 .
• ̅ = 𝟕, 𝑺𝒙 = 𝟐, 𝒚
A bivariate sample of size 𝟏𝟏 gave the results 𝒙 ̅ = 𝟗, 𝑺𝒚 = 𝟒 𝐚𝐧𝐝 𝒓 = 𝟎. 𝟓.
It was later found that one point of the sample values (𝒙 = 𝟕, 𝒙 = 𝟗) was inaccurate and
was rejected. How would the original value of 𝒓 be affected by this rejection?
1
∑11
𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑥̅ 𝑦
̅
11
𝑟= . Given 𝑥̅ = 7, 𝑆𝑥 = 2, 𝑦̅ = 9, 𝑆𝑦 = 4, 𝑟 = 0.5.
𝑆𝑥 𝑆𝑦
11 11 11 11 11

∴ ∑ 𝑥𝑖 𝑦𝑖 = 737 [∵ ∑ 𝑥𝑖 = 77, ∑ 𝑦𝑖 = 99, ∑ 𝑥𝑖2 = 583, ∑ 𝑦𝑖2 = 1067. ]


𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

9
Let 𝑥̅ ′ , 𝑦̅ ′ , 𝑆𝑥′ , 𝑆𝑦′ and 𝑟 ′ be the values of 𝑥̅ , 𝑦̅, 𝑆𝑥 , 𝑆𝑦 , 𝑟 respectively after the rejection of the pair (7,9).

11

∑11
𝑖=1 𝑥𝑖 − 7 ∑11
𝑖=1 𝑦𝑖 − 9 1
𝑥̅ = = 7, 𝑦̅ ′ = = 9, 𝑆𝑥′ = √ (∑ 𝑥𝑖2 − 72 ) − 72 = √4.4, 𝑆𝑦′
10 10 10
𝑖=1

11 1
1 ∑11 𝑥 𝑦 − 36
2 2 2 ′ 10 𝑖=1 𝑖 𝑖
= √ (∑ 𝑦𝑖 − 9 ) − 9 = √17.6, 𝑟 = = 0.5 = 𝑟.
10 √4.4√17.6
𝑖=1

𝒚
• If 𝒚 = 𝟐𝒙 and 𝒙 = 𝟖 are two regression lines of a sample (𝒙𝒊 , 𝒚𝒊 ), 𝒊 = 𝟏, 𝟐, … , 𝒏 drawn
from a bivariate population of (𝑿, 𝒀). Find the correlation coefficient of the sample. If
(𝒖𝒊 , 𝒗𝒊 ), 𝒊 = 𝟏, 𝟐, … , 𝒏 be a sample from a bivariate population of (𝑼, 𝑽) where 𝒖𝒊 = 𝒙𝒊 +
𝒚𝒊 , 𝒗𝒊 = 𝒙𝒊 − 𝒚𝒊 , 𝒊 = 𝟏, 𝟐, … , 𝒏, find the regression lines of the sample (𝒖𝒊 , 𝒗𝒊 ), 𝒊 =
𝟏, 𝟐, … , 𝒏.

𝑦
Case-I: We take 𝑦 = 2𝑥 as regression line of 𝑌 on 𝑋 and 𝑥 = 8 as regression line 𝑋 on 𝑌.
𝑟𝑆𝑦 𝑟𝑆𝑥 1 2 2 1 1
= 2, = , 𝑟 = = , 𝑟 = [𝑆𝑥 , 𝑆𝑦 > 0].
𝑆𝑥 𝑆𝑦 8 8 4 2
𝑛
1 (∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑦𝑖 )
𝑢̅ = ∑ 𝑢𝑖 = = 𝑥̅ + 𝑦̅, 𝑣̅ = 𝑥̅ − 𝑦̅.
𝑛 𝑛
𝑖=1
1
(𝑥̅ , 𝑦̅) is the point of intersection of the lines 𝑦 = 2𝑥 and 𝑥 = 𝑦. 𝑥̅ = 0 = 𝑦̅, 𝑢̅ = 0 =
8
1 1 1 1 2
𝑣̅ , 𝑆𝑢2 = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 + 𝑦𝑖 − 0)2 = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 + 𝑦𝑖 )2 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 + 𝑛 ∑𝑛𝑖=1 𝑦𝑖2 + 𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 =
𝑆𝑥2 + 𝑆𝑦2 + 2𝑟𝑆𝑥 𝑆𝑦 = 21𝑆𝑥2 , 𝑆𝑣2 = 𝑆𝑥2 + 𝑆𝑦2 − 2𝑟𝑆𝑥 𝑆𝑦 = 13𝑆𝑥2 .

1 1
∑𝑛𝑖=1 𝑢𝑖 𝑣𝑖 − 𝑢̅𝑣̅ ∑𝑛𝑖=1 𝑢𝑖 𝑣𝑖 (𝑆𝑥2 − 𝑆𝑦2 ) 15
′ 𝑛 𝑛
𝑟 = = = = − .
𝑆𝑢 𝑆𝑣 𝑆𝑢 𝑆𝑣 21𝑆𝑥2 × 13𝑆𝑦2 √21√13
𝑟 ′ 𝑆𝑢 15
Regression line of 𝑈 on 𝑉 (𝑢 − 𝑢̅) = (𝑣 − 𝑣̅ ), 𝑢 = − 𝑣 .
𝑆𝑣 13
𝑟 ′ 𝑆𝑣 15
Regression line of 𝑉 on 𝑈 (𝑣 − 𝑣̅ ) = (𝑢 − 𝑢̅), 𝑣 = − 𝑢.
𝑆𝑢 21
𝑦
Case-II: We take 𝑦 = 2𝑥 as regression line of 𝑋on 𝑌 and 𝑥 = 8 as regression line 𝑌 on 𝑋.

𝑟𝑆𝑦 𝑟𝑆𝑥 1 2 8
= 8, = , 𝑟 = = 4, 𝑟 = 2[𝑆𝑥 , 𝑆𝑦 > 0] − Contradiction.
𝑆𝑥 𝑆𝑦 2 2

10

You might also like