Professional Documents
Culture Documents
Let 𝜃 be a population parameter. Let there exists two statistics 𝐴 = 𝛼(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) and 𝐵 =
𝛽(𝑋1 , 𝑋2 , … , 𝑋𝑛 ) such that 𝑃(𝐴 < 𝜃 < 𝐵) = 1 − 𝜖; 0 < 𝜖 < 1. Then the interval (𝐴, 𝐵) is referred as
a (1 − 𝜖)100% confidence interval for the parameter 𝜃 or confidence interval for the parameter 𝜃
with confidence coefficient 𝟏 − 𝝐. The end points of the interval are called lower and upper
confidence limits.
Let the population random variable 𝑋 be defined on the event space 𝑆 of a random experiment
𝐸. Consider a random sample (𝑋1 , 𝑋2 , … , 𝑋𝑛 ) of size 𝑛 drawn from the population of 𝑋. Let (𝑎, 𝑏) be
the confidence interval for 𝜃 with confidence coefficient 1 − 𝜖 where 𝑎 = 𝛼(𝑥1 , 𝑥2 , … , 𝑥𝑛 ), 𝑏 =
𝛽(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) and (𝑥1 , 𝑥2 , … , 𝑥𝑛 ) is a value of (𝑋1 , 𝑋2 , … , 𝑋𝑛 ). Let random samples of fixed size 𝑛
be drawn from the given population 𝑁 times under uniform conditions and (𝑎, 𝑏) are computed for
each of such random samples. Let (𝑎1 , 𝑏1 ), (𝑎2 , 𝑏2 ), … , (𝑎𝑁 , 𝑏𝑁 ) are the computed intervals. If 𝑁 be
large, then 𝜃 will belong approximately to (1 − 𝜖)𝑁 of the computed intervals i.e. (𝑎, 𝑏) will include
𝜃 approximately (1 − 𝜖)𝑁 times if 𝑁 is large.
If 𝜖 = 0.05, and (𝑎, 𝑏) is a confidence interval for 𝜃 with confidence coefficient 0.95, then frequency
interpretation of the confidence interval implies that (𝑎, 𝑏) includes 𝜃 approximately 950 times if the
random samples of size 𝑛 are drawn under uniform conditions 1000 times from the given population.
Generally, the number 𝜖 is chosen to be small, like 0.05, 0.01, 0.001 etc. then corresponding
confidence coefficients are 0.95, 0.99, .999 etc. and corresponding confidence intervals are called
95%, 99%, 99.9% etc. confidence intervals.
Method for finding confidence intervals:
1. Choose a suitable statistic 𝑍 = 𝑔(𝑋1 , 𝑋2 , … , 𝑋𝑛 , 𝜃) whose sampling distribution is
independent of 𝜃 but which itself depends on 𝜃.
2. Take two numbers 𝛼𝜖 , 𝛽𝜖 (> 𝛼𝜖 ) such that 𝑃(𝛼𝜖 < 𝑍 < 𝛽𝜖 ) = 1 − 𝜖. (1)
3. Rewrite the equation (1) as 𝑃(𝐴 < 𝜃 < 𝐵) = 1 − 𝜖. 𝐴 and 𝐵 are in fact functions of
𝑋1 , 𝑋2 , … , 𝑋𝑛 . Then (𝐴, 𝐵) is the desired (1 − 𝜖)100% confidence interval for 𝜃.
𝑵(𝒎, 𝝈) Population:
Confidence interval for m: Two cases may arise.
Case I: 𝜎 is known.
𝑋̅ −𝑚
We choose the statistic 𝑈 = 𝜎 whose sampling distribution is 𝑁(0,1), 𝑋̅ being the mean of a
⁄ 𝑛
√
random sample of size 𝑛 from a normal population.
Take two points ±𝑢𝜖 symmetrically arranged about the origin such that
𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖.
𝑋̅ − 𝑚
∴ 𝑃 (−𝑢𝜖 < 𝜎 < 𝑢𝜖 ) = 1 − 𝜖
⁄ 𝑛
√
𝜎𝑢𝜖 𝜎𝑢𝜖
∴ 𝑃 (𝑋̅ − < 𝑚 < 𝑋̅ + )= 1−𝜖
√𝑛 √𝑛
1
𝜎𝑢 𝜎𝑢
Therefore confidence interval for mean 𝑚 having confidence coefficient 1 − 𝜖 is (𝑋̅ − 𝑛𝜖 , 𝑋̅ + 𝑛𝜖 )
√ √
where 𝑢𝜖 is given by
𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖. (2)
𝜖
From symmetry of 𝑁(0,1) distribution, we have from (2) 𝑃(𝑈 > 𝑢𝜖 ) = 2 .
normal population.
Take two points ±𝑡𝜖 symmetrically arranged about the origin such that
𝑃(−𝑡𝜖 < 𝑇 < 𝑡𝜖 ) = 1 − 𝜖.
𝑋̅ − 𝑚
∴ 𝑃 (−𝑡𝜖 < 𝑠 < 𝑡𝜖 ) = 1 − 𝜖
⁄ 𝑛
√
𝑠𝑡𝜖 𝑠𝑡𝜖
∴ (𝑋̅ − < 𝑚 < 𝑋̅ + ) = 1 − 𝜖.
√𝑛 √𝑛
𝑠𝑡 𝑠𝑡
Hence confidence interval for 𝑚 with confidence coefficient 1 − 𝜖 is (𝑋̅ − 𝑛𝜖 , 𝑋̅ + 𝑛𝜖 ) where 𝑡𝜖 is
√ √
given by 𝑃(−𝑡𝜖 < 𝑇 < 𝑡𝜖 ) = 1 − 𝜖. (3)
𝜖
Due to symmetry of 𝑡 −distribution, we have from (3) 𝑃(𝑇 > 𝑡𝜖 ) = 2.
𝑛𝑆 2
∴ 𝑃 (𝜒𝜖1 2 < 2
< 𝜒𝜖2 2 ) = 1 − 𝜖
𝜎
𝑛 𝑛
∴ 𝑃 (𝑆√ < 𝜎 < 𝑆√ )=1−𝜖
𝜒𝜖2 2 𝜒𝜖1 2
𝑛 𝑛
Therefore the confidence interval for 𝜎 with confidence coefficient 1 − 𝜖 is (𝑆√𝜒 2
, 𝑆√𝜒 2
).
𝜖2 𝜖1
𝜖
Here 𝜒𝜖1 2 and 𝜒𝜖2 2 are determined by 𝑃(0 < 𝜒 2 < 𝜒𝜖1 2 ) = 2
2
𝜖 𝜖
or 𝑃(𝜒 2 > 𝜒𝜖1 2 ) = 1 − 2 and 𝑃(𝜒 2 > 𝜒𝜖2 2 ) = 2.
(𝑋−𝑁𝑝)2
Now 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 𝑃(𝑈 2 < 𝑢𝜖 2 ) = 𝑃 [𝑁𝑝(1−𝑝) < 𝑢𝜖 2 ] = 𝑃[(𝑥 − 𝑁𝑝)2 < 𝑁𝑝(1 − 𝑝)𝑢𝜖 2 ]
So, 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 𝑃[(𝑁 2 + 𝑁𝑢𝜖 2 )𝑝2 − 𝑝(2𝑁𝑋 + 𝑁𝑢𝜖 2 ) + 𝑋 2 < 0] … (4)
We now consider the following quadratic equation in 𝑝
(𝑁 2 + 𝑁𝑢𝜖 2 )𝑝2 − 𝑝(2𝑁𝑋 + 𝑁𝑢𝜖 2 ) + 𝑋 2 = 0 … (5)
𝑋 𝜖 𝜖 𝑁𝑢 2 𝑁𝑢 2 1
For large 𝑁, 𝑁 ≅ 𝑝 and therefore 4𝑋(𝑁−𝑋) ≅ 𝑁(𝑁−𝑁𝑝)𝑝 which is of the order 𝑁.
1 1
−1
√𝑁𝑋(𝑁 − 𝑋) 𝑁𝑢𝜖 2 2 √𝑋(𝑁 − 𝑋) 𝑢𝜖 2 𝑁𝑢𝜖 2 2
∴ 𝑢𝜖 [1 + ] = 𝑢𝜖 (1 + ) [1 + ]
𝑁(𝑁 + 𝑢𝜖 2 ) 4𝑋(𝑁 − 𝑋) √𝑁𝑁 𝑁 4𝑋(𝑁 − 𝑥)
𝑋(𝑁 − 𝑋) 𝑢𝜖 2 𝑢𝜖 4 𝑁𝑢𝜖 2
= 𝑢𝜖 √ (1 − + − ⋯ ) [1 + +⋯]
𝑁3 𝑁 𝑁4 8𝑋(𝑁 − 𝑋)
−1
2𝑁𝑋+𝑁𝑢 2 2𝑋+𝑁𝑢 2 2𝑋+𝑁𝑢𝜖 2 𝑢𝜖 2 𝑋 𝑢𝜖 2 𝑢𝜖 2
Also, 2𝑁(𝑁+𝑢 𝜖2) = 2(𝑁+𝑢 𝜖2) = (1 + ) = (𝑁 + ) (1 − +⋯)
𝜖 𝜖 2𝑁 𝑁 2𝑁 𝑁
𝑋 1
≅ 𝑁 (Retaining term up to the order of )
√𝑁
3
𝑋 𝑋(𝑁−𝑋)
Therefore, the roots of the quadratic equation (5) are approximately ± 𝑢𝜖 √ .
𝑁 𝑁3
∴ 𝑃[(𝑁 2 + 𝑁𝑢𝜖 2 )(𝑝 − 𝛼)(𝑝 − 𝛽) < 0] = 𝑃[(𝑝 − 𝛼)(𝑝 − 𝛽) < 0] where 𝛼, 𝛽 are the random
𝑋 𝑋(𝑁−𝑋)
variables corresponding to the roots 𝑁 ± 𝑢𝜖 √ .
𝑁3
𝑋 𝑋(𝑁−𝑋) 𝑋 𝑋(𝑁−𝑋)
Confidence interval for 𝑝 with confidence coefficient 1 − 𝜖 is (𝑁 − 𝑢𝜖 √ , 𝑁 + 𝑢𝜖 √ ). 𝑢𝜖
𝑁3 𝑁3
is given by
∞ 𝑢2
1 − 𝜖
𝑃(𝑈 > 𝑢𝜖 ) ≅ ∫ 𝑒 2 𝑑𝑢 = .
√2𝜋 𝑢𝜖 2
• Find 𝟗𝟓% confidence interval for the mean of 𝑵(𝒎, 𝝈)population using the following
data:
∞ 𝒕𝟐
𝟏
̅ = 𝟒𝟖, 𝝈 = 𝟗, 𝒏 = 𝟑𝟔. [𝐆𝐢𝐯𝐞𝐧
𝒙 ∫ 𝒆− 𝟐 𝒅𝒕 = 𝟎. 𝟎𝟐𝟓]
√𝟐𝝅 𝟏.𝟗𝟔
𝑋̅ −𝑚
We take the statistic 𝑈 = 𝜎 , sampling distribution of 𝑈 is 𝑁(0,1) distribution. Choose two
⁄ 𝑛
√
real numbers ±𝑢𝜖 s.t 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖.
Here 𝜖 = 0.05.
𝜎𝑢𝜖 𝜎𝑢𝜖
∴ 𝑃 (𝑋̅ − < 𝑚 < 𝑋̅ + ) = 0.95,
√𝑛 √𝑛
3 3
or 𝑃 (48 − 𝑢𝜖 < 𝑚 < 48 + 𝑢𝜖 ) = 0.95.
2 2
𝜖
𝑢𝜖 is obtained from 𝑃(𝑈 > 𝑢𝜖 ) = 2 = 0.025. ∴ 𝑢𝜖 = 1.96.
The 95% confidence interval is (45.06, 50.94).
𝑋̅ −𝑚
Consider the statistic 𝑇 = 𝑠 , sampling distribution of 𝑇 is 𝑡 − distribution with (𝑛 − 1)
⁄ 𝑛
√
𝑛 18
degrees of freedom. 𝑠 2 = 𝑆 2 = 17 × 65 = 68.82.
𝑛−1
Take two numbers ±𝑡𝜖 .such that 𝑃(−𝑢𝜖 < 𝑈 < 𝑢𝜖 ) = 1 − 𝜖.
Here 𝑠 = 8.296, 𝜖 = 0.05.
𝜎𝑡𝜖 𝜎𝑡𝜖
∴ 𝑃 (𝑋̅ − < 𝑚 < 𝑋̅ + ) = 0.95,
√𝑛 √𝑛
8.296 8.296
𝑃 (56 − 𝑡𝜖 < 𝑚 < 56 + 𝑡𝜖 ) = 0.95,
√18 √18
𝑡𝜖 is obtained from 𝑃(|𝑇| > 𝑡𝜖 ) = 1 − (1 − 𝜖) = 𝜖 = 0.05. ∴ 𝑡𝜖 = 2.11.
The confidence limits are 51.874 and 60.126.
4
• In a random sample, 𝟏𝟑𝟔 of 𝟒𝟎𝟎 persons given a flu vaccine experienced some
discomfort. Construct a 𝟗𝟓% confidence interval for the true proportion of persons who
will experience some discomfort from the vaccine.
∞ 𝑢2
1 − 𝜖
𝑃(𝑈 > 𝑢𝜖 ) ≅ ∫ 𝑒 2 𝑑𝑢 = 2.
√2𝜋 𝑢𝜖
Bivariate Samples:
Let 𝑋, 𝑌 be the random variables defined over the sample space 𝑆 of a random experiment 𝐸. Then
(𝑋, 𝑌) is a two dimensional random variable which is a mapping from 𝑆 to 𝑅 × 𝑅; 𝑅 being the set of
all real numbers.
Let 𝐸 be performed once and let the outcome be 𝜔(∈ 𝑆). If (𝑋, 𝑌)(𝜔) = (𝑋(𝜔), 𝑌(𝜔)) = (𝑥, 𝑦)
where 𝑥 ∈ 𝑅 and 𝑦 ∈ 𝑅, then (𝑥, 𝑦) is called an observed value of (𝑋, 𝑌).
If 𝐸 be repeated under identical conditions a finite number of times, say 𝑛 times, then a set of
observed values (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) is obtained. This set is called a random bivariate
sample of size 𝑛.
Scatter diagram:
Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ) be a bivariate sample of size 𝑛 drawn from a given bivariate
population. We may consider these pair of values to be 𝑛 points in 𝑥𝑦- plane by plotting the variables
𝑋 and 𝑌 along 𝑥 axis and 𝑦 axis respectively in a rectangular Cartesian coordinate system. Then
pointing all these points, we get a set of points in 𝑥𝑦 plane. This diagrammatic representation of a
bivariate data is known as scatter diagram.
5
Sample Characteristics:
Let {(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 )} be a bivariate sample of size 𝑛 drawn from a given bivariate
population.
Sample means:
𝑛 𝑛
1 1
𝑥̅ = ∑ 𝑥𝑖 , 𝑦̅ = ∑ 𝑦𝑖 .
𝑛 𝑛
𝑖=1 𝑖=1
Sample variances:
𝑛 𝑛
2 1 1
𝑆𝑥 = ∑(𝑥𝑖 − 𝑥̅ )2 , 𝑆𝑦 2 = ∑(𝑦𝑖 − 𝑦̅)2 .
𝑛 𝑛
𝑖=1 𝑖=1
Sample moments:
𝑛
1
𝑎𝑘𝑙 = ∑ 𝑥𝑖 𝑙 𝑦𝑖 𝑘 (𝑙, 𝑘 are non − negative integers).
𝑛
𝑖=1
Sample covariance:
𝑛
1
𝑚11 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅).
𝑛
𝑖=1
Correlation Coefficient:
𝑚11
𝑟=𝑆 .
𝑥 𝑆𝑦
1 𝑛
cov(𝑥,𝑦) 𝑚11 ∑ (𝑥 −𝑥̅ )(𝑦𝑖 −𝑦̅)
𝑛 𝑖=1 𝑖
𝑟= =𝑆 =
√Var(𝑥)Var(𝑦) 𝑥 𝑆𝑦 1
√ ∑𝑛
1
(𝑥 −𝑥̅ )2 √ ∑𝑛 (𝑦 −𝑦̅)2
𝑛 𝑖=1 𝑖 𝑛 𝑖=1 𝑖
2
𝑥𝑖 −𝑥̅ 𝑦𝑖 −𝑦̅
Now, ( ± ) ≥ 0; 𝑖 = 1,2, … , 𝑛.
𝑆𝑥 𝑆𝑦
2
𝑥𝑖 −𝑥̅ 2 𝑦𝑖 −𝑦̅ 𝑥𝑖 −𝑥̅ 𝑦𝑖 −𝑦̅
or, ( ) +( ) ± 2( ).( ) ≥ 0.
𝑆𝑥 𝑆𝑦 𝑆𝑥 𝑆𝑦
2
𝑥 −𝑥̅ 2 𝑦𝑖 −𝑦̅ 𝑥𝑖 −𝑥̅ 𝑦𝑖 −𝑦̅
or, ∑𝑛𝑖=1 ( 𝑖 ) + 𝑛
∑𝑖=1 ( ) ± 2 ∑𝑛𝑖=1 ( ).( ) ≥ 0.
𝑆 𝑥 𝑆 𝑦 𝑆𝑥 𝑆𝑦
1 1
or, 𝑆 2 𝑛𝑆𝑥 2 + 𝑆 2 𝑛𝑆𝑦 2 ± 2𝑛𝑟 ≥ 0
𝑥 𝑦
or, 2 ± 2𝑟 ≥ 0
or, −1 ≤ 𝑟 ≤ 1.
6
• Correlation coefficient is independent of the change of origin and of the change of
scales.
∑ 𝑦𝑖 = 𝑛𝑎 + 𝑏 ∑ 𝑥𝑖 ,
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
∑ 𝑥𝑖 𝑦𝑖 = 𝑎 ∑ 𝑥𝑖 + 𝑏 ∑ 𝑥𝑖 2 … (2)
𝑖=1 𝑖=1 𝑖=1
Solving the system (1) we determine 𝑎 and 𝑏 and hence the best fitting line (1).
Least square parabola:
Let us consider the functional form
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 … (3)
Find the values of 𝑎, 𝑏, 𝑐 which will make the graph of (3) to pass as near as possible to each of the 𝑛
points (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … , (𝑥𝑛 , 𝑦𝑛 ).
The sum of squares of the deviation errors will thus
𝑛
8
𝑛
𝜕𝑆
= 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 − 𝑐𝑥𝑖2 ) 𝑥𝑖 = 0,
𝜕𝑏
𝑖=1
𝑛
𝜕𝑆
= 0 ⇒ ∑ 2(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖 − 𝑐𝑥𝑖2 ) 𝑥𝑖2 = 0.
𝜕𝑐
𝑖=1
𝑛𝑎 + 𝑏 ∑ 𝑥𝑖 + 𝑐 ∑ 𝑥𝑖2 = ∑ 𝑦𝑖 ,
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛
𝑎 ∑ 𝑥𝑖 + 𝑏 ∑ 𝑥𝑖2 + 𝑐 ∑ 𝑥𝑖3 = ∑ 𝑥𝑖 𝑦𝑖 ,
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛 𝑛 𝑛
∑ 𝑣𝑖 = 𝑛𝑎′ + 𝑏 ∑ 𝑢𝑖 ,
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
∑ 𝑢𝑖 𝑣𝑖 = 𝑎′ ∑ 𝑢𝑖 + 𝑏 ∑ 𝑢𝑖 2 … (7)
𝑖=1 𝑖=1 𝑖=1
′
Solving the system we find 𝑎′ , 𝑏 and hence 𝑎 will be determined from 𝑎 = 𝑒 𝑎 .
• ̅ = 𝟕, 𝑺𝒙 = 𝟐, 𝒚
A bivariate sample of size 𝟏𝟏 gave the results 𝒙 ̅ = 𝟗, 𝑺𝒚 = 𝟒 𝐚𝐧𝐝 𝒓 = 𝟎. 𝟓.
It was later found that one point of the sample values (𝒙 = 𝟕, 𝒙 = 𝟗) was inaccurate and
was rejected. How would the original value of 𝒓 be affected by this rejection?
1
∑11
𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑥̅ 𝑦
̅
11
𝑟= . Given 𝑥̅ = 7, 𝑆𝑥 = 2, 𝑦̅ = 9, 𝑆𝑦 = 4, 𝑟 = 0.5.
𝑆𝑥 𝑆𝑦
11 11 11 11 11
9
Let 𝑥̅ ′ , 𝑦̅ ′ , 𝑆𝑥′ , 𝑆𝑦′ and 𝑟 ′ be the values of 𝑥̅ , 𝑦̅, 𝑆𝑥 , 𝑆𝑦 , 𝑟 respectively after the rejection of the pair (7,9).
11
′
∑11
𝑖=1 𝑥𝑖 − 7 ∑11
𝑖=1 𝑦𝑖 − 9 1
𝑥̅ = = 7, 𝑦̅ ′ = = 9, 𝑆𝑥′ = √ (∑ 𝑥𝑖2 − 72 ) − 72 = √4.4, 𝑆𝑦′
10 10 10
𝑖=1
11 1
1 ∑11 𝑥 𝑦 − 36
2 2 2 ′ 10 𝑖=1 𝑖 𝑖
= √ (∑ 𝑦𝑖 − 9 ) − 9 = √17.6, 𝑟 = = 0.5 = 𝑟.
10 √4.4√17.6
𝑖=1
𝒚
• If 𝒚 = 𝟐𝒙 and 𝒙 = 𝟖 are two regression lines of a sample (𝒙𝒊 , 𝒚𝒊 ), 𝒊 = 𝟏, 𝟐, … , 𝒏 drawn
from a bivariate population of (𝑿, 𝒀). Find the correlation coefficient of the sample. If
(𝒖𝒊 , 𝒗𝒊 ), 𝒊 = 𝟏, 𝟐, … , 𝒏 be a sample from a bivariate population of (𝑼, 𝑽) where 𝒖𝒊 = 𝒙𝒊 +
𝒚𝒊 , 𝒗𝒊 = 𝒙𝒊 − 𝒚𝒊 , 𝒊 = 𝟏, 𝟐, … , 𝒏, find the regression lines of the sample (𝒖𝒊 , 𝒗𝒊 ), 𝒊 =
𝟏, 𝟐, … , 𝒏.
𝑦
Case-I: We take 𝑦 = 2𝑥 as regression line of 𝑌 on 𝑋 and 𝑥 = 8 as regression line 𝑋 on 𝑌.
𝑟𝑆𝑦 𝑟𝑆𝑥 1 2 2 1 1
= 2, = , 𝑟 = = , 𝑟 = [𝑆𝑥 , 𝑆𝑦 > 0].
𝑆𝑥 𝑆𝑦 8 8 4 2
𝑛
1 (∑𝑛𝑖=1 𝑥𝑖 + ∑𝑛𝑖=1 𝑦𝑖 )
𝑢̅ = ∑ 𝑢𝑖 = = 𝑥̅ + 𝑦̅, 𝑣̅ = 𝑥̅ − 𝑦̅.
𝑛 𝑛
𝑖=1
1
(𝑥̅ , 𝑦̅) is the point of intersection of the lines 𝑦 = 2𝑥 and 𝑥 = 𝑦. 𝑥̅ = 0 = 𝑦̅, 𝑢̅ = 0 =
8
1 1 1 1 2
𝑣̅ , 𝑆𝑢2 = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 + 𝑦𝑖 − 0)2 = 𝑛 ∑𝑛𝑖=1(𝑥𝑖 + 𝑦𝑖 )2 = 𝑛 ∑𝑛𝑖=1 𝑥𝑖2 + 𝑛 ∑𝑛𝑖=1 𝑦𝑖2 + 𝑛 ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 =
𝑆𝑥2 + 𝑆𝑦2 + 2𝑟𝑆𝑥 𝑆𝑦 = 21𝑆𝑥2 , 𝑆𝑣2 = 𝑆𝑥2 + 𝑆𝑦2 − 2𝑟𝑆𝑥 𝑆𝑦 = 13𝑆𝑥2 .
1 1
∑𝑛𝑖=1 𝑢𝑖 𝑣𝑖 − 𝑢̅𝑣̅ ∑𝑛𝑖=1 𝑢𝑖 𝑣𝑖 (𝑆𝑥2 − 𝑆𝑦2 ) 15
′ 𝑛 𝑛
𝑟 = = = = − .
𝑆𝑢 𝑆𝑣 𝑆𝑢 𝑆𝑣 21𝑆𝑥2 × 13𝑆𝑦2 √21√13
𝑟 ′ 𝑆𝑢 15
Regression line of 𝑈 on 𝑉 (𝑢 − 𝑢̅) = (𝑣 − 𝑣̅ ), 𝑢 = − 𝑣 .
𝑆𝑣 13
𝑟 ′ 𝑆𝑣 15
Regression line of 𝑉 on 𝑈 (𝑣 − 𝑣̅ ) = (𝑢 − 𝑢̅), 𝑣 = − 𝑢.
𝑆𝑢 21
𝑦
Case-II: We take 𝑦 = 2𝑥 as regression line of 𝑋on 𝑌 and 𝑥 = 8 as regression line 𝑌 on 𝑋.
𝑟𝑆𝑦 𝑟𝑆𝑥 1 2 8
= 8, = , 𝑟 = = 4, 𝑟 = 2[𝑆𝑥 , 𝑆𝑦 > 0] − Contradiction.
𝑆𝑥 𝑆𝑦 2 2
10