You are on page 1of 17

 In many problems there are two or more variables that

are inherently related, and it is necessary to explore


the nature of this relationship.
 Regression analysis is a statistical technique for
modeling and investigating the relationship between
two or more variables.
 If the true relationship is a straight line, and that the
observation y at each level of x is a random variable
whose expected value
E(Y/x) = 0 +1x

 The model becomes, the random variable y


Y = 0 +1x+

where  is a random error with NID (0,2)


 Suppose that we have ‘n’ pairs of observations, say

(y1, x1), (y2, x2),…, (yn, xn).


 These data may be used to estimate the unknown

parameters 0 and 1 using least square error


method.
 0 and 1will be estimated so that the sum of

squares of the deviations between the observations


and the regression line is a minimum.
𝑛 𝑛
σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
෍ 𝑥𝑖 ෍ 𝑦𝑖 𝑋ത = 𝑌ത =
𝑖 𝑖
𝑛 𝑛

𝑛 𝑛 𝑛

෍ 𝑥𝑖2 ෍ 𝑦𝑖2 ෍ 𝑥𝑖 𝑦𝑖
𝑖=1 𝑖=1 𝑖

𝑛 𝑛
2
σ𝑛1 𝑥𝑖 2 σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑥𝑥 = ෍ 𝑥𝑖 − 𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 −
𝑛 𝑛
𝑖=1 𝑖=1

𝑆𝑥𝑦
𝛽መ1 = 𝛽መ0 = 𝑦-
ത 𝛽መ1 𝑥ҧ
𝑆𝑥𝑥

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥
 A chemical engineer is investigating the effect of process
operating temperature on product yield. The study results
the following data:
Temp Yield
x2 y2 xy
(xi) (yi)
100 45 1002 452 4500
110 51
120 54
130 61
140 66
150 70
160 74
170 78
180 85
190 89
sum 1450 673 218500 47,225 101570
σ𝑛𝑖 𝑥𝑖 1450 σ𝑛𝑖 𝑦𝑖 673
𝑋ത = = 145 𝑌ത = = = 67.3
𝑛 10 𝑛 10

𝑛
σ𝑛1 𝑥𝑖 2 14502
𝑆𝑥𝑥 = ෍ 𝑥𝑖2 − = 2,18,000 − = 8250
𝑛 10
𝑖=1

𝑛
σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖 1450 673
𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 − = 101570 − = 3985
𝑛 10
𝑖=1

𝑆𝑥𝑦 3985
𝛽መ1 = = = 0.48 𝛽መ0 = 𝑦-
ത 𝛽መ1 𝑥ҧ = 67.3 − 0.48 145 = −2.74
𝑆𝑥𝑥 8250

𝑦ො = 𝛽መ0 + 𝛽መ1 𝑥 = −2.74 + 0.48𝑥


 It is very important that the regression model fitted is
to be assessed for its adequacy.
 Testing statistical hypothesis about the model
parameters (slope and intercept of regression line)
does this assessment adequately well.
 To test hypothesis about the slope and intercept of the
regression model
 There are two hypotheses of our interest in assessing
the adequacy of the model: one is on slope and
intercept of the simple linear regression and the other
is on significance of regression
 H0 : 1= 10=𝛽መ1
If 10 is not given it can be taken as 𝛽መ1
 H1 : 1 10

Follows t distribution with (n-2)


The test statistic: degrees of freedom (dof ) would
𝛽መ1 − 𝛽10 be rejected if to > t, (n-2)
𝑡0 =
𝑀𝑆𝐸 Τ𝑆𝑥𝑥
𝑛
σ𝑛
1 𝑥𝑖
2 σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑥𝑥 = σ𝑛𝑖=1 𝑥𝑖2 − =8250 𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 − = 3985
𝑛 𝑛
𝑖=1
𝑛 𝑛 2
2
σ1 𝑦𝑖 6732
𝑆𝑦𝑦 = ෍ 𝑦𝑖 − = 47225 − = 1932.1
𝑛 10
𝑖=1
𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝛽መ1 𝑆𝑥𝑦 = 1932.1 − 0.48 3985 =19.3
𝑆𝑆𝐸 19.3 𝛽መ1 − 𝛽10
𝑀𝑆𝐸 = = = 2.41 𝑡0 = = 0 𝑎𝑔𝑎𝑖𝑛𝑠𝑡 𝑡0.1,8 = 1.4
𝑛 − 2 10 − 2 𝑀𝑆𝐸 Τ𝑆𝑥𝑥
Follows t distribution with (n-2)
degrees of freedom (dof ) under
H0 would be rejected if to > t, (n-2)
 H0 : 0 = 00 =𝛽መ0
 H1 : 0  00 𝛽መ0 = 𝑦ത − 𝛽መ1 𝑥ҧ
m 𝑛
σ𝑛1 𝑥𝑖 2
The test statistic: 𝑆𝑥𝑥 = ෍ 𝑥𝑖2 −
𝑛
𝑖=1
𝛽መ0 − 𝛽00
𝑡0 =
1 𝑥ҧ 2 𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝛽መ1 𝑆𝑥𝑦
𝑀𝑆𝐸 +
𝑛 𝑆𝑥𝑥
𝑛 𝑛
σ𝑛1 𝑦𝑖 2 σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖
𝑆𝑦𝑦 = ෍ 𝑦𝑖2 − 𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 −
𝑛 𝑛
𝑖=1 𝑖=1

𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑛−2
 The second hypothesis relates to the significance of
regression
 H0 : 1= 0 (no linear relationship between x and y)
 H1 : 1  0
 Failing to reject H0 which is equivalent to conclude that
there is no linear relationship between x and y.
 Significance of linear regression model can be done by
ANOVA partitioning variability by regression line and
residual variation
2 2 2
𝑆𝑦𝑦 = σ 𝑦𝑖 − 𝑦ത = σ 𝑦ො𝑖 − 𝑦ത − σ 𝑦𝑖 − 𝑦ො𝑖 = 𝑆𝑆𝑅 + 𝑆𝑆𝐸
 It is like one-way ANOVA keeping SST as Syy and SSA as
SSR, regression sum of squares (like between the
samples)
 Regression has two parameters and hence the dof
becomes, (2-1=1)
 Therefore, as per ANOVA procedure SSE = Syy-SSR
 H0 : 1= 0
 H1 : 1 0

𝑛
𝑆𝑆𝑅 = 𝛽መ1 𝑆𝑥𝑦 𝑆𝑥𝑦 σ𝑛𝑖 𝑥𝑖 σ𝑛𝑖 𝑦𝑖

𝛽1 = 𝑆𝑥𝑦 = ෍ 𝑥𝑖 𝑦𝑖 −
𝑆𝑥𝑥 𝑛
𝑖=1

𝑛 𝑛 𝑛 2
σ𝑛1 𝑥𝑖 2 2
σ1 𝑦𝑖
𝑆𝑥𝑥 = ෍ 𝑥𝑖2 − 𝑆𝑦𝑦 = ෍ 𝑦𝑖 −
𝑛 𝑛
𝑖=1 𝑖=1

𝑆𝑆𝐸 = 𝑆𝑦𝑦 − 𝑆𝑆𝑅


ANOVA Summary
Source of Sum of
dof MS F0
variation squares
1.
SSR 1 MSR F0 = MSR/MSE
Regression

2. Error SSE n-2 MSE -

3. Total SST i.e., Syy n-1 - -

 Find F,1,n-2 from table


 Compare F0 computed with critical obtained from table F
 If (F0 >F) reject the null hypothesis saying that factor levels, i.e.,
regression at various levels of x does not have influence is
rejected

14
 Estimation of model parameters requires the errors or
residuals to follow NID (0,σ2)
 Generate error or residuals
 Apply goodness of fit test for normal distribution for
mean 0 and S2 = σ2
 Goodness of fit of a regression model
 H0: the model adequately fits the data
 H1: the model does not fit the data
 Partitions the error or residuals into: Pure error(PE)
within the replicates and lack-of-fit (LOF)
 The situation is similar to one-way ANOVA
 SSE is similar to SST
 SSPE is similar to within samples with dof of m(n-1)
 SSLOF is similar to between the samples
 SSE = SSPE + SSLOF
A case
ANOVA Summary
Source of
Sum of squares dof MS F0
variation
LOF SSLOF= m-2 MSR F0 = MSR/MSE
𝑚 𝑛

Pure Error 𝑆𝑆𝑃𝐸 = ෍ ෍ 𝑦𝑖𝑢 − 𝑦ത𝑖 2


m(n-1) MSE -
𝑖=1 𝑢=1
Total SSE=Syy-𝛽መ1 𝑆𝑥𝑦 mn-2

 Find F,1,n-2 from table


 Compare F0 computed with critical obtained from table F
 If (F0 >F) reject the null hypothesis saying that the model does
not adequately fit to the data.

17

You might also like