Professional Documents
Culture Documents
1
Simple Linear regression Model
The simple linear regression model is a
model that describes a linear relationship
between two variables.
The independent variable (regressor,
predictor) is denoted by X.
The dependent (response) variable is
denoted by Y.
Examples:
e.g.1 : Y : Blood Pressure
X : Age
e.g.2: Y : Yield of a chemical process
X : Reaction temperature
Assuming linear relationship, the regression
model is written as
Y = β0 + β1 X
2
β0: Value of Y for X = 0 (Y-intercept)
β1 : Change in Y due to a change of one unit
in X (Slope)
The above model describes an exact
relationship between X and Y.
In practical situations, the relationship
between X and Y is not exact.
3
The appropriate model is written as
Y = β0 + β1 X +ε
Y : Dependent variable
X: Independent, Predictor
4
ε : Random Error Term
The regression coefficients β0 and β1are
unknown parameters.
√ √
∑ ( x i− x̄ )2 ∑ ( y i− ȳ )2
i=1 i=1
S xy
r=
√ S xx √ S yy
5
n n
n ∑ xi∑ yi
i=1 i=1
S xy=∑ x i y i−
i=1 n
n
n ( ∑ xi )2
i=1
S xx =∑ x 2i −
i=1 n
n
n ( ∑ y i )2
i=1
S yy =∑ y 2i −
i=1 n
The inequality 1 r 1 is indicating that the r
cannot be out side of the range of –1 and 1.
Properties of Pearson's Correlation
1.The value of r falls between -1 and +1.
2.A positive value of r indicates that as one
variable increases, the other variable
6
increases. A negative value of r indicates
that as one variable increases, the other
variable decreases. If r = 0, then there is no
linear relationship between the two
variables.
3.The magnitude of r indicates the strength
of the association between the two
variables. As r gets closer to either -1 or
+1, the strength of the association becomes
greater.
4.r measures only the linear relationship
between X and Y.
8
Scatterplot of Yeild vs Tempreture
220
200
180
Yeild
160
140
120
50 60 70 80 90 100
Tempreture
Observatio X
n Temp Y Yield X2 Y2 XY
1 50 122 2500 14884 6100
2 53 118 2809 13924 6254
3 54 128 2916 16384 6912
4 55 121 3025 14641 6655
5 56 125 3136 15625 7000
6 59 136 3481 18496 8024
7 62 144 3844 20736 8928
8 65 142 4225 20164 9230
9 67 149 4489 22201 9983
10 71 161 5041 25921 11431
11 72 167 5184 27889 12024
12 74 168 5476 28224 12432
13 75 162 5625 26244 12150
9
14 76 171 5776 29241 12996
15 79 175 6241 30625 13825
16 80 182 6400 33124 14560
17 82 180 6724 32400 14760
18 85 183 7225 33489 15555
19 87 188 7569 35344 16356
20 90 200 8100 40000 18000
21 93 194 8649 37636 18042
22 94 206 8836 42436 19364
23 95 207 9025 42849 19665
24 97 210 9409 44100 20370
25 100 219 10000 47961 21900
Totals 1871 4158 145705 714538 322516
6 6
∑ x i= 1871 , ∑ x2i = 145705 , x̄=74 .84
i=1 i=1
6 6
∑ y i = 4158 , ∑ y 2i = 714538 , ȳ=166 .32
i=1 i=1
6
∑ x i y i=322516
i=1
n n
n ∑ xi∑ yi
S xy =∑ x i y i− i=1 i=1
=
i=1 n
(1871)(4158 )
S xy =322516− =11331.28
25
10
n 2
n
S xx =∑ x 2i −
(∑ )
i=1
xi
i =1 n
(18712 )
S xx=145705− =5679 .36
25
n 2
n
S yy =∑ y 2i −
(∑ )
i =1
yi
i=1 n
2
( 4158)
S yy =714538− =22979. 44
25
S xy 11331.28
r= = =0.99
√ S xx √ S yy √ 5679.36 √ 22997.44
The Model
Y i=β0 +β 1 xi +ε i , i=1,...,n
Assumptions
11
1. Theε1, ε2, …, εn are independent.
2
2. Var ( ε i )=σ , i=1 , .. . , n
(constant variance)
Y =β 0 + β1 X
Where Y is the fitted (predicted) value, the
above model is called the estimated model.
12
Least squares estimates of β0 and β1
n
Q( β 1 , β 0 )=∑ (Y i −β 0−β 1 x i )2
i=1
n n
n β 0 + β 1 ∑ x i= ∑ Y i
i =1 i=1
13
n n n
2
β 0 ∑ x i +β 1 ∑ xi = ∑ xi Y i
i=1 i=1 i=1
These equations are called normal equations.
The solution of these equations is
S xy
β 1=
S xx ,
β 0 = ȳ−β 1 x̄
Where
n n
n ∑ xi∑ yi
S xy =∑ x i y i− i=1 i=1
=
i =1 n
14
n 2
S xx =∑
n
2
xi −
i=1
( ∑ xi )
i =1 n
n 2
S yy =∑
n
2
yi −
(∑ )
i =1
yi
i=1 n
Fitted values
^y i=β 0 +β 1 x i , i=1,...,n
15
Residuals
e i= y i − ^y i , i=1,...,n
(actual value – fitted value)
16
n
Error sums of squares= SSE=∑ ( y i − ^y i )2
i=1
n
∑ ( y i − ^y i )2
SSE
^ 2 = s2 =
σ = i =1
n−2 n−2
S yy − β 1 S xy
2
s =
n− 2
σ^ S
S . E( β 1 )= =
√ S xx √ S xx
Where
n
∑ ( y i− y^ i )2
s 2 = i=1 = MSE
n−2
17
is the estimate of σ2
Assuming normality
β 1 −β 1
t=
s / √ S xx
Has t distribution with d.f = n-2
Result:
s
β 0 ±t α /2 , n−2
√ S xx
18
β0
The standard error of
1 x̄ 2
√
S . E( β 0 )=s +
n S xx
Assuming normality
β 0 −β 0
t=
S . E( β 0 )
Has t distribution with d.f = n-2
Result:
19
Result:
E( s2 )=E ( i=1
y
∑ i i
( − y
^
n−2
)2
)
=σ 2
Testing Hypothesis
Test
H 0 : β 1= 0
H 1 : β 1≠0
Test statistic
20
β1
T=
S . E ( β 1)
Test
H 0 : β 0 =0
H 1 : β 0≠ 0
Test statistic
β0
T=
S . E( β0 )
2
SS E
R =1−
SST
2
0≤R ≤ 1
22
Proportion of variation in response data that is
explained by model
Prediction:
^y ( x 0 )=β 0 +β 1 x 0
(The estimated or predicted value of y at x0)
23
Scatterplot of Yeild vs Tempreture
220
200
180
Yeild
160
140
120
50 60 70 80 90 100
Tempreture
Observatio X
n Temp Y Yield X2 Y2 XY
1 50 122 2500 14884 6100
2 53 118 2809 13924 6254
3 54 128 2916 16384 6912
4 55 121 3025 14641 6655
5 56 125 3136 15625 7000
6 59 136 3481 18496 8024
7 62 144 3844 20736 8928
8 65 142 4225 20164 9230
9 67 149 4489 22201 9983
10 71 161 5041 25921 11431
11 72 167 5184 27889 12024
12 74 168 5476 28224 12432
13 75 162 5625 26244 12150
14 76 171 5776 29241 12996
15 79 175 6241 30625 13825
16 80 182 6400 33124 14560
17 82 180 6724 32400 14760
18 85 183 7225 33489 15555
19 87 188 7569 35344 16356
20 90 200 8100 40000 18000
21 93 194 8649 37636 18042
22 94 206 8836 42436 19364
23 95 207 9025 42849 19665
24 97 210 9409 44100 20370
25 100 219 10000 47961 21900
Totals 1871 4158 145705 714538 322516
24
6 6
∑ x i= 1871 , ∑ x2i = 145705 , x̄=74 .84
i=1 i=1
6 6
∑ y i = 4158 , ∑ y 2i = 714538 , ȳ=166 . 32
i=1 i=1
6
∑ x i y i=322516
i =1
n n
n ∑ xi∑ yi
S xy =∑ x i y i− i =1 i=1
=
i =1 n
(1871)(4158 )
S xy=322516− =11331.28
25
n 2
S xx = ∑ x 2
i −
n (∑ )
i=1
xi
i =1 n
2
(1871 )
S xx =145705− =5679 .36
25
n 2
S yy =∑ y 2i −
n (∑ )
i =1
yi
i=1 n
25
2
( 4158)
S yy =714538− =22979. 44
25
S xy 11331 .28
β 1= = =1 .9951
S xx 5679. 36
or
2
S yy −β 1 S xy
s =
n−2
26
22979. 44−1 . 9951(11331.28 ) 372
s2= = =16
25−2 23
2
s =16, s=4.01967
Standard Errors
S 4 . 02
S . E( β 1 )= = =0 . 05334
√ S xx √ 5679. 36
1 x̄ 2
S . E( β 0 )=s +
n S xx√
2
1 (74 .84 )
S . E( β 0 )=4 . 02
√+
25 5679. 36
S . E( β 0 )=4 . 072
A 95% C.I for β1 is given by
27
1.9951±t0 .025,23( 0.05334)
1. 9951±2. 069(0 . 05334 )
(1.89, 2.11)
Test
H 0 : β 1= 0
H 1 : β 1≠0
β1 1 . 9951
t= = =4 . 61
S . E( β 1 ) 0 . 05332
n
SS E =∑ ( y i − ^y i )2=372
i =1
Coefficient of determination:
2
SS E 327
R =1− =1− =98 . 4 %
SST 22979
Regression
Statistics
Multiple R 0.991880955
R Square 0.983827829
29
Adjusted R
Square 0.983124692
Standard Error 4.019665911
Observations 25
Standard
Coefficients Error t Stat P-value
4.17524
Intercept 17.00159173 4.071997 6 0.000363789
37.4058
X Temp 1.99516847 0.053338 3 4.173E-22
Cheking linearity
1) Scatterplot of X vs. Y
30
Suppose you fit linear regression model,
then the residual plot will be
(plot of residuals vs. the x-values)