You are on page 1of 20

LINEAR REGRESSION

 Linear vs Non-linear regression


 A linear regression model is a mathematical equation that describes a straight line relationship between two
or more variables.
 On the other hand a non-linear regression model is a mathematical equation that describes a non-linear
relationship between two or more variables e.g. parabolic, cubic etc.
 A simple linear regression model includes only two variables; one called the independent variable and the
other called the dependent variable. For more than 2 variables, you have multiple linear regression.

1
SIMPLE LINEAR REGRESSION
 Simple Linear Regression Model
 The equation of a simple linear regression model between two variables 𝑥 and 𝑦 is written as
𝑦 = 𝑎 + 𝑏𝑥
 where 𝑥 is the independent variable and 𝑦 is the dependent variable
 𝑎 gives the 𝑦-intercept and 𝑏 represents the slope of the line.
 This model gives the exact relationship between x and y, but in real life, that is never the case. Hence the
complete simple linear regression model used is always given as:
𝑦 = 𝑎 + 𝑏𝑥 + 𝑒
 Where 𝑒 is the random errors
 After estimation of the Simple linear regression Model, the computed values of the dependent variable 𝑦ො can
be compared to the observed values of the dependent variable 𝑦 so that for each data point
𝑒 = 𝑦 − 𝑦ො = 𝑦 − 𝑎 − 𝑏𝑥
2
SIMPLE LINEAR REGRESSION
 Limitations of Simple Linear Regression Model
 A value of a dependent variable cannot be estimated if the value of the independent variable and vice-
versa is beyond the values on which the regression data is based. If x ranges from say 200 to 400, you
cannot predict y corresponding to say x=1500 or x=2000 i.e. any value outside the range.
 The analysis is confined to normally distributed data

 Methods of Simple Linear Regression


 There are two methods of simple linear regression
 Least-squares method
 Coefficient of correlation method
3
METHODS OF SIMPLE LINEAR REGRESSION
 Simple Linear regression by least-squares method
 The line that best fit the scatter of points is obtained by minimizing the sum of squares of errors denoted by
S𝑆𝐸:
𝑆𝑆𝐸 = ෍ 𝑒 2 = ෍ 𝑦 − 𝑦ො 2
= 𝑚𝑖𝑛𝑖𝑚𝑢𝑚

 The aim is to estimate the slope and the y-intercept of the best fitting line..
 The method is also based on the assumption that the best fitting line model must pass through the mean of
the datasets

4
METHODS OF SIMPLE LINEAR REGRESSION
 Simple Linear regression by least-squares method
 First compute the slope b using the following formula
𝑆𝑥𝑦
𝑏=
𝑆𝑥𝑥
where
σ𝑥 σ𝑦
𝑆𝑥𝑦 = ෍ 𝑥𝑦 −
𝑛
σ𝑥 2
2
𝑆𝑥𝑥 = ෍𝑥 −
𝑛

 We then compute the mean of the independent (x) and dependent (y) variables through the formula
σ𝑥 σ𝑦
𝑥ҧ = 𝑎𝑛𝑑 ഥ𝑦 =
𝑛 𝑛 5
METHODS OF SIMPLE LINEAR REGRESSION
 Simple Linear regression by least-squares method
 Since the regression equation for a set of n data points must pass through the mean of y and
x, the equation can be estimated using the formula
𝑦ത = 𝑎 + 𝑏𝑥ҧ
 Where 𝑦ത and 𝑥ҧ are the mean of y and x respectively. The 𝑦-intercept and 𝑎 is therefore given
as
𝑎 = 𝑦ത − 𝑏𝑥ҧ
 There are errors since we are simply taking a straight line and forcing it to fit into the given
data in the best possible way.
 We then have to estimate the standard deviation 𝜎𝑒 which measures the spread of the errors
around the regression line and is calculated using:
6
𝑆𝑦𝑦 −𝑏𝑆𝑥𝑦 σ𝑦 2
𝜎𝑒 = where 𝑆𝑦𝑦 = σ 𝑦 2 −
𝑛−2 𝑛
METHODS OF SIMPLE LINEAR REGRESSION
Example
Fit the following data using simple linear regression by least squares method
x 0 2 4 6
y 11 16 19 26
First compute the slope b using the following formula
𝑆𝑥𝑦
n x y xy x2 y2 𝑏=
1 0 11 0 0 121 𝑆𝑥𝑥
2 2 16 32 4 256
σ𝑥 σ𝑦 12 (72)
3 4 19 76 16 361 𝑆𝑥𝑦 = ෍ 𝑥𝑦 − = 264 − = 264 − 216 = 48
𝑛 4
4 6 26 156 36 676
σ 12 72 264 56 1,414
σ 𝑥 2 12 2
𝑆𝑥𝑥 = ෍ 𝑥 2 − = 56 − = 56 − 36 = 20
𝑛 4
Therefore
𝑆𝑥𝑦 48 7
𝑏= = = 2.4
𝑆𝑥𝑥 20
METHODS OF SIMPLE LINEAR REGRESSION
Example
Fit the following data using simple linear
regression by least squares method

x 0 2 4 6
We then compute the means of y and x using
y 11 16 19 26
the following formula
n x y xy x2 y2
σ𝑥 12 σ𝑦 72
1 0 11 0 0 121 𝑥ҧ = = = 3 and 𝑦ത = = = 18
𝑛 4 𝑛 4
2 2 16 32 4 256
3 4 19 76 16 361
4 6 26 156 36 676 But 𝑏 = 2.4, therefore
σ 12 72 264 56 1,414
𝑎 = 𝑦ത − 𝑏𝑥ҧ = 18 − 2.4 3 = 18 − 7.2 = 10.8

Hence 8

𝑦ො = 10.8 + 2.4𝑥
METHODS OF SIMPLE LINEAR REGRESSION
Example
Fit the following data using simple linear
regression by least squares method
We can then estimate the standard deviation 𝜎𝑒
x 0 2 4 6 which measures the spread of the errors around
y 11 16 19 26
the regression line using:
𝑆𝑦𝑦 − 𝑏𝑆𝑥𝑦
n x y xy x2 y2 𝜎𝑒 =
1 0 11 0 0 121 𝑛−2
2 2 16 32 4 256 Where
3 4 19 76 16 361 σ𝑦 2 72 2

4 6 26 156 36 676 𝑆𝑦𝑦 = ෍ 𝑦2 − = 1414 − = 1414 − 1296 = 118


𝑛 4
σ 12 72 264 56 1,414
Therefore
𝑆𝑦𝑦 −𝑏𝑆𝑥𝑦 118−2.4(48)
𝜎𝑒 = = = 1.4 = ±1.189
𝑛−2 2
METHODS OF SIMPLE LINEAR REGRESSION
Example
An experiment gives the relationship between force (N) and velocity (m/s) for a
suspended object in a wind tunnel as

Velocity (m/s) 10 20 30 40 50 60 70 80
Force (N) 24 68 378 552 608 1218 831 1452

i. Use the linear least squares regression method to determine the coefficients a and b
in the function 𝑦 = 𝑎 + 𝑏𝑥 that best fits the data
ii. Estimate the force when the velocity is 55m/s
iii. Determine the standard error σe for the data
10
METHODS OF SIMPLE LINEAR REGRESSION
Example
n x y xy x2 y2
1 10 24 240 100 576 First compute the slope b using the following formula
2 20 68 1,360 400 4,624 𝑆𝑥𝑦
𝑏=
3 30 378 11,340 900 142,884 𝑆𝑥𝑥
4 40 552 22,080 1,600 304,704 where
5 50 608 30,400 2,500 369,664
σ𝑥 σ𝑦 360 (5131)
6 60 1,218 73,080 3,600 1,483,524 𝑆𝑥𝑦 = ෍ 𝑥𝑦 −
𝑛
= 312830 −
8
= 312830 − 230895
7 70 831 58,170 4,900 690,561 = 81935
8 80 1,452 116,160 6,400 2,108,304
σ𝑥 2 2
2
360
𝑆𝑥𝑥 = ෍𝑥 − = 20400 − = 20400 − 16200 = 4200
σ 360 5,131 312,830 20,400 5,104,841 𝑛 8
Therefore
𝑆𝑥𝑦 81935
𝑏= = = 19.51
𝑆𝑥𝑥 4200
11
METHODS OF SIMPLE LINEAR REGRESSION
Example
We then compute the means of y and x using the
n x y xy x2 y2
1 10 24 240 100 576
following formula
2 20 68 1,360 400 4,624
σ𝑥 360 σ𝑦 5131
3 30 378 11,340 900 142,884 𝑥ҧ = = = 45𝑚/𝑠 and 𝑦ത = = = 641.375𝑁
𝑛 8 𝑛 8
4 40 552 22,080 1,600 304,704
5 50 608 30,400 2,500 369,664
6 60 1,218 73,080 3,600 1,483,524 But 𝑏 = 19.51, therefore
7 70 831 58,170 4,900 690,561
8 80 1,452 116,160 6,400 2,108,304 𝑎 = 𝑦ത − 𝑏𝑥ҧ = 641.375 − 19.51 45 = −236.5

σ 360 5,131 312,830 20,400 5,104,841 Hence


𝑦 = −236. 5 + 19.51𝑥
When the velocity is 55m/s 12
𝑦 = −236.4985 + 19.5083𝑥 = −236.5 + 1070.9
= 836N
METHODS OF SIMPLE LINEAR REGRESSION
We can then estimate the standard deviation 𝜎𝑒
Example which measures the spread of the errors around
n x y xy x2 y2 the regression line using:
1 10 24 240 100 576
𝑆𝑦𝑦 − 𝑏𝑆𝑥𝑦
2 20 68 1,360 400 4,624 𝜎𝑒 =
3 30 378 11,340 900 142,884 𝑛−2
4 40 552 22,080 1,600 304,704 Where
5 50 608 30,400 2,500 369,664
6 60 1,218 73,080 3,600 1,483,524 σ 𝑦 2
5131 2
2−
7 70 831 58,170 4,900 690,561 𝑆𝑦𝑦 = ෍ 𝑦
𝑛
= 5104841 −
8
= 1813945.875
8 80 1,452 116,160 6,400 2,108,304
Therefore
σ 360 5,131 312,830 20,400 5,104,841
𝑆𝑦𝑦 − 𝑏𝑆𝑥𝑦 1813945.875 − 19.5083(81935) 13
𝜎𝑒 = = = ± 189.5316
𝑛−2 6
METHODS OF SIMPLE LINEAR REGRESSION
 Simple linear regression by coefficient of correlation method
 A correlation exists between two variables when one of them is related to the other in some way
 A more precise and objective measure to define the correlation between the two variables. Is the use the
linear correlation coefficient and varies between -1 and +1. It is a measure of how well one variable can predict
the other (given the context of the data), and determines the precision you can assign to a relationship
 The line that best fit the scatter of points can also be determined by calculation of the coefficient of
correlation method through the following process:
 We first compute the mean of the independent and dependent variables through the formulas
σ𝑥 σ𝑦
𝑥ҧ = 𝑎𝑛𝑑 ഥ𝑦 =
𝑛 𝑛
 We then compute the standard errors of the independent and dependent variables through the formula
14
σ 𝒙−𝒙ഥ 𝟐 σ 𝒚−𝒚ഥ 𝟐
𝜎𝑥 = 𝑎𝑛𝑑 𝜎𝑦 =
𝑛 𝑛
METHODS OF SIMPLE LINEAR REGRESSION

 Simple linear regression by coefficient of correlation method


 The coefficient of correlation (γ ) is then computed using the formula
σ x − xത y − yത
γ=
n × σx × σy
 The simple linear regression equation is then determined by reducing the formula

𝜎𝑦
𝑦 − 𝑦ത = 𝛾 𝑥 − 𝑥ҧ
𝜎𝑥
 into the format 15

𝑦 = 𝑎 + 𝑏𝑥
METHODS OF SIMPLE LINEAR REGRESSION
Previous Example
An experiment gives the relationship between force (N) and velocity (m/s) for a
suspended object in a wind tunnel as

Velocity (m/s) 10 20 30 40 50 60 70 80
Force (N) 24 68 378 552 608 1218 831 1452

Use the coefficient of correlation method to determine the coefficients a and b in the
function 𝑦 = 𝑎 + 𝑏𝑥 that best fits the data

16
METHODS OF SIMPLE LINEAR REGRESSION

 Simple Linear Regression by coefficient of correlation method


We first compute the mean of the
Dx Dy dx2 dy2 dx.dy
dependent and independent variables as
n x y 𝒙−𝒙ഥ 𝒚−𝒚 ഥ 𝒙−𝒙ഥ 𝟐 𝒚−𝒚 ഥ 𝟐 ഥ 𝒚−𝒚
𝒙−𝒙 ഥ σ 𝑥 360
𝑥ҧ = = = 45𝑚/𝑠
1 10 24 -35 -617.375 1225 381151.9 21608.13 𝑛 8
2 20 68 -25 -573.375 625 328758.9 14334.38 and
3 30 378 -15 -263.375 225 69366.39 3950.625 σ 𝑦 5131
4 40 552 -5 -89.375 25 7987.891 446.875 ഥ𝑦 = = = 641.475𝑁
𝑛 8
5 50 608 5 -33.375 25 1113.891 -166.875 We then compute the standard errors of
6 60 1218 15 576.625 225 332496.4 8649.375 the variables as
7 70 831 25 189.625 625 35957.64 4740.625
8 80 1452 35 810.625 1225 657112.9 28371.88 σ 𝒙−𝒙ഥ 𝟐 4200
𝜎𝑥 = = = 22.91288
෍ 𝑛 8
360 5131 0 0 4200 1813946 81935

σ 𝒚−𝒚ഥ 𝟐 1813946
𝜎𝑦 = = = 476.17565
𝑛 8 17
METHODS OF SIMPLE LINEAR REGRESSION

 Simple Linear Regression by coefficient of correlation method


The coefficient of correlation (γ ) is then computed using
the formula
σ x − xത y − yത
Dx Dy dx2 dy2 dx.dy γ=
x y 𝒙−𝒙ഥ 𝒚−𝒚 ഥ 𝒙−𝒙ഥ 𝟐 𝒚−𝒚ഥ 𝟐 ഥ 𝒚−𝒚
𝒙−𝒙 ഥ n × σx × σy
10 24 -35 -617.375 1225 381151.9 21608.13
20 68 -25 -573.375 625 328758.9 14334.38 81935
30 378 -15 -263.375 225 69366.39 3950.625 =
8 × 22.912878 × 476.175650
40 552 -5 -89.375 25 7987.891 446.875
50 608 5 -33.375 25 1113.891 -166.875
= 0.938713
60 1218 15 576.625 225 332496.4 8649.375
70 831 25 189.625 625 35957.64 4740.625
But
𝜎𝑦
80 1452 35 810.625 1225 657112.9 28371.88 𝑦 − 𝑦ത = 𝛾 𝑥 − 𝑥ҧ
360 5131 0 0 4200 1813946 81935 𝜎𝑥
476.175650
𝑦 − 641.375 = 0.938713 𝑥 − 45
22.912878

𝑦 − 641.375 = 19.508343𝑥 − 877.875415


18

𝑦 = −236.500 + 19.508343𝑥
METHODS OF SIMPLE LINEAR REGRESSION

 Simple linear regression by coefficient of correlation method


 Example 3:
Given that the means for 1000 samples of two variables is given as 𝑥ҧ = 64.25 and 𝑦ത = 64.75 while
the standard deviations are 𝜎𝑥 = 2.35 and 𝜎𝑦 = 2.25 , using the coefficient of correlation 𝛾 = 0.875
determine the simple linear regression equation of the sample data. Determine y when x=100
𝜎𝑦
𝑦 − 𝑦ത = 𝛾 𝑥 − 𝑥ҧ
𝜎𝑥

2.25
𝑦 − 64.75 = 0.875 𝑥 − 64.25
2.35
𝑦 − 64.75 = 0.837766𝑥 − 53.8265
19

𝑦 = 10.9235 + 0.837766𝑥
METHODS OF SIMPLE LINEAR REGRESSION
 Application of Simple linear regression in geospatial science
 The application of simple linear regression in geospatial science is best described using geospatial information
system (GIS).
 Mostly clients are interested in the production of maps that relies on modeling to predict based on actual
observations.
 GIS can be used to investigate associations between such variables
 the distribution of mosquito species responsible for malaria transmission i.e. species distribution vs
prevalence
 temperature and relative humidity vs malaria prevalence
 NDVI vs rainfall patterns
 Crime vs uneducated youth
 Population vs poverty index
20
 geologic rock types vs groundwater recharge

You might also like