You are on page 1of 77

QMS 105

Business Statistics
For SOB students
2024

4/15/2024 1
Simple linear regression and
Correlation

Topic-2

4/15/2024 2
Correlation
Introduction

❑ Researchers are often interested to know what


relationship exist, if any, between two or more
variables.
❑ Correlation is a measure of linear relationship
between variables.
❑ Specifically correlation measures the strength or
degree of linear relationship between variables.
❑ Note; Correlation does not imply a causal
relationship.
4/15/2024 3
Correlation

Examples of Correlation Cases


❑ Correlation between quantity of food consumed and
weight gained by a person.
❑ Correlation between income and expenditure
❑ Correlation between sale of ice cream and
temperature
❑ Correlation between final grade score and test score
❑ Correlation between quantity demanded and price
for commodity X
❑ Correlation between production and factors of
production, inter alia.
4/15/2024 4
Correlation
Forms of linear relations between two variables
❑ Suppose we have two variables X and Y, the
forms of linear relations are as follows;

i. Positive or direct linear relationship


ii. Negative or indirect or inverse linear
relationship
iii. Zero or no linear relationship
iv. Non-linear relationship

4/15/2024 5
Correlation
Positive or direct relationship
❑ X and Y variables are said to have a positive or
direct linear relationship when X increases and
Y increases or when X decreases and Y
decreases

4/15/2024 6
Correlation
Negative or Indirect or Inverse relationship
❑ X and Y variables are said to have a negative
linear relationship when X increases and Y
decreases or when X decreases and Y increases

4/15/2024 7
Correlation
Zero or no relationship
❑ X and Y variables are said to have a zero or no
relationship when changes in X (either increase
or decrease) does not determine changes in Y

4/15/2024 8
Correlation
Non-linear relationship
❑ X and Y variables are said to have a non-linear
relationship when changes in X (either increase
or decrease) does not correspond with a
constant change in Y

4/15/2024 9
Measurement of correlation
❑ Correlation can be computed by using Karl
Pearson’s Coefficient of correlation which is
given by;

4/15/2024 10
Measurement of correlation
❑ Alternatively, correlation coefficient is given by;

4/15/2024 11
Measurement of correlation
Properties of r

❑ r- is the coefficient of correlation


❑ Range of r is such that -1 ≤ r ≤ 1
❑ When the value of r is positive, it indicates
positive relationship between variables
❑ The value of r is negative it indicates negative
relationship between variables

4/15/2024 12
Measurement of correlation
Properties of r (Continued)

❑ Magnitude of r that is | r | expresses the strength


(degree) of linear association.
No. Range of r Interpretation
1 |r |= 0 No linear relationship
2 0 < |r | ≤ 0.2 Very weak linear relationship
3 0.2 < |r | ≤ 0.4 Weak linear relationship
4 0.4 < |r | ≤ 0.6 Average linear relationship
5 0.6 < |r | ≤ 0.8 Strong linear relationship
6 0.8< |r | <1 Very strong linear relationship
7 |r |= 1 Perfect linear relationship
4/15/2024 13
Measurement of correlation
Properties of r (Continued)
❑ Magnitude of r that is | r | expresses the strength
(degree) of linear association.

4/15/2024 14
Measurement of correlation
Assumptions of Karl Pearson’s coefficient of
correlation

❑ The two variables X and Y should be measured


in a continuous scale
❑ Each variable, that is X and Y should be
normally distributed.
❑ There should not be outliers in either of the
variables.
❑ Each observation (data collected) should be
independent from the other observations.
4/15/2024 15
Measurement of correlation
Example

❑ The following data are measurements on wing


length (X) and tail length (Y) for a sample of 12
birds.
❑ Compute the Coefficient of correlation
between the two variable and interpret your
results
❑ Demonstrate the relationship between the two
variables with an aid of a scatter plot.

4/15/2024 16
Measurement of correlation
Example
No Wing length (X cm) Tail length (Y cm)
1 10.4 7.4
2 10.8 7.6
3 11.1 7.9
4 10.2 7.2
5 10.3 7.4
6 10.2 7.1
7 10.7 7.4
8 10.5 7.2
9 10.8 7.8
10 11.2 7.7
11 10.6 7.8
12 11.4 8.3
4/15/2024 17
Measurement of correlation
Solution
No X Y X2 Y2 XY
1 10.4 7.4 108.16 54.76 76.96
2 10.8 7.6 116.64 57.76 82.08
3 11.1 7.9 123.21 62.41 87.69
4 10.2 7.2 104.04 51.84 73.44
5 10.3 7.4 106.09 54.76 76.22
6 10.2 7.1 104.04 50.41 72.42
7 10.7 7.4 114.49 54.76 79.18
8 10.5 7.2 110.25 51.84 75.6
9 10.8 7.8 116.64 60.84 84.24
10 11.2 7.7 125.44 59.29 86.24
11 10.6 7.8 112.36 60.84 82.68
12 11.4 8.3 129.96 68.89 94.62
n=12 Sum (X)=128.2 Sum (Y)=90.8 Sum (X2)=1371.32 Sum (Y2)=688.4 Sum (XY)=971.37

4/15/2024 18
Measurement of correlation
Solution

4/15/2024 19
Measurement of correlation
Solution
❑ Therefore Coefficient of Correlation is 0.87
❑ Since the value of r=0.87 this means that, there
is a very strong direct (positive) linear
relationship between wing length and tail
length of such birds.
❑ That is the longer the length of the wing the
longer the length of the tail and vice versa is
true.

4/15/2024 20
Measurement of correlation
Solution
❑ The Scatter plot to show the relationship
between Wing length and Tail length of birds

4/15/2024 21
Simple Linear Regression
Introduction

❑ Regression analysis is the statistical technique for


modeling and investigating the relationship
between variables
❑ In such the relationship there is a dependent
variable and an independent variable.
❑ Dependent variable which is the one that is
determined by the other variable. It is also
known as response, endogenous, criterion,
regressand or outcome variable
4/15/2024 22
Simple Linear Regression
Introduction (Continued)

❑ Independent variable is the one which determines the


other variable. This is also known as predictor,
regressor, explanatory, or exogenous variable.
❑ For example the relationship between age and blood
pressure in humans, blood pressure may be
considered as dependent variable and age the
dependent variable.
❑ In business, Expenditure can be dependent variable
while income serves as an independent variable
4/15/2024 23
Simple Linear Regression
Introduction (Continued)

❑ Generally, the dependence relationship of the


outcome variable on the predictor variable is
what is called Regression
❑ The term simple regression refers to the simplest
kind of regression, one in which only two
variables are considered that is One dependent
variable and one independent variable

4/15/2024 24
Simple Linear Regression
Introduction (Continued)

❑ The adjective linear may be used to refer to the


relationship between the two variables being in
the straight line, but to the statistician it
describes the additive relationship of the two
parameters in the regression model.

4/15/2024 25
Simple Linear Regression Model/Equation
The concept of a straight line

Example
❑ It is believed that, age of a sparrow bird is one of
the factors that determines its wing length.
Suppose you have been provided with age and
wing length data from a sample of 13 birds as
indicated in the following Table.
i. Present the data with an aid of a scatter plot
ii. Draw a straight line to fit the points

4/15/2024 26
Simple Linear Regression Model/Equation
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0

4/15/2024 27
Simple Linear Regression Model/Equation
The concept of a straight line
❑ The Scatter plot to show age and wing length
data

4/15/2024 28
Simple Linear Regression Model/Equation
The concept of a straight line
❑ The straight line drawn to fit the points

4/15/2024 29
Simple Linear Regression Model/Equation
The concept of a straight line

❑ Mathematically the equation of the line that


describe wing length as the function of age is
given by

4/15/2024 30
Simple Linear Regression Model/Equation
The challenge of the straight line

❑ No matter what ever kinds of a straight line that


we would like to draw under the given the
mathematical equation so as to fit the points in
the scatter diagram, there will be considerable
variability of data around that line.
❑ The variability of a point from the straight line is
called an error or a residue.

4/15/2024 31
Simple Linear Regression Model/Equation
The challenge of the straight line

4/15/2024 32
Simple Linear Regression Model/Equation
Introduction of the regression model

❑ Since all data points don’t fall on the straight line,


we now seek to define what is commonly termed
the “best fit” line through the data.
❑ The ‘best fit’ line is the one that take into account
of the mathematical functional form and an error
or residue.
❑ Such kind of the functional form/line/model is
the linear regression model

4/15/2024 33
Simple Linear Regression Model/Equation
Introduction of the regression model
❑ The general form of a simple linear regression model is
as follows;

Y- Dependent Variable
X- Independent Variable
β0 – Intercept coefficient (model parameter)
β1 – Slope coefficient (model parameter)
 – Residual or error, stochastic, disturbance term

4/15/2024 34
Simple Linear Regression Model/Equation
Introduction of the regression model

❑ For an individual observation linear regression


model can be defined as;

4/15/2024 35
Simple Linear Regression Model/Equation
Terms in the regression model
β0 – intercept coefficient
-It is taken when X=0
-The average (expected) value of Y (dependent)
variable without an influence of an X (explanatory)
variable .
β1 – Slope coefficient
-Express the rate of change in Y for unit change in X
-The average (expected) change in Y (explained)
variable brought about by a unit change observed in
X (independent) variable.
4/15/2024 36
Simple Linear Regression Model/Equation
Terms in the regression model

Error term
 - residual, error, stochastic, or disturbance term.
-Explains the influence of other variable not
included in the model (apart from given
independent variable)

4/15/2024 37
Simple Linear Regression Model/Equation
Interpretation of regression parameters

Example
❑ Suppose you were provided with the following
estimated simple linear regression models and
you are required to interpret them.

4/15/2024 38
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
First equation;
Intercept term/parameter
❑ The estimated value of a dependent variable (Y) is
4.35 units in the absence or when there is no
influence of the independent variable (X)
Slope coefficient/parameter
❑ The dependent variable (Y) increases by 1.56 units
when the independent variable (X) increases by a unit
and vice versa is true (Positive linear relationship)

4/15/2024 39
Simple Linear Regression Model/Equation
Interpretation of regression parameters
Solution.
Second equation;
Intercept term/parameter
❑ The estimated value of a dependent variable (Y) is
-2.75 units in the absence or when there is no
influence of the independent variable (X)
Slope coefficient/parameter
❑ The dependent variable (Y) decreases by 0.675 units
when the independent variable (X) increases by a unit
and vice versa is true (Negative linear relationship)

4/15/2024 40
Simple Linear Regression Model/Equation
Assumptions of the Simple linear regression

❑ Linearity: The relationship between X and Y must


be linear.
❑ Independence of errors: There is no relationship
between the residuals and the Y variable; in
other words Y is independent of errors.
❑ Normality of errors: The residuals must be
approximately normally distributed.
❑ Equal variances: The variance of the residuals is
similar for all values of X.
4/15/2024 41
Fitting the simple regression line
❑ To fit the simple regression line means to
estimate the parameters of the simple linear
regression model. That is β0 and β1.
❑ There are several approaches to estimate the
mentioned regression parameters.
❑ The most common approach is Ordinary Least
Square (OLS) method because its estimated
parameters are Best Unbiased Linear Estimators
(BLUE) –(Based on Gauss-Makov Theorem)

4/15/2024 42
Fitting the simple regression line
Ordinary Least Square Method

❑ Ordinary least square method is an approach that


is used to estimate the parameters of the linear
regression models by minimizing the sum of
squared error of the regression models .

4/15/2024 43
Fitting the simple regression line
Estimation of model parameters

❑ From OLS method the parameter estimates are;

4/15/2024 44
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Example
❑ It is believed that, age of a sparrow bird is one of the
factors that determines its wing length. Suppose you
have been provided with age and wing length data from
a sample of 13 birds as indicated in the following Table.

i. Estimate the linear regression model by using OLS


method and interpret the relationship between age
and wing length of the bird.
ii. Predict wing length of a bird having 19 days

4/15/2024 45
Fitting the simple regression line
No Age (days) Wing length (cm)
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0

4/15/2024 46
Fitting the simple regression line
Estimation of β0 and β1 by using OLS

Solution
❑ The estimated simple linear Regression Model is
given by;

4/15/2024 47
Fitting the simple regression line
Estimation of β0 and β1 by using OLS

Solution
❑ The parameters in the model are calculated as;

4/15/2024 48
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0
n=13
4/15/2024 49
Fitting the simple regression line
No X Y X2 XY
1 3.0 1.4 9 4.2
2 4.0 1.5 16 6
3 5.0 2.2 25 11
4 6.0 2.4 36 14.4
5 8.0 3.1 64 24.8
6 9.0 3.2 81 28.8
7 10.0 3.2 100 32
8 11.0 3.9 121 42.9
9 12.0 4.1 144 49.2
10 14.0 4.7 196 65.8
11 15.0 4.5 225 67.5
12 16.0 5.2 256 83.2
13 17.0 5.0 289 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum(XY)=514.8
4/15/2024 50
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

4/15/2024 51
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

4/15/2024 52
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

4/15/2024 53
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

4/15/2024 54
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution

❑ Therefore, the estimated simple linear regression model


is given by;

4/15/2024 55
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
❑ In other words

❑ This means that, other factors remain constant Wing


length of a sparrow bird increases by 0.2702cm per day
❑ Normally we do not make interpretation when X=0. If
necessary under this case we case we can comment that
a sparrow bird is having a wing length of 0.7134cm just
after hatchling other factors remain constant
4/15/2024 56
Fitting the simple regression line
Estimation of β0 and β1 by using OLS
Solution
❑ The predicted wing length of a bird with 19 days is given
by;

❑ Therefore, the predicted wing length of a bird with 19


days is 5.8472cm, other factors remain constant

4/15/2024 57
Data, Estimated(fitted) value, Residual

Example
❑ Refer to the Sparrow bird example. Use the
estimated regression model to prepare a table
consisting of the following columns;

i. Dependent variable (Data)


ii. Estimated or fitted value
iii. Residuals or Errors

4/15/2024 58
Data, Estimated(fitted) value, Residual

4/15/2024 59
Data, Estimated(fitted) value, Residual

❑ Note; One of the properties of the Least square


estimated models is that the summation of errors
or residuals across all observations in a sample is
equal to zero. Mathematically it can be expressed
as;

4/15/2024 60
Total variability of an outcome variable
❑ Total variability of an outcome variable refers
to the square summation of the deviation of
the dependent variable (Y) from its central
value (Mean)
❑ Total variability of the dependent variable (Y)
can be partitioned or broken down into
i. Explained variability (variability due to
estimated regression model)
ii. Unexplained variability (variability due to
errors or residual)
4/15/2024 61
Total variability of an outcome variable
❑ Total variability of the dependent variable Y is
also known as Total Sum Square (TSS) or Sum
of Square Total (SSTotal)
❑ Total variability due to regression model is
also known as Explained Sum Square (ESS) or
Sum of Square Regression (SSRegression)
❑ Total variability due to error or residual is also
known as Residual Sum Square (RSS) or Sum
of Square Residual (SSResidual)

4/15/2024 62
Total variability of an outcome variable
❑ Mathematically, partition of the total variability
of the dependent variable Y can be presented
as follows;

4/15/2024 63
Total variability of an outcome variable
❑ Further more;

4/15/2024 64
Coefficient of determination
❑ The coefficient of determination refers to the
proportion or percentage of the total
variability in Y (dependent variable) that is
explained or accounted for by a fitted
regression model.
❑ Coefficient of determination is the measure
of goodness of fit of the regression model.
❑ Coefficient of determination is denoted by
R2.

4/15/2024 65
Coefficient of determination
❑ Mathematically coefficient of determination
is computed as;

4/15/2024 66
Coefficient of determination
❑ The components that are used to compute
the coefficient of determination can also be
obtained as;

4/15/2024 67
Coefficient of determination
❑ Note: For the case of simple linear
regression coefficient of determination is
equal to the squared value of the coefficient
of correlation between an independent and a
dependent variable. This is expressed as;

4/15/2024 68
Coefficient of determination
Interpretation
Usually: 0 ≤ R2 ≤ 1
❑ When R2 is very close to 1, implies a very good
fit; in other words the variability in Y (dependent
variable) is highly explained by the variability of
X’s in the model (independent variables).

❑ When R2 is very close to 0, implies the poor fit,


in other words the variability in Y (dependent
variable) is less explained by the variability of X’s
in the model (independent variables).
4/15/2024 69
Coefficient of determination
Interpretation
❑ R2 is usually expressed in terms of percentage.
That is;
R2 *100
Example:
❑ If R2 =0.48
❑ We say about 48% i.e (0.48*100) of the
variability in Y is explained by variability in X’s.
❑ This also Implies that; about 52% of variability
in Y is explained by other variables not included
in the model. i.e residual.
4/15/2024 70
Computation of R2
Example
❑ Refer to the Sparrow bird example. Use the
sample data and estimated regression model to
compute the following and interpret your results;

i. Coefficient of determination

4/15/2024 71
Computation of R2
Solution
❑ The estimated linear regression model was given
by

❑ Where β1 =0.2702
❑ Now, consider the next table

4/15/2024 72
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4
2 4.0 1.5
3 5.0 2.2
4 6.0 2.4
5 8.0 3.1
6 9.0 3.2
7 10.0 3.2
8 11.0 3.9
9 12.0 4.1
10 14.0 4.7
11 15.0 4.5
12 16.0 5.2
13 17.0 5.0

4/15/2024 73
Computation of R2
No X Y X2 Y2 XY
1 3.0 1.4 9 1.96 4.2
2 4.0 1.5 16 2.25 6
3 5.0 2.2 25 4.84 11
4 6.0 2.4 36 5.76 14.4
5 8.0 3.1 64 9.61 24.8
6 9.0 3.2 81 10.24 28.8
7 10.0 3.2 100 10.24 32
8 11.0 3.9 121 15.21 42.9
9 12.0 4.1 144 16.81 49.2
10 14.0 4.7 196 22.09 65.8
11 15.0 4.5 225 20.25 67.5
12 16.0 5.2 256 27.04 83.2
13 17.0 5.0 289 25 85
n=13 Sum (X)=130 Sum(Y)=44.4 Sum (X2)=1562 Sum (Y2)=171.3 Sum(XY)=514.8

4/15/2024 74
Computation of R2
Solution
❑ Coefficient of determination can be obtained as;

4/15/2024 75
Computation of R2
Solution

❑ This indicates that the estimated linear regression


model can explain 97.3% of the variations of the
wing length of the bird.
4/15/2024 76
The End

4/15/2024 77

You might also like