You are on page 1of 29

Regression

PROPERTIES OF REGRESSION COEFFICIENTS


Relationship between Two Variables

1. Family Income and Expenditure


2. Study hours and Performance in an examination
3. Daily Calory intake and Weight
4. Amount of fertilizer used and Crop yield
5. Consumer Price index and Wholesale Price Index
6. Population size and GDP
7. Amount of chemical used and strength of cement

BY DR. GARGI TYAGI


Measures for Degree of Linear Relationship

• Karl Pearson’s Correlation Coefficient

• Spearman’s Rank Correlation Coefficient

-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-1 0 1

Suppose we want to know if one increases his/her daily calory intake by 100 units,
how much it will affect the weight of the person?
Regression Analysis

Regression Analysis is

a mathematical measure of average relationship

between two or more variables

in terms of the original units of the data.


Regression

The term first used by British Biometrician


Sir Francis Galton

Sir Francis Galton


16 February 1822 – 17 January 1911
Galton’s data on Father's and Son's heights
80

75
SON'S HEIGHT (INCHES)

70

65

60

55

50
50 55 60 65 70 75 80
FATHER'S HEIGHT (INCHES)
Regression
• Meaning: “Stepping back towards the average”
• Galton found that offsprings of abnormally tall or short parents tend to “regress”
or “step back” to the average population height.

Stepping back towards the average


“Stepping back towards the average”
It has historical meaning.
Now the term regression is used in general sense.
Regression Analysis

The study of the dependence of one variable, the dependent variable,


on one or more other variables, the explanatory variables,
with a view to estimating average value of the former
in terms of the known or fixed values of the latter.
Some Examples
Amount of fertilizer Yield of a crop

Income Expenditure

Family size Electricity Consumption

Education Year Wages

Independent variable/ Dependent variable/


Explanatory variable/ Explained variable/
Regressor Regressand
Marks in Mathematics Marks in Statistics Marks in Mathematics and Statistics
(X) (Y) 100
75 85

Marks in Statistics
30 45 80
60 54 60
80 91
53 58
40
35 63 20
15 35
0
40 45
0 20 40 60 80 100
35 45
Marks in Mathematics
48 44

𝑟 𝑋, 𝑌 = 0.86 We need to find the equation which best


Suppose we wish to estimate marks in Statistics explains the relationship,
when a student gets 85 marks in Mathematics. in this case equation of straight line.
Which line to chose?
Marks in Mathematics and Statistics
100
90
80
Marks in Statistics

70
60 Line of best fit
50 or we can say
40 a line which best fits the data
30
20
10
0
0 20 40 60 80 100
Marks in Mathematics
How to find a line which best fits the data?

One method is

Principle of Least Squares


Fitting a straight line
using Principle of Least Squares

Let us consider the following equation of straight line

𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
Where
• (𝑋𝑖 , 𝑌𝑖 ) is the 𝑖𝑡ℎ pair of observations on the dependent variable and
independent variables,
• 𝑛 is the sample size or the total pairs of observations on 𝑋 and 𝑌.
• 𝑎 and 𝑏 are unknown constants of the equation.
• 𝜖 is the error term.
Principle of Least Squares
Estimate the values of unknown constants by minimizing the residual sum of squares.

Let 𝑌෠𝑖 be estimated value of 𝑌 for a 100


90
given value of 𝑋𝑖 . 80
70
෠ 𝑖
𝑌෡𝑖 = 𝑎ො + 𝑏𝑋 60

Also, let 50

Y
40
𝑒𝑖 = 𝑌𝑖 − 𝑌෠𝑖 , 𝑖 = 1,2, … , 𝑛. 30
20
𝑒𝑖 is the residual associated with the 10

estimation of 𝑖𝑡ℎ observation on 𝑌. 0


0 20 40 60 80 100
X
Principle of Least Squares
𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖 + 𝜖𝑖 , 𝑖 = 1,2, … , 𝑛
෠ 𝑖 , 𝑖 = 1,2, … , 𝑛
𝑌෡𝑖 = 𝑎ො + 𝑏𝑋
෠ 𝑖 , 𝑖 = 1,2, … , 𝑛
𝑒𝑖 = 𝑌𝑖 − 𝑌෠𝑖 = 𝑌𝑖 − 𝑎ො − 𝑏𝑋
෠ such that σ𝑛𝑖=1 𝑒𝑖2 is minimum.
Find the values of 𝑎ො and 𝑏,
Principle of Least Squares
Regression Line
The equation of regression line is given as
𝑌𝑖 = 𝑎 + 𝑏𝑋𝑖 + 𝜖𝑖
Using principle of Least squares, the following normal equations are obtained:
σ𝑛𝑖=1 𝑌𝑖 = 𝑛𝑎ො + 𝑏෠ σ𝑛𝑖=1 𝑋𝑖
2
σ𝑛𝑖=1 𝑌𝑖 𝑋𝑖 = 𝑎ො σ𝑛𝑖=1 𝑋𝑖 + 𝑏෠ σ𝑛𝑖=1 𝑋𝑖

The estimators of 𝑎 and 𝑏 are obtained as follows:


𝐶𝑜𝑣 𝑋,𝑌 𝜎𝑌
𝑏෠ = 2 = 𝑟 and 𝑎ො = 𝑌ത − 𝑏෠ 𝑋ത
𝜎𝑋 𝜎𝑋

ത 𝑌).
The first normal equation tells that the line passes through the sample means (𝑋, ത

The slope coefficient 𝑏෠ is also called coefficient of regression of 𝑌 on 𝑋, denoted as 𝑏𝑌𝑋 .


Regression Line

The regression line (line of best fit) of 𝑌 on 𝑋 is a line which has slope 𝑏෠ and passes
ത 𝑌),
through (𝑋, ത i.e.
𝜎𝑌
𝑌 − 𝑌ത = 𝑟 𝑋 − 𝑋ത
𝜎𝑋
𝑌 − 𝑌ത = 𝑏𝑌𝑋 𝑋 − 𝑋ത
𝜎
Where 𝑏𝑌𝑋 = 𝑟 𝑌 is called the regression coefficient of 𝑌 on 𝑋.
𝜎𝑋

The regression coefficient 𝑏𝑌𝑋 tells the change in the value of 𝑌 with a unit change in
𝑋 on 𝑌.
Regression Line
X on Y

𝑋𝑖 = 𝑎′ + 𝑏′𝑌𝑖 + 𝜖′𝑖 , 𝑖 = 1,2, … , 𝑛


2 2
σ𝑛𝑖=1 𝑒𝑖′2 = σ𝑛𝑖=1 𝑋𝑖 − 𝑋෠𝑖 = σ𝑛𝑖=1 𝑋𝑖 − 𝑎෡′ − 𝑏෡′ 𝑖 𝑌𝑖

The regression line of 𝑋 on 𝑌 can be written as:


𝜎𝑋

𝑋−𝑋 =𝑟 𝑌 − 𝑌ത .
𝜎𝑌
Regression Line
X on Y

The regression line of 𝑋 on 𝑌 can be written as:


𝜎𝑋

𝑋−𝑋 =𝑟 𝑌 − 𝑌ത .
𝜎𝑌
Why Two Regression Lines?

The regression line of 𝒀 on 𝑿


𝑌 = 𝑌ത + 𝑏𝑌𝑋 𝑋 − 𝑋ത
gives the best fit value of 𝒀 for a given value of 𝑿,
whereas
the regression line of 𝑋 on 𝑌
𝑋 = 𝑋ത + 𝑏𝑋𝑌 𝑌 − 𝑌ത
gives the best fit value of 𝑋 for a given value of 𝑌.
Estimate marks in Statistics when a student gets 85 marks in Mathematics.
Marks in Marks in Marks in Mathematics and Statistics
Mathematics (𝑋) Statistics (𝑌)
75 85 100
30 45

Marks in Statistics
Example

80
60 54
80 91 60
53 58
40
35 63
15 35 20
40 45
35 45 0
48 44 0 20 40 60 80 100
471 565 Marks in Mathematics

𝑟 𝑋, 𝑌 = 0.86
Marks in Mathematics Marks in Statistics
(𝑋) (𝑌) 𝑋2 𝑌2 𝑋𝑌
75 85 5625 7225 6375
30 45 900 2025 1350
60 54 3600 2916 3240
80 91 6400 8281 7280
Example

53 58 2809 3364 3074


35 63 1225 3969 2205
15 35 225 1225 525
40 45 1600 2025 1800
35 45 1225 2025 1575
48 44 2304 1936 2112
471 565 25913 34991 29536

𝑋ത = 47.1, 𝑌ത = 56.5
𝜎𝑋 = 19.31, 𝜎𝑌 = 17.52
𝑟 𝑋, 𝑌 = 0.86
Marks in Mathematics Marks in Statistics
(𝑋) (𝑌) Estimated Marks in Statistics when a student
75 85
gets 85 marks in Mathematics.
30 45
Example

60 54
80 91
53 58
35 63
15 35
40 45
35 45
48 44

𝑌෠ = 19.56 + 0.78 𝑋
𝑋෠ = −6.748 + 0.95 𝑌
Summary

• What is Regression Analysis?


• Derivation of regression lines using principle of least squares
• Why two regression lines?
Thank You
Properties of Regression Line
Regression versus Causation

In the words of Kendall and Stuart

“A statistical relationship,

however strong and however suggestive,

can never establish causal connection:

our ideas of causation must come from outside statistics,

ultimately from some theory or other.”

You might also like