You are on page 1of 26

Syndicated Learning Program – II (SLP-II)

Regression Analysis

Dr. Rishabh Rathore


ICFAI Business School, Kochi
Department of Operations & IT
Independent and dependent variables
Regression and correlation analyses are based on the relationship, or association, between two
(or more) variables.
The known variable (or variables) is called the independent variable(s).
The variable we are trying to predict is the dependent variable or the variable used to predict or
explain the dependent variable
Introduction to Regression Analysis
▪ Predict the value of dependent variable based on the value of at least one dependent
variable
▪ Explain the impact of changes in an independent variable on the dependent variable.
▪ The statistical technique of estimating the unknown value of one variable (i.e., dependent
variable) from the known value of other variable (i.e., independent variable) is called
regression analysis.
▪ How the typical value of the dependent variable changes when any one of the independent
variables is varied, while the other independent variables are held fixed.

▪ Example:
𝑆𝑎𝑙𝑒𝑠 = 𝑓 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒
𝑃𝑟𝑖𝑐𝑒 = 𝑓 𝑄𝑢𝑎𝑙𝑖𝑡𝑦
𝑃𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛 = 𝑓 𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑚𝑒𝑛𝑡
Scatter Diagram Method
Estimation using the Regression Line
The equation for a straight line where the dependent variable Y is determined by the
independent variable X is:

• Using this equation, we can take a given value of X and compute the value of Y.
• The a is called the Y-intercept because its value is the point at which the regression line crosses the Y-
axis—that is, the vertical axis.
• The b in Equation is the slope of the line.
• It represents how much each unit change of the independent variable X changes the dependent
variable Y. Both a and b are numerical constants because for any given straight line, their values do not
change.
One Independent Variable and one dependent variable: 𝑌 = 𝛼0 + 𝛼1 𝑋1

Two or More independent Variables – Multiple regression analysis


𝑌 = 𝛼0 + 𝛼1 𝑋1 + 𝛼2 𝑋2 +𝛼3 𝑋3 … … … 𝛼𝑚 𝑋𝑚

Y=0.9+1.2X
𝑆𝑎𝑙𝑒𝑠 = 𝑓 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒

𝑌 = 0.9 + 1.2𝑋1 + 3𝑋2 + 5𝑋3


𝑆𝑎𝑙𝑒𝑠 = 𝑓 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒, 𝑃𝑟𝑜𝑑𝑢𝑐𝑡 𝑞𝑢𝑎𝑙𝑖𝑡𝑦, 𝑃𝑟𝑜𝑓𝑖𝑡
Result – Dependent Variable
Roll Number – Independent
Percentage - Independent

Result = F(Roll Number, Percentage)


35%
Simple Linear Regression Analysis
Straight Line Equation: 𝒀 = 𝒂 + 𝒃𝑿

◦ Only one independent variable 𝑿


◦ Relationship between 𝒀 and 𝑿 is described by a linear function
◦ Changes in 𝒀 are assumed to be related to changes in 𝑿
Importance of Regression Analysis
➢It provides estimate of values of dependent variables from values of independent

variables.

➢It can be extended to 2 or more variables, which is known as multiple regression.

➢It shows the nature of relationship between two or more variable.


Coefficients of Regression
◦ Let us consider the line of regression of 𝒀 on 𝑿;
𝒀 = 𝒂 + 𝒃𝑿
◦ The coefficients of 𝒃 which is the slope of the line of regression of 𝒀 on 𝑿
called the coefficients of regression of 𝒀on 𝑿.

◦ It represents how much each unit change of the independent variable 𝑿


changes the dependent variable 𝒀.

◦ It also represents the rate of change of 𝒀 w.r.t. 𝑿.

◦ The coefficients of regression of 𝒀on 𝑿 is written is 𝑏𝑦𝑥


Coefficients of Regression
◦ Let us consider the line of regression of 𝑿 on 𝒀;
𝑿 = 𝒂 + 𝒃𝒀

◦ It represents how much each unit change of the independent variable 𝒀


changes the dependent variable 𝑿.

◦ It also represents the rate of change of 𝑿 w.r.t. 𝒀.

◦ The coefficients of regression of 𝑿on 𝒀 is written is 𝑏𝑥𝑦


Some important formulae
◦ Equation of the line of regression of 𝑦 on 𝑥

ഥ = 𝒃𝒚𝒙 𝒙 − 𝒙
𝒚−𝒚 ഥ

◦ Equation of the line of regression of 𝑥 on 𝑦

ഥ = 𝒃𝒙𝒚 𝒚 − 𝒚
𝒙−𝒙 ഥ

The coefficients of regression of 𝒀on 𝑿 is written is 𝑏𝑦𝑥


The coefficients of regression of 𝑿on 𝒀 is written is 𝑏𝑥𝑦
Some important formulae
◦ Remark 1: For numerical computations of the equations of line of regression of 𝑦 on 𝑥, and 𝑥 on 𝑦,
the following formulae for the regression coefficients 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are very convenient to use

σ 𝒙−ഥ
𝒙 𝒚−ഥ 𝒚 σ 𝒅𝒙𝒅𝒚
𝒃𝒚𝒙 = =
𝒙 𝟐
σ 𝒙−ഥ σ 𝒅𝒙𝟐

σ 𝒙−ഥ
𝒙 𝒚−ഥ 𝒚 σ 𝒅𝒙𝒅𝒚
𝒃𝒙𝒚 = ഥ 𝟐
σ 𝒚−𝒚
= σ 𝒅𝒚𝟐

Note: If the standard deviation 𝜎𝑥 , and 𝜎𝑦 is known then we can also write the formula
𝜎𝑦
𝒃𝒚𝒙 =𝑟 𝑎𝑛𝑑
𝜎𝑥
𝜎𝑥
𝒃𝒙𝒚 = 𝑟
𝜎𝑦
Theorem on Regression Coefficients
Theorem 1: The correlation coefficient is the geometric mean between the regression
coefficient
𝒓𝟐 = 𝒃𝒚𝒙 . 𝒃𝒙𝒚
𝒓 = ± 𝒃𝒚𝒙 . 𝒃𝒙𝒚

(Note: if 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are positive, then 𝑟 is positive; and if 𝑏𝑦𝑥 and 𝑏𝑥𝑦 are negative, then 𝑟 is
negative)

Theorem 2: If one of the regression coefficient is greater than one, then other must be less than
one
Summary
Notation Formula Alternate Formula If SD is given
Line of regression (𝐲 on 𝐱 ) 𝑌 = 𝑎 + 𝑏𝑋 - -
Line of regression (𝐱 on 𝐲 ) 𝑋 = 𝑎 + 𝑏𝑌 - -
Equation of the line of regression 𝑦 − 𝑦ത = 𝑏𝑦𝑥 𝑥 − 𝑥ҧ - -
of 𝐲 on 𝐱
Equation of the line of regression 𝑥 − 𝑥ҧ = 𝑏𝑥y 𝑦 − 𝑦ത - -
of 𝐱 on 𝐲
Coefficients of regression 𝐲 on 𝐱 σ 𝑥−𝑥ҧ 𝑦−𝑦ത σ 𝑑𝑥𝑑𝑦 𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦 𝜎𝑦
𝑏𝑦𝑥 = σ 𝑥−𝑥ҧ 2
= σ 𝑑𝑥 2 𝑏𝑦𝑥 = 𝒃𝒚𝒙 =𝑟
𝑛 σ 𝑥2 − σ 𝑥 2 𝜎𝑥
𝜎𝑥
Coefficients of regression 𝐱 on 𝐲
𝑏𝑥𝑦 =
σ 𝑥−𝑥ҧ 𝑦−𝑦ത
=
σ 𝑑𝑥𝑑𝑦 𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦 𝒃𝒙𝒚 = 𝑟
σ 𝑦−𝑦ത 2 σ 𝑑𝑦 2 𝑏𝑥y = 𝜎𝑦
𝑛 σ 𝑦2 − σ 𝑦 2
Coefficients of correlation - -
𝑟 = ± 𝑏𝑦𝑥 . 𝑏𝑥𝑦
𝑆𝑎𝑙𝑒𝑠 = 𝑓 𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒
𝑌 = −240.343 + 57.934𝑋
The Multiple Regression Model
In case of multiple regression analysis where more than one explanatory variable is used, the
above probabilistic model can be extended to more than one independent variable and the probabilistic
model can be presented as multiple probabilistic regression model as:
Multiple regression model with k independent variables:
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 …….. 𝛽𝑘 𝑥𝑘 + 𝜀
Where
• 𝑦𝑖 is the value of the dependent variable for ith value, 𝛽0 the y intercept,
•𝛽1 the slope of y with independent variable 𝑥1 holding variables 𝑥2 , 𝑥3 … … 𝑥𝑘 constant,
•𝛽2 the slope of y with independent variable 𝑥2 holding variables 𝑥1 , 𝑥3 … … 𝑥𝑘 constant,
•𝛽3 the slope of y with independent variable 𝑥3 holding variables 𝑥1 , 𝑥2 , 𝑥3 … … 𝑥𝑘 constant,
• 𝛽𝑘 the slope of y with independent variable 𝑥𝑘 holding variables 𝑥1 , 𝑥2 , 𝑥3 … … 𝑥𝑘−1 constant, and
•𝜀 the random error in y for observation i
Multiple Regression Model with Two
Independent Variables
A consumer electronics company has adopted an aggressive policy to increase sales of a newly launched product. The company
has invested in advertisements as well as employed salesmen for increasing sales rapidly. Table presents the sales, the number of
employed salesmen, and advertisement expenditure for 24 randomly selected months. Develop a regression model to predict the
impact of advertisement and the number of salesmen on sales.

On the basis of the multiple regression model, predict the sales of a given month when the number of salesmen employed are 35
and advertisement expenditure is 500 thousand rupees.

You might also like