You are on page 1of 22

Lecture 4

Multiple Regression
➢ In simple linear regression we developed a procedure for obtaining a linear
equation that predicts a dependent variable as a function of a single
independent or exogenous variable.
➢ However, in many situations several independent variables jointly
influence a dependent variable.
➢ Multiple regression enables us to determine the simultaneous effect of
several independent variables on a dependent variable using the least
squares principle.
Examples
➢ The quantity of goods sold is a function of price, income, advertising,
price of substitute goods, and other variables.
➢ Salary of an employee is a function of experience, education, age, and
job ranking.
➢ Current market value of home is function of square feet of living area,
location (indicator for zone of city), appraised value last year, and quality
of construction (price per square foot).
Multiple Linear Regression Model
Let Y denotes the dependent (or study) variable that is linearly related to k
independent (or explanatory) variables 𝑋1 , 𝑋2 , … … … . , 𝑋𝑘 through the parameters
𝛽1 , 𝛽2 , … … . , 𝛽𝑘 and we write
𝑌 = 𝛽0 + 𝛽1 𝑋1 + ⋯ … … … … . +𝛽𝑘 𝑋𝑘 + 𝜀
[Response] = [mean (dependent on 𝑋1 , 𝑋2 , … … … … … . . , 𝑋𝑟 )] + [𝑒𝑟𝑟𝑜𝑟]
This is called the multiple linear regression model.
➢ The parameters 𝛽1 , … … . . , 𝛽𝑘 are the regression coefficients associated with
𝑋1 , 𝑋2 , … … … … . , 𝑋𝑘 respectively and 𝛽0 is the y-intercept.
➢  is the random error component reflecting the difference between the
observed and fitted linear relationship.
❖ Note that the jth regression coefficient 𝛽𝑗 represents the expected change in Y
per unit change in the jth independent variable 𝑋𝑗 .
➢ The term "linear" refers to the fact that the mean is a linear function of the
unknown parameters 𝛽0 , 𝛽1 , … … … … . . , 𝛽𝑟 .
➢ The predictor variables may or may not enter the model as first-order terms.
✓ The term "first order'' means that the first derivative of y appears, but
no higher order derivatives do.

Linear model:
A model is said to be linear when it is linear in parameters.
For example,
i) 𝑌 = 𝛽0 + 𝛽1 𝑋1 is a linear model as it is linear in the parameters.
ii) 𝑌 = 𝛽0 𝑋𝛽1 can be written as

𝑙𝑜𝑔𝑦 = log 𝛽0 + 𝛽1 𝑙𝑜𝑔𝑋1

𝑦 ∗ = 𝛽0 ∗ + 𝛽1 𝑥 ∗
which is linear in the parameter 𝛽0 ∗ and 𝛽1 , but nonlinear in variables 𝑦 ∗ =
log 𝑦, 𝑥 ∗ = log 𝑥. So it is a linear model.
(iii) 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋 2
is linear in parameters 𝛽0 , 𝛽1 and 𝛽2 but it is nonlinear in variables X. So, it is a
linear model
𝛽1
(iv) 𝑌 = 𝛽0 +
𝑋−𝛽2

is nonlinear in the parameters and variables both. So, it is a nonlinear model.


(v) 𝑌 = 𝛽0 + 𝛽1 𝑋𝛽2
is nonlinear in the parameters and variables both. So, it is a nonlinear model.
(vi) 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋 2 + 𝛽3 𝑋 3
is a cubic polynomial model which can be written as
(vi) 𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝛽3 𝑋3
which is linear in the parameters 𝛽0 , 𝛽1 , 𝛽2 and 𝛽3 and linear in the variables 𝑋1 ,
𝑋2 = 𝑋 2 , 𝑋3 = 𝑋 3 . So, it is a linear model.
Example:
The income and education of a person are related. It is expected that, on average,
a higher level of education provides higher income.
So, a simple linear regression model can be expressed as
𝐼𝑛𝑐𝑜𝑚𝑒 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝜀
Note that
➢ 𝛽1 reflects the change in income with respect to per unit change in education
and
➢ 𝛽0 reflects the income when education is zero as it is expected that even an
illiterate person can also have some income.
Further, this model neglects that most people have higher income when they are
more experienced than when they are less experienced, regardless of education. So
𝛽1 will over-state the marginal impact of education. So, a better model is
𝐼𝑛𝑐𝑜𝑚𝑒 = 𝛽0 + 𝛽1 𝐸𝑑𝑢𝑐𝑎𝑡𝑖𝑜𝑛 + 𝛽2 𝑒𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝜀
This is how we proceed for regression modeling in real-life situation.
➢ One needs to consider the experimental condition and the phenomenon
before making the decision on how many, why and how to choose the
dependent and independent variables.

Model Development:
Let an experiment be conducted n times, and the data is obtained as follows:
Observation number Response Explanatory variables
y 𝑋1 , 𝑋2 , … , 𝑋𝑘
1 𝑦1 𝑥11 , 𝑥21…… … 𝑥𝑘1
2 𝑦2 𝑥12 , 𝑥22…… … 𝑥𝑘2

n 𝑦𝑛 𝑥1𝑛 , 𝑥2𝑛…… … 𝑥𝑘𝑛

Assuming that the model is


𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + ⋯ … + 𝛽𝑘 𝑋𝑘 + 𝜖
the n-tuples of observations are also assumed to follow the same model. Thus, they
satisfy
𝑦1 = 𝛽0 + 𝛽1 𝑥11 + 𝛽2 𝑥21 + ⋯ … + 𝛽𝑘 𝑥𝑘1 + 𝜖1
𝑦2 = 𝛽0 + 𝛽1 𝑥12 + 𝛽2 𝑥22 + ⋯ … + 𝛽𝑘 𝑥𝑘2 + 𝜖2


𝑦𝑛 = 𝛽0 + 𝛽1 𝑥1𝑛 + 𝛽2 𝑥2𝑛 + ⋯ … + 𝛽𝑘 𝑥𝑘𝑛 + 𝜖𝑛
where the error terms are assumed to have the following properties:
1. 𝐸(𝜀𝑖 ) = 0;
2. 𝑉𝑎𝑟(𝜀𝑖 ) = 𝜎 2 (𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡); 𝑎𝑛𝑑 (1)
3. 𝐶𝑜𝑣(𝜀𝑖 , 𝜀𝑗 ) = 0, 𝑗 ≠ 𝑘
These n equations can be written in matrix form as
𝑦1
1 𝑥11 𝑥12 ⋯ 𝑥1𝑘 𝛽0 𝜀1
𝑦2
𝜀2
⋮ = [1 𝑥21 𝑥22 ⋯ 𝑥2𝑘 𝛽1
][ ] + [ ⋮ ]
⋮ ⋮ ⋮ ⋮ ⋱ ⋮ ⋮
1 𝑥𝑛1 𝑥𝑛2 ⋯ 𝑥𝑛𝑘 𝛽𝑘 𝜀𝑛
[𝑦𝑛 ]

In general, the model with k explanatory variables can be expressed as


𝑦 = 𝑋𝛽 + 𝜀
where 𝑦 = (𝑦1 , 𝑦2 , … … … . , 𝑦𝑛 )′ is a n1 vector of n observation on study variable,
1 𝑥11 𝑥12 ⋯ 𝑥1𝑘
1 𝑥21 𝑥22 ⋯ 𝑥2𝑘
𝑋=[ ]
⋮ ⋮ ⋮ ⋱ ⋮
1 𝑥𝑛1 𝑥𝑛2 ⋯ 𝑥𝑛𝑘

is a n(k+1) designed matrix of n observations on each of the k explanatory


variables, 𝛽 = (𝛽0 , 𝛽1 , , 𝛽2 , … … … . , , 𝛽𝑘 )′ is a (k +1) 1 vector of regression
coefficients and 𝜀 = ( 𝜀1 , , 𝜀2 , … … … . , , 𝜀𝑛 )′ is a n1 vector of random error
components or disturbance term.
and the specification in (1) become
1. 𝐸(𝜀) = 0; and
2. 𝐶𝑜𝑣(𝜀) = 𝐸(𝜀𝜀 ′ ) = 𝜎 2 𝐼

Example: Determine the linear regression model for fitting a straight line
Mean response = 𝐸(𝑌) = 𝛽0 + 𝛽1 𝑥1 to the data
𝑥1 0 1 2 3 4
y 1 4 3 8 9
Before the responses 𝑌 = [𝑌1 , 𝑌2 , … … . . , 𝑌5 ] are observed, the errors 𝜀 ′ =

[𝜀1 , 𝜀2 , … … . . , 𝜀5 ] are random and we can write


𝑌 = 𝑍𝛽 + 𝜀
where
𝑌1 1 𝑥11 𝜀1
𝑌 1 𝑥21 𝛽 𝜀2
𝑌 = [ 2] , 𝑍=[ ], 𝛽 = [ 0 ], 𝜀 = [ ⋮ ]
⋮ ⋮ ⋮ 𝛽1
𝑌5 1 𝑥 𝜀5
The data for this model are contained in the observed response vector y and the
design matrix Z, where
1 1 0
4 1 1
𝑦= 3 , 𝑋= 1 2
8 1 3
[9 ] [1 4]
Least Squares Estimation
➢ One of the objectives of regression analysis is to develop an equation that
will allow the investigator to predict the response for given values of the
predictor variables. Thus, it is necessary to "fit" the model in (3) to the
observed 𝑦𝑗 corresponding to the known values 1, 𝑥𝑗1 , 𝑥𝑗2 , … … … … . , 𝑥𝑗𝑘 .
➢ That is, we must determine the values for regression coefficients 𝛽 and the
error variance 𝜎 2 consistent with the available data.

➢ Let b be trial values for 𝛽.


➢ Consider the difference 𝑦𝑗 − 𝑏0 − 𝑏1 𝑥𝑗1 − ⋯ … … − 𝑏𝑘 𝑥𝑗𝑘 between the
observed response 𝑦𝑗 and the value 𝑏0 + 𝑏1 𝑥𝑗1 + ⋯ + 𝑏𝑟 𝑥𝑗𝑘 that would be
expected if b were the "true” parameter vector.
➢ Typically the differences 𝑦𝑗 − 𝑏0 − 𝑏1 𝑥𝑗1 − ⋯ … … − 𝑏𝑘 𝑥𝑗𝑘 will not be zero,
because the response fluctuates about its expected value. The method of least
squares selects b so as to minimize the sum of the squares of the differences:
𝑛
2
𝑆(𝑏) = ∑(𝑦𝑗 − 𝑏0 − 𝑏1 𝑥𝑗1 − ⋯ … … − 𝑏𝑘 𝑥𝑗𝑘 )
𝑗=0

= (𝑦 − 𝑍𝑏)′ (𝑦 − 𝑍𝑏)
➢ The coefficients b chosen by the least squares criterion are called least
squares estimates of the regression parameters 𝛽. They will henceforth be
denoted by 𝛽̂ to emphasize their role as estimates of 𝛽.
➢ The coefficients 𝛽̂ are consistent with the data in the sense that they produce
estimated (fitted) mean responses, 𝛽̂0 + 𝛽̂1 𝑥𝑗1 + ⋯ … … . +𝛽̂𝑘 𝑥𝑗𝑘 , the sum of
whose squares of the differences from the observed 𝑦𝑗 is as small as possIble.

➢ The deviations
𝜀̂𝑗 = 𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑥𝑗1 − ⋯ … … . −𝛽̂𝑘 𝑥𝑗𝑘 , 𝑗 = 1, 2, … … , 𝑛 (5)

are called residuals.


➢ The vector of residuals 𝜀̂ = 𝑦 − 𝑍𝛽̂ contains the information about the
remaining unknown parameter 𝜎 2 .

Let X have full rank 𝑟 + 1 ≤ 𝑛.

𝛽̂ = (𝑍 ′ 𝑍)−1 𝑍 ′ 𝑦
Let 𝑦̂ = 𝑍𝛽̂ = 𝐻𝑦 denote the fitted values of y, where 𝐻 = 𝑍(𝑍 ′ 𝑍)−1 𝑍 ′ is
called “hat” matrix. Then the residuals

𝜀̂ = 𝑦 − 𝑦̂
′ ′
satisfy 𝑍 𝜀̂ = 0 and 𝑦̂ 𝑒̂ = 0.
Also, the
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑠𝑢𝑚 𝑜𝑓 𝑠𝑞𝑢𝑎𝑟𝑒𝑠 = ∑𝑛𝑗=1(𝑦𝑗 − 𝛽̂0 − 𝛽̂1 𝑥𝑗1 − ⋯ … − 𝛽̂𝑘 𝑥𝑗𝑘 )2
= 𝜀̂ ′ 𝜀̂
= 𝑦 ′ 𝑦 − 𝑦 ′ 𝑍𝛽̂

Matrix and Properties of Matrix


Definition of Matrix
➢ A matrix is a rectangular array or table arranged in rows and columns of
numbers or variables.

Operations on Matrices
The three basic yet main operations on matrices are:
i. addition,
ii. subtraction, and
iii. multiplication.

Addition of Matrices
If A=[aij]m×n and B=[bij]m×n are matrices of the same order, then addition A+B is
a matrix, obtained by adding the corresponding elements of two matrices.
Example:
𝑎 𝑏1 𝑎 𝑏2 𝑎 + 𝑎2 𝑏1 + 𝑏2
[ 1 ]+[ 2 ]=[ 1 ]
𝑐1 𝑑1 𝑐2 𝑑2 𝑐1 + 𝑐2 𝑑1 + 𝑑2

1 5 12 −1
Example: Let A= [ ] B= [ ], then find A + B
7 3 0 9

Subtraction of Matrices
Let two matrices A and B are of the same order, then the subtraction
A – B = A + (–B)
is obtained by subtracting the corresponding elements.

Example:
𝑎 𝑏1 𝑎 𝑏2 𝑎 − 𝑎2 𝑏1 − 𝑏2
[ 1 ]−[ 2 ]=[ 1 ]
𝑐1 𝑑1 𝑐2 𝑑2 𝑐1 − 𝑐2 𝑑1 − 𝑑2
Example:
−𝟐 𝟕
= [−𝟒 𝟏]
−𝟐 𝟗

Matrix Multiplication Properties


➢ If A is a matrix of order 𝑚 × 𝑛 and B is a matrix of order 𝑛 × 𝑝, then the
order of the product matrix is 𝑚 × 𝑝.
➢ The number of columns of matrix A is the same as the number of rows of
matric B.

Steps in Matrix Multiplication


1. Check that the number of columns in the 1st matrix has the same number
of rows in the 2nd matrix for the compatibility of matrices for
multiplication.
2. Multiply the elements of each row of the first matrix by the elements of
each column in the second matrix.
3. Add the products obtained.
4. Put the added products in the respective columns.
Determinants

➢ These determinants of 2 × 2 matrices are called minors of an element in the


3 × 3 matrix.
➢ The symbol Mij represents the determinants of the matrix that results when
row i and column j are eliminated.
The following list gives the minors from the matrix
𝑎11 𝑎12 𝑎13
[𝑎21 𝑎22 𝑎23 ]
𝑎31 𝑎32 𝑎33

Definition of Cofactor
Let Mij be the minor for element aij in an 𝑛 × 𝑛 matrix. The cofactor of aij written Aij
is
𝐴𝑖𝑗 = (−1)𝑖+𝑗 . 𝑀𝑖𝑗
Example:
Matrix Inverse
What is Inverse of Matrix?
➢ The inverse of matrix is another matrix, which on multiplication with the
given matrix gives the multiplicative identity. For a matrix A, its inverse is
A-1, and A · A-1 = A-1· A = I, where I is the identity matrix.

How to Find Matrix Inverse?

Adjoint of Matrix of A; Adj A:


➢ The adjoint of a matrix A is the transpose of the cofactor element matrix of
the matrix A.

The inverse of a matrix can be calculated by following the given steps:


Step 1: Calculate the minors of all elements of A.
Step 2: Then compute the cofactors of all elements and write the cofactor
matrix by replacing the elements of A by their corresponding cofactors.
Step 3: Find the adjoint of A (written as adj A) by taking the transpose of
cofactor matrix of A.
Step 4: Multiply adj A by reciprocal of determinant.

Example: Find the inverse of the matrix


4 7
𝐴=[ ]
2 6
Solution:
Step 1: Calculate Minors of all elements
𝑀11 = 6
𝑀12 = 2
𝑀21 = 7
𝑀22 = 4
Step 2: Calculate cofactors of all elements and cofactor matrix
𝐴11 = 6
𝐴12 = −2
𝐴21 = −7
𝐴22 = 4
6 −2
Cofactor Matrix 𝐶 = [ ]
−7 4
Step 3: Find the adjoint of A
6 −7
𝐴𝑑𝑗(𝐴) = 𝐶 𝑇 = [ ]
−2 4
Step 4: Multiply adj A by reciprocal of determinant
Determinant of A = (24-14) = 10
1 1 6 −7 0.6 −0.7
Inverse of Matrix A =|𝐴| 𝐴𝑑𝑗(𝐴) = [ ]=[ ]
10 −2 4 −0.2 0.4

𝟑 𝟏 −𝟔
Example: Find inverse of the matrix 𝑨 = [ 𝟓 𝟐 −𝟏]
−𝟒 𝟑 𝟎
Solution:
𝟐 −𝟏 𝟓 −𝟏 𝟓 𝟐
|𝑨| = 𝟑 | | − 𝟏| | + (−𝟔) | |
𝟑 𝟎 −𝟒 𝟎 −𝟒 𝟑
= 𝟑[(𝟎 − (−𝟑)] − 𝟏(𝟎 − 𝟒) − 𝟔[𝟏𝟓 − (−𝟖)]
= 𝟑(𝟑) − 𝟏(−𝟒) − 𝟔(𝟐𝟑) = 𝟗 + 𝟒 − 𝟏𝟐𝟖 = −𝟏𝟏𝟓
𝟐 −𝟏
Cofactor 𝐴11 = | | = [(𝟎 − (−𝟑)] = 𝟑
𝟑 𝟎
𝟓 −𝟏
Cofactor 𝐴12 = −| | = −(𝟎 − 𝟒) = 𝟒
−𝟒 𝟎
𝟓 𝟐
Cofactor 𝐴13 = | | = [(𝟏𝟓 − (−𝟖)] = 𝟐𝟑
−𝟒 𝟑
𝟏 −𝟔
Cofactor 𝐴21 = − | | = −[(𝟎 − (−𝟏𝟖)] = −𝟏𝟖
𝟑 𝟎
𝟑 −𝟔
Cofactor 𝐴22 = | | = [(𝟎 − (𝟐𝟒)] = −𝟐𝟒
−𝟒 𝟎
𝟑 𝟏
Cofactor 𝐴23 = − | | = −[(𝟗 − (−𝟒)] = −𝟏𝟑
−𝟒 𝟑
𝟏 −𝟔
Cofactor 𝐴31 = | | = [(−𝟏) − (−𝟏𝟐)] = 𝟏𝟏
𝟐 −𝟏
𝟑 −𝟔
Cofactor 𝐴32 = − | | = −[(−𝟑) − (−𝟑𝟎)] = −𝟐𝟕
𝟓 −𝟏
𝟑 𝟏
Cofactor 𝐴33 = | | = [(𝟔) − (𝟓)] = 𝟏
𝟓 𝟐
3 4 23
Thus, the cofactor matrix is 𝐶 = [−18 −24 −13]
11 −27 1
3 −18 11
𝑇
𝐶 = 𝑎𝑑𝑗(𝐴) = [ 4 −24 −27]
23 −13 1
Therefore,
3 18 11

−115 −115 −115
1 1 3 −18 11 4 24 27
𝐴−1 = 𝐴𝑑𝑗𝐴 = [ 4 −24 −27] = − −
|𝐴| −115 −115 −115 −115
23 −13 1
23 13 1
[−115 −
−115 −115 ]
Example: Calculate the least square estimates 𝛽̂ , the residuals 𝜀̂, and the residual
sum of squares for a straight-line model
𝑌𝑗 = 𝛽0 + 𝛽1 𝑥𝑗1 + 𝜀𝑗
𝑓𝑖𝑡 𝑡𝑜 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎
𝑥1 0 1 2 3 4
y 1 4 3 8 9

Solution: We have
X 𝑋′ 𝑦 𝑋′𝑋 (𝑋 ′ 𝑋)−1 𝑋′𝑦

1 0 1
1 1 4
1 1 1 1 1 5 10 0.6 −0.2 25
1 2 [ ] 3 [ ] [ ] [ ]
0 1 2 3 4 10 30 −0.2 0.1 70
1 3 8
[1 4] [9 ]

Calculation of 𝑿′ 𝑿−𝟏
5 10
𝑋′𝑋 = [ ]
10 30
|𝑋 ′ 𝑋| = 150 − 100 = 50
Cofactor of 5 = (-1)2(30)=30
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 10 = (-1)3 (10)= -10
Cofactor of 30 = (-1)4 (5)= 5
30 −10
Cofactor matrix of 𝑍 ′ 𝑍 = [ ]
−10 5
30 −10
Adj 𝑋 ′ 𝑋 = 𝑇𝑟𝑎𝑛𝑝𝑜𝑠𝑒 𝑜𝑓 𝑐𝑜𝑓𝑎𝑐𝑡𝑜𝑟 𝑚𝑎𝑡𝑟𝑖𝑥 𝑋 ′ 𝑋 = [ ]
−10 5
1 1 30 −10 0.6 −0.2
(𝑋 ′ 𝑋)−1 = ′
Adj 𝑋 ′
𝑋 = [ ]=[ ]
|𝑋 𝑋| 50 −10 5 −0.2 0.1

Consequently,
𝛽̂ 0.6 −0.2 25 1
𝛽̂ = [ 0 ] = (𝑋 ′ 𝑋)−1 𝑋 ′ 𝑦 = [ ][ ] = [ ]
𝛽̂1 −0.2 0.1 70 2
and the fitted equation is
𝑦̂ = 𝛽̂0 + 𝛽̂1 𝑥 = 1 + 2𝑥
The vector of fitted (predicted) value is
1 0 1
1 1 3
̂ 1
𝑦̂ = 𝑋𝛽 = 1 2 [ ]= 5
2
1 3 7
[1 4] [9 ]
1 1 0
4 3 1
so 𝜀̂ = 𝑦 − 𝑦̂ = 3 − 5 = −2
8 7 1
[ 9 ] [9 ] [ 0 ]

The residual sum of squares is


0
1
𝜀̂ ′ 𝜀̂ = [0 1 −2 1 0] −2 = 02 + 12 + (−2)2 + 12 + 02 = 6
1
[0]
The Three-Variable Model: Notation and Assumptions
The three variable Population Regression Model can be written as
𝑌 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 + 𝜖
where,
𝛽0 = the Y intercept, where the regression line crosses the Y axis
𝛽1 = partial slope for X1 on Y
➢ β1 indicates the change in Y for one unit change in X1, controlling for X2
𝛽2 = partial slope for X2 on Y
➢ β2 indicates the change in Y for one unit change in X2, controlling for X1
Estimations of Coefficients
• Multiple regression coefficients are computed using estimators obtained by
the least squares procedure.
• This least square procedure is similar to that for simple linear regression.

Least Square Estimates Using Correlation Coefficients


Least Squares Estimation and the Sample Multiple Regression
We begin with a sample of n observations denoted as
(𝑥1𝑖 , 𝑥2𝑖 , … … … … , 𝑥𝑘𝑖 , 𝑌𝑖 𝑤ℎ𝑒𝑟𝑒 𝑖 = 1,2, … … . . , 𝑛) measured for a process whose
population multiple regression model is
𝑦𝑖 = 𝛽0 + 𝛽1 𝑥1𝑖 + 𝛽2 𝑥2𝑖 + ⋯ … … … … … . . +𝛽𝑘 𝑥𝑘𝑖 + 𝜖𝑖
The least squares estimate of the coefficients 𝛽1 , 𝛽2 , … … … . . , 𝛽𝑘 are
𝛽̂1 , 𝛽̂2 , … … … . . , 𝛽̂𝑘 for which the sum of the squared deviations
2
𝑆𝑆𝐸 = ∑𝑛𝑖=1(𝑦𝑖 − 𝛽̂0 − 𝛽̂1 𝑥1𝑖 − ⋯ … … . . −𝛽̂𝑘 𝑥𝑘𝑖 )
is a minimum.

The resulting equation


𝑦̂𝑖 = 𝛽̂0 + 𝛽̂1 𝑥1𝑖 + ⋯ … … . . +𝛽̂𝑘 𝑥𝑘𝑖
is the sample multiple regression of Y on 𝑋1 , 𝑋2 , … … … … , 𝑋𝑘 .
Let us consider again the regression model with only two predictor variables
𝑦̂𝑖 = 𝛽̂0 + 𝛽̂1 𝑥1𝑖 + 𝛽̂2 𝑥2𝑖
The coefficient estimators can be solved using the following forms
𝑆𝑦 𝑟𝑦𝑥 − 𝑟𝑦𝑥2 𝑟𝑥1𝑥2
𝛽̂1 = ( ) ( 1 )
𝑆𝑥1 1 − 𝑟𝑥21𝑥2
𝑆𝑦 𝑟𝑦𝑥 − 𝑟𝑦𝑥1 𝑟𝑥1𝑥2
𝛽̂2 = ( ) ( 2 )
𝑆𝑥2 1 − 𝑟𝑥21𝑥2

𝛽̂0 = 𝑦̅ − 𝛽̂1 𝑥̅1 − 𝛽̂2 𝑥̅2


where
𝛽̂1 =partial slope of 𝑋1 on Y
𝛽̂2 =partial slope of 𝑋2 on Y
𝑆𝑦 = standard deviation of Y
𝑆𝑥1 = standard deviation of the first independent variable (𝑋1 )
𝑆𝑥2 = standard deviation of the second independent variable (𝑋2 )
𝑟𝑦𝑥1 = bivariate correlation between Y and 𝑋1
𝑟𝑦𝑥2 = bivariate correlation between Y and 𝑋2
𝑟𝑥1𝑥2 = bivariate correlation between 𝑋1 and 𝑋2
Example
• Husbands’ hours of housework per week (Y)
• Number of children (X1)
• Husbands’ years of education (X2)
Family Husband’s housework (Y) Number of Husband’s year of
children (X1) education (X2)
A 1 1 12
B 2 1 14
C 3 1 16
D 5 1 16
E 3 2 18
F 1 2 16
G 5 3 12
H 0 3 12
I 6 4 10
J 3 4 12
K 7 5 12
L 4 5 16

Solution:
Husband’s housework Number of Children (X1) Husband’s Education
(Y) (X2)
𝑌̅ = 3.3 𝑋̅1 = 2.7 𝑋̅2 = 13.7
𝑆𝑦 = 2.1 𝑆𝑥1 = 1.5 𝑆𝑥2 = 2.6

Zero-Order Correlation
𝑟𝑦𝑥1 = 0.50
𝑟𝑦𝑥2 = −0.30
𝑟𝑥1𝑥2 = −0.47

SPSS Result
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Y 12 .00 7.00 3.3333 2.14617
X1 12 1.00 5.00 2.6667 1.55700
X2 12 10.00 18.00 13.6667 2.67423
Valid N (listwise) 12

Correlations
Y X1 X2
Y Pearson Correlation 1 .499 -.296
X1 Pearson Correlation .499 1 -.466
X2 Pearson Correlation -.296 -.466 1

̂𝟏
Result and interpretation of 𝜷
𝑆𝑦 𝑟𝑦𝑥1 − 𝑟𝑦𝑥2 𝑟𝑥1𝑥2
𝛽̂1 = ( )( )
𝑆𝑥1 1 − 𝑟𝑥21𝑥2
2.1 0.50 − (−0.30)(−0.47)
=( )( ) = 0.65
1.5 1 − (−0.47)2
As the number of children in a household increases by one, the husband’s hours of
housework per week increases on average by 0.65 hours (about 39 minutes),
controlling for husband’s education.
̂𝟐
Result and interpretation of 𝜷
𝑆𝑦 𝑟𝑦𝑥 − 𝑟𝑦𝑥1 𝑟𝑥1𝑥2
𝛽̂2 = ( ) ( 2 )
𝑆𝑥2 1 − 𝑟𝑥21𝑥2
2.1 −0.30 − (0.50)(−0.47)
=( )( ) = −0.07
2.6 1 − (−0.47)2
As the husband’s years of education increases by one year, the number of hours of
housework per week decreases on average by 0.07 (about 4 minutes), controlling for
the number of children.
̂𝟎
Result and interpretation of 𝜷
̂𝟎 = 𝒀
𝜷 ̅ − 𝛽̂1 𝑋̅1 − 𝛽̂2 𝑋̅2 = 3.3 − (0.65)(2.7) − (−0.07)(13.7) = 2.5

With zero children in the family and a husband with zero years of education, that
husband is predicted to complete 2.5 hours of housework per week on average.
Final regression equation
In this example, this is the final regression equation
̂ = 𝛽̂0 + 𝛽̂1 𝑋1 + 𝛽̂2 𝑋2 = 2.5 + 0.65𝑋1 − 0.07𝑋2
𝒀
Prediction
• Use the regression equation to predict a husband’s hours of housework per week
when he has 11 years of schooling and the family has 4 children
̂ = 2.5 + 0.65𝑋1 − 0.07𝑋2 = 2.5 + 0.65(4) + (−0.07)(11) = 4.3
𝒀
Under these conditions, we would predict 4.3 hours of housework per week.
Standardized coefficients (𝜷∗ )
➢ Partial slopes (𝛽̂1 ; 𝛽̂2 ) are in the original units of the independent variables
✓ Income can be measured in dollars/Tk., education in years, number of
children in number, household workhours in hour.
✓ This makes assessing relative effects of independent variables difficult
when they have different units
✓ It is easier to compare if we standardize to a common unit by
transforming to Z scores
✓ The transformed variables then have a mean of zero and a variance of
1.
➢ Compute beta-weights (𝛽 ∗ ) to compare relative effects of the independent
variables
✓ Amount of change in the standardized scores of Y for a one-unit
change in the standardized scores of each independent variable
• While controlling for the effects of all other independent variables
✓ They show the amount of change in standard deviations in Y for a
change of one standard deviation in each X
Formulas
• Rescaling the variables also rescales the regression coefficients.
• Formulas for standardized coefficients
𝑺
̂ 𝟏 ( 𝑿𝟏 )
𝜷∗𝟏 = 𝜷
𝑺𝒚
𝑺
̂ 𝟐 ( 𝑿𝟐 )
𝜷∗𝟐 = 𝜷
𝑺𝒚
Example
❖ Which independent variable, number of children (X1) or husband’s education
(X2), has the stronger effect on husband’s housework in dual-career families?
𝑺 𝟏. 𝟓
̂ 𝟏 ( 𝑿𝟏 ) = (𝟎. 𝟔𝟓) (
𝜷∗𝟏 = 𝜷 ) = 𝟎. 𝟒𝟔
𝑺𝒚 𝟐. 𝟏
𝑺 𝟐. 𝟔
̂ 𝟐 ( 𝑿𝟐 ) = (−𝟎. 𝟎𝟕) (
𝜷∗𝟐 = 𝜷 ) = −𝟎. 𝟎𝟖
𝑺𝒚 𝟐. 𝟏
➢ The standardized coefficient for number of children (0.46) is greater in
absolute value than the standardized coefficient for husband’s education
(–0.08).
➢ Therefore, number of children has a stronger effect on husband’s housework.

SPSS Results
Coefficients
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta t Sig.
1 (Constant) 2.526 4.300 .587 .571
X .636 .448 .461 1.417 .190
Z -.065 .261 -.081 -.249 .809
a. Dependent Variable: Y

Standardized coefficients
Standardized regression equation
𝒁𝒚 = 𝜷𝒛 + 𝜷𝟏∗ 𝒁𝟏 + 𝜷∗𝟐 𝒁𝟐
where Z indicates that all scores have been standardized to the normal curve
• The y-intercept will always equal zero once the equation is standardized
𝒁𝒚 = 𝜷∗𝟏 𝒁𝟏 + 𝜷∗𝟐 𝒁𝟐
For our example
𝒁𝒚 = (𝟎. 𝟒𝟔)𝒁𝟏 + (−𝟎. 𝟎𝟗)𝒁𝟐

You might also like