Session 11 - Multiple Regression Analysis (GbA) PDF

Introduction to Statistical Methods
BITS Pilani Prof.Gangaboraiah PhD

Bangalore Campus
BITS Pilani
Session
BITS Pilani11: Multiple Linear Regression Analysis
Bangalore Campus
About me
 Dr.Gangaboraiah, PhD (Stats)
 Former Professor of Statistics, KIMS, Bangalore
 Work Experience
 Kempegowda Institute of Medical Sciences, Bangalore (34 years)
 Govt. Homeopathy Medical College, Bangalore (4 years)
 SJC Institute of Technology, Chickballapur (13 years, Visiting Professor)
 Manipal University, Bangalore Centre (Since 2008, Visiting Professor)
 MS (Computer Science), MS (Computer Network)
 Data Science
 BITS (Since 2013, Visiting Professor)
 MTech (Data Science)
 WIPRO and Aricent (2019)
Prof.Gangaboraiah PhD (Stats) | Slide 3 Former Professor of Statistics | KIMS, B’lore
Agenda
Here’s what you will learn in the entire unit:
1 Data Visualization: Why? What? How?
2 Measures of Central Tendency

2 Measures of Dispersion/ Variation

Learning objectives of this unit
• At the end of the session, the student should be able to
• Define and interpret Covariance.
• Define Correlation
• Identify types of correlation. Solve problems on

Correlation
• Define Regression.
• Identify types of regression. Solve problems on

regression
Session 11:
Multiple Linear Regression
Analysis
Learning objectives of this unit
• At the end of the session, the student should be able to
• Define and interpret Covariance.
• Define Correlation
• Identify types of correlation. Solve problems on

Correlation
• Define Regression.
• Identify types of regression. Solve problems on

regression
Multiple Linear Regression Analysis

• Many applications of regression analysis

involve situations in which there are more
than one regressor variable.
• A regression model that contains more than
one regressor variable is called a multiple
regression model.

Suppose that there is a study on the prognosis

(improvement) of patients at the time of
diagnosis for a certain cancer for which there
is not, as yet, an effective treatment

The physician might surmise that the length
of survival for a patient would depend on
Patient’s age
 Conceptually this relationship could be
explained as follows:
Cancer prognosis varies with Age
Mathematically,
 Cancer prognosis = Age
• Assign suitable weighting factor based on its

• relative importance predicting prognosis
Cancer prognosis = W1 Age
For the above equation become useful two more things are
needed
♦ some sort of an anchor point
♦ an error term
Cancer prognosis = W0 + W1Age + Error term
This is called a Simple Linear Regression (SLR) model.

In general, denoting
♦ Cancer prognosis by Y
♦ Anchor point by a
♦ Weight factor by b
♦ Age by X
♦ Error term by ε
Cancer prognosis = W0 + W1Age + Error term

is given by
Y = a + bX + ε
The physician might surmise that the length of survival
for a patient would depend on at least four things:
● Patient’s age
● The anatomic stage of the disease at the time of
diagnosis
● The presence or absence of other diseases
(co-morbidity)
● The degree of systemic symptoms such as weight
loss
 Conceptually this relationship could be explained as
follows:
Cancer prognosis varies with Age,
Stage, Co-morbidity and Symptoms
Mathematically,
 Cancer prognosis = Age + Stage + Co-morbidity
+ Symptoms
 All four independent variables are not necessarily of
equal importance
 Assign suitable weighting factor based on its relative
importance predicting prognosis
Cancer prognosis = W1 Age + W2 Stage + W3 Co-morbidity

+ W4 Symptoms
For the above equation become useful two more things
are needed
● some sort of an anchor point
● an error term
Cancer prognosis = W0 + W1Age + W2Stage
+ W3 Co-morbidity
+ W4 Symptoms + Error term

By denoting say,
Y = Cancer prognosis
w0 = Anchor point
X1 = Age
X2 = Stage
X3 = Co-morbidity
X4 = Symptoms
e = Error term
The statistical model is
Y = w 0 + w 1 X1 + w 2 X 2 + w 3 X3 + w 4 X4 + e
This model is called multivariate regression model
In general a dependent or response or outcome
variable Y may be related to k independent or
explanatory or regressor variables. The model
based on k parameters is
Y = β0 + β1X1 + β2X2 + β3X3 + … +βKXk + ε
is called multiple linear regression model with k
regressor variables. The parameters βj, j =0, 1,
2, …, k are called regression coefficients. This
model describes a hyperplane in k dimensional
space of the regressor variables {Xj}.
The parameters βj, represents the expected
change in response variable Yj per unit change
in Xj, when all the remaining regressor Xi (i ≠ j)
are held constants.

• The least squares normal Equations are
• The solution to the normal Equations are the least

squares estimators of the regression coefficients.
Based on the sample data the model can be written as
Y  b0  b1X1  b 2 X 2  b3X3  ...  b k X k  ε
The normal equations obtained from least squares
principles are
n n n n
Y
i 1
i  nb 0  b1  X1i  b 2  X 2i  ...  b k  X ki
i 1 i 1 i 1
n n n n n
X Yi  b 0  X 1i  b1  X1i  b 2  X1i X 2i  ...  b k  X1i X ki

2
1i
i 1 i 1 i 1 i 1 i 1
...
n n n n n 2
X
i 1
ki Yi  b 0  X ki  b1  X1i X ki  b 2  X 2i X ki  ...b k  X ki
i 1 i 1 i 1 i 1

When k = 2, the model becomes
The normal equations obtained from least squares

principles are
n n n
 Y nβ
i 1
i 0  β1  X1i  β 2  X 2i
i 1 i 1
n n n n
X Yi  β 0  X 1i  β1  X1i  β 2  X1i X 2i
2
1i
i 1 i 1 i 1 i 1
n n n n
X Yi  β 0  X 2i  β1  X1i X 2i  β 2  X 2i
2
2i
i 1 i 1 i 1 i 1

For k = 2, based on the sample data the model can be
written as Y  b 0  b1X1  b 2 X 2  ε
The normal equations obtained from least squares
principles are
n n n
 Y nb
i 1
i 0  b1  X1i  b 2  X 2i
i 1 i 1
n n n n
X Yi  b 0  X 1i  b1  X1i  b 2  X1i X 2i
2
1i
i 1 i 1 i 1 i 1
n n n n
X Yi  b 0  X 2i  b1  X1i X 2i  b 2  X 2i
2
2i
i 1 i 1 i 1 i 1

Solving these normal equations the predicted
regression line is given by
Ŷ  b̂ 0  b̂1X1  b̂ 2 X 2
For given values of X1 and X2 Y can be
predicted

For example, suppose that the effective life of a
cutting tool depends on the cutting speed and the
tool angle. A possible multiple regression model
could be Y = b0+b1X1+b2X2+ε
where
Y - tool life
X1 - cutting speed
X2 - tool angle




Pull strength  b 0  b1Wire length  b 2 Die height  ε
Y  b 0  b1X1  b 2 X 2  ε
The normal equations are
25b 0  206b 1  8294b 2  725.82
206b 0  2396b 1  77177b 1  8008.47
8294b 0  77177b 1  3531848b 2  274816.71
Solving these normal equations we get
b0 = 2.26379, b1 = 2.74427, b2 = 0.01253
The fitted regression line is
Y = 2.26379 + 2.74427 X1 + 0.01253 X2
yˆ  a  b1 x1  b2 x2  b3 x3  ...  bk xk  
ŷ
X3
X1
X2

STATITICAL DATA ANALYSIS
COMMON TYPES OF ANALYSIS?
1. Compare Groups
Compare Proportions (e.g., Chi Square Test -2)
a.
H0: P1 = P2 = P3 = … = Pk

b. Compare Means (e.g., Analysis of Variance)

H0: µ1 = µ2 = µ3 = …= µk

Examine Strength and Direction of Relationships
Bivariate (e.g., Pearson Correlation—r)
a.
Between one variable and another:


Y = a + b 1 x1
b. Multivariate (e.g., Multiple Regression Analysis)
Between one dependent variable and each of
several independent variables, while holding
all other independent variables constant:
Y = a + b1 x1 + b2 x2 + b3 x3 + … + bk xk
What does regression analysis do?
Examines whether changes/differences in values of one
variable (dependent variable Y) are linked to
changes/differences in values of one or more other
variables (independent variables X1, X2, etc.), while
controlling for the changes in values of all other Xs.
E.g., Relationship between salary and gender for people who have the same levels of education, work
experience, position level, seniority, etc.
The DV (Y) must be metric.

The IVs (Xs) must be either metric or dummy variable

Central Question Addressed:
Is Y a function of X1, X2, etc.? How ?
Is there a relationship between Y and X1, X2 ,
etc., (in each case, after controlling for the
effects of all other Xs)? In what way?
What is the relative impact of each X on Y,
holding all other Xs constant (that is, all
other Xs being equal)?

More specifically,
Do values of Y tend to increase/decrease as ŷ
values of X1, X2, etc. increase/decrease?
X3
If so,
By how much? X1
and
How strong is the connection/ relationship between Xs and Y?
X2
what % of differences/variations in Y values (e.g., income)
among study subjects can be explained by (or attributed
to) differences in X values (e.g. years of education, years of
experience, etc.)?

NOTE: Once we can determine how values of Y change as a
function of values of X1, X2, etc., we will also be able to predict/
estimate the value of Y from specific values of X1, X2, etc.

Y = a + b1 x1 + b2 x2 + b3 x3 + … + bk xk+є
Therefore, regression analysis, in a sense, is about ESTIMATING
values of Y, using information about
values of Xs:
Estimation, by definition, involves?
The objective?
To minimize error in estimation.
Or, to compute estimates that are as close to the true/actual
values as possible.

QUESTION: What is the simplest way to obtain an
estimate for some population characteristic
(e.g., number of credit cards per Indian household)?
ANSWER:
1. Select a representative sample from the population and
2. Compute the mean for that sample (e.g., compute the
average number of CCs for the sample households).
Regression analysis can be viewed as a technique that often

significantly improves the accuracy of estimation results relative
to using the mean value. X
So, suppose we were to estimate the number of credit cards for
Indian households, based on information from a random sample
of, say, n = 8 families.
Estimating Number of Credit Cards*
Family Actual # of Credit
Number Cards
i yi
1 4 ŷ  Estimate?
2 6 56
3 6
yˆ  y  7
8
4 7
5 8 QUESTION: Can we determine
6 7 how much error in estimation we
7 8 are committing by using Y  7 as
our estimate, for each of these
8 10
households?
Y
i  56
Family Actual # of Estimate for # Error in
Number Credit Cards of Credit Cards Estimation
i
1 4 7 ?
2 6 7 ?
3 6 7 ?
4 7 7 ?
5 8 7 ?
6 7 7 ?
7 8 7 ?
8 10 7 ?
 y  56
56
i
yˆ  y   7
8
i yi yˆ  y yi  y
1 4 7 -3
2 6 7 -1
3 6 7 -1
4 7 7 0
5 8 7 +1
6 7 7 0
7 8 7 +1
7 Lets now see all
8 10 +3
this graphically
 yi  56 yˆ  y 
56
8
7
10
F8
9
F5
8 F7
F6 ^
7
F4 Y Y Estimat
6 F2, F3 e
5
4 F1
3
2
1
Let’s spread the dots away from each other to
0
see things more clearly!
10 F8
Graphic Representation
9
F5
Actual Estimate
8 F7
^
Estimate
7
F4 F6 Y Y
6 F3
F2
5
4 F1
Estimation Error
3
2 Can we determine the
total estimation
1
error for all 8
0 families?
i
yi yˆ  y yi  y
1 4 7 -3
2 6 7 -1
3 6 7 -1 What would
4 7 7 0
be the total
5 8 7 +1
6 7 7 0
estimation
7 8 7 +1 error for all 8
8 10 7 +3 families
 yi  56 yˆ  y 
56
7  ( y  y) = 0
i
combined?
8
Prof.Gangaboraiah PhD (Stats) | Slide 43 Solution?
Former Professor of Statistics | KIMS, B’lore
Family Actual # of Estimate for # Error in Errors
Number Credit Cards of Credit Cards Estimation Squared
yi yˆ  y yi  y 2
( yi  y )
1 4 7 -3 9
2 6 7 -1 1
3 6 7 -1 1
4 7 7 0 0
5 8 7 +1 1
6 7 7 0 0
7 8 7 +1 1
8 10 7 +3 9
 yi  56 yˆ  y 
56
8
7  ( yi  y )  0  ( yi  y )  22
2 SST = Sum of
Squares Total

22 = SST = Index for total (combined) amount of estimation error
for all families (observations) in the sample when using the mean
as the estimate.
 SST is also the sum of squared deviations from the mean.
o Remember the formula for computing Variance?
• Objective in Estimation?
Minimize error, maximize precision.
• Can we cut down the amount of estimation error (SST)? How?

Yes, we can, by using information about other variables suspected
to be strong predictors (strongly related to) # of credit cards
possessed by families (e.g., family size, family income, etc.)..

Family Actual # of Family We now can attempt to
Number Credit Cards Size estimate # of credit cards
y x
from the information on
1 4 2 family size, rather than
2 6 2 from its own mean.
3 6 4 Let’s first see this
4 7 4 graphically!
5 8 5
6 7 5
7 8 6
8 10 6

Y F8
10 Plot actual numbers of
9 CCs against family Size.
8 F5 F7
7 F2 F4 F6 yˆ  y
6 Original (Baseline)
5
F3 Estimate
F1 QUESTION: Does the mean (y ) appear to

4 x  2, y  4 represent the closest estimate of the
3 actual c.c. numbers for our sample
families ?
2 That is, is the green line the best line to
represent the location of estimates of #
1 of CC for these families?
0 X Family Size
1
Prof.Gangaboraiah PhD (Stats) | 2 Slide 347 4 5 6 7Former Professor of Statistics | KIMS, B’lore
Y yˆ  a1  b1 x
F8
10 Generic Equation for any yˆ  a3  b3 x
9 straight line: Y= a + bx Regression Line
F4 yˆ  a2  b2 x
8 F5 F7
7
F2 yˆ  y
F6 Original (Baseline)
6 Estimate
F3 yˆ  a  0 x  y
5
4
F1
3 Regression Line (Line of Best
2 Fit)--new improved location for
CC estimates (see next slide)
1
0 X Family Size
1
Prof.Gangaboraiah PhD (Stats) | 2
Slide 48 3 4 5 6 7 Professor of Statistics | KIMS, B’lore
Former
Y F8 yˆ  a  bx
10
Reg. Line (Line of
9 Best Fit)--new
improved location
8 F5 F7 for CC estimates
F2 F4
7 y Original
F6 (Baseline)
6
F3 Estimate
5
4 Estimation ERROR ( y  yˆ )
F1
3
Regression Line will
2
2
Minimize  ( y  yˆ ) = total estimation error.
1 But, how do we know the values a and b in
yˆ  a  bx (the reg.
0 line)? Family
1
Prof.Gangaboraiah PhD (Stats) | 2 3 4 5 6 Professor
X
7 of Statistics | KIMS, B’lore
Slide 49 Former Size
EQUATION FOR REGRESSION LINE (LINE OF BEST
FIT) - Values of a and b for the regression line:
ˆ  ( x  x)( y  y)
b
 ( x  x ) 2
ˆ
yˆ  aˆ  bx
aˆ  y  bˆ x
Let’s use above formulas to compute the values of
“a” and “b” for the regression line in our example.
2
We will need: y , x ,  ( x  x )( y  y ), and  (x  x)

Multiple Linear Regression Analysis 2
We need: y, x,  ( x  x )( y  y), and  ( x  x )
Family Actual # of Family
xx
2
Number Credit Cards Size y y ( x  x )( y  y ) (x  x)
i y x
1 4 2 ? ? ? ?
2 6 2 ? ? ? ?
3 6 4 ? ? ? ?
4 7 4 ? ? ? ?
5 8 5 ? ? ? ?
6 7 5 ? ? ? ?
7 8 6 ? ? ? ?
8 10 6 ? ? ? ?
Y 
56
7
34
x   4.25  ( x  x )( y  y )  ?  ( x  x ) 2
?
8 8
Multiple Linear Regression Analysis 2
We need: y, x,  ( x  x )( y  y), and  ( x  x )
Family Actual # of Family
xx
2
Number Credit Cards Size y y ( x  x )( y  y ) (x  x)
i y x
1 4 2 -2.25 -3 6.75 5.0625
2 6 2 -2.25 -1 2.25 5.0625
3 6 4 -.25 -1 .25 .0625
4 7 4 -.25 0 0 .0625
5 8 5 .75 1 .75 .5625
6 7 5 .75 0 0 .5625
7 8 6 1.75 1 1.75 3.0625
8 10 6 1.75 3 5.25 3.0625
Y 
56
7
34
x   4.25  ( x  x )( y  y )  ?  ( x  x ) 2
?
8 8
REGRESSION LINE (LINE OF BEST FIT):
 (x  x)(y  y) 17
b̂    0.971
 (x  x) 2 17.5
ˆ
yˆ  aˆ  bx
aˆ  y  b x  7  0.971(4.25)  2.87
aˆ  2.87 bˆ  0.971
yˆ  2.87  0.971x
? ?
Y-Intercept Regression Coefficient
Y 10 F8 yˆ  2.87  0.971x
9 New
Improved
8
F F Estimates
F F 5
7
2 4
7 y Original
(Baseline)
6
F6 Estimate
5
F
3
4
F
3 Can 1
we tell how much estimation error we have
2 committed by using the new regression line?
Yes, examine differences between our household’s
1
actual # of CCs and their new/regression estimates.
0 X Family
1
Prof.Gangaboraiah PhD (Stats) | 2
Slide 54 3 4 5 Professor
Former 6 of Statistics | KIMS, B’lore
ŷ  2.87  0.97 x ŷ
Family Actual # of Family Regression Error Errors
Number Credit Cards Size Estimate (Residual) Squared
y  yˆ
2
i y x ŷ ( y  yˆ )
1 4 2 ? ? ?
2 6 2 ? ? ?
3 6 4 ? ? ?
4 7 4 ? ? ?
5 8 5 ? ? ?
6 7 5 ? ? ?
7 8 6 ? ? ?
8 10 6 ? ? ?
Prof.Gangaboraiah PhD (Stats) | Slide 55 Former Professor  y)ˆ
 (ofyStatistics
2
| KIMS, B’lore
ŷ  2.87  0.97 x yˆ  2.87  .97(2)  4.81
Family Actual # of Family Regression Error Errors
Number Credit Cards Size Estimate (Residual) Squared
y  yˆ
2
i y x ŷ ( y  yˆ )
1 4 2 4.81 -.81 .66
2 6 2 4.81 1.19 1.42
3 6 4 6.76 -.76 .58
4 7 4 6.76 .24 .06
5 8 5 7.73 .27 .07
6 7 5 7.73 -.73 .53
7 8 6 8.7 -.7 .49
8 10 6 8.7 1.3 1.69
 ( y  yˆ )  5.486
2
SSE = Sum of Squares Error (SS Residual)
Total Baseline Error using the mean (SS Total) 22.0
New or Remaining Error (SS Error or SS Residual) 5.486 ~ 5.5
QUESTION: How much of the original estimation error have we explained Total Var. in
away (eliminated) by using the regression model (instead of the mean)? Y = 22
22 – 5.486 = 16.514 (SS Regression or SS Explained) 5.5

16.5
Y the
X1 by using
QUESTION: What % of estimation error have we explained (eliminated
regression model?
R2 = 16.514 / 22 = .751 or 75% What is this called?
% of differences in # of CCs among households that is explained by
differences in their family size.
What does the remaining 25% represent?
Percent of variation (differences) in number of credit cards owned by families that
can be accounted for by: (a) all other potential predictors not included in the
model, beyond family size, and (b) unexplainable random/chance variations.
R2 = SS Regression / SS Total = 16.5/22 = 75%
R2 is a measure of our success regarding accuracy of our estimation effort.
 R2 =% of estimation error that we have been able to explain away by
using the regression model, instead of using the mean.
 R2 indicates how much better we can predict Y from information about
Xs, rather than from using its own mean.
 R2 = % of differences (variations) in Y values that is explained by
(attributable to) differences in X values.
Note: When dealing with only two variables (a single X and Y):
Pearson Correlation
16.514
r R 2
 0.75  0.866of Y with X1
22 (NOT controlling for any other varariable)
Let’s now examine all this graphically!

Y F8
10 Regression Line (New Improved
yˆ  2.87  0.97 x
Estimates):
9
8 F5 F7
F2 F4 F6
7 y Original
y y yˆ  y (Baseline)
6 ? Explained by Estimate
REGRESSION F3
Original
Baseline
5 ERROR
Model
4 for F1
? y  yˆ
F1 New ERROR
3 (Unexplained/
RESIDUAL)
2
1
0 X Family Size
1
Prof.Gangaboraiah PhD (Stats) | 2 Slide 359 4 5 6 7Former Professor of Statistics | KIMS, B’lore
5.5 = SSE = The amount of estimation error for the 8
sample families when using simple regression (i.e., a
regression model that includes only information about
family size).
Can we reduce the amount of estimation

error (SSE) to an even lower level and,
thus, improving the estimation process? How?
Yes, by adding information on a second variables

suspected to be strongly related to # of credit cards (e.g.,
family income--X2).
Family Actual # of Family Family
Number Credit Cards Size Income
i yi x1 x2 We now can attempt to
estimate # of CCs from
1 4 2 14
our information on
2 6 2 16 family size and family
3 6 4 14 income!
4 7 4 17
Our regression model
5 8 5 18 will now be a linear
6 7 5 21 plane, rather than a
7 8 6 17 straight line!
8 10 6 25
Generic Equation for a linear plane: yˆ  a  b1 x1  b2 x2
Let’s examine the regression plane for our example graphically.
12 Y = # of Credit Cards
yˆ  a  b1 x1  b2 x2 11
Formulas are available for 10
computing values of
a, b1 and b2 9
8
MULTIPLE REGRESSION
MODEL FOR OUR EXAMPLE: 7 Family Income
yˆ  0.482  0.63 x1  0.216 x2 6
Let’s now see 5

how much error 4
in estimation we 3
are committing by 2
using this 1 Actual
multiple Regression Estimate
0
regression model.
Prof.Gangaboraiah PhD (Stats) | Slide 62 X1 =ofFamily
Former Professor Size
Statistics | KIMS, B’lore
yˆ  0.482  0.63 x1  0.216 x2 ŷ
Family Actual # of Family Family Income Regression Error Errors
Number Credit Cards Size ($000) Estimate (Residual) Squared
2
i
y x1 x2 Yˆ y  yˆ ( y  yˆ )
1 4 2 14 ? ? ?
2 6 2 16 ? ? ?
3 6 4 14 ? ? ?
4 7 4 17 ? ? ?
5 8 5 18 ? ? ?
6 7 5 21 ? ? ?
7 8 6 17 ? ? ?
8 10 6 25 ? ? ?
 ( y  y)
ˆ 2

yˆ  0.482  0.63 x1  0.216 x2yˆ  0.482  0.63(2)  0.216(14)  4.77
2
i
y x1 x2 Yˆ y  yˆ ( y  yˆ )
1 4 2 14 4.77 -0.77 0.59
2 6 2 16 5.20 0.80 0.64
3 6 4 14 6.03 -0.03 0.00
4 7 4 17 6.68 0.32 0.10
5 8 5 18 7.53 0.47 0.22
6 7 5 21 8.18 -1.18 1.39
7 8 6 17 7.95 0.05 0.00
8 10 6 25 9.67 0.33 0.11
SSE = Sum of Squares Error (Residual) 3.05   ( y  yˆ ) 2
Unique (additional) contribution of X2 (family income) beyond X1 = ? 5.5 – 3.05 = 2.45
The MULTIPLE REGRESSION MODEL FOR OUR EXAMPLE:
yˆ  0.482  0.63 x1  0.216 x2
? ?
Y-Intercept, “a” b1 and b2 = Regression Coefficients
(NOTE: Only when all Xs can 0.63: Among families of the same income, an increase
meaningfully take on value of in family size by one person would, on average, result
zero, the intercept will have a in .63 more credit cards.
meaningful/direct/ practical
0.21: Among families of the same size, an income
interpretation. Otherwise, it
increase of $1,000, results in an average increase of 0.2
is simply an aid in increasing
credit cards .
accuracy of estimation.
“b”s represent effect of each X on Y when all other Xs
are controlled for/held constant/taken into account
• i.e., after impacts of all other variables are
accounted for (remember the high blood pressure-
hearing problem connection?)
The MULTIPLE REGRESSION MODEL FOR OUR EXAMPLE:
yˆ  0.482  0.63 x1  0.216 x2
SST = 22 SSE = 3.05
2
What is our new R ?
Percent of differences in
households’ number of CCs that
SS Regression = 22 – 3.05 = 18.95 is explained by differences in
family size and family income.
2
R = 18.95 / 22 = .861 or 86% Percent of variation in number of
credit cards that can be accounted
for by (a) all other relevant factors not
included in the model, beyond family
The Remaining 14%? size and income, and (b)
unexplainable random/chance
(3.05 / 22 = .14) variations.
a d Y= # of CC
Total Variation/Error in Y = SS Total = a + b + c + d = 22
c b
yˆ  2.87  .97 X 1 r2 = ? R2 = (a+c) / (a+b+c+d)
X1=Family Y R2 = 16.5 / 22 = 0.75
Size
What do we call the square root of this?
X2 = Family SSR = Pearson/simple r yx1 
16.5
 0.75  0.867
Income 22
Correlation
a+
X1=Family
of Y with X1 ac
c (not controlling ryx 
abcd
15.11
r  size
yx2  0.829 1
22
= 16.5 for X2)
Y yˆ  0.063  .398 X 2 r2 = (b+c) / (a+b+c+d) = 15.12 / 22 = 0.687
SSR = Pearson/simpl bc
e Correlation ryx2 
c+b of Y with X2
abcd
= 15.12
(not controlling
X2 = Family for X1) ?
Income PhD (Stats) |
Prof.Gangaboraiah Slide 67 Former Professor of Statistics | KIMS, B’lore
a d yˆ  0.482  0.63 x1  0.216 x2
c b R2Graphically = ?
X1=Family
Size NOTE: c is explained by both X1 and X2
X2 = Family
Income
SSR = a + b +c = 18.95
SST = a + b + c + d = 22
R2 = SSR / SST = (a + b + c) / (a + b + c + d) = 18.95 / 22 = 86%
SSE = ?
SSE = d = 22 – 18.95 = 3.05

yˆ  0.482  0.63 x1  0.216 x2yˆ  0.482  0.63(2)  0.216(14)  4.77
2
i
y x1 x2 Yˆ y  yˆ ( y  yˆ )
1 4 2 14 4.77 -0.77 0.59
2 6 2 16 5.20 0.80 0.64
3 6 4 14 6.03 -0.03 0.00
4 7 4 17 6.68 0.32 0.10
5 8 5 18 7.53 0.47 0.22
6 7 5 21 8.18 -1.18 1.39
7 8 6 17 7.95 0.05 0.00
8 10 6 25 9.67 0.33 0.11
SSE = Sum of Squares Error (Residual) 3.05   ( y  yˆ ) 2
Unique (additional) contribution of X2 (family income) beyond X1 = ? 5.5 – 3.05 = 2.45
Inference on Regression coefficients
General regression model
Y   0  1 X  
1.0, and 1 are parameters
2.X is a independent variable
3.Deviations  are independent N(o,  )
2

We will write an estimated regression line based on

sample data as
yˆ  b0  b1 x
Logistic Linear
The method of Regression
least squares Analysis
chooses the values for b0,
and b1 to minimize the sum of squared errors
n n 2
SSE   ( yi  yˆ i )   y  b0  b1 x 
2
i 1 i 1

Example Y X
The weekly advertising expenditure 1250 41
(X) and weekly sales (Y) are 1380 54
presented in the following table. 1425 63
1425 54
Logistic Linear Regression Analysis 1450 48
1300 46
1400 62
1510 61
1575 64
1650 71

Using Principles of least squares, we obtain the
following estimates:
n n n n
 (X i  X)(Yi  Y) n  X i Yi   X i  Yi Sy
b̂1   or b̂1  r
i 1 i 1 i 1 i 1
n n n
 (X i  X)
i 1
2
n  X  ( X i )
i 1
2
i
i 1
2 Sx
and
b̂ 0  Y  b̂1X such that Ŷ  â  b̂1X

From previous table we have:
n  10  X  564  X  32604
2
 Y  14365  XY  818755
The least squares estimates of the regression
coefficients are:
n  XY   X  Y 10(818755)  (564)(14365)
b̂1    10.8
n  X  ( X)
2 2
10(32604)  (564) 2
b̂ 0  1436.5  10.8(56.4)  828

The estimated regression function is:
Ŷ  828  10.8X
Sales  828  10.8 Expenditure
Logistic Linear Regression Analysis
This means that if the weekly advertising
expenditure is increased by Rs.1 we would
expect the weekly sales to increase by Rs.10.8.

Fitted values for the sample data are obtained by substituting
the X value into the estimated regression function.
For example if the advertising expenditure is $50, then the
estimated Sales is:
Sales  828  10.8(50)  1368
This is called the point estimate (forecast) of the mean
response (sales).

Example: weekly advertising expenditure
Y X Y-hat Residual (e)

1250 41 1270.8 -20.8
1380 54 1411.2 -31.2
1425 63 1508.4 -83.4
1425 54 1411.2 13.8
1450 48 1346.4 103.6
1300 46 1324.8 -24.8
1400 62 1497.6 -97.6
1510 61 1486.8 23.2
1575 64 1519.2 55.8
1650 71 1594.8 55.2
The variance 2 of the error terms i in the
regression model needs to be estimated for a
variety of purposes.
It gives an indication of the variability of the
probability distributions of Y.
It is needed for making inference concerning
regression function and the prediction of Y.

To estimate  we work with the variance and take the
square root to obtain the standard deviation.
For simple linear regression the estimate of 2 is the
average squared residual.
1 1
  ei   (Yi  Ŷi )
2 2 2
To estimate  , use s y.x
n2 n2
s y. x  s y. x
2
s estimates the standard deviation  of the error term 

in the statistical model for simple linear regression.
Y X Y-hat Residual (e) square(e)
1250 41 1270.8 -20.8 432.64
1380 54 1411.2 -31.2 973.44
1425 63 1508.4 -83.4 6955.56
1425 54 1411.2 13.8 190.44
1450 48 1346.4 103.6 10732.96
1300 46 1324.8 -24.8 615.04
1400 62 1497.6 -97.6 9525.76
1510 61 1486.8 23.2 538.24
1575 64 1519.2 55.8 3113.64
1650 71 1594.8 55.2 3047.04
Y-hat = 828+10.8X total 36124.76

S y .x 67.19818
Confidence Intervals and Significance Tests
In our previous lectures we presented confidence intervals
and significance tests for means and differences in means.
In each case, inference rested on the standard error s of the
estimates and on t or z distributions.
Inference for the slope and intercept in linear regression is
similar in principal, although the recipes are more
complicated.
All confidence intervals, for example , have the form
estimate  t* SE(Estimate)
t* is a critical value of a t distribution.

Confidence intervals and tests for the slope and

intercept are based on the sampling distributions of
the estimates b1 and b0.
Here are the facts:
If the simple linear regression model is true, each of
b0 and b1 has a Normal distribution.
The mean of b0 is 0 and the mean of b1 is 1.

That is, the intercept and slope of the fitted line are
unbiased estimators of the intercept and slope of the
population regression line.
The standard deviations of b0 and b1 are multiples of the

model standard deviation .
S
SE b1  S(b1 ) 
 (X  X)
i
2
2
1 X
SE b0  S(b 0 )  S  n
 (X  X)
n 2
i
i 1


Let us return to the Weekly advertising expenditure and
weekly sales example. Management is interested in testing
whether or not there is a linear association between
b t ( S (b )) 
advertising expenditure and weekly sales, using regression
1
(
2
; n 2 )
1
model. Use  = .05

Hypothesis: H 0 : 1  0
H a : 1  0
Decision Rule: Reject H0 if t  t.025;8  t  2.306
or t  t.025;8  t  2.306
Test statistic:
b1 S y. x 67.2
t b1  10 .8 S (b1 )    2.38
S (b1 )  ( x  x ) 2
794 .4
10.8
t  4 .5
2.38
Conclusion:
Since t =4.5 > 2.306 then we reject H0.
There is a linear association between advertising expenditure
and weekly sales.
b1  t  ( S (b1 ))
( ; n 2 )
2
Now that the test showed that there is a linear

association between advertising expenditure and
weekly sales, the management wishes an estimate of
1 with a 95% confidence coefficient.

For a 95 percent confidence coefficient, we require t

(.025; 8). From table B in appendix III, we find t(.025;
8) = 2.306.
The 95% confidence interval is:
b1  t  ( S (b1 ))
( ; n2)
2
10.8  2.306(2.38)
10.8  5.49  (5.31, 16.3)

Analysis of variance approach to
Regression analysis Regression analysis
Analysis of Variance is the term for statistical
analyses that break down the variation in data into
separate pieces that correspond to different sources of
variation.
It is based on the partitioning of sums of squares and
degrees of freedom associated with the response
variable.
In the regression setting, the observed variation in the
responses (yi) comes from two sources.
The breakdowns of the total sum of squares and associated degrees of freedom are
displayed in a table called analysis of variance table (ANOVA table)
Source of
df SS MSS F - ratio
variation
Regression 1 SSR MSR = SSR/1 MSR/MSE
Error n-2 SSE MSE = SSE/n-2
Total n-1 SST

The breakdowns of the total sum of squares and associated degrees of freedom are
displayed in a table called analysis of variance table (ANOVA table)
Source of
df SS MSS F - ratio
variation
92427.74 92427.74
Regression 1 20.47
36124.76 4515.6
Error 8
Total 9 128552.5
F-Test for 1= 0 versus 1 0
Equivalence of F Test and t Test:
For given  level, the F test of 1 = 0 versus 1  0 is equivalent
algebraically to the two sided t-test.
Thus, at a given level, we can use either the t-test or the F-test
for testing 1 = 0 versus 1  0.
The t-test is more flexible since it can be used for one sided test
as well.
Multivariate Regression analysis
Population Model:
Y  β 0  β1X1  β 2 X 2  ...  β k X k  ε i
Sample model:
Y  b 0  b1X1  b 2 X 2  ...  b k X k  ei

To test whether explanatory variables collectively have effect on y, we
test
H0 : 1 = 2 = … = k = 0
(i.e., y independent of all the explanatory variables)
Ha: At least one i  0
(at least one explanatory variable has an effect on y, controlling for the
others in the model)
Equivalent to testing
H0: population multiple correlation = 0 (or popul. R2 = 0)
vs. Ha: population multiple correlation > 0

Test statistic (with k explanatory variables)
2
R df1 = k
k (number of explanatory variables in
F model)
(1  R) 2
df2 = n – (k + 1)
[n  (k  1)] (sample size – no. model parameters)
When H0 true, F values follow the F distribution (R. A. Fisher)
Larger R gives larger F test statistic, more evidence against

null hypothesis.
Since larger F gives stronger evidence against null, P-value =

right-tail probability above observed value
Example with two predictor variables
H0 : 1 = 2 = 0 (i.e., y independent on x1 and x2)
H1: 1  0 or 2  0 or both
Test statistic
2
R /k 0.861 / 2
F   15 . 486
(1  R )/[n  (k  1)] (1  0.861) /[8  (k  1)]
2
For df1 = 2, df2 = 5, Fobs = 15.486, P < 0.001

There is very strong evidence that at least one of the
explanatory variables is associated with no. of credit cards
Inferences for individual regression coefficients
(Need all predictors in model?)
To test partial effect of xi controlling for the other
explanatory variable’s in model, then test
H0: i = 0 vs. H1: i ≠ 0
bi  0
using test-statistic t with df = n - (k + 1)
SE(b i )

Inferences for individual regression coefficients
(Need all predictors in model?)
which is df2 from the F test (and in df column of
ANOVA table in Residual row)
CI for i has form bi ± tα/2SE(bi),
with t-score from t-table also having df = n - (k + 1), for
the desired confidence level
Software provides estimates, standard errors, t test
statistics, P-values for tests (2-sided by default)
Inference correlation coefficients
Significance Test: H0: ρ = 0 vs. H1: ρ ≠ 0

To test whether the relation is merely apparent, and/or
might have arisen by chance use the t test is applied
𝑛−2
𝑡=𝑟
1 − 𝑟2
which follows t-distribution with n-2 degrees of

freedom.

Logistic Regression Analysis
When and Why
Time interval
Outcome of Marital Method of attempt between attempt
Sl No Age (yrs) Sex Cause of suicide Religion SES Occupation Time of event
suicide status to suicide to suicide and
bring to hospital
1 Died 19 Female Married Dowry death Hindu Middle Housewife Hanging Morning 30 minutes
Failure in
2 Survived 20 Male Unmarried Hindu Middle Student Poisson 30 minutes
studies
Time of death
unknown, hifted to
3 Died 21 Male Unmarried Depression Hindu Lower Painting/ Coolie Electrical Night
hospital for
postmartum
Manager in Cane
4 Died 21 Male Unmarried Depression Hindu Middle Hanging Morning 25 minutess
Juice centre
Time of death
Failure in unknown, hifted to
5 Died 17 Male Unmarried Hindu Middle Student Hanging Night
studies hospital for
postmartum
Time of death
Problem at work unknown, hifted to
6 Died 20 Male Unmarried Hindu Lower Coolie Electrical Evening
place hospital for
postmartum
7 Survived 19 Female Unmarried Pain abdomen Hindu Upper Middle Student Poisson 70 minutes
8 Survived 16 Female Unmarried Depression Jains Middle Student Hanging 40 minutes
9 Survived 20 Male Unmarried Depression Hindu Middle Student Fall from height 35 minutes
Time of death
unknown, hifted to
10 Died 20 Male Unmarried Love failure Hindu Middle Student Hanging Morning
hospital for
postmartum
Introduction
In linear regression models it is assumed that
the dependent variable ‘Y’ should be
Quantitative (Continuous/ Discrete) and
normally distributed.
But in many instances, the dependent variable
will not quantitative instead may be categorical.
If the dependent variable is categorical, it
violates the assumption of linearity of normal
regression.
In many regression settings, the Y variable is (0,1)
A Few Examples:
 Consumer chooses brand (1) or not (0);
 A quality defect occurs (1) or not (0);
 A person is hired (1) or not (0);
 Evacuate home during hurricane (1) or not (0);
 Other Examples???
Scatterplot of with Y=(0,1):
Y = Hired-Not Hired; X= Experience
Y
1
0 X
The Linear Probability Model (LPM)
If we estimate the slope using OLS regression:
Hired = α + *Income + e ;
The result is called a “Linear Probability Model”

 The predicted values are probabilities that Y equals 1;
 The equation is linear – the slope is constant
Picture of LPM
Y = Hired-Not Hired; X= Experience
Y
1
LPM Regression Line
(slope coefficient)
Points on regression line represent
predicted probabilities. For Y for each value of X
0 X
An Example: Loan Approvals
Data:
Dependent Variable:
Loaned = 1 if Loan Approved
0 if not Approved by Bank Z
Independent Variables:
ROA = net income as % of total assets of applicant;
Debt = debt as % of total assets of applicant;
Officer = 1 if loan handled by loan officer A
0 if handled by officer B;
The Linear Probability Model (LPM) Weaknesses
 The predicted probabilities can be greater than 1 or less
than 0
 Probabilities, by definition, have max =1; min = 0;
 This is not a big issue if they are very close to 0 and 1
 The error terms vary based on size of X-variable
(“heteroskedastic”) –
 There may be models that have lower variance – more
“efficient”
 The errors are not normally distributed because Y takes on
only two values
 Creates problems for
 More of an issue for statistical theorists
Picture of LPM fused with S-type curve
Y Y = Hired-Not Hired; X= Experience
1
LPM Regression Line
(slope coefficient)
Points on regression line represent
predicted probabilities. For Y for each value of X
0 X
Model development
The model that describes the S-type curve is as
follows.
Let ‘p’ be the probability that an event ‘Y’
occurs, ie., P(Y=1)
Let ‘1 - p’ be the probability that an event ‘Y’ do

not occurs, ie., P(Y=0)
Model development
The estimated probability P(Y) with one
predictor variable is given by
P (Y )  1 e ( 0 1X1i ) 1
and
 ( 0  1X1εi )  ( 0  1X1εi )
1 - P(Y)  1 e
1 e
1
( 0  1X1εi )  e
1 e ( 0  1X1εi )
Model development
P(Y)
The ratio of is called the Odds
1  P(Y)
Ratio of two probabilities which is given by
P(Y) (  0  1 X 1   )
e
1  P(Y)
Relationship between Odds & Probability
Probability (Event)
Odds (Event) 
1  Probability (Event)
Odds (Event)
Probability (Event) 
1  Odds (Event)
The Odds Ratio
Definition of Odds Ratio: Ratio of two odds
estimates.
So, if Pr (response |trt) = 0.40 and Pr (response | placebo) =
0.20
 0.40
Then: Odds  response| trt group    0.667
1  0.40

Oddsresponse | placebo group  
0.20
 0.25
1  0.20
 0.667
 OR  Trt vs. Placebo    2.67
0.25
Model development
P(Y)
The logarithm of 1  P(Y) is called the
P(Y)
The logarithm of 1  P(Y)
is called the
 P(Y) 
Hence, Ln    β  β X  ε
1  P(Y) 
0 1 1
is called Logistic Regression model

The solution of the Logistic Regression Model

can be obtained from Maximum Likelihood
Method.
However, direct method of estimation may be

difficult beacause of complexity in function and
should be solved iteratively using computers.
Confusion matrix
Outcome of suicide
Sex Survived Died Total
Male 6 41 47
Female 14 52 66
Total 20 93 113
a d
Sensitivit y  x 100 Specificit y  x 100
ac bd
a
Positive predictive value  x 100
ab
d
Negative predictive value  x 100
cd
ad
Accuracy  x 100
n

Session 11 - Multiple Regression Analysis (GbA) PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Session 11 - Multiple Regression Analysis (GbA) PDF

Uploaded by

Copyright:

Available Formats

Introduction to Statistical Methods

BITS Pilani Prof.Gangaboraiah PhD

2 Measures of Central Tendency

Prof.Gangaboraiah PhD (Stats) | Slide 4 Former Professor of Statistics | KIMS, B’lore

• Identify types of correlation. Solve problems on

• Identify types of regression. Solve problems on

• Identify types of correlation. Solve problems on

• Identify types of regression. Solve problems on

Prof.Gangaboraiah PhD (Stats) | Slide 8 Former Professor of Statistics | KIMS, B’lore

• Many applications of regression analysis

Prof.Gangaboraiah PhD (Stats) | Slide 9 Former Professor of Statistics | KIMS, B’lore

Suppose that there is a study on the prognosis

Prof.Gangaboraiah PhD (Stats) | Slide 10 Former Professor of Statistics | KIMS, B’lore

• Assign suitable weighting factor based on its

This is called a Simple Linear Regression (SLR) model.

Cancer prognosis = W0 + W1Age + Error term

Cancer prognosis = W1 Age + W2 Stage + W3 Co-morbidity

Prof.Gangaboraiah PhD (Stats) | Slide 16 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 19 Former Professor of Statistics | KIMS, B’lore

• The solution to the normal Equations are the least

X Yi  b 0  X 1i  b1  X1i  b 2  X1i X 2i  ...  b k  X1i X ki

Prof.Gangaboraiah PhD (Stats) | Slide 21 Former Professor of Statistics | KIMS, B’lore

The normal equations obtained from least squares

Prof.Gangaboraiah PhD (Stats) | Slide 22 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 23 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 24 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 25 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 26 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 27 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 28 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 30 Former Professor of Statistics | KIMS, B’lore

b. Compare Means (e.g., Analysis of Variance)

Prof.Gangaboraiah PhD (Stats) | Slide 31 Former Professor of Statistics | KIMS, B’lore

Between one variable and another:

The DV (Y) must be metric.

Prof.Gangaboraiah PhD (Stats) | Slide 33 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 34 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 35 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 36 Former Professor of Statistics | KIMS, B’lore

Regression analysis can be viewed as a technique that often

Prof.Gangaboraiah PhD (Stats) | Slide 44 Former Professor of Statistics | KIMS, B’lore

• Can we cut down the amount of estimation error (SST)? How?

Prof.Gangaboraiah PhD (Stats) | Slide 45 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 46 Former Professor of Statistics | KIMS, B’lore

F1 QUESTION: Does the mean (y ) appear to

We will need: y , x ,  ( x  x )( y  y ), and  (x  x)

Prof.Gangaboraiah PhD (Stats) | Slide 50 Former Professor of Statistics | KIMS, B’lore

22 – 5.486 = 16.514 (SS Regression or SS Explained) 5.5

Let’s now examine all this graphically!

Can we reduce the amount of estimation

Yes, by adding information on a second variables

Let’s now see 5

Prof.Gangaboraiah PhD (Stats) | Slide 63 Former Professor of Statistics | KIMS, B’lore

SSE = d = 22 – 18.95 = 3.05

General regression model

Prof.Gangaboraiah PhD (Stats) | Slide 70 Former Professor of Statistics | KIMS, B’lore

We will write an estimated regression line based on

Prof.Gangaboraiah PhD (Stats) | Slide 71 Former Professor of Statistics | KIMS, B’lore

Prof.Gangaboraiah PhD (Stats) | Slide 72 Former Professor of Statistics | KIMS, B’lore

b̂ 0  Y  b̂1X such that Ŷ  â  b̂1X

Prof.Gangaboraiah PhD (Stats) | Slide 73 Former Professor of Statistics | KIMS, B’lore

b̂ 0  1436.5  10.8(56.4)  828

Prof.Gangaboraiah PhD (Stats) | Slide 75 Former Professor of Statistics | KIMS, B’lore