Professional Documents
Culture Documents
R.Venkatesakumar
Department of Management Studies (SOM)
Pondicherry University
1
Why do we use multiple regression analysis?
2
An Illustration of Multiple Regression
Number of Credit Cards (Y) Family Size (V1) Family Income (V2) Number of cars (V3)
4 2 14 1
6 2 16 2
6 4 14 2
7 4 17 1
8 5 18 3
7 5 21 2
8 6 17 1
10 6 25 2
Mean representation
n If Mean alone used as
‘representative for the Number of Credit Cards
(Y) Y- µ (Y-µ)**
data’ then 4 -3 9
µ = Total / Number of
observations 6 -1 1
= 56 /8 = 7 6 -1 1
representation of the 8 1 1
data is 22 7 0 0
8 1 1
3
How to reduce ‘error’?
Number of Credit
Cards
(Y) 1.0000
Family Size
(V1) 0.8664 1.0000
Family Income
(V2) 0.8290 0.6727 1.0000
Number of cars
(V3) 0.3419 0.1917 0.3008 1.0000
4
Prediction accuracy of regression
Prediction using Y = 2.871 + 0.971
Mathematics of regression
16.514 / 22
0.7506
0.8664 * 0.8664
Correlation square
Coefficient of determination) R**
0.7506
10
5
Excel / SPSS output
Regression Statistics
Multiple R 0.866400225
ANOVA
R Square 0.750649351
df SS MS F Significance F
Total 7 22
Observations 8
11
6
Prediction using
Y = 0.482 + 0.63 V1 + 0.216 V2
0 3.050136
13
0.8614
14
7
The logic…
15
16
8
The logic…
variance accounted by
entering V2
Error reduction = (Remaining variance X 0.11062
Partial correlation squared)
= {0.2494 X (0.666 X 0.666)}
17
18
9
Stage -1
Stage -1
Two objectives associated with prediction
20
10
Stage -1
21
Stage -1
Appropriate for statistical relationships,
not functional relationships
n Statistical relationships
assume that more than one
value of the dependent value
will be observed for any value
of the independent variables.
n An average value is
estimated and error is
expected in prediction.
n Functional relationships
assume that a single value of
the dependent value will be
observed for any value of the
independent variables.
n An exact estimate is
made, with no error
22
11
Stage -1
Selection of Dependent and Independent
Variables
Stage -1
Specification error
n The inclusion of an independent variable must be
guided by the theoretical foundation of the regression
model and its managerial implications.
n A variable that by chance happens to influence
statistical significance, but has no theoretical or
managerial relationship with the dependent variable is
of no use to the researcher in explaining the
phenomena under observation.
n Researchers must be concerned with specification
error, or the inclusion of irrelevant variables or the
omission of relevant variables.
n Parsimony in the regression model - fewest
independent variables with the greatest contribution to
the variance explained
24
12
Stage -2
Research Design
25
2 5 10 20
13
Stage -2
Research Design
27
Stage -2
Research Design
28
14
Stage -2
Research Design
n Non-linear relationships:
n Arithmetic transformations (i.e. square root or logarithm) and
polynomials are most often used to represent non-linear relationships.
n Moderator effects:
n Reflect the changing nature of one independent variable's relationship
with the dependent variable as a function of another independent
variable
n Represented as a compound variable in the regression equation.
n Moderators change the interpretation of the regression coefficients.
To determine the total effect of an independent variable, the separate
and the moderated effects must be combined.
n Nonmetric variable inclusion:
n Dichotomous variables, also known as dummy variables. may be used
to replace nonmetric independent variables.
n The resulting coefficients represent the differences in group means
from the comparison group and are in the same units as the
dependent variable.
29
Stage -3
Assumptions
Assumptions in Multiple Regression Analysis
30
15
Stage -3
Assumptions
3.1 Assessment of individual variables versus the variate
Stage -3
Assumptions
3.2 Linearity of the phenomenon
32
16
Stage -3
Assumptions
33
Stage -4
Estimating & Model fit
assessment
Estimating the regression model
and assessing overall fit
34
17
Stage -4
Estimating & Model fit
assessment
n Forward method begins with no variables in the equation and then adds
variables that satisfy the F-to-enter test.
n Equation is estimated again and the F-to-enter of the remaining variables is
calculated
n This is repeated until the F-to-enter test finds no variables to enter.
n Backward elimination begins with all variables in the regression
equation and then eliminates any variables with the F-to-remove test.
n The same repetition of estimation is performed as with forward estimation.
n Stepwise estimation is a combination of forward and backward
methods.
n It begins with no variables in the equation as with forward estimation and
then adds variables that satisfy the F test.
n The equation is estimated again and additional variables that satisfy the F
test are entered.
n At each re-estimation stage, however, the variables already in the equation
are also examined for removal by the appropriate F test.
n This repetition continues until both F tests are not satisfied by any of the
variables either in or out of the regression equation.
35
Stage -4
Estimating & Model fit
assessment
n Combinatorial Methods
n The combinatorial approach estimates regression
possible-subsets regression.
n Combinatorial methods become impractical for very
large sets of independent variables.
n For example, for even 10 independent variables, one would
have to estimate 1024 regression equations
36
18
Stage -4
Estimating & Model fit
assessment
37
Stage -4
Estimating & Model fit
assessment
38
19
Stage -4
Estimating & Model fit
assessment
39
Stage -4
Estimating & Model fit
assessment
Further insight…
n As a further measure of the strength of the
model fit, compare the standard error of the
estimate in the model summary table to the
standard deviation of time reported in the
descriptive statistics table.
n The standard error of estimate would be
lower than Standard deviation implies that the
model is better than mean representation
40
20
Stage -4
Estimating & Model fit
assessment
4.3 Analyze the variate
n The variate is the linear combination of independent variables
used to predict the dependent variables.
n Analysis of the variate relates the respective contribution of each
independent variable in the variate to the regression model.
n The researcher is informed as to which independent variable
contributes the most to the variance explained and may make relative
judgments between/among independent variables (using
standardized coefficients only).
n Regression coefficients are tested for statistical significance.
n The intercept (or constant term) should be tested for appropriateness
for the predictive model. If the constant is not significantly different
from zero, it cannot be used for predictive purposes.
n The estimated coefficients should be tested to ensure that across all
possible samples, the coefficient would be different from zero.
n The size of the sample will impact the stability of the regression
coefficients. The larger the sample size, the more generalizable the
estimated coefficients will be.
n An F-test may be used to test the appropriateness of the intercept
and the regression coefficients.
41
Stage -4
4.4 Examine the data for influential Estimating & Model fit
observations assessment
42
21
Stage -4
Estimating & Model fit
assessment
To assess residuals
43
Stage -5
Interpreting Regression
44
22
Stage -5
Interpreting Regression
45
Stage -5
Interpreting Regression
Impact of Multicollinearity
46
23
Stage -5
Interpreting Regression
Issue of multicollinearity
Correlations
Dependent V1 V2 Dependent V1 V2
Dependent Pearson Correlation 1 .823* -.977**
5.00 6.00 13.00 Sig. (2-tailed) .012 .000
N 8 8 8
3.00 8.00 13.00
V1 Pearson Correlation .823* 1 -.913**
9.00 8.00 11.00 Sig. (2-tailed) .012 .002
N 8 8 8
9.00 10.00 11.00
V2 Pearson Correlation -.977** -.913** 1
13.00 10.00 9.00 Sig. (2-tailed) .000 .002
N 8 8 8
11.00 12.00 9.00
*. Correlation is significant at the 0.05 level (2-tailed).
17.00 12.00 7.00 **. Correlation is significant at the 0.01 level (2-tailed).
15.00 14.00 7.00
Stage -5
Interpreting Regression
48
24
Stage -5
Interpreting Regression
49
Stage -5
Interpreting Regression
50
25
Stage -5
Interpreting Regression
Remedy
n One can apply stepwise procedure
n Omit one or more highly correlated independent
variables
n (but check whether the remaining variables specify the
needed model)
n Use simple correlations between independent and
dependent variables to understand relationships
51
Stage -6
Validation
Validation of the results
52
26
Stage -6
Validation
Validation of the results
53
Thank you
54
27