Professional Documents
Culture Documents
Slide 1
Different types of multiple regression are distinguished by the method for entering the
independent variables into the analysis.
In standard (or simultaneous) multiple regression, all of the independent variables are
entered into the analysis at the same.
In hierarchical (or sequential) multiple regression, the independent variables are entered in
an order prescribed by the analyst.
In stepwise (or statistical) multiple regression, the independent variables are entered
according to their statistical contribution in explaining the variance in the dependent
variable.
No matter what method of entry is chosen, a multiple regression that includes the same
independent variables and the same dependent variables will produce the same multiple
regression equation.
The number of cases required for stepwise regression is greater than the number for the
other forms. We will use the norm of 40 cases for each independent variable.
Slide 2
Stepwise regression is designed to find the most parsimonious set of predictors that are
most effective in predicting the dependent variable.
Variables are added to the regression equation one at a time, using the statistical criterion
of maximizing the R of the included variables.
After each variable is entered, each of the included variables are tested to see if the model
would be better off it were excluded. This does not happen often.
The process of adding more variables stops when all of the available variables have been
included or when it is not possible to make a statistically significant improvement in R
using any of the variables not yet included.
Since variables will not be added to the regression equation unless they make a
statistically significant addition to the analysis, all of the independent variable selected for
inclusion will have a statistically significant relationship to the dependent variable.
Slide 3
Each time SPSS includes or removes a variable from the analysis, SPSS considers it a new
step or model, i.e. there will be one model and result for each variable included in the
analysis.
SPSS provides a table of variables included in the analysis and a table of variables
excluded from the analysis. It is possible that none of the variables will be included. It is
possible that all of the variables will be included.
The order of entry of the variables can be used as a measure of relative importance.
Slide 4
Stepwise multiple regression can be used when the goal is to produce a predictive model
that is parsimonious and accurate because it excludes variables that do not contribute to
explaining differences in the dependent variable.
Stepwise multiple regression is less useful for testing hypotheses about statistical
relationships. It is widely regarded as atheoretical and its usage is not recommended.
Stepwise multiple regression can be useful in finding relationships that have not been
tested before. Its findings invite one to speculate on why an unusual relationship makes
sense.
It is not legitimate to do a stepwise multiple regression and present the results as though
one were testing a hypothesis that included the variables found to be significant in the
stepwise regression.
Using statistical criteria to determine relationships is vulnerable to over-fitting the data set
used to develop the model at the expense of generalizability.
75/25% Cross-validation
To do cross validation, we randomly split the data set into a 75% training sample and a
25% validation sample. We will use the training sample to develop the model, and we test
its effectiveness on the validation sample to test the applicability of the model to cases not
used to develop it.
Note: shrinkage may be a negative value, indicating that the accuracy rate for the
validation sample is larger than the accuracy rate for the training sample. Negative
shrinkage (increase in accuracy) is evidence of a successful validation analysis.
If the validation is successful, we base our interpretation on the model that included all
cases.
Slide 6
IV2
DV
IV1
DV
Slide 7
IV1
IV2
Slide 8
IV1
IV2
DV
The brown area is
the variance in DV
that is explained by
both IV1 and IV2.
Slide 9
IV1
Since IV1 had the stronger relationship with
DV (.70 versus .40), it will be the variable
entered first in the stepwise regression.
As the only variable in the regression
equation, it is given full credit (.70) for its
relationship to DV.
The partial correlation and the part
correlation have the same value as the zeroorder correlation at .70.
DV
Slide 10
Slide 11
Slide 12
IV1
IV1
IV2
DV
DV
NOTE: diagrams
are scaled to r2
rather than r.
Slide 13
IV2
IV1
DV
IV2
DV
Slide 15
Slide 16
Slide 17
Slide 19
Slide 20
Sig.
Colum
n
Partial
Correlatio
n
Column
In the table of Excluded
Variables for model 2,
the next largest partial
correlation is HOW
OFTEN R ATTENDS
RELIGIOUS SERVICES
at .149.
Slide 22
Sig.
Colum
n
Partial
Correlatio
n
Column
Slide 24
Slide 25
The problems this week take the 13 questions on prejudice from the general social survey
and explore the relationship of each to the demographic characteristics of age, education,
income, political views (conservative versus liberal), religiosity (attendance at church),
socioeconomic index, gender, and race.
I had no specific hypothesis about which demographic factors would be related to which
question on prejudice, beyond an expectation that race would be a significant contributor
to explaining differences on each of the questions.
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
In addition:
"Description of political views" [polviews] is ordinal level, but the
problem calls for treating it as metric, applying the common
convention of treating ordinal variables as interval level.
"Frequency of attendance at religious services" [attend] is
ordinal level, but the problem calls for treating it as metric,
applying the common convention of treating ordinal variables as
interval level.
The metric independent variable "socioeconomic index" [sei]
was interval level, satisfying the requirement for independent
variables.
The non-metric independent variable "sex" [sex] was
dichotomous level, satisfying the requirement for independent
variables.
The non-metric independent variable "race of the household"
[hhrace] was nominal level, but will satisfy the requirement for
independent variables when dummy coded.
Slide 33
Slide 34
Slide 35
Navigate to the
My Documents
folder, if
necessary.
Slide 36
Slide 37
To have the script save the dummycoded variables, clear the check box
Delete variables created in this analysis.
Slide 39
Slide 41
Slide 42
Slide 44
Slide 45
Click on the
Continue button to
close the dialog box.
Slide 46
Click on the OK
button to produce
the output.
Slide 47
Slide 49
Slide 50
Slide 52
Slide 53
Slide 54
Slide 55
Slide 56
Slide 57
Slide 58
We reject the null hypothesis that the partial slope (b coefficient) for the
variable "frequency of attendance at religious services" = 0 and conclude that
the partial slope (b coefficient) for the variable "frequency of attendance at
religious services" is not equal to 0. The positive sign of the b coefficient
(0.062) means that higher values of frequency of attendance at religious
services were associated with higher values of "importance of ethnic
identity".
Slide 61
Slide 62
Slide 63
Slide 64
Slide 65
Second, select
the option button
for a Fixed
Value.
Slide 68
Click on the OK
button to create
the variable.
Slide 69
Slide 70
An Additional Task before Running the Stepwise Regression on the Training Sample
Before we run the regression on the training sample, we need an additional step that will
enable us to compare the accuracy of the model for the training sample to the accuracy of
the model for the validation sample, using the R2 for each as our measure of accuracy.
We need to exclude from the analysis cases that are missing data for any of the variables
that we have designated as candidates for inclusion. If we dont specifically do this, SPSS
may include different cases in predicting values for the dependent variable than it does in
determining which variables to include in the model.
In model building, SPSS does listwise exclusion of missing data and omits any cases that
have missing data for any variable. In predicting scores on the dependent variable, it
excludes cases that are missing data for only the variables included in the stepwise model.
Thus, when selecting variables, SPSS assumes that only respondents who answer all
questions are valid cases; in predicting scores, it assumes that failing to answer a question
on a variable that is not included has no importance in the analysis.
Slide 72
Selecting Cases with Valid Data for All Variables in the Analysis - 1
Slide 73
Selecting Cases with Valid Data for All Variables in the Analysis - 2
First, mark the
option button for If
condition is
satisfied.
Second, click on
the If button to
add the
condition.
Slide 74
Selecting Cases with Valid Data for All Variables in the Analysis - 3
Type
NMISS(ethimp,age,educ,rincom98,polviews,
attend,sei,sex_1,hhrace_1,hhrace_2) = 0
in the condition textbox. In the parentheses,
we type the names of the dependent variable
and all of the independent variables.
Slide 75
Selecting Cases with Valid Data for All Variables in the Analysis - 4
Click on the
Continue button to
close the dialog box.
Slide 76
Selecting Cases with Valid Data for All Variables in the Analysis - 5
Click on the OK
button to
execute the
command.
Slide 77
Selecting Cases with Valid Data for All Variables in the Analysis - 6
Slide 78
Slide 79
Slide 81
First, highlight
the split variable.
Second, click on the
right arrow button to the
left of the Selection
Variable text box..
Slide 82
Slide 83
First, type 1 in
the Value text
box. Recall that
this is the value
of split indicating
training cases.
Slide 84
Slide 85
Click on the
Continue button to
close the dialog box.
Slide 86
Click on the
Continue button to
close the dialog box.
Slide 87
Click on the OK
button to produce
the output.
Slide 88
Slide 89
Slide 90
Slide 92
Slide 93
Level of
measurement ok?
No
Yes
Yes
Consider limitation in
discussion of findings
No
Slide 94
Logic Diagram for Solving Homework Problems: Sample Size and Overall Relationship
Sample size ok
(number of Ivs x 40)?
No
Yes
Model will be
statistically
significant if
any
variables
entered
1+ variables entered
in model?
No
No
Stop (model is
not usable)
Yes
Yes
Slide 95
Subset of entered
variables correctly
identified?
No
Yes
Strength of model
correctly characterized
No
Yes
Slide 96
Variable entered
and not removed?
No
Yes
Correct interpretation of
direction of relationship?
No
Yes
Yes
Additional variables
entered?
No
Slide 97
No
No
Yes
Shrinkage
< or = 2%?
Yes