You are on page 1of 5

Lab 11: Multiple Regression.

Conducting a multiple regression in SPSS is a fairly straightforward matter. This lab will
demonstrate how to do multiple regression, as well as a review of scatterplots and generating a
correlation matrix. All of these are necessary in order to complete Assignment 3. Steps 1-7 start
with a review of calculating correlations and simple regression, a review of the material from lab
10.

1. Open SPSS and then open the Boston Census Tract data set.
2. Well continue from where we left off in Lab 10 on Correlation and Simple Regression.
Recall our causal model for our simple regression model:

% of residents living in poverty (+) Violent arrest rate (+)

Our research hypothesis was supported as percent poverty was statistically significant
meaning there is a real relationship between the two variables. The equation it produced,
which we obtained via the regression coefficients, was the following:

Y(RATEVA) = 1.32 + .24(POVERTY)

3. What are some limitations to this? Well, the adjusted r2 (.192) only explained 19.2% of
the variation in our dependent variable. Additionally, were only including one variable.
As the Lewis-Beck textbook notes, we have two reasons for including more variables. It
is certainly plausible that there are other variables that could influence the violent arrest
rate and we want to be sure the variable has a unique effect on our dependent variable.
For instance, we might hypothesize that the neighborhoods with citizens who are less
involved in community policing programs [PINV] will have higher rates of violent
arrests (in other words, we might expect that neighborhoods with fewer crime watches
and other programs would have more crime as people may commit crime in these areas
rather than neighborhoods which are more active in community policing). This is
operationalized by the percent of respondents involved in community policing programs.

% of residents involved in community policing programs (-) Violent arrest rate (+)

4. Violent arrest rates may also be influenced by residential mobility [BRESMOB] (that is,
residents moving in and out). Neighborhoods with higher rates of residential mobility,
which is operationalized as percentage of residents that moved within the last five years,
is hypothesized have a positive relationship with the violent arrest rate.

Percent of residents that moved within the last five years (+) Violent arrest rate (+)

So, we have hypothesized the relationship between our 3 IVs and our DV. Of these, we
expect two correlations to be positive (POVERTY and BRESMOB) and one to be
negative (PINV).

5. Putting this all together based on our hypotheses, our equation would look like this:
Y(Violent Arrest Rate) = A + (B1)(POVERTY) - (B2)(PINV) + (B3)(BRESMOB)

1
6. We already examined our scatterplot from lab 10. Lets look at one other: residential
mobility and violent arrest rate. Remember, the dependent variable goes on the Y axis.
Do that now and make sure to add in the regression line; refer to Lab 10 if you are having
difficulty. Double click on the graph and the SPSS Chart Editor appears. To add the best
fit line, select the following (circled) and make sure Linear is selected:
7.

We now have a regression line added along with the equation which takes the form: Y =
a + bx. For this relationship, y = 9.87 + (-0.07)(BRESMOB).

This means that for every one unit increase in percent poverty (i.e. a 1% increase), the
violent arrest rate is expected to decrease by 0.07; a 10% increase would increase the rate
by 0.70. This is the opposite direction of what we hypothesized. This relationship alone
explains 2.6% (0.026) of the variation in our dependent variable according to the r2 linear
on the top right of the scatterplot.

2
8. Next, we need to run a correlation matrix for all of our variables. Adding in all four of
variables produces the following:

9. Remember, the correlation matrix is mirrored so whatever we see below the diagonal (the
correlations of 1) will also appear above it (i.e., correlation is non-directional).

10. So far weve satisfied the first criteria for causality for our main IV correlation (there is
a moderate, positive relationship between percent of residents living in poverty and the
violent arrest rate: r = .443, significant at the p < .01 level which we established in Lab
10). The second criterion is easy: the census data was gathered before the crime data, so
time ordering is not a problem though there is a seven year gap between when the two
were collected which may be problematic depending on the trends cities experience in
their demographics.

11. Now that weve examined these two criteria, it is time to estimate the OLS regression
components for our model to rule out any spuriousness. First, go to Analyze, then select
Regression, then select Linear Now the Linear Regression window appears, and here
we can specify our IVs and DV. From the variable list, select RATEVA, then click on
the arrow beneath the Dependent: title to tell SPSS that RATEVA is our Dependent
Variable in this regression. Next, select PCTPOV, and click the arrow beneath the
Independent title to tell SPSS that this will be our IV in the regression. Once this is
done, click on the Next tab; it should now say Block 2 of 2. This is where add in our
two control variables: BRESMOB and PINV, and click the arrow beneath the
Independent title to tell SPSS that this will be our second set of variables in the
regression. Remember, enter the Independent Variable first and then your Control
Variables second. Click on Statistics and make sure Estimates and Model fit are
selected.

12. An SPSS output window will appear and display several tables. The first two tables
inform us of the variables we entered in the two models in addition to the model

3
summary. With only our main IV, PCTPOV, the adjusted r2 is .192. By adding in our
two control variables, it increases to .258. So with 3 variables, our model explains 25.8%
of the variation in the dependent variable. Remember, we use our adjusted r2.

Model Summary

Adjusted R Std. Error of the


Model R R Square Square Estimate
a
1 .443 .197 .192 5.90294
2 .522b .272 .258 5.65391

a. Predictors: (Constant), % of residents living in poverty


b. Predictors: (Constant), % of residents living in poverty, % of
respondents involved in community policing programs, % residents that
moved within last five years

13. The next table is our ANOVA table which tests to see if any of the variables are
significantly related to our dependent variable. Both models are significant at the p < .001
level.

4
14. Our next table is the most interesting to us! The Coefficients table shows us the slopes
are for each of the variables and if they are significant.

This table presents the estimates for the intercept (the constant) in addition to the slope (B). It
also tells us if the independent variables significantly predict the dependent variable RATEVA.
The part of the table on which to concentrate is both the direction of the slope and whether or
not it is significant. Look at the table under B (which is the slope) for PCTPOV. The value for B
in the table for PCTPOV is positive (.236) in our first model, which means that as the percent of
residents living in poverty increases, the violent arrest rate increases as well. So, a 1-unit (1%)
increase in residents living in poverty, the violent arrest rate is predicted to increase by .236. It is
also highly significant (p=0.000). This matches the direction wed predict given our hypothesis
though we know this from Lab 10.

When adding in our control variables, the coefficient for percent poverty increases slightly (.236
to .270) and it remains statistically significant (p=0.000). Percent of respondents involved in
community policing programs is not statistically significant (p = .815). For residential mobility, a
1 unit increase (1%) results in the violent arrest rate decreasing by 0.128. This relationship is
significant (p=0.000). We can ignore the significance test for the intercept (Constant), as it is of
little value to us. Our second control variable, percent of residents that moved within the last five
years is also highly significant and is negative.

Had our main IV, PCTPOV, gone from significant to non-significant, this would be evidence of
spuriousness. As it remains significant, we do not have evidence of spuriousness. However, we
have only included 3 independent variables in the model when typically many more are used
(often between 10-20 independent variables). So, for the third assignment, you will do exactly
this procedure: develop and then test a hypothesis, and determine correlation (using a correlation
matrix and scatterplots), time order, and spuriousness of your proposed hypothesis using multiple
regression