Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1
multiple regression mechanics

# multiple regression mechanics

Ratings: (0)|Views: 112|Likes:

### Availability:

See more
See less

05/22/2010

pdf

text

original

Green // Statistics
The Mechanics of Multiple Regression
One of the most important concepts in statistics is the idea of “controlling” for a variable.This lecture is designed to give you a feel for what “controls” are and how they areimplemented in the context of multiple regression.Let’s begin by considering an example. In the weeks leading up to the November 2003election, a group called ACORN sought to bolster support for a ballot proposition inKansas City. The measure authorized a rise in sales tax in order to fend off cuts to publictransportation. ACORN canvassed voters in a predominantly black section of KansasCity, targeting registered voters who had voted in at least one of the five most recentelections. The campaign consisted primarily of door-to-door canvassing conductedduring the final two weeks before Election Day.I was asked to evaluate the effectiveness of this campaign. ACORN identified 28 precincts of potential interest to their campaign; I randomly assigned 14 to the treatmentgroup and 14 to the control group. After the election, voter turnout records weregathered. Voting rates among those living in the treatment and control precincts werecalculated. The data may be found atKansas City DatasetThe data may be modeled in a few different ways. The simplest model describes thevoter turnout rate (Y) as a linear function of the experimental treatment (X) plus adisturbance term:Y = a + bX + U.Here is an “individual value plot” of the data. Note that all of the X values are either 0(control) or 1 (treatment), but the plot scatters them a bit in order to make the individualvalues easier to see.

TREATMEN
V     O     T     E     0     3
1.000.000.500.450.400.350.300.250.20
Individual Value Plot of VOTE03 vs TREATMEN
Using regression, we obtain the following results:
Regression Analysis: VOTE03 versus TREATMEN
The regression equation isVOTE03 = 0.289 + 0.0355 TREATMENPredictor Coef SE Coef T PConstant 0.28884 0.01778 16.24 0.000TREATMEN 0.03554 0.02515 1.41 0.169S = 0.0665291 R-Sq = 7.1% R-Sq(adj) = 3.6%
The critical numbers here are .036, which suggests that the expected rate of turnoutincreases by 3.6 percentage-points as we move from control to treatment, and .025, whichconveys the uncertainty surrounding this experimental effect. The p-value of .169 tells usthat there is a 16.9% chance of observing a treatment effect as large as this in absolutevalue even if the true experimental effect were zero. Ordinarily, we would use a 1-tailedtest here, because one would suppose that canvassing would increase turnout; in thatcase, the one-tailed p-value is approximately .09. For what it’s worth, that falls a bitshort of the conventional statistical significance threshold of .05.(Note that it is just a coincidence that the estimated treatment effect of .036 coincideswith the adjusted R-squared of 3.6%. Why, speaking of R-squared, is it not of centralconcern as we interpret these regression statistics?)

How can we make this analysis more precise? One answer is to gather more data.Another is to control for other predictors of voter turnout that are not consequences of thetreatment. (We’ll see why we don’t want to control for consequences of the treatment innext week’s lectures.) Fortunately, we happen to have just such a predictor at hand. TheKansas City voter file contains extensive information about the past voter turnout of every voter. I calculated the average voting rate over several elections from 1998through the summer of 2003. Since these votes occurred before the experiment, we neednot be concerned that they represent consequences of the treatment. Let’s call the pastvote average Z and control for it in our revised regression model:Y = a + bX + cZ + U.In terms of sheer Minitab mechanics, this model is easy to estimate.
Regression Analysis: VOTE03 versus TREATMEN, VOTEAVG
The regression equation isVOTE03 = - 0.310 + 0.0452 TREATMEN + 1.17 VOTEAVGPredictor Coef SE Coef T PConstant -0.31046 0.08199 -3.79 0.001TREATMEN 0.04518 0.01446 3.12 0.004VOTEAVG 1.1723 0.1591 7.37 0.000S = 0.0381032 R-Sq = 70.7% R-Sq(adj) = 68.4%
Take a close look at these regression results, and compare them to the results presentedabove. The estimated treatment effect is somewhat larger than before. This pattern isspecific to this example, not a general feature of multiple regression. You should notexpect coefficients to grow when control variables are added to a regression equation – especially when analyzing experimental data (Why?). In this case, the estimatedtreatment effect grows from .036 to .045, but it could have gone the other way. It justhappens to be the case that randomly assigned treatments were more likely to go to precincts with below average VOTEAVG scores. In the plot below, we see that thecorrelation between TREATMEN and VOTEAVG is slightly negative. Thus, some of the positive influence of the treatment is understated in the first regression, because itignores the fact that the treated precincts had slightly lower voting propensities before theexperiment got underway.