## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Multiple regression refers to regression with multiple explanatory variables (but just one response variable). Multiple regression is an amazingly flexible tool which can be used to model linear and nonlinear relationships. Don’t be fooled by the “linear” in “linear regression”: we’ve already seen how simple linear regression can be used to model nonlinear relationships by transforming one or both of the explanatory and response variables. There are more ways using multiple regression. It’s even possible to incorporate categorical variables into multiple regression models. Examples: 1. One explanatory variable, but a quadratic relationship µ (Y X ) = β 0 + β1 X + β 2 X 2 We can include higher order powers of X, although this is unusual unless there is a theoretical reason for it. Note: we always include lower order terms when a higher order term is in a model. For example, we always include X if X 2 is in the model. 2. Two explanatory variables: µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2 3. Two explanatory variables with an interaction µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2 + β 3 X 1 X 2 The term X 1 X 2 is the product of the two variables. We’ll see why this is called an interaction below. 4. The explanatory variables can be binary (0,1). In fact, the ANOVA and pooled two-sample t models can be written as special cases of the linear regression model. For normal-based inferences, we have the usual assumptions: • Normality: the Y values at any particular combination of X values are normally distributed. • • Constant variance: the variance of the Y values is the same at every combination of X values. Independence: they Y’s are independent draws from their respective distributions.

These assumptions can also be summarized by writing a linear regression model in the following way, using model 2 above as an example: Y = β 0 + β1 X 1 + β 2 X 2 + ε where the ε ’s are independent N (0, σ ) random variables (the subscript i has been omitted). How do we fit the models? Least squares can still be used: find the values of the β’s to minimize the ˆ sum of squared residuals, ∑ (Yi − Yi ) 2 = ∑ resi2 . It’s not necessary to examine the formulas for the

i =1 i =1 n n

least squares estimators of all the β’s, but the formulas can be obtained fairly easily using calculus, no

Chapter 9, page 2 matter how many β’s there are. Formulas for standard errors of the estimates can also be derived. Confidence intervals and tests for individual coefficients can be computed under the assumptions of the model. These will be covered in Chapter 10.

Example 1: Ozone data again. Four variables measured: ozone, max temperature, wind speed, solar radiation. Examine ozone vs. wind speed; loess fit on left.

200

200

150

150

Ozone(ppb)

Ozone(ppb)

100

100

50

50

0 0.0 5.0 10.0 15.0 20.0

0 0.0 5.0 10.0 15.0 20.0

Wind speed (mph)

Wind speed (mph)

A log transformation on ozone could be tried, but if the variance looks approximately constant, might not want to transform ozone. Might try a quadratic relationship (right, above):

µ (Ozone Wind) =

Coefficientsa Unstandardized Coefficients B Std. Error 166.733 14.306 -19.958 2.735 .662 .124 Standardized Coefficients Beta -2.135 1.564

Model 1

(Constant) Wind speed (mph) Wind^2

t 11.655 -7.298 5.347

Sig. .000 .000 .000

a. Dependent Variable: Ozone(ppb)

Fitted model is

ˆ µ (Ozone Wind) =

What is the predicted ozone level when Wind speed is 10 mph?

What do you think of the quadratic model based on the graph above?

Chapter 9, page 3 Notes • Interpretation of the coefficients in a quadratic model is not straightforward. In particular, we ˆ cannot interpret β1 the way we did in a simple linear regression model, since the change in ˆ ˆ Ozone when Wind speed changes is affected by both β and β .

1 2

•

If you include a quadratic term then you must also include a linear term for that variable. It does not matter whether the coefficient on the linear term is statistically significant or not. You cannot interpret the statistical significance of the coefficient on a variable if a higher order term involving that variable is included in the model.

Example 2: Four variables were measured at each of thirty meteorological stations scattered throughout California. These variables were: average annual precipitation (in inches), altitude (in feet), latitude (in degrees), and whether or not the station was on the leeward side of the mountains in the rain shadow (1 = in rain shadow, 0 = not in rain shadow). The goal was to examine the relationship between precipitation and the other variables and also to create a model to predict precipitation.

Location 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Precip (in) 39.57 23.27 18.20 37.48 49.26 21.82 18.07 14.17 42.63 13.85 9.44 19.33 15.67 6.00 5.73 47.82 17.95 18.20 10.03 4.63 14.74 15.02 12.36 8.26 4.05 9.94 4.25 1.66 74.87 15.95 Elevation (ft) 43 341 4152 74 6752 52 25 95 6360 74 331 57 740 489 4108 4850 120 4152 4036 913 699 312 50 125 268 19 2105 -178 35 60 Latitude 40.8 40.2 33.8 39.4 39.3 37.8 38.5 37.4 36.6 36.7 36.7 35.7 35.7 35.4 37.3 40.4 34.4 40.3 41.9 34.8 34.2 34.1 33.8 37.8 33.6 32.7 34.1 36.5 41.7 39.2 Rain Shadow 0 1 1 0 0 0 1 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 0 1 1 0 1 1 0 1

Chapter 9, page 4 Scatterplot matrix: Graphs…Scatter…Matrix. Put in a categorical variable under Set Markers By. The default is different colors, but you can edit the scatterplot to use different symbols.

In rain shadow 0 1

Latitude (degrees)

Altitude (ft)

Precipitation (in)

Precipitation (in)

Altitude (ft)

Latitude (degrees)

Ignore the Rainshadow variable for the time being. It also looks like transformations might be needed, but for now, let’s ignore that also. Consider the model µ (Precip Latitude, Altitude) = β 0 + β1Latitude + β 2 Altitude

Coefficientsa Unstandardized Coefficients B Std. Error -105.733 36.165 3.338 .984 .0014 .0013 Standardized Coefficients Beta .536 .178

Model 1

(Constant) Latitude (degrees) Altitude (ft)

t -2.924 3.392 1.129

Sig. .007 .002 .269

a. Dependent Variable: Precipitation (in)

ˆ µ (Precip Latitude, Altitude) =

According to the fitted model, what’s the predicted precipitation for a location at latitude 40 degrees and 1000 feet in elevation?

Chapter 9, page 5

Interpreting the coefficients in the model: • β1 represents the increase in mean precipitation for every one degree increase in latitude, given that altitude remains fixed.

•

β 2 represents the increase in mean precipitation for every one foot increase in altitude, given that latitude remains fixed. It would be more natural to express this change for every 100 or 1000 feet increase in altitude.

•

These interpretations are valid only in the range of combinations of combinations of latitude and altitude that we have observed in our data.

6000

Altitude (ft)

4000

2000

0

32.0

34.0

36.0

38.0

40.0

42.0

Latitude (degrees)

Further interpretation of the model The model assumes that there is a linear relationship between mean precipitation and latitude for every altitude. The slope of the line is the same for all altitudes, but the intercept changes.

**Altitude = 1000 feet µ (Precip Latitude, Altitude = 1000) =
**

ˆ µ (Precip Latitude, Altitude = 1000) =

**Altitude = 3000 feet µ (Precip Latitude, Altitude = 3000) =
**

ˆ µ (Precip Latitude, Altitude = 3000) =

Chapter 9, page 6 Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude for every latitude. The slope of the line is the same for all latitudes, but the intercept changes. Latitude = 34 degrees µ (Precip Latitude = 34, Altitude) =

ˆ µ (Precip Latitude = 34, Altitude) =

**Latitude = 40 degrees µ (Precip Latitude = 40, Altitude) =
**

ˆ µ (Precip Latitude = 40, Altitude) =

We can also add an interaction term to the model. An interaction term is the product of two (or more) explanatory variables. In SPSS, we can create a new variable which is the product of Altitude and Latitude using Transform…Compute.

Coefficientsa Unstandardized Coefficients B Std. Error -144.230 44.487 4.375 1.206 .0304 .0202 -.00076 .00053 Standardized Coefficients Beta .702 3.830 -3.700

Model 1

(Constant) Latitude (degrees) Altitude (ft) Altitude*Latitude

t -3.242 3.628 1.501 -1.434

Sig. .003 .001 .145 .163

a. Dependent Variable: Precipitation (in)

The model is µ (Precip Latitude, Altitude) = β 0 + β1Latitude + β 2 Altitude + β 3 Latitude * Altitude

According to this model, the relationship between Precipitation and Latitude is linear for any Altitude, but both the intercept and slope of the relationship depend on the Altitude: Altitude = 1000 feet µ (Precip Latitude, Altitude = 1000) =

ˆ µ (Precip Latitude, Altitude = 1000) =

**Altitude = 3000 feet µ (Precip Latitude, Altitude = 3000) =
**

ˆ µ (Precip Latitude, Altitude = 3000) =

Chapter 9, page 7 Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude for every latitude. Both the intercept and slope of the line depend on the particular value of the latitude. Latitude = 34 degrees µ (Precip Latitude = 34, Altitude) =

ˆ µ (Precip Latitude = 34, Altitude) =

**Latitude = 40 degrees µ (Precip Latitude = 40, Altitude) =
**

ˆ µ (Precip Latitude = 40, Altitude) =

Notes • Interpretation of the model is easier in the absence of interactions so we usually avoid interactions unless either a) there is strong evidence to the contrary, b) the interaction is expected to be present, c) a test of the interaction term is meaningful scientifically in the context of the problem. If prediction (and not interpretation) is the only goal, then we don’t need to worry about the lack of interpretability of interactions. •

If an interaction between two variables is included in the model, then each of the variables individually must be included. It doesn’t make sense not to. It does not matter whether the coefficients on the individual variables are statistically significant or not. You cannot interpret the statistical significance of the coefficient on individual variables if there is also an interaction between those variables in the model.

Indicator variables 0/1 indicator variables, like Rainshadow, can be used in a multiple regression model to distinguish between two groups.

**µ (Precip Latitude, Rainshadow ) = β 0 + β1Latitude + β 2 Rainshadow
**

This implies there are two separate models relating Precipitation to Latitude, one for locations in the rain shadow, and one for those not in the rain shadow.

µ (Precip Latitude, Rainshadow = 1) = µ (Precip Latitude, Rainshadow = 0) =

Chapter 9, page 8 What would these two models look like on a graph of Precipitation versus Latitude?

**Estimating the model:
**

Coefficientsa Unstandardized Coefficients B Std. Error -103.575 24.514 3.637 .659 -19.942 3.486 Standardized Coefficients Beta .584 -.605

Model 1

(Constant) Latitude (degrees) Rainshadow

t -4.225 5.521 -5.720

Sig. .000 .000 .000

a. Dependent Variable: Precipitation (in)

ˆ µ (Precip Latitude, Rainshadow ) =

ˆ µ (Precip Latitude, Rainshadow = 1) =

ˆ µ (Precip Latitude, Rainshadow = 0) =

ˆ ˆ Interpretation of β1 and β 2 :

Chapter 9, page 9 We can also add an interaction term between Latitude and Rainshadow:

**µ (Precip Latitude, Rainshadow ) = β 0 + β1Latitude + β 2 Rainshadow + β 3Latitude * Rainshadow µ (Precip Latitude, Rainshadow = 1) = µ (Precip Latitude, Rainshadow = 0) =
**

What would these two models look like on a graph of Precipitation versus Latitude?

Coefficientsa Unstandardized Coefficients B Std. Error -175.457 26.177 5.581 .705 139.839 39.019 -4.315 1.051 Standardized Coefficients Beta .895 4.240 -4.871

Model 1

(Constant) Latitude (degrees) Rainshadow Latitude*Rainshadow

t -6.703 7.912 3.584 -4.105

Sig. .000 .000 .001 .000

a. Dependent Variable: Precipitation (in)

ˆ µ (Precip Latitude, Rainshadow ) =

ˆ µ (Precip Latitude, Rainshadow = 1) =

ˆ µ (Precip Latitude, Rainshadow = 0) =

Chapter 9, page 10 We can graph these two fitted lines in SPSS by graphing Precipitation versus Latitude with Rainshadow entered into Set Markers By (this gives what the Sleuth calls a “coded scatterplot,” p. 254). Then get into Chart Editor, select one of the groups of points by clicking on the symbol on the legend and then click Add Fit Line. Repeat for the other group. The plotting symbols and colors can also be changed.

80.0

Rainshadow 0 1

60.0

Precipitation (in)

40.0

20.0

0.0 32.0 34.0 36.0 38.0 40.0 42.0

Latitude (degrees)

Interpretation of the coefficients in the model: •

**β 0 represents the intercept of the model relating Precipitation to Latitude for locations not in
**

the rain shadow. The intercept isn’t of much interest, though, since Latitude of 0 is not meaningful for these data.

•

β1 represents the slope of the model relating Precipitation to Latitude for locations not in the rain shadow. Thus, according to the model, mean precipitation increases by β1 for every one degree increase in latitude for locations not in the rain shadow.

β 2 represents the difference in mean precipitation for locations at Latitude 0 in and not in the rain shadow. This isn’t meaningful since Latitude of 0 isn’t meaningful.

•

•

β 3 represents the difference in the slope on Latitude for locations in and not in the rain shadow. More meaningful is the quantity β1 + β 3 . According to the model, mean precipitation increases by β1 + β 3 every one degree increase in latitude for locations in the rain

shadow.

Chapter 9, page 11

Question: what is the difference between fitting the above 3-variable model with an interaction and fitting two separate linear regression models, one for locations in the rain shadow and one for those not in the rain shadow? Are the assumptions of the two sets of models different?

Model with all three variables

**µ (Precip Latitude, Altitude, Rainshadow ) = β 0 + β1Latitude + β 2 Altitude + β 3Rainshadow
**

How do you interpret the coefficients in this model?

What if you added all 2-way interactions?

More on indicator variables

Consider the model only with Rainshadow:

**µ (Precip Rainshadow ) = β 0 + β1Rainshadow
**

What does this model imply about locations in and not in the rain shadow? If we have the assumptions of normal distributions with constant variance and independent observations, what model we have already studied is it equivalent to?

Chapter 9, page 12

Regression results:

Coefficientsa Unstandardized Coefficients B Std. Error 30.984 3.760 -19.723 4.995 Standardized Coefficients Beta -.598

Model 1

(Constant) Rainshadow

t 8.240 -3.949

Sig. .000 .000

a. Dependent Variable: Precipitation (in)

Here’s some output from the two-sample t procedure. What’s the correspondence with the linear regression model results?

Group Statistics Rainshadow 0 1 N 13 17 Mean 30.9838 11.2606 Std. Deviation 19.35004 6.38787 Std. Error Mean 5.36674 1.54929

Precipitation (in)

Independent Samples Test t-test for Equality of Means 95% Confidence Interval of the Difference Lower Upper 9.4919 7.7436 29.9547 31.7030

t Precipitation (in) Equal variances assumed Equal variances not assumed 3.949 3.531

df 28 14.010

Sig. (2-tailed) .000 .003

Mean Difference 19.7233 19.7233

Std. Error Difference 4.9948 5.5859

Chapter 9, page 13

Categorical variables with more than 2 levels Categorical variables in linear regression models are called factors. How do we incorporate a factor with 3 or more levels? In other words, how do we allow a separate effect for each level of the factor?

•

A factor with k levels needs k-1 indicator variables to represent its effects in a regression model.

Example: Meadowfoam case study, Chap. 9, p. 246. Light level (6 levels) can be treated as either a quantitative variable or a categorical variable represented by 5 indicator variables. What’s the difference, both in terms of the number of parameters in the model, and what the model says about the relationship between number of flowers and light level?

We can create 6 indicator variables for light level, as seen in Display 9.7 on p. 246. Only 5 of them are needed in the model because the constant term represents the omitted level. The level that is omitted is called the reference level; the coefficients on the indicator variables represent differences from the reference level. Compare an ANOVA of Flowers on Light level to a regression of Flowers on the indicator variables L300, L450, L600, L750, and L900. One-way ANOVA output:

Descriptives Flowers 95% Confidence Interval for Mean Lower Bound Upper Bound 61.533 85.017 45.923 82.377 45.551 74.249 34.082 66.018 26.674 64.376 33.436 54.414 50.338 61.937

N 150 300 450 600 750 900 Total 4 4 4 4 4 4 24

Mean 73.275 64.150 59.900 50.050 45.525 43.925 56.138

Std. Deviation 7.379 11.455 9.017 10.035 11.847 6.592 13.733

Std. Error 3.689 5.727 4.509 5.017 5.923 3.296 2.803

Chapter 9, page 14

ANOVA Flowers Sum of Squares 2683.514 1654.423 4337.936 df 5 18 23 Mean Square 536.703 91.912 F 5.839 Sig. .002

Between Groups Within Groups Total

Multiple Comparisons Dependent Variable: Flowers LSD Mean Difference (I-J) 9.12500 13.37500 23.22500* 27.75000* 29.35000*

(I) Light intensity 150

(J) Light intensity 300 450 600 750 900

Std. Error 6.77910 6.77910 6.77910 6.77910 6.77910

Sig. .195 .064 .003 .001 .000

95% Confidence Interval Lower Bound Upper Bound -5.1174 23.3674 -.8674 27.6174 8.9826 37.4674 13.5076 41.9924 15.1076 43.5924

*. The mean difference is significant at the .05 level.

**Regression output for model µ (Flowers LIGHT) = β 0 + β1L300 + β 2 L450 + β 3 L600 + β 4 L750 + β 5 L900
**

ANOVAb Model 1 Sum of Squares 2683.514 1654.423 4337.936 df 5 18 23 Mean Square 536.703 91.912 F 5.839 Sig. .002a

Regression Residual Total

a. Predictors: (Constant), L900, L750, L600, L450, L300 b. Dependent Variable: Flowers

Coefficientsa Unstandardized Coefficients B Std. Error 73.275 4.794 -9.125 6.779 -13.375 6.779 -23.225 6.779 -27.750 6.779 -29.350 6.779 95% Confidence Interval for B Lower Bound Upper Bound 63.204 83.346 -23.367 5.117 -27.617 .867 -37.467 -8.983 -41.992 -13.508 -43.592 -15.108

Model 1

(Constant) L300 L450 L600 L750 L900

t 15.286 -1.346 -1.973 -3.426 -4.093 -4.329

Sig. .000 .195 .064 .003 .001 .000

a. Dependent Variable: Flowers

Chapter 9, page 15

- econometrics
- 1.pdf
- Cervical Cancer Vaccine - Eco No Metrics Project
- 01-2009 EASD C-peptide Poster 580
- Wong 2004
- 13 Multiple Regression Part3
- KFS2312 Topic 10(1t)
- A Study on Impacts of Demonetization
- Assignment_Item Deletion Table
- Asa
- Regression
- Homework 3
- wqgewqg.txt
- Ordinary least squares
- Econometrics MCQS
- Case 2
- Ana de la o
- Duffee, 2001
- Bio Statistics
- Effectiveness of Direct Resource Delivery Policy to Crop Farmers a Case of Fadama III Project in Enugu State Nigeria
- Blinkers
- g 0954550
- Important
- Trade Diversion, C. Magee
- Viewer Satisfaction
- San Francisco Bread Co. Docx.
- Ch12 Solutions
- 20080628094326_727
- Economics Assignments.doc
- US Federal Reserve

- Chapter 12
- Chapter 20
- ReviewChaps3-4
- Chapter 10
- Chapter 14
- Chapter 21
- Charles Taylor
- Chapter 8
- ReviewChaps1-2
- Chapter 13
- Model- vs. design-based sampling and variance estimation
- Hypo%26PowerLecture
- Chapter 11
- Non%26ParaBoot
- SampleSizeCalcRevisited
- R Matrix Tutor
- Chapter 7
- Latex Short Guide
- Clustering in the Linear Model
- An Ova Power
- GRM
- Chapter 7
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELS
- Chapter 5
- Chapter 6
- Chapter5p2Lecture
- Intro Bootstrap 341
- Data Modeling

Sign up to vote on this title

UsefulNot usefulRead Free for 30 Days

Cancel anytime.

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading