Attribution Non-Commercial (BY-NC)

100 views

Attribution Non-Commercial (BY-NC)

- TUGAS KIMED LIDYA PARAMETER.docx
- Checking Assumptions of Normality Regression STATA
- Final Time Serirs
- MCgretl
- thesis-a
- Review Questions and Key Oct 4 11
- Economics Final
- Regression[1]
- SenSemRegress1S14
- Formulating a Trip Production Prediction Model For
- ACTWEE
- Project Part C(Final)
- Regression
- as13
- L04 Calibration
- Stats 104E FinalExamCheatSheet
- Note on Panel Data
- Kuliah 5, 6 - Simple Linear Regression
- Output
- Johnston-Econometric Methods-McGraw Hill Higher Education (1997) (1).pdf

You are on page 1of 15

Multiple regression refers to regression with multiple explanatory variables (but just one response

variable). Multiple regression is an amazingly flexible tool which can be used to model linear and

nonlinear relationships. Don’t be fooled by the “linear” in “linear regression”: we’ve already seen

how simple linear regression can be used to model nonlinear relationships by transforming one or both

of the explanatory and response variables. There are more ways using multiple regression. It’s even

possible to incorporate categorical variables into multiple regression models.

Examples:

1. One explanatory variable, but a quadratic relationship

µ (Y X ) = β 0 + β1 X + β 2 X 2

We can include higher order powers of X, although this is unusual unless there is a theoretical reason

for it. Note: we always include lower order terms when a higher order term is in a model. For

example, we always include X if X 2 is in the model.

µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2

µ (Y X 1 , X 2 ) = β 0 + β1 X 1 + β 2 X 2 + β 3 X 1 X 2

The term X 1 X 2 is the product of the two variables. We’ll see why this is called an interaction below.

4. The explanatory variables can be binary (0,1). In fact, the ANOVA and pooled two-sample t models

can be written as special cases of the linear regression model.

• Normality: the Y values at any particular combination of X values are normally distributed.

• Constant variance: the variance of the Y values is the same at every combination of X values.

• Independence: they Y’s are independent draws from their respective distributions.

These assumptions can also be summarized by writing a linear regression model in the following way,

using model 2 above as an example:

Y = β 0 + β1 X 1 + β 2 X 2 + ε

where the ε ’s are independent N (0, σ ) random variables (the subscript i has been omitted).

How do we fit the models? Least squares can still be used: find the values of the β’s to minimize the

n n

sum of squared residuals, ∑ (Yi − Yˆi ) 2 = ∑ resi2 . It’s not necessary to examine the formulas for the

i =1 i =1

least squares estimators of all the β’s, but the formulas can be obtained fairly easily using calculus, no

Chapter 9, page 2

matter how many β’s there are. Formulas for standard errors of the estimates can also be derived.

Confidence intervals and tests for individual coefficients can be computed under the assumptions of

the model. These will be covered in Chapter 10.

Example 1: Ozone data again. Four variables measured: ozone, max temperature, wind speed, solar

radiation. Examine ozone vs. wind speed; loess fit on left.

200 200

150 150

Ozone(ppb)

Ozone(ppb)

100 100

50 50

0 0

0.0 5.0 10.0 15.0 20.0 0.0 5.0 10.0 15.0 20.0

Wind speed (mph) Wind speed (mph)

A log transformation on ozone could be tried, but if the variance looks approximately constant, might

not want to transform ozone. Might try a quadratic relationship (right, above):

µ (Ozone Wind) =

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 166.733 14.306 11.655 .000

Wind speed (mph) -19.958 2.735 -2.135 -7.298 .000

Wind^2 .662 .124 1.564 5.347 .000

a. Dependent Variable: Ozone(ppb)

Fitted model is

µ̂ (Ozone Wind) =

What do you think of the quadratic model based on the graph above?

Chapter 9, page 3

Notes

• Interpretation of the coefficients in a quadratic model is not straightforward. In particular, we

cannot interpret β̂1 the way we did in a simple linear regression model, since the change in

Ozone when Wind speed changes is affected by both β̂ and βˆ .

1 2

• If you include a quadratic term then you must also include a linear term for that variable. It

does not matter whether the coefficient on the linear term is statistically significant or not. You

cannot interpret the statistical significance of the coefficient on a variable if a higher order term

involving that variable is included in the model.

Example 2: Four variables were measured at each of thirty meteorological stations scattered

throughout California. These variables were: average annual precipitation (in inches), altitude (in

feet), latitude (in degrees), and whether or not the station was on the leeward side of the mountains in

the rain shadow (1 = in rain shadow, 0 = not in rain shadow). The goal was to examine the relationship

between precipitation and the other variables and also to create a model to predict precipitation.

Rain

Location Precip (in) Elevation (ft) Latitude Shadow

1 39.57 43 40.8 0

2 23.27 341 40.2 1

3 18.20 4152 33.8 1

4 37.48 74 39.4 0

5 49.26 6752 39.3 0

6 21.82 52 37.8 0

7 18.07 25 38.5 1

8 14.17 95 37.4 1

9 42.63 6360 36.6 0

10 13.85 74 36.7 1

11 9.44 331 36.7 1

12 19.33 57 35.7 0

13 15.67 740 35.7 1

14 6.00 489 35.4 1

15 5.73 4108 37.3 1

16 47.82 4850 40.4 0

17 17.95 120 34.4 0

18 18.20 4152 40.3 1

19 10.03 4036 41.9 1

20 4.63 913 34.8 1

21 14.74 699 34.2 0

22 15.02 312 34.1 0

23 12.36 50 33.8 0

24 8.26 125 37.8 1

25 4.05 268 33.6 1

26 9.94 19 32.7 0

27 4.25 2105 34.1 1

28 1.66 -178 36.5 1

29 74.87 35 41.7 0

30 15.95 60 39.2 1

Chapter 9, page 4

Scatterplot matrix: Graphs…Scatter…Matrix. Put in a categorical variable under Set Markers By. The

default is different colors, but you can edit the scatterplot to use different symbols.

In rain shadow

0

1

Precipitation (in)

Altitude (ft)

Latitude (degrees)

Ignore the Rainshadow variable for the time being. It also looks like transformations might be needed,

but for now, let’s ignore that also.

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -105.733 36.165 -2.924 .007

Latitude (degrees) 3.338 .984 .536 3.392 .002

Altitude (ft) .0014 .0013 .178 1.129 .269

a. Dependent Variable: Precipitation (in)

According to the fitted model, what’s the predicted precipitation for a location at latitude 40 degrees

and 1000 feet in elevation?

Chapter 9, page 5

• β1 represents the increase in mean precipitation for every one degree increase in latitude, given

that altitude remains fixed.

• β 2 represents the increase in mean precipitation for every one foot increase in altitude, given

that latitude remains fixed. It would be more natural to express this change for every 100 or

1000 feet increase in altitude.

• These interpretations are valid only in the range of combinations of combinations of latitude

and altitude that we have observed in our data.

6000

4000

Altitude (ft)

2000

Latitude (degrees)

The model assumes that there is a linear relationship between mean precipitation and latitude for every

altitude. The slope of the line is the same for all altitudes, but the intercept changes.

µ (Precip Latitude, Altitude = 1000) =

µ (Precip Latitude, Altitude = 3000) =

Chapter 9, page 6

Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude

for every latitude. The slope of the line is the same for all latitudes, but the intercept changes.

Latitude = 34 degrees

µ (Precip Latitude = 34, Altitude) =

Latitude = 40 degrees

µ (Precip Latitude = 40, Altitude) =

We can also add an interaction term to the model. An interaction term is the product of two (or more)

explanatory variables. In SPSS, we can create a new variable which is the product of Altitude and

Latitude using Transform…Compute.

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -144.230 44.487 -3.242 .003

Latitude (degrees) 4.375 1.206 .702 3.628 .001

Altitude (ft) .0304 .0202 3.830 1.501 .145

Altitude*Latitude -.00076 .00053 -3.700 -1.434 .163

a. Dependent Variable: Precipitation (in)

According to this model, the relationship between Precipitation and Latitude is linear for any Altitude,

but both the intercept and slope of the relationship depend on the Altitude:

µ (Precip Latitude, Altitude = 1000) =

µ (Precip Latitude, Altitude = 3000) =

Chapter 9, page 7

Similarly, the model assumes that there is a linear relationship between mean precipitation and altitude

for every latitude. Both the intercept and slope of the line depend on the particular value of the

latitude.

Latitude = 34 degrees

µ (Precip Latitude = 34, Altitude) =

Latitude = 40 degrees

µ (Precip Latitude = 40, Altitude) =

Notes

• Interpretation of the model is easier in the absence of interactions so we usually avoid

interactions unless either a) there is strong evidence to the contrary, b) the interaction is

expected to be present, c) a test of the interaction term is meaningful scientifically in the

context of the problem. If prediction (and not interpretation) is the only goal, then we don’t

need to worry about the lack of interpretability of interactions.

• If an interaction between two variables is included in the model, then each of the variables

individually must be included. It doesn’t make sense not to. It does not matter whether the

coefficients on the individual variables are statistically significant or not. You cannot interpret

the statistical significance of the coefficient on individual variables if there is also an

interaction between those variables in the model.

Indicator variables

0/1 indicator variables, like Rainshadow, can be used in a multiple regression model to distinguish

between two groups.

This implies there are two separate models relating Precipitation to Latitude, one for locations in the

rain shadow, and one for those not in the rain shadow.

Chapter 9, page 8

What would these two models look like on a graph of Precipitation versus Latitude?

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -103.575 24.514 -4.225 .000

Latitude (degrees) 3.637 .659 .584 5.521 .000

Rainshadow -19.942 3.486 -.605 -5.720 .000

a. Dependent Variable: Precipitation (in)

Chapter 9, page 9

We can also add an interaction term between Latitude and Rainshadow:

What would these two models look like on a graph of Precipitation versus Latitude?

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) -175.457 26.177 -6.703 .000

Latitude (degrees) 5.581 .705 .895 7.912 .000

Rainshadow 139.839 39.019 4.240 3.584 .001

Latitude*Rainshadow -4.315 1.051 -4.871 -4.105 .000

a. Dependent Variable: Precipitation (in)

Chapter 9, page 10

We can graph these two fitted lines in SPSS by graphing Precipitation versus Latitude with

Rainshadow entered into Set Markers By (this gives what the Sleuth calls a “coded scatterplot,” p.

254). Then get into Chart Editor, select one of the groups of points by clicking on the symbol on the

legend and then click Add Fit Line. Repeat for the other group. The plotting symbols and colors can

also be changed.

Rainshadow

80.0 0

1

60.0

Precipitation (in)

40.0

20.0

0.0

Latitude (degrees)

• β 0 represents the intercept of the model relating Precipitation to Latitude for locations not in

the rain shadow. The intercept isn’t of much interest, though, since Latitude of 0 is not

meaningful for these data.

• β1 represents the slope of the model relating Precipitation to Latitude for locations not in the

rain shadow. Thus, according to the model, mean precipitation increases by β1 for every one

degree increase in latitude for locations not in the rain shadow.

• β 2 represents the difference in mean precipitation for locations at Latitude 0 in and not in the

rain shadow. This isn’t meaningful since Latitude of 0 isn’t meaningful.

• β 3 represents the difference in the slope on Latitude for locations in and not in the rain

shadow. More meaningful is the quantity β1 + β 3 . According to the model, mean

precipitation increases by β1 + β 3 every one degree increase in latitude for locations in the rain

shadow.

Chapter 9, page 11

Question: what is the difference between fitting the above 3-variable model with an interaction and

fitting two separate linear regression models, one for locations in the rain shadow and one for those not

in the rain shadow? Are the assumptions of the two sets of models different?

What does this model imply about locations in and not in the rain shadow? If we have the assumptions

of normal distributions with constant variance and independent observations, what model we have

already studied is it equivalent to?

Chapter 9, page 12

Regression results:

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 30.984 3.760 8.240 .000

Rainshadow -19.723 4.995 -.598 -3.949 .000

a. Dependent Variable: Precipitation (in)

Here’s some output from the two-sample t procedure. What’s the correspondence with the linear

regression model results?

Group Statistics

Std. Error

Rainshadow N Mean Std. Deviation Mean

Precipitation (in) 0 13 30.9838 19.35004 5.36674

1 17 11.2606 6.38787 1.54929

95% Confidence

Interval of the

Mean Std. Error Difference

t df Sig. (2-tailed) Difference Difference Lower Upper

Precipitation (in) Equal variances

3.949 28 .000 19.7233 4.9948 9.4919 29.9547

assumed

Equal variances

3.531 14.010 .003 19.7233 5.5859 7.7436 31.7030

not assumed

Chapter 9, page 13

Categorical variables with more than 2 levels

Categorical variables in linear regression models are called factors. How do we incorporate a factor

with 3 or more levels? In other words, how do we allow a separate effect for each level of the factor?

• A factor with k levels needs k-1 indicator variables to represent its effects in a regression

model.

Example: Meadowfoam case study, Chap. 9, p. 246. Light level (6 levels) can be treated as either a

quantitative variable or a categorical variable represented by 5 indicator variables. What’s the

difference, both in terms of the number of parameters in the model, and what the model says about the

relationship between number of flowers and light level?

We can create 6 indicator variables for light level, as seen in Display 9.7 on p. 246. Only 5 of them are

needed in the model because the constant term represents the omitted level. The level that is omitted is

called the reference level; the coefficients on the indicator variables represent differences from the

reference level.

Compare an ANOVA of Flowers on Light level to a regression of Flowers on the indicator variables

L300, L450, L600, L750, and L900.

Descriptives

Flowers

95% Confidence Interval for

Mean

N Mean Std. Deviation Std. Error Lower Bound Upper Bound

150 4 73.275 7.379 3.689 61.533 85.017

300 4 64.150 11.455 5.727 45.923 82.377

450 4 59.900 9.017 4.509 45.551 74.249

600 4 50.050 10.035 5.017 34.082 66.018

750 4 45.525 11.847 5.923 26.674 64.376

900 4 43.925 6.592 3.296 33.436 54.414

Total 24 56.138 13.733 2.803 50.338 61.937

Chapter 9, page 14

ANOVA

Flowers

Sum of

Squares df Mean Square F Sig.

Between Groups 2683.514 5 536.703 5.839 .002

Within Groups 1654.423 18 91.912

Total 4337.936 23

Multiple Comparisons

LSD

Mean

Difference 95% Confidence Interval

(I) Light intensity (J) Light intensity (I-J) Std. Error Sig. Lower Bound Upper Bound

150 300 9.12500 6.77910 .195 -5.1174 23.3674

450 13.37500 6.77910 .064 -.8674 27.6174

600 23.22500* 6.77910 .003 8.9826 37.4674

750 27.75000* 6.77910 .001 13.5076 41.9924

900 29.35000* 6.77910 .000 15.1076 43.5924

*. The mean difference is significant at the .05 level.

µ (Flowers LIGHT) = β 0 + β1L300 + β 2 L450 + β 3 L600 + β 4 L750 + β 5 L900

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 2683.514 5 536.703 5.839 .002a

Residual 1654.423 18 91.912

Total 4337.936 23

a. Predictors: (Constant), L900, L750, L600, L450, L300

b. Dependent Variable: Flowers

Coefficientsa

Unstandardized

Coefficients 95% Confidence Interval for B

Model B Std. Error t Sig. Lower Bound Upper Bound

1 (Constant) 73.275 4.794 15.286 .000 63.204 83.346

L300 -9.125 6.779 -1.346 .195 -23.367 5.117

L450 -13.375 6.779 -1.973 .064 -27.617 .867

L600 -23.225 6.779 -3.426 .003 -37.467 -8.983

L750 -27.750 6.779 -4.093 .001 -41.992 -13.508

L900 -29.350 6.779 -4.329 .000 -43.592 -15.108

a. Dependent Variable: Flowers

Chapter 9, page 15

- TUGAS KIMED LIDYA PARAMETER.docxUploaded bymaulidya
- Checking Assumptions of Normality Regression STATAUploaded byAce Cosmo
- Final Time SerirsUploaded byKabir Agarwal
- MCgretlUploaded byFrancisco Silva
- thesis-aUploaded byRaheel M Hanif
- Review Questions and Key Oct 4 11Uploaded byWayne WaiLok Chan
- Economics FinalUploaded byAyesha Rehman
- Regression[1]Uploaded byRangothri Sreenivasa Subramanyam
- SenSemRegress1S14Uploaded byRahulsinghoooo
- Formulating a Trip Production Prediction Model ForUploaded byInternational Journal of Research in Engineering and Technology
- ACTWEEUploaded bySolarPanel
- Project Part C(Final)Uploaded bynirmitshelat
- RegressionUploaded byAna-Maria Zaiceanu
- as13Uploaded byLakshmi Seth
- L04 CalibrationUploaded bySteven Lambert
- Stats 104E FinalExamCheatSheetUploaded byVivian Brooklyn Chen
- Note on Panel DataUploaded byHassanMubasher
- Kuliah 5, 6 - Simple Linear RegressionUploaded byFaizal Akbar
- OutputUploaded bysanti
- Johnston-Econometric Methods-McGraw Hill Higher Education (1997) (1).pdfUploaded byHimanshu Goyal
- Chapter 4 RegressionUploaded byIvan Ng
- TantiUploaded bytanti_restiyanti
- Mathematics and Statistics - Resume 8aUploaded byRizky Darmawan
- SPE-10279-MSUploaded byPedro Guerrero
- RegressionUploaded byshags
- Kak La Manos 2015Uploaded byNicolas Enrique Bastias Tejos
- dokumen.tips_jurnal-nilai-dan-kepuasan-pelanggan (1).docUploaded byChica
- energies-11-00448-v2.pdfUploaded byMadalina Dumitru
- Investigation of Stock Market as a barometer of EconomyUploaded byNamrata
- rovigatti2018-1Uploaded byDavid Sanchez

- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Close Out NettingUploaded byFanny Sylvia C.

- Barry L Bayus, Sachin Gupta.pdfUploaded byBenjamin Naula
- Are Future Earnings Related to Past Earnings Performance and Dividend.pdfUploaded byHop Luu
- Basic Regression AnalysisUploaded byAnantha Vijayan
- Galbraith Expectancy TheoryUploaded byLudmila Stanova
- Card- Krueger_ Minimum Wages and EmploymentUploaded bydanielcsaba
- Etrix1_Exsh4Uploaded byAriana Tushin
- Fixed/Random OLS StateUploaded byPasha Gee
- Religious Conversion in Colonial Africa by Nathan NunnUploaded byeatmeup3000
- Kim Jungmin JMPUploaded byRobals_jals
- Econometrics assignmentUploaded byCurtis Miller
- Do More Expensive Wines Taste BetterUploaded byBlackScholeSun
- SPSS Advanced GuideUploaded bynanakethan
- Chapter 04 AnswersUploaded bysadfj545
- SSRN-id1612486Uploaded byluv_y_kush3575
- Multiple Regression Tutorial 3Uploaded by2plus5is7
- OPTIMAL ACCOUNTING BASED DEFAULT PREDICTION MODEL FOR THE UK SMEsUploaded byAngus Sadpet
- Bham13P3slidesUploaded byliemuel
- Fixed Effects, Random Effects Model Cheat SheetUploaded byEric Rojas
- Socius- Sociological Research for a Dynamic World-2016-Mishel-2378023115621316Uploaded byMAP
- Whats Mahattan Worth v2 March 2015-1Uploaded bymircea
- Richardson Walker 2001 - Social Disclosure, Financial Disclosure and the Cost of Equity CapitalUploaded byDiny Sulis
- Competition and Other External Determinants of The Profitabiliy of Islamic BanksUploaded bytalhajafry
- Multiple Regression - Retired 2013Uploaded byscjofyWFawlroa2r06YFVabfbaj
- Section10 SolutionsUploaded byMASHIAT MUTMAINNAH
- Corporate FinanceUploaded byFelipe Proença
- Customer Satisfaction and Dissatisfaction in Retail Banking Exploring TheUploaded bynasir
- Analysis of Vehicles’ Daily Fuel Consumption Frontiers with Long-Term Controller Area Network DataUploaded bySaiSai
- Essentials of Econometrics - ContentsUploaded byM T
- Momentum Effect in the Vietnamese Stock MarketUploaded byHiển Hồ
- Odysseas PavlatosUploaded byDiegoMoreira

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.