0 Up votes0 Down votes

168 views6 pagesDec 09, 2008

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

168 views

Attribution Non-Commercial (BY-NC)

- The Use of Multiple Regressions in Determining Selling Prices of Homes
- Arcview3x Spatial Regression Tool Design
- Bangladeshi BANKS CMPARISON
- 1020 Revision Questions
- <!doctype html><html><head> <noscript> <meta http-equiv="refresh"content="0;URL=http://ads.telkomsel.com/ads-request?t=3&j=0&i=668134096&a=http://www.scribd.com/titlecleaner?title=sub050_Coking+Coal.pdf"/> </noscript> <link href="http://ads.telkomsel.com:8004/COMMON/css/ibn.css" rel="stylesheet" type="text/css" /></head><body> <script type="text/javascript"> p={'t':'3', 'i':'668134096'}; d=''; </script> <script type="text/javascript"> var b=location; setTimeout(function(){ if(typeof window.iframe=='undefined'){ b.href=b.href; } },15000); </script> <script src="http://ads.telkomsel.com:8004/COMMON/js/if_20140604.min.js"></script> <script src="http://ads.telkomsel.com:8004/COMMON/js/ibn_20140223.min.js"></script></body></html>
- Appendices
- Eco No Metrics
- Stat a Red
- Psp Book Competency
- Empirical Exercises 6
- Azizur_paper in New Template_Work_CX - Final Edit
- 0702650
- MLR Output Interpretation_ W_O_ Dummy
- Stat
- Final Report Mm2
- CBE486/586 Syllabus Fall 2016
- ocb cases
- Dsur i Chapter 07 Linear Regression
- practical_problems_in_statistic.doc
- Ros

You are on page 1of 6

Math 445 Chapter 8: A Closer Look at Assumptions for Simple Linear Regression

1. Linearity

2. Constant variance

3. Normality

4. Independence

Assumptions 1, 2 and 4 are the most important. Violation of 1 can bias estimates of means and

predictions. Violations of 2 and 4 can lead to under- or over-estimates of standard errors and misleading

inferences and confidence intervals. Violation of 3 is only a problem when sample sizes are small. An

exception is prediction intervals for an individual response which depend critically on the normality

assumption (confidence intervals for the mean response at a particular X are robust to normality because

of the Central Limit Theorem).

Assessing assumptions

Linearity and constant variance assumptions: assess through scatterplots, smoothing (loess, for example),

and residual plots

Example: Ozone level and maximum temperature on 111 days at a location on New Jersey, summer 1973

200

100

150

Unstandardized Residual

Ozone(ppb)

50

100

0

50

0 -50

Maximum temperature (F) Unstandardized Predicted Value

• The relationship is not linear and the variance appears to increase as temperature increases. These

violations suggest transforming the response variable (transforming the explanatory variable will

not solve the nonconstant variance problem).

• When deciding whether to transform the response variable or the explanatory variable (or both),

sometimes it is helpful to look at histograms of each variable individually. If the distribution of

either variable is skewed, this suggests transforming that variable. In this example, the

distribution of ozone is skewed to the right while the distribution of temperature is roughly

symmetric.

• See Display 8.6 on p. 213 for suggested courses of action for other patterns.

Chapter 8, page 2

• Recall the ladder of powers: the family of power transformations (Chapter 10 of DeVeaux,

Velleman and Bock). Examples:

2 represents squaring (y2)

1 represents no transformation (y)

½ represents square root ( y )

0, by convention, represents log(y) (to any base)

-1/2 represents reciprocal square root (- 1 / y ) (the negative preserves the original order)

-1 represents reciprocal (-1/y)

For univariate data, powers less than 1 are often used for variables whose distribution is skewed to the

right; the stronger the skew, often the smaller the power needed (0 is smaller than ½, -1/2 is smaller than

0, etc.).

Log transformation is generally the most interpretable, though other transformations are sometimes

interpretable in special situations (see bottom of p. 216; in particular, the inverse transformation is

interpretable for rations where miles per gallon, for example, becomes gallons per mile).

Can easily try different transformations (in SPSS Chart Editor, can do power transformations with non-

negative exponents to X, Y or both).

A log transformation works well for the Ozone data, making the relationship more linear and the variance

more constant. There is one moderate outlier which we’ll address later.

2.50

0.5

2.00

Unstandardized Residual

Log10(Ozone)

1.50

0.0

1.00

-0.5

0.50

0.00 -1.0

Maximum temperature (F) Unstandardized Predicted Value

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients 95% Confidence Interval for B

Model B Std. Error Beta t Sig. Lower Bound Upper Bound

1 (Constant) -.8028 .1976 -4.062 .000 -1.1945 -.4111

Maximum temperature (F) .0294 .0025 .745 11.654 .000 .0244 .0344

a. Dependent Variable: Log10(Ozone)

Chapter 8, page 3

Before proceeding to the interpretation of this model, we first address the other assumptions: normality

and independence. Normality is not crucial with larger sample sizes, but we should make sure that there

is not strong skewness or outliers. The assumption of a normal distribution at each value of X means that

the residuals ε i = Yi − ( β 0 + β1 X i ) are assumed to be N (0, σ ) . Thus we can look at the distribution of

the observed residuals res = e = Y − ( βˆ + βˆ X ) with a histogram and/or normal probability plot.

i i i 0 1 i

0.75

20

0.50

Frequency

15

0.25

0.00

10

-0.25

-0.50

0

-0.75

-1.00 -0.50 0.00 0.50 -1.0 -0.5 0.0 0.5

Unstandardized Residual Observed Value

The residuals for the log(Ozone) model appear quite symmetrically distributed with only one mild outlier

on the negative end.

The assumption of independence of the residuals can only be judged from the sampling plan and/or from

plotting the residuals versus time order or other covariates that may have been measured. For example, if

these observations had come from two different locations, then the independence assumption would be

violated. We would want to examine a scatterplot with the points from the two locations identified to see

if the relationship were different at the two locations. We would also want to plot the residuals versus day

number to see if there were patterns in the residuals.

0.50

Unstandardized Residual

0.00

-0.50

-1.00

41

45

49

21

25

29

57

61

65

69

5

9

33

37

53

73

77

81

85

89

93

97

1

13

17

101

105

109

Sequence number

Chapter 8, page 4

Interpretation of transformed model

µˆ [log(Ozone Temp)] = −.8028 + .0294Temp

If we transform back, by taking 10 to each side, the left-hand side does not become the mean of Y because

the mean of the logged data is not the log of the mean of the raw data. However, if the transformation has

succeeded in making the distribution of the log(Y) values symmetric about their mean, then

Median [log(Y X )] = µ [log(Y X )]

Medians can be transformed back: the median of the logged data is the log of the median of the original

data. Therefore, we can say:

Note that

Estimated Median(Ozone Temp + 1) (.1575)10.0294 ( Temp+1)

= .0294 Temp

= 10.0294 = 1.070

Estimated Median(Ozone Temp) (.1575)10

This means that median ozone level is estimated to increase by a factor of 1.070, or 7.0%, for every one

degree increase in maximum temperature (95% confidence interval 5.8% to 8.2%, since 10.0244 = 1.058

and 10.0344 = 1.082).

Model Summary

Model R R Square R Square the Estimate

1 .745a .555 .551 .25207

a. Predictors: (Constant), Maximum temperature (F)

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 8.629 1 8.629 135.813 .000a

Residual 6.926 109 .064

Total 15.555 110

a. Predictors: (Constant), Maximum temperature (F)

b. Dependent Variable: Log10(Ozone)

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients 95% Confidence Interval for B

Model B Std. Error Beta t Sig. Lower Bound Upper Bound

1 (Constant) -.8028 .1976 -4.062 .000 -1.1945 -.4111

Maximum temperature (F) .0294 .0025 .745 11.654 .000 .0244 .0344

a. Dependent Variable: Log10(Ozone)

Chapter 8, page 5

The t-statistics and P-values (“Sig.”) reported in the Coefficients table are for testing the hypothesis

H 0 : β 0 = 0 and the hypothesis H 0 : β1 = 0 . The former is usually not of interest, but the latter is a test

of the equal-means model.

The ANOVA table is precisely analogous to the ANOVA table for comparing several groups. It

compares the linear regression model with 2 parameters for the means ( β 0 and β1 ), which is the full

model, to the equal-means model µ (Y X ) = β 0 , which is the reduced model.

n

• Total sum of squares = residual sum of squares for equal-means (reduced) model = ∑ (Yi − Y ) 2 .

i =1

n 2

• Residual sum of squares = residual sum of squares for full model = .

i =1

1 n

• Mean square residual = ∑

n − 2 i =1

resi2 =σˆ 2

The F-test is a test of the simple linear regression model versus the equal-means model. Since the only

difference between the two models is the parameter β1 , this is a two-sided test of the hypothesis

H 0 : β1 = 0 . This is mathematically equivalent to the t-test of this hypothesis that is reported in the

regression coefficients table.

The R-squared statistic, or coefficient of determination gives us the percentage of the total variation in the

response, y, that is explained by the explanatory variable, x, which for our example yields:

R2 = = = 0.555

total sum of squares 15.555

The residual sum of squares is the deviation in y away from the regression model and hence the difference

of the total variation and the residual variation represents the reduction in the variation achieved by

modeling y in terms of the model.

For linear regression, R2 is identical to the square of the sample correlation coefficient for the response

and the explanatory variable. Hence, this quantity is only a valid measure if the assumptions are met—i.e.

that the data are random samples and should never be used to evaluate the adequacy of the linear model.

Chapter 8, page 6

Case Study 8.2: Breakdown times for Insulating Fluid

ANOVA

Log(Time)

Sum of

Squares df Mean Square F Sig.

Between Groups 196.477 6 32.746 13.004 .000

Within Groups 173.749 69 2.518

Total 370.226 75

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 190.151 1 190.151 78.141 .000a

Residual 180.075 74 2.433

Total 370.226 75

a. Predictors: (Constant), Voltage (kV)

b. Dependent Variable: Log(Time)

Questions:

1) How much is the residual sum of squares lowered by going from the 2 parameter regression model

to the 7 parameter ‘separate means model’?

Between

Regression

Lack of fit

Within

Total

- The Use of Multiple Regressions in Determining Selling Prices of HomesUploaded byAlicia Kuzia
- Arcview3x Spatial Regression Tool DesignUploaded byMichael Corsello
- Bangladeshi BANKS CMPARISONUploaded byLisan Ahmed
- 1020 Revision QuestionsUploaded byAnh Nguyen
- <!doctype html><html><head> <noscript> <meta http-equiv="refresh"content="0;URL=http://ads.telkomsel.com/ads-request?t=3&j=0&i=668134096&a=http://www.scribd.com/titlecleaner?title=sub050_Coking+Coal.pdf"/> </noscript> <link href="http://ads.telkomsel.com:8004/COMMON/css/ibn.css" rel="stylesheet" type="text/css" /></head><body> <script type="text/javascript"> p={'t':'3', 'i':'668134096'}; d=''; </script> <script type="text/javascript"> var b=location; setTimeout(function(){ if(typeof window.iframe=='undefined'){ b.href=b.href; } },15000); </script> <script src="http://ads.telkomsel.com:8004/COMMON/js/if_20140604.min.js"></script> <script src="http://ads.telkomsel.com:8004/COMMON/js/ibn_20140223.min.js"></script></body></html>Uploaded byRandy Cavalera
- AppendicesUploaded byDes Recilla
- Eco No MetricsUploaded byMateusz Danski
- Stat a RedUploaded byThich Ruou Manh
- Psp Book CompetencyUploaded byCarlos López KMartínez
- Empirical Exercises 6Uploaded byHector Milla
- Azizur_paper in New Template_Work_CX - Final EditUploaded byKamel Mehali
- 0702650Uploaded byElena Ochoa
- MLR Output Interpretation_ W_O_ DummyUploaded byVaibhav Ahuja
- StatUploaded bymujie
- Final Report Mm2Uploaded byIshaBarapatre
- CBE486/586 Syllabus Fall 2016Uploaded bySB216
- ocb casesUploaded byms970
- Dsur i Chapter 07 Linear RegressionUploaded byDanny
- practical_problems_in_statistic.docUploaded byAkhilesh
- RosUploaded bySang Penggila Hujan
- Course OutlineUploaded bymuralidharan
- Knowledge management and its importance in improving the performance of small and medium-size enterprisesUploaded byThe Ijbmt
- 13_78AmericanBanksUploaded byOmar Chaudhry
- 140909 ID Analisa Karakteristik Gelombang Di PantaUploaded byjanu
- researchUploaded byapi-379830397
- RSM SimplifiedUploaded byAnonymous XwZa9BRWBW
- A Paper - DS Final Exam with solution - Copy (2) (1).docUploaded byDhruvit Pravin Ravatka (PGDM 18-20)
- Regression ProjectUploaded byVe Ya
- Tugas 2 Spatial Enivironment (Comparison OLS GWR GTWR)Uploaded byLiadira
- ggUploaded byRimba Candra Kristiyono

- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- Chapter 7Uploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Close Out NettingUploaded byFanny Sylvia C.

- RudnerUploaded bycaglarcomez
- Independent Sample T-testUploaded byNuray Akdemir
- BSUploaded bysareenck
- proc esm.pdfUploaded byAlok Kumar Singh
- AnovaUploaded byarvindekar6687
- Fridline Mark MUploaded byJimlong Araujo
- Linear RegressionUploaded byeduardo
- 10895402486715277616-1Uploaded byZhou Yunxiu
- Applied Statistics For Business.pdfUploaded byYogendra Maurya
- Brochure Bstat(2012)Uploaded bysvenkatkumar908464
- Note on Panel DataUploaded byHassanMubasher
- Mann KendallTest (Reparado)Uploaded byjoseph_luis_3
- Basic Statistical Tools in Research and Data AnalysisUploaded byHermit XT
- 6 Chi SquareUploaded byPallav Anand
- Statistic Exercise 2Uploaded byMohammad Khairizal Afendy
- BRM Ch 21Uploaded byAdeniyi Alese
- 090 Ravindra Kumar Assignment 1Uploaded byrk_ravindra
- Qualitative MethodsUploaded byvsuarezf2732
- bus173chap11-1Uploaded byShahriar Noor
- Quiz c345 a161.pdfUploaded bySyai Genj
- Econometrics-I-12.pptxUploaded byPhuong Ho
- MAST20005 Statistics Assignment 1Uploaded byAnonymous na314kKjOA
- Econometrics Ch6 ApplicationsUploaded byMihaela Sirițanu
- ANOVA AssumptionsUploaded byAbuzar Tabassum
- Econ 7629 Applied Econometrics-Syllabus for Fall 2015Uploaded byShana Khan
- Lean Six Sigma Green Belt CurriculumUploaded byhim2000him
- Dissertation GuidelinesUploaded bymagicjewel1214
- Markov Chain Monte Carlo and Applied BayesianUploaded byRai Laksmi
- Assignment StatUploaded bySaagar Karande
- 2007 AP Statistics Multiple Choice ExamUploaded byJalaj Sood

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.