Attribution Non-Commercial (BY-NC)

529 views

Attribution Non-Commercial (BY-NC)

- The Effect of Transportation on Affordability in Greater Vancouver
- Group Assignment Hi6007 statitics
- Appendix on Linear Regress in Excel
- V2N2P5
- ECON2206 Assignment 2 William Chau z3376203
- Tute Exercise 5
- gravity09_mons2
- Regression.slides 0
- System Identification of Unmanned Aerial Vehicles
- Stats 3 Note 04
- multiple regression
- Atp Examples
- Common
- PracticeProblems 1 Solutions
- sample lab report
- Edme7 One-factor Tut
- rapor
- Tables and Formulae Sheet
- Nov_2014
- econ project

You are on page 1of 8

7, page 1

Chapter 7 The Simple Linear Regression Model

A common model for modeling the relationship between two quantitative variables is the linear

regression model. Don’t be fooled by the “linear” part: as we’ll see, linear regression models can

often be used to model relationships which aren’t linear.

Although we looked at the linear regression model last semester, we only looked at one part of it

– the part that models the mean response Y as a linear function of X. We’ll extend the model to

model the scatter of the individual data points around the line. The way we extend it makes the

linear regression model exactly like the ANOVA model, except that the explanatory variable is

quantitative instead of categorical.

We assume that at each X, the distribution of Y values is normal with mean β 0 + β1 X and

standard deviation σ.

µ (Y X ) = β 0 + β 1 X

σ (Y X ) = σ 2

Least squares estimates of β 0 and β 1 are denoted by β̂ 0 and βˆ1 . The predicted or fitted value

of Y for a particular X is:

µˆ (Y X ) = βˆ 0 + βˆ1 X .

By modeling the distribution of data points around the line, we can make inferences from the

sample data about the regression parameters.

Chap. 7, page 2

Case Study 7.2: Meat Processing and pH

ANOVAb

Sum of

Model Squares df Mean Square F Sig.

1 Regression 3.00647 1 3.00647 444.306 .000a

Residual .05413 8 .00677

Total 3.06060 9

a. Predictors: (Constant), Log(hours)

b. Dependent Variable: pH

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients

Model B Std. Error Beta t Sig.

1 (Constant) 6.9836 .0485 143.897 .000

Log(hours) -.7257 .0344 -.991 -21.079 .000

a. Dependent Variable: pH

1 7.02 0 6.9836 0.0364

1 6.93 0 6.9836 -0.0536

2 6.42 0.69 6.4806 -0.0606

2 6.51 0.69 6.4806 0.0294

4 6.07 1.39 5.9777 0.0923

4 5.99 1.39 5.9777 0.0123

6 5.59 1.79 5.6834 -0.0934

6 5.8 1.79 5.6834 0.1166

8 5.51 2.08 5.4747 0.0353

8 5.36 2.08 5.4747 -0.1147

Chap. 7, page 3

Yi = β 0 + β 1 X i + ε i

n

∑ ( X i − X )(Yi − Y )

i =1

βˆ1 = n

, βˆ 0 = Y − βˆ1 X

∑ ( X i − X )2

i =1

∑ resi2

Estimate of σ is σˆ = = i =1 .

degrees of freedom n−2

Degrees of freedom = n - #parameters in the model for the means = n –2 for simple linear

regression

The ANOVA table gives the sum of squared residuals and the mean square residual which is

σˆ 2 = 0.00677 so σˆ = 0.0823.

The standard errors of β̂ 0 and βˆ1 represent the estimated standard deviations of the sampling

distributions of β̂ and βˆ . The sampling distributions refer to how the least squares estimates

0 1

would vary from sample to sample. We view the X i ’s as fixed; they are viewed to remain the

same from sample to sample while the Yi ’s are random.

1 1 X2

SE ( βˆ1 ) = σˆ , SE ( βˆ 0 ) = σˆ +

(n − 1) s X2 n (n − 1) s X2

Chap. 7, page 4

Example: Steer carcass data

Mean pH is estimated to decrease by .7257 for every one unit increase in Log(Hours). A one

unit increase in Log(Hours) is an increase in Hours by a factor of e ≈ 2.72. If we had used

Log10(Hours) instead, the interpretation would be easier: the slope represents the increase in

predicted pH for every 10-fold increase in time since slaughter.

A 95% confidence interval for β 1 is -.7257 ± t 8 (.975) (.0344) = -.7257 ± 2.306 (.0344) =

-.7257± .0793 = -.805 to -.646. So we are 95% confident that the decrease in mean pH is

between .646 and .805 for every 2.72-fold increase in time since slaughter.

The confidence interval can also be obtained from SPSS by choosing Options in the

Analyze…Regression…Linear window.

Coefficientsa

Unstandardized Standardized

Coefficients Coefficients 95% Confidence Interval for B

Model B Std. Error Beta t Sig. Lower Bound Upper Bound

1 (Constant) 6.984 .049 143.897 .000 6.872 7.096

Log(hours) -.726 .034 -.991 -21.079 .000 -.805 -.646

a. Dependent Variable: pH

The intercept β 0 represents the mean value of Y when X = 0. Usually, this is not particularly

meaningful. It is usually more meaningful to estimate the mean value of Y at particular values of

X which are meaningful and interesting, which is covered next.

Inferences about the slope of the regression line tell us about how big the change is in the mean

response (Y) for a 1-unit increase in X. Sometimes, we are interested in a confidence interval for

the mean response at a particular X, say X 0 . According to the model, the true mean of Y at X 0

0 0 1 0 ( 0

)

is µ (Y X ) = β + β X . The estimate of this is µˆ Y X = βˆ + βˆ X . The standard error of

0 1 0

µˆ (Y X 0 ) is

[( )]

SE µˆ Yˆ X 0 = σˆ

1 ( X 0 − X )2

+

n (n − 1) s X2

Note that the standard error is bigger for values of X 0 further from X and is smallest at X .

Chap. 7, page 5

Steer data: What is the estimated mean pH for carcasses 3 hours old? Give a 95% confidence

interval for the mean pH after 3 hours.

First, remember that the X variable in the regression model is log(Hours), so X 0 = log(3) =

( )

1.0986 (natural logarithm). Therefore, µˆ Y X 0 = 1.0986 = 6.9836 - .7257(1.0986) = 6.186.

To calculate the standard error, we need to compute X , the mean of the log(Hours) for the 10

data points and s X2 , the sample variance of log(Hours). From SPSS,

Descriptive Statistics

LogTime 10 1.19013 .796480 .63438

Valid N (listwise) 10

Therefore,

[( )]

SE µˆ Yˆ X 0 = 1.0986 = 0.0823

1 (1.0986 − 1.1901) 2

10

+

5.709

= 0.0262

and a 95% confidence interval for the mean pH among all steers after 3 hours is

If we want simultaneous confidence intervals at several different values of X, we can use

Bonferroni if the number of values is small. We can compute simultaneous confidence intervals

at every possible value of X using a Scheffe procedure. The result is a set of confidence bands

for the regression line. We are 95% (or whatever the chosen confidence level) that the

regression line lies entirely within the bands. Thus, we are 95% confident that the true means at

all possible values of X are all within the confidence band limits. The formula for the

simultaneous confidence bands is

βˆ 0 + βˆ1 X ± 2 F2,n−2 (1 − α ) SE[µ̂ (Y X )]

This is referred to as the Workman-Hotelling procedure. In practice, you compute these limits at

a large number of X values, then join the limits to make a smooth curve on the scatterplot. Some

programs will do this automatically, but SPSS will not. It will, however, plot the individual

confidence intervals for all X’s using the t coefficient rather than the Scheffe coefficient.

Steer data: for simultaneous 95% confidence intervals, F2,n −2 (.1 − α ) = F2,8 (.95) = 4.46. The

confidence interval for the mean pH after 3 hours is therefore (see above):

Chap. 7, page 6

The confidence intervals above is for the mean pH for all steer 3 hours after slaughter. A 95%

prediction interval for the pH of an individual steer 3 hours after slaughter is an interval in which

you are 95% confident that the pH of a particular steer will lie 3 hours after slaughter. A

confidence interval is for a mean; a prediction interval is for an individual.

Pred(Y X 0 ) = µˆ (Y X 0 ) = βˆ 0 + βˆ1 X 0

1 ( X 0 − X )2

SE[Pred(Y X 0 )] = σˆ 2 + SE[µˆ (Y X 0 )] = σˆ 1 +

2

+

n (n − 1) s X2

The standard error of prediction has two parts: the uncertainty due to estimating the mean

response at X 0 and the uncertainty due to the fact that individual observations vary around that

mean with standard deviation σ. Note that while the standard error of the mean response at X 0

goes to 0 as n increases, the standard error of prediction never goes to 0. An individual 100(1-

α)% prediction interval for the response of an individual at X 0 is

For the steer data, a 95% prediction interval for the pH of a particular steer 3 hours after

slaughter is:

1 (1.0986 − 1.1901) 2

6.186 ± 2.306 (.0823) 1 + + = 6.186 ± 2.306(.08637) = 6.186 ± .1992 =

10 5.709

5.99 to 6.39.

Simultaneous prediction intervals can be computed for several different X values using

Bonferroni, but there is no analog to the Working-Hotelling Scheffe-based procedure for

simultaneous prediction intervals at all possible values of X.

Chap. 7, page 7

SPSS commands

Analyze…Regression…Linear

Under Statistics button, you can choose to get confidence intervals for β 0 and β1 .

• Unstandardized Predicted Values

• Unstandardized Residuals

• Prediction Intervals: Mean: this isn’t a prediction interval, it’s an individual confidence

interval for the mean response at each X. SPSS does not compute the Working-Hotelling

simultaneous confidence intervals

• Prediction Intervals: Individual: this is a prediction interval for an individual response at

each X

To obtain predicted values, confidence intervals and prediction intervals for a value of X not in

the data set, add a case to the data with the desired X value, but leave the value of Y blank (it

should display a period which indicates a missing value).

SPSS can plot the individual confidence intervals for mean response and the prediction

intervals for an individual response. Create a scatterplot and double-click the plot to get into

Chart Editor. Select one of the data points and click on the “Add fit line” icon. Under the “Fit

line” tab you can select “Mean” or “Individual” confidence intervals. The first gives individual

(not simultaneous) confidence intervals for the mean response at each X and the second gives

prediction intervals.

Chap. 7, page 8

95% individual confidence intervals for the mean, 95% Working-Hotelling simultaneous

confidence bands for the mean, and 95% individual prediction intervals for a single response

(this graph is from S-Plus,; SPSS will only do the first and last of the three).

0.95 bands

7.0

6.5

y

6.0

5.5

x

- The Effect of Transportation on Affordability in Greater VancouverUploaded byLee Haber
- Group Assignment Hi6007 statiticsUploaded bychris
- Appendix on Linear Regress in ExcelUploaded byHoang Huong Tra
- V2N2P5Uploaded byNiveditha Nandakumar
- ECON2206 Assignment 2 William Chau z3376203Uploaded byPeter Dundaro
- Tute Exercise 5Uploaded byashley_kasner
- gravity09_mons2Uploaded byHo Van
- Regression.slides 0Uploaded byNavjot Singh
- System Identification of Unmanned Aerial VehiclesUploaded byAhmed Medhat Youssef
- Stats 3 Note 04Uploaded byacrosstheland8535
- multiple regressionUploaded byWinny Shiru Machira
- Atp ExamplesUploaded bySameerbaskar
- CommonUploaded byLianaFibrina
- PracticeProblems 1 SolutionsUploaded bycoffeedance
- sample lab reportUploaded byapi-255900651
- Edme7 One-factor TutUploaded byDANIEL-LABJME
- raporUploaded byFatih Altinbaş
- Tables and Formulae SheetUploaded byHa Hoang Anh Nguyen
- Nov_2014Uploaded byLuca Allevi
- econ projectUploaded byapi-313955723
- Ts RollingUploaded byAnonymous eUu3WWH
- Chapter 09Uploaded bySophie
- Chapter 2Uploaded byTeo Liang Wei
- Simple linear regression analysisUploaded byJoses Jenish Smart
- etivariasiUploaded byjaelani
- COURSE 6 ECONOMETRICS 2009 regression.pptUploaded byAlex Ionescu
- FinQuiz-Level2Mock2016Version3JuneAMQuestionsUploaded byDavid Lê
- 15.Kothari-Information in prices about future earnings -Implications for earnings response coefficie.pdfUploaded byKSatria BerGitar
- DSC4213 - Analytics tools for consultingUploaded bySabina Tan
- Stat a Cheat SheetsUploaded byivanmrn

- Model- vs. design-based sampling and variance estimationUploaded byFanny Sylvia C.
- ReviewChaps3-4Uploaded byFanny Sylvia C.
- SampleSizeCalcRevisitedUploaded byFanny Sylvia C.
- Hypo%26PowerLectureUploaded byFanny Sylvia C.
- ReviewChaps1-2Uploaded byFanny Sylvia C.
- Non%26ParaBootUploaded byFanny Sylvia C.
- Chapter 21Uploaded byFanny Sylvia C.
- Chapter 20Uploaded byFanny Sylvia C.
- Chapter 14Uploaded byFanny Sylvia C.
- Chapter 13Uploaded byFanny Sylvia C.
- Chapter 12Uploaded byFanny Sylvia C.
- Chapter 11Uploaded byFanny Sylvia C.
- Chapter 8Uploaded byFanny Sylvia C.
- Chapter 10Uploaded byFanny Sylvia C.
- Chapter 9Uploaded byFanny Sylvia C.
- Chapter 5Uploaded byFanny Sylvia C.
- Chapter 6Uploaded byFanny Sylvia C.
- Chapter5p2LectureUploaded byFanny Sylvia C.
- An Ova PowerUploaded byFanny Sylvia C.
- Intro BootstrapUploaded byMichalaki Xrisoula
- Chapter 7Uploaded byFanny Sylvia C.
- Good Article on Standard Error vs Standard DeviationUploaded byAshok Kumar Bharathidasan
- Data Modeling: General Linear Model &Statistical InferenceUploaded byFanny Sylvia C.
- Bio Math 94 CLUSTERING POPULATIONS BY MIXED LINEAR MODELSUploaded byFanny Sylvia C.
- GRM: Generalized Regression Model for Clustering Linear SequencesUploaded byFanny Sylvia C.
- Clustering in the Linear ModelUploaded byFanny Sylvia C.
- R Matrix TutorUploaded byFanny Sylvia C.
- The not so Short Introduction to LaTeXUploaded byoetiker
- Close Out NettingUploaded byFanny Sylvia C.

- Mathematics IV Nov2004 or 311851Uploaded byNizam Institute of Engineering and Technology Library
- E 562 - Volume FractionUploaded byRahmat Ramadhan Pasaribu
- Adv statUploaded bydeepakravichandar
- 9709_m18_qp_72Uploaded byLaura Wu
- 2017120619094917f Data Analysis Assignment 5Uploaded bySunil Kumar
- Sample Final Exam 1Uploaded byupload55
- Fexofenadine BioequivalenceUploaded byZeshan Haider Kazmi
- A Systematic Review of Outcomes and Complications of Reconstruction and Amputation for Type IIIB and IIIC Fractures of the TibiaUploaded byKlaus
- the effect of graphic organizeUploaded byapi-311206006
- 21_2.pdfUploaded byAdjei Paul
- Statistics TutorialUploaded byS
- Calibration ModelsUploaded byRodrigo Vallejos Vergara
- skittles term prooject compiled dataUploaded byapi-325274340
- Barker CI Poisson Events 5 TAS 2002Uploaded byKi Pa
- Assignment #3 Hypothesis TestingUploaded byJihen Smari
- Applied Eco No Metrics With StataUploaded byaba2nb
- Solutions Chapter 3Uploaded byTimroo Hamro
- Unbiased Estimation of the Black Scholes FormulaUploaded byZhang Peilin
- Nordtest Tec537 ed2Uploaded byZeliha Yıldırım
- ASQ POCKET BOOK FOR SIX SIGMA BLACK BELT STATICS.pdfUploaded bybalu
- 2aschenbrennerUploaded bycarine_moreira_7
- Midterm 2 With SolutionUploaded byFatimaIjaz
- Healy MJR -- Non-normal Data.Uploaded byAspirin A. Bayer
- 5 Th FractureUploaded byalizzx
- ASEAN stability GuidelineUploaded byvarizzz
- A Study of the Ischiopubic Index: A Radiographic Analysis In Maiduguri, North Eastern NigeriaUploaded byIOSRjournal
- Grandey, Fish, & Steiner (2005)Uploaded byJoe Evans
- Calibration Uncertainty ENGUploaded bymagtrol
- Final Exam Sample SolutionsUploaded byJung Yoon Song
- Estimation TheoryUploaded byTejaswi Joshyula Challa

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.