Simple LReg

© All Rights Reserved

6 views

Simple LReg

© All Rights Reserved

- ch11 (1)
- OM Cheat Sheet
- Regression Analysis Mpp 6aug2014
- Investigating the Statistical Linear Relation between the Model Selection Criterion and the Complexities of Data Mining Algorithms
- Hetrosadastisity
- Linear Regression
- Correlation and Simple Linear Regression
- Validation of Analytical Methods Using a Regression Procedure
- Negative Behaviour and Speaking Scores
- model.p
- Unit 8
- Relation Bw Oil &$
- Petrosky
- Econ 140A - Lecture 12 - Limited Dependent Variables
- Correlation
- CORRELATION AND REGRESSION EXAM.docx
- percobaan ke2
- unit4
- M17_Part1_Tier1a
- Demand Forecasting CH 4

You are on page 1of 69

Chap 13-1

Chapter Goals

After completing this chapter, you should be

able to:

two variables

equation for a set of data

analysis

Chap 13-2

Chapter Goals

(continued)

able to:

regression coefficients

purposes of prediction and description

analysis is used incorrectly

variables

Chap 13-3

the relationship between two variables

of the association (linear relationship) between

two variables

relationship

Chap 13-4

Linear relationships

y

Curvilinear relationships

y

x

y

x

y

x

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

x

Chap 13-5

(continued)

Strong relationships

y

Weak relationships

y

x

y

x

y

x

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

x

Chap 13-6

(continued)

No relationship

y

x

y

x

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-7

Correlation Coefficient

(continued)

measures the strength of the association

between the variables

estimate of and is used to measure the

strength of the linear relationship in the

sample observations

Chap 13-8

Features of and r

Unit free

Range between -1 and 1

The closer to -1, the stronger the negative

linear relationship

The closer to 1, the stronger the positive

linear relationship

The closer to 0, the weaker the linear

relationship

Chap 13-9

Examples of Approximate

r Values

y

r = -1

r = -.6

r=0

r = +.3

r = +1

x

Chap 13-10

Calculating the

Correlation Coefficient

Sample correlation coefficient:

( x x )( y y )

[ ( x x ) ][ ( y y ) ]

2

n xy x y

[n( x 2 ) ( x )2 ][n( y 2 ) ( y )2 ]

where:

r = Sample correlation coefficient

n = Sample size

x = Value of the independent variable

y = Value of the dependent variable

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-11

Calculation Example

Tree

Height

Trunk

Diameter

xy

y2

x2

35

280

1225

64

49

441

2401

81

27

189

729

49

33

198

1089

36

60

13

780

3600

169

21

147

441

49

45

11

495

2025

121

51

12

612

2601

144

=321

=73

=3142

=14111

=713

Chap 13-12

Calculation Example

(continued)

Tree

Height,

y

n xy x y

8(3142) (73)(321)

[8(713) (73)2 ][8(14111) (321)2 ]

0.886

Trunk Diameter, x

linear association between x and y

Chap 13-13

Excel Output

Excel Correlation Output

Tools / data analysis / correlation

Correlation between

Tree Height and Trunk Diameter

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-14

Correlation

Hypotheses

H0: = 0

(no correlation)

HA: 0

(correlation exists)

Test statistic

1 r

n2

2

Chap 13-15

Is there evidence of a linear relationship

between tree height and trunk diameter at

the .05 level of significance?

H0: = 0

(No correlation)

H1: 0

(correlation exists)

=.05 , df = 8 - 2 = 6

r

1 r 2

n2

.886

1 .886 2

82

4.68

Chap 13-16

t

r

1 r 2

n2

.886

1 .886 2

82

Decision:

Reject H0

4.68

Conclusion:

There is

evidence of a

linear relationship

at the 5% level of

significance

d.f. = 8-2 = 6

/2=.025

Reject H0

-t/2

-2.4469

/2=.025

Do not reject H0

Reject H0

t/2

2.4469

4.68

Chap 13-17

Introduction to Regression

Analysis

the value of at least one independent variable

variable on the dependent variable

explain

Independent variable: the variable used to

explain the dependent variable

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-18

Model

described by a linear function

by changes in x

Chap 13-19

Positive Linear Relationship

No Relationship

Chap 13-20

The population regression model:

Population

y intercept

Dependent

Variable

Population

Slope

Coefficient

Independent

Variable

y 0 1x

Linear component

Random

Error

term, or

residual

Random Error

component

Chap 13-21

Linear Regression

Assumptions

given value of x

normal

constant variance

variable and the y variable is linear

Chap 13-22

y

y 0 1x

(continued)

Observed Value

of y for xi

Predicted Value

of y for xi

Slope = 1

Random Error

for this x value

Intercept = 0

xi

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

x

Chap 13-23

The sample regression line provides an estimate of

the population regression line

Estimated

(or predicted)

y value

Estimate of

the regression

Estimate of the

regression slope

intercept

y i b0 b1x

Independent

variable

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-24

of b0 and b1 that minimize the sum of the

squared residuals

2

e (y y)

2

(y (b

b1x))

Chap 13-25

b1

( x x )( y y )

(x x)

2

algebraic equivalent:

b1

x y

xy

n

2

(

x

)

2

x

and

b0 y b1 x

Chap 13-26

Interpretation of the

Slope and the Intercept

when the value of x is zero

average value of y as a result of a oneunit change in x

Chap 13-27

Equation

found using computer software, such as

Excel or Minitab

computed as part of computer-based

regression analysis

Chap 13-28

Example

relationship between the selling price of a home

and its size (measured in square feet)

Dependent variable (y) = house price in $1000s

Independent variable (x) = square feet

Chap 13-29

Model

House Price in $1000s

(y)

Square Feet

(x)

245

1400

312

1600

279

1700

308

1875

199

1100

219

1550

405

2350

324

2450

319

1425

255

1700

Chap 13-30

Chap 13-31

Excel Output

Regression Statistics

Multiple R

0.76211

R Square

0.58082

Adjusted R Square

0.52842

Standard Error

41.33032

Observations

ANOVA

10

df

SS

MS

F

11.0848

Regression

18934.9348

18934.9348

Residual

13665.5652

1708.1957

Total

32600.5000

Coefficients

Intercept

Square Feet

Standard Error

t Stat

P-value

Significance F

0.01039

Lower 95%

Upper 95%

98.24833

58.03348

1.69296

0.12892

-35.57720

232.07386

0.10977

0.03297

3.32938

0.01039

0.03374

0.18580

Chap 13-32

Graphical Presentation

regression line

Slope

= 0.10977

Intercept

= 98.248

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-33

Interpretation of the

Intercept, b0

house price 98.24833 0.10977 (square feet)

value of X is zero (if x = 0 is in the range of

observed x values)

just indicates that, for houses within the range of

sizes observed, $98,248.33 is the portion of the

house price not explained by square feet

Chap 13-34

Interpretation of the

Slope Coefficient, b1

house price 98.24833 0.10977 (square feet)

average value of Y as a result of a oneunit change in X

house increases by .10977($1000) = $109.77, on

average, for each additional one square foot of size

Chap 13-35

Properties

regression line is 0 ( ( y y ) 0 )

( y y ) 2 )

(minimized

mean of the y variable and the mean of the x variable

estimates of 0 and 1

Chap 13-36

Variation

SST

Total sum of

Squares

SST ( y y )2

SSE

Sum of Squares

Error

SSE ( y y )2

SSR

Sum of Squares

Regression

SSR ( y y )2

where:

y = Observed values of the dependent variable

y = Estimated value of y for the given x value

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-37

Variation

(continued)

mean y

Variation attributable to factors other than the

relationship between x and y

between x and y

Chap 13-38

Variation

(continued)

y

yi

2

SSE = (yi - yi )

_2

SSR = (yi - y)

_

y

Xi

_

y

x

Chap 13-39

Coefficient of

Determination, R2

of the total variation in the dependent variable

that is explained by variation in the

independent variable

R-squared and is denoted as R2

SSR

R

SST

2

where

0 R 1

2

Chap 13-40

Coefficient of

Determination, R2

(continued)

Coefficient of determination

SSR sum of squares explained by regression

R

SST

total sum of squares

2

of determination is

R r

2

where:

R2 = Coefficient of determination

r = Simple correlation coefficient

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-41

Examples of Approximate

R2 Values

y

R2 = 1

R2 = 1

explained by variation in x

R = +1

2

between x and y:

Chap 13-42

Examples of Approximate

R2 Values

y

0 < R2 < 1

between x and y:

Some but not all of the

variation in y is explained

by variation in x

x

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-43

Examples of Approximate

R2 Values

R2 = 0

No linear relationship

between x and y:

R2 = 0

depend on x. (None of the

variation in y is explained

by variation in x)

Chap 13-44

Excel Output

Multiple R

0.76211

R Square

0.58082

Adjusted R Square

0.52842

Standard Error

house prices is explained by

variation in square feet

41.33032

Observations

ANOVA

SSR 18934.9348

R

0.58082

SST 32600.5000

2

Regression Statistics

10

df

SS

MS

F

11.0848

Regression

18934.9348

18934.9348

Residual

13665.5652

1708.1957

Total

32600.5000

Coefficients

Intercept

Square Feet

Standard Error

t Stat

P-value

Significance F

0.01039

Lower 95%

Upper 95%

98.24833

58.03348

1.69296

0.12892

-35.57720

232.07386

0.10977

0.03297

3.32938

0.01039

0.03374

0.18580

Chap 13-45

observations around the regression line is

estimated by

SSE

s

n k 1

Where

SSE = Sum of squares error

n = Sample size

k = number of independent variables in the

model

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-46

the Regression Slope

coefficient (b1) is estimated by

sb1

(x x)

( x)

x n

where:

SSE

s

= Sample standard error of the estimate

n2

Chap 13-47

Excel Output

Regression Statistics

Multiple R

0.76211

R Square

0.58082

Adjusted R Square

0.52842

Standard Error

sb1 0.03297

41.33032

Observations

ANOVA

s 41.33032

10

df

SS

MS

F

11.0848

Regression

18934.9348

18934.9348

Residual

13665.5652

1708.1957

Total

32600.5000

Coefficients

Intercept

Square Feet

Standard Error

t Stat

P-value

Significance F

0.01039

Lower 95%

Upper 95%

98.24833

58.03348

1.69296

0.12892

-35.57720

232.07386

0.10977

0.03297

3.32938

0.01039

0.03374

0.18580

Chap 13-48

y

from the regression line

small s

lines from different possible samples

small sb1

large sb1

large s

Chap 13-49

t Test

H0: 1 = 0

H1: 1 0

(linear relationship does exist)

Test statistic

b1 1

t

sb1

d.f. n 2

where:

b1 = Sample regression slope

coefficient

1 = Hypothesized slope

sb1 = Estimator of the standard

error of the slope

Chap 13-50

t Test

(continued)

House Price

in $1000s

(y)

Square Feet

(x)

245

1400

312

1600

279

1700

308

1875

199

1100

219

1550

405

2350

324

2450

319

1425

255

1700

house price 98.25 0.1098 (sq.ft.)

Does square footage of the house

affect its sales price?

Chap 13-51

t Test Example

Test Statistic: t = 3.329

H0: 1 = 0

HA: 1 0

Coefficients

Intercept

Square Feet

d.f. = 10-2 = 8

/2=.025

Reject H0

/2=.025

Do not reject H0

-t/2

-2.3060

Reject H

0

t/2

2.3060 3.329

b1

Standard Error

sb1

t Stat

P-value

98.24833

58.03348

1.69296

0.12892

0.10977

0.03297

3.32938

0.01039

Decision:

Reject H0

Conclusion:

There is sufficient evidence

that square footage affects

house price

Chap 13-52

Description

Confidence Interval Estimate of the Slope:

b1 t /2 sb1

d.f. = n - 2

Coefficients

Intercept

Square Feet

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

98.24833

58.03348

1.69296

0.12892

-35.57720

232.07386

0.10977

0.03297

3.32938

0.01039

0.03374

0.18580

the slope is (0.0337, 0.1858)

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-53

Description

Coefficients

Intercept

Square Feet

Standard Error

t Stat

P-value

Lower 95%

Upper 95%

98.24833

58.03348

1.69296

0.12892

-35.57720

232.07386

0.10977

0.03297

3.32938

0.01039

0.03374

0.18580

$1000s, we are 95% confident that the average

impact on sales price is between $33.70 and

$185.80 per square foot of house size

This 95% confidence interval does not include 0.

Conclusion: There is a significant relationship between

house price and square feet at the .05 level of significance

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-54

the Average y, Given x

Confidence interval estimate for the

mean of y given a particular xp

Size of interval varies according

to distance away from mean, x

1 (x p x)

2

n (x x)

2

y t /2 s

Chap 13-55

an Individual y, Given x

Confidence interval estimate for an

Individual value of y given a particular xp

1 (x p x)

1

2

n (x x)

2

y t /2 s

the added uncertainty for an individual case

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-56

Interval Estimates

for Different Values of x

y

Prediction Interval

for an individual y,

given xp

Confidence

Interval for

the mean of

y, given xp

b

1

+

y = b0

xp

x

Chap 13-57

House Price

in $1000s

(y)

Square Feet

(x)

245

1400

312

1600

279

1700

308

1875

199

1100

219

1550

405

2350

324

2450

319

1425

255

1700

house price 98.25 0.1098 (sq.ft.)

with 2000 square feet

Chap 13-58

(continued)

with 2000 square feet:

98.25 0.1098(200 0)

317.85

The predicted price for a house with 2000

square feet is 317.85($1,000s) = $317,850

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-59

Example

Confidence Interval Estimate for

E(y)|xp

average price of 2,000 square-foot houses

y t /2 s

(x p x)2

317.85 37.12

2

n (x x)

or from $280,660 -- $354,900

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-60

Estimation of Individual

Values: Example

Prediction Interval Estimate for

y|xp

individual house with 2,000 square feet

y t /2s

(x p x)2

1

1

317.85 102.28

2

n (x x)

or from $215,500 -- $420,070

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-61

Intervals PHStat

In Excel, use

PHStat | regression | simple linear regression

Check the

confidence and prediction interval for X=

box and enter the x-value and confidence level

desired

Chap 13-62

Prediction Intervals PHStat

(continued)

Input values

E(y)|xp

Prediction Interval Estimate for

y|xp

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

Chap 13-63

Residual Analysis

Purposes

Examine for linearity assumption

Examine for constant variance for all

levels of x

Evaluate normal distribution assumption

Can plot residuals vs. x

Can create histogram of residuals to

check for normality

Chap 13-64

Linearity

y

Not Linear

Business Statistics: A Decision-Making Approach, 6e 2005 PrenticeHall, Inc.

residuals

residuals

Linear

Chap 13-65

Constant Variance

y

x

Non-constant variance

residuals

residuals

Constant variance

Chap 13-66

Excel Output

RESIDUAL OUTPUT

Predicted

House Price

Residuals

251.92316

-6.923162

273.87671

38.12329

284.85348

-5.853484

304.06284

3.937162

218.99284

-19.99284

268.38832

-49.38832

356.20251

48.79749

367.17929

-43.17929

254.6674

64.33264

10

284.85348

-29.85348

Chap 13-67

Chapter Summary

Discussed correlation to measure the strength

of a linear association

Introduced simple linear regression analysis

Calculated the coefficients for the simple linear

regression equation

Described measures of variation (R2 and s)

Addressed assumptions of regression and

correlation

Chap 13-68

Chapter Summary

(continued)

Addressed estimation of mean values and

prediction of individual values

Discussed residual analysis

Chap 13-69

- ch11 (1)Uploaded byMarc Alamo
- OM Cheat SheetUploaded byslitandkiss
- Regression Analysis Mpp 6aug2014Uploaded byPalanitewan தேவன்
- Investigating the Statistical Linear Relation between the Model Selection Criterion and the Complexities of Data Mining AlgorithmsUploaded byJournal of Computing
- HetrosadastisityUploaded bystreet_rapperz
- Linear RegressionUploaded bykishorenayark
- Correlation and Simple Linear RegressionUploaded bySwethaPriya
- Validation of Analytical Methods Using a Regression ProcedureUploaded byleizar_death64
- Negative Behaviour and Speaking ScoresUploaded byHendi Pratama
- model.pUploaded bySMBEAUTY
- Unit 8Uploaded byRamesh G Mehetre
- Relation Bw Oil &$Uploaded byRohit Bhansali
- PetroskyUploaded byLiiz Corte
- Econ 140A - Lecture 12 - Limited Dependent VariablesUploaded byEric Hu
- CorrelationUploaded byajayikayode
- CORRELATION AND REGRESSION EXAM.docxUploaded byVince Frederick Estrada Dulay
- percobaan ke2Uploaded byeko
- unit4Uploaded byJuan
- M17_Part1_Tier1aUploaded bySagar Srinivas
- Demand Forecasting CH 4Uploaded byNayana Weerasinghe
- Week06&07_Curve Fitting.pdfUploaded bynoorhafiz07
- Curry 1986 (SL y CO).pdfUploaded byPedro Alberto Herrera Ledesma
- Reinstein Psid SubstitutionUploaded byDavid Reinstein
- 610HW7Uploaded byleeedeeemeee
- Brm ProjectUploaded bysandeepsingh87
- es2 - stat2var - lessonsex - rev 2013Uploaded byapi-203629011
- Part4-PanelDataBinaryChoiceModelsUploaded byJesús M. Villero
- babaUploaded byapi-285777244
- 65-2-37-395.pdfUploaded byletter2lal
- e0dc7e98d2d93dc5cd8bd3304adfcc3befd9Uploaded byInocêncioCollorMeloHorácio

- Auto-Axcess E 675 DigitalUploaded byweldtech
- jetty 03Uploaded byorode franklyn
- Teaching Vocabulary Through Singing MethodUploaded byBaihaqi Bin Muhammad Daud
- November 28, 2014 SAMAHAN Central Board Report of MinutesUploaded bySAMAHAN Central Board
- p247=Concentration of c&d Waste Through JiggingUploaded bymehra222
- Db2fmp Process Mechanism & Reported IssuesUploaded byDeepak Sansaniwal
- G3512LE.pdfUploaded byandrey7219
- C Section Step by StepUploaded byj_nagely
- Fracture of Shoulder and Upper Arm S42Uploaded byyunita
- Roman NumberUploaded byNur Ainna Shafiqah Sopi
- International Journal of Tomatis Method ResearchUploaded byאורית לוי-לידר
- CDNLive2012_A Comprehensive Approach to Scalable Framework for Both Vertical and Horizontal Reuse in UVM Verification_AMDUploaded bySam
- Automats TheoryUploaded byAmell Olarte Natera
- release-Dubé-contempt.docUploaded byjp_murray8765
- finaldraftofadvocacypaper-2Uploaded byapi-317154097
- Tax Returns and Filing DatesUploaded bypriginflo26
- Business Management -- Group AssignmentUploaded byKei Hiroyuki
- Assessment Rubrics for Action Research Report 2016Uploaded byWan Nurul Syazwani
- ME341_HW1Uploaded bynimishk92
- “Joskus rajaton rikkoo rajan – Ibland bryter den gränslösa gränsen” Religiös identitet hos första, andra och tredje generationens sverigefinländareUploaded byAnoo Skoglund Niskanen
- Cashew Nut Shell LiquidUploaded byPankaj Jadhav
- Factor Affecting Positive & Negative Word of Mouth in Restaurant IndustryUploaded byEditor IJTSRD
- Telecom Appraisal_FormUploaded byOra Jin
- Act_contol_system Dclt Doppler Funciones de TrjetasUploaded byRoman Ortega
- -IndianUploaded byAllen Fourever
- dioxines and furansUploaded byElla Rădoi
- Spiritual Solutions Feb 2018 3Uploaded bypuneet
- Corporate CultureUploaded byMuhammad Mubeen Iqbal Puri
- Paragraph Organization Civil Service Test ExamplesUploaded bylordaiztrand
- To.utamA Pipe FlowUploaded byHasan Syaiful I