2017 09 11 Quantitative Methods Fall 2017 HAUT Chapter 2 Least Square Slides

1
Least Squares Method
The objective of the scatter diagram is to measure the

strength and direction of the linear relationship.
Both can be more easily judged by drawing a straight line

through the data.
Which line best

describes the
relationship
between X and Y?
2
We need an objective method of producing a straight line.
The best line will be one that is “closest” to the points on the scatterplot. In
other words, the best line is one that minimises the total distance between
itself and all the observed data points.
 Since we oftentimes use regression to predict values of Y from observed

values of X, we choose to measure the distance vertically.
3

We want to find the line that minimises the vertical distance between
itself and the observed points on the scatterplot.
So here we have 2 different lines that may describe the relationship
between X and Y. To determine which one is best, we can find the
vertical distances from each point to the line...
 So based on this, the line on the right is better than the line on the left in describing
the relationship between X and Y. ***infinite number of lines***
4
Recall, the slope-intercept equation for a line is expressed

in these terms:
y = mx + b
Where: m is the slope of the line

b is the y-intercept.
If we have determined there is a linear relationship

between two variables with covariance and the coefficient
of correlation, can we determine a linear function of the
relationship?
5
Just to make things more difficult for students, we typically

rewrite this line as:
yˆ  b0  b1 x
Read as y-hat! --- Fitted regression line!
s xy
where the slope, b1  2
s x
and the intercept, b0  y  b1 x

Read as ”b naught”
6
Interpretation of the b0, b1

7
Some of the errors will be positive and some will be negative! The problem is that
when we add positive and negative values, they tend to cancel each other out.
“Best” line: least-squares, or regression line
We can then define the error to be the difference between the coordinates and
the prediction line.
The coordinate of one point: (xi, yi)
Predicted value for given xi : yˆ i  b0  b1 xi

“Best” line minimizes  y ˆ
y 
 i i , the sum of the squared errors.

2
Error = distance from one point to the line =

Coordinate – Prediction
8
Some of the errors will be positive and some will be negative! The problem is that
when we add positive and negative values, they tend to cancel each other out.
“Best” line: least-squares, or regression line
When we square those error lines, we are literally making squares from those
lines. We can visualize this as...
So we want to find the regression line that minimizes the sum of the areas of
these error squares. For this regression line, the sum of the areas of the
squares would look like this...
9
Let`s determine the best-fitted line for following data:
s xy
Least Squares Method b1  2
b0  y  b1 x
s x
10
s xy
b0  y  b1 x
s x
11
s xy
b0  y  b1 x
s x
12
s xy
b0  y  b1 x
s x
13
s xy
b0  y  b1 x
s x
Lines of best fit will pivot around the

point which represents the mean of X
and the mean of the Y variables!
14
s xy
b0  y  b1 x
s x
 x  x 
2
s 2
 i
n 1
15
s xy
b0  y  b1 x
s x
16
s xy
b0  y  b1 x
s x
17
s xy
b0  y  b1 x
s x
18
s xy
b0  y  b1 x
s x
19
s xy
b0  y  b1 x
s x
20
s xy
b0  y  b1 x
s x
21
s xy
b0  y  b1 x
s x
 x  x 
2
s 2
 i
n 1
22
s xy
b0  y  b1 x
s x
 x  x 
2
s 2
 i
n 1
23
s xy
b0  y  b1 x
s x
 x  x 
2
s 2
 i
n 1
24
s xy
b0  y  b1 x
s x
 x  x 
2
s 2
 i
n 1
25
s xy
b0  y  b1 x
s x
 x  x 
2
s 2
 i
n 1
26
s xy
b0  y  b1 x
s x
27
Line of Best Fit
Only for medium to strong correlations...

28
Line of Best Fit

29
Line of Best Fit

30
What line?
r measures “closeness” of data to the “best” line. How

best? In terms of least squared error:
31
î
Interpretation of the b0, b1, y  9.95  2.25 xi
In a fixed and variable costs model:
yî  9.95  2.25 xi

b0 =9.95? Intercept: predicted value of y when x = 0.
b1 =2.25? Slope: predicted change in y when x

increases by 1.
32
î
A simple example of a linear equation
A company has fixed costs of $7,000 for plant and equipment and
variable costs of $600 for each unit of output.
What is total cost at varying levels of output?
let x = units of output

let C = total cost
C = fixed cost plus variable cost = 7,000 + 600 x

33
î
b1, slope, always has the same sign as r, the correlation
coefficient — but they measure different things!
The sum of the errors (or residuals),  yi  yî , is always 0

(zero).
The line always passes through the point x, y  .

34
Coefficient of Determination
When we introduced the coefficient of correlation we

pointed out that except for −1, 0, and +1 we cannot
precisely interpret its meaning.
We can judge the coefficient of correlation in relation to its

proximity to −1, 0, and +1 only.
Fortunately, we have another measure that can be

precisely interpreted. It is the coefficient of
determination, which is calculated by squaring the
coefficient of correlation. For this reason we denote it R2 .
35
Coefficient of Determination
The coefficient of determination measures the amount of variation in the

dependent variable that is explained by the variation in the independent
variable.
The coefficient of determination is

R2 = 0.758
This tells us that 75.8% of the

variation in electrical costs is
explained by the number of tools. The
remaining 24.2% is unexplained.
36
Least Squares Method --- R2

37
Parameters and Statistics

2017 09 11 Quantitative Methods Fall 2017 HAUT Chapter 2 Least Square Slides

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2017 09 11 Quantitative Methods Fall 2017 HAUT Chapter 2 Least Square Slides

Uploaded by

Copyright:

Available Formats

1

Least Squares Method

The objective of the scatter diagram is to measure the

Both can be more easily judged by drawing a straight line

Which line best

Least Squares Method

We need an objective method of producing a straight line.

 Since we oftentimes use regression to predict values of Y from observed

Least Squares Method

Least Squares Method

Recall, the slope-intercept equation for a line is expressed

Where: m is the slope of the line

If we have determined there is a linear relationship

Least Squares Method

Just to make things more difficult for students, we typically

and the intercept, b0  y  b1 x

Interpretation of the b0, b1

“Best” line: least-squares, or regression line

The coordinate of one point: (xi, yi)

Predicted value for given xi : yˆ i  b0  b1 xi

Error = distance from one point to the line =

“Best” line: least-squares, or regression line

Lines of best fit will pivot around the

Line of Best Fit

Only for medium to strong correlations...

Line of Best Fit

Line of Best Fit

r measures “closeness” of data to the “best” line. How

yˆi  9.95  2.25 xi

b1 =2.25? Slope: predicted change in y when x

What is total cost at varying levels of output?

let x = units of output

C = fixed cost plus variable cost = 7,000 + 600 x

The sum of the errors (or residuals),  yi  yˆi , is always 0

The line always passes through the point x, y  .

When we introduced the coefficient of correlation we

We can judge the coefficient of correlation in relation to its

Fortunately, we have another measure that can be

The coefficient of determination measures the amount of variation in the

The coefficient of determination is

This tells us that 75.8% of the

Least Squares Method --- R2

Parameters and Statistics

You might also like