Professional Documents
Culture Documents
CHAPTER 1: Basic Concepts of Regression Analysis: Prof. Alan Wan
CHAPTER 1: Basic Concepts of Regression Analysis: Prof. Alan Wan
Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
1 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Table of contents
1. Introduction
2 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
I Regression analysis is a statistical technique used to describe
relationships among variables.
I The simplest case to examine is one in which a variable Y ,
referred to as the dependent or target variable, may be
related to one variable X , called an independent or
explanatory variable, or simply a regressor.
3 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
I Regression analysis is a statistical technique used to describe
relationships among variables.
I The simplest case to examine is one in which a variable Y ,
referred to as the dependent or target variable, may be
related to one variable X , called an independent or
explanatory variable, or simply a regressor.
I If the relationship between Y and X is believed to be linear,
then the equation for a line may be appropriate:
Y = β1 + β2 X ,
where β1 is an intercept term and β2 is a slope coefficient.
3 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
I Regression analysis is a statistical technique used to describe
relationships among variables.
I The simplest case to examine is one in which a variable Y ,
referred to as the dependent or target variable, may be
related to one variable X , called an independent or
explanatory variable, or simply a regressor.
I If the relationship between Y and X is believed to be linear,
then the equation for a line may be appropriate:
Y = β1 + β2 X ,
where β1 is an intercept term and β2 is a slope coefficient.
I In simplest terms, the purpose of regression is to try to find
the best fit line or equation that expresses the relationship
between Y and X .
3 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
I Consider the following data points
X 1 2 3 4 5 6
Y 3 5 7 9 11 13
I A graph of the (x, y ) pairs would appear as
Fig. 1.1
14
12
10
8
Y
0
0 1 2 3 4 5 6 7
X
4 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
5 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
5 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
I Data encountered in a business environment are more likely to
appear like the data points in this graph, where Y and X
largely obey an approximately linear relationship, but it is not
an exact relationship:
Fig. 1.2
14
12
10
8
Y
0
0 1 2 3 4 5 6 7
X
6 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Introduction
7 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
8 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
8 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
8 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Fig. 1.3
14
12
10
𝑦𝑦𝑖𝑖
8
𝑒𝑒𝑖𝑖
Y
6
𝑦𝑦
�𝚤𝚤
4
0
0 1 2 3 4 5 6 7
𝑥𝑥𝑖𝑖
X
9 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
10 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
10 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
10 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
11 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
11 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
12 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
or
Pn
xi yi − nx̄ ȳ
b2 = Pi=1n 2 2
i=1 xi − nx̄
b1 = ȳ − b2 x̄
13 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
14 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Fig. 1.4
60000
55000
50000
45000
COST
40000
35000
30000
25000
20000
10 20 30 40 50 60 70
NUMPORTS
15 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Regression Statistics
Multiple R 0.941928423
R Square 0.887229154
Adjusted R Square 0.877831584
Standard Error 4306.914458
Observations 14
ANOVA
Significance
df SS MS F F
Regression 1 1751268376 1751268376 94.41048161 4.88209E-07
Residual 12 222594145.8 18549512.15
Total 13 1973862522
17 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
\ = 16593.65 + 650.169NUMPORTS
COST
18 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Regression Statistics
Multiple R 0.815274956
R Square 0.664673254
Adjusted R
Square 0.661251553
Standard Error 27270.25391
Observations 100
ANOVA
df SS MS F Significance F
Regression 1 1.44459E+11 1.44459E+11 194.252262 5.49151E-25
Residual 98 72879341312 743666748.1
Total 99 2.17338E+11
20 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
21 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
21 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
21 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
21 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
22 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
22 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
22 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
I Thus far, the regression can be viewed as a descriptive
statistic.
I However, the power of regression is not restricted to its use as
a descriptive statistic for a particular sample, but more in its
ability to draw inferences or generalisations about the entire
population of values for the variables X and Y .
23 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
I Thus far, the regression can be viewed as a descriptive
statistic.
I However, the power of regression is not restricted to its use as
a descriptive statistic for a particular sample, but more in its
ability to draw inferences or generalisations about the entire
population of values for the variables X and Y .
I To draw inference about a population from a sample, we must
make some assumptions about how X and Y are related in
the population. These assumptions will be spelt out in details
in Chapter 2. Most of these assumptions describe an ”ideal”
situation. Later, some of these assumptions will be relaxed
and we demonstrate modifications to the basic least squares
approach that provide a method that is still suitable.
23 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
µY |X = β1 + β2 X ,
Statistical Model
yi = β1 + β2 xi + i ,
Statistical Model
26 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
26 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
27 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
27 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Statistical Model
27 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
29 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
30 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
where
y1 1 x12 . . . x1k β1 1
y2 1 x22 . . . x2k β2 2
Y = . , X = . .. , β = .. and = ..
.. . .
.. .. . . . . .
yn 1 xn2 . . . xnk βk n
32 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Y = Xβ +
33 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Let
b1
b2
b = . be the O.L.S. estimator of β.
..
bk
Define
e1
e2
e = . = y − Xb
..
en
Pn
Note that 2
i=1 ei = e 0 e.
34 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Thus,
e 0 e = (Y − Xb)0 (Y − Xb)
= Y 0 Y − b 0 X 0 Y − Y 0 Xb + b 0 X 0 Xb
= Y 0 Y − 2b 0 X 0 Y + b 0 X 0 Xb
35 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
∂e 0 e
= −2X 0 Y + 2X 0 Xb = 0,
∂b
leading to
X 0 Xb = X 0 Y or
0 −1
b = (X X ) X 0Y , (6)
36 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
X 0 e = X 0 (Y − Xb)
= X 0 Y − (X 0 X )(X 0 X )−1 X 0 Y
= 0,
37 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
giving
−1
b1 25 4080.1 14379.1 4082.34
b2 = 4080.1 832146.8 2981925 801322.7
b3 14379.1 2981925 11737267 2994883
39 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Or
b1 0.202454971 −0.001159287 0.000046500 4082.34
b2 = −0.001159287 0.000020048 −0.000003673 801322.7
b3 0.000046500 −0.000003673 0.000000961 2994883
36.789
= 0.332
0.125
40 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
Regression Statistics
Multiple R 0.891730934
R Square 0.795184059
Adjusted R
Square 0.776564428
Standard Error 38.43646324
Observations 25
ANOVA
df SS MS F Significance F
Regression 2 126186.6552 63093.33 42.70676 2.66073E-08
Residual 22 32501.95754 1477.362
Total 24 158688.6128
41 / 42
1. Introduction
2. Approaches to Line Fitting
3. The Least Squares Approach
4. Linear Regression as a Statistical Model
5. Multiple Linear Regression and Matrix Formulation
42 / 42