2023 Stat441 Lec3

Simple Linear Regression
Estimation and Inference

Summary and Remark
Lecture 3 Linear Regression I

STAT 441: Statistical Methods for Learning and Data Mining
Jinhan Xie
Department of Mathematical and Statistical Sciences

University of Alberta
Winter, 2023
Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 1/20

Summary and Remark
Outline
1 Simple Linear Regression
2 Estimation and Inference
3 Summary and Remark

Summary and Remark
Vectors
An array x of n real numbers x1 , . . . , xn is called a vector, and it is written as

 
x1
 x2 
x =  .  , or x0 = [x1 , x2 , · · · , xn ].
 
 .. 
xn
Here the prime denotes the operation of transposing a column to a row.

The number n is referred to as the dimension of the vector x.
The coordinates x1 , . . . , xn can also be complex numbers, in which case
x is a complex vector. However in this subject we consider ONLY real
numbers and real vectors.

Summary and Remark
The set of all the real vectors of dimension n, is denoted by Rn .

One can scale a vector x by multiplying it by a constant c.
x=x ⇒ cx = cx.
Vectors can be added.

 
x1 + y1
 x2 + y2 
x+y=x+y= .
 
..
 . 
xn + yn

Summary and Remark
inner product:
n
X
hx, yi := xi yi = hy, xi .
i=1
length, or norm:
p q
kxk := hx, xi = x12 + x22 + · · · + xn2 .
For any constant c,
hcx, yi = c hx, yi ;
p q p
kcxk = hcx, cxi = c2 hx, xi = |c| hx, xi = |c|kxk.
triangle inequality: (proof left as a warming up exercise)
kx + yk ≤ kxk + kyk.

Summary and Remark
Matrix
Matrix A, of dimension m × n, is just a table of (real) numbers.
 
A1,1 A1,2 · · · A1,n
 A2,1 A2,2 · · · A2,n 
A= . ..  .
 
.. ..
 .. . . . 
Am,1 Am,2 · · · Am,n
Ai,j is called the (i, j)-entry of the matrix A.

Transpose: A0 is a n × m matrix, defined by
 
A1,1 A2,1 · · · Am,1
 A1,2 A2,2 · · · Am,2 
A0 =  . .
 
.. .. ..
 .. . . . 
A1,n A2,n · · · Am,n

Summary and Remark

Linear regression is a simple approach to supervised learning. It assumes
that the dependence of Y on X1 , X2 , · · · , Xp is linear.
True regression functions are never linear! although it may seem overly
simplistic, linear regression is extremely useful both conceptually and
practically.

Summary and Remark
Simple Linear Regression Model (SLR) has the form of
Y = β0 + β1 X + ε,
where β0 and β1 are two unknown parameters (coefficients), called

intercept and slope, respectively, and ε is the error term.
Given the estimates βˆ0 and βˆ1 , the estimated regression line is
y = βˆ0 + βˆ1 x.
For X = x, we predict Y by ŷ = βˆ0 + βˆ1 x, where the hat symbol denotes

an estimated value.

Summary and Remark
Estimate the parameters
Let (yi , xi ) be the i-th observation and ŷi = βˆ0 + βˆ1 xi , we call ei = yi − ŷi
the ith residual.
To estimate the parameters, we minimized the residual sums of squares
(RSS),
X X 2
RSS = e2 = yi − βˆ0 − βˆ1 xi , .
i
i i
P P
Denote ȳ = i yi /n and x̄ = i xi /n. The minimized values are
P pP !
− ȳ)(xi − x̄) (y − 2
i (y
Pi
i ȳ)
βˆ1 = 2
= r pP i ,
i (xi − x̄)
2
i (xi − x̄)
βˆ0 = ȳ − βˆ1 x̄.

Summary and Remark
Example
●
●
● ● ● ●
25
●
● ●
● ● ●
●
● ● ●
● ● ●
● ●
● ●
●
● ● ● ●
20
● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ●
● ●●
● ● ● ● ●●
● ●● ● ●
●
15
● ●
Sales
● ● ● ● ●●
●● ● ● ●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ●● ● ● ●
● ● ● ● ●
● ● ● ●
● ●
● ●● ● ● ●
● ● ● ●
● ● ●
●●
● ● ● ●
●
●● ● ● ●●
● ●● ● ● ●
● ● ●
●
10
●● ● ●
● ● ● ●●● ●●
●
●
● ● ● ●
● ●● ● ● ● ●
●
● ●
● ●
●● ●
● ●●
● ●●
●● ●●
● ●
5
0 50 100 150 200 250 300
TV
Advertising data: the least square fit for the regression of sales and TV.
Each grey line segment represents an error, and the fit makes a
compromise by averaging their squares.
In this case a linear fit captures the essence of the relationship, although
it is somewhat deficient in the left of the plot.
Summary and Remark
Interval Estimates
A point estimate gives a single number, but does not give information about
the precision and reliability of estimation. For example, given βˆ1 = 1.2231,
we have no information about how close it might be to β1 .
A 95% confidence interval (CI) for β1 is an interval derived from sample,

such that if we independently repeat this experiments many times, in 95% of
our experiments the intervals obtained will cover β1 .
The percentage “95%” is called the confidence level or confidence
coefficient. Usually people use 90%, 95%, 99%, etc, as confidence levels.

Summary and Remark
Assess the coefficient estimates

The standard error of an estimator reflects how it varies under repeated
sampling.
s s
σ2 1 x̄2
SE(β̂1 ) = P , SE(β̂0 ) = σ2 +P ,
(xi − x̄)2 n (xi − x̄)2
where σ 2 = Var(ε).
A 95% confidence interval is defined as a range of values such that with
95% probability, the range will contain the true unknown value of the
parameter.
It has the form
βˆ1 ± 2 · SE(β̂1 ).
For the advertising data, the 95% confidence interval for β1 is
[0.042, 0.053], which means, there is approximately 95% chance this
interval contains the true value of β1 (under a scenario where we got
repeated samples like the present sample).
Summary and Remark
Hypothesis testing
Definition: A statistical hypothesis is an assertion about the distribution

of one or more random variables. If the statistical hypothesis completely
specifies the distribution, it is called a simple statistical hypothesis; if it
does not, it is called a composite statistical hypothesis.
Definition: A test of statistical hypothesis is a decision rule which, when
the experimental sample values have been obtained, leads to a decision to
accept or to reject the hypothesis under consideration.
We do not expand all the details. For a comprehensive review, see any
statistics text books, for example Morris H. DeGroot and Mark J. Schervish,
Probability and Statistics, Pearson, 2012.

Summary and Remark
Hypothesis testing
Standard errors can also be used to perform hypothesis tests on the

coefficients. The most common hypothesis test involves testing the null
hypothesis of
H0 : There is no relationship between X and Y versus the alternative
hypothesis
HA : There is some relationship between X and Y.
Mathematically, we test
H0 : β1 = 0 versus HA : β1 6= 0,
since if β1 = 0 then the model reduces to Y = β0 + ε, and X is not

associated with Y.

Summary and Remark
Four possibilities
Decision Table for a Test of Hypothesis
True State of Nature

Decision H0 is true H1 is true
Reject H0 Type I Error Correct Decision
Accept H0 Correct Decision Type II Error

Summary and Remark
Hypothesis testing
To test the null hypothesis, we compute a t-statistics,
β̂1 − 0
t= .
SE βˆ1
This statistics follows tn−2 under the null hypothesis β1 = 0.

Using statistical software, it is easy to compute the probability of
observing any value equal to |t| or larger. We call this probability the
p-value (the smallest level α0 that we would reject the null-hypothesis
(H0 ) at level α0 with the data.)
Results for the advertising data
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.032594 0.457843 15.36 <2e-16 ***
TV 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Summary and Remark
Measure of fit
We compute the Residual Standard Error

r s
1 1 X
RSE = RSS = (yi − ŷi )2 ,
n−2 n−2 i
where the residual sum-of-squares is RSS = i (yi − ŷi )2 .

P
R-squared or fraction of variance explained is

TSS − RSS RSS
R2 = =1− ,
TSS TSS
− ȳ)2 is the total sum of squares.

P
where TSS = i (yi
a statistical measure that represents the proportion of the variance for a
dependent variable that’s explained by an independent variable or
variables in a regression model.

Summary and Remark
Measure of fit
It can be shown that in this simple linear regression setting that R2 = r2 ,

where r is the correlation between Y and X:
P pP !
− ȳ)(xi − x̄)
i (yi i (xi − x̄)2
r = pP pP = β̂1 pP .
2 2 − ȳ)2
i (yi − ȳ) i (xi − x̄) i (yi

Summary and Remark
R code
> TVadData? read.csv(’... Advertising.csv’)

> attach(TVadData)
> TVadlm = lm(Sales~TV)
> summary(TVadlm)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.032594 0.457843 15.36 <2e-16 ***
TV 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1
Residual standard error: 3.259 on 198 degrees of freedom

Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16

Summary and Remark
Summary and Remark
Simple linear regression

Estimation and inference
Measure of fit R2
Read textbook Chapter 3

2023 Stat441 Lec3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 Stat441 Lec3

Uploaded by

Copyright:

Available Formats

Simple Linear Regression

Estimation and Inference

Lecture 3 Linear Regression I

Department of Mathematical and Statistical Sciences

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 1/20

1 Simple Linear Regression

2 Estimation and Inference

3 Summary and Remark

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 2/20

An array x of n real numbers x1 , . . . , xn is called a vector, and it is written as

Here the prime denotes the operation of transposing a column to a row.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 3/20

The set of all the real vectors of dimension n, is denoted by Rn .

Vectors can be added.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 4/20

For any constant c,

triangle inequality: (proof left as a warming up exercise)

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 5/20

Ai,j is called the (i, j)-entry of the matrix A.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 6/20

Simple Linear Regression

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 7/20

Simple Linear Regression

Simple Linear Regression Model (SLR) has the form of

where β0 and β1 are two unknown parameters (coefficients), called

For X = x, we predict Y by ŷ = βˆ0 + βˆ1 x, where the hat symbol denotes

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 8/20

Estimate the parameters

βˆ0 = ȳ − βˆ1 x̄.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 9/20

0 50 100 150 200 250 300

A 95% confidence interval (CI) for β1 is an interval derived from sample,

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 11/20

Assess the coefficient estimates

Definition: A statistical hypothesis is an assertion about the distribution

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 13/20

Standard errors can also be used to perform hypothesis tests on the

since if β1 = 0 then the model reduces to Y = β0 + ε, and X is not

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 14/20

Decision Table for a Test of Hypothesis

True State of Nature

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 15/20

This statistics follows tn−2 under the null hypothesis β1 = 0.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 16/20

We compute the Residual Standard Error

where the residual sum-of-squares is RSS = i (yi − ŷi )2 .

R-squared or fraction of variance explained is

− ȳ)2 is the total sum of squares.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 17/20

It can be shown that in this simple linear regression setting that R2 = r2 ,

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 18/20

> TVadData? read.csv(’... Advertising.csv’)

Residual standard error: 3.259 on 198 degrees of freedom

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 19/20

Summary and Remark

Simple linear regression

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 20/20

You might also like