You are on page 1of 20

Simple Linear Regression

Estimation and Inference


Summary and Remark

Lecture 3 Linear Regression I


STAT 441: Statistical Methods for Learning and Data Mining

Jinhan Xie

Department of Mathematical and Statistical Sciences


University of Alberta

Winter, 2023

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 1/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Outline

1 Simple Linear Regression

2 Estimation and Inference

3 Summary and Remark

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 2/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Vectors

An array x of n real numbers x1 , . . . , xn is called a vector, and it is written as


 
x1
 x2 
x =  .  , or x0 = [x1 , x2 , · · · , xn ].
 
 .. 
xn

Here the prime denotes the operation of transposing a column to a row.


The number n is referred to as the dimension of the vector x.
The coordinates x1 , . . . , xn can also be complex numbers, in which case
x is a complex vector. However in this subject we consider ONLY real
numbers and real vectors.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 3/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

The set of all the real vectors of dimension n, is denoted by Rn .


One can scale a vector x by multiplying it by a constant c.

x=x ⇒ cx = cx.

Vectors can be added.


 
x1 + y1
 x2 + y2 
x+y=x+y= .
 
..
 . 
xn + yn

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 4/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

inner product:
n
X
hx, yi := xi yi = hy, xi .
i=1

length, or norm:
p q
kxk := hx, xi = x12 + x22 + · · · + xn2 .

For any constant c,

hcx, yi = c hx, yi ;
p q p
kcxk = hcx, cxi = c2 hx, xi = |c| hx, xi = |c|kxk.

triangle inequality: (proof left as a warming up exercise)

kx + yk ≤ kxk + kyk.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 5/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Matrix
Matrix A, of dimension m × n, is just a table of (real) numbers.
 
A1,1 A1,2 · · · A1,n
 A2,1 A2,2 · · · A2,n 
A= . ..  .
 
.. ..
 .. . . . 
Am,1 Am,2 · · · Am,n

Ai,j is called the (i, j)-entry of the matrix A.


Transpose: A0 is a n × m matrix, defined by
 
A1,1 A2,1 · · · Am,1
 A1,2 A2,2 · · · Am,2 
A0 =  . .
 
.. .. ..
 .. . . . 
A1,n A2,n · · · Am,n

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 6/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Simple Linear Regression


Linear regression is a simple approach to supervised learning. It assumes
that the dependence of Y on X1 , X2 , · · · , Xp is linear.
True regression functions are never linear! although it may seem overly
simplistic, linear regression is extremely useful both conceptually and
practically.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 7/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Simple Linear Regression

Simple Linear Regression Model (SLR) has the form of

Y = β0 + β1 X + ε,

where β0 and β1 are two unknown parameters (coefficients), called


intercept and slope, respectively, and ε is the error term.
Given the estimates βˆ0 and βˆ1 , the estimated regression line is

y = βˆ0 + βˆ1 x.

For X = x, we predict Y by ŷ = βˆ0 + βˆ1 x, where the hat symbol denotes


an estimated value.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 8/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Estimate the parameters

Let (yi , xi ) be the i-th observation and ŷi = βˆ0 + βˆ1 xi , we call ei = yi − ŷi
the ith residual.
To estimate the parameters, we minimized the residual sums of squares
(RSS),
X X 2
RSS = e2 = yi − βˆ0 − βˆ1 xi , .
i
i i
P P
Denote ȳ = i yi /n and x̄ = i xi /n. The minimized values are
P pP !
− ȳ)(xi − x̄) (y − 2
i (y
Pi
i ȳ)
βˆ1 = 2
= r pP i ,
i (xi − x̄)
2
i (xi − x̄)

βˆ0 = ȳ − βˆ1 x̄.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 9/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Example


● ● ● ●
25

● ●
● ● ●

● ● ●
● ● ●
● ●
● ●

● ● ● ●
20

● ● ●
● ● ● ●
● ● ●
● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ●
● ●●
● ● ● ● ●●
● ●● ● ●

15

● ●
Sales

● ● ● ● ●●
●● ● ● ●
● ●
● ●
● ● ● ●
● ● ● ●
● ● ●● ● ● ●
● ● ● ● ●
● ● ● ●
● ●
● ●● ● ● ●
● ● ● ●
● ● ●
●●
● ● ● ●

●● ● ● ●●
● ●● ● ● ●
● ● ●

10

●● ● ●
● ● ● ●●● ●●


● ● ● ●
● ●● ● ● ● ●

● ●
● ●
●● ●
● ●●
● ●●
●● ●●
● ●
5

0 50 100 150 200 250 300

TV

Advertising data: the least square fit for the regression of sales and TV.
Each grey line segment represents an error, and the fit makes a
compromise by averaging their squares.
In this case a linear fit captures the essence of the relationship, although
it is somewhat deficient in the left of the plot.
Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 10/20
Simple Linear Regression
Estimation and Inference
Summary and Remark

Interval Estimates

A point estimate gives a single number, but does not give information about
the precision and reliability of estimation. For example, given βˆ1 = 1.2231,
we have no information about how close it might be to β1 .

A 95% confidence interval (CI) for β1 is an interval derived from sample,


such that if we independently repeat this experiments many times, in 95% of
our experiments the intervals obtained will cover β1 .
The percentage “95%” is called the confidence level or confidence
coefficient. Usually people use 90%, 95%, 99%, etc, as confidence levels.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 11/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Assess the coefficient estimates


The standard error of an estimator reflects how it varies under repeated
sampling.
s s  
σ2 1 x̄2
SE(β̂1 ) = P , SE(β̂0 ) = σ2 +P ,
(xi − x̄)2 n (xi − x̄)2

where σ 2 = Var(ε).
A 95% confidence interval is defined as a range of values such that with
95% probability, the range will contain the true unknown value of the
parameter.
It has the form
βˆ1 ± 2 · SE(β̂1 ).
For the advertising data, the 95% confidence interval for β1 is
[0.042, 0.053], which means, there is approximately 95% chance this
interval contains the true value of β1 (under a scenario where we got
repeated samples like the present sample).
Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 12/20
Simple Linear Regression
Estimation and Inference
Summary and Remark

Hypothesis testing

Definition: A statistical hypothesis is an assertion about the distribution


of one or more random variables. If the statistical hypothesis completely
specifies the distribution, it is called a simple statistical hypothesis; if it
does not, it is called a composite statistical hypothesis.
Definition: A test of statistical hypothesis is a decision rule which, when
the experimental sample values have been obtained, leads to a decision to
accept or to reject the hypothesis under consideration.
We do not expand all the details. For a comprehensive review, see any
statistics text books, for example Morris H. DeGroot and Mark J. Schervish,
Probability and Statistics, Pearson, 2012.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 13/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Hypothesis testing

Standard errors can also be used to perform hypothesis tests on the


coefficients. The most common hypothesis test involves testing the null
hypothesis of
H0 : There is no relationship between X and Y versus the alternative
hypothesis
HA : There is some relationship between X and Y.
Mathematically, we test

H0 : β1 = 0 versus HA : β1 6= 0,

since if β1 = 0 then the model reduces to Y = β0 + ε, and X is not


associated with Y.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 14/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Four possibilities

Decision Table for a Test of Hypothesis

True State of Nature


Decision H0 is true H1 is true
Reject H0 Type I Error Correct Decision
Accept H0 Correct Decision Type II Error

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 15/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Hypothesis testing
To test the null hypothesis, we compute a t-statistics,
β̂1 − 0
t=  .
SE βˆ1

This statistics follows tn−2 under the null hypothesis β1 = 0.


Using statistical software, it is easy to compute the probability of
observing any value equal to |t| or larger. We call this probability the
p-value (the smallest level α0 that we would reject the null-hypothesis
(H0 ) at level α0 with the data.)
Results for the advertising data
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.032594 0.457843 15.36 <2e-16 ***
TV 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 16/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Measure of fit

We compute the Residual Standard Error


r s
1 1 X
RSE = RSS = (yi − ŷi )2 ,
n−2 n−2 i

where the residual sum-of-squares is RSS = i (yi − ŷi )2 .


P

R-squared or fraction of variance explained is


TSS − RSS RSS
R2 = =1− ,
TSS TSS

− ȳ)2 is the total sum of squares.


P
where TSS = i (yi
a statistical measure that represents the proportion of the variance for a
dependent variable that’s explained by an independent variable or
variables in a regression model.

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 17/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Measure of fit

It can be shown that in this simple linear regression setting that R2 = r2 ,


where r is the correlation between Y and X:
P pP !
− ȳ)(xi − x̄)
i (yi i (xi − x̄)2
r = pP pP = β̂1 pP .
2 2 − ȳ)2
i (yi − ȳ) i (xi − x̄) i (yi

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 18/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

R code

> TVadData? read.csv(’... Advertising.csv’)


> attach(TVadData)
> TVadlm = lm(Sales~TV)
> summary(TVadlm)

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.032594 0.457843 15.36 <2e-16 ***
TV 0.047537 0.002691 17.67 <2e-16 ***
---
Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

Residual standard error: 3.259 on 198 degrees of freedom


Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 19/20


Simple Linear Regression
Estimation and Inference
Summary and Remark

Summary and Remark

Simple linear regression


Estimation and inference
Measure of fit R2
Read textbook Chapter 3

Jinhan Xie (University of Alberta) Lecture 3 LR I Winter, 2023 20/20

You might also like