Lect5 Math231

1/31
Statistics
Simple Linear Regression
Shaheena Bashir
FALL, 2019
2/31
Outline
Introduction
Regression
Assumptions about The Model
Method of Least Squares
Assessment of the Model
Graphical Assessment
Regression with Categorical Predictor
o
3/31
Introduction
o
4/31
Introduction
Motivating Example
I https://www.nature.com/articles/ejhg20095
I https://www.wired.com/2009/03/predicting-height-the-
victorian-approach-beats-modern-genomics/
o
5/31
Introduction
Galton’s Dataset
74
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
72
● ● ● ● ● ● ●
Child height (inches)

● ● ● ● ● ● ● ●
70
● ● ● ● ● ● ● ● ● ●
68 ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
66
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
64
● ● ● ● ● ● ● ●
● ● ●
62
● ● ● ● ●
64 66 68 70 72
Mid−parent height (inches)
I Why do children of very tall parents tend to be tall, but a

little shorter than their parents and
I Why children of very short parents tend to be short, but not
o
as short as their parents?
6/31
Introduction
Regression to the Mean
These phenomena are all examples of so called regression to the

mean, invented by Francis Galton in the paper ’Regression
towvards mediocrity in hereditary stature’. The Journal of the
Anthropological Institute of Great Britain and Ireland , Vol. 15,
(1886)
o
7/31
Introduction
Deterministic Models
A deterministic model is one in which the values for the dependent

variables of the system are completely determined by the
parameters of the model.
y = α + βx
Area = πr 2
Circumference = 2πr
9
Fahreheit = 32 + × Celsius
5
o
8/31
Regression
I The aim of regression is to model the dependence of one

variable Y on a set of variables X1 , . . . , Xp .
I Y is called the dependent variable or the response variable.
I X1 , . . . , Xp are called the independent variables or predictors
or covariates.
I We assume here that the relationship between X and Y is
linear (or has been linearized through transformation).
I In Linear Regression Model, Y will be a continuous or
quantitative variable but the covariates may be continuous or
discrete.
o
9/31
Regression
The General Regression Model
I A general model for predicting Y given the covariates

X1 , . . . , Xp would be Y = f (X1 , . . . , Xp ) +
I The term explains the random variation of Y about
f (X1 , . . . , Xp )
I In most modelling situations the form of f will be determined
by the analyst and it will typically depend on a set of
unknown parameters
I The aim of regression is then to make inferences about the
unknown parameters in this model.
o
10/31
Regression
Example: Galton’s Data Cont’d
I 928 adult children born to 205 fathers and mothers

I How the children’s height (Y) depends on the parent’s height
(X)?
I We wish to fit the model Y = βo + β1 X +
I Here βo is the y-intercept, the value of y when x=0
I β1 is the slope of the line, the change in y for a unit change in
x
I is the random error component that make the model as
probabilistic model
o
11/31
Regression
Example
74
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ● ●
72
● ● ● ● ● ● ●
Child height (inches)
● ● ● ● ● ● ● ●
70
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
68
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
66
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ●
64
● ● ● ● ● ● ● ●
● ● ●
62
● ● ● ● ●
64 66 68 70 72
o
Mid−parent height (inches)
12/31
Regression
Example: Cont’d
I The red line is the line
Children Height = Parents Height

I The blue line is the fitted regression line
Children Height = 23.94 + 0.65 Parents Height
I The fitted line shows that short parents have shorter than
average children but they tend to be taller than their parents.
I Conversely, tall parents have taller than average children but
they are shorter than their parents. This phenomenon is
known as Regression to the Mean.
I The blue line seems to follow the mean heights very well.
I The fitted line tells us that, on average, for every 1 inch
increase in parent’s height, children height increases by 0.65
inches. o
13/31
Regression
I A linear relationship βo + β1 X + exists between X and Y

I are independent in the probabilistic sense & follow a Normal
Distribution N(0, σ 2 )
I In terms of Y this means that the conditional distribution of
Y given X = x is normal
Y |X = x ∼ N(βo + β1 X ; σ 2 )
one assumption of the fitted model is that the standard
deviations of the error terms are constant and do not depend
on the x-value. Consequently, each probability distribution for
y (response variable) has the same standard deviation
regardless of the x-value (predictor). In short, this assumption
is homoscedasticity
I Note that the marginal (unconditional) distribution of Y may
not be normal and this is not required for our model. All that
is required is that the conditional distribution is normal for
every x under consideration. o
14/31
Regression
∼ N(0, σ 2 )
That is, for any value of the independent variable there is a single
o
most likely value for the dependent variable
15/31
Regression
Predicted Values
I For a bivariate data set (y1 , x1 ), (y2 , x2 ), . . . , (yn , xn ), we are

interested to predict values of Y for any given value of X
using the regression model
I If the values of βo & β1 are known then the predicted value of
Y would be βo + β1 X also called as the fitted value
o
16/31
Regression
Errors
We can estimate random errors, i in the fitted value by the
vertical distance i = yi − βo − β1 xi
y●4
3.5
3.0
y^5
2.5
y^4
y
y●5
y2
y^3
2.0
y^2
1.5
●
y^1 y3
{ e1 = y1 − y^1
1.0
●
y1
1 2 3 4 5 o
x
17/31
Regression
Estimating the Unknown Parameters
I We wish to find a line which makes the smallest total vertical

error.
I But some of the errors are positive, while others would be
negative, so sum of squared errors is an overall measure of the
fit of the regression line.
I The Method of Least Squares is an estimation method
which estimates
P βo and β1 as those values which minimize
S(β0 , β1 ) = (yi − ŷ )2 .
o
18/31
Regression
Least Squares Estimates

Suppose that the bivariate data (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
follows a linear relationship explained by simple linear regression
model. The least squares estimates are
βô
= ȳ − βˆ1 x̄
P
(yi − ȳ )(xi − x̄)
βˆ1 = ,
(xi − x̄)2
P
P P
where ȳ = i yi /n & x̄ = i xi /n are the sample means of the
response variable & the predictor variable respectively. (βô , βˆ1 ) are
also called the OLS estimates. The units of βo are the same as
units of Y , while units of β1 correspond to units of Y per unit of
x. The least squares regression line is then
Ŷ = βô + βˆ1 X o
19/31
Regression
Fitted Values ŷi
I For each observation in the dataset we can compute the fitted

value
yî = βô + βˆ1 xi
I yî is simply estimated mean of Y when X = xi
o
20/31
Regression
Residuals ei
I The vertical distance from the observed yi to the fitted value
yî is called the residual.
ei = yi − yî = yi − βô − βˆ1 xi , i = 1, . . . , n
I The residuals can be thought of as estimates (predicted
values) of the unknown errors 1 , . . . , n
o
21/31
Regression
Properties of Least Squares Estimates
1. The least squares line always passes through the point (x̄, ȳ )
2. The sum of the residuals ei ’s is 0.
3. The sum of the squares of the ei ’s is called the Residual Sum
of Squares or Sum of Squared Errors (SSE).
SSE
4. An unbiased estimate of the variance σ 2 is given by n−2
o
22/31
Regression
Coefficient of Determination R 2
The strength of the relationship between x and y is measured by
the coefficient of determination R 2 .
(yi − yî )2
P
2 SSE
R =1− =1− P
SSy (yi − ȳ )2
I SSy (total sum of squares) is a measure of the variability in

y 1, . . . , yn without taking the covariate into account,
I SSE (the error sum of squares) is the amount of variability left
after fitting a linear regression for the covariate.
We interpret R 2 as the fraction of the variance of y that is
’explained’ by the regression. In Galton data-set, R 2 = 0.2105, we
can say that 21% of variation in the child height is explained by
the parent height. o
23/31
Regression
Diagnostics
I Regression model based on some set of assumptions

I It is important to check those assumptions before making any
conclusions about the relationship of the response variable Y
to the predictor X .
I Checking assumptions after the fit of the preliminary model is
based on diagnostics.
I Diagnostics may be graphical or numerical.
I Few graphical diagnostics are discussed here.
o
24/31
Regression
Bivariate Plots
I In simple regression, one of the most useful plots is a

scatterplot of the covariate against the response.
I This can be useful in detecting non-linearity in the model
which needs to be corrected.
I It can also show outlying points in the variables space.
o
25/31
Regression
Residuals vs Fitted Values Plot
6
●
● ● ●
●
● ●
4
● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ●
● ●
2
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
Residuals
● ● ●
●
0 ●
●
●
●
●
●
●
●
●
● ● ● ●
● ● ●
● ● ●
−2
● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
−4
● ● ●
● ●
● ● ●
● ●
●
●
−6
● ●
−8
66 67 68 69 70 71
Fitted
I The plot should look like a random scatter about the line y =
0 with constant variance.
I A pattern in the plot may indicate violation of one or more
o
assumptions
26/31
Regression
Normal Q-Q Plot of Residuals

Plot of the ordered studentized residuals against the N(0,1)
quantiles.
Normal Q−Q
3
●
●●●●●●●●● ●
●●
●●
2
●●
●
●●
●●
●
●●
●
●●
●
●●
●
●●
●
●●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
1
●
Standardized residuals
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
0
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
−1
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
●●
●●
●
●●
−2
●
●●
●●
●
●●
●●
●
●●
●
●●
●
●
●●
●
●
●
●●●●
●●
●●
●●●
●
13
2● ●
−3
●1
−3 −2 −1 0 1 2 3
Theoretical Quantiles
lm(child ~ parent)
I It should be close to the line y = x if normality holds.

o
I Curvature in the tails indicates a violation of the normality
27/31
Regression
Background
I We often wish to use categorical (or qualitative) variables as

covariates in a regression model.
I For binary variables (taking on only 2 values, e.g. gender), it
is relatively easy to include them in the model.
I Usually one level is coded as 0 and the other as 1 and then
the variable can be put into the model as normal.
I However, the interpretation of the estimate is slightly different.
o
28/31
Regression
A Single Binary Predictor

Consider the linear regression model
Y = βo + β1 X +
Here
1 if Males,
X =
0 Females
is the dummy variable.
Then for Males
E (Y |X = 1) = βo + β1
while for females
E (Y |X = 0) = βo
βˆ1 is interpreted as increase or decrease in the mean response for
Males vs Females o
29/31
Regression
An R Example
Remember
Weight = βô + βˆ1 Gender
βô : Mean Weight for females
βô + βˆ1 : Mean Weight for Males
βˆ1 : Increase/decrease in Male’s Weight as compared to Female
Weight = 49.66 + 10.7Gender

o
30/31
Regression
70
65
60
Weight in Kg
55
50
45
F M
o
31/31
Regression

Lect5 Math231

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect5 Math231

Uploaded by

Copyright:

Available Formats

1/31

Simple Linear Regression

Child height (inches)

Mid−parent height (inches)

I Why do children of very tall parents tend to be tall, but a

Regression to the Mean

These phenomena are all examples of so called regression to the

A deterministic model is one in which the values for the dependent

I The aim of regression is to model the dependence of one

The General Regression Model

I A general model for predicting Y given the covariates

Example: Galton’s Data Cont’d

I 928 adult children born to 205 fathers and mothers

Children Height = Parents Height

I A linear relationship βo + β1 X +  exists between X and Y

I For a bivariate data set (y1 , x1 ), (y2 , x2 ), . . . , (yn , xn ), we are

Estimating the Unknown Parameters

I We wish to find a line which makes the smallest total vertical

Least Squares Estimates

Fitted Values ŷi

I For each observation in the dataset we can compute the fitted

Properties of Least Squares Estimates

I SSy (total sum of squares) is a measure of the variability in

I Regression model based on some set of assumptions

I In simple regression, one of the most useful plots is a

Residuals vs Fitted Values Plot

Normal Q-Q Plot of Residuals

I It should be close to the line y = x if normality holds.

I We often wish to use categorical (or qualitative) variables as

A Single Binary Predictor

βˆo : Mean Weight for females

βˆo + βˆ1 : Mean Weight for Males

βˆ1 : Increase/decrease in Male’s Weight as compared to Female

Weight = 49.66 + 10.7Gender

You might also like

I A linear relationship βo + β1 X + exists between X and Y