You are on page 1of 31

1/31

Statistics

Simple Linear Regression

Shaheena Bashir

FALL, 2019
2/31
Outline

Introduction

Regression
Assumptions about The Model
Method of Least Squares
Assessment of the Model
Graphical Assessment
Regression with Categorical Predictor

o
3/31
Introduction

o
4/31
Introduction

Motivating Example

I https://www.nature.com/articles/ejhg20095
I https://www.wired.com/2009/03/predicting-height-the-
victorian-approach-beats-modern-genomics/

o
5/31
Introduction

Galton’s Dataset

74
● ● ● ●
● ● ● ● ● ●

● ● ● ● ● ● ● ●

72
● ● ● ● ● ● ●

Child height (inches)


● ● ● ● ● ● ● ●

70
● ● ● ● ● ● ● ● ● ●

68 ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●
66

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●
64

● ● ● ● ● ● ● ●

● ● ●
62

● ● ● ● ●

64 66 68 70 72

Mid−parent height (inches)

I Why do children of very tall parents tend to be tall, but a


little shorter than their parents and
I Why children of very short parents tend to be short, but not
o
as short as their parents?
6/31
Introduction

Regression to the Mean

These phenomena are all examples of so called regression to the


mean, invented by Francis Galton in the paper ’Regression
towvards mediocrity in hereditary stature’. The Journal of the
Anthropological Institute of Great Britain and Ireland , Vol. 15,
(1886)

o
7/31
Introduction

Deterministic Models

A deterministic model is one in which the values for the dependent


variables of the system are completely determined by the
parameters of the model.

y = α + βx
Area = πr 2
Circumference = 2πr
9
Fahreheit = 32 + × Celsius
5

o
8/31
Regression

I The aim of regression is to model the dependence of one


variable Y on a set of variables X1 , . . . , Xp .
I Y is called the dependent variable or the response variable.
I X1 , . . . , Xp are called the independent variables or predictors
or covariates.
I We assume here that the relationship between X and Y is
linear (or has been linearized through transformation).
I In Linear Regression Model, Y will be a continuous or
quantitative variable but the covariates may be continuous or
discrete.

o
9/31
Regression

The General Regression Model

I A general model for predicting Y given the covariates


X1 , . . . , Xp would be Y = f (X1 , . . . , Xp ) + 
I The term  explains the random variation of Y about
f (X1 , . . . , Xp )
I In most modelling situations the form of f will be determined
by the analyst and it will typically depend on a set of
unknown parameters
I The aim of regression is then to make inferences about the
unknown parameters in this model.

o
10/31
Regression

Example: Galton’s Data Cont’d

I 928 adult children born to 205 fathers and mothers


I How the children’s height (Y) depends on the parent’s height
(X)?
I We wish to fit the model Y = βo + β1 X + 
I Here βo is the y-intercept, the value of y when x=0
I β1 is the slope of the line, the change in y for a unit change in
x
I  is the random error component that make the model as
probabilistic model

o
11/31
Regression

Example

74
● ● ● ●
● ● ● ● ● ●

● ● ● ● ● ● ● ●
72

● ● ● ● ● ● ●
Child height (inches)

● ● ● ● ● ● ● ●
70

● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●
68

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●
66

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ●
64

● ● ● ● ● ● ● ●

● ● ●
62

● ● ● ● ●

64 66 68 70 72
o
Mid−parent height (inches)
12/31
Regression

Example: Cont’d
I The red line is the line

Children Height = Parents Height


I The blue line is the fitted regression line
Children Height = 23.94 + 0.65 Parents Height
I The fitted line shows that short parents have shorter than
average children but they tend to be taller than their parents.
I Conversely, tall parents have taller than average children but
they are shorter than their parents. This phenomenon is
known as Regression to the Mean.
I The blue line seems to follow the mean heights very well.
I The fitted line tells us that, on average, for every 1 inch
increase in parent’s height, children height increases by 0.65
inches. o
13/31
Regression
Assumptions about The Model

I A linear relationship βo + β1 X +  exists between X and Y


I  are independent in the probabilistic sense & follow a Normal
Distribution N(0, σ 2 )
I In terms of Y this means that the conditional distribution of
Y given X = x is normal
Y |X = x ∼ N(βo + β1 X ; σ 2 )
one assumption of the fitted model is that the standard
deviations of the error terms are constant and do not depend
on the x-value. Consequently, each probability distribution for
y (response variable) has the same standard deviation
regardless of the x-value (predictor). In short, this assumption
is homoscedasticity
I Note that the marginal (unconditional) distribution of Y may
not be normal and this is not required for our model. All that
is required is that the conditional distribution is normal for
every x under consideration. o
14/31
Regression
Assumptions about The Model

 ∼ N(0, σ 2 )
That is, for any value of the independent variable there is a single
o
most likely value for the dependent variable
15/31
Regression
Assumptions about The Model

Predicted Values

I For a bivariate data set (y1 , x1 ), (y2 , x2 ), . . . , (yn , xn ), we are


interested to predict values of Y for any given value of X
using the regression model
I If the values of βo & β1 are known then the predicted value of
Y would be βo + β1 X also called as the fitted value

o
16/31
Regression
Assumptions about The Model

Errors
We can estimate random errors, i in the fitted value by the
vertical distance i = yi − βo − β1 xi

y●4
3.5
3.0

y^5
2.5

y^4
y

y●5
y2
y^3
2.0

y^2
1.5


y^1 y3
{ e1 = y1 − y^1
1.0


y1

1 2 3 4 5 o

x
17/31
Regression
Method of Least Squares

Estimating the Unknown Parameters

I We wish to find a line which makes the smallest total vertical


error.
I But some of the errors are positive, while others would be
negative, so sum of squared errors is an overall measure of the
fit of the regression line.
I The Method of Least Squares is an estimation method
which estimates
P βo and β1 as those values which minimize
S(β0 , β1 ) = (yi − ŷ )2 .

o
18/31
Regression
Method of Least Squares

Least Squares Estimates


Suppose that the bivariate data (x1 , y1 ), (x2 , y2 ), . . . , (xn , yn )
follows a linear relationship explained by simple linear regression
model. The least squares estimates are

βˆo
= ȳ − βˆ1 x̄
P
(yi − ȳ )(xi − x̄)
βˆ1 = ,
(xi − x̄)2
P
P P
where ȳ = i yi /n & x̄ = i xi /n are the sample means of the
response variable & the predictor variable respectively. (βˆo , βˆ1 ) are
also called the OLS estimates. The units of βo are the same as
units of Y , while units of β1 correspond to units of Y per unit of
x. The least squares regression line is then
Ŷ = βˆo + βˆ1 X o
19/31
Regression
Method of Least Squares

Fitted Values ŷi

I For each observation in the dataset we can compute the fitted


value
yˆi = βˆo + βˆ1 xi
I yˆi is simply estimated mean of Y when X = xi

o
20/31
Regression
Method of Least Squares

Residuals ei
I The vertical distance from the observed yi to the fitted value
yˆi is called the residual.
ei = yi − yˆi = yi − βˆo − βˆ1 xi , i = 1, . . . , n
I The residuals can be thought of as estimates (predicted
values) of the unknown errors 1 , . . . , n

o
21/31
Regression
Method of Least Squares

Properties of Least Squares Estimates

1. The least squares line always passes through the point (x̄, ȳ )
2. The sum of the residuals ei ’s is 0.
3. The sum of the squares of the ei ’s is called the Residual Sum
of Squares or Sum of Squared Errors (SSE).
SSE
4. An unbiased estimate of the variance σ 2 is given by n−2

o
22/31
Regression
Assessment of the Model

Coefficient of Determination R 2
The strength of the relationship between x and y is measured by
the coefficient of determination R 2 .
(yi − yˆi )2
P
2 SSE
R =1− =1− P
SSy (yi − ȳ )2

I SSy (total sum of squares) is a measure of the variability in


y 1, . . . , yn without taking the covariate into account,
I SSE (the error sum of squares) is the amount of variability left
after fitting a linear regression for the covariate.
We interpret R 2 as the fraction of the variance of y that is
’explained’ by the regression. In Galton data-set, R 2 = 0.2105, we
can say that 21% of variation in the child height is explained by
the parent height. o
23/31
Regression
Assessment of the Model

Diagnostics

I Regression model based on some set of assumptions


I It is important to check those assumptions before making any
conclusions about the relationship of the response variable Y
to the predictor X .
I Checking assumptions after the fit of the preliminary model is
based on diagnostics.
I Diagnostics may be graphical or numerical.
I Few graphical diagnostics are discussed here.

o
24/31
Regression
Assessment of the Model

Bivariate Plots

I In simple regression, one of the most useful plots is a


scatterplot of the covariate against the response.
I This can be useful in detecting non-linearity in the model
which needs to be corrected.
I It can also show outlying points in the variables space.

o
25/31
Regression
Assessment of the Model

Residuals vs Fitted Values Plot

6

● ● ●

● ●

4
● ● ●
● ● ● ●
● ●
● ● ● ● ●
● ●
● ● ●
● ●

2
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
Residuals
● ● ●

0 ●








● ● ● ●
● ● ●
● ● ●
−2

● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
−4

● ● ●
● ●
● ● ●
● ●


−6

● ●
−8

66 67 68 69 70 71

Fitted

I The plot should look like a random scatter about the line y =
0 with constant variance.
I A pattern in the plot may indicate violation of one or more
o
assumptions
26/31
Regression
Assessment of the Model

Normal Q-Q Plot of Residuals


Plot of the ordered studentized residuals against the N(0,1)
quantiles.

Normal Q−Q

3

●●●●●●●●● ●
●●
●●
2

●●

●●
●●

●●

●●

●●

●●

●●
●●

●●



●●

●●



●●

●●


●●



●●


●●


●●


●●


●●


1


Standardized residuals


●●

●●


●●


●●


●●

●●


●●

●●


●●



●●


●●

●●


●●


●●






●●


●●


●●
0



●●


●●




●●

●●


●●



●●

●●

●●


●●

●●


●●


●●


●●



●●


●●



●●


●●
−1






●●


●●

●●




●●


●●
●●




●●


●●



●●

●●
●●

●●
−2


●●
●●

●●
●●

●●

●●


●●



●●●●
●●
●●
●●●

13
2● ●
−3

●1

−3 −2 −1 0 1 2 3

Theoretical Quantiles
lm(child ~ parent)

I It should be close to the line y = x if normality holds.


o
I Curvature in the tails indicates a violation of the normality
27/31
Regression
Regression with Categorical Predictor

Background

I We often wish to use categorical (or qualitative) variables as


covariates in a regression model.
I For binary variables (taking on only 2 values, e.g. gender), it
is relatively easy to include them in the model.
I Usually one level is coded as 0 and the other as 1 and then
the variable can be put into the model as normal.
I However, the interpretation of the estimate is slightly different.

o
28/31
Regression
Regression with Categorical Predictor

A Single Binary Predictor


Consider the linear regression model

Y = βo + β1 X + 

Here 
1 if Males,
X =
0 Females
is the dummy variable.
Then for Males
E (Y |X = 1) = βo + β1
while for females
E (Y |X = 0) = βo
βˆ1 is interpreted as increase or decrease in the mean response for
Males vs Females o
29/31
Regression
Regression with Categorical Predictor

An R Example
Remember
Weight = βˆo + βˆ1 Gender

βˆo : Mean Weight for females

βˆo + βˆ1 : Mean Weight for Males

βˆ1 : Increase/decrease in Male’s Weight as compared to Female

Weight = 49.66 + 10.7Gender


o
30/31
Regression
Regression with Categorical Predictor

70
65
60
Weight in Kg

55
50
45

F M

o
31/31
Regression
Regression with Categorical Predictor

You might also like