You are on page 1of 89


Stat 608 Chapter 2 (and 5)

Simple Linear regression

+Introduction: Simple
Linear Regression Models
Simple Linear Regression (SLR)
We always draw a scatterplot first to obtain an
idea of the strength (measured by correlation),
form (e.g., linear, quadratic, exponential), and
direction (positive or negative, in the linear
case) of any relationship that exists between two

On the next slide we see a strong (correlation

close to 1) positive (as x increases, y increases)
linear (straight line) relationship between the
percentage of voters of each state who voted for
President Obama in 2008 and in 2012.
Simple Linear Regression (SLR)
70 60
% Obama 2012
40 50

40 50 60 70
% Obama 2008
Simple Linear Regression (SLR)
When data are collected in pairs the
standard notation used to designate this
(x1, y1), (x2, y2), . . . , (xn, yn),
where x1 denotes the first value of the X-
variable and y1 denotes the first value of
the Y-variable. The X-variable is called the
explanatory variable, while the Y-
variable is called the response variable.
Simple Linear Regression (SLR)
Forexample, the first few entries in
the election data set above were
(37.7, 40.8), (38.8, 38.4), (38.8,
meaning 37.7% of Alabama voters
voted for President Obama in 2008,
while 40.8% did so in 2012, and so on.
Simple Linear Regression (SLR)
We could use a matched pairs t-test to test
whether the same percentage of voters
voted for President Obama in 2008 and
2012 in the 50 states, on average. If the
percentages were indeed equal, what would
that mean for our regression model?

Aside: I wouldnt be able to conclude just from the

average percentages being equal that the percentage of
the country voting for Obama had stayed the same why
S.L.R. Models
The regression of Y on X is linear if

where the unknown parameters 0 and 1

determine the intercept and the slope,
respectively, of a specific straight line.
Note: the parabolic model

is also a linear model, even though it graphs a parabola,

because the mean of Y given X is related to the
parameters via a linear function.
S.L.R. Models
S.L.R. Models
There will almost certainly be some variation in
Y due strictly to random phenomenon that
cannot be measured, predicted, or explained.
This variation is called random error, and
denoted ei.

In other words, the difference between what

actually occurs and our model is the error. Our
model doesnt explain everything; the amount
of variability in our response variable that
remains unexplained is measured by the error.
S.L.R. Models
Suppose that Y1, Y2 , , Yn are independent
realizations of the random variable Y that are
observed at the values x1 , x2 , , xn of a
random variable X . If the regression of Y on X
is linear, then for i = 1, 2, , n;

where ei is the random error in Yi and is such

that E(ei | X) = 0.
Notice that the random error term does not
depend on x , nor does it contain any
information about Y (otherwise it would be a
systematic error).

For now, we assume the errors have constant

Introduction to Least Squares
Dont get these ideas confused:
Introduction to Least Squares
+Deriving And Interpreting
Parameter Estimates
Least Squares: Derive
parameter estimates
Goal: Minimize
with respect to 0
and 1.

We have two equations

with two unknowns. Set
both equal to 0 and solve.
Least Squares: Derive
parameter estimates
First: derivative
with respect to the
intercept. Solve
for the y-intercept.

The least squares regression

line goes through the point
Least Squares: Derive
parameter estimates
Second: Derivative
with respect to the
slope. Solve for
the slope.
Example: R Code


For more information about this linear model, type


To plot your x and y variables on a scatterplot, use:


For more information, syntax, and sometimes examples on a function

(e.g. the lm function), type:
Example: SAS Code


Example: Election Data
Explanatory variable: % voting for Obama in 08
Response variable: % voting for Obama in 12

(Intercept) election$Obama_08
-4.461 1.042

Interpret the slope in context.

Least Squares: Matrix Setup
Derivatives of Matrices
Derivatives of matrices are defined as partials
with respect to the variable vector. For example:
Derivatives of Matrices
The previous example was a derivative of a
scalar-by-vector on this wikipedia page:

It can also be found in the section on First Order

derivatives in the Matrix Cookbook. (61) p. 9
(see eCampus)
Derivatives of Quadratic Forms

The gradient and Hessian:

+ Least Squares: Derive
parameter estimates in matrix
Linear Models: Inference
+ 28

George Box

All models are wrong,

but some are useful.
+ 29

Objective: Develop hypothesis tests and
confidence intervals for the slope and
intercept of a least squares regression model
1. Assumptions: Next chapter
2. Bias of Estimates
3. Variability of Estimates
In English: Is the percent of the states that
voted for Obama in 2008 linearly related to
the percentage in 2012?
Review: Expectation and
Variance of Random Variables
Recall the definition of variance:

Covariance is similar. It is the numerator of

Expectation and Variance of
Random Variables
Assume X and Y are random variables, and a and
b are constants. Then:
Correlation is unit-free (with no measurement error,
changing from meters to feet yields the same

Correlation does not change if the explanatory and

response variables are exchanged.

Correlation is always between -1 and 1.

Correlation closer to -1 or +1 indicates a stronger

relationship between the explanatory and response
variables; correlation closer to 0 indicates a weaker

When both x and y are above average, or both x and y are

below average, a positive value is contributed to the sum.
When exactly one is above average, a negative value is
contributed to the sum.

When most values are in the bottom left and top right of the
plot, the correlation is positive.

Sample covariance is the numerator of sample correlation.


The correlation between the


percents of states voting for

% Obama 2012

Obama in 2008 and 2012 is


r = 0.98.

This indicates a strong


positive relationship
40 50 60 70
between 2008 and 2012.
% Obama 2008
Expectation and Variance of
Assume x is a vector of random variables. Then:
Variance / Covariance Matrix
Correlation Matrix

Tocalculate the correlation matrix, divide by

the appropriate standard deviations
Expectation and Variance of
Assume X and Y are matrices of random
variables; x and y are vectors of random
variables; and A, B, and C are constant matrices.
Quadratic Form
Expectation of Quadratic Form
It can be shown that

where and are the expected value and

variance-covariance matrix of x, respectively,
and tr denotes the trace of a matrix.

This result only depends on the existence of

and ; in particular, normality of x is not
+ 41

Usual Assumptions

In matrix notation:

The bias of a statistic is the difference between its

expected value and the parameter it should estimate.
+ 43

Inferences About the Slope and

Unbiased Estimates
+ Inferences About the Slope and 44

Variance of the estimates
Inferences About the Slope and
What does it mean to take the expectation of the
sample slope?
+ Inferences About the Slope 46

Distribution of slope:

Since the errors are normally distributed, yi = 0 +

1xi + ei is also normally distributed, given x. Since
the sample slope is a linear combination of the yis, it
is also normally distributed.
+ Inferences About the Slope 47

Of course, is unknown, so we estimate it using the

sample standard deviation. Then our test statistic
has the t-distribution (with n k degrees of freedom
in general, k being the number of columns of X).
Inferences About the Slope

standard error is the standard deviation of a statistic
(often, the sample standard deviation).

The margin of error is half the width of a confidence interval.

Dont mix up the critical value with the test statistic calculated
from the data! Look up the critical value on a table, or use R.

, the variance of the errors, is estimated by MSE. So the

square root of MSE, RMSE, is s in the equation above.
+ Inferences About the Slope 49

For our election example, R output gives:

Coefficients: Estimate Std. Error t value


(Intercept) -5.13608 1.72705 -2.974

0.00459 **

states50$Obama_08 1.05610 0.03363 31.401

< 2e-16 ***
Confidence Interval for the slope:
Inferences About the Slope

Hypothesis Test for whether the slope = 0:

More Inference
+ 52

Objectives: Regression Line

Develop hypothesis tests and confidence

intervals for the regression line:
1. Confidence intervals for the mean
2. Prediction intervals for individuals
In English: If 48% of a state voted for
Obama in 2008, what percentage voted
for him in 2012?
+ 53

Slope vs. regression line

+ 54

Slope vs. regression line

+ 55

Slope vs. regression line

Confidence interval for slope:
If x increases by 1 unit, by how much does y increase or decrease?
Deals with sampling distribution of sample slope from one sample to
Can use CLT to say the sample slope is approximately normal for large

Confidence interval for the regression line (mean):

For a given value of x, what is the mean value of y?
Deals with sampling distribution of sample mean from one sample to
Can use CLT to say the sample mean is approximately normal for large

Prediction interval for the regression line:

For a given value of x, what are the individual values of y?
Deals with distribution of individuals about the regression line.
The CLT is NOT useful in this case. We need to know the distribution of
the errors.
+ 56

Confidence Intervals for the

Population Regression Line

We consider the problem of finding a confidence

interval for the unknown population mean at a
given value of X, which we shall denote by x*.

First, recall that the population mean at X = x*

is given by
+ 57

Confidence Intervals for the

Population Regression Line
Anestimator of this unknown quantity is the
value of the estimated regression equation at
X = x*, namely:
+ 58

How many Populations are


One for each unique value of X = x*

How many populations were sampled in
the election data?
+ 59

How Many Means and Variances

do we Have?
___ means
___ variances
However, we assume that
+ 60

Variance of mean
+ 61

Estimated Variance of mean

+ 62

Confidence Intervals for the

Population Regression Line
A 100(1 a)% confidence interval for

the population mean of Y at X = x*, is given by

where t/2, n-2 is the 100( 1 /2)th quantile of the t-

distribution with n - 2 degrees of freedom.
+ 63

Confidence Intervals for the

Population Regression Line
Does the confidence interval for the mean y-value get
narrower as n gets bigger?

What happens to the width of the confidence interval as

your desired x-value is farther from the mean for x?
+ 64

Prediction Intervals for the Actual

Value of Y
+ 65

Prediction Intervals for the

actual value of Y
+Dummy (Indicator)
Variable Regression
+ 67

Dummy Variable Regression

Experiment: Does increasing calcium intake
decrease blood pressure?

For placebo group:

For calcium group:

+ 68

Dummy Variable Regression

For placebo group:

For calcium group:

+ 69

Dummy Variable Regression

21 total men; 10
took the calcium
supplement, and 11
took the placebo.
+ 70

Dummy Variable Regression

What if we had the intercept plus two variables, one an

indicator of taking the placebo and one an indicator of
taking the calcium supplement?

The columns of X must be linearly independent; that is,

the matrix X must be of full rank.
+ 71

Dummy Variable Regression

+ 72

Dummy Variable Regression

+ 73

Dummy Variable Regression

+ 74

Dummy Variable Regression



Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.2727 2.2266 -0.122 0.904
x 5.2727 3.2267 1.634 0.119

Calcium group mean = 5, Placebo group mean = -0.27.

Difference between means = 5.27, as expected.
+ 75

Dummy Variable Regression

+ 77

Geometric Interpretation of
LetV be the vector space spanned by the
columns of our design matrix X.
Then is the projection of the vector y down
into the vector space V.
Is a in V?
+ 78

Geometric Interpretation of

Projection (Hat) Matrix:

Projection is like a shadow of the real thing

+ 79

Geometric Interpretation of
+ 80

Geometric Interpretation of
+ 81

Geometric Interpretation of

Any vector in V multiplied by the error vector is 0:

+ 82

Geometric Interpretation of

Is it true that
What is
ANalysis Of VAriance (ANOVA)

The original sample variance of y is the total variance,

measuring the total amount of variability in the data

The ANOVA table breaks that variability into two parts:

one explained by x, and the leftovers.
+ 85

ANalysis Of VAriance (ANOVA)

Were considering two sources of variation using this
regression model:

Variation that can be explained by the model

Variation due to randomness or other variables not


Data = Fit + Residual

yi = ( 0 + 1xi) + ( i)

The regression model, whether using dummy variables or not,

is not really about variances; the point in a regression model is
to study how y changes with x.
+ 87




All else held constant:

If the strength of the correlation increases,

_____________ decreases.

If the slope of the line gets steeper, _______________


es of
Freedo Sums of Mean F- P-
m Squares Squares Statistic value
on 1