You are on page 1of 2

11 Simple Linear Regression

Correlation a statistical method used to determine whether a relationship between variables

o Can be represented numerically as , the correlation coefficient
o can lie between and , where a negative number implies a negative linear
correlation and a positive number implies a positive linear correlation
o The close is to 0, the weaker the correlation
Regression a statistical method used to describe the nature of the relationship between the
o Used to predict the value of the dependent variable based on the value of other
independent variables
o The simple linear regression model: = 0 + 1 +
Where is the dependent variable and is the independent variable
0 is the y-intercept (like b in y=mx+b); the estimated average value of y, when x
is 0
1 is the slope (like m in y=mx+b); the estimated change in the average value of
y when x increases by 1 unit
is the error variable
o is typically left out, assuming no error, when estimating values using a regression
3 possible assumptions of a regression model
o Constant variance assumption
Population of error terms does not depend on the value of x
o Normality assumption
Population of error terms has a normal distribution
o Independence assumption
Any value of is statistically independent from any other value of
A regression model is not likely not useful unless there is a significant relationship between x
and y
o This can be tested using a t-test statistic
o An F-test can also be used to test the null hypothesis 0 : 1 = 0 (so slope of 0, no
relationship between x and y)
This is done on the computer on an ANOVA table
Regression can also be numerically represented as 2 , which can lie between 0 and 1
o Tells us how much of the variation in the dependent variable can be accounted for by
the independent variable
o How much variation can be explained by the model

Confidence Interval vs. Prediction Interval

o Prediction interval intended to trap a new observation of the dependent variable
given values of the independent variables
o Confidence interval intended to trap the mean of the dependent variable given values
of the independent variables