Professional Documents
Culture Documents
3
What is Regression Analysis?
➔ First, what’s a model?
◆ Model is a mathematical description of the relationship between 2 or more variables.
● Dependent variable: depend on values of IV
● Independent variable(s): variable whose value doesn’t depend on others
➔ Deterministic (formulaic) vs. Probabilistic (statistical)
4
What is Regression Analysis?
➔ Regression analysis is a statistical process
that helps describe the relationship
between variables
◆ How can an outcome be explained?
◆ What will a future outcome be? Ie.
prediction and/or forecasting
5
What is Regression Analysis?
➔ Many types of regression
◆ Linear - continuous dependent variable and linear relationship
◆ Logistic - dichotomous (1/0) dependent variable
◆ Polynomial - power of independent variable(s); aka “curved” relationship
◆ Stepwise - multiple independent outcomes (machine learning)
◆ Ridge, Lasso and more!
● but we’ll focus on linear regression (LR)
6
What is Regression Analysis?
1. Simple Linear Regression (SLR) involves one dependent variable and one independent
variable.
2. Multiple Linear Regression (MLR) involves one dependent variable and 2+ independent
variables (most common).
7
Linear Regression
➔ Variable Types
◆ Covariates ~ independent variables (your X’s)
◆ Outcomes ~ dependent variables (your Y’s)
● Scale - continuous
● Nominal - categorical with no order
● Ordinal - categorical with equidistant order (Likert Scale)
8
Simple Linear Regression (SLR)
yi = c + Bi*xi + ε
where y ~ estimated outcome
x ~ independent variable
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
9
Simple Linear Regression (SLR)
yi = c + Bi*xi + ε
where y ~ estimated outcome
x ~ independent variable
We’re predicting/estimating
Bi ~ coefficient
the outcome there is an error
c ~ constant/intercept associated
ε ~ error term
10
Simple Linear Regression (SLR)
yi = Bi*xi + c + ε
where yi ~ estimated outcome
xi ~ independent variable
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
11
Simple Linear Regression (SLR)
yi = Bi*xi + c + ε
where yi ~ estimated outcome
xi ~ independent variable
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
12
Simple Linear Regression (SLR)
yi = Bi*xi + c + ε
where yi ~ estimated outcome
xi ~ independent variable
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
13
Simple Linear Regression (SLR)
yi = Bi*xi + c + ε
where yi ~ estimated outcome
xi ~ independent variable
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
14
Simple Linear Regression (SLR)
yi = Bi*xi + c + ε
where yi ~ estimated outcome
xi ~ independent variable
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
15
Multiple Linear Regression (MLR)
yi = Bi*xi + Bj*xj + c + ε
where yi ~ estimated outcome
xij ~ independent variable(s)
Bi ~ coefficient
c ~ constant/intercept
ε ~ error term
16
Linear Regression Fundamentals
Least Squares Method - Minimizes sum of squared deviations by penalizing large errors more
than small errors
∑ (Yi - Yi*)2
We minimize the equation for the sum of the squared prediction errors:
Q=∑i=(yi−(b0+b1xi))2
(that is, take the derivative with respect to b0 and b1, set to 0, and solve for b0 and b1) and get the
"least squares estimates" for b0 and b1:
18
Linear Regression Assumptions
1. The mean of the response, E(Yi), at each value of 1. X(s) and Y have a linear relationship
the predictor, xi, is a linear function of the xi.
a. Expected value of error term is zero.
2. The errors, εi, are independent.
a. Autocorrelation - disturbances are correlated 2. No hidden correlations between independent
with one another variables.
3. The errors, εi, at each value of the predictor, x i, are 3. Normal distribution (can do statistical tests)
normally distributed.
4. The errors, εi, at each value of the predictor, x i, 4. Error doesn’t change drastically across values
have equal variances (denoted σ2).
a. Heteroscedastic - disturbances are not all
equal 19
Pearson Correlation
➔ Measures the strength of relationship ➔ Scale of -1 to +1
between two variables ◆ -1 Strong negative relationship
◆ 0 No relationship
➔ Several correlation coefficients, but Pearson
◆ +1 Strong positive relationship
is most common
➔ Quantitative variables only
https://www.spss-tutorials.com/pearson-correlatio
n-coefficient/
Linear Regression Model Evaluation
➔ Mean Square Error (MSE): Minimize tradeoff of bias and inefficiency
◆ E(B*) = E(B* - B)2
22
Linear Regression Model Evaluation
➔ F-test: compares the fits of different linear models (multiple variables)
◆ Null hypothesis: The fit of the intercept-only model and your model are equal.
◆ Alternative hypothesis: The fit of the intercept-only model is significantly reduced
compared to your model.
● P value for the F-test of overall significance test is less than your significance level,
you can reject the null-hypothesis
○ Higher F stat the better the model
23
Schedule
▶ Recap of Regression Basics
▶ Single Variable
▶ Multi Variable
▶ Excel Output
▶ “Best” Model
▶ t Stat & p-Value
▶ Regression in R
▶ Hands On
▶ Let’s try our hand
Single Variable Regression (Simple Regression)
▶ Measures a single independent variable, x, as it impacts a single dependent variable, y
▶ Goal: Minimize Error using Least Squares Methodology
▶ Categorical Variables
▶ Must be split into indicator variables
▶ Region Example
Excel Output
Model Fit
Model
Significanc
e
Coefficients
Residuals
“Best” Model
▶ What is the ”best” fitting model?
▶ Goldilocks balance with the number of predictors
▶ Too few: An underspecified model tends to produce biased estimates.
▶ Too many: An overspecified model tends to have less precise estimates.
▶ Just right: A model with the correct terms has no bias and the most precise estimates
▶It depends!
“Best” Model
“Best” Model
“Best” Model
Excel Output – t-Stat & P-value
▶ t-Stat = Coefficient / Standard Error
▶ P-value = Calculated from t-State and degrees of freedom
▶ What do these value actually measure?
▶ Probability that our results are extreme