FIN 640 - Lecture Notes 6 - Correlation and Regression

Week 6: Correlation and Regression
PROF. MICHAEL DONG

CALIFORNIA STATE UNIVERSITY LONG BEACH
FALL 2020
1
1. Correlation Analysis
Road Map 2. Simple Linear Regression
3. Multiple Linear Regression
2
Very helpful statistics videos
https://www.youtube.com/channel/UCs3IhN8VOA_5WxpAgbSmFkg/playlists
By Stephanie Glen
3
1. Correlation Analysis
4
Scatter Plots
Check Python demonstration
5
Correlation Analysis
in contrast to a scatter plot, which graphically depicts the relationship between two data series,
correlation analysis expresses this same relationship using a single number. The correlation
coefficient is a measure of how closely related two data series are. In particular, the correlation
coefficient measures the direction and extent of linear association between two variables.
6
7
8
Correlation Calculation
To study historical or sample correlations, we need to use sample covariance. The sample
covariance of X and Y, for a sample of size n, is
We then need to calculate the sample variance of X to obtain its sample standard deviation.
The formula for computing the sample correlation coefficient is
9
Example
10
Example
11
Limitations of correlation analysis
correlation measures the linear association between two variables, but it may not always be re-
liable. two variables can have a strong nonlinear relation and still have a very low correlation.
For example, the relation B = (A − 4)2 is a nonlinear relation contrasted to the linear relation
12
Nonlinear relationship
13
Outliers
in the scatter plot in Figure 6, most of the
data lie clustered together with little
discernible relation between the two
variables. two cases, however (the two
circled observations), stand out from the
rest. in one of those cases, inflation was
extremely low at almost –2 percent, and
in the other case, stock returns were
strongly negative at almost –17 percent.
These observations are outliers. if we
compute the correlation coefficient for
the entire data sample, that correlation is
−0.0350. if we eliminate the two outliers,
however, the correlation is −0.1489.
14
Spurious Correlation/Regression
The term spurious correlation has been used to refer to 1) correlation between two variables
that reflects chance relationships in a particular data set, 2) correlation induced by a calculation
that mixes each of two variables with a third, and 3) correlation between two variables arising
not from a direct relation between them but from their relation to a third variable.
As an example of the second kind of spurious correlation, two variables that are uncorrelated
may be correlated if divided by a third variable.
As an example of the third kind of spurious correlation, height may be positively correlated with
the extent of a person’s vocabulary, but the underlying relationships are between age and height
and between age and vocabulary.
Investment professionals must be cautious in basing investment strategies on high correlations.
Spurious correlation may suggest investment strategies that appear profitable but actually
would not be so, if implemented.
15
Testing the Significance of the correlation coefficient
16
Tests Concerning the correlation
17
2. Linear Regression
18
2.1 Simple Regression Model
19
Some Reference Books
Introductory Econometrics pdf:
https://economics.ut.ac.ir/docu
ments/3030266/14100645/Jeffre
y_M._Wooldridge_Introductory_
Econometrics_A_Modern_Appro
ach__2012.pdf
20
The Simple Regression Model
Names:
Definition of the simple regression model • Simple Linear Regression, or
◦ “Explains variable y in terms of variable x” • Univariate Linear Regression, or
• Linear Regression with only one
independent variable
21
Definition
The variables y and x have several different names used
interchangeably, as follows:
y is called the dependent variable, the explained variable,

the response variable, the predicted variable, or
the regressand;
x is called the independent variable, the explanatory

variable, the control variable, the predictor variable, or
the regressor. (The term covariate is also used for x.)
The terms “dependent variable” and “independent variable”

are frequently used in econometrics. But be aware that the
label “independent” here does not refer to the statistical
notion of independence between random variables.
22
The Simple Regression Model (2 of 39)
Interpretation of the simple linear regression model

◦ Explains how y varies with changes in x
The simple linear regression model is rarely applicable in practice but its discussion is useful for
pedagogical reasons.
23
When is there a causal interpretation?

◦ Conditional mean independence assumption
Example: wage equation
24
Population regression function (PFR)

◦ The conditional mean independence assumption implies that
This means that the average value of the dependent variable can be expressed as a linear
function of the explanatory variable.
25
26
The Simple Regression Model
Deriving the ordinary least squares estimates
◦ In order to estimate the regression model one needs data
◦ A random sample of n observations
27
Estimation Method: Ordinary Least Squares
Deriving the ordinary least squares (OLS) estimators
Defining regression residuals
Minimize the sum of the squared regression residuals
OLS estimators
28
Estimation Method: Ordinary Least Squares
OLS fits as good as possible a regression line through the data points
29
30
Properties of OLS on any sample of data

Fitted values and residuals
Algebraic properties of OLS regression
31
What each item means in OLS
32
Assumptions of the Linear Regression Model
to be able to draw valid conclusions from a linear regression model with a single independent
variable, we need to make the following six assumptions, known as the classic normal linear
regression model assumptions:
33
Zero conditional mean assumption
34
The standard error of the estimate/regression
The formula for the standard error of estimate for a linear regression model with one
independent variable is
35
Goodness-of-Fit
Goodness of fit
◦ How well does an explanatory variable explain the dependent variable?
Measures of variation:
36
Goodness-of-Fit
37
Goodness-of-fit
38
Estimator Properties
Expected values and variances of the OLS estimators
The estimated regression coefficients are random variables because they are calculated from a
random sample
The question is what the estimators will estimate on average and how large will their variability be in
repeated samples
39
Standard assumptions for the linear regression model
Assumption SLR.1 (Linear in parameters)
Assumption SLR.2 (Random sampling)
40
Assumptions for the linear regression model (cont.)
Assumption SLR.3 (Sample variation in the explanatory variable)
Assumption SLR.4 (Zero conditional mean)
41
Theorem 2.1 (Unbiasedness of OLS)
Interpretation of unbiasedness
◦ The estimated coefficients may be smaller or larger, depending on the sample that is the result of a random
draw.
◦ However, on average, they will be equal to the values that characterize the true relationship between y and x
in the population.
◦ “On average” means if sampling was repeated, i.e. if drawing the random sample and doing the estimation
was repeated many times.
◦ In a given sample, estimates may differ considerably from true values.
42
Variances of the OLS estimators
◦ Depending on the sample, the estimates will be nearer or farther away from the true population values.
◦ How far can we expect our estimates to be away from the true population values on average (= sampling
variability)?
◦ Sampling variability is measured by the estimator‘s variances
Assumption SLR.5 (Homoskedasticity)
43
Graphical illustration of homoskedasticity
44
An example for heteroskedasticity: Wage and education
45
Estimator Variance Properties
Theorem 2.2 (Variances of the OLS estimators)
Under assumptions SLR.1 – SLR.5:
Conclusion:
◦ The sampling variability of the estimated regression coefficients will be the higher, the larger the variability of
the unobserved factors, and the lower, the higher the variation in the explanatory variable.
46
Estimating the error variance
47
Theorem 2.3 (Unbiasedness of the error variance)
Calculation of standard errors for regression coefficients
The estimated standard deviations of the regression coefficients are called “standard errors.” They
measure how precisely the regression coefficients are estimated.
48
Hyothesis Testing
49
Analysis of Variance (ANOVA)
Analysis of variance (ANOVA) is a statistical procedure for dividing the total variability of a variable into
components that can be attributed to different sources.
an important statistical test conducted in analysis of variance is the F-test. The F-statistic tests whether all
the slope coefficients in a linear regression are equal to 0. in a regression with one independent variable,
this is a test of the null hypothesis H0: b1 = 0 against the alternative hypothesis Ha: b1 ≠ 0.
50
Analysis of Variance (ANOVA)
51
Prediction Intervals
52
Q&A
53

FIN 640 - Lecture Notes 6 - Correlation and Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FIN 640 - Lecture Notes 6 - Correlation and Regression

Uploaded by

Copyright:

Available Formats

Week 6: Correlation and Regression

PROF. MICHAEL DONG

Road Map 2. Simple Linear Regression

3. Multiple Linear Regression

The formula for computing the sample correlation coefficient is

y is called the dependent variable, the explained variable,

x is called the independent variable, the explanatory

The terms “dependent variable” and “independent variable”

Interpretation of the simple linear regression model

When is there a causal interpretation?

Example: wage equation

Population regression function (PFR)

Minimize the sum of the squared regression residuals

Properties of OLS on any sample of data

Algebraic properties of OLS regression

Assumption SLR.2 (Random sampling)

Assumption SLR.4 (Zero conditional mean)

Assumption SLR.5 (Homoskedasticity)

Calculation of standard errors for regression coefficients

You might also like