Professional Documents
Culture Documents
Overview
Lesson 1: Linear Regression
LOS 4a: Distinguish between the dependent and independent variables in a linear regression.
LOS 4b: Explain the assumptions underlying linear regression and interpret regression coefficients.
https://wel.instructure.com/courses/3711/modules/items/1149846 1/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Study Guide
Lesson 1: Linear Regression
1.1 Scatter Plots
The following table lists average observations of annual money supply growth and inflation rates for 6
countries over the period 1990 to 2010.
Figure 1.1 illustrates this data on a scatter plot. The scatter plot does not show which point relates to which
country; it just plots the observations of both data series as pairs. The data plotted in Figure 1.1 suggests a
fairly strong linear relationship with a positive slope for the countries in our sample over the sample period.
1.2.1 Properties of Covariance
https://wel.instructure.com/courses/3711/modules/items/1149846 2/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
i=1
− −
where:
n = sample size
Xi = ith observation of Variable X
−
Cov (X, Y )
Sample correlation coefficient = r =
sX sY
i=1
2
−
2
Sample variance = s
X
= ∑ (Xi − X) /(n − 1)
2
Sample standard deviation = sX = √s
X
https://wel.instructure.com/courses/3711/modules/items/1149846 3/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Example 1.1
Calculating the Correlation Coefficient
Using the money supply growth and inflation data from 1990 to 2010 for the 6 countries in Figure 1-1,
calculate the covariance and the correlation coefficient.
Solution:
Squared Squared
Money Supply Inflation Cross Product Deviations Deviations
Country Growth Rate (Xi) Rate (Yi) ¯
¯¯)(Y − Y
(Xi − X i
¯
¯¯) ¯
¯¯)
(Xi − X
2
¯
¯
(Yi − Y¯)
2
Illustrations of Calculations
Var (X) = Sum of squared deviations from the sample mean / n − 1 = 0.004921/5 = 0.000984
Var (Y) = Sum of squared deviations from the sample mean / n − 1 = 0.003094/5 = 0.000619
Cov (X,Y )
0.000747
Correlation coefficient = r = = = 0.9573 or 95.73%
s s
X Y (0.031373)(0.024874)
The correlation coefficient of 0.9573 suggests that over the period, a strong linear relationship exists
between the money supply growth rate and the inflation rate for the countries in the sample.
Note that computed correlation coefficients are only valid if the means and variances of X and Y, as well as
the covariance of X and Y, are finite and constant.
LOS 4a: Distinguish between the dependent and independent variables in a linear regression. Vol 1, pp 238–
243
https://wel.instructure.com/courses/3711/modules/items/1149846 4/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Linear regression is used to summarize the relationship between two variables that are linearly related. It is
used to make predictions about a dependent variable, Y (also known as the explained variable, endogenous
variable, and predicted variable), using an independent variable, X (also known as the explanatory variable,
exogenous variable, and predicting variable), to test hypotheses regarding the relation between the two
variables, and to evaluate the strength of the relationship between them. The dependent variable is the
variable whose variation we are seeking to explain, while the independent variable is the variable that is used
to explain the variation in the dependent variable.
Another way to look at simple linear regression is that it aims to explain the variation in the dependent
variable in terms of the variation in the independent variable. Note that variation refers to the extent that a
variable deviates from its mean value. Do not confuse variation with variance (variation referring to the
concept of dispersion or variability of a variable; variance being a particular measure of dispersion or
variability).
The following linear regression model describes the relation between the dependent and the independent
variables.
Regression model equation = Yi = b0 + b1 Xi + εi , i = 1, … . , n
where:
Based on this model, the regression process estimates the line of best fit for the data in the sample. The
regression line takes the following form:
ˆ = ˆ
Regression line equation = Y
ˆ
b 0 + b 1 Xi , i = 1,...., n
i
Linear regression computes the line of best fit that minimizes the sum of the squared regression residuals (the
squared vertical distances between actual observations of the dependent variable and the regression line).
What this means is that it looks to obtain estimates, b̂ and b̂ , for b0 and b1 respectively, that minimize the
0 1
sum of the squared differences between the actual values of Y, Yi and the predicted values of Y, Yˆ , according i
n
2
ˆ ˆ
∑ [Yi − ( b 0 + b 1 Xi )]
i=1
where:
Yi = Actual value of the dependent variable
https://wel.instructure.com/courses/3711/modules/items/1149846 5/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Hats over the symbols for regression coefficients indicate estimated values. Note that it is these estimates
that are used to conduct hypothesis tests and to make predictions about the dependent variable.
The sum of the squared differences between actual and predicted values of Y is known as the sum of squared
errors, or SSE.
LOS 4e: Explain the assumptions underlying linear regression and interpret regression coefficients. Vol 1, pp
243–245
1. The relationship between the dependent variable (Y) and the independent variable (X) is linear in the
parameters, b1 and b0. This means that b1 and b0 are raised to the first power only and neither of them is
multiplied or divided by another regression parameter.
2. The independent variable, X, is not random.
3. The expected value of the error term is zero: E(ε) = 0.
4. The variance of the error term is constant for all observations (E(ε 2
i
2
) = σε , i = 1, … , n) . This is known
as the homoskedasticity assumption.
5. The error term is uncorrelated across observations.
6. The error term is normally distributed.
Example 1.2
Linear Regression
For the money supply growth and inflation rate data that we have been working with in this reading,
determine the slope coefficient and the intercept term of a simple linear regression using money supply
growth as the independent variable and the inflation rate as the dependent variable. The data provided
below is excerpted from Example 1-1:
Squared Squared
Money Supply Inflation Cross Product Deviations Deviations
Country Growth Rate (Xi) Rate (Yi) ¯
¯¯)(Y − Y
(Xi − X i
¯
¯¯) ¯
¯¯)
(Xi − X
2
¯
¯
(Yi − Y¯)
2
Solution:
https://wel.instructure.com/courses/3711/modules/items/1149846 6/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Typically, regression software is used to determine the regression coefficients. However, for illustrative
purposes we perform the calculations to make the source of the numbers clear.
Note that the slope coefficient can be computed in this manner only when there is one independent
variable.
In a linear regression, the regression line passes through the coordinates corresponding to the mean values
of the independent and dependent variables. Using the mean values for money supply growth
¯
(X¯¯ = 0.1012) and the inflation rate (Y¯¯= 0.0718) we can compute the intercept term as:
¯
¯
¯
Y¯ = ˆ ˆ ¯¯¯
b0 + b1 X
Therefore, the relationship between the money supply growth rate and the inflation rate for our sample
data can be expressed as:
Inflation rate = −0.005 + 0.7591(Money supply growth rate)
The regression equation implies that:
For every 1-percentage-point increase in the money supply growth rate, the inflation rate is predicted
to increase by 0.7591 percentage points. The slope coefficient (ˆb ) is interpreted as the estimated
1
change in the dependent variable for a 1-unit change in the independent variable.
If the money supply growth rate in a country equals 0, the inflation rate in the country will be −0.5%.
The intercept (ˆb ) is an estimate of the dependent variable when the independent variable equals 0.
0
Figure 1.2 illustrates the regression line (blue line) along with the scatter plot of the actual sample data (blue
dots). Using Country F as an example, we also illustrate the difference between the actual observation of the
inflation rate in Country F, and the predicted value of inflation (according to the regression model) in Country
F.
https://wel.instructure.com/courses/3711/modules/items/1149846 7/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
https://wel.instructure.com/courses/3711/modules/items/1149846 8/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Flashcards
Lesson 1: Linear Regression
1
fc.L2R7L01.0001_1812
¯
¯¯) (Y − Y
¯
¯¯)
COV (X, Y ) = ∑ (Xi − X
and a sample correlation coefficient (r). n−1
i
i=1
n
1 2
2
s = ¯
¯¯)
∑ (Xi − X
X n−1
i=1
COV (X,Y )
r =
s s
X Y
2
fc.L2R7L02.0001_1812
Give the formula used to determine the linear Regression model equation = Yi = b0 + b1Xi + εi, I
regression model between the dependent and = 1, …., n
independent variables.
3
fc.L2R7L02.0002_1812
https://wel.instructure.com/courses/3711/modules/items/1149846 9/10
9/25/2020 CFA Exam Review - Level 2 - Lesson 1: Linear Regression
Notes
Lesson 1: Linear Regression
https://wel.instructure.com/courses/3711/modules/items/1149846 10/10