You are on page 1of 28

BE368: Finance Research Techniques Using

Matlab
Lecture 4: Linear Regression

Mark Hallam

University of Essex

3 Nov 2022

Mark Hallam University of Essex


BE368: Lecture 4
Outline for Today’s Lecture

Linear Regression
I Concepts and definitions: sample and population regression
functions, variables, errors and residuals
I Estimation and inference: estimating linear regression
models, predicted values, alternative functional forms, tests of
significance, R-squared and other regression statistics
I An application: capital asset pricing model (CAPM)

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression Models
I Linear regression analysis concerns the estimation and
inference for linear regression models - the simplest case is the
simple linear regression model:

y = β0 + β1 x + e

I y is known as the dependent/explained variable or


regressand, x is the independent/explanatory variable or
regressor and e is the error term which has mean value zero
I In such a model, the expected value of y equals β0 (the
intercept) plus β1 times the value of x (since E [e] = 0)
I This implies a linear relationship between the values of the
dependent variable, y , and the explanatory variable, x

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression Models
I Linearity assumption may seem restrictive, but linear models
are directly applicable in many financial applications
I Additionally, we can define y and/or x to be non-linear
transformations of some underlying variables e.g. x = ln(z),
to allow for more general relationships
I We can also generalise the model by considering multiple
linear regression models in which the value of y may depend
on the values of two or more regressors, x1 , x2 , . . .
I For the general case with k regressors, we have:

y = β0 + β1 x1 + β2 x2 + . . . + βk xk + e

where xj denotes the value for the j-th regressor

Mark Hallam University of Essex


BE368: Lecture 4
Time Series vs. Cross-Sectional Regression
I Linear regression models can be used in either a cross-section
or time-series context, both of which have numerous
applications in finance
I In a cross-sectional context, the values of y and x1 , . . . xk
represent values for different observational units (i.e. different
stocks, investors, firms, etc.) observed at the same point in
time
I In a time-series context, the values of y and x1 , . . . xk
correspond to values for the same observational units, but for
different points in time
I For the purposes of this lecture, there are no major theoretical
or practical differences between these two cases

Mark Hallam University of Essex


BE368: Lecture 4
Population vs. Sample Regression Functions

I In practice, we do not know the true (population) values of


the parameters β0 , . . . , βk and we must estimate their values
I We do this using a sample of data containing matched sets of
observations for the dependent and explanatory variable(s)
I For cross-sectional data, our sets of observations are denoted
(yi , x1,i , x2,i , . . . , xk,i ) for i = 1, . . . , N, where the index i
denotes the observational unit and there are N units in total
I For time-series data, our sets of observations are denoted
(yt , x1,t , x2,t , . . . , xk,t ) for t = 1, . . . , T , where the index t
denotes the time period and there are T periods in total

Mark Hallam University of Essex


BE368: Lecture 4
Population vs. Sample Regression Functions
I There are various methods available for estimating the
unknown parameters given a sample of data, including
least-squares based methods, maximum likelihood, etc.
I Matlab uses either OLS or maximum likelihood by default, but
all methods could be programmed in Matlab if desired
I We will denote the estimated parameters we obtain by
βb0 , . . . , βbk , giving the estimated sample regression function:

yt = βb0 + βb1 x1,t + . . . + βbk xk,t + ut

where ut for t = 1, . . . , T are known as the residuals (note we


have used time series notation here, since the examples here
all use time series data)

Mark Hallam University of Essex


BE368: Lecture 4
Applications of Linear Regression

I The size and sign of the sample estimates of the parameters


βb0 , . . . , βbk are sometimes of direct interest (see e.g. CAPM)
I They can often be interpreted as estimates of the partial
effect of changes in each of the explanatory variables on the
dependent variable (i.e. causal effects)
I Particularly in the case of models intended for forecasting, we
may want to compute the predicted/fitted values from the
estimated regression model, usually denoted by yb
I More generally, we can use the estimated sample regression
function to predict the value of the dependent variable yt for
any values of the explanatory variables x1,t , . . . , xk,t

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression in Matrix Notation
I Given Matlab’s use of matrix algebra, it will be more
convenient (and more compact) to express the general
multiple linear regression model in matrix form
I For this we define the following vectors and matrices:
   
y1 1 x1,1 x2,1 · · · xk,1
 y2  1 x1,2 x2,2 · · · xk,2 
y = .  X = .
   
. . .. .. .. 
(T ×1)  .  (T ×k+1) . . . . 
yT 1 x1,T x2,T ···
xk,T
    b 
e1 β0 β0
 e2   β1  βb1 
e = .  β = .  βb =  . 
     
(T ×1)  ..  (k+1×1)  ..  (k+1×1)  .. 
eT βk βbk
Mark Hallam University of Essex
BE368: Lecture 4
Linear Regression in Matrix Notation
I Our population regression function for each observation in our
sample can then be written as:
      
y1 1 x1,1 x2,1 · · · xk,1 β0 e1
 y2  1 x1,2 x2,2 · · · xk,2  β1   e2 
 ..  =  .. ..   ..  +  .. 
      
.. ..
 .  . . . .  .   . 
yT 1 x1,T x2,T · · · xk,T βk eT
or:
y = Xβ + e
and the corresponding sample regression function as:

y = Xβb + u

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression in Matlab using regress
I One way to estimate linear regression models in Matlab is
using the regress function, with the simplest form of the
command being b = regress(y,X)
I The output ‘b’ is a (k + 1 × 1) vector of estimated parameter
values i.e. βb in our previous notation
I The input ‘y’ is a column vector of data for the dependent
variable, as in vector y above
I The input ‘X’ is a matrix of data for the explanatory
variable(s), with the data for one variable in each column i.e.
as in matrix X above
I Specifying additional outputs for the regress function also
allows it to compute some other quantities of interest e.g.
[b,bint,r] = regress((y,X) (see also Matlab help)
Mark Hallam University of Essex
BE368: Lecture 4
Linear Regression in Matlab using fitlm
I The key advantage of regress is it’s speed and simplicity,
especially when you only want the estimated coefficient values
I However, for more detailed regression analysis the alternative
fitlm function is generally a much better option
I The basic form of the function is: mdlname = fitlm(X,y)
I This creates a linear model object named ‘mdlname’ that
contains extensive estimation results for the regression model
I The input ‘y’ is again a column vector of data for the
dependent variable and ’X’ is a matrix containing data for the
explanatory variable(s), with one variable in each column
I Note: the order of the inputs ‘X’ and ‘y’ are reversed between
regress and fitlm

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression in Matlab using fitlm
I A bit like a structure object, the linear model object contains
many named fields, each containing specific quantities

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression in Matlab using fitlm
Some of the more commonly used fields include:
I Coefficients: a table containing the estimated coefficient
values, their standard errors, t-statistics and p-values
I Residuals: vector of T residuals (the vector u above)
I Fitted: vector of T fitted/predicted values for y , computed
as b
y = X βb
I SSE, SST and SSR: the error sum of squares, total sum of
squares and regression sum of squares respectively
I Rsquared: the standard and adjusted R-squared measures for
goodness of fit or explanatory power
I If using the regress function instead, we would have to
compute most of these quantities manually

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression in Matlab using fitlm
I Matlab can also display a text summary of the more
important quantities using the disp(mdlname) command, in
a similar way to other programs like STATA, etc.

Mark Hallam University of Essex


BE368: Lecture 4
Linear Regression in Matlab using fitlm
I Although the linear model object contains a lot of information
in a well-presented way, it takes a little more effort to
access/index the numerical values directly
I Because it behaves mostly like a structure object, we can
index each field using the ‘dot’ notation we saw earlier
I For example, suppose we want to create a numerical vector
‘bhat’ containing the estimated regression coefficients
I The estimated coefficients are stored in the field ‘Coefficients’,
in the sub-field ‘Estimate’ which can be indexed using
bhat=mdlname.Coefficients.Estimate
I Likewise, to save the standard/ordinary R-squared value into
‘rsq’ we use rsq=mdlname.Rsquared.Ordinary

Mark Hallam University of Essex


BE368: Lecture 4
Intercept Terms in Linear Regression
I One key practical difference between the regress and fitlm
functions is how they handle the intercept term (i.e. the β0 )
I regress does not include an intercept term automatically, so
if an intercept should be included in the model then then ’X’
must contain a column of ones in the first column
I If ‘X’ contains the data for the k explanatory variable(s) for T
time periods, then we can create a new (T × (k + 1)) matrix
X int = [ones(T,1), X] to use with the regress function
I fitlm does include an intercept by default - if you want to
exclude the intercept term then use mdlname = fitlm(X,y,
’Intercept’, false)

Mark Hallam University of Essex


BE368: Lecture 4
Example: Capital Asset Pricing Model

I The capital asset pricing model (CAPM) implies the following


relationship for assets, known as the security market line:

E [ri − rf ] = βi E [rm − rf ]

where ri , rm and rf are the returns on some particular


asset/portfolio, the market portfolio and the risk-free asset
I This states that the expected excess return on a particular
asset, E [ri − rf ], should equal the asset’s market β times the
expected excess return on the market portfolio, E [rm − rf ]

Mark Hallam University of Essex


BE368: Lecture 4
Example: Capital Asset Pricing Model

I It can be shown that the CAPM market β for a particular


asset can be estimated using the slope coefficient, α1 , in the
following regression:

r¯i,t = α0 + α1 r¯m,t + et

where r¯i,t ≡ ri,t − rf ,t is the excess return on asset/portfolio i


(above the risk-free return) at time t and r¯m,t ≡ rm,t − rf ,t is
the excess return on the market portfolio at time t
I Therefore, using a sample of excess return data for an asset
and the market portfolio, we can use linear regression to
estimate the beta for a specific asset/portfolio

Mark Hallam University of Essex


BE368: Lecture 4
Example: Capital Asset Pricing Model

I To illustrate this we use a dataset containing monthly returns


from 1926 to 2015 on a set of 17 portfolios of stocks that are
formed based on industry
I The 17 industry classifications include cars, clothes, chemicals,
financial, food, oil, retail, steel, transportation, etc.

I Taking the financial industry portfolio as an example, we start


by computing the series excess returns on the financial
portfolio and the market portfolio above the risk-free asset
using the gross returns: Finan exc = Finan - riskfree
and mkt exc = mkt - riskfree

Mark Hallam University of Essex


BE368: Lecture 4
Example: Capital Asset Pricing Model
I Using these excess return series we use the fitlm function
(could also use regress) Finan reg =
fitlm(mkt exc,Finan exc) we obtain the following
regression line:

r¯i,t = −0.022 + 1.166¯


rm,t + ut

and thus a β of 1.166 - since β > 0, returns for the portfolio


of financial firms are positively correlated with those on the
market, but since β > 1, it is more risky than the market
I It is also interesting to note the R-squared of 0.8421 i.e. 84%
of the variation in monthly returns on the financial firm
portfolio can be explained by the market

Mark Hallam University of Essex


BE368: Lecture 4
Example: Capital Asset Pricing Model
I It is sometimes also useful to plot the estimated regression line
on top of the data, to visualise how the model fits the data
I We could do this manually using the estimated regression
coefficients to create a vector of estimated/fitted values and
then plotting this vector on top of the scatter plot
I However, if we used the fitlm command to estimate our
regression model, then we can do this more easily using just
the plot command
I Specifically, we can use plot(mdlname), where ‘mdlname’ is
the name of the linear model object e.g. plot(Finan reg)
I We can then add labels to the axes and a title in the usual
way using the commands title, ylabel, etc.
Mark Hallam University of Essex
BE368: Lecture 4
Estimated regression line for Portfolio of Financial Firms
60

40

20
Excess return on asset

-20

-40

-60
-30 -20 -10 0 10 20 30 40
Excess return on market
Mark Hallam University of Essex
BE368: Lecture 4
Example: Capital Asset Pricing Model
I We can repeat this for the other 16 industry-based portfolios
to obtain their market βs in the same way
I According to the CAPM, we should see a linear relationship
between the β of an asset and it’s expected return (known as
the security market line or SML)
I Line has intercept equal to the risk-free rate (β = 0) and for
β = 1, it equals the expected return on the market portfolio

I We can plot the relationship between β and expected returns


observed in the data and compare it with the theoretical
predictions of the model (code for plots will be on Moodle)
I Any assets that lie above the SML are undervalued and any
below the SML are overvalued
Mark Hallam University of Essex
BE368: Lecture 4
Security Market Line and Betas for Industry Portfolio Data
1.4

1.2

1
Return

0.8

0.6

0.4

0.2
0 0.5 1 1.5
Beta

Mark Hallam University of Essex


BE368: Lecture 4
A Little More on Figures
I This figure is a good excuse to talk a bit more about plotting
figures in Matlab - the complete code that generated this
figure can be found in the script file on Moodle
I To generate the scatter plot we can use:
figure
scatter(beta, mean(data all))
xlabel(’Beta’)
ylabel(’Return’)
title(’Security Market Line and Betas for
Industry Portfolio Data’)
grid on
where ‘beta’ is a vector containing the estimated CAPM betas
and ‘data all’ is a matrix containing the original data for all
assets
Mark Hallam University of Essex
BE368: Lecture 4
A Little More on Figures
I Then to add the estimated SML we start by using:
sml betavals = 0:0.1:1.5;
which creates a vector of values for beta from 0 to 1.5 that
we will plot the SML for
I Then we calculate the corresponding points on the SML using:
sml y = mean(riskfree) + (mean(mkt -
riskfree)).*sml betavals;
I Then, we add a plot of the SML to the original figure using:
hold on
plot(sml betavals, sml y);
I Finally, we can make the figure look better by adding labels
and adjusting the axis limits (see the script file on Moodle)

Mark Hallam University of Essex


BE368: Lecture 4
Security Market Line and Betas for Industry Portfolio Data
1.3

1.2

Cars
1.1
Machn
Chems
Cnsum
Oil Rtail
Return

Finan
1 Food Cnstr

Clths Trans
FabPr Durbl
Mines
0.9 Other Market
Steel
Utils

0.8

0.7
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
Beta

Mark Hallam University of Essex


BE368: Lecture 4

You might also like