BE368 Lecture 4

BE368: Finance Research Techniques Using
Matlab
Lecture 4: Linear Regression
Mark Hallam
University of Essex
3 Nov 2022
Mark Hallam University of Essex

BE368: Lecture 4
Outline for Today’s Lecture
Linear Regression
I Concepts and definitions: sample and population regression
functions, variables, errors and residuals
I Estimation and inference: estimating linear regression
models, predicted values, alternative functional forms, tests of
significance, R-squared and other regression statistics
I An application: capital asset pricing model (CAPM)

BE368: Lecture 4
Linear Regression Models
I Linear regression analysis concerns the estimation and
inference for linear regression models - the simplest case is the
simple linear regression model:
y = β0 + β1 x + e
I y is known as the dependent/explained variable or

regressand, x is the independent/explanatory variable or
regressor and e is the error term which has mean value zero
I In such a model, the expected value of y equals β0 (the
intercept) plus β1 times the value of x (since E [e] = 0)
I This implies a linear relationship between the values of the
dependent variable, y , and the explanatory variable, x

BE368: Lecture 4
Linear Regression Models
I Linearity assumption may seem restrictive, but linear models
are directly applicable in many financial applications
I Additionally, we can define y and/or x to be non-linear
transformations of some underlying variables e.g. x = ln(z),
to allow for more general relationships
I We can also generalise the model by considering multiple
linear regression models in which the value of y may depend
on the values of two or more regressors, x1 , x2 , . . .
I For the general case with k regressors, we have:
y = β0 + β1 x1 + β2 x2 + . . . + βk xk + e
where xj denotes the value for the j-th regressor

BE368: Lecture 4
Time Series vs. Cross-Sectional Regression
I Linear regression models can be used in either a cross-section
or time-series context, both of which have numerous
applications in finance
I In a cross-sectional context, the values of y and x1 , . . . xk
represent values for different observational units (i.e. different
stocks, investors, firms, etc.) observed at the same point in
time
I In a time-series context, the values of y and x1 , . . . xk
correspond to values for the same observational units, but for
different points in time
I For the purposes of this lecture, there are no major theoretical
or practical differences between these two cases

BE368: Lecture 4
Population vs. Sample Regression Functions
I In practice, we do not know the true (population) values of

the parameters β0 , . . . , βk and we must estimate their values
I We do this using a sample of data containing matched sets of
observations for the dependent and explanatory variable(s)
I For cross-sectional data, our sets of observations are denoted
(yi , x1,i , x2,i , . . . , xk,i ) for i = 1, . . . , N, where the index i
denotes the observational unit and there are N units in total
I For time-series data, our sets of observations are denoted
(yt , x1,t , x2,t , . . . , xk,t ) for t = 1, . . . , T , where the index t
denotes the time period and there are T periods in total

BE368: Lecture 4
Population vs. Sample Regression Functions
I There are various methods available for estimating the
unknown parameters given a sample of data, including
least-squares based methods, maximum likelihood, etc.
I Matlab uses either OLS or maximum likelihood by default, but
all methods could be programmed in Matlab if desired
I We will denote the estimated parameters we obtain by
βb0 , . . . , βbk , giving the estimated sample regression function:
yt = βb0 + βb1 x1,t + . . . + βbk xk,t + ut
where ut for t = 1, . . . , T are known as the residuals (note we

have used time series notation here, since the examples here
all use time series data)

BE368: Lecture 4
Applications of Linear Regression
I The size and sign of the sample estimates of the parameters

βb0 , . . . , βbk are sometimes of direct interest (see e.g. CAPM)
I They can often be interpreted as estimates of the partial
effect of changes in each of the explanatory variables on the
dependent variable (i.e. causal effects)
I Particularly in the case of models intended for forecasting, we
may want to compute the predicted/fitted values from the
estimated regression model, usually denoted by yb
I More generally, we can use the estimated sample regression
function to predict the value of the dependent variable yt for
any values of the explanatory variables x1,t , . . . , xk,t

BE368: Lecture 4
Linear Regression in Matrix Notation
I Given Matlab’s use of matrix algebra, it will be more
convenient (and more compact) to express the general
multiple linear regression model in matrix form
I For this we define the following vectors and matrices:
   
y1 1 x1,1 x2,1 · · · xk,1
 y2  1 x1,2 x2,2 · · · xk,2 
y = .  X = .
   
. . .. .. .. 
(T ×1)  .  (T ×k+1) . . . . 
yT 1 x1,T x2,T ···
xk,T
    b 
e1 β0 β0
 e2   β1  βb1 
e = .  β = .  βb =  . 
     
(T ×1)  ..  (k+1×1)  ..  (k+1×1)  .. 
eT βk βbk
BE368: Lecture 4
Linear Regression in Matrix Notation
I Our population regression function for each observation in our
sample can then be written as:
      
y1 1 x1,1 x2,1 · · · xk,1 β0 e1
 y2  1 x1,2 x2,2 · · · xk,2  β1   e2 
 ..  =  .. ..   ..  +  .. 
      
.. ..
 .  . . . .  .   . 
yT 1 x1,T x2,T · · · xk,T βk eT
or:
y = Xβ + e
and the corresponding sample regression function as:
y = Xβb + u

BE368: Lecture 4
Linear Regression in Matlab using regress
I One way to estimate linear regression models in Matlab is
using the regress function, with the simplest form of the
command being b = regress(y,X)
I The output ‘b’ is a (k + 1 × 1) vector of estimated parameter
values i.e. βb in our previous notation
I The input ‘y’ is a column vector of data for the dependent
variable, as in vector y above
I The input ‘X’ is a matrix of data for the explanatory
variable(s), with the data for one variable in each column i.e.
as in matrix X above
I Specifying additional outputs for the regress function also
allows it to compute some other quantities of interest e.g.
[b,bint,r] = regress((y,X) (see also Matlab help)
BE368: Lecture 4
Linear Regression in Matlab using fitlm
I The key advantage of regress is it’s speed and simplicity,
especially when you only want the estimated coefficient values
I However, for more detailed regression analysis the alternative
fitlm function is generally a much better option
I The basic form of the function is: mdlname = fitlm(X,y)
I This creates a linear model object named ‘mdlname’ that
contains extensive estimation results for the regression model
I The input ‘y’ is again a column vector of data for the
dependent variable and ’X’ is a matrix containing data for the
explanatory variable(s), with one variable in each column
I Note: the order of the inputs ‘X’ and ‘y’ are reversed between
regress and fitlm

BE368: Lecture 4
I A bit like a structure object, the linear model object contains
many named fields, each containing specific quantities

BE368: Lecture 4
Some of the more commonly used fields include:
I Coefficients: a table containing the estimated coefficient
values, their standard errors, t-statistics and p-values
I Residuals: vector of T residuals (the vector u above)
I Fitted: vector of T fitted/predicted values for y , computed
as b
y = X βb
I SSE, SST and SSR: the error sum of squares, total sum of
squares and regression sum of squares respectively
I Rsquared: the standard and adjusted R-squared measures for
goodness of fit or explanatory power
I If using the regress function instead, we would have to
compute most of these quantities manually

BE368: Lecture 4
I Matlab can also display a text summary of the more
important quantities using the disp(mdlname) command, in
a similar way to other programs like STATA, etc.

BE368: Lecture 4
I Although the linear model object contains a lot of information
in a well-presented way, it takes a little more effort to
access/index the numerical values directly
I Because it behaves mostly like a structure object, we can
index each field using the ‘dot’ notation we saw earlier
I For example, suppose we want to create a numerical vector
‘bhat’ containing the estimated regression coefficients
I The estimated coefficients are stored in the field ‘Coefficients’,
in the sub-field ‘Estimate’ which can be indexed using
bhat=mdlname.Coefficients.Estimate
I Likewise, to save the standard/ordinary R-squared value into
‘rsq’ we use rsq=mdlname.Rsquared.Ordinary

BE368: Lecture 4
Intercept Terms in Linear Regression
I One key practical difference between the regress and fitlm
functions is how they handle the intercept term (i.e. the β0 )
I regress does not include an intercept term automatically, so
if an intercept should be included in the model then then ’X’
must contain a column of ones in the first column
I If ‘X’ contains the data for the k explanatory variable(s) for T
time periods, then we can create a new (T × (k + 1)) matrix
X int = [ones(T,1), X] to use with the regress function
I fitlm does include an intercept by default - if you want to
exclude the intercept term then use mdlname = fitlm(X,y,
’Intercept’, false)

BE368: Lecture 4
Example: Capital Asset Pricing Model
I The capital asset pricing model (CAPM) implies the following

relationship for assets, known as the security market line:
E [ri − rf ] = βi E [rm − rf ]
where ri , rm and rf are the returns on some particular

asset/portfolio, the market portfolio and the risk-free asset
I This states that the expected excess return on a particular
asset, E [ri − rf ], should equal the asset’s market β times the
expected excess return on the market portfolio, E [rm − rf ]

BE368: Lecture 4
I It can be shown that the CAPM market β for a particular

asset can be estimated using the slope coefficient, α1 , in the
following regression:
r¯i,t = α0 + α1 r¯m,t + et
where r¯i,t ≡ ri,t − rf ,t is the excess return on asset/portfolio i

(above the risk-free return) at time t and r¯m,t ≡ rm,t − rf ,t is
the excess return on the market portfolio at time t
I Therefore, using a sample of excess return data for an asset
and the market portfolio, we can use linear regression to
estimate the beta for a specific asset/portfolio

BE368: Lecture 4
I To illustrate this we use a dataset containing monthly returns

from 1926 to 2015 on a set of 17 portfolios of stocks that are
formed based on industry
I The 17 industry classifications include cars, clothes, chemicals,
financial, food, oil, retail, steel, transportation, etc.
I Taking the financial industry portfolio as an example, we start

by computing the series excess returns on the financial
portfolio and the market portfolio above the risk-free asset
using the gross returns: Finan exc = Finan - riskfree
and mkt exc = mkt - riskfree

BE368: Lecture 4
I Using these excess return series we use the fitlm function
(could also use regress) Finan reg =
fitlm(mkt exc,Finan exc) we obtain the following
regression line:
r¯i,t = −0.022 + 1.166¯

rm,t + ut
and thus a β of 1.166 - since β > 0, returns for the portfolio

of financial firms are positively correlated with those on the
market, but since β > 1, it is more risky than the market
I It is also interesting to note the R-squared of 0.8421 i.e. 84%
of the variation in monthly returns on the financial firm
portfolio can be explained by the market

BE368: Lecture 4
I It is sometimes also useful to plot the estimated regression line
on top of the data, to visualise how the model fits the data
I We could do this manually using the estimated regression
coefficients to create a vector of estimated/fitted values and
then plotting this vector on top of the scatter plot
I However, if we used the fitlm command to estimate our
regression model, then we can do this more easily using just
the plot command
I Specifically, we can use plot(mdlname), where ‘mdlname’ is
the name of the linear model object e.g. plot(Finan reg)
I We can then add labels to the axes and a title in the usual
way using the commands title, ylabel, etc.
BE368: Lecture 4
Estimated regression line for Portfolio of Financial Firms
60
40
20
Excess return on asset
-20
-40
-60
-30 -20 -10 0 10 20 30 40
Excess return on market
BE368: Lecture 4
I We can repeat this for the other 16 industry-based portfolios
to obtain their market βs in the same way
I According to the CAPM, we should see a linear relationship
between the β of an asset and it’s expected return (known as
the security market line or SML)
I Line has intercept equal to the risk-free rate (β = 0) and for
β = 1, it equals the expected return on the market portfolio
I We can plot the relationship between β and expected returns

observed in the data and compare it with the theoretical
predictions of the model (code for plots will be on Moodle)
I Any assets that lie above the SML are undervalued and any
below the SML are overvalued
BE368: Lecture 4
Security Market Line and Betas for Industry Portfolio Data
1.4
1.2
1
Return
0.8
0.6
0.4
0.2
0 0.5 1 1.5
Beta

BE368: Lecture 4
A Little More on Figures
I This figure is a good excuse to talk a bit more about plotting
figures in Matlab - the complete code that generated this
figure can be found in the script file on Moodle
I To generate the scatter plot we can use:
figure
scatter(beta, mean(data all))
xlabel(’Beta’)
ylabel(’Return’)
title(’Security Market Line and Betas for
Industry Portfolio Data’)
grid on
where ‘beta’ is a vector containing the estimated CAPM betas
and ‘data all’ is a matrix containing the original data for all
assets
BE368: Lecture 4
A Little More on Figures
I Then to add the estimated SML we start by using:
sml betavals = 0:0.1:1.5;
which creates a vector of values for beta from 0 to 1.5 that
we will plot the SML for
I Then we calculate the corresponding points on the SML using:
sml y = mean(riskfree) + (mean(mkt -
riskfree)).*sml betavals;
I Then, we add a plot of the SML to the original figure using:
hold on
plot(sml betavals, sml y);
I Finally, we can make the figure look better by adding labels
and adjusting the axis limits (see the script file on Moodle)

BE368: Lecture 4
Security Market Line and Betas for Industry Portfolio Data
1.3
1.2
Cars
1.1
Machn
Chems
Cnsum
Oil Rtail
Return
Finan
1 Food Cnstr
Clths Trans
FabPr Durbl
Mines
0.9 Other Market
Steel
Utils
0.8
0.7
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5
Beta

BE368: Lecture 4

BE368 Lecture 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BE368 Lecture 4

Uploaded by

Copyright:

Available Formats

BE368: Finance Research Techniques Using

Mark Hallam University of Essex

Mark Hallam University of Essex

I y is known as the dependent/explained variable or

Mark Hallam University of Essex

where xj denotes the value for the j-th regressor

Mark Hallam University of Essex

Mark Hallam University of Essex

I In practice, we do not know the true (population) values of

Mark Hallam University of Essex

yt = βb0 + βb1 x1,t + . . . + βbk xk,t + ut

where ut for t = 1, . . . , T are known as the residuals (note we

Mark Hallam University of Essex

I The size and sign of the sample estimates of the parameters

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

I The capital asset pricing model (CAPM) implies the following

where ri , rm and rf are the returns on some particular

Mark Hallam University of Essex

I It can be shown that the CAPM market β for a particular

where r¯i,t ≡ ri,t − rf ,t is the excess return on asset/portfolio i

Mark Hallam University of Essex

I To illustrate this we use a dataset containing monthly returns

I Taking the financial industry portfolio as an example, we start

Mark Hallam University of Essex

r¯i,t = −0.022 + 1.166¯

and thus a β of 1.166 - since β > 0, returns for the portfolio

Mark Hallam University of Essex

I We can plot the relationship between β and expected returns

Mark Hallam University of Essex

Mark Hallam University of Essex

Mark Hallam University of Essex

You might also like