Regression Analysis

A regression analysis has the goal of measuring how changes in one variable,

called a dependent or explained variable can be explained by changes in one or

more other variables called the independent or explanatory variables. The

regression analysis measures the relationship by estimating an equation (e.g.,

linear regression model). The parameters of the equation indicate the

relationship.

A scatter plot is a visual representation of the relationship between the

dependent variable and a given independent variable. It uses a standard twodimensional graph where the values of the dependent, or Y variable, are on the

vertical axis, and those of the independent, or X variable, are on the horizontal

axis.

Lockup(yrs)

5

6

7

8

9

10

10

17

16

15

21

20

Returns(%) per year

14

14

15

12

12

15

16

10

19

19

13

13

20

19

15

16

20

16

20

18

17

21

23

19

Average Return

13

14

16

17

19

20

E (return | lockup period) = B0 + B1 (lockup period)

Or more generally:

Intercept Coefficient

Slope Coefficient

Yi B0 B1 X i i

Error Term

Yi b0 b1 X i ei

minimize e Yi - (b0 b1 Xi )

2

i

Yi b0 b1X i ei

e4

e2

e1

e3

Yi b0 b1X i

X

201405

201405

48. You are conducting an ordinary least squares regression of the returns on stocks Y

and X as Y=a + b X + based on the past three years daily adjusted closing price

data. Prior to conducting the regression, you calculated the following information

from the data:

Sample covariance

Sample Variance of Stock X

Sample Variance of Stock Y

Sample mean return of stock X

0.000181

0.000308

0.000525

-0.03%

0.03%

A. 0.35

B. 0.45

C. 0.59

D. 0.77

0.000181

0.5877

0.000308

Example

jointly normally distributed, the marginal distribution of each stock

has mean 2% and standard deviation 10%, and the correlation is

0.9. What is the expected annual return of stock A if the annual

return of stock B is 3%?

A.

B.

C.

D.

2%

2.9%

4.7%

1.1%

Answer: B

Linear regression requires a number of assumptions. Most of the major assumptions

pertain to she regression models residual term (i.e., error term). Three key assumptions

are as follows:

1. The expected value of the error term, conditional on the independent variable, is

zero ( E ( i | X i ) 0 )

2. All (X, Y) observations are independently and identically distributed (i.i.d.).

3. It is unlikely that large outliers will be observed in the data. Large outliers have the

potential to create misleading regression results.

Additional assumptions include:

4. A linear relationship exists between the dependent and independent variable.

5. The model is correctly specified in that it includes the appropriate independent

variable and does not omit variables.

6. The independent variable is uncorrelated with the error terms.

7. The variance of i is constant for all Xi : Var( i | X i ) 2

8. No serial correlation of the error terms exists

Y

Yi b0 b1 X i

(Yi Yi ) SSR

__

(Yi Y ) TSS

(Yi Y ) ESS

__

Y

b0

(Y

TSS

__

Y)

__

(Y Y )

ESS

ESS

SSR

R

1

TSS

TSS

2

(Y Y )

i

SSR

2 R2 R2

The standard error of the regression (SER) measures the degree of variability

of the actual Y-values relative to the estimated Y-values from a regression

equation. The SER gauges the "fit" of the regression line. The smaller the

standard error, the better the fit.

The SER is the standard deviation of the error terms in the regression. As such,

SER is also referred to as the standard error of the residual, or the standard error

of estimate (SEE).

In some regressions, the relationship between the independent and dependent

variables is very strong (e.g., the relationship between 10-year Treasury bond

yields and mortgage rates). In other cases, the relationship is much weaker (e.g.,

the relationship between stock returns and inflation). SER will be low (relative to

total variability) if the relationship is very strong and high if the relationship is

weak.

FRM

