You are on page 1of 32

Problem set 1 - Solutions

Financial statistics
HT 2023

Andriy Andreev
Ralf Xhaferi
Department of Statistics
Question 1.1
A Swedish company which does business with a Norwegian company is
affected by the exchange rate. The company needs to know how many
Swedish kronor (SEK) corresponds to 100 Norwegian krone (NOK). After
analyzing data from an 18-month period, they found that the daily
change in the exchange rate can be presumed to be independent and
identically distributed from a normal distribution with expected value 0
and standard deviation 0.375 SEK.
a) What is the probability that the exchange rate drops at least 0.50
kronor from today's rate to tomorrow's rate?
b) What is the mean and standard deviation of the change over two
business days?
c) What is the probability that the exchange rate will rise at least 0.50
kronor over two business days?
Solution 1.1:
• We define the variable: 𝑋𝑖 =change of the exchange rate over a day(daily
change in exchange rate)
Then 𝑋𝑖 ~𝑁(0, 0.3752 )

a) What is the probability that the exchange rate drops at least 0.50 kronor from today's rate to
tomorrow's rate?
In this way we think for normal distribution, standardized.
𝑋−µ
If 𝑋𝑖 ~𝑁(0, 0.3752 ) then Z =
σ
x −0
i −0.5−0
P(𝑋𝑖 <-0.50)=P(0.375 < 0.375
)= P(Z <-1.33)= P(Z >1.33)= 1-P(Z<1.33)= 1- ɸ(1.33) =1-0.90824=0.09176

Table 1 from the


formula sheet

Consequently, the probability that the rate decreases with at least 0.50 SEK from the rate of the day
to the rate of tomorrow is 0.09176.
𝑋𝑖 =change of the exchange rate over a day, 𝑋𝑖 ~𝑁 0, 0.3752 and 𝑋𝑖+1 ~𝑁(0, 0.3752 )

b) What is the mean and the standard deviation of the change over two business
days? Remember that:
E(aX+bY)=aE(X)+bE(Y)
We define Y= 𝑋𝑖 + 𝑋𝑖+1 = 𝑐ℎ𝑎𝑛𝑔𝑒 𝑜𝑣𝑒𝑟 𝒕𝒘𝒐 𝒃𝒖𝒔𝒊𝒏𝒆𝒔𝒔 𝒅𝒂𝒚𝒔 Var(aX+bY)= 𝑎2 𝑉(𝑋)+ 𝑏2 𝑉 𝑌 + 2𝑎𝑏𝐶𝑜𝑣(𝑋, 𝑌)

Then E(Y) = E(𝑋𝑖 + 𝑋𝑖+1 ) = E(𝑋𝑖 ) + 𝐸(𝑋𝑖+1 )=2 * E(𝑋𝑖 ) = 2 ∗ 0 = 0


V(Y) = V(𝑋𝑖 + 𝑋𝑖+1 ) = V(𝑋𝑖 ) + 𝑉(𝑋𝑖+1 ) + 2 ∗ 1 ∗ 1 ∗ cov((𝑋𝑖 , 𝑋𝑖+1 )
Covariance is zero because one can ssume that the daily changes are independent

V(Y)=2 * V(𝑋𝑖 ) = 2 ∗ 0.3752 ⇒ 𝑆𝐷 = 2*0.375=1.4142*0.375=0.530325

Consequently, the change over two business days has a mean zero and a standard
deviation of 0.53
𝑌𝑗 =change of the exchange rate for two days
When 𝑌𝑗 ~𝑁(0, 0.532 )
c) What is the probability that the exchange rate will rise at least 0.50 kronor over two
business days?
𝑌𝑗 −0 0.5−0
P(𝑌 > 0.50)=P( > )= P(Z > 0.94)=1-P(Z<0.94)=1- ɸ(0.94) =1-0.82639=0.17361
0.53 0.53

Read from the formula sheet


and distribution tables

Consequently, the probability that the rate increases with at least 0,50 SEK over two
bankdays is 0.17361
Question 1.2
You may choose between two corporate bonds (from companies A or B) that have
the same maturity (five years). There is a risk of bankruptcy associated with
corporate bonds, where you don’t get your money back.
Let:
X = return on investment in A's bond after five years (SEK).
Y = return on investment in B's bond after five years. (SEK).

If the company does not go bankrupt the accumulated interest rate after five years is
10% for A and 20% for B. The risk that company A goes bankrupt is 3% and the
corresponding risk for company B is 5%. The risk that both A and B go bankrupt
during the 5 years is estimated to be 2%:
Y= -1 Y= 0.2

X= -1 0.02 0.01 0.03


X and Y have the following joint distribution:
X= 0.1 0.03 0.94 0.97

0.05 0.95 1
a) What is the expected return on investment from A and B; E[X] and
E[Y]?

b) Calculate the correlation between the two investments; Corr(X,Y).

c) If you invest 5000 kronor in each bond, what is the expected


return and variance from the portfolio? Let 𝑊 = 5000𝑋 + 5000𝑌 be
the return on the portfolio after 5 years. Calculate 𝐸(𝑊) and 𝑉(𝑊).

d) If you have 10 000 kronor to invest in either company, how should


you divide up your investments in order to minimize the variance
of your portfolio?
Solution 1.2:
a) What is the expected return on investment from A and B; E[X] and E[Y]?

Company B

Company A

Remember that E[X]=σ 𝒙𝑷(𝒙)


Then:
• E[X] = -1*0.03 + 0.1*0.97= -0.03 + 0.097 = 0.067
• E[Y] = -1*0.05 + 0.2*0.95= -0.05 + 0.19 = 0.14

Consequently, the expected profit for A and B is 6.7% and 14%.


b) Calculate the correlation between the two investments, Corr(X,Y).

𝑪𝒐𝒗(𝑿,𝒀)
We know that Corr(X,Y) =
𝑽 𝑿 ∗𝑽(𝒀)
First we have to calculate variance. Recall: Var[X]=σ(𝑥 − 𝜇𝑥 )2 ∗ 𝑃(𝑥)= σ𝑎𝑙𝑙 𝑥(𝑥)2 ∗ 𝑃 𝑥 − (𝜇𝑥 )2
• Therefore Var(X) = (-1)2*0.03 + 0.12*0.97 – 0.0672= 0.03 + 0.0097 – 0.004489 = 0.035211
• Var(Y) = (-1)2*0.05 + 0.22*0.95 – 0.142= 0.05 + 0.038 -0.0196 = 0.0684
• Recall: Cov(X,Y)=σ𝑎𝑙𝑙 𝑥 σ𝑎𝑙𝑙 𝑦(𝑥𝑦) ∗ 𝑃 𝑥, 𝑦 − E[X] * E[Y]
• Cov(X,Y) = (-1)*(-1)*0.02 + (-1)*0.2*0.01 + 0.1*(-1)*0.03 + 0.1*0.2*0.94 – 0.067*0.14 = 0.02442
𝐶𝑜𝑣(𝑋,𝑌) 0.02442
• Corr(X,Y) = = 0.035211∗0.0684
=0.498
𝑉 𝑋 ∗𝑉(𝑌)
The correlation is 0.498.
c) If you invest 5000 kronor in each bond, what is the
expected return and variance from the portfolio? Let
W=5000X+5000Y be the return on the portfolio after 5
years.
W represents the overall return on the portifolio for 5
years
E[W] = E[5000X+5000Y] = 5000*E[X]+5000*E[Y]= 500(0.067
+ 0.14) = 1035
Var(W) = V[5000X+5000Y] =
50002*Var(X)+50002*Var(Y)+2*5000*5000*Cov(X,Y)
= 50002* 0.035211 +50002* 0.0684 +2*5000*5000* 0.02442 = 3
811 275
d) If you have 10 000 kronor to invest in either company, how should you
divide up your investments in order to minimize the variance of your
portfolio

Let α be the proportion of the 10000 invested in option A, that minimises the variance
The portfolio: W=10000 α X+10000(1- α)Y=10000[α X+(1- α)Y]

Var(Portfolio) = Var(10000[α X+(1- α)Y]) = 100002 𝑉𝑎𝑟[αX+(1−α)Y] =100002 [𝑎2 𝑉𝑎𝑟(X) +(1 − 𝛼)2 Var(Y) +
2∗α∗(1−α) cov(x,y) = 100002 [𝑎2 𝑉𝑎𝑟(X) +(1 − 𝛼)2 Var(Y) + 2∗(α−𝑎2 ) cov(x,y)

min Var(Portfolio) {𝛼 ∗ }
𝜕Var(Portfolio)
=0
𝜕𝛼

100002 [2𝛼𝑉𝑎𝑟(X) +2(1−α)(−1) Var(Y) + 2∗(1 − 2𝛼) cov(x,y)] = 0

Var(Y)−Cov(X,Y)
α=
Var(X)+Var(Y)−2Cov(X,Y)
. Whether we include the factor 10000 or not, the variance (the
risk) is minimized for :
Var(Y)−Cov(X,Y)
α=
Var(X)+Var(Y)−2Cov(X,Y)

0.0684 − 0.0244
= = 0.8029
0.0684 + 0.0352 − 2∗0.0244

The risk is minimized if we invest 8029 SEK


(10000*0,8029) in obligation A and 1971 SEK
[10000*(1-0,8029)] in B.
Question 1.3
A grocery store in the United States conducted an experiment to
investigate how the sales of a certain coffee brand could be affected
by exposure area. The exposure area was varied randomly for
twelve weeks (the exposure area could be 3, 6 or 9 square feet).
Other coffee brands had a constant exposure area of 3 square feet.
Assume that all coffee brands had the same price during the
experiment. The following results were obtained

Weekly sales in number 526 421 581 640 412 500 444 443 580 570 376 723
of packages (1/2 kg) (Y)
Exposure area (square 6 3 6 9 3 9 6 3 9 6 3 9
feet) (X):
Suppose that the relationship between weekly sales and exposure area can be
described by a simple linear regression model : Yi = β0 + β1Xi +εi
Answer the following questions using the regression output on the following page
a) Estimate the model’s parameters using the least squares method and interpret the
parameter estimates in words.
b) Calculate the residuals and their variance 𝑆𝑒2 .
c) State the coefficient of determination 𝑅2 and interpret the value
d) Test at the 5% significance level the null hypothesis 𝛽1 = 0 against the alternative 𝛽1 >
0.
e) Construct a 95% confidence interval for 𝛽1 and interpret the interval.
f) Calculate 95% confidence intervals for the average weekly coffee sales when the
exposure area is 3, 6 and 9 square feet. Which interval is shortest and why?
g) Suppose that next week the exposure area will be 9 feet. Calculate a 95% prediction
interval for expected sales that week.
h) What assumptions should you make to answer d) – g)? Are all those assumptions
fulfilled? If not, what effect will this have?
Weekly Sales vs Exposure Area
750

700

650

600
Weekly Sales (Y)

550

500

450

400

350

300
2 3 4 5 6 7 8 9
Exposure Area (X)
Descriptive Statistics:

Standard
Variable Count Mean Minimum Q1 Median Q3 Maximum
Deviation

Weekly sales 12 518 30.11644 376 426.5 513 580.8 723

Exposure area 12 6 0.738549 3 3 6 9 9

Regression Analysis:
Model summary

R Square 0.653254

Adjusted R Square 0.618579

Standard Error 64.43126

Coefficients
Coefficients SE t-value P-value VIF
Intercept 320.25 49.21019 6.507799 6.83E-05
Exposure area 32.95833 7.593297 4.340451 0.001465 1
Prediction for Weekly Sales

Variable Setting 𝑿 = 𝟑
Fit SE Fit 95% CI 95% PI
419.125 29.4087 (353.598; 484.652) (261.316; 576.934)

Variable Setting 𝑿 = 𝟔

Fit SE Fit 95% CI 95% PI


518 18 18. 5997 (476.557; 559.443) (368.576; 667.424)

Variable Setting 𝑿 = 𝟗

Fit SE Fit 95% CI 95% PI


616.875 29.4087 (551.348; 682.402) (459.066; 774.684)
Solution 1.3:
a) Estimate the model’s parameters using the least squares method and interpret the
parameter estimates in words.

The parameters can be seen in this regression output:


෢β0 =320
If x=0 was situated in the area we are
investigating, the interpretation would be: If
we have a surface exposure of 0 square feet
Coefficients SE t-value P-value VIF
the weekly sale would be 320 on average (be
careful with this interpretation).
Intercept 320.25 49.21019 6.507799 6.83E-05
Exposure area 32.95833 7.593297 4.340451 0.001465 1

β෡1 =32.96
If the surface exposure increases with a unit (a
Y෡i = ෢
β0 + β෡1 ෡i
𝑿 square foot) the weekly sales is estimated to
increase with 33 packages on average.
How to calculate manually: ෢
β0 and β෡1
Y X 𝑋𝒀 𝒙𝟐
526 6
𝛽መ0 = 𝑦ത − β
෢1 𝑥ҧ
3156 36
421 3 (σ 𝑥𝑖 )(σ 𝑦𝑖 )
1263 9 σ𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 − 𝑆𝑥𝑦 𝑆𝑦
581 6 3486 36 𝛽መ1= 𝑛
2 = = 𝑟𝑥𝑦
𝑛 2 (σ 𝑥𝑖 ) 𝑆𝑥2 𝑆𝑥
σ𝑖=1 𝑥𝑖 −
640 9 5760 81 𝑛
412 3 1236 9 (72)(6216)
39669 − 39669 − 37296
500 9 4500 81 𝛽መ1= 12 =
444 6 72 2 504 − 432
2664 36 504 −
12
443 3 1329 9
2373
580 9 5220 81 𝛽መ1= = 72 = 32.9583~32.96
570 6 3420 36 6216 72
𝛽መ0= = − 32.96 ∗ = 518-(32.96*6)=320.24
12 12
376 3 1128 9
723 9 6507 81
. Consequently: 𝛽መ1= =32.96 and 𝛽መ0= 320.24
෍ 𝑦𝑖 = 6216 ෍ 𝑥𝑖 = 72 ෍ 𝑥𝑦 = 39669 σ 𝑥𝑖2 =504
.
Read this closely, important to know
• The interceptet B0 states an estimation of an average y-value for
individuals with the value x = 0. i.e., the estimation of the
population average B0 for the population of all individuals with
the value x = 0.
What does this mean? (Think of this first)
• In many situations the interpretation of the intercept B0 is not
meaningful. This is due to the fact that the values x = 0 often is
situates far outside the investigated area or that x-variable
cannot have the value x = 0 whatsoever.
• This concerns exactly our example where X is situated between
3 , 6 and 9.
b) Calculate the residuals and their variance 𝑺𝟐𝒆 .
• How to calculate the residuals manually: e=Y-𝑌෠
σ 𝒆𝟐 41513.92
Y X ෠
𝑌=320.3+32.96X e=Y-𝑌෠ 𝒆𝟐 𝑆2 = = 10
𝑛−2
526 6 518.06 7.94 63.0436
𝑆 2 = 4151.392
421 3 419.18 1.82 3.3124
581 6 518.06 62.94 3961.444
640 9 616.94 23.06 531.7636
412 3 419.18 -7.18 51.5524 From regression printout:
500 9 616.94 -116.94 13674.96 𝑆 = 64.4313 → 𝑆 2 = (64.4313)2= = 4151.354
444 6 518.06 -74.06 5484.884
443 3 419.18 23.82 567.3924
580 9 616.94 -36.94 1364.564
570 6 518.06 51.94 2697.764
376 3 419.18 -43.18 1864.512
723 9 616.94 106.06 11248.72
෍ 41513.92
c) State the coefficient of determination 𝑹𝟐 and interpret the value
.
R2 the coefficient of determination will be interpreted.

Model summary 𝑅2 = 65.33%


The interpretation is that 65.33%
R Square 0.653254 of the variation in the sale is
explained by the estimated model.
Adjusted R Square 0.618579 SSR SSE
R2 = = 1− , SST = SSR + SSE
SST SST
Standard Error 64.43126 41513.92
R2 = 1 − = 0.6532 or 65.32%
119724
SSE =  e2 = 41513.92

( Y ) Alternative
2

SST =  (Y − Y )2 =  Y 2 − Cov( X ,Y )
n r = cor ( X ,Y ) = = 0.808241
var( x)Var (Y )
( 6216 )
2

SST = 3339612 − = 119724 R 2 = r 2 = ( 0.808241) = 0.6532


2
12
d) Test at the 5% significance level the null hypothesis 𝜷𝟏 = 𝟎 against the alternative 𝜷𝟏 > 𝟎
Null hypothesis H0 : β1 =0
Alternative hypothesis H1 : β1 >0
Significance level α=5% (the significance level is the probability of rejecting the null hypothesis when the null
hypothesis is true)

𝑏1 −β1
Test function: t= that is t-distributed
𝑆2
𝑒
𝑥)2 𝑏 −0
σ(𝑥𝑖−ഥ
t= 𝑆1
𝑏1
with n-2 degrees of freedom (n=12)

32.96−0
Test function: t-value= 4151.354
=4.34
Coefficients SE t-value P-value VIF 𝑥)2
σ(𝑥𝑖−ഥ

That is t-distributed with n-2 degrees of freedom (n=12)


Intercept 320.25 49.21019 6.507799 6.83E-05
The critical value tc= tn-2 , α =1..812,
Exposure area 32.95833 7.593297 4.340451 0.001465 1 Since t-value>tc ,
the null hypothes is rejected

Alternative: Since p−value < α , the null hypothesis is


rejected
X x-𝑥ҧ (x−𝑥ҧ )𝟐
32.96−0
6 0 0 Test function: t= 4151.354
that is t-distributed
3 -3 9 ഥ )2
σ(𝑥𝑖−𝑥
6 0 0
with n-2 degrees of freedom (n=12)
9 3 9
3 -3 9 32.96
Test function: t= with 10 degrees of freedom
4151.354
9 3 9 72

6 0 0
3 -3 9 32.96−0 32.96
Test function: t= =7.593266= 4.340688 with 10 df
9 3 9 57.65769
6 0 0
3 -3 9 tc=1.812 with 10 df
9 3 9
𝑥ҧ =72/12=6 Since t>tc (4.34>1.812), the null hypothesis is rejected
෍ = 72
e) Construct a 95% confidence interval for 𝜷𝟏 and interpret the interval.
𝑆𝑒2
𝑆𝛽෡2 =
1 σ(𝑥𝑖 − 𝑥)ҧ 2
• We know that confidence interval for β1 is 𝛽෠1 = 𝑏1 −𝑡 α ∗ 𝑠𝑏1 < β1 < 𝑏1 + 𝑡 α ∗ 𝑠𝑏1
𝑛−2, 𝑛−2,
2 . 2 .

32.96 − 𝑡 10,0.025 . 7.59< β1 < 32.96 + 𝑡 10,0.025 . 7.59

Coefficients SE t-value P-value VIF

Intercept 320.25 49.21019 6.507799 6.83E-05

Exposure area 32.95833 7.593297 4.340451 0.001465 1


32.96 −2.228*7.59< β1 < 32.96 + 2.228*7.59
16.05< β1 < 49.85

Interpretation: With 95% confidence this interval covers the expected sale increase when the
surface exposure increases with a square foot.
f) Calculate 95% confidence intervals for the average weekly coffee sales when the
exposure area is 3, 6 and 9 square feet. Which interval is shortest and why?
95% CI for 3 square feet, with
Variable Setting 𝑿 = 𝟑 length 131.054

Fit SE Fit 95% CI 95% PI


419.125 29.4087 (353.598; 484.652) (261.316; 576.934)

95% CI for 6 square feet, with


length 82,886
Variable Setting 𝑿 = 𝟔

Fit SE Fit 95% CI 95% PI 95% CI for 9 square feet, with


518 18 18. 5997 (476.557; 559.443) (368.576; 667.424) length 131.054

The confidence interval for the


Variable Setting 𝑿 = 𝟗 average weekly coffee sales when
the exposure area is 6 is the
shortest, since the value x=6 is
Fit SE Fit 95% CI 95% PI situated closer to the average of
616.875 29.4087 (551.348; 682.402) (459.066; 774.684)
X.
If you want to calculate manually, you
have to remember that:
• The confidence interval of for the mean given X=x is
calculated by;
g) Suppose that next week the exposure area will be 9 feet. Calculate a 95% prediction
interval for expected sales that week.

Variable Setting 𝑿 = 𝟑

Fit SE Fit 95% CI 95% PI


419.125 29.4087 (353.598; 484.652) (261.316; 576.934)

With 95% probability the


Variable Setting 𝑿 = 𝟔 weekly coffee sales will be
situated in this intervall if
Fit SE Fit 95% CI 95% PI we suppose that the coffee
518 18 18. 5997 (476.557; 559.443) (368.576; 667.424) is exposed on a surface of 9
square feet.

Variable Setting 𝑿 = 𝟗

Fit SE Fit 95% CI 95% PI


616.875 29.4087 (551.348; 682.402) (459.066; 774.684)
If you want to calculate manually, you have to remember that:

• Prediction interval for 𝒚


ෝ𝒏+𝟏 is:
1 (𝑥𝑛+1 −𝑥)ҧ 2
𝑦ො𝑛+1 ±𝑡(𝑛−2,∝) ∗ ( 1 + + σ 𝑥𝑖 −𝑥ҧ 2
) *𝑆𝑒 or
2 𝑛

1 (9−6)2
616.875 ± 2.228* 1 + 12
+ 72
*64.43.13

95% PI: (459.07,774.67)


h) What assumptions should you make to answer d) – g)? Are all those assumptions
fulfilled? If not, what effect will this have?
You have to fulfill the following assumptions:
• The observations have to be independent. The residuals in the model have to be
normally distributed and the variance
2
of the residual has to be constant
(homoscedasticity) i.e. 𝜀𝑖 ~𝑁(0, 𝜎 )
You can see that the bigger X (exposure area) the bigger
Weekly Sales vs Exposure Area is the variation in Y (weekly sales), which means that
750 variance for 𝜀𝑖 is not homoscedastic.
700
650
Weekly Sales (Y)

600
550 Effect :
500
450
1. If you cannot fulfil assumptions of normality, the
400 estimations continue to be unbiased
350
300
2 3 4 5 6 7 8 9 2. But the confidence interval and the test are wrong
Exposure Area (X)

You have to be careful when you interpret the


estimations from a model that does not fulfil the
assumptions.
Important to know
• If the residuals in the model are not normally distributed or the
variance in the residuals is not constant or if both of them are
violated or not fulfilled, you can sometimes find an appropriate
transformation that normalizes the residuals and / or stabilizes
the variances. Linear regression is quite robust against small
deviations from the normal distribution.

• If observations are dependent, you have to model in other ways.


• Extra exercises in the NCT. Solutions on Athena
• 4.74, 4.79, 5.32, 5.44, 5.97, 11.20, 11.25a, 11.26 (extra), 11.58.

You might also like