You are on page 1of 5

EXERCISE SHEET 7

1. Assume the X and Y are random variables and they have bivariate Normal
distribution (A bivariate normal distribution is made up of two independent
random variables. The two variables in a bivariate normal are both are
normally distributed, and they have a normal distribution when both are
added together.)

x y
5 8
7 9
3 11
16 27
12 15
9 13

C
a) Calculate the correlation coefficient between X and Y, using the r =
√ AB
formula, where

C=¿ n ∑ x i yi −∑ x i ∑ y i
A=¿ n ∑ x i2−¿ ¿ ¿
B=n ∑ y i −¿ ¿ ¿ ¿
2

5 8 40 25 64
7 9 63 49 81
3 11 33 9 121
16 27 432 256 729
12 15 180 144 225
9 13 117 81 169

C=¿ n ∑ xy−∑ x ∑ y =6∗865−52∗83=874


A=¿ n ∑ x i2−¿ ¿ ¿
B=n ∑ y i −¿ ¿ ¿ ¿
2

C 874
r= = = 0.88
√ AB √ 680∗1445

1
b) Test the significance of the sample correlation coefficient using a 95%
confidence level.
Step 1: Hypothesis
H 0 : ρ=0; H 1 : ρ ≠ 0
Step2: use the formula to calculate test value for t
r
t=
Test value:
√ 1−r 2
n−2
0.88
t= =¿
Test value:
√ 1−.88 2
6−2
3.7

Step3: T-table critical value


Two tails tests because # r value can go less than 0 or higher than 0
So, α/2= 5/2=2.5 % =0.025
Degree of Freedom
V = n-2= 6-2=4
With 4 degree of freedom with 0.025 significant level, critical value is 2.776

Step 4: make the decision


Test value is greater than the table value/critical value. So, reject the null
hypothesis (r=0).

c) Consider the following regression model and data for x and y:

y  0  1 x  

x y
5 8
7 9
3 11
16 27
12 15
9 13
Calculate the least squares estimates of the intercept and the slope of the
regression model.

Answer:

5 8 40 25 64
7 9 63 49 81

2
3 11 33 9 121
16 27 432 256 729
12 15 180 144 225
9 13 117 81 169

C
Slope: The estimator for β=b=
A

Intercept:

d) Show the estimated linear regression function

Estimated linear regression function:

e) Calculate R-squared and explain the results.

Answer:

3
3. A consultancy firm has offices in over 100 different cities. The managers have the
impression that some offices work more efficiently than others in giving advice to
clients. They ask you to conduct a statistical analysis to provide a benchmark on
which different offices can be evaluated.

The management provides you with data on 12 offices. The data are given in the table
below and consist of the total cost Y and the number of clients X of an office.

You are also given the following values:

Σx = 512 Σy = 2275 Σx2 = 24290 Σy2 = 471529 Σxy = 106941

The management assumes that the relationship between total cost and the number of
clients is linear.

(a) Based on top management’s assumption, what is the underlying theoretical


regression model
Regression line will be of the form:
Where y is the total cost, x is the number of clients.

(b) What are the assumptions of ordinary least squares methods?

Assumptions underlying the OLS regression model


yi = α + β · xi + εi with εi ~N(0,σ2)

1. εi is a random variable with mean zero, E[εi] = 0


2. Variance of εi is the same (constant) for all values of X, Var(εi) = σ2
3. The values of εi are stochastically independent (Uncorrelated with one
another)
4. The error term εi is a normally distributed random variable
5. The independent (explanatory) variables are fixed. They are not random
variable
4
(c) What term captures unsystematic influences on total cost in the model and
what assumptions are made about these terms when estimating the regression model?

The error term captures the unsystematic influences. The error term is assumed
to be normally distributed with mean 0 and a constant variance σ 2. In other words,
ε N ¿).

(d) Calculate the estimates for the unknown model parameters and show
the estimated linear regression function.
The parameters to be estimated are as always α and β
C
The estimator for β=b=
A
Where,
C=n ∑ xy−∑ x y= (12 ×106,941 ) −( 512× 2,275 )=118,492

Therefore, ;

Hence, the equation line is of the following form:


(e) What is the interpretation of the intercept and the slope coefficient?

The intercept is the fixed cost while the slope is the cost per client.

(f) Calculate the coefficient of determination, R2. Explain your result

(10%)

R-square of 0.99: It means that 99% of the variability of the total cost of the offices is
explained by the number of clients.

You might also like