You are on page 1of 4

Covariance, correlation coefficient

Covariance: cov ( ξ ,η )=E ( ξη )−E ( ξ ) E ( η )

cov (ξ ,η)
Correlation coefficient: ρ ( ξ ,η )=
σξση

ρ ( ξ ,η ) is a measure of “linear dependence between random variablesξ∧η ”

ρ ( ξ ,η ) is always between -1 and 1


ξ∧η are independent => ρ ( ξ ,η )=0

ρ ( ξ ,η ) is close to 1 => strong positive correlation


ρ ( ξ ,η ) is close to -1 => strong negative correlation

(Note that cov ( ξ ,ξ )=D (ξ))

1) It is known that variances of random variables ξ and η are D(ξ ) = 1,


−1
D(η) = 4, and the correlation coefficient ρ ( ξ ,η ) = . Find ρ ( ξ +η , η )
2

cov (ξ +η , η)
ρ ( ξ +η , η )=
σ ξ+η σ η

σ ξ+η=√ D(ξ +η)

D ¿)= D ¿)+ D ¿)+2cov(ξ , η) = 1+4+2cov(ξ , η)=…

−1
cov(ξ , η)= ρ ( ξ ,η ) σ ξ σ η= ∗1∗2=−1
2
…=1+4-2=3
σ ξ+η=√ D(ξ +η)=√ 3

Cov(ξ 1 +ξ 2 , ξ 3) = Cov(ξ 1 ,ξ 3)+ Cov(ξ 2 ,ξ 3)


Cov(ξ 1 ,ξ 2 +ξ 3) = Cov(ξ 1 ,ξ 2)+ Cov(ξ 1 ,ξ 3)

cov ( ξ +η , η )=cov ( ξ , η )+ cov ( η ,η )=−1+4 =3

=√
cov (ξ +η , η) 3 3
ρ ( ξ +η , η )= =
σ ξ +η σ η √ 3∗2 2
1. A particular brand of tires claims that its deluxe tire averages at least 50,000 miles before it
needs to be replaced. A survey of owners of that tire design is conducted. From the 28 tires
surveyed, the mean lifespan was 46,500 miles with a standard deviation of 9,800 miles. Assume
the lifespan is normally distributed.

a) At significance level 0.05, is the data highly inconsistent with the claim?
b) At significance level 0.1, is the data highly inconsistent with the claim?
c) At significance level 0.05 test the hypothesis that the population variance is 100000000
d) Construct a 95% confidence interval for the mean lifespan of a tire.
e) Construct a 99% confidence interval for the variance of the lifespan of a tire

a)
H 0 : μ=μ0 =50000
H 1 : μ ≠ 50000
α= 0.05

Sample summary statistics:


n=28
X =46500
S= 9800

Test statistic:
( X−μ0 )
T=√ n−1 = -1.86
S

α
t α =¿ 1- -quantile of t n−1=¿
2
¿ 2.052 so we do not reject

b)
t α =1.703

T<−t α => the data gives sufficient evidence to reject the hypothesis at this significance level

c)
2
H 0 :σ =100000000
2
H 1 : σ ≠ 100000000

T= n S2 /σ 2 =26.9
2
κ 1 ,α =0.025−quantile of chi squared with27 d . f .=14.6
κ 22 ,α =0.975−quantile of chi squared with27 d . f .=43.2

We do not reject the null hypothesis.

S∗t α S∗t α 9,800∗2.052


d)(1-α )*100%- confidence interval for μ is [ X − ,X + =¿[46500− ,
√n−1 √ n−1 √28−1
9,800∗2.052
46500+ ¿=¿[
√ 28−1
[42629.9, 50370]

2 2
nS n S
e) [ , ] =….
κ 22, α κ 21 , α

α =0.01

2) The cost of a daily newspaper varies from city to city. However, the variation among prices
remains steady with a standard deviation of 20¢. A study was done to test the claim that the
mean cost of a daily newspaper is $1.00. Twelve costs yield a mean cost of 95¢ with a standard
deviation of 18¢. Assume the cost is normally distributed. Do the data support the claim at the
1% significance level?

In this problem σ is known, so the test statistic is:

( X −μ 0)
T= √ n =-0.866
σ

that is distributed normally N(0,1).

α
z α =1− −quantile of N ( 0,1 ) =0.995−quantile of N (0,1)=2.57
2

Rejection region: (-∞ , -2.57)∪( 2.57 , ∞)

Chi squared goodness of fit test (Pearson’s test)

3. A manager thinks that 50% of the company's employees were educated on the east coast and
50% were educated on the west coast. In a survey of 100 employees 60 were educated on the
east coast and 40 on the west coast.
a) Test the manager’s claim at 0.1 significance level.
b) Test the manager’s claim at 0.01 significance level.
0 1
p_1 p_2

H_0:

ξ: 0 1
p_1= 1/2 p_2=1/2
2 2
( ν – n p1 ) (ν 2 – n p 2) (60 – 50)2 (40 – 50)2
Test statistic T= 1 + = + =4
n p1 n p2 50 50

2 2
(O1 – E1 ) (O 2 – E2 )
+
E1 E2

Kappa_alphaκ 2α = 1-α -quantile of chi squared distribution with 2-1 =1 degrees of freedom

κ 2α =2.71

4. 2.  The table below shows the number of pupils absent on particular days in the week.

Day               M     Tu     W     Th     F


Number       125    88    85    94    108

Find the expected frequencies if it is assumed that the number of absentees is independent of the
day of the week.  Test, at 5% significance level, whether the differences in observed and
expected frequencies are significant.

You might also like