You are on page 1of 2

Assignment Name - Statistics Advance

1. Calculate covariance and correlation between below two


columns A and B
Mention all step by step formula calculations in the answer
sheet.
Answer:
A B A-μ(A) B-μ(B) [A-μ(A)]*[B-μ(B)] [A-μ(A)]2 [B-μ(B)]2

25 52 -23.14 6 -138.84 535.4596 36

35 10 -13.14 -36 473.04 172.6596 1296

21 5 -27.14 -41 1112.74 736.5796 1681

67 98 18.86 52 980.72 355.6996 2704

98 52 49.86 6 299.16 2486.0196 36

27 36 -21.14 -10 211.4 446.8996 100

64 69 15.86 23 364.78 251.5396 529

Mean(A)=(25+35+21+67+98+27+64)÷7=
337÷7=48.14(Approx.)

Mean(B)=(52+10+5+98+52+36+69)÷7=322÷7=46(Approx.)

Cov(A,B)=[∑(A-μ(A))(B-μ(B))]÷6=(3303)÷6=550.5

σ(A)=√(∑[A-μ(A)]2 ÷ N-1)=√(4984.8572÷6)=
√803.809(Approx.)=28.35(Approx.)

σ(B)=√(∑[B-μ(B)]2 ÷ N-1)=√(6382÷6)=
√1063.66(Approx)=32.61(Approx.)

Cor(A,B)=Cov(A,B)÷(σ(A)*σ(B))=550.5÷(28.35*32.61)=
550.5÷924.4935=0.5954
2. What are the different ways to deal with multi collinearity?
Answer:
There are 2 ways:
A.Ignoring collinearity if prediction of y values is the object of
your study, as it is not a problem in this case.
B.Using a variable selection techniques to remove
redundant variables

3. What should be the correlation threshold value based on


which we determine the highly collinear variables?
Answer:
The correlation threshold value to determine highly collinear
variables should be ± 0.50 as any value greater than -0.5 or
lesser than 0.5 would indicate a weak correlation.

4. What are the two different types of variable we used in


ANOVA?
Categorical Variable and Numerical Variable

5. What are the null and alternate hypothesis in chi-square


test?
Chi-Square Test is applied on 2 categorical variables.
Null Hypothesis: These 2 categorical variables are
independent.
Alternate Hypothesis:These 2 categorical variables are not
independent.

You might also like