Professional Documents
Culture Documents
Note:
Concept of Spurious or Nonsense correlation:
Sometimes it is found that there is no casual relation between two variables but due to
presence of a third variable a correlation can be observed between the two. This variable
which is responsible for the correlation other two variable is called “Lurking variable”.
n n n n
iii. When deviations are taken from actual means say x and y such that u=x-
x and v=y- y in such a case r will be given by,
r=
∑ uv
∑u ⋅ ∑v
2 2
: 342 :
J. K. SHAH CLASSES Correlation
iv. When deviations are taken from assumed means say ‘a’ from z and ‘b’
from y such that u=X-a and v=Y-b in such a case ‘r’ is given by,
∑ uv − ∑ u ⋅ ∑ v
r= n n n
∑ u 2 ∑ v 2 − ∑ v
2
n n n
Note 1: Use (i) when you find that cov (x, y), σ x and σ y are provided
Note 2: Use (ii) when you find that the values of x and y are small
Note 3: Use (iii) when you find that x and y are whole numbers
Note 4: Use (iv) when you find that x and y are not whole numbers or the
values of x and y are large or the problems specifically directs that the
deviations are to be taken from assumed mean only.
: 343 :
J. K. SHAH CLASSES Correlation
3. Concurrent Deviation Method or Coefficient of Concurrent Deviation [r]:
• It is the simplest method of calculating correlation
• It is used to know the direction changes between two variables
• It is suitable only when the variable includes short term fluctuations
• It lies between -1 and +1
• Let (x1,y1), (x2,y2), ……, (xn+1,yn+1) be a set of (n+1) pairs of values of x
and y. Let Cx and Cy denote the direction changes in the values of x and y
i.e., Cx and Cy will have positive signs if there is an increase in the values
of x and y w.r.t its immediate preceding value and will have negative
signs in case of decrease.
If C denotes the number of concurrent deviations i.e., total number of
positive signs in the Cx .Cy column then the coefficient of concurrent
deviation is given by,
2C − n )
r = ± ±
n
Where,
2C − n
i. If is positive, positive sign is to be assigned both inside and
n
outside the square root.
2C − n
ii. If is negative, negative sign is to be assigned both inside and
n
outside the square root.
iii. When C = 0, r = -1
iv. When C = n, r = 1
n
v. When C = ,r=0
2
: 344 :
J. K. SHAH CLASSES Correlation
4. Diagramatic representation of correlation through scatter diagram or scatter plot:
• It the simplest and the quickest way to represent bivariate data
• It gives a vague idea about the nature of correlation between two
variables
• It helps us to distinguish between different types of correlation but fails to
measure the extent of relationship between the variables
• Through scatter diagram we can get an idea about the nature of
correlation; positive, negative, zero or curvilinear.
: 345 :
J. K. SHAH CLASSES Correlation
Properties of Correlation of Coefficient ‘r’:
• Coefficient correlation is symmetric i.e., rxy= ryx
• If y = a+ bx then,
i. r = +1 when b>0 and
ii. r = -1 when b<0
iii. correlation coefficient is independent of the change of origin and scale.
x−a y −b
If u = and v = then,
c d
a. ruv = rxy, if c and d are of the same sign
b. ruv = - rxy, if c and d are of the opposite size
Miscellaneous Properties:
• Coefficient of determination = r2
ExplainedVariance Un exp lainedVariance
r2= = 1−
TotalVariance TotalVariance
• Coefficient of Non -Determination :
Un exp lainedVariance
1 − r2 =
TotalVariance
• Coefficient of alienation = square root of coefficient of non-determination =
1 − r2
• Percentage of explained variation = r 2 × 100
• Percentage of unexplained variation = (1-r2)×100
1 − r2
• Standard error of r (S.E of r) =
n
• Probable error of r [P.E (r)] = 0.6745 × SE (r )
• Probable error is used for determining the reliability of correlation coefficient and
for this purpose the following rule is followed,
a. If r < 6 P.E then there is no evidence of correlation and it is not significant
b. If r = 6 P.E then there is correlation
c. If r > 6 P.E then correlation exists and it is also significant
• Let x and y be two correlated variables, then: V (x ± y ) = V ( x ) + V ( y ) ± 2Cov ( x, y )
• Let x and y are two uncorrelated variables, then Cov(x,y) = 0 and hence,
V ( x ± y) = V ( x) + V ( y)
Bivariate Data
• When a set of data is collected for two variables simultaneously it is called a
Bivariate Data
: 346 :
J. K. SHAH CLASSES Correlation
• When a frequency distribution is formed with these bivariate data it is known as
Bivariate Frequency Distribution or Joint Frequency Distribution or Two Way
Distribution
• The tabular representation of this frequency distribution is known as Two Way
Frequency Table
• Following is a bivariate table for the data relating to marks in maths and
statistics
Marks in Mathematics
0-4 4-8 8-12 12-16 16-20 Total
Marks in Stats
0-4 1 1 2 0 0 4
4-8 1 4 5 1 1 12
8-12 1 2 4 6 1 14
12-16 0 1 3 2 5 11
16-20 0 0 1 5 3 9
Total 3 8 15 14 10 50
Observations :
• A bivariate frequency distribution has m × n cells
• Some of the cell frequencies may be zero
From a bivariate distribution we can have the following two types of Uni-variate
distributions
i. Two Marginal Distributions
ii. m+n Conditional Distributions
From the above table the two marginal distributions are as follows,
Marks No of students
0-4 3
4-8 8
8-12 15
12-16 14
16-20 10
Total 50
Similarly we can have Marginal Distribution for marks in statistics
From the above table, an example a Conditional distribution of marks in Statistics when the
mathematics marks lie between 8-12
Marks No of students
0-4 2
4-8 5
8-12 4
12-16 3
16-20 1
Total 15
: 347 :
J. K. SHAH CLASSES Correlation
Bivariate Relationship
Between two variables x and y there can exist any of the following three relationship
a. Direct or Positive – with change in one variable x, the other variable y will also change
in the same direction. Eg: Price and quantity supplied: amount of rainfall and crop yield
b. Indirect or Inverse or Negative – With change in one variable, the other variable will
change in the opposite direction. Eg: Price and quantity demanded.
c. No relation – With change in one variable x, if another variable y doesn’t show any
specific trend (increasing or decreasing), then we say there exist no relation between x
and y.
1. Find the number of pairs of observation from the following data: r = 0.25, ∑ ( x − x )( y − y ) =
60, ∑ ( )
2
x−x = 90, SDy = 4.
2. The Cov(x, y) of the following data is: (1, 10), (2, 9), (3, 2), (4, 8), (5, 6), (6, 0):
a) 4.51
b) 3.51
c) – 4.42
d) – 3.42
3. Calculate Cov(x, y) for the following data: x = 30, y = 40, ∑ ( x − x )( y − y ) = - 235, n = 64.
a) 39.17
b) – 39.17
c) – 3.67
d) 39.71
4. Calculate the co-variance from the following data: ∑ x = 55, ∑ y = 74, ∑ xy = 411, n = 11.
a) 0.4
b) 0.5
c) 0.6
d) 3.72
5. Calculate r, if Cov (x, y) = 10, Var (x) = 6.25 and Var (y) = 31.36.
a) 0.71
b) – 0.71
c) 0.61
d) – 0.61
6. Karl Pearson’s coefficient of correlation between two variables X and Y is 0.52, their
covariance is + 7.8. If the variance of X is 16, then the standard deviation of Y series is:
a) 2.85
b) 3.25
c) 1.25
d) 3.75
7. From the following information calculate the value of “r”: ∑ x = 125, ∑ y = 100, ∑ x = 2
10. Mention the correct value of correlation coefficient for the following data:
X: 2 3 5 8 9
Y: 4 6 10 16 18
a) 1
b) 0
c) –1
d) None of the above
11. Find correlation coefficient from the following data:
X: 10 12 13 16 17 20 25
Y: 19 22 24 27 29 33 37
a) – 0.987
b) 0.99
c) 0.895
d) – 0.99
12. Which of the following is the value of correlation coefficient for the following data:
i −X −Y −X i −Y
i =1 i =1 i =1
a) 0.987
b) – 0.75
c) 0.75
d) 0.85
13. Calculate coefficient of correlation for the data: n = 10, ∑ x = 140, ∑ y = 150,
∑ ( x − 10) = 180, ∑ ( y − 15) = 215, ∑ (x − 10)(y − 15) = 60.
2 2
14. Calculate correlation coefficient from the following data: n = 12, ∑ x = 120,
∑ y =130, ∑ ( x − 8) = 150, ∑ ( y − 10) = 200, ∑ (x − 8)(y − 10) = 50.
2 2
a) 0.215
b) – 0.215
c) – 0.317
d) None of the above
: 349 :
J. K. SHAH CLASSES Correlation
16. The values of co-variance of two variables x and y is 143/3 and the variance of x is
273/3 and the variance of y is 131/3. Then the coefficient of correlation is:
a) 0.48
b) 0.76
c) 0.87
d) 0.67
17. If n = 10 and ∑ D = 280, then which of the following represents the value of rank
2
correlation coefficient?
a) 0.70
b) – 0.7
c) 0.645
d) None of the above
18. For two series we have, ∑ D = 30 and N = 10, find the value of R. (symbols having
2
usual meanings).
a) 0.28
b) – 0. 82
c) 0. 82
d) – 0.28
19. The ranks according to two attributes in a sample are given below. The rank correlation
between them would be:
R1: 1 2 3 4 5
R2: 1 2 3 4 5
a) 1
b) 0
c) –1
d) None of the above
20. The coefficient of rank correlation between the marks in Statistics and Mathematics
obtained by a certain group of students is 2/3 and the sum of the squares of the
differences in ranks is 55. How many students are there in the group?
a) 10
b) 9
c) 12
d) more than 15
: 350 :
J. K. SHAH CLASSES Correlation
21. Eight contestants in a musical contest were ranked by two judges A and B in the
following manner. The rank correlation coefficient is:
Judge Judge
Candidates
A B
1 7 5
2 6 4
3 2 6
4 4 3
5 5 8
6 3 2
7 1 1
8 8 7
a) 0.65
b) 0.63
c) 0.60
d) 0.57
22. The coefficient of rank correlation of marks obtained by 9 students was calculated to be
0.4. it was later discovered that the value of difference between ranks for one student
was written wrongly as 6 instead of 8. Find the correct value of coefficient of rank
correlation. (Answer is 0.17)
23. If R (rank correlation coefficient) = 0.60 and N (no. of observations) = 10, find the value
of ∑ D2 , where D is the difference in ranks of the two series?
a) 61
b) 64
c) 74
d) 66
24. From the following data calculate the value of coefficient of Rank correlation:
X: 75 88 95 70 60 80 81 50
Y: 120 134 150 115 110 140 142 100
a) 0.93
b) – 0.85
c) 0.85
d) 0.63
: 351 :
J. K. SHAH CLASSES Correlation
25. Find the coefficient of rank correlation between marks obtained by students in Auditing
and Statistics:
Student
Marks in Marks in
(Roll
Auditing Statistics
No.)
1 30 15
2 20 40
3 40 40
4 50 45
5 30 20
6 20 30
7 30 15
8 50 50
9 10 20
10 0 10
a) 0.63
b) 0.83
c) 0.36
d) None of the above
27. Calculate the coefficient of correlation using the method of concurrent deviations from
the following data:
Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989
Supply 120 110 120 119 140 125 127 119 140 160
Demand 240 250 260 266 232 245 255 267 268 239
28. What is the coefficient of concurrent deviations for the following data:
Supply: 68 43 38 78 66 83 38 23 83 53 48
Demand: 65 60 55 61 35 75 45 40 85 80 85
a) 0.82
b) 0.85
c) 0.89
d) – 0.81
: 352:
J. K. SHAH CLASSES Correlation
30. What is the coefficient of concurrent deviations for the following data:
Supply : 68 43 38 78 66 83 38 23 83 53 48
Demand : 65 60 55 61 35 75 45 40 85 80 85
a) 0.82
b) 0.85
c) 0.89
d) -0.81
31. Calculate the coefficient of concurrent deviation for the following data:
Supply 65 40 35 75 63 80 35 20 80 60 50
Demand 60 55 50 56 30 70 40 35 80 75 80
a) rc=0.085
b) rc=0.894
c) rc=+0.85
d) rc=+0.0855
32. The coefficient of concurrent deviation for p pairs of observation was found to be 1/ 3.
If the number of concurrent deviations was found to be 6, then the value of p
a) 10
b) 9
c) 8
d) None of these
34. Given the coefficient of correlation between x and y is 0.6, write down the correlation
coefficient between u and v where 2u – 3x + 4 = 0 and 4v – 16y + 11 = 0
a) 0.6 b) -0.6 c) 0.8 d) -0.8
35. If rxy = 0.6, what will be the value of ruv, where u = 3x + 5 and v = - 4y + 3?
a) – 0.6
b) 0.6
c) 0.36
d) 0.66
36. If u+5x=6 and 3y-7v=20 and the correlation coefficient between x and y is 0.58, then
what would be the correlation coefficient between u and v?
a) 0.58
b) -0.58
c) -0.84
d) 0.84
: 353 :
J. K. SHAH CLASSES Correlation
6. Miscellaneous Concepts
40. If r = + 0.526 and no. of variables are 50, what is the probable error of “r”?
a) 0.06623
b) 0.6623
c) 1.6623
d) –0.6629
41. Given the coefficient of correlation between x and y is 0.6, write down the correlation
coefficient between u and v where 2u + 3x + 4 = 0 and 4v – 16y + 11 = 0
42. If rxy = 0.6, what will be the value of ruv, where u = 3x + 5 and v = 4y – 3?
a) – 0.6
b) 0.6
c) 0.36
d) – 0.36
44. A student calculates the value of r as 0.7 when the value of n is 5 and concludes that r
is highly significant, is he correct?
45. Find the coefficient of correlation r, when its probable error is 0.2 and the number of
pairs of items is 9.
: 354 :
J. K. SHAH CLASSES Correlation
7. Theoretical Aspects
Introduction:
47. If r = 0, then there is _______ correlation between the two variables.
a) Negative
b) Positive
c) No Linear
d) Both positive and negative
48. If the variables are inversely proportional to each other, then statistically they are
_________ correlated.
a) Positively
b) Not
c) Negatively
d) None of the above
50. For which of the following statements the correlation will be positive?
a) Age and Income
b) Speed of an automobile and the distance required to stop the car after applying
brakes
c) Sale of cold-drinks and day temperature
d) All of the above are positively correlated
51. For which of the following statements the correlation will be negative?
a) Production and price per unit
b) Sale of woolen garments and day temperature
c) Only a) above
d) Both a) and b) above
52. Which of the following is the example of variables which are uncorrelated?
a) Profit & Investment
b) Price & Demand of an item
c) Shoe-size & Intelligence
d) Yield & Rainfall
: 355 :
J. K. SHAH CLASSES Correlation
54. If the value of r is “+”, then there is a ________ correlation between the variables.
a) Negative
b) Positive
c) Both negative and positive
d) Neither positive nor negative
55. If the variables are directly proportional to each other, then statistically they are
_________ correlated.
a) Positively
b) Not
c) Negatively
d) None of the above
Properties:
57. In case the correlation coefficient between two variables is 1, which of the following
would be the relationship between the two variables?
a) y = p + qx, q > 0
b) y = p + qx, q < 0
c) y = p + qx, p > 0, q < 0
d) Both a) and b) above
58. If the relationship between two variables x and y is given by 22x + 33y + 84 = 0, then
the value of correlation coefficient between x and y will be:
a) 1.00
b) 0
c) – 1.00
d) Between 0 and 1.00
60. Rank correlation is useful when we study the relationship between __________
characteristics.
a) Quantitative
b) Qualitative
c) Both a) and b) above
d) Either a) or b) above
: 356:
J. K. SHAH CLASSES Correlation
62. If x and y are two correlated variable, then which of the following is true?
a) Var(x+y)=Var(x)+Var(y)+ 2Cov(x,y)
b) Var(x+y) = Var(x) + Var(y)
c) Var(x – y) = Var(x) – Var(y)
d) None of the above is true
63. If x and y are two uncorrelated variables, then which of the following relation is TRUE?
a) Var (x + y) = Var (x) + Var (y)
b) Var (x - y) = Var (x) + Var (y)
c) Both a) and b) above are true
d) Only a) above is true
Application of r:
64. ________ of correlation co-efficient is a measure of reliability.
a) Standard Error
b) Statistical Error
c) Probable Error
d) Both a) and c) above
69. The ratio of coefficient of non-determination and the square root of the number of
variables is known as:
a) Probable Error
b) Error
c) Standard Error
d) None of the above
Scatter Diagram :
70. _____________ is the simplest way of the diagrammatic representation of bivariate
data.
a) Time Series Graph
b) Radar Diagram
c) Frequency Polygon
d) Scatter Diagram
71. The scatter diagram gives us the ______ ideas about the correlation between the two
variables.
a) Exact
b) Vague
c) No
d) None of the above
73. On a scatter diagram if one is not able to recognize a specific pattern, then the two
variables depicts _______ correlation.
a) No
b) Positive
c) Both negative and positive
d) Perfectly Negative
: 358 :
J. K. SHAH CLASSES Correlation
Bivariate Data:
74. For bivariate data, there exist how many types of relationship between the variables?
a) One
b) Two
c) Three
d) More than three
75. If the data are collected for two variables simultaneously, it is known as:
a) Univariate Data
b) Bivariate Data
c) Conditional Data
d) Marginal Data
76. The frequency distribution related to two variables collected at the same point of time is
known as:
a) Bivariate Frequency Distribution
b) Joint Frequency Distribution
c) Two-way Frequency Distribution
d) All of the above
77. From the Bivariate Frequency Distribution, we can obtain which of the following
Univariate distribution?
a) Marginal distribution
b) Conditional distribution
c) Both a) and b) above
d) Neither a) nor b) above
79. In a bivariate frequency distribution, if there m classification for x and n classification for
y, then there would be altogether ____ conditional distributions.
a) m + n
b) m – n
c) m
d) mn
80. In (m × n) bivariate frequency table, the maximum number of marginal distributions is:
a) 1
b) 2
c) No marginal distribution exists
d) m+n
81. In a (m x n) bivariate distribution table, some of the cell frequencies may be:
a) Zero
b) Negative
c) Both a) and b) above
d) None of the above
: 359 :
J. K. SHAH CLASSES Correlation
Corrected Correlation:
82. A computer, while calculating the correlation coefficient between the variables x and y
obtained the following constants:
N=30,∑x=120, ∑x2=600 ∑y2=250, ∑xy=356, ∑y=90. It was, however , later discovered
at the time of checking that it had copied down two pairs of observations.
X y X y
8 10 While the corrected values were 8 12
12 7 10 8
Obtain the correct value of the correlation coefficient between x and y.
83. In order to find the correlation coefficient between two variables X and Y from 12
observations, the following calculations were made: ∑ x = 30, ∑ y = 5, ∑ x = 670,
2
∑ y = 285 & ∑ xy = 334. On subsequent verification it was found that the pair (X = 11, Y
2
= 4) was copied wrongly, the correct value being (X = 10, Y = 14). Which of the
following is the correct value of correlation coefficient?
a) – 0.87
b) – 0.78
c) 0.87
d) 0.78
47 c 56 d 65 a 74 c
48 c 57 a 66 b 75 b
49 d 58 c 67 c 76 d
50 d 59 d 68 c 77 c
51 d 60 b 69 c 78 d
52 c 61 b 70 d 79 a
53 a 62 a 71 b 80 b
54 b 63 c 72 b 81 a
55 a 64 d 73 a
: 360 :