Professional Documents
Culture Documents
Chapter 4: Correlation: 4.1 Association Between Variables
Chapter 4: Correlation: 4.1 Association Between Variables
CHAPTER 4: CORRELATION
4.2 Scattergrams
A scattergram is a graph of two variables (usually labelled x and y) that is plotted in
order to illustrate the relationship between them, if any. It is constructed by drawing the
usual x-axis and y-axis and then plotting a point for each pair of x- and y-values in the
dataset.
Example 4.2.1: A sample of five steel cables in a workshop yielded the following data,
which is illustrated in a scattergram.
Unlike the axes usually drawn in mathematics, it is not necessary to show the origin in
a scattergram; the axis for each variable need be drawn only for the range of values that is
required for that variable.
27
CHAPTER 4 CORRELATION
28
CHAPTER 4 CORRELATION
said to be a positive, or direct, one. When high values of one variable tend to be associated
with low values of the other, the relationship is said to be a negative, or inverse, one.
29
CHAPTER 4 CORRELATION
Note that values close to zero (whether positive or negative) indicate low (or weak)
correlation, whereas values close to 1 in size (whether positive or negative) indicate high (or
strong) correlation.
A serious and common misunderstanding about correlation is that “high correlation
implies causation”; in other words, if there is moderate to strong correlation between two
variables, then one of the variables “is caused by” or “depends on” the other one. This is not
necessarily so. For example, a high positive correlation has been found to exist between the
number of murders in the UK over a number of years and the number of marriages in the
Anglican Church over the same period; it is obviously nonsense to suggest that “murders
cause marriages” or that “marriages cause murders”. Often the explanation for such so-called
“nonsense correlations” is that the two variables in question are affected by (“caused by”) a
third variable or several other variables. In this example, the number of murders and the
number of marriages occurring over time are both clearly affected by the growth in
population over time.
A high correlation between two variables merely indicates that there is a
mathematical relationship between the two sets of values; further research is required in
order to determine whether there is a causal relationship between the two variables or
whether there is some other explanation for the observed association between them.
I 4 5 -1 1
J 3 7 -4 16
Total 55 55 0 306
n = 10
= 1 - 6(306)/(10*99) = -0.85
If one has data for one variable measured on an interval or ratio scale and for another
variable measured on an ordinal scale, one can convert the former values into rank values and
then apply Spearman’s coefficient. When ranking data, if there are ties, each of the common
values must be given the average of the rank positions that they cover, as described in Section
1.10.
31