You are on page 1of 3

Correlation

Correlation - refers to the degree to which two or more events vary together.
- refers to the relationship or association between two or more variables. 

correlation coefficient – or the product-moment correlation is a simple descriptive statistic that measures the
strength of the linear relationship or association between variables, as might be visualized in a scatter plot.
Scatter plot / Scattergram – a graphic technique used to represent the relationship between two variables.

There are different types of correlation for different types of data:


1. correlation between interval data, the parametric Pearson-product correlation or ‘’r-value’’
2. correlation between ordinal data, the non-parametric Spearman-Rank correlation
3. many others. But all coefficients share in the common property that they range from +1.00 and -1.00.

---> because there are types of statistical tests:


1. Parametric test
-  when values are sampled from populations that follow a normal(Gaussian) distribution.
- Generally used for Interval and Ratio levels of measurement
- Z-test , T-test, F-test , Pearson-product correlation
2. Non-Parametric test
- when values are sampled from populations that follow an abnormal distribution.
- Usually applied to Nominal and Ordinal types
- Chi-Square test, Smirnov-Kolmagoror test, Kruskal-Wallis test, Spearman’s rank correlation

Measuring the correlation coefficient, ‘r’

o A correlation coefficient describes direction (positive or negative) and degree (strength) of relationship


between two variables. The higher the correlation coefficient, the stronger the relationship. When
dealing with a linear correlation the correlation coefficient (r) lies between -1 and 1

Three patterns that can be seen in scatterplots of data are:

Positive correlations
 When two or more events change in the same direction, they are
said to be positively correlated. Ex: between age and height
 Data has an upward trend.  As the independent variable (x-axis)
increases the dependent variable (y-axis) also increases in a linear
manner
 An r value of 1 suggests that there is a perfect linear association
present, which gives a perfect positive correlation. Ex: height in cm
and height in inches of subjects measured.
Perfect Correlation - An increase in one variable is accompanied by proportional increase in the
other variable. It can be connected with a straight line.

Negative correlation
 When two or more events change in opposite directions, they are
said to be negatively correlated. Ex: between interest rates and
lending activity
 Data has a downward trend.  As the independent variable (x-axis) increases, the dependent
variable (y-axis) decreases in a linear manner.

 An r value of -1 suggests that there is a perfect linear association present, which gives a perfect
negative correlation. Ex: the volume of gas decreases as pressure increases.

Nonexistent correlation
 Data has no trend.  There is no correlation between
the variables. The size of one variable is unrelated
to the size of the other variable.
 Ex: IQ and shoe size.

Nonlinear Correlation (curvilinear correlation)


 The graph is not in a straight line. The ratio does not remain constant.

**This shows the importance of plotting the data and not relying on summary statistics such as correlation coefficient. The
correlation coefficient may not show if it’s actually a nonlinear correlation.

Interpreting correlation using a scatter plot can also be subjective. Usually, a precise way to measure the type and strength of
a linear correlation between two variables is still to calculate the correlation coefficient. symbol r represents the sample correlation
coefficient. The formula for r is:
here’s a rough but useful guide to the degree of relationship indicated by the size of the coefficients.

value for r   Relationship bet. the variables


r=±1 Perfect correlation Perfect relationship
± 0.90 ≤ r ≤ ± 1.0 Very high correlation Very strong relationship
Pearson-Product moment ± 0.70 ≤ r ≤ ± 0.90 High correlation Marked relationship
coefficient of correlation ± 0.40 ≤ r ≤ ± 0.70 Moderate correlation Substantial Relationship
± 0.20 ≤ r ≤ ± 0.40 Low correlation Weak Relationship
± 0.00 ≤ r ≤ ± 0.20 Slight correlation Negligible Relationship
r=0 No correlation No Relationship / Uncorrelated

Perfect negative correlation High negative correlation

**The strength of the relationship is indicated by the absolute value of the correlation coefficient.
Interpretation and Significance:

Our obtained (sample ) r could reflect a statistically significant population correlation depending on:
a. the size of the correlation coefficient obtained
- this is as whether we reject the null hypothesis (there is a zero correlation) based if the obtained
coefficient equals or exceeds the tabled critical value at the level of risk chosen.
b. The size of the sample
- A good-sized sample must have a minimum of 30 participants

Things to Remember:

  "Correlation does not imply causation" Two events may vary together, but one does not necessarily
cause the other. For example, even though height and musical expression are positively correlated, no
one is likely to assert that if I grow taller then I would become more proficient in using musical
instruments.

Correlation should not be used when:


 There is a non-linear relationship between variables
 The are outliers
 There are distinct sub-groups
e.g. healthy controls with diseased cases
 If the values of one of the variables is determined in advance
e.g. Picking the doses of a drug in an experiment measuring its effect

Two examples of when not to use a correlation coefficient:

(a) When there is a non-linear relationship; (b) when distinct subgroups are present. In both of these examples
the correlation coefficient quoted is spurious.

You might also like