You are on page 1of 2

Correlations

Pearson’s r correlation is a bivariate, correlational and parametric statistical test that measures
the strength of the linear relationship between two variables. Pearson’s r correlation assumes
that the variables are measured at least at the interval scale and that these two variables are
proportional, meaning they are linearly related (Hill and Lewicki, 2006). The assumptions of
Pearson’s r correlation are the following: (1) the units of measurement are the same for all
variables, (2) there is a linear relationship between the two variables, (3) the variables are
either normal or lognormal in distribution (Rollinson, 2014). Pearson’s r correlation requires
a multivariate normal distribution (Reimann et al., 2002).

The correlation coefficient can be misleading, as it is influenced by data outliers and skewed
data distributions (Filzmoser and Hron, 2009). The Pearson correlation coefficient, r, is also
known as the Pearson’s product moment correlation coefficient. The coefficient of
determination, r2, is calculated by taking the square of the Pearson’s correlation coefficient.
The coefficient of determination is a measure of the amount of shared variance between two
variables. When converted into a percentage, this measure indicates the proportion of shared
variance between two variables.

Prior to Pearson’s r correlation, a logarithmic transformation of the univariate data is


recommended, as is a visual inspection of the univariate distribution (Reimann et al. 2008).
Due to the Euclidean geometry of compositional data, correlation analysis using raw and
logarithmically transformed data is discouraged (Filzmoser and Hron, 2009). The use of
logarithmically transformed geochemical data with correlation analysis can lead to
misleading results due to the bias in the relationship between two variables (Filzmoser and
Hron, 2009). The constant sum problem affects the results of Pearson’s r correlation by
introducing negative bias into the results (Rollinson, 2014). The closure of geochemical data
affects the results of Pearson’s r correlation by forcing two variables to be associated
(Rollinson, 2014).

The use of data transformations, nonparametric correlation methods or robust correlation


measurements account for data outliers and skewed data distributions (Filzmoser and Hron,
2009). Spearman’s rho correlation is the nonparametric analog to the Pearson’s r correlation.
Spearman’s correlation is advantageous in that this test can be applied to ranked data and this
test is preferred to Pearson’s r correlation when the data is not normally distributed or
contains outliers (Rollinson, 2014). Spearman’s rho correlation computes a correlation
coefficient, rs , based on ranked data. Spearman’s rho correlation assumes that the variables
are measured at least at the ordinal scale (Hill and Lewicki, 2006). Spearman’s rho
correlation is preferred over Pearson’s r correlation when one or both variables are non-
normally distributed or contain outliers (Cooksey, 2014). The results of the Spearman rho
correlation for raw and logarithmically transformed geochemical data are the same; therefore,
a logarithmic transformation of the geochemical data is not needed prior to Spearman rho
correlation (Reimann et al., 2008).
A correlation coefficient includes both strength and direction. The strength of association
between two variables can range from weak to very strong. The variables can either move in
a positive or negative direction. A correlation matrix displays the correlation coefficients
between variable pairs. A correlation coefficient is calculated using the equation:
rumus

A correlation coefficient ranges from -1 to +1, a perfect negative and perfect positive
correlation, respectively. A correlation coefficient of zero corresponds to no correlation. A
correlation coefficient between +/- 0.5 to 0.7 is good, while a correlation coefficient between
+/- 45 0.7 to 1.0 is strong. A positive association is one in which both variables increase. A
negative association occurs when one variable increases and the other variable decreases.

The interpretation of Pearson’s r correlation and Spearman’s rho correlation tests are the
same. The probability of a statistically significant relationship between the two variables, x
and y, is denoted by the p-value. In a two-tailed test, the null hypothesis states that there is no
correlation between x and y and the population correlation coefficient is equal to zero. The
alternative hypothesis states that the population correlation coefficient is not equal to zero
and there is a correlation between x and y.

A strong correlation, but non-linear relationship between two variables, can be misleading
(Hill and Lewicki, 2006). Correlation between variables is often forced and negative bias is
introduced into correlation due to the constant sum constraint of geochemical data (Rollinson,
1993). Closed data affects the results of correlation analysis (Rock, 1988). Rock (1988)
suggests that when using closed data, a negative correlation is less significant, and a positive
correlation is more significant compared to open data. Data transformation does not correct
for data closure (Reimann et al., 2002). A significant difference between the Pearson’s r and
Spearman’s rho correlation coefficients suggests that the Pearson’s correlation coefficient is
influenced by data abnormalities (Cooksey, 2014).

You might also like