You are on page 1of 23

Correlation, Partial

Correlations and regression


Assumptions
• Assumption #1: Your two variables should be measured at
the interval or ratio level (i.e., they are continuous). Examples of variables
that meet this criterion include revision time (measured in hours),
intelligence (measured using IQ score), exam performance (measured from 0
to 100), weight (measured in kg), and so forth. You can learn more about
interval and ratio variables in our Types of Variable guide.
• Assumption #2: There is a linear relationship between your two variables.
Whilst there are a number of ways to check whether a linear relationship
exists between your two variables, we suggest creating a scatterplot using
SPSS Statistics, where you can plot the one variable against the other
variable, and then visually inspect the scatterplot to check for linearity. Your
scatterplot may look something like one of the following:
• Assumption #3: There should be no significant outliers. Outliers are
simply single data points within your data that do not follow the
usual pattern (e.g., in a study of 100 students’ IQ scores, where the
mean score was 108 with only a small variation between students,
one student had a score of 156, which is very unusual, and may even
put her in the top 1% of IQ scores globally).
• Assumption #4: Your variables should be approximately normally
distributed. To test for normality you can use the Shapiro-Wilk test of
normality. Value of the Shapiro-Wilk Test is greater than 0.05, the
data is normal.
One-tailed vs Two-tailed test
• A one-tailed test should be selected when you have a directional
hypothesis (e.g. ‘the more anxious someone is about an exam, the
worse their mark will be’).
• A two-tailed test (the default) should be used when you cannot
predict the nature of the relationship (i.e. ‘I’m not sure whether
exam anxiety will improve or reduce exam marks’).
• Therefore, if you have a directional hypothesis click on 1-tailed ,
whereas if you have a non-directional hypothesis click on 2-tailed .
Example
• Our researcher predicted that (1) as anxiety increases, exam
performance will decrease, and (2) as the time spent revising
increases, exam performance will increase. Both of these are
directional hypotheses, so both tests are one-tailed.
How to Interpret a Correlation Coefficient r using
temprate.sav (these data relate people's body
temperatures and heart rates)
• Exactly –1. A perfect downhill (negative) linear relationship
• –0.70. A strong downhill (negative) linear relationship
• –0.50. A moderate downhill (negative) relationship
• –0.30. A weak downhill (negative) linear relationship
• 0. No linear relationship
• +0.30. A weak uphill (positive) linear relationship
• +0.50. A moderate uphill (positive) relationship
• +0.70. A strong uphill (positive) linear relationship
• Exactly +1. A perfect uphill (positive) linear relationship
Coefficient of determination, R^2
• A measure of the amount of variability in one variable that is shared by the other.
• For example, we may look at the relationship between exam anxiety and exam
performance. Exam performances vary from person to person because of any number of
factors (different ability, different levels of preparation and so on). If we add up all of this
variability (rather like when we calculated the sum of squares in section 2.4.1) then we
would have an estimate of how much variability exists in exam performances. We can then
use R2 to tell us how much of this variability is shared by exam anxiety. These two variables
had a correlation of −0.4410 and so the value of R2 will be (−0.4410)2 = 0.194. This value
tells us how much of the variability in exam performance is shared by exam anxiety.
• If we convert this value into a percentage (multiply by 100) we can say that exam anxiety
shares 19.4% of the variability in exam performance. So, although exam anxiety was highly
correlated with exam performance, it can account for only 19.4% of variation in exam
scores. To put this value into perspective, this leaves 80.6% of the variability still to be
accounted for by other variables.
Partial Correlation
• Partial correlation is a measure of the strength and direction of a
linear relationship between two continuous variables whilst
controlling for the effect of one or more other continuous variables
(also known as 'covariates' or 'control' variables).
Assumptions
• Assumption #1: You have one (dependent) variable and one (independent) variable and these are both
measured on a continuous scale (i.e., they are measured on an interval or ratio scale). Examples
of continuous variables include revision time (measured in hours), intelligence (measured using IQ
score), exam performance (measured from 0 to 100), weight (measured in kg), temperature (measured
in °C), sales (measured in US dollars), and so forth.
• Assumption #2: You have one or more control variables, also known as covariates (i.e., control variables
are just variables that you are using to adjust the relationship between the other two variables; that is,
your dependent and independent variables). These control variables are also measured on
a continuous scale (i.e., they are continuous variables). Examples of continuous variables are provided
above.
• Assumption #3: There needs to be a linear relationship between all three variables. That is, all possible
pairs of variables must show a linear relationship. This is often accomplished by visually inspecting a
scatterplot.
• Assumption #4: There should be no significant outliers. Outliers are simply single data points within
your data that do not follow the usual pattern. Partial correlation is sensitive to outliers, which can have
a very large effect on the line of best fit and the correlation coefficient, leading to incorrect conclusions
regarding your data. Therefore, it is best if there are no outliers or they are kept to a minimum.
• Assumption #5: Your variables should be approximately normally distributed. This can be achieved
using the Shapiro-Wilk test of normality, which is easily tested for using SPSS Statistics.
How to report correlation coefficents
• Five things to note are that:
(1) there should be no zero before the decimal point for the correlation
coefficientor the probability value (because neither can exceed 1)
(2) coefficients are reported to 2 decimal places
(3) if you are quoting a one-tailed probability, you should say so
(4) each correlation coefficient is represented by a different letter (and
some of them are Greek!);
(5) there are standard criteria of probabilities that we use (.05, .01 and
.001)
Example
• There was a significant relationship between the number of adverts
watched and the number of packets of sweets purchased, r = .87, p
(one-tailed) < .05.
• Exam performance was significantly correlated with exam anxiety, r =
−.44, and time spent revising, r = .40; the time spent revising was also
correlated with exam anxiety, r = −.71 (all ps < .001).

You might also like