# From the SelectedWorks of Durgesh C Pathak

January 2009

Correlation using SPSS

Contact Author

Notify Me of New Work

Available at: http://works.bepress.com/durgesh_chandra_pathak/11

Correlation using SPSS*
D.C. Pathak
Primary Text Book: Discovering Statistics Using SPSS, 2nd ed., Andy Field, 2005. *This Presentation has borrowed heavily from the aforesaid book.

y ∑( xi − x )( yi − y ) = ( N − 1) The numerator of the above equation is Cross-product deviation. Age and Memory. Problem with Covariance: It depends on the scales of measurement. When two variables are measured on different units.g. How to solve this problem? Standardization: Converting covariance into a standard set of units by dividing it with standard deviations of the two variables.∑( xi − x ) 2 Variance( s ) = ( N − 1) 2 What happens when there are two variables? Covariance: Measuring Relationships (how?) Covariance Together Changing We get Cross-product deviations which are deviation of each variable from its mean. 2 . e. cov x ..

What is the relationship between two (or more) variables. When one variable varies.Correlation: Standardized covariance. 3 . negative (case 2) or zero (case 3). So. The second one remains unchanged when the first one varies. A measure of Linear relationship between variables. The second one decreases when the first one increases. ∑( xi − x )( yi − y ) r= = S X SY ( N − 1) S X SY cov xy Where r=Pearson’s Correlation Coefficient. 2. r can also be positive (case 1). The second one increases when the first one increases. 3. there are three possibilities: 1. The value of r varies between -1 to +1 through 0.

When r=+1. it implies that the two variables are not related to each other (they don’t change together). 4 . it implies that the two variables have a perfect positive relation. When r=-1. it implies that the two variables are perfectly related in a negative manners (when one increases. the other decreases). When r= 0.

Simple Scatter Plot: Used when there are two variables. Graphs Scatterplot Interactive X-axis (Independent variable) Y-axis (Dependent variable) OK 5 .Scatter Plots: Graphs of Relationship 1.

00 A A A A A A AA A A AA A A A A A A A A A A A A A AAA A A A A A A A A A A A AA A A A AA AA A A A A A A A A A A A Gender Male Female Exam Performance (%) AA A A A A AA A A A 75.00 A AA AA A 50.00 Exam Anxiety Simple Scatterplot 6 .00 50.00 25.00 AA AA A A A AA AA A A A A A AA A A A A A A 25.00 0.00 0.00 100.100.00 75.

3-D Scatterplot: Used to show relationship among three variables. Cumbersome to use. Graphs Scatterplot Interactive 3-D Coordinate X-axis (Independent variable) Y-axis (Dependent variable) Z-axis (variable which is related with both X and Y) OK 7 .

A A A Gender Male Female A A A A A A AA A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A AA A A A A AA AA A A A A A A A A A A A A AA A A A A A A A A A A A A A A A A A A A A A 3-D Scatterplot 8 .

Several pairs of variables are plotted on the same axes. Graphs Scatterplots Overlay Scatterplot Form pairs of variables OK 9 .. E. if we are interested in knowing the relation between Anxiety and Exam performance and Revision time and Exam performance but not in Anxiety and Revision time.g.Overlay Scatterplot: When one variable is held constant and is plotted against several other variables.

Exam Performance (%) Overlay Scatterplot 10 Exam Anxiety/Revision Time .

Time vs. Rev. Exam Performance C2: Anxiety vs. Exam Performance B3: Rev. Time A2: Anxiety vs.Matrix Scatterplot: Can be used in place of a 3-D Scatterplot. Time vs. Anxiety OK 11 . Graphs Scatterplot B1: Exam Performance vs. Anxiety C1: Exam Performance vs. Rev. Time Matrix Scatterplot Transfer variable to ‘Matrix Variables’ box A3: Rev.

A B C 1 2 3 Matrix Scatterplot 12 .

Bivariate Correlation Types of Correlation Partial (& Part) Correlation How to do Bivariate Correlation in SPSS? Analyze Correlate Bivariate Transfer variables Choose type of Correlation Choose type of test: Onetailed or Two-Tailed OK 13 Correlation between Two variables Observing relationship between two variables while controlling the effect of one or more additional variables. .

Now. anxiety before the exam. and marks obtained in exam of these students. Analyze Correlate Bivariate Transfer variables of interest to ‘Variables box’ Decide on Test of Significance: One-tailed/two-tailed Choose ‘Pearson’ in Correlation Coefficient OK 14 . We shall calculate Pearson’ Product-Moment Correlation Coefficient for these variables. we are going to use a data set of 103 students (52 Male and 51 Female) regarding the time spent in revision of subject. This data set has been downloaded from Andy Field’s website.

SPSS Output: Statistically significant Correlations are flagged by *. Correlation is significant at the 0. (2-tailed) N Pearson Correlation Sig.441** 1 .000 .709** .05 level and ** (two asterisk) implies that the result is significant at .397** 1 -.01 level (2-tailed). (2-tailed) N **. p (two-tailed) < . r = .441** .01 level.01 15 .397.000 103 103 103 -. Correlations Exam Time Spent Performance (%) Revising Exam Anxiety 1 .000 .709** -.000 . positive relationship between Time spent in revision by an individual and his exam performance. Reporting the Correlation: There is a statistically significant.000 103 103 103 . (2-tailed) N Pearson Correlation Sig.000 103 103 103 Time Spent Revising Exam Performance (%) Exam Anxiety Pearson Correlation Sig. One * means the result is significant at .397** -.

We can multiply R2 by 100 to express it in Percentage. E.Correlation and Causality: Correlation does not imply causality. Brain size and gender can be affected by size of the person.g. is called the Coefficient of Determination (R2) and can be used as a measure of the amount of variability in one variable that can be explained by the other. Correlation and Effect Size: Small Effect: r = ± . •Direction of causality: correlation coefficient says nothing which variable is causing the other to change. Coefficient of Determination (R2): The correlation coefficient (r) if squared.. Word of Caution: R2 does not imply causal relationship. So.4481% ⇒ Variations in exam performance of students can be explained by variations in their anxiety.3..g. why? •Third variable problem: there can be some other third variable affecting the relation between the two variables under question. in our previous example.441.5. 0. So. E.1 Medium Effect: r = ± .194481.194481x100=19. Large Effect: r = ± . R2= (-0. 16 .441)2=0. the r between anxiety and exam performance was -0.

we can use non-parametric correlations. Interval Data 4. Normally Distributed Data 2. Homogeneity of Variance 3. Independence When the data has violated any or all of these assumptions.Non-Parametric Correlations: Four assumptions should be met for the data to be Parametric: 1. 17 .

Decision Rule: If the test is Non-significant (p > . Analyze Descriptive Statistics Explore Transfer variables to ‘Dependent’ Box OK 18 . These test compare the scores in the sample to a normally distributed set of scores of same mean and standard deviation.05) ⇒the distribution does not differ significantly from a normal distribution.How can we know if Our Distribution is Normal or not? One can use Kolmogorov-Smirnov Test (K-S test) or Shapiro-Wilk Test. If the test is Significant (p < .05) ⇒ the distribution is Non-normal.

Tests of Normality Kolmogorov-Smirnov Statistic df Sig.002 . .822 103 Sig. our data came out to be Non-normal. .000 a. Decision Rule: • If Levene’s test is statistically non-significant (p > .955 103 .000 .135 103 .000 . 19 .804 103 .000 .05). It tests the hypothesis that the variances in the groups are equal. ⇒ Variances are Heterogeneous. Lilliefors Significance Correction Thus.153 103 .05) ⇒ Homogeneity of variances is maintained.179 103 . •If Levene’s test is statistically significant (p < . Testing Homogeneity of Variance: Levene’s test is used.000 a Time Spent Revising Exam Performance (%) Exam Anxiety Shapiro-Wilk Statistic df .

000 .997 Time Spent Revising Exam Performance (%) Exam Anxiety Based on Mean Based on Median Based on Median and with adjusted df Based on trimmed mean Based on Mean Based on Median Based on Median and with adjusted df Based on trimmed mean Based on Mean Based on Median Based on Median and with adjusted df Based on trimmed mean 20 .620 .177 101 Sig.267 .003 .606 .000 .606 .678 .000 df1 1 1 1 1 1 1 1 1 1 1 1 1 df2 101 101 99.068 .138 .795 .795 .892 101 101 101 99.160 .989 .Analyze Descriptive Statistics Explore Transfer variables to ‘Dependent’ box Put Categorical variable in ‘Factor’ box Spread vs.711 . .173 .068 .989 .690 .267 .247 .956 .318 101 101 101 100. Level with Levene’s Test OK SPSS output for Levene’s Test: Test of Homogeneity of Variance Levene Statistic .

Correlations Exam Time Spent Performance Revising (%) Exam Anxiety 1.Thus. Non-Parametric Correlations: 1.000 .350** -. d2 is the difference between the ranks and N is the number of cases.000 103 103 103 -.000 . (2-tailed) N Correlation Coefficient Sig. (2-tailed) N Correlation Coefficient Sig.000 . .000 .000 -.622** .000 . . 103 103 103 Spearman's rho Time Spent Revising Exam Performance (%) Exam Anxiety Correlation Coefficient Sig. Spearman’s Correlation Coefficient: It first ranks the data and then applies Pearson’s correlation to these ranks. 21 . Correlation is significant at the 0.405** .000 .350** 1. the assumption of Homogeneity of Variances is maintained for the data under consideration.000 103 103 103 .622** -.405** 1. (2-tailed) N **.01 level (2-tailed). 6 × ∑d 2 rs = 1 − 3 N −N Where rs is Spearman’s correlation coefficient.

263** 1. 22 . (2-tailed) N Correlation Coefficient Sig. .000 .000 .000 .2.000 103 103 103 -. Kendall’s Tau ( ): A non-parametric correlation coefficient and can be used in place of Spearman’s coefficient when one has a small data set with a large number of tied ranks.000 103 103 103 . Correlation is significant at the 0.000 -.285** . (2-tailed) N Correlation Coefficient Sig.000 .000 . τ Correlations Exam Time Spent Performance Revising (%) Exam Anxiety 1. (2-tailed) N **.285** 1. .000 .489** .263** -.489** -. 103 103 103 Kendall's tau_b Time Spent Revising Exam Performance (%) Exam Anxiety Correlation Coefficient Sig.01 level (2-tailed).

441** ..285** . Performance Rev. Time vs. Spearman and Kendall’s τ Pearson Rev. selection of appropriate correlation coefficient is important as it would affect the Coefficient of Determination (i.Comparison between Pearson.622** Kendall + .358** .405** . Anxiety + .e...489** Thus.263** . Time vs. 23 .397** .... Performance Anxiety vs.. the amount of variance we can explain).709** Spearman + .

. Correlations Exam Performance (%) 1 103 -. •Point-Biserial Correlation: When one variable is a discrete dichotomy.Correlation between a Continuous and a Discrete variable: •Biserial Correlation: When one variable is a continuous dichotomy. It depends on the way variables are categorized. If we reverse the coding.g.005 . the sign of Point-Biserial Correlation coefficient would also be reversed.963 103 1 103 The sign of the correlation coefficient becomes irrelevant in case of PointBiserial Correlation.g.963 103 Exam Performance (%) Gender Pearson Correlation Sig.. 24 . pregnancy. (2-tailed) N Gender -. (2-tailed) N Pearson Correlation Sig.005 . e. e. passing or failing an exam.

So.Partial Correlation: In our data set. Variance unique to Revision Time Exam Performance Revision Time Variance unique to Exam Anxiety Exam Anxiety Variance Explained by both Exam Anxiety and Revision Time 25 . Lets assume that Anxiety explains x% of variance in the Exam performance and Revision Time explains y% of variance in Exam performance while Revision Time also explains z% of the variation in Anxiety. all three variables are interrelated. Exam Performance is negatively related to Exam Anxiety but positively related with Revision Time.

if we want to find out the unique portions of variance then we need to do a partial correlation. 2005) Analyze Correlate Partial Transfer variables of interest to ‘Variables’ box Transfer the Control variable to ‘Controlling for’ box OK 26 . A Correlation between two variables in which the effect of other variables are held constant is known as Partial Correlation (Field.Now. Andy.

the unique variance in Exam Performance explained by Exam Anxiety is 27 only 6.1%. .012 100 Control Variables Time Spent Revising Exam Performance (%) Exam Anxiety Correlation Significance (2-tailed) df Correlation Significance (2-tailed) df Exam Anxiety -.247)2 = 0.061009 Or in percentage: 6.000 . 0 The Partial Correlation Coefficient between Exam Performance and Exam Anxiety is statistically significant but the variance explained has been reduced.First Order Correlation: When only one variable is controlled. Now. 0 -. Second Order Correlation: When two variables are controlled.1009%.000 .012 100 1. Thus. and so on. R2 for the unique r= (.247 . Correlations Exam Performance (%) 1.247 .