Correlation Analysis

(Testing if One Variable is Correlated with another Variable)

Introduction

Introduction

• Correlation analysis is common in research.

• For example, previous researches used correlation analysis

to investigate the relationship between

** Motivation and Academic success
**

High school grades and college academic performance

Teacher quality and student success

Crime rate and employment rate

Product quality and customer satisfaction

Salary and job satisfaction

Curriculum Instruction Educational Technology Research

no causal effect is implied. Analysis of the relationships among variables is called Correlation Analysis. Scatter plot (or scatter diagram) can be used to show the relationship between two numerical variables. It is only concerned with the strength and direction (whether positive or negative) of the relationship. Correlation coefficient is used to measure the direction and strength of the linear relationship between variables. It includes the measurement of the correlation between the variables.

Y) r Var (X)Var (Y) where: X and Y are variables under consideration.Pearson Correlation: Formula Cov(X. where: X and Y are variables under consideration. Pearson Correlation is also called "Pearson r" or "r")

The Values of Pearson r • Unit free • Pearson r ranges from -1 to +1 -1 0 +1 • Positive and negative signs indicate the direction of the relationship.

Positive r • The value of r is Scatter Plot Showing the Relationship Between X and Y positive if the values of X and Y go in the same 6 direction 4 Example: Y 2 X = Score in BEE Math Y = Score in BEE English 0 0 1 2 3 4 5 6 X

Negative r • The value of r is Scatter Plot Showing the Relationship negative if the values Between X and Y of X and Y go in 6 opposite direction 4 Example: Y 2 X = Employment Rate Y = Crime rate 0 0 1 2 3 4 5 6 X

Absolute value of r The absolute value of r indicates the strength of the linear relationship. – The closer to 1.0 the stronger the linear relationship – The closer to zero the weaker the linear relationship.

Sample Data with r =1.0 Data set A Scatter Plot of Data set A X Y with its Estimated Line (r=+1.0) 1 2 10 2 4 8 6 Y 3 6 4 2 4 8 0 0 1 2 3 4 5 6 5 10 X

Sample Data with r close to 1.0 Data set B Scatter Plot of Data set B X Y with its Estimated Line (r = +0.99) 1 2.0 2 4 7.5 10 8 2 3.5 6 Y 4 3 5.0 0 0 1 2 3 4 5 6 X 5 9.5

Sample Data with r = -1.0 Data set C Scatter Plot of Data set C X Y with its Estimated Line (r = -1.0) 1 10 10 8 2 8 6 Y 4 3 6 2 4 4 0 0 1 2 3 4 5 6 5 2 X

Sample Data with r close to -1.0 Data set D Scatter Plot of Data set D X Y with its Estimated Line (r = -0.96) 1 9 10 8 2 6.5 6 Y 3 6.9 4 2 4 3 0 0 1 2 3 4 5 6 5 1 X

Sample Data with r close to 0 Data set E Scatter Plot of Data set E with its Estimated Line X Y (r = 0.04) 1 1 10 8 2 10 6 Y 3 7 4 2 4 9 0 0 1 2 3 4 5 6 5 2 X

More on Scatter Plots Y Y Y X X X r = -1 r = -.6 r=0 Y Y X X r = +1 r = +.3

Pearson r: Guide Interpretation (Note: Rule of Thumb only) Absolute value of r Interpretation .70 or higher Very Strong .40 .69 Strong .30 .39 moderate .20 .29 weak .19 and below Negligible Source: http://faculty.quinnipiac.edu/libarts/polsci/statistics.html

When performing a correlation. (Or. you are testing the null hypothesis "Ho: There is no relationship between the two variables. the two variables are not correlated)" against the alternative hypothesis "Ha: There is relationship between the variables. (Or. the two variables are correlated)"

DECISION RULE: • Reject the Ho if the p-value associated with the computed Pearson r is less than .05. • Do not reject Ho if the p-value associated with the computed Pearson r is equal to or greater than .05

Testing Statistical Significance • Researcher's Goal: Investigate if there is correlation between the variables in the population using the sample data drawn randomly from that population. • Testing for Statistical Significance. Because the researcher will make inferences about the correlation in the population based only on information contain in the sample data. then statistical significance should be tested. A statistically significant finding (p < α. traditionally α = .05) is one that is determined statistically to be very unlikely to happen by chance. Suppose r = .4 with p=.02. This means that there is only 2% probability that the observed relationship could have happened by chance. Or. there is a high probability ( 98%) that the observed relationship would not happen by chance. Thus. the researcher should reject the null hypothesis that the variables are not correlated and conclude that the variables are significantly correlated.

Example 1: Correlation between product quality. price and company image Research Question: Are there significant relationships between product quality. price and company image? Hypothesis (Null form): There are no significant relationship between product quality. price and company image.

Data set: hatco.sav Variables: • Price level (X2) .perceived level of price charged by product suppliers • Price Flexibility (X3).perceived willingness of representatives to negotiate price on all types of purchases • Product Quality (X7) .perceived level of quality of a particular product • Manufacturer Image (X4).overall image of the manufacturer/supplier • Salesforce Image(X6) .overall image of the manufacturer's sales force Notes: Data are perceptions of n=100 customers. Perceptions were measured using graphic rating scale. where a ten centimeter line was drawn between the endpoints. labeled "Poor" and "Excellent".

Step-by-Step Procedure in SPSS Step 1: CLICK Analyze > Correlate >Bivariate Step 2: MOVE the following to the "Variables:" box Price Level (X2) Price Flexibility (X3) Product Quality (X7) Manufacturer's Image (X4) Salesforce Image (X6) Step 3: CLICK Ok.

SPSS Outputs Correlations Price Product Manufacturer Salesforce Price Level Flexibility Quality Image Image Price Level Pearson Correlation 1 -.186 -. 448** -.470 ** .788** Sig. (2-tailed) . .064 .000 .000 .000 N 100 100 100 100 100 Price Flexibility Pearson Correlation -.186 1 .116 -.272 ** .487 ** Sig. (2-tailed) .064 . .250 .006 .000 N 100 100 100 100 100 Product Quality Pearson Correlation -.448 ** .116 1 .735 .200* Sig. (2-tailed) .000 .250 . .000 .046 N 100 100 100 100 100 Manufacturer Image Pearson Correlation -.470** -.272** .735 1 .177 Sig. (2-tailed) .000 .006 .000 . .078 N 100 100 100 100 100 Salesforce Image Pearson Correlation .788** .487** .200* .177 1 Sig. (2-tailed) .000 .000 .046 .078 . N 100 100 100 100 100 **. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Summary of SPSS Outputs Table_ Correlation Matrix of Product Quality. Price and Company Image 1 2 3 4 5 1. Price Level 1 -.186 -. 448** -.470** .788** 2. Price Flexibility -.186 1 .116 -.272** .487** 3. Product Quality -.448** .116 1 .735 .200* 4. Manufacturer Image -.470** -.272** .034 1 .177 5. Salesforce Image .788** .487** .200* .177 1 ** p < .01. * p < .05. n=100

