Chapter 3

UNIVERSITY OF NEGROS OCCIDENTAL - RECOLETOS
GRADUATE SCHOOL
First Semester 2022 - 2023

Chapter 3. Correlation and Regression
According to Surbhi (2017), the correlation and regression are the two analysis based on multivariate
distribution. A multivariate distribution is described as a distribution of multiple variables. Correlation is described as the
analysis which lets us know the association or the absence of the relationship between two variables ‘x’ and ‘y’. On the
other end, Regression analysis, predicts the value of the dependent variable based on the known value of the
independent variable, assuming that average mathematical relationship between two or more variables.
Statistical correlation is measured by what is called the

coefficient of correlation (r). Its numerical value ranges from +1.0 to -
1.0. It gives us an indication of both the strength and direction of the
relationship between variables (Wilson, 2019).
Moreover, r > 0 indicates a positive relationship, r < 0 indicates

a negative relationship and r = 0 indicates no relationship (or that the
variables are independent of each other and not related). Here r =
+1.0 describes a perfect positive correlation and r = -1.0 describes a
perfect negative correlation. In addition, the closer the coefficients
are to +1.0 and -1.0, the greater the strength of the relationship
between the variables.
According to Cohen et al. (2002), the correlation between the

two variables can be strong or week. The value of the correlation
coefficient indicates the amount of variability shared between the
two variables and the strength of the relationship, where the
correlation coefficient always falls between – 1 and + 1.
There are no absolute criteria in interpreting the strength of the relationship based on the value of the
correlation coefficient, according to Salkind (2008) as shown in table 1. Furthermore, the rule of thumb generally a
accepted by social science researcher in interpreting the correlation coefficient.
Table 1
Correlation Coefficient General interpretation of the strength of relationship
± 0.8 to ± 1.0 Very strong/ Very high (± 1 = perfect relationship)
± 0.6 to ± 0.79 Strong/ High
± 0.4 to ± 0.59 Moderate/ Average
± 0.2 to ± 0.39 Weak/ Low
0 to ±0.2 Very weak/ Very low (0 = no relationship)
Test of Correlation/Relationship
Pearson’s correlation coefficient (Pearson’s – r) is the parametric test statistics that measures the statistical
relationship, or association, between two continuous variables. It is known as the best method of measuring the
association between variables of interest because it is based on the method of covariance. It gives information about
the magnitude of the association, or correlation, as well as the direction of the relationship.
To use Pearson correlation, your data must meet the following requirements:
1. Two or more continuous variables (i.e., interval or ratio level)
2. Cases must have non-missing values on both variables
3. Linear relationship between the variables
4. Independent cases (i.e., independence of observations)
5. Normality
6. Random sample of data from the population
7. No outliers
What values can the Pearson correlation coefficient take?

The Pearson correlation coefficient, r, can take a range of values from +1 to -1. A value of 0 indicates that there
is no association between the two variables. A value greater than 0 indicates a positive association; that is, as the value
of one variable increases, so does the value of the other variable. A value less than 0 indicates a negative association;
that is, as the value of one variable increases, the value of the other variable decreases. This is shown in the diagram
below:
Example:
An analyst is studying the relationship between shopping-center traffic and a department store’s daily sales. The
analyst develops an index to measure the daily volume of traffic entering the shopping center, and an index of daily sales
for 10 randomly selected days. Using 0.05 level of significance, is there significant relationship between shopping-center
traffic and a department store’s daily sales?
Traffic index (X) Sales index (Y)
71 250
82 280
111 301
85 325
89 328
110 390
111 410
121 420
129 450
132 475
Using the 5 steps of Hypothesis Testing.
1. State your null and alternative hypothesis

Ho: There is no significant relationship between shopping-center traffic and a department store’s daily
Sales.
H1: There is significant relationship between shopping-center traffic and a department store’s daily sales.
2. Level of significance.
α = 0.05
3. Statistical Tool
Pearson’s - r
4. Computation
Using JASP
1. Open the data
2. Click Regression and select
3. Direct Traffic index and Sales index to Variables, and go to Results
Pearson's Correlations
Traffic Sales
Variable
index index
1. Traffic Pearson'
—
index sr
p-value —
2. Sales Pearson'
0.908 —
index sr
p-value < .001 —
Traffic Sales
Variable
index index
Note: Before interpreting the result, determine first the df (degree of freedom). To determine the degree of freedom, find
the difference of n (number of participants) and 2 (independent and dependent variables). Since is n = 10, then 10 – 2 is
equal 8.
5. Making decision and conclusion
Based from the result, there was a very high positive relationship between Traffic index and Sales index [r(8) =
0.908, p < 0.001] at 0.05 level of significance. This implies, if the number of traffic index increases the number of sales
indexes also increases.
Activity:
With the growth of internet service providers, a researcher decides to examine whether there is a
relationship between cost of internet service per month and degree of customer satisfaction (on a scale of 1 -
10 with a 1 being not at all satisfied and a 10 being extremely satisfied). The researcher only includes programs
with comparable types of services. A sample of the data is provided below.
Pesos Satisfaction
1100 6
1800 8
1700 10
1500 4
900 9
500 6
1200 3
1900 5
2200 2
2500 10
Can we conclude that there was relationship between amount of money spent per month on internet provider
service and level of customer satisfaction? (use 0.05 level of significance)
Linear Regression
Statistical regression estimates relationships between independent variables and dependent variables.
Furthermore, regression models can be used to help understand and explain relationships among variables; they can
also be used to predict actual outcomes, according to Pardoe (2019).
An example of statistical regression is Linear regression. According to Prabhakaran (2017), it is used to predict
the value of an outcome variable Y based on one or more input predictor variables X. The aim is to establish a linear
relationship (a mathematical formula) between the predictor variable(s) and the response variable, so that, we can use
this formula to estimate the value of the response Y, when only the predictors (X’s) values are known.
The basic equation for Regression Line is (Nishishiba, 2014);
Where: Y = the dependent variable (value to be predicted)
X = The independent variable (predictor)
a = The point where the regression line crosses the Y axis, called the intercept.
b = The slope of the regression line, indicated the strength of the relationship between X and Y. (regression
coefficient).
Frost (2017) indicated, that regression analysis produces a regression equation where the coefficients represent
the relationship between each independent variable and the dependent variable. Where, equation is used to make
predictions.
Example:
An analyst wants to determine if shopping-center traffic can predict the department store’s daily sales. The
analyst develops an index to measure the daily volume of traffic entering the shopping center, and an index of daily sales
for 10 randomly selected days. Using 0.05 level of significance, is the shopping-center traffic index can predict the
department store’s daily sales?
Using the 5 steps of hypothesis testing
1. State null and alternative hypothesis

Ho: The shopping-center traffic index cannot predict the department store’s daily sales.
H1: The shopping-center traffic index can predict the department store’s daily sales.
2. Level of significance
α = 0.05
3. Statistical Tool
Linear Regression
4. Computation
Using JASP
1. Open data
2. Click Regression and select Linear Regression
3. Direct Traffic index to Covariates and Sales index to Dependent Variable, and go to Results
Model Summary - Sales index
Model R R² Adjusted R² RMSE
H₀ 0.000 0.000 0.000 76.355
H₁ 0.908 0.824 0.802 33.937
This table provides the R and R2 values. The R value represents the simple correlation and is 0.908 (the "R"
Column), which indicates a high degree of correlation. The R2 value (the "R Square" column) indicates how much of the
total variation in the dependent variable, sales, can be explained by the independent variable, traffic. In this case, 82.4%
can be explained, which is very large.
ANOVA
d
Model Sum of Squares Mean Square F p
f
H₁ Regression 43257.398 1 43257.398 37.560 < .001
Residual 9213.502 8 1151.688
Total 52470.900 9
Note. The intercept model is omitted, as no meaningful information can be shown.
This table indicates that the regression model predicts the dependent variable significantly well. How do we
know this? Look at the "Regression" row and go to the "Sig." column. This indicates the statistical significance of the
regression model that was run. Here, p < 0.001, which is less than 0.05, and indicates that, overall, the regression model
statistically significantly predicts the outcome variable (i.e., it is a good fit for the data).
The Coefficients table provides us with the necessary information to predict price from income, as well as
determine whether income contributes statistically significantly to the model (by looking at the "Sig." column).
Furthermore, we can use the values in the "B" column under the "Unstandardized Coefficients" column.
Coefficients
Model Unstandardized Standard Error Standardized t p
H₀ (Intercept) 362.900 24.146 15.030 < .001
H₁ (Intercept) 20.175 56.942 0.354 0.732
Traffic index 3.292 0.537 0.908 6.129 < .001
Results shows that the shopping-center traffic index can predict the department store’s daily sales [F(1, 8) =
37.560, p < 0.001, R = 0.908, R2 = 0.824] at 0.05 level of significance, with a linear equation of Sales = 20.175 + 3.292
(Traffic).
This implies that the increase of traffic index of the shopping-center anticipates also the increase of the
department store’s daily sales

To present the regression equation as:
Sales = 20.175 + 3.292 (Traffic)
Example:
1. Traffic index (X) = 60, what is the daily sales?
Then, Sales = 20.175 + 3.292 (60) = 217.695 or 218
2. Traffic index (X) = 150, what is the daily sales?
Then, Sales = 20.175 + 3.292 (150) = 513.975 or 514
Activity 3:
Given: The data of the 10 permanent government employees who have been infected with COVID – 19 and sent
to quarantine facility for health reasons. Given also their level of depression and self-esteem after infection.
Depression Self-Esteem
10 104
12 100
19 98
4 150
25 75
15 105
21 82
7 133
a. Determine if there is significant relationship between the level of depression and self-esteem of the employees.
b. Determine if the level of depression can predict the level of self-esteem of the employees. What will be the model
equation of the given data.
Multiple Regression
Multiple regression is a statistical technique that can be used to analyze the relationship between a single
dependent variable and several independent variables. The objective of multiple regression analysis is to use the
independent variables whose values are known to predict the value of the single dependent value. Each predictor value
is weighed, the weights denoting their relative contribution to the overall prediction.
Y =a+b1 X 1 +b 2 X 2 +. . .+bn X 2
The difference between Linear and Multiple Regression
The analyst utilized linear regression to explain the change of dependent variables using only one independent
variable. While in multiple regression, the analyst attempts to explain a dependent variable using more than one
independent variable.
A multiple regression considers the effect of more than one explanatory variable on some outcome of interest. It
evaluates the relative effect of these explanatory, or independent, variables on the dependent variable when holding all
the other variables in the model constant.
Example: An industrial psychologist conducts a study to examine those variables thought to be related to on-
the-job performance of technical employees. A random sample of 15 employees gives the following results.
Performance Ratings Job Aptitude Test In-service Training Units
54 15 8
37 13 1
30 15 1
48 15 7
37 10 4
37 14 2
31 8 3
49 12 7
43 10 9
12 3 1
30 15 1
37 14 2
61 14 10
31 9 1
31 4 5
Is job aptitude test results and in-service training units earned predicts the job performance of the employees?
(use 0.05 level of significance)
Using the 5 steps of hypothesis testing
1. State the null and alternative hypothesis

Ho: The job aptitude test results and in-service training units earned cannot predict the job performance of
the employees.
H1: The job aptitude test results and in-service training units earned can predict the job performance of
the employees.
.
2. Level of significance
α = 0.05
3. Statistical Tool
Multiple Regression
4. Computation
Note: Establish first the relationship of job performance to job aptitude and in-service training units.
Using JASP
1. Open the data
2. Select Regression and click Correlation

3. Direct performance ratings, job aptitude test and in-service training units to Variables. Then go to the results.
Results
Performance Job Aptitude In-service Training
Variable
Ratings Test Units
Pearson'
1. Performance Ratings —
sr
p-value —
Pearson'
2. Job Aptitude Test 0.604 —
sr
p-value 0.017 —
3. In-service Training Pearson'
0.815 0.149 —
Units sr
p-value < .001 0.595 —
Results shows that the job performance of the employee’s have a high positive relationship with their job
aptitude test [r(13) = 0.604, p = 0.017] and in-service training units earned [r(13) = 0.815, p < 0.001] at 0.05 level of
significance.
Since the job performance of the employee’s have relationship with job aptitude test and in-service training
units earned, now we can employ multiple regression to determine if job aptitude test and in-service training units
earned predicts performance ratings of the employees.
4. Select Regression and click Linear Regression

5. Direct performance ratings to Dependent Variable and, job aptitude test and in-service training units to Covariates.
Then go to results.
6. Results
Model Summary - Performance Ratings

Model R R² Adjusted R² RMSE
H₀ 0.000 0.000 0.000 11.910
H₁ 0.950 0.902 0.886 4.028
ANOVA
Model Sum of Squares df Mean Square F p
H₁ Regression 1791.079 2 895.539 55.208 < .001
1
Residual 194.654 16.221
2
1
Total 1985.733
4
Note. The intercept model is omitted, as no meaningful information can be shown.
Coefficients
Mod
Unstandardized Standard Error Standardized t p
el
12.31 < .00
H₀ (Intercept) 37.867 3.075
4 1
0.01
H₁ (Intercept) 9.870 3.380 2.920
3
< .00
Job Aptitude Test 1.477 0.274 0.494 5.399
1
< .00
In-service Training Units 2.699 0.333 0.741 8.107
1
Based from the results, the job aptitude test and in-service training units predicts the job performance of the
employees [F(2, 12) = 55.208, p <0.001, R = 0.950, R2 = 0.902] at 0.05 level of significance, with model equation of:
Job Satisfaction = 9.870 + 1.477 (X1) + 2.699 (X2)

This implies that every time the job aptitude test results rises and additional in-service training units of the
employees, their job performance is increases.
Activity:
Determine if the level of preparedness and level of awareness of the participants towards natural disaster
predicts their level of resiliency. Refer your data on ACTIVITY 1 and use 0.05 level of significance.
Reference:
Cohen, J., et al. (2002), Applied multiple regression/correlation analysis for behavioral sciences (3 rd ed). Mahwah, NY:
Erlbaum.
Daniel, W. and Terrel, J (1986) Business Statistics (Basic Concepts and Methodology) 4 th Edition. Houghton Mifflin
Company, Boston.
Nishishiba, M. (2014), Research Methods and Statistics for Public and Nonprofit Administrators, Sage.
Pardoe, I (2019), Regression, https://www.statistics.com/courses/regression-analysis/, Date Retrieved: March 13, 2020,

Friday.
Prabhakaran, S. (2017), Linear Regression, http://r-statistics.co/Linear-Regression.html, Date Retrieved: March 13, 2020,
Friday.
Salkind, N.J. (2008), Statistics for people who (think they) hate statistics. Thousand Oaks, CA: Sage.
Surbhi, S. (2017), Difference Between Correlation and Regression, https://keydifferences.com/difference-between-

correlation-and-regression.html, Date Retrieved March 13, 2020, Friday
Wilson, L. (2019), Statistical Correlation, https://explorable.com/statistical-correlation, Date Retrieved March 13, 2020,
Friday.

Chapter 3

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3

Uploaded by

Copyright:

Available Formats

UNIVERSITY OF NEGROS OCCIDENTAL - RECOLETOS

First Semester 2022 - 2023

Statistical correlation is measured by what is called the

Moreover, r > 0 indicates a positive relationship, r < 0 indicates

According to Cohen et al. (2002), the correlation between the

± 0.8 to ± 1.0 Very strong/ Very high (± 1 = perfect relationship)

± 0.6 to ± 0.79 Strong/ High

± 0.4 to ± 0.59 Moderate/ Average

± 0.2 to ± 0.39 Weak/ Low

0 to ±0.2 Very weak/ Very low (0 = no relationship)

What values can the Pearson correlation coefficient take?

Traffic index (X) Sales index (Y)

Using the 5 steps of Hypothesis Testing.

1. State your null and alternative hypothesis

1. Open the data

2. Click Regression and select

3. Direct Traffic index and Sales index to Variables, and go to Results

5. Making decision and conclusion

Using the 5 steps of hypothesis testing

1. State null and alternative hypothesis

2. Click Regression and select Linear Regression

Note. The intercept model is omitted, as no meaningful information can be shown.

5. Making decision and conclusion

department store’s daily sales

Then, Sales = 20.175 + 3.292 (60) = 217.695 or 218

2. Traffic index (X) = 150, what is the daily sales?

Then, Sales = 20.175 + 3.292 (150) = 513.975 or 514

The difference between Linear and Multiple Regression

Performance Ratings Job Aptitude Test In-service Training Units

Using the 5 steps of hypothesis testing

1. State the null and alternative hypothesis

2. Select Regression and click Correlation

4. Select Regression and click Linear Regression

Model Summary - Performance Ratings

Note. The intercept model is omitted, as no meaningful information can be shown.

5. Making decision and conclusion

Job Satisfaction = 9.870 + 1.477 (X1) + 2.699 (X2)

employees, their job performance is increases.

Pardoe, I (2019), Regression, https://www.statistics.com/courses/regression-analysis/, Date Retrieved: March 13, 2020,

Surbhi, S. (2017), Difference Between Correlation and Regression, https://keydifferences.com/difference-between-

You might also like