Professional Documents
Culture Documents
GRADUATE SCHOOL
According to Surbhi (2017), the correlation and regression are the two analysis based on multivariate
distribution. A multivariate distribution is described as a distribution of multiple variables. Correlation is described as the
analysis which lets us know the association or the absence of the relationship between two variables ‘x’ and ‘y’. On the
other end, Regression analysis, predicts the value of the dependent variable based on the known value of the
independent variable, assuming that average mathematical relationship between two or more variables.
There are no absolute criteria in interpreting the strength of the relationship based on the value of the
correlation coefficient, according to Salkind (2008) as shown in table 1. Furthermore, the rule of thumb generally a
accepted by social science researcher in interpreting the correlation coefficient.
Table 1
Correlation Coefficient General interpretation of the strength of relationship
Test of Correlation/Relationship
Pearson’s correlation coefficient (Pearson’s – r) is the parametric test statistics that measures the statistical
relationship, or association, between two continuous variables. It is known as the best method of measuring the
association between variables of interest because it is based on the method of covariance. It gives information about
the magnitude of the association, or correlation, as well as the direction of the relationship.
To use Pearson correlation, your data must meet the following requirements:
1. Two or more continuous variables (i.e., interval or ratio level)
2. Cases must have non-missing values on both variables
3. Linear relationship between the variables
4. Independent cases (i.e., independence of observations)
5. Normality
6. Random sample of data from the population
7. No outliers
Example:
An analyst is studying the relationship between shopping-center traffic and a department store’s daily sales. The
analyst develops an index to measure the daily volume of traffic entering the shopping center, and an index of daily sales
for 10 randomly selected days. Using 0.05 level of significance, is there significant relationship between shopping-center
traffic and a department store’s daily sales?
71 250
82 280
111 301
85 325
89 328
110 390
111 410
121 420
129 450
132 475
2. Level of significance.
α = 0.05
3. Statistical Tool
Pearson’s - r
4. Computation
Using JASP
Pearson's Correlations
Traffic Sales
Variable
index index
1. Traffic Pearson'
—
index sr
p-value —
2. Sales Pearson'
0.908 —
index sr
p-value < .001 —
Pearson's Correlations
Traffic Sales
Variable
index index
Note: Before interpreting the result, determine first the df (degree of freedom). To determine the degree of freedom, find
the difference of n (number of participants) and 2 (independent and dependent variables). Since is n = 10, then 10 – 2 is
equal 8.
Based from the result, there was a very high positive relationship between Traffic index and Sales index [r(8) =
0.908, p < 0.001] at 0.05 level of significance. This implies, if the number of traffic index increases the number of sales
indexes also increases.
Activity:
With the growth of internet service providers, a researcher decides to examine whether there is a
relationship between cost of internet service per month and degree of customer satisfaction (on a scale of 1 -
10 with a 1 being not at all satisfied and a 10 being extremely satisfied). The researcher only includes programs
with comparable types of services. A sample of the data is provided below.
Pesos Satisfaction
1100 6
1800 8
1700 10
1500 4
900 9
500 6
1200 3
1900 5
2200 2
2500 10
Can we conclude that there was relationship between amount of money spent per month on internet provider
service and level of customer satisfaction? (use 0.05 level of significance)
Linear Regression
Statistical regression estimates relationships between independent variables and dependent variables.
Furthermore, regression models can be used to help understand and explain relationships among variables; they can
also be used to predict actual outcomes, according to Pardoe (2019).
An example of statistical regression is Linear regression. According to Prabhakaran (2017), it is used to predict
the value of an outcome variable Y based on one or more input predictor variables X. The aim is to establish a linear
relationship (a mathematical formula) between the predictor variable(s) and the response variable, so that, we can use
this formula to estimate the value of the response Y, when only the predictors (X’s) values are known.
The basic equation for Regression Line is (Nishishiba, 2014);
Where: Y = the dependent variable (value to be predicted)
X = The independent variable (predictor)
a = The point where the regression line crosses the Y axis, called the intercept.
b = The slope of the regression line, indicated the strength of the relationship between X and Y. (regression
coefficient).
Frost (2017) indicated, that regression analysis produces a regression equation where the coefficients represent
the relationship between each independent variable and the dependent variable. Where, equation is used to make
predictions.
Example:
An analyst wants to determine if shopping-center traffic can predict the department store’s daily sales. The
analyst develops an index to measure the daily volume of traffic entering the shopping center, and an index of daily sales
for 10 randomly selected days. Using 0.05 level of significance, is the shopping-center traffic index can predict the
department store’s daily sales?
2. Level of significance
α = 0.05
3. Statistical Tool
Linear Regression
4. Computation
Using JASP
1. Open data
3. Direct Traffic index to Covariates and Sales index to Dependent Variable, and go to Results
Model Summary - Sales index
Model R R² Adjusted R² RMSE
H₀ 0.000 0.000 0.000 76.355
H₁ 0.908 0.824 0.802 33.937
This table provides the R and R2 values. The R value represents the simple correlation and is 0.908 (the "R"
Column), which indicates a high degree of correlation. The R2 value (the "R Square" column) indicates how much of the
total variation in the dependent variable, sales, can be explained by the independent variable, traffic. In this case, 82.4%
can be explained, which is very large.
ANOVA
d
Model Sum of Squares Mean Square F p
f
H₁ Regression 43257.398 1 43257.398 37.560 < .001
Residual 9213.502 8 1151.688
Total 52470.900 9
This table indicates that the regression model predicts the dependent variable significantly well. How do we
know this? Look at the "Regression" row and go to the "Sig." column. This indicates the statistical significance of the
regression model that was run. Here, p < 0.001, which is less than 0.05, and indicates that, overall, the regression model
statistically significantly predicts the outcome variable (i.e., it is a good fit for the data).
The Coefficients table provides us with the necessary information to predict price from income, as well as
determine whether income contributes statistically significantly to the model (by looking at the "Sig." column).
Furthermore, we can use the values in the "B" column under the "Unstandardized Coefficients" column.
Coefficients
Model Unstandardized Standard Error Standardized t p
H₀ (Intercept) 362.900 24.146 15.030 < .001
H₁ (Intercept) 20.175 56.942 0.354 0.732
Traffic index 3.292 0.537 0.908 6.129 < .001
Results shows that the shopping-center traffic index can predict the department store’s daily sales [F(1, 8) =
37.560, p < 0.001, R = 0.908, R2 = 0.824] at 0.05 level of significance, with a linear equation of Sales = 20.175 + 3.292
(Traffic).
This implies that the increase of traffic index of the shopping-center anticipates also the increase of the
Example:
1. Traffic index (X) = 60, what is the daily sales?
Activity 3:
Given: The data of the 10 permanent government employees who have been infected with COVID – 19 and sent
to quarantine facility for health reasons. Given also their level of depression and self-esteem after infection.
Depression Self-Esteem
10 104
12 100
19 98
4 150
25 75
15 105
21 82
7 133
a. Determine if there is significant relationship between the level of depression and self-esteem of the employees.
b. Determine if the level of depression can predict the level of self-esteem of the employees. What will be the model
equation of the given data.
Multiple Regression
Multiple regression is a statistical technique that can be used to analyze the relationship between a single
dependent variable and several independent variables. The objective of multiple regression analysis is to use the
independent variables whose values are known to predict the value of the single dependent value. Each predictor value
is weighed, the weights denoting their relative contribution to the overall prediction.
Y =a+b1 X 1 +b 2 X 2 +. . .+bn X 2
The analyst utilized linear regression to explain the change of dependent variables using only one independent
variable. While in multiple regression, the analyst attempts to explain a dependent variable using more than one
independent variable.
A multiple regression considers the effect of more than one explanatory variable on some outcome of interest. It
evaluates the relative effect of these explanatory, or independent, variables on the dependent variable when holding all
the other variables in the model constant.
Example: An industrial psychologist conducts a study to examine those variables thought to be related to on-
the-job performance of technical employees. A random sample of 15 employees gives the following results.
54 15 8
37 13 1
30 15 1
48 15 7
37 10 4
37 14 2
31 8 3
49 12 7
43 10 9
12 3 1
30 15 1
37 14 2
61 14 10
31 9 1
31 4 5
Is job aptitude test results and in-service training units earned predicts the job performance of the employees?
(use 0.05 level of significance)
2. Level of significance
α = 0.05
3. Statistical Tool
Multiple Regression
4. Computation
Note: Establish first the relationship of job performance to job aptitude and in-service training units.
Using JASP
1. Open the data
Results
Pearson's Correlations
Performance Job Aptitude In-service Training
Variable
Ratings Test Units
Pearson'
1. Performance Ratings —
sr
p-value —
Pearson'
2. Job Aptitude Test 0.604 —
sr
p-value 0.017 —
3. In-service Training Pearson'
0.815 0.149 —
Units sr
p-value < .001 0.595 —
Results shows that the job performance of the employee’s have a high positive relationship with their job
aptitude test [r(13) = 0.604, p = 0.017] and in-service training units earned [r(13) = 0.815, p < 0.001] at 0.05 level of
significance.
Since the job performance of the employee’s have relationship with job aptitude test and in-service training
units earned, now we can employ multiple regression to determine if job aptitude test and in-service training units
earned predicts performance ratings of the employees.
6. Results
ANOVA
Model Sum of Squares df Mean Square F p
H₁ Regression 1791.079 2 895.539 55.208 < .001
1
Residual 194.654 16.221
2
1
Total 1985.733
4
Coefficients
Mod
Unstandardized Standard Error Standardized t p
el
12.31 < .00
H₀ (Intercept) 37.867 3.075
4 1
0.01
H₁ (Intercept) 9.870 3.380 2.920
3
< .00
Job Aptitude Test 1.477 0.274 0.494 5.399
1
< .00
In-service Training Units 2.699 0.333 0.741 8.107
1
Based from the results, the job aptitude test and in-service training units predicts the job performance of the
employees [F(2, 12) = 55.208, p <0.001, R = 0.950, R2 = 0.902] at 0.05 level of significance, with model equation of:
Activity:
Determine if the level of preparedness and level of awareness of the participants towards natural disaster
predicts their level of resiliency. Refer your data on ACTIVITY 1 and use 0.05 level of significance.
Reference:
Cohen, J., et al. (2002), Applied multiple regression/correlation analysis for behavioral sciences (3 rd ed). Mahwah, NY:
Erlbaum.
Daniel, W. and Terrel, J (1986) Business Statistics (Basic Concepts and Methodology) 4 th Edition. Houghton Mifflin
Company, Boston.
Nishishiba, M. (2014), Research Methods and Statistics for Public and Nonprofit Administrators, Sage.
Prabhakaran, S. (2017), Linear Regression, http://r-statistics.co/Linear-Regression.html, Date Retrieved: March 13, 2020,
Friday.
Salkind, N.J. (2008), Statistics for people who (think they) hate statistics. Thousand Oaks, CA: Sage.
Wilson, L. (2019), Statistical Correlation, https://explorable.com/statistical-correlation, Date Retrieved March 13, 2020,
Friday.