You are on page 1of 20

BIG DATA

To d a y ’s o b j e c t i v e s

All the session today will be done on JASP, we will be dealing with:
1. Contingency tables
2. Correlation coefficients – Pearson
3. Correlation coefficients – Spearman
4. Simple linear regression

2
Inferential statistics
3
Measuring connections between variables
Depending on the type of variables that we have there are different option on how to measure
connections between variables

Testing for
connections
between variables

Contingency tables Correlation

Nominal variables Ordinal Interval or ratio

Spearman
Pearson correlation
correlation
coefficient
coefficient
4
Spearman correlation coefficient
For two variables which are continuous in nature

• Height, age, test score, income

But not for discrete or categorical variables

Pearson • Race, political affiliation, social class, rank

correlation
The relations between two variables
coefficient
• How the value of one variable changes when the value of another
variable changes

A correlation coefficient is a numerical index to reflect


the relationship between two variables.
• Range: -1 ~ +1
• Bivariate correlation (for two variables)
Correlation Analysis – Measuring
t h e R e l a t i o n s h i p B e t w e e n T w o Va r i a b l e s

Definition
The Coefficient of Correlation (r) is a measure of the strength of the relationship between two
variables.

• It shows the direction and strength of the  , there is little linear relationship between the variables.
linear relationship between two interval-  indicates a direct or positive linear relationship.
or ratio-scale variables.  indicates an inverse or negative linear relationship between
the variables.
Perfect negative correlation Perfect positive corrrelation
12 12
10 10

8 8
6 6
4 4
2 2

0 0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

6
Correlation Analysis – Measuring
t h e R e l a t i o n s h i p B e t w e e n Tw o Va r i a b l e s

Strong Moderate Weak Weak Moderate Strong


Negative Negative Negative Positive Positive Positive
Correlation Correlation Correlation Correlation Correlation Correlation

−1 Negative correlation:
0 Positive correlation:
1
Absence of correlation, no correlation, the variables are independent
Weak correlation
Moderate correlation
Strong correlation
Perfect correlation

7
Correlation Analysis – Measuring
t h e R e l a t i o n s h i p B e t w e e n T w o Va r i a b l e s

Examples of distributions
and their relative
coefficient of correlation

8
Correlation between more than 2
variables?
Income Education Attitude Vote
Income 1.00 0.35 -0.19 0.51
Education 1.00 -0.21 0.43
Attitude 1.00 0.55
Vote
Correlation matrix 1.00

rxy value Interpretation


0.8 ~ 1.0 Very strong relationship (share most of the things in common)

0.6 ~0.8 Strong relationship (share many things in common)

0.4 ~ 0.6 Moderate relationship (share something in common)

0.2 ~ 0.4 Weak relationship (share a little in common)

0.0 ~ 0.2 Weak or no relationship (share very little or nothing in common)


Spearman correlation coefficient
Variable types (level of measurement)
• Interval/ratio
• Ordinal

Range: -1 to +1

Spearman
correlation
coefficient
Different types of correlation
coefficients

11
Correlation in JASP

Choice of
variables

Choice of correlation
coefficient

Scatter plots

Correlation is available on the regression menu under the option classical correlation
Correlation vs. Regression
• What are the differences
between the 2?

• What to expect from


regression?
Regression Analysis
• Correlation Analysis tests for the strength and direction of the
relationship between two quantitative variables.

• Regression Analysis evaluates and “measures” the relationship


between two quantitative variables with a linear equation. This
equation has the same elements as any equation of a line, that is, a
slope and an intercept.
𝑌 =𝒂 𝑋+ 𝒃
The relationship between X and Y is defined by the values of the intercept, b and
the slope, a. In regression analysis, we use data (observed values of X and Y) to
estimate the values of a and b.
14
Regression Analysis

𝑌 =𝒂 𝑋+ 𝒃

 Y is the Dependent Variable. It is the variable being predicted or


estimated explained. The Endogenous variable.

 X is the Independent Variable. For a regression equation, it is the


variable used to estimate the dependent variable, Y. X is the
predictor variable, the Exogenous variable.

15
Purpose of Regression Analysis

The purpose of regression analysis is to


analyze relationships among variables.

Answer the question of how much y


Forecast or predict the value of y
changes with changes in each of the
based on the values of the X's
X’s
Simple Linear Regression- JASP
Choice of dependent variable

Choice of independent
variables
Simple Linear Regression – Output from
SPSS
correlation between actual and predicted values of the dependent
variable

the model’s accuracy in explaining the dependent variable

𝑃𝑟𝑜𝑓𝑖𝑡=𝟏𝟓.𝟗𝟔𝟕𝑨𝒈𝒆+𝟏𝟏𝟎.𝟓𝟐𝟔 Coefficient for b (intercept in linear regression


formula)

Coefficient for a (slope in linear regression formula) 𝑌 =𝒂 𝑋+ 𝒃


9-18
R e g r e s s i o n S t a t i s t i c s i n E x c e l ’s O u t p u t
Multiple R
• is the correlation between actual and predicted values of the dependent variable (r varies from -1
to +1 (r is negative if slope is negative) )

R Square
• the model’s accuracy in explaining the dependent variable
• R2 varies from 0 (no fit) to 1 (perfect fit)

Adjusted R Square
• adjusts R2 for sample size and number of X variables
• As the sample size increases above 20 cases per variable, adjustment is less needed (and vice
versa).
Standard Error
• variability between observed & predicted Y variables
9-19
Activity
Load the file Employee data on JASP.

Answer the questions on quiz

https://buv.instructure.com/courses/1541/quizzes/4645

9-20

You might also like