# Dependent Variable Last Week T-Test Correlation and Regression Chi-Square General Linear Model Continuous

Independent Variable Categorical

Continuous

Continuous

This Week

Categorical Categorical or Continuous

Categorical Categorical or Continuous

Next Week

Association Between Two Variables
• No association • Linear association
– Positive association – Negative association

• Curvilinear association

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 0 0.2 0.4 0.6 0.8 1

0 0 0.2 0.4 0.6 0.8 1

1

1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 0 0.2 0.4 0.6 0.8 1

0 0 0.2 0.4 0.6 0.8 1

Strength of Linear Association
1
1

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 0 0.2 0.4 0.6 0.8 1

0 0 0.2 0.4 0.6 0.8 1

Strength of Linear Association
1 1 0.8 0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 0 0.2 0.4 0.6 0.8 1

0 0 0.2 0.4 0.6 0.8 1

Strength of Linear Association
1 1 0.8 0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 0 0.2 0.4 0.6 0.8 1

0 0 0.2 0.4 0.6 0.8 1

Quantifying the Strength of Linear Correlation
• What does a positive linear correlation mean?
– Large numbers on one variable go with large numbers on the other variable.

• How to decide what are large and small numbers?
– Relative to the means.

Student # 1 2 3 4 5 µ σ

SAT (X) 450 520 600 470 460 500 55.5

GPA (Y) 2.7 3.1 3.5 2.6 3.1 3.0 0.32

3.6 3.4 3.2 3 500 2.8 2.6 2.4

350

400

450

550

600

650

Student SAT GPA X – µ X Y – µ Y # (X) (Y) 1 450 2.7 -50 -0.3 2 3 4 5 Sum µ σ 520 600 470 460 3.1 3.5 2.6 3.1 20 100 -30 -40 0 0 0.1 0.5 -0.4 0.1 0 0

(X – µ X)(Y – µ Y) 15 2 50 12 -4 75 (Cross Product) 15 (Covariance)

2500 15.0 500 55.5 3.0 0.32

Quantifying the Strength of Linear Correlation
• Is 15 a large or smaller number? • At least we know it is positive. • Magnitude relative to the variance (or standard deviation) of X and Y.

r=

Co var iance σ XY = σ X ⋅σ Y σ X ⋅σ Y

• r = 15 / (55.5 x 0.32) = 0.84

Alternative Approach
• Standardize X and Y first (z-scores), then calculate the covariance between the zscores.

r=

∑z

X

⋅ zY

N

Student SAT GPA # (X) (Y) 1 450 2.7 2 3 4 5 Sum µ σ 520 600 470 460 3.1 3.5 2.6 3.1

zX

zY -0.93 0.31 1.55 -1.24 0.31 0 0

zX zY 0.84 0.11 2.79 0.67 -0.22 4.19 0.84
r

-0.90
0.36 1.80 -0.54 -0.72 0 0

2500 15.0 500 55.5 3.0 0.32

Interpreting the Magnitude of Correlations
• Always between -1 and +1 • Proportion of variance explained by the other variable: r2 • r = .84, r2 = .71 = 71% • A correlation of .8 is NOT two times stronger than a correlation of .4.
– How much stronger? – 4 times. (.8)2 = .64; (.4)2 = .16

Significance Testing
• The following has a t distribution:

t=

r N −2 1− r2

df = N – 2 r = .84, t = 2.68, df = 3, p = .075 Not significant at .05 level. Small sample size.

When There’s a Significant Correlation
• • • • Correlation and Causation X causes Y Y causes X Z causes both X and Y

When There’s No Significant Correlation
• Small sample • Other Noise • Attenuation due to unreliability of measurement • Outliers • Restriction in range • Curvilinearity

From Correlation to Regression
• Correlation: to describe the relationship between two variables • Regression: to use one variable to predict another variable • The accuracy of prediction depends on the strength of correlation

Strength of Linear Association
1 1 0.8 0.8

0.6

0.6

0.4

0.4

0.2

0.2

0 0 0.2 0.4 0.6 0.8 1

0 0 0.2 0.4 0.6 0.8 1

An Example
• Research Question: Does eating spinach increase strength? • Randomly sampled 20 individuals. • IV: How many cans of spinach one consumed in the past week. • DV: How many push-ups one can do in a minute.

70

60

r = .86

50

Pushup

40

30

20

10

0 0 5 10 Spinach 15 20 25

Coefficientsa Unstandardized Coefficients B Std. Error 19.443 3.494 1.550 .220 Standardized Coefficients Beta .856

Model 1

(Constant) spinach

t 5.565 7.031

Sig. .000 .000

a. Dependent Variable: pushup

ˆ Y =19.48+1.55X

) zY = (.856) z X

Understanding R2: Proportion of Variance Explained, or Proportion Reduction in Error
70 60

50

Pushup

40

30

20

10

0 0 5 10 Spinach 15 20 25

70

60

50

Pushup

40

30

20

10

When you don’t know X, you can only use the mean of Y to predict the Y score of any individual.
0 5 10 Spinach 15 20 25

0

70

60

50

Pushup

40

30

20

10

Errors (or variance) are relatively high when you use the mean of Y as your prediction.
0 5 10 Spinach 15 20 25

0

70

60

50

Pushup

40

30

20

10

0 0 5 10 Spinach 15 20 25

70

60

50

Pushup

40

30

20

10

When you know X, and use X to predict Y, the errors become smaller.
0 5 10 Spinach 15 20 25

0

70 60 50 Pushup 40 30 20 10 0 0 5 10 Spinach 15 20 25

R2 =

) ∑ (Y − Y ) 2

Green

∑ (Y − Y )

2

Green and Red

# of push-ups

Spinach consumption

Association Between Two Categorical Variables
• Angelina Jolie or Jennifer Aniston?

Test for Independence
• Null Hypothesis: There is no relationship between JA/AJ preference and which side you are sitting in the classroom. • To rephrase: JA/AJ preference does not depend on which side you are sitting in the classroom. • Another version: People sitting on the right and people sitting on the left do not have different JA/AJ preferences.

JA Left Right Total
Observed Expected

AJ

Total

Expected Frequency
• Expected assuming the null hypothesis is true, i.e., no association between the two variables.

Expected =

C⋅R N

• C: Column total, R: Row total, N: Grand total

Chi-Square
(Observed − Expected ) 2 χ =∑ Expected
2

• Degree of Freedom df = (# of Columns – 1)(# of Rows – 1) • What is the df for a 2 x 2 table? • The shape of Chi-Square distribution depends on the degree of freedom

Chi-Square Distribution

Critical Region

Chi-Square
• The chi-square statistic is always positive. Why? • When df = 1, chi-square distribution is the distribution of z2. • Without looking up in a reference, what is the alpha = .05 cutoff value for the chisquare distribution (df = 1)?
– (1.96)2 = 3.84

Back to Angelina and Jennifer
• In SPSS.

If We Still Have Time…

Chi-Square Test for Goodness of Fit
To test whether a distribution is the same as a predetermined or theoretical distribution.

Next Week
• Integrating t-test, correlation, regression, and chi-square test for independence • They are all special cases of the general linear model • Effect size and power for the above tests