2 Correlation & Regression

You might also like

You are on page 1of 20

Working with relationships between two variables Size of Teaching Tip & Stats Test Score

100 90 80 70 60 50 40 30 20 10 0 $0 $20 $40 $60 $80

Stats Test Score

Correlation & Regression


Univariate & Bivariate Statistics
U: frequency distribution, mean, mode, range, standard deviation B: correlation two variables

Correlation
linear pattern of relationship between one variable (x) and another variable (y) an association between two variables relative position of one variable correlates with relative distribution of another variable graphical representation of the relationship between two variables

Warning:
No proof of causality Cannot assume x causes y

Scatterplot!
No Correlation
Random or circular assortment of dots

Positive Correlation
ellipse leaning to right GPA and SAT
Smoking and Lung Damage

Negative Correlation
ellipse learning to left Depression & Self-esteem Studying & test errors

Pearsons Correlation Coefficient


r indicates
strength of relationship (strong, weak, or none) direction of relationship positive (direct) variables move in same direction negative (inverse) variables move in opposite directions

r ranges in value from 1.0 to +1.0

-1.0 Strong Negative

0.0 No Rel.

+1.0 Strong Positive

Go to website!
playing with scatterplots

Practice with Scatterplots

r = .__ __

r = .__ __

r = .__ __

r = .__ __

Correlation Guestimation

Correlations Miles walked per day 1 12 -.797** .002 12 -.800** .002 12 -.774** .003 12

Miles walked per day

Weight

Depression

Anxiety

Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N Pearson Correlation Sig. (2-tailed) N

Weight Depression -.797** -.800** .002 .002 12 12 1 .648* .023 12 12 .648* 1 .023 12 12 .780** .753** .003 .005 12 12

Anxiety -.774** .003 12 .780** .003 12 .753** .005 12 1 12

**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

Samples vs. Populations


Sample statistics estimate Population parameters
M tries to estimate r tries to estimate (rho greek symbol --- not p)

correlation for a sample based on a the limited observations we have actual correlation in population the true correlation

Beware Sampling Error!!


even if =0 (theres no actual correlation), you might get r =.08 or r = -.26 just by chance. We look at r, but we want to know about

Hypothesis testing with Correlations


Two possibilities
Ho: Ha: = 0 (no actual correlation; The Null Hypothesis) 0 (there is some correlation; The Alternative Hyp.)

Case #1 (see correlation worksheet)


Correlation between distance and points r = -.904 Sample small (n=6), but r is very large We guess < 0 (we guess there is some correlation in the pop.)

Case #2
Correlation between aiming and points, r = .628 Sample small (n=6), and r is only moderate in size We guess = 0 (we guess there is NO correlation in pop.)

Bottom-line
We can only guess about We can be wrong in two ways

Reading Correlation Matrix


a Correlations

Time spun Total ball Distance before Aiming Manual College gradeConfidence toss points from target throwing accuracy dexterity point avg for task Total ball toss points Pearson Correlatio 1 -.904* -.582 .628 .821* -.037 -.502 Sig. (2-tailed) . .013 .226 .181 .045 .945 .310 N 6 6 6 6 6 6 6 Distance from target Pearson Correlatio -.904* 1 .279 -.653 -.883* .228 .522 Sig. (2-tailed) .013 . .592 .159 .020 .664 .288 N 6 6 6 6 6 6 6 Time spun before Pearson Correlatio -.582 .279 1 -.390 -.248 -.087 .267 throwing Sig. (2-tailed) .226 .592 . .445 .635 .869 .609 N 6 6 6 6 6 6 6 Aiming accuracy Pearson Correlatio .628 Sig. (2-tailed) .181 N 6 Manual dexterity Pearson Correlatio .821* Sig. (2-tailed) .045 N 6 College grade pointPearson Correlatio -.037 a Sig. (2-tailed) .945 N 6 Confidence for task Pearson Correlatio -.502 Sig. (2-tailed) .310 N 6 a.Day sample collected = Tuesday -.653 .159 6 -.883* .020 6 .228 .664 6 .522 .288 6 -.390 .445 6 -.248 .635 6 -.087 .869 6 .267 .609 6 1 . 6 .758 .081 6 -.546 .262 6 -.250 .633 6 .758 .081 6 1 . 6 -.553 .255 6 -.101 .848 6 -.546 .262 6 -.553 .255 6 1 . 6 -.524 .286 6 -.250 a.633 6 -.101 .848 6 -.524 .286 6 1

Co e a ons

Time spun r = Manual Total ball Distance before Aiming-.904 College grade Confidence throwing toss points target from accuracy task p.628013 -- Probability of = . dexteritypoint avg for-.502 Total ball toss p Pearson Corre 1 -.904 -.582 * .821* -.037 correlation this size Sig. (2-tailed) . .013 .226 getting a.045 .181 .945 .310 N 6 6 6 by sheer chance. Reject Ho 6 6 6 6 Distance from ta Pearson Corre -.904 * 1 .279 if p .05. * -.653 -.883 .228 .522 Sig. (2-tailed) .013 . .592 .159 .020 .664 .288 sample N 6 6 6 6 6 6 size 6 r (4)-.248 -.087 .267 = -.904, pe.05 Time spun befor Pearson Corre -.582 .279 1 -.390 throwing Sig. (2-tailed) .226 .592 . .445 .635 .869 .609 N 6 6 6 6 6 6 6
. 6 *.Correlation is significant at the 0.05 level (2-tailed).

Predictive Potential
Coefficient of Determination
r Amount of variance accounted for in y by x Percentage increase in accuracy you gain by using the regression line to make predictions Without correlation, you can only guess the mean of y [Used with regression]

0%

20%

40%

60%

80%

100%

Limitations of Correlation
linearity:
cant describe non-linear relationships e.g., relation between anxiety & performance

truncation of range:
underestimate stength of relationship if you cant see full range of x value

no proof of causation
third variable problem: could be 3rd variable causing change in both variables directionality: cant be sure which way causality flows

Regression
Regression: Correlation + Prediction
predicting y based on x e.g., predicting. throwing points (y) based on distance from target (x)

Regression equation
formula that specifies a line y = bx + a plug in a x value (distance from target) and predict y (points) note y= actual value of a score y= predict value Go to website!
Regression Playground

Regression Graphic Regression Line


See correlation & regression worksheet

y=47 y=20
T t l
Rsq .

ll t ss

i ts

Dist

c fr

t r

if x=18 then

if x=24 then

Regression Equation
y= bx + a
y = predicted value of y b = slope of the line x = value of x that you plug-in a = y-intercept (where line crosses y access) See correlation & regression worksheet

In this case.
y = -4.263(x) + 125.401

So if the distance is 20 feet y = -4.263(20) + 125.401


y = -85.26 + 125.401

y = 40.141

SPSS Regression Set-up


Criterion, y-axis variable, what youre trying to predict

Predictor, x-axis variable, what youre basing the prediction on

Note: Never refer to the IV or DV when doing regression

Getting Regression Info from SPSS


Model Summary

Model 1 a. Predictors: (Constant), istance from target

Adjusted Std. Error of R R Square R Square the Estimate a .777 .603 .581 18.476

See correlation & regression worksheet

y = b (x)

a
a Coefficients

y = -4.263(20) + 125.401

Unstandardized Standardized Coefficients Coefficients Model B Std. Error Beta 1 (Constant) 125.401 14.265 istance from targe -4.263 .815 -.777 a. ependent Variable: Total ball toss points

t 8.791 -5.230

Sig. .000 .000

Predictive Ability
Mantra!!
As variability decreases, prediction accuracy ___ if we can account for variance, we can make better predictions

As r increases:
r increases variance accounted for increases the prediction accuracy increases prediction error decreases (distance between y and y) Sy decreases the standard error of the residual/predictor measures overall amount of prediction error

We like big rs!!!

Drawing a Regression Line by Hand


Three steps 1. Plug zero in for x to get a y value, and then plot this value
Note: It will be the y-intercept

2. Plug in a large value for x (just so it falls on the right end of the graph), plug it in for x, then plot the resulting point 3. Connect the two points with a straight line!

You might also like