You are on page 1of 19

Multiple Regression and

Correlation
Dr. Carlo Magno
Y = a + bX
Bivariate correlation

y = b1x1 + b2x2 + ... + bnxn + c

Multiple correlation
 Multiple Regression– association between
a criterion variable and two or more
predictor variables (Aron & Aron, 2003).

Multiple correlation coefficient = R

 Using two or more variables to predict a


criterion variable.
Onwuegbuzie, A. J., Bailey, P, & Daley, C. E. (2000). Cognitive, affective,
personality, and demographic predictors of foreign-language achievement.
The Journal of Educational Research, 94, 3-15.
Cognitive
Academic Ach.
Study Habits
Expectation
Foreign
Affective
Language
Perception
Achievement
Anxiety

Personality
Cooperativeness
Competitiveness

Demographic
Gender
Age
Espin, C., Shin, J., Deno, S. L., Skare, S., Robinson, S., & Brenner, B. (2000).
Identifying indicators of written expression proficiency for middle school students.
The Journal of Special Education, 34, 140-153.

Words written
Words correct
Characters
Sentences
Character/Word Written
Word/sentences Expression
Correct word sentences Proficiency
Incorrect Word sentences
Correct minus incorrect
word sentences
Mean length of correct
word sentences
Results
 Regression coefficient (β) /Beta weight– Distinct
contribution of a variable, excluding any overlap
with other predictor variables. Unstandardized
simple regression coefficient

 Standardized regression coefficient - converted


variables (independent and dependent) to z-
scores before doing the regression. Indicates
which independent variable has most effect on
the dependent variable.
Results
 Multiple correlation coefficient (R) – the
correlation between the criterion variable and all
the predictor variables taken together.
 Squared Correlation Coefficient (R2) –The
percent of variance in the dependent variable
explained collectively by all of the independent
variables.
 R2adjusted - assessing the goodness of fit of a
regression equation. How well do the predictors
(regressors), taken together, explain the
variation in the dependent variable.
R2adj = 1 - (1-R2)(N-n-1)/(N-1)
 R2adj
 above 75% as very good;
 50-75% as good;
 25-50% as fair;
 below 25% as poor and perhaps
unacceptable. R2adj values above 90% are
rare in psychological data
 Residual - The deviation of a particular
point from the regression line (its
predicted value).
 t-tests - used to assess the significance of
individual b coefficients.
 F test - The F test is used to test the
significance of R,
 F = [R2/k]/[(1 - R2 )/(n - k - 1)].
Considerations in using
multiple regression:
 The units (usually people) observed
should be a random sample from some
well defined population.
 The dependent variable should be
measured on an interval, continuous
scale.
 The independent variables should be
measured on interval scales
Considerations in using
multiple regression:
 The distributions of all the variables
should be normal
 The relationships between the dependent
variable and the independent variable
should be linear.
 Although the independent variables can
be correlated, there must be no perfect (or
near-perfect) correlations among them, a
situation called multicollinearity.
Considerations in using
multiple regression:
 There must be no interactions (in the
anova sense) between independent
variables
 a rule of thumb for testing b coefficients is
to have N >= 104 + m, where m = number
of independent variables.
Reporting regression results:
“The data were analyzed by multiple regression,
using as regressors age, income and gender.
The regression was a rather poor fit (R2adj =
40%), but the overall relationship was significant
(F3,12 = 4.32, p < 0.05). With other variables held
constant, depression scores were negatively
related to age and income, decreasing by 0.16
for every extra year of age, and by 0.09 for
every extra pound per week income. Women
tended to have higher scores than men, by 3.3
units. Only the effect of income was significant
(t12 = 3.18, p < 0.01).
Partial Correlation
 In its squared form is the percent of
variance in the dependent uniquely and
jointly attributable to the given
independent when other variables in the
equation are controlled
Stepwise Regression
 y = ß0 + ß1x1 + ß2x2 + ß3x3 + ß4x4 + ß5x5 +
ß6x6 + ß7x7 + ß8x8 + ß9x9 + ß10x10 + ß11x11 +
ß12x12 + ß13x13 + ß14x14 + ß14x14 + ε
 choose a subset of the independent
variables which "best" explains the
dependent variable.
Heirarchical Regression
 1) Forward Selection
 Start by choosing the independent
variable which explains the most variation
in the dependent variable.
 Choose a second variable which explains
the most residual variation, and then
recalculate regression coefficients.
 Continue until no variables "significantly"
explain residual variation.
Stepwise Regression
 2) Backward Selection
 Start with all the variables in the model, and
drop the least "significant", one at a time, until
you are left with only "significant" variables.

 3) Mixture of the two


 Perform a forward selection, but drop variables
which become no longer "significant" after
introduction of new variables.
Hierarchical Regression
 The researcher determines the order of
entry of the variables.
 F-tests are used to compute the
significance of each added variable (or set
of variables) to the explanation reflected in
R-square
 an alternative to comparing betas for
purposes of assessing the importance of
the independents
Categorical Regression
 Used when there is a combination of
nominal, ordinal, and interval-level
independent variables.

You might also like