Professional Documents
Culture Documents
Subject PSYCHOLOGY
TABLE OF CONTENTS
1. Learning Outcomes
2. Introduction
2.1 Need for multiple regression
2.2 Purpose of Multiple regression
3. Multiple Regression
3.1 What is multiple regression
3.2 Model for multiple regression
3.3 Assumptions of multiple regression
4. Applications of multiple regression analysis
4.1 Numerical example for Multiple Regression Analysis
4.2 Practical Applications of Multiple Regression Analysis
4.3 Using SPSS for regression analysis
4.4 Multiple Regression Analysis SPSS (IBM SPSS 20) commands
5. Summary
1. Learning Outcomes
After studying this module, you shall be able to
2. Introduction
The fate of a movie is counted in terms of the money it earns through the number of people it
attracts. A movie becomes a hit and the masses come when all the ingredients of the film fall in
place. At the same time, it is quite possible that one component like the music may have had an
important role in drawing the audience to the theatres even if the story was not that interesting. At
the same time, a good story can make a small budget film without big stars also a huge success.
Thus, it would be interesting to see whether and how measures like star cast, story, music, dialogs
and publicity relate to the success of the film.
So, out of the number of available variables influencing the fate of a movie which ones have
contributed significantly would be interesting to know. This would perhaps be useful for other film
makers too, as to how to prepare a perfect recipe for a successful film in future.
Multiple regression is a term used first by Pearson in 1908. His aim was to understand more about
the relationships between many independent or predictor variables like the components of the film,
discussed in the example above, and a dependent or a criterion variable that is the audience response
to the released film. The social and natural sciences use multiple regression procedures widely as
they allow the researcher to ask ‘what is the best predictor of ….’. For instance educational
researhcer would want to know what predicts success in high school or a clinical psychologist
wanting to know what predicts a good quality of life for an individual.
3. Multiple Regression
In order to calculate the effects of two or more independent variables on a dependent variable,
multiple regression method is used. It enables one to predict and weigh the relationship between
two or more explanatory or independent variables and an explained dependent variable.
For instance, level of achievement in an examination may not just be affected by the study time,
quality of teaching, amount of practice only but also the level of intelligence possessed by the
student. Now, if one wishes to see how much marks a student would obtain if he/she studies for 30
hours per week and has an intelligence score of 110 can be carried out only with multiple regression
method.
Thus, multiple regression help the researcher to see the predicted effects of a particular independent
variable on a dependent variable, when other independent variables are also present. It takes in a
range of variables and enables one to calculate the relative weightings of independent variables on
a dependent variable. But one must be cautious in the sense that variables may interact with each
other and may be inter-correlated.
A reading of the modules on regression and simple regression analysis would make the readers
realize the importance of “the line of best fit” in regression and prediction. The relationship between
variables can be best described by fitting a straight line running through the data points with
minimal deviances, as functioning on the least square criterion thus, leaving the scope for minimal
residuals .
The straight line becomes our model which is used to predict the values of Y (criterion variable)
from values of X (predictor variables). This straight line assesses the fit of the model by looking at
the deviations between the model which is largely the mean model and the actual data collected.
The mean model is the simplest model available because on an average, it will be a fairly good
guess of an outcome. These deviations are the vertical distances between what the model predicted
and each data point that was actually observed. The differences mentioned here are called residuals.
The data points fall both above which shows that the model underestimates their value and below
the model which shows that the model overestimates their value, resulting in both positive and
negative differences. Since these differences cancel each other on summing them up, the
differences are squared to overcome this problem. The squared differences give an assessment of
how well a line fits the data. Multiple regression becomes an extension of the linear regression
model studied earlier just that the situation has several predictors now. However, the basic equation
remains the same.
But here, for every extra predictor included, a coefficient is added. Hence, each predictor variable
has its own coefficient and the outcome variable is predicted from a combination of all the variables
multiplied by their respective coefficients and a residual term. Hence, the new equation for multiple
regression analysis turns out to be..
where,
Hence, we are trying to find the linear combination of predictors that correlate maximally
with the outcome variable.
Residuals that is the predicted and observed values’ differences are distributed normally or
follow the normal distribution.
There are some mammals that burrow into the ground for some time to survive. Since the quality
of the air in the burrows is not the same and as good as the air above the ground, some mammals
change the way they breathe so as to sustain themselves in the poor air quality conditions under the
ground.
Some researchers wanted to explore the way nestling bank swallows alter their breathing. A
randomised experiment was conducted by the researchers on 120 (= n) nestling bank swallows.
These birds varied the percentage of oxygen at different levels (13%, 15%, 17% & 19%) and the
percentage of carbon dioxide at different leves as well (0%, 3%, 4.5%, 6% and 9%). Under each of
the resulting 5x4 =20 experimental conditions, total volume of air breathed per minute for each of
the 6 nestling bank swallows was observed by the researchers.
Response (y): percentage increase in "minute ventilation," (Vent), i.e., total volume of air
breathed per minute.
Potential predictor (x1): percentage of oxygen (O2) in the air the baby birds breathe.
Potential predictor (x2): percentage of carbon dioxide (CO2) in the air the baby birds
breathe.
The scatter plot matrix of the data obtained by the researchers was:
When the predictor variables are 2 and the response variable is 1, a three dimensional scatter plot
can be made. The first order model with 2 quantitative predictors could be summarized by the same
equation as before:
where,
Yi = percentage of minute ventilation of nestling bank swallow i
xi1 = percentage of oxygen exposed to nestling bank swallow i
xi2 = percentage of carbon dioxide exposed to nestling bank swallow i
and the independent error terms εi follow a normal distribution with mean 0 and equal
variance σ2.
By using the following equation and feeding the data in the computer, the output revealed
25.6% of variation in minute ventilation being reduced by taking into account percentages of
oxygen and carbon di oxide. The P-values for the t-tests suggested slope parameter for carbon di
oxide level (P<0.001) being significantly different
from 0 while the slop pearamter for oxygen level (P=0.408) is not. Lastly, the P value for the
analysis of variance F-test (P<0.001) suggested that the model containing oxygen and carbon di
oxide levels was more useful in predicting minute ventilation than not taking into account the two
predictors.
Focus of analysis: the purpose of carrying out the multiple regression analysis in
quantitative psychological research is to analyze the extent to which two or more
independent variables relate to a dependent variable.
Variables involved: there may be two or more than two independent variables which are
continuously scaled. The dependent variables are also continuously scaled i.e. either
interval or ratio scale of measurement.
Realtioship of the participants’ scores across the groups being compared: to be suitable
for multiple regression analysis, the participants should have scored on all the variables, or
in other words the scores are dependent upon each other.
PSYCHOLOGY Paper No. 2: Quantitative Methods
Module No. 26: Multiple Regression
____________________________________________________________________________________________________