You are on page 1of 10

____________________________________________________________________________________________________

Subject PSYCHOLOGY

Paper No and Title Paper No.2: Quantitative Methods

Module No and Title Module No.24: Regression

Module Tag PSY_P2_M24

TABLE OF CONTENTS

1. Learning Outcomes
2. Introduction
3. Regression
3.1 History of Regression
3.2 Regression Model
3.3 factors effecting correlation
3.4 Assumptions of Regression
4. Practical applications of Regression
4.1 Applications
4.2 using SPSS for regression
4.3 SPSS20 for regression analysis
5. Summary

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

1. Learning Outcomes
After studying this module, you shall be able to

 Get a preliminary understanding of regression,


 Make predictions of variables
 Comprehend its applicability
 Use SPSS 20 for regression analysis

2. Introduction
Suppose, in a group of 40 applicants in an organization, one wants to estimate the number of
employees who would be efficient in working on a particular project knowing their past academic
record. The best possible guess that can be made is that since the job requires an average
academic performance, an average scorer could be the best possible choice. So, in that group of
40 applicants, one would take the mean percentage academic performance and those applicants
who lie in this range can be predicted to perform good on this job.

One can understand that if any two variables are related to each other statistically, i.e. if they
share a fair degree of co-variance or have a high positive or negative correlation, then a measure
on one variable can be used to predict about the possible score on the other variable with fair
amount of success.

A researcher may know that intelligence score is highly positively correlated with academic
achievement through data collected on a sample. Using the this information and the concept of
regression, if the researcher has a measure of only one of the variables, either intelligence or
academic achievement then one can make prediction about the other variable with a fair amount
of accuracy. So, the concept of regression is a step forward in the direction of studying
relationship among the variables statistically and using it to the advantage of predicting the other.

3. Regression
3.1 History of Regression

The term regression was first used by Francis Galton with reference to the inheritance of stature.
He found that the children of tall parents tended to be less tall while children of short parents
tended to be shorter. Thus, the heights of the off-springs tended to move towards the mean height
of the general population. The tendency of maintaining the mean value was called the principle of
regression by Galton and the line describing the relationship of height in parent and offspring was
called the regression line. Thus the predictor variable here becomes the parent height and the
outcome variable is the child’s height. The prediction of one variable from the other is the
concept of regression.

Similar to correlation, regression is used to analyze the relationship between two continuous
variables. It is also better suited for studying functional dependencies between factors that is
where X partially determines the level of Y. For instance, as age increases, blood pressure
PSYCHOLOGY Paper No.2: Quantitative Methods
Module No.24: Regression
____________________________________________________________________________________________________

increases. But the length of arm does not have any effect on
length of the leg. Also, it becomes better suited than correlation
for studying samples in which the investigator fixes the distribution of X or the predictor variable.

For example, if independent variable be the percentage of children receiving reduced fee school
lunches in a particular neighborhood as a substitute for neighborhood socio economic status and
the dependent variable be the percentage of bicycle riders wearing helmets. The researcher finds a
strong negative correlation of –0.85. the data obtained is useful if the researcher wants to know
about the helmet wearing behavior on the basis of data obtained on socio- economic status. The
scatter-plot of the data is depicted in the graph below. A straight line of best fit can be fitted to the
data using the least square criterion (readers may refer to the module on correlation to read more
about this). The line of best fit enables the statistician to develop the regression model.

3.2 Regression model

A line of best fit is described as a stright line that runs through the data in such a manner that the
sum of square of the deviances from the mean is minimum. Let us understand this concept.

A line is recognised by its slope or the angle of the line describing the change in Y per unit X and
intercept is where the line crosses the Y axis. Regression describes the relation between Xand Y
with just such a line.

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

= predicted value of Y

a = intercept of the best fitting line

b = slope of the line

hence, the regression model is represented by:

Now identifying the best line for the data becomes a question. Had all the data points fell on a
single line, identification of the slop and intercept would have been easier. But since the statistical
data has random scatter, identifying a good line becomes a process requiring effort.

The random scatter around the line is recognised as the distance of each point from the predicted
line and these distances are referred to as residuals shown below.

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

One’s aim is now to identify the line that minimises the sum of
the squared residuals which is called the least squares line. The
slope of the least squares line represented by by is given as

where,

SSXY = sum of the cross products for variable X and Y

SSXX = sum of the squares for variable X

Hence, in the above exmaple, SSXY = -4231.1333 and SSXX = 7855.67.thus, b = -


4231.1333/7855.67 = -0.539.

The intercept of the least squares line is represented by the equation

where,

= average value of Y

b = slope

= average value of X

Hence, in the above example, = 30.8833 and = 30.8333. thus, a=(30.8333) + (-0.539)
(30.8333) = 47.49 and the regression model becomes = 47.49+ (-0.54)x.

Now making the interpretations.

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

The slope in the regression model is the average change in Y per unit X. thus, the slope of -0.54
predicts 0.54 fewer helmet users per 100 bicycle riders for each additional percentage of children
receiving reduced fee meals.

Since the regression model can be used to predict the value of Y at a given level of X, like the
neighborhood in which half the children receive reduced fee lunch (X=50) has an expected
helmet use rate (per 100 riders) as 47.49+(-0.54)(50) = 20.5.

3.3 factors effecting regression

Number of cases: when doing regression, the cases-to-independent variable ratio should be 20:1
that is 20 cases for every independent variable in the model. The lowest ratio could be 5:1 that is
5 cases for every independent variable in the model

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

Accuracy of data: one should check the accuracy of data entry to ensure that all the values for
each variable are valid.

Missing data: one should look for missing data and if there are a lot of missing values, one should
not include those variables in analyses or, delete the cases if there are few cases with missing
values or if its important, place the mean values of that variable in the missing places.

Outliers: once should check data for outliers that is an extreme value on a particular item which is
at least 3 standard deviation above or below the mean. One may delete these cases if they are not
a part of the same population or retain it but reduce how extreme it is that is recode the value.

3.4 Assumptions of Regression

Normality: one should check for normal distribution of the data by constructing a histogram or
construct a normal probability plot.

Linearity: one assumes linearity or a single straight line relationship between the independent and
dependent variables as regression analysis tests for only linear relationships. Any non linear
relationship gets ignored.

Homoscedasticity: one assumes homoscedasticity that is the residuals are approximately equal for
all predicted dependent variable scores or variability of scores for the independent variables is
same at all values of the dependent variable.

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

Multicollinearity and Singularity: multicollinearity is when the


independent variables are very highly correlated (0.90 or
greater).singularity is when independent variables are perfectly correlated and one independent
variable is a combination of one or more of the other independent variables.

4. Practical Applications of Regression analysis

4.1 Applications of regression

Regression method can be employed in


 To seek some sortof descriptive relationshipbetween a set of measured variables like a
sociologist wanting to establish a relationship between the occupational status of an
individual and the educational level of the person as well as his/her parents’educational
level
 To provide evidence for a theory for instance in estimating coefficients for an established
model employed in relating a particular plant’sweight with the amount of water it
receives, available nutrients in soil and sunlight it is exposed to.
 To predict some response variable at a certain level of other input variables which play a
role in planning, monitoring, altering or evaluating a process.

 Areas of demand management and data analysis in the field of business


 Social, psychological or any research of any field like agriculture to predict the
yield of a crop for instance by studying the role of quality of seed, soil fertility,
temperature, rainfall etc.

Focus of analysis: the purpose of carrying out the multiple regression analysis in
quantitative psychological research is to analyze the extent to which two or more
independent variables relate to a dependent variable.

Variables involved: there may two or more than two independent variables which are
continuously scaled. The dependent variables are also continuously scaled i.e. either
interval or ratio scale of measurement.

Realtioship of the participants’ scores across the groups being compared: to be suitable
for multiple regression analysis, the participants should have scored on all the variables,
or in other words the scores are dependent upon each other.

Assumptions underlying regression analysis:

The assumptions of normality, homoscedasticity, linearity, independence of errors and


multicollinearity are applicable to multiple regression analysis.

4.2 Using SPSS for regression analysis


The pearsons’ product moment correlation is a popular statistical method to assess the basic
statistic in regression analysis. There are three types of regression analyses:
 Standard/ simultaneous regression analysis
 Sequential/ hierarchical regression analysis
 Statistical/ stepwise regression analysis

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression
____________________________________________________________________________________________________

The standard or simple regression analysis is also referred to as


simulataneousregression analysis. All predictor variables are entered in theregression equation at
once. Each predictor variable isevaluated in terms of what it adds uniquely to the prediction of the
criterion variable . simple regression analysis is used in exploratory and hypothesis building
regression model. Although it is a much less used approach in quantitative research.

4.3 Regression Analyses SPSS (IBM SPSS 20) commands:


(source: W.E. Martin & K.D. Bridgmon (2012) quantitative and statistical research methods from
hypothesis to results. Jossey- Bass, A wiley imprint, San francisco.)

Let us take an example:


A researcher wants to studywhether doctoral student’s lower interests in scientist activities more
highly predict higher dissertation stress than do their interests in practitioners’ activities. The
scientist and practitionerscales of the SPI are predistor variables and higher scores reflect higher
interests. The dependent variable in the study is the Dissertation Stress Inventory(DSI) , and
higher scores translate to hger dissertation stress perceived by the sample ofdoctoral students.
Bivariate correlation coeffecient, multiple correlation coeffecient and sequential regression
analysis will be conducted at ∞= .05.

1. Open SPSS data file called MRA-data and click on Analyze>regression>linear.


2. Click over DSI under dependent:
3. Under independent(s): > click over SPIScient then click on the next button to the upper
right. The SPIScient disappears and the program stores the first predictor variable. Click
over SPIPract and click on the next button. You have set the model you want to test.
4. Click on statistics button >check on estimates, confidence intervals, and type in
95besides level(%): check on model fit,R squared change, descriptives, part and
partial correlations and collinearity diagnostics and click on continue.
5. Click on plots button and click over *ZRESID to the Y: box and *ZPRED to the X: .
under standardized residual plots click on histogram and normalprobability plot and
click continue.
6. Click on the save button >under distances click mahalanobis and click continue and
click OK
7. Save the results

5. Summary
 The concept of regression is a step forward in the direction of studying relationship
among the variables statistically and using it to the advantage of predicting the other.

 Similar to correlation, regression is used to analyze the relationship between two


continuous variables. It is also better suited for studying functional dependencies between
factors that is where X partially determines the level of Y.
 A straight line of best fit can be fitted to the data using the least square criterion . The line
of best fit enables the statistician to develop the regression model.
 A line is recognised by its slope or the angle of the line describing the change in Y per
unit X and intercept is where the line crosses the Y axis. Regression describes the relation
between Xand Y with just such a line.
 Factors effecting regression: Outliers, Missing data, Number of cases, Number of cases
PSYCHOLOGY Paper No.2: Quantitative Methods
Module No.24: Regression
____________________________________________________________________________________________________

 Assumptions of Regression: Multicollinearity and


Singularity or variables related, Homoscedasticity, Linearity, Normality.
 The steps involved in using SPSS20 for regression analysis is also discussed in the text.

PSYCHOLOGY Paper No.2: Quantitative Methods


Module No.24: Regression

You might also like