-P H SHAMEER

• An introduction to regression model

• Performing it on SPSS

INTRODUCTION

• What is regression model?

An explanatory method

Forecast expressed as a function of a

certain no. of variables that influences its

outcome

2 types of variables

1. DEPENDENT

-which we want to forecast

2. INDEPENDENT

-or predictor variables

• Eg:

• Predict how much an individual enjoys

his/her job

• Dependent variable: job satisfaction

• Independent variables:

salary, academic qualification, age, sex,

no. of years, socio-economic status….

assumptions

1. LINEAR RELATIONSHIP exists

2. HOMOSCEDASTICITY exists

3. Residuals are INDEPENDENT of one

another

4. MULTICOLLINEARITY doesn’t exist

Analysis for Linearity

Y Y

x x

x x

Not Linear

Linear

Residual Analysis for

Homoscedasticity

Y Y

x x

residuals

x residuals x

Non-constant variance

Constant variance

SCATTER PLOTS

• -helps to visualize, graphically the

relationship between pairs of variables

Regression Equation

y ' = a + b1 x1 + b2 x2 + bi xi

where

a is y intercept

&

b1, b2, ..bi are regression coefficients

How a & b can be calculated?

• Method of least squares

this method determines the values in such

a way that the sum of squared deviations

(errors) is minimized

and hence the name least squares

b=(∑x*y/n) ─ (x * y)

( ∑x2 / n) ─ (x)2

a = y- bx

where y = ∑y/n

x= ∑x/n

n is the no. of observations

forecasting

• Once the relationship is determined , it

can be used to make any no. of forecasts

simply by inserting the values of X’s

• y = a+b1x1+b2x2+…+bixi

assessed periodically

terminology

b - standard regression coefficient:

Measure of how strongly each predictor

variable influences the dependent variable

E.g.: if b=2.5

change of one standard deviation in

the predictor will change 2.5 standard

deviations in the forecasting variable

terminology

R

Measure of correlation between observed

& predicted value of the dependent

variable

R -1 t0 1

R= n*∑xi*yi-∑xi*∑yi

√(n∑xi2- (∑xi)2) √(n∑yi2- (∑yi)2)

Scatter Plots of Data with

Various Correlation Coefficients

Y Y Y

X X X

r = -1 r = -.6 r=0

Y

Y Y

X X X

r = +1 r = +.3 r=0

Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall

terminology…..

R2

variation in Y accounted for by the set of

predictors

Measure of how good a forecasting of

dep. variable by knowing the independent

variables.

When applied to reality, R2 over estimate

the success

terminology…

Adjusted R2

The adjustment takes into account the size

of the sample and number of predictors

Gives most useful measure of success of our

model ( goodness of fit)

R2 range:0 to 1.

If R2=0.75, success will be 75%

Is each X contributing to the prediction

of Y?

• Test if each regression coefficient is

significantly different than zero given the

variables standard error.

Performing regression in

spss

Eg:importance of several

psycholinguistic variables on spelling

performance

variables

Independent:

standardized spelling score(spellsc),

chronological age(age), reading

age(readage), standardized reading

score(standsc)

Dependent variable:

percentage correct spelling(spelperc)

Performing regression in

spss

• SPPS=Statistical Packages

in Social Sciences

Enter the data

Cont..

>Analyze>regression> linear

dialogue box appears

now enter dependent and independent

variables

Selection methods:

on relative contribution of independent

variables

1. simultaneous/ enter method

2. Hierarchical method

3. Statistical methods

a. Forward

b. Backward

c. Stepwise

d. Remove

Now click the statistics button

Output:

Cont…

• Here reading age is not a significant

predictor

result:

percentage correct spelling=

-232+.406*chronological age

+.394*standardized reading score

+.786*standardized spelling score

references

Forecasting methods for management

by Spyros Makridas & Steven C Wheelwright

SPSS for psychologists

by Nicola Brace, Richard Kemp & Rosemary

Snelger

Research Methods for M.Com

by L.R Potti

THANKYOU…

