# REGRESSION ANALYSIS

-P H SHAMEER
• An introduction to regression model
• Performing it on SPSS
INTRODUCTION
• What is regression model?
 An explanatory method
 Forecast expressed as a function of a
certain no. of variables that influences its
outcome
2 types of variables
1. DEPENDENT
-which we want to forecast
2. INDEPENDENT
-or predictor variables
• Eg:
• Predict how much an individual enjoys
his/her job
• Dependent variable: job satisfaction
• Independent variables:
salary, academic qualification, age, sex,
no. of years, socio-economic status….
assumptions
1. LINEAR RELATIONSHIP exists
2. HOMOSCEDASTICITY exists
3. Residuals are INDEPENDENT of one
another
4. MULTICOLLINEARITY doesn’t exist
Analysis for Linearity
Y Y

x x

x x

Not Linear
 Linear
Residual Analysis for
Homoscedasticity

Y Y

x x
residuals

x residuals x

Non-constant variance
 Constant variance
SCATTER PLOTS
• -helps to visualize, graphically the
relationship between pairs of variables
Regression Equation

y ' = a + b1 x1 + b2 x2 + bi xi
where
a is y intercept
&
b1, b2, ..bi are regression coefficients
How a & b can be calculated?
• Method of least squares
this method determines the values in such
a way that the sum of squared deviations
(errors) is minimized
and hence the name least squares
b=(∑x*y/n) ─ (x * y)
( ∑x2 / n) ─ (x)2

a = y- bx
where y = ∑y/n
x= ∑x/n
n is the no. of observations
forecasting
• Once the relationship is determined , it
can be used to make any no. of forecasts
simply by inserting the values of X’s
• y = a+b1x1+b2x2+…+bixi

## • Caution: the basic relationship should be

assessed periodically
terminology
b - standard regression coefficient:
 Measure of how strongly each predictor
variable influences the dependent variable
 E.g.: if b=2.5
change of one standard deviation in
the predictor will change 2.5 standard
deviations in the forecasting variable
terminology
R
 Measure of correlation between observed
& predicted value of the dependent
variable
 R -1 t0 1
 R= n*∑xi*yi-∑xi*∑yi
√(n∑xi2- (∑xi)2) √(n∑yi2- (∑yi)2)
Scatter Plots of Data with
Various Correlation Coefficients
Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r=0
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
terminology…..
R2
 variation in Y accounted for by the set of
predictors
 Measure of how good a forecasting of
dep. variable by knowing the independent
variables.
 When applied to reality, R2 over estimate
the success
terminology…
Adjusted R2
 The adjustment takes into account the size
of the sample and number of predictors
 Gives most useful measure of success of our
model ( goodness of fit)
 R2 range:0 to 1.
 If R2=0.75, success will be 75%
Is each X contributing to the prediction
of Y?
• Test if each regression coefficient is
significantly different than zero given the
variables standard error.

## – T-test for each regression coefficient

Performing regression in
spss
Eg:importance of several
psycholinguistic variables on spelling
performance
variables
Independent:
standardized spelling score(spellsc),
chronological age(age), reading
age(readage), standardized reading
score(standsc)
Dependent variable:
percentage correct spelling(spelperc)
Performing regression in
spss

• SPPS=Statistical Packages
in Social Sciences
Enter the data
Cont..
>Analyze>regression> linear
 dialogue box appears
now enter dependent and independent
variables
Selection methods:
on relative contribution of independent
variables
1. simultaneous/ enter method
2. Hierarchical method
3. Statistical methods
a. Forward
b. Backward
c. Stepwise
d. Remove
Now click the statistics button

## Now click ‘continue’> then ‘ok’

Output:
Cont…
• Here reading age is not a significant
predictor

result:
percentage correct spelling=
-232+.406*chronological age
+.394*standardized reading score
+.786*standardized spelling score
THANKYOU…