You are on page 1of 9


Multiple Regression (Part 1) (DR SEE KIN HAI)

1. Standard Multiple Regresion 2. Stepwise Multiple Regression 3. Hierarchical Multiple Regression 1. This is an extension from the bivariate correlation ( 2 variables correlation) 2. Regression analysis is used when several independent variables (continuous or categorical) are correlated with themselves and with the dependent variable (continuous scale). If dept var is not continuous use Discriminant Function Analysis). 3. Three Regression Models: (a) Standard/ Simultaneous Regression = all indep vars enter at once to see the relationship btw whole set of predictors and dep var. (b) Hierarchical Multiple Regression = you determine which indep vars to enter first based on your theoretical knowledge. (c) Stepwise Multiple Regression = No of indep vars entered and the order of entry (forward= enter predictor one at a time accepted if F ratio > Critical value (FIN) and critical alpha level (PIN) determined by statistical criteria. Backward selection = with all variables in equation and poor performers deleted if F value < critical value (FOUT). Stepwise uses combination of forward and backward procedures). 4. You will practice on the use of the 3 types of regressions as stated above.

1. Standard (Simultaneous) Multiple Regression Analysis (Practice)

Independent Variables

Dependent Variable

RIPAS hospital uses data from 15 patients below to examine the relationship between the length of STAY (or Academic Results in your case) in hospital (days), the age of the patient (AGE) and the number of previous admissions (PREVAD). Using the data shown, answer the research questions below. 1. What contribution do both age of the patient and number of previous admissions make to the prediction of the length of stay in RIPAS? (Use Standard Multiple Regression) 2. Which is the best predictor of length of stay in RIPAS? (Use Stepwise Multiple Regression) 3. Previous research has suggested that number of previous admissions is the salient (most important) predictor of length of stay in RIPAS. Is this hypothesis correct? (Use Hierarchical Multiple Regression) How to carry out this Standard (Simultaneous) Multiple Regression Analysis? 1. Enter the data as shown below into SPSS 20. 2. Click on [Analyze] then [Regression] then [Linear] to open the dialogue box

Your case (Dep Var) Academic Results

Attitude (Ind Var)

Motivation (Ind Var)


(Ind Var)


Select [Length of Stay] and enter into [Dependent] Box

Select [Age] and [Prevad] and enter into [Independent] box

For [Method] select [Enter]

Select [Statistics] to open [Linear Regression] sub-dialogue Box

Select [Estimates] , [R sq change] and [Model fit] then click on [Continue]

In the previous screen, click on [Plots..] to open [Linear Regression Plots] sub-dialogue box

Select [ZRESID] and click to [Y] and [ZPRED] and enter into [X] box In the [Std Residual Plot] select [Normal Prob plot] then [Produce all partial plots] and [Continue]. Click [Save] from FIG 1 to open [Linear Regression Save New Variables] sub-dialogue box

In the [Distances] box select [Mahalanobis] then [Continue] and [OK]

Both independent variables as predictors together explain 85% of the variance in [length of stay] and is highly sig as indicated by F value (34.1) at sig p = 0.000

An examination of T values indicates that [Age] adds to the predictor power in the equation p = 0.007 and [Prevad] is not sig at p=0.983

This[ MAH_1] is added to your data file with [Mahal Distance] small values and there is not multivariate outliers among thee ind variables as no values are greater or equal to critical

2 of 13.8 at

p = 0.001 level

From scatterplot of [Residuals] Vs [Predicted values] it does not show any clear relationship. The [Normal plot] of Regression standardized residuals for Dep variable indicates a relatively Normal distribution

Highly correlated

Even we partial out the other Independent variables (Age or Prevad) we can see that [Age] is quite highly correlated with the Dependent variable [Length of stay] but not [Prevad] Conclusion for Research Question 1:
No correlation

We can say that [Age] significantly predicts the [length of stay] of patient in RIPAS. However the number of previous admission [Prevad] is not a significant predictor.

2. Stepwise Multiple Regression Analysis

1. [Analyze] then [Regression] then [Linear .] to open the dialogue box 2. Enter [Length of stay] into the [Dependent] box and [Age] and [Prevad] into the [Independent] box. 3. For the [Method] select [Stepwise] then click on [Statistics] to open [Linear Reg Stat] sub-dialogue box. 4. Select[Estimates], [Model fit] and [R sq change] then [Continue]

To Plot Charts (You will get the same charts as Standard Regression) 1. Click on [Plots..] to open the [Linear Regression Plots] sub-dialogue box.

Interpreting the Output Here each new step is called a Model. Model 1 uses highest correlated predictor [Age] and Model 2 uses [Age] and lesser correlated second predictor [Prevad]. Generally you concentrate on the Model that has the highest correlation. Table below is badly shown with highest correlation entered first then follow by second highest correlated predictor and the rest of smaller correlateed predictor if any.

No value added after below 0.850 for [Prevad]

This Table shows predictor [Age] is entered in the first step of stepwise analysis (Model 1) with the highest correlation with Dep variable [Stay] and is significant at p = 0.000. Multiple R= correlation between [Age] and [Saty] = 0.922. R square = square of multiple correlation coefficient = 0.85 indicating 85% of the variance in the criterion is shared/ explained by this predictor [Age]. Adjusted R-square =R-square adjusted for the size of the sample and the number of predictors in the equation. {

x1 =Age, b1 =Regression coefficient for x1


x2 = Prevad,

y = a + b1 x1 + b2 x2 where y=estimated length of stay, b2 = Regression coeff for x2 } Effect is to reduce size of R square to 0.839

from 0.850. Here analysis stop as the second predictor does not contribute to significant proportion of Dep variable [Saty] so no Model 2. Rsquare change=increase in the output of the variance in [Stay] by predictor 2 if any (not in this case for [Prevad]) Interpreting the proportion

Beta for Model 1 = 0.922= Muliple R = standardized regression coefficient. To get output fast ignore [constant] see the next row/s here [Age] sig predictor of [Stay] at p=0.000

B = +ve means positive high correlation of [Age] and [Stay] = 0.922

T value for [Age] is sig at p = 0.000 but not [Prevad] not sig

Reporting the output In this stepwise multiple regression analysis, [Age] was entered first and explained for 85% of the variance in [Length of stay] for the patient in RIPAS with F1,13 = 73.84, p < 0.001 . Table included as below: Table 1.0 Stepwise multiple regression off predictors of length of stay in RIPAS (only significant predictors are included) Variable Age Multiple R 0.922 B 1.057 Standard error b 0.123 Beta t Significance of t 0.000

0.922 8.563


Hierarchical Multiple Regression and Coursework for the 3 types of Regression

(See Next Lecture)