You are on page 1of 22

Multiple regression

Introduction
• In multiple regression and correlation we use more than one independent
variable to investigate the dependent variable.

• In multiple regression and correlation analysis, the process consist of


three steps

• Describe the multiple-regression equation.

• Examine the multiple-regression standard error of estimate.

• Use multiple-correlation analysis to determine how well the regression


equation describes the observed data.

• In addition, in multiple regression, we can look at each individual


independent variable and test whether it contribute significantly to the
way regression describe the data.
MULTIPLE-REGRESSION EQUATION
Here, we have more than one independent
variable therefore use X1 , X2 to represent the
variables.
Thus the equation of the line of estimation
will be
Illustration
• The Internal Revenue Service (IRS) is trying to estimate the
monthly amount of unpaid taxes discovered by its auditing
division, in the past the IRS estimated this figure on the
basis of the expected number of field-audit labour hours. In
recent years, however, the field audit labour hours have
became a unreliable predictor of the actual unpaid taxes.
As a result, the IRS is looking for another factor with which
it can improve the estimating equation.

• The auditing division does keep a record of the number of


hours its computers are used to detect unpaid taxes. Could
we combine this information with the data in the field audit
labour hours and come up with a more accurate estimating
equation for the unpaid taxes discovered each month?
Illustration
Field-Audit Labor Actual Unpaid Taxes
Month Computer Hours
Hours Discovered
January 45 16 29
February 42 14 24
March 44 15 27
April 45 13 25
May 43 13 26
June 46 14 28
July 44 16 30
August 45 16 28
Septemb
44 15 28
er
October 43 15 27
• The auditing division can use this equation monthly to
estimate the amount of unpaid taxes it will discover.
Direct Formula
Regression output in Excel
Regression Statistics
Multiple R 0.85377
R Square 0.72892
Adjusted R Square 0.65147
Standard Error 1.07064
Observations 10

ANOVA
Significance
df SS MS F F
Regression 2 21.57613 10.78806 9.411471 0.010371
Residual 7 8.023873 1.146268
Total 9 29.6

Standard
Coefficients Error t Stat P-value
Intercept -13.81963 13.32330 -1.03725 0.33411
X1 0.56366 0.30327 1.85859 0.10543
X2 1.09947 0.31314 3.51112 0.00984
Multiple R
• The "R" column represents the value of R, the
multiple correlation coefficient. R can be
considered to be one measure of the quality
of the prediction of the dependent variable.
R Square
• The "R Square" column represents the R2 value
(also called the coefficient of determination),
which is the proportion of variance in the
dependent variable that can be explained by the
independent variables (technically, it is the
proportion of variation accounted for by the
regression model above and beyond the mean
model).
• You can see from our value of 0.728 that our
independent variables explain 72.8% of the
variability of our dependent variable.
Interpretation of R squared
• After fitting a linear regression model, you need to
determine how well the model fits the data. Does it do
a good job of explaining changes in the dependent
variable?
• R-squared is a goodness-of-fit measure for linear
regression models.
• R2 by itself can't thus be used to identify which
predictors should be included in a model and which
should be excluded. R2 can only be between 0 and 1,
where 0 indicates that the outcome cannot be
predicted by any of the independent variables and 1
indicates that the outcome can be predicted without
error from the independent variables.
Interpretation of R squared
• R-squared evaluates the scatter of the data points
around the fitted regression line. It is also called
the coefficient of determination, or the
coefficient of multiple determination for multiple
regression.
• For the same data set, higher R-squared values
represent smaller differences between the
observed data and the fitted values.
• R-squared is the percentage of the dependent
variable variation that a linear model explains.
Are Low R-squared Values
Always a Problem?

• No! Regression models with low R-squared values


can be perfectly good models for several reasons.
• Some fields of study have an inherently greater
amount of unexplainable variation.
• Fortunately, if you have a low R-squared value
but the independent variables are statistically
significant, you can still draw important
conclusions about the relationships between the
variables.
Adjusted R squared
• Adjusted R2 is a corrected goodness-of-fit (model
accuracy) measure for linear models. It identifies the
percentage of variance in the target field that is
explained by the input or inputs.
• R2 tends to optimistically estimate the fit of the
linear regression. It always increases as the number
of effects are included in the model. Adjusted R2
attempts to correct for this overestimation. Adjusted
R2 might decrease if a specific effect does not
improve the model.
P-Values
• P-values and coefficients in regression analysis work together to
tell you which relationships in your model are statistically
significant and the nature of those relationships.
• The coefficients describe the mathematical relationship between
each independent variable and the dependent variable.
• The p-value for each term tests the null hypothesis that the
coefficient is equal to zero (no effect). A low p-value (< 0.05)
indicates that you can reject the null hypothesis.
• In other words, a predictor that has a low p-value is likely to be a
meaningful addition to your model because changes in the
predictor's value are related to changes in the response variable.
• Conversely, a larger (insignificant) p-value suggests that changes
in the predictor are not associated with changes in the response.
Standard Error of Estimates
• The general form of the multiple-regression equation is
• Standard Error of Estimate

• Standard Error of Estimate

• Where
• Y = sample values of dependent variable
• ŷ = corresponding estimated values from the
regression equation
• N = number of data points
• K = number of independent variables
Standard Error of Estimates

 The denominator of this equation indicates that in


multiple regression with k independent variables, the
standard error has n − k − 1 degrees of freedom.
 This occurs because the degrees of freedom are
reduced from n by the k + 1 numerical constants, a, b1,
b2, . . . , bk that have all been estimated from the same
sample.
Calculation of Standard
Error of Estimates
Field-Audit Labor Actual Unpaid Taxes
Month Computer Hours
Hours Discovered
January 45 16 29
February 42 14 24
March 44 15 27
April 45 13 25
May 43 13 26
June 46 14 28
July 44 16 30
August 45 16 28
September 44 15 28
October 43 15 27
Calculation of SEE
Y x1 x2 ŷ (Y-ŷ) (Y-ŷ)^2
29 45 16 29.136 -0.136 0.02
24 42 14 25.246 -1.246 1.55
27 44 15 27.473 -0.473 0.22
25 45 13 25.839 -0.839 0.70
26 43 13 24.711 1.289 1.66
28 46 14 27.502 0.498 0.25
30 44 16 28.572 1.428 2.04
28 45 16 29.136 -1.136 1.29
28 44 15 27.473 0.527 0.28
27 43 15 26.909 0.091 0.01
Here the SEE is 1.071.
Principle of parsimony
• The principle of parsimony is attributed to the early
14th-century English nominalist philosopher, William of
Occam, who insisted that, given a set of equally good
explanations for a given phenomenon, the correct
explanation is the simplest explanation.
• It is called Occam's razor because he ‘shaved’ his
explanations down to the bare minimum: his point was
that in explaining something, assumptions must not be
needlessly multiplied.
• In particular, for the purposes of explanation, things not
known to exist should not, unless it is absolutely
necessary, be postulated as existing.
Principle of parsimony
• For statistical modeling, the principle of parsimony
means that:
• Models should have as few parameters as possible.
• linear models should be preferred to non-linear
models.
• Experiments relying on few assumptions should be
preferred to those relying on many.
• Models should be pared down until they are minimal
adequate.
• Simple explanations should be preferred to complex
explanations.

You might also like