You are on page 1of 14

Multiple Regression

Aiza Mamansag-Tabal, RM, RN

N202 (Statistics) – MAN 1 Sec B


Example:

Delivery of Medical Supplies


Let’s assume that you are a nursepreneur who offers/deliver medical supplies to
rural areas in Luzon. You are able to use Google maps to group individual
deliveries into one trip to reduce time and fuel costs. Therefore the trips will have
more than one delivery.

As the nursepreneur, you would like to be able to estimate how long a delivery will
take based on two factors:
1. The total distance of trips in miles.
2. The number of deliveries to be made during the trip.
DMS Data and Variable naming
To conduct your analysis you take a random sample of 10 past trips and record the three pieces
of information for each trip: 1) total miles traveled. 2) number of deliveries, and 3) total travel
time in hours.

Remember that in this case, you would like to be able to predict


the total travel time using both the miles travelled and number of
deliveries on each trip.

In what way does the travel time DEPEND on the first two
measures?

Travel times is the dependent variable and miles travelled and


number of deliveries ar the independent variables.
Multiple Regression
is an extension of simple linear regression

Simple linear regression


IV D.V. one-to-one

IV IV Multiple regression
…or more D.V. many-to-one

IV IV
New Considerations
● Adding more independent variables to a multiple regression procedure does not mean the
regression will be “better” or offer better predictions; in fact it can make things worse. This is
called OVERFITTING.

● The addition of more independent variables creates more relationships among them. So not only
are the independent variables potentially related to the dependent variable, they are also
potentially related to each other. When this happens, it is called MULTICOLLINEARITY.

● The ideal is for all the independent variables to be correlated with the dependent variable but
NOT with each other
New Considerations
● Because of multicollinearity and overfitting, there is a fair amount of prep-work to do BEFORE
conducting multiple regression analysis if one is to do it properly.

• Correlations

• Scatter Plots

• Simple Regressions
More Relationships
Independent variables Dependent
variable
Potential multicollinearity

milesTraveled
(x1)

Multiple regression
travelTime,
many-to-one
(y)
numDeliveries
(x2)
IV
The ideal is for all of the independent variables to be
correlated with the dependent variable but NOT with
each other.
IV

Multiple regression
D.V. many-to-one

IV 10 relationships to consider!

Some Independent variables, or sets of


independent variabes, are better at predicting the
DV than others. Some contribute nothing.
IV
Multiple Regression Model
• A regression model is a statistical model that estimates the relationship between
one dependent variable and one or more independent variables using a line (or a
plane in the case of two or more independent variables).
• A regression model can be used when the dependent variable is quantitative,
except in the case of logistic regression, where the dependent variable is binary.
Multiple Regression Model
Multiple Regression
Model

Multiple Regression
Equation

Estimated Multiple
Regression Equation
Estimated Multiple Regression Equation
variables

Example

intercept coefficients

Estimated Multiple
Regression Equation
Interpreting Coefficients
ỹ = 27 + 9 x1 + 12 x2
x1 = capital investment ($1000s)

x2 = marketing expenditures ($1000s)

ỹ = predicted sales ($1000s)

In multiple regression, each coefficient is interpreted as the estimated change in ỹ


corresponding to a one unit change in a variable, when all other variables are held
constant.

So in this example, $9000 is an estimate of the expected increase in sales ỹ, corresponding


to a $1000 increase in capital investment (x1) when marketing expenditures (x2)
Review
● Multiple regression is an extension of simple linear regression.

● Two or more independent variables are used to predict /explain the variance in one dependent variable

● Two problems may arise: 1) Overfitting, 2) Multicollinearity

● Overfitting is caused by adding too many independent variables; they account for more variance but add
nothing to the model

● Multicollinearity happens when some/all of the independent variables are correlated to each other

● In multiple regression, each coefficient is interpreted as the estimated change in a variable, when all other
variables are held constant.
Thank you
&
God bless

You might also like