You are on page 1of 2

ANSWER a.

Let probability of participation = r (categorical data)

Let distance travelled = d (numerical data)

Let availability of car = c (categorical data)

Let age = a (numerical data)

The logistic regression model for r can be expressed as

r = -0.675d + 1.9369c - 0.0599a + 5.7665


ANSWER b.

Important: This is logistic regression and not linear where the coefficient of an independent
variable (IV) indicated by what extent the dependent variable (DV) increases or decreases by a
unit change in the independent variable.

But in logistic regression, the coefficients are log odds ratio. Before we understand this we need
to remind ourselves that logistic regression or model is used to predict the probability of an event
like success/ failure, yes/ no, happy/ unhappy happening. So the coefficients in a logistic
regression model tell us the log oddsthat an observation is in the target class (“1”). In other
words, If the logistic is >1 there is ahigher chance in the probability of the event when you have a
positive change in the IV and vice versa. That is if the logistic < 1 the probability of the event
decreases when you have a positive change in the IV.
So, our general interpretation is as follows:

distance: It is numeric data. It's coefficient is -0.675 (<1) So with every positive change in
distance travelled, there is a decrease in the probability of participation happening. In other
words, farther the patient from the rehab center, less likely they were to participate.

car: It is categorical data. It's coefficient is >1. So with every positive change in a car being
owned, there is an increase in the probability of participation happening. What this means is that
when distance and age are ignored, then the odds of participation happening is 10 1.9369
age: This is also numerical data. It's coefficient is -0.0599. (<1). So with every positive change in
age, there is a decrease in the probability of participation happening. So with every 1 year
increase in age of person, there is a decrease of 0.0599 in participation. In other words, older the
patient , less likely they were to participate.

Which of the independent variables (distance, car and age) has a stronger 'say' or influence on
the dependent variable? For that we need to have some similarity in their nature. Since this is
logistic regression, we will take the exponents of the coefficients to comment on their strength.
So, coefficient of car is 6.9372, that for distance is 0.509156 and that for age is 0.941859

Now we can say that of the 3 variables, the presence of a car has the most contribution in
determining participation, followed by the age and then distance.

c.

The odds ratio for a patient to participate in rehabilitation if he or she travels 20km to
rehabilitation, has a car, and is 65 years old is given by

Log odds of participation = -0.675(20) + 1.9369(1) - 0.0599(65) + 5.7665 = -9.6901

Odds of participating = Exp(-9.6901) = 0.00006189 which is 1: 16157. The likelihood of the said
person participating in rehab is 1 in 16157
d.

The probability is given


by 

 
= 0.00006189 which is same as before.

You might also like