You are on page 1of 17

1|Page

A dataset of 9000 credit card customers was provided. Of these credit card
customers about 2400 customers are inactive (i.e., have never used the card) and the
rest are active.
We would like to understand the factors that affect whether a credit card customer
is active or not. The variables that we are interested in using to explain active
status is:

1. The mode of acquisition (whether they were acquired through direct mail
(DM), direct selling (DS), telephone sales (TS) or through internet (NET))
2. Whether they have a Reward card (i.e., a card that gives points for every
dollar purchased)
3. Whether they have an affinity card
4. The type of card that they were given (that is, whether they have a standard,
gold, platinum or quantum card)
5. Credit limit
6. Number of cards that they have with his bank

ID ID of the account
Active Whether the account is active (=1) or not (=0)
Affinity whether the customer has a affinity card (=1) or not (=0)
Rewards whether the customer has a reward card (=1) or not (=0)
Limit credit limit of the customer
numcard number of cards that the customer has from this bank
Dm whether the customer was acquired though direct mail (1=Yes, 0=No)
Ds whether the customer was acquired though direct selling (1=Yes, 0=No)
Ts whether the customer was acquired though telephone selling (1=Yes, 0=No)
Net whether the customer was acquired though internet (1=Yes, 0=No)
Gold whether the customer has a GOLD card (1=Yes, 0=No)
platinum whether the customer has a PLATINUM card (1=Yes, 0=No)
quantum whether the customer has a QUANTUM card (1=Yes, 0=No)
standard whether the customer has a STANDARD card (1=Yes, 0=No)
Profit profit generated by the customer over a 3 year period
Totalfee Total fee paid by the customer over a 3 year period
Totfc Total finance charges paid by the customer over a 3 year period

1. Run a binary logit model to model the probability that a customer is active.

Comment on the model fit. Do the covariates do a good job of explaining

whether a customer is active or not?
2|Page

Percent Concordant = 76.8

Percent of being active = 72.6%
So, model is a good fit as percent concordant > percent of being active.
Here, we see that model is predicting better than our random intelligent guess.

Also, by comparing the values of Intercept only and Intercept and Covariates of AIC,
SC and -2LogL the values for all criterions are decreasing which indicates that the
covariates are doing a good job.
The values of AIC and SC are somewhat similar to adjusted R square if we keep
adding insignificant covariates that do not explain much variation in the target
variable then the value of AIC and SC increases.

The value of McFadden R square is 16.25% which means the covariates in the model
explain 16.25% variation in the dependent variable.
3|Page

Which of the explanatory variables are significant? How did you decide this?

The variables Rewards, limit, numcard, ds,ts,gold,platinum,quantum are significant

and we can check its significance by simply looking at Pr>Chisq the p value less than
0.05 indicates that the variables are significant.
Another relation from the significant variable using parameter estimate and
corresponding t-value is:
DS - INT = -1.94
TS INT = -2

DS TS = .19
Error = (.1757+.1686)/2 = .17215

T= .19/.17 = 1.117, which is less than the critical value 1.96, so we do not reject null.
That means contacting customers through direct mail or by telephone makes no
difference in acquisition of a customer.

(Int(Internet)=Dm(DirectMail))> (Ds (Direct Sales) =Ts (Telephonic sales))

Cards:
Standard>Platinum>Gold>Quantum

What is the interpretation of the coefficient of direct mail?

The percentage change in log of odds of being an active customer decreases by
8.4% if the customer was acquired through mail compared to internet
4|Page

(Exp(beta)-1) *100=8.4
But this variable (dm) is not significant (p=valuable >0.05). So, No effect on
activation of customers

Which of the four modes of acquisition generates the most active customers?

As we checked parameter estimate and got the relevant order (Int=Dm)>(ts=ds).

But parameter estimate of Internet is 1 which is highest so we will say:
Internet Mode of acquisition generated most active customers.

Which of the four types of cards has the most active customers?

For Cards We calculated the order (from Parameter estimate):

Standard>Platinum>Gold>Quantum
So, Standard has the most active cards.

How much does the presence of a reward card affect the odds of a customer
being active?
We can say that the percentage change in log odds of being active customer in
presence of reward decreases by 36.9%. Also, this variable is significant.

How much does an affinity card affect the odds of a customer being active?
We would have said that the percentage change in log odds of being active
customer if the customer has affinity card increases by 2.6% but this variable is
insignificant so no affect.

How well does the model predict whether a customer is likely to be active or
not? How do you determine this?
The key focus for this is the Percent concordant, the high value we get the better it
is. So, we can also say in order to determine how accurately the model predicts
the active or not active customers by looking at the following criteria:
Percent Concordant - A pair of observations with different observed responses
is said to be concordant if the observation with the lower ordered response
value (active = 0) has a lower predicted mean score than the observation with
the higher ordered response value (active= 1).
5|Page

So, we can say that the in the model 76.8% of the times the predicted pairwise
observation of the higher order and lower order value is correct.

Percent Concordant = 76.8

Percent of being active = 72.6%
So, model is a good fit as percent concordant > percent of being active

What is the predicted probability of a customer being active if he has only

one GOLD reward card but no affinity card, credit limit of 10000, and was
acquired through telesales?

Elasticity with respect to affinity, and rewards is as follows:

We know that
For Affinity:
Elasticity = (1-.726) *0.0255*.833=0.0058

For Reward:
Elasticity = (1-0.726) * -0.4607*.2031= -0.026

quantum;
Tobit model:
6|Page

7|Page

Model profit = totalfee affinity rewards limit numcard dm ds ts gold platinum

quantum;

Selection Model:
8|Page

1. What additional or different findings emerge from the Tobit model results.
9|Page

AIC in Tobit model: 116082, SC in Tobit: 116175 & -LogL in Tobit: -58028
AIC in Slection is 28771, SC is 28948 in Selection and -LogL is: -14360

AIC of selection is much lessor, so we go with selection model.

Tobit and selection
Tfee is significant in both
Affinity is insignificant in Tobit and insignificant in selection model
Rewards is significant in both
Limit significant in both
Numcard is significant in both
Dm is significant in Tobit but not significant in selection
DS & TS both are significant in both models
Gold is insignificant in Tobit and in Selection too it is insignificant
Platinum and quantum significant in both

We can see that all the variables in both the models are significant and by comparing of
AIC and SC values, Selection proves to be a better model with lower AIC and SC values.

2. What additional or different findings emerge from the selection model results.
By comparing the logistic model and selection model, the variable reward is significant in
logistic model and insignificant in selection model.

Logit Model and Selection Model

Affinity is insignificant in Logit & selection both
Rewards is significant in both
Numcard is significant in both
Limit is significant in both
DM is insignificant in both
DS & TS significant in both
Gold, Quantum & Platinum significant in both

Note: - Sigma gives us an estimate of variance in errors and _Rho is significant and
high parameter estimate value so its better to use Selection Model.
10 | P a g e

Provided with scanner dataset for ketchup, which has 4 brands (Heinz, Hunts,
Del Monte and Other). The data has prices of four brands, dummy variables for
display and feature ads (flyer ads) for each brand and the brand chosen in each
week in each store. There are 3129 observations in the dataset and 25 variables. The
order and description of the variables is as follows.

1 HID Household ID
2 STID Store ID
3 WEEK Week (YYYYWK)
4 BR Brand chosen
5 P1 Price of brand 1
6 P2 Price of brand 1
7 P3 Price of brand 1
8 P4 Price of brand 1
9 D1 Whether there was a display of brand 1 in that week (1=yes 0 =
No)
10 D2 Whether there was a display of brand 2 in that week (1=yes 0 =
No)
11 D3 Whether there was a display of brand 3 in that week (1=yes 0 =
No)
12 D4 Whether there was a display of brand 4 in that week (1=yes 0 =
No)
13 F1 Whether there was a feature of brand 1 in that week (1=yes 0 =
No)
14 F2 Whether there was a feature of brand 2 in that week (1=yes 0 =
No)
15 F3 Whether there was a feature of brand 3 in that week (1=yes 0 =
No)
16 F4 Whether there was a feature of brand 4 in that week (1=yes 0 =
No)
17 INC Income level (1 (low income) to 14 (high income))
18 NMEMB Number of members in the household (family size)
19 TOT Total number of purchases made by a household in the entire data
period
20 DOLSPENT Dollars spent on that trip
21 KID Whether there is a small child in the family
22 L1 Loyalty for brand 1
23 L2 Loyalty for brand 2
24 L3 Loyalty for brand 3
25 L4 Loyalty for brand 4
11 | P a g e

1. Estimate a multinomial logit model using PROC MDC.

Assume utility for brand i is a function of Pi, Fi, Di, Li, nmemb, kid, inc.
12 | P a g e

Interpretation:
Brands:
On the basis parameter estimates we can say about the preferred brands:
Brand1 is the most preferred brand.
Br1>Br3>Br2>Br4

For higher income, we can say about the preference in the following way:
Band 1>Brand2> Brand4>Brand3

Number of members:
(Brand3=Brand2=Brand4)>Brand1
With more number of people in family they prefer brand other than the top
brand(BR1)

Kids:
Brand 2 is the most preferred brand
Note: Kid3 and Kid4 are in significant
13 | P a g e

2. Now estimate a random coefficients multinomial logit model using PROC

BCHOICE. Assume the same utility function as above.

From above we can say that Br2 Br3 Br4 and Price are significant base on 95%
HPD interval

3. Make a table of coefficients and t-values, from both models. Do not give me
the full SAS output. I can see that by running your code.
14 | P a g e

Br2, Br3, B4, Price, Display, Feature, Loyalty, Inc3, Inc4, Nmemb2, Nmemb3,
Kid2 are significant. (P-values less than 0.05)

For Bchoice:

From above we can say that Br2 Br3 Br4 and Price are significant base on 95%
HPD interval

4. Write a brief report explaining your findings. Comment on model fit

(likelihood values, AIC, SC), significant parameters and their effect on the
15 | P a g e

probability of choice, and the effect of using a random coefficient logit model
on parameters and model fit.

The McFadden R-square is 61.5% after adding separate coefficients of non-brand specific
variables so we can say that 61.5 % of variation in probability of choosing a brand is
explained by all the independent variables

We know that there are total 4 brands to play around so we created dummy brands. The
brand specific variables i.e. independent variables ((P1-P4, D1-D4, L1-L4)) are left as the
same (brand specific variable have the same coefficient) and non-brand specific like
income, kid and number of members will have separate coefficients.

Detail Explanation:
Br2, Br3, B4, Price, Display, Feature, Loyalty, Inc3, Inc4, Nmemb2, Nmemb3,
Kid2 are significant. (P-values less than 0.05)

We can clearly see the significance part from p-value.

With 1 unit increase in price, the utility to choose a brand decreases by -2.0360 units,
holding all other explanatory variables constant.

If a feature is present for a particular brand, then the utility to choose a brand increases by
.8648 units, holding all other explanatory variables constant.

If there is a display versus there is no display for a brand, the utility to choose that brand
increases by 0.4221 units, holding all other explanatory variables constant.

If a customer is loyal for a particular brand then the utility for that brand increases by
3.4513 units, holding all other explanatory variables constant.
16 | P a g e

5. Compute own price elasticity and cross price elasticity for brand 1.

Own price elasticity = (1-prob(j))*Xj*

(1-.6839) *-2.036*3.733=-2.397
Cross price elasticity = (-p (brand 1)) * price*beta
-.6839*-2.036*3.733=5.2031
17 | P a g e