You are on page 1of 49


Raghuram Iyengar, University of Pennsylvania Sunil Gupta, Columbia University

The previous chapter covered the basics of the powerful and yet simple technique of Ordinary Least Squares (OLS). It was noted that the mathematical relationship between the dependent variable for an observation yt at time t and a vector of independent variables xt can be written in the following manner. yt = xt + t (1)

Here, xt is the transpose of vector xt and is a vector of parameters. Also, yt is continuous from to and t is the random error that is typically assumed to be normally distributed. Several scenarios fit the assumption of a continuous dependent variable that ranges from - to . In cases when yt is strictly positive (e.g. sales), we can transform it as ln(yt) to make it lie between - to and continue to use OLS. But what happens if the dependent variable is discrete (e.g. buy / no buy) or choice of a brand (e.g., Brand A, B, C or D) and we want to analyze the effect of brand prices on these decisions? The purpose of this chapter is to show methods that can be used in such scenarios. We begin with Discriminant Analysis. This is followed by a discussion of logistic regression and the multinomial logit model. Thereafter, we focus on the multinomial probit model. The chapter ends with a discussion on Tobit models.

Discriminant Analysis

Consider the following example where a dependent variable is binary a buy / no buy decision. A company that has introduced a product in the market wishes to describe the people that are buying its product. Figure 14.1 shows the demographic information that the company has together with the purchasers (P) and non-purchasers (N). The figure suggests that purchasers of this product are older and richer. Thus, age and income discriminate among the purchasers and non-purchasers. However, it is not clear which of the two variables is more important and how we can predict a new person to be a purchaser or non-purchaser based on his/her income and age. Such questions can be answered by using discriminant analysis. [Figure 14.1 about here] Discriminant analysis is a method to analyze which independent variables discriminate among groups and to classify observations into predetermined groups based on these variables. These predetermined groups can be either binary (eg., buy or no buy) or more than two. In the latter case, the analysis is termed as multiple discriminant analysis. For the sake of simplicity, we begin with a two-group discriminant analysis. In a discriminant analysis, an index is built using the measured characteristics as the independent variables. Thus for an observation at time t, ft = x1t1 + x2t2 + x3t3 + + xKtK = xt Here, ft is the index. It is also called the discriminant function. There are K measured characteristics (x1t, x2t, ..., xKt). The vector xt is the transpose of vector xt that contains these K variables. There are also K parameters (1, 2, , K) , which are the weights corresponding to these variables. These weights are also termed as the discriminant coefficients. The goal of discriminant analysis is to estimate weights such that the index values for the two groups are as far as possible. In other words, the weights are derived such that the variation (2)


in f scores between the two groups is as large as possible, while the variation in the f scores within the groups is as small as possible. That is, the weights are derived so that the following ratio is maximized. Between - Group Variation Within - Group Variation (3)

Maximizing the above ratio makes the two groups as distinct as possible with respect to the index values. More mathematically oriented readers can see Chapter 11 of Johnson and Wichern (2002) for a description of how the above quantity is maximized. Discriminant analysis is related to and yet is distinct from linear regression. In both methods, there is a weighted linear combination of independent variables that is used to predict a dependent variable. Also, like linear regression, discriminant analysis suffers from multicollinearity of the independent variables. The primary difference between the two methods is that in a linear regression, the dependent variable is typically assumed to range from - to whereas in a discriminant analysis, the dependent variable is group membership i.e., is discrete. For an application where the group membership is in two groups, a linear regression can be run with a dummy variable representing group membership as the dependent variable. The estimates from such a regression will be proportional to the weights that are obtained from a discriminant analysis. When the number of groups is, however, greater than two, then a regression will not yield the same results. Discriminant analysis is also different from cluster analysis (see Chapter 18, this book). In discriminant analysis, the groups are predetermined and the analysis is focused on which variables best discriminate among these groups. In a cluster analysis, the group memberships are unknown and the focus of the analysis is to form these groups.


Consider the following example of a two-group discriminant analysis. Table 14.1 contains the data on the fifty US states and they are broken down into two groups 15 states that are South and 35 that are Non-South (Lehmann, Gupta, & Steckel, 1997). These groups are compared on observable characteristics such as income, population and others. A univariate Ftest compares the differences in means across the two groups on each of the independent variables. The big differences between the two groups appear to be in income, tax per capita and mineral production. A discriminant analysis was run and Table 14.1 contains the discriminant coefficients. These are the weights of the independent variables. Another column shows standardized discriminant coefficients. These coefficients are similar to the standardized regression coefficients in an OLS regression. They correct for any scale issues associated with the independent variables. We can calculate these coefficients by first standardizing the independent variables and then running a discriminant analysis or by first running a discriminant analysis and then multiplying each discriminant coefficient by the standard deviation of the respective independent variable. Both methods yield standardized coefficients and these can be used to ascertain how a change of one standard deviation in each independent variable will affect the discriminant function. [Table 14.1 about here] From the estimated unstandardized coefficients, we find that population is most important variable for discrimination followed by average income. Upon standardizing the variables, we observe a different set of variables that are important. We find that while population is still the most important, college enrollment and manufacturing output are clearly more relevant for


discrimination among the states than is average income. Thus, a failure to account for differences in scale can lead to erroneous conclusions about the relative importance of variables.

Measures of Fit
There are several measures of fit that are used to analyze how good is the model for discrimination.

Chi-Squared Value
A Chi-Squared value tests whether overall the variables help discriminate among the two groups. This is very similar to the F-test for overall significance in a regression setting. Here, the Chi-Squared value is 42.71. For testing the significance, we look at the critical value for 11 degrees of freedom (the number of independent variables). This value is 31.3 at the 0.001 level. Thus, the variables clearly help in discrimination.

Canonical Correlation
The canonical correlation is the correlation resulting from a regression of the independent variables on a dummy dependent variable. Its squared value is the R2 from this regression. In this example, the canonical correlation is 0.80. Thus, the R2 is 0.64.

Wilks Lambda
Wilks Lambda is the ratio of within-group variance to the total variance. Here, it is essentially 1- R2. Thus, the Wilks Lamda is 1-0.64 = 0.36.

The Hit-Miss Table

The Hit-Miss table provides an indication of how good is the discriminant function in classifying observations. Table 14.2 is such a hit-miss table. Here we find that 32 out of the 35 non south states and 14 out of the 15 south states are correctly classified. Thus, the overall classification rate is (32+ 14)/ 50 i.e., 92 %.


[Table 14.2 about here]

Multiple Discriminant Analysis

A multiple discriminant analysis is carried out when the observations are preclassified into more than two groups. The basic idea is first to find a single function that spreads all groups as far apart as possible. Then, a second function is found that best explains any differences among groups and so on. If there are K groups, then K-1 discriminant functions are found. To illustrate multiple discriminant analysis, we consider an example described in Lehmann, Gupta, and Steckel (1997). In this example, there are five groups of consumers depending on how much they spend in dollars on their monthly expenditure for food. Table 14.3 shows the five groups together with the averages of the independent variables. The means appear to indicate the larger spenders are more educated, are younger, have higher incomes, have larger family sizes and shop more extensively. Table 14.3 also shows F tests for the variables for the significance of differences among the five groups. These tests suggest that family size and income are the most important (i.e. have the highest F value). [Table 14.3 about here] A discriminant analysis is run. In the analysis, a few variables are dropped as they do not contribute to discrimination among the groups. We then obtain the standardized and unstandardized discriminant coefficients. As there are 5 groups, we have 5-1=4 discriminant functions. Table 14.4 shows the unstandardized and standardized coefficients. The discriminant functions are ranked according to their usefulness for discrimination. In other words, the first function is the most important for discriminating amongst the five groups; the second one is the second most important and so on. [Table 14.4 about here]


From the results on the standardized coefficients, we find that the most important variables in the first function are family size, income and how often they shop. The second function is related to age and family size. Table 14.5 gives the group means for the groups based on the four discriminant functions. Figure 14.2 plots these means for the first and the second functions. We can see that there is a big spread of the means of the five groups along the horizontal axis (Function-1) and less so along the vertical axis (Function 2). [Table 14.5 about here] [Figure 14.2 about here]

Measure of Fit
As a measure of fit of the model, we use the hit-miss table. Table 14.6 is such a hit miss table for the five categories. From the results, we see that the overall classification rate is (20+106+90+84+40)/ (34+284+293+181+61) *100 = 39.86 %. [Table 14.6 about here] Discriminant analysis rests on two statistical assumptions. One, the independent variables are assumed to be jointly normally distributed and two, the covariances are assumed to be the same across all groups. When these assumptions are violated then the statistical interpretation of the results becomes very difficult. For instance, while in practice, dummy variables are frequently used as independent variables, in theory it is a problem. This is because if a dummy independent variable is used then the independent variables are not normally distributed. To alleviate such statistical difficulties, the method of logistic regression is used. We motivate this method with a managerial problem that all direct marketers face.

Logistic Regression


Catalog companies regularly keep track of Recency, Frequency and Monetary (RFM) variables. There is an interest in relating these RFM measures to purchase behavior a buy / no buy decision. These measures can then be used for predicting purchase and for making any strategic intervention decisions to increase retention. Table 14.7 contains the summary statistics of such a data where Recency is measured in months since last purchase and Monetary is in dollar amount. Choice is a variable which takes a value 1 if a consumer made a purchase and 0 if she did not.1 [Table 14.7 about here] One strategy for estimating the relationship between choice and the RFM measures would be to use an OLS with choice as the dependent variable (yt) and RFM measures as the independent variables (xt). Table 14.8 shows the results for the OLS regression. The results suggest that Recency and Monetary are significant whereas Frequency is not. Further, the R2 is about 0.61 and the adjusted R2 is around 0.60. Despite the high R2, OLS is not appropriate for several reasons. Figure 14.3 plots the predictions of Choice and the true value of Choice for the 100 data points. We see that there are instances where the predictions for choice are either less than zero or greater than one! While this is not surprising given that OLS assumes that the dependent variable is continuous between - to , in the current context these predictions are clearly inconsistent with the data. For instance, how do we interpret a prediction of 1.32 and compare it with a prediction of 1.82? Are both indicating a purchase decision i.e. should we assume both are just 1 (buy)? Similarly, it is not clear how to interpret a prediction of -0.18 when a value of 0 reflects no purchase. This example shows that when an assumption of the OLS technique (in this case, the continuous distribution of the dependent variable) is violated, its results cannot be interpreted.


[Table 14.8 about here] [Figure 14.3 about here] The dependent variable in the above example is discrete. Such choice scenarios are extremely common. For instance, pharmaceutical companies are interested in predicting whether a physician would prescribe their drug or not and the factors that might increase the prescription rate. Similarly, managers in industries with an online presence are interested in identifying factors that can predict which consumers will purchase online (Bellman, Lohse, & Johnson, 1999). While these questions can also be addressed by discriminant analysis, there are other scenarios such as when the dependent variable is market share (i.e. lies between 0 and 1) and we want to quantify the effect of price and promotions on it, which needs a different method that can accommodate such responses.

Model for Logistic Regression

A logistic regression analysis begins with a dependent variable, which is either discrete (eg. buy / no buy) or lies between 0 and 1 (eg. market share). If we are modeling a discrete decision such as buy / no buy then we specify the probabilities of the two possible events i.e. P(Buy) and P(No Buy). As P(Buy) and P(No Buy) are probabilities, they are between 0 and 1 and they should sum up to 1. Next, we revisit the example with the discrete choice (buy / no buy) and RFM measures that we discussed earlier. We then briefly discuss how the same framework can be applied for analyzing market shares. In the RFM example, the two events are purchase and no purchase. Using the measures of P(Buy) and P(No Buy), we can specify the odds of buying as
P(Buy) P(Buy) = P(NoBuy) (1 P(Buy))

Odds(Buy) =



The odds of buying are constrained between 0 and + and take a value 1 if both outcomes are equally likely i.e. P(Buy) = 0.5 and P(No Buy) = 0.5. We can make the odds lie between - and + by taking the natural log transform. Thus, Log(Odds(Buy)) = Log( P(Buy) ) P(NoBuy) (5)

As log odds lie between - and +, we can relate it to any independent variables and interpret the effects of the variables in a manner similar to that in OLS; only now the effect of the variables would be on the log odds of the dependent variable. Thus, we can write the following equation relating the log odds of purchase for an observation t with the independent variables (xt) as. Log(Odds(Buy))t = xt This can be rewritten as P(Buy)t = 1 /(1 + exp( x t )). (7) (6)

Recall that P(Buy) is the probability of a purchase and hence should always be between 0 and 1. The above expression ensures that this will be the case irrespective of the values of the covariates. We can now use the above model for our example. Table 14.9 shows the results of two logistic regression models using Maximum Likelihood Estimation (MLE) the intercept only model, where xi contains only the intercept and the full model, where xi contains the intercept and the RFM variables. The results of the full model show that the RFM variables are significant. Further, an increase of 1 month in Recency causes an increase of 3.34 in the log odds of Buying. We can also calculate the effect on the odds of buying. This would be exp(3.34) or 28.28 i.e., the effect of increasing the Recency by 1 month increases the odds of buying by 28.28. A similar 566

analysis can be done for the other variables. Note that the RFM estimates are close to the true values of the sensitivities (see Footnote 1). Also note that the frequency sensitivity is significant in this analysis while it was not so using OLS. Thus, OLS can mask the true relationship between variables and its results can lead to erroneous interpretations for cases when the dependent variable is not continuous. [Table 14.9 about here] Figure 14.4 plots the predicted probabilities of Buying with the true value of Choice. Notice that, in contrast to the predictions of the OLS regression (Figure 14.3), all predictions lie between 0 and 1. Also, unlike the case of the OLS regression, a higher predicted value has the interpretation of a higher probability of purchase. To see how this probability of purchase varies with a change in one of the covariates, see Figure 14.5. In this figure, we plot the predicted probability of purchase with change in frequency. For generating this figure, we fixed the recency and monetary variables at their average values. The figure shows that probability of purchase has an S shape curve when the frequency increases. [Figure 14.4 about here] [Figure 14.5 about here] In the above example, we modeled the purchase decision and then related it to RFM measures. As the purchase variable was discrete, we specified the probability of purchase and no purchase measures that lie between 0 and 1. We then specified the odds of purchase and took a log transform to make it lie between - and +. We can apply the above framework to analyze market shares (MS) as well. For instance, a brand manager might want to quantify the effect of the region-specific prices and promotions on the market share in these regions. In this case, we


can begin the analysis by directly specifying the odds of market share since it already lies between 0 and 1. Thus,
MS Log(Odds(MS))t = Log( )t = x 't 1 MS


Here, for a region t, the vector xt will contain the prices and promotions for that region.

Measures of Fit
There are several measures of model fit that are used for testing the suitability of logistic regression models. Most of these measures are based around the log-likelihood measure, which is as follows.
LL( ) =

Ln(L )
t t


Here, is the entire set of MLE parameters (intercept and the other explanatory variables).

Likelihood Ratio Test

The most commonly used likelihood ratio test has the following test statistic: -2(LL( C ) - LL( )) (10)

Here, LL( C ) refers to the likelihood of the data when only an intercept model is run. Suppose

there are K covariates in the model (including the intercept) then the above statistic is distributed

2 with K-1 degrees of freedom (Theil, 1971). Thus, the test statistic measures whether the
increase in the likelihood caused by the inclusion of the explanatory variables (over and above the intercept) is significantly better than the likelihood from a model containing only the intercept.


In our example, -2 LL( ) is 30.489 while -2LL( C ) is 137.628. Thus, the test statistic takes a value of 107.139. The degrees of freedom are 4-1=3. The critical value of a 2 with 3 degrees of freedom at the 0.001 level is 16.26. Thus, the likelihood of a model that has the RFM measures is significantly better than a model with just the intercept.

Akaike Information Criterion (AIC)

AIC provides a way of adjusting the log-likelihood of a model for the number of parameters in the model. This adjustment corrects for over fitting of the data. The expression for this statistic is as follows. AIC = -2 LL( ) + 2K (11)

Here, K is the dimension of . Lower values of AIC denote a better model. Thus, a model with very large number of variables might have a low likelihood but it will also be penalized for the number of variables. In our example, we can calculate the AIC with the intercept only model (AICint) and the AIC associated with a model containing the intercept and RFM measures (AICfull). These are as follows. AICint = 137.628 + 2(1) = 139.628, (12a)

AICfull = 30.489 + 2(4) = 38.489.


Thus, the full model has a better (i.e. lower) AIC as compared to the intercept only model.

Likelihood Ratio Index (2)

The likelihood ratio index is similar to the R2 in the regular regression models. It is described as follows.


2 = 1 LL( ) /LL( C )


Here, LL( ) is -15.24 (= -30.48/2) and LL( C ) is 68.81 (= 137.628/2). Thus, the value of 2 is 0.78. As the R2, the 2 of a model will always increase or atleast stay the same when new variables are added. There is another statistic, the adjusted likelihood ration index ( 2 ) that penalizes for the increase in the number of parameters. This statistic is similar to the adjusted R2 . 2 = 1- (LL( ) -K)/(LL( C )-1) In our example, this statistic will be the following. (14)

2 = 1- (15.24+4)/(68.81+1) = 0.72


Hit Rate
Another measure that is typically used to test the fit of a model is the hit rate. For computing this measure, we take the predicted probabilities of the events from the logistic regression and employ a cut off value for making discrete predictions for the occurrence of an event. We then compare the predicted events with the actual events to determine the percentage of times in the dataset the two are the same. In our example, the two events are buy / no buy. The results from the logistic regression estimation provide the probability of purchase. We put a cut-off at 0.5 i.e. for an observation if the predicted probability of purchase is above 0.5, then we predict a purchase for that observation else we predict a no purchase. We then compare these predictions with actual events. We find that, using the full model, we correctly classify 94 out of the 100 observations. Thus, the hit rate is 94 %. The measure of hit rate as a statistic for model accuracy has a few limitations. First, the cutoff is arbitrary. Here we took a cut off of 0.5. We could have chosen any other cutoff value as

well. Second, the hit rate is not very useful when the data is skewed. Suppose we have a dataset where there are many observations with no purchase and few observations with purchase. Then a model that predicts no purchase for all observations will do well on the hit rate. In most applications, the data is also typically split into a calibration sample and a hold out sample. The model is estimated on the calibration sample and then is used to predict the observations in the hold out sample. Almost always, the hit rate within the hold out sample is lower than the hit rate within the calibration sample. Thus far, we have considered instances when the dependent variable is binary (or is between 0-1, e.g. market share) and logistic regression is readily applicable. There are also scenarios where the dependent variable can take multiple values. For instance, in the antihistamine category, there are 4 major drugs - Claritin, Zyrtec, Allegra and Clarinex. A doctor might prescribe one of these drugs to a patient. It is of much interest to pharmaceutical companies to quantify the factors which can predict when a doctor is most likely to prescribe their drug. Analysis of situations that have a multinomial dependent variable is not possible with a logistic regression. Next, we describe a method that can analyze such situations.

Multinomial Logit Model

Consider the case of a consumer packaged goods manufacturer in the grocery industry. The company is interested in predicting which brands their customers will choose on a shopping occasion and how prices and promotions might affect this choice. For example, Figure 14.6 shows the variation in market share of a brand with changes in promotion. In this figure, we find that there is an increase in market share (shown in blue) whenever there is a dip in prices (shown in red). Further, the presence of various promotional vehicles such as feature, display and coupons affects these shares. A quantitative analysis of such a problem can help retailers


understand the effect of brand promotions (Gupta, 1988), aid in appropriately setting retail prices and determine the product portfolio that they should carry (Draganska & Jain, 2005). A multinomial logit model is the most popular model to analyze such scenarios. Next, we develop this model within a random utility framework. [Figure 14.6 about here]

Random Utility Theory

Assume that a consumer assigns a level of attractiveness to each discrete alternative in her choice set. This attractiveness number for an alternative, a single index, conveys how much the consumer likes that alternative. Thus, all the information present in the attributes of the alternative is collapsed into this single index. This alternative-specific index is typically called utility. For an alternative j and time t, we will specify the utility (Ujt) to be composed of two components. One component is called the systematic component (denoted by Vjt). This is deterministic and contains the effects of covariates on the utility. The second component is called the random component. This contains any other random factors that affect consumers choice. Thus, Ujt = Vjt + jt or, Ujt = xjt + jt (16b) (16a)

Here, for time t, xjt contains covariates associated with alternative j and is a vector of parameters. We assume that decision makers choose the alternative that gives them the maximum utility. Also, for all alternatives the random components are independent and identically Gumbel


distributed. This particular choice of the error distribution leads to the following expression for the probability of choice of an alternative j out of the possible J alternatives in a choice set.
x jt x kt

P(Choice = j) =

J k =1


The above expression is intuitive to understand. The numerator can be interpreted as the strength of alternative j while the denominator is the sum of the strengths of all alternatives. Thus, the probability expression essentially is the relative strength of alternative j. For a detailed description of how this probability expression is attained from the assumptions of the error distributions, see Ben-Akiva and Lerman (1985) or Train (2003).

The above expression also shows that the logistic regression model is a subset of the multinomial logit model with binary outcomes. Thus, we can also arrive at the expressions for the probabilities of the logistic regression by beginning with a random utility specification for the binary outcomes. We can apply the above model to an example from grocery industry. The data for this example, made available by A.C. Nielsen, was collected during January, 1993 to May, 1995. We use a sample of 300 people that purchased in the Breakfast Foods category. There are four major brands in this category.2 For each brand, we have the price and promotion variation over time, which enter the vector xjt. In this application, promotion is a dummy variable created by combining various promotional vehicles such as feature and display. Table 14.10 shows the summary statistics for the Breakfast Foods data.


We estimate the parameters of the multinomial logit model using MLE on this data. Prior to looking at the results, a set of identification conditions have to be discussed. These are restriction conditions that must be imposed such that the model is identifiable i.e. only one set of parameters will be maximizing the likelihood. The restriction corresponds to setting any one of the brand intercepts to be zero. This is because only differences in utility matter in specifying which brand a consumer will choose. This can be seen from the following illustration. Suppose the four brands have the following utilities: U1t=10, U2t=20, U3t=25 and U4t=30. Then, a consumer will choose Brand 4 as that has the maximum utility. Now, suppose we add 5 units to each brand-specific utility. Then, the utilities for the four brands will be the following: U1t=15, U2t=25, U3t=30 and U4t=35. This addition of 5 units will not change the chosen brand. A consumer will still choose Brand 4. Thus, the absolute values of the utilities do not matter. It is only the relative differences in the utilities among the brands that do. We will arbitrarily set the intercept of Brand 4 to be zero. Thus, the intercepts of the other three brands will be interpreted as being relative to Brand 4. This is similar to the interpretation of a dummy variable in a regression. [Table 14.10 about here] Table 14.11 contains the MLE estimates of two multinomial logit models. The brandspecific intercepts only model contains the estimates for a model that contains only the alternative specific intercepts. The full model contains both the intercepts and the price and promotion covariates. The estimates of the full model show that the coefficient of price is negative (as it should be) whereas the coefficient for promotion is positive (again as expected). While these results are intuitive, a more managerially relevant goal is to estimate the impact of changing the price (or promotion) of brand j on the probability of choice of brand j as well as


on the probability of choosing any other brand k. A variable used for quantifying such effects is elasticity. [Table 14.11 about here]

Elasticity from the Logit Model

The systematic component for a brand j contains price and promotion. Thus,

V jt = j + price * Price jt + prom * Prom jt.


Here, j is the intercept for alternative j, price is the price sensitivity and prom is the promotion sensitivity. The elasticity of any dependent variable with respect to an independent variable is the percent change in the dependent variable following a 1% change in the independent variable. As an example, suppose the price of brand j is changed, then the own-price elasticity can be ascertained by estimating the percent change in the probability of purchasing brand j after a 1% change in its price. Similarly, the cross-price elasticity on a brand k can be evaluated by considering the percent change in the probability of purchasing brand k following a 1% change in price of brand j. For the multinomial logit model, the expressions for the own-price and cross-price elasticity are closed-form and are determined by the multinomial logit probabilities. These expressions are as follows.
price jj =

P(j) Pricej = (1 P(j) ) Pricej price Pricej P(j)


price kj =

P(k) Pricej = P(j)Price j price Pricej P(k)



price Here, jj denotes the own-price elasticity of brand j and reflects the percentage change in price probability of buying brand j with a 1% change in the price of brand j. And, kj is the cross-

price elasticity of brand k and reflects the percentage change in the probability of buying brand k with a 1% change in the price of brand j is changed. Notice that the cross-price elasticity for brand k does not depend on the attributes of brand k. Thus, the cross-price elasticity arising from a change in brand j is the same for all other brands. This property, termed as uniform
cross-elasticity, is a consequence of the expression of the multinomial logit probabilities.

Table 14.12 contains the price elasticity measures for the full model. To estimate these elasticity measures, we calculate the own and cross price elasticities for each brand and for every observation. Then, we average these measures over all observations in the dataset. We can use these numbers to interpret the impact of changing prices of a brand on own shares as well as shares of other brands. We find that a 1% increase in the price of Brand 1 lowers the probability of choosing Brand 1 by about 4.5 %. Similarly, a 1% increase in the price of Brand 1 increases the probability of choosing the others brands by 0.85 %. A similar analysis can be conducted for the other brands. [Table 14.12 about here] The elasticity measures also show an interesting property. From the summary statistics, we know that Brand 3 has the highest share. If we now consider the elasticity measures, we notice that Brand 3 has the lowest own-price elasticity and the highest cross-price elasticity. This is a limitation of the elasticity measures resulting from the multinomial logit model i.e., high market share brands show low own-price elasticity and high cross-price elasticity. Note that we showed elasticity measures for the multinomial logit model. Similar measures can also be calculated for the logistic regression model.


Fit Measures
In this application, we can calculate all the fit measures that we specified in the section on logistic regression.

Likelihood Ratio Test

A typical likelihood ratio test involves comparing a model with only alternative specific intercepts with a model where there are alternative-specific intercepts together with other explanatory variables.
Let LL( C ) refer to the likelihood of the data when only intercepts are included in a

model while LL( ) denotes the likelihood when the model contains intercepts together with the price and promotion covariates. In our example, from Table 14.11, -2 LL( ) is 3033.92 while 2LL( C ) is 4321.88. Thus, the test statistic takes a value of 1287.96. The degrees of freedom are

5-3=2. The critical value of a 2 with 2 degrees of freedom at the 0.001 level is 13.81. Thus, the likelihood of a model that contains the price and promotion covariates is significantly better than a model without.

Likelihood Ratio Index (2)

The likelihood ratio index is described as follows. 2 = 1 LL( ) /LL( C ) (20)

In the current application, LL( ) is -1516.96 and LL( C ) is -2160.94. Thus, 2 is 0.30.

As explained earlier, the adjusted 2 has the following expression. 2 = 1- (LL( ) -K)/(LL( C )-P) (21)

Here, K is the total number of parameters including the intercepts and other covariates while P is the number of intercepts. Thus, 2 is 0.29. 577

Hit Rate
In a multinomial logit model, the probabilities of choice of each alternative have a closed form expression. The predicted probabilities for choosing each alternative can then be easily calculated by inserting the MLE estimates in the probability expressions. We can then predict the alternative that is most likely to be chosen (brand with the highest probability) and compare it with the brand that is actually chosen. If the two are the same, we have a hit (i.e. a correction prediction) else the prediction is wrong. We calculate the hit rate for the intercepts only model and the full model. Table 14.11 reports these results. We find that the hit rate for an intercepts only model is around 48.3 % while the hit rate for the full model is considerably higher at 63.2 %.

Independence of Irrelevant Alternatives (I.I.A.)

The multinomial logit model has several properties. One property that we discussed was the uniform cross-elasticity. Another property that has been especially emphasized is that of I.I.A. The property can be best illustrated by revisiting the expressions for the probabilities from the logit model. Suppose we consider the probability of choice of two alternatives, i and j, denoted by P(i) and P(j) respectively, then,

x it x kt = e e x it x jt

P (i ) = P( j)

k =1

J k =1

x jt x kt



Equation (22) shows that the ratio of the probabilities of choosing two alternatives, i and j, is
independent of the presence of other alternatives and is only dependent on the systematic utilities

of the two alternatives. Thus, even if a new alternative very similar to i enters into the market, it will not make a different in the relative probabilities of choosing i and j. This result is a direct consequence of the independence assumption among the errors of the alternative-specific utilities. This assumption can be pretty tenuous in many contexts. The following problem illustrates one such context. There is a famous problem, called the red bus/ blue bus problem, which illustrates the I.I.A. issue. The problem is as follows. Suppose consumers are choosing between a car and a blue bus as means of transportation and suppose they equally like both modes of transport. The probability of choosing either a car or a blue bus is 0.5. In other words, P(choose car) / P(choose blue bus) = 1. (23)

Recall, the I.I.A. property dictates that this ratio should remain the same irrespective of the choice set. Now, suppose a red bus, similar to the blue bus in all respects except the color, is introduced as a means of transport. Then, we would expect that consumers will be equally likely to choose a red or a blue bus. This equality together with the above equality will imply the following. P(choose car) = P(choose blue bus) = P(choose red bus) = 1/3. (24)

This result is not appealing as consumers will mostly likely consider both bus types as one alternative. If this is the case, then it implies the following probabilities are more reasonable. P(choose car) = 1/2 ; P(choose red bus) = P(choose blue bus) = 1/4. (25)

Thus, the I.I.A. property can constrain the probabilities in such a way that in some contexts, we can get results that are unrealistic. There are several ways of correcting this


problem. One alternative is to allow for a tree structure for consumer choice. We can achieve this with a nested-logit model (Ben-Akiva, 1973) that allows for correlation among the utilities of alternatives only within a nest. A second alternative is to allow for heterogeneity in customers parameters then, at the aggregate level, the IIA property disappears (see Chapter 19 this book). A third method is to allow for the brand utilities to be correlated as is done by the multinomial probit model. We discuss this last method later.

Sampling of Alternatives
In the above analysis, we just had four alternatives. There are many instances, however, where the number of alternatives can be much larger. For example, if retail store managers want to evaluate the effect of price and promotion at the UPC level rather than focusing at the brand level, then the number of alternatives can be in hundreds. In that case, evaluating the denominator (the sum of strengths of all alternatives) in the probability expression of choosing a particular alternative will be infeasible. One method for circumventing this problem is to sample a set of alternatives from the entire set of possible alternatives and then evaluate the probabilities. The following example illustrates this method. Suppose we wish to model consumers choosing a mutual fund from all available mutual funds. There are many mutual funds that consumers can choose from - the latest figures suggest that there are more than 8000 mutual funds in the US alone (Investment Company Institute, 2005). We definitely cannot use all 8000 or more of these funds while evaluating the probability of choosing a specific one. What we can do is to randomly sample a small number of these mutual funds, for example 10, to form the set of alternatives. While sampling, for each observation we have to ensure that the mutual fund chosen for that observation is in the constructed set of alternatives (else how could the consumer have chosen that mutual fund if it


were not in her set of alternatives?). To ensure this, we include the mutual fund that was chosen on that observation and randomly sample 9 others from the rest of alternatives. We can then estimate the parameters of the model in exactly the same manner as described above in the Breakfast Foods example. The MLE parameter estimates from using a set of alternatives constructed from such a random sampling scheme will be exactly the same as those from using all the alternatives. For a more detailed description of sampling, look at Ben-Akiva and Lerman (1985). There are other methods for sampling of alternatives, such as importance sampling. This sort of sampling scheme is typically used when there is a need to over sample an alternative. For example, in the previous example of mutual funds, suppose we find that many consumers are choosing a few mutual funds then a sampling scheme should take these skewed choices into account when selecting samples. An importance sample scheme does exactly that. Here, while estimating the model, a correction factor is included to account for the non-random sampling. Train, Ben-Akiva, and Atherton (1989) show an application of such a sampling scheme in the context of consumers choosing long distance plans and minutes of consumption.

Multinomial Probit Model

There are several instances when there is a need to allow the utility errors of the alternatives to be correlated. For example, consumers typically choose between different modes of transport such as bus, car, train and others. They can also be using a combination of these alternatives for commuting e.g., a mix of car and train (Currim, 1982). In such a scenario, the errors in the utility of choosing a car, train and the alternative representing a combination of car and train can be correlated (i.e., cannot be assumed to be independent). Clearly, we need a model that is flexible enough to capture any possible correlation.


A multinomial probit model allows for the utility errors to be correlated and have different variances (i.e., different scales for different alternatives). It also places several identification restrictions (Keane, 1992). We show these restrictions in the simplest setting a choice model with three alternatives. The utilities for the three alternatives are given as follows. U1t=1 + x1t 1 + 1t U2t=2 + x2t 2 + 2t U3t=3 + x3t 3 + 3t (26)

Here, we have intentionally separated the intercepts with the other covariates to show the identification conditions. Also note, we have assumed a general model where the parameters for the covariates are alternative-specific. The errors are assumed to have the following distributional specification.
11 12 1t 2t ~ N 0, 21 22 3t 31 32

13 23 33


As only differences in utilities matter, we can rewrite the above utilities in the following manner. Y1t = U1t U3t =(1-3) + (x1t 1 - x3t 3) + ( 1t - 3t) Y2t = U2t U3t =(2-3) + (x2t 1 - x3t 3) + ( 2t - 3t) (28)

Let 1t be ( 1t - 3t) and 2t be ( 2t - 3t) then the joint distributional specification is as follows.

12 1t ~ N 0, 11 21 22 2t



We now state the identification conditions. First, note that the differences in the intercepts, (13) and (2-3), enter Y1t and Y2t. Thus, it is only these differences among the intercepts and not their absolute values that are estimable. We can, therefore, without loss of generality set 3 as 0. Second, unlike a linear regression where the dependent variable is observable, utilities are latent (i.e., unobservable) and we have to set its scale. We do this by setting one of the variances (11 or 12) to 1. Let 11 be set to 1 then, only 12 and 22 are estimable. Here, the parameter 12 captures any correlation between the differenced utilities and, therefore, the IIA problem is no longer a concern. In empirical applications, the estimate of 12 will suggest whether there is correlation present among the utilities. If in an application, the estimate is significantly different from zero then it implies that a multinomial logit model is inappropriate for that application as it does not allow for utilities to be correlated. Note that not all parameters of the original covariance matrix of the non-differenced utilities are identified. In general, if there are J alternatives then the original covariance matrix contains J*(J+1)/2 parameters. Of these, upon taking the difference of utilities and putting the identification conditions, only ((J-1)*J/2)-1 parameters are identified (Train, 2003). In the above formulation, we had 3 alternatives thus the original covariance matrix has (3)(4)/2 = 6 parameters. Of these, only (3-1)3/2-1=2 parameters are identified. We now consider an application where a trinomial probit choice model is applied. This application is from Keane (1982). The application considers the employment choices of men and models three choices manufacturing (M), nonmanufacturing (NM) and unemployment. The data for this model is from a national longitudinal survey of men. Table 14.13 contains a description of the independent variables. In this application, the intercept for unemployment is set to zero and the variance for the utility for M is set to 1. Note that independent variables are


allowed to have different effects on M and NM. For example, the model allows education to have different effects on manufacturing and non-manufacturing. Finally, there are two sets of parameters one set of parameters is estimated when the correlation among the utilities (12) is set to 0 and the variance of NM is set to 1 (Model-1). The other set of parameters is attained when both the correlation and variance are estimated (Model-2). [Table 14.13 about here] The results show that Model-2 is marginally better than Model-1 in terms of the loglikelihood. Further, the correlation is positive and significantly different from zero. Also note that there are differences in the estimated parameters of Model-1 and Model-2. This implies that allowing for a correlation among the utilities clearly affects the estimation of other parameters in the model. The positive correlation also suggests that a multinomial logit model may not have been appropriate for this setting as it would have failed to capture this correlation. The multinomial probit model provides a flexible way of capturing the correlations that might be present among the utilities. This alleviates the IIA problem that is inherent in the multinomial logit model.

Tobit Analysis
Tobit models are a part of general class of models for analyzing censored data. These types of data are encountered when for a large number of observations, the dependent variable is clustered around a certain value (Tobin, 1958). For example, in a large scale study of the number of hours that married women work, it was found that about 66% of respondents reported zero hours (Greene & Quester, 1982). We will show that analyzing such censored data without accounting for censoring will always lead to biased estimates.


There are many other scenarios where such a censoring is observed. For example, in grocery settings consumers either dont purchase a brand or have positive quantity (Jedidi, Ramaswamy, & Desarbo, 1993; Tellis, 1988). Technically then, any demand modeling must employ a tobit modeling framework as quantity is inherently non-negative (i.e., is censored) and has a cut off at zero. There are several ways of modeling such a demand situation. If the focus is on modeling the demand of a single alternative then a censored regression is typically the chosen method. We will show an example of this methodology. If, however, the focus is on modeling both the choice of an alternative and quantity demanded subsequent to the choice, then a two stage regression is usually adopted (Tellis, 1988). In this framework, the choice of alternative and quantity demanded are assumed to be interconnected i.e. the errors in the utility of alternatives are correlated with the error in the demand model. This correlation captures any selectivity bias (Heckman, 1979). For example, consumers may buy more of their preferred brand but less of a brand that is chosen on a promotion. We now illustrate a censored regression analysis. In general, a censored regression can be expressed as follows. (30)

q * = x t + t. t
Here, for an observation t, the random variable qt* is a partially observable variable. The error, t is normally distributed, N(0, 2). The observed value of this variable is qt , the quantity observed for observation t, when it is greater than zero. The observed value is zero if qt* is less than zero. In other words,

q * t qt = 0

if if

q* > 0 t q <0
* t



The expected value of qt is E(qt) = E(qt|qt* > 0) P(qt* > 0). Thus,

E(q | q * > 0 ) = x t +E(t > x ) t t t

x t = x + t x t


Here, the ( .) and ( . ) are the density and the cumulative distribution function respectively of the standard normal distribution. The above equation is similar to a standard OLS model with an additional term that corrects for censoring. We can estimate the above model together with the correction factor to yield unbiased estimates. Notice, that if we did not include the correction factor then a regular OLS estimation will lead to biased estimates due to the omitted variable. Bomberger (1993) used the above censored regression model to estimate the impact of income and wealth on household deposits. There were 4262 households in the dataset out of which 290 households had no deposits. Therefore, for these households the dependent variable is censored at zero. Bomberger estimates the above model and compare the results with an OLS regression. Table 14.14 shows these results. We can make several observations from these estimates. First, the intercept has very different value in the two equations. Second, we find that wealth is marginally significant is the tobit model while it is non-significant in the OLS regression. This implies that a failure to properly model the censoring can alter both the sign and significance of the estimates. [Table 14.4 about here]


In this chapter, we discussed several methods that are applicable in scenarios wherein the dependent variable is either discrete (e.g. choice of brand) or constrained in such a manner (e.g. market share) that a linear regression with OLS estimation fails to be the best alternative. We began the chapter with a discussion on discriminant analysis. We showed that this method is applicable for a discrete dependent variable (predetermined groups). In this context, we also showed how to determine the independent variables that best discriminate among groups and how to calculate their relative importance for discrimination. Next, we discussed logisitic regression. We showed that this method is suitable for both binary discrete dependent variables (e.g. buy/ no buy situation) and dependent variables that are between 0-1 (e.g. market share). Thus, this method is applicable for a wider set of situations than a two-group discriminant analysis. We extended the logistic regression model to multinomial choice models that are suitable for scenarios with a dependent variable that can take multiple values. In this context, we showed the multinomial logit and probit models. The former model is the most frequently used choice model as it provides closed form probability expressions. It has a few limitations it suffers from the IIA property and the elasticity expressions are constrained to show a particular substitution pattern among alternatives (e.g. own price elasticity is smaller for higher share brands). The multinomial probit model alleviates the IIA problem but at the expense of closed form probability expressions. We noted that for applications where the unobserved factors affecting the available alternatives are correlated (e.g. the red bus / blue bus problem) then a multinomial probit model is more appropriate than a multinomial logit model.


We ended the chapter with a discussion of censored regression models or tobit models. These models are a combination of a binary probit and a multiple regression and are applicable in a wide range of scenarios where there is censoring of the data (e.g. demand of a good).



This is a synthetic dataset. For generating this data, we set the sensitivities to Recency,

Frequency and Monetary at 2.3, 0.3 and 0.1 respectively.

We use brand and alternative interchangeably in this example. Here the four alternatives

correspond to four different brands.



Bellman, S., Lohse, G. L., & Johnson, E. J. (1999). Predictors of online buying behavior.
Communications of the ACM, 42(12), 32-38.

Ben-Akiva, M. (1973). Structure of Passenger Travel Demand Models, Ph.D. Dissertation. Department of Civil Engineering, MIT, Cambridge, MA. -----, & Lerman, S. (1985). Discrete choice analysis: Theory and application to travel demand. Cambridge, MA: MIT Press. Bomberger, W. A. (1993). Income, wealth and household demand for deposits. The American
Economic Review, 84(4), 1034-1044.

Currim, I. S. (1982). Predictive testing of consumer choice models not subject to independence of irrelevant alternatives. Journal of Marketing Research, 19, 208-222. Draganska, M., & Jain, D. (2005). Product line length as a competitive tool. Journal of
Economics and Management Strategy, 14(1), 1-28.

Greene, W. H., & Quester, A. (1982). Divorce risk and wives labor supply behavior. Social
Science Quarterly, 63, 16-27.

Gupta, S. (1988). Impact of sales promotions on when, what, and how much to buy. Journal of
Marketing Research, 25, 342-355.

Heckman, J. (1979). Sample selection bias as a specification error. Econometrica, 46, 931-961. Investment Company Institute (2005). ICI Factbook. Retrieved October 16, 2005, from Jedidi, K., Ramaswamy, V., & DeSarbo, W. S. (1993). A maximum likelihood method for latent class regression involving a censored dependent variable, Psychometrika, 58(3), 375-394.


Johnson, R. A. & Wichern, D. W. (2002). Applied multivariate statistical analysis. Upper Saddle River, NJ: Prentice Hall. Keane, M. P. (1992). A note on identification in the multinomial probit model. Journal of
Business & Economic Statistics, 10(2), 193-200.

Lehmann, D.R., Gupta, S., & Steckel, J. H. (1998). Marketing Research. New York: AddisonWesley. Tellis, G. J. (1988). Advertising exposure, loyalty and brand purchase: A two-stage model of choice. Journal of Marketing Research, 25, 134-144. Theil, H. (1971). Principles of econometrics. New York: Wiley. Tobin, J. (1958). Estimation of relationship for limited dependent variables. Econometrica, 26, 24-36. Train, K. (2003). Discrete choice models with simulation. Cambridge, MA: Cambridge University Press. -----, Ben-Akiva, M., & Atherton T. (1989). Consumption patterns and self-selecting tariffs.
Review of Economics and Statistics, 71(1), 62-73.


Table 14.1 Southern Versus Non-Southern States


South 4.95 4.45 1.37 57.00 464.13 286.13 165.20 2006.27 15.93 6.77 1801.73 42.71 11 0.80 0.36

Variable Means Non-South One-Way F 5.91 4.19 1.19 58.37 618.23 281.54 192.45 610.49 14.70 8.66 1943.37 17.13 0.05 0.30 0.03 26.97 0.00 0.14 6.84 0.05 0.40 0.06

Discriminant Function Standardized Unstandardized -0.43 0.92 -0.37 0.02 -0.01 0.00 -0.01 0.00 0.01 -0.27 -0.00 -0.32 4.15 -0.39 0.33 -0.87 0.75 -2.03 0.03 0.15 -2.65 -0.40

Average Income Population Population Change Percent Urban Tax Per Capita Government Expen. College Enrollment Mineral Production Forest Acres Manuf. Output Farm Receipts Chi-Square Degrees of Freedom Canonical Correlation Wilks Lambda

Source: Lehmann, Gupta and Steckel, Marketing Research. Page 668 (Addison-Wesley Educational Publishers Inc., 1998)

Table 14.2 Hit Miss Table

Predicted Group Actual Group South Non-South South 14 3 Non-South 1 32

Source: Lehmann, Gupta and Steckel, Marketing Research. Page 668 (Addison-Wesley Educational Publishers Inc., 1998)


Table 14.3 Averages for the Five Food Expenditure Groups

Variables Education of wife Education of husband Age Income Family size How often they shop Number of brands shopped for Information sought Sample size

1 < $15 3.32 2.79 4.09 1.62 2.09 1.91 1.82 1.91 34

2 $15-$29 4.11 3.75 3.46 2.06 2.52 2.18 2.25 1.91 284

Group 3 $30-$44 4.29 4.08 3.06 2.75 3.13 2.27 2.34 1.81 293

4 $45-$59 4.47 4.57 2.50 3.47 4.14 2.29 2.25 1.84 181

5 > $60 4.49 4.69 2.72 3.75 5.11 2.62 2.72 1.87 61

Source: Lehmann, Gupta and Steckel, Marketing Research. Page 670 (Addison-Wesley Education Publishers Inc., 1998)

Table 14.4 Discriminant Functions

Unstandardized Coeff. 1 2 3 4 Variables Education of wife Age Income Family size How often they shop Number of brands shopped for Constant 0.02 0.01 -0.56 0.42 -0.01 0.55 0.20 -0.13 -0.25 -0.29 0.21 -0.62 -0.58 0.40 0.21 0.38 -0.29 0.35 -0.80 -0.25 -0.01 0.26 -0.28 -0.30 3.19 -3.62 2.93 0.36

Standardized Coeff. 1 2 3 4 0.02 -0.01 -0.41 -0.77 -0.20 -0.01 0.01 0.81 -0.43 0.56 0.24 0.37 -0.70 0.29 0.30 0.29 -0.58 -0.37 0.52 -0.20 -0.89 0.53 -0.18 -0.43 -

Source: Lehmann, Gupta and Steckel, Marketing Research. Page 687 (Addison-Wesley Education Publishers Inc., 1998)


Table 14.5 Means of Groups

Groups 1 2 3 4 5

1 1.03 0.59 0.04 -0.71 -1.44

Functions 2 3 0.16 0.06 -0.06 -0.19 0.47 0.65 -0.06 -0.07 0.09 -0.01

4 -0.06 0.05 -0.07 0.05 -0.01

Source: Lehmann, Gupta and Steckel, Marketing Research. Page 687 (Addison-Wesley Education Publishers Inc., 1998)

Table 14.6 Hit Miss Table (Multiple Discriminant Analysis)

Predicted Group Actual Group 1 2 3 4 5 1 20 86 50 7 2 2 13 3 1 4 0 24 57 84 12 5 0 9 31 50 40

106 59 65 7 1 90 33 6

Source: Lehmann, Gupta and Steckel, Marketing Research. Page 687 (Addison-Wesley Education Publishers Inc., 1998)


Table 14.7 Summary Statistics for the RFM Data

Variable Recency Frequency Monetary Choice

Mean 3.87 7.80 73.14 0.45

Std. Dev. 2.06 2.74 28.87

Table 14.8 Parameter Estimates for OLS Regression

Variable Intercept* Recency* Frequency Monetary* R2 Adjusted R2

Estimate -0.92 0.15 0.02 0.01

Std. Error 0.13 0.01 0.01 0.001 0.61 0.60

*Significant at the 0.05 significance level.


Table 14.9 Parameter Estimates for Logistic Regression

Intercept Only Model Variable Intercept Recency Frequency Monetary -2LL Estimate -0.20 Std. Error 0.21 137.628
*Significant at the 0.05 level.

Full Model Estimate -30.29* 3.34* 0.59* 0.17* Std. Error 8.55 0.93 0.24 0.05 30.489

Table 14.10 Summary Statistics for Breakfast Foods Data

Brand Brand 1 Brand 2 Brand 3 Brand 4

Average Price ($) 1.75 1.58 1.91 1.94

Promotion 0.07 0.04 0.09 0.01

Market Share (%) 22.19 17.18 48.36 12.28


Table 14.11 Parameters estimates for Multinomial Logit Model

Intercepts Only Model Variable Estimate Intercept_Brand1 0.59 Intercept_Brand2 0.34 Intercept_Brand3 1.37 Price Promotion -2LL Hit Rate Std. Error 0.08 0.09 0.07 4321.88 48.3 %

Full Model Estimate 0.06 -0.64 1.91 -3.03 0.44 Std. Error 0.11 0.11 0.10 0.11 0.14 3033.92 63.2 %


Table 14.12 Price Elasticities from the Full Multinomial Logit Model

Change in probability of k Brand 1 Change in price of j 2 3 4 1 -4.47 0.67 2.64 0.53 2 0.85 -4.14 2.64 0.53 3 0.85 0.67 -3.17 0.53 4 0.85 0.67 2.64 -5.37


Table 14.13 Parameters estimates for Multinomial Probit Model

Model 1 Variable Non labor income Unemployment rate Time trend Years of education Labor experience Square of Exper. Dummy for race Dummy for marriage Number of kids Intercept Correlation Variance LL M 0.01 (0.01) -0.08 (0.01) -0.02 (0.01) 0.01 (0.01) 0.02 (0.01) -0.01 (0.00) 0.10 (0.05) 0.47 (0.04) 0.12 (0.02) -0.06 (0.14) NM -0.05 (0.01) -0.05 (0.01) 0.05 (0.01) 0.11 (0.01) -0.03 (0.01) 0.00 (0.01) 0.09 (0.05) 0.95 (0.09) -0.18 (0.03) -0.13 (0.12) M 0.00 (0.01) -0.09 (0.02) -0.01 (0.01) 0.03 (0.01) 0.01 (0.01) 0.00 (0.01) 0.15 (0.06) 0.51 (0.07) 0.09 (0.03) 0.46 (0.18) 0.00 (fixed) 1.00 (fixed) -10,300.71

Model 2 NM -0.03 (0.03) -0.08 (0.02) 0.04 (0.03) 0.10 (0.06) -0.02 (0.02) 0.00 (0.01) 0.14 (0.07) 0.91 (0.39) -0.11 (0.12) 0.31 (0.35) 0.64 (0.37) 1.16 (0.58) -10,299.65

Source: Keane, M. P. (1992). A Note on Identification in the Multinomial Probit Model. Journal of Business & Economic Statistics, 10, 2, Page 199


Table 14.14 Parameters estimates for OLS and Tobit Analysis


OLS Estimate Std. Error -698.6 0.1015 -0.00001 1036.10 0.0065 0.0004

Tobit Estimate Std. Error -9733 0.145 0.0002 3033 0.002 0.0001

Intercept Income Wealth

Source: Bomberger, W. A. (1993). Income, Wealth and Household Demand for Deposits. The American Economic Review. 84, 4, Page 1038.

Figure 14.1 Purchasers and Non-purchasers Versus Age and Income






1 1 2 3



Figure 14.2 Group Means on First Two Discriminant Functions

Group Means



Function 2





-0.5 -0.1






Function 1


Figure 14.3 OLS Predictions Versus Actual Choice


Predictions Choice 0.5

0 0 20 40 60 80 100 120



Figure 14.4 Logistic Regression Predictions Versus Actual Choice




Predictions Choice



0 0 20 40 60 80 100 120


Figure 14.5 Probability of Purchase Versus Frequency


0.8 Probability of purchase




0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Frequency


Figure 14.6 Variation in Market Share with changes in Marketing Mix

Market Share

1 0.75 0.5

0.8 0.6 0.4 0.2

0.25 F D C 5 F D C 10 F F F F F D D D D D C C 20 25 30

15 Week

F = Feature, D = Display, C = Store Coupon