Professional Documents
Culture Documents
Harbingers of Failure
Decades of research have emphasized that customer feed- At first glance, the result is striking. How can more prod-
back is a critical input throughout the new product develop- uct sales signal future failure? After all, the ability to meet
ment process. A central premise of this customer-focused sales targets is the litmus test of all new products. We present
process is that positive feedback is good news. The more evidence that this result is driven by the existence of an
excited that customers are about a prototype, the more likely unrepresentative subset of customers. We label them Har-
it is that a firm will continue to invest in it. When firms bingers of failure. Harbingers are more likely to purchase
move to the final stages of testing and launching a new products that other customers do not buy, and so a purchase
product, metrics of success shift from likes and dislikes to by these customers may indicate that the product appeals to
actual product sales. Again, conventional wisdom is that a narrower slice of the marketplace. This yields a signal that
more product sales indicate a greater likelihood of long- the product is more likely to fail.
term success. This assumption is fundamental to nearly We identify these customers in two ways. Our primary
every new product forecasting model (Bass 1969; Mahajan, focus is on customers who have previously purchased new
Muller, and Bass 1990). products that have failed. We show that the tendency to buy
In this article, we challenge this commonly held assump- hits or flops is systematic. If customers tend to buy failures,
tion. We show that not all positive feedback should be the next new product they purchase is more likely to be a
viewed as a signal of future success. In particular, using failure. For example, customers who purchase Diet Crystal
detailed transaction data from a chain of convenience stores, Pepsi are more likely to have purchased Frito Lay Lemon-
we demonstrate that increased sales of a new product by ade (both of which failed). In contrast, customers who tend
some customers can actually be a strong signal of future to purchase a successful product, such as a Swiffer mop, are
failure. more likely to buy other ultimately successful products,
such as Arizona Iced Tea.
*Eric Anderson is Professor of Marketing, Kellogg School of Management, It is not only the initial purchase of new products by Har-
Northwestern University (e-mail: eric-anderson@kellogg.northwestern.edu). bingers that is informative but also the decision to purchase
Song Lin is Assistant Professor in Marketing, HKUST Business School, again. A one-time purchase of Diet Crystal Pepsi is partially
Hong Kong University of Science and Technology (e-mail: mksonglin@
ust.hk). Duncan Simester is Professor of Marketing, MIT Sloan School of informative about a consumer’s preferences. However, a
Management, Massachusetts Institute of Technology (e-mail: simester@ consumer who repeatedly purchases Diet Crystal Pepsi is
mit.edu). Catherine Tucker is Professor of Marketing, MIT Sloan School of even more likely to have unusual preferences and is more
Management, Massachusetts Institute of Technology (e-mail: cetucker@
mit.edu). Christophe Van den Bulte served as associate editor for this article.
likely than other customers to choose other new products
that will fail in the future.
The second way to identify Harbingers focuses on pur- decisions than other customers and why the new products
chases of existing products. This approach is motivated by they purchase tend to fail. However, many of these theories
evidence that customers who systematically buy new products also predict that one segment of customers will influence
that fail are also more likely to buy niche existing products. the purchasing decisions of other customers. In contrast, our
The findings reveal that both approaches are similarly effec- explanation does not require that a group of customers influ-
tive at identifying Harbingers and that distinguishing early ence the decisions of others. Moreover, in our consumer
adopters of new products using either metric can signifi- packaged goods setting, it is not obvious that dependency
cantly improve predictions of long-term success or failure. between different customers’ purchasing decisions is as
large an effect as it is in technology markets.
RELATED LITERATURE This is not the first research to recognize that the feedback
Our results complement several streams of literature, of certain customers should be weighted differently in the
including literature on preference minorities, representative- new product development process. In particular, the lead user
ness, lead users, and new product forecasting. We discuss literature has argued for giving greater weight to positive
each of these areas next. feedback from some customers. Rather than relying on
We identify Harbingers through past purchases of either information from a random or representative set of cus-
new products that failed or existing products that few other tomers, the lead user process proposes collecting information
customers buy. In both cases, Harbingers reveal preferences from customers on the “leading edges” of the market. The
that are unusual compared with the rest of the population. rationale for this approach is that these leading customers
The term “preference minorities” was previously coined by are more likely to identify “breakthrough” ideas that will
Waldfogel (2009) to describe customers with unusual pref- result in product differentiation (Von Hippel 1986). Many
erences. The existence of these customers has been used researchers have tried to validate these benefits. For exam-
recently to explain the growth of Internet sales in some ple, Urban and Von Hippel (1988) examine the computer-
product categories. Offline retailers tend to allocate their aided design market and show that reliance on lead users
scarce shelf space to the dominant preferences in that mar- results in new product concepts that are preferred by poten-
ket, so customers whose preferences are not representative tial users over concepts generated by traditional product
may not find products that suit their needs (Anderson 1979; development methods. Lilien et al. (2002) report findings
Waldfogel 2003). Choi and Bell (2011) show that, as a from a natural experiment at 3M, exploiting variation in the
result, preference minorities are more likely to purchase adoption of lead user practices across 3M’s business units.
from the Internet and are less price sensitive when doing so They find that annual sales of product ideas generated using
(see also Brynjolfsson, Hu, and Rahman 2009). Preference a lead user approach are expected to yield eight times more
minorities also help explain why we observe a longer tail of revenue than products generated using traditional approaches.
niche items purchased through Internet channels compared Our results complement this literature; although early adop-
with other retail channels (Brynjolfsson, Hu, and Simester tion by lead users may presciently signal new product suc-
2011; Brynjolfsson, Hu, and Smith 2003).1 cess, there also exist customers whose adoption is an early
Lack of representativeness of customer preferences also signal of product failure.
underpins Moore’s (1991) explanation that new technology All of the new product introductions we examine had sur-
products often fail because they are unable to “cross the vived initial pilot testing. Yet despite these screens, only
chasm.” He posits that early adopters of technology are 40% of the new products in our data survived for three
more likely to be technology enthusiasts and visionaries and years. This raises the question: How did the products that
argues that the mainstream market has different (more risk- failed ever make it through the initial market tests? The new
averse) preferences. Early success may therefore not be a product development literature has identified as possible
predictor of future success. We caution that Moore’s expla- explanations escalated commitment (Boulding, Morgan,
nation is focused primarily on the adoption of disruptive and Staelin 1997; Brockner 1992; Brockner and Rubin
new technologies that represent significant innovations over 1985), an inability to integrate information (Biyalagorsky,
existing products. The role of technology enthusiasts is less Boulding, and Staelin 2006), and distortions in management
apparent in the consumer packaged goods markets that we incentives (Simester and Zhang 2010). Our identification of
study. this class of Harbingers provides an alternative explanation:
Van den Bulte and Joshi (2007) formalize Moore’s (1991) if customers who initially adopt the product have unusual
explanation by modeling the diffusion of innovation in mar- preferences that are different from other customers, the
kets with segments of “influentials” and “imitators.” They product may be more likely to fail despite high initial sales.
show that diffusion in such a market can exhibit a dip Most new product forecasting models focus on predicting
between the early and later parts of the diffusion curve new product outcomes using an initial window of sales.
depending on the extent to which the influential segment Bass (1969) introduced perhaps the best-known new prod-
affects the imitators. They offer five theories of consumer uct forecasting model, which has the important characteris-
behavior that may explain this result. When extended to our tic of an assumed interaction between current and potential
setting, these theories may partially explain why we observe adopters of the new product. The speed of diffusion depends
Harbingers making systematically different purchasing on the degree to which later adopters imitate the early
adopters. As we have discussed, our explanation does not
1Huang, Singh, and Srinivasan (2014) use a similar explanation to argue require dependency between different customers’ purchas-
why some crowdsourcing participants may offer worse ideas than other ing decisions (and this effect may be relatively weak in the
participants. types of markets that we study). Moreover, a central predic-
582 JOURNAL OF MARKETING RESEARCH, OCTOBER 2015
tion of these models is that a positive initial response is a they can also be identified through purchases of (existing)
signal of positive future outcomes.2 Our findings indicate niche items. We summarize our findings and their implica-
that this central premise may not hold if the positive initial tions in the last section.
response reflects purchases by Harbingers.
Another stream of new product forecasting models DATA AND INITIAL RESULTS
focuses on predicting success before the product has been The current research uses two data sets: a sample of indi-
launched on the market. The absence of market data means vidual customer transaction data and a sample of aggregate
that premarket forecasts are usually considered less accurate store-level transaction data. Both data sets come from a
than forecasts that use an initial window of post-launch large chain of convenience stores with many branches
sales. However, if the cost of launch is sufficiently high, across the United States. The store sells products in the
premarket tests can provide information to evaluate whether beauty, consumer health care, edibles, and general merchan-
to invest in a new product launch. Urban and Hauser (1993) dise categories. Customers visit the store frequently (on
review nine approaches to premarket testing. Some average almost weekly) and purchase approximately four
approaches rely on experience with past new products to items per trip at an average price of approximately $4 per
estimate the relationship between advertising, promotion, item.
and distribution response functions (e.g., the NEWS model The store-level transaction data include aggregate weekly
proposed by Pringle, Wilson, and Brody 1982). Other transactions for every item in a sample of 111 stores spread
approaches obtain estimates of trial and repeat purchase across 14 states in the midwestern and southwestern por-
rates, which are used as inputs in a dynamic stochastic tions of the United States. The data period extends from
model to estimate cumulative sales (e.g., Eskin and Malec January 2003 through October 2009. We use the store-level
1976). The estimates of the trial and repeat parameters are transaction data to define new product survival and to con-
obtained from various sources. For example, some models struct product covariates for our multivariate analysis. We
use premarket field tests (Parfitt and Collins 1968), while exclude seasonal products that are designed to have a short
others use laboratory tests. Perhaps the best known of the shelf life, such as Christmas decorations and Valentine’s
laboratory models is ASSESSOR, proposed by Silk and Day candy.
Urban (1978), in which respondents are exposed to advertis- The individual customer data cover more than ten million
ing and given an opportunity to purchase in a simulated transactions made using the retailer’s frequent shopping
store. The laboratory purchase rates are combined with esti- card between November 2003 and November 2005 for a
mates of availability and awareness to predict initial market sample of 127,925 customers. The customers represent a
adoption, while repeat purchase rates are estimated from random sample of all the customers who used the frequent
mail-order repurchases.3 shopping card in the 111 stores during this period. Their
The principle underlying models of trial and repeat pur- purchase histories are complete and record every transac-
chasing is that adoption can be influenced by the firm’s tion in any of the firm’s stores (in any geographic region)
investments in advertising, distribution, and promotion. using the retailer’s frequent shopping card. We focus on
However, long-term success depends on customers accept- purchases of new products that (1) were made between
ing the product, often to the exclusion of a product they November 2003 and November 2005 and (2) occurred
were previously using (Eskin 1973; Fader and Hardie 2005; within 52 weeks of the product’s introduction. There are
Parfitt and Collins 1968). Repeat purchase rates may there- 77,744 customers with new product purchases during this
fore provide a more accurate predictor of new product suc- period. They purchased 8,809 different new products, with a
cess than initial adoption rates. For this reason, we use both total of 439,546 transactions distributed across 608 product
initial adoption and repeat purchases to classify customers. categories. Examples of these new products include Paul
Specifically, we ask whether customers who repeatedly pur- Mitchell Sculpting Foam Mousse, Hershey’s wooden pen-
chase new products that fail provide a more accurate signal cils, SpongeBob SquarePants children’s shoelaces, and
of new product failure than customers who only purchase SnackWell’s sugar-free shortbread cookies.
the new product once.
The remainder of the article is organized as follows. In New Product Success
the next section, we describe the data in detail, including a We initially define a product as a “failure” if its last trans-
summary of how long unsuccessful new products remain in action date (in the store-level transaction data) is less than
the store and the opportunity cost of these failures to the three years after its introduction.4 If the last transaction date
retailer. Then, we present initial evidence that there are cus- is after this date, the product is a “success.” This definition
tomers whose decision to adopt a new product is a signal
that the product will fail. We also conduct a wide range of 4We evaluate product success using store-level data, which contain pur-
checks to evaluate the robustness of the findings. Next, we chases by all customers and include customers who purchased without
investigate who the Harbingers are and explore whether using a loyalty card. To the extent that customers using a loyalty card are
different from other customers, we would expect this difference to make it
more difficult to predict store-level outcomes. Therefore, any selection bias
2In a recent study, Morvinski, Amir, and Muller (2014) show that this introduced by the loyalty cards hinders rather than contributes to our find-
prediction holds only when the initial positive response is from similar cus- ings. Similarly, we use individual customer transactions at all of the firm’s
tomers and when the uncertainty about the product quality is low. stores, not just the 111 stores for which we have aggregate weekly data
3A final class of premarket forecasting models compares customers’ atti- (recall that all of the customers made purchases not only in the 111 stores
tudes toward the new product with attitudes toward existing products. The during our data period but also in other stores). We subsequently investi-
challenge for these attitude approaches is to accurately map customer atti- gate how this affects the results by limiting attention to purchases in the
tudes to actual purchase probabilities. 111 stores or by only considering purchases outside those stores.
Harbingers of Failure 583
of success is relatively strict (the product must survive a distinguishes between Harbingers and non-Harbingers.
minimum of 36 months), and so we also investigate the Although our initial analysis focuses on customers’ prior
robustness of our findings to using a shorter survival hori- purchases of new products that failed, we also investigate
zon. In addition, we consider several alternative measures whether we can identify Harbingers through their prior pur-
of success, including accuracy in a holdout sample, the mar- chases of existing products. In particular, we identify cus-
ket share of the new item, and how long the item survived in tomers who tend to purchase niche items that few other cus-
the market. tomers purchase.
Across the full sample of 8,809 new products, 3,508 Our initial analysis proceeds in two steps. First, we use a
(40%) survived for three years (12 quarters).5 Other sample of new products to group customers according to
research has reported similar success rates for new products how many flops they purchased in the weeks after the prod-
in consumer packaged goods markets. For example, Liutec, uct is introduced. We then investigate whether purchases in
Du, and Blair (2012) cite success rates of between 10% and the first 15 weeks by each group of customers can predict
30%. Similarly, Barbier et al. (2005) report success rates of the success of a second sample of new products. We label
between 14% and 47%. In evaluating whether a success rate these 15 weeks the “initial evaluation period.” We demon-
of 40% is high or low, it is important to recognize that these strate the robustness of the findings by varying how we
new products have all survived the retailer’s initial market select the groups of products, the length of the initial
tests and are now broadly introduced across the retailer’s evaluation period used to predict new product success, and
stores. If we were able to observe the full sample of new the metrics we use to measure success.
products that were either proposed by manufacturers or sub- The unit of analysis is a (new) product, and we assign the
jected to initial market tests by this retailer, the success rate products into two groups according to the date of the new
would be considerably lower. product introduction. We initially assign products intro-
duced between November 2003 and July 2004 to Product
The Cost of New Product Failure Set A (the “classification” set). The classification set con-
As a preliminary investigation, we compared profits tains 5,037 new products, of which 1,958 (38.9%) survive
earned in the first year of a new product’s life for new prod- three years. New products introduced between July 2004
ucts that ultimately succeeded or failed. Throughout the first and July 2005 are assigned to Product Set B (the “predic-
year, flops contribute markedly lower profits than hits do. tion” set).6 The prediction set contains 2,935 new products,
This has an important implication for the retailer. Because including 1,240 (42.2%) successes. We subsequently vary
shelf space is scarce (the capacity constraint is binding), the these demarcation dates and also randomly assign products
retailer incurs an opportunity cost when introducing a new to the prediction and classification sets.
product that fails and keeping it in the stores. Such a cost If the classification and prediction sets contain new prod-
results in lost profits equivalent to 49% of the average ucts that are variants of the same item, this may introduce a
annual profits for an existing item in the category. The spurious correlation between the failure rates for the two
implication is that new product mistakes are very costly. groups of items. For example, it is possible that the classifi-
The more accurately the retailer can predict which new cation set includes a new strawberry-flavored yogurt and
products will succeed (and the faster it can discontinue the prediction set includes a new raspberry flavor of the
flops), the more profitable it will be. same yogurt. It is plausible that the success of these prod-
It is important to recognize that throughout this article we ucts is correlated because the firm may choose to continue
do not make any distinctions based on the reasons that a or discontinue the entire product range. For this reason, we
new product fails. Instead, we consider which customers restrict attention to new products for which there is only a
purchase new products that succeed or fail and identify cus- single color or flavor variant to ensure that products in the
tomers (Harbingers) whose purchases signal that a new validation set are not merely a different color or flavor vari-
product is likely to fail. We also show that the retailer can ant of a product in the classification set. It also ensures that
more quickly identify flops if it distinguishes which cus- the products are all truly new and not just new variants of an
tomers initially purchase a new product rather than just how existing product.
many customers purchase. We begin this investigation in the
next section. Grouping Customers Using the Classification Product Set
To group customers according to their purchases of prod-
DO PURCHASES BY SOME CUSTOMERS PREDICT ucts in the classification set, we calculate the proportion of
PRODUCT FAILURE? new product failures that customers purchased in the initial
Firms often rely on customer input to make decisions evaluation period and label this as the customer’s
about whether to continue to invest in new products. Our FlopAffinityi:
analysis investigates whether the way that firms treat this
(1) FlopAffinityi =
information should vary for different customers. In particu-
lar, we consider the retailer’s decision to continue selling a Total number of flops purchased from classification set
new product after observing a window of initial purchases .
Total number of new products purchased from classification set
and show how this decision can be improved if the retailer
6Transactions in the first 15 weeks after a new product is introduced are
5The average survival duration for the 5,301 failed items was 84.68 used to predict the product’s success. Therefore, we cannot include products
weeks, or approximately 19.5 months. In the Web Appendix, we report a introduced between July 2005 and November 2005 in the prediction set
histogram (Figure WA1) describing how long new products survive. because we do not observe a full 15 weeks of transactions for these items.
584 JOURNAL OF MARKETING RESEARCH, OCTOBER 2015
We classify customers who have purchased at least two new The second model distinguishes purchases according to the
products during the product’s first year into four groups, outcomes of the customers’ classification set purchases.
according to their FlopAffinity. These include 29,436 cus- These sales measures are all calculated using purchases
tomers, representing approximately 38% of all the cus- in the first 15 weeks after the new product is introduced.
tomers in the sample. The 25th, 50th, and 75th percentiles Table 1 reports average marginal effects from both models.
have flop rates of 25%, 50%, and 67%, respectively. There- We report the likelihood ratio comparing the fit of Models 1
fore, we use the following four groupings: and 2 together with a chi-square test statistic measuring
whether the improvement in fit is significant. We also report
Group 1: Between 0% and 25% flops (25th percentile) in the
the area under the receiver operating characteristic (ROC)
classification set.
Group 2: Between 25% and 50% flops (50th percentile) in the
curve (area under the curve [AUC]). The AUC measure is
classification set. used in the machine learning literature and is equal to the
Group 3: Between 50% and 67% flops (75th percentile) in the probability that the model ranks a randomly chosen positive
classification set. outcome higher than a randomly chosen negative outcome.7
Group 4: More than 67% flops in the classification set.
Although we use these percentiles to group customers,
7The ROC curve represents the fraction of true positives out of the total
actual positives versus the fraction of false positives out of the total actual
the number of customers in each group varies because a dis- negatives.
proportionally large number of customers have a flop rate of
0%, 50%, or 100%. The groups’ sizes (for Groups 1–4, Table 1
respectively) are 8,151 (28%), 4,692 (16%), 10,105 (34%), MARGINAL EFFECTS FROM LOGISTIC MODELS
and 6,515 (22%). There are also 48,308 “other” customers
who are not classified into a group, because they did not Model 1 Model 2 Model 3 Model 4
purchase at least two new products in the classification set.
Total sales .0011** .0025**
The focus of our research is to investigate whether these (.0004) (.0006)
groups of customers can help predict success or failure of Group 1 sales .0113* .0056
products in the prediction set. (.0049) (.0041)
Group 2 sales .0016 .0004
Predicting the Success of New Products in the Prediction Set (.0055) (.0050)
Group 3 sales –.0067 –.0018
Recall that the prediction set includes new products intro- (.0036) (.0032)
duced after the products in the classification set. We use Group 4 sales –.0258** –.0165**
purchases in the initial evaluation period (the first 15 weeks) (.0052) (.0048)
to predict the success of the new products in the prediction Sales to other customers .0114**
(.0023)
.0098**
(.0021)
set. We estimate two competing models. The competing No sales in the first 15 weeks .1156 .1037
models are both binary logits, where the unit of analysis is a (.0769) (.0761)
new product indexed by j, and the dependent variable, Suc- (Log) price paid .0500** .0432*
cessj, is a binary variable indicating whether the new prod- Profit margin
(.0181)
.0300
(.0180)
.0259
uct survived for at least three years. The first model treats (.1226) (.1191)
all customers equally, while the second model distinguishes Discount received –.0389 –.0105
between the four groups of customers. In particular, the (.1604) (.1552)
probability that product j is a success (pj) is modeled as Discount frequency –.1187
(.0951)
–.1173
(.0953)
Herfindahl index .1996 .2134*
pj (.1113) (.1034)
(2) Model 1: ln = α + β 0 Total Sales j , and
1 − p j Category sales –.1025** –.0979**
(.0340) (.0338)
Vendor sales –.0304 –.0317
pj (.0334) (.0338)
(3) Model 2: ln = α + β1Group 1 Sales j + β2Group 2 Sales j
1 − p j Private label .2499** .2362**
(.0464) (.0469)
pj Number of customers with –.0134 –.0063
(3) Model 2: ln + β 3Group 3 Sales j + β 4Group 4 Sales j one repeat (.0081) (.0092)
1 − p j Number of customers with –.0390 –.0423
two repeats (.0199) (.0240)
pj Number of customers with .0038 .0181
(3) Model 2: ln + β5Sales to Other Customers j three or more repeats (.0288) (.0413)
1 − p Log-likelihood –1,998 –1,952 –1,823 –1,800
The Total Sales measure counts the total number of pur- Likelihood ratio test, 90.24** 46.66**
chases of the new product during the initial evaluation chi-square (d.f. = 4)
period. The Group ¥ Sales measures count the total number AUC .6035 .6160 .7104 .7242
of purchases by customers in Groups 1–4. The Sales to Other *p < .05.
Customers measure counts purchases by customers who are **p < .01.
not classified (because they did not purchase at least two Notes: The table reports average marginal effects from models in which
new products in the classification set). Therefore, the differ- the dependent variable is a binary variable indicating whether the new
product succeeded (1 if succeeded, 0 if failed). Robust standard errors
ence between the two models is that the first model aggre- (clustered at the category level) appear in parentheses. The unit of analysis
gates all sales without distinguishing between customers. is a new product, and the sample size is 2,953 new products.
Harbingers of Failure 585
The closer this value is to 1, the more accurate the classifier. equals the total number of new products customer i pur-
For both of these metrics (and for the additional success chased at least n times in the classification set.
metrics we report subsequently in this section), we compare We use this definition to group customers by their
Models 1 and 2. This represents a direct evaluation of Repeated FlopAffinity. To investigate the relationship
whether we can boost predictive performance over standard between the Repeated FlopAffinity groups and the success
models that forecast the outcome of new products using the of products in the prediction set, we vary the minimum
number of initial purchases. In a subsequent analysis, we number of purchases, n, from one to three. There are fewer
extend this comparison to consider not only the initial trial customers who make repeat purchases of the products in the
of the product but also repeat purchases (see, e.g., Eskin classification set, so we aggregate customers in Groups 1
1973). and 2 and customers in Group 3 and 4. Table 2 reports the
The chi-square statistics and AUC measures both confirm findings.
that distinguishing initial purchases by Harbingers from When n is equal to 1, the definition in Equation 4 is
those of other customers can significantly improve deci- equivalent to that in Equation 1. We include it to facilitate
sions about which new products to continue selling. Recall comparison. Using the new definition, we find the same pat-
that the dependent variable is a binary variable indicating tern that sales from Harbingers significantly reduce the like-
whether the product succeeded. Positive marginal effects lihood of success. Notably, the marginal effects for Groups
indicate a higher probability of success, while negative mar- 3 and 4 are larger as n becomes larger. This suggests that a
ginal effects indicate the reverse. In Model 1, we observe new product is even more likely to fail if the sales come
that higher Total Sales are associated with a higher probabil- from customers who repeatedly purchase flops.
ity of success. This is exactly what we would expect: prod- Embracing the Information that Harbingers Provide
ucts that sell more are more likely to be retained by the
retailer. In Model 2, we observe positive marginal effects The findings indicate that the firm should not simply
ignore purchases by Harbingers, because their purchases are
for customers in Groups 1 and 2 but negative marginal
informative about which products are likely to fail. In par-
effects for customers in Groups 3 and 4. Purchases by cus-
ticular, if we omit purchases by customers in Groups 3
tomers in each group are informative, but they send differ-
and/or 4, the model is less accurate in explaining which
ent signals. Notably, purchases by Harbingers (represented
products succeed. When sales to Groups 3 and 4 are
by customers in Groups 3 and 4) are a signal of failure: if included, the model is able to give greater weight to pur-
sales to these customers are high, the product is more likely chases by other customers whose adoption is a strong signal
to fail. that the product will succeed. This is reflected in the much
Our two models do not contain any controls for product larger positive marginal effect of sales to customers in
covariates. However, we can easily add covariates to each Groups 1 and 2 (Model 2) compared with the marginal
model. To identify covariates, we focus on variables that effect for Total Sales (Model 1). The large, significant effect
have previously been used in the new product development for Groups 1 and 2 (see also Table 2, Model 2: At Least One
literature (e.g., Henard and Szymanski 2001). Our focus is Purchase) provides some evidence that there may also exist
on controlling for these variables rather than providing a customers whose purchases signal that a product is more
causal interpretation of the relationships. A detailed defini- likely to succeed (i.e., Harbingers of Success).
tion of all the control variables appears in the Web Appen-
dix, together with summary statistics (Tables WA1 and Table 2
WA2). The findings when including product covariates are GROUPING CUSTOMERS BY REPEAT PURCHASES
also included in Table 1 (as Models 3 and 4). We find that
the negative effect of sales from Group 4 on new product Model 2
success persists. At Least At Least At Least
One Two Three
Are Repeat Purchases More Informative? Model 1 Purchase Purchases Purchases
Previous research has suggested that not only a cus- Total sales .0011**
tomer’s initial purchase of a product is informative but also (.0004)
whether the customer returns and repeats that purchase Groups 1 and 2 .0064* –.0058 –.0096
(Eskin 1973; Parfitt and Collins 1968). Therefore, we inves- Groups 3 and 4
(.0032) (.0042) (.0087)
–.0144** –.0218** –.0375**
tigate whether repeat purchases of a new product that subse- (.0027) (.0051) (.0090)
quently failed are more informative about which customers Sales to other customers .0118** .0066** .0045**
are Harbingers than just a single purchase. To address this (.0022) (.0012) (.0008)
question, we identify new products that are purchased Log-likelihood
Likelihood ratio test,
–1,998 –1,959 –1,963 –1,962
78.69** 69.57** 73.06**
repeatedly by the same customer in the classification set chi-square (d.f. = 2)
(during the initial evaluation period). In particular, we rede- AUC .6035 .6128 .6050 .6159
fine FlopAffinity as *p < .05.
**p < .01.
xi ( n ) Notes: The table reports average marginal effects from models in which
(4) Repeated FlopAffinityi ( n ) = ,
yi ( n ) the dependent variable is a binary variable indicating whether the new
product succeeded (1 if succeeded, 0 if failed). Robust standard errors
where xi(n) equals the number of flops customer i pur- (clustered at the category level) appear in parentheses. The unit of analysis
chased at least n times in the classification set and yi(n) is a new product. The sample size is 2,953.
586 JOURNAL OF MARKETING RESEARCH, OCTOBER 2015
100%
80% success, (2) alternative approaches of constructing product
sets in the analysis, and (3) alternative predictors of success.
60%
We also assess predictive accuracy using an alternative
40% approach to construct the data. Finally, we report the find-
20% ings by category and when grouping the items by other
product characteristics. We briefly summarize the findings
for all of the results in this section and present detailed find-
0%
The next measure of success that we investigate is market and the prediction set instead of dividing them by time.
share. Recall that the products in the prediction set were all Again, we use the classification set to group customers and
introduced between July 2004 and July 2005. To measure the prediction set to predict product success. Our qualitative
market share, we calculate each item in the prediction set’s conclusions remain unchanged under this approach. To fur-
share of total category sales in calendar year 2008 (approxi- ther confirm that the classification set and prediction set
mately three years later). The qualitative conclusions do not contain new products that are truly different and unrelated
change (see Web Appendix Table WA4). Purchases by cus- to one another, we also repeated the analysis when ran-
tomers in Group 4 are associated with a significantly lower domly assigning all of the new products in some product
market share, and both the chi-square and the AUC mea- categories to the classification set and all of the new prod-
sures confirm that distinguishing between customers in ucts in the remaining product categories to the prediction
Model 2 yields a significant improvement in accuracy. set. Product categories were equally likely to be assigned to
An alternative to measuring whether a new product sur- each set. The findings also survive under this allocation (see
vives for two or three years is to measure how long a new Web Appendix Table WA9).
product survives. A difficulty in doing so is that some prod- Recall that our transaction data include the complete pur-
ucts survive beyond the data window, and so we do not have chasing histories of each customer in the sample (when
a well-defined measure of how long these products survive. using the store’s loyalty card). This includes purchases from
To address this challenge, we estimate a hazard function. other stores in the chain, beyond the 111 stores used to con-
We report details of the analysis in the Web Appendix struct covariates and identify when a product is introduced
(Tables WA5a and WA5b). The findings reveal the same and how long it survived. To investigate how the findings
pattern as our earlier results: increased sales among cus- are affected by the inclusion of purchases from other stores,
tomers in FlopAffinity Groups 3 and 4 are associated with we repeated the analysis using three approaches. First, we
higher hazards of product failure. The implication is that excluded any purchases from outside the 111 stores when
increased purchases by Harbingers are an indication that a either classifying the customers into FlopAffinity groups or
new product will fail faster. predicting the success of new products in the prediction set.
Second, we only considered purchases from outside the 111
Alternative Constructions of Product Sets stores when classifying customers and predicting new prod-
Recall that when predicting the success of new products uct success. Finally, we obtained a sample of detailed trans-
in the prediction product set, we only consider purchases action data for different customers located in a completely
made within 15 weeks of the new product introduction. We different geographic area and used purchases by these cus-
repeated the analysis when using initial evaluation period tomers to both classify these customers and predict new
lengths of 5 or 10 weeks (using the same sample of prod- product success.8 In the first two approaches, we used our
ucts). The findings are again qualitatively unchanged (see original allocation of products to the classification and pre-
Web Appendix Table WA6). Even as early as 5 weeks after a diction sets. In the final approach, we randomly assigned
new product is introduced, purchases by Harbingers are a new products into the classification set and the prediction
significant predictor of new product success. set. As might be expected, the findings are strongest when
In our analysis, we omit items that were discontinued we focus solely on purchases in the 111 stores and weakest
during the 15-week initial evaluation period. If we are using when we use customers from a different geographic area.
the initial evaluation period to predict future success, it However, even in this latter analysis, increased initial sales
seems inappropriate to include items that have already to Harbingers (customers in Groups 3 and 4) are associated
failed. However, these items are not a random sample; they with a lower probability of success (see Web Appendix
are the items that failed the most quickly. To investigate Table WA10).
how the omission of these items affected the results, we
Alternative Predictors of Success
repeated the analysis when including these items in our
sample of new products. Doing so yields the same pattern of We restricted attention to customers who have purchased
findings (see Web Appendix Table WA7). at least two new products in the classification set. For cus-
We have grouped products according to the timing with tomers with two purchases, FlopAffinity can only take on
which they were introduced. New products purchased in the values of 0, .5, or 1. To investigate whether the findings are
first 39 weeks of the data period (between November 2003 distorted by the presence of customers who purchased rela-
and July 2004) were assigned to the classification set, while tively few new products, we replicated the analysis when
products introduced between July 2004 and July 2005 were restricting attention to customers who purchased at least
assigned to the prediction set. We repeated the analysis three, four, and five new products from the classification
when using different time periods to allocate products into set. There is almost no qualitative difference in the results
these two product sets. In particular, we constructed both a when restricting attention to a subset of customers using a
smaller classification set (the first 26 weeks) and a larger minimum number of new product purchases (see Web
classification set (the first 52 weeks). The results confirm
that the conclusions are robust to varying the length of the 8These data include 27 million transactions between August 2004 and
periods used to divide the products (see Web Appendix August 2006, for a sample of 810,514 customers. These customers are a
Table WA8). random sample of all of the customers who shopped in 18 stores located in
We also investigated two alternative approaches to con- a different geographic region. Their purchase histories are also complete
and record every transaction in any of the firm’s stores (in any geographic
structing these two product sets. In one approach, we ran- region). Only .03% of the store visits for this separate sample of 810,514
domly assign the new products into the classification set customers were made at the 111 stores that we use in the rest of our analysis.
588 JOURNAL OF MARKETING RESEARCH, OCTOBER 2015
Appendix Table WA11). We conclude that the results do not Predictive Accuracy
seem to be distorted by the presence of customers who pur- An alternative way to measure predictive accuracy is to
chased relatively few new products. fit the models to a subset of the data and predict the out-
The variables in Equation 3 focus on the quantity pur- comes in the remaining data. We divided the 2,953 products
chased by customers in each FlopAffinity group. Alterna- in the prediction set into an estimation sample and a holdout
tively, we can investigate how the probability of success sample.9 We used the estimation sample to produce coeffi-
varies when we focus on the proportion of purchases by cients and then used these coefficients to predict the out-
customers in each group. In particular, we estimate the fol- come of the products in the holdout sample. The estimates
lowing modification to Equation 3: of marginal effects are very close to those from the full sam-
ple (see Web Appendix Table WA14).
pj A baseline prediction would be simply that all of the items
(5) ln = α + β 0 Total Sales j + β1Group 2 Ratio j
1 − p j in the holdout sample will fail. This baseline prediction is cor-
rect 54.08% of the time. Using Total Sales during the 15-week
pj initial evaluation period (Model 1) only slightly improves
(5) ln + β2Group 3 Ratio j + β 3Group 4 Ratio j
1 − p j accuracy to 54.99%. However, in Model 2, in which we dis-
tinguish which customers made those purchases during the
pj initial evaluation period, accuracy improves to 61.42%.
(5) ln + β 4 No Sales to Grouped Customers.
1 − p This result highlights the value of knowing who is pur-
The ratio measures represent the percentage of sales of chasing the product during an initial trial period. Although
product j to customers in Groups 1–4 contributed by cus- higher total sales are an indication of success, this is only a
tomers in each group. The four ratio measures sum to 1 (by relatively weak signal. The signal is significantly strength-
definition), so we omit the Group 1 ratio measure from the ened if the firm can detect how many of those initial pur-
model. Under this specification, the coefficient for each of chases are made by Harbingers. It is also useful to recognize
the other three ratio measures can be interpreted as the that while a 6.43% improvement in accuracy (from 54.99%
change in the probability of success when there is an to 61.42%) may seem small, the difference is significant
increase in the ratio of sales to that group (and a correspon- and meaningful. As we discussed in the “Related Literature”
ding decrease in the ratio of sales to customers in Group 1). section, the cost to the retailer of retaining products that will
As we discussed previously, some items have no sales to subsequently fail is large, in some cases even larger than the
any grouped customers. The ratio measures are all set to cost to the manufacturer. As a result, even small improve-
zero for these items, and we include a binary indicator flag- ments in accuracy are valuable.
ging these items. Table WA12 in the Web Appendix reports A limitation of this holdout analysis is that the outcomes
the results. For Groups 3 and 4, we observe significant (success or failure) for products in the estimation sample are
negative coefficients, indicating that when customers in not known when the items in the holdout sample are intro-
these groups contribute a higher proportion of sales, there is duced (i.e., at the time of the prediction). Given that we
a smaller probability that the new product will succeed. In require three years of survival to observe the outcome and
have only two years of individual purchasing data, there is
particular, if the ratio of sales contributed by customers in
no way to completely address this limitation. However, we
Group 3 increases by 10%, the probability of success drops
offer two comments. First, in practice, firms have access to
by 1.73%. A 10% increase in the Group 4 ratio leads to a 3%
longer data periods and will be able to observe the outcomes
drop in the probability of success. of items in the calibration sample using data that exist at the
In our analysis, we examined purchases of new products time of the predictions. Second, it is important to distin-
by each group of customers. We can also investigate whether guish information about product outcomes from informa-
the decision not to purchase a new product is informative. In tion about customers’ purchasing decisions. Although the
particular, we can classify customers according to the num- predictions use future information about the product out-
ber of successful new products in the classification set that comes, they only rely on customer purchases made before
they did not purchase in the first year the product is intro- the date of the predictions.
duced. We calculate the following measure:
Results by Product Category
(6) Success Avoidance i =
We next compare how the findings varied across product
Number of successful new products not purchased by customer i categories. We begin by comparing the results across four
.
Total number of new products not purchased by customer i “super-categories” of products: beauty products, edibles,
general merchandise, and health care products. These super-
As might be expected, the Success Avoidance and categories are defined by the retailer10 and comprise 49%,
FlopAffinity measures are highly correlated (r = .76). We
use the same approach to investigate whether distinguishing 9We use new items that are introduced earlier for estimation (60%) and
between customers with high and low Success Avoidance hold out new items that are introduced later for the prediction test (40%).
can help predict whether new products in the prediction set The sample sizes are 1,740 in the estimation sample and 1,213 in the hold-
will succeed. The findings confirm that purchases by cus- out sample. The findings are robust to randomly assigning the items to the
estimation sample and the holdout sample.
tomers who tend to avoid success are also indicative of 10These categories are considerably broader than the product categories
product failure (see Web Appendix Table WA13). used to allocate products in our previous robustness analysis.
Harbingers of Failure 589
5%, 20%, and 26% (respectively) of the new products in our the basis of their classification set purchases. Harbingers
sample of 2,953 new products in the prediction set. We include customers in Groups 3 and 4, while the Other cus-
repeated our previous models separately for the four super- tomers are in Groups 1 and 2. In Table 3, we compare the
categories (using the new products in the prediction set) and purchasing patterns of the two types of customers using the
report the findings in the Web Appendix (Table WA15). transactions in the period used to identify the classification
We replicate the negative effect of sales for Group 4 cus- set (November 2003 to July 2004). We include purchases of
tomers in each category, and the result is statistically signifi- all products (new and existing), and in the Web Appendix
cant in three of the four categories. Furthermore, distin- (Table WA19), we repeat the analysis when focusing solely
guishing purchases by the four customer groups seems to on new products. The Web Appendix also provides defini-
lead to particularly large improvements in accuracy in the tions and summary statistics of these purchasing measures
health care and edibles categories. It could be argued that (Tables WA20 and WA21).
they are the categories for which evaluating quality is, on The findings in Table 3 reveal that, on average, Harbin-
average, more important because the products are intended gers purchase more items but visit a similar number of
either for consumption or to improve consumer health. stores. They tend to buy slightly more items per visit but
Notably, although Harbingers are generally less likely to make slightly fewer visits. Although the differences in these
purchase health care products, when they do purchase them, measures are statistically significant, they are relatively
it is a particularly strong signal that the product will fail. small. There are larger differences in the prices of the items
We next compare the outcomes for national brands and that Harbingers purchase and the categories from which
private label products. Our sample of 2,953 new products they purchase. Harbingers tend to choose less expensive
includes 18% private label products (the retailer’s store items and are more likely to purchase items on sale and
brand). The results from reestimating the models separately items with deeper discounts. They purchase a higher pro-
for private label and national brand new products in the pre- portion of beauty items but a lower proportion of health care
diction set appear in the Web Appendix (Table WA16). items.
Again, we observe a negative effect for Group 4 sales, Harbingers tend to purchase new products more quickly
which replicates the harbinger effect. We also find that the after the items are introduced (see Table WA19 in the Web
improvement in predictive performance (measured by the
area under the ROC curve) is particularly strong for private Table 3
label items.
In the Web Appendix (Tables WA17 and WA18), we also
HARBINGERS VERSUS OTHERS PURCHASING NEW AND
and Geography Drive Cross-Channel Competition,” Manage- Mahajan, Vijay, Eitan Muller, and Frank M. Bass (1990), “New
ment Science, 55 (11), 1755–65. Product Diffusion Models in Marketing: A Review and Direc-
———, ———, and Duncan Simester (2011), “Goodbye Pareto tions for Research,” Journal of Marketing, 54 (February), 1–26.
Principle, Hello Long Tail: The Effect of Internet Commerce on Moore, Geoffrey A. (1991), Crossing the Chasm. New York:
the Concentration of Product Sales,” Management Science, 57 HarperBusiness.
(8), 1373–87. Morvinski, Coby, On Amir, and Eitan Muller (2014), “‘Ten Mil-
———, ———, and Michael D. Smith (2003), “Consumer Surplus lion Readers Can’t Be Wrong!’ or Can They? On the Role of
in the Digital Economy: Estimating the Value of Increased Prod- Information about Adoption Stock in New Product Trial,” MSI
uct Variety at Online Booksellers,” Management Science, 49 report, (accessed August 10, 2015), [available at http://www.
(11), 1580–96. msi.org/reports/ten-million-readers-cant-be-wrong-or-can-they-
Choi, Jeonghye and David R. Bell (2011), “Preference Minorities the-role-of-information-about/].
and the Internet,” Journal of Marketing Research, 48 (August), Parfitt, J.H. and B.J.K. Collins (1968), “Use of Consumer Panels
670–82. for Brand-Share Prediction,” Journal of Marketing Research, 5
Eskin, Gerald J. (1973), “Dynamic Forecasts of New Product (May), 131–45.
Demand Using a Depth of Repeat Model,” Journal of Marketing Pringle, Lewis G., Dale Wilson, and Edward I. Brody (1982),
Research, 10 (May), 115–29. “NEWS: A Decision Oriented Model for New Product Analysis
——— and John Malec (1976), “A Model for Estimating Sales and Forecasting,” Marketing Science, 1 (1), 1–30.
Potential Prior to the Test Market,” in Proceedings of the Ameri- Silk, Alvin J. and Glen L. Urban (1978), “Pre-Test-Market Evalua-
can Marketing Association 1976 Fall Educators’ Conference. tion of New Packaged Goods: A Model and Measurement
Chicago: American Marketing Association, 230–33. Methodology,” Journal of Marketing Research, 15 (May), 171–91.
Fader, Peter S. and Bruce G.S. Hardie (2005), “The Value of Sim- Simester, Duncan and Juanjuan Zhang (2010), “Why Are Bad
ple Models in New Product Forecasting and Customer-Base Products So Hard to Kill?” Management Science, 56 (7), 1161–
Analysis,” Applied Stochastic Models in Business & Industry, 79.
21 (4/5), 461–73. Urban, Glen L. and John R. Hauser (1993), Design and Marketing
Henard, David H. and David M. Szymanski (2001), “Why Some of New Products. Englewood Cliffs, NJ: Prentice Hall.
New Products Are More Successful than Others,” Journal of ——— and Eric von Hippel (1988), “Lead User Analyses for the
Marketing Research, 38 (August), 362–75. Development of New Industrial Products,” Management Sci-
Huang, Yan, Param V. Singh, and Kannan Srinivasan (2014), ence, 34 (5), 569–82.
“Crowdsourcing New Product Ideas Under Consumer Learn- Van den Bulte, Christophe and Yogesh V. Joshi (2007), “New
ing,” Management Science, 60 (9), 2138–59. Product Diffusion with Influentials and Imitators,” Marketing
Lilien, Gary L., Pamela D. Morrison, Kathleen Searls, Mary Son- Science, 26 (3), 400–421.
nack, and Eric von Hippel (2002), “Performance Assessment of Von Hippel, Eric (1986), “Lead Users: A Source of Novel Product
the Lead User Idea-Generation Process for New Product Devel- Concepts,” Management Science, 32 (7), 791–805.
opment,” Management Science, 48 (8), 1042–59. Waldfogel, Joel (2003), “Preference Externalities: An Empirical
Liutec, Carmen, Rex Du, and Ed Blair (2012), “Investigating New Study of Who Benefits Whom in Differentiated Product Mar-
Product Purchase Behavior: A Multi-Product Individual-Level kets,” RAND Journal of Economics, 34 (3), 557–68.
Model for New Product Sales,” working paper, University of ——— (2009), The Tyranny of the Market: Why You Can’t Always
Houston. Get What You Want. Cambridge, MA: Harvard University Press.