Professional Documents
Culture Documents
Analytics
Methods and Applications in Marketing
Management, Public Policy, and Litigation Support
Edited by
Natalie Mizik
Professor of Marketing and J. Gary Shansby Endowed Chair
in Marketing Strategy, Foster School of Business, University
of Washington, USA
Dominique M. Hanssens
Distinguished Research Professor of Marketing, Anderson
School of Management, University of California, Los Angeles,
USA
Published by
Edward Elgar Publishing Limited
The Lypiatts
15 Lansdown Road
Cheltenham
Glos GL50 2JA
UK
List of contributors ix
Overview of the chapters xviii
Introduction 1
Natalie Mizik and Dominique M. Hanssens
METHODS CHAPTERS
Index 671
ix
xviii
EXPERIMENTAL
DESIGNS
11
reducing the connection between the company and the rumor might be an
effective strategy in response to any future rumors.
In summary, historical, survey, and qualitative data are excellent
sources for hypotheses about relationships between variables, but they are
inadequate to support a strong causal inference. In situations where it is
important to establish causality, an experiment should be conducted.
Testing a Theory
Testing an Intervention
The value of theory ultimately lies in its application to real world situa-
tions in the form of theory-based interventions. Researchers may pilot
test these interventions prior to implementing them on a grand scale. In
an intervention-testing experiment, the focus is on the treatments and out-
comes rather than on the abstract theory that led to the selection of these
variables. The goal is to see whether an intervention or treatment has the
desired effect and, if multiple interventions are under consideration, to
gauge their relative effectiveness. Rather than striving to create interven-
tions that vary along a single dimension and controlling for factors unad-
dressed by the theory (as would be the goal in a theory test), researchers
often design interventions that operationalize the theoretical constructs in
multiple ways so as to maximize the likelihood that the intervention will
have the desired impact and relax control over factors that lie outside the
theory to better mimic the natural environment to which the results will
be generalized.
that sales of a product are tied to its placement within a grocery store
such that sales are greater when the product is displayed next to comple-
mentary categories rather than potentially competing ones (e.g., peanut
butter shelved next to jams and preserves rather than next to soy nut
butter). A litigator may need to estimate sales that were potentially lost
due to a competitor’s infringement on a patent by isolating the effect of
specific product features on consumer preferences. Or a charity may desire
to select the most effective appeal from several executions for generating
donations. In these situations, a field experiment has some obvious advan-
tages. Nevertheless, a laboratory experiment may be the better choice due
to monetary and time constraints.
In summary, if the primary goal is to establish a clear causal linkage
(versus estimating the magnitude of the relationship in natural settings), a
laboratory experiment is preferred. A laboratory experiment may also be
preferred for a variety of practical reasons detailed earlier. An important
additional advantage of conducting an experiment in the laboratory is
the opportunity to solicit participants’ responses to other questions that
may further shed light on the causal relationship. Information such as
age, gender, income, education, past experiences, and their thoughts
and emotions while being exposed to the treatment may also be useful
in identifying why the effect occurs, when it may dissipate or accentuate,
and what kinds of intervention may be useful to enhance or suppress the
effect.
EG 1 X1 O1
EG 2 X2 O2
EG 1 O1 X1 O2
When the main objective of the experiment is to compare the effects of dif-
ferent treatments (as in an intervention or effect test), a single factor design
with as many levels of treatments as desired may be adequate. However,
when the objective of the experiment is to delve into the why or how some-
thing happens (as in theory testing), a design involving multiple factors
may be needed for at least two reasons.
First, multiple factors may be included for the simple reason that some
theories specify moderators or boundary conditions. The simplest multi-
factor design is a 2 (XA1, XA2) × 2 (XB1, XB2) design, with participants being
randomly assigned to each of the four experimental groups:
XB1 XB2
XA1 EG 1 EG 2
XA2 EG 3 EG 4
where m = grand mean, tj is the main effect for the jth level of treatment
XA, lk is the main effect for the kth level of treatment XB, and (tl)jk is the
interaction effect for XAj and XBk.
As an example, the Aaker and Lee (2001) study that was discussed
earlier used a 2 × 2 design to test the hypothesis that individuals’ self-view
moderates whether a promotion or prevention message frame is more
persuasive. The researchers varied the content of a website for Welch’s
Grape Juice that encouraged participants to adopt one of two self-views
(independent or interdependent) and exposed them to a persuasive mes-
sages evoking one of two goal orientations (promotion or prevention).
Notes:
* A = Independent self-view/Promotion frame.
B = Independent self-view/Prevention frame.
C = Interdependent self-view/Promotion frame.
D = Interdependent self-view/Prevention frame.
When the objective of the research is to test for both main and interaction
effects, as is typically the case in theory-testing research, a full factorial
design is used where every level of one factor is crossed with all levels of the
other factors. This was the case for both of the Aaker and Lee (2001) experi-
ments described above. A full factorial design ensures that all the independ-
ent variables in the model, including the interaction terms, are orthogonal to
each other so that each of the effects could be estimated independently of all
other effects. Sometimes for efficiency it is desirable to use just a subset (i.e.,
a fraction) of the experimental conditions of a full factorial design, care-
fully chosen to preserve the orthogonality of the design. With a fractional
factorial design, the researcher will be able to estimate the main effects with
a much smaller sample, but will not be able to estimate all the interaction
effects. One instance of a fractional factorial design is the Latin Square
design described earlier. A common use of fractional factorial designs is in
conjoint studies (see Chapter 3 on conjoint analysis in this volume).
Another strategy that makes efficient use of participants is to “yoke”
additional cells to a simple factorial design. The Tybout et al. (1981)
experiment illustrates this strategy. The basic design in this study was
a 2 × 2 factorial where the participants were randomly assigned to one
of four conditions created by crossing mention of the worm rumor
(rumor absent, rumor present) with the inclusion of questions prompting
retrieval of prior attitudes toward McDonald’s (questions absent, ques-
tions present). Two additional treatments were yoked to the condition
where the rumor was introduced and the retrieval questions were absent.
In the first yoked treatment condition, McDonald’s refutation of the
rumor was presented. In the second condition, a response designed to
weaken the connection between McDonald’s and worms while making
people’s mental associations to worms more positive was presented. The
design is depicted below.
No Rumor Rumor
No Retrieval EG 1 EG 2 EG 5* EG 6**
Questions
Retrieval EG 3 EG 4
Questions
Notes:
* Rumor, no retrieval questions, McDonald’s refutation.
** Rumor, no retrieval questions, a message designed to weaken the connection between
McDonald’s and worms and making people’s associations to worms more positive.
Notice that the yoked treatments could have been included as addi-
tional treatments in a fully crossed design by allowing the retrieval ques-
tions variable to assume four rather than two levels. Doing so would have
required eight cells rather than six cells, while allowing the researchers
to examine the effectiveness of dual-approach strategies (e.g., retrieval
questions + McDonald’s refutation). Yet another design could be a single-
factor design with five conditions (EG 1, EG 2, EG 4, EG 5, and EG 6) if
the researchers were not interested at all in people’s attitudes when prior
associations are made salient in the absence of a rumor. The key consid-
eration to bear in mind in design selection is how efficient the design is in
serving the objectives of the research.
There are many types of dependent measures that researchers can use to
assess the effects of the independent variables in a laboratory experiment.
The decision of which measures and how many to include will depend
on the goal of the experiment. Theories specify not only outcomes, but
also processes by which the outcomes occur. Thus, in testing theories, the
researcher may include the outcome measures to capture the proposed
effect, such as participants’ beliefs about or dispositions toward certain
brands or products (i.e., the dependent variable), as well as measures that
allow inferences about the process underlying those outcomes (i.e., the
mediator variable). These process measures serve to strengthen the test
of the theory by allowing the researcher to conduct mediation analyses
to uncover the mechanism that drives the proposed effect. By contrast,
when conducting an intervention test or seeking to establish an effect, the
researcher is primarily interested in whether a desired outcome occurs in
response to the treatments, and is less interested in the process that led to
that outcome, in which case a smaller set of measures may be included. In
the next sections, we describe some of the more commonly used measures
in lab experiments.
the underlying construct can be obtained than would occur with a single
item. These items are then combined to create an index that serves as the
dependent variable in the data analysis.
Choice/behavior
Participants may also be asked to make choices or engage in certain
behaviors. For example, they may be sent on an online shopping trip
where there are real consequences associated with the choices made (e.g.,
participants receive these products as compensation for participating in
the study). Or participants may be asked to a sample a food product and
the amount that they consume is measured as an indicator of their liking.
Or, participants may be asked to serve as a spokesperson for a cause and
the length and detail of their advocacy may serve as an indicator of the
strength of their support for the cause.
Need for Cognition Scale; Snyder (1974): Self-monitoring Scale) can also
be used to operationalize theoretical concepts. This was the case in the
Aaker and Lee (2001) experiment discussed earlier where participants’
cultural background (American or Chinese) served as one operationaliza-
tion of self-view. Further, demographic characteristics and individual dif-
ferences may be used to partition the data post hoc to explore whether
the same or different effects are observed in subsets of the sample. Thus,
including these measures can be useful in determining the robustness of
effects or in exploring potential moderators post hoc.
When multiple measures are included in the design, the researcher must
consider the order in which they are presented because there is a risk that
initial measures may influence subsequent measures. For example, asking
participants to recall information presented in the treatment just before
expressing their attitude could alter their attitude by encouraging them
to rely on the recalled information that they otherwise may not use. One
approach to addressing these concerns is to present the dependent measure
of greatest interest first and recognize the potential for order effects on
subsequent measures. An alternative strategy is to counterbalance the
order of the measures and make order a blocking variable in the design
to identify potential biases. In the event an order effect is detected, the
researcher may have to consider using dependent variables that are less
likely to have an order effect, such as those used to assess nonconscious
processes (e.g., response time), or collecting data on these variables using
separate experiments.
Selecting a Sample
Concluding Remarks
Note
1. When comparing between means, Cohen (1988) considered an ES (d = (µ1 – µ0)/ s) of .20
to be small, d = .50 to be medium, and d = .80 to be large. When comparing between two
proportions (P), he considered an ES (h = ϕ1 – ϕ2 where ϕ1 = 2 arcsin (!Pk = 2 )) of .20 to
be small, h = .50 to be medium and h = .80 to be large. And when assessing correlations,
r = .10 is considered small, r = .30 is medium, and r = .50 is large.
References
Aaker, Jennifer L. and Angela Y. Lee (2001), “‘I’ Seek Pleasures and ‘We’ Avoid Pains: The
Role of Self-Regulatory Goals in Information Processing and Persuasion,” Journal of
Consumer Research, 28 (June), 33–49.
Berinsky, Adam J., Gregory A. Huber and Gabriel S. Lenz (2012), “Evaluating Online Labor
Markets for Experimental Research: Amazon.com’s Mechanical Turk,” Political Analysis,
20, 351–368.
Cacioppo, John T. and Richard E. Petty (1982), “The Need for Cognition,” Journal of
Personality and Social Psychology, 42(1), 116–131.
Calder, Bobby J., Lynn W. Phillips and Alice M. Tybout (1981), “Designing Research for
Application,” Journal of Consumer Research, 8(September), 197–207.
Cohen, Jacob (1988), Statistical Power Analysis for the Behavior Sciences. Hillsdale, NJ:
Erlbaum.
Cohen, Jacob (1992), “A Power Primer,” Psychological Bulletin, 112(1), 155–159.
Greene, Bob (1978), “Worms? McDonald’s Isn’t Laughing,” Chicago Tribune (November
20), p. 1, Section 2.
Lynch, John G., Joseph W. Alba, Aradhna Krishna, Vicki G. Morwitz and Zeynep Gurhan-
Canli (2012), “Knowledge Creation in Consumer Research: Multiple Routes, Multiple
Criteria,” Journal of Consumer Psychology, 22, 473–485.
Marder, Jenny (2015), “The Internet’s Hidden Science Factory,” PBS, http://www.pbs.
org/newshour/updates/inside-amazons-hidden-science-factory/, February 11 (last accessed
October 3, 2017).
McShane, Blakeley and Ulf Böckenholt (2014), “You Cannot Step into the Same River
Twice: When Power Analyses are Optimistic,” Psychological Science, 9(6), 612–625.
Neff, Jack (2006), “Don’t Study Too Hard: MBA Marketing,” Advertising Age (March 20).
Snyder, Mark (1974), “Self-monitoring of Expressive Behavior,” Journal of Personality and
Social Psychology, 30(4), 526–537.
Tal, Aner and Brian Wansink (2015), “An Apple a Day Brings More Apples Your Way:
Healthy Samples Prime Healthier Choices,” Psychology & Marketing, 35(May), online.
Tybout, Alice M., Bobby J. Calder and Brian Sternthal (1981), “Using Information
Processing Theory to Design Marketing Strategies,” Journal of Marketing Research,
18(February), 73–79.
The digital revolution has led to an explosion of data for marketing. This
‘Big Data’ available to researchers and practitioners had created much
excitement about potential new avenues of research. In this chapter, we
argue that an additional large and potentially important part of this revo-
lution is the increased ability for researchers to use data from field experi-
ments facilitated by digital tools.
Marketing as a field, perhaps because of its historical relationship with
psychology, has embraced and idealized field experiments from an early
stage in its evolution. Roberts (1957), when evaluating statistical inference
as a tool for Marketing Research, wrote the following still powerful pas-
sage on the merits of field experiments:
32
In this section, we describe why field experiments are useful from a statisti-
cal point of view and five steps that researchers need to reflect upon when
designing a field experiment and interpreting its results. The focus of this
chapter is field experiments or interventions in the real world, rather than
the laboratory. The Lee and Tybout chapter in this volume discusses the
lab experiment method and we encourage interested readers to read that
chapter for more information.
The difference between yi1 and yi0 is the causal effect. However, this is
problematic to measure, because a single individual i cannot both receive
and not receive the treatment. Therefore, only one outcome is observed for
each individual. The unobserved outcome for any individual is the ‘coun-
terfactual.’ The lack of observable counterfactuals for each individual
means that those who experience x and those who do not are different,
even if there is a field experiment. Instead, a field experiment ensures that
ex ante, via random assignment, any differences between the treated and
control group should not matter.
The above framework makes the motivation for the use of field experi-
ments straightforward. However, the term ‘random assignment’ and its
implementation turn out to be far more challenging than they appear
The second question that a researcher should tackle after establishing the
unit of randomization is whether to conduct stratified randomization or
complete randomization.
In complete randomization, individuals (or the relevant unit of rand-
omization) are simply allocated at random into a treatment. In stratified
randomization, individuals are first divided into subsamples based on
covariate values so that each of the subsamples are more homogenous
relative to that covariate than the full sample. Then, each individual
in each of these subsets is randomized to a treatment.6 This stratified
technique is useful if a covariate is strongly correlated with an outcome.
For example, household income may be strongly correlated with purchase
behavior towards private label brands. Therefore, it may make sense, if the
researcher has access to household-level data, to stratify the sample prior
to randomization to ensure sufficient randomization occurs within, for
example, the high-income category.
There is a relatively large empirical literature discussing the merits of
different approaches to stratification in the context of schooling experi-
ments and experiments within the developing world. For examples of this
debate, see Bruhn and McKenzie (2008) and Imai et al. (2008, 2009). It is
worth pointing out, though, that the typical school setting on which this
debate is focused is often less relevant to marketing applications. First,
often in marketing it is hard to collect reliable data before an experiment
which would allow stratification and subsequent random assignment
before the experiment. Second, much of the debate is motivated by experi-
mental treatments such as a change in school class size which are very
costly and therefore obtaining statistical efficiency from a small number of
observations is paramount. For example, when randomizing 30 different
schools into different class-size conditions, one might not obtain any sta-
tistical precision in estimates simply because by unlucky chance the richest
schools were all randomly allocated into the lowest class-size condition.
However, for many marketing applications such as pricing or advertising,
the kind of cost constraints that would restrict the researcher to only look
at only 30 units of observations are less likely to be present. Furthermore,
reliable data that would allow such stratification may not be present.
field experiment ran for will affect their interpretation of their results.8
Anderson and Simester (2004) highlighted the importance of making sure
the researcher has access to a long enough period of data by showing that
the long-run effects of promotional depth were negative for established
customers, though in the short run they could look deceptively attractive
due to their ability to attract new customers. In general, researchers should
try and collect data for as long a period as possible to understand whether
any treatment they measure is stable, dissipates or increases in its effect
over time. However, for many field experiments it is hard to measure long-
run effects as the researcher does not have the ability to monitor treated
and untreated individuals over time. Therefore, in most settings research-
ers should carefully consider whether the causal effect they establish truly
reflects the long-run treatment effect.
The existence or importance of Hawthorne effects, where the mere fact
of being observed as part of a field experiment can alter outcomes, is the
subject of much academic debate (Parsons, 1974; Adair, 1984; Jones, 1992;
McCarney et al., 2007).9 In general, however, this kind of critique invites
a researcher to be thoughtful about what really is the difference between
the ‘treatment’ and the ‘control’ and what specifically they measure. The
researcher should provide reassuring evidence for the reader that the
causal effect they measure between the treatment and control is associated
with the part of the treatment they claim it is. For example, Burtch et al.
(2015) use data from a field experiment which introduced new privacy set-
tings in a crowdfunding setting. They devote much space in their article to
giving the reader evidence that the change they measure in crowdfunding
propensity really was a result of the change in privacy settings rather than
simply the introduction of a new screen or potential navigation costs for
the website user.
One obvious concern that researchers face, especially those who work
with firms, is that there may be compromises or challenges to randomiza-
tion. Firms may only be willing, for example, to experiment with, in their
view, less successful media or sales territories, and unwilling to experi-
ment with more successful ones. Similarly, firms may only be willing
to incur the costs of experimentation for their best customers. Simester
et al. (2009) provide a nice example of how a researcher faced with such
constraints can describe the selection criteria which constrained rand-
omization and provide reassuring evidence and discussion to allow the
reader to understand what the constraints mean. In their particular case,
they used the company’s decision to distinguish between ‘best’ customers
and ‘other’ customers when determining random assignment as a useful
way of exploring the underlying behavioral mechanism. In general,
though, in such circumstances the key procedure for any researcher
Pricing
(2015) and Andrews et al. (2015) are among a recent body of work explor-
ing when mobile promotions are effective.
While a majority of field experiments focus on B-to-C settings, a study
by Tadelis and Zettelmeyer (2011) demonstrates that field experiments can
likewise be very useful in understanding B-to-B transactions. The authors,
using a large-scale field experiment that randomly discloses quality
information in wholesale automobile auctions, examine how information
disclosure affects auction outcomes.
Last, field experiments have served to understand consumers’ response to
pay-what-you-want pricing. Kim et al. (2009) find in multiple field studies
that prices paid are significantly greater than zero and can even increase rev-
enues. These studies rely on experimentation over time, highlighting the dif-
ficulty for offline stores, specifically restaurants, to concurrently implement
different pricing mechanisms. By contrast, Gneezy et al. (2012) randomized
in several field experiments the price level and structure to which consumers
were exposed. They show that often, when granted the opportunity to name
the price of a product, fewer consumers choose to buy it than when the price
is fixed and low. Jung et al. (2014) demonstrate that when asked to pay as
much as they like, merely reframing payments to be on behalf of others, not
their own, leads people to pay more. Broadly related, Gneezy et al. (2010)
show that a charitable component in a purchase increased sales significantly
when coupled with a ‘pay-what-you-want’ pricing mechanism.
Product
Distribution
Last, we address to what extent field experiments are useful when explor-
ing questions of broader importance to marketers. In general, many of the
most important questions of marketing strategy, such as whether there is
a first-mover advantage, are difficult to analyze using a field experiment
technique.
However, recent research suggests that field experiments can be quite
useful for analyzing the broader policy or welfare context in which mar-
keting occurs and investigating how marketing can help correct societally
charged issues such as inequality in income or across nations. A very
useful example of this is the work of Anderson-Macdonald et al. (2015)
investigating what parts of a marketing or entrepreneurial education can
benefit small startups in South Africa. He finds that, in general, parts of
a curriculum focused on the demand side tended to be more useful than
parts of the curriculum focused on the cost side. Another notable feature
of this experiment is the mix between digital and non-digital methods in
the experimental setting. The educational treatment was done at great
expense offline, but data collected was facilitated and made less costly by
the use of digital survey tools to monitor the effects of the treatment.
Digitization and Big Data have also attracted increasing attention to
consumer privacy. Miltgen and Tucker (2014) provide some evidence from
a field experiment that when money is not involved, people tend to behave
in a privacy-protective way that is consistent with their stated privacy
preferences. However, when pecuniary rewards are in play, consumers
behave inconsistently with their stated privacy preferences, particularly
consumers who have the most online experience.12 A complement to this
work on privacy is understanding what makes consumers behave in a
non-private way and share information online. Toubia and Stephen (2013)
investigate this using a field experiment on Twitter and show that both
image-related and intrinsic utility matter as motivations.
Lastly, field experiments can shed light on a number of broader social
issues and serve as real-world validation of laboratory experiments on a
variety of topics. Gneezy et al. (2012) examine prosocial behavior in the
field and show that initial pro-social acts that come at a cost increase
the likelihood of subsequent prosocial acts. Baca-Motes et al. (2013)
show that a purely symbolic commitment to an environmentally friendly
practice significantly increases this practice. Gneezy and Rustichini (2000)
found that the introduction of fines increased late arrivals by parents at
day-care centers. Based on a field study in an all-you-can-eat restaurant,
Just and Wansink (2011) suggest that individuals are consuming to get
their money’s worth rather than consuming until their marginal hedonic
Limitations
Any empirical technique has limitations, and given the special status that
field experiments are afforded regarding causal inference in the social sci-
ences, it is particularly important to understand these limitations. We also
point our readers to the broader debate in economics about the usefulness
of field experiments (see, for example, Deaton (2009) and Banerjee and
Duflo (2008)).
Lack of Theory
incentives for absenteeism, and, in marketing, Yao et al. (2012) who use a
structural model to evaluate implied discount rates in a field experiment
where consumers were randomly switched from a linear to a three-part
tariff pricing plan as well as Dube et al. (2016 who use two field experiments
and a structural model to analyze the role of self-signaling in choices.
Another kind of work in this vein is researchers who use estimates
from a field experiment to validate their model. For example, Misra and
Nair (2011) used their estimates of differences in dynamic incentives for
sales force compensation to implement a field test of new compensation
schemes which led to $12 million annually in incremental revenues. Li
and Kannan (2014) use a field experiment to evaluate their model for
multichannel attribution.
A general challenge with field experiments is clarifying the degree of
generalizability of any one study and understanding how the lessons of
one point in time will apply in the future.13 It is perhaps a useful reminder
in particular that the aim of a field experiment is not simply to measure
a variable at one point in time, but instead to try and measure something
that has relevance to both managers and academic theory in the future.
External Generalizability
One-shot
One practical challenge of field experiments is that they often require sub-
stantial effort and/or expense and so a researcher often has only one shot.
This has two implications. First, a field experiment ‘gone wrong’ because
of a flaw in the setup, be it theoretical or in the practical implementation,
can often not easily be run again, requiring the researcher to carefully con-
sider all possible difficulties and carefully check all practical requirements
(e.g., regarding data collection) upfront. Second, it means that researchers
can usually implement only a limited set of experimental conditions. As
a result, researchers who aim to demonstrate a more complex behavioral
mechanism sometimes complement their field data with laboratory experi-
ments (Berger and Heath, 2008).
Limited Scope
Conclusion
This chapter argues that one of the major advances of the digital age
has been to allow digital experimentation. The main advantage of such
digital experimentation is to allow causal inference. The challenge now
for researchers in this space is to ensure that the causal inferences they
are making are both correct given the setting and limitations of any field
experiment, and useful in terms of advancing marketing practice.
Notes
1. This builds on a large number of books and articles that have covered similar material
(Angrist and Pischke, 2009; Manski, 2007; Meyer, 1995; Cook and Campbell, 1979;
Imbens and Wooldridge, 2009).
2. Stratified randomization can deal with this possibility when there is data on the observ-
able characteristics of different units.
3. optimizely.com
4. Roberts (1957) puts this well by advising the researcher to make sure that the popula-
tion being studied can be broken down into smaller units (families, stores, sales territo-
ries, etc.) for which the experimental stimuli can be measured and for which responses
to the stimuli are not ‘contagious.’
5. Such spillovers are currently attracting the attention of econometricians at the frontier
of the analysis of randomized experiments. We point the interested reader to the work
of Barrios et al. (2012), among others.
6. A special case of such a stratified design is a pairwise design where each stratum con-
tains a matched pair of individuals, one of whom receives the treatment and the other
does not.
7. Roberts (1957) states that ‘The sample size is large enough to measure important
responses to experimental stimuli against the background of uncontrolled sources of
variation.’
8. Roberts (1957) urges researchers to ensure that ‘The experiment is run sufficiently long
that responses to experimental stimuli will have time to manifest themselves.’
9. Roberts (1957) emphasizes that researchers should try and make sure ‘Neither the
stimulus nor the response is changed by the fact that an experiment is being conducted.’
10. Roberts (1957) somewhat anticipates this when he urges researchers to ensure that ‘The
experimentor is able to apply or withhold, as he chooses, experimental stimuli from any
particular unit of the population he is studying.’
11. http://www.mcdonalds.co.uk/ukhome/whatmakesmcdonalds/questions/food/nutrition
al-information/how-do-you-product-test-new-products.html (last accessed October 3,
2017).
12. Much work on privacy is limited by firm’s unwillingness to experiment with something
as legally and ethically sensitive as consumer privacy. Therefore, many studies have
taken the approach of Goldfarb and Tucker (2011b); Tucker (2014b) and mixed field
experiment data with quasi-experimental changes in privacy regimes.
13. Roberts (1957) urges researchers to ensure that ‘The underlying conditions of the past
persist into the future.’
References
Cook, T. D. and D. T. Campbell (1979). Quasi-Experimentation: Design & Analysis Issues for
Field Settings. Houghton Mifflin.
Deaton, A. S. (2009). Instruments of development: Randomization in the tropics, and the
search for the elusive keys to economic development. Working Paper 14690, National
Bureau of Economic Research.
Draganska, M., W. R. Hartmann, and G. Stanglein (2014). Internet versus television adver-
tising: A brand-building comparison. Journal of Marketing Research 51 (5), 578–590.
Dube, J.-P., X. Luo, and Z. Fang (2016). Self-signaling and pro-social behavior: a cause
marketing mobile field experiment. Marketing Science 36 (2), 161–186.
Duflo, E., R. Hanna, and S. P. Ryan (2012). Incentives work: Getting teachers to come to
school. American Economic Review 102 (4), 1241–78.
Dupas, P. (2014). Short-run subsidies and long-run adoption of new health products:
Evidence from a field experiment. Econometrica 82 (1), 197–228.
Fong, N. M. (2012). Targeted marketing and customer search. Available at SSRN 2097495.
Fong, N. M., Z. Fang, and X. Luo (2015). Geo-conquesting: Competitive locational target-
ing of mobile promotions. Journal of Marketing Research 52 (5), 726–735.
Gallino, S. and A. Moreno (2014). Integration of online and offline channels in retail: The
impact of sharing reliable inventory availability information. Management Science 60 (6),
1434–1451.
Gneezy, A., U. Gneezy, L. D. Nelson, and A. Brown (2010). Shared social responsibility: A
field experiment in pay-what-you-want pricing and charitable giving. Science 329 (5989),
325–327.
Gneezy, A., U. Gneezy, G. Riener, and L. D. Nelson (2012). Pay-what-you-want, identity,
and self-signaling in markets. Proceedings of the National Academy of Sciences 109 (19),
7236–7240.
Gneezy, A., A. Imas, A. Brown, L. D. Nelson, and M. I. Norton (2012). Paying to be nice:
Consistency and costly prosocial behavior. Management Science 58 (1), 179–187.
Gneezy, U. and A. Rustichini (2000). Fine is a price. Journal of Legal Studies 29, 1.
Goldfarb, A. and C. Tucker (2011a). Online display advertising: Targeting and obtrusive-
ness. Marketing Science 30 (3), 389–404.
Goldfarb, A. and C. Tucker (2011b). Privacy regulation and online advertising. Management
Science 57 (1), 57–71.
Goldfarb, A. and C. Tucker (2015). Standardization and the effectiveness of online advertis-
ing. Management Science 61 (11), 2707–2719.
Hildebrand, C., G. Häubl, and A. Herrmann (2014). Product customization via starting solu-
tions. Journal of Marketing Research 51 (6), 707–725.
Hoban, P. R. and R. E. Bucklin (2014). Effects of internet display advertising in the purchase
funnel: Model-based insights from a randomized field experiment. Journal of Marketing
Research 52 (3), 375–393.
Imai, K., G. King, C. Nall, et al. (2009). The essential role of pair matching in cluster-
randomized experiments, with application to the Mexican universal health insurance
evaluation. Statistical Science 24 (1), 29–53.
Imai, K., G. King, and E. A. Stuart (2008). Misunderstandings between experimentalists and
observationalists about causal inference. Journal of the Royal Statistical Society: Series A
(Statistics in Society) 171 (2), 481–502.
Imbens, G. and J. Wooldridge (2009). Recent developments in the econometrics of program
evaluation. Journal of Economic Literature 47 (1), 5–86.
Jones, S. R. (1992). Was there a Hawthorne effect? American Journal of Sociology 98 (3),
451–468.
Jung, M. H., L. D. Nelson, A. Gneezy, and U. Gneezy (2014). Paying more when paying for
others. Journal of Personality and Social Psychology 107 (3), 414.
Just, D. R. and B. Wansink (2011). The flat-rate pricing paradox: Conflicting effects of
‘all-you-can-eat’ buffet pricing. Review of Economics and Statistics 93 (1), 193–200.
Kim, J.-Y., M. Natter, and M. Spann (2009). Pay what you want: A new participative pricing
mechanism. Journal of Marketing 73 (1), 44–58.
Kivetz, R., O. Urminsky, and Y. Zheng (2006). The goal-gradient hypothesis resurrected:
Purchase acceleration, illusionary goal progress, and customer retention. Journal of
Marketing Research 43 (1), 39–58.
Kremer, M. and A. Holla (2009). Improving education in the developing world: What have
we learned from randomized evaluations? Annual Review of Economics 1 (1), 513–542.
Lambrecht, A. and C. Tucker (2012). Paying with money or with effort: Pricing when
customers anticipate hassle. Journal of Marketing Research 49 (1), 66–82.
Lambrecht, A. and C. Tucker (2013). When does retargeting work? Information specificity in
online advertising. Journal of Marketing Research 50 (5), 561–576.
Lambrecht, A., C. Tucker, and C. Wiertz (2017). Advertising to early trend propagators?
Evidence from Twitter. Marketing Science, forthcoming.
Lee, L. and D. Ariely (2006). Shopping goals, goal concreteness, and conditional promotions.
Journal of Consumer Research 33 (1), 60–70.
Levav, J., M. Heitmann, A. Herrmann, and S. S. Iyengar (2010). Order in product customi-
zation decisions: Evidence from field experiments. Journal of Political Economy 118 (2),
274–299.
Lewis, R. A. and J. M. Rao (2015). The Unfavorable Economics of Measuring the Returns
to Advertising, Quarterly Journal of Economics 130 (4), 1941–1973.
Lewis, R. A. and D. H. Reiley (2014a). Advertising effectively influences older users:
How field experiments can improve measurement and targeting. Review of Industrial
Organization 44 (2), 147–159.
Lewis, R. A. and D. H. Reiley (2014b). Online ads and offline sales: measuring the effect
of retail advertising via a controlled experiment on Yahoo! Quantitative Marketing and
Economics 12 (3), 235–266.
Li, H. A. and P. Kannan (2014). Attributing conversions in a multichannel online marketing
environment: An empirical model and a field experiment. Journal of Marketing Research
51 (1), 40–56.
List, J. A. (2011). Why economists should conduct field experiments and 14 tips for pulling
one off. Journal of Economic Perspectives 25 (3), 3–16.
Manski, C. F. (2007). Identification for Prediction and Decision. Harvard University Press.
McCarney, R., J. Warner, S. Iliffe, R. van Haselen, M. Griffin, and P. Fisher (2007). The
Hawthorne effect: A randomised, controlled trial. BMC medical research methodology
7 (1), 30.
Meyer, B. (1995). Natural and quasi-experiments in economics. Journal of Business and
Economic Statistics 13 (2) 151–161.
Miltgen, C. and C. Tucker (2014). Resolving the privacy paradox: Evidence from a field
experiment. Mimeo, MIT.
Misra, S. and H. S. Nair (2011). A structural model of sales-force compensation dynamics:
Estimation and field implementation. Quantitative Marketing and Economics 9 (3), 211–257.
Nosko, C. and S. Tadelis (2015). The limits of reputation in platform markets: An empirical
analysis and field experiment. National Bureau of Economic Research working paper No.
20830.
Parsons, H. M. (1974). What happened at Hawthorne? New evidence suggests the Hawthorne
effect resulted from operant reinforcement contingencies. Science 183 (4128), 922–932.
Ravallion, M. (2009). Should the randomistas rule? The Economists’ Voice 6 (2).
Roberts, H. V. (1957). The role of research in marketing management. Journal of Marketing
22 (1), 21–32.
Rubin, D. B. (2005). Causal inference using potential outcomes. Journal of the American
Statistical Association 100 (469), 322–331.
Sahni, N. (2015). Effect of temporal spacing between advertising exposures: Evidence from
an online field experiment. Quantitative Marketing and Economics 13 (3), 203–247.
Sahni, N., D. Zou, and P. K. Chintagunta (2014). Effects of targeted promotions: Evidence
from field experiments. Available at SSRN 2530290.
Schwartz, E. M., E. Bradlow, and P. Fader (2016). Customer acquisition via display advertis-
ing using multi-armed bandit experiments. Marketing Science 36 (4), 500–522.
Shu, L. L., N. Mazar, F. Gino, D. Ariely, and M. H. Bazerman (2012). Signing at the begin-
ning makes ethics salient and decreases dishonest self-reports in comparison to signing at
the end. Proceedings of the National Academy of Sciences 109 (38), 15197–15200.
Simester, D., Y. J. Hu, E. Brynjolfsson, and E. T. Anderson (2009). Dynamics of retail adver-
tising: Evidence from a field experiment. Economic Inquiry 47 (3), 482–499.
Sun, M., X. M. Zhang, and F. Zhu (2012). To belong or to be different? evidence from a
large-scale field experiment in China. NET Institute Working Paper (12–15).
Tadelis, S. and F. Zettelmeyer (2011). Information disclosure as a matching mechanism:
Theory and evidence from a field experiment. Available at SSRN 1872465.
Toubia, O. and A. T. Stephen (2013). Intrinsic vs. image-related utility in social media: Why
do people contribute content to Twitter? Marketing Science 32 (3), 368–392.
Tucker, C. (2014a). Social Advertising. Mimeo, MIT.
Tucker, C. (2014b). Social networks, personalized advertising, and privacy controls. Journal
of Marketing Research 51 (5), 546–562.
Yao, S., C. F. Mela, J. Chiang, and Y. Chen (2012). Determining consumers’ discount rates
with field studies. Journal of Marketing Research 49 (6), 822–841.
This chapter assumes the reader has a basic understanding of the work-
ings of Conjoint Analysis. For readers interested in a more comprehensive
coverage of the topic, I recommend the exhaustive reviews of academic
research in Conjoint Analysis in Agarwal et al. (2015); Bradlow (2005);
Green, Krieger and Wind (2001); or Netzer et al. (2008). Conversely,
readers who would like an introduction to the basics of conjoint meas-
urement may want to consult Sawtooth Software’s website (see http://
www.sawtoothsoftware.com/support/technical-papers#general-conjoint-
analysis and http://www.sawtoothsoftware.com/academics/teaching-aids),
or Ofek and Toubia (2014a), Rao (2010), or Green, Krieger and Wind
(2001).
52
“base price,” etc.) that each has different levels (e.g., “500 minutes,” “1,000
minutes,” “unlimited”). The output of a Conjoint Analysis study is an esti-
mation of how much each consumer in a sample values each level of each
attribute. Such preferences are called partworths, because they capture
how much each part of the product is worth to the consumer.
Conjoint Analysis takes somewhat of an indirect approach to estimat-
ing partworths. Instead of asking consumers directly how much they
value each level of each attribute, Conjoint Analysis asks consumers to
evaluate profiles, defined by a set of attribute levels. A profile might be
a “$100 plan with unlimited calls and 10 GB of data per month.” Then,
Conjoint Analysis relies on statistical analysis to disentangle the value of
each attribute level based on consumers’ evaluations of profiles. By doing
that, Conjoint Analysis builds a model of consumer behavior, which can
predict each consumer’s preferences for any profiles, even if they were
not included in the survey. For example, suppose we have five attributes
with three levels each. There are 35 = 243 possible profiles. We might ask
consumers to evaluate 15 of these profiles, estimate their partworths for
each attribute level based on these data, and then be able to predict market
share for any set of profiles that contains any number of these 243 possible
profiles.
The number of partworths estimated for each attribute is equal to
the number of levels in that attribute minus 1. The loss of one degree
of freedom emerges from statistical considerations, which will become
clear to the statistically minded reader later in the chapter. Intuitively,
each attribute in each profile must be at one level. If there are L levels
in a given attribute, it is possible to describe the level of each profile
on that attribute using only L – 1 variables. (For example, if L = 2 and
we know whether the attribute is at the first level, we can deduce with
certainty whether it is at the second level.) There are different ways
to reduce the degrees of freedom. Interested readers are referred to
Kuhfeld (2005). One simple way is to set one level of each attribute as
the “baseline” and define each other partworth in that attribute with
respect to this baseline. For example, if the partworth for “500 min”
is set as the baseline, the partworth for “1,000 minutes” captures the
additional utility provided to the consumer by an increase from 500
minutes to 1,000 minutes.
Mathematically, if consumers are indexed by i, profiles by j, and
attributes by k, Conjoint Analysis assumes that the utility of profile j for
consumer i is given as follows:
Where:
Note that this basic model assumes that all levels of all attributes enter
linearly and independently into the utility function. However, this model
may be easily extended to include interactions between attributes. For
example, if it is believed that consumers value voice minutes more in a
cellular plan when more data are available, an additional interaction term
may be included in the utility function, which would capture the joint
presence of a large number of minutes and a large amount of data. In
practice, these interactions are seldom used. One of the issues related to
the use of interactions is that the number of possible interactions is very
large. Therefore they should only be included if the researcher has a strong
and valid reason to believe that specific interactions are relevant.
Note also that the additivity of the utility function implies that the basic
model is compensatory, i.e., it is possible to “make up” for a lower value
on one attribute by increasing the value on another attribute. However,
in some cases, consumers may evaluate profiles using non-compensatory
rules. Examples of non-compensatory rules include conjunctive rules
(where a profile “passes” the rule if it meets a list of criteria, e.g., a car
has to be of a certain body type and be below a certain price), disjunctive
rules (where a profile “passes” the rule if it meets any criterion from
a list, e.g., a car has to be of a certain body type or be below a certain
price), disjunctions of conjunctions (where a profile “passes” the rule if
satisfies at least one conjunctive rule from a set of conjunctive rules – see
Hauser et al. 2010), lexicographic (where profiles are ranked based
on criteria that are considered sequentially, e.g., cars are first ranked
according to body type, then according to price), and elimination by
aspect (where profiles are eliminated from the choice set by considering
various criteria sequentially – see Tversky 1972). It has been noted that
non-compensatory decision rules might actually be approximated using
Readers are referred to Orme (2002) or Ofek and Toubia (2014b) for
guidelines regarding the first step. The second and third steps will be dis-
cussed below.
I close this section by noting that there also exist market research
methods that measure partworths directly instead of taking the indirect
approach followed by Conjoint Analysis. These methods are referred
to as “self-explicated” (Leigh, MacKay and Summers 1984; Netzer and
Srinivasan 2011). Although the self-explicated approach leads to ques-
tions that are probably easier for consumers to answer and produces
data that are easier to analyze, it suffers from one major limitation. In
particular, when asked directly how much they care about each attribute
or level, consumers have a tendency to claim that “everything is impor-
tant.” This leads to partworth estimates that do not discriminate as much
between attributes. By forcing consumers to make tradeoffs (e.g., “this
plan has more data but it is more expensive, is the difference really justi-
fied?”), Conjoint Analysis is believed to provide a more nuanced picture
of consumer preferences. Note, however, that empirical comparisons of
Conjoint Analysis versus the self-explicated approach have produced
mixed results (e.g., Leigh, MacKay and Summers 1984; Netzer and
Srinivasan 2011; Sattler and Hensel-Börner 2001), and the self-explicated
approach remains a viable alternative to Conjoint Analysis.
Survey Implementation
Format
Several formats of Conjoint Analysis have been proposed over the years.
The most traditional format is usually referred to as “ratings-based
Conjoint Analysis.” Ratings-based Conjoint Analysis consists of showing
respondents several profiles (usually between 12 and 20) and asking them
to rate each of them on some response scale. That is, each profile receives
a preference score that may be translated into a numerical value. Profiles
are assumed to be rated independently from each other by the consumer,
i.e., there are no comparison between profiles.
This older format of Conjoint Analysis offers several benefits, but it
suffers from some limitations. One of the main benefits is the ease with
which it may be implemented and the ease with which the results may
be analyzed. It is not an exaggeration to claim that with today’s tools, a
ratings-based Conjoint Analysis survey may be conducted from start to
finish within a day and with virtually no budget. In particular, libraries
exist that will provide the researcher with an efficient experimental design
(see next subsection). Online platforms like Qualtrics or SurveyMonkey
may be used to construct the online survey, i.e., obtain a link to the survey
that may be shared with respondents. This link may be sent to lists main-
tained by the researcher, or panels like Amazon Mechanical Turk may be
used to obtain several hundred respondents within a few hours, for a cost
in the order of $1 per respondent. Finally, the analysis of ratings-based
Conjoint Analysis data may be conducted using standard software such
as Microsoft Excel. These benefits make ratings-based Conjoint Analysis
a good choice for researchers working on a very tight deadline and with a
very tight budget.
However, ratings-based Conjoint Analysis also suffers from limitations.
In particular, it does not truly force respondents to make tradeoffs or to
make choices that resemble real life situations. Indeed, nothing prevents
the respondents from giving the same rating to all profiles. In addition,
rating is not an activity in which consumers engage on a regular basis
in their everyday lives (with a few notable exceptions such as product
reviews). Therefore, it is questionable whether ratings-based Conjoint
Analysis provides data that reflect the real-world decisions made by
consumers.
Experimental Design
Survey Hosting
Data Collection
Most Conjoint Analysis studies are now performed online. Many options
are available today for data collection. Some researchers have access to
proprietary mailing lists of respondents, which may include their per-
sonal contacts, existing customers, etc. Other researchers use traditional
online panels such as Research Now. Those hosting their surveys on
Qualtrics may use that same platform as a source of respondents. In par-
ticular, Qualtrics partners with several online panel companies and offers
Partworths Estimation
All these analyses rely on the same model of consumer behavior, which
specifies a utility function based on partworths, and on a link between
utility and choice. In the case of CBC, the link between utility and choice is
given simply by logistic probabilities. In the case of ratings-based conjoint,
one may, for example, assume that when given a choice between various
alternatives, a consumer would choose the one with the highest utility.
Armed with such a model of consumer choice, researchers can simulate
how the market would respond to any set of profiles. In particular,
demand simulators may be built that take as input the partworths of a
representative sample of consumers, and that estimate the market shares
of any profiles given these partworths. See Ofek and Toubia (2014b) for
an example of an Excel-based market share simulator. Such simulators
allow users to specify any number of profiles based on the list of attributes
included in the survey. These profiles may capture existing offerings,
competitors, as well as potential new offerings. Once a market share simu-
lator has been built, it is possible to “play” with the set of profiles and see
the resulting market shares immediately. In addition, several algorithms
have been proposed to find the optimal product or product line, i.e., the
set of profile specifications that will maximize profit (or other objective
functions). See Kohli and Sukumar (1990) or Belloni et al. (2008) for a
review. The implementation of these algorithms often requires customized
programming.
Hauser and Garcia (2007) used a similar method to determine the discount
that should be offered to convince wine customers to switch from cork to
screw caps. A similar approach was used in an expert report on the famous
Apple v. Samsung case, to determine how much consumers value certain
features of smartphones such as “pinch-to-zoom.” Readers are referred
to Netzer and Sambandam (2014) for a short and simplified discussion.
This approach is not without its critics, however. Notably, Allenby et al.
(2014) warn against ignoring competitive response to changes in product
attributes and stress the need to consider equilibrium profits when using
Conjoint Analysis to value product features.
Finally, once partworths have been estimated, researchers sometimes
find it useful to explore the existence of distinct segments in the population.
This may provide valuable insights to marketers and constitutes one viable
way to segment markets (other ways include demographic segmentation,
psychographic segmentation, etc.). For this, any segmentation approach
such as k-means clustering may be used.
In practice, calculations of willingness to pay may be completed very
easily using any data handling software. Market share simulators may be
implemented within Microsoft Excel or more complex technical program-
ming software. Market share simulators may also be used to approximate
the market value of an attribute, by determining the loss in profit (i.e.,
price reduction) for a company that would reduce their offering on this
attribute. Segmentation may be conducted using any available statistical
software.
behavior, researchers have studied more generally the issue of how much
attention consumers spend in Conjoint Analysis surveys, and whether
their level of attention in the survey is similar to how they would approach
choices in real life. Such evidence will be reviewed later in this section.
First, we review recent attempts to motivate participants to pay more
attention to surveys and take the task more seriously.
Incentive Alignment
Gamification
model links partworths and eye movements, which enables the researcher
to learn about the respondent’s preferences from their eye movements.
Yang, Toubia and De Jong (2015) find that this additional information
allows reducing the length of Conjoint Analysis questionnaires. In their
study, they find that leveraging eye tracking data allows extracting as
much information in 12 choice questions as would be extracted in 16
choice questions without eye tracking data. Such a model is becoming
increasingly feasible in practice, as eye-tracking technology becomes more
easily accessible. In particular, it is now possible to conduct eye-tracking
studies using the camera on the respondent’s computer or smartphone
(e.g., www.eyetrackshop.com, www.youeye.com).
In practice: whenever feasible, it is recommended to use incentive
alignment in Conjoint Analysis, despite the implied costs. It is also
recommended to design surveys that are attractive and engaging in order
to motivate respondents to pay more attention to the task. Researchers
should also implement measures and tests of attention and drop respond-
ents who appear to have been inattentive. Despite these best practices, it
is important to keep in mind that Conjoint Analysis remains a marketing
research tool, which can at best approximate real-life decisions. The first-
best option would be to manipulate choice options in real-life and observe
the resulting consumer choices. Short of this, incentive-aligned Conjoint
Analysis may be viewed as a second-best solution.
Conclusions
References
Ding, Min, “An incentive-aligned mechanism for Conjoint Analysis.” Journal of Marketing
Research 44.2 (2007): 214–223.
Ding, Min, Rajdeep Grewal, and John Liechty. “Incentive-aligned Conjoint Analysis.”
Journal of Marketing Research 42.1 (2005): 67–82.
Ding, Min, Young-Hoon Park, and Eric T. Bradlow. “Barter markets for Conjoint
Analysis.” Management Science 55.6 (2009): 1003–1017.
Ding, Min, et al. “Unstructured direct elicitation of decision rules.” Journal of Marketing
Research 48.1 (2011): 116–127.
Dong, Songting, Min Ding, and Joel Huber. “A simple mechanism to incentive-align con-
joint experiments.” International Journal of Research in Marketing 27.1 (2010): 25–32.
Elrod, Terry, Jordan J. Louviere, and Krishnakumar S. Davey. “An empirical comparison
of ratings-based and choice-based conjoint models.” Journal of Marketing Research 29.3
(1992): 368–377.
Evgeniou, Theodoros, Massimiliano Pontil, and Olivier Toubia. “A convex optimization
approach to modeling consumer heterogeneity in conjoint estimation.” Marketing Science
26.6 (2007): 805–818.
Gilbride, Timothy J. and Greg M. Allenby. “A choice model with conjunctive, disjunctive,
and compensatory screening rules.” Marketing Science 23.3 (2004): 391–406.
Green Paul, E., A. M. Krieger, and T. Vavra. “Evaluating EZ-Pass: using Conjoint Analysis
to assess consumer response to a new tollway technology.” Marketing Research 11.2 (1999):
5–16.
Green, Paul E., Abba M. Krieger, and Yoram Wind. “Thirty years of Conjoint Analysis:
Reflections and prospects.” Interfaces 31.3 supplement (2001): S56–S73.
Green, Paul E., and Vithala R. Rao. “Conjoint measurement for quantifying judgmental
data.” Journal of Marketing Research (1971): 355–363.
Green, Paul E. and Venkat Srinivasan. “Conjoint Analysis in marketing: new developments
with implications for research and practice.” Journal of Marketing (1990): 3–19.
Hauser, John R. “Consideration-set heuristics.” Journal of Business Research 67.8 (2014):
1688–1699.
Hauser, John R., Olivier Toubia, Theodoros Evgeniou, Rene Befurt, and Daria Dzyabura.
“Disjunctions of conjunctions, cognitive simplicity, and consideration sets.” Journal of
Marketing Research 47.3 (2010): 485–496.
Huber, Joel and Klaus Zwerina. “The importance of utility balance in efficient choice
designs.” Journal of Marketing research (1996): 307–317.
Jedidi, Kamel and Rajeev Kohli. “Probabilistic subset-conjunctive models for heterogeneous
consumers.” Journal of Marketing Research 42.4 (2005): 483–494.
Johnson, Richard M. “Adaptive Conjoint Analysis.” Sawtooth Software Conference
Proceedings. Sawtooth Software, Ketchum, ID, 1987.
Kamakura, Wagner A. and Gary Russell. “A probabilistic choice model for market segmen-
tation and elasticity structure.” Journal of Marketing Research 26 (1989): 379–390.
Kohli, Rajeev and Kamel Jedidi. “Representation and inference of lexicographic preference
models and their variants.” Marketing Science 26.3 (2007): 380–399.
Kohli, Rajeev and Ramamirtham Sukumar. “Heuristics for product-line design using
Conjoint Analysis.” Management Science 36.12 (1990): 1464–1478.
Kuhfeld, Warren F. “Marketing research methods in SAS.” Experimental Design, Choice,
Conjoint, and Graphical Techniques. Cary, NC, SAS-Institute TS-722 (2005).
Kuhfeld, Warren F., Randall D. Tobias, and Mark Garratt. “Efficient experimental design
with marketing research applications.” Journal of Marketing Research (1994): 545–557.
Leigh, Thomas W., David B. MacKay, and John O. Summers. “Reliability and validity
of Conjoint Analysis and self-explicated weights: A comparison.” Journal of Marketing
Research (1984): 456–462.
Lenk, Peter J., et al. “Hierarchical Bayes Conjoint Analysis: Recovery of partworth
heterogeneity from reduced experimental designs.” Marketing Science 15.2 (1996):
173–191.
Louviere, Jordan J. “Conjoint Analysis modelling of stated preferences: a review of theory,
methods, recent developments and external validity.” Journal of Transport Economics and
Policy (1988): 93–119.
Louviere, Jordan J., David A. Hensher, and Joffre D. Swait. Stated choice methods: analysis
and applications. Cambridge University Press, 2000.
Louviere, Jordan J. and George Woodworth. “Design and analysis of simulated consumer
choice or allocation experiments: an approach based on aggregate data.” Journal of
Marketing Research (1983): 350–367.
Luce, R. Duncan and John W. Tukey. “Simultaneous conjoint measurement: A new type of
fundamental measurement.” Journal of Mathematical Psychology 1.1 (1964): 1–27.
Meißner, Martin, Andres Musalem, and Joel Huber. “Eye-Tracking Reveals Processes that
Enable Conjoint Choices to Become Increasingly Efficient with Practice.” Journal of
Marketing Research. 53.1 (2016): 1–17.
Mitchell, Robert Cameron and Richard T. Carson (1989), Using Surveys to Value Public
Goods: The Contingent Valuation Method, Resources for the Future, Washington, DC.
Moore, William L. “A cross-validity comparison of rating-based and choice-based Conjoint
Analysis models.” International Journal of Research in Marketing 21.3 (2004): 299–312.
Netzer, Oded and Rajan Sambandam. “Apple vs. Samsung: The $2 Billion Case.” Columbia
CaseWorks (2014).
Netzer, Oded and Visvanathan Srinivasan. “Adaptive self-explication of multiattribute pref-
erences.” Journal of Marketing Research 48.1 (2011): 140–156.
Netzer, Oded, et al. “Beyond Conjoint Analysis: Advances in preference measurement.”
Marketing Letters 19.3–4 (2008): 337–354.
Ofek, Elie and Olivier Toubia. “Conjoint Analysis: Online Tutorial.” Harvard Business
School Tutorial 514–712. (2014a).
Ofek, Elie and Olivier Toubia. “Conjoint Analysis: A Do it Yourself Guide.” Harvard
Business School Technical Note 515–024. (2014b).
Oppenheimer, Daniel M., Tom Meyvis, and Nicolas Davidenko. “Instructional Manipulation
Checks: Detecting Satisficing to Increase Statistical Power.” Journal of Experimental
Social Psychology 45 (2009): 867–872.
Orme, Bryan. “Formulating attributes and levels in Conjoint Analysis.” Sawtooth Software
Research Paper (2002): 1–4.
Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. “Running experiments on
Amazon Mechanical Turk.” Judgment and Decision Making 5.5 (2010): 411–419.
Park, Young-Hoon, Min Ding, and Vithala R. Rao. “Eliciting preference for complex
products: A web-based upgrading method.” Journal of Marketing Research 45.5 (2008):
562–574.
Pieters, Rik and Luk Warlop. “Visual attention during brand choice: The impact of time
pressure and task motivation.” International Journal of Research in Marketing 16.1 (1999):
1–16.
Pieters, Rik and Michel Wedel. “Attention capture and transfer in advertising: Brand, picto-
rial, and text-size effects.” Journal of Marketing 68.2 (2004): 36–50.
Rao, Vithala R. “Conjoint Analysis.” Wiley International Encyclopedia of Marketing (2010).
Rossi, Peter E., and Greg M. Allenby. “Bayesian statistics and marketing.” Marketing
Science 22.3 (2003): 304–328.
Sándor, Zsolt and Michel Wedel. “Heterogeneous conjoint choice designs.” Journal of
Marketing Research 42.2 (2005): 210–218.
Sattler, Henrik and Susanne Hensel-Börner. “A comparison of conjoint measurement with
self-explicated approaches.” Conjoint Measurement. Springer (2001): 121–133.
Sawtooth Software. “The Adaptive Choice-Based Conjoint (ACBC) Technical Paper.”
Sawtooth Software Technical Paper Series (2014). Available at: http://www.sawtoothsoft
ware.com/support/technical-papers/adaptive-cbc-papers/acbc-technical-paper-2009 (last
accessed October 3, 2017).
Shi, Savannah Wei, Michel Wedel, and F. G. M. Pieters. “Information acquisition during
online decision making: A model-based exploration using eye-tracking data.” Management
Science 59.5 (2013): 1009–1026.
Srinivasan, Venkataraman and Allan D. Shocker. “Linear programming techniques for mul-
tidimensional analysis of preferences.” Psychometrika 38.3 (1973): 337–369
Stüttgen, Peter, Peter Boatwright, and Robert T. Monroe. “A satisficing choice model.”
Marketing Science 31.6 (2012): 878–899.
Todorov, Alexander, Amir Goren, and Yaacov Trope. “Probability as a psychological dis-
tance: Construal and preferences.” Journal of Experimental Social Psychology 43.3 (2007):
473–482.
Toubia, Olivier and John R. Hauser. “Research note-on managerially efficient experimental
designs.” Marketing Science 26.6 (2007): 851–858.
Toubia, Olivier, John Hauser, and Rosanna Garcia. “Probabilistic polyhedral methods for
adaptive choice-based Conjoint Analysis: Theory and application.” Marketing Science
26.5 (2007): 596–610.
Toubia, Olivier, John R. Hauser, and Duncan I. Simester. “Polyhedral methods for adaptive
choice-based Conjoint Analysis.” Journal of Marketing Research 41.1 (2004): 116–131.
Toubia, Olivier, et al. “Fast polyhedral adaptive conjoint estimation.” Marketing Science
22.3 (2003): 273–303.
Toubia, Olivier, et al. “Measuring consumer preferences using conjoint poker.” Marketing
Science 31.1 (2012): 138–156.
Trope, Yaacov and Nira Liberman. “Construal-level theory of psychological distance.”
Psychological review 117.2 (2010): 440.
Tversky, Amos. “Elimination by aspects: A theory of choice.” Psychological Review 79.4
(1972): 281.
Van der Lans, Ralf, Rik Pieters, and Michel Wedel. “Eye-movement analysis of search effec-
tiveness.” Journal of the American Statistical Association 103.482 (2008): 452–461.
Wakslak, Cheryl J., et al. “Seeing the forest when entry is unlikely: probability and the mental
representation of events.” Journal of Experimental Psychology: General 135.4 (2006): 641.
Wedel, Michel and Rik Pieters. “Eye fixations on advertisements and memory for brands: A
model and findings.” Marketing science 19.4 (2000): 297–312.
Wind, Jerry, et al. “Courtyard by Marriott: Designing a hotel facility with consumer-based
marketing models.” Interfaces 19.1 (1989): 25–47.
Yang, Liu, Olivier Toubia, and Martijn G. De Jong. “A Bounded Rationality Model of
Information Search and Choice in Preference Measurement.” Journal of Marketing
Research 52.2 (2015): 166–183.
Yang, Liu, Olivier Toubia, and Martijn G. De Jong. “Attention, Information Processing and
Choice in Incentive-Aligned Choice Experiments.” Working paper. Columbia Business
School (2017).
Yee, Michael, Ely Dahan, John R. Hauser, and James Orlin. “Greedoid-based noncompen-
satory inference.” Marketing Science 26.4 (2007): 532–549.
CLASSICAL
ECONOMETRICS
79
MIZIK_9781784716745_t.indd 81
Are performance and marketing variables stable or evolving?
Evolving Stable
COINTEGRATION TEST:
Does a long-run equilibrium exist between the evolving variables?
Yes No
81
VECTOR ERROR CORRECTION MODEL VARX MODEL IN DIFFERENCES VARX MODEL IN LEVELS
14/02/2018 16:38
Table 4.1 Persistence modeling steps*
MIZIK_9781784716745_t.indd 82
Methodological approach Relevant Literature Research questions
Econometrics Marketing
1. Unit root test Dickey and Fuller (1979) Dekimpe and Hanssens (1995a,b) Are performance and
Kwiatkowski et al. (1992) Slotegraaf and Pauwels (2008) marketing variables
Enders (1995) Nijs et al. (2001) stationary (mean/trend
reverting) or evolving
(unit root)?
2. Cointegration test
– E&G 2-step approach Engle and Granger (1987) Baghestani (1991) Do evolving variables move
– Johansen’s FIML Johansen (1988) Dekimpe and Hanssens (1999) together?
82
approach
3. Impulse Response Analysis
– IRF Lütkepohl (1993) Dekimpe and Hanssens (1995a) What is the long-term
– GIRF Pesaran and Shin (1998) Dekimpe and Hanssens (1999) performance impact of a
marketing shock?
4. Variance Decomposition
Analysis
– FEUD Hamilton (1994) Hanssens (1998) What fraction of performance
variance comes from each
marketing action?
– GFEUD Pesaran and Shin (1988) Nijs et al. (2007) Without imposing a causal
order?
Note: * The listed studies are given for illustrative purposes only. As such, the list is not meant to be exhaustive.
14/02/2018 16:38
Time-series models of short-run and long-run marketing impact 83
TECHNICAL BACKGROUND
The t-statistic of b is compared with critical values and the unit-root null
hypothesis is rejected if the obtained value is larger in absolute value than
the critical value. Indeed, if b = 0, there is no mean reversion in sales levels,
and vice versa. The m ΔSt−j terms reflect temporary sales fluctuations and
are added to make ut white noise. Because of these additional terms, one
often refers to this test as the “augmented” Dickey–Fuller (ADF) test. The
ADF test was used, for example, in Dekimpe and Hanssens (1999). They
analyzed a monthly sample of five years of market performance (number
of prescriptions), market support (national advertising and number of
sales calls to doctors) and pricing (price differential relative to the main
challenger) data for a major brand in a prescription drug market. Based
on the Schwartz (SBC) criterion (cf. infra), a value of m varying between
0 (price differential and sales-calls series) and 2 (prescription series) was
selected. The t-statistic of the b-parameter in Equation (4.4) was smaller
in absolute value than the 5 percent critical value for each of the variables,
implying the presence of a unit root in each of them.
Key decisions to be made when implementing ADF-like unit-root tests
are (1) the treatment (inclusion/omission) of various deterministic com-
ponents, (2) the determination of the number of augmented (ΔSt−j) terms,
and (3) whether or not allowance is made for structural breaks in the data.
First, Equation 4.4 tests whether or not temporary shocks may cause a
permanent deviation from the series’ fixed mean level. When dealing with
temporally disaggregated (less than annual) data, marketing researchers
may want to add deterministic seasonal dummy variables to the test equa-
tion to allow this mean level to vary across different periods of the year.
Their inclusion does not affect the critical value of the ADF test. This
is not the case, however, when a deterministic trend is added to the test
equation, in which case one tests whether shocks can initiate a permanent
deviation from that predetermined trend line. Assessing whether or not a
deterministic trend should be added is intricate because the unit-root test
is conditional on its presence, while standard tests for the presence of a
deterministic trend are, in turn, conditional on the presence of a unit root.
An often-used test sequence to resolve this issue is described in Enders
(1995, 256–257). Marketing applications include Nijs et al. (2001) and
Srinivasan, Vanhuele and Pauwels (2010), among others.
A second critical issue in the implementation of ADF tests is the
determination of the number of augmented terms. Two popular order-
determination procedures are the application of fit indices such as the AIC
or SBC criterion (see e.g. Nijs et al. 2001; Srinivasan, Pauwels, Hanssens
and Dekimpe 2004), or the top-down approach advocated by Perron
(1994). The latter approach, used in a marketing setting by Deleersnyder,
Geyskens, Gielens and Dekimpe (2002), starts with a maximal value of m,
and successively reduces this value until a model is found where the last lag
is significant, while the next-higher lag is not.
Finally, a decision has to be made whether or not to allow for a struc-
tural break in the data-generating process. Indeed, the shocks considered
in Equations 4.1–4.4 are expected to be regularly occurring, small shocks
that will not alter the underlying data-generating process. This assumption
may no longer be tenable for shocks associated with, e.g., a new product
introduction (see, e.g., Pauwels and Srinivasan 2004; Nijs et al. 2001) or
an internet channel addition (Deleersnyder et al. 2002). Such shocks tend
to be large, infrequent, and may alter the (long-run) properties of the time
series. A failure to account for these special events has been shown to bias
unit-root tests toward finding evolution. In that case, one would errone-
ously conclude that all (regular) shocks have a long-run impact, while (1)
these shocks cause only a temporary deviation from a fixed mean (deter-
ministic trend), and (2) only the special events caused a permanent shift in
the level (intercept and/or slope) of an otherwise level (trend) stationary
series. Appropriate adjustments to Equation 4.4 to account for such spe-
cial event(s) have been proposed by Perron (1994) and Zivot and Andrews
(1992), among others. Different testing procedures are used depending on
whether the presumed structural break is determined a priori (imposed) by
the researcher (as in Deleersnyder et al. 2002) or determined endogenously
(as in Kornelis, Dekimpe and Leeflang 2008).
Importantly, ADF type tests are characterized by a unit-root null
hypothesis. Many marketing studies (see, for example, Pauwels, Leeflang,
Teerling and Huizingh 2011) also apply the Kwiatkowski, Phillips,
Schmidt and Shin (1992) test, which maintains stationarity as null
hypothesis. Consistency in the conclusion (stationary versus evolving)
increases one’s confidence in the test results. To increase the power of the
tests (which may be especially called for when the time series are not very
long), researchers are increasingly adopting panel versions of the different
unit-root tests (for marketing applications, see, for example, van Heerde,
Gijsenberg, Dekimpe and Steenkamp 2013 or Luo, Raithel and Wiles
2013).
Other developments that are relevant to applied marketing researchers
deal with the design of unit-root tests that incorporate the logical-
consistency requirements of market shares (Franses, Srinivasan and
Boswijk 2001) and the use of outlier-robust unit-root (and cointegra-
tion, cf. infra) tests as described in Franses, Kloek and Lucas (1999).
Pauwels and Hanssens (2007) and Fang, Li, Huang and Palmatier (2015)
implemented rolling-window unit-root tests to identify changing regimes
of, respectively, stability and evolution over time. Unit-root tests are
basically univariate tests. Wang and Zhang (2008), however, argue that
St = b0 + b1 Mt + b2 CMt (4.5)
where J is the order of the model, and where = [uS,t uM,t uCM,t]′ ~ N(0, S).
This specification is very flexible, and reflects multiple forces or channels of
j j
influence: delayed response (p12, j=1, . . . J), purchase reinforcement (p11),
j j
performance feedback (p21), inertia in decision making (p22) and competi-
j
tive reactions (p32). Only instantaneous effects are not included directly,
but these are reflected in the variance–covariance matrix of the residuals
(S). Estimation of these models is straightforward: (1) all explanatory
variables are predetermined, so there is no concern over the identification
issues that are often encountered when specifying structural multiple-
equation models, and (2) all equations in the system have the same
explanatory variables so that OLS estimation can be applied without loss
of efficiency.
However, this flexibility comes at a certain cost. First, the number of
parameters may become exorbitant. For J = 8, for example, the VAR
model in Equation 4.7 will estimate 9 8 = 72 autoregressive parameters.
If, however, one considers a system with 5 endogenous variables, this
number increases to 25 8 = 200. Several authors (see e.g., Pesaran, Pierse
and Lee 1993; Dekimpe and Hanssens 1995a) have therefore restricted
all parameters with |t-statistic| < 1 to zero.5 While this may alleviate the
problem of estimating and interpreting so many parameters, it is unlikely
to fully eliminate it.6 As a consequence, VAR modelers typically do not
interpret the individual parameters themselves, but rather focus on the
impulse-response functions (IRFs) derived from these parameters. As
discussed in more detail in the next section, IRFs trace, over time, the
incremental performance and spending implications of an initial one-
period change in one of the support variables. In so doing, they provide a
concise summary of the information contained in this multitude of param-
eters, a summary that lends itself well to a graphical and easy-to-interpret
representation (cf. infra).
Second, no direct estimate is provided of the instantaneous effects. The
residual correlation matrix can be used to establish the presence of such
an effect, but not its direction. Various procedures have been used in the
marketing literature to deal with this issue, such as an a priori imposi-
tion of a certain causal ordering on the variables (i.e., imposing that an
instantaneous effect can occur in one, but not the other, direction) as in
Dekimpe and Hanssens (1995a), a sensitivity analysis of various causal
orderings (see e.g., Dekimpe, Hanssens and Silva-Risso 1999), or account-
ing for expected instantaneous effects in the other variables when deriving
the impulse-response functions, as implemented in Nijs et al. (2001) and
Steenkamp et al. (2005).
If some of the variables have a unit root, the VAR model in Eq. (4.7) is
specified in the differences; e.g., St, St-1, . . . are replaced by ΔSt, ΔSt-1,. . .
ln (DISt)
g11 g12 g13 g14 uS,t
ln (Ft)
£ g21 g22 g23 g24 § ≥ ¥ 1 £ uM,t § (4.9)
ln (Dt)
g31 g32 g33 g34 uCM,t
ln (FDt)
1.2
1.0
0.8
0.6
0.4 Long-run Impact
0.2
0.0
–0.2 0 5 10 15 20 25
–0.4
Weeks
2.0
1.5
1.0
0.0
0 5 10 15 20 25
Weeks
a cij (l)
t
g 2
a a cij (l)
5 t m
g 2
l50 j51
SUBSTANTIVE INSIGHTS
Marketing-mix Effectiveness
Study Contribution
MIZIK_9781784716745_t.indd 94
Panel A: Short- and long-run marketing-mix effectiveness
Dekimpe and Hanssens (1995a) Persistence measures quantify marketing’s long-run effectiveness. Image-oriented and
price-oriented advertising messages have a differential short- and long-run effect.
Dekimpe and Hanssens (1999) Different strategic scenarios (business as usual, escalation, hysteresis and evolving
business practice) have different long-run profitability implications.
Dekimpe, Hanssens, and Silva-Risso Little evidence of long-run promotional effects is found in CPG markets.
(1999)
Nijs, Dekimpe, Steenkamp and Hanssens Limited long-run category expansion effects of price promotions. The impact differs
(2001) in terms of the marketing intensity, competitive structure, and competitive conduct
in the industry.
94
Pauwels, Hanssens and Siddarth (2002) The decomposition of the promotional sales spike in category-incidence, brand-
switching and purchase-quantity effects differs depending on the time frame
considered (short versus long run).
Slotegraaf and Pauwels (2008) Both permanent and cumulative sales effects from marketing promotions are greater
for brands with higher equity and more product introductions. Brands with low
equity gain greater benefits from product introductions.
Srinivasan, Pauwels, Hanssens and Price promotions have a differential performance impact for retailers versus
Dekimpe (2004) manufacturers.
Panel B: Marketing/finance interface
Chakravarty and Grewal (2011) The past behavior of firm stock returns and volatility may create investor
expectations of short-term financial performance, which drives managers to modify
either R&D or marketing budgets or both.
Joshi and Hanssens (2010) Advertising has a direct effect on firm value, beyond its indirect effect through market
performance. The advertiser benefits, while competitors of comparable size get
hurt.
14/02/2018 16:38
Luo (2009) Negative word-of-mouth hurts firm value and increases volatility in the short run and
in the long run. It takes several months for these effects to wear in.
MIZIK_9781784716745_t.indd 95
Luo, Raithel and Wiles (2013) Variance in brand ratings across consumers (brand dispersion) affects stock prices: it
harms returns but reduces firm risk. Also, there is an asymmetric effect of downside
versus upside dispersion.
Luo and Zhang (2013) Consumer buzz and traffic in social media are useful predictors of firm value.
Pauwels, Silva-Risso, Srinivasan and New product introductions benefit firm value in the short run and the long run, while
Hanssens (2004) rebates hurt firm value in the long run. It takes several weeks for these effects to
wear in.
Panel C: On- versus offline selling
Deleersnyder, Geyskens, Gielens and Limited evidence of cannibalization by the Internet channel in the European
Dekimpe (2002) newspaper industry.
Pauwels, Leeflang, Teerling and Huizingh The long-run revenue impact of the introduction and marketing efforts of an
95
(2011) informational website depends on the product type and the consumer segment.
Pauwels and Neslin (2015) Adding bricks-and-mortar stores cannibalizes existing catalog and Internet channels
differently.
Wiesel, Pauwels and Arts (2011) Multiple cross-channel effects exist, with off-line marketing activities affecting online
funnel metrics, and online funnel metrics affecting off-line sales.
Panel D: New/social media
Demirci, Pauwels, Srinivasan and Yildirim Brand strength and the search-versus-experience nature of the category affect
(2014) the effectiveness of different types of online media, and their synergy with other
marketing actions.
Fang, Li, Huang and Palmatier (2015) Attracting existing sellers has a greater effect on click rate than new sellers in the
launch stage, but the opposite is true in the mature stage. Attracting new buyers
exerts a greater effect on click rate and price than does attracting existing buyers,
and this pattern is more pronounced in the mature stage.
14/02/2018 16:38
Table 4.2 (continued)
MIZIK_9781784716745_t.indd 96
Study Contribution
Panel D: New/social media
Kireyev, Pauwels and Gupta (2016) Display ads significantly increase search conversion. Both search and display ads
exhibit significant dynamics that improve their effectiveness and ROI over time. In
addition to increasing search conversion, display ad exposure also increases search
clicks, thereby increasing search advertising costs.
Luo and Zhang (2013) Consumer buzz and traffic in social media are useful predictors of firm value.
Srinivasan, Rutz and Pauwels (2015) Online owned, (un)earned and paid media can explain a substantial part of the path
to purchase, also for CPG brands.
96
Pauwels and Weiss (2008) Moving from free to fee structure slows the growth of free users directly and reduces
the effectiveness of marketing communications in generating free users for online
content providers.
Panel E: Inclusion of mindset metrics
Pauwels and van Ewijk (2013) Both attitude survey and online behavior metrics matter for sales explanation and
prediction in business-to-consumer categories.
Srinivasan, Vanhuele and Pauwels (2010) Mindset metrics such as advertising awareness, brand consideration and brand liking
can add explanatory power in a sales response model that already accounts for
short-run and long-run effects of advertising, price, distribution and promotion.
Note: * The listed studies are given for illustrative purposes. As such, the list is not meant to be exhaustive. The current table complements earlier
reviews in, among others, Dekimpe and Hanssens (2000, 2010).
14/02/2018 16:38
Time-series models of short-run and long-run marketing impact 97
Marketing–finance Interface
Time-series methods are well suited to analyze stock-price data, and quan-
tify their sensitivity to new marketing information. Not only can they be
employed without having to resort to strong a priori assumptions about
investor behavior such as full market efficiency, VAR models are also
very flexible to accommodate feedforward and feedback loops between
investor behavior and managerial behavior. Given the increasing interest
in understanding the linkage between product markets (“Main Street”)
and financial markets (“Wall Street”), it is not surprising that time-
series models in general, and VAR models in particular, have been used
in that research domain. Some illustrative examples are given in Panel
B of Table 4.2. More extensive reviews are available in Srinivasan and
Hanssens (2009) and Luo, Pauwels and Hanssens (2012).
New/Social Media
The emergence of new media has brought along a new set of marketing
metrics, which can easily be tracked over time. Given the multitude of
these new media (Twitter, Facebook, etc.), the large number of metrics
that can be derived from them (like website visits, paid search clicks,
Facebook likes, Facebook unlikes, etc.), and the large number of feedback
loops that may exist (not only among these online metrics themselves, but
also with more traditional offline metrics), many researchers have opted
for the flexibility of VAR models, with their data-driven identification of
relevant effects, to study these phenomena. Trusov, Bucklin and Pauwels
(2009), for example, studied the effect of word-of-mouth marketing on
member growth at an internet social network, and compared it with more
traditional marketing vehicles. Word-of-mouth referrals were found to
have a substantially longer carryover effect than more traditional mar-
keting actions, and to have higher elasticities as well. Luo and Zhang
(2013) linked various buzz and online traffic measures to the subsequent
performance of a firm’s stock in the market, while Srinivasan, Rutz and
Pauwels (2016) considered the effects of consumer activities on paid,
owned and earned online media on sales, as well as their interdependen-
cies with the more traditional marketing-mix elements of price, advertising
and distribution.
CONCLUSION
Notes
1. Strictly speaking, one could also consider the situation where ϕ > 1, in which case past
shocks become more and more important, causing the series to explode to plus or minus
infinity. Situations where the past becomes ever more important are, however, unrealistic
in marketing.
2. The previous discussion used the first-order autoregressive model to introduce the con-
cepts of stability, evolution and unit roots. The findings can easily be generalized to the
more complex autoregressive moving-average process (L)St = c + Q(L)ut. Indeed, the
stable/evolving character of a series is completely determined by whether or not some of
the roots of the autoregressive polynomial (L) = (1 − ϕ1L − . . .−ϕpLp) are equal to
one.
3. One could argue that two mean-stationary series are also in long-run equilibrium, as
each series deviates only temporarily from its mean level, and hence, from the other.
However, this situation is conceptually different from a cointegrating equilibrium, in
which a series can wander away from its previously-held positions, but not from the
other.
4. In case only a subset of the variables has a unit root or is cointegrated, mixed models are
specified.
5. Note that this may necessitate the use of SUR, rather than OLS, estimation, as the
equations may now have a different set of explanatory variables.
References
Srinivasan, Shuba, Koen Pauwels, Dominique M. Hanssens and Marnik G. Dekimpe (2004),
“Do Promotions Benefit Manufacturers, Retailers, or Both?” Management Science, 50 (5),
617–629.
Srinivasan, Shuba, Koen Pauwels and Vincent Nijs (2008), “Demand-Based Pricing versus
Past-Price Dependence: A Cost-Benefit Analysis,” Journal of Marketing, 72 (2), 15–27.
Srinivasan, Shuba, Oliver J. Rutz and Koen Pauwels (2016), “Paths to and off Purchase:
Quantifying the Impact of Traditional Marketing and Online Consumer Activity,” Journal
of the Academy of Marketing Science, 44 (4), 440–453.
Srinivasan, Shuba, Marc Vanhuele and Koen Pauwels (2010), “Mind-Set Metrics in Market
Response Models: An Integrative Approach,” Journal of Marketing Research, 47 (4),
672–684.
Steenkamp, Jan-Benedict E. M., Vincent R. Nijs, Dominique M. Hanssens and Marnik
G. Dekimpe (2005), “Competitive Reactions to Advertising and Promotion Attacks,”
Marketing Science, 24 (1), 35–54.
Tellis, Gerard J. and Philip Hans Franses (2006), “Optimal Data Interval for Estimating
Advertising Response,” Marketing Science, 25 (3), 217–229.
Trusov, Michael, Randolph E. Bucklin and Koen Pauwels (2009), “Effects of Word-of-
Mouth versus Traditional Marketing: Findings from an Internet Social Networking Site,”
Journal of Marketing, 73 (5), 90–102.
van Heerde, Harald J., Marnik G. Dekimpe and William P. Putsis, Jr. (2005), “Marketing
Models and the Lucas Critique,” Journal of Marketing Research, 42 (1), 15–21.
van Heerde, Harald J., Maarten Gijsenberg, Marnik G. Dekimpe and Jan-Benedict E. M.
Steenkamp (2013), “Price and Advertising Effectiveness over the Business Cycle,” Journal
of Marketing Research, 50 (2), 177–193.
van Heerde, Harald, J., Kristiaan Helsen and Marnik G. Dekimpe (2007), “The Impact of
a Product-Harm Crisis on Marketing Effectiveness,” Marketing Science, 26 (2), 230–245.
van Heerde, Harald J., Shuba Srinivasan and Marnik G. Dekimpe (2010), “Estimating
Cannibalization Rates for Pioneering Innovations,” Marketing Science, 29 (6), 1024–1039.
Wang, Fang and Xiao-Ping Zhang (2008), “Reasons for Market Evolution and Budgeting
Implications,” Journal of Marketing, 72 (5), 15–30.
Wiesel, Thorsten, Koen Pauwels and Joep Arts (2011), “Marketing’s Profit Impact:
Quantifying Online and Off-line Funnel Progression,” Marketing Science, 30 (4), 604–611.
Zivot, Eric and Donald W. K. Andrews (1992), “Further Evidence on the Great Crash,
the Oil Price Shock and the Unit Root Hypothesis,” Journal of Business and Economic
Statistics, 10 (3), 251–270.
APPENDIX
and computes (simulates) the future values for the various endogenous
variables, i.e.:
107
The error term uit in Equation 5.1 reflects the influence of omitted factors
affecting yit. Some of these factors reflected in the error term can be posited
to be specific to a particular cross-sectional unit i. As such, the error term
in Equation 5.1 can be expressed as
uit = mi + eit,
Equation 5.2 differs from equation 5.1 in that it allows for the time-
invariant (fixed) unobserved factors that differ across cross-sections i to
be correlated with the explanatory factors xit. The effect of these fixed
factors is reflected in the individual-specific constant ai. To the extent that
fixed effects ai are correlated with the observed explanatory variables xit
included in the model (even if the correlation is with just one of the several
explanatory variables included in the set x, see discussion of bias spread-
ing later in the chapter), the OLS or GLS estimation of Equation 5.2 will
generate biased and inconsistent coefficient estimates.
fixed effects (ai). One approach, the within (i.e., mean-difference) estima-
tor, involves analysis of deviations from the individual-specific mean of
each variable. That is, the following model is estimated:
Here, yi 5 T1 g Tt51 yit and the means of other variables are defined
similarly. Since ai 5 ai (ai is constant over time for a given cross-sectional
unit), the within transformation of the data eliminates the individual-
specific unobserved effects ai from the equation. The within estimator
for the effects of the time-varying factors b̂ is numerically identical to
the least-squares dummy variable (LSDV) estimator of b̂. The advantage
of the dummy variable approach is that it does not difference out and
provides direct estimates of a^ i . For short panels (small T and large N),
however, the estimates of a^ i are inconsistent (Cameron and Trivedi 2005,
704).
The other common approach to estimating fixed-effects models, the
first-difference estimator, involves taking first differences of the data. That
is, the following model is estimated:
(cov (ai , xit) 5 0). Other considerations should not drive the choice between
random-effects versus fixed-effects model specification (Wooldridge 2006,
493). Specification tests for choosing fixed-effects versus random-effects
exist and the Hausman (1978) test is the most popular among them. It is
focused on assessing the validity of the cov (ai , xit) 5 0 assumption. We
describe the test, its interpretation, and limitations later in the chapter.
When a lagged dependent variable enters the model with unobserved indi-
vidual effects, standard OLS, within, and random-effects estimators are
not appropriate, as we describe below.
OLS
The OLS estimator generates biased and inconsistent estimates of model
5.5. The intuition is straightforward. Consider the OLS estimation of
model 5.5:
Both yt and yt21 depend on ai. This means that the lagged dependent
variable yt21 and ai, which is a part of the composite OLS error (ai 1 eit), are
correlated. As such, the exogeneity assumption is violated and the estimate
of f, as well as the estimates for the other explanatory variables correlated
with regressor yt21 , are biased. Hsiao (2014, 86) formally derives the bias for
the OLS estimator of f in a simple autoregressive model with fixed effects
and reports that OLS tends to overestimate the magnitude of the autoregres-
sive coefficient. Higher variance of individual-specific effects s2a increases
the magnitude of the bias.
Trognon (1978) provides OLS bias formulas for a dynamic panel data
model with exogenous regressors and for an autoregressive process of
order p. Adding exogenous explanatory variables does somewhat reduce
the magnitude, but does not alter the direction or the bias in f: in the first-
order autoregressive model with exogenous regressors, the OLS estimate
of f remains biased upward and the effects of the exogenous factors are
underestimated (their estimates are biased toward zero). The direction of
the asymptotic bias for higher-order autoregressive models is difficult to
postulate a priori.
Within estimator
The within estimator is not appropriate for the dynamic panel data models
with individual-specific effects either. The within transformation of the
data in the dynamic panel data models leads to biased estimates. If we
apply the within estimator to model (5.5), we would regress ( yit 2yi ) on
(yit21 2 yi ) and (xit 2 xi):
2f 1 (1 2 fT) 21
e12 a1 2 bf
(12 f) (T21) T 12 f
The magnitude of the bias can be significant. For example, when the
true value of f 50.5 and T 5 10, the bias is equal to –0.167. This implies
a 33.4 percent deviation from the true value (i.e., –0.167/0.5). As long as f
is positive, the sign of the bias is always negative and the within estimator
underestimates the magnitude of f.
The severity of the bias for the within estimator is greater for shorter
panels. The bias diminishes for longer time series because as T‡∞, the
contribution of eit21 to ei decreases and ( yit21 2 yi,21) becomes asymptoti-
cally uncorrelated with (eit 2 ei ), reducing the dynamic panel bias of the
mean-difference (i.e., within) estimator. For large T, the asymptotic bias
is approximated by:
2 (11 f)
plimN S ` (f̂ 2 f) >
T21
Random effects
A random-effects specification is generally not appropriate in dynamic
panel data models because the assumption of no correlation between the
unobservable factors μi and the explanatory factors is violated. The logic
is straightforward. If we add a lagged dependent variable to the set of
explanatory variables in a random-effects model (5.1), we obtain the fol-
lowing model:
i
Step 2: Regress (yit 2 yit21) on Dyit21 and (xit 2 xit21) . The resulting
estimates f̂ and b̂ are consistent.
Other valid instruments also exist. For example, (yit22 2 yit23) is also
a valid instrument for ( yit21 2 yit22) . Using (yit22 2 yit23) rather than
yit22 , however, requires an additional time period of data and leaves the
researcher with N fewer observations in the final estimation step. The
strength of a particular instrumental variable is an empirical question,
and can be examined in the first stage of 2SLS estimation. The Anderson-
Hsiao estimator is implemented in Stata with xtivreg, fd command.
Extending this logic of Anderson and Hsiao (1981) further, any level or
difference of yit , appropriately lagged, is a valid instrumental variable for
( yit21 2 yit22 ) . The pool of such potential instrumental variables grows
with increasing T. Certain optimal combinations of instrumental vari-
ables might deliver more efficient estimates. Identification of this optimal
combination is at the core of Arellano and Bond (1991) estimator.
The Arellano-Bond GMM estimator specifies a system of equations
(one equation per time period) and allows the instruments to differ for
each equation (e.g., additional lags are available as instruments in later
periods). As we have many instruments and only one variable that requires
instrumentation ( yit21 2 yit22) , the system will be overidentified, calling
for the use of Generalized Method of Moments (GMM).
The method of moments estimator uses moment conditions of the type:
J ( b, f) 5 g ( b,f)r W g ( b,f)
8
i51i i51 i51 i i51
8
is a vector of the dependent variable with Dyit in row t, and Zi is a matrix
of instruments:
zi3 r 0 ... 0
0 zi4 r ( 0
Zi 5 ≥ ¥
( ( f (
0 0 . . . ziT r
The zit element of Zi is [ yit22 , yit23, . . . ,yi1 , Dxitr ] , and the number of rows
of Zi equals to T 2 2. For example, if T 5 5,
yi1 Dxi3 0 0 0 0 0 0 0
Zi 5 £ 0 0 yi2 yi1 Dxi4 0 0 0 0 §
0 0 0 0 0 yi3 yi2 yi1 Dxi5
assumption that idiosyncratic errors eit are not serially correlated. This
assumption is testable through the Arellano-Bond (1991) test for serial
correlation in errors. If eit are iid, then ∆eit exhibit negative first-order
serial correlation and zero serial correlation at higher orders. That is,
when the null hypothesis of no serial correlation is rejected at order 1,
but is not rejected at higher orders, the validity of Arellano-Bond instru-
ments is supported. The test is implemented in Stata with estat abond
command which should be run after xtabond (or xtdpd in case of system
GMM estimation). The Sargan/Hansen test of overidentifying restrictions
(Sargan 1958, Hansen 1982) assesses the joint validity of instruments in
a given model. xtabond2 command reports Sargan and Hansen statistics
separately after model estimation. Roodman (2009) offers discussion of
the tests and their interpretation.
The two-step Arellano-Bond estimation has been shown to generate
downward biased standard errors (the one-step implementation does
not have this issue). Arellano and Bond found that “the estimator of the
asymptotic standard errors of GMM2 shows a downward bias of around
20 percent relative to the finite-sample standard deviations” (1991, 285).
The Windmeijer (2005) finite sample correction resolves the issue. It is
available in Stata with the xtabond, twostep vce(robust) command syntax.
SPECIFICATION TESTING
cov (ai , xit) 5 0 holds or not, because they directly account for time-invar-
iant individual-specific unobserved heterogeneity. The random-effects
model estimates are consistent and efficient (i.e., minimum variance)
under the null hypothesis that the fixed effects and the contemporaneous
shocks are uncorrelated with the explanatory factors. However, under the
alternative hypothesis of omitted fixed effects correlated with the explana-
tory factors included in the model, the random-effects estimates will be
biased and inconsistent (see Table 5.1).
Under the null hypothesis of the time-invariant individual-specific effects
ai being uncorrelated with the explanatory factors xit (i.e., cov (ai , xit) 5 0),
the estimates from a random-effects model should not differ significantly
from the estimates obtained from a fixed-effects model. If a statistically
significant discrepancy between random-effects and fixed-effects model
estimates is not detected, the finding is interpreted as evidence in favor of
the assumption that individual effects are (approximately) uncorrelated
with the regressors. In such a case, random-effects estimates are consist-
ent and the random-effects model is preferred to fixed-effects models
because the random-effects estimates are efficient and the coefficients
on time-invariant regressors can be identified. However, if a significant
discrepancy between random-effects and fixed-effects model estimates is
found, random-effects estimates are deemed inconsistent and the fixed-
effects model is preferred.
The Hausman test statistic can be computed as:
( b̂FE 2 b̂RE) 2
H5
Var ( b̂FE ) 2 Var ( b̂RE )
Assumption of Consistency of ̂FE under Both the Null and the Alternative
Hypotheses
The Hausman test relies on the assumption that the fixed-effects estimator
b̂FE is consistent. That is, it assumes that there is no correlation between
xit and eit in any time period once fixed effects are accounted for. This
assumption can be violated. For example, it is violated if relevant variables
are omitted or the unobserved heterogeneity in the model is time-variant
and the unobserved effect varies over time (ait). In this case, a fixed-effects
estimator is not consistent, and cannot serve as an appropriate bench-
mark in the Hausman test. Under time-varying unobserved heterogeneity,
neither fixed-effects nor random-effects estimators are appropriate and
the Hausman test would not indicate that.
In the classic interpretation of the Hausman test, the difference between
the random-effects and fixed-effects model estimates is attributed to
a single issue, namely, the correlation between the unobserved fixed
effects and the explanatory factors. Often, in empirical applications the
discrepancy between the fixed-effects and random-effects estimators can
be driven by other factors.
For example, when the right-hand-side variables are subject to measure-
ment error, a fixed-effects estimator can be subject to a greater attenuation
bias compared to a corresponding cross-section estimate. The fixed-effects
estimator removes all cross-sectional variation in the data, which is good
because it removes the biases due to unobserved individual heterogene-
ity. However, is also removes useful information about the variables of
interest. Depending on the characteristics of particular data, the change
in the signal-to-noise ratio as a result of applying a fixed-effects estimator
is ambiguous, and in many cases is disadvantageous. When measurement
error is present, a researcher undertaking a Hausman test might find that
fixed-effects estimates are lower in absolute magnitude compared to the
alternative random-effects or OLS estimates. The difference might be due
to the unobserved heterogeneity biases in random-effects and OLS, or
it can be due to the attenuation bias exacerbated by the differencing of
the data in the fixed-effects estimation. In such case, rather than relying
on the Hausman test to choose between fixed-effects and random-effects
estimators, a researcher should undertake steps to investigate and tackle
the potential measurement error problem (e.g., through IV methods).
The null and the alternative hypotheses in the Hausman test refer to
extreme cases where either all covariates are exogenous (i.e., the random-
effects estimator is appropriate), or none of the regressors are exogenous
(a fixed-effects model is required). Baltagi (2005, 19) notes that one should
probably not immediately proceed with fixed-effects estimation if the
classic Hausman test rejects H0. Instead, he advises researchers to explore
models that allow for only some regressors to be correlated with the fixed
effects ai , while still maintaining the assumption (that all regressors xit are
uncorrelated with idiosyncratic shocks eit).
Hausman and Taylor (1981) developed an estimator which allows
some of the regressors in the set xit to be correlated with ai. The Hausman
and Taylor (HT) estimator is an instrumental variable-based estimator
(implemented in Stata with command xthtaylor). It combines the elements
of both fixed-effects and random-effects estimators and offers a range of
benefits. The HT procedure gives researchers additional flexibility: when it
Power Issues
The Hausman test is a statistical test derived under large sample properties.
The denominator of the Hausman statistic relies on the asymptotic
variances of coefficient estimates. The betas are assumed to be nor-
mally distributed with means b̂FE and b̂RE and the asymptotic variances
Var ( b̂FE ) and Var (b̂RE ) . The Hausman test computed for small samples
should be viewed with additional caution because the variances ( b̂FE ) and
( b̂RE ) calculated based on small samples can be far from their asymptotic
counterparts.
yi 5 a0 1 bxi 1 ei (5.11)
yi 5 a0 1 b (x* *
i 2ni ) 1 ei 5 a0 1 bx i 1 (ei 2 bni)
Because cov ( xi , ni ) 5 0, var (x*i ) 5 var (xi) 1var (ni) 5 s2x 1 s2v , and
cov (x*i , ei 2 bni) 5 2bcov (x*i , ni ) 52bs2v , we can derive the OLS estima-
tor as:
Here, xit is the true regressor of interest, and x*it is its observed value
which is measured with measurement error nit where x*it 5 xit 1 vit. For
generality, let us allow xit series to be autocorrelated with the autocor-
relation parameter gx (gx , 1) and the measurement error vit series to be
autocorrelated with the autocorrelation parameter gn (gn , 1) , such that
cov (nit , nit21 ) 5 gn s2n , where Var (nit) 5 sv2. Further, let’s assume that
the measurement error vit is not correlated with the true regressor xit , the
unobserved individual effect ai, and the idiosyncratic error eit . Estimating
model 5.13 by OLS yields the following probability limit for the estimate
8
bOLS :
sx2 Cov (xit ,ai)
8
plim N S ` bOLS 5 b 2 2
1 (5.14)
sn 1 sx sv2 1 sx2
8
The total bias of bOLS consists of two components. The first term,
2
multiplier ( s 2s1x s 2 ) , is the familiar attenuation bias caused by the presence
n x
of the measurement error. The second term (Cov(xit, αi ) /( σ2ν + σ2x)) is the
omitted variable bias caused by the failure to account for the individual
heterogeneity.
Individual-specific heterogeneity effects ai can be eliminated from
model 5.13 through first-differencing and estimating the model:
Dyit 5 bDxit 1 Deit . In this formulation, the expected value of b̂ can be
derived similarly to that in equation 5.12 as:
2
sDx
plim (b̂) 5 ba 2 1 s 2 b,
where
sDx Dv
2 5 Var (x 2 x
sDx it it21) 5Var (xit) 22cov (xit , xit21) 1Var (xit21)
sx2 (12gx)
plim N S ` b^ FD 5 b (5.15)
sx2 (12gx) 1 sn2 (1 2 gn)
We can compare the magnitude of the bias in the OLS (equation 5.14)
and first-difference (equation 5.15) estimates. If there is no measurement
error (sn2 5 0) , the first-difference estimate is unbiased while OLS is
biased because it fails to account for individual heterogeneity. If sn2 . 0,
both estimators are subject to attenuation bias, and the relative size
of the biases depends on gv and gx , the degree of autocorrelation in
the measurement error and explanatory variable, respectively. If xit is
autocorrelated stronger than the measurement error vit (i.e., gx . gv), first-
differencing xit results in a reduction in the signal-to-noise ratio making
the attenuation bias b̂FD more severe compared to the attenuation bias
component in the OLS estimate. When nit resembles white noise (no
persistence), the attenuation bias of the first-difference estimator is large,
especially for higher gx . On the other hand, as the persistence in the
measurement error increases (gn goes to 1), the attractiveness of the first-
difference estimator increases.
Table 5.2 Conditions when the attenuation bias is smaller for the within
estimator versus the first-difference estimator, under rj = 0
sx1, ai q11
1 0 q21
plim (X rX)21 ≥ ¥ 5 sx1, ai ≥ ¥ 5
N .. ..
0 qK1
where Q = plim N1 X rX. Effectively, the bias is smeared over all other
estimates. It affects not only the estimate for x1, but to the extent x1 is
c orrelated with the other explanatory variables, the estimates for the other
explanatory variables are affected as well, even though they are uncorre-
lated with the unobserved time-invariant factor ai .
Assume that the regressors x1, x2, and q are uncorrelated with the error
term e, i.e., plim N1 qre 5 0 and plim N1 xjr e 5 0 for j=1, 2. Also assume
that x1 is uncorrelated with q, while x2 is correlated with q. That is,
plim N1 x1r q 5 0, plim N1 x2r q 2 0. Further, assume that q is unobserved and
is omitted in the estimation. The estimating equation becomes
y 5 x1 b1 1x2 b2 1 h, (5.17)
where h 5qg 1e
As such, x1 is an exogenous regressor, while x2 is endogenous.
The Frisch–Waugh–Lovell theorem states that coefficients from a
multiple regression can be reconstructed from a series of bivariate
regressions. Specifically, b1 in the equation (5.17) above can be obtained
by first regressing y on x2 (step 1), then regressing x1 on x2 (step 2), and
finally regressing the residuals from step one on residuals from step two
(step 3).
Let us define the projection matrix P2 and the residual-making matrix
M2 (aka annihilator matrix) as follows:
21
P2 5 x2 ( x2r x2 ) x2r
M2 5 I 2P2 ,
M2 y 5 M2 x1 b1 1 M2 x2 b2 1 M2h (5.18)
M2 y 5 M2 x1 b1 1 M2h (5.19)
Redefining M2 y 5 |
y , M2 x1 5 |
x and M2h 5 |
h, equation (5.19) can be
written as
|
y 5|
x b1 1 |
h (5.20)
Then
b1 5 (|
xr|
x) (|x r|
21
y ) 5 (x1r M2 rM2 x1) 21 (x1r M2 rM2 y)
8
(5.21)
(x1r M2 x1) 21 x1r M2 x1 b1 1 (x1r M2 x1)21 x1r M2 x2 b2 1 (x1r M2 x1)21 x1r M2 (qg 1e)
(5.22)
(5.23)
(x1r M2 x1) 21 x1r M2 (qg 1 e) is the “smeared bias” term. To derive the
probability limit of this bias let us first simplify the x1r M2 (qg 1e)
component:
x1r (qg 1e) 2x1 rx2 (x2r x2) 21 x2 rqg 2x1 rx2 (x2r x2) 21 x2 re 5
covx1,x2 covx2,q
a2 g b (5.24)
Vx2
Employing the exogeneity assumption on x1 (i.e., plim N1 x1r e 50 and
plim N1 x1r q 5 0) and assumption plim N1 x2r e 5 0, the terms x1r (qg 1 e) and
x1 rx2 (x2r x2) 21 x2 re cancel out. (x2r x2) 21 x2r x1 is an OLS estimate from bivar-
iate regression of x1 on x2, which equals covVx , x . 1 2
x
Now, let us rewrite x1r M2 x1 as follows: 2
x1r M2 x1 5 x1r (I 2 x2 (x2r x2) 21 x2r ) x1 5 x1r x1 2 x1r x2 (x2r x2)21 x2r x1 5
cov2x1,x2 Vx1Vx2 2 cov2x1,x2
Vx1 2 5 (5.25)
Vx2 Vx2
where x1 is measured with error and x2 is measured without error. That is,
we observe x*1 5 x1 1v. If equation (5.27) is estimated by OLS then both
8
estimates, b1 and b2 are biased and inconsistent (Greene 2017):
8
1
(5.28)
8
plim b1 5 b1 a b
1 1 sv2 s11
sv2 s12
b (5.29)
8
plim b2 5 b2 2 b1 a
1 1 sv2 s11
where s ij is the ij-th element of the inverse of the covariance matrix and sv2
is the variance of the measurement error v.
8
b2 is biased and the direction of the bias can be either upward or down-
ward, depending on the sign of b1 and covariance between the two regressors.
CONCLUSION
References
Anderson, Theodore Wilbur and Cheng Hsiao (1981), “Estimation of Dynamic Models
with Error Components,” Journal of the American Statistical Association, 76 (January),
598–606.
Angrist, Joshua D. and Jörn-Steffen Pischke (2008), Mostly Harmless Econometrics: An
Empiricist’s Companion. Princeton, NJ: Princeton University Press.
Arellano, Manuel and Stephen Bond (1991), “Some Tests of Specification for Panel Data:
Monte Carlo Evidence and an Application to Employment Equations,” Review of
Economic Studies, 58(2), 277–297.
Arellano, Manuel and Olympia Bover (1995), “Another look at the instrumental variable
estimation of error-components models,” Journal of Econometrics, 68(1), 29–51.
Baltagi, Badi (2005), Econometric Analysis of Panel Data. New York: John Wiley & Sons.
Biørn, Erik (2000), “Panel Data with Measurement Errors: Instrumental Variables and
GMM Procedures Combining Levels and Differences,” Econometric Reviews, 19(4),
391–424.
Blundell, Richard and Stephen Bond (1998), “Initial Conditions and Moment Restrictions in
Dynamic Panel Data Models,” Journal of Econometrics, 87(1), 115–143.
Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics: Methods and
Applications. New York: Cambridge University Press.
Cameron, A. Colin and Pravin K. Trivedi (2009), Microeconometrics Using Stata (Vol. 5).
College Station, TX: Stata Press.
Chamberlain, Gary (1984), “Panel Data,” in Z. Griliches and M. Intriligator (eds), Handbook
of Econometrics. Amsterdam: North Holland, 1247–1318.
Greene, William (2017), Econometric Analysis. Lecture notes. http://people.stern.nyu.edu/
wgreene/Econometrics/Econometrics-I-13.pdf
Griliches, Zvi and Jerry A. Hausman (1986), “Errors in Variables in Panel Data,” Journal of
Econometrics, 31(1), 93–118.
Hansen, Lars Peter (1982), “Large Sample Properties of Generalized Method of Moments
Estimators,” Econometrica: Journal of the Econometric Society, 50(4), 1029–1054.
Hausman, Jerry A. (1978), “Specification Tests in Econometrics,” Econometrica 46
(November), 1251–1271.
Hausman, Jerry A. and William E. Taylor (1981), “Panel Data and Unobservable Individual
Effects,” Econometrica, 49(6), 1377–1398.
Hsiao, Cheng (2014), Analysis of Panel Data, Cambridge: Cambridge University Press. 3rd
edition.
Jacobson, Robert (1990), “Unobservable Effects and Business Performance,” Marketing
Science, 9 (Winter), 74–85, 92–95.
Kirzner, Israel M. (1976), “On the Method of Austrian Economics,” in E.G. Dolan (ed.), The
Foundations of Modern Austrian Economics, Kansas City: Sheed and Ward, 40–51.
Mizik, Natalie and Robert Jacobson (2004), “Are Physicians ‘Easy Marks’? Quantifying
the Effects of Detailing and Sampling on New Prescriptions,” Management Science,
1704–1715.
Mundlak, Yair (1978), “On the Pooling of Time Series and Cross-Sectional Data,”
Econometrica, 46 (January), 69–86.
Nickell, Stephen (1981), “Biases in Dynamic Models with Fixed Effects,” Econometrica,
1417–1426.
Pischke, Jörn-Steffen (2007), Lecture notes on measurement error. London School of
Economics. http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf.
Roodman, David (2009), “How to do xtabond2: An introduction to difference and system
GMM in Stata,” Stata Journal, 9 (1), 86–136.
Rumelt, Richard (1984), “Towards a Strategic Theory of the Firm,” in B. Lamb (ed.),
Competitive Strategic Management, Englewood Cliffs, NJ: Prentice Hall, 556–570.
Sargan, John D. (1958), “The estimation of economic relationships using instrumental vari-
ables,” Econometrica: Journal of the Econometric Society, 393–415.
Trognon, Alain (1978), “Miscellaneous Asymptotic Properties of Ordinary Least Squares
and Maximum Likelihood Estimators in Dynamic Error Components Models,” Annales
de l’INSEE. Institut National de la Statistique et des Études Économiques, 631–657.
Wernerfelt, Birger (1984), “A Resource-based View of the Firm,” Strategic Management
Journal, 5 (April–June), 171–180.
Windmeijer, Frank (2005), “A Finite Sample Correction for the Variance of Linear Efficient
Two-step GMM Estimators, ”Journal of Econometrics, 126 (1), 25–51.
Wooldridge, Jeffrey (2002), Econometric Analysis of Cross Section and Panel Data,
Cambridge, MA: MIT Press.
Wooldridge, Jeffrey (2006), Introductory Econometrics: A Modern Approach, Mason, OH:
Thomson/South-Western.
135
St = f(At|q) + et
A growing literature (c.f. Angrist and Pischke (2009) and Imbens and
Rubin (2015)) emphasizes a particular formulation of the problem of
causal inference. Much of this literature re-interprets existing econo-
metric methods in light of this paradigm. The basis for this paradigm of
causal inference was originally suggested by Neyman (1923), who con-
ceived of the notion of potential outcomes for a treatment. The notation
favored by Imbens and Rubin is as follows. Y represents the outcome
random variable. In our case, Y will be sales or some sort of event (like a
conversion or click) which is on the way toward a final purchase. We seek
to evaluate a treatment, denoted D. For now, consider any binary treat-
ment such as exposure to an ad. We conceive of there being two potential
outcomes:
E [ Yi 0 Di 5 1 ] 2 E [ Yi 0 Di 5 0 ] 5 E [ Yi (1) 0 Di 5 1 ] 2 E [ Yi (0) 0 Di 5 0 ]
5E [ Yi (1) 0 Di 51 ] 2E [ Yi (0) 0 Di 51 ] 1
E [ Yi (0) 0 Di 5 1 ] 2E [ Yi (0) 0 Di 50 ]
This equation simply states that what the data identifies is the mean
difference in the outcome variable between the treated and untreated, and
this can be expressed as the sum of two terms.
The first term is the effect on the treated, E [ Yi (1) 0 Di 5 1 ] 2E [ Yi (0) 0
D i = 1 and the second term is called the selection bias, E[Yi(0)|Di = 1]− E
[Yi(0)|D i = 0]. Selection bias occurs when the potential outcome for those
assigned to the treatment differs in a systematic way from those who are
assigned to the “control” or assigned not to be treated. This selection bias is
what inspired much of the work of Heckman, Angrist and Imbens to obtain
further information. The classic example of this is the so-called ability bias
argument in the literature on education. We can’t simply compare the wages
of college graduates with those who did not graduate for college because it
is likely that college graduates have greater ability even “untreated” with a
college education. Those who argue for the “certification” view of higher
education are the extreme point of this selection bias – they argue that the
only point of education is not those courses in Greek philosophy but simply
the selection bias of finding higher ability individuals.
It is useful to reflect on what sort of situations are likely to have large
selection bias in the evaluation of marketing actions. Mass media like TV
Randomized Experimentation
The problem with observational data is the potential correlation between
“treatment” assignment and the potential outcomes. We have seen that
this is likely to be a huge problem for highly targeted forms of marketing
activities where the targeting is based on customer preferences. More gen-
erally, any situation in which some of the variation in the right-hand side
variables is correlated with the error term in the sales response equation
will make any “regression-style” method inconsistent in estimating the
parameters of the causal function. For example, the classical errors-in-
variables model results in a correlation between the measured values of the
rhs variables and the error term.
In a randomized experiment, the key idea is that assignment to the treat-
ment is random and therefore uncorrelated with any other observable or
unobservable variable. In particular, assignment to the treatment is uncor-
related with the potential outcomes. This eliminates the selection bias term.
E [ Yi (0) 0 Di 5 1 ] 2 E [ Yi (0) 0 Di 5 0 ] 5 0
This means that the difference in means between the treated and
untreated populations consistently estimates not only the effect on the
treated, but also the average effect or the effect on the person chosen at
random from the population. However, it is important to understand
that when we say person chosen at random from the “population,” we
are restricting attention to the population of units eligible for assignment
in the experiment. Most experiments have a very limited domain. For
example, if we randomly assign designated market areas (DMAs) in
the northeast portion of the United States, our population is only that
restricted domain. Most of the classic social experiment in economics
have very restricted domains or population to which the results can be
extrapolated. Generalizability is the most restrictive aspect of randomized
experimentation. Experimentation in marketing applications such as
“geo” or DMA-based experiments conducted by Google and Facebook
is starting to get at experiments which are generalizable to the relevant
population (i.e. all US consumers).
Another key weakness of randomization is that this idea is really a large
St 5 g1 (Zt) 1 uS,t
Xt 5 g2 (Zt) 1 uX,t
The Wald estimator makes a great deal of intuitive sense. The numerator
is basically the derivative of the mean of S wrt to Z and the denominator
preferences and ability to buy) who was not exposed to the ad. The differ-
ence in means between the exposed (treatment) group and this synthetic
control population should be a cleaner estimate of the causal effect. In
terms of propensity scores, those with similar propensity scores are consid-
ered “twins.” In this same spirit, there is a large literature on “matching”
estimators that attempt to construct synthetic controls (c.f. Imbens and
Rubin Chapters 15 and 18). Again, any matching estimator is only as good
as the variables used in implementing “matching.”
With aggregate data, the “difference-in-differences” approach to con-
structing a control group has achieved a great deal of popularity. A nice
example of this approach can be found in Blake et al. (2015). Here they
seek to determine the impact of sponsored search ads using a “natural”
experiment in which eBay terminated paid search ads on MSN after a
certain date. The standard analysis would be simply to compare some
outcome measure such as clicks, conversions or revenue before and after
termination of the sponsored search ads. In this approach, the “control” is
the period after termination and the “experimental” or treatment period is
the before. There are two problems with this approach. First, this does not
control for other time-varying factors influencing interest in the sponsored
search keywords. Second, there can be power problems. The standard
difference-in-differences approach is to find a control condition where
there was no change in sponsored search ads. The authors use Google
organic search results as the control. The difference-in-differences method
is simply to subtract the before-and-after differences on MSN from the
before-and-after differences on Google (the control). The success of this
strategy depends on whether or not Google keyword results constitute a
valid control. Blake et al. are suspicious of this assumption and pursue a
randomized experimentation strategy to estimate the impact of sponsored
search ads.
The popularity of the differences-in-differences approach is that all that
appears to be required is some subset of the data (typically a geographically
based subset) that was not exposed to the advertisement or policy change.
It is not possible to test the assumption that the changes in the response
variable for the control subset are independent of the “treatment.” There
are also a host of power and statistical inference problems associated with
the difference-in-differences literature (see Chapters 5 and 8 of Angrist
and Pischke). As a practical matter, it is advisable to do a “placebo” test if
a difference-in-differences approach is adopted. That is, take two subsets
of the data where there should be, by definition, no treatment effect and
perform a difference-in-differences analysis on the “placebo” sample.
Up to this point, I have considered only aggregate time series data. The
problem with this data with respect to causal inference is that there can
be decisions to set the rhs variables that, over time, induce an “endog-
eneity” problem or a correlation with the model errors. The same is true
for pure cross-sectional variables. If the X variables are correlated with
unobserved cross-sectional characteristics, valid causal inferences cannot
be obtained.
If we have panel data and we think that there are unobservables that
are time invariant, then we can adopt a “fixed effects” style approach that
uses only variation within unit over time to estimate causal effects. The
only assumption required here is that the unobservables are time invari-
ant. Given that marketing data sets seldom span more than a few years,
this time invariance assumption seems eminently reasonable. It should be
noted that if the time span increases, a host of non-stationarities arise such
as the introduction of new products and entry of competitors. In sum, it
is not clear that we would want to use a long-time series of data without
modeling the evolution of the industry we are studying.
Consider the example of estimating the effect of a Super Bowl ad.
Aggregate time series data may have insufficient variation in exposure
to estimate ad effects. Pure cross-sectional variation confounds regional
preferences for products with true useful variation in ad exposure. Panel
data, on the other hand, might be very useful to isolate Super Bowl ad
effects. Klapper and Hartman (2017) exploit a short panel of six years of
data across about 50 different DMAs to estimate effects of CPG ads. They
find that there is a great deal of variation from year to year in the same
DMA in Super Bowl viewership. It is hard to believe that preferences for
these products vary from year to year in a way that is correlated with the
popularity of the Super Bowl broadcast. Far more plausible is that this
variation depends on the extent to which the Super Bowl is judged to be
interesting at the DMA level. This could be because a home team is in the
Super Bowl or it could just be due to the national or regional reputation
of the contestants. Klapper and Hartmann estimate linear models with
Brand-DMA fixed effects (intercepts) and find a large and statistically
significant effect of Super Bowl ads by beer and soft drink advertisers.
This is quite an achievement, given the cynicism in the empirical advertis-
ing literature about ability to have sufficient power to measure advertising
effects without experimental variation.
Many, if not most, of the marketing mix models estimated today are
estimated on aggregate or regional time series data. The success of Klapper
Regression Discontinuity
Model Evaluation
lnQt 5 a 1hPt 1 et
an OLS estimate of the elasticity will be too small, and we might conclude,
erroneously, that the firm should raise its price even if the firm is setting
prices optimally.
Suppose we reserve a portion of our observational data for out-of-
sample validation. That is, we will fit the log–log regression on observa-
tions, 1, 2, . . . T0, reserving observations T0+1, . . ., T for validation.9 If
we were to compare the performance of the inconsistent and biased OLS
estimator of the price elasticity with any valid causal estimate using our
“validation” data, we would conclude that OLS is superior using anything
like the MSE metric. This is because OLS is a projection-based estimator
that seeks to minimize mean squared error. The only reason OLS will
fare poorly in prediction in this sort of exercise is if the OLS model is
highly over-parameterized and the OLS procedure will over-fit the data.
However, the OLS estimator will yield non-profit maximizing prices if
used in a price optimization exercise because it is inconsistent for the true
causal elasticity parameter.
Thus, we must devise a different validation exercise in evaluating causal
estimates. We must either find different policy regimes in our observa-
tional data or we must conduct a validation experiment.
Conclusions
Notes
1. Many marketing mix models are built with advertising expenditure variables not adver-
tising exposure variables. This confounds the problem of procurement of advertising
with the measurement of exposure. Sales response models must have only exposure vari-
ables on the right-hand side.
2. Bass (1969) constructed such a model of the simultaneous determination of sales and
advertising using cigarette data.
3. The proper way to view propensity score analysis is as particular example of adding
control variables where the control variable is the propensity score.
4. Randomized assignment to treatment typically means randomized treatment in market-
ing applications. That is to say, there is always full compliance – if you are assigned to a
treatment you take it and if you are not assigned to a treatment you do not take it. An
exception might be leakage in Geo experiments – if subjects work in different areas than
they reside, some who are assigned to non-exposure may become exposed. In biostatistics
and economics, there can be an important distinction between assignment and receiving
the treatment which, fortunately, we can largely ignore in marketing applications.
5. Note that the selection bias discussed above can always be expressed as a correlation
between a treatment variable and the error term.
6. See, for example, Rossi (2014) for a more detailed discussion of this point.
7. See Imbens and Rubin, Chapter 13, for more details on propensity scores.
8. See Dube, Hitsch, and Rossi (2011) and Rossi and Allenby (2011) for examples and
further discussion.
9. It does not matter how sophisticated we are in selecting estimation and validation
subsets, any cross-validation style procedure will be subject to the same vulnerabilities
laid out here.
References
Angrist, J. D. (1990), “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from
Social Security Administrative Records,” American Economic Review 80, 313–335.
Angrist, J. D. and A. B. Kruger (1991), “Does Compulsory Schooling Attendance Affect
Schooling and Earnings?” Quarterly Journal of Economics 106, 976–1014.
Angrist, J. D. and J. Pischke (2009), Mostly Harmless Econometrics, Princeton, NJ:
Princeton University Press.
Bass, F. M. (1969), “A Simultaneous Equation Regression Study of Advertising and Sales of
Cigarettes,” Journal of Marketing Research 6, 291–300.
Blake, T., C. Nosko and S. Tadelis (2015), “Consumer Heterogeneity and Paid Search
Effectiveness: A Large Scale Field Experiment,” Econometrica 83, 155–174.
Dube, J. P., G. Hitsch, and P. E. Rossi (2011), “State Dependence and Alternative
Explanations for Consumer Inertia,” Rand Journal of Economics 41, 417–445.
Hartmann, W. and D. Klapper (2017), “Super Bowl Ads,” Marketing Science, forthcoming.
Hartmann, W., H. Nair, and S. Narayanan (2011), “Identifying Causal Marketing Mix
Effects Using a Regression Discontinuity Design,” Marketing Science 30, 1079–1097.
Heckman, J. J. and E. J. Vytacil (2007), “Econometric Evaluation of Social Programs,
Part I: Causal Models, Structural Models and Econometric Policy Evaluation,” in J. J.
Heckman and E. E. Leamer, eds, Handbook of Econometrics, Amsterdam: Elsevier, 2007,
4779–4874.
Imbens, G. W. and T. Lemieux (2008), “Regression Discontinuity Designs: A Guide to
Practice,” Journal of Econometrics 142, 807–828.
Imbens, G. and D. Rubin (2015), Causal Inference for Statistics, Social and Biomedical
Sciences: An Introduction, New York: Cambridge University Press.
Lewis, R. and J. Rao (2015), “The Unfavorable Economics of Measuring the Returns to
Advertising,” Quarterly Journal of Economics, 130(4), 1941–1973.
Neyman, J. (1923, 1990), “On the Application of Probability Theory to Agricultural
Experiments. Essay on Principles: Section 9,” translated in Statistical Science 5, 465–480.
Rossi, P. (2014), “Even the Rich Can Make Themselves Poor: A Critical Examination of IV
Methods,” Marketing Science 33, 655–672.
Rossi, P. and G. Allenby (2011), “Bayesian Applications in Marketing,” in Geweke et al. eds,
The Oxford Handbook of Bayesian Econometrics, Oxford: Oxford University Press.
Stephens-Davidowitz, S., H. Varian, and M. D. Smith (2015), “Super Returns to Super Bowl
Ads?” working paper, Google Inc.
DISCRETE CHOICE
MODELING
155
Early work in economics suggested that the utility, Uij, that consumer i
could expect to derive from product j was a function of the attributes k
(k= {1, 2, . . ., K}) that the consumer perceived the product to contain,
yijk, multiplied by how important those attributes were to the consumer, bik
(e.g., Lancaster 1966). Assuming separability of attributes and linearity in
attribute levels, this is frequently expressed as:
Given that consumer i is assumed to always buy the product with the
highest utility, Uij, but we cannot fully observe this, we need a repre-
sentation of how the unobserved utilities of product j (for j = {1, 2, . . .,
J}) relate to the actual choice and ultimately into his or her associated
probabilities of choice, Pij. Early attempts to undertake this task adopted
a share of utility model, in which a product’s probability of being chosen
equaled its utility divided by the sum of the utilities of all of the products
that might have been chosen (e.g., Luce 1959). While simple, this approach
has a number of drawbacks. First, the predicted probability of a product
being selected is not invariant to the scale used. That is, if a constant is
added to the utility of each product, the predicted probability of each
product being chosen will change. Second, Luce’s axiom, the foundation
on which this formula is predicated, requires that the ratio of any two dif-
ferent products being selected does not depend on the presence or absence
of other possible products in the available set (Bradley and Terry 1952).
This assumption, known as the independence of irrelevant alternatives (or
IIA), can be problematic in some applications. For example, assume that a
commuter has the option of driving a car or catching a blue bus, and does
so with equal probability of 0.5. If a new bus is added to the commuting
route, identical in all respects to the blue bus (schedule, comfort, price,
etc.), except that it is red, one might assume that the red bus would draw
(almost) exclusively from the blue bus for which it is a perfect substitute
and negligibly from the car, giving probabilities of PCar, PBlue, and PRed of
0.5, 0.25, and 0.25 respectively. However, a share of utility model would
suggest that the red bus would draw proportionately from the blue bus
and the car and thus lead to probabilities of probabilities of PCar, PBlue, and
PRed of 0.33, 0.33, and 0.33 respectively.
In order to adopt a more axiomatic approach to the relationship
between probability of choice and the underlying utilities on which it is
based, econometricians consider possible distributions of the error term in
equation (7.2), eij, and use these to derive the implied probability that the
utility, Uij, would be greater than the utilities of all of the other available
products, {Uij’, j’=1, 2, . . .., J and ≠ j}. This approach, the results of which
are described in the next section, led to the basic choice models that are in
common usage today.
In his Figure 1 (reproduced here as Figure 7.1), McFadden (1986)
describes the relationship between physical attributes and consumer
perceptions of them, past choice (behavior), future choice intentions, and
intermediate constructs such as preferences.
Behavioral Intentions
Experimental Constraints
Stated Intentions
BLACK BOX
Market Behavior
One attractive feature of the EV1 assumption is that by assuming that all
of the error terms for consumer i are independent and identically distrib-
uted across alternatives, it is possible to derive a closed-form solution for
the probability that any product j is chosen, as illustrated in equation (7.3).
eVij
a j r [C e
Pij 5 Vijr (7.3)
as the inclusive value, IViB, which specifies how individual brands’ utili-
ties affect the utility of a category purchase as a whole (e.g., Louviere,
Hensher and Swait 2000). IViB may be shown to be equal to the expression
in equation (7.5).
IViB 5 ln ( a eVij )
r
(7.5)
j r[C
In marketing, where strategies may be targeted at either increasing
primary demand (category demand) or secondary demand (brand choice,
given category purchase), this distinction is a particularly useful one. As an
example of this in practice in the US ground coffee market, see Guadagni
and Little (1983) for a model of brand choice (conditioned on purchase) and
Guadagni and Little (1998) for the corresponding category purchase model.
The nested logit model is also extremely useful for understanding the
structure of competition implied by consumer switching. See Urban,
Johnston and Hauser (1984) for an example in the freeze-dried coffee
market, using the nested logit model to determine the best representation
of category structure.
Pi1 = Pr(Ui1> Ui2) = Pr(Vi1 − Vi2 > ei2 − ei1) = 1− (Vi1 − Vi2) (7.6)
Decomposing utility
While it is useful to understand an individual’s choice response as a func-
tion of utility, it is far more diagnostic to the manager to decompose that
utility into more actionable measures, such as the product’s attributes
or its price. Substituting equation (7.1) into the probability of choice
(equation (7.3) or equation (7.6), for example) provides a mechanism
by which product attributes or consumers’ perceptions of them may be
related to choice. Price can be treated as an attribute for the purpose of
studying price elasticities. Frequently, more sophisticated response curves
are required than those represented by such a simple substitution. For
example, behavioral economics has suggested that price response may
not be symmetric around some reference price around which a consumer
anchors his or her judgment. In this regard, Lattin and Bucklin (1989)
demonstrate that explanatory power is increased by allowing price elastici-
ties of prices increases to be greater than those for price decreases.
Product attributes may be incorporated either as objectively measured
features (such as brand name, size, or claimed fuel economy) or subjec-
tively measured perceptions. Perceptions may be elicited using surveys of
consumers and relating those perceptions to past reported behavior or to
future behavioral intentions. For example, Danaher et al. (2011) relate the
intended probability of choosing an airline to its perceived performance,
reputation and price which they in turn relate to perceptions of 29 sub-
attributes, allowing management to focus on those perceptions with high
importance weights and performance deficits that can be cost-effectively
addressed. Guadagni and Little (1983) include the objectively measured
variables of brand and pack size. The role of objectively measured
attributes in driving choice can not only be calibrated by this type of study
of consumers’ past choices and determining how they vary as a function
of their constituent attributes, they can also be gauged by seeking a con-
sumer’s intent toward hypothetical products, using choice-based conjoint
analysis (see, for example, Rao 2014), as described below.
Other management decision variables may be incorporated into discrete
choice models, though often in a way that is somewhat arbitrary and a
matter of convenience. Inserting such explanatory variables may often
The multinomial logit in equation (7.3) and probit in equation (7.6) have
been generalized to cover a number of behavioral situations and to accom-
modate panel data where repeated choice occasions are available for each
individual. Fiebig et al. (2010) provide a comprehensive review of the situ-
ations in which extensions to the logit model may be useful. By combining
equations (7.1) and (7.2), they show how the vector of importance weights,
b = {bk } Kk51, and the properties of the error term, {eij}, can be generalized
to allow a relaxation of the IIA assumption to generate the Generalized
Multinomial Logit model. We write:
G-MNL
βi = σi.β + γ.ηi + (1-γ) σiηi
γ =1 γ=0
G-MNL-I G-MNL-II
βi = σi.β + ηi βi = σi.(βi + ηi)
var(ηi) = 0 σi = σ = 1
S-MNL MIXL
βi = σi.β βi = (βi + ηi)
σi = σ = 1 var(ηi) = 0
S-MNL
βi = σi.β
Tobit Model
In many situations observations are not available on all levels of the inde-
pendent variables that form the predictors of utility in equation (7.1). For
example, a supermarket may have a policy of not pricing milk under $1 a
pint. Purchases of milk at prices below $1 are never observed and thus con-
sumer responses are censored above the resultant utility stemming from
that price. To ignore this censoring will result in biased estimators and so
James Tobin (1958) developed the Tobit model to account for the missing
data. Chandrashekaran and Sinha (1995) provide a nice example in mar-
keting when studying trial-and-repeat. Repeat purchase is predicated on
initial trial and so repeat is not observed for all consumers, in particular
not for those for whom trial never occurs.
Estimation
with the error structure. For example, lagged price may provide a good
instrument for price. Wooldridge (2010, Chapter 6.3) provides an excellent
description of the Durbin-Wu-Hausman test to probe the degree of threat
posed by endogeneity.
MIZIK_9781784716745_t.indd 168
Type of Application Examples of application
7.1 Leveraging Consideration Consideration Consideration Choice Affect in choice Non-
the Consumer (Self-explicated) (Scanner data) (Psychology) archetypes Roberts et al. compensatory
Decision Roberts and Lattin Siddarth, Hutchinson, Swait Popal and (2015) two stage
Process (1991) Utility Bucklin& Raman and Wang (2016) Emotions choice models
Thresholds Morrison (1995) Mantrala Information in choice Gilbride and
Hauser and Heterogeneity (1994) context models Allenby (2004)
Wernerfelt Andrews and Retrieval/ of choice
(1990) Cost Srinivasan salience processes
benefit (1995)
168
Dynamics
7.2 Heterogeneity Latent segments Discrete Heterogeneity on Primary vs Discrete Use of probit
and in choice models Continuous Probit models secondary segment choice for
Segmentation Kamakura and segments Allenby and demand targeting segmentation
Russell (1989) Andrews Ainslie Rossi (1998) Arora Allenby Kamakura, Chintagunta and
and Currim and Ginter Kim and Lee Honore (1996)
(2002) (1998) (1996)
7.3 Dynamics Variety seeking Variety seeking Loyalty and Consumer Consumer Trial–repeat
and Market Lattin and and inertia heterogeneity learning learning models
Evolution McAlister (1985) jointly Ailawadi, Roberts and Erdem and Chandrashekaran
Seetharaman and Gedenk and Urban (1998) Keane (1996) and Sinha
Chintagunta Neslin (1999) Survey based Scanner (1995) Split
(1998) based hazard model
14/02/2018 16:38
MIZIK_9781784716745_t.indd 169
7.4a Product Perceptions Rating and choice Adaptive choice- Quality Menu planning
design and versus objective based conjoint based conjoint Reference Liechty,
consumer measures analysis Toubia, Hauser effects Ramaswamy,
response Adamowicz et al. Moore (2004) & Simester Hardie, Johnston and Cohen
(1997) (2004) and Fader (2001)
(1993)
7.4b Marketing Brand choice Category purchase Primary and Reference Generalizations
mix response models of choice models of secondary points price & of Reference
modeling Guadagni and choice demand promotion points
169
Little (1983) Guadagni and Gupta (1988) Lattin and Kalyanaram
Little (1995) Bucklin (1989) and Winer
(1995)
7.5 Competitive Market structure Acquisition and Portfolio models Defence Growth and
analysis and Urban, Johnson retention of choice prelaunch defence
strategy and Hauser Rust, Lemon & Ben Akiva et al. Roberts, Nelson agenda
(1984) Zeithaml (2004) (2002) and Morrison Hauser, Tellis
(2005) and Griffin
(2006)
14/02/2018 16:38
170 Handbook of marketing analytics
physical) cost of its consideration. See also Hauser and Wernerfelt (1990)
and Roberts and Lattin (1991) for the development and testing of similar
models.
For example, Roberts and Lattin (1991, equation 6) demonstrate that
a utility-maximizing consumer should include product j in his or her
consideration set if its utility, uj, passes the following threshold:
Consumers’ evaluation and choice processes are generally not static. First,
they may vary cyclically, depending on the purchase context, and, second,
they may evolve systematically over time. Choice models have been
adapted to represent both of these marketplace phenomena.
Models in which choice in one period of time is dependent on choice
in the previous period are said to exhibit state dependence. Behaviorally,
choice of an alternative in time period t + 1 that is higher than its long-
term average when that alternative was chosen in time t may be driven by
inertia or habit (e.g., Seetharaman and Chintagunta 1998). Conversely,
if a purchase in time t reduces a product’s probability of purchase on the
next occasion, that consumer is said to be exhibiting variety-seeking (e.g.,
Lattin and McAlister 1985). Kahn (1995) provides a nice classification of
the different types of variety-seeking that we might observe. Seetharaman
and Chintagunta (1998) warn of the dangers of only including one of these
phenomena in choice models when both may be present. They demon-
strate that a failure to account for inertia may lead to a false conclusion
that variety seeking is occurring in the marketplace. By the same token,
they note that a failure to adequately account for consumer heterogeneity
applied for decisions of when purchase will take place (in continuous time),
as well as what will be purchased (e.g., Chintagunta 1993). These dynamic
models may be applied to different decision stages of the diffusion process.
For example, Chandrashekaran and Sinha (1995) look at the determinants
of consumer trial and repeat using different dynamic hazard rate models.
Perhaps surprisingly, discrete choice models have had less impact at the
strategic level than the operational one. There are exceptions. In the early
days of choice modeling in marketing, Urban, Johnson and Hauser (1984)
used nested choice models to understand the market structure in the
US coffee market. Market structure analysis provides valuable strategic
insight in terms of both competitive analysis and portfolio planning.
More recently, Roberts, Nelson and Morrison (2005) developed a
dynamic brand choice model for market defense. The problems facing a
defendant are different to those facing a new entrant and so a dynamic
model to calibrate the speed and degree of the evolution of the market
is required to allow the incumbent to slow the rate of share loss and
Challenges
The rise of the internet channel has meant that the purchase process is
considerably more multichannel than it was previously. Whereas it used to
be sufficient to understand the effect of the marketing mix on the final pur-
chase decision, increasingly marketers are being asked to identify the effect
of different touch points on the final decision outcome. The ability of the
consumer to engage with the marketer at any place and at any time, and
for the marketer to engage with the consumer at any place and at any time,
means that the consumer experience corridor is attracting considerably
more attention in both academia and industry with touchpoint attribution
models being adopted by many multichannel organizations (Chittilappilly
et al. 2013). A special issue of the Journal of Retailing edited by Verhoef,
Kannan and Inman (2015) explored the challenges posed by consumers’
channel switching, both in terms of changing competitive infringement (as
witnessed by the effect of the ecommerce model of Amazon on the bricks-
and-mortar business of Borders) and channel coordination and multiple
Summary
This chapter has outlined basic choice models and showed how they can
be generalized to handle a more complex set of phenomena. The survey
focused on the application of choice models: the management decisions to
which an understanding of customers might lead. It examined the market-
ing environment for trends, and suggested challenges that will face choice
modelers as consumers become more connected with each other, more
mobile while still in touch, and more fragmented in terms of channels for
information and products and services. These trends apply to business to
consumer marketing, but they are applicable to business to business mar-
keting as well (e.g., Bolton Lemon and Verhoef 2008).
References
Adamowicz, Wiktor, Joffre Swait, Peter Boxall, Jordan Louviere and Michael Williams (1997)
“Perceptions versus objective measures of environmental quality in combined revealed and
stated preference models of environmental valuation.” Journal of Environmental Economics
and Management 32, no. 1: 65–84.
Allenby, Greg M. and Peter E. Rossi (1998) “Marketing models of consumer heterogene-
ity.” Journal of Econometrics 89, no. 1: 57–78.
Andrews, Rick L., Andrew Ainslie, Imran S. Currim (2002) “An Empirical Comparison of
Logit Choice Models with Discrete Versus Continuous Representations of Heterogeneity.”
Journal of Marketing Research 39, no. 4: 479–487.
Andrews, Rick L. and Imran S. Currim (2003) “A comparison of segment retention criteria
for finite mixture logit models.” Journal of Marketing Research 40, no. 2: 235–243.
Andrews, Rick L. and T. C. Srinivasan (1995) “Studying consideration effects in empirical
choice models using scanner panel data.” Journal of Marketing Research 32, no. 1: 30–41.
Arora, Neeraj, Greg M. Allenby and James L. Ginter (1998) “A hierarchical Bayes model of
primary and secondary demand.” Marketing Science 17, no. 1 (1998): 29–44.
Arora, Neeraj, Xavier Dreze, Anindya Ghose, James D. Hess, Raghuram Iyengar, Bing Jing,
Yogesh Joshi, V. Kumar, N. Lurie, Scott Neslin and S. Sajeesh (2008) “Putting one-to-one
marketing to work: Personalization, customization, and choice.” Marketing Letters 19,
no. 3–4: 305–321.
Bass, F. M. (1969) “A new product growth for model consumer durables.” Management
Science 15, no. 5: 215–227.
Ben-Akiva, Moshe and Bruno Boccara (1995) “Discrete choice models with latent choice
sets.” International Journal of Research in Marketing 12, no. 1: 9–24.
Ben-Akiva, Moshe E. and Steven R. Lerman (1985) Discrete choice analysis: theory and
application to travel demand. Vol. 9. Cambridge, MA: MIT Press.
Ben-Akiva, Moshe, Daniel McFadden, Kenneth Train, Joan Walker, Chandra Bhat,
Michel Bierlaire and Denis Bolduc (2002) “Hybrid choice models: progress and chal-
lenges.” Marketing Letters 13, no. 3: 163–175.
Berger, Jonah and Katherine L. Milkman (2012) “What makes online content viral?” Journal
of Marketing Research 49, no. 2: 192–205.
Berger, Paul D., Ruth N. Bolton, Douglas Bowman, Elten Briggs, V. Kumar, A. Parasuraman
and Creed Terry (2002) “Marketing Actions and the Value of Customer Assets a
Framework for Customer Asset Management.” Journal of Service Research 5, no. 1: 39–54.
Blattberg, Robert C. and John Deighton (1991) “Interactive marketing: Exploiting the age of
addressability.” Sloan Management Review 33, no. 1: 5.
Bolton, Ruth N., Katherine N. Lemon and Peter C. Verhoef (2008) “Expanding business-
to-business customer relationships: Modeling the customer’s upgrade decision.” Journal of
Marketing 72, no. 1: 46–64.
Bradley, Ralph Allan and Milton E. Terry (1952) “Rank analysis of incomplete block
designs: I. The method of paired comparisons.” Biometrika 39, no. 3/4: 324–345.
Carson, Richard T. and Jordan J. Louviere (2011) “A common nomenclature for stated pref-
erence elicitation approaches.” Environmental and Resource Economics 49, no. 4: 539–559.
Carson, Richard T. and Jordan J. Louviere (2014) “Statistical properties of consideration
sets,” Journal of Choice Modelling 13: 37–48.
Chandrashekaran, Murali and Rajiv K. Sinha (1995) “Isolating the determinants of innova-
tiveness: A split-population tobit (SPOT) duration model of timing and volume of first and
repeat purchase.” Journal of Marketing Research 32, no. 4: 444–456.
Chintagunta, Pradeep K. (1993) “Investigating purchase incidence, brand choice and pur-
chase quantity decisions of households.” Marketing Science 12, no. 2: 184–208.
Chintagunta, Pradeep K. and Bo E. Honore (1996) “Investigating the effects of marketing
variables and unobserved heterogeneity in a multinomial probit model.” International
Journal of Research in Marketing 13, no. 1: 1–15.
Chittilappilly, Anto, Madan Bharadwaj, Payman Sadegh and Darius Jose. “Method, com-
puter readable medium and system for determining weights for attributes and attribute
values for a plurality of touchpoint encounters.” US Patent Application 13/789,453, filed
March 7, 2013.
Danaher, Peter J., John H. Roberts, Alan Simpson and Ken Roberts (2011) “Practice Prize
Paper-Applying a Dynamic Model of Consumer Choice to Guide Brand Development at
Jetstar Airways.” Marketing Science 30, no. 4: 586–594.
Day, George (1994) “The capabilities of market-driven organizations.” Journal of
Marketing 58, no. 4: 37–52.
Elshiewy, O., G. Zenetti, and Y. Boztug (2017) “Differences between classical and Bayesian
estimates for mixed logit models: a replication study.” Journal of Applied Econometrics 32,
no. 2: 470–476.
Erdem, Tülin and Michael P. Keane (1996) “Decision-making under uncertainty: Capturing
dynamic brand choice processes in turbulent consumer goods markets.” Marketing Science
15, no. 1: 1–20.
Fiebig, Denzil G., Michael P. Keane, Jordan Louviere, and Nada Wasi (2010) “The
generalized multinomial logit model: accounting for scale and coefficient heterogene-
ity.” Marketing Science 29, no. 3: 393–421.
Fishbein, Martin (1967) “Attitude and Prediction of Behavior,” in Martin Fishbein (ed.),
Readings in Attitude Theory and Measurement. New York: Wiley, 477–492.
Fishbein, Martin (1976) “Extending the extended model: Some comments,” in B.B. Anderson
(ed.), Advances in Consumer Research, Vol. 3. Chicago: Association for Consumer
Research, 491–497.
Gensch, Dennis H. (1987) “A two-stage disaggregate attribute choice model,” Marketing
Science 6, no. 3: 223–239.
Gilbride, Timothy J. and Greg M. Allenby (2004) “A choice model with conjunctive, disjunc-
tive, and compensatory screening rules.” Marketing Science 23, no. 3: 391–406.
Guadagni, Peter M., and John D. C. Little (1983) “A logit model of brand choice calibrated
on scanner data.” Marketing Science 2, no. 3: 203–238.
Guadagni, Peter M. and John D. C. Little (1998) “When and what to buy: a nested logit model
of coffee purchase.” Journal of Forecasting 17, no. 3–4: 303–326.
Gupta, Sunil (1988) “Impact of sales promotions on when, what, and how much to
buy.” Journal of Marketing Research 25, no. 4: 342–355.
Hardie, Bruce G. S., Eric J. Johnson and Peter S. Fader (1993) “Modeling loss aversion and
reference dependence effects on brand choice.” Marketing Science 12, no. 4: 378–394.
Hauser, John R. and Birger Wernerfelt (1990) “An evaluation cost model of consideration
sets.” Journal of Consumer Research 16, no. 4: 393–408.
Hauser, John, Gerard J. Tellis, and Abbie Griffin (2006) “Research on innovation: A review
and agenda for marketing science.” Marketing Science 25, no. 6: 687–717.
Hausman, Jerry and Daniel McFadden (1984) “Specification tests for the multinomial logit
model.” Econometrica 52, no. 5: 1219–1240.
Hensher, David A., John M. Rose and William H. Greene (2005) Applied choice analysis: a
primer. New York: Cambridge University Press.
Humby, Clive, Terry Hunt and Tim Phillips (2004) Scoring points: How Tesco is winning
customer loyalty. London: Kogan Page.
Hutchinson, J. Wesley, Kalyan Raman and Murali K. Mantrala (1994) “Finding choice
alternatives in memory: Probability models of brand name recall.” Journal of Marketing
Research 31, no. 4: 441–461.
Jain, Dipak C. and Naufel J. Vilcassim (1991) “Investigating household purchase timing
decisions: A conditional hazard function approach.” Marketing Science 10, no. 1: 1–23.
Kahn, Barbara E. (1995) “Consumer variety-seeking among goods and services: An integra-
tive review.” Journal of Retailing and Consumer Services 2, no. 3: 139–148.
Kalyanaram, Gurumurthy and Russell S. Winer (1995) “Empirical generalizations from ref-
erence price research.” Marketing Science 14, no. 3 supplement: G161-G169.
Kamakura, Wagner A., Byung-Do Kim and Jonathan Lee (1996) “Modeling preference and
structural heterogeneity in consumer choice.” Marketing Science 15, no. 2: 152–172.
Kamakura, Wagner A. and Gary Russell (1989) “A probabilistic choice model for market
segmentation and elasticity structure.” Journal of Marketing Research 26: 379–390.
Keane, Michael P. and Nada Wasi (2013) “Comparing alternative models of heterogeneity in
Consumer choice behavior.” Journal of Applied Econometrics 28: 1018–1045.
Lancaster, Kelvin (1966) “A new approach to consumer theory.” Journal of Political
Economy 132–157.
Lattin, James M. and Randolph E. Bucklin (1989) “Reference effects of price and promotion
on brand choice behavior.” Journal of Marketing Research 26, no. 3: 299–310.
Lattin, James M. and Leigh McAlister (1985) “Using a variety-seeking model to identify
substitute and complementary relationships among competing products.” Journal of
Marketing Research 23, no. 4: 330–339.
Liechty, John, Venkatram Ramaswamy and Steven H. Cohen (2001) “Choice Menus for
Mass Customization: An Experimental Approach for Analyzing Customer Demand with
an Application to a Web-Based Information Service.” Journal of Marketing Research 38,
no. 2: 183–196.
Linden, Greg, Brent Smith and Jeremy York (2003) “Amazon.com recommendations: Item-
to-item collaborative filtering.” Internet Computing, IEEE 7, no. 1: 76–80.
Louviere, Jordan J., David A. Hensher and Joffre D. Swait (2000) Stated choice methods:
analysis and applications. New York: Cambridge University Press.
Louviere, Jordan J., Terry N., Flynn and Anthony A. J. Marley (2015) Best worst scaling
theory, methods and applications. New York: Cambridge University Press.
Louviere, Jordan J. and George Woodworth (1983) “Design and analysis of simulated con-
sumer choice or allocation experiments: an approach based on aggregate data.” Journal of
Marketing Research 20, no. 4: 350–367.
Luce, R. D. (1959) Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.
McFadden, Daniel (1974) “Conditional logit analysis of qualitative choice behavior,” in Paul
Zarembka (ed.), Frontiers in Econometrics, New York: Wiley, 105–142.
McFadden, Daniel (1986) “The choice theory approach to market research.” Marketing
Science 5, no. 4: 275–297.
Moore, William L. (2004) “A cross-validity comparison of rating-based and choice-based
181
Basic Bayes
p (D,u) p (D 0 u) p (u)
p (u 0 D) 5 5
p (D) p (D)
or
p (u 0 D) ~ p (D 0 u) p (u)
where p (D 0 u) is the likelihood of the data and p (u) is the prior distribu-
tion. The denominator, p (D) , is left out of the later expression because q
is the variable of interest, and inference is unaffected by its value, up to a
constant of proportionality. The expression above is the only theorem that
guides Bayesian analysis. Modern Bayesian computing methods use some
type of simulation method for generating draws of u from its posterior
distribution, p (u 0 D) , which summarizes all information from the prior
and the data.
The challenge in conducting Bayesian analysis is in summarizing the
information contained in the posterior distribution. The dimension of
the posterior distribution can be very large in marketing applications,
especially in models that account for heterogeneous response among
the units of analysis, such as key accounts and respondents. A conjoint
analysis involving 500 respondents and 10 partworths leads to a posterior
distribution of 5,000 parameter values, not including parameters from the
prior distribution. Modern Bayesian methods summarize the posterior
distribution via simulation methods, and in particular Monte Carlo
Markov Chain (MCMC) methods that are particularly well suited for the
analysis of hierarchical models.
The advantage of simulation methods is that they facilitate inves-
tigation of particular respondents and cross-sectional units p (ui 0 D) ,
as well as functions of interest of these parameters, i.e., p (h (u) 0 D) .
This ability contrasts with sampling theory methods that are content
with reporting point estimates and standard errors to summarize
information from the data. We believe these summary measures are
somewhat irrelevant because they are based on properties of hypotheti-
cal distributions, not on the observed data (D) . Moreover, the normal
approximation often used when interpreting standard errors can often
be very misleading when working with marketing data because of data
sparseness.
Prediction from a Bayesian point of view can be thought of in a way that
|
is similar to inference where the predictive data (D) is unobservable, and
one should compute the posterior distribution of the unobservable given
|
the observed data, p (D 0 D) . The Bayesian solution to obtaining the predic-
tive distribution for a model by integrating over the posterior distribution
of model parameters:
p (D 0 D) 5 3 p (D 0 u) p (u 0 D) du
| |
where we assume that the predictive values of that data are conditionally
independent of the past values given the model and its parameters q. The
expression above is a reminder that Bayesian analysis employs the entire
posterior distribution in making inferences and making predictions, and
|
avoids the use of plug-in approximations (i.e., p (D 0 û) ) because it does
not fully reflect the uncertainty of unobservable, latent quantities such as
parameters.
Finally, optimal decisions associated with a Bayesian analysis employs
the concept of a loss function L (a,u) which is a function of an action
(a) and an unobserved parameter or state of nature (u) . The Bayesian
approach to the problem is to choose the action a so that the posterior
expectation of loss is minimized:
then it can be shown that the optimal point estimate for u under squared-
error loss is the mean of the posterior distribution p (u 0 D) (see Zellner,
1971).
Marketing problems employ a wide variety of loss functions that
are of interest to analysts beyond squared error loss. These include the
desire to maximize profits, consumer utility and intermediate constructs
such as brand recall, recognition and consideration. Bayesian analysis
provides a flexible tool for addressing a wide range of decisions in
marketing.
To illustrate these concepts, consider a simple example involving a
binary outcome variable from a binomial distribution. The binominal dis-
tribution is often used in the analysis of marketing data when respondents
either respond or not, such as when they click on a website or purchase
a product. The outcome variable can take on two values: zero, implying
failure or no action, and one, implying success or purchase. The likelihood
for the data can be expressed as:
yt , Bin (u)
p (y 0 u) 5 q u yt (12u)12yt
T
t51
5 qn (12 q) T2n
with support on the unit interval (0,1). The posterior is obtained by multi-
plying the likelihood by the prior:
p (u 0 y) ~ p ( y 0 u) p (u)
5 [ un (12u) T2n ] 3 [ ua21 (12q) b21 ]
5 un1a21 (12 u) T2n1b21
, Beta ( n 1a,T 2n 1 b)
p ( yf 0 y) 5 3 p (yf 0 u) p (u 0 y) du
5 3 up (u 0 y) du
5 E [u 0 y ]
Bayesian Computation
p (u) 5 p ( b 0 s 2) p (s 2)
where the conditional prior for b is assumed Normal ( b, A21) and the mar-
ginal prior for s2 is assumed to be inverted gamma:
21
p (b 0 s2, b, A) 5 (s 2)2k/2 exp c ( b 2b )r A ( b2b ) d
2s 2
n0
a 2 11b 2n0 s20
p (s 2 0 n0 , s20) ~ (s 2) 2 exp c d
2s 2
The posterior distribution for the model is proportional to the product
of the likelihood and the prior:
21
~ (s 2)
2n/2
exp c (y 2Xb) r (y2Xb) d
2s 2
21
( b2 b ) r A ( b2b ) d
2k/2
3 (s 2) exp c
2s 2
1. Generate a candidate value of a parameter unew using the old value plus
a symmetric disturbance: unew 5 uold 1N (0,t2) where t2 is specified by
the analyst so that 30–50 percent of the candidates are accepted.
( new 0
2. Compute the acceptance probability a 5 min { 1, pp (uuold 0 DD }
3. Accept the new draw of u with probability a: draw a Uniform(0,1)
random variable and if U , a accept the draw of u. Otherwise, retain
the old value of u and proceed to the next draw in the recursion.
about them with regard to their stationary, long-run distributions and the
property of time reversibility. A Markov chain is a stochastic process that
describes the evolution of random variables by specifying transition prob-
abilities of moving from one realization to the next. The simplest Markov
chain contains just two states, or values, that a variable can assume and
has a matrix of transition probabilities:
p11 p12
P5 c d
p21 p22
where pij is the probability of moving from state i to state j, and the sum of
the probabilities in each row sum to one, e.g.,.p1,1 1 p1,2 5 1.0 The transi-
tion probability pi,i is the probability of staying in the state i. If the prob-
ability of being in each of the two states is initially p0 = (0.7, 0.3), then the
state probabilities after one iteration of the Markov chain are:
p1 5 p0 P
p11 p12
5 [ 0.7 0.3 ] c d
p21 p22
5 [ 0.7p11 1 0.3p21 0.7p12 1 0.3p22 ]
p1 5 p0 P
0.50 0.50
5 [ 0.7 0.3 ] c d
0.25 0.75
5 [ 0.425 0.575 ]
and we can see that the probability of being in the second state increases
from 0.30 to 0.575. As the chain continues to iterate, the state probabilities
will converge to long-run or steady-state probabilities:
p1 5 p0 P p2 5 p1 P 5 p0 PP 5 p0 P2 pr 5 p0Pr
and the effects of the starting distribution p0 would wear off. The chain
will converge to what is know as the stationary distribution, p, defined
such that:
pP 5 p
For the transition matrix P defined above, it can be verified that the
long-run stationary distribution is p 5 [ 13 23 ] which is obtained regardless
of the initial probabilities p0.
The goal of the MH algorithm is to construct a Markov chain with sta-
tionary distribution equal to the posterior distribution of a specific model.
This is accomplished by making the chain time-reversible with respect
to the posterior. A time reversible chain is one where the probability of
moving from state i to state j is the same as moving from state j to state i.
At any point in time, the probability of seeing an i→j transition is pi pij and
so a chain is time reversible if
pi pij 5 pj pji
pP 5 p
where
pj qij
a 5 min e 1, f
pi qij
That is, a candidate state value is generated according to the transition
matrix Q and accepted with probability a. With probability 1 2a the
candidate value is rejected and the old value is retained. This algorithm
results in a Markov chain with stationary distribution p.
pi qij
pj pji 5 pi qji min e 1, f 5 min { pj qji , pi qij }
pj qji
The right sides of the above expressions are the same, and therefore
pi pij 5 pj pji and the resulting Markov chain has stationary distribution
p. If we select p as the posterior distribution of our model, and we regard
the “states” of the stochastic process as the possible values that our model
parameters can assume, then the resulting Markov chain will simulate
draws from the posterior distribution p. All that is needed is to be able to
evaluate the posterior distribution up to the constant of proportionality
that cancels from the numerator and denominator of the above expression:
pi ~ p (D 0 ui ) p (ui)
Bayes in Marketing
Unit-level Models
and
0 if z # 0
ye
1 if z . 0
which is used in regression analysis when the data takes on positive values
with a mass buildup at zero. A final example is the ordered probit model
used in the analysis of ranked outcome data:
where the observed data take on integer values depending on the relation-
ship of the censored regression value and cutoff values { cr } . This model
is often used to model integer data from fixed point rating scales found
customer satisfaction data.
The above models are all examples of hierarchical models that can be
written in the form:
y | z
z | x, b
where all information in the data (y) is transmitted to the model param-
eters through the latent variable z. In other words, y and b are independent
of each other, given z. Models employing conditional independence are
known as hierarchical models, and as we will see below they are particu-
larly well suited to be estimated by Bayesian MCMC methods.
We can write our model using brackets to denote distributions as:
where the first factor is the censoring mechanism, the second factor is the
latent regression and the third factor is the prior on b.
The traditional analysis of these models typically integrates the latent
variable z from the model likelihood and finds the parameter values that
maximize the probability of the observed data. The Bayesian analysis of
censored regression models differs in that the latent variable z is intro-
duced as a latent object of interest and Bayes theorem is used to obtain
parameter estimates. The Gibbs sampler for this model involves first
generating draws from the full conditional distribution of z given all other
parameters:
1. [ z 0 else ] ~ [ y 0 z ] [ z 0 x, b ]
which takes the form of a censored normal distribution, being greater than
zero when y equals one, and negative when y is equal to zero. The second
step in model estimation involves draw of the latent regression coefficients:
2. [ b 0 else ] ~ [ z 0 x, b ] [ b ]
which are draws from the standard regression model conditional on the
previous draws of z. The advantage of Bayesian estimation is seen here
in two ways: (1) the MCMC iterations involve simplified portions of the
entire likelihood involving only the parameter of interest; and (2) draws of
Heterogeneity
p (u1,...uI , t 0 h) ~ c q p (ui 0 t) d 3 p (t 0 h)
i
and the hyper-parameters t are the mean and covariance matrix of the
Normal distribution. An additional prior distribution is provided on the
hyper-parameters, so that the analyst is not forced to specify the location
and variability of the distribution of heterogeneity, but this can be inferred
from the data. The parameters of the prior distribution of hyper-param-
eters, h, are specified by the analyst and are not estimated from the data.
The non-Bayesian analysis of models with cross-sectional variation of
model parameters are known as random-effect models. Since models are
viewed as fixed but unknown constants in the classical statistics paradigm,
the random effects { ui } are typically integrated out of the model to obtain
the marginal likelihood of the data given the hyper-parameters:
p (D 0 t) 5 q 3 p ( yi 0 ui ) p (ui 0 t) dui
i
MCMC methods are then used to generate draws from the high-
dimensional posterior distribution of all model parameters. The posterior
distribution then needs to be marginalized to obtain the posterior distribu-
tion of any particular parameters, e.g.,
where “u21” denotes the set { ui } except for the first respondent.
Fortunately, this integration is easy to evaluate with the MCMC estimator
Decision theory is one of the most powerful aspects of the Bayesian para-
digm. Bayesian decision theory identifies the optimal action as the one
that minimizes expected posterior loss, where the loss function can be
broadly construed and can include aspects of profits and consumer utility.
We note that the loss function is completely distinct from the likelihood
or model that is assumed to generate the data. The posterior distribu-
tion arises from the prior and the likelihood, and the loss function can be
chosen completely distinct from the process assumed to generate the data.
A special case of decision theory is model selection. If we assume that
the loss function is a 0–1 binary function for choosing the correct model,
then the best model is the one that maximizes the posterior probability
of the model being correct. The posterior probability of a model can be
calculated using the Bayes theorem:
p (D 0 Mm) p (Mm)
p (Mm 0 D) 5
p (D)
where Mi denote model “m.” The posterior model probabilities are often
expressed in terms of a posterior odds ratio that compares two models
against each other:
p (M1 0 D) p (D 0 M1) p (M1)
5 3
p (M2 0 D) p (D 0 M2) p (M2)
equal to the Bayes factor multiplied by the prior odds of the models. The
Bayes factor is the ratio of the marginal distribution of the data, or the
average of the likelihood with respect to the prior:
Concluding Comments
References
Allenby, Greg M., Neeraj Arora and James L. Ginter (1998) “On the Heterogeneity of
Demand,” Journal of Marketing Research, 35, 384–389.
Allenby, Greg M. and Peter E. Rossi (1999) “Marketing Models of Consumer Heterogeneity,”
Journal of Econometrics, 89, 57–78.
Allenby, Greg M., Jeff D. Brazell, John R. Howell and Peter E. Rossi (2014) “Economic
Valuation of Product Features,” Quantitative Marketing and Economics, 12, 421–456.
Berger, J. O. and R. L. Wolpert (1988) “The Likelihood Principle. Institute of Mathematical
Statistics.” Lecture Notes 6.
Büschken, Joachim, Thomas Otter and Greg M. Allenby (2013) “The Dimensionality
of Customer Satisfaction Survey Responses and Implications for Driver Analysis,”
Marketing Science, 32(4), 533–553.
Chankukala, Sandeep, Yancy Edwards and Greg M. Allenby (2011) “Identifying Unmet
Demand,” Marketing Science, 30(1), 61–73.
Chib, Siddhartha and Edward Greenberg (1995) “Understanding the Metropolis-Hastings
Algorithm,” American Statistician, 49(4), 327–335.
Gelfand, Alan E. and Dipak K. Dey (1994) “Bayesian Model Choice: Asymptotics and
Exact Calculations,” Journal of the Royal Statistical Society. Series B (Methodological)
501–514.
Gilbride, Timothy J. and Greg M. Allenby (2004) “A Choice Model with Conjunctive,
Disjunctive, and Compensatory Screening Rules,” Marketing Science, 23(3), 391–406.
Gilbride, Timothy J., Greg M. Allenby and Jeff Brazell (2006) “Models of Heterogeneous
Variable Selection,” Journal of Marketing Research, 43, 420–430.
Hansen, Lars Peter (1982) “Large Sample Properties of Generalized Method of Moments
Estimators,” Econometrica: Journal of the Econometric Society, 50(4), 1029–1054.
Howell, John R., Sanghak Lee and Greg M. Allenby (2015) “Price Promotions in Choice
Models,” Marketing Science, 35(2), 319–334.
Kim, Sunghoon, Simon J. Blanchard, Wayne S. DeSarbo and Duncan K.H. Fong (2013)
“Implementing Managerial Constraints in Model-Based Segmentation: Extension of Kim,
Fong, and DeSarbo (2012) with an Application to Heterogeneous Perceptions of Service
Quality,” Journal of Marketing Research, 50, 664–673.
Kim, Jaehwan, Greg M. Allenby and Peter E. Rossi (2002) “Modeling Consumer Demand
for Variety,” Marketing Science, 21(3), 229–250.
Lee, Sanghak, Jaehwan Kim and Greg M. Allenby (2013) “A Direct Utility Model for
Asymmetric Complements,” Marketing Science, 32(3), 454–470.
Lee, Sanghak and Greg M. Allenby (2014) “Modeling Indivisible Demand,” Marketing
Science, 33(3), 364–381.
Manchanda, Puneet, Peter E. Rossi and Pradeep K. Chintagunta (2004) “Response Modeling
with Nonrandom Marketing-mix Variables,” Journal of Marketing Research, 41(4),
467–478.
Marshall, Pablo and Eric T. Bradlow (2002) “A Unified Approach to Conjoint Analysis
Models,” Journal of the American Statistical Association, 97(459), 674–682.
Newton, Michael A. and Adrian E. Raftery (1994) “Approximate Bayesian inference with
the weighted likelihood bootstrap,” Journal of the Royal Statistical Society. Series B
(Methodological): 3–48.
Otter, Thomas, Timothy J. Gilbride and Greg M. Allenby (2011) “Testing Models of
Strategic Behavior Characterized by Conditional Likelihoods,” Marketing Science, 30(4),
686–701.
Rossi, Peter E. (2014) Bayesian Semi-Parametric and Non-Parametric Methods in Marketing
and Micro-Econometrics. Princeton, NJ: Princeton University Press.
Rossi, Peter E., Zvi Gilula and Greg M. Allenby (2001) “Overcoming Scale Usage
Heterogeneity: A Bayesian Hierarchical Approach,” Journal of the American Statistical
Association, 96, 20–31.
Rossi, Peter E., Robert E. McCulloch and Greg M. Allenby (1996) “The Value of Purchase
History Data in Target Marketing,” Marketing Science, 15, 321–340.
Rossi, Peter E., Greg M. Allenby and Robert McCulloch (2005) Bayesian Statistics and
Marketing. New York: John Wiley & Sons.
Satomura, Takuya, Jaehwan Kim and Greg M. Allenby (2011) “Multiple Constraint Choice
Models with Corner and Interior Solutions,” Marketing Science, 30(3), 481–490.
Schwarz, Gideon (1978) “Estimating the Dimension of a Model,” Annals of Statistics, 6(2),
461–464.
Terui, Nobuhiko, Masataka Ban and Greg M. Allenby (2011) “The Effect of Media
Advertising on Brand Consideration and Choice,” Marketing Science, 30(1), 74–91.
Yang, Sha, Yuxin Chen and Greg M. Allenby (2003) “Bayesian Analysis of Simultaneous
Demand and Supply,” with discussion, Quantitative Marketing and Economics, 1,
251–304.
Zellner, Arnold (1971) An Introduction to Bayesian Inference in Econometrics. New York:
John Wiley & Sons.
Over the past two decades structural models have come to their own in
empirical research in marketing.1 The basic notion of appealing to eco-
nomic theory when building models of consumer (e.g., Guadagni and
Little 1983) and firm behavior (Horsky 1977; Horsky and Nelson 1992) in
marketing has been around for much longer than that. Yet, this idea has
come to the forefront as authors have confronted the challenges associated
with drawing inferences from purely statistical relationships governing
the behaviors of the agents of interest. While these relationships provide
important insights into the correlational structure underlying the data,
they are less useful when one is interested in quantifying the consequences
of a change in either the structure of the market (e.g., what happens when
a retailer closes down its bricks-and-mortar operations to focus solely on
online sales) or in the nature of conduct of one or more players in that
market (e.g., what happens to prices of car insurance when consumers
change the ways in which they search for these prices). Since the econom-
ics underlying the conduct or the behavior of agents in the presence of the
structure are not explicitly built into models that only focus on describing
statistical relationships between agents’ actions and outcomes, it is diffi-
cult if not impossible for those models to provide a prediction when one of
these dimensions actually changes in marketplace.
As marketers move away from being focused only on “local” effects of
marketing activities, e.g., what happens when I change price by 1percent,
in order to better understand the consequences of broader shifts in policy,
the need for structural models has also grown. In this chapter, I will
focus on a small subset of such “structural models” and provide brief
discussions of what we mean by structural models, why we need them, the
typical classes of structural models that we see being used by marketers
these days, along with some examples of these models. My objective is
not to provide a comprehensive review. Such an endeavor is far beyond
my current purview. Rather, I would like to provide a basic discussion of
structural models in the context of the marketing literature. In particular,
to keep the discussion focused, I will limit myself largely to models of
demand rather than models of firm behavior.
200
The definition and key elements of a structural model have been well
established, at least since the important chapter by Reiss and Wolak
(2007). Other papers by Kadiyali et al. (2001), Chan et al. (2009) and
Chintagunta et al. (2004, 2006) have also stayed close to this early work.
And I will not depart in any significant way from the previous work that
precedes this chapter and will draw heavily from that work. In simple
terms, a structural model is an empirical model; one that can be taken
to the data. But it is not any empirical model – since an equation that
establishes a statistical relationship between a set of explanatory variables
and an outcome variable is also an empirical model. What distinguishes a
structural model is that the relationship between explanatory and outcome
variables is based on theory – most often in economic theory – although
it is not limited just to economic principles and can encompass theories
from other disciplines such as psychology as well (Erdem et al. 2005). The
theory for its part makes a prediction about the behavior of some set of
economic agents (consumers, firms, etc.) and thereby governs how the
outcome variable of interest is influenced by the explanatory variables.
Thus the key ingredients of the model are the (economic) agents involved;
the nature of their behavior (optimizing, satisficing, and so on); and the
relationships between explanatory and outcome variables ensuing from
such behavior. These ingredients stem from the researcher’s beliefs about
how they map onto the specific context of interest.2
Since theories make specific predictions, it is unlikely that these predic-
tions about the explanatory and outcome variables can perfectly rational-
ize the actual data one observes on these variables in the market. The
link between the predictions of the model and the observed outcome data
is provided by the “unobservables” in the model. These unobservables
essentially allow us to convert the economic (or some other theory-based)
model into an econometric model, i.e., the final empirical model that we
take to the data. These unobservables get their nomenclature from vari-
ables that are unobserved to us as researchers but are, in general, known
to the agents whose behavior is being modeled.
As Reiss and Wolak point out, these unobservables can be of different
forms. First, we have “structural” error terms – variables that belong to
the set of explanatory variables in the economic model but constitute the
subset that we do not observe as researchers. For example, we know that
shelf space and shelf location are important determinants of a brand’s
market share in addition to price and advertising. But in many situations
we do not have access to data on these variables. In such situations they
become part of the unobservables and constitute “structural” error in the
sense that they are directly related to the theory we are trying to create an
empirical model for.
The second set of unobservables has a very long history in the marketing
literature – unobserved heterogeneity. These unobservables help explain
differences in the relationship between the explanatory and outcome
variables across different agents whose behavior is being characterized
by the structural model. For instance, when looking at brand choice
behavior, data patterns might reveal that one consumer is very price
sensitive whereas another consumer is not. By allowing the consumers’
utility parameters to differ from one another we can capture some of the
deviations between the theoretical model and the data on hand across
consumers in the market.
The third set of unobservables comes about in structural models that
involve agent uncertainty about a specific parameter in the model. In
these models, agents learn about the parameter they are uncertain about
over time but usually have some prior belief about the parameter, often
characterized via a distribution. Learning is a consequence of “signals”
received by the agent (say a consumer) from another agent (say a firm)
or from the environment that allows the former agent to update his/her
belief about the uncertain parameter. As the agent receives more signals,
the uncertainty gets resolved over time. While there exist instances where
the researcher also observes the signals received by the agent, in most
instances this is not the case. In such situations the signals received become
part of the set of unobservables from the researcher’s perspective.
A fourth set of unobservables comes from measurement error. For
instance one might be interested in studying the relationship between the
level of advertising received by a consumer and the purchases that might
be caused by this advertising. In these cases, one observes advertising at a
level different from the exact exposure that the consumer members receive.
Rather, one might have proxies for advertising such as the expenditure
on that activity in the market where the consumer resides or the average
exposure of the specific demographic profile to which the consumer
belongs. Such errors in the measurement of variables constitute another
unobservable from the researcher’s perspective.
I begin with the classic brand choice model that has been ubiquitous
in marketing and that is based on the model of consumer utility maxi-
mization. I use the framework from Deaton and Muellbauer (1980) or
Hanemann (1984), for a consumer i on purchase occasion t choosing from
egory ( g Jj51 cijt xijt) and the other being the quality weighted quantity of
(cijt) weighted sum of the quantities (xijt) of each of the brands in the cat-
k51
The next set of unobservables that we can introduce into the above
model corresponds to the heterogeneity across consumers in their
preference and responses to marketing activities. Accordingly, several
researchers, e.g., Kamakura and Russell (1989), Chintagunta, Jain and
Vilcassim (1991), Gonul and Srinivasan (1993), Rossi, McCulloch and
Allenby (1996), among many others, have allowed Q to vary across
consumers following some distribution (either discrete or continu-
ous) across consumers such that Qi , f (Q) ; where f(.) represents the
density of a specified multivariate distribution. Specifically, when the
parameters are heterogeneous, the consumer’s probability of choosing
brand j can be written as:
k51
More recently, the logit model has been used in conjunction with aggre-
gate data – store or market (e.g., Berry 1994; Berry, Levinsohn and Pakes
1995; Nevo 2001; Sudhir 2001) level data. Assuming for now that there is
no heterogeneity in the intrinsic preference or the responsiveness param-
eters, the probability of a consumer purchasing brand j is once again given
by the expression in equation (9.3). Aggregating these probabilities across
all consumers (N) visiting the store or purchasing in that market in a given
time period t (say a week) we can obtain the market share as follows:
k51
The sampling error associated with the share in equation (9.4) is then
given as follows:
Sjt (12Sjt)
sejt 5 (9.5)
Å N
It is clear that, as the number of consumers in the market becomes
“large,” the sampling error shrinks to zero. And equation (9.4) will rep-
resent the market share of the brand in that week. At the aggregate level,
however, Sjt represents a deterministic relationship between the various
explanatory variables (prices and other marketing variables) and the
outcome variable – market share. Recall that this was not the case at the
individual level. So although the expressions in the two cases are identical,
the nature of the outcome variable has different implications.
At issue is that if the expression in equation (9.4) is to be used as a
predictor of the outcome variable, market share, then it implies that given
a set of parameters and a set of observable variables, researchers will
be able to predict market shares perfectly, i.e., with no associated error.
Clearly such a claim would be inappropriate as one cannot perfectly
predict shares. This brings up a need for another error that can explain the
discrepancies between the model prediction and what we observe in the
data in terms of the brand shares for different time periods. An easy way
in which these errors can be introduced is additively in equation (9.4). In
other words we can write the share expression as:
k51
One can alternatively argue that these are brand-level factors that have not
been included as part of vector { pjt , Zjt } that we have already introduced
into the model. So these are unobservables like shelf space and shelf loca-
tion that are common across consumers who visit a store, are brand spe-
cific, influence shares, but are not observed by us as researchers (in most
cases). So if the error term captures such factors that have been omitted
in the model, where would they belong? It appears that they should be
included as a brand- and week-specific measure of quality when one is
looking at store share data. Denoting these factors as xjtfor brand j in week
t, the share equation in (9.5) can instead be written as:
k51
Since the jjt are not observed by us as researchers, they qualify for inclu-
sion as unobservables. Further, since they are integral to the utility
maximization problem considered earlier, they can also be viewed as being
structural in nature.
So the (observed) explanatory variables are the same as those in equa-
tion (9.2) but the outcome variable is the shares of the different brands
in a given market- and time- period. Per se, estimation of the model in
equation (9.6) is straightforward since it can be “linearized” as follows:
Sjt
lna b 5 aj 1 Zjt b2u ln ( pjt) 1jjt (9.7)
S0t
this case, prices are being set “endogenously” and one must address the
associated endogeneity issue.
I will not go into the issue of endogeneity and how one goes about
resolving endogeneity in such a model. Others have tackled this issue
(Berry 1994; Berry et al. 1995; Rossi 2014). Briefly, there are two broad
approaches to tackling the issue – one that is agnostic about the specific
data-generating process that leads to the endogeneity issues (sometimes
referred to as a “limited information” approach) and one that considers
the data-generating process more explicitly (sometimes referred to as
a “full information” approach). Under the former category, we have
instrumental variables approaches (e.g., see the discussion in Rossi 2014),
control-functions (Petrin and Train 2010) and so on. Examples of studies
using the latter approach include, e.g., Yang et al. (2003). Thus, while
there are several approaches to addressing the problem, consensus about
a universal best approach is lacking. There are of course pros and cons
associated with each approach and each context within which it is applied.
While the presence of the structural error term jjt in equation (9.7)
addresses the issue of variability of shares from observed outcomes,
there is another form of variability that the model does not account for.
Specifically, the model in equation (9.6) suffers from the Independence
from Irrelevant Alternatives (or IIA) problem. In particular, what that
means is that if brand j changes its prices then the shares of the other
brands will change proportional to those brands’ market shares (i.e.,
in a manner consistent with the IIA assumption). In reality, of course,
careful inspection of the share data in conjunction with changes in prices
(for example) might reveal to the researcher that the IIA assumption is
inconsistent with the data on hand. In such instances the logical question
that arises is: how can I modify the model to be able to accommodate
deviations from the IIA?
The answer to this stems from one of the unobservables we have
already introduced – that of heterogeneity in the preferences and response
parameters. The presence of “heterogeneity” in preferences and respon-
siveness parameters results in an aggregate share model that no longer
suffers from the IIA problem. This is how. Recall that, in the context of
consumer data, we allowed these consumers to have parameters Q that
vary according to a multivariate normal distribution. The question then
becomes, if such heterogeneity exists at the consumer level, what does
the aggregate share of brand j look like in week t? If the consumer level
probability is given by the expression in equation (9.3) then the aggregate
share of brand j in week (or some other time period) t requires us to
integrate out the heterogeneity distribution in that week. This yields the
following expression.
Sjt 5 3
exp (aij 1Zjt bi 2ui ln ( pjt) 1 jjt)
f (Qi ) dQ
11 a exp (aik 1Zkt bi 2ui ln ( pkt) 1 jkt)
J
k51
53
exp ([ aj 1Zjt b2u ln ( pjt) 1 jjt ] 1 [ Daij 1 Zjt D bi 2 Dui ln ( pjt) ])
k51
In equation (9.8), aij 5 aj 1Daij , where the first term on the right-hand-
side, aj, is the mean of that parameter across consumers and the second
term is the deviation of consumer i’s preference from the mean. The second
line of equation (9.8) separates the part that is not consumer-specific from
the part that is; so the heterogeneity distribution only pertains to the
distribution of consumer deviations DQifrom the overall mean. Thus,
DQi ,MVN (0,W) .
From the above expression it is clear that the ratio of the shares of
two brands, j and k, depends on the levels of the explanatory variables
of all other brands and hence free from the effects of the IIA property. A
clear downside to the model in (9.8) is that it is no longer linearizable as
it once was. Hence other approaches need to be employed to address the
unobservability of jjt. In particular, Berry (1994) proposed the contraction
mapping procedure to isolate the component aj 1Zjt b 2uln ( pjt) 1 jjt (or
the “linear utility” component in the language of Berry and BLP) in the
first square bracket in the numerator and denominator from (9.8) above;
conditional on a chosen set of parameters for the “non-linear” part, i.e.,
that corresponding to the heterogeneity distribution. This restores the lin-
earity we saw in (9.7) and regression methods can once again be employed.
An alternative approach that has been proposed more recently is that by
Dube, Fox and Su (2012) using an MPEC (Mathematical Programming
with Equilibrium Constraints) approach. The identification of the param-
eters of this model was implicit in my discussion for the motivation of
including the “additional” error term (to better fit share variability over
time) and heterogeneity in the parameters (to better account for deviations
from IIA). Small deviations from IIA will result in finding low variances
for the heterogeneity distribution, DQi ,MVN (0,W) .
The above discussion covers the first two types of unobservables identified
earlier. It also introduced a third unobservable identified in the context of
aggregate demand data.
neutral and maximizes expected utility then the probability of the con-
sumer purchasing brand j will be:
Prijt 5 (9.9)
11 a exp (E (ak) 1 Zkt b2u ln ( pkt))
J
k51
seek to resolve his or her uncertainty regarding this quality, and if so how
does (s)he do it? (The following discussion draws heavily from Sriram
and Chintagunta 2009.) Here we consider the case in which the consumer
learns about the unknown quality. The typical assumption is that consum-
ers learn in a Bayesian fashion over time. Let aj be the true quality of the
brand j. Consumers do not know this true quality. And while they know
that it comes from a distribution, unlike the case above, they do not know
the mean of that distribution. In period 0, the consumer starts with a prior
belief that the quality is normally distributed with mean a0j and variance
s0j2 , i.e.,
a0j | (a0j , s20 j) (9.10)
(
For now we assume that the above prior belief is common across
consumers. In period 1, the consumer would make a purchase decision
based on these prior beliefs for each of the J brands. If consumer i, i = 1,
2, . . . I, purchases brand j, she can assess the quality of the product from
her consumption experience, aEij1. If we assume that the consumer always
derives the experience of quality that is equal to the true quality, then this
one consumption experience is sufficient to assess the true quality of the
product. However, in reality, this experienced quality might differ from
the true quality, because of (1) intrinsic product variability and/or (2) idi-
osyncratic consumer perceptions. Hence, researchers typically assume that
these experienced quality signals are draws from a normal distribution
whose mean equals the true quality, i.e., that these are unbiased signals.
Thus, we have
where sj2 captures the extent to which the signals are noisy. Thus, for
learning to extend beyond the initial purchase, we need sj2 > 0. In (9.11)
consumers do not know the mean but are assumed to know the variance.
Subsequent to the first purchase (and consumption experience) the
consumer has some more information than the prior she started with.
Consumers use this new information along with the prior to update
their beliefs about the true quality of the product in a Bayesian fashion.
Specifically, since both the prior and the signal are normally distributed,
conjugacy implies that the posterior belief at the end of period 1 would
2 such that
also follow a normal distribution with mean aij1 and variance sij1
$
aij1 5 uij1 a0j 1 v ij1aEij1
2 5
1
sij1
1 1
1
s 20j s 2j
sj2
uij1 5
s 20j 1 s 2j
$ 5 s0j2
vij1 (9.12)
s0j2 1 sj2
If none of the other brands is purchased in the first period, the posterior
distributions for those brands will be the same as the prior distributions as
there is no additional information to update the consumer’s beliefs about
these brands.
This posterior belief at the end of period 1 acts as the prior belief at
the beginning of period 2. Thus, when the consumer makes a purchase
decision in period 2, she would expect her quality experience to come from
the distribution
(
On the other hand, a consumer who does not make a purchase in period
1 will use the same prior in period 2 as she did in period 1. Hence, we can
generalize the above equations for any time period t, t = 1, 2, . . ., T, as
follows
$ a
aijt 5 uijt aij (t21) 1 vijt Eijt
2 5
1 1
(9.13)
a Iijt
sijt 5
1 Iijt t
1
sij2 (t21) sj2 1 t51
1
s0j2 sj2
sj2
uij1 5
Iijt sij2 (t21) 1 sj2
$ 5 Iijt sij2(t21)
vij1
Iijt sij2(t21) 1 sj2
in such situations the signals received by consumers become part of the set
of unobservables from the researcher’s perspective. Researchers typically
assume, as above, that the signals come from a known distribution with
unknown parameters and then simulate these signals over the course of
the estimation. Accordingly, identification in learning models poses a
challenge. One needs to observe a pattern in the data that suggests that
behavior evolves over time consistent with converging towards some
preference level if indeed there is support for the Bayesian updating
mechanism described above.
For example, one implication of the expression in equation (9.13) is
that if the variance of the received signals s2j is high then learning will
be slower than when the variance is low. As an example of identification
using this idea, Sriram et al. (2015) look at a situation where the variance
of signals received by consumers can be high or low with these variances
being observed by researchers. The context is that of consumers deciding
whether to continue subscribing to a video-on-demand service. Consumers
who receive high (low) quality service are more likely to continue (stop)
subscribing but consumers are uncertain about their quality. They learn
about this quality based on the signals received. If the signals consumers
receive have low variance then consumers receiving either high or low
quality of service learn about this quality quickly; those with high quality
continue with the firm and those with low quality leave, i.e., terminate the
service. But if signals have a high variance, learning is slow and consumers
receiving low quality service may continue with the service. Indeed, the
patterns in the data suggest precisely this nature of behavior. Figures 9.1
and 9.2 below are adapted from Sriram et al. (2015).
Given the nonlinearity associated with learning models, one often
finds evidence of learning even when it is unclear whether such learning is
going on in the data. Thinking about the sources of identification prior to
estimation makes for good practice not just with these models but with all
econometric models in general.
10
8
Termination probability (%)
0
1 2 3 4 5 6 7 8+
No. of periods of high/low quality encounters
structural model. I will now illustrate these two types of applications and
explain why it might be difficult to make the same assessments sans the
structural model.
In the Sriram et al. study (2015) mentioned above, some consumers are
exposed to signals about the quality they receive that have high variance
whereas the signals that others receive have low variance. The latter are
able to learn about the true quality they receive quicker than those with
high variance. An implication of this is that when consumers are uncertain
about the quality they experience, those experiencing low temporal vari-
ability in quality are likely to be more responsive (in terms of termination)
to the average quality level compared to those experiencing high vari-
ability. Specifically, if, at the time of signing up for the service, a consumer
has a high prior belief on the quality, then it becomes more difficult for
the consumer to learn that the quality is actually low when the variance
of signals received is high. As a consequence these consumers will respond
less, in terms of termination, to the quality they receive. On the other
hand, for consumers receiving higher quality than their prior belief, high
variability will interfere with such learning so termination may be higher
than for those with high quality but low signal variability. In other words,
we would see an interaction effect between average quality and variability
8
Termination probability (%)
0
1 2 3 4 5 6 7 8+
No. of periods of high/low quality encounters
High quality Low quality
experience experience
HQ+ LQ+
HQ– LQ–
goes to make the next purchase. This provides an explicit link between
purchasing today and purchasing tomorrow (see Ching et al. 2013). Note
however, that the model I described previously was a “myopic” model of
learning since it did not fully consider this intertemporal link.
The third bucket includes models that have recently seen an interest
in marketing – those involving uncertainty, not about the parameters
of the utility function as in learning models, but about some feature or
characteristic of the product itself. Here I am referring to the models
of search. Specifically, in this case, the consumer may not be perfectly
informed about the price of a product in the market and needs to engage in
costly (defined broadly as including time and psychological costs) search
to uncover information about price (as examples, see Mehta et al. 2003
and Honka 2014). Alternatively, consumers search for a product that
best matches their preferences, as in shopping online for a digital camera
that best suits one’s needs (e.g., Kim et al. 2010). In particular, as online
browsing behavior, visit and purchase information become more widely
available, I expect these models to see increasing application in marketing.
Structural models have certainly made an impact in the field of mar-
keting. While diffusion has taken a while, today they are considered an
integral part of the marketer’s toolbox. Looking ahead there appear to be
three principal domains in which the research seems to be progressing. I
will very briefly mention each of them in turn.
NOTES
1. I thank Anita Rao and S. Sriram for their useful comments on an earlier version. My
thanks to the Kilts Center at the University of Chicago for financial support.
Please note that parts of this chapter appear elsewhere in the chapter “Structural
models in Marketing: Consumer Demand and Search” of the second edition of the
“Handbook of Marketing Decision Models,” edited by B. Wierenga and R. van der Lans.
2. A point to emphasize here relates to causality. If the researcher is interested only in estab-
lishing causality, then a structural model per se may not be required (see e.g., Goldfarb
and Tucker 2014).
References
latent structure
analysis
Cluster Analysis1
3 4
X1 = ‘I need power
in my laptop’
227
input from far more than two variables. For example, online recommenda-
tion engines cluster several millions of customers on thousands of SKUs.
There are several decisions to be made when conducting a cluster
analysis. They are: (1) data preparation, (2) the cluster model to be used,
and (3) the interpretation of the clusters. Each issue is discussed in turn.
Data Preparation
cutomer
ID Mystery Bio DIY
1057 3 2 1
0143 5 3 0
1552 0 1 1
0094 1 0 2
... ...
N
Means: 2.25 1.50 1.00
5 g k51
distance. Customers 1 and 2 would be deemed d12 units far apart, where
2
d12 r (x1k 2x2k) 2 across the k 5 1,2,. . .r variables. For more options
see Aldenderfer and Blashfield (1984) and Everitt et al. (2011).
Clustering Models
Next, the marketing analyst must choose among the many clustering algo-
rithms. Some cluster models are “hierarchical” and we show a popular
example of such a model—Ward’s method, and others are not, and we
show an example of that as well—k-means clustering. For each, we shall
illustrate using the small data set in Figure 10.2, which depicts the pur-
chase patterns of four customers across three genres of book purchases.
Ward’s method
Ward’s method is a clustering technique that operationalizes the intuition
that segments should consist of similar customers, whereas customers in
different segments should be different. In the statistical parlance, the clus-
tering model minimizes the variability within clusters, and maximizes the
variability between clusters.
Figure 10.3 shows the computation of the total sums of squares in the
Possible means
Cluster for Mystery Bios DIY Error SS R2
{A&B} {C} {D} A&B 4.0 2.5 0.5 3.00 0.862
{A&C} {B} {D} A&C 1.5 1.5 1.0 5.00 0.770
{A&D} {B} {C} A&D 2.0 1.0 1.5 4.50 0.793
{B&C} {A} {D} B&C 2.5 2.0 0.5 15.00 0.310
{B&D} {A} {C} B&D 3.0 1.5 1.0 14.50 0.333
{C&D} {A} {B} C&D 0.5 0.5 1.5 1.50 0.931 min error,
max R2
small illustration data set: SStotal 5 21.74; i.e., the amount of variability
that may be apportioned across the clusters. Each step of the model seeks
to assign customers to groups so as to maximize R2. Recall, from regres-
sion, R2 is a measure of fit that indicates the amount of total variance
explained by the model. It is defined as: R2 5 12 (SSerror/SStotal) , so to say
that maximum variance is explained is also to say that error variability is
minimized.
Ward’s method begins with each of the N customers in his or her own
cluster (i.e., each cluster is of size 1). In the first iteration, customers are
combined to form clusters of size 2. First, customers A and B are combined,
and C and D are left in their own clusters. Then, customers A and C are com-
bined, with B and D left in their own clusters. Each possible two-customer
segment is created, and the R2 is calculated for each combination. For
example, the SSerror 5 3.00 the first row is derived by comparing customer
A’s data (and B’s data) to their combined means (4.0, 2.5, 0.5), as follows:
SSerror 5 (3 24) 2 1 (2 22.5) 2 1 (120.5) 2 1 (524) 2 1 (3 22.5) 2 1 (0 20.5) 2
= 3.00. In Figure 10.4 we see R2 maximized when customers C and D form
a segment, with customers A and B in their own individual segments.
Ward’s method is a “hierarchical” cluster model, which means that once
customers C and D are joined in a segment, they will always be in the same
cluster (whether other customers join that segment or not). Thus in Figure
10.5, the second iteration of the model treats C and D together, and tries
out all remaining possibilities of clusters—that customer A or customer
B might join the {C&D} segment, but the highest R2 is achieved when
customers A and B constitute their own segment.
Given the small size of this illustration data set, the only possible
iteration that remains would be for customer segment {C&D} to join with
{A&B}. The starting and endpoints in cluster analyses are not particularly
insightful—the starting place has all customers in separate segments, and
it is not very efficient for companies to truly customize their offerings for
Possible means
Cluster for Mystery Bio DIY Error SS R2
{C&D&A} {B} C&D&A 1.33 1.00 1.33 7.32 0.663
{C&D&B} {A} C&D&B 2.00 1.33 1.00 20.67 0.049
{C&D} {A&B} A&B 4.00 2.50 0.50 3.00 0.862 min error,
max R2
each individual, and the endpoint has all customers in one segment, and
presumably a mass marketing strategy would not appeal to the customers
who are heterogeneous across segments. So the question is whether the
company finds more insight and utility in sorting customers into three
segments {C&D,A,B} or two {C&D,A&B}.
Ward’s method is popular and empirically well-behaved. It might be
less advised in application to so-called big data, because it requires large
numbers of combinations to be computed in early iterations.
K-means clustering
In k-means clustering, the marketing analyst has a rough guess that there
might be, say, five segments, and so tells the computer to derive a five-
cluster solution. The model sets k = 5 and proceeds. Obviously it would be
smart to also check k = 4, k = 6, and perhaps more solutions to see what
number of clusters might provide a partitioning of customers that seems
optimal in terms of parsimoniously fitting the data. The k-means solutions
are not hierarchical, so the four clusters when k = 4 might not be four of
the five clusters when k = 5, for example.
The k-means model begins with random assignment. Figure 10.6 shows
the four customers assigned to one of two clusters; k = 2 for this simple
example. The centroid (multivariate means) are computed for cluster 1,
which consists of customers B and C, and for cluster 2, which consists of
customers A and D. Those means are at the top of Figure 10.7.
Next in Figure 10.7, the distances are computed between each customer
and the means of each cluster. If the customer is closer to the cluster he or
she is already assigned to, the customer stays put. If the customer’s data
more closely resemble the other cluster, the model will move the customer
to that other cluster. The distances are computed in Figure 10.7 for all
four customers, to diagnose whether they belong in the B&C cluster or the
A&D cluster. When the customers are reclassified, there now exist still k =
2 clusters, but they consist of customers A&B and C&D.
In Figure 10.8, the means of the new clusters are computed, and a new
assessment is conducted regarding whether each customer is in the optimal
cluster or again should be moved. Figure 10.8 shows that in this second
iteration, each customer is in the cluster with the mean profile that is
closest to his or her own individual data. Thus, no more iterations are nec-
essary, and the final partition is comprised of clusters {A,D} and {B,C}.
A question naturally arises as to how many clusters exist in the data. It is
answered by looking at the tradeoff of a large number of clusters explain-
ing the data better while the marketing analyst simultaneously seeks a
small number of clusters for purposes of parsimonious understanding
and communication. For example, the end R2 in a k-means can be plotted
against k (for various runs on k) to see the point at which the enhancement
of fit diminishes with the extraction of additional clusters. This issue is
relevant in factor analysis and multidimensional scaling as well and will be
revisited in those contexts.
Factor Analysis
Managers say things like, “If you can’t measure it, you can’t manage it”
or “You manage what you measure.” Quantitative indicators are not
the only means of assessing business practices, but they can be extremely
helpful.
There are two major decisions to be made when conducting a factor
analysis. They are: (1) the number of factors to extract and (2) the rotation
of the factors and their interpretation.
Measuring objective indicators like a car’s gas mileage or speed is
relatively easy, but marketers frequently find themselves in the business
of trying to understand customers’ attitudes and behavioral propensities,
asking survey questions such as, “Do you like the car’s style?” or “Does it
F2 = Attitude b52 d6
Abrand3 U6
toward brand b62
factor loadings, and they reflect the relationships between each factor and
the six variables; e.g., Aad1 will be expected to have a high loading on F1
(Aad) and a low loading on F2 (Abrand). The Us at the right of the figure
represent the uniqueness factors, and the d weights reflect their impact on
their respective observed variables.
A factor analysis model finds the b’s in equation (10.1) to capture
as much of the information contained in the original X1, X2, . . ., Xp
variables as possible. In the factor analytic context, that means capturing
the pattern of correlations among the p variables in the p×p correlation
matrix, R.
The computer or model proceeds as follows. First, the correlation
matrix is adjusted for the uniqueness factors. The obverse of unique-
ness is communality, or the extent of covariability with other variables.
Communalities are estimated for each variable as the squared multiple
correlation (SMC) from predicting each variable from the others, in turn,
i.e., R21•2,3,. . .,p, R22•1,3,. . .,p, . . ., and R2p•1,2,3,. . .,p-1 (then the uniqueness of a
variable is 1 minus its communality). The SMCs are imputed into the
diagonal of R, and we’ll call that adjusted matrix: Radjusted = RSMC. The
difference between the two matrices is depicted in Figure 10.10, for our
example data set on p = 6 variables.
Next, the RSMC is “factored” or decomposed into matrices of “eigen-
values” and “eigenvectors.” Each eigenvector will form a column of the
vector matrix V and its values, v1, v2, . . ., vp comprise the loadings that
indicate the extent to which the variables X1, X2, . . ., Xp load on the
corresponding factor. The first vector or factor is derived to capture the
R 0.944
0.402
0.957
0.404
1.000
0.387
0.387
1.000
0.355
0.964
0.379
0.948
0.371 0.369 0.355 0.964 1.000 0.967
0.382 0.386 0.379 0.948 0.967 1.000
RSMC 0.944
0.402
0.957
0.404
0.922
0.387
0.387
0.935
0.355
0.964
0.379
0.948
0.371 0.369 0.355 0.964 0.958 0.967
0.382 0.386 0.379 0.948 0.967 0.940
aximum covariability among the Xs. The eigenvalue indicates how much
m
(co)variability that eigenvector captured. The second vector or factor is
derived to capture the maximum amount of covariability that remains
among the Xs with the constraint that the second vector be orthogonal to
(uncorrelated with) the first. The eigenvalue–eigenvector step is written as
RSMC 5 VLV r (V r is the transpose of V, and the eigenvalues, l1,l2,. . .lp
form the diagonal elements in L).
The eigensolution is broken in two by defining a matrix B 5VL.5
such that RSMC 5 BBr. Recall that, to achieve parsimony, the number of
common factors retained (r) is fewer than the number of input variables
(p), so that while the matrix RSMC can be perfectly reproduced by BBr,
extracting r factors yields an approximation: RSMC < BrBrr . Figure 10.11
presents the first two eigenvectors as the columns of V, and their cor-
responding eigenvalues in L. (For readers rusty in matrix multiplication,
calculate (0.409)(2.00115) + (0.407)(0) to obtain 0.819, all values in the
solid boxes, and (0.406)(0) + (–0.425)(1.30979) to obtain –0.557, values
in the dashed boxes.) Note the sums of squared elements of eigenvectors
(columns of V) are 1.0 (within rounding), whereas the sum of squares for
B equal the eigenvalues. The B matrix is the raw, “unrotated” (not to be
interpreted) factor loadings matrix.
We will address the issue of rotations and the interpretation of
factor loadings shortly, but we are currently steeped in eigenvalues (and
V .5 Unrotated
V r=2 r=2 factor loadings, Br
v1 v2 √ l1 √ l2 v1 v2
Aad1 0.409 0.407 2.00115 0 Aad1 0.819 0.534
Aad2 0.412 0.412 0 1.30979 Aad2 0.825 0.541
Aad3 0.402 0.406 Aad3 0.804 0.532
Abrand1 0.412 –0.392 Abrand1 0.824 –0.514
Abrand2 0.406 –0.425 Abrand2 0.813 –0.557
Abrand3 0.408 –0.405 Abrand3 0.816 –0.531
sum of squares sum of squares
each coloumn: 0.999 0.999 each coloumn: 4.005 1.716 = l
√'s : 2.001 1.310 = √
l
e igenvectors), and they can be used to answer the question, “How many
factors are there?” or “What is r?”
eigenvalue
2.0
1.5 take 3
1.0
0.5
1 2 3 4 5 6 ... #factors
F2 F’2 F”2
Aad2
Aad3 Aad1
F1
Abrand1
Abrand3
Unrotated factors, F Abrand2
Orthogonal rotation, F’
Oblique rotation, F” F”1
F’1
Factor Rotations
three brand Xs are negative. The loadings indicate that F1 reflects all six
variables, and F2 reflects some kind of contrast between the ad and brand
variables. That interpretation isn’t very enlightening.
One means of rotating factors functions like operating a spinner in a
children’s board game—we take the original factors and rotate the axes a
bit clockwise until the axes are in a location we like better. If we spin the
axes labeled F1 and F2 through an approximate angle of q = 45°, then the
new axes would appear where there are dashed lines labeled F1′ and F2′.
That rotation is said to be an “orthogonal” rotation because F1′ and F2′
are still uncorrelated (the axes are perpendicular to each other). When the
Xs are projected onto these new axes, the rotated factors, it is clearer to see
that F1′ is defined by the three brand variables having high loadings (and
the three ad variables have relatively lower loadings), and F2′ is defined by
the three ad variables.
An orthogonal rotation is achieved by a simple transformation. We
can estimate that the angle from F1 to the placement of F1′ is about
45°. In Figure 10.14, the raw, unrotated factor loadings matrix B from
Figure 10.11 is repeated for convenience. The small matrix in the center
contains the sine and cosine of the 45° angle, and the matrix multiplication
yields the orthogonally rotated factors, F1′ and F2′. The matrix at the right
contains the new factor loadings. Note its interpretation, consistent with
Figure 10.13, indicates that F1′ is defined by the brand variables, and F2′
by the ad variables. (It is standard to use a cut-off of 0.3 to determine the
loadings that are large, associated with the variables that help to define a
factor, versus those loadings that are so small as to be sampling variability
or noise.) The most frequently used and best performing orthogonal
Unrotated Orthogonally
factors, Br transformation rotated factors
F1 F2 F '1 F '2
Aad1 0.819 0.534 0.708 0.706 0.203 0.956
=
Aad2 0.825 0.540 –0.706 0.708 0.202 0.965
Aad3 0.804 0.532 0.193 0.945
Abrand1 0.824 –0.514 0.946 0.219
Abrand2 0.813 –0.557 0.969 0.180
Abrand3 0.816 –0.531 0.953 0.201
cos θ –sin θ
sin θ cos θ
θ ≈ 45º
Orthogonally Oblique
rotated factors factor loadings
F '1 F '2 F ''1 F ''2
0.203 0.956 Aad1 0.008 0.874
0.202 0.965 ^3 Aad2 0.008 0.898
0.193 0.945 Aad3 0.007 0.843
0.946 0.219 Abrand1 0.848 0.010
0.969 0.180 Abrand2 0.910 0.006
0.953 0.201 Abrand3 0.865 0.008
ϕ = 0.385
produce?” and “Do you believe that Whole Foods offers good value?”
juxtaposed with “How important is freshness when you shop for grocer-
ies?” and “How important is value?” Means over survey respondents
are calculated and plotted to see whether a brand excels on dimensions
that consumers consider to be important. Many brand attributes may be
plotted, and competitor brands may be superimposed on the plots.
This approach to creating perceptual maps is appealing for its simplic-
ity. Yet the map can only reflect the attributes measured on the survey,
and if consumers distinguish among brands using features and benefits
that the brand manager does not anticipate, those features will not be
reflected in the brand positions.
By comparison, perceptual maps derived from multidimensional scaling
(MDS) pose an omnibus question to consumers, simply, “How similar are
brands A and B?” (asked for all pairs of brands). Consumers proceed to
make brand comparisons along whatever attributes they care about, and
marketing managers infer them using MDS.
The heart of the MDS model is the analogy between distance and (dis)
similarity. A map is created so that brands thought to be similar will be
represented as points close together on the map, and brands thought to
be different will be farther apart. The map is studied for its configuration
as well as its dimensions. The configuration (i.e., relative brand locations)
helps inform numerous marketing questions, such as market structure
analysis, given that close brands are most competitive and likely substi-
tutes, verification of the effectiveness of marketing communications in
having properly positioned a brand vis-à-vis its competition, the necessity
for repositioning, strategic opportunities for brand development where
there currently exist empty spaces in the map, etc. The dimensions in a per-
ceptual map can also be informative, just as labels of North, South, East,
and West are in a geo-map, and we’ll show how to find their perceptual
equivalents.
There are several major decisions to be made when conducting an MDS.
They are: (1) the nature of the data to be modeled, (2) the MDS model to
be used, (3) the number of dimensions to extract, and (4) the interpretation
of the configuration and dimensions. We discuss each.
Dissimilarities Data
If the basic model or metaphor underlying MDS is that distances are used
to represent dissimilarities, the marketing analyst usually simply asks con-
sumers to fill out survey questions of the form, “How similar are these two
brands?” cycling through all pairs of p brands. Consumers use a scale such
as 1 = “very similar” and 9 = “very different.”
MDS Models
With proximities data in hand, the MDS model begins to fit them onto a
map. Say consumers think brands A and B are very similar (call the dis-
similarities judgement dAB , and say dAB 51), B and C a little less similar
(dBC 5 2), and A and C still less similar (dAC 53). The brands could be
placed along a line, with A at point 1, B at point 2, and C at point 4. That
1 – d model would capture the data perfectly with dAB = 1, dBC = 2, dAC = 3.
Naturally, real data are noisier and real brands are more complex, so
the data are unlikely to be fit perfectly in 1 – d. For example, imagine the
data were dAB 51, dBC 52, dAC 5 2.24. These dissimilarities judgments
wouldn’t be represented perfectly in 1 – d, but they would be so in 2 – d
(with the squares of those ds defining three legs of a triangle and the
Pythagorean theorem).
Alternatively, we can assume that there is likely measurement error
in consumer judgements, and note that while the values are different, these
ds still follow the same rank order as the 1 – d example had. If we take the
ds at face value, we are fitting a “metric” MDS model, whereas if we simply
wish to render their relative size, we would fit a “nonmetric” MDS model.
In the class metric MDS model, the data values dij, representing the
dissimilarity judgment for brands i and j, are squared and centered by
removing the effects of the row means, column means, and the grand mean
(see Figure 10.16):
δij δ2ij
A B C D A B C D row means δ2i–
A 0 A 0 9 36 25 17.5
B 3 0 B 9 0 9 16 8.5
C 6 3 0 C 36 9 0 25 17.5
D 5 4 5 0 D 25 16 25 0 16.5
II
Configuration
Coordinates, X
C B A I II
A 1.2 0.5
I B 0.0 0.5
C –1.2 0.5
D 0.0 –1.5
D
The matrix D is factored into D 5 XXr, where the matrix X contains the
coordinates for p points (brands) in r-dimensional space (thus p r,
read “p by r,” meaning p rows and r columns). This problem is solved as
D 5 VLVr (an eigensolution with V being the matrix of eigenvectors and
Λ the diagonal matrix of eigenvalues, much like in factor analysis). The
1/2
matrix of MDS coordinates is defined X 5 VL .
Figure 10.17 contains the 2 – d solution (after standardizing the dimen-
sions), both plotted and in matrix form. Given that the MDS model works
on configurations of distances, the model would be equally valid if the “T”
appearance of the four brands were reflected vertically or horizontally, or
rotated through an angle.
By comparison, in nonmetric MDS, the input data are translated
to ranks and then modeled. In addition, whereas for metric MDS, the
t = dimension, and k = consumer. The model then produces the usual p×r
matrix X, containing the coordinates of the brands in space, along with a
N×r matrix W, which contains the “subject weights” wkt representing the
weight that person k puts on the tth dimension. Those subject weights can
then be correlated with any additional information we had collected on the
consumers, such as demographic information or other attitudinal ratings
to learn, say, that consumers who weight dimension 1 heavily tend to be
male, whereas the consumers for whom dimension 2 is more salient are
older, for example.
As is true for many statistical models (e.g., as we discussed for factor analysis),
MDS has its own version of the tradeoff between model fit and the parsimony
of the model. Ideally, the perceptual map fits the data “as best as possible”
and does so in “minimal dimensionality.” As more dimensions are extracted,
the data fit improves, but parsimony declines. Furthermore, human beings
are so used to seeing 2-d geo-maps that 2-d perceptual maps dominate as
well, even if 3-d or 4-d perceptual maps might describe the data better.
Different MDS models use different measures of fit. Classic metric
MDS often produces a series of eigenvalues, and INDSCAL usually
produces a model R2. Both of these are “goodness of fit” indices (higher
numbers mean better fits). Nonmetric MDS usually produces a measure
called “Stress,” and it is a “badness of fit” index.
0.2 0.2
0.1 0.1
1 2 3 4 5 ... 1 2 3 4 5 ...
#dimension #dimension
(MDS run once as 1-d, (MDS run once as 1-d,
another run as 2-d, etc.) another run as 2-d, etc.)
Figure 10.18 shows examples of plots for each. For either kind of fit
index, the goal is still to identify a break in the curve. For goodness-of-
fit indices, the number of dimensions to extract lies to the left (or above)
the break; the argument of diminishing returns suggests that taking yet
another dimension does not sufficiently enhance the fit. For badness of
fit indices, the number of dimensions to extract lies to the right (or below)
the break; the argument of diminishing returns suggests that taking yet
another dimension does not improve the (lack of) fit.
II
Diet Coke
Diet Pepsi
I
7 up
Pepsi
Sprite Coke
Coordinates on Standardized
Dimensions: Coordinates 0 = nondiet 0 = uncola
I II I II 1 = diet 1 = cola
Coke 0.5 –0.5 0.641 –0.862 0 1
Pepsi 0.6 –0.4 0.808 –0.637 0 1
Diet Coke 0.4 0.5 0.474 1.387 1 1
Diet Pepsi 0.5 0.4 0.641 1.162 1 1
7Up –0.7 –0.3 –1.366 –0.412 0 0
Sprite –0.6 –0.4 –1.198 –0.637 0 0
mean: 0.000 0.000
standard deviation: 1.000 1.000
z-scores. Those two standardized columns will serve as the two predictor
variables in the regressions. The remaining columns represent attributes of
the brands—here they are binary just for simplicity.
One multiple regression is run for each attribute. When running
the regression in Figure 10.20 on the diet versus non-diet property,
specifically, d̂iet 5 b1 zdimI 1 b2 zdimII, the regression R2 5 0.987,
and the coefficient estimates are dˆiet 5 0.117zdimI 1 0.949zdimII. For
the cola–uncola attribute, specifically, ĉola 5 b1 zdimI 1 b2 zdimII,
the regression R2 5 0.993, and the coefficient estimates are
ĉola 5 0.964zdimI 1 0.086zdimII.
The betas from these regressions are the coordinates for the head of an
attribute vector emanating from the origin. In Figure 10.21, we see that the
II
Diet Coke
segment 1
cola
I
7 up
Pepsi
Sprite
Coke
cola attribute vector points roughly to the “east,” indicating the direction
in which that attribute is maximized (brands farther east are those per-
ceived to have much of that attribute). Similarly, the diet attribute vector
points almost due north, such that brands at the top of the perceptual map
are the diet drinks, and by implication, through the origin heading in the
opposite direction, toward the south are the non-diet drinks.
Chapter Summary
Notes
model formulation looks similar, but it has no uniqueness factors, in part because users
typically do not care about measurement error on the variables.
References
Cluster Analysis
Aggarwal, Charu C. (2013), Data Clustering: Algorithms and Applications, Boca Raton, FL:
Chapman & Hall/CRC.
Aldenderfer, Mark S. and Roger K. Blashfield (1984), Cluster Analysis, Newbury Park, CA:
Sage.
Everitt, Brian S., Sabine Landau, Morven Leese, and Daniel Stahl (2011), Clustering
Analysis, 5th ed., New York: Wiley.
King, Ronald S. (2014), Cluster Analysis and Data Mining: An Introduction, Herndon, VA:
Mercury Learning and Information.
McCutcheon, Allan L. (1987), Latent Class Analysis, Newbury Park, CA: Sage.
Romesburg, Charles (2004), Cluster Analysis for Researchers, Lulu.
Smithson, Michael and Jay Verkuilen (2006), Fuzzy Set Theory, Thousand Oaks, CA:
Sage.
Cliff, Norman (1987), Analyzing Multivariate Data, San Diego: Harcourt Brace Jovanovich.
Comrey, Andrew L. and Howard B. Lee (1992), A First Course in Factor Analysis, 2nd ed.,
Hillsdale. NJ: Erlbaum.
Fabrigar, Leandre R. and Duane T. Wegener (2011), Exploratory Factor Analysis, New
York: Oxford University Press.
Gorsuch, Richard L. (1983), Factor Analysis, 2nd ed., Hillsdale, NJ: Erlbaum.
Iacobucci, Dawn (1994), “Classic Factor Analysis,” in Richard Bagozzi (ed.), Principles of
Marketing Research, Cambridge, MA: Blackwell, 279–316.
Kim, Jae-On and Charles W. Mueller (1978a), Introduction to Factor Analysis: What It Is and
How to Do It, Beverly Hills, CA: Sage.
Kim, Jae-On and Charles W. Mueller (1978b), Factor Analysis: Statistical Methods and
Practical Issues, Beverly Hills, CA: Sage.
Long, J. Scott (1983), Confirmatory Factor Analysis, Newbury Park, CA: Sage.
Pette, Marjorie A., Nancy R. Lackey, and John J. Sullivan (2003), Making Sense of Factor
Analysis: The Use of Factor Analysis for Instrument Development in Health Care Research,
Thousand Oaks, CA: Sage.
Thompson, Bruce (2004), Exploratory and Confirmatory Factor Analysis, New York:
American Psychological Association.
Walkey, Frank and Garry Welch (2010), Demystifying Factor Analysis: How it Works and
How to Use It, Bloomington, IN: Xlibris.
Borg, Ingwer and Patrick J. F. Groenen (2005), Modern Multidimensional Scaling: Theory
and Applications, New York: Springer.
Borg, Ingwer, Patrick J. F. Groenen, and Patrick Mair (2012), Applied Multidimensional
Scaling, New York: Springer.
Clausen, Sten Erik (1998), Applied Correspondence Analysis, Thousand Oaks, CA: Sage.
Cox, Trevor F. and Michael A. A. Cox (2000), Multidimensional Scaling, 2nd ed., Boca
Raton, FL: Chapman & Hall/CRC.
Coxon, A. P. M. (1982), The User’s Guide to Multidimensional Scaling, Exeter, UK:
Heinemann.
Davison, Mark L. (1983), Multidimensional Scaling, New York: Wiley.
DeSarbo, Wayne and Jaewun Cho (1989), “A Stochastic Multidimensional Scaling Vector
Threshold Model for the Spatial Representation of ‘Pick Any/N’ Data,” Psychometrika,
54(1), 105–129.
DeSarbo, Wayne, Ajay K. Manrai, and Lalita A. Manrai (1994), “Latent Class
Multidimensional Scaling: A Review of Recent Developments in the Marketing and
Psychometric Literature,” in Richard P. Bagozzi (ed.), Advanced Methods of Marketing
Research, New York: Blackwell Publishers, 190–222.
Green, Paul E., Frank J. Carmone Jr., and Scott M. Smith (1989), Multidimensional Scaling:
Concepts and Applications, Boston: Allyn & Bacon.
Greenacre, Michael (2007), Correspondence Analysis in Practice, 2nd ed., New York:
Chapman & Hall/CRC Interdisciplinary Statistics.
Kruskal, Joseph B. and Myron Wish (1978), Multidimensional Scaling, Beverly Hills, CA:
Sage.
General References
Grimm, Laurence G. and Paul R. Yarnold (1995), Reading & Understanding Multivariate
Statistics, New York: American Psychological Association.
Iacobucci, Dawn (2017), Marketing Models: Multivariate Statistics and Marketing Analytics,
3rd ed., Nashville, TN: Earlie Lite Books.
Johnson, Richard A. and Dean W. Wichern (2007), Applied Multivariate Statistical Analysis,
6th ed., Upper Saddle River, NJ: Pearson.
Kachigan, Sam Kash (1991), Multivariate Statistical Analysis: A Conceptual Introduction,
2nd ed., New York: Radius Press.
Rencher, Alvin C. and William F. Christensen (2012), Methods of Multivariate Analysis, 3rd
ed., New York: Wiley.
Tabachnick, Barbara G. and Linda S. Fidell (2012), Using Multivariate Statistics, 6th ed.,
Upper Saddle River, NJ: Pearson.
MACHINE LEARNING
AND BIG DATA
255
feature selection and efficient optimization help achieve scale and effi-
ciency. Scalability is increasingly important for marketers because many
of these algorithms need to run in real time.
To illustrate these points, consider the problem of predicting whether
a user will click on an ad. We do not have a comprehensive theory of
users’ clicking behavior. We can, of course, come up with a parametric
specification for the user’s utility of an ad, but such a model is unlikely
to accurately capture all the factors that influence the user’s decision to
click on a certain ad. The underlying decision process may be extremely
complex and potentially affected by a large number of factors, such as
all the text and images in the ad, and the user’s entire previous browsing
history. ML methods can automatically learn which of these factors affect
user behavior and how they interact with each other, potentially in a
highly non-linear fashion, to derive the best functional form that explains
user behavior virtually in real time. ML methods typically assume a model
or structure to learn, but they use a general class of models that can be
very rich.
Broadly speaking, ML models can be divided into two groups: super-
vised learning and unsupervised learning. Supervised learning requires
input data that has both predictor (independent) variables and a target
(dependent) variable whose value is to be estimated. By various means,
the process learns how to predict the value of the target variable based
on the predictor variables. Decision trees, regression analysis, and neural
networks are examples of supervised learning. If the goal of an analysis
is to predict the value of some variable, then supervised learning is used.
Unsupervised learning does not identify a target (dependent) variable,
but rather treats all of the variables equally. In this case, the goal is not to
predict the value of a variable, but rather to look for patterns, groupings,
or other ways to characterize the data that may lead to an understanding
of the way the data interrelate. Cluster analysis, factor analysis (principle
components analysis), EM algorithms, and topic modeling (text analysis)
are examples of unsupervised learning.
In this chapter, we first discuss the bias–variance tradeoff and regu-
larization. Then we present a detailed discussion of two key supervised
learning techniques: (1) decision trees and (2) support vector machines
(SVM). We focus on supervised learning, because marketing researchers
are already familiar with many of the unsupervised learning techniques.
We then briefly discuss recent applications of decision trees and SVM in
the marketing literature. Next, we present some common themes of ML
such as feature selection, model selection, and scalability, and, finally, we
conclude the chapter.
Bias–Variance Tradeoff
The last term, se2, is inherent noise in the data, so it cannot be minimized
and is not affected by our choice of fˆ (x) . The first term is the squared
bias of the estimator; the second term is the variance. We can see that both
the bias and variance contribute to predictive error. Therefore, when we
are trying to come up with the best predictive model, an inherent tradeoff
exists between bias and variance of the estimator. By ensuring no bias,
unbiased estimators allow no tradeoff. We refer readers to Hastie et al.
(2009) for the formal derivation of the above.
To allow for a tradeoff, we introduce the concept of regularization.
Instead of minimizing in-sample error alone, we introduce an additional
term and solve the following problem:
minimizex a ( yi 2 fˆ (xi)) 2 1 lR ( f̂ )
n
(11.2)
i
x1 ≤ t1
y=1 x2 ≤ t2
y=2 y=3
of the predictions in the case of the decision tree being used in a regres-
sion setting, or the misclassification rate in a classification setting. The
split procedure evaluates the costs of using all of the input variables at
every possible value that a given input variable can assume, and chooses a
variable ( j*) and the value (u*) that yields the lowest cost. The stopping
criteria for the tree construction can either be based on the cost function or
on desired properties of the tree structure. For example, tree construction
can be stopped when the reduction in cost as a consequence of introduc-
ing a new tree node becomes small or when the tree grows to a predefined
number of leaves or a predefined depth.
The greedy algorithm implies that at each split, the previous splits are
taken as given, and the cost function is minimized going forward. For
instance, at node B in Figure 11.1, the algorithm does not revisit the split
at node A. However, it considers all possible splits on all the variables at
each node, even if some of the variables have already been used at previous
nodes. Thus, the split points at each node can be arbitrary, the tree can
be highly unbalanced, and variables can potentially repeat at later child
nodes. All of this flexibility in tree construction can be used to capture a
complex set of flexible interactions, which are learned using the data.
CART is popular in the ML literature for many reasons. The main
advantage of a simple decision tree is that it is very interpretable—infer-
ring the effect of each variable and its interaction effects is easy. Trees
can accept both continuous and discrete explanatory variables, can work
with variables that have many different scales, and allow any number of
interactions between features (Murphy, 2012). A key advantage of CART
over regression models is the ability to capture rich non-linear patterns in
data, such as disjunctions of conjunctions (Hauser et al., 2010). CART
models are also robust to errors, both in the output and in the explanatory
variables, as well as missing explanatory variable values for some of the
observations. Further, CART can do automatic variable selection in the
sense that CART uses only those variables that provide better accuracy in
the regression or classification task. Finally, because the CART technique
is non-parametric, it does not require data to be linearly separable, and
outliers do not unduly influence its accuracy. These features make CART
the best off-the-shelf classifier available.
Nevertheless, CART has accuracy limitations because of its discon-
tinuous nature and because it is trained using greedy algorithms and thus
can converge to a local maximum. Also, decision trees tend to overfit
data and provide the illusion of high accuracy on training data, only to
underperform on the out-of-sample data, particularly on small training
sets. Some of these drawbacks can be addressed (while preserving all of the
advantages) through boosting, which gives us MART.
Boosting or MART
where fk (x, bk) is the function modeled by the kth regression tree and ak
is the weight associated with the kth tree. Both fk (.) s and aks are learned
during the training or estimation.
We choose fk (x, bk) to minimize a prespecified cost function, which is
usually the least-squares error in the case of regressions and an entropy or
logit loss function in the case of classification or discrete choice models.
Given the set of data points (xi ,yi) 0 1 # i # n and a loss function L ( yi , ŷi)
corresponding to making a prediction of ŷi for yi, the boosting technique
minimizes the average value of the loss function. It does so by starting
with a base model F1 (x) and incrementally refining the model in a greedy
fashion:
where gk is the step length chosen so as to best fit the residual value:
Note the gradients are easy to compute for the traditional loss functions.
For example, when the loss function is the squared-error loss function
1/2 ( yi 2F (xi)) 2, the gradient is simply the residual yi 2F (xi) . In general,
boosting techniques can accommodate a broad range of loss functions and
can be customized by plugging in the appropriate functional form for the
loss function and its gradient.
MART can be viewed as performing gradient descent in the function
space using “shallow” regression trees (i.e., trees with a small number
of leaves). MART works well because it combines the positive aspects
of CART with those of boosting. CART, especially shallow regression
trees, tends to have high bias but low variance. Boosting CART models
addresses the bias problem while retaining the low variance. Thus, MART
produces high-quality classifiers.
Classification Problems
Linear Classifiers
Op
tim
al H
yp
erp
lan
e
Maximum
margin
(a) Many linear classifiers can correctly classify (b) The maximum margin classifier is the
this set of points strongest
be to find a line that passes as far as possible from all the points, as shown
in Figure 11.2b.
That is, we seek the classifier that gives the largest minimum distance
to all the training examples; this distance is called the “margin” in SVM
theory. For now, we rely on intuition to motivate this choice of the clas-
sifier; theoretical support for this choice is provided below. The optimal
separating hyperplane maximizes the margin of the training data, as in
Figure 11.2b. The training examples that are closest to the hyperplane are
called support vectors. Note that the margin in Figure 11.2b, M, is twice
the distance to the support vectors. The distance between a point xi and
the hyperplane (b, b0) is given by
0 b0 1bTxi 0
distance 5 (11.11)
0 0b0 0
Thus, the margin is given by M 5 2 . b0 10 b00b xi , which is twice the dis-
0 T 0
subject to
Because the optimal separating hyperplane is drawn as far away from the
training examples as possible, the MML is only robust to noisy predictors,
not to noisy labels. Because it does not allow for misclassified examples,
even a single misclassification error in the training data can radically affect
the solution. To address this problem, the above approach can be relaxed
to allow for misclassified examples. The main idea is this: instead of con-
straining the problem to classify all the points correctly, explicitly penalize
incorrectly classified points. The magnitude of the penalty attached to
a misclassification will determine the tradeoff between misclassifying a
training example and the potential benefit of improving the classification
of other examples. The penalization is done by introducing slack variables
for each constraint in the optimization problem in equation (11.12), which
measure how far on the wrong side of the hyperplane a point lies—the
degree to which the margin constraint is violated. The optimization
problem then becomes
00 b 00 2 1 C a ji
1 N
minimizex (11.13)
2 i51
Now, if the margin constraint is violated, we will have to set ji > 0 for
some data points. The penalty for this violation is given by C·ji , and it is
traded off with the possibility of decreasing 0 b 0 2 . Note that for linearly
separable data, if C is set to a sufficiently large value, the optimal solution
will have all the ji = 0, corresponding to the MML classifier. In general,
the larger the value of C, the fewer margin constraints will be violated.
Users typically choose the value of C by cross-validation. Note that in
this more general formulation, many more data points affect the choice
of the hyperplane: in addition to the points that lie on the margin, the
misclassified examples also affect it. We will come back to this formulation
shortly and see how this formulation can be seen from the point of view
of regularization.
The above problem is also a quadratic optimization problem that has
a convex objective function and therefore can be efficiently solved. One
common method for solving it is by introducing Lagrange multipliers and
forming a dual problem. The Lagrange function resulting from the opti-
mization problem in equation (11.13) is obtained by introducing Lagrange
multipliers to the objective function for the constraints:
(11.14)
b 5 a ai yi xi ,
N
(11.15)
subject to 0 # ai # C, g ai yi 5 0.
N
i51
Note that in the above optimization problem, the input features xi only
enter via inner products. This property of SVM is critical to the com-
putational efficiency for nonlinear classifiers. Next, we show how the
SVM machinery can be used to efficiently solve nonlinear classification
problems.
Suppose now that our data are not separable by a linear boundary, but can
be separated by a non-linear classifier, such as in Figure 11.3a. The kernel
x2 z2
x1 z1
(a) Points cannot be correctly separated (b) The same points in the transformed
with a linear classifier, but a nonlinear space are now linearly separable.
classifier f (x) = –2 + x12 + x22
separates them perfectly.
method, also known as the “kernel trick,” is a way to transform the data
into a different space, and construct a linear classifier in this space. If the
transformation is non-linear, and the transformed space is high dimen-
sional, a classifier that is linear in the transformed space may be nonlinear
in the original input space.
Consider the example of the circle shown in Figure 11.3a, which
represents the equation x21 1x22 5 2. That is, the non-linear classifier
f (x) 5 22 1 x21 1x22 separates the data set perfectly. Let us now apply the
following nonlinear transformation to x:
a i 2a a
N 1 N N
a 2 ai air yi yir 8 (xi ) , (xir) 9. (11.18)
i51 i51 i r 51
Thus, the solution involves (x) only through inner products. Therefore,
we never need to specify the transform (x), but only the function that
computes inner products in the transformed space:
The function K (x, xr) is known as the kernel function. The most com-
monly used choices for K are polynomial kernels:
m E 4m
Perr # 1 a11 1 1 b, (11.23)
N 2 Å NE
where 2N d
haln 1 1b 2 ln
h 4
E54 . (11.24)
N
N is the number of points in the training sample, m is the number
of training examples misclassified by the hyperplane, and h is the VC
Regularization
Dividing the data into separate sets for the purpose of training, valida-
tion, and testing is common. Researchers use the training data to estimate
models, the validation data to choose a model, and the testing data to
evaluate how well the model performs. We discuss below the reasons
for splitting the data into the constituent parts and issues related to this
framework.
We first examine the need for using a testing data set. As discussed
earlier, the goal of ML techniques is to provide the best out-of-sample
predictions as opposed to simply improving the model fit on the sample
data set. Given this need, the predictive ability of ML techniques is
evaluated by first constructing a model on a training data set and then
evaluating its accuracy on a testing data set, whose corresponding data
items weren’t included in the training data set. This approach provides
a meaningful estimate of the expected accuracy of the model on out-of-
sample data.
Let us now examine the need for having a validation data set. Consider
an ML technique that trains multiple models on a training set S, and
picks the model that provides the best in-sample accuracy (the lowest
error on the set S). This approach will prefer larger and more detailed
models to less detailed ones, even though the less detailed ones might
have better predictive performance on out-of-sample data. For example,
if we are approximating a variable y using a polynomial function applied
on inputs x, then, if we determine the order of the polynomial based on
the accuracy of prediction on the training set S, we would always pick a
very high-degree, high-variance polynomial model that overfits the data
in S and may, as a consequence, perform poorly on the testing data. To
address this issue, cross-validation splits the input data set S into two
components: St (training) and Sv (validation). It then uses the training set
St to generate candidate models, and then picks a model that performs best
on Sv as opposed to basing the decision solely on St fit. Cross-validation
thus ensures the chosen model does not overfit St and performs well on
out-of-sample data.
The cross-validation enhancement can be applied to any ML algorithm.
For example, in the case of boosted MART, cross-validation typically
works as follows. After each tree is computed based on St and added to the
MART ensemble, the MART is evaluated on the validation data set Sv .
Although the additional tree would have improved the accuracy on St , it
might not have necessarily improved the accuracy on Sv . So the algorithm
could introduce a stopping rule for MART construction that terminates
the MART construction when k consecutive iterations (or trees) have not
yielded accuracy improvements on Sv. The algorithm would then select
the MART (or the output of an intermediate step) that yielded the best
accuracy on Sv as the best model.
Note that validation is not free. The algorithm has to split the input
data set into two smaller components and use only one of them for the
purpose of training. This procedure leaves fewer samples to train a model,
resulting in a suboptimal model. However, the accuracy gains realized
from avoiding overfitting typically trump the reduction in the size of the
training data, particularly for large data sets. As a consequence, most ML
practitioners use cross-validation as part of their modeling toolkit.
Regularization
(1) Divide the data into k equal subsets (folds) and label them s 5 1,. . .,k.
Start with s 5 1.
(2) Pick an initial value for the tuning parameter.
(3) Fit your model using the k 2 1 subsets other than s.
(4) Predict the outcome variable for subset s and measure the associated
loss.
(5) Stop if s 5 k; otherwise, increment s by 1 and go to step 2.
Feature selection
Feature selection is a standard step in ML settings that involve supervised
learning (Guyon and Elisseeff, 2003). Feature selection typically provides
a faster and more computationally efficient model by eliminating less rele-
vant features with minimal loss in accuracy. It is thus particularly relevant
for training large data sets that are typical in various target application
settings. Feature selection also provides more comprehensible models that
offer a better understanding of the underlying data-generating process.
When the data sets are modest in size and the number of features is large,
feature selection can actually improve the predictive accuracy of the model
by eliminating irrelevant features whose inclusion often results in overfit-
ting. Many ML algorithms, including neural networks, decision trees,
CART, and naive Bayes learners, have been shown to have significantly
worse accuracy when trained on small data sets with superfluous features
(Duda and Hart, 1973; Aha et al., 1991; Breiman et al., 1984; Quinlan,
1993).
The goal of feature selection is to find the smallest set of features that
can provide a fixed predictive accuracy. In principle, this problem is
straightforward because it simply involves an exhaustive search of the
feature space. However, with even a moderately large number of features,
an exhaustive search is practically impossible. With F features, an exhaus-
tive search requires 2F runs of the algorithm on the training data set,
which is exponentially increasing in F. In fact, this problem is known to be
NP-hard (Amaldi and Kann, 1998).
The wrapper method addresses this problem by using a greedy algo-
rithm (Kohavi and John 1997). Wrappers can be categorized into two
types—forward selection and backward elimination. In forward selection,
Conclusion
Notes
1. The authors thank Bryan Bollinger, Shahryar Doosti, Theodoros Evgeniou, John
Hauser, Panos Ipeirotis, Lan Luo, Eugene Pavlov, Omid Rafieian, and Amin ZadKazemi
for their comments.
2. For a detailed discussion of the roles of causal, predictive, and descriptive research in
social sciences, please see Shmueli (2010).
3. In a comparison of logistic regression and decision trees, Perlich et al. (2003) examined
several data sets. Taking different sized subsamples of the data, they estimated both
models using learning curves, that is, how the model’s predictive accuracy improves as
the sample size increases. They found that logistic regressions work better for smaller
data sets, and trees work better for larger data sets. Interestingly, they found this pattern
holds even for training sets from the same domain.
References
The field of “Big Data” is vast, and rapidly evolving. It is fueled by the
explosion of opportunities and technologies to collect, store, and analyze
vast amounts of data on consumers, firms, and other entities. Many fields
of enquiry are germane to the analysis of big data, including statistical
inference, optimization, machine learning, networking, and visualization.
Given the vastness of the intellectual terrain that is involved, and the
variety of perspectives that one can use to harness big data, only a limited
understanding can be gained via a particular lens. While the analysis of
big data requires handling challenges that are associated with a number of
areas such as data storage, data processing, and rapid access to data, an
understanding of the emerging advances and technologies in these areas is
critical for unleashing the promise of big data. In this chapter, we restrict
attention to challenges that are associated with making statistical infer-
ences from big data. In doing so, it is useful to characterize big data by the
four Vs: volume, velocity, variety, and veracity.
VOLUME
Volume refers to the fact that a big data set contains a large quantity of
data. In a typical rectangular dataset, volume can be expressed in terms of
the total number of observations N. When N is very large, we have what
is called a tall dataset. In panel data settings, each individual has multiple
observations. Such data are typically analyzed via hierarchical models in
which the number of model parameters grows with the number of individ-
uals. Both the number of individuals and the total number of observations
characterize volume in these settings.
Many marketing contexts generate tall datasets. Retailers routinely
collect data on the purchases of millions of customers and aim to generate
insights and predictions, both at the individual and the population levels.
Consumers are immersed in a highly connected world and their activities
and interactions leave traces that can be of value to marketers. Click-
stream data generated from online interactions is one example of big data.
The internet is also a ready source of data on sequences of advertising
exposures for consumers and their consequent responses. Similarly, user-
280
VELOCITY
The modern world generates data at high velocity. For example, in retail
contexts, each browsing session, or each purchase occasion generates
new information about a consumer’s preferences. Companies such as
Netflix and Amazon have access to the viewing and purchasing habits of
their users as these dynamically evolve over time. Firms need to integrate
such new information with the existing profile of the customer in a timely
manner to account for shifts in preferences and tastes. More often than
not, recent shifts in preferences could be the best predictors of future
behavior and therefore timely integration of new information is of con-
siderable importance to firms. As a result, the arrival of new information
requires quickly updating customer-specific parameters in the statisti-
cal models that analyze the big data. Moreover, the aggregation of the
new observations across all customers also shifts information about the
entire customer base, and thus population level parameters also need to
be updated to reflect the changing preference structure. Such streaming
datasets can be considered infinite, as the number of observations grow
with time. Analysts therefore need online methods of inference to handle
these streaming contexts.
VARIETY
c ontained in the data. While data are available in many different modali-
ties, in the end, such data gets converted to numbers, and then analyzed,
either using traditional methods or via newer approaches. Relational data
garnered from social networks also poses its own challenges, in terms
of sampling, clustering, and data analysis. These data require complex
models as the dependency structure needs to be properly modeled, and in
many cases, modeling of heterogeneity becomes very important.
Marketing academics have begun to leverage data of variegated forms.
Methods of text mining and natural language processing, and approaches
based on topic modeling are gaining currency in marketing. Similarly,
information contained in images can be parsed and analyzed using image
processing methodologies and deep learning technologies that apply
hierarchical models composed of multiple layers to capture different levels
of representation.
Another aspect of big data variety is reflected in high dimensional
datasets. Such datasets are characterized by the number of variables
(dimensions), p being very large, and in some case much larger than N,
and are termed as wide datasets. In these instances, dimension reduction
and regularization (i.e., the ability to tradeoff model complexity and fit)
become crucial to avoid overfitting and to adequately summarize the
information within the wide data.
VERACITY
Finally, veracity refers to the inferential challenges that stem from the
way big data is sometimes pieced together from disparate sources. For
instance, firms can bring together data on consumer reviews, user gener-
ated content such as product tags, as well as traditional numerical indica-
tors of preferences in the form of ratings. Moreover, data germane to a
particular marketing context could be available across different levels of
aggregation. Such data that differs in modality and source of origin needs
to be fused appropriately to unearth meaningful insights, and uncertainty
about the quality of the data needs to be reflected in the analysis.
We now briefly describe the computational challenges that arise in esti-
mating complex models in big data settings and highlight a few strategies
that are being actively pursued to handle these challenges.
Optimization-based Approaches
Stochastic Approximation
Ct a , log f ( yi ; xi, ) .
g N
t11 5 t 1 (12.2)
N i51
This second-order gradient descent is a variant of the Newton method for
optimization.
When the data are massive (i.e., large N), or we have a streaming dataset
for which N is not known (i.e., infinite N), gradient descent and other tra-
ditional optimization approaches do not scale well or are not applicable,
due to the following two reasons. First, these methods typically require an
evaluation of the objective function using the entire dataset. For example,
the gradient descent method requires the computation of the gradient
using all the observations in the sample. In other words, across iterations,
multiple passes over all the observations are needed. This makes such
methods computationally prohibitive. Second, methods such as Fisher
scoring and Newton algorithm, require an inversion of p 3 p matrices for
each iteration, which significantly adds to the computational complexity,
when the data is high dimensional, i.e., p is large.
Given the above reasons, stochastic approximation methods based on
noisy estimates of the gradient become useful in reducing the computa-
tional burden.
The SGD method can suffer from numerical instability problems if the
learning rate gt is set too high and the algorithm can diverge, instead of con-
verging to the true parameter value. Setting the learning parameter too low,
however, can result in slow convergence. Moreover, as this is an approxi-
mate algorithm, there is an efficiency loss, compared to more traditional
optimization methods. The loss in efficiency can be handled using averaging
of the parameter estimates across the iterations. Toulis and Airlodi (2015)
and Toulis and Airlodi (2016) show that the instability issues can be tackled
using an implicit stochastic gradient descent method. The parameter update
in implicit method differs from the above SGD update as follows:
a k .
1 t im
t 5 (12.6)
t k51
In the above, the first equation represents the implicit update. This is
an implicit update because imt occurs on both sides of the equation. The
second equation represents the parameter averaging. Upon completion,
the averaged parameter provides an estimate of the true parameter.
While the above shows gradients based on a single observation, oftentimes
the gradients are based on random subsets of observations. This again
improves the stability of the algorithm.
Variational Bayes
where, Covq denotes the covariance with respect to the variational distribu-
tion. Instead of approximating Covq [ S () 21 ] and Covq [ S () , log p ( y,) ]
directly, one can iteratively evaluate these terms using weighted Monte Carlo
with random samples of ̂ generated from the latest variational approxima-
tion q ( 0 h) .In particular, when multivariate normal is used to approximate
the posterior, i.e., q ( 0 h) 5N (mq (q), Sq(q) ) , where, h 5{ µq(q), Sq(q) }, Minka
(2001) and Opper and Archambeau (2009) show that (12.10) implies
where 0/0 and 02 /02 denote the gradient vector and Hessian matrix of
log p ( y,) , respectively. As in the general case, one can use weighted Monte
2 ( )
Carlo to stochastically approximate the quantities, H 5 2Eq [ 0 log0p 2 y, ] ,
0log p ( y,)
g 5 Eq [ 0 ] , and m 5Eq [ ] . Due to non-conjugacy, an analytical
expression for the KL divergence is unavailable for FFVB, therefore we
assess convergence based on the relative change in the estimates of the
variational parameters.
Next, we discuss two simulation studies that implement MFVB, FFVB,
and the combination of these two to handle hierarchical marketing models.
where, yij indicates the response for person i on item j, i 5 1,. . .,I, and the
vectors li and gj represent individual and product heterogeneities, respec-
tively. The covariate xij characterizes the individual and the item, zj consists
of item-specific variables, and wi contains individual-specific covariates such
as demographics. Each person is assumed to respond to an idiosyncratic set
of j [ Ji items, yielding an unbalanced data set with a total of SN i51 Ji 5N
observations. Such a model arises, for instance, in recommendation systems
where users rate different items (products). Ansari and Li (2017) detail the
derivation of closed-form variational distributions for this model.
To assess the speed, scalability, and accuracy of the MFVB approach,
we now compare it to Gibbs sampling on simulated data sets of varying
sizes. For MFVB, we use a tolerance of 10−4 as the convergence criterion.
For Gibbs sampling, we run the chain for 5,000 iterations, which reflects
a conservative estimate of convergence given the multiple sources of
heterogeneity in the model.
Table 12.1 shows the comparison results for simulated data sets of
different sizes. One can see that MFVB requires very few iterations for
convergence. It is also clearly apparent that the MFVB approach is
considerably faster than MCMC and results in a substantial reduction in
computational time.1 The last column of Table 12.1 reports the ratio of
the time required for Gibbs sampling to that of MFVB. As the MFVB
approach requires fewer iterations for larger data sets, we see that this
ratio increases with data set size. Therefore MFVB scales much better than
MCMC for larger data sets.
To assess the accuracy, we simulate 10 different data sets with I 53,000
and J 550, and compute the root mean squared errors (RMSE) between
the estimated and the true parameters. Across the 10 simulations, the mean
and standard deviation of RMSE across model parameters are 0.338 and
r l 1 e .
Uijt 5 xijt (12.13)
i ijt
exp (xijt
r l)
i
P ( yijt 0 ,L) 5 . (12.14)
a exp (xikt li)
J
r
k51
Sq(b) 5 (S21
b 1Irq (L)Rq (L) )
21 21
and q(b) 5
of the data. Note that Algorithm 12.1 updates the variational parameters
associated with each individual in an inner loop (Step 3) and then updates
the variational parameters for the population quantities in Steps 4–5.
However, this is wasteful early in the iterative process as the individual
level updates are based on population values that are far from the truth.
Therefore, in stochastic VB, the inner loop involves updating the param-
eters for a mini-batch of randomly selected individuals. The size of the
mini-batch can be adaptively increased over time such that the final
estimates are based on the entire data.
Table 12.2 Estimation time (seconds) for the hierarchical logit model
Table 12.3 Total variation error for the hierarchical logit model
Model
True Covariance Hybrid VB MCMC
0.250 0.125 0.125 0.125 0.260 0.127 0.120 0.128 0.248 0.127 0.123 0.123
0.125 0.250 0.125 0.125 0.127 0.262 0.129 0.127 0.127 0.244 0.121 0.127
0.125 0.125 0.250 0.125 0.120 0.129 0.261 0.121 0.123 0.121 0.241 0.124
0.125 0.125 0.125 0.250 0.128 0.127 0.121 0.253 0.123 0.127 0.124 0.249
larger the data set, the smaller is the total variation errors for the VB
methods, reflecting the suitability of VB for big data settings.
As characterizing consumer heterogeneity is very important for target-
ing and personalization in marketing, we also examine the recovery of
the population covariance matrix L. Table 12.4 presents the estimates
as well as the true covariance in the simulation with I = 50,000 and
T = 200. It is clear from the diagonal and off-diagonal entries that
hybrid VB and MCMC yield variance and covariance estimates at
similar levels of accuracy, relative to the truth. Thus, to the extent that
population distribution is useful for targeting and personalization, we
have shown how the hybrid VB approach is useful in supporting these
marketing actions.
Until now we have focused on the computational challenges that arise
from tall datasets. We now shift attention to wide data and illustrate
briefly dimension reduction approaches that are useful in such settings.
Wide Data
number of distinct words (also called the vocabulary size) can be signifi-
cantly greater than the number of reviews.
When modeling high dimensional data, analysts are interested in both
understanding the patterns that are present in the data and in predicting the
outcome of interest for future observations. A regression for the ratings, or
a logistic regression for the sentiment, that uses all the features/variables
as independent variables can result in an unwieldy statistical model that
overfits the noise in the data, and is therefore unlikely to predict well in the
future. Such a model is also not very useful in developing a proper under-
standing of the data, given the large number of coefficients that appear
relevant. In such situations, one is interested in sparse representations of the
data, i.e., to identify a statistical model in which relatively few parameters
are shown to be important or relevant. Such sparsity is achieved via regu-
larization approaches that result in automatic feature/variable selection.
More formally, consider a linear regression setup,
Lasso
subject to 0 b 0 # t
zation constrains the coefficients to the extent that their sum lies within
a “budget” t. The budget controls the complexity of the model. A larger
budget implies that there is greater leeway for the parameters and there-
fore more parameters are allowed to be non-zero. The value of the tuning
parameter t. that results in best predictions can be determined separately,
typically via cross-validation. Predictive performance is best when the
model is complex enough to capture the signal in the data, without at the
same time overfitting. The above optimization can alternatively be written
using a Lagrangian specification as follows:
The tuning parameter l controls the relative impact of the loss function
and the penalty term. When l = 0, the penalty term has no impact, and
the lasso will provide the least squares estimates. Notice that the shrink-
age is not applied to the intercept, which measures the mean value of the
outcome variable.
The lasso is similar to the more traditional regularizer, ridge regression
that is popular in robust regression contexts. In ridge regression, the coef-
ficients are obtained via the following optimization:
towards zero, when l→. This helps in improving prediction, but the ,2
penalty just reduces the magnitude of the coefficients; it does not set any
of the coefficients to zero. In contrast, the ,1 norm associated with the
lasso is special as it yields sparse (or corner) solutions, i.e., it not only
shrinks the magnitude of the coefficients but also ensures that only some
of the parameters are assigned non-zero values, by shrinking some of the
coefficients exactly to zero. Thus the lasso provides automatic relevance
determination or variable selection, and thus yields sparse models.
One can study the geometry of the optimization setup to understand
why the lasso results in corner solutions. Figure 12.1 represents the
β2 10 β2 10
8 8
6 6
4 4
2 2
β1 β1
–22 2 4 6 8 10 –22 2 4 6 8 10
–2 –2
2
Figure 12.1 Constraint regions and contours of the error for lasso and
ridge regressions
s ituation for the two dimensional case. The constraint region ) b1)+) b2)# t
for the lasso is represented by the grey diamond and the constraint region
) b 21)+) b 22)# t2 for the ridge regression is represented by the grey circle. The
ellipses represent regions of constant loss (i.e., regression sum of squares).
The optimization solution is obtained by the first point where the elliptical
contours touch the constraint region. It is clear from the left figure that
the optimum can occur in the corner of the constraint set, whereas, such a
corner solution cannot be possible in the ridge regression setup.
While we looked at the lasso in the context of regression, it can also be
used for non-linear models, including generalized linear models. Extensions
of the lasso can be used for popular marketing models such as the multi-
nomial logit. We refer the reader to Hastie, Tibshirani, and Wainwright
(2015) for an extensive discussion of lasso and its generalizations.
Conclusions
centers on tall and wide datasets, the other characteristics of big data, includ-
ing velocity and variety, are becoming increasingly relevant. A number
of exciting untapped research opportunities exist in modeling marketing
data in streaming contexts as well. Similarly, marketers can benefit from
modeling approaches that handle data of multiple modalities, such as text,
numbers, images and sound tracks. It is our hope that marketing researchers
will enthusiastically embrace these emerging and promising opportunities.
Note
1. For a fair comparison, we code both VB and MCMC in Mathematica 11 and use the just-in-
time compilation capability of Mathematica to compile the programs to C. We run all pro-
grams on a Mac computer with 3GHz 8-Core Intel Xeon E5 processor and 32GB of RAM.
References
GENERALIZATIONS AND
OPTIMIZATIONS
At its most basic level, meta analysis is an attempt to codify what we can
learn from multiple past experiences.
Meta analysis and replication are closely related. Both focus on establish-
ing generalizations. In general, replications create data points for use in
meta analysis.
In marketing (and many other fields), meta-analysis has come to mean
a quantitative integration of past research projects, i.e., the analysis of a
number of related “primary” analyses. At least three types of meta analy-
ses have been employed which differ in their objectives.
22 a ln ( pi)
k
i51
305
In marketing, there are two types of focal variables of interest. One is the
level of a variable, for example the percentage of people who adhere to a
drug regimen. The other is the magnitude of the impact of one variable
on another, as assessed by a coefficient in some statistical model such as
regression analysis. This type of meta analysis seems most managerially
relevant. We focus on it for the rest of this chapter because managers base
decisions on the size of the marginal impact rather than on the correlation
or whether it is significant.
There are two basic reasons for doing a meta analysis. The first is knowl-
edge development and learning. It is interesting to learn about empirical
generalizations (see Hanssens 2015) including both a sense of what a
typical/average effect is and which factors make it larger or smaller. The
second is to use the results to make predictions about what would happen
if a certain situation arises or to discover which situation produces the
largest (or smallest) effect.
This seemingly trivial step is still necessary. For example, if you are inter-
ested in the effect of price on the dependent variable (e.g., sales), you
need to decide if it is the regular price or price promotion and whether to
study absolute price or relative-to-competition price. If the answer is all of
the above, then you need to include additional variables (Z) in the meta-
analysis “design” to account/control for the differences.
Practically, what to study depends on what data (studies) are available.
For example, studying how a particular result depends on a specific
variable may be very desirable but not feasible given the paucity (or even
absence) of studies that report it. This leaves a choice: either set out on a
major effort to run studies or switch topics/focus. Realistically the latter is
typically the chosen (and wisest) approach. The scarcity of data typically
leads researchers to include different types of studies in the meta analysis
and in effect combine “apples and oranges,” i.e., conceptual/imperfect
replications.
Meta analysis has two components, the model of the effect of interest used
in the individual studies and the meta-analysis model of factors that influ-
ence its key outputs.
Assume a number of studies have been run that assess the effect of a
variable X on the criterion variable Y:
where W stands for other variables that were included in the estimation of
B1. Here B1 is the effect of interest. The meta-analysis model then expresses
B1 as a function of other variables (Z):
B1 = C0 +C1Z +e2
Meta analysis focuses on both finding the “typical” B1, i.e., the average
effect, and, more heavily, on those factors (Zs) that influence it, i.e. the C
values. (Because the average is potentially influenced by the particular Z
values in the available observations, some researchers “de-bias” the aver-
age by using B0 to estimate it when the W values are effect coded.)
Variables to Include
“Design” Inefficiency
It would be ideal, once you specified the variables to include and coded
each observation on them, to simply run a regression (or some other pro-
cedure) on the data set and be done. Unfortunately, this is rarely possible
when you have several predictors (e.g., of the size of the effect).
The first problem is sample size. Many levels of variables (e.g., studies
done in South America) typically have few observations. Although there is
no hard rule, when you have fewer than five observations, the coefficient
of their effect tends to be unstable. This leaves two choices: drop the
variable (and risk omitted variable bias) or group the variable with similar
ones. While it is possible to do this by verifying which other variables seem
to have a similar effect and grouping them together, it is generally fine to
just group a variable with another on logical/theoretical grounds.
The second problem is non-significance of coefficients driven by limited
sample size plus collinearity (confounding) of the predictor (design)
variables. Here again you face the option to drop the variable (which
again may be insignificant because of its relation to another variable, thus
producing biased coefficients) or combine it with others in an index for,
perhaps, income and education. While this won’t separate the effects of
income and education, it also won’t produce a possible false interpretation
that only one matters.
Taking the two previous points together, this strongly suggests that the
first step in analysis should be to examine frequencies and the correlations
among the variables.
After an initial estimation of the meta -analysis model, at this point one
typically alters the variables in the model and re-estimates it. Depending
on the results, this may result in further modifications. The basic point is
that developing a meta analysis is a craft involving sequential adjustments
rather than a set of pre-determined steps.
Estimation Issues
Correlated Observations
fixed effect (dummy variable) to account for the mean effect for study typi-
cally performs quite well (Bijmolt and Pietens 2001).
Weighing Observations
Not all observations are of equal quality. One approach is to weigh the Bis
by the sample size used to estimate them. A more sophisticated approach
is to weigh the Bis by the inverse of their variance. Fortunately, in many
cases this does not materially alter the results.
Ancillary Statistics
Fail-safe n
This statistic (Rosenthal 1979) calculates the number of zero-effect studies
that would be required to be added before a/the result would become non-
significant. It has some value if the objective is to “prove” an effect is sig-
nificant (which suggests how many non-significant studies would have to
be excluded from the analysis, i.e., in a file drawer, to invalidate a finding).
Some researchers examine the pattern of results to see if there appears
to be a discontinuity at a specific level of result or statistical significance
(e.g., 5 percent). Essentially this involves “backcasting” a forecast of how
many small(er) results should exist for them to form a smooth curve (Rust,
Farley and Lehmann 1990). As a basic check, it is useful to simply plot the
distribution of effects and see if it looks reasonable.
Other tests
A number of other tests are sometimes reported. For example, Cochran’s Q
tests for whether results are equal (homogeneous) and I2 tests for whether
the variability is non-random (Huedo–Medina et al. 2006; Higgins and
Thompson 2002). If the meta analysis does not explain a significant amount
of the variance, it suggests all the results come from the same distribution
and can simply be averaged. Equivalently, if a particular design variable is
not significant, then it means that it may not have any effect. Importantly,
these tests are subject to the low power available in most meta analyses.
Indeed, some fairly large coefficients can be non-significant.
Equivalent tests can be done with regression. If the overall R2 of the
meta-analysis regression is not significant, this means you cannot reject
the hypothesis that there are no significant differences in the results (based
on the variables examined) and hence the studies are poolable, i.e., can
The simplest way to assess effects is to assume they are “fixed,” i.e., deter-
ministic. Alternatively, you can assume there is unexplained (random)
variation in them; that is, random effects. As in the general econometric
literature, there are proponents of both in meta analysis (Hunter and
Schmidt 2000). Given a bias toward parsimony, I prefer simpler methods
(fixed effects). Put differently, if one wants to get a reasonable (ball park)
sense, fixed effects should suffice, at least as a starting point. For those
interested in more precision or those who believe effects vary, random
coefficients are frequently employed.
Postscript
MIZIK_9781784716745_t.indd 315
Advertising Assmus, Farley, and Lehmann (1994) Advertising elasticity
Aurier and Broz-Giroux (2014) Long-term effects of campaigns
Batra et al. (1995) Advertising effectiveness
Brown, Homer, and Jeffrey Inman (1998) Ad evoked feelings
Brown and Stayman (1992) Attitude toward the ad
Capella, Webster, and Kinard (2011) Cigarette advertising
Compeau and Grewal (1998) Comparative advertising
Eisend (2011) Humor in advertising
Eisend (2006) Two-sided advertising
Grewal et al. (1997) Competitive advertising
Hite and Fraser (1998) Altitude toward the ad
315
Keller and Lehmann (2008) Health communication
Lodish et al. (1995) TV advertising
Aurier and Broz-Giroux (2014) Ad campaign effects
Sethuraman, Tellis, and Briesch (2011) Brand advertising elasticities
Vakratsas and Ambler (1999) How advertising works
White and Italia (2000) Fear appeals in health campaign
Brands Eisend and Stokburger-Sauer (2013) Brand personality
Heath and Chatterjee (1995) Decoy effects
Capabilities Cano, Carrillat, and Jaramillo (2004) Market orientation
Kirca, Jayachandran, and Bearden (2005) Market orientation
Krasnikov and Jayachandran (2008) Marketing, R&D and Operations Capabilities
Consumer Behavior Beatty and Smith (1987) External Search
Carlson, Vincent, Hardesty, and Bearden (2009) Relation of Objective and Subjective Knowledge
Farley, Lehmann, and Ryan (1981) (Fishbein) attitude models
14/02/2018 16:38
Table 13.1 (continued)
MIZIK_9781784716745_t.indd 316
Topic Paper Focus
Farley, Lehmann, and Ryan (1982) Howard-Sheth model
Holden and Zlatevska (2015) Partitioning paradox
Janiszewski, Noel, and Sawyer (2003) Spacing Effects and Verbal Learning
Peterson, Albaum, and Beltramini (1985) Effect size in consumer behavior experiments
Scheibehenne, Greifeneder, and Todd (2010) Choice Overload
Sheppard, Hartwick, and Warshaw (1988) Theory of Reasoned Action
Szymanski and Henard (2001) Customer satisfaction
van Laer, de Ruyter, Visconti, and Wetzels (2014) Narrative Transportation
Völckner and Hofmann (2007) Price-Perceived Quality Relationship
Zlatevska, Dubelaar, and Holden (2014) Effect of Portion Size
316
New Products Arts, Frambach, and Bijmolt (2011) Consumer innovation adoption
Bahadir, Bharadwaj, and Parzen (2009) Organic sales growth
Chang and Taylor (2016) Consumer participation in new product development
Evanschitzky, Eisend, Calantone, and Jiang (2012) New product success
Henard, Szymanski (2001) New product success
Krishna et al. (2002) Effect of price presentation
Montoya-Weiss and Calantone (1994) New product performance
Noseworthy and Trudel (2011) Evaluation of incongruous product forms
Rubera and Kirca (2012) Innovativeness and firm performance
Sultan, Farley, and Lehmann (1990) Diffusion (Bass) models
Szymanski, Troy, and Bharadwaj (1995) Order of Entry Effect
Troy, Hirunyawipada, and Paswan (2008) Cross-functional integration
Van den Bulte and Stremersch (2004) Social contagion and income inequality
14/02/2018 16:38
Method Churchill and Peter (1984) Rating scale reliability
Cooper, Hedges, and Valentine (2009) General Reference
MIZIK_9781784716745_t.indd 317
Eisend (2015) Effect Size
Eisend and Tarrahi (2014) Selection bias
Farley, Lehmann, and Mann (1998) Study design
Farley and Lehmann (1986) General Reference
Farley, Lehmann, and Sawyer (1995) General Reference
Glass, McGaw, and Smith (1981) General Reference
Hedges and Olkin (1985) General Reference
Homburg, Klarmann, Reimann, and Schilke (2012) Key informant accuracy
Hunter and Schmidt (2004) General Reference
Kepes et al. (2013) General Reference
317
Peterson (2001) Use of college students
Peterson, Albaum, and Beltramini (1985) Effect size in consumer behavior experiments
Rosenthal (1991) General Reference)
Schmidt (1992) General Reference
Price Bell, Chiang, and Padmanabhan (1999) Promotional response
Bijmolt, Heerde, and Pieters (2005) Price elasticity
Estelami, Lehmann, and Holden (2001) Macro-economic determinants of price knowledge
Kremer, Bijmolt, Leeflang, and Wieringa (2008) Price promotions
Nijs, Dekimpe, Steenkamp, and Hanssens (2001) Price promotions
Rao and Monroe (1989) Impact on perceived quality
Sethuraman (1995) National and store brand promotional price elasticity
Sethuraman, Srinivasan, and Kim (1999) Cross-price effects
Tellis (1988) Price elasticity
14/02/2018 16:38
MIZIK_9781784716745_t.indd 318
Table 13.1 (continued)
318
Geyskens, Steenkamp, and Kumar (1999) Channel relationship satisfaction
Other Blut, Frennea, Mittal, and Mothersbaugh (2015) Switching costs impact on satisfaction and repurchase
Geyskens, Steenkamp, and Kumar (1998) Trust in channel relationship
Gelbrich and Roschk (2011) Complaint compensation and satisfaction
Palmatier, Dant, Grewal, and Evans (2006) Relationship marketing
You, Vadakkepatt, and Joshi (2015) Electronic word of mouth elasticity
Zablah, Franke, Brown, and Bartholomew (2012) Customer orientation impact on frontline employees
14/02/2018 16:38
Meta analysis in marketing 319
References
Albers, S., Mantrala, M. K., & Sridhar, S. (2010). Personal selling elasticities: a meta-analy-
sis. Journal of Marketing Research, 47(5), 840–853.
Arts, J. W., Frambach, R. T., & Bijmolt, T. H. (2011). Generalizations on consumer innova-
tion adoption: A meta-analysis on drivers of intention and behavior. International Journal
of Research in Marketing, 28(2), 134–144.
Assmus, G., Farley, J. U., & Lehmann, D. R. (1984). “How Advertising Affects Sales: Meta-
Analysis of Econometric Results,” Journal of Marketing Research, 21 (February), 65–74.
Aurier, P. & Broz-Giroux, A. (2014). Modeling advertising impact at campaign level:
Empirical generalizations relative to long-term advertising profit contribution and its
antecedents. Marketing Letters, 25(2), 193–206.
Bahadir, S. C., Bharadwaj, S., & Parzen, M. (2009). A meta-analysis of the determinants of
organic sales growth. International Journal of Research in Marketing, 26(4), 263–275.
Bahadir, S. C., Bharadwaj, S., & Parzen, M. (2009). A meta-analysis of the determinants of
organic sales growth. International Journal of Research in Marketing, 27(1), 87–89.
Batra, R., Lehmann, D. R., Burke, J., & Pae, J. (1995). When does advertising have an
impact? A study of tracking data. Journal of Advertising Research, 35(5), 19–33.
Beatty, S. E. & Smith, S. M. (1987). External search effort: An investigation across several
product categories. Journal of Consumer Research, 14(1), 83–95.
Bell, D. R., Chiang, J., & Padmanabhan, V. (1999). The decomposition of promotional
response: An empirical generalization. Marketing Science, 18(4), 504–526.
Bijmolt, T. H., Heerde, H. J. V., & Pieters, R. G. (2005). New empirical generalizations on
the determinants of price elasticity. Journal of Marketing Research, 42(2), 141–156.
Bijmolt, T. H. & Pieters, R. G. (2001). Meta-analysis in marketing when studies contain
multiple measurements. Marketing Letters, 12(2), 157–169.
Blut, M., Frennea, C. M., Mittal, V., & Mothersbaugh, D. L. (2015). How procedural,
financial and relational switching costs affect customer satisfaction, repurchase intentions,
and repurchase behavior: A meta-analysis. International Journal of Research in Marketing,
32(2), 226–229.
Brown, S. P. & Peterson, R. A. (1993). Antecedents and Consequences of Salesperson Job
Satisfaction: Meta-Analysis and Assessment of Causal Effects. Journal of Marketing
Research, 30 (February), 63–77.
Brown, S. P., Homer, P. M., & Inman, J. J. (1998). A meta-analysis of relationships between ad-
evoked feelings and advertising responses. Journal of Marketing Research, 35(1), 114–126.
Brown, S. P. & Stayman, D. M. (1992). Antecedents and Consequences of Attitude Toward
the Ad: A Meta-Analysis. Journal of Consumer Research, 19(1), 34–51.
Bucklin, R. E., Lehmann, D. R., & Little, J. D. C. (1998). From decision support to decision
automation: a 2020 vision. Marketing Letters, 9(3), 235–246.
Cano, C. R., Carrillat, F. A., & Jaramillo, F. (2004). A meta-analysis of the relationship
between market orientation and business performance: evidence from five continents.
International Journal of research in Marketing, 21(2), 179–200.
Capella, M. L., Webster, C., & Kinard, B. R. (2011). A review of the effect of cigarette adver-
tising. International Journal of Research in Marketing, 28(3), 269–279.
Carlson, J. P., Vincent, L. H., Hardesty, D. M., & Bearden, W. O. (2009). Objective and
subjective knowledge relationships: A quantitative analysis of consumer research findings.
Journal of Consumer Research, 35(5), 864–876.
Chang, W. & Taylor, S. A. (2016). The Effectiveness of Customer Participation in New
Product Development: A Meta-Analysis. Journal of Marketing, 80(1), 47–64.
Churchill, G. A., Ford, N. M., Hartley, S. W., & Walker, O. C. (1985). The Determinants of
Salesperson Performance: A Meta-Analysis. Journal of Marketing Research, 22(2), 103–118.
Churchill, G. A. & Peter, J. P. (1984). Research Design Effects on the Reliability of Rating
Scales: A Meta-Analysis. Journal of Marketing Research, 21(4), 360–375.
Clarke, D. G. (1976). Econometric Measurement of the Duration of Advertising Effect on
Sales. Journal of Marketing Research, 13 (November), 345–357.
Compeau, L. D. & Grewal, D. (1998). Comparative price advertising: an integrative review.
Journal of Public Policy & Marketing, 17(2), 257–273.
Cooper, H., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and
meta-analysis (2nd ed.). New York: Russell Sage Foundation.
Eisend, M. (2015). Have We Progressed Marketing Knowledge? A Meta-Meta-Analysis of
Effect Sizes in Marketing Research. Journal of Marketing, 79(3), 23–40.
Eisend, M. (2011). How humor in advertising works: A meta-analytic test of alternative
models. Marketing Letters, 22(2), 115–132.
Eisend, M. (2006). Two-sided advertising: A meta-analysis. International Journal of Research
in Marketing, 23(2), 187–198.
Eisend, M. & Stokburger-Sauer, N. E. (2013). Brand personality: A meta-analytic review of
antecedents and consequences. Marketing Letters, 24(3), 205–216.
Eisend, M. & Tarrahi, F. (2014). Meta-analysis selection bias in marketing research.
International Journal of Research in Marketing, 31(3), 317–326.
Estelami, H., Lehmann, D. R., & Holden, A. C. (2001). Macro-economic determinants of
consumer price knowledge: A meta-analysis of four decades of research. International
Journal of Research in Marketing, 18(4), 341–355.
Evanschitzky, H., Eisend, M., Calantone, R. J., & Jiang, Y. (2012). Success factors of product
innovation: An updated meta-analysis. Journal of Product Innovation Management,
29(S1), 21–37.
Farley, John U. & Donald R. Lehmann (1986). Meta-Analysis in Marketing: Generalization
of Response Models. Lexington, MA: Lexington Books.
Farley, J. U., Lehmann, D. R., & Sawyer, A. (1995). Empirical marketing generalization
using meta-analysis. Marketing Science, 14(3_supplement), G36–G46.
Farley, J. U., Lehmann, D. R., & Ryan, M. J. (1981). Generalizing from “imperfect” replica-
tion. Journal of Business, 54(4), 597–610.
Farley, J. U., Lehmann, D. R., & Mann, L. H. (1998). Designing the next study for maximum
impact. Journal of Marketing Research, 35(4), 496–501.
Farley, J. U., Lehmann, D. R. & Ryan, M. J. (1982). Pattern in Parameters of Buyer
Behavior Models: Generalization from Sparse Replication. Marketing Science, 1 (Spring),
181–204.
Franke, G. R., & Park, J. E. (2006). Salesperson adaptive selling behavior and customer
orientation: a meta-analysis. Journal of Marketing Research, 43(4), 693–702.
Troy, L. C., Hirunyawipada, T., & Paswan, A. K. (2008). Cross-functional integration and
new product success: an empirical investigation of the findings. Journal of Marketing,
72(6), 132–146.
Vakratsas, D. & Ambler, T. (1999). How advertising works: what do we really know? Journal
of Marketing, 63(1), 26–43.
Van den Bulte, C. & Stremersch, S. (2004). Social contagion and income heterogeneity in new
product diffusion: A meta-analytic test. Marketing Science, 23(4), 530–544.
van Laer, T., de Ruyter, K., Visconti, L. M., & Wetzels, M. (2014). The Extended
Transportation-Imagery Model: A Meta-Analysis of the Antecedents and Consequences
of Consumers’ Narrative Transportation. Journal of Consumer Research, 40(5), 797–817.
Völckner, F. & Hofmann, J. (2007). The price-perceived quality relationship: A meta-ana-
lytic review and assessment of its determinants. Marketing Letters, 18(3), 181–196.
Witte, K. & Allen, M. (2000). A meta-analysis of fear appeals: Implications for effective
public health campaigns. Health Education & Behavior, 27(5), 591–615.
You, Y., Vadakkepatt, G. G., & Joshi, A. M. (2015). A meta-analysis of electronic word-of-
mouth elasticity. Journal of Marketing, 79(2), 19–39.
Zablah, A. R., Franke, G. R., Brown, T. J., & Bartholomew, D. E. (2012). How and when
does customer orientation influence frontline employee job outcomes? A meta-analytic
evaluation. Journal of Marketing, 76(3), 21–40.
Zlatevska, N., Dubelaar, C., & Holden, S. S. (2014). Sizing up the effect of portion size on
consumption: a meta-analytic review. Journal of Marketing, 78(3), 140–154.
324
(or services), product lines and assortments, the pricing of these offer-
ings, and the investment and allocation of resources towards activities
such as advertising and promotion, personal selling, distribution and
display involved in marketing these offerings. However, the emphasis
on optimization methods in Marketing Science clearly declined between
1980 and 2010. One indicator is that the term ‘optimization’ does not
figure among the top 20 most popular keywords associated with articles
in the leading journal Marketing Science since the beginning of the 1980s
(Mela et al. 2013). Rather, marketing scientists’ attention clearly shifted
to the empirical ‘estimation’ aspects of marketing problems as indicated
by keywords like ‘choice models’, ‘econometric models’, ‘forecasting’,
‘conjoint analysis’, ‘hierarchical Bayes’, and ‘Bayesian analysis’ in the top
20 keyword list of Mela et al. (2013).
But there are signs that research on marketing optimization methods
is making a comeback since the beginning of the new millennium with
the proliferation of new marketing technologies, channels, media, mar-
kets and competitors even as marketing budgets and resources remain
constrained. Clearly, marketers have many more options and factors to
consider and trade off in marketing decision-making with respect to prices
and limited resources. That is, marketing optimization problems facing
marketers have been rapidly multiplying in recent years, calling for greater
expertise in this domain for marketing success. Therefore, it is hoped that
this chapter’s review of classical as well as new marketing optimization
problems and solutions contributes to improving knowledge and stimulat-
ing research in this area.
In the next section, we begin with two important typologies of market-
ing optimization problems around which we organize the rest of the
content in this chapter. The first typology is a classification of these
problems according to the number (‘single’ or ‘multiple’) of ‘sales entities’
and marketing input variables involved in the problem. In the second
typology, we classify optimization problems according to the nature of the
objective function (e.g., static or dynamic) involved.
problem could be single, e.g., the firm’s entire ‘market’, or multiple, e.g.,
customer segments of this market, geographic areas in this market, prod-
ucts or services being marketed or time periods or intervals of a planning
horizon. Lastly, sales entities can be individual customers and households
or more aggregate groupings of customers, e.g., market segments or
markets. The distinguishing feature of any sales entity is that it is charac-
terized by a sales response function relating the marketing input/s directed
at it and the outcome/s of interest from it (typically taken to be ‘sales’
units, e.g., number or dollar value of customers or orders or physical units
of a product sold, unless otherwise stated). Sales response function is syn-
onymous with ‘demand function’, especially when the decision variable of
interest is price.
Next, a problem may involve single or multiple inputs. They typically
are one or more of the famous ‘4 Ps’ of the marketing mix – product,
price, promotion, and place (distribution). Here it is useful to distin-
guish between three types of common marketing inputs, namely product
‘attribute’, ‘price’ or ‘resource’. In general, an attribute, e.g., ‘convenience’
or ‘durability’, is a feature of a product that has one of more ‘levels’
and the decision-maker can choose to include one or the other levels of
the attribute in the product. Naturally, the inclusion or exclusion of the
attribute-level will impact customer demand for the product as would
be represented by its demand function – which could be specified at the
individual- or more aggregate-level. Similarly, price is the payment per
unit of a good or service that is expected or required by the supplier. Price,
price discounts, markdowns (magnitudes and/or timing), shipping fee
reductions, are all price-related marketing decision variables. Notably,
price can also be viewed as a product attribute whose level will affect the
demand for the product. However, the price level is ‘special’ because it also
appears in the product margin per unit and, therefore, will have a second
effect on the level of profit made by the decision-maker. Because it appears
twice in a profit-focused decision-maker’s objective function – once in the
demand function, and a second time in the gross margin per unit demand –
in a multiplicative way, the decision-maker’s profit outcomes are typically
very sensitive to price changes.
Lastly, a resource is a source or supply from which benefit is produced.
Typically resources are stocks or supplies of materials, efforts, time,
services, staff, knowledge, or other assets that are transformed to produce
benefit and in the process may be consumed or made unavailable. Thus,
resources have utility, limited availability and can be depleted. A resource,
however, has a cost or monetary expenditure related to it that will enter
the objective function of the optimization problem. In marketing, the
common resources of interest include advertising and direct market-
est will be modified to its ‘expected’ value (e.g., expected profit, expected
utility etc.) and his/her goal will be to choose the values of the input vari-
ables that optimize his/her expected value or utility objective function.
Depending on the form of this expected value objective function, the
variability (or variance) in the realized objective that is acknowledged by
the decision-maker may still not impact his/her optimal decisions. This
often occurs when uncertainty enters the objective function only in an
additive manner and is independent of the level of the marketing input,
and/or the decision-maker is ‘risk-neutral’, i.e., his/her expected utility
effectively does not give any weight to the variance in response. In all
other situations, the optimal decisions should be impacted by uncertainty
i.e., the optimal solutions in the deterministic versus stochastic cases
are different depending on the decision-maker’s risk attitude (e.g., risk-
neutral or risk-averse) (e.g., Aykac et al. 1989). Interestingly, deterministic
optimization problems tend to dominate in both academic research and
practice – probably because of the analytical tractability of deterministic
problems and/or the complexity in conceptualizing and solving stochastic
optimization problems.
MIZIK_9781784716745_t.indd 333
Single Marketing Input Multiple Marketing Inputs
Non-price Price Multiple Non-price Price and Non-price
Inputs (IMC problems) Inputs (Marketing Mix
problems)
Single Representative Dean (1951) Monroe and Della Gatignon and Hanssens Dorfman and Steiner
Entity study Bitta (1978) (1987) (1954)
Optimization Determining profit- Determining profit- Determining profit- Determining profit-
problem studied maximizing advertising maximizing price for maximizing mix of maximizing mix of
budget new product interacting advertising price, advertising, and
and sales force efforts product quality
333
Optimization Marginal analysis Marginal analysis Numerical optimization Marginal analysis
approach
Multiple Representative Lodish (1980) Reibstein and Mantrala et al. (2007) Kanuri et al. (2017)
Entity study Gatignon (1984)
Optimization Determining profit- Determining expected Determining platform Determining platform
problem studied maximizing sales profit-maximizing firm profit-maximizing firm profit-maximizing
resource allocation product line pricing distribution, product design and pricing of
across products, quality, and sales menu of subscription
customers investments plans
Optimization Repetitive incremental Numerical Analytical (marginal Mixed integer
approach analysis solution to optimization analysis) nonlinear program
knapsack problem
14/02/2018 16:38
Table 14.1b Illustrative dynamic marketing optimization models
MIZIK_9781784716745_t.indd 334
Single Marketing Input Multiple Marketing Inputs
Non-price Price Multiple Non- Price and Non-price
price Inputs (IMC Inputs (Marketing Mix
problems) problems)
Single Representative study Nerlove and Arrow Nair (2007) Naik and Raman Naik et al. (2005)
Entity (1962) (2003)
Optimization Determining Determining the Determining Determining interactive
problem studied discounted price sequence that the discounted advertising and price
cumulative profit- maximizes expected cumulative profit- promotion policies
334
maximizing discounted value of maximizing mix maximizing discounted
advertising future profits from a of TV & Print cumulative profit over
expenditure policy durable good advertising finite horizon when
over infinite horizon facing oligopolistic
competition
Optimization Calculus of variations Dynamic Deterministic optimal Specialized ‘marketing
approach programming control theory) mix algorithm’
(numerical allowing for
procedure) interactions based
on deterministic
differential game
theory
14/02/2018 16:38
MIZIK_9781784716745_t.indd 335
Multiple Representative study Aravindakshan et al. Bayus (1992) Sridhar et al. (2011) Fischer et al. (2011)
Entity (2014)
Optimization Determining Determining the price Determining platform Finding discounted
problem studied spatiotemporal of two overlapping firm discounted cumulative profit-
allocation of ad generations of cumulative profit- maximizing pricing
budget maximizing products to maximize maximizing and allocations of
expected discounted total discounted investment policies marketing budget
335
value of future profits profit over second for product quality across mix of countries,
over infinite horizon generation time and sales force products, and
horizon investments over marketing activities
finite horizon
Optimization method Stochastic optimal Deterministic optimal Deterministic optimal Calculus of variations
control theory control theory control theory with Lagrange
approach
14/02/2018 16:38
336 Handbook of marketing analytics
12.00 160.00
10.00
120.00
8.00
Sales ($)
Sales ($)
6.00 80.00
4.00
40.00
2.00
0.00 0.00
0 20 40 60 80 0 5 10 15
Marketing effort ($) Marketing effort ($)
S-Shaped Semi-log
70.00 5.80
60.00 5.60
50.00
Sales ($)
5.40
Sales ($)
40.00
30.00 5.20
20.00
5.00
10.00
0.00 4.80
0 20 40 60 80 100 120 140 160 0 2 4 6 8
Marketing effort (hours) Marketing Effort ($)
concave over the entire range of effort or is S-shaped, i.e., initially convex
exhibiting increasing returns, and then decreasing returns after some level
of effort (known as the ‘inflection point’) (see Figure 14.1). S-shaped func-
tions actually seemed more consistent with marketing managers’ intuitive
beliefs as reflected by many subjective judgment-based or ‘decision
calculus’ measurements, as well as observed practices such as pulsing or
flighting in expending advertising budgets (Little 1979). Figure 14.1 shows
common examples of specifications of concave and S-shaped response
functions. In proposing his early ‘ADBUDG’ specification of the sales
response model, Little (1970) clearly felt it appropriate to allow for both
possibilities and let the data decide.
Subsequently, however, the bulk of empirical evidence supported the
view that aggregate sales response functions are predominantly concave,
not S-shaped in form (e.g., Simon and Arndt 1980). This was a very useful
empirical finding for later models and research because concave functions
are not only easier to estimate but also easier to optimize using marginal
analysis methods of convex programming (a special case of NLP). Unless
otherwise stated, we shall hereafter assume that sales response models
are concave in this chapter’s exposition. Mathematically, we express the
concave sales response function, as s = f(x), where f is a continuous, dif-
ferentiable function, with a positive first derivative or slope, f’(x) > 0, and
a negative second derivative, f’’ < 0.
The objective function: The standard assumption is that the outcome of
interest is net profit = dollar margin per unit times sales units less the cost
of resource. Now, if the resource being invested is the monetary equivalent
of some physical units (e.g., number of ads or number of sales reps) then
the cost of the resource is simply the same as the resource expenditure.
However, if physical units is the measure of the resource being allocated
then the cost of the resource could be a linear or nonlinear function of
these units. In the latter case, the usual assumption is that the resource cost
function is convex in form, i.e., cost per unit of the resource increases as
more units of it are consumed. Hereafter, unless otherwise stated, we shall
assume the resource input choice variable in the optimization problem is
measured in dollars rather than physical units.
Assuming that we are considering a setting where competition is
absent or not active, the monopoly profit objective function can then be
expressed as: p 5 ( p 2 c) f (x) 2x, where p = price per unit and c is unit
production cost. Thus, m 5 ( p 2c) is the gross margin or ‘contribution’
per unit. In the present discussion, we assume both price and production
cost are held constant.
The constraints: In this problem of determining the optimal budget, the
only constraint is that x ≥ 0. The mathematical statement of the profit-
maximizing resource budget-setting problem is then:
The solution and optimality conditions: Notably, given the sales response
function is concave, the objective net profit as a function of the input
resource is also concave in form – specifically quadratic or inverted-U
in shape. This allows the use of convex programming (a special case of
NLP) approach to find the optimal budget. More specifically, because
the objective function is concave, we can simply find the maximum by
setting the first derivative of the objective function to zero, implying the
point where the incremental or marginal contribution dollars gained
1,010
810
The Flat
610
$
410
210
10
40 60 80 100 120 140 160 180
Marketing Investment level (hours)
The problem: The fundamental question in this class of problems is: what
is the optimal price to set for a product assuming other marketing vari-
ables are held fixed?
The choice variable in this problem is the price p > 0.
The response or demand function in this case is: s 5 f ( p) . The common
assumption for this price response function is that sales decrease as price
increases. Some common specifications for this downward-sloping price
response functions are shown in Figure 14.3.
The objective function for this problem is then the gross contribution
= margin times demand as a function of price. Note that price variable
enters in the margin as well as the demand function making the objective
30.00
25.00
20.00
Sales ($)
Linear
15.00
Nonlinear
10.00
5.00
0.00
0 2 4 6 8 10
Price ($)
Notes:
Linear response function: S = 25 – 2P.
Nonlinear response function: S = e25p–10.
where, e is the price elasticity and the left-hand side (LHS) of (14.4) is com-
monly known as the Lerner Index. This index (which is bounded between 0
and 1) is interpreted to be a measure of the market power for a monopolist
and the main insight is that the Lerner index reduces in magnitude as the
elasticity increases. That is, the higher the market’s price elasticity, the
lower is the firm’s market power. Note that (14.4) implies: p* 5 ( e 2e 1) c.
The solution: Dorfman and Steiner (1954) were the first to derive the
conditions for optimal values of the decision variables in this problem.
Continuing with the assumption that the marginal cost is fixed, the three
optimality conditions – one for each of p, u, and v are derived from setting
the partial derivatives of the objective function with respect to each choice
variable equal to zero, i.e., their FOC:
0f
( p 2c) 1 f ( p,u,v) 5 0 (14.6)
0p
0f
( p 2c) 2 1 5 0 (14.7)
0u
0f
( p 2c) 2 1 5 0 (14.8)
0v
Let us denote the gross margin as a fraction of price by L, i.e.,
( c)
L 5 p2 p
0f
; the marginal revenue product of advertising by h, i.e., h 5 p 0u ;
0f
the marginal revenue product of personal selling by ϑ, i.e., q5 p 0v ; and
recall that the price elasticity e 5 (pf ) 0p
0f
. Then the above first-order condi-
tions can be compactly and meaningfully summarized in the form of
the famous Dorfman-Steiner (D-S) (1954) conditions for marketing mix
optimality. Specifically, the optimal levels of the marketing mix variables
are those that simultaneously satisfy the following conditions:
1
5 e 5 h 5 u (14.9)
L
That is, the optimal levels of the price and resources are those at which
the reciprocal of the gross margin as a fraction of price equals the price
elasticity as well as the marginal revenue products of the marketing
resources. Note that with further manipulation of (14.9), the conditions
for the optimal levels of the resources can be expressed in terms of their
respective elasticities. Specifically, Albers (2000) has provided the follow-
ing two versions of the rule for the optimal ‘marketing resource’ (advertis-
ing or personal selling etc.) budget level:
v* v 0f
5µ5a b (14.10)
mf (v*) f 0v
i.e., [Optimal marketing resource budget/Gross margin revenues (or profit
contribution)] = marketing resource elasticity (µ). Alternatively,
v* µ
*
5 (14.11)
f (v ) e
The problem: There are two versions of these problems – the constrained
budget allocation problem; and the unconstrained budget allocation
problem. In the former, the question is: How should a given budget be
allocated across n different sales entities such as markets, customer seg-
ments, products, etc.? In the unconstrained problem, the optimal alloca-
tions are freely determined and their sum amounts to the optimal budget.
Below we consider the constrained budget allocation problem in a static,
deterministic, and monopoly decision-making setting. Some pioneering
examples of such marketing optimization problems published early on
include:
Notes:
a
LSTE is the long-term share of total effect, which is defined as (carryover effect/total
effect) = carryover effect/(current period effect + carryover effect).
b
The ‘90% duration interval’ is the number of periods during which 90% of the expected
total or cumulative marcom effort’s effect has taken place.
Choice variables: These are the allocations (in either physical units, e.g.,
number of ads, number of sales calls etc. or their monetary equivalents) of
the total resource budget made to each sales entity, xi for i = 1. . .n.
The sales response models: As in the case of the budgeting problem, the
sales response functions characterizing the market entities (e.g., geographic
areas, products and media) competing for the resource lie at the heart of allo-
cation models. Frequently these disaggregate response functions are hetero-
geneous in their parameters if not shapes and can be concave or S-shaped.
Again, however, positive allocations to units are likely to fall in the concave
portions of the sales response curves. Therefore, we shall continue with the
assumption that functions are concave unless otherwise stated.
The objective function: The objective function is the sum of the contribu-
tions from each sales entity. Note that if all the individual entities’ sales
response functions are concave then their sum, i.e., the objective function,
is also a concave function of the allocations. Also, let the margin per unit
be constant in time although it may vary across the sales entities.
The constraints: In the constrained budget allocation problem, the
allocations should be greater than or equal to zero and the sum of the
allocations across the sales entities should be less than or equal to the total
budget B. The manager’s optimization problem then can be stated as:
to the total budget., i.e., g x*i 5B. Note, however, that if the size of B is
should be equal at these allocations; and (2) these allocations should sum
sufficiently small, one or more entities may receive zero allocations in the
optimal solution.
In the unconstrained problem, when the total budget has not been
set, we can simultaneously determine the optimal total budget as well
as its optimal allocations by applying these optimality conditions:
m1 f1r (x*1) 5 ... 5 mi fir(xir) 5k, i.e., the marginal contributions of all enti-
ties at their optimal allocations should equal the marginal cost (k) of the
resource; and the optimal budget is equal to the sum of these optimal
allocations. Alternatively, the optimal allocations across the sales entities
are those at which the ratios of each pair of allocations is equal to the
ratio of their corresponding sales response elasticities, and the sum of the
allocations equals the total budget. Qualitatively, the key insight is that the
allocations to the sales entities should be proportionate to their response
elasticities or, more simply, the more responsive entities should receive
higher allocations.
Unfortunately, in practice, allocation decisions are often done by apply-
ing constant proportion of investment (CPI) allocation rules. Examples
include allocation of budgets according to the ratio of entities’ sales
potentials, consumer population sizes etc. The basic problem with such
allocation rules is that they often confuse potentials or market sizes with
responsiveness. Consequently, the optimal allocation ratios considering
responsiveness are often quite different from those of CPI allocations.
Further, under CPI rules all entities receive positive allocations regardless
of the size of the budget and, also, all entities’ allocations increase pro-
portionately as the budget is increased or decreased. However, given sales
response heterogeneity, optimization prescribes that for budgets below a
certain critical size, only some entities should receive positive allocations
while others should get nothing. Moreover, even when the given budget is
greater than the critical budget size, optimal allocations to entities often
increase disproportionately as the budget size is increased. This means
there can be reversals in the ratios of divisions of incremental budgets
among the entities. (Indeed, if the response functions are S-shaped, there
may even be reversals in not just allocation ratios but allocation levels as
well as budget increases.)
Furthermore, we have noted earlier that the ‘flat maximum effect’ can
mitigate to some extent the adverse consequences of budgeting errors.
However, as Mantrala et al. (1992) demonstrate, allocation errors are
usually much more consequential. Specifically, the authors show examples
where allocation errors can lead to so much of loss that the flat maximum
principle can be comforting for budgeters only when they can trust or
rely on allocators to make careful and optimal decisions. A number of
studies in the operations research literature have presented algorithms and
procedures for solving the distribution of effort problems when the sales
response functions are concave or S-shaped (see, e.g., Charnes and Cooper
1958; Freeland and Weinberg 1980; Koopman 1953; Sinha and Zoltners
1979).
Before concluding, we wish to highlight another interesting marketing
budget allocation problem that is a variation of equation (14.12) where
the objective metric is customer equity, e.g., Berger and Bechwati (2001)
(see also, Blattberg and Deighton 1996; Kumar and George 2007). More
specifically, customer equity is the sum of two customer-level net present
values: the return from customer acquisition spending and the return
from retention spending. Berger and Bechwati express customer equity
as: am 2A 1 a (m 2 Rr) [ rr (12rr) ] where, a is the acquisition rate (i.e.,
proportion of solicited prospects acquired) and depends on A, the level
of acquisition spending (i.e., dollars spent per solicited prospect), m is the
margin (in monetary units) on a transaction, A is the acquisition spending
per solicited prospect, R is the retention spending per customer per year.
Further, rr 5 (1 1r d) where, r is the yearly retention rate (as a proportion)
and d is the yearly discount rate appropriate for marketing investments
(again, as a proportion). The acquisition rate and retention rate are both
modeled as concave (modified exponential) functions of the acquisition
spending and retention spending respectively. Then the firm’s problem
is to allocate its promotion budget between acquisition spending and
retention spending so as to maximize its customer equity, subject to the
following constraints: A 1 (a*R) 5 B; A $ 0, R $ 0.
Berger and Bechwati (2001) solve this optimization problem using the
add-in Solver function in Excel that applies the NLP approach using
Generalized Reduced Gradient (GRG) technique. Solver proceeds by first
finding a ‘feasible’ solution, i.e., a solution for which all the constraints are
satisfied. Then, Solver seeks to improve upon the first solution through
changing the decision variables values to move from one feasible solution
to another feasible solution until the objective function has reached its
maximum or minimum.
More generally, Excel’s Solver can solve any non-linear optimization
problems with any type of restrictions (Fylstra et al. 1998). As noted by Albers
(2000), the development of Excel’s Solver, in a readily accessible spreadsheet
software environment, has certainly allowed the development and solution
of a wide range of common nonlinear optimization problems that arise in
marketing decision-making requiring numerical solution techniques.
The solution to the problem can be found by taking the derivatives of the
objective function with respect to p1 and p2, respectively, and setting the
resulting expressions equal to zero. These two first-order conditions can
then be simultaneously solved to obtain the optimal prices. Upon doing
so, we obtain the following results as indicated by Reibstein and Gatignon
(1984):
b1 b21 S*2
p*1 5 c d 2 ca b a * b ( p*2 2c2) d (14.14)
(b1 11) c1 b1 1 1 S1
b2 b12 S*1
p*2 5 c d 2 ca b a * b ( p*1 2c1) d (14.15)
(b2 11) c2 b2 11 S2
Note that if the two cross-price elasticities are zero, i.e., the products are
independent, then each product’s optimal price can be found independ-
ently according to equation (14.4).
The key insights from this solution are the optimal price for each
product is a function of (1) its own elasticity; (2) own marginal cost; (3) the
price of the other product; (4) the cross-price elasticity; (5) the scale factors
for each product; (6) the other product’s cross-elasticity; and (7) the other
product’s marginal cost.
0S 0R
pd 5 km1 1 m2 2 1 5 0 (14.18)
0d 0d
0S 0R
ps 5 km1 1 m2 2 1 5 0 (14.19)
0a 0a
exp (xgq
r b 1 p b 1 zr b )
ix gq ip gq iz
Prigq 5 ,4i [ I and 5q
5i P [QQ
a [ exp (xgrqbix 1 pgrqbip 1 z gqbiz) ] 1exp (ai)
G
4q P
r r
g r51
(14.21)
where,
xgq = a vector of 1s and 0s representing multi-format versions available in plan g
and choice set q
pgq = weekly subscription price of plan g in choice set q
zgq = a vector representing interactions between the formats
bix = a vector of parameter coefficients (partworths) corresponding to format-
version x for reader i
bip = parameter coefficient (partworth) of price p for reader i
biz = a vector of parameter coefficients (partworths) corresponding to the
interactions between the formats
eigq = random component of reader i’s utility
ai = constant term representing the utility of the no-choice option for reader i.
Subsequently, the analyst can use the preference data to measure the
WTP for each multi-format plan g using Kohli and Mahajan’s (1991)
piecewise linear approach: Uij 0 2p 1Ui (p) $ ai 1 | e , where Uij 0 2p represents
the total utility of the plan configuration j excluding reader i’s utility of
price. Ui ( p) is the utility of a price point p, ai is reader’s utility of the status
quo or the no-choice option and | e is an arbitrary positive number used to
round the price ‘p’.
Next, the analyst can use the following 4-equation simultaneous
response model system to obtain reader and advertiser demand function
elasticities by print and digital formats:
(14.22)
where,
PAt,OAt Print and digital advertising demand at time period t
PRt,ORt Print and digital reader demand at time period t
PAMM,OAMM Marketing investments that affect print and digital advertiser
demand
PRMM,ORMM Marketing investments that affect print and digital reader
demand
PMP,OMP Number of potential print and digital readers in the NDMA
Note that this system extends the one noted in the previous example
(i.e., by Mantrala et al. (2007)) to multiple formats.
The objective function: The primary objective of the newspaper is
to maximize profits from readers and advertisers, which can be expressed
as:
where, Bj indicates whether or not the newspaper is offering the jth sub-
scription plan, PFj is the subscription profit, PA, OA are the forecasted
print and digital advertising revenues, Ma is the margin on print and
digital advertising revenue, Skj and RPkj are the consumer surplus and res-
ervation prices and Pj is the price of the subscription plan j.
The constraints: While maximizing the objective function, the analyst
needs to account for the way in which readers self-select their subscription
plans (equations 14.24–14.26) (Moorthy 1984). In particular, the analyst
needs to account for the fact that a reader will select a plan only if: (1) the
surplus she derives from subscribing to plan j is strictly positive, and (2)
the surplus she derives from plan j is greater than the surplus she derives
from all the other plans offered in the menu.
The solution: Real-world product line design and pricing problems
(with the number of products > 2) generally belong to a class of NP hard
problems and therefore, analytical, closed-form solutions are not feasible.
Moreover, this particular problem presents the analyst with a discrete
combinatorial challenge with an extremely large search space. Therefore,
to obtain solution in a reasonable amount of time, the authors propose a
novel heuristic based solution to obtain profit maximizing plans.
The heuristic, which resembles a coordinated gradient ascent approach,
assists the newspaper in building its menu by sequentially assigning a
profit-maximizing plan to each segment, subject to plans assigned to prior
segments. The authors implemented their heuristic on real newspaper
data and obtained profit maximizing plans for several newspaper business
models. The key insights are: (1) the optimal product-line composition and
prices are influenced by the customer group that contributes the highest
revenue (i.e., advertisers) even though the product line is for the customer
group that contributes the least revenue (i.e., readers), (2) total profits
are maximized when marketing investments in each market are aimed at
jointly maximizing total profits from the two customer groups (integrated
strategy) rather than aimed at separately maximizing profits from each
customer group (‘siloed’ strategy), and (3) the profit maximizing menu,
under a siloed business model comprises of a partial mixed bundle of print
and digital subscription plans and that under an integrated business model
comprises of a pure bundle of print and digital subscription plans.
The problem: The dynamic analog of Problem 1a in the static case has been
discussed at length by Sethi (1977). This is the problem of determining or
characterizing an optimal policy for expending a marketing resource, say
advertising, over time (as opposed to the static problem of finding the one-
time) optimal advertising budget.
The choice or ‘control’ variable in this problem is advertising expenditure
rate u(t) over time.
The state equation: The dynamic version of the static sales response
model is expressed by the state equation which is typically a differential
equation (in continuous-time) or difference equation (in discrete-time)
with sales as the ‘state’ variable, which evolves over time, under the
influence of the ‘control’ variable, specifically, ad expenditure rate. Two
famous versions of the state equation employed in these models are:
dS S
5 axa1 2 b (14.28)
dt M
Note that a consumer will buy the product in the current period only if
his/her utility from purchase exceeds waiting in future state (St11). This
can be mathematically represented as:
ar
2 br pt 1 et .Wr (St) 1 er0t 2 ert (14.32)
(1 2dc ) 2
where,
where, a is the base sales and (b1, b2) denote unequal independent
effectiveness parameters and k, the coefficient of the interaction term,
denotes synergy between the two media when k > 0 and l is the carryo-
ver coefficient (Koyck form). Hereafter, we switch from discrete-time to
continuous-time horizon as this simplifies the analytics and exposition
to some extent. Given the focus on synergy as well as dynamics, the
continuous-time version of the state equation is specified as follows:
dS DSt
5 lim Dt S 0 (14.38)
dt Dt
dS
5 b1u (t) 1b2v (t) 1 ku (t) v (t) 2 (12l) S (14.39)
dt
The objective function is now the dynamic version of the static version
in equation (14.5). Specifically the stock of profit at each instant in time
is given by p (S,u,v) 5 mS 2 u2 2 v2 where it is assumed the cost of each
resource is a convex quadratic function of the physical units expended.
The decision-maker’s objective is then to choose u and v to maximize
cumulative discounted profit over an infinite horizon:
where, r denotes the discount rate and J(u, v) is the net present value of
any multimedia policies (u(t), v(t)).
The solution: Naik and Raman (2003) solve the maximization problem
induced by equations (21) and (22) by applying optimal control theory.
The optimal solutions for the two control variables are the following:
effective, then the advertiser should allocate the media budget equally
among them, regardless of the magnitude of synergy.
In reality, however, marketing effectiveness can vary over time, e.g.,
consumer segments, values and tastes change as products age and com-
petitive landscape or economic conditions change, making the aggregated
market less or more responsive over time to marketing efforts (e.g.,
Mahajan et al. 1980). In the context of online marketing investments,
Biyalogorsky and Naik (2003, 30) state that: with the changing nature of
the Internet, it is possible that . . . [the effectiveness of online marketing
investments] . . . may change over time in predictable ways’. Therefore,
next, we provide an illustration of a problem of marketing mix optimiza-
tion with time-varying effectiveness that also involves a finite rather than
infinite planning horizon.
Conclusion
MIZIK_9781784716745_t.indd 363
marketing elasticity, i.e. m 5 mfx* (x*)
The flat maximum principle: The realized profit is relatively insensitive to fairly wide
deviations from the optimum budget.
(
Static Single Entity Single Price Lerner’s Index ( p*p*2 c ) : this index, which is bounded between 0 and 1, is a measure of
Optimization Problems the market power for a monopolist. This should equal the reciprocal of the price elasticity at
optimality.
Static Single Entity Multi-variable The Dorfman-Steiner rule: the optimal levels of the marketing mix variables are those
Optimization Problems that simultaneously satisfy the following conditions:
1
5 e5 h5 u
L
Static Multiple Entity Single For an unconstrained problem, the allocations to the sales entities should be
363
Resource Allocation Problems proportionate to their response elasticities or, more simply, the more responsive entities
should receive higher allocations.
Static Multiple Entity Multi-variable For a firm offering two products at different prices, the optimal price for each product is
Optimization Problems a function of: (i) its own elasticity; (ii) own marginal cost; (iii) price of the other product;
(iv) the cross-price elasticity; (v) scale factors for each product; (v) the other product’s cross-
elasticity; and (vi) the other product’s marginal cost.
For firms with cross-market network effects, the optimum marketing resource budget is
a function of the ‘cross-market dependency coefficient’ (d).
For firms offering multiple products and experiencing cross-market network effects:
optimal product-line composition and prices are influenced by the customer group that
contributes the highest revenue even though the product line is for the customer group
that contributes the least revenue.
total profits are maximized when marketing investments in each market are aimed at
jointly maximizing total profits from the two customer groups (integrated strategy) rather
than aimed at separately maximizing profits from each customer group (‘siloed’ strategy).
14/02/2018 16:38
Table 14.3 (continued)
MIZIK_9781784716745_t.indd 364
Dynamic Single Resource Single The optimal ad expenditure in the steady state is directly proportional to sales in the
Entity Optimization Problems steady state.
Dynamic Single Entity Single Price For manufacturers producing durable goods:
Optimization Problems the optimal pricing policy is to charge a higher price in the initial period and lower prices
in the subsequent period.
the optimal price to charge in each period is contingent on the discount factor of the
consumers. Specifically, as the discount factor increases, the optimal price decreases and
the rate of price decline in future periods decreases.
forward looking consumers with low WTP can be beneficial because when forward
looking consumers with low WTP defer their decision to future periods, they end up
competing with forward looking consumers with high WTP, which consequently increases
364
the WTP of consumers with low valuations.
Dynamic Single-Entity Multi-variable A firm determining the profit maximizing mix of marketing communication expenditures over
Optimization Problems time, where the effectiveness of marketing inputs is time-invariant:
should increase the total budget but decrease (increase) the proportion of media budget
allocated to the more (less) effective communications activity, as the synergy between
multiple media increases.
should allocate the media budget equally amongst them, regardless of the magnitude of
synergy, if the various media are equally effective.
For a firm determining the profit maximizing mix of marketing communication expenditures
over time, where the effectiveness of marketing inputs is time-varying:
the optimal allocations are proportional to the effectiveness parameters.
the optimal allocation ratio will change over time, thereby directing managers to emphasize
different marketing mix elements at different times over the planning horizon.
the allocation ratio can switch over the planning horizon, causing complete reversals in
the emphasis placed on one instrument versus the other.
14/02/2018 16:38
Marketing optimization methods 365
Note
1. Little (1975) has proposed a discrete-time version of a dynamic sales response model
(BRANDAID) that he showed later is a generalization of the discrete-time versions of
the Nerlove–Arrow and Vidale–Wolfe models (Little 1979). It is useful to note here that
if a constant level of advertising expenditure was continuously applied, and the market
potential is fixed, then sales will reach a long-run equilibrium and the form of the sales-
advertising response function in this ‘steady state’ is linear (concave) under the Nerlove–
Arrow (Vidale–Wolfe) model.
References
Mantrala, Murali K., Prasad A. Naik, Shrihari Sridhar, and Esther Thorson (2007), “Uphill
or Downhill? Locating the Firm on a Profit Function,” Journal of Marketing, 71 (2),
26–44.
Mantrala, Murali K. and Surya Rao (2001), “A Decision-Support System that Helps
Retailers Decide Order Quantities and Markdowns for Fashion Goods,” Interfaces, 31
(3_supplement), S146-S65.
Mantrala, Murali K., Prabhakant Sinha, and Andris A. Zoltners (1992), “Impact of
Resource Allocation Rules on Marketing Investment-level Decision and Profitability,”
Journal of Marketing Research, 29 (2), 162.
Mela, Carl F., Jason Roos, and Yiting Deng (2013), “Invited Paper–A Keyword History of
Marketing Science,” Marketing Science, 32 (1), 8–18.
Monroe, Kent B. and Albert J. Della Bitta (1978), “Models for Pricing Decisions,” Journal
of Marketing Research, 413–28.
Montgomery, David B. and Alvin J. Silk (1972), “Estimating Dynamic Effects of Market
Communications Expenditures,” Management Science, 18 (10), B-485-B-501.
Montgomery, David B., Alvin J. Silk, and Carlos E. Zaragoza (1971), “A Multiple-Product
Sales Force Allocation Model,” Management Science, 18 (4-part-ii), P-3-P-24.
Moorthy, K. Sridhar (1984), “Market Segmentation, Self-selection, and Product Line
Design,” Marketing Science, 3 (4), 288–307.
Naik, Prasad A. and Kalyan Raman (2003), “Understanding the Impact of Synergy in
Multimedia Communications,” Journal of Marketing Research, 40 (4), 375–88.
Naik, Prasad A., Kalyan Raman, and Russell S. Winer (2005), “Planning Marketing-Mix
Strategies in the Presence of Interaction effects,” Marketing Science, 24 (1), 25–34.
Nair, Harikesh (2007), “Intertemporal Price Discrimination with Forward-looking
Consumers: Application to the US Market for Console Video-games,” Quantitative
Marketing and Economics, 5 (3), 239–92.
Nerlove, Marc and Kenneth J. Arrow (1962), “Optimal Advertising Policy under Dynamic
Conditions,” Economica, 129–42.
Raman, Kalyan and Rabikar Chatterjee (1995), “Optimal Monopolist Pricing under
Demand Uncertainty in Dynamic Markets,” Management Science, 41 (1), 144–62.
Raman, Kalyan, Murali K. Mantrala, Shrihari Sridhar, and Yihui Elina Tang (2012),
“Optimal Resource Allocation with Time-Varying Marketing Effectiveness, Margins and
Costs,” Journal of Interactive Marketing, 26 (1), 43–52.
Reibstein, David J. and Hubert Gatignon (1984), “Optimal Product Line Pricing: The
Influence of Elasticities and Cross-elasticities,” Journal of Marketing Research, 21 (3),
259–67.
Robinson, Bruce and Chet Lakhani (1975), “Dynamic Price Models for New-Product
Planning,” Management Science, 21 (10), 1113–22.
Seitz, Jürgen and Steffen Zorn (2016), “Perspectives of Programmatic Advertising,” in
Programmatic Advertising, Oliver Busch, ed. New York: Springer.
Sethi, Suresh P. (1977), “Optimal Advertising for the Nerlove–Arrow Model under a Budget
Constraint,” Operational Research Quarterly, 28 (3), 683–93.
Sethuraman, Raj, Gerard J. Tellis, and Richard A. Briesch (2011), “How Well Does
Advertising Work? Generalizations from Meta-analysis of Brand Advertising Elasticities,”
Journal of Marketing Research, 48 (3), 457–71.
Shankar, Venkatesh, Alladi V.enkatesh, Charles Hofacker, and Prasad Naik (2010),
“Mobile Marketing in the Retailing Environment: Current Insights and Future Research
Avenues,” Journal of Interactive Marketing, 24 (2), 111–20.
Simon, Hermann (1982), “ADPULS: An Advertising Model with Wearout and Pulsation,”
Journal of Marketing Research, 19 (3), 352–63.
Simon, Julian L. and Johan Arndt (1980), “The Shape of the Advertising Response
Function,” Journal of Advertising Research, 20 (4), 11–28.
Sinha, Prabhakant and Andris A. Zoltners (1979), “The Multiple-Choice Knapsack
Problem,” Operations Research, 27 (3), 503–15.
Sridhar, Shrihari, Murali K. Mantrala, Prasad A. Naik, and Esther Thorson (2011),
“Dynamic Marketing Budgeting for Platform Firms: Theory, Evidence, and Application,”
Journal of Marketing Research, 48 (6), 929–43.
Su, Xuanming (2007), “Intertemporal Pricing with Strategic Customer Behavior,”
Management Science, 53 (5), 726–41.
Thomas, Jerry W. (2006), “Marketing Optimization,” Decision Analyst.
Tull, Donald S., Van R. Wood, Dale Duhan, Tom Gillpatrick, Kim R. Robertson, and
James G. Helgeson (1986), “’Leveraged’ Decision Making in Advertising: The Flat
Maximum Principle and Its Implications,” Journal of Marketing Research, 23 (1), 25–32.
Urban, Glen L. (1975), “Allocating Ad Budgets Geographically,” Journal of Advertising
Research, 15 (6), 7–16.
Urban, Glen L. (1969), “A Mathematical Modeling Approach to Product Line Decisions,”
Journal of Marketing Research, 6 (1), 40–47.
Van Ittersum, Koert, Brian Wansink, Joost M. E. Pennings, and Daniel Sheehan (2013),
“Smart Shopping Carts: How Real-Time Feedback Influences Spending,” Journal of
Marketing, 77 (6), 21–36.
Verhoef, Peter C., P. K. Kannan, and J. Jeffrey Inman (2015), “From Multi-channel
Retailing to Omni-channel Retailing: Introduction to the Special Issue on Multi-Channel
Retailing,” Journal of Retailing, 91 (2), 174–81.
Vidale, M. L. and H. B. Wolfe (1957), “An Operations-Research Study of Sales Response to
Advertising,” Operations Research, 5 (3), 370–81.
Store Location
375
market share of the new store. The estimate of market potential needs to
include the likely market expansion due to the presence of the new store.
The expected market share for the new store depends on the strength of
competing stores in the area. While historical data can provide estimates
of the current market potential and market shares of existing stores,
judgment is called for estimating market expansion and market share.
Conjoint methods have been applied in this context. One model in the
franchising context (Ghosh and Craig 1991) considers both the potential
to take market share from existing competitors and the market expansion
potential in the geographic area due to the new store. We will first describe
a mathematical model to estimate expected market share and then show
how judgment is used for estimating its components as described in
Durvasula, Jain, and Andrews (1992).
Let us consider a geographic area with n existing stores and introduc-
tion of another store (n+1). Let Mi, denote the market share of the i-th
store. Let ME denote the market expansion due to the presence of the
existing stores are given by: MSi 5 (Mi 2PMS*Mi 1ki*ME) / (1 1 ME) .
Here, market shares of the n existing stores are typically known and the
other quantities (PMSs, ks, and ME) need to be estimated by another
model or judged by the decision makers.
One model used for estimating the PMS quantities is: PMSi =
PMIN + (PMAX- PMIN) (1- f (Si)); i = 1, . . ., n, where PMIN (≥
0) and PMAX (≤ 1) are the minimum and maximum share an outlet
can obtain and Si is the relative strength of the existing stores in the
area. Typically, f (Si) is modeled as a logistic function in Si. PMIN and
PMAX are judgmentally obtained. The relative strength construct (Si)
depends on various store attributes and can be modeled using conjoint
analysis.
Durvasula, Jain, and Andrews (1992) applied this model for the case of
banks and showed how conjoint analysis can be used in estimation. The
context is that of a firm, called ABC Commerce, evaluating the potential
of four locations, L1, L2, L3, and L4 in a certain geographic region. The
firm currently has 16 branches in the region. In order to evaluate relative
strength, the authors identified five attributes (by an exploratory study).
A Bidding Application
to different terrains and angles. In the conjoint study, each attribute was
varied at three levels and 302 subjects ranked 18 full profiles. The authors
estimated the MVAI for each of the five attributes when changes are made
in each of the three products. Their results show that the benefits from
improving all attributes except set-up time exceed the cost of making the
improvement. The authors found that the MVAI values calculated using
a commonly used approach of averaging the ratio of weights of attribute
and price across the individuals in the sample to be considerably upward
biased and possibly incorrect. Further, the profitability of different
attribute improvements is much lower when competitive reactions are
considered in the computations. Note that such reaction calculations are
possible with simulations in conjoint studies.
This application will describe how conjoint analysis was applied in setting
marketing initiatives (largely push marketing strategies) in a B2B context
using the published article by Levy, Webster, and Kerin (1983), who
applied conjoint analysis to the problem of determining profit functions for
alternative push strategies for a margarine manufacturer. They described
each push strategy in terms of four marketing mix variables: coopera-
tive advertising (3 levels described as 3 times at 15 cents/lb.; 4 times at 10
cents/lb.; and 6 times at 7 cents/lb.), coupons in local newspapers (3 levels
described as 2 times at 25 cents/lb., 4 times at 10 cents/lb. and 3 times at 15
cents/lb), financial terms of sale (2 levels described as 2 percent/10 days/net
30 and 2 percent/30 days), and service level defined in terms of percentage
of items shipped that were ordered by the retailer (3 levels described as 96
percent, 98 percent, and 99.5 percent). While the costs for a push strategy
could be computed from internal records of the firm, sales response could
not be estimated from past data. The authors utilized conjoint analysis
to determine the retailers’ sales response to different push strategies. For
this purpose, nine profiles, developed using a partial factorial orthogonal
design, were presented to a sample of 68 buyers and merchandising man-
agers. The judgment by the respondent was the expected change from last
year’s sales due to the push marketing mix defined by each profile. All the
retail buyers were classified into small, medium, and large buyers, with
respective levels of past purchases of 5,000, 15,000, and 30,000 cases. The
sales level used in the questionnaires was changed according to the size
of past buying by the retail buyer. The judged sales changes were used in
computing the expected sales revenues and profits from each marketing
mix and average partworth values were computed as dollar sales.
Based on this analysis, the authors concluded that the least profitable
marketing mix is cooperative advertising offered three times a year at 15
cents per pound, coupons in newspapers offered two times a year at 25
cents per pound, terms of sale 2 percent/10 days/ net 30, and 96 percent
level of service. The most profitable marketing mix consisted of coopera-
tive advertising six times a year at 7 cents per pound, coupons four times
a year at 10 cents per pound, 2 percent/30 day terms and a 98 percent
service level. Although the particular results are specific to the situation
considered, the application shows how conjoint analysis can be employed
to determine the allocation of a marketing mix budget for a brand.
Place yourself in a situation where you have just settled down in a new city, and
you are thinking of purchasing a new 17’” computer monitor for yourself, since
you sold the old one when you moved. You have a budget of three hundred U.S.
dollars for this purchase, and you have other uses for any funds left over. You
also wish to get the monitor soon due to the need of some work at hand. After
some initial information search, you have narrowed down to your most favorite
model. Your search has also identified three retailers, each of which is the best
in each of the three channels from which you may consider purchasing the
monitor, bricks & mortar, print catalog, and the Internet/online. Fortunately, all
of them carry the model you want.
All three retailers are described on five attributes of average price, prod-
uct trial/evaluation, sales assistance, speed of acquiring purchased monitor,
and convenience of acquisition and return, described on 3, 2, 3, 3, and 3
levels respectively. The definitions of the levels were as shown in Table 15.3.
This study was conducted among 146 graduate and senior undergradu-
ate students (78 males and 68 females) in a major Northeastern university;
respondents were compensated for their participation in the study. Each
survey took about half an hour and consisted of 11 conjoint choice
tasks on channel choices for the purchase of a computer monitor and
respondents were asked to choose the one option from which he/she would
Table 15.3 Attributes and levels for the computer monitor conjoint study
Attribute Levels
Average price 1. around $230
2. around $250
3. around $270
Product trial/evaluation 1. display only
2. display AND physical/virtual trial
Sales assistance 1. not available
2. only minimal technical support
3. very helpful with rich technical information
Speed of acquiring purchased 1. same day
monitor 2. within 2–7 days
3. longer than 7 days
Acquisition and return 1. in store only
2. mail only
3. in store OR mail
Rho-square 0.37
ajority of the respondents had more than three years of online experi-
m
ence (93.8 percent of the 146 respondents) and spent less than 20 hours per
week online (72.4 percent). One-third (32.4 percent) of the respondents
spent less than $200 per year online; another third (37.9 percent) spent
between $200 and $1,000 annually online; the rest of them spent more than
Conclusion
Notes
1. This material is drawn from Chapter 9 and Section 8.6.1 of Vithala R. Rao, Applied Conjoint
Analysis, Berlin Heidelberg: Springer Verlag, 2014; used with the permission of Springer.
2. The catering company also sets fixed fees for setting up the catering arrangement and
arranging special banquets, but these were outside the scope of this study.
3. While the authors developed their theory using continuous changes in the attributes, we
use discrete changes for the purpose of exposition.
References
Levy, Michael, John Webster, and Roger Kerin (1983), “Formulating Push Marketing
Strategies: A Method and Application,” Journal of Marketing, 47 (Winter), 25–34.
Ofek, E. and V. Srinivasan (2002), “How Much Does the Market Value an Improvement in
a Product Attribute?” Marketing Science, 21 (4), 398–411.
Rao, Vithala R. (2014), Applied Conjoint Analysis. New York: Springer.
l Phone style: candy bar, slide phone, flip phone, or touch screen (4
levels);
l Brand: Samsung, Google, Nokia, and LG (4 levels);
l Weight: 100gm, 115 gm, 130 gm, and 145 gm (4 levels);
l Talk time: 5 hours, 7 hours, 9 hours, and 11 hours (4 levels); and
l Camera quality (in megapixels); 8, 12, 16, 20 (4 levels).
These part utilities will then be used to estimate the overall utility of any
product. Normally, the estimated model is “validated” using additional
data collection.
An example of a utility model is:
where DS1, DS2, and DS3 are dummy variables (taking values of 1 or
zero) for the phone styles of slide phone, touch screen, and flip phone,
respectively, and DB1, DB2, and DB3 are dummy variables (taking
values of 1 and zero) for the brands of Samsung, Nokia, and Google,
respectively. This estimated utility model has face validity. Touch screen is
preferred relative to other phone styles, Google brand is preferred, lighter
phone is preferred, more hours of talk time and megapixels are preferred.
Figure 15A.1 shows the steps involved in the conjoint methodology;
this figure shows only two of the many options available for implementing
conjoint analysis.
Identify
Product
Attributes
and Levels
Ratings- Choice-
Based Based
Design Design
Profiles Choice
Sets
Collect Collect
Preference Choice
Data Data
Analyze Analyze
Data Data
(Regression) (Logit)
390
1 2 3 4
Validating
Defining Organizing,
Insights
Managerial Leveraging Analyzing
& Discussing
Problem Data
Strategy
5
Infrastructure and Training to Improve Decision Making
MARKETING
Level 1
Adwords
Email
Feed Catalog Feed
Fax
Flyer
Benefits Benefits
Costs
Level 2
ONLINE CHANNEL OFFLINE CHANNEL
Web visits Leads (info requests)
F Leads (info requests) Cross-Channel effect F
Quote requests
Quote requests Orders
Orders
PROFITS
Level 3
Both online and offline marketing activity may ultimately generate profits
(level 3 in Figure 16.2) by inducing prospective customers to start/finish
their purchase process either online or offline. Customers may search
online when the need arises for office furniture, visit the website to ask for
information, but then call up the salesforce for the final quote and order
(cross-funnel effects). Moreover, a marketing exposure or touch point
may increase conversion down the funnel. For instance, being exposed
to paid-search ads may increase the prospect’s familiarity with the brand,
while a well-designed catalog in the mail can signal the high quality of the
company and its product. Both instances may increase customer conver-
sion in later stages. In our framework and model, we account for both:
marketing activities can affect the beginning but also later stages of the
purchase funnel.
Variable Operationalization
Marketing Catalog Daily cost of catalogs (0 on days with no
activity catalogs sent)
Fax Daily cost of faxes (0 on days with no faxes
sent)
Flyers Daily cost of flyers (0 on days with no flyers
sent)
Adwords Daily costs of pay-per-click referrals
eMail Daily number of net emails (sent minus bounced
back)
Discounts Percentage of revenue given as a discount
Online Web visits Daily total amount of visits to the website
funnel Online leads Daily requests for information received via the
website
Online quotes Daily requests for offers received via the website
Online orders Daily number of orders received via the website
Offline Offline leads Daily requests for information received via sales
funnel reps, telephone or mail
Offline quotes Daily requests for offers received via sales reps,
telephone or mail
Offline orders Daily number of orders received via sales reps,
telephone or mail
Performance Sales revenues Daily sales revenues
(Gross) profit Daily revenues minus cost of goods sold
Figure 16.3 shows estimated impulse response functions, i.e., the profit
effects for €1 spent on the three main marketing activities. Table 16.2
derives from these figures the total (cumulative) profit effect, including the
number of days till the peak effect (wear-in period) and the total number
of days with significant profit effects (wear-out period).
Catalogs showed no significant profit effects. While faxes achieved their
peak impact on the day sent (wear-in of 0), Adwords took one day and
Flyers took two days to do so (wear-in of 2). Interestingly, the effect of
faxes also wore out quickly, while Adwords and Flyers continued to affect
purchases for at least one week. In response to Inofec’s questions about
these differences, we proposed that these temporal patterns were driven by
the effect of different marketing activities on different stages of the pur-
chase funnel. Based on the restricted impulse response analysis (Pauwels
2004), we estimated the separate effects of each marketing activity on the
online and offline funnel stages, as shown in Figure 16.4.
Faxes hardly “feed the funnel” at all: they are unlikely to get the atten-
tion of prospective customers early on in the purchase funnel. However,
they directly increase online information requests and quotes, and offline
orders. The latter direct path represents 83 percent of faxes’ total profit
impact. Because of this direct effect on later funnel stages, the profit
impact of faxes materializes and dissipates quickly. Higher spending on
Google Adwords both feeds the funnel, in the form of online visits, and
increases online quotes and orders, even keeping online visits constant.
This illustrates the “billboard” or “inferred quality” effects of Google
Adwords: we infer (in the absence of individual-level data, which Google
does not share) that high paid-search rankings increase the likelihood that
a prospective customer, after having checked and dismissed competitive
offerings, progresses towards a purchase. Two-thirds (66 percent) of
Google Adwords’ impact is through the visits-offline orders path, explain-
ing the longer wear-in of the profit effect of Adwords versus faxes. Finally,
flyers feed both the online and the offline funnels and yield profit through
many paths, none of which dominate and all of which yield rather small
profit effects in the end. As a result, flyers take longer to wear-in and have
a smaller total impact on profits than either faxes or Adwords.
Finally, Figure 16.4 shows a clear directionality of cross-channel effects.
Offline marketing may affect online funnel metrics, but not vice versa.
Conceivably, many prospective customers prefer to start the purchase
decision process online, even when they noticed the firm’s offline market-
ing activities. In contrast, online funnel metrics significantly affect offline
funnel metrics, but not vice versa. In other words, some customers move
from online to offline as their decision process moves from information to
evaluation and finally to action. This is consistent with prospects enjoying
2 20 0.2
1.5 15 0.1
1 10 0
0.5 5 –0.1
396
0 0 –0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
Days Days Days
Note: * Profit effect estimate of 1 euro spent in solid line, standard error bands in dotted lines.
14/02/2018 16:38
Time series econometrics to quantify funnel progression 397
Table 16.2 Marketing’s total profit effect, sales elasticity and its timing
in days
the search convenience of the Internet at early stages, and personal contact
with salespeople at later stages of the purchase cycle.
Adwords
5.17 0.0068 0.0383
1.53
0.025 0.0011 0.17 0.03
0.09 1.94
Offline Offline Offline 73%
Flyers
0.03 0.0005 0.0004
Online 0.0011 0.0636 3.50
Online Online Online 20%
visits
1.53
0.025 0.0011 0.17 0.03
Faxes
0.02 0.0004 0.0002
1.53
0.09 1.94
Offline Offline Offline 94%
0.0096
Table 16.3 Daily net profit changes during the experiment versus before
the experiment
Adwords
High Base
Flyers Base € 81.39 € 10.84
Low € 153.71 € 135.45
could simply cut the least efficient activity of flyers, while maintaining
spending on Adwords. To validate that our estimated effect sizes would
still hold up after such a substantial policy change, we re-estimated our
model on the 91 days of data during the experiment, and indeed found
similar coefficient estimates. The one exception was that each euro spent
on flyers now returned 0.92 euros in the lowest marketing spend condition.
This was consistent with Inofec’s explanation that diminishing returns
were to blame for the original findings and suggested that flyers should
not be cut much more.
This case study changed the organization as it led Inofec to rethink how
it makes decisions. Since its inception, the company was managed by
intuition. Hence, it was unlikely to totally abandon “gut feel” in deci-
sion making. Given the complexity of marketing problems, the litera-
ture suggests that a combination of marketing analytics and managerial
intuition provides the best results for many marketing decisions (Lilien
and Rangaswamy 2008). Accordingly, Inofec now uses both scientific
approaches as well as intuition in order to make their decisions. Moreover,
our work became a basis for discussing the operational dimensions of
Inofec’s marketing activities, affecting the mental models of decision
makers throughout the organization (Kayande et al. 2009). We devel-
oped a spreadsheet-driven dashboard tool – including a rolling windows
approach to update the model estimates – that allows easy entry of
potential marketing allocation plans and then uses the model estimates
to project likely profit consequences (Pauwels et al. 2009). Finally, the
ongoing training and increasing clout of a new employee, in charge of
marketing analytics, is expected to help institutionalize the marketing
scientific approach to allocating marketing resources – the final step in
model adoption according to Davenport (2009). As Inofec’s CEO con-
cluded: “We are going to design way more elaborate marketing strategies.
In doing so, we will focus on the linkages between online and offline activi-
ties, explicitly distinguish the effects, and explore new opportunities due to
new technical developments.”
Note
1. For each condition, we subtract the gross profits in the three months preceding the
experiment from gross profits in the three months of the experiment, and then scale each
condition’s profit change by the national average profit change (to control for seasonal
and general economy factors that may boost or depress profits in all conditions).
References
Davenport, T. 2009. Make better decisions. Harvard Business Review (November), 117–123.
Dekimpe, M. G. and D. M. Hanssens. 1999. Sustained spending and persistent response:
A new look at long-term marketing profitability. Journal of Marketing Research 36(4),
397–412.
Kayande, U., A. De Bruyn, G. L. Lilien, A. Rangaswamy, and G. H. van Bruggen. 2009.
How incorporating feedback mechanisms in a DSS affects DSS evaluations. Information
Systems Research 24(4), 527–546.
Lilien, G. L. and A. Rangaswamy. 2008. Marketing engineering: Models that connect
with practice. In B. Wierenga, ed. Handbook of Marketing Decision Models. New York:
Springer Science Business Media, 527–559.
Pauwels, K. H. 2004. How dynamic consumer response, competitor response, company
support and company inertia shape long-term marketing effectiveness. Marketing Science
23(4), 596–610.
Pauwels, K. H., T. Ambler, B. H. Clark, P. LaPointe, D. Reibstein, B. Skiera, B. Wierenga,
and T. Wiesel. 2009. Dashboards as a service: Why, what, how, and what research is
needed? Journal of Service Research 12(2), 175–189.
Wiesel, T., K. Pauwels, and J. Arts. 2011. Practice Prize Paper-Marketing’s Profit Impact:
Quantifying Online and Off-line Funnel Progression. Marketing Science 30 (4), 604–611.
402
The Data
The dataset comes from Mizik and Jacobson (2004) and covers a 24-month
period for a well-established and widely prescribed drug in the primary
care category. It contains information on the number of new prescrip-
tions for the studied drug and its competitors issued by 55,896 US-based
physicians and detailing and sampling activity by the focal pharmaceutical
firm during each month. The dataset also contains information about the
physician’s specialty area.
To illustrate the use of panel data methods, we present three sets of analy-
ses. The first set contains models making use of five different panel data
estimators for contemporaneous effects of detailing and sampling on pre-
scriptions. The second set of models does not limit the effect of detailing
and sampling to be strictly contemporaneous. Rather, these models allow
for the fact that the effects of PSR activity are unlikely to be limited to the
month when the visit occurred but may exhibit delayed and/or carryover
effects into subsequent months. While allowing for dynamic effects, the
models in the second set ignore potential physician-specific heterogeneity
in the level of prescribing (i.e., they do not explicitly model physician-
specific effects). The third set contains models that allow for both dynamic
effects and physician-specific effects and presents the final complete model
we recommend for these data.
Table 17.1 provides the results from five different panel data estima-
tors that link monthly prescriptions of the drug to PSR activity taking
place during that month. Model 1 is the “population average” estimator
that involves a least-squares analysis of each data point. Model 2 is the
“between” estimator that makes use of the mean values for each physician.
Unlike the population average estimator, which makes use of both time-
series and cross-sectional variation to estimate the model, the between
model makes use of only cross-sectional variation. As such, the between
estimator is analogous to cross-sectional regressions.
Models 3, 4, and 5 incorporate the heterogeneity in physician prescrib-
ing. Model 3 is “the random effects” estimator. It allows for physician-
specific effects (ui) but posits any such effects to be uncorrelated with
the regressors in the model. Such an assumption, however, might not be
Notes:
‡
Model specifications are provided below. Results are presented as estimate (standard
error). Time, specialty, and specialty-specific trend estimates are not reported for brevity.
The number of observations differs across the models due to the averaging, taking of first-
or mean-differences, and removing outliers.
** p-value < 0.01.
g 11 k *Specialtys + g 11
s51 s
ws*Specialtys*Trendt + hit
k *Specialtys + hi
s51 s
g 11 k *Specialtys +g 11
s51 s
ws*Specialtys*Trendt + (ui + hit)
g 11 k *Specialtys + g 11
s51 s s51
ws*Specialtys*Trendt + hit
The fact that the Hausman test rejects the null hypothesis does not neces-
sarily mean that a fixed effect correlated with the regressors is present.
Rather, other types of mis-specification may be inducing the significant
differences in the model estimates. For example, some time-varying vari-
ables may have been omitted from the model and their exclusion can be
causing bias.
Indeed, it can be expected that marketing activities have effect not just
in the contemporaneous month but rather may exhibit delayed and/or car-
ryover effects. Further, physician prescribing behavior may exhibit habit
persistence that would induce current prescribing behavior to be related
to past prescribing behavior. To assess the presence of these factors (as an
alternative to physician-specific effects), Table 17.2 provides the results
from three models that include not just contemporaneous marketing
effects but also allow for carryover effects and persistence in physician
prescribing behavior. Because these models allow for an influence of
lagged prescriptions on current period prescriptions, the assumptions of
the random effects models are violated and random effects estimation is
not appropriate for these models. Therefore, these models are estimated
through ordinary least squares.
Model 6 augments the current effects specification with one-month
lagged prescriptions to capture habit persistence (i.e., a state-dependency).
Model 6 can also depict carryover effects to the extent that it reflects a
geometric decay in marketing effects (i.e., a Koyck distributed lag model).
Model 7 adds 12 lags of both detailing and samples in additional to a
contemporaneous effect so as to explicitly model carryover effects. Model
8 also has 12 lags of detailing and sampling but in addition includes 12 lags
of past prescriptions as well.
Notes:
‡
Model specifications are provided below. Results are presented as estimate (standard
error). Time, specialty, and specialty-specific trend estimates are not reported for brevity.
The number of observations differs across the models due to the inclusion of lagged terms
and removing outliers.
** p-value < 0.01, * p value < 0.05.
Models legend:
Model 6: Prescribeit = a0+ b0*Detailsit + g0*Samplesit + 1*Prescribeit-1
+ g t51
T
dt *Timet + g 11 k *Specialtys + g 11
s51 s
ws*Specialtys*Trendt + hit
gj*Samplesit-j + 1*Prescribeit-1
T
s51 s
ws*Specialtys*Trendt + hit
*Prescribeit-j
j51 j
T
s51 s s51
ws*Specialtys*Trendt + hit
Model 6 shows the implied total effects of detailing (.632) and sampling
(.116) very similar to the population average current-effects model (Model
1), but attributes the effect not solely to effects occurring at the month of
the PSR activity.1 Rather, the model depicts smaller current-term effects
of .165 for detailing and .130 for sampling that persist at a monthly rate of
.739 (i.e., dissipate at a monthly rate of .261). One of the advantages
of Model 6 is that imposing a geometric decay structure/specification for
habit persistence allows for a parsimonious model of possible carryover
effects. However, the parsimony is not required with a sufficient number
of observations and it may, in fact, come at the cost of accuracy (as the
imposed structure may not be accurately reflecting the data).
Model 7 allows for direct modeling of the delayed and carryover effects
by adding 12 monthly lags of detailing and sampling. The results from
Model 7 show the constraints imposed by Model 6 are not reflective of
the data. The pattern of the estimates for the lagged effects of detailing
and sampling shows that the assumption of geometric decay implicit in
a Koyck specification does not hold: Marketing effects are not decaying
geometrically from the current period. In fact, it appears that the effects
of sampling do not dissipate at all and remain relatively constant over the
12 months. The total implied effect of detailing (.464) decreases notably
compared to the Model 6 specification, while the total implied effect of
sampling (.235) more than doubles.
Model 8 allows for direct modeling of the delayed and carryover
The models presented in Table 17.3 allow for the presence of fixed
physician-specific effects correlated with the regressors. Model 9 aug-
ments the Model 6 state-dependency specification with the inclusion of
a fixed effect. The within (mean-difference) estimator used to estimate
fixed-effects Model 4 is no longer appropriate as this estimator generates
downward-biased estimates for the lagged dependent variable (Nickell
1981). However, the first-difference estimator provides an approach both
for controlling for fixed effects and for obtaining consistent estimates
for the lagged dependent variable. Taking first-differences of the data
removes the fixed effects from the estimating equation. But it also induces
correlation between the lagged dependent variable and the error term:
DPrescribeit-1 will be correlated with the differenced error term (hit – hit-1)
by construction. As such, least squares estimation of a first-difference
model with a lagged dependent variable would generate biased estimates.
An instrumental variable approach can be used to generate consistent
estimates. Following Anderson and Hsiao (1982), we use lagged values of
the levels of the series (values at time period t–2 and earlier) to generate
instrumental variable estimates for DPrescribeit-1. This procedure gener-
ates consistent (i.e., asymptotically unbiased) estimates of the parameters
and their standard errors.
Notes:
‡
Model specifications are provided below. Models are estimated in first-differences. Results
are presented as estimate (standard error). Time and specialty effects estimates are not
reported for brevity. The number of observations differs across the models due to the
taking of first differences, the inclusion of lagged terms, and removing outliers.
** p-value < 0.01, * p value < 0.05.
§
Instrumental variable estimate utilized.
Models legend:
Model 10: Prescribeit = ai+ b0*Detailsit + g0*Samplesit + 1*Prescribeit-1 +
+ g t51
T
dt*Timet + g 11 k *Specialtys + g 11
s51 s
ws*Specialtys*Trendt + hit
+ g t51
T
dt*Timet + g 11 k *Specialtys + g 11
s51 s
ws*Specialtys*Trendt + hit
+ g j50
6
lj*Competitorit-j + g t51
T
dt*Timet + g 11 k *Specialtys
s51 s
+ g 11
s51
ws*Specialtys*Trendt + hit
Column 1 of Table 17.3 reports the estimation results for Model 9. The
estimated coefficients are markedly different than those of Model 6
(i.e., a habit persistence model without fixed effects). The current period
effects of detailing (.042) and sampling (.005) are significantly lower, as
is the coefficient for lagged prescriptions (.023), than those in Model 6.
That is, Model 6 is mistaking the unmodeled fixed effect for persistence.
Since under the null hypothesis of no fixed effects the estimates from the
two models should both yield consistent coefficient estimates, the null
hypothesis of no fixed effects can be rejected. Since the estimated effect
of prescriptions lagged one month is very small (.023), the implied totals
from Model 9 are virtually indistinguishable from the contemporaneous
fixed effect Model 5.
Model 10 augments Model 9 by including additional lagged terms
of detailing, sampling, and past prescriptions into the specification.
Unlike the results of Model 8, the inclusion of physician-specific fixed
effect shows the effects of detailing, sampling, and lagged prescriptions
dissipating and all but vanished for lags greater than 6 months. The
difference in estimated coefficients between Model 10 and Model 9
highlights the importance of the inclusion of additional lagged values
of the series. The estimated implied total effects of detailing (.211) and
sampling (.024) are approximately five times larger than those reported
in Model 9.
The effects of lagged detailing, sampling, and prescriptions shown in
Model 10 indicate one reason (i.e., omitted variable bias) that can account
for the difference in coefficient estimates between the mean-difference
(Model 4) and first-difference (Model 5) estimators in Table 17.1. An
additional potential consideration is the role of measurement error.
Conclusion
Panel data studies are being increasingly used as researchers have come to
appreciate the additional insights that can be gained compared to cross-
sectional studies and estimation precision achieved as compared to time
series. Effective panel data analysis involves understanding heterogeneity
both across units and across time-series dynamics. Carefully comparing
estimation results across various estimators and models provides research-
ers a mechanism to better model effects and understand the nature of
underlying relationships in the data.
Notes
1. The total effect of detailing and sampling can be calculated as g j50 bj / [ 1 2 g Ll50 fl ] and
gK g
J
k50 kg / [ 1 2 L
f
l50 l ] , respectively.
2. Conversely, the effects of measurement error may also be reduced in fixed effects panel
data estimator to the extent the measurement error is autocorrelated.
3. Just as substitution effects cause competitor prescriptions to influence own prescriptions,
own prescriptions will influence the amount of competitor prescriptions. To account for
this simultaneity, we make use of instrumental variable estimation by using lagged values
of the levels of the series (values at time period t – 2 and earlier) to generate an instrumen-
tal variable estimates for ∆Competitorit.
References
Anderson, T.W. and Cheng Hsiao (1982), “Formulation and Estimation of Dynamic Models
Using Panel Data,” Journal of Econometrics, 18, 47–82.
Griliches, Zvi and Jerry A. Hausman (1986), “Errors in Variables in Panel Data,” Journal of
Econometrics, 31, 93–118.
415
conditions to all their dealers in a local market (i.e., they cannot alternate
sales promotions among retailers in a local market).
There are two main findings. First, in durable goods markets, consum-
ers are heterogeneous with respect to transaction types as well as brand
preferences. Second, consumers are heterogeneous in their relative sen-
sitivity to the different pricing instruments, not just on their overall price
sensitivity. Thus, some consumers are more responsive to a cash discount,
others to a reduced interest rate, etc. Hence, price discounts of the same
magnitude may lead to different effects, depending on what instruments
are used and the idiosyncratic price sensitivities of the target consumers.
A menu of pricing options tends to be most profitable, given the
constraint of blanket pricing. The best combination of pricing instruments
and their respective levels depends on the consumers’ transaction type
preferences and price sensitivities in the target market. Hence, a profit
maximizing manufacturer needs to find the “optimal” structure for its
pricing program, not just an overall “optimal” price level.
Transaction type /
MIZIK_9781784716745_t.indd 419
finance term {τ}
… …
Lease Cash
Dealer financed
419
Stand Stand
Alone Stand alone Rebate / APR
Alone
Rebate APR program Combo program
Rebate
Promotional APRs
aprs
Market rate
14/02/2018 16:38
420 Handbook of marketing analytics
exp (Vtm,i
h )
Phtm (i)
a exp (Vtm,k)
5 (18.4)
h
k
Finally, in the third stage the national mean is assumed to come from a
distribution defined by the hyper priors as follows9:
Empirical Illustration
We illustrate the modeling approach with an empirical application to
entry-level SUVs in the Western region10 (Arizona, California, Hawaii,
Idaho, Nevada, Oregon, Washington).
The PIN data base has data from 22 DMAs in the Western region. Note
that this empirical application does not correspond to any actual client
implementation. Confidentiality prevents us from publishing details of
actually implemented models. However, this illustration is realistic in that
it follows the current model methodology used in the implemented models.
Data Description
The main data source is new car sales transactions collected by the Power
Information Network, a division of J.D. Power and Associates. PIN
Transaction Types
Plots of parameter estimates are presented in Figure 18.2. The mean of the
posterior distribution of the parameters has the expected sign, and very
rarely there is a sign change within a 95 percent interval. The plots also
reveal substantial differences in response parameters across local markets.
Simulations
0.5
1.0
1.5
2.0
2.5
–6.50
–6.25
–6.00
–5.75
–5.50
–5.25
–5.00
SF SF
I I
BE EL BE EL
MIZIK_9781784716745_t.indd 423
C N D C N D
H D H D
IC O IC O
FR O- BO R FR O- BO R
ES RE IS ES RE IS
N DD E N DD E
O O
-V IN -V IN
G G
M H ISA M H ISA
ED O ED O
N LIA N LIA
FO LA L O FO
R U R
LA OL
U
D LO S V LU D LO S V LU
-K S E -K S E
M LA A G M LA A G
O N A O N A
N MA GE S N MA GE S
TE T LE TE T
R H R H
LE
EY FA S EY FA S
PA -SA LL PA SA LL -
LM L S LM L S
SA I SA I
C SP NA C SP NA
R R S R R S
A P IN A I
M M
DMA
DMA
SA N TK
D SA N K N X
N F N D
TO O FR TO O
Rebate Coef
TA RA N R TA A N R
BA NC -M RE BA NC -M RE
R I O N R IS O N
BR SC D O BR C D O
O SA ES
A - A
O SA ES
-
-S OA N D TO -S OA N D TO
A A
N K-S IE N K-S IE
M A G M
Log of Monthly Pmt (lease) Coef
SE AR N O A G
SE AR N O
A - S J A J
TT A OS TT -SA OS
14/02/2018 16:38
424 Handbook of marketing analytics
(in this case $3,600) is the result of the weighted average of the cost of all
transaction types (see Figure 18.1).
We built a market simulator based on the sample of consumers used for
calibration. We updated the environment (i.e., the pricing and incentive
programs for all products and markets) to reflect market conditions in
May 2016. Then, we created a set of scenarios in which Model X would
change the incentive offerings.13 Drawing from the posterior distributions
of the response parameters, we obtained distributions for the expected
share and program cost (price discount) for Model X. We used the means
of the resulting distributions (share and cost) to evaluate programs. For
example, increasing customer cash to $3,500, while lowering the APRs to
0.9 percent (36 months), 1.9 percent (48 months), 2.9 percent (60 months);
adding a combination of $2,500 customer cash and a 1.9 percent (36
months), 2.9 percent (60 months) APR program; lowering lease cash to
$1,250; and discontinuing $700 in dealer cash, would result in an increase
of sales of 2.9 percent with a reduction in unit cost of $278. We also found
programs that would increase sales by 6 percent for the same cost, or that
would keep sales at the same volume with savings greater than $300 per
unit.
5,500
MIZIK_9781784716745_t.indd 425
5,000 Program at the beginning
of january '16
4,500
4,000
3,500
2,500
425
Proposed program
2,000
1,500
500
0
6,500 7,000 7,500 8,000 8,500 9,000 9,500 10,000 10,500 11,000 11,500
January 2016 retail sales
14/02/2018 16:38
426 Handbook of marketing analytics
“captive” at the market standard rate, as well as consumers who take the
promotional APR or lease program, qualify for the $1,000 captive cash.
The lower financing interest rates in the proposed program result in a
greater discount (net present value) of about $2,400. Thus, after account-
ing for the elimination of the captive cash, APR transactions enjoyed
a net enhancement of $1,400, while the promotional money for rebate
and lease transactions got reduced by $1,000 (through the elimination of
captive cash). In sum, the efficiency gains hinge on reducing promotional
Concluding Remarks
only on choice effects and does not capture the peaks and troughs driven
by consumers accelerating or postponing their decisions (not necessarily
affecting choice). Even though incremental sales are driven by choice
effects, it is also relevant to capture the up and down waves driven from
consumer time decisions for proper planning.16
Notes
1. We consider that referring to price promotions as a “cost” is a misnomer. In fact, price
promotions are a tool to customize pricing and increase revenues through price discrim-
ination among consumers with different degrees of price sensitivity (cf. Varian 1980).
We use the term “cost,” in this chapter, to be consistent with the usage and accounting
practices in the automobile industry.
2. There are a few cases in which dual dealerships are allowed, e.g., for low-share makes.
Note also that some automakers allow dealers to carry more than one of the auto-
maker’s nameplates (e.g., Chrysler and Jeep).
3. Note that this specification does not assume that consumers in a local market are
homogenous. We capture observed heterogeneity, first, through information of the car
traded-in and consumer demographics. Second, we capture within-DMA unobserved
heterogeneity through the posterior distribution of the DMA response parameters
(analogous to estimating DMA-level random coefficients).
4. Weights are used to project the PIN data sample to the volumes and shares of each
DMA, then to project the respective DMAs to the corresponding region shares and
volume and to project regions to the US market, using a procedure similar to the one
described by Maddala (1993) for choice-based samples.
5. Other examples of the use of nested logit and related models are Ainslie et al. (2006);
Cui and Curry (2005); Nair et al. (2005); Sriram et al. (2006) and Yang et al. (2006).
6. Note that, as illustrated in Figure 18.1, we tested a four-level nested logit (product,
acquisition type, program type, term) and a three-level nested logit (product, acquisi-
tion/program type, term). However, in the empirical analysis, dissimilarity coefficients
(i.e., inclusive value parameters) for financing terms and transaction types were not
significantly different from 1, and the model reduced to the two-level nested logit illus-
trated here. Dasgupta et al. (2007) found a similar result. However, in other applica-
tions, e.g., at the national level with a larger number of local markets, we have found
3- and (in a few cases) 4-level structures.
7. For simplicity, we omitted the error terms. The multinomial nested logit assumes gen-
eralized extreme-value distribution for the error structure (McFadden 1978; Maddala
8. The dissimilarity parameter is the coefficient of the inclusive value: ln ( g exp (Utm,itr
1993, 70), i.e., that the error terms in each nest are correlated (Train 2003, 83).
h )).
tr
The inclusive value represents the overall attractiveness of the corresponding lower
nest, expressed as the natural log of the denominator of the corresponding multinomial
logit in equation (18.2). McFadden (1978) showed that the dissimilarity coefficient is
approximately equal to 1 minus the pairwise correlation between the error terms of the
alternatives in that node, which in this case are the transaction-type utilities in equa-
tion (18.3). Hence, the value of the dissimilarity coefficient should be in the [0,1] range.
Values outside the [0,1] range are indicative of model misspecification. A value of nm =
1 indicates complete independence and the nested logit reduces to the standard multi-
nomial logit (Train, 2003).
9. Given this hierarchical set up, the posterior distributions for all unknown parameters
can be obtained using either Gibbs or Metropolis-Hastings steps. r, h, R and C are
set to be the number of parameters plus one, 0 (null matrix), I (Identity Matrix),
and I*1000, respectively, which represents a fairly diffuse prior yet proper posterior
distribution.
10. This “Western” region is for illustrative purposes only and does not correspond to any
actual specific automaker region definition.
11. A major source of fleet sales is vehicles sold to rental car companies, which are often
affiliated with or owned by a car manufacturer. Hence, fleet sales are frequently
“managed” by automakers to partially offset supply-demand gaps. Using total sales,
including fleet sales, as was done by Berry, Levinsohn and Pakes (1995, 2004) and
Sudhir (2001) would bias the response parameter estimates.
12. Because the cost (effective average price discount) of an incentive program depends on
the proportion of consumers who will choose each component of the program (e.g.,
cash rebates, reduced interest rate, lease), the effective cost is not known a priori. We
need to estimate the impact on sales (or share) and the cost for each program using the
model.
13. These scenarios were created by modifying the levels of the components of the incen-
tives offered by Model X and searching for better programs in a trial and error mode.
For simplicity, we kept the pricing and incentives offered by competitors fixed at the
May 2016 levels. However, competitive programs could be modified simultaneously
with the target product (in this case, Model X).
14. Note that of the 49 percent of consumers who prefer to take the rebate of $2,500 at the
beginning of January 2016, 33 percent also finance through the captive and qualify for
the additional $1,000 captive cash. The remaining 16 percent, either pay out of their
pockets or finance through other financing institutions (e.g., a credit union).
15. This result is consistent with the finding of Bruce et al. (2006) about rebates being used
to enhance the “ability to pay,” particularly for consumers who have “negative equity”
in the car they are trading in.
16. Additionally, making predictions for peaks and troughs explicit and linked to purchase
acceleration would help prevent a misleading read of outcomes (e.g., if a purchase accel-
eration peak is interpreted as a higher incremental volume than true).
References
Ainslie, Andrew, Xavier Drèze and Fred Zufryden (2005), “Modeling Movie Life Cycles and
Market Share,” Marketing Science, 24 (3), 508–517.
Berry, Steven T., J. Levinsohn and A. Pakes (1995), “Automobile Prices in Marketing
Equilibrium,” Econometrica, 63 (4), 841–890.
Berry, Steven T., J. Levinsohn and A. Pakes (2004), “Differentiated Products Demand
Systems from a Combination of Micro and Macro Data: The New Vehicle Market,”
Journal of Political Economy, 89, 400–430.
Bruce, Norris, Preyas Desai and Richard Staelin (2006), “Enabling the Willing: Consumer
Rebates for Durable Goods,” Marketing Science, 25 (4), 350–366.
Bucklin, Randolph E. and James M. Lattin (1991), “A two-state model of purchase incidence
and brand choice,” Marketing Science, 10 (Winter), 24–39.
Busse, Meghan, Jorge Silva-Risso and Florian Zettelmeyer (2006), “$1000 Cash Back:
The Pass-Through of Auto Manufacturer Promotions,” American Economic Review
(September), 1253–1270.
Cui, Dapeng and David Curry (2005), “Prediction in Marketing Using the Support Vector
Machine,” Marketing Science, 24 (4), 595–615.
Dasgupta, Srabana, S. Siddarth and Jorge Silva-Risso (2007), “Lease or Buy? A Structural
Model of a Consumer’s Vehicle and Contract Choice Decisions,” Journal of Marketing
Research, (August), 490–502.
Maddala, G. S. (1983), Limited-Dependent and Qualitative Variables in Econometrics, New
York: Cambridge University Press.
431
markets. For instance, in 2012 consumers could choose among 920 digital
cameras, 1,196 washing machines or 1,514 vacuum cleaners (Ringel and
Skiera 2016). Yet, mapping techniques provided by marketing scholars
at that point in time uncovered and visualized competitive relations only
among a limited number of products (e.g., 7 detergents, 62 digital cameras
or 169 cars, see Ringel and Skiera 2016).
While it is relatively easy to visualize competitive market structure for
small markets by mapping bubbles onto a two-dimensional space, where
each bubble represents a single product, the graphical representation of
larger markets quickly takes the form of a dense lump of bubbles, making
the resulting map difficult to decipher (Netzer, Feldman, Goldenberg and
Fresko 2012). Such lumping among hundreds of products is especially
severe when the visual representation is generated using multidimen-
sional scaling techniques (MDS) that have become popular in marketing
research over the past decades. Moreover, a circular bending effect, which
refers to objects being mapped in a circular shape or “horseshoe,” is
common to MDS solutions and can lead to an inaccurate interpretation of
competitive relationships, since products that have weak or non-existent
competitive relationships with one another may appear closer together
than they should (Kendall et al. 1970; Clark, Carroll, Yang and Janal
1986; Diaconis, Goel and Holmes 2008).
The main reason such horseshoes appear when mapping large markets
using MDS is that large markets typically consist of several submarkets
with the products of one submarket having no or only very weak relations
to products of other submarkets. For instance, assume that display size
is a submarket defining criteria for TV sets. Someone wanting to buy a
TV for a large space in his living room will probably only choose among
very large TVs (e.g., 60-inch display) and not consider smaller TVs (e.g.,
all TVs smaller than 55 inches). Consequently, most competitive relation-
ships among products in very large markets are either very weak, or most
often, even zero, leading to what we refer to as a very sparse dataset. When
MDS now attempts to position products of a sparse data set in a map in
such a way that all these zero or nearly zero relationships are reflected in
similar distances of the corresponding products to another, it arranges
them in a circular, horseshoe shape.
To solve the above problems, analysts can confine their analysis to
individual submarkets. However, unlike in the above example of small and
large TV sets, it is not always clear what the true submarket-separating
criteria are. Therefore, an analyst can easily make a mistake when defining
individual submarkets up front, leading to an incomplete and perhaps
even incorrect competitive market structure map. And finally, when
only individual submarkets are analyzed, no insight is created as to how
these individual submarkets relate to another and where exactly they are
separated.
Another important aspect of competitive analysis is competitive asym-
metry. It exists when the degree of competition between two firms is
not equal, such as when Firm A competes more intensely with Firm B
than Firm B competes with Firm A (DeSarbo and Grewal 2007). For
example, Apple is a large and best-known manufacturer of MP3 players
(i.e., iPods) whereas iRiver only supplies a few models and is less known.
From iRiver’s perspective, the competition with Apple is quite intense.
From Apple’s point of view, however, iRiver is hardly a competitor worth
noting. A complete visualization of competitive market structure must
therefore also include competitive asymmetries.
Data Collection
Map Generation
Map Exploration
439
Figure 19.2 Four solutions to represent the development of asymmetric competitive market structure map construction
with DRMABS
14/02/2018 16:38
MIZIK_9781784716745_t.indd 440
440
Figure 19.3 Visualization of asymmetric competitive market structure map of 1,124 LED-TVs
14/02/2018 16:38
MIZIK_9781784716745_t.indd 441
441
Legend
Bubbles represent individual products (SKUs)
Bubble color indicates brand
Bubble size indicates display size
Top 10 brands by market share (GfK):
Samsung Philips LG Panasonic Sony Toshiba Sharp Grundig Loewe Telefunken
Figure 19.4 Using brand and display size to understand competitive market structure of 1,124 LED-TVs
14/02/2018 16:38
442 Handbook of marketing analytics
Model Comparison
Figure 19.5 3D TVs in the competitive market structure map of 1,124 LED-TV
14/02/2018 16:38
Kamada-Kawai Ordinal MDS Cluster-centric kamada-kawai
MIZIK_9781784716745_t.indd 444
Fruchterman-reingold VOS DRMABS
444
Bubble size indicates global competitive asymmetry (consideration frequency)
Bubble color indicates cluster membership
Mean top 10 hit-rate in %
Figure 19.6 Comparison of different models to display the competitive structure of LED-TV market
14/02/2018 16:38
Visualizing asymmetric competitive market structure in large markets 445
DRMABS, but the hit-rate of VOS is less than half as high (19 percent).
The cluster-centric Kamada-Kawai solution, which does not optimize
submarket rotation and dilation, suffers from heavy overlapping of
submarkets. Clearly, DRMABS outperforms all other models in terms
of hit-rate (41 percent), shows clear submarket separation and does not
exhibit circular bending or lumping of dominant products.
Conclusion
Note
1. This article is based upon Ringel DM, Skiera B (2016) Visualizing asymmetric com-
petition among more than 1,000 products using big search data. Marketing Science
References
Digital display advertising has established itself as the primary outlet for
advertising dollars spent by marketers online and reached $27 billion in
2015 (eMarketer, 2015). The key to display advertising is user informa-
tion, which feeds into an ad-targeting engine to improve responses to
advertising (e.g., click-through rate or other forms of interaction). One
of the main constituents of user data is web browsing information. As
consumers navigate through the web, advertising networks, such as
Advertising.com or ValueClick.com, can track their online activities
across multiple sites participating in their network, building behavioral
profiles of each individual. One popular way of describing consumers’
interests and preferences revealed through online activities is to represent
individual profiles as a vector of count data that captures number of visits
to corresponding types of websites. For example, a profile dimension on
“Interests in Sports” will be high for a person who is frequent to ESPN.
com; in turn, regular visits to Netflix.com would serve as a proxy for
“Interest in Entertainment.” The observed online activities of an indi-
vidual consumer are thus a collection of visitations to many websites of
different categories, which reflects a combination of her various interests
and behavioral patterns.
This approach to constructing behavioral profiles, while straightfor-
ward, faces some important challenges. First, individual consumer-level
records are massive and call for scalable, high-performance processing
algorithms; second, advertising networks can only observe a consumer’s
browsing activities on the sites participating in the network, potentially
missing site categories not adequately covered. The latter, in particular,
results in a biased view of the consumer’s profile that could lead to subop-
timal advertising targeting.
We present a method that aims to address these challenges. Extending
the Correlated Topic Model (Blei and Lafferty 2007), we develop a
modeling approach that augments individual-level ad network data with
anonymized third-party data that significantly improves profile recovery
performance and helps to correct for potential biases. The approach is
scalable and easily parallelized, improving almost linearly in the number
448
40
Actual Profile Advertiser’s View
35
30
Number of Visits
25
20
15
10
5
0
ce
s/R nce
es
es
ls
rs
es
ia
lt
ai
en
le
io
io
du
ta
ve
ed
rc
am
ic
an
et
ty
nm
at
at
e
rv
ou
er
A
R
M
Po
s
in
s
G
ig
re
fe
Se
lS
ai
X
es
sF
av
al
or
P
Li
rt
X
na
ci
/N
te
nf
es
te
X
So
io
ra
s/I
sin
ie
En
ch
ot
po
or
ew
ar
Bu
om
ct
or
Se
N
ire
C
Pr
D
Our data are obtained from a leading global information and measurement
company that wishes to remain anonymous. The dataset contains detailed
website browsing information of a large panel of more than 45,000 house-
holds over a 12-month period, from January 2012 to December 2012. For
each household in the panel, a detailed log of browsing activities at session
level is recorded. Each website being visited is assigned a unique category,
with a total of 29 categories. The most popular categories include “Social
Media,” “Entertainment,” “Portals,” “News/Information,” and “Retail.”
Figure 20.1 shows an example of a profile fragment. Each solid bar rep-
resents the number of visits to the corresponding site category over a
certain period of time. This consumer shows high level of engagement with
Entertainment, Games and Social Media sites and fairly low interest in
Business Finance and Lifestyles sites.
The consumer profile depicted on Figure 20.1 represents an unbiased
view of the consumer’s online browsing activities, as it was collected using
tracking software installed on the consumer’s computer. An advertiser’s
view of this profile may be quite different, as it depends on the advertiser’s
tracking ability (or the ad network coverage). For example, if Netflix and
Profile prediction based on Profile distribution from 3rd Bias Corrected Profile
the advertiser’s data party data (anonymized)
Facebook are not part of the advertiser’s network, the profile view may
look like that depicted by pattern-filled bars on Figure 20.1, where the
advertiser underestimates the consumer’s interests in Entertainment and
Social Media categories. This could affect the decision of what type of ads
to serve to this consumer.
The approach presented in this case study addresses this problem as
follows (Figure 20.2). First, we develop a statistical model that describes
a consumer’s profile and, importantly, captures dependencies across
different dimensions of the profile. Second, we calibrate this model using
anonymized third-party data available from market research firms that
employ large online user panels and collect their browsing activities. As
a key outcome of this step, we learn various relationships in cross-site
category activities that exist on the population level. Finally, we combine
the profile information extracted from the advertiser’s own records (pre-
sumably incomplete) with the relationships inferred from the previous step
to arrive at the bias-corrected view of the individual profile.
Our statistical model for describing consumer profiles extends from
the Correlated Topic Model, or CTM (Blei and Lafferty 2007), which
is among the latest developments in the family of Topic Models. Topic
models were originally used to identify the mixture of topics present in a
large number of documents. Just like a document can be considered as a
combination of multiple topics, a consumer’s website visit activities can
be considered as the combination of multiple “roles” or objectives. For
example, the consumer may play a “social” role, where she visits places
like Facebook or Twitter; she may play a “shopper” role at another time
and visit places like eBay or Amazon; she may also play an “information
seeker” role at yet another time, visiting CNN and blogs, etc. Topic
models thus are a good conceptual fit to our task of user profiling using
website visit data.
The most commonly used topic model is the Latent Dirichlet Allocation
model, or LDA (Blei et al. 2003; Griffiths and Steyvers 2004). LDA
models the generation of mixed-topic documents in two steps, from
document to topic and then from topic to word, with each step modeled
In the vector, Vitc is the number of times a consumer visits websites that
belong to category c in the time period t and C is the total number of
categories.
Following the conceptual framework of topic models, each individual
visit takes place in a two-step process. First, the consumer decides on
the role for the website visit. Next, according to the role decided on in
the first step, the consumer decides on the website to visit. For example,
a consumer may decide that she wants to do some online shopping and
then visits Amazon.com. A consumer is expected to have multiple needs,
such as shopping, social, education, etc. The overall website visit profile is
the combination of the different roles the consumer plays to satisfy those
needs. Different consumers would have different emphasis on individual
roles. A college student, for example, may spend more time playing educa-
tion and social roles than a retired person does. The role-composition of
consumer i in time period t is denoted as:
In the vector, R is the total number of roles; pitr is the probability she
plays role r in time period t.
When playing different roles, a consumer is expected to visit different
categories of websites with different probabilities. Someone who is doing
online shopping may visit Amazon and eBay, while someone who is
studying may visit a university website. Each role is thus represented as
In the vector, ϕrC is the probability a consumer taking role r will visit a
web site which belongs to category r.
Furthermore, the total number of visits of consumer i at time t, repre-
senting the consumer’s internet usage intensity, is denoted as Nit and is
drawn from a Poisson distribution:
exp (uitr)
1 1 a rr 51..R21exp (uitrr)
pitr 5 , r 51,. . .,R21
µ (20.5)
1
11 a
pitR 5
exp (uitrr)
rr 51..R21
In equation (20.6), uir is the consumer i’s baseline propensity for role
r. A positive value of uir indicates that the r-th role accounts for a bigger
portion of website visitation than the last role, role R. Xit is a vector of
observed characteristics that can be consumer-specific, time-specific,
or both. The corresponding coefficients are captured in rY r . Admitting
observed heterogeneity this way allows us to analyze how observed
consumer characteristics and other observed characteristics determine role
composition. For example, if age is observed and we expect a younger con-
sumer to spend more time playing a “social” role, then the coefficient for
age for the social role should be positive. Firms that possess large amounts
of data on such characteristics can thus leverage such information to
ui1 u
... . .1 .
° uiR21¢ , Na ° u ¢ , Sb (20.7)
R21
li ul
In equation (20.7), S encodes the variance of the distribution of each
role across consumers, and the correlations among roles and between roles
and the web site usage intensity.
As discussed earlier, each consumer visit is generated from a two-step
process. For each of visit v,v 51..Nit , she first decides on a role:
Then, based on the chosen role, she decides on the category of the web
site to visit:
r , Dir (a
Y) (20.11)
The improvement in user profiling afforded by our model may have sig-
nificant profit implications for firms. We now present an economic simu-
lation that illustrates potential gains the proposed model may offer to a
firm if used in individual-level targeting of display ads.
Consider a hypothetical digital advertising agency that generates traf-
fic to the website of their client using display (banner) advertising. The
agency distributes ads through ad exchange, paying $2.80 per thousand
impressions served (CPM) and getting an industry average click-through
rate of 0.5 percent (Johnston 2014). Accordingly, the agency’s effective
cost of generating a click to the client’s website is $0.56. The agency
charges the client a pre-negotiated rate of $0.67 per site visit. The agency
operates on a set daily budget of $1,000, which helps to generate about
1,786 visits per day with the baseline click-through rate of 0.5 percent.
Clearly, the agency’s profitability will improve if it can produce more
clicks. While several factors contribute to the click-through rate of a
given ad (e.g., ad creativity, page placement, context), profile-based
targeting is one of the key drivers of ad performance (Hazan and Banfi
2013).
Targeting approach
Base Histogram Proposed model
Precision 30% 42% 54%
CTR 0.50% 0.52% 0.54%
As a common practice in this industry, the agency employs their own pro-
prietary scoring model that links user’s online behavioral profile and pro-
pensity to click on the ad. For the sake of this simulation we assume that,
for the top 30 percent most active users in the target profile category, the
click-through probability is 25 percent higher than the average, while for
the remaining 70 percent of users the CTR is correspondingly 11 percent
lower than average. These numbers are selected to preserve the average
rate of 0.5 percent (Table 20.1).
With the help of our model, the agency should be able to improve
the performance of this campaign by targeting individuals in the “Top
users” segment. In the extreme case, all the ads should be served only to
the “Top users” segment achieving a click-through rate of 0.63 percent
(Table 20.1). Clearly such performance is unrealistic, and the effective
CTR would depend on classification accuracy, which in turn depends on
the information available to the agency and the targeting model. As part
of our study, we analyzed the information content of the data available
to several prominent advertising networks, and evaluated the potential
gains using our modeling approach.1 For example, assuming the agency
is DoubleClick (or has information of similar quality as DoubleClick),
according to the data we have, Table 20.2 presents the results of effective
CTR when different targeting models are used. Using the histogram
approach the agency is able to accurately identify 42 percent of active
users, resulting in an effective click-through rate of 0.52 percent. Our
model produces further improvement with 0.54 percent effective CTR.
Finally, substituting effective click-through rates from Table 20.2 into
profit calculations, we get an improvement in profit of 25 percent for the
Targeting approach
Base Histogram Proposed model
CTR 0.50% 0.52% 0.54%
Effective CPC $ 0.56 $0.54 $ 0.52
Traffic 1,786 1,860 1,937
Price to client $ 0.67 $0.67 $ 0.67
Revenue $ 1,196.43 $ 1,246.51 $ 1,297.57
Profit $ 196.43 $ 246.51 $ 297.57
Profit improvement over base 25.5% 51.5%
Model
Conclusion
As “digital” has established itself as a key medium for reaching and inter-
acting with consumers, one-on-one marketing is becoming a norm for
online businesses. Fueling this process is the ability to collect, analyze and
act upon individual-level data. This case study focuses on a fundamental
component of online marketing – user profiling. Valued by most online
businesses, user profile data have broad application across different areas
of digital marketing. McKinsey & Company regards online user profiling
as one of the promising opportunities companies should take advantage
of to unlock “big data’s” potential (Hazan and Banfi, 2013) Our pro-
posed approach extends the Correlated Topic Model (Blei and Lafferty
2007) for user profiling. The proposed approach augments individual-
level first-party data with anonymized third-party data that significantly
improves profile recovery performance and helps to correct for biases. The
approach is highly scalable and easily parallelized, improving almost lin-
early in the number of CPUs. It produces easily interpretable and intuitive
results, while taking into account both observed and unobserved hetero-
geneities. Using economic simulation, we demonstrate potential gains the
proposed model may offer to a firm if used in individual-level targeting of
display ads.
Note
1. In our study, we collected the website coverage information of several prominent adver-
tising networks.
References
Ahmed, Amr, Mohamed Aly, Joseph Gonzalez, Shravan Narayanamurthy, and Alexander
Smola (2012), “Scalable inference in latent variable models,” in Proceedings of the Fifth
ACM International Conference on Web Search and Data Mining, Seattle, WA: ACM,
123–132.
Blei, David M. and John D. Lafferty (2007), “A correlated topic model of science,” Annals
of Applied Statistics, 17–35.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan (2003), “Latent dirichlet allocation,”
Journal of Machine Learning Research, 3: 993–1022.
eMarketer (2015), “US Digital Ad Spending, by Format, 2013–2019” (accessed September 2015),
[available at http://acquisio.com/blog/display-advertising/display-surpasses-search-2015/].
Griffiths, Thomas L. and Mark Steyvers (2004), “Finding scientific topics,” Proceedings
of the National Academy of Sciences of the United States of America, 101.Suppl 1 (2004):
5228–5235.
Hazan, Eric and Francesco Banfi (2013), “Leveraging big data to optimize digital marketing”
(accessed March 4, 2015), [available at http://www.mckinsey.com/client_service/marketing_
and_sales/ latest_thinking/leveraging_big_data_to_optimize_digital_marketing].
Hofmann, Thomas (1999), “Probabilistic latent semantic indexing,” Proceedings of the 22nd
annual international ACM SIGIR conference on Research and development in information
retrieval. ACM.
Johnston, Michael (2014), “Display Ad CPM Rates” (accessed February 9, 2015), [available
at http://monetizepros.com/cpm-rate-guide/display/].
Papadimitriou, Christos H., Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala (2000),
“Latent Semantic Indexing: A Probabilistic Analysis,” Journal of Computer and System
Sciences, 61 (2): 217–235.
The global annual marketing budget of a company is usually set in the pre-
vious year; that is, it is fixed. If companies offer a broad product portfolio
to customers from various countries and use a variety of communication
channels they need to break down the fixed annual budget into expendi-
tures across countries, products, and communication activities. For many
firms this task requires determining individual budgets for hundreds of
allocation units. As a result, firms face a complex decision problem: they
need to allocate a fixed budget across a multitude of allocation units by
evaluating the impact of these investment decisions on future cash flows.
Since marketing expenditures are immediately recognized as costs on the
income statement, but their total impact on sales often fully unfolds only
in future periods, they need to be evaluated in terms of an investment deci-
sion and based on the principle of marginal returns. Technically speaking,
management needs to solve a dynamic optimization problem for an invest-
458
Unit sales, marketing elasticity, and growth elasticity are labeled “optimal”
in Figure 21.2 because this information is endogenous and depends on the
budget and resulting unit sales in the optimum. The exact numbers can
only be determined in an iterative process by applying dynamic numeri-
cal optimization techniques. However, the structure of the optimal solu-
tion provides the basis for deriving a heuristic rule that does not require
numerical optimization. We describe this heuristic rule subsequently.
Figure 21.3 shows how the allocation weights are determined using the
simplified decision rule. Data for the carryover coefficient, sales elasticity,
and the growth multiplier are not readily available but must be estimated.
For example, if historical sales and marketing time-series are available,
econometric methods can be used to estimate marketing elasticity and
carryover.
Expected
Last period’s revenues in
marketing elasticity Profit con- Last T periods
tribution period’s
Discount – Marketing Last period’s
1+ margin (%) revenue
rate carryover revenues
1 2 3
(Discounted) long-term Size of profit Growth
marketing effectiveness contribution potential
(T = Planning
horizon)
Current values of revenues are available from last year and the contribu-
tion margin is a target figure decided by management. The growth poten-
tial is calculated as a multiplier that divides expected revenues in 5 years
(planning horizon) by the current revenue level. By this, products get a
greater share of the total budget as long as they are expected to grow.
In contrast, when they are expected to turn into their decline stage, their
budget is reduced.
By definition, the heuristic solution is likely to differ from the optimal
solution, but it should not deviate too much to be useful. The performance
of the heuristic was tested in an experimental simulation study and found
to provide very good results that even improve after several planning
cycles and converge to the optimum if applied consecutively (Fischer,
Wagner, and Albers 2013).
Although the tool was applied to prescription drugs (see below), it is
suitable for many other industries such as consumer durables, consumer
packaged goods, etc. In all these markets, rich information is available at
the aggregate product level that allows the calibration of market response
models.
the allocation tool was targeted at the five main European countries that
contribute the largest share to total sales. The application was developed
in the period 2005–2006 and budget recommendations for 2007 were
derived.
At that point in time, the three therapeutic areas diabetes, hypertension,
and infectious diseases represented established areas that are in their satu-
ration stage. Due to the aging of the population in industrialized societies
and innovative new-product introductions, they are, however, expected to
continue to grow at moderate rates in the future. The biggest challenge for
Bayer in these areas is to keep its market position. Existing and new drugs
by other global players are the main competitors for the Bayer drugs.
In contrast, the market for the treatment of erectile dysfunction is a
new category that was pioneered by Pfizer with its Viagra brand in 1998.
Bayer and Eli Lilly followed in 2003 with the introduction of their brands
Levitra and Cialis. This market is still growing and does not face generic
competitors yet.
MIZIK_9781784716745_t.indd 465
dysfunction
Mean SD Mean SD Mean SD Mean SD
Unit sales in thousand standard units 16,319 20,674 11,391 16,649 1,008 649 5,291 8,004
Elapsed time since launch in years 14.50 12.69 10.00 7.42 2.75 1.91 12.25 10.45
Order of entry (Median) 3 4 2 3
Price in EUR per standard unit .16 .26 .50 2.96 7.00 .48 2.01 1.97
Martketing stock variables:
Detailing at general practitioners 22,519 36,566 64,595 87,134 55,026 30,326 44,259 34,930
in thousand EUR.
465
Detailing at specialists in thousand EUR. 2,081 4,068 8,803 13,701 14,498 12,771 10,380 11,353
Detailing at pharmacies in thousand EUR 588 1,453 1,930 3,039 1,766 2,598
Professional journal advertising in thousand 149 341 458 502 165 295
EUR.
Meeting invitations in thousand EUR. 730 2,030 1,361 3,062 3,884 2,481 471 837
Other Martketing expenditures in thousand 2,558 9,278 3,912 4,404
EUR.
# of countries 5 5 5 5
# of subcategories 6 10 1 12
# of products 104 306 15 100
# of observations 2,398 7,908 233 2,916
14/02/2018 16:38
466 Handbook of marketing analytics
(1) Products that generate more incremental sales with the same budget
should get a larger slice of the total budget. Of course, relative incre-
mental sales tend to decline as sales and budgets increase due to satu-
ration effects.
(2) Products with a higher level of profit contribution generate more
financial resources to cover their own marketing expenditures and
contribute more to overall profits.
(3) Marketing should support growing and not declining products and
shift resources over the life cycle.
Organizational impact
Although the allocation tool is not the only source used by Bayer to
generate budget options, it has significantly improved the efficiency and
quality of the decision process. Because of its transparency and top–down
perspective, the allocation tool ameliorates a decision process that often
appears emotional and inefficient. Since it is strictly based on a range
of verifiable input information, the allocation tool adds an independent
perspective and its recommendations are fully fact-based. The budgeting
project contributed substantially to an organizational transformation that
eventually resulted in the creation of a completely new marketing intel-
ligence unit called Global Business Support. This unit supports global
marketing management and sales, including the global management board
with tools, results, and recommendations for a more efficient and effective
use of marketing resources.
+6.7 m
4.5 m
1.5 m
Budget after
2.3 m
2.2 m
Hypertension Hypertension
Hypertension Hypertension product A product B
product A product B
Conclusion
Note
1. This chapter is an adapted version of Marc Fischer, Sönke Albers, Nils Wagner,
and Monika Frie (2011), “Dynamic Marketing Budget Allocation across Countries,
Products, and Marketing Activities,” Marketing Science, 30 (4), pp. 568–585, and
appeared slightly modified under the title “Dynamically Allocating the Marketing
Budget: How to Leverage Profits across Markets, Products and Marketing Activities,”
in Marketing Intelligence Review, 4 (1), 2012, 50–59.
References
Albers, Sönke, Murali K. Mantrala, and Srihari Sridhar (2010), “Personal Selling Elasticities:
A Meta-Analysis,” Journal of Marketing Research, 47 (5), 840–853.
Bayer (2009), Annual Report, 2008. Bayer AG, Leverkusen, Germany.
Fischer, Marc and Sönke Albers (2010), “Patient- or Physician-Oriented Marketing: What
Drives Primary Demand for Prescription Drugs?” Journal of Marketing Research, 47 (2),
103–121.
Fischer, Marc, Peter S. H. Leeflang, and Peter C. Verhoef (2010), “Drivers of Peak Sales for
Pharmaceutical Brands,” Quantitative Marketing and Economics, 8 (4), 429–460.
Fischer, Marc, Nils Wagner, and Sönke Albers (2013), “Investigating the Performance
of Budget Allocation Rules: A Monte Carlo Study,” MSI Report Series No. 13-114,
Cambridge: MA: Marketing Science Institute.
Hanssens, Dominique M., Leonard J. Parsons, and Randall L. Schultz (2001), Market
Response Models: Econometric and Time Series Analysis. 2nd ed., Boston: Kluwer.
Tull, Donald S., Van R. Wood, Dale Duhan, Tom Gillpatick, Kim R. Robertson, and James
G. Helgeson (1986), “’Leveraged’ Decision Making in Advertising: The Flat Maximum
Principle and Its Implications,” Journal of Marketing Research, 23 (1), 25–32.
Consumers often “misbehave” (Thaler 2015).1 They save and exercise too
little; they spend, eat, and drink too much and take on too much debt;
they work too hard (or too little); they smoke, take drugs (but not their
prescription medicine), have unprotected sex, and carelessly expose their
private lives on social media. These misbehaviors may entail large costs
not only to society but also to the individuals concerned. Hence, policy-
makers feel compelled to regulate these behaviors along with the extent to
which companies are allowed to cater to, or take advantage of, consumer
preferences to engage in these behaviors. Examples abound. Witness, for
example, the widespread regulatory constraints imposed by governments
on both companies and consumers such as bans on smoking and taking
drugs, curbs on alcohol consumption, or borrowing limits based on dis-
posable income. Prominent examples of regulatory constraints imposed
on marketers include Australia’s Tobacco Plain Packaging Act, which,
beginning in December 2012, requires cigarette manufacturers to use
generic, undifferentiated packaging; or New York City’s proposed so-
called soda ban of sales of sugar-sweetened drinks in cups of more than
16 ounces (ultimately rejected by the courts in 2014); or the United States
Credit Card Act of 2009, which limits how credit card companies can
charge consumers and make them pay off their debt balances.
What is it about consumer financial decision-making, eating and drink-
ing, smoking, online behavior, and other (mis)behaviors that can make
them problematic? How can empirical methods and findings from mar-
keting science be used to help marketers, consumers, and policy makers
evaluate and control these misbehaviors? In this chapter, I will focus and
build on an approach developed in Wertenbroch (1998) to outline how the
theory-guided use of experimental methods, complemented by field data,
can provide both a criterion for evaluating the need for policy intervention
and a tool, offered by government as well as private enterprise, for allow-
ing consumers to avoid or limit their own misbehaviors without imposing
heavy-handed, intrusive constraints on market participants’ freedom of
choice (Thaler and Sunstein 2003).
473
Negative Externalities
not consistent with what they choose to do, an inconsistency between their
stated and revealed preferences (Wertenbroch and Skiera 2002).
Many such cases of what we might label misbehavior, or misconsump-
tion, involve intertemporal tradeoffs, which consumers make between
consequences of their consumption choices that occur over time. People
give in to the temptation to consume or do something unhealthy (e.g.,
drink sugary soft drinks, smoke, have unprotected sex, fail to exercise
or take one’s prescription drugs) for its immediate benefits even though
they know that their choice entails much larger negative long-term conse-
quences, which they anticipate they will regret. They thus choose a sooner,
smaller reward (e.g., immediate taste benefits, pleasure, leisure, present
consumption) over a larger, later one (e.g., better health outcomes, suf-
ficient retirement savings), when the sooner, smaller reward is imminent,
even though they prefer the larger, later reward when both occur in the
future. Strotz (1955–56) showed that such intertemporally, or dynami-
cally, inconsistent preferences cannot be characterized by discounting
the future at a constant rate, which is commonly regarded as normative.2
Instead, consumers discount the future consequences of their present
choices disproportionately, or hyperbolically, relative to the immediate
consequences, entailing myopic or present-biased preferences (Ainslie
1975; Frederick, Loewenstein and O’Donoghue 2001; Laibson 1997).
Such present-biased preferences that disproportionately overvalue imme-
diate outcomes can be said to yield negative internalities, that is, costly
consequences for consumers’ own future selves (Bartels and Urminsky
2011; Herrnstein et al. 1993; Hershfield et al. 2011).
Consumers differ—not only across individuals but also intra-individ-
ually across situations—in the degree to which their choices are present-
biased and also in the extent to which they exercise self-control, that is,
in the extent to which they attempt to curb their present-bias to minimize
the negative future consequences of their present choices. Consistent with
Strotz’s (1955–56) analysis, O’Donoghue and Rabin (1999) distinguish
between rational, time-consistent consumers (who may also include those
who use willpower to resist temptation and thus do not exhibit present-
biased choices; Baumeister and Vohs 2003) and others whose choices are
characterized by present-bias. The latter encompass naïfs who do not
foresee the self-control problems that arise from their present-biased pref-
erences and sophisticates who are aware of their present-bias and hence
foresee these self-control problems. Sophisticates may exercise self-control
to curb their present-biased impulses by engaging in precommitment: at
a time when they are not yet tempted to choose a smaller, sooner reward
over a larger, later one, they foresee that they will be tempted when that
smaller, sooner reward becomes imminent. They therefore self-impose
The previous section outlined three different criteria for policy interven-
tion and consumer protection: the presence of negative externalities,
third-party assessments of individual consumer welfare, and evidence of
consumer precommitment. Of these, precommitment offers the only crite-
rion that reveals the consumer’s own preferences for controlling his or her
consumption, as opposed to relying on the consumer’s stated preferences
or on third-party assessments of the consumer’s welfare. How can firms
and policy makers detect such precommitment in consumer markets?
The first empirical analysis of consumer precommitment in the mar-
ketplace was offered by Wertenbroch (1998), providing a template for a
theory-guided, empirical identification of instances of precommitment as
a behavioral criterion to detect a need for policy intervention. The paper
introduced a formal distinction into the marketing literature between
so-called vice and virtue goods (318–319). Vices are defined as goods that
dynamically inconsistent consumers are tempted to overconsume (e.g.,
alcohol, sweets, etc.), whereas virtues are defined as goods that dynami-
cally inconsistent consumers are tempted to underconsume (e.g., exercise,
spinach, etc.), due to how the costs and benefits of consuming them are
distributed over time. For example, snacking on cookies (a vice) yields an
immediate taste benefit but may make you gain weight over time, while
doing your homework (a virtue) is effortful but helps you achieve better
subsequent grades.
Wertenbroch (1998) hypothesized that consumers who worry about
being tempted to overconsume vices ration their purchase quantities of
these vices, relative to those of comparable virtues. That is, they prefer
to buy these vices in smaller package sizes at a time. For example, many
smokers prefer to buy their cigarettes in packs rather than in cartons
(Wertenbroch 2003). This imposes additional transaction costs on mar-
ginal consumption—they have to take another shopping trip to buy a
new pack when the initial pack is finished. Hence, rationing is a form of
precommitment—at the time of purchase, when consumers are not yet
tempted to overconsume a vice (e.g., in the store), they themselves strategi-
cally change the incentives, which they expect to face later on at the time
of consumption (e.g., at home), self-imposing constraints on marginal vice
consumption. To illustrate, when you have finished a bag of potato chips,
a prototypical impulse good, it is a lot more difficult for you to eat more
chips if you have to go out and buy another bag than if you can simply
grab one from your pantry. Such strategically motivated preferences to
buy vices in smaller package sizes imply that demand for vices ought to
be less price-elastic than demand for comparable virtues: In response to
a given price reduction, demand for vices increases at a slower rate than
demand for virtues (subject to the condition that consumers do not prefer
virtues to vices at all prices). Sophisticated consumers who recognize their
need for self-control will be reluctant to buy more of a vice in response to
a price discount.
In an early example of the application of multiple methods in marketing
science, Wertenbroch (1998) employed a combination of experimental
data, field study data, and aggregate store-level scanner data analysis to
test this hypothesis and to enhance the external validity of the experimen-
tal findings. In an incentive-compatible experiment, 304 MBA student
participants were given an opportunity to buy potato chips. They could
choose between a small purchase quantity (one 6-oz. bag) for $1, or a
larger purchase quantity (three 6-oz. bags) at a quantity discount, or none
at all. The quantity discount depth varied between participants, either
shallow (three bags for $2.80) or deep (three bags for $1.80). To manipu-
late how tempting the chips were (and thus how strong the potential need
for self-control by precommitment was), they were described either as 25
percent fat (a more tempting vice frame) or as 75 percent fat-free (a less
tempting virtue frame), also between participants. Manipulation checks
showed that participants’ perceptions of the two price discount levels and
of the intertemporal costs and benefits differed accordingly. The results
were as predicted: For those 151 participants who bought potato chips,
a logistic regression analysis to predict purchase quantity probabilities
showed that increasing the quantity discount depth was much less effec-
tive at inducing the purchase of the large quantity under the vice frame
(25 percent fat) than under the virtue frame (75 percent fat-free). At the
same time, participants did not exhibit a stronger preference for the chips
when they were framed as a virtue than when they were framed as a vice,
indicating that the reluctance to buy the large size under the vice frame did
not arise because the chips were less preferred overall when framed as 25
percent fat. These results provided initial support for the hypothesis that
consumers control their consumption of tempting vice goods by buying
these vices in smaller package sizes at a time than comparable virtues.
A second experiment provided additional evidence linking participants’
package size preferences to a measure of their need for self-control. A
different group of 310 MBA student participants recruited for this experi-
ment indicated whether they wanted to buy zero, one, or two packs of
Oreo chocolate chip cookies at each of 20 different package prices (from
25¢ to $5 in 25¢ increments). Using an incentive-compatible lottery proce-
dure, 10 percent of the participants were randomly selected to receive $10
across these 21 pairs (e.g., doubling package size decreased unit price by an
average of 57 percent for relative vices versus only 45 percent for relative
virtues). This finding suggests that marketers’ actual pricing policies are in
line with consumer preferences for rationing purchase quantities of vices
Finally, Wertenbroch (1998) examined 52 weeks of store-level sales data
from 86 stores of Dominick’s Finer Foods, a leading supermarket chain in
metropolitan Chicago with a 20 percent market share at the time, for four
of these matched categories, in which UPCs could be unambiguously iden-
tified as regular and light products. The analyses showed that aggregate
consumer demand for the relative vices was almost 30 percent less price-
elastic than demand for the relative virtues, carefully matching regular and
light UPCs and adjusting for the effects of various control variables. This
result presented additional suggestive evidence of the presence of consumer
precommitment by purchase quantity rationing in the marketplace.
All four studies showed or implied that consumer demand for relative
vices is less price-elastic than demand for relative virtues, as implied by
Wertenbroch’s (1998) purchase quantity rationing hypothesis. Consumers
do not generally prefer virtues over vices, yet demand for vices increases
less than demand for virtues in response to given unit price reductions.
This suggests that consumers self-impose inventory constraints on their
vice consumption, not because they like vices less, but for strategic
reasons, revealing a preference for precommitment. By forgoing unit price
reductions from quantity discounts, they end up paying higher unit prices
for small package sizes (relative to unit prices for large package sizes) of
vices than of virtues—put loosely, paying more to buy less of what they
want too much—a self-control premium. Wertenbroch (1998) illustrates
that key to detecting consumer precommitment in the marketplace is to
assess whether consumers are willing to pay such a premium to ration
themselves or to self-impose any other costly constraint on their own
freedom of choice (e.g., Ariely and Wertenbroch 2002). Such behavioral
evidence of precommitment allows marketers and policymakers to detect
a need for policy intervention purely based on consumers’ revealed prefer-
ences for self-imposing constraints, not on their (possibly biased) stated
preferences or third-party assessments.
NOTES
1. This chapter draws on and extends ideas introduced and discussed in Wertenbroch
(2014). I am grateful to Janet Schwartz for helpful comments.
2. Frederick, Loewenstein and O’Donoghue (2001, 356) point out that Samuelson’s (1937)
standard discounted utility model, which uses constant discounting, entails no normative
claim, but that Koopmans (1960) showed that it “could be derived from a superficially
plausible set of axioms.”
3. Benartzi and Lewin (2012) offer details on practical applications of Save-More-Tomorrow™.
4. Dean Karlan is also co-founder of www.stickk.com, launched in 2007, which helps con-
sumers and organizations create precommitment contracts to reach their own or their
members’ personal goals, providing a commercial example of detecting and facilitating
consumer demand for precommitment.
References
Akerlof, George A. (1970), “The Market for ‘Lemons’: Quality Uncertainty and the Market
Mechanism,” Quarterly Journal of Economics, 84 (August), 488–500.
Ariely, Dan and Klaus Wertenbroch (2002), “Procrastination, Deadlines, and Performance:
Self-Control by Precommitment,” Psychological Science, 13 (May), 219–224.
Ashraf, Nava, Dean Karlan, and Wesley Yin (2006), “Tying Odysseus to the Mast: Evidence
from a Commitment Savings Product in the Philippines,” Quarterly Journal of Economics,
121 (May), 635–672.
Baca-Motes, Katie, Amber Brown, Ayelet Gneezy, Elizabeth A. Keenan, and Leif D.
Nelson (2013), “Commitment and Behavior Change: Evidence from the Field,” Journal of
Consumer Research, 39 (February), 1070–1084.
Bartels, Daniel M. and Oleg Urminsky (2011), “On Intertemporal Selfishness: How the
Perceived Instability of Identity Underlies Impatient Consumption,” Journal of Consumer
Research, 38 (1), 182–198.
Baumeister, Roy F., and Kathleen D. Vohs (2003), “Willpower, Choice, and Self-Control,”
in Time and Decision: Economic and Psychological Perspectives on Intertemporal Choice,
ed. George Loewenstein, Daniel Read, and Roy Baumeister, New York, NY: Russell Sage
Foundation, 201–216.
Becker, Gary S. and Kevin M. Murphy (1988), “A Theory of Rational Addiction,” Journal
of Political Economy, 96 (4), 675–700.
Benartzi, Shlomo and Roger Lewin (2012), Save More Tomorrow: Practical Behavioral
Finance Solutions to Improve 401(k) Plans, New York: Penguin.
Börsch-Supan, Axel and Reinhold Schnabel (1998), “Social Security and Declining Labor-
Force Participation in Germany,” American Economic Review, 88 (2), 173–178.
Brune, Lasse, Xavier Giné, Jessica Goldberg, and Dean Yang (2011), “Commitments to Save:
A Field Experiment in Rural Malawi,” World Bank Policy Research Working Paper Series
5748.
Cawley, John and Chad Meyerhoefer (2012), “The Medical Care Costs of Obesity: An
Instrumental Variables Approach,” Journal of Health Economics, 31 (1), 219–30.
Coase, Ronald H. (1960), “The Problem of Social Cost,” Journal of Law and Economics, 3
(October), 1–44.
Frederick, Shane, George F. Loewenstein, and Ted O’Donoghue (2002), “Time Discounting
and Time Preference: A Critical Review,” Journal of Economic Literature, 40 (June), 351–401.
Grover, Steven A., et al. (2015), “Years of Life Lost and Healthy Life-Years Lost from
Diabetes and Cardiovascular Disease in Overweight and Obese People: A Modelling
Study,” Lancet Diabetes & Endocrinology, 3 (2), 114–122.
Gul, Faruk and Wolfgang Pesendorfer (2001), “Temptation and Self-Control,” Econometrica,
69 (6), 1403–1435.
Herrnstein, Richard J., George F. Loewenstein, Dražen Prelec, and William Vaughan,
Jr. (1993), “Utility Maximization and Melioration: Internalities in Individual Choice,”
Journal of Behavioral Decision Making, 6 (September), 149–185.
Hershfield, Hal E., Dan G. Goldstein, William F. Sharpe, Jesse Fox, Leo Yeykelvis, Laura
L. Carstensen, and Jeremy N. Bailenson (2011), “Increasing Saving Behavior Through
Age-Progressed Renderings of the Future Self,” Journal of Marketing Research, 48, S23–S37.
Kahneman, Daniel (2011), Thinking Fast and Slow, New York, NY: Farrar, Straus & Giroux.
Kahneman, Daniel and Amos Tversky (1979), “Prospect Theory: An Analysis of Decisions
under Risk,” Econometrica, 47 (2), 263–291.
Kast, Felipe, Stephan Meier, and Dina Pomeranz (2012), “Under-Savers Anonymous:
Evidence on Self-Help Groups and Peer Pressure as a Savings Commitment Device,”
NBER Working Paper No. 18417.
Koopmans, Tjalling C. (1960), “Stationary Ordinal Utility and Impatience,” Econometrica
28 (2), 287–309.
Laibson, David (1997), “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of
Economics, 112 (2), 443–477.
Mill, John Stuart (1859/1975), On Liberty, New York, NY: Norton.
O’Donoghue, Ted and Matthew M. Rabin (1999), “Doing It Now or Later,” American
Economic Review, 89 (1), 103–124.
Puri, Radhika (1996), “Measuring and Modifying Consumer Impulsiveness: A Cost–Benefit
Accessibility Framework,” Journal of Consumer Psychology, 5 (2), 87–113.
Rogers, Todd, Katherine L. Milkman, and Kevin G. Volpp (2014), “Commitment Devices:
Using Initiatives to Change Behavior,” Journal of the American Medical Association, 311
(20), 2065–2066.
Samuelson, Paul A. (1937), “A Note on Measurement of Utility,” Review of Economic
Studies, 4 (2), 155–161.
Schelling, Thomas C. (1984), “Self-Command in Practice, in Policy and in a Theory of
Rational Choice,” American Economic Review, 74 (May), 1–11.
Schwartz, Janet, Daniel Mochon, Lauren Wyper, Josiase Maroba, Deepak Patel, and Dan
Ariely (2014), “Healthier by Precommitment,” Psychological Science, 25 (2), 538–546.
Stigler, George J. (1961), “The Economics of Information,” Journal of Political Economy,
69 (3), 213–225.
Stigler, George J. and Gary S. Becker (1977), “De Gustibus Non Est Disputandum,”
American Economic Review, 67 (2), 76–90.
Strotz, Robert H. (1955–56), “Myopia and Inconsistency in Dynamic Utility Maximization,”
Review of Economic Studies, 23, 165–180.
Sunstein, Cass R. (2015), “Fifty Shades of Manipulation,” Journal of Marketing Behavior, 1
(3–4), 213–244.
Thaler, Richard H. (1980), “Toward a Positive Theory of Consumer Choice,” Journal of
Economic Behavior & Organization, 1 (1), 39–60.
Thaler, Richard H. (2015), Misbehaving: The Making of Behavioral Economics, New York,
NY: Norton.
Thaler, Richard H. and Shlomo Benartzi (2004), “Save More Tomorrow™: Using Behavioral
Economics to Increase Employee Saving,” Journal of Political Economy, 112 (1, pt. 2),
S164–S187.
Thaler, Richard H. and Cass R. Sunstein (2003), “Libertarian Paternalism,” American
Economic Review, 93 (2), 175–179.
Thaler, Richard H. and Cass R. Sunstein (2008), Nudge: Improving Decisions About Health,
Wealth, and Happiness, New Haven, CT: Yale University Press.
Tversky, Amos, and Daniel Kahneman (1974), “Judgment under Uncertainty: Heuristics
and Biases,” Science, 185 (4157), 1124–1131.
von Neumann John and Oskar Morgenstern (1944), Theory of Games and Economic
Behavior, New York: Wiley.
Wertenbroch, Klaus (1998), “Consumption Self-Control by Rationing Purchase Quantities
of Virtue and Vice,” Marketing Science, 17 (4), 317–337.
Wertenbroch, Klaus (2003), “Self-Rationing: Self-Control in Consumer Choice,” in Time
and Decision: Economic and Psychological Perspectives on Intertemporal Choice, eds.
George Loewenstein, Daniel Read, & Roy Baumeister, New York, NY: Russell Sage
Foundation, 491–516.
Wertenbroch, Klaus (2014), “How (Not) to Protect Meta-Rational Consumers from
Themselves,” Journal of Consumer Protection and Food Safety, 9 (3), 266–268.
Wertenbroch, Klaus and Bernd Skiera (2002), “Measuring Consumer Willingness to Pay at
the Point of Purchase,” Journal of Marketing Research, 39 (May), 228–241.
Anyone who has made a New Year’s resolution and failed to make a
lasting behavior change is intimately familiar with the “intention-behavior
gap” (Sheeran, 2002). When it comes to following through on our best-
laid plans, we often fall short—most intentions to change behavior end in
failure (Sheeran, Webb, and Gollwitzer, 2005). There exist a multitude of
situations in which human behavior is seemingly irrational—going against
their intentions, for example—but nonetheless predictable. The promise of
behavioral science is that these anomalies can be exploited opportunisti-
cally to nudge people in the direction of making better choices. To help
people make desired behaviors easier for themselves and others, we have
formed an academic–industry collaboration to develop and implement a
new framework, the 4Ps Framework for Behavior Change. It offers strate-
gies and tactics for helping close the intention-behavior gap, organizing a
variety of “nudges” from marketing, psychology, and behavioral econom-
ics. These nudges can help practitioners and consumers design interven-
tions across multiple domains. The framework is consistent with Richard
Thaler and Cass Sunstein’s ideal of “libertarian paternalism”—nudging
people in directions that align their behaviors with their long-term self-
interest, without curtailing their ultimate freedom to choose (Thaler and
Sunstein, 2003). Focusing on actionable, high-impact levers of change, it
combines common sense with novel ways to make desirable behavior the
path of least resistance. In this chapter, we present the framework, along
with supporting research findings, and describe how it is being applied in
the field: encouraging healthy food choices at Google.
Most people report a desire to eat healthfully (Share Our Strength,
2014), but people eat more and eat more fattening foods than they did
20 years ago, with rates of obesity skyrocketing as a result. In addition
to increasing the personal risks of heart disease, diabetes, and other
chronic illnesses (Flegal, Graubard, Williamson, and Gail, 2007), obesity
is estimated to account for almost 10 percent of total annual medical
expenditures in the USA (Finklestein, Trogdon, Cohen, and Dietz, 2009).
Millions of dollars are being spent on nutritional and wellness education,
486
and American consumers spend more than $50 billion a year on weight-
loss attempts (Market Data Enterprises, 2009), but desire and information
are clearly not enough. It is in the public interest to help make healthier
food choices easier for everyone. And in many cases, it is in the interest of
corporations as well.
In 2015, Google celebrated its sixth year holding the number one spot
on Fortune’s list of 100 Best Companies to work for (Fortune, 2015).
And in all those years, Googlers mentioned the free, delicious food as one
of the keys to their satisfaction. The biggest challenge for the food team
was figuring out how to help Googlers stay simultaneously healthy and
satisfied: failing on either dimension would mean loss of productivity and
morale, which could hurt business outcomes and employee retention. And
inducing satisfaction meant not just providing a variety of foods (includ-
ing some less healthy ones), but treating employees as adults in control of
their own decisions about their bodies and their health. Therefore, gentle
nudges that did not restrict choices were appealing to the food team.
When the Google food team engaged Yale School of Management to
help them apply the 4Ps framework, they had already been using many
“tweaks” inspired by behavioral economists that were consistent with the
framework. In fact, they were on the vanguard of applying behavioral
economics to the food environment. Here, we describe how the framework
is being applied at Google, with results of some field experiments. Our
hope is that describing how the framework can be applied to one challenge
(serve food that keeps people healthy and satisfied) in one type of location
(Google offices) will inspire ideas for applying the framework to other
challenges and locations.
and Dhar, 2006). For all these reasons, it is possible and helpful to nudge
them in the right direction, through the types of simple interventions sug-
gested by the 4Ps framework.
The intervention domains of the 4Ps framework are: Process (how
choices are made), Persuasion (how choices are communicated),
Possibilities (what choices are offered), and Person (how intentions are
reinforced). (See Figure 23.1 for a summary of the framework.) Each lever
of change provides different paths to reduce resistance and nudge indi-
viduals toward healthy choices, offering ways to make intuitive choices
healthier and rational choices easier. Together, the framework provides
comprehensive suggestions for engineering the environment to make the
healthy choice the easy choice. Any aspects of the framework can be used
together; it is not necessary to use all of them. And although we focus on
health and food choices in this chapter, the framework can be applied to
any type of behavior.
Order
PROCESS
ITES
PER
BIL PERSON
SUAS
SI
S IO
PO N
is the first item in a pair or the middle item in a set of three. The privileged
positions in an experiential or auditory set (like a set of stockings to touch
or a list of daily specials to hear) are both the first and the last items.
When options are ordered by an alignable attribute such as size, people
with weak preferences tend to compromise by choosing the middle option
because it is easier to rationalize (Sharpe, Staelin, and Huber, 2008). These
biases can serve health goals, if healthy options are offered in the advan-
taged positions in comparative choices.
Defaults
Due to a bias toward the status quo, and also the ease of not making a
decision, defaults have proven extremely effective in guiding choices, even
in domains as weighty as organ donations (Johnson and Goldstein, 2003)
and retirement savings (Thaler and Benartzi, 2004). Often people are not
even aware of any alternative to the default. For example, in one study at
a Chinese takeout restaurant, patrons were asked if they would prefer a
half-serving of rice (without any price discount). Many chose this option,
which had not occurred to them when the full-sized entrée was offered
as the default (Schwartz, Riis, Elbel, and Ariely, 2012). Defaults are less
effective when preferences are strong. When preschool children were
offered apple slices as the default side but allowed to switch to French
fries, their strong preference for fries led the vast majority to reject the
apples (Just and Wansink, 2009).
Accessibility
Vividness
Vivid messaging and imagery grabs the attention of the intuitive, emo-
tional mind. Triggering emotions such as delight or disgust can help the
gut instinct be the right one. Vividness can be achieved with words or with
a visual or tactile experience.
Names play an important role in expectations and evaluations.
Understanding this, marketers have changed the names of some popular
products. To avoid vivid and negative images of oiliness, Kentucky Fried
Chicken has been officially shortened to KFC®, and Oil of Olay has been
shortened to Olay®. To escape the vivid connection with constipation,
prunes have become “dried plums.” Healthy choices can be assisted by
vivid names as well. Adding adjectives like “succulent” or “homemade”
can make food not only more appealing but also tastier and more filling
(Wansink, van Ittersum, and Painter, 2005). Even fruit consumption
can be nudged—a sign reading “fresh Florida oranges” increased fruit
consumption (Wansink, 2006). However, food names can spur over-
consumption, too: dieters thought a “salad special” was healthier and
thus ate more of it than an identical “pasta special” (Irmak, Vallen, and
Robinson, 2011). And people eat more when portions are called “small”
or “medium,” believing they have eaten less (Aydinoglu, Krishna, and
Wansink, 2009).
Using pictures or objects is another vivid way to engage the emotions,
which can encourage persistence in healthy behaviors. For example, look-
ing at bacteria cultured from their own hands led doctors to wash more
often. And seeing a vial of fat from a gallon of whole milk caused many
milk drinkers to switch to skim (Heath and Heath, 2010). Visuals can also
simplify the decision process. In one cafeteria intervention, implementing
a simple green/yellow/red color-coding system improved sales of healthy
items (green) and reduced sales of unhealthy items (red) (Thorndike, et
al., 2012). Google has implemented stoplight labels as well, with many
Googlers reporting that the colored labels helped them make healthy
choices.
Comparisons
Moments of Truth
A “moment of truth” is the time and place when people will be most
receptive to persuasive messaging (Dhar and Kim, 2007). The evaluation
of choice alternatives depends on which goals are active in any particular
moment. Therefore, decision processes are quite sensitive to timing—and
Although most people in a different study had predicted that seeing ads for some
commonly disliked vegetables wouldn’t get them to eat more of those vegetables,
it appears they may have been wrong. In one high-traffic café where Googlers
eat free meals, we promoted a series of unpopular vegetables (beets, parsnips,
squash, Brussels sprouts, and cauliflower) as the Vegetable of the Day! with dis-
plays of colorful photos and trivia facts next to a dish containing that vegetable as
its main ingredient. By placing the campaign posters at the moment of truth—right
next to the dish—we increased the number of employees trying the featured dish
by 74 percent and increased the average amount each person served themselves
by 64 percent.
Possibilities provide the most obvious lever of change, yet they are often
overlooked. Possibilities refers to the composition of the choice set: before
trying to steer choices, the planner might improve options. While it may
in rare cases be effective to ban undesirable behavior (such as smoking in
restaurants) or to legislate desirable behavior (such as wearing seatbelts),
the negative reactions against paternalism can often outweigh its benefits.
Therefore, we advocate a gentler approach, maintaining freedom of choice
while improving the options. When designing a choice set to facilitate
Assortment
The first decision a planner must make is what will the assortment be?
Availability has a strong impact on consumption: people tend to eat
whatever is in front of them. Sometimes the existing options can be made
healthier, either by modifying components (e.g., white to wholegrain
pasta) or by switching the mode of delivery (e.g., salt shakers that dispense
less salt per shake). One study found people were more likely to choose
a healthy option (fruit over a cookie) from a larger assortment than a
smaller one (Sela, Berger, and Liu, 2009). Relative appeal can also be
manipulated. In the Healthy Lunchrooms Initiative, Wansink found that
placing fruit in a nice bowl or under a light increased fruit sales by more
than 100 percent (“Nutrition advice,” 2014).
Variety in an assortment is a powerful stimulant of consumption.
Generally, when consuming more than one thing is possible, more options
mean more consumption. This is true even when variation is purely
perceptual. For example, people ate more M&Ms from a bowl containing
more colors of M&Ms, even though the total quantity and flavors were
identical to a bowl with fewer colors (Kahn and Wansink, 2004). One
way to reduce consumption without restricting choice altogether is by
rotating variety over time, with healthy or desirable options switching
more frequently, to encourage sampling or consumption, with unhealthy
or undesirable options switching less frequently, to encourage satiation.
Bundling
Quantity
Goals
Setting explicit goals can increase healthy choices by reducing the think-
ing required for engaging in a behavior. Effective goals are personal,
motivational and measurable—challenging, specific, and concrete (Locke
and Latham, 1990). “Getting in shape” is a wish, whereas a goal to “run 3
miles 3 times a week until the wedding” entails both a reasonable challenge
and a means of measuring success—and is more likely to yield the desired
outcome (Strecher et al., 1995). Goals also become more manageable
when broken into smaller steps. Like paying for a new car in monthly pay-
ments, a goal of losing four pounds per month becomes easier than losing
50 pounds in a year. And another important benefit of setting intermedi-
ate goals is building momentum by tracking small wins along the way—
perception of progress toward a goal can itself be motivating (Kivetz,
Urmisky, and Zheng, 2006). Tracking goals, with tools for accomplish-
ment and measurement, increases the chance of success.
Precommitment
Habits
Conclusion
In this chapter, we have shared the 4Ps Framework for Behavior Change,
designed to organize research findings to make them more easily appli-
cable in the real world. We have described many levers the well-meaning
planner can employ to support the healthy intentions of others, and we
have shared some examples of how the 4Ps Framework is being applied
at Google. The examples here focused on nudging people toward healthy
food choices, but similar strategies can be used to nudge people’s behavior
in any direction that supports their own intentions. The framework offers
a toolbox of interventions leveraging a contextual approach aimed at
influencing specific decisions via (1) the combination of choices people are
exposed to, (2) the choice environment, and (3) communication about the
In a field experiment at Google, we helped employees turn goals into healthy eating
habits. Volunteers set personal diet and body goals and were randomly assigned
to one of three groups. The first received information on the link between blood
glucose and weight gain. The second also received tools for using that information:
blood glucose monitoring devices, data sheets, and advice on measuring glucose,
weight, BMI, and body composition. The third was the control group, receiving no
information or tools. Weekly surveys showed those who had received tools in addi-
tion to information made the greatest progress on their goals. After three months,
there was no difference between the information group and the control in achiev-
ing personal goals, while among those who had received the tools, 10 percent
more reported making progress on their body goals and 27 percent more reported
making progress on their diet goals. By the end of the study, those in the tools group
reported healthy choices becoming habitual, “After doing the first blood tests, I didn’t
need to prick myself much more.” Information was not enough to facilitate change,
but tools and measurement gave insight that closed the intention-behavior gap.
References
Aydinoglu, N. Z., Krishna, A., and Wansink, B. (2009). Do size labels have a common
meaning among consumers? In A. Krishna (ed.), Sensory marketing: Research on the sen-
suality of products. New York, NY: Routledge, 343–360.
Bargh, J. A. and Chartrand, T. L. (1999). The unbearable automaticity of being. American
Psychologist, 54, 462–479.
Baumeister, R. F., Bratslavsky, E., Finkenauer, C., and Vohs, K. D. (2001). Bad is stronger
than good. Review of General Psychology, 5(4), 323–370.
Baumeister, R. F. and Tierney, J. (2011). Willpower: Rediscovering the greatest human
strength. New York: Penguin Press.
Bravata, D. M., Smith-Spangler, C., Sundaram, V., Gienger, A. L., Lin, N., Lewis, R.,
Sirard, J. R. (2007). Using pedometers to increase physical activity and improve health: A
systematic review. Journal of the American Medical Association, 298, 2296–2304.
Chapman, J., Armitage, C. J., and Norman, P. (2009). Comparing implementation intention
interventions in relation to young adults intake of fruit and vegetables. Psychology and
Health, 24(3), 317–332.
Dhar, R. and Gorlin, M. (2013). A dual-system framework to understand preference con-
struction processes in choice. Journal of Consumer Psychology, 23(4), 528–542.
Dhar, R. and Kim, E. Y. (2007). Seeing the forest or the trees: Implications of construal level
theory for consumer choice. Journal of Consumer Psychology, 17(2), 96–100.
Dhar, R. and Nowlis, S. M. (1999). The effect of time pressure on consumer choice defer-
ral. Journal of Consumer Research, 25(4), 369–384.
Dhar, R. and Simonson, I. (2003). The effect of forced choice on choice. Journal of Marketing
Research, 40(2), 146–160.
Market Data Enterprises (2009). The Weight Loss and Diet Control Market.
Meiselman, H. L., Hedderley, D., Staddon, S. L., Pierson, B. J., Symonds, C. R. (1994).
Effect of effort on meal selection and meal acceptability in a student cafeteria. Appetite,
23(1), 43–55.
Milkman, K., Minson, J. A., and Volpp, K. G. (2014). Holding the Hunger Games hostage
at the gym: An evaluation of temptation bundling. Management Science, 60(2), 283–299.
Milkman, K. L., Rogers, T., and Bazerman, M. H. (2010). I’ll have the ice cream soon and
the vegetables later: A study of online grocery purchases and order lead time. Marketing
Letters, 21(1), 17–35.
Nisbett, R. E., and Wilson, T. D. (1977). Telling more than we can know: Verbal reports on
mental processes. Psychological Review, 84, 231–259.
Nowlis, S. M., Dhar, R., and Simonson, I. (2010). The effect of decision order on purchase
quantity decisions. Journal of Marketing Research, 47(4), 725–737.
Nutrition advice from nutrition expert Brian Wansink. (2014). Smarter Lunchrooms Movement.
Retrieved December 2015, http://smarterlunchrooms.org/news/nutrition-advice-nutrition-ex
pert-brian-wansink.
Perneger, T. V., and Agoritsas, T. (2011). Doctors and patients susceptibility to framing bias:
A randomized trial. Journal of General Internal Medicine, 26(12), 1411–1417.
Pocheptsova, A., Amir, O., Dhar, R., and Baumeister, R. F. (2009). Deciding without
resources: Resource depletion and choice in context. Journal of Marketing Research, 46(3),
344–355.
Prestwich, A., Conner, M., Lawton, R., Bailey, W., Litman, J., and Molyneaux, V. (2005).
Individual and collaborative implementation intentions and the promotion of breast self-
examination. Psychology and Health, 20, 743–760.
Read, D., and van Leeuwen, B. (1998). Predicting hunger: The effects of appetite and delay
on choice. Organizational Behavior and Human Decision Processes, 76(2), 189–205.
Schwartz, J., Riis, J., Elbel, B., and Ariely, D. (2012). Inviting consumers to downsize fast-
food portions significantly reduces calorie consumption. Health Affairs, 31(2) 399–407.
Sela, A., Berger, J., and Liu, W. (2009). Variety, vice, and virtue: How assortment size influ-
ences option choice. Journal of Consumer Research, 35(6), 941–951.
Share Our Strength (2014). It’s dinnertime: a report on low-income families’ efforts to plan,
shop for and cook healthy meals. Retrieved December 2015, https://www.nokidhungry.
org/images/cm-study/report-highlights.pdf.
Sharpe, K., Staelin, R., and Huber, J. (2008). Using extremeness aversion to fight obesity:
Policy implications of context dependent demand. Journal of Consumer Research, 35,
406–422.
Sheeran, P. (2002). Intention—behavior relations: A conceptual and empirical review. European
review of social psychology, 12(1), 1–36.
Sheeran, P., Webb, T. L., and Gollwitzer, P. M. (2005). The interplay between goal intentions
and implementation intentions. Personality and Social Psychology Bulletin, 31, 87–98.
Shiv, B., and Nowlis, S. M. (2004). The effect of distractions while tasting a food sample:
The interplay of informational and affective components in subsequent choice. Journal of
Consumer Research, 31(3), 599–608.
Strecher, V. J., Seijts, G. H., Kok, G. J., Latham, G. P., Glasgow, R., DeVellis, B., and
Bulger, D. W. (1995). Goal setting as a strategy for health behavior change. Health
Education Quarterly, 22, 190–200.
Thaler, R. H. and Benartzi, S. (2004). Save More Tomorrow™: Using behavioral economics
to increase employee saving. Journal of Political Economy, 112(S1), S164–S187.
Thaler, R. H. and Sunstein, C. R. (2003). Libertarian paternalism. American Economic
Review Papers and Proceedings, 93, 175–179.
Thaler, R. H. and Sunstein, C. R. (2008). Nudge: improving decisions about health, wealth, and
happiness. New Haven, CT: Yale University Press.
Thorndike, A. N., Sonnenberg, L., Riis, J., Barraclough, S., and Levy, D. E. (2012). A
2-phase labeling and choice architecture intervention to improve healthy food and bever-
age choices. American Journal of Public Health, 102(3), 527–533.
Verplanken, B., and Wood, W. (2006). Interventions to break and create consumer habits.
Journal of Public Policy and Marketing, 25(1), 90–103.
Wansink, B. (2006). Mindless eating: Why we eat more than we think. New York, NY:
Bantam.
Wansink, B., Van Ittersum, K., and Painter, J. E. (2005). How descriptive food names bias
sensory perceptions in restaurants. Food Quality and Preference, 16(5), 393–400.
Wansink, B., Van Ittersum, K., and Painter, J. E. (2006). Ice cream illusions: bowls, spoons,
and self-served portion sizes. American Journal of Preventive Medicine, 31(3), 240–243.
Wood, W., Tam, L., and Guerrero Witt, M. (2005). Changing circumstances, disrupting
habits. Journal of Personality and Social Psychology, 88, 918–933.
502
ultimately making data collection seem like a simple extension of the tasks
room attendants were used to completing.
Also, knowing that a language barrier might pose a challenge, the team
asked the room attendant supervisor for permission to go into a room and
take pictures of towels in various places to eventually be used in a guide that
pictorially demonstrated what should and should not count as towel reuse.
In addition to in-person training by the researchers, the team also wrote out
instructions in English and then paid a translator to translate the instructions
into Spanish (the native language of the majority of the room attendants).
Because the room attendants were being given new instructions that
differed from their well-established habit, our instructions were somewhat
complicated, and they personally had little incentive to follow these
instructions. Therefore, the room attendant supervisors were asked to
occasionally “test” the room attendants and to report back to the
researchers if there were any room attendants whose data they believed
would be inaccurate. After a few weeks the supervisors named several
room attendants whom they did not endorse and whose data the team
did not use. Had the room attendant supervisor not conducted these tests,
these room attendants’ data likely would have added noise to the experi-
ment and reduced the likelihood of detecting statistically significant effects
between the different message conditions.
Prior to this field experiment comparing provincial and global norms,
the researchers conducted an initial field experiment. This experiment
simply tested the difference in compliance between a control message that
was akin to the standard messages hotels already employed (focusing on
the importance of conservation to the environment) and a descriptive
social norm-based message indicating that most hotel guests participate
in the program (these norm data were based on a small study the authors
had previously discovered). There were several benefits of conducting this
study prior to proposing the provincial norm study. First, from a purely
applied standpoint, demonstrating that a novel sign designed by psycholo-
gists was superior to the standard ones used by almost all hotels would
provide a key applied insight to practitioners in the hospitality industry.
Second, this would provide the hotel manager with tangible results,
further highlighting the utility of research. Finally, and most important,
this experiment helped iron out the kinks, so to speak, of the logistics
and coordination necessary to run future studies jointly with the hotel
management and staff.
After collecting data for this initial experiment for nearly three months,
the researchers found that the social norm message indicating that most
of the hotel guests reuse their towels did indeed yield significantly greater
participation in the hotel’s towel reuse program than the standard
Reciprocity by Proxy
potential mechanism(s) that drove the initial findings. This is where fol-
low-up laboratory studies prove so useful: Not only can they help reduce
confounds, but they also typically give researchers far greater insight into
the psychological underpinnings of the field experiment effects.
Rewards
Although field experiments present more challenges than many other
forms of research, they can also provide many more rewards. One major
benefit of field research is that it is conducted in a real-life setting and is
viewed as more convincing than lab study results. There is no leap of faith
required in making the jump from theory to practice. This is not to say that
real change happens quickly after field experiments are publicized—it took
years after publication of the first hotel study for us to observe any hotels
actually making use of the findings. But sometimes large-scale changes
do occur shortly after field experiments are published. For example, rela-
tively soon after Schultz and colleagues (2007) published their paper on
the benefits of providing normative feedback to home energy users, the
company Opower was founded using the same principles demonstrated in
that work (Cuddy, Doherty, and Bos, 2010). To date, Opower’s feedback
on homeowner’s energy reports has cumulatively saved approximately 11
trillion watt-hours of energy and reduced customers’ energy bills by about
$1.1 billion. It seems very likely that interest in the power of normative
feedback was a direct result of running a field experiment rather than a
survey or lab experiment. Given the potential major impact of field experi-
mentation on scholarship and practice, we look forward to seeing more of
it conducted by consumer researchers in the future.
References
Cialdini, R. B. (2008). Influence: Science and practice (5th ed.). Boston: Allyn & Bacon.
Cialdini, R. B. (2009). We have to break up. Perspectives on Psychological Science, 4(1), 5–6.
Cialdini, R.B. & Goldstein, N.J. (2004). Social Influence: Compliance and Conformity.
Annual Review of Psychology, 55(1), 591–621.
Cuddy, A. J. C., Doherty, K. T., & Bos, M. W. (2010). “OPOWER: Increasing Energy
Efficiency through Normative Influence (A).” Harvard Business School Case 911-016
(Revised January 2012).
Freedman, J. L., & Fraser, S. C. (1966). Compliance without pressure: The foot-in-the-door
technique. Journal of Personality and Social Psychology, 4, 195–203.
Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using
social norms to motivate environmental conservation in hotels. Journal of Consumer
Research, 35, 472–482.
Goldstein, N. J., Griskevicius, V., & Cialdini, R. B. (2011). Reciprocity by proxy: A novel
Privacy
511
Web users in the treatment and control groups were asked to fill out
a survey that asked about opinions of the brand in the treatment group.
Thus people who saw the branded ad and people who saw the public
service announcement were both asked about their opinion on the brand.
The difference in favorability and stated purchase intention between the
treatment and control groups can be seen as the effect of the ad on brand
favorability and purchase intent. In other words, the experiment allows
the marketing research firm (and us researchers!) to assess the causal
impact of the advertisement on stated opinions.
It is important to note some limitations of this method. First, we do not
know the impact of the ad on actual purchasing, only on stated intentions
to purchase and stated opinion of the brand. Second, a large fraction of con-
sumers did not fill out the survey. While the response rate for the treatment
and control groups is similar, it is generally low. This suggests the measure
of effectiveness we have could be narrowly seen as a measure of the effect of
an advertisement on the type of people who are willing to fill out surveys.
Nevertheless, the field experiments give us measures of the effectiveness
of thousands of different advertising campaigns across many countries
and over many years. We could use this information to look at changes in
the effectiveness of advertising campaigns in Europe before and after the
2004 implementation of the privacy regulation; however, such an analysis
would be incomplete. It would not help solve the second requirement for
measuring the impact of the regulation: a comparison group to provide a
relevant benchmark.
As a benchmark, we use non-EU countries (the non-EU data come
primarily from the United States, with a small number of campaigns in
each of Canada, Mexico, Brazil, and Australia). We use the change in
EU privacy policy in the use to conduct a “difference-in-differences”
analysis that treats the change in policy as a natural or quasi-experiment.
We compare the change in effectiveness of EU ads before and after the
policy change to the change in the effectiveness of non-EU ads, before
and after the EU policy change. This is called a difference-in-differences
analysis because it looks at the difference in the change in ad effectiveness
across locations over time. The changes in ad effectiveness are, themselves,
differences between the before and after periods. While it is possible to
conduct difference-in-differences estimation by comparing the four aver-
ages (ad effectiveness in the EU before the policy change, ad effectiveness
in the EU after the policy change, ad effectiveness outside the EU before
the EU policy change and ad effectiveness outside the EU after the EU
policy change), it is more common and often more informative to conduct
regression analysis that emphasizes an interaction term between the policy
change timing and the treatment group.
Next, I discuss another study that used part of this same data set of field
experiments from a marketing research company to assess whether the
digital channel limits the ability of local governments to change behavior
by restricting advertising.
Castells (2001) highlighted the potential of the internet to reduce state
control by allowing information to flow freely across borders. While
national governments have been able to erect barriers to the international
flow of information online (Zhang and Zhu 2011), such barriers have
proven challenging within countries. The point that local government
policies can be undermined by the online channel has received a great deal
of attention in the context of local sales taxes (Goolsbee 2000; Ellison and
Ellison 2009; Einav et al. 2012; Anderson et al. 2010). One common thread
in these studies is that consumers are much more likely to buy online in
locations with high offline sales taxes.
In Goldfarb and Tucker (2011b), we examine whether this reduced
potential of government control applies to advertising regulation. Many
local governments ban certain types of advertising within their juris-
diction. Particularly common in the United States is the banning of
alcohol advertising using billboards and other outdoor media. During the
2001–2008 time period, 17 states regulated such out-of-home advertising
of alcoholic products. To test whether the internet makes government
regulation less effective, we compared the effectiveness of online advertis-
ing campaigns for alcohol within the 17 states that restricted out-of-home
Antitrust
intellectual property law, etc.). We argue that prices for these words were
unlikely to be affected by a ban on direct solicitation, but are likely to
be affected by other drivers of the price of law-related keywords such as
litigiousness and local competition between lawyers.
We conducted difference-in-differences analysis, comparing the differ-
ence in the prices of personal injury keywords with other law keywords
in states with direct solicitation bans to the difference in the prices of
personal injury keywords with other law keywords in states without direct
solicitation bans.
We found substantial substitution between search engine advertising
and direct solicitation: when direct solicitation is banned, prices for per-
sonal injury keywords are much higher. We interpret this to suggest that
search engine advertising competes directly with offline direct solicitation
(a form of advertising).
This research has been used to argue that online and offline advertising
markets should not be seen as separate markets, but as part of a larger
advertising market (Ratliff and Rubinfeld 2010). If the relevant market
is all advertising, rather than search engine advertising, it is harder to see
how Google can be an antitrust target based on its share of the search
advertising market alone.
Conclusion
This chapter has summarized three studies that used experiments and
difference-in-differences regression modeling to inform policy debates
around privacy, local jurisdiction, and antitrust. Much work remains to
be done to improve the empirical content of these debates, as well as other
discussions in marketing policy.
References
Acquisti, Alessandro, Curtis Taylor, and Liad Wagman. 2016. The Economics of Privacy.
Journal of Economic Literature 54(2), 442–492.
Anderson, E., N. Fong, D. Simester, and C. Tucker. 2010. How sales taxes affect customer and firm
behavior: The role of search on the internet. Journal of Marketing Research 47(2), 229–239.
Castells, Manuel. 2001. The Internet Galaxy: Reflections on the Internet, Business, and
Society. London: Oxford University Press.
Einav, L., D. Knoepe, J. D. Levin and N. Sundaresan. 2012. Sales taxes and internet com-
merce. Working Paper 18018, National Bureau of Economic Research.
Ellison, Glenn and Sara Fisher Ellison. 2009. Tax Sensitivity and Home State Preferences in
Internet Purchasing. American Economic Journal: Economic Policy 1(2), 53–71.
Goldfarb, Avi. 2004. Concentration in Advertising-Supported Online Markets: An Empirical
Approach. Economics of Innovation and New Technology 13(6), 581–594.
Goldfarb, Avi and Catherine Tucker. 2011a. Privacy Regulation and Online Advertising,
Management Science 57(1), 57–71.
Goldfarb, Avi and Catherine Tucker. 2011b. Advertising Bans and the Substitutability of
Online and Offline Advertising. Journal of Marketing Research 48(2), 207–227.
Goldfarb, Avi and Catherine Tucker. 2011c. Search engine advertising: Channel substitution
when pricing ads to context. Management Science 57(3), 458–470.
Goldfarb, Avi and Catherine Tucker. 2011d. Online Advertising. In Advances in Computers
vol. 81, ed. Marvin Zelkowitz. New York: Elsevier.
Goolsbee, A. 2000. In a world without borders: The impact of taxes on internet commerce.
Quarterly Journal of Economics 115 (2), 561–576.
Manne, Geoffrey, and Joshua Wright. 2011. Google and the Limits of Antitrust: The
Case Against the Case Against Google. Harvard Journal of Law and Public Policy 34(1),
171–244.
Nissenbaum, Helen. 2010. Privacy in Context: Technology, policy, and the integrity of social
life. Palo Alto CA: Stanford Law Books.
Ratliff, James D. and Daniel L. Rubinfeld. 2010. Online Advertising: Defining Relevant
Markets. Journal of Competition Law and Economics 6(3), 653–686.
Selove, Daniel. 2008. Understanding Privacy. Cambridge MA: Harvard University Press.
Zhang, X. and F. Zhu. 2011. Group Size and Incentives to Contribute: A Natural Experiment
at Chinese Wikipedia. American Economic Review 101(4), 1601–1615.
519
data. The chapter concludes with an analysis of the empirical results and a
discussion of their policy implications.
Background
The two main programs that society currently uses to respond to individu-
als with problems of illegal drug use are health-system interventions and
legal-system controls. The health system deals with physical, mental, and
some behavioral aspects of drug use but does not necessarily address crime
and violence. The legal system, which views drug use from the perspective
of criminal justice, focuses on the criminality of drug users and imposes
penalties for illegal activities, including incarceration. Both the medical
and the criminal aspects of drug use, however, are intricately related. The
strong linkage between narcotics addiction and crime has been well docu-
mented (see e.g., reviews by Speckart and Anglin 1986; Powers 1990).
Studies evaluating the effectiveness of treatment, especially methadone
maintenance, consistently show that treatment reduces narcotics use and
related crime among chronic narcotic addicts (Anglin and Hser 1990).
Evidence for the direct effects of legal supervision, while promising, is
more equivocal (Simpson and Friend 1988). Even fewer studies have
investigated the joint effectiveness of criminal justice system interventions
and community drug treatment on drug use and crime, especially over a
long period of time (Collins and Allison 1983). As a result, the relative con-
tributions of methadone maintenance and legal supervision to combatting
drug use and crime remain unclear. Nor is it known how these two types
of intervention should be combined for maximum efficacy. Furthermore,
before policy decisions can be made, it is necessary to determine whether
such interventions continue to have beneficial effects over the long run
for a sufficiently large number of drug-dependent individuals to be cost
effective.
In order to explore these questions, the present study will develop a
multivariate time-series model, using a cointegration and error-correction
approach to understand the long-term and the short-term relationships
among the intervention and behavioral variables (Engle and Granger
1987). Long-term, or “permanent,” relationships refer to how a stochastic
trend in a given variable is related to the stochastic trends of other vari-
ables. Short-term relationships measure how temporary fluctuations from
the means, or trends, of the measured variables are related to each other.
From the literature, it is clear that methadone maintenance and legal
supervision do not typically operate in isolation from each other, and
both are often imposed, either alone or in combination, in response to
Data
Sample
The data for the present analysis were taken from extensive retrospective
longitudinal interviews with 720 heroin addicts who entered methadone
maintenance programs in Southern California in the years 1971–1978.
Detailed descriptions of sample selection and sample characteristics are
available elsewhere (Anglin and McGlothlin 1984; Hser, Anglin and Chou
1988). The original sample consisted of 251 Anglo men, 283 Anglo women,
141 Chicanos, and 45 Chicanas. Because the length of the observation
period had to be sufficiently long for the results of time-series analysis to
be reliable and because it was necessary to retain a sufficient number of
subjects for the results to be generalizable, subjects who did not have at
least 80 months of observation were eliminated, providing 627 subjects
(87 percent of the original sample) for the time-series analysis. To ensure
that the reduced sample was representative of the original group, back-
ground characteristics of both samples were compared and are presented
in Table 26.1. No apparent differences were observed between the two
samples. The selected sample consisted of Anglo (74 percent) and Chicano
(26 percent) chronic narcotic addicts, both men (57 percent) and women
(43 percent). All the following analyses are based on the selected sample.
Variables
MIZIK_9781784716745_t.indd 522
Background Characteristics Original Sample (N= 720) Selected Sample (N=627)
N % N %
Ethnicity
Chicano 186 25.8 163 26.0
Anglo 534 74.2 464 74.0
Gender
Men 392 54.4 357 56.9
Women 328 45.6 270 43.1
Socioeconomic status of family (%)
522
Poor 7.1 7.1
Working class 33.4 33.4
Middle 45.5 44.9
Upper-middle 13.9 14.6
Problems in family a 2.8 2.8
Gang membership (%) 17.7 18.7
Problems in school (%) 72.0 72.0
Mean highest grade completed 10.9 10.9
Main occupation (%)
Skilled 19.6 19.9
Semi skilled 56.3 57.6
Unskilled 19.0 17.5
Never worked 5.1 4.9
14/02/2018 16:38
Mean age atb
First arrest 17.4 (671) 17.3 (587)
MIZIK_9781784716745_t.indd 523
Time left home 17.7 (706) 17.4 (616)
First narcotic use (FNU) 19.5 19.2
First daily use (FDU) 20.8 20.6
First legal supervision 22.4 (549) 22.3 (484)
First MM entry 26.6 26.9
Interview 31.9 32.5
Incarcerated >30 days prior to 25.1 25.6
FNU (%)
No. of mos. incarcerated prior to FNU (%)
523
None 75.0 74.5
1–12 17.4 18.0
13–24 5.2 5.4
25 or more 2.4 2.1
No. of incarcerations prior to FNU (%) None 1-5 6 or more
None 66.7 65.6
1–5 28.3 29.5
6 or more 5.0 4.9
a Measured by self-reported problematic relationships with parents; a higher value indicates more serious problems (range 1-6).
b The values in parentheses are the number of cases for mean computation after exclusion of missing cases. When not specified, the entire sample
was used.
c Includes incarcerations <30 days.
14/02/2018 16:38
524 Handbook of marketing analytics
Methodology
Zt = c + ϕZt-1 + at
MIZIK_9781784716745_t.indd 525
50
P
e
r
c 40
e Daily Narcotics Use
n No Narcotics Use
t
30 Methadone Maintenance
T Legal Supervision
i
525
Property Crime
m 20
e
%
10
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 99
Period
14/02/2018 16:38
Stage I
MIZIK_9781784716745_t.indd 526
(Examination of Unit Roots)
DO THE VARIABLES CONTAIN LONG-TERM COMPONENTS?
Test: unit-roots test
yes no
Stage II
(Assessment of
Long-Term Equilibrium) ARE THE VARIABLES COINTEGRATED?
Test: equilibrium regression
526
yes no
Stage III
(Assessment of
Short-Term Dynamics)
ERROR CORRECTION MODEL MODEL IN CHANGES MODEL IN LEVELS
Long-Term
Effect
yes no cannot be inferred
Short-Term
Effect maybe yes yes
14/02/2018 16:38
Measuring the long-term effects of public policy 527
(1 – ϕL)Zt = c + at (26.1)
where:
ϕ is the parameter relating the present to the past of Z,
L is the lag operator such that LkZt = Zt-k with k being a positive integer,
Zt is a random variable measured at time t with t = 1, 2, . . ., T,
c is a constant, and
at is a white noise random shock at time t, which is assumed to have a
normal distribution with mean 0 and constant variance s2a .
When |ϕ| < 1 holds for this model, the series {Zt} is said to be
stationary, having finite mean E(Zt) = c/(1−ϕ), and variance Var(Zt) =
s2a / (1−ϕ2). In this case, all observed fluctuations in {Zt} are temporary
in the sense that the series does not systematically depart from its mean
value, but rather reverts to it. On the other hand, if |ϕ| = 1, the series
is said to be a non-stationary, or evolutionary, series (a random walk,
in this case) whose mean and variance are functions of time t. For this
condition, the observed fluctuations are permanent in the sense that
the series wanders freely without any mean reversion. If |ϕ| > 1, the
series explodes toward + ∞ or – ∞, which is also non-stationary. For
the above model, determining whether the series is stationary or not
is equivalent to testing whether the root of the characteristic equation,
1 – ϕL = 0, is greater than one. When |ϕ| < 1, we conclude that the series
is stationary.
If MM and DNU are stationary, this implies that no long-term change
in these variables is observed over the observation period. Thus, if MM
has an effect at all on DNU, then the effect must be transitory, or short-
term, since the level of DNU will eventually return to its mean. Under
these conditions, we would argue that methadone treatment has only
temporary effects on narcotics use. On the other hand, if MM and DNU
are non-stationary, then we may investigate whether the observed random
walk, or stochastic trend, in DNU can be explained by the stochastic
trend in MM. For example, can a gradual decrease in DNU be explained
by a gradual increase in MM? A positive answer would imply that there
is a long-term, or equilibrium, relationship between the two. A negative
answer still does not rule out the effectiveness of methadone maintenance,
but it would imply that the treatment produces only temporary deviations
in the level of narcotics use. Finally, it is possible that a mixed scenario
occurs, such as the presence of a stochastic trend in narcotics abuse, but
not in methadone treatment. If the change in narcotics use could be related
to the level of methadone treatment, that would imply an even stronger
where
narcotics use, would establish that the two time series representing these
variables are related to each other in the long run. In theory, if the equilib-
rium relationship holds between MM and DNU, then they relate to each
other under the linear constraint
where êt-1 is the estimate of the equilibrium error correction term obtained
from equation (26.6), and w (L) and s (L) are parameter polynomials in L:
The results would reveal the short-term dynamics of the system, but
they would not explain the long-term behavior of the variables. Notice
that equation (26.10) is a restricted form of equation (26.7), where the
error correction term is absent.
Stationary system
Finally, if the data are stationary, we develop a model on the levels of
narcotics use and methadone treatment,
Zt = c| + a i Zt-i + |
J
| | |
a t (26.12)
i51
where
|
Zt = a (k x 1) random vector observed at time t for t = 1, 2, . . ., T,
|
c = a (k x 1) vector of constants,
|
i = a (k x k) parameter matrix, and
| | |
at = a (k x 1) white-noise vector assumed to be i.i.d. N(O, S).
The dynamics of the VAR (J) model are specified as follows: the jth
sample partial autoregression matrix P(j) can be obtained from
5 5
P (j) = j with j = 1, 2, . . ., J,
Results
The Box-Jenkins modeling approach was applied to each of the five outcome
variables. Dickey-Fuller unit roots tests were carried out to statistically
examine the existence of unit roots in each of the five variables. The resulting
five univariate ARIMA models indicated that a unit root is present in all the
variables, and the outcomes of the Dickey-Fuller tests were consistent with
these results. Because a unit root was present in each of the five outcome
variables, as well as in the control variable AGE, the next step is to test the
long-term relationships among the variables using equilibrium regressions.
Equilibrium Regressions
Table 26.2 summarizes the results of equilibrium regressions for the five
outcome variables. The unit-root tests performed on the error terms of
these five equilibrium regressions confirmed that all residuals were station-
ary, indicating the presence of long-term associations among the depend-
ent variables. The R2 for each of the five regressions show that significant
amounts of variance, ranging from 88 percent to 97 percent, are explained
by the models. Examining the coefficients of the equilibrium regressions
provides the following results. Long-term movements of narcotics use and
property crime go hand-in-hand. As the crime level rises, abstinence from
narcotics use eventually decreases, and daily use increases. Furthermore,
increased crime is associated with lower methadone maintenance involve-
ment and higher legal supervision. Reciprocally, narcotics use has a positive
long-term association with crime involvement. In terms of social interven-
tion effects, methadone maintenance has a significant long-term association
with no narcotics use and property crime, indicating its desirable effects.
Addict involvement in either methadone maintenance or legal supervision
increases the likelihood of involvement in the other. Finally, contrary to our
expectation, legal supervision shows a positive long-term association with
narcotics abuse and crime involvement; that is, as legal status persists, so do
narcotics use and property crime. Some possible justification and explana-
tion for this last finding will be presented in the discussion section.
Overall, the five outcome variables form a cointegrated system. While
each variable individually may move up or down over time without mean
reversion, there exists a dynamic equilibrium state toward which all other
variables will adjust. Therefore, an error-correction model can be used to
examine the short-term relationships within the system in conjunction with
partial adjustment for the long-term behavior of the variables.
The procedure advanced by Tiao and Box (1981) was used to estimate
a VAR model augmented with equilibrium error-correction terms. In
order to determine how many lags were needed for developing a model,
the pattern of the partial autoregression matrices was examined. Based
on the Akaike Information Criterion, specifying one lag was found to be
sufficient to represent the short-term dynamics in the system.
The error-correction equations for the five outcome variables were
estimated simultaneously. The generalized least-squares parameter esti-
mates and the residual correlation matrix are given in Table 26.3. The
error-correction terms in the five equations were all significant at p < 0.05
NNU DNU C MM LS
Const. 14.874 (4.404)* −20.127 (5.965) 14.930 (1.196) 36.258 (5.560) −14.383 (3.600)
DNU _** – 0.213 (0.023)a 0.062 (0.109) 0.120 (0.062)c
C −0.656 (0.178)a 2.236 (0.242)a – −2.412 (0.252)a 0.925 (0.183)a
MM 0.196 (0.072)a 0.055 (0.097) −0.204 (0.021)a – 0.509 (0.029)a
LS 0.057 (0.121) 0.317 (0.164)c 0.232 (0.046)a 1.509 (0.085)a –
AGE 0.746 (0.091)a 0.075 (0.123) −0.085 (0.037)b 0.066 (0.131) 0.074 (0.076)
534
R2*** 0.958 0.927 0.972 0.969 0.880
F(4,94) 506.455a 298.034a 810.362a 738.956a 171.750a
t**** −3.763 −4.649 −5.245 −5.56 −5.069
Unit Root? No No No No No
14/02/2018 16:38
Table 26.3 Error correction models on first differences
MIZIK_9781784716745_t.indd 535
Lag l
∆NNU 0.112 (0.095)* –** – – –
∆DNU – 0.438 (0.104)a 0.200 (0.052)a −0.124 (0.085) 0.096 (0.071)
∆C 0.243 (0.138)c −0.372 (0.246) −0.151 (0.119) −0.075 (0.187) −0.161 (0.146)
∆MM −0.050 (0.093) 0.144 (0.146) 0.019 (0.069) 0.226 (0.110)b 0.134 (0.087)
∆LS 0.129 (0.109) −0.018 (0.172) 0.001 (0.076) 0.079 (0.128) −0.008 (0.101)
∆AGE 0.208 (1.932) 2.630 (3.048) −0.033 (1.341) 2.635 (2.224) 1.545 (1.783)
∆EQ Error −0.178 (0.053)a −0.143 (0.058)b −0.347 (0.110a −0.126 (0.049b −0.273 (0.072)a
R2*** 0.167 0.126 0.272 0.136 0.212
F(6,90) 2.995a 2.163c 5.606a 2.367b 4.046a
535
Residual Correlations****
∆NNU ∆DNU ∆C ∆MM ∆LS
∆NNU 1
∆DNU −0.594 1
∆C −0.372 0.519 1
∆MM 0.376 −0.471 −0.206 1
∆LS −0.134 0.087 0.198 −0.109 1
14/02/2018 16:38
536 Handbook of marketing analytics
or better. On the other hand, only a few parameter estimates for the short-
term effects were significant (4 out of 25 estimates, one of which was only
marginally significant). These significant estimates reflect the persistence
of narcotics abuse over time and the contribution of narcotics-use behavior
to subsequent crime involvement. However, it should be emphasized that
the observed changes in the five outcome variables were explained mainly
by the error-correction terms, i.e., partial adjustments toward equilibrium.
In the present study, a major focus was the assessment of the dynamic
equilibrium relationship between narcotics use and property crime within
the larger social context. The results demonstrate that, at least at the
group aggregate level, there is an interlocked reciprocal response between
the two behaviors that persists over time. Criminal activity contributes to
long-term narcotics use, while, at the same time, narcotics use increases
long-term property crime. This implies that addicts develop a special life-
style commitment from their long-term involvement in both narcotics use
and criminal activities. When the long-term component is partialed out,
current changes in the crime level are driven by the changes in narcotics
use in the immediately previous period, but not vice versa. The contempo-
raneous relationship (where causal direction cannot be statistically speci-
fied) is strong, as has been shown in most of the previous research.
References
Anglin, M. Douglas and Yih-Ing Hser (1990), “Treatment of Drug Abuse,” Crime and
Justice, 13, 393–460.
Anglin, M. Douglas and William H. McGlothlin (1984), “Outcome of Narcotic Addict
Treatment in California,” Drug Abuse Treatment Evaluation: Strategies, Progress, and
Prospects, National Institute on Drug Abuse Research Monograph, 51, 106–128.
Collins, James J. and Margaret Allison (1983), “Legal Coercion and Retention in Drug
Abuse Treatment,” Psychiatric Services, 34(12), 1145–1149.
Dickey, David A., William R. Bell, and Robert B. Miller (1986), “Unit Roots in Time Series
Models: Tests and Implications,” American Statistician, 40(1), 12–26.
Dickey, David A. and Wayne A. Fuller (1979), “Distribution of the Estimators for
Autoregressive Time Series with a Unit Root,” Journal of the American Statistical
Association, 74, 427–431.
Engle, Robert F. and Clive WJ Granger (1987), “Co-integration and Error Correction:
Representation, Estimation, and Testing,” Econometrica: Journal of the Econometric
Society, 55, 251–276.
Hser, Yih-Ing, M. Douglas Anglin, and Chih-Ping Chou (1988), “Evaluation of Drug Abuse
Treatment: A Repeated Measures Design Assessing Methadone Maintenance,” Evaluation
Review, 12(5), 547–570.
Powers, Keiko I. (1990), “A Multivariate Time Series Analysis of the Long- and Short-term
Effects of Treatment and Legal Interventions on Narcotics Use and Property Crime.”
Ph.D. diss., University of California at Los Angeles.
Powers, Keiko I., Dominique M. Hanssens, Yih-Ing Hser, and M. Douglas Anglin (1991),
“Measuring the Long-Term Effects of Public Policy: The Case of Narcotics Use and
Property Crime,” Management Science, 37, 627–644.
Powers, Keiko I., Dominique M. Hanssens, Yih-Ing Hser and M. Douglas Anglin (1993),
“Policy Analysis with a Long-Term Time Series Model: Controlling Narcotics Use and
Property Crime,” Mathematical and Computer Modeling, 17(2), 89–107.
Priestley, Maurice Bertram (1981), Spectral Analysis and Time Series. London: Academic
Press.
Simpson, D. Dwayne, and H. Jed Friend (1988), “Legal Status and Long-Term
Outcomes for Addicts in the DARP Follow-up Project,” In C. G. Leukefeld and F. M.
Tims (eds), Compulsory Treatment of Drug Abuse: Research and Clinical Practice, 86,
81–98.
Sims, Christopher A. (1980), “Macroeconomics and Reality,” Econometrica: Journal of the
Econometric Society, 48, 1–48.
Speckart, George and M. Douglas Anglin (1986), “Narcotics Use and Crime: An Overview
of Recent Research Advances,” Contemporary Drug Problems, 13, 741–769.
Stock, James H. (1987), “Asymptotic Properties of Least Squares Estimators of Cointegrating
Vectors,” Econometrica: Journal of the Econometric Society, 55, 1035–1056.
Tiao, George C. and George E. P. Box (1981), “Modeling Multiple Time Series with
Applications,” Journal of the American Statistical Association, 76, 802–816.
Model
539
model helps predict the pass-through of the subsidy and the net outcome
in the industry.
Consumers
where Pri (m) is the probability of choosing the car type m or the outside
good; the term Pri (b (m) 0 m) captures the probability of choosing brand
b, given the choice of car type m; and finally Pri ( j 0 b (m)) is the probabil-
ity of buying alternative j, given the choice of brand b and type m. Each
probability is given by
1
expa V b
(12 sB) (12 sM) ij
Pri ( j 0 b (m)) 5 , (27.3)
a j r [b(m)
1
expa V rb
(12 sB ) (12 sM) ij
where IVib(m) and IVim are the inclusive values of brand nest b and type m
equal to
IVib(m) 5 ln a expa
1
Vij b (27.6)
j[b (m)
( 12sB (12sM)
)
the total number of vehicles sold for alternative j. The fixed costs incurred
by the manufacturer are denoted by fk.
In equation (27.8), the manufacturer costs cj are unobserved by
researchers – firm costs are typically not part of available data sets – but
can be estimated through first-order profit maximizing conditions of
manufacturers. The fixed costs fk drop out of the estimation and are
assumed to be uncorrelated with the pricing decisions.
Dealers take manufacturer prices as given and choose consumer prices,
obtaining the following profit function:
The terms in parentheses give the unit margin for each car sold: the
difference between consumer price Pj and manufacturer price Wj, plus (or
minus) any additional cash flows dj (such as car service net revenues) that
need to be estimated. AB assume that dj are fixed quantities set based on
industry standards and not strategically decided by each retailer. The term
fd represents the fixed costs of dealer d that drop out of the estimation.
To solve this problem, AB start with the dealer maximization problem
and define the first-order conditions (in vector form) as
To empirically apply their model and test the impact of the public policy,
AB use data on a large number of individual car transactions occurring in
San Diego and its suburbs between 2004 and 2006. For each transaction,
the researchers observe the car make and model, engine size, fuel, and
transmission type, as well as the zip code of dealer and consumer loca-
tions. For prices, they observe retail and wholesale prices for each car,
including any manufacturer rebates. The data are complemented by US
Census demographic data on income and population density at the zip
code level, used to implement consumer heterogeneity. The authors apply
the model to 15,795 transactions, concentrating the analysis on the most
important manufacturers in the area, which are General Motors (with
brands Cadillac, Chevrolet, and General Motors Cars), Ford, Honda,
Hyundai, Chrysler, Toyota, and Volkswagen. The data include 22 differ-
ent dealerships with a total of J = 62 alternatives.
The authors estimate the demand and supply model in two steps: first,
they estimate the demand parameters; second, using these parameter
values, the supply-side first-order conditions and respective parameters
are obtained. Since the demand model is fully identified from the choice
data, AB estimate the demand side without any assumptions on the
behavior of dealers and manufacturers. They use the control function
approach (Pancras and Sudhir, 2007; Petrin and Train, 2010) to control
for endogeneity of xjt and obtain the demand parameters using simulated
maximum likelihood. The “simulation” comes from including consumer
heterogeneity as draws from the distributions of demographic character-
istics to approximate the integrals in the demand model (for more details,
see Berry et al., 1995, and Nevo, 2001). The likelihood function takes the
following form:
where yijt is an indicator variable that takes the value of one for the alter-
native chosen by individual i and zero otherwise. The term q is the vector
Using their model, AB show that the optimal pricing decisions of manu-
facturer and dealers would be to drop prices by $3,000 to $6,000 over two
years in the absence of any public policy, as a response to the financial
crisis and the large negative industry demand shock. The resulting severe
profit reductions and the then tough financial situation of the US car man-
ufacturers would have threatened their survival. The Cash for Clunkers
program was intended to offer some relief to auto companies by funding
a direct decrease in the prices paid by consumers, while keeping the higher
pre-crisis margins.
A subsequent question emerges: given that manufacturers and retailers
know that consumers have $4,500 of additional disposable income to
spend on a new car, do they adjust final prices to account for that subsidy?
The advantage of a structural model as proposed by AB is that it can be
used to measure how much of the subsidy offered to consumers stays in
Conclusion
NOTE
References
Albuquerque, P. and B.J. Bronnenberg (2012), “Measuring the impact of negative demand
shocks on car dealer networks,” Marketing Science, 31 (1), 4–23.
Berry S., J. Levinsohn and A. Pakes (1995), “Automobile Prices in Market Equilibrium,”
Econometrica, 63, 841–890.
Bolinger, B. (2015), “Green Technology Adoption: An Empirical Study of the Southern
California Garment Cleaning Industry,” Working Paper.
Cardell, N.S. (1997), “Variance Components Structures for the extreme value and logis-
tic distributions with applications to models of heterogeneity,” Economic Theory, 13,
185–213.
Dubé, Jean-Pierre, Guenter J. Hitsch, and Puneet Manchanda (2005), “An Empirical
Model of Op-timal Dynamic Product Launch and Exit Under Demand Uncertainty,”
Quantitative Marketing and Economics, 3 (2), 107–144.
Goettler, Ronald L. and Brett R. Gordon (2011), “Does AMD Spur Intel to Innovate
More?” Journal of Political Economy, 119 (6), 1141–1200.
Hanssens, D.M., D. Purohit, R. Staelin, P. Albuquerque, and B.J. Bronnenberg (2012),
“Commentaries and Rejoinder to ‘Measuring the Impact of Negative Demand Shocks
on Car Dealer Networks’ by Paulo Albuquerque and Bart J. Bronnenberg,” Marketing
Science, 31 (1), 24–35.
Misra, Sanjog and Harikesh S. Misra (2011), “A Structural model of Sales-Force
Compensation Dynamics, Estimation and Field Implementation,” Quantitative Marketing
and Economics, 9 (3), 211–257.
Pakes, A., J. Porter, K. Ho, and J. Ishii (2015), “Moment Inequalities and Their Application,”
Econometrica, January, 315–334.
Pancras, J. and K. Sudhir (2007), “Optimal Marketing Strategies for a Customer Data
Intermediary,” Journal of Marketing Research, 44 (4), 560–578.
Petrin, A. and K. Train (2010), “A Control Function Approach to Endogeneity in Consumer
Choice Models,” Journal of Marketing Research, 47 (1), 3–13.
Richards, T. (2007), “A Nested Logit Model of Strategic Promotion,” Quantitative Marketing
and Economics, 5, 63–91.
Shriver, S. (2015), “Network Effects in Alternative Fuel Adoption: Empirical Analysis of the
Market for Ethanol,” Marketing Science, 34, (1), 78–97.
Sudhir, K. (2001), “Competitive Pricing Behavior in the Auto Market: A Structural
Analysis,” Marketing Science, 20, 42–60.
Tuchman, A. (2015), “Advertising and Demand for Addictive Goods: The Effects of
E-Cigarette Advertising,” Working Paper.
Villas-Boas, S. (2007), “Vertical Relationships between Manufacturers and Retailers:
Inference with Limited Data,” Review of Economic Studies, 74 (2), 625–652.
549
What is Bias?
Valid surveys require a survey expert to ask the right people the right ques-
tions in the right way. In other words, a survey expert must implement an
appropriate method to accurately measure the construct of interest – all
while sampling from an appropriate population. If a survey fails in any
one of these areas – method, implementation, and population sampled – it
may suffer from one or several biases.
In order to demonstrate that potential biases have been avoided and
to encourage acceptance by courts, the survey expert must take affirma-
tive steps to demonstrate that careful and relevant design and sampling
techniques were used. For purposes of this chapter, we will put potential
biases into three categories: (1) selection biases, (2) information-related
biases, and (3) analytical biases. The first category relates to the popula-
tion studied (i.e., did the expert seek out and ask the right people using
statistically valid sampling techniques?). The second category relates to
which questions are asked, how the questions are asked, and what answers
are offered. The third category relates to how the data are analyzed, such
as implementing criteria for respondent inclusion in the analytical sample,
or the interpretation of open-ended responses. In some cases, if biases are
introduced through the analyses of the results, alternative analyses could
be conducted using the same data. Experts may even recover from errors
resulting from information-related biases – an imperfect question, for
example, may still provide relevant information. On the other hand, it is
nearly impossible to recover from selection-related biases that result in a
failure to identify the right population. A valid survey must study the right
population – otherwise the results are irrelevant.3
These potential biases may exist when any survey is implemented in any
context; however, the incentives present in litigation, as well as the need to
demonstrate rigor in such contexts, make the assessment of bias in court
cases particularly critical. Recent expert reports and court opinions have
will have greater probative value if the expert can document and support
the choice of question, sample, and method, while minimizing the possibil-
ity for or existence of biases that can “tweak” the survey method in his or
her favor.
The survey expert’s decision to use open-ended or closed-ended ques-
tions can have implications in terms of relevance, analysis, and potential
for or perception of bias. Open-ended questions increase analytical
complexity and may make it difficult to group responses effectively, given
the array of words and phrases respondents may use to express the same
concept. Alternatively, closed-ended questions might “push” respondents
into an answer they would not otherwise have given, a concern expressed
by the Seventh Circuit in Hubbard v. Midland Credit Mgmt.21 Qualitative
research to justify closed-ended responses or a two-stage approach (i.e.,
open-ended followed by closed-ended questions) can help to alleviate
concerns of such biases.
When phrasing questions, social science researchers have long empha-
sized the importance of understandable language. The survey expert should
be wary of “unexpected meanings and ambiguities to potential respond-
ents”22 and adopt “a critical attitude toward [their] own questions.”23
Questions should be reviewed for clarity – and should test one concept at a
time. If questions are unclear or attempt to test too many factors at once,
they “may threaten the validity of the survey by systematically distorting
responses if respondents are misled in a particular direction.”24 Examples
of distortion include questions that are framed in a way to prompt a “yes”
or questions that inadvertently “tip off” the respondent to the researcher’s
hypothesis. Results from a classic experiment illustrate this effect. In
this experiment, respondents were presented with three identical product
samples, but were told that the samples were different. Unsurprisingly,
the respondents “acted as a demand bias explanation would assert and
obligingly varied their ratings of three identical samples.”25 In the recent
NetAirus patent litigation, the judge reasoned that the survey evidence was
affected by informational biases (among other issues), and excluded the
survey evidence in part because the expert’s hypothesis was exposed within
the survey instrument.26
An additional way to minimize potential bias is to conduct surveys
and experiments in a manner that is “double-blind,” thus eliminating the
chance that the interviewer could influence the results. Research indicates
that respondents generally want to please those conducting the survey;
therefore, to ensure objectivity, both “the interviewer and the respondent
should be blind to the sponsor of the survey and its purpose.”27 Expert use
of online surveys has reduced the possibility of unobservable interviewer
bias since the interviewer is a computer program.
have associated the plaintiff’s products or brands with the products of the
defendants by reviewing results to questions such as “Who makes or puts
[this at-issue product] out?” and “Why do you say that?” In addition, mis-
spellings, abbreviations, or colloquialisms used by respondents may make
such analyses difficult and subject to biased interpretation.
To avoid introducing researcher bias, open-ended responses can be
carefully analyzed by coders who are blind to the purpose of the study.
Such coding “requires a detailed set of instructions so that decision
standards are clear and responses can be scored consistently and accu-
rately.”33 Often, it may be important to involve two coders in the analysis
to compare results, cross-check response categorization, and ensure
consistency. If relevant, the expert may choose to include in his or her
production materials the instructions and decision standards provided
to the coders. Regardless, the raw open-ended response data and ensuing
analysis should be provided in order to allow for independent review and
confirmation by opposing parties.
Biases may also be introduced during the analysis of the data. When
analyzing data, it may be necessary to exclude certain categories of
respondents with appropriate justification. One example would be to
exclude “straight-liners,” or respondents who always select the first option
in multiple-choice answers, because the expert may suspect that these
respondents were not paying sufficient attention to the survey task.34
Experts may also choose to exclude those who take too much or too little
time to answer the survey questions. Generally, the analytical results are
unlikely to be affected by such exclusions. If, on the other hand, the expert
excludes larger categories of respondents, such as consumers of particular
products, or consumers residing in certain regions, the reasons for such
exclusions should be well documented and appropriately justified, and the
effect of such exclusions should be tested and understood.
Conclusion
NOTES
1. Sentius International LLC v. Microsoft Corporation, 2015 US Dist. LEXIS 8782 (9th
Cir. N.D. Cal. Jan. 23, 2015).
2. Kraft Foods Group Brands LLC v. Cracker Barrel Old Country Store, Inc., 735 F.3d 735
(7th Cir. Ill. 2013).
3. “A survey is inadmissible when the sample is clearly not representative of the universe it
is intended to reflect.” Bank of Utah v. Commercial Security Bank, 369 F.2d 19 (10th Cir.
1966).
4. Similar to Lanham Act matters, under California laws relating to “misleading” repre-
sentations, the standard is the “reasonable” consumer. See, e.g., Committee on Children’s
Television v. General Foods Corp. (1983) 35 Ca1.3d at 212; Chern v. Bank of America
(1976) 15 Cal.3d 866, 876; Colgan v. Leatherman Tool Group, Inc. (2006) 135 Cal.App.4th
663, 680. In Canada, a recent Supreme Court ruling stated that the standard should apply
to a “credulous and inexperienced” consumer. See Richard v. Time Inc. (2012) SCC 8.
5. J.T. McCarthy, McCarthy on Trademarks and Unfair Competition, 4th ed., Thomson
Reuters/West, 2012, pp. 376–379.
6. Ubiquitous internet use and the decline of land telephone line usage within certain
demographic categories have led to the increasing acceptance of internet-based surveys
that sample from online panels. For extensive discussion of the advantages and dis-
advantages of Internet surveys, please see Shari S. Diamond, “Reference Guide on
Survey Research,” in Reference Manual on Scientific Evidence, 3rd ed., The National
Academies Press, 2011, pp. 359–423, at pp. 406–409.
7. Diamond, “Reference Guide on Survey Research,” at pp. 386–387.
8. Competitive Edge v. Staples, 763 F. Supp. 2d 997; 2010 U.S. Dist. LEXIS 29678 (7th
Cir. N.D. Ill. 2010).
9. “A survey is inadmissible when the sample is clearly not representative of the universe
it is intended to reflect.” Bank of Utah v. Commercial Security Bank, 369 F.2d 19 (10th
Cir. 1966).
10. In Re: Front Loading Washing Machine Class Action Litigation, [Daubert hearings
opinion], July 10, 2013. http://www.lieffcabraser.com/Documents/front-loading-opinion-
daubert.pdf.
561
also demonstrates that external validity can alter how a specific consumer
behavior standard is understood. In so doing, the study demonstrates
how the requirements of the courtroom can actually contribute to the
understanding of consumer behavior.
I begin with the premise that the differing ways that validity criteria are
applied in litigation experiments stem from the differing goals they have
relative to academic consumer research.
Construct Validity
dilution. I would argue that this result is less than surprising. Nike is one
of the world’s best-known brand names. It would be hard to imagine that
the mention of Nikepal would not bring Nike to mind. But this measure
lacks construct validity in a dilution case. The court did not explain how
this association impairs the distinctiveness or harms the reputation of Nike.
More recent research has proposed and implemented a response
latency approach to measuring dilution (Morrin and Jacoby 2000; Pullig,
Simmons, and Netemeyer 2006). Subjects are presented with a series of
paired words and/or phrases on a computer screen. One word is a brand
and the other is a thing that might be associated with that brand. The
subjects’ task is to identify, yes or no, whether the association is appropri-
ate for the brand presented. For example, a study might present the target
brand of study paired with the product category it belongs to embedded
in a longer series of pairs (e.g., Heineken – beer). The proposed measure
of dilution in this case would be the speed and accuracy of the response to
the target brand – category association.
Unlike the association measurement in NikePal, response latency and
accuracy represent an attempt to operationalize a difference in brand
associations. However, the connection between response latency and
impairing distinctiveness or harming reputation in a marketplace remains
tenuous at best.
Internal Validity
External Validity
Concluding Remarks
Over the years I have developed the view that good science is good science,
in our journals and in the courtroom. Experiments in both arenas are held
to similar “classes” of standards: construct, internal, and external validity.
The biggest differences in my view are twofold. First, academic consumer
research experiments must be interesting in the sense that they are held to
a standard of advancing the state of human knowledge. Litigation experi-
ments must be interesting in the sense that they must address particular
issues important for the matter at hand. Second, litigation experiments
place a much greater emphasis on external validity for the same reason.
They must address particular issues important for the matter at hand.
I have written elsewhere about the need for greater cooperation between
academia and practice (Steckel and Brody 2001). The courtroom is but
another example. I hope the results of the Overstock experiment persuade
the reader that the external validity required by the courtroom provides
scholars with the opportunity to test and (as in this case, possibly) revise
their theories in that and other contexts. As Lambrecht and Tucker (this
volume) point out, as the world becomes digitally enabled, the ease of
doing this can only increase.
REFERENCES
Introduction
First developed formally in the early 1970s, Conjoint Analysis has been
used widely in marketing science to study and measure preference.
Numerous corporations have used and continue to use Conjoint Analysis
to make business decisions. With its use in the global smartphone litiga-
tion wars, Conjoint Analysis has enjoyed more than its 15 minutes of
fame in recent high-stakes litigation.1 Variants of it are routinely used
in intellectual property disputes, in product liability class actions, and in
consumer protection food-labeling matters.
Conjoint Analysis as a method of proof in the courtroom may have
arisen as a response to more stringent standards for admissibility of expert
opinions related to damages. Courts increasingly demand sophisticated
damages models tied to facts of a case. Take patent infringement, where
the call for market-based evidence is particularly strident. Uniloc v.
Microsoft2 and its progeny have laid to rest such venerable expert career
platforms as the 25 percent “rule of thumb.” This so-called rule, which is
more accurately thought of as a “wink wink, say no more” assumption,
used 25 percent as a “reasonable” percentage applied to the profit margin
of the accused product or service that was implicated by the intellectual
property in question.
Other decisions, notably Cornell v. HP,3 Lucent v. Gateway,4 and Laser
Dynamics v. Quanta5 have catechized unfounded entitlements to the
so-called Entire Market Value (EMV) of a product or service as a basis
for calculating damages. (EMV is the market price of the accused product
or service in question, instead of, say, a component of the device that
“contains” the accused functionality.) The doctrine of apportionment,
enshrined in the nineteenth century Supreme Court Garretson decision,
wherein the royalty base is “apportioned” to a portion of the product or
service, is back in vogue.6 And courts have also tightened standards with
respect to “comparable” license agreements in Georgia Pacific damages
analyses (see, for example, the Federal Circuit’s guidance in ResQNet.com
v. Lansa.7 The ResQNet decision held that “the trial court must carefully
tie proof of damages to the claimed invention’s footprint in the market
place.”8
572
Courts have also demanded rigor in class actions. The Supreme Court’s
Comcast decision calls for a “rigorous analysis” in the class certification
stage so that plaintiffs’ damages methodologies should be sufficiently
tied to the asserted liability theories.9 Given this increased scrutiny and
concomitant uncertainty with respect to what methods pass Daubert
screens, Conjoint Analysis has emerged as a potential option for litigants
and their experts.10 Like any scientific method, it is subject to misuse and
misinterpretation. With that caution in mind, I discuss some overarching
features of Conjoint Analysis.
are willing to pay for increasing from one level of an attribute to the
next level. WTP is related to but distinct from market price. WTP,
when properly constructed, gives a demand side measure and not
equilibrium market price impact.
l Preference shares: Partworths can also be used in “market simula-
tions” to estimate what share of the population would be willing
to buy a product, based on the attributes and levels specified for
the product. An extension of this concept is a willingness-to-buy
(“WTB”) measure. The WTB measure starts by calculating the
share of the population who would be willing to buy a product and
assesses the decline in that share when a certain feature is removed
from the product. Therefore, the WTB measure can be useful in
addressing what-if questions such as “what percent of the purchas-
ers would cease to buy a product if the product did not incorporate
an infringing feature?”
Planning stage
In this stage, the survey expert decides whether Conjoint Analysis is
likely to be useful for the litigation issue at hand and, if so, estimates
how long it will take to design and implement the survey and analyze
the results. This stage typically takes several days, however the timeline
may extend to many weeks depending on the complexities of the matter,
the theory of liability, and the products and features at issue. The key
consideration for the expert in this stage is to have enough understand-
ing of the technology at issue and the marketplace to assess whether it
would be possible to design attributes and levels in a meaningful way.
In complex cases, such assessment may require discussions with techni-
cal experts to understand the patented technology and how it impacts
consumer-facing features of the product. It may also require review of
publicly available information to understand how consumers approach
the purchase process in the relevant market and the information they
are exposed to.
Design stage
In this stage, the conjoint expert prepares and vets the “survey question-
naire,” decides which questions to ask, how to word questions, what
information and instructions to provide to the respondents, and so on.
At this stage, the conjoint expert should ensure that the survey does not
suffer from design problems (such as unclear or leading questions) and
document pretest results and outcomes. A second task in the design stage
is identifying the features to be included in the survey, finalizing feature
descriptions, choice of attributes and levels. As I discuss below, feature
selection is a theme that repeatedly appears in motions attempting to
exclude Conjoint Analysis results and is carefully addressed by courts in
numerous Daubert orders. Accordingly, this stage requires care and docu-
and food and beverage labeling.29 My search also showed a recent trend
in product liability class action cases: plaintiffs propose conjoint analysis
and/or a hedonic regression as means to estimate damages during Class
Certification Stage.30 Below, I summarize salient aspects in a selection of
matters where conjoint analysis has been used by litigants.
Apple v. Samsung I
health risks are ranked above every other feature, with the exception of
price.48
The district court determined that the expert’s testimony met the
requirements of Rules 702 and 703 the Federal Rules of Evidence.49 And
the district court certified the class sought by the plaintiffs.50 However,
US Court of Appeals overturned this decision and decertified the class.51
Notably, the Court of Appeals ruled that the class cannot be certified
because “[i]ndividualized proof is needed to overcome the possibility that
a member of the purported class purchased Lights for some reason other
than the belief that Lights were a healthier alternative.”52
court granted plaintiffs’ motion for class certification and allowed the
conjoint survey.67 On October 8, 2015, the parties participated in media-
tion and agreed to settle.68
Notes
1. See, for example, Apple Inc. v. Samsung Elecs. Co. Ltd. et al. No. 5:11-cv-01846, (N.D.
Cal. June 30, 2012) and Apple Inc. v. Samsung Elecs. Co. Ltd. No. 12-cv-00630, (N.D.
Cal. Feb. 25, 2014).
2. Uniloc USA, lnc. v. Microsoft Corp., 632 F.3d (Fed. Cir. 2011).
3. Cornell Univ. v. Hewlett-Packard Co., 609 F. Supp. 2d (2009).
4. Lucent Techs., Inc. v. Gateway, Inc., 580 F.3d (Fed. Cir. 2009) (“Lucent v. Gateway”).
5. LaserDynamics, Inc. v. Quanta Computer, lnc., 694 F.3d (Fed. Cir. 2012).
6. For details of these decisions and a general overview of developments in reasonable
royalty damages, see Shankar Iyer, “Patent Damages in the Wake of Uniloc,” Spring
2012, Vol. 23 No.3, Damages in IP Litigation, ABA Intellectual Property Section.
A more recent discussion is Zalin Yang, “Damaging Royalties:An Overview of
Reasonable Royalty Damages,” Berkeley Technology Law Journal.
7. ResQNet.com, Inc. v. Lansa, Inc., 594 F.3d (Fed. Cir. 2010).
8. ResQNet.com, Inc. v. Lansa, Inc., 594 F.3d 869 (Fed. Cir. 2010).
9. Comcast Corp. v. Behrend, 133 S. Ct. 1426 (2013).
10. The Daubert standard is a rule of evidence regarding the admissibility of expert testimony.
11. For a general overview of early business applications of Conjoint Analysis, see Green,
Paul E., Abba M. Krieger, and Yoram Wind. “Thirty years of Conjoint Analysis:
Reflections and Prospects,” Interfaces, 31:3 (2001), S56-S73.
12. Orme, Bryan K, Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research, Research Publishers, 2010, p. vii.
13. See, e.g., Hauser, John R., Olivier Toubia, Theodoros Evgeniou, Rene Befurt, and
Daria Dzyabura. “Disjunctions of conjunctions, cognitive simplicity, and considera-
tion sets.” Journal of Marketing Research 47, no. 3, 2010, pp. 485-496.
14. See John R. Hauser & Vithala Rao, “Conjoint Analysis, Related Modeling, and
Applications,” in Advances in Marketing Research: Progress and Prospects 141–168
(Jerry Wind & Paul Green eds., 2004).
15. Daubert, 509 U.S. at 594.
16. Apple v. Samsung II, Order granting in part and denying in part motions to exclude
certain expert opinions, February 25, 2014, p. 27.
17. Apple v. Samsung II, Order granting in part and denying in part motions to exclude
certain expert opinions, February 25, 2014, p. 28.
18. Apple v. Samsung II, Order granting in part and denying in part motions to exclude
certain expert opinions, February 25, 2014, p. 29.
19. See, for example, Diamond, Shari S., “Reference Guide on Survey Research,” in
Reference Manual on Scientific Evidence, Third Edition, Federal Judicial Center, 2011.
20. See, for example, Lucent v. Gateway, 1301, 1333-34.
21. Oracle v. Google, “Order Granting in Part and Denying in Part Google’s Daubert Motion
to Exclude Dr. Cockburn’s Third Report,” March 13, 2012, p. 15 (emphasis added).
22. Feature selection is a crucial step in the design of a conjoint survey which should be
done carefully and supported adequately. Failure to do so may be grounds for exclu-
sion of the survey. (See Oracle Am., Inc. v. Google, Inc., No. C 10—03561, 2012 WL
850705 (N.D. Cal. Mar. 13, 2012) (“Oracle v. Google”).
23. See Vithala R. Rao, Conjoint Analysis Springer (2014), Chapters 3 and 4.
24. Orme, Bryan K, Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research, Research Publishers, 2010, p. 45.
25. See for example, Hauser, John R., and Vithala R. Rao. “Conjoint analysis, related
modeling, and applications.” In Marketing Research and Modeling: Progress and
Prospects, pp. 141-168. Springer, 2004.
26. Orme, Bryan K. Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research, Research Publishers, 2010, p. 84.Chapter 9 has an explanation of
WTPs.
27. Diamond, S. S. (2011) “Reference Guide on Survey Research.” Reference Manual on
Scientific Evidence, 3rd edition, (Federal Judicial Center). p. 388.
28. Diamond, S. S. (2011) “Reference Guide on Survey Research.” Reference Manual on
Scientific Evidence, 3rd edition, (Federal Judicial Center). p. 389.
29. The search found 10 and 14 cases in the intellectual property and consumer protection
fields, respectively. The search also found an antitrust case involving conjoint analysis:
U.S. v. H & R Block, Inc. No. 11-00948 (BAH). In this matter, the plaintiffs’ expert
proposed conjoint analysis during the class certification stage.
30. See, for example, In re NJOY, Inc. Consumer Class Action, Scotts EZ Seed Litigation,
and Miller v. Fuhu Inc.
31. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 1.
32. Ryan V., Christopher, Avelyn M. Ross and Kristen P. Foster. “4 Tips for Using Consumer
Surveys In Patent Cases – Law360.” Accessed April 14, 2016. http://www.law360.com/art
icles/536189/4-tips-for-using-consumer-surveys-in-patent-cases.
33. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 14.
34. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 14.
35. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 15.
36. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 16.
37. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 16.
38. Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No. 5:11-cv-01846, “Order Denying Motion
for Permanent Injunction,” December 17, 2012, p. 1.
39. See Expert Report of John R. Hauser, Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No.
5:11-cv-01846-LHK, Exhibits D and E.
40. See Expert Report of John R. Hauser, Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No.
5:11-cv-01846-LHK, Exhibits D and E.
41. Expert Report of John R. Hauser, Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No. 5:11-cv-
01846-LHK,p. 34.The “rubberband” feature refers to the “scrolling effect that occurs
when [a user] reach[es] the end of a webpage” and the screen bounces back.See “Steve Jobs
and The ‘rubber Band’ patent.” Engadget. Accessed April 15, 2016. http://www.engadget.
com/2012/08/07/steve-jobs-and-the-rubber-band-patent.
42. See Expert Report of John R. Hauser, Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No.
5:11-cv-01846-LHK, Exhibits D and E.
43. Ryan V., Christopher, Avelyn M. Ross and Kristen P. Foster. “4 Tips for Using
Consumer Surveys In Patent Cases – Law360.” Accessed April 14, 2016. http://www.la
w360.com/articles/536189/4-tips-for-using-consumer-surveys-in-patent-cases;Bishop,
Bryan. “Apple Expert: Smartphone Owners Are Willing to Pay $100 Premium
for Features Samsung Copied.” The Verge, August 10, 2012. http://www.theverge.
com/2012/8/10/3234453/apple-expert-smartphone-owners-100-premium-copied-samsu
ng-trial; Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No. 5:11-cv-01846, “Order
Granting-In-Part and Denying-In-Part Motions to Exclude Expert Testimony,” June
29, 2012, p. 7.
44. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, p. 16.
45. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, pp. 308–309.
46. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, pp. 309–310.
47. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, p. 311.
48. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, p. 311.
49. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, p. 316.
50. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, p. 27.
51. McLaughlin v. Philip Morris USA, Inc. (Philip Morris v. Schwab), 522 F.3d 215 (2nd Cir.
2008) | Public Health Law Center.” Accessed April 15, 2016. http://publichealthlawcenter.
org/resources/mclaughlin-v-philip-morris-usa-inc-philip-morris-v-schwab-522-f3d-215-
2nd-cir-2008.
52. McLaughlin v. Philip Morris USA, Inc. (Philip Morris v. Schwab), 522 F.3d 215 (2nd Cir.
2008) | Public Health Law Center.” Accessed April 15, 2016. http://publichealthlawcenter.
org/resources/mclaughlin-v-philip-morris-usa-inc-philip-morris-v-schwab-522-f3d-215-
2nd-cir-2008.
53. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 1.
54. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 1.
55. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 4.
56. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 37.
57. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 38.
58. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer
Products Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014,
pp. 40–42.
59. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 41.
60. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer
Products Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014,
pp. 41–42.
61. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer
Products Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014,
p. 44.
62. Verdict Form, In re: Whirlpool Corp. Front-loading Washer Products Liability Litigation
Case, No. 1:08-WP-65000 (MDL 2001), October 30, 2014.
63. Amended Memorandum Opinion and Order, Devi Khoday and Danise Townsend, et. al,
v. Symantec Corp. and Digital River, Inc. Civil No. 11-180 (JRT/TNL), March 19, 2015.
64. Memorandum Opinion and Order Granting Plaintiff’ Motion for Class Certification,
Devi Khoday and Danise Townsend, et. al, v. Symantec Corp. and Digital River, Inc. Civil
No. 11-180 (JRT/TNL), March 31, 2014, p. 9.
65. Memorandum Opinion and Order Granting Plaintiff’ Motion for Class Certification,
Devi Khoday and Danise Townsend, et. al, v. Symantec Corp. and Digital River, Inc. Civil
No. 11-180 (JRT/TNL), March 31, 2014, p. 9.
66. Memorandum Opinion and Order Granting Plaintiff’ Motion for Class Certification,
Devi Khoday and Danise Townsend, et. al, v. Symantec Corp. and Digital River, Inc. Civil
No. 11-180 (JRT/TNL), March 31, 2014, p. 9.
67. Memorandum Opinion and Order Granting Plaintiff’ Motion for Class Certification,
Devi Khoday and Danise Townsend, et. al, v. Symantec Corp. and Digital River, Inc. Civil
No. 11-180 (JRT/TNL), March 31, 2014, p. 27.
68. “Symantec Norton Insurance Class Action Lawsuit Settlement.” Top Class Actions, July
14, 2014. https://topclassactions.com/lawsuit-settlements/closed-settlements/34134-syma
ntec-norton-insurance-class-action-lawsuit.
69. For a discussion of the technical terms in this discussion, the reader is referred to the
chapters by Toubia and Howell, Allenby, Rossi in this volume.
The Apple v. Samsung patent infringement cases are among the most high-
profile cases to use Conjoint Analysis. In Apple v. Samsung I, it was used
to estimate a price premium for various touchscreen features. Apple’s
conjoint expert focused on customers who were known to have purchased
the infringing smartphones (and tablets).
The experimental design included six features plus price: the capabilities
of the touchscreen, size and weight, camera, storage, connectivity, number
of apps, and price. The levels of the first feature were designed to represent
the benefit to the customer of the patents. The other features were used
to “distract” the customer to minimize focus on the touchscreen features
alone. Questions were worded so that all other features were instructed
to be held constant. The customer was asked to focus on the smartphone
(tablet) that he/she had bought and to assume that only the indicated
features and price varied. Respondents were drawn from a professional
panel and screened to be relevant. Survey craft was emphasized—pretest-
ing, layout of the features, video instructions for both the features and
the task, security controls, and tests of respondents’ care in answering the
questions. All estimation was by standard hierarchical Bayes with price
partworths constrained to be monotonic.
Price Premium
The damages calculations were done by another expert. The conjoint anal-
ysis was used “as an indicator of demand.” The conjoint expert was noted
that “I just have market demand and . . . the actual price that you pay
depends upon both the demand and also what Apple and Samsung would
be willing to supply.” The court endorsed this use of conjoint analysis as
relevant to the case (Judge Koh’s decision, June 29, 2012).
Permanent Injunction
After the jury awarded damages, the plaintiff sought to use the con-
joint analysis to justify a permanent injunction against the infringing
smartphones (tablets). Initially, the court judged that conjoint analysis
measured demand for features not products (see Judge Koh’s decision,
December 17, 2012), but that decision was remanded back to the court
(Court of Appeals for the Federal Circuit (“CAFC”) decision, November
18, 2013). In a follow-up decision, the district court endorsed the con-
joint survey for patent evaluation, but questioned its use for a permanent
injunction, citing the need to account for market price and citing that the
expert put forth the study (appropriately) for market demand only (Judge
Koh’s decision, March 6, 2014). However, this decision was remanded
back to the court (CAFC decision, September 17, 2015).
The use of consumer surveys and conjoint analysis has become increas-
ingly common in complex litigation, especially in intellectual property
disputes. In trademark litigation, for example, consumer surveys are often
used to assess the extent of consumer confusion across similarly marked
products. In patent infringement matters, consumer surveys are used to
assess customer demand for patented features in complex, multi-featured
products, to apportion value between patented and non-patented features,
and to estimate consumer willingness to pay for products that are provided
free in the marketplace. In such applications, survey respondents are often
asked to choose between hypothetical bundles of products (some of which
include the patented technology at issue) that have been assigned reason-
able prices. Statistical methodologies are then applied to estimate how
much a consumer is willing to pay to have the patented feature included
in the product or to estimate the increase in consumer demand associated
with including the patented feature in the product.
Although the use of consumer surveys and conjoint analysis has been
less common in antitrust litigation, there is a fairly long history of the use
of these techniques in the antitrust context. Diamond (2011) discusses
a 1985 antitrust case in which the plaintiff used a consumer survey to
identify product characteristics that affected consumer preferences and to
estimate alleged damages.1 Rubinfeld (2008) notes that US government
antitrust authorities have relied on conjoint analysis on numerous occa-
sions, including in reaching a consent decree with ski resort operators
in the 1997 United States v. Vail Resorts matter.2 Walter and Reynolds
(2008) and Hurley (2010) discuss the UK Competition Commission’s use
of customer surveys in defining relevant antitrust markets and assessing
competitive effects in the case of proposed mergers.3
Economists often prefer using market data (based on revealed prefer-
ence) rather than survey data (based on stated preference). However, in
some instances market data relevant to the issue at hand are unavailable.
For example, consumer surveys and conjoint analysis can be valuable in
assessing the possible price effects of a proposed merger before it has been
590
Once the purpose of the survey has been established and the conjoint
survey instrument has been designed, the next step is to identify the
target population for the survey – i.e., the universe of individuals that
are relevant to answering the question at hand. In the case of estimat-
ing the importance of a particular product characteristic in relation to
overall consumer demand for that product, the relevant target population
might be either current or current and likely future purchasers for those
products. For example, when assessing how consumer demand for infant
formula would be affected by the addition of certain additives, the target
population might be current or expectant parents (or other caretakers of
infants).
Defining the relevant target population is often a point of contention
in surveys used in litigation. In recent patent litigation between Apple
required because the market context is often relevant to addressing the key
issues in the case. For example, under certain assumptions conjoint survey
data can be used to estimate systems of demand equations, to estimate
market-wide responses to changes in price or product characteristics, or
to simulate different “but-for” market outcomes.20 As another example,
simulations can be conducted to test how a change in product attributes
changes the utility of different products and how consumers substitute
between products with different sets of features. Hildebrand (2006) out-
lines a methodology for using conjoint analysis to implement a hypotheti-
cal monopolist test for the purposes of market definition.21
In the next section, we present two case studies that illustrate how con-
joint survey data and models have been used in recent antitrust litigation.
Our first case study on the use of conjoint analysis in antitrust litiga-
tion relates to the US payment card industry and allegations of antitrust
foreclosure related to certain rules adopted by the two largest payment
card associations (or network systems): Visa and MasterCard. Visa and
MasterCard were organized as joint ventures, owned by the numerous
banking institutions that are members of the networks. Member banks of
the MasterCard and Visa networks can function either as card “issuers,”
merchant “acquirers,” or both. An “issuer” member bank issues cards to
cardholders; it serves as the liaison between the network and the individual
cardholder. An “acquirer” member bank acquires the card-paid transac-
tions of a merchant; a particular acquiring bank acts as liaison between the
network and those merchants accepting the network’s payment cards with
whom it has contracted.
Visa and MasterCard imposed certain restrictions on member banks
that wanted to issue Visa and MasterCard credit and debit cards.
Specifically, both MasterCard and Visa had rules that prohibited their
members from issuing American Express or Discover cards. Those rules
(Visa’s by-law 2.10(e) and MasterCard’s Competitive Programs Policy
(“CPP”)) were the focus of a civil lawsuit filed by the US Department of
Justice against Visa and MasterCard (the “DOJ Case”) in October 1998.22
The Court in the DOJ Case ordered the repeal of Bylaw 2.10(e) and the
CPP with respect to third-party issuing, finding that those restrictions
Along with some of the survey design issues noted above, the simulation
model was also a point of some controversy in the litigation. There was
substantial debate among the experts in the case as to whether the simula-
tion models, which were used to estimate relative market shares (for both
card networks and issuing banks), were a sufficiently rich and realistic
characterization of the actual market environment faced by cardholders.
Since the actual market environment involved many more card features
and options than could be incorporated practically into the simulation
models, the models necessarily required abstraction from many real-world
complexities. As such, the perceived validity and interpretation of the
relative market share estimates were colored substantially by the extent
to which the abstractions were deemed to be reasonable, given the market
circumstances at issue.26
Our second case study on the use of conjoint analysis in antitrust litigation
comes from allegations of antitrust liability and damages from exclusive
distribution contracts in the infant formula supplements industry. One of
the more important developments in the infant formula industry in recent
years has been the introduction of DHA and ARA additives produced
from various sources other than breast milk.27 DHA and ARA are types of
fats that are found naturally in breast milk.28 Research suggests that DHA
and ARA from breast milk provide substantial benefits for infant eye and
brain development.29
Infant formula supplemented with DHA and ARA has been substan-
tially more expensive than un-supplemented infant formula. For example,
the US Department of Agriculture estimated that a supplemented Mead
Johnson infant formula was about 9.4 percent more expensive per ounce
in January 2006 than an un-supplemented infant formula. Despite the
higher prices, these additives have been well-received by US consumers
and sales of DHA- and ARA- supplemented infant formula expanded
rapidly following their initial introduction. In 2004, about 65 percent of
US infant formula dollar sales were DHA- and ARA- supplemented and
by 2008 the rate was over 95 percent.30
Martek Biosciences Corporation (“Martek”) is a US a producer of food
ingredients from microbial sources (e.g., algae and fungi).31 Martek has
developed and patented fermentable strains of microalgae which produce
oils rich in DHA.32 A similar Martek-patented process was developed for
a fungus that produces an oil containing ARA.33 Substantially all DHA
and ARA supplements used in US infant formula have been produced and
sold by Martek,34 at least in part because Martek has had long-term sole
source (exclusive) supply agreements in place with the large infant formula
manufacturers operating in the United States.35
BNLfood (“BNL”) sells DHA and ARA supplements (derived using
egg phospholipids) for use in infant formula. BNL sells its DHA and
ARA products in Europe, Asia and the United States. BNL’s DHA and
ARA supplements are advertised as “completely natural egg derived
fatty acids for all life stages, i.e., infancy, adulthood & ageing.”36 This
characteristic may resonate with some end-purchasers of infant formula,
who would prefer to avoid bio-engineered DHA and ARA, as third-party
market research has identified strong consumer demand for organic and
natural infant nutrition products and anticipates market growth in this
area.37
In addition to being derived from natural sources, BNL’s fatty acids
were alleged to be more effective than Martek’s oils for some functions.
For example, studies have suggested that ARA and DHA are more
bioavailable when they are delivered in phospholipid form as opposed to
triglyceride form38 and that egg phospholipids are more bio-effective than
triglycerides (algal oil-type).39
In 2011, BNL filed an antitrust lawsuit against Martek related to
Martek’s exclusive contracts with infant formula manufacturers for the
supply of ARA and DHA. According to the Amended Complaint in that
case, the matter involved:
an action against Martek for its efforts to monopolize the manufacture and
sale of DHA and ARA for use in infant formula in the US market. Faced with
an emerging competitive threat from BNLfood, Martek has acted to protect
its monopoly position by extending its exclusive contracts in violation of US
antitrust laws.40
The economic experts in the case analyzed whether: (1) Martek’s alleged
anticompetitive conduct has substantially limited competition in the
market(s) in which its products are sold; (2) Martek’s alleged anticompeti-
tive conduct has caused injury to BNL; and (3) the extent of damages to
BNL arising from Martek’s alleged anticompetitive conduct. One of the
key elements of the antitrust analysis was an estimate of BNL’s long-run
US market share but-for Martek’s exclusive agreements, which allegedly
delayed BNL’s entry into the US market. One basis for estimating this
figure was market data on the shares achieved by a variety of organic or
natural products, which reflect the demand for natural, non-bioengineered
products in the United States.
Another estimate of the long-run market share that BNL’s egg-based
DHA and ARA could be expected to achieve in the but-for world was
derived using an internet-based survey of recent infant formula purchas-
ers in the United States and a series of conjoint exercises that were used
to estimate the relative demand for different additives types at different
price points. The survey was thus designed to measure the demand for
Conclusion
Notes
final on October 4, 2004. US v. Visa USA, Inc., et al., 344 F.3d 229 (2d Cir. 2003), cert.
denied, 543 U.S. 811 (2004).
24. Discover Financial Services Inc. and Discover Bank v. Visa USA Inc., et al, 04-cv-
07844, U.S. District Court, Southern District of New York (Manhattan). American
Express also sued Visa and MasterCard on similar grounds. For further background,
see also http://www.bloomberg.com/apps/news?pid=newsarchive&sid=a7pbgZn.610c
&refer=finance and http://www.nytimes.com/2007/11/08/business/08visa.html?_r=0.
25. Cannibalization was measured as the ratio of the difference in take rate for proprietary
Discover cards (i.e., Discover-branded payment cards that were also issued by Discover
as the issuing bank) from the two scenarios to the overall take rate for non-proprietary
Discover cards (i.e., Discover-branded cards issued by other banks). For example,
suppose the take rate for proprietary Discover cards was 19 percent in the base case,
and in the new scenario the take rate for proprietary Discover cards was 18 percent and
third-party Discover cards take rate was 5 percent. In this case, the cannibalization rate
is 20 percent [(19–18)/5].
26. The litigation between Discover and Visa settled on the eve of trial, so it is unclear how
the competing interpretations of the various experts would have been received by the
jury.
27. Abbott first introduced these additives into its US product lines in 2002, with Mead
Johnson and Nestle following in 2003. V. Oliveira, E. Frazao, and D. Smallwood,
“Rising Infant Formula Costs to the WIC Program: Recent Trends in Rebates and
Wholesale Prices / ERR-93,” USDA, February 2010, p. 9. (Hereinafter “Rising Infant
Formula Costs to the WIC Program”.)
28. See A. Abad-Jorge, “The Role of DHA and ARA in Infant Nutrition and
Neurodevelopmental Outcomes,” Today’s Dietitian, 10 (10), p. 66.
29. See, e.g., R. Uauy, D. R Hoffman, P. Mena, A. Llanos, E. E Birch, “Term infant
studies of DHA and ARA supplementation on neurodevelopment: results of rand-
omized controlled trials,” Journal of Pediatrics, 143 (4), Supplement, October 2003,
pp. 17–25.
30. “Rising Infant Formula Costs to the WIC Program,” p. 9.
31. In December 2010, Martek was acquired for $1.09 billion by Dutch vitamin maker
Royal DSM N.V. R. Sharrow, “Martek to be acquired by Royal DSM for $1.09B,”
Baltimore Business Journal, December 21, 2010, http://www.bizjournals.com/baltimore/
news/2010/12/21/martek-to-be-acquired-by-royal-dsm.html.
32. Martek Biosciences Corporation Form 10-K for the Fiscal Year Ended October 31,
2005, p. 2. See also US Patent No. 5,374,657.
33. Martek Biosciences Corporation Form 10-K for the Fiscal Year Ended October 31,
2005, p. 2.
34. Martek’s DHA oil is the only source of DHA currently used in infant formula in the
United States and represents “nearly 100% of the estimated $4.5 billion US retail
market for infant formula.” See Martek Biosciences Corporation Form 10-K For the
Fiscal Year Ended October 31, 2009, p. 15.
35. See, e.g., http://www.bloomberg.com/apps/news?pid=newsarchive&sid=aSv1xqLUNRQ
A; http://www.prnewswire.com/news-releases/martek-biosciences-announces-extended-
global-sole-source-supply-agreement-with-mead-johnson-96792844.html; and http://
www.prnewswire.com/news-releases/martek-signs-multi-year-worldwide-sole-source-sup
ply-agreement-with-abbott-58472377.html.
36. See http://www.ovolife.eu/.
37. “US Infant Nutrition Market: N5BD-88,” Frost and Sullivan, 2009.
38. See, e.g., F. Thies et al., “Unsaturated fatty acids esterified in 2-acyl lysophosphati-
dylcholine bound to albumin are more efficiently taken up by the young rat brain
than the unesterified form,” Journal of Neurochemistry, 59, pp. 1110–1116 (1992); and
V. Wijendran et al., “Efficacy of dietary arachidonic acid provided as triglyceride or
phospholipid as substrates for brain arachidonic acid accretion in baboon neonates,”
Pediatric Research, 51, pp. 265–272 (2002).
39. See, e.g., V. Wijendran et al., Pediatric Research, 51, pp. 265–272 (2002) and M.
Lagarde et al., Journal of Molecular Neuroscience, 16, pp. 201–204 (2001).
40. Amended Complaint for Injunctive Relief and Damages, BNLFood Investments
Limited SARL v. Martek Biosciences Corp., May 5, 2011.
41. “Rising Infant Formula Costs to the WIC Program,” p. 10.
42. See, e.g., Kenneth Train, Discrete Choice Methods with Simulation, 2nd ed., New York:
Cambridge University Press, 2009, pp. 55–56.
609
the profits of the patent holder would be higher, and these lost profits are
a measure of damages.
In order to calculate potential or lost profits we need to represent
the demand system and the competitive environment of the firm. In a
real marketplace, companies change their prices in response to different
market situations. The problem with many methods of calculating optimal
prices and the associated demands is that they assume a static marketplace
where the set of competitive products remains fixed when a firm enhances
its product. In reality, as features are introduced into the market place,
competitors respond to those introductions by either adjusting prices or
feature sets. These price changes can dampen the advantage that a com-
pany can achieve by the introduction of the new product. If a company
does not take this competitive reaction into account, static models will
overstate the potential benefit of the product introduction.
Competitive reaction can be taken into account using a concept called
market equilibrium. Equilibrium occurs when the market participants do
not have an incentive to change their current offerings. When a market
is not in equilibrium firms would be able to increase profits by adjust-
ing prices or features offered. Economic theory suggests that markets
will settle into equilibrium as profit maximizing firms seek to maximize
their respective advantage. If we have an accurate demand system and
information on the marginal costs of the participating firms, we can
mathematically simulate a market’s equilibrium and capture both the first
and second order effects of a feature introduction on profitability.
This chapter will discuss in more detail the equilibrium calculations
and how an analyst can carry those out. An important part of these
equilibrium calculations will be the creation of a high-quality demand
model. Discrete choice models are commonly used for the creation of these
demand models so we will discuss the specifics of using this technique for
the creation of a basic demand model. The chapter will then illustrate the
technique with a hypothetical patent setting using digital cameras. The
conclusion will discuss the broader application of these techniques as well
as the limitations and challenges that these techniques engender.
Equilibrium Calculations
An equilibrium occurs when none of the firms under study have a profit
incentive to change the price or features they offer. Any equilibrium cal-
culation must take into account a firm’s incentives as well as consumer
responses. Many different theories of equilibrium have been developed
using various assumptions about the decision-making process of the
a lternative measures have been proposed, but they largely ignore competi-
tive pressures or rely solely on a measure of a customer’s willingness-to-pay
for the feature rather than the value that the firm can ultimately capture.
Commonly used methods include calculating a customer’s willingness-to-
pay (WTP) or using a pseudo-WTP.
Willingness-to-pay is a social welfare concept that does not directly
relate to the value provided to the firm; however, it is a commonly used
metric to measure the value of a feature. In order to calculate WTP we
need to define two products, a feature-poor product, A, and a feature-rich
product, A*. WTP is the welfare surplus that a customer receives when
confronted with a choice between the feature poor product, A, and the
feature rich product, A*. This surplus is measured in terms of money
rather than in terms of utility. WTP is defined as the amount of additional
money that would have to be given to the consumer before they would be
indifferent between the feature-poor product with the additional money
and the feature-rich product. This is represented as:
where V (p,y 0 A) is the indirect utility function of the consumer defined as:
V ( p, y 0 A) 5 E c max Uj 0 Ad
j
j51 j51
WTP 5 2
bp bp
It should be noted that this value for WTP is slightly different from
WTP measure used in traditional conjoint literature (Orme 2001). In the
traditional conjoint literature, WTP is defined as the additional price
a firm can charge such that a consumer is indifferent between choice A
and A*.
exp (akr b2 pbp)
5
a exp aj b2 pj bp 1 exp akb 2pb
J
( r ) ( r )
j51, j2k
a exp (aj b 2 pj bp) 1 exp (ak b 2 ( p1pWTP) bp)
J
r * r
j51, j2k
uj 5 b'aj 2 bp pj 1 ej
The vector of attributes of the product, aj, includes the feature that
requires valuation. This vector can represent discrete attributes in which
case the values are dummy coded or continuous attribute quantities. The
utility and specifying the scale factor. The reference utility is usually set
by assigning one of the products a utility of zero. This is generally accom-
plished by dummy coding the attributes and assigning the associated
levels for the reference product a design code of zero. It is also necessary
to account for scale shifts. Typically, this is resolved by setting the scale
parameter for the extreme value distribution to 1.0.
Assuming all consumers have enough money to purchase any alterna-
tive, the random utility model yields the standard multinomial logit
specification commonly used with choice-based conjoint studies:
E (Q) 5 M a Pr ( j 0 br) 5 M a
R R exp (Ajbr 2 bppj)
where A represents the design matrix for the market being studied such
that
a1
a
A 5 ≥ 2 ¥.
(
aK
In this specification, akis the design for the kth product and j indexes the
focal product. Based on the aggregate demand it is possible to consider the
firms problem.
Equilibrium calculations
p ( pj 0 p2j) = E [ Q ] ( pj 2cj) .
The marginal cost for firm j is represented by cj, the price for the firm pj ,
the price for the competitors p2j , and the quantity sold Q. Note that the
choice probability component of the profit function depends on the entire
set of competitive firms prices, p, and product offerings, A. This profit
function forms the basis for calculating the best response for a given firm.
A firm’s best response is the maximum profit.
max p ( pj 0 p2j)
p
Since the total market size, M, is simply a scaling factor for the total
profit, it does not affect the final solutions and can be ignored in the cal-
culation of the Nash equilibrium price. If the profit function is a concave
function then we could calculate each firm’s conditional best response
analytically by finding the partial derivative of each firms profit function,
setting it equal to zero, and solving the resulting systems of equations.
There are a few challenges, however, that makes this impractical. The first
is that it is not possible to show that the profit function is strictly concave
in a heterogeneous logit setting. The concavity of the function depends
on the parameter values derived from the choice models. In practice this
appears to be primarily driven by the distribution of the price coefficient.
The heterogeneous logit model will often lead to a concave profit func-
tion; however, we have observed cases where the profit function is not
concave. When the profit function is not concave, a common variation is
to observe a reasonable local maximum followed by a local minima and
then an extremely large global maximum. This appears to occur when
there is a large mass of price coefficients close to zero. The implication
here is that there is a small set of very price-insensitive consumers. These
price-insensitive consumers would buy the product regardless of the price,
so a firm would find it most profitable to charge very high prices to just
this small set of consumers. In most settings this not a practical solution
and we see it as an artifact of the choice exercise.
A second computational challenge stems from the integral in the profit
function of the firm. Recall that:
E (Q) 5M3Pr ( j 0 b) Pr (b 0 u) d b
dp
h1 ( p) 5
dp1
dp
h2 ( p) 5
dp2
h (p) 5
(
dp
hJ ( p) 5
dpJ
and the equilibrium price vector, p*, is the zero of the function h (p) .
As previously discussed, the profit functions for firms are often, but
not always concave. Because of this it is necessary to independently verify
the computed root to the first-order conditions and we demonstrate two
methods for finding equilibrium prices. The first method involves directly
computing equilibrium prices using the first-order conditions. This opti-
mization involves using a quasi-Newton method to find the roots directly.
The optimization problem is finding the minimum of the norm of h (p)
min 00 h ( p) 0
p
3. Update the price for the first firm in the price vector.
4. Repeat steps 2–3 for the remaining firms, one at a time
5. Calculate the difference between the starting price vector in 1 and the
updated price vector from step 4.
6. If the difference between the price vectors is greater than the tolerance
set the price vector from step 4 as the starting price vector and go to
step 2.
task where the experimentally designed concepts are presented to the user.
The respondent then chooses their most preferred option. A follow-up
task is then included which asks the respondent if they would actually pur-
chase the product chosen. If the respondent indicates that yes, they would
purchase the product, the chosen product is recorded as the final choice.
If the respondent would not purchase the product the previous response is
discarded and the no-purchase option is recorded.
While the two methods for eliciting the no-purchase decision should
lead to logically identical results, in practice the two methods often lead
to significantly different choice patterns. Asking a conjoint question as
a dual-response none question generally increases the prevalence of the
“would not purchase” option. It is not uncommon to see the “would not
purchase” share more than double with the use of the dual-response none.
While we won’t speculate on the respondent psychology leading to the
change in none share, we generally feel that the dual-response none meth-
odology leads to a none share that is more in line with the non-purchase
option observed in actual purchase situations. For this reason we recom-
mend that the dual-response none option be used when designing conjoint
studies for equilibrium calculations.
An additional important criterion to consider when fielding a con-
joint study is the experimental design used. A conjoint study should
be considered an experiment and designed to allow for the maximum
discrimination between features. Experimental design is a highly technical
subject (see, for example, Box and Draper 1987, Chapters 4 and 5) and
we will not cover it fully here. Existing experimental-design software
used to create choice-based conjoint studies is readily available and will
create high-quality designs. The combination of features, brands, and
prices in a conjoint study is specifically designed to have a high degree a
variation. This variation is necessary to increase the power of the study to
discriminate the value of different features and attributes. The problem
is that an unrestricted experimental design will create offerings that are
not currently represented in the market place and may vary significantly
from existing product offerings. This is a necessary component of the
survey technique and care should be taken to avoid artificially restricting
offerings by constraining or modifying the generated experimental design.
can use these empirical distributions to create a sample from the posterior
distribution. The process draws a series of samples from a normal distribu-
tion with the draw of the mean and variance as the distribution param-
eters. In this way it is possible to draw a sample of arbitrary size from the
posterior distribution. While this method requires recalculating the sample
after the completion of the MCMC chain, the resulting distribution is less
sensitive to outliers and can be easily reweighted to match population
characteristics. We use this method for calculating the posterior distribu-
tion for this example exercise. We sampled 1,000 draws of the mean and
variance parameters and then created a sample of 10,000 draws from each
of those samples, for a total of 1,000 × 10,000 = 10,000,000 draws from
the posterior distribution. These draws can then be used to empirically
sample from the distribution of expected demand. Recall that the formula
for expected demand is:
(Qj) 5 M 3 a Pr ( j 0 bi )
1 I
I i51
where I indexes the draws from the posterior distribution. From a com-
putational standpoint we need to calculate the relative choice probability
for each draw from the posterior distribution and then scale the average
of those choice probabilities by the total market size. This step forms an
input into the profit maximizing routine.
Using the camera study previously described, we applied these tech-
niques to calculate an equilibrium price change for one of the features.
Consider the following scenario. Nikon currently holds a patent on a
unique swivel screen feature for a camera. They are considering develop-
ing the feature for introduction into a new camera model and want to
determine the potential return on the investment they could expect. They
recognize that competitors are likely to respond to the feature introduc-
tion by adjusting their prices to compensate for the new market situation.
Nikon fields the conjoint study described above. Using the equilibrium
calculation they can determine that they are likely to increase profits by
about 42 percent. The full distribution of the expected change in profits for
Nikon is show in Figure 32.2. This change in profits can be calculated as:
* ) 2 pNikon ( pw/oSS
pNikon ( pSS * )
%Dp 5
*
pNikon ( pw/oSS )
This is simply the difference in profit when Nikon implements the swivel
screen and when it does not implement the swivel screen, normalized by
the profit without the investment.
The 95 percent posterior interval (represented by the short vertical bars
in the figure) for the change in profits is quite wide, going from 26 percent
to 58 percent. This reflects the high uncertainty in the result even with a
study including nearly 500 respondents.
Consider an alternative scenario where Sony, instead of responding
strictly by adjusting prices, decides to also implement a swivel screen.
After litigation it is determined that Sony has infringed on Nikon’s patent.
The question then is how much harm has Nikon suffered due to the
infringement by Sony. The answer is clearly that Nikon has been damaged
to the extent that Sony’s infringement hurt the profits that Nikon would
have received in the absence of the infringement. This can be calculated as:
Conjoint analysis has proved attractive not only in patent litigation but
also in consumer fraud cases. In consumer fraud litigation, the plaintiffs
identify some alleged false product representation or an omission of
negative product information such as side effects from the use of a drug
or mildew in a washing machine. The plaintiffs then proceed to argue that
a class action is the proper way of adjudicating the dispute and award-
ing damages. The first step in this process is to seek certification of the
class under federal statues governing class litigation. In class certifica-
tion, the plaintiffs must show that damages can be proven on the basis of
arguments that apply in some “predominant” sense over the entire class
and that damages can be calculated in a “uniform manner.” Class action
plaintiffs are attracted to conjoint as a way to isolate the impact of the
alleged fraud on the price of the product. That is, the view is that damages
should be based on the difference between the “value” of the product as
represented and the value as received. The problem becomes one of how to
assess this difference in “value.”
For example, in Saavedra v. Eli Lilly, the plaintiffs sought damages
for what they alleged was Eli Lilly’s failure to disclose a higher incidence
of withdrawal symptoms from discontinuing use of the anti-depressant,
Cymbalta. The “value” as represented could be the market price of the
Cymbalta medicine. The real question is what is the value as received. The
plaintiffs proposed to use conjoint analysis to “value” the product in the
counterfactual world in which the full extent of withdrawal symptoms was
known. In particular, the plaintiffs proposed to undertake a “willingness
to pay” calculation for the purpose of valuation. The court denied the
plaintiffs’ motion for class certification (Case No. 2:12-CV-9366-SVW,
12/18/2014), arguing that this proposed method was inappropriate. The
court stated “the Plaintiffs used the term ‘value’ to mean consumer
utility – a concept distinct from price. . . . It appears that consumer value
is a subjective concept distinct from the fair market value concept com-
monly used.” The court further states that the “Plaintiffs’ theory of injury
is distinct from the typical benefit-of-the bargain claim because it focuses
only on the demand side of the equation, rather than the intersection of
demand and supply.”
In short, the court in Saavedra v. Eli Lilly endorses our view that we
must use market prices as the basis for damages in litigation and that
WTP is not appropriate. The court even understood the basic point
that a WTP analysis will tend to overstate the damages, stating: “he [the
plaintiffs’ expert] forgets that a rational consumer would surely pay less
than she believes a drug is worth.” The plaintiffs failed to show the court
that conjoint surveys can be used as an input to the process of estimating
a market price for the product as received.
In the US District Court in California, the court recently denied a
motion for certification of a class in a consumer fraud case involving
e-cigarettes, endorsing the same logic we have outlined in our work. In
Ben Z. Halberstam v. NJOY, Inc et al, Judge Margaret Morrow (Case No.
14-CV-00428) stated (citing a decision in Werdebaugh v. Blue Diamond
Growers) that the correct measure of restitution “can be determined by
taking the difference between the market price actually paid by consum-
ers and the true market price that reflects the impact of the unlawful,
unfair, or fraudulent business practices.” Judge Morrow rejected the
Conclusion
the marginal cost of the products and especially the marginal cost of the
feature. This is often closely guarded by companies, but should be a part
of any investment decision. The second difficulty is primarily compu-
tational. Solving the equilibrium analysis increases the computational
burden and there is an absence of commercial software that implements
the technique. Both of these problems are easily surmounted and should
not been seen as a major impediment to the use of equilibrium analysis for
feature valuation.
Feature valuation is an important element of the marketing analytics
tool kit and one of the primary motivations behind the popularity of con-
joint analysis. Our hope is that we have called attention to an important
deficiency in current, consumer-centric, approaches and demonstrated
that equilibrium calculations are feasible.
NoteS
1. For details and further discussion, see “Valuation of Patented Product Features”
(2014), Journal of Law and Economics, 57, 629–663 and “Economic Valuation of Product
Features” (2014), Quantitative Marketing and Economics 12, 4, 421–456.
2. This survey was part of a wave of four other very similar conjoint studies on digital
cameras. Across all studies, 16,185 invitations were sent and 6,384 individuals responded.
Of those who responded, 2,818 passed screening and of those that passed screening,
2,503 completed the study. The other four studies were not considered in the analysis.
References
Allenby, Greg M., Geraldine Fennell, Joel Huber, Thomas Eagle, Tim Gilbride, Dan
Horsky, Jaehwan Kim, Peter Lenk, Rich Johnson, Elie Ofek, Bryan K. Orme, Thomas
Otter, and Joan Walker (2005), “Adjusting Choice Models to Better Predict Market
Behavior,” Marketing Letters, 16 (3/4), 197–208.
Apple Electronics Co. Ltd. v. Samsung Electronics Co. Ltd., No. 11-CV-1846 [N.D. Cal.
December 2, 2011].
Barry, Chris, Ronen Arad, and Kristofer Swanson (2013), “2013 Patent Litigation Study,”
PWC Research Report.
Ben Z. Halberstam v. NJoy Inc et al, No. 2:14-CV-00428 [C.D. Cal. August 14, 2015].
Ben-Akiva, Moshe, Daniel McFadden, and Kenneth Train (2015), “Foundations of Stated
Preference Elicitation.” Working Paper, http://eml.berkeley.edu/~train/foundations.pdf
(accessed January 4, 2016).
Jennifer L Saavedra v. Eli Lilly and Company, No. 2:12-CV-09366 [C.D. Cal. December 18,
2014].
Jeruss, S., R. Feldman and J. H. Walker (2012), “The America Invents Act 500: Effects of
Patent Monetization Entities on US Litigation,” Duke Law and Technology Review, 11
(2), 357–388.
Orme, Bryan K. (2001), “Assessing the Monetary Value of Attribute Levels with Conjoint
Analysis: Warnings and Suggestions.” Unpublished manuscript. Sawtooth Software,
Sequim, Wash.
Orme, Bryan K. (2009), Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research. Madison, WI: Research Publishers.
Phonographic Performance Co. of Australia Ltd. (ACN 000 680 704) under section 154(1) of
the Copyright Act 1968(CTH) (2007), available at Federal Court of Australia, http://www.
judgments.fedcourt.gov.au/judgments/Judgments/tribunals/acopyt/2007/2007acopyt0001.
R Core Team (2015), “R: A language and environment for statistical computing.” R
Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/.
Wlömert, Nils and Felix Eggers (2016), “Predicting new service adoption with conjoint
analysis: external validity of BDM-based incentive-aligned and dual-response choice
designs,” Marketing Letters, 27: 195–210.
Allegations
On July 26, 2007, the Citri-Lite Company (“Citri-Lite”) filed suit against
Cott Beverages, Inc. (“Cott”) for breach of contract due to Cott’s alleged
failure to promote a licensed product in a commercially reasonable
manner.2 Citri-Lite, a company that produced and marketed “Slim-Lite,”
a non-carbonated fruit-flavored zero-calorie weight-loss drink, entered
into an exclusive licensing agreement with Cott on September 17, 2003.3
As a consequence of the agreement, Cott agreed to manufacture, produce,
distribute, sell, and market Slim-Lite with the following terms:
1. Cott would pay a royalty rate of $0.50 to Citri-Lite for each case sold;4
2. Cott would maintain the level of marketing support Citri-Lite was
previously providing; and
3. Cott would provide some method to protect Citri-Lite if the product
did not fare as well as expected.5
633
His analysis included weekly sales data and weekly demo data at Sam’s
Club for the whole period of the agreement.16
First, Dr. Bucklin regressed weekly aggregate Slim-Lite sales for Sam’s
Club locations on the number of demos run in that particular week and
found that demos led to an increase in sales of approximately 12 cases
during the weeks in which demos were performed compared with the
weeks in which the demos were not.17 Second, Dr. Bucklin determined
that Cott spent, on average, approximately $143 per demo, which included
both the cost of the demo and the price of the Slim-Lite cases used for sam-
pling purposes, and that the increase in the contribution margin brought
by the sale of the 12 additional cases in weeks in which the demo occurred
was only $16.20, resulting in a net loss of $126.80 due to each demo in
weeks in which demos were performed compared with weeks in which a
demo was not performed.18 Third, Dr. Bucklin tested whether demos in
prior weeks affected current weekly aggregate Slim-Lite sales for Sam’s
Club locations and found that demos did not have any long-term effect on
sales of Slim-Lite using either a four- or eight-week horizon.19 Dr. Bucklin
also ran an analysis to determine if the effects of the demos were different
from January to August 2004, when the Sam’s Club distribution was
constant, versus September 2004 to April 2005, when the number of Sam’s
Club locations in which Slim-Lite was carried increased significantly. He
found the same effect for these two subsamples as for the full sample.20
Therefore, Dr. Bucklin argued that the demo activity at Sam’s Club only
increased sales in weeks in which the demo activity was performed and not
in subsequent weeks, and that each demo resulted in a net loss in weeks
in which demos were performed.21 Since the demos did not result in short-
term profits, and did not increase long-term sales, Dr. Bucklin concluded
that the demos were not effective marketing tools for Slim-Lite in Sam’s
Club locations and hence that Cott’s decision to reduce and ultimately
cancel demos at Sam’s Club was reasonable.22
The agreement specified that Cott spend $0.80 per case sold on market-
ing efforts, and that Cott make commercially reasonable efforts to pro-
mote Slim-lite.23 Dr. Bucklin argued that the three alternative marketing
strategies used by Cott besides demos were reasonable:
Finally, as Cott spent at least $0.80 per case when considering all these
marketing mechanisms, including the price reduction, and because demos
were not commercially reasonable efforts, Dr. Bucklin argued that Cott’s
efforts were appropriate and reasonable from a marketing perspective.25
Plaintiff’s Rebuttal
Case Outcome
After reviewing the experts reports, rebuttal reports and a number of legal
motions submitted by or on behalf of both Citri-Lite and Cott, Judge
Oliver W. Wanger decided that Cott was not liable to Citri-Lite for any
of the claims and argued that Cott had established, with a preponderance
of evidence, the fact that it acted with reasonable justification to protect
its own economic interest.43 More specifically, Judge Wanger concluded
that the licensing agreement between Cott and Citri-Lite did not guaran-
tee Slim-Lite’s commercial or marketing success, and that Cott did spend
a significant amount of “money, time and effort” to market Slim-Lite,
which falls under the umbrella of commercially reasonable marketing
efforts.44 Directly related to the expert reports, Judge Wanger concluded
that the industry marketing expert’s opinions were “old school” and
“not founded in modern marketing science, economics, or quantifiable
approaches” and were thus less persuasive than Dr. Bucklin’s opinions.45
Notes
In this case study, we discuss the use of consumer surveys to evaluate con-
sumer confusion in a trademark infringement case. In trademark cases, a
plaintiff alleging infringement must provide evidence of the infringement,
including the likelihood of consumer confusion. Since trademark owners
are often unable to provide evidence of actual confusion, consumer
surveys can be used to evaluate the likelihood of consumer confusion
over similarity of trademarks or products1 because a survey gauges the
“subjective mental associations and reactions of prospective purchasers.”2
Courts can rely on survey evidence to establish likelihood of confusion
in trademark infringement cases. The admissibility of a survey and the
weight given to survey evidence depend in part on the survey design and
the manner in which the survey was conducted.3 Below we summarize
the role surveys have played in trademark infringement cases and discuss
how consumer surveys were used by both the plaintiffs and defendants in
a trademark infringement case involving artesian bottled water from the
Republic of Fiji.
640
ship of the goods or services offered under the parties’ marks.5 Critical
aspects of trademark infringement cases are, typically, the degree of
similarity between the marks at issue and whether the parties’ goods and/
or services are sufficiently related that consumers are likely to mistakenly
assume that they come from a common source. Trademark owners who
can successfully prove their case in court can obtain an injunction against
the infringer to prevent further use of the mark. One common approach
for measuring likelihood of consumer confusion is through consumer
surveys.
There are two common types of survey formats for addressing likelihood
of confusion due to similarity of marks or products:
This case study focuses on the Eveready format, which has been
accepted in numerous trademark infringement cases.6 Originally used in
Union Carbide Corp. v. Ever-Ready, Inc.,7 the Eveready format has become
the “gold standard”8 for evaluating likelihood of consumer confusion in
cases with a strong, top-of-mind senior mark (i.e., highly accessible in
memory).9 In a typical Eveready format, a respondent is first shown the
defendant’s mark in context (e.g., picture or product, advertisement, etc.)
and then asked a series of open-ended questions, such as “who makes or
puts out the product(s) shown here,” “why do you say that,” and “what
other product(s) does the company put out.” The last question could be
asked in the form of closed-ended “sponsorship” or “affiliation” ques-
tions, e.g., “do you believe that whoever makes or puts out the product(s)
1) is sponsored by (or affiliated with) another company, 2) is not spon-
sored by (or affiliated with) any other company.”10 There is some variation
in the phrasing of these questions and care must be taken to ensure that the
questions are not leading or suggestive.
The last question allows the researcher to identify whether consumers
believe that the mark they were shown is related to similar brands the com-
pany produces, even if consumers do not know the name of the company
itself. For example, in the original Union Carbide Corp. v. Ever-Ready, Inc.
case, while consumers were not able to directly name Union Carbide as the
company that makes Eveready batteries, they did indicate that the product
came from the same company that made Eveready batteries.11
When designing an Eveready format survey, it is critical that the survey
respondents are drawn from the proper universe (i.e., individuals who
make the ultimate purchase decision), and that the sample of respondents
is representative of the universe it is intended to reflect.12 Surveys may
be conducted by mall-intercept, telephone, or online.13 Regardless of
the format, the survey must include a screener to select the appropriate
respondents who represent typical purchasers of the product in question
(i.e., the product associated with the defendant’s allegedly infringing
mark). In addition to sampling respondents from the proper universe, it
is important to select the appropriate control for the survey. Controls can
be used to eliminate background noise. A strong mark is more likely to
be remembered and associated with more products (even when there is no
similarity of marks) than a mark that is relatively unknown. For example,
if a respondent is shown a soda can with a Pepsi mark, and then asked who
makes or puts out this product or what other brand this product brings to
mind, he or she may guess Coke even though there is no similarity in the
marks. This is due, in part, to the strength of Coke brand. The control
group “functions as a baseline and provides a measure of the degree to
which respondents are likely to give an answer . . . not as a result of the
[product at issue], but because of other factors, such as the survey’s ques-
tions, the survey’s procedures . . . or some other potential influence on a
respondent’s answer such as preexisting beliefs.”14
In implementing an Eveready survey, it is common to use a test and
control group and compare the levels of confusion between the two
groups. Respondents in the test group are shown a picture or advertise-
ment with the stimulus or defendant’s mark and asked the series of
questions described above. In the test group, confusion is measured by the
proportion of respondents who associated the defendant’s mark with the
plaintiff’s either by naming the plaintiff specifically or naming other prod-
ucts produced by the plaintiff. Respondents in the control group are asked
the same questions as the test group but receive a slightly varied control
mark without the at-issue features. Similarly, confusion is measured by
the proportion of respondents who associate the control mark with the
plaintiff. There may be some baseline level of confusion in both groups
due to other factors, such as brand awareness or brand association, and
thus the difference in confusion levels between the test and control group
is a measure of consumer confusion due to the defendant’s mark.
If the difference is positive, then consumers are more likely to confuse
the defendant’s mark with plaintiff’s products or services than they are
with a slightly varied control mark, indicating the defendant’s mark is
On October 6, 2009, FIJI filed suit against VITI for federal law claims of
trademark infringement, trade dress infringement, trademark dilution,
and California statutory and common law claims of unfair competition.15
FIJI subsequently filed a motion for a preliminary injunction against
VITI. Plaintiffs allege Defendant’s VITI bottled water was confused with
FIJI bottled water due to similarities between the trademarks and trade
dress of FIJI’s bottled water products and the packaging and labeling of
VITI’s bottled water products.16 Figure 34.1 depicts the packaging for
selected FIJI and VITI bottles.
In order for FIJI to succeed on the merits of its trade dress infringe-
ment claim, FIJI must show that the VITI bottle design will likely cause
consumers to believe that VITI is produced by or affiliated with FIJI.
Confusion can be established using guidelines commonly known as the
Sleekcraft factors. The eight factors are: (1) the similarity of the mark(s)
or trade dress; (2) the strength of the mark(s) or trade dress; (3) evidence
of actual confusion; (4) the proximity or relatedness of the goods; (5) the
degree to which the marketing channels used for the goods converge; (6)
the type of goods and the degree of care likely to be exercised by the pur-
chasers; (7) the defendant’s intent in selecting the mark or trade dress; and
(8) the likelihood of expansion of the product lines.17 In this case, the court
relied on evidence from a consumer survey for (2) the strength of the trade
dress, and (3) evidence of actual confusion.18
FIJI retained Dr. Hal Poret, a survey expert, to measure the likelihood
of consumer confusion between the VITI label and packaging and the
FIJI trade dress. Dr. Poret’s assignment was to determine whether or
not consumers who viewed the VITI label and packaging confused the
bottled water for FIJI bottled water, or any other related products pro-
duced by Plaintiffs.19 Dr. Poret, the FIJI expert, addressed this question
After adjusting for confusion found in the control group, the FIJI expert
found a 24.3 percent confusion level as a result of the defendant’s label and
packaging.23 That is, after controlling for the baseline level of confusion,
24.3 percent of respondents answered that the VITI bottled water was
made by FIJI or the same company that makes FIJI bottled water. Thus,
FIJI expert concluded that “there is a high likelihood that VITI bottled
water will be confused with FIJI.”24
In the test group of the FIJI expert’s survey, respondents were shown the
defendant’s label and packaging of the VITI bottled water, an image of
which is provided in Figure 34.2.
The respondents were then asked the Eveready format questions such
as: “what company or brand puts out the product I just showed you?”
“does the company that puts out the product I just showed you put out
any other product or products that you know of?” “what other product or
products?”25 Respondents who mentioned the FIJI brand or company in
response to any of the questions were coded as confused.
Figure 34.2 Test stimulus used in Dr. Poret’s (FIJI expert) survey
In the control group of the FIJI expert’s survey, respondents were shown
a similar product, but without the at-issue features of the label and pack-
aging of VITI bottled water. Specifically, the control bottle excluded the
square shape of the bottle, the blue bottle cap, the three-dimensional effect
of two transparent labels and the stylization of the VITI logo, among
others. The control bottle retained other, not-at-issue features, such as
the information that the product was mineral water from the Fiji Islands
and a label that conveyed images of water and tropical islands, among
others.26 Figure 34.3 depicts the image of the bottle packaging utilized in
the control survey.
The respondents in the control group were asked the same questions
as the respondents in the test group. Respondents who mentioned the
Figure 34.3 Control stimulus used in Dr. Poret’s (FIJI expert) survey
Level of confusion
Test group 32.1%
Control group 7.8%
Difference in confusion level 24.3%
Based on his survey and analysis, VITI expert concluded that only 8
percent of respondents confused the VITI label and packaging with the
FIJI brand.32
The Plaintiffs described key differences between the FIJI expert’s and the
VITI expert’s surveys as flaws that undermined VITI expert’s conclusion.
Those flaws, and the corresponding biases, included the following:
you think this is?”40 In deciding whether or not a respondent confused the
VITI label and packaging with the FIJI trade dress, VITI expert did not
consider 10 responses that misspelled “Fiji” as “Figi,” “Fugi,” or “Fuji.”41
Plaintiffs found that, in total, the VITI expert failed to account for 79
respondents that named FIJI in response to any of his three main questions.
Using the VITI expert’s data, Plaintiffs found that, with the most conserva-
tive approach, at least 18–20 percent of respondents (upward to 28 percent)
confused the VITI label and packaging as being associated with the FIJI
brand or company.45 Due to these biases, among others, Plaintiffs argued that
the VITI expert’s survey artificially deflated the confusion level findings.46
Case Outcome
Notes
1. Bird, C. R., and Steckel, J. H., “The Role of Consumer Surveys in Trademark
Infringement: Empirical Evidence from the Federal Courts,” U. Pa. J. Bus. L. 14 (2011):
1016.
2. McCarthy, T. J., McCarthy on Trademarks and Unfair Competition, 32:158 at 32 189
(4th ed. 2003).
3. Thornburg, R. H. Trademark Surveys: Development of Computer-Based Survey
Methods, 4J. Marshall Rev. Intell. Prop. L. 91 (2005), 93.
4. http://www.uspto.gov/page/about-trademark-infringement, accessed August 15, 2015.
5. http://www.uspto.gov/page/about-trademark-infringement, accessed August 15, 2015.
6. Contrary to the Eveready format, the Squirt format is commonly used in cases in
which the mark is not as well-known (i.e., not easily accessible in memory) and must be
included in the survey design as part of a line-up of brands. See Squirtco v. Seven-Up
Co., 628 F. 2d 1086, 1089 n.4, 1091 (8th Cir. 1980).
7. Union Carbide Corp. v. Ever-Ready, Inc., 531 F. 2d 366 (7th Cir. 1976).
8. Swann, J. B., Brewster, W. H., Mayberry, J. D., and Henn, Jr., R. C., “Likelihood
of Confusion Surveys,” Intellectual Property Desk Reference: Patents, Trademarks,
Copyrights and Related Topics, 171.
9. Swann, Brewster, Mayberry, and Henn, “Likelihood of Confusion Surveys,” 171.
10. Swann, Brewster, Mayberry, and Henn, “Likelihood of Confusion Surveys,” 171–182.
11. Union Carbide Corp. v. Ever-Ready, Inc., 531 F. 2d 381 (7th Cir. 1976).
12. See, e.g., 1-800 Contacts, Inc. v. WhenU.com, 309 F. Supp. 2d 467, 499 (S.D.N.Y. 2003).
13. Regardless of the format, it is standard procedure to validate that a survey was actu-
ally completed by a real respondent, the demographic information provided by the
respondent was accurate, and the respondent who completed the survey actually met
the screening criteria. See, e.g., Lavrakas, P. J., “Telephone Surveys,” in Handbook
of Survey Research, eds. P. V. Marsden and J. D. Wright, 2nd edition, Bingley, UK:
Emerald Group Publishing, 2010, at 493.
14. Novartis Consumer Health Inc. v. Johnson & Johnson Merck Consumer Pharms. Co., 129
F. Supp. 2d 351, 365 n.10 (D.N.J. 2000).
15. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., 741 F. Supp. 2d
1165, United States District Court, Central District of California, Southern Division,
Case 8:09-cv-01148-CJC-MLG, filed September 30, 2010, at 1.
16. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., at 1.
17. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., 741 F. Supp. 2d
1165, at 7.
18. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC e. al., 741 F. Supp. 2d
1165, at 8–9.
19. Expert Report of Hal Poret, at 35–36.
20. Expert Report of Hal Poret, at 39, 48, 74.
21. Expert Report of Hal Poret, at 39.
22. Expert Report of Hal Poret, at 39–42.
23. Expert Report of Hal Poret, at 46.
24. Expert Report of Hal Poret, at 46.
25. Expert Report of Hal Poret, at 40.
26. Expert Report of Hal Poret, at 41–44.
27. Expert Report of Hal Poret, at 45.
28. Expert Report of Hal Poret, at 54.
29. Industry practice is to validate 15–20% of respondents. Expert Report of Hal Poret, at
54.
30. Rebuttal Report of Hal Poret Regarding Gentry Bottled Water Survey, Fiji Water
Company, LLC et. al. v. Fiji Mineral Water USA, LLC et al., United States District
Court, Central District of California, Southern Division, Case 8:09-cv-01148-CJC-
MLG, filed August 2, 2010, (“Rebuttal Report of Hal Poret”), at 5.
“All Natural” claims have increasingly become the subject matter of law-
suits in recent years.3 These cases may involve allegations that the adver-
tising of various manufacturers is deceptive and misleading. For example,
the complaints may interpret “All Natural” labels very narrowly and claim
652
Dr. Kent Van Liere, a survey expert on behalf of Ben & Jerry’s, filed
an expert report on October 2, 2013, employing a survey on consumers’
knowledge and beliefs about Ben & Jerry’s ice cream products.7 The goal
of such a report and its included analyses is to assist the judge in his or
her evaluation of consumers’ interpretation of and reliance on the “All
Natural” claims. An expert report is somewhat comparable to an aca-
demic article in the sense that it explains the methodologies, data sources,
and sets forth the results of the survey in a manner that is transparent and
reproducible.
As with any academic research paper, Dr. Van Liere was given a
Ben & Jerry’s Survey Expert: Survey Design – the Diamond Principles
Dr. Van Liere’s survey was conducted according to a set of principles that
were established by Professor Shari Diamond, a renowned professor of
law and a social psychologist. Dr. Van Liere adhered to five principles set
forth by Professor Diamond pertaining to (1) the relevant survey popula-
tion; (2) procedures for sampling from the relevant population; (3) ques-
tion design and interviewing procedures; (4) the nature of the specific test
and control stimuli shown to survey participants; and (5) the protocol
for calculating the results from the survey. These principles are described
below.
Jerry’s Cherry Garcia ice cream carton, yet with an altered label
containing the phrase “Vermont’s Finest” and without the claim “All
Natural” on the label.
Except for the altered label (i.e., “Vermont’s Finest” used in place
of “All Natural”) the test and control stimuli were the same. In his
expert report, Dr. Van Liere reasons the choice of this survey design
as a method to counteract potential “background noise” that may be
caused by elements of the test condition that do not constitute alleg-
edly deceptive content. While this is a true and important statement,
experimental setups such as the one used by Dr. Van Liere can also
Figure 35.1 Test and control stimuli used in the Van Liere survey (with
added illustrations)
Plaintiffs took issue with Dr. Van Liere’s survey and critiqued it on a
number of dimensions. One expert, Dr. Stephen A. Schneider, attacked
the design of the survey, while another expert, Dr. Elizabeth Howlett, pro-
vided other conceptual critiques. Both attempted to undermine Dr. Van
Liere’s credibility and the reliability of the results of his survey. Critique
points addressing the survey craftsmanship included, among others, an
alleged failure to measure the impact of the general “All Natural” claim on
consumer purchasing decisions; targeting of the wrong population; assess-
ment of the wrong time period because Dr. Van Liere included consumers
who purchased Ben & Jerry’s in the past 10 years; lack of representative-
ness of the sample because Dr. Van Liere sampled consumers across the
country, rather than focusing only on the California customers at issue;
and insufficient sample size. Critique points addressing the survey on a
conceptual level included a failure to control for the effects of marketing,
news coverage, FDA warnings, popular culture, advertising, and other
sources of information that influenced customers to believe that Ben &
Jerry’s is “All Natural,” above and beyond being exposed to the packag-
ing in a single showing in a mall; a lack of testing of whether some custom-
ers had already come to believe that all Ben & Jerry’s ice cream products
are “All Natural” and all ingredients and processes are natural; and the
lack of control for the fact that some consumers were exposed to adver-
tisements or warnings that Ben & Jerry’s was not “All Natural” while
others were exposed to messages that Ben & Jerry’s was “All Natural.”
In a response declaration, Dr. Van Liere addressed these critiques one by
one and insisted that his survey was ultimately reliable; he also argued that
neither of the opposing experts was qualified to evaluate his survey based
on their educational background.
Case Outcome
Notes
1. Vithala R. Rao and Joel H. Steckel (1998), “Analysis for Strategic Marketing,”
pp. 23–75.
2. William M. Pride and O. C. Ferrel (2010), “Marketing,” p. 130.
3. For example, an article in the Wall Street Journal pointed out that at least 100 “All
Natural” lawsuits were filed in the years 2012 to 2013. See Mike Esterl, “Some Food
Companies Ditch ‘Natural’ Label,” Wall Street Journal, November 6, 2013.
4. Specifically, Judge Walter stated in his order to dismiss an “All Natural” law suit against
Nestle: “For example, Plaintiff offers the Webster’s Dictionary definition of ‘natural,’
meaning ‘produced or existing in nature’ and ‘not artificial or manufactured.’ [. . .]
However, even Plaintiff admits that this definition clearly does not apply to the Buitoni
Pastas because they are a product manufactured in mass [. . .], and the reasonable con-
sumer is aware that Buitoni Pastas are not ‘springing fully-formed from Ravioli trees
and Tortellini bushes.’” Order Granting Defendants’ Motion to Dismiss First Amended
Complaint (filed 9/23/13; Docket No. 30), Case No. CV 13-5213-JFW (AJWx), October
25, 2013.
5. Judge Dimitrouleas stated in his order to dismiss a Defendant’s motion in an “All
Natural” law suit against Bodacious Food Company: “Specifically, Plaintiff alleges
that ‘the Products contain synthetic, artificial, and/or genetically modified ingredients,
including, but not limited to, Sugar, Canola Oil, Dextrose, Corn Starch, and Citric
Acid.’ [. . .] The Court finds no basis to disregard those allegations, which identify the
specific compounds that are purportedly not ‘natural.’” Order Denying Defendant’s
Motion to Dismiss, Case No. 14-80627-CIV-DIMITROULEAS, September 14, 2014.
6. A “class action” lawsuit refers to lawsuits brought by one or more plaintiffs on behalf
of many similarly situated individuals who are able to present evidence that “the ques-
tions of law or fact common to the members of the class predominate over any ques-
tions affecting only individual members, and that a class action is superior to other
available methods for the fair and efficient adjudication of the controversy” (“The Use
of Econometrics in Class Certification,” American Bar Association, Econometrics:
Legal, Practical, and Technical Issues, ABA Publishing, 2005, pp. 179–224, p. 180). For
example, if it were discovered that manufacturers were conspiring to raise prices on a
common consumer good, it is possible that the consumers of this good will have been
affected in a similar way. It may be more efficient for a single lawsuit representing these
similarly-affected consumers to be filed, rather than thousands of separate lawsuits, one
for each consumer, for both the court and the individuals.
7. Expert Report of Dt. Kent Van Liere, Skye Astiana on behalf of herself and all others
similarly situated v. Ben & Jerry’s Homemade, Inc., Case No. 4:10-cv- 04387-PJH,
October 2, 2013 (hereafter “Van Liere Report”).
8. Van Liere Report, p. 6.
9. Surveys and interviews screen respondents by asking them a series of background ques-
tions before presenting them with the full questionnaire. These background questions
ensure that only individuals in the target population are surveyed. Dr. Van Liere asked
screener questions to identify American consumers who were over 18 and who had
purchased Ben & Jerry’s ice cream or frozen yogurt in the past ten years. Dr. Van Liere
excluded consumers who worked in a market research company, a store in the mall, or
a dairy product manufacturer, as well as consumers who had participated in a market
research study in the past three months. See Van Liere Report, p. 9 and Exhibit C.
10. Surveys include filter questions to reduce the likelihood of respondents guessing at
answers if they do not have an opinion on the topic of interest. Filter questions can
include a “no opinion” option in the answer or they can explicitly ask respondents
whether or not they have an opinion on the topic before asking follow-up questions.
See Shari S. Diamond, “Reference Guide on Survey Research,” in Reference Manual
on Scientific Evidence, 359–423, Federal Judicial Center/National Academy of Sciences,
2011.
11. Respondents were also asked a follow-up question: “What makes you say that?”
12. Open-ended questions allow respondents to answer using their own words, while close-
ended questions require respondents to select one (or more) answer option(s) provided
to them in the survey.
13. To avoid respondent bias, some surveys include distractors, which are questions unre-
lated to the central research topic, in order to obscure the true purpose of the survey.
For example, Dr. Van Liere’s primary topic of interest is the alkali content of the Ben
& Jerry’s product, but includes questions on the use of pasteurized milk and the car-
rageenan content of the product to avoid placing explicit emphasis on the topic of inter-
est, which could bias responses.
14. All of the questions described here were followed up with an open-ended question of
“why do you say that?”
In civil law, the United States’ Federal Rules of Civil Procedure (“FRCP”)
require that parties to a lawsuit provide documents to the opposing side
that are relevant to the matter at hand, as long as they are not excluded by
attorney–client privilege or similar restrictions and as long as the request
is not overly burdensome.2 Since the adoption of the FRCP in 1938, the
scope and volume of discovery has steadily increased in lockstep with
technological innovations such as the office copier.3 Amendments to the
FRCP in 2006 made it clear that all “electronically stored information”
is within the scope of civil discovery. As one recent review of discovery
rules notes, “[d]iscoverable information is now found not only on desktop
computers and network servers, but on PDAs, smart cards, cell phones,
thumb drives and backup tapes, as well as in bookmarked files, temporary
files, activity logs, Facebook accounts, and text messages, to name just
a few examples.”4 As of 2013, a single lawsuit could involve more than
100 million pages of discovery documents, requiring over 20 terabytes of
storage.5 By one estimate, discovery costs represent 35–50 percent of the
cost of litigation.6
The need to deal with this deluge of information has created an industry
around electronic discovery or e-discovery through which “electronic
data is sought, located, secured and searched with the intent of using it
as evidence in a civil or criminal legal case.”7 A large group of software
and consulting firms with different business models and different levels
of technological sophistication assist companies and law firms in the
e-discovery process.8
Machine learning plays an increasingly important role in one aspect
of e-discovery: document review.9 In a typical document review exercise,
documents requested are classified in terms of their likely relevance to the
case. Traditional methods for accomplishing this classification typically
involve keyword or Boolean searches, followed by manual review of the
search results.10 It is hardly surprising, however, that methods utilizing
661
In common law systems such as the judicial system in the United States,
previous court rulings can establish a principle or rule that can be binding
or persuasive to a court. Black’s Law Dictionary defines precedent as “rule
of law established for the first time by a court for a particular type of case
and thereafter referred to in deciding similar cases.”13 Identifying the best
precedent to cite in support of legal arguments can make the difference
between winning or losing a case. In principle, lawyers must evaluate
thousands of cases in order to identify the most relevant legal precedent
for a given case. More often, they rely on their experience and training to
narrow their search to a more manageable number.
However, if a case presents factual circumstances that are somewhat
Start
Case from
2nd, 3rd, D.C., Yes
Affirm
or Federal
Circuit?
No
Affirm
o
Is the
N
Reverse
s
No
Reverse
Source: T. Ruger, P. Kim, A. Martin, and K. Quinn (2004), “The Supreme Court
Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court
Decisionmaking,” Columbia Law Review, 104, 4, 1150–1210, figure 1.
the United States was a respondent in the case, Justice O’Connor would
vote to reverse precedent if the primary issue in the case was related to civil
rights, the First Amendment, economic activity, or federalism.19 In some
cases, Ruger et al. (2004) incorporated other justices’ predicted decisions
into a given justice’s decision tree. For example, according to Ruger
et al. (2004), Justice Thomas’ decision tree predicts that he would vote to
r eaffirm precedent if Justice Scalia’s predicted vote was not liberal and the
lower court’s ruling was conservative.20
More recently, Katz et al. (2014) have applied the extremely randomized
tree method, a close relative of the random forest approach, to identify
optimal decision trees while considering over 90 potential explanatory var-
iables.21 The random forest approach creates a “forest” of decision trees,
each tree trained on a different randomly selected subset of the overall
dataset. The model then makes a prediction by aggregating the predictions
of all the random trees in the forest by majority rule. The extremely rand-
omized tree method modifies this by randomizing the subset of attributes
used in each tree and the thresholds for those attributes.
Notes
1. The authors gratefully acknowledge excellent assistance from Robert Meyer, Ryann
Noe, Jacob Ryan, and Justin Ying. The views expressed in this article are solely those
of the authors, who are responsible for the content, and do not necessarily reflect the
views of Cornerstone Research.
2. FRCP Rule 26(b). “Parties may obtain discovery regarding any nonprivileged matter
that is relevant to any party’s claim or defense—including the existence, description,
nature, custody, condition, and location of any documents or other tangible things.”
FRCP Rule 26(b)(2)(C) “On motion or on its own, the court must limit the frequency
or extent of discovery otherwise allowed by these rules or by local rule if it determines
that: the burden or expense of the proposed discovery outweighs its likely benefit.”
3. Milberg LLP and Hausfeld LLP (2011), “E-Discovery Today: The Fault Lies Not In
Our Rules,” Federal Courts Law Review, 4, 2, 1–52, p. 20.
4. “E-Discovery Today,” p. 6.
5. “E-Discovery Today,” p. 7.
24. See, e.g., W. Antweiler and M. Z. Frank (2004), “Is All That Talk Just Noise? The
Information Content of Internet Stock Message Boards,” Journal of Finance, 59 (3),
1259–1294; N. Archak, A. Ghose, and P. G. Ipeirotis (2001), “Deriving the Pricing
Power of Product Features by Mining Consumer Reviews,” Management Science, 57
(8), 1485–1509; O. Netzer, R. Feldman, J. Goldenberg, and M. Fresko (2012) “Mine
Your Own Business: Market-Structure Surveillance through Text Mining,” Marketing
Science, 31 (3), 521–543. Other studies have applied similar techniques to study the
effect of media coverage on stock returns and volume, the relative informational
content of social media commentary from different groups of stakeholders, among
others. See, e.g., P. Tetlock, (2007), “Giving Content to Investor Sentiment: The Role
of Media in the Stock Market,” Journal of Finance, 62 (3), 1139–1168; S. Jian, H. Chen,
J. Nunamaker, and D. Zimbra (2014) “Analyzing Firm-Specific Social Media and
Market: A Stakeholder-Based Event Analysis Framework,” Decision Support Systems,
67, 30–39. For a review see A. Nassirtoussi, S. Aghabozorgi, T. Wah, and D. Ngo
(2014), “Text Mining for Market Prediction: A Systematic Review,” Decision Support
Systems, 41, 7653–7670.
671