[Research Handbooks in Business and Nanagement] Natalie Mizik, Dominique M. Hanssens, Editors - Handbook of Marketing Analytics_ Methods and Applications in Marketing Management, Public Policy, and Litigation Support (2018, Edward.pdf

Handbook of Marketing
Analytics
Methods and Applications in Marketing
Management, Public Policy, and Litigation Support
Edited by
Natalie Mizik
Professor of Marketing and J. Gary Shansby Endowed Chair
in Marketing Strategy, Foster School of Business, University
of Washington, USA
Dominique M. Hanssens
Distinguished Research Professor of Marketing, Anderson
School of Management, University of California, Los Angeles,
USA
Cheltenham, UK • Northampton, MA, USA
MIZIK_9781784716745_t.indd 3 14/02/2018 16:38

© Natalie Mizik and Dominique M. Hanssens 2018
All rights reserved. No part of this publication may be reproduced, stored

in a retrieval system or transmitted in any form or by any means, electronic,
mechanical or photocopying, recording, or otherwise without the prior
permission of the publisher.
Published by
Edward Elgar Publishing Limited
The Lypiatts
15 Lansdown Road
Cheltenham
Glos GL50 2JA
UK
Edward Elgar Publishing, Inc.

William Pratt House
9 Dewey Court
Northampton
Massachusetts 01060
USA
A catalogue record for this book

is available from the British Library
Library of Congress Control Number: 2017950469
This book is available electronically in the

Business subject collection
DOI 10.4337/9781784716752
ISBN 978 1 78471 674 5 (cased)

ISBN 978 1 78471 675 2 (eBook)
Typeset by Servis Filmsetting Ltd, Stockport, Cheshire
MIZIK_9781784716745_t.indd 4 14/02/2018 16:38

Contents
List of contributors ix
Overview of the chapters xviii
Introduction 1
Natalie Mizik and Dominique M. Hanssens
METHODS CHAPTERS
part i Experimental Designs
1 Laboratory experimentation in marketing 11

Angela Y. Lee and Alice M. Tybout
2 Field experiments 32
Anja Lambrecht and Catherine E. Tucker
3 Conjoint Analysis 52
Olivier Toubia
part ii Classical Econometrics
4 Time-series models of short-run and long-run marketing

impact 79
Marnik G. Dekimpe and Dominique M. Hanssens
5 Panel data methods in marketing research 107
Natalie Mizik and Eugene Pavlov
6 Causal inference in marketing applications 135
Peter E. Rossi
part iii Discrete Choice Modeling
7 Modeling choice processes in marketing 155

John Roberts and Denzil G. Fiebig
8 Bayesian econometrics 181
Greg M. Allenby and Peter E. Rossi
MIZIK_9781784716745_t.indd 5 14/02/2018 16:38

vi Handbook of marketing analytics
9 Structural models in marketing 200

Pradeep K. Chintagunta
part iv Latent Structure Analysis
10 Multivariate statistical analyses: cluster analysis, factor

analysis, and multidimensional scaling 227
Dawn Iacobucci
part v Machine Learning and Big Data
11 Machine learning and marketing 255

Daria Dzyabura and Hema Yoganarasimhan
12 Big data analytics 280
Asim Ansari and Yang Li
part vi Generalizations and Optimizations
13 Meta analysis in marketing 305

Donald R. Lehmann
14 Marketing optimization methods 324
Murali K. Mantrala and Vamsi K. Kanuri
CASE STUDIES AND APPLICATIONS
PARt vii Case Studies and Applications in

Marketing MANAGEMENT
15 Industry applications of conjoint analysis 375

Vithala R. Rao
16 How time series econometrics helped Inofec quantify online
and offline funnel progression and reallocate marketing
budgets for higher profits 390
Koen Pauwels
17 Panel data models for evaluating the effectiveness of
direct-to-physician pharmaceutical marketing activities 402
Natalie Mizik and Robert Jacobson
MIZIK_9781784716745_t.indd 6 14/02/2018 16:38

Contents vii
18 A nested logit model for product and transaction-type

choice planning automakers’ pricing and promotions 415
Jorge Silva-Risso, Deirdre Borrego and Irina Ionova
19 Visualizing asymmetric competitive market structure in large
markets 431
Daniel M. Ringel and Bernd Skiera
20 User profiling in display advertising 448
Michael Trusov and Liye Ma
21 Dynamic optimization for marketing budget allocation at
Bayer 458
Marc Fischer and Sönke Albers
part VIII Case Studies and Applications in Public

Policy
22 Consumer (mis)behavior and public policy intervention 473

Klaus Wertenbroch
23 Nudging healthy choices with the 4Ps framework for
behavior change 486
Zoë Chance, Ravi Dhar, Michelle Hatzis, Michiel Bakker,
Kim Huskey and Lydia Ash
24 Field experimentation: promoting environmentally friendly
consumer behavior 502
Noah J. Goldstein and Ashley N. Angulo
25 Regulation and online advertising markets 511
Avi Goldfarb
26 Measuring the long-term effects of public policy: the case of
narcotics use and property crime 519
Keiko I. Powers
27 Applying structural models in a public policy context 539
Paulo Albuquerque and Bart J. Bronnenberg
PART IX Case Studies and Applications in

Litigation Support
28 Avoiding bias: ensuring validity and admissibility of survey

evidence in litigations 549
Rebecca Kirk Fair and Laura O’Laughlin
MIZIK_9781784716745_t.indd 7 14/02/2018 16:38

viii Handbook of marketing analytics
29 Experiments in litigation 561

Joel H. Steckel
30 Conjoint analysis in litigation 572
Sean Iyer
31 Conjoint analysis: applications in antitrust litigation 590
Michael P. Akemann, Rebbecca Reed-Arthurs and J. Douglas
Zona
32 Feature valuation using equilibrium conjoint analysis 609
John R. Howell, Greg M. Allenby and Peter E. Rossi
33 Regression analysis to evaluate harm in a breach of contract
case: the Citri-Lite Company, Inc., Plaintiff v. Cott Beverages,
Inc., Defendant 633
Rahul Guha, Darius Onul and Sally Woodhouse
34 Consumer surveys in trademark infringement litigation:
FIJI vs. VITI case study 640
T. Christopher Borek and Anjali Oza
35 Survey evidence to evaluate a marketing claim: Skye Astiana,
Plaintiff v. Ben & Jerry’s Homemade, Inc., Defendant 652
Alan G. White and Rene Befurt
36 Machine learning in litigation 661
Vildan Altuglu and Rainer Schwabe
Index 671
MIZIK_9781784716745_t.indd 8 14/02/2018 16:38

Contributors
Michael P. Akemann is a Managing Director at Berkeley Research Group,

LLC, a strategic advisory and expert consulting firm. He is an economist
who consults and testifies on antitrust, intellectual property, and general
commercial damages issues.
Sönke Albers is Professor of Marketing and Innovation, Kühne Logistics
University, Hamburg, Germany. Professor Albers’ research interests include
marketing planning, sales management, and diffusion of innovations.
Paulo Albuquerque is Associate Professor at INSEAD. His research
interests focus on firm decisions to introduce new products and consumer
decisions to search, use, and buy products online.
Greg M. Allenby is Helen C. Kurtz Chair in Marketing, Professor of
Marketing, and Professor of Statistics at the Ohio State University Fisher
College of Business. Professor Allenby’s research focuses on the develop-
ment and application of quantitative methods in marketing. His research
is used to improve product, pricing, promotion and targeting strategies at
leading firms.
Vildan Altuglu is Principal at Cornerstone Research. She specializes in
applying economic analysis and marketing research techniques to con-
sumer fraud and product liability, privacy and data breach, antitrust,
intellectual property, and general business litigation matters. She holds a
Ph.D. in Marketing from Columbia Business School.
Ashley N. Angulo is Postdoctoral Associate at Disney Research within the
Behavioral Science unit studying social influence and decision-making.
Asim Ansari is the William T. Dillard Professor of Marketing at the
Columbia Business School. Professor Ansari’s research addresses customer
relationship management, customization of marketing activities and
product recommendations over the internet, social networks modeling and
Bayesian modeling of consumer actions.
Lydia Ash is Director of People Operations at Google and people strate-
gist for many of the key growth businesses. An organizational design
expert, current key projects include architecting an innovative incentive
structure system for performance management and career development.
ix
MIZIK_9781784716745_t.indd 9 14/02/2018 16:38

x Handbook of marketing analytics
Michiel Bakker Director of Google’s Global Food program. Michiel and

his team take pride in fueling the minds and bodies behind Google’s inno-
vative products, serving over 150,000 delicious, nutritious, responsibly
sourced meals around the world each day. Michiel has over 25 years of
global food and beverage operations experience, including 17 years with
Starwood Hotels and Resorts.
Rene Befurt is a Vice President in the Boston office of Analysis Group,
Inc. Dr. Befurt specializes in applying marketing research methodologies
to both litigation and strategy casework. His expertise lies in cases involv-
ing false advertising and trademark disputes, the assessment of disclosures
on consumer decision-making, choice modeling, valuations of product
features, and general marketing and branding strategies.
T. Christopher Borek is a Managing Principal in the Washington, DC
office of Analysis Group, Inc. as well as a Senior Policy Scholar in
the Center for Business and Public Policy in Georgetown University’s
McDonough School of Business. Dr. Borek specializes in the application
of microeconomics, finance, and statistics to litigation and complex busi-
ness problems. He expertise is in intellectual property, antitrust, consumer
harm, finance, and tax disputes.
Deirdre Borrego is Senior Vice President and General Manager of the Data
and Analytics Division at J.D. Power. In this role, she is the senior leader
responsible for managing all aspects of business strategy and product
development, operations, talent development and financial performance.
The Data and Analytics Division provides the automotive, financial serv-
ices and insurance industries with unparalleled insights helping clients
make more informed decisions ranging from marketing, inventory man-
agement, pricing and vehicle valuations.
Bart J. Bronnenberg is Professor of Marketing and the GSB Trust Faculty
Fellow for 2017–2018 at Stanford University, and a research fellow at
the Center for Economic Policy Research (CEPR) in London. He studies
distribution and retailing, the persistence of branding effects, and (online)
search behavior.
Zoë Chance is an assistant professor in marketing at the Yale School of
Management. She is an expert in persuasion, focusing on tiny tweaks that
help people lead richer, healthier, happier lives.
Pradeep K. Chintagunta is the Joseph T. and Bernice S. Lewis Distinguished
Service Professor of Marketing at the Booth School of Business, University
of Chicago. He conducts research into the analysis of household purchase
behavior, pharmaceutical markets, and technology products.
MIZIK_9781784716745_t.indd 10 14/02/2018 16:38

Contributors xi
Marnik G. Dekimpe is Research Professor of Marketing at Tilburg

University and Professor of Marketing at KU Leuven. His research focuses
on econometric models of marketing effectiveness, with particular reference
to international retailing and to the impact of macro-economic fluctuations
on marketing.
Ravi Dhar is the George Rogers Clark Professor of Management and direc-
tor of the Center for Customer Insights at the Yale School of Management
and Professor of Psychology in the Department of Psychology, Yale
University. He is an expert in consumer behavior and branding, marketing
management, marketing strategy, and consumer decision-making.
Daria Dzyabura is an Assistant Professor of Marketing at the New York
University Stern School of Business. Her research focuses on methods
for analyzing complex decision-making rules and developing efficient
forecasting methods for multi-faceted buying situations.
Denzil G. Fiebig is a Professor of Economics at the UNSW Australia
Business School. His primary expertise is in micro-econometrics and in
particular discrete choice modelling. He also has research interests in
forecasting and applied econometric methods. His recent empirical work
has concentrated on health economics with an emphasis on modelling
the behavior of individuals and health practitioners as they interact to
determine utilization of health care and services.
Marc Fischer is Professor of Marketing and Market Research, University
of Cologne, Germany, and Professor of Marketing, University of
Technology Sydney, Australia. His expertise includes the measurement
and management of marketing performance, brand management, and the
optimization of marketing mix.
Avi Goldfarb is Ellison Professor of Marketing at Rotman School of
Management, University of Toronto. Dr. Goldfarb’s research focuses on
understanding the impact of information technology on marketing, on
universities, and on the economy. His research has also explored the value
of brands and the role of experience in managerial decision-making.
Noah J. Goldstein is Associate Professor of Management and Organizations,
Psychology, and Medicine at the UCLA Anderson School of Management.
He studies social influence in a number of contexts, including consumer
behavior, management, and medicine.
Rahul Guha is Senior Vice President at Cornerstone Research. He heads the
firm’s antitrust and competition practice and is the former head of its life
sciences practice. He holds a PhD in Marketing from Cornell University.
MIZIK_9781784716745_t.indd 11 14/02/2018 16:38

xii Handbook of marketing analytics
Dominique M. Hanssens is Distinguished Research Professor of Marketing

at the UCLA Anderson School of Management. His research focuses on
strategic marketing problems, in particular marketing productivity, to
which he applies his expertise in econometrics and time-series analysis.
Michelle Hatzis is Google Food’s Global Health and Wellness program
manager. A licensed clinical psychologist specializing in Behavioral
Medicine, she designs workplace programs for optimal performance and
vitality specializing in food choice architecture, food/culinary literacy,
movement and optimizing stress and resiliency.
John R. Howell is Assistant Professor of Marketing at the Penn State
Smeal College of Business. Professor Howell’s areas of expertise include
Pricing, Product Design, Conjoint Analysis, Bayesian Statistics.
Kim Huskey is regional leader for Google Food Services. Huskey’s
responsibilities in global corporate food services and restaurant consulting
include strategic programming and macro planning, business strategy and
concept development.
Dawn Iacobucci is E. Bronson Ingram Professor of Management in
Marketing at the Owen Graduate School of Management, Vanderbilt
University. Professor Iacobucci conducts research on networks, customer
satisfaction and service marketing, quantitative psychological research
and high dimensional data models.
Irina Ionova is a former Chief Science Officer at the Power Information
Network, J.D. Power and Associates. She holds doctoral and under-
graduate degrees in Applied Mathematics and Computer Science from the
Moscow Institute of Physics and Technology. Her research interests and
accomplishments are in the field of mathematical modeling and computer
simulations of complex systems, and related development of optimization
algorithms. Currently she focuses on consumer choice modeling in the
automobile market, and the effects of pricing and promotions within the
framework of consumer heterogeneity and transaction level modeling.
Sean Iyer is an Executive Vice President at Compass Lexecon. He
has worked on numerous matters in intellectual property, consumer fraud,
product liability, and deceptive advertising litigation where conjoint
analysis and other market research techniques have been used or critiqued.
Robert Jacobson is a consultant specializing in marketing strategy, brand
valuation, and litigation support. From 1984 until 2009, he was on the
faculty at the University of Washington, where he was Evert McCabe
Distinguished Professor of Marketing. His research has focused on
MIZIK_9781784716745_t.indd 12 14/02/2018 16:38

Contributors xiii
arketing strategy, with an emphasis on the interactions between firm

m
strategy and the financial markets.
Vamsi K. Kanuri is Assistant Professor of Marketing at the School of
Business Administration of the University of Miami. His research focuses
on marketing decision models, digital and multi-channel marketing strate-
gies, business model innovation and performance implications of firm and
marketing communications.
Rebecca Kirk Fair is a Managing Principal in the Boston office of Analysis
Group, Inc. Ms. Kirk Fair specializes in matters involving intellectual
property, corporate valuation, patent infringement, false advertising, tax,
class certification, and major antitrust litigation. She often serves as an
expert witness in matters involving the design, implementation, and evalu-
ation of consumer surveys.
Anja Lambrecht is an Associate Professor of Marketing at London Business
School. Her research focuses on digital marketing, with a particular focus
on online targeting and advertising as well as promotion and pricing.
Angela Y. Lee is the Mechthild Esser Nemmers Professor of Marketing at
the Kellogg School of Management, Northwestern University. Professor
Lee is a consumer psychologist, with special expertise in consumer learn-
ing, emotions and goals. Her research focuses on consumer motivation
and affect, cross-cultural consumer psychology, and non-conscious influ-
ences of memory on judgment and choice.
Donald R. Lehmann is the George E. Warren Professor of Business at the
Columbia Graduate School of Business. His research focuses on indi-
vidual and group choice and decision-making, the adoption of innovation
and new product development, and the management and valuation of
marketing assets (brands, customers).
Yang Li is Associate Professor of Marketing at Cheung Kong Graduate
School of Business. His research focuses on big data marketing analytics,
with emphases related to product recommendation systems, pricing, and
consumer choices.
Liye Ma is Associate Professor, Robert H. Smith School of Business,
University of Maryland. His research focuses on the dynamic interactions
of consumers and firms on internet, social media and mobile platforms.
He develops quantitative models to analyze the drivers of consumer
actions in the digital economy.
Murali K. Mantrala is the Sam M. Walton Distinguished Professor of
Marketing at the Trulaske College of Business of the University of Missouri.
MIZIK_9781784716745_t.indd 13 14/02/2018 16:38

xiv Handbook of marketing analytics
His research focuses on topics such as marketing by two-sided platforms,

sales resource allocation, compensation design, and retail pricing strategies.
Natalie Mizik is Professor of Marketing and J. Gary Shansby Endowed
Chair in Marketing Strategy at the University of Washington Foster
School of Business. Her research centers on assessing financial perform-
ance consequences of marketing strategies and activities and valuation of
intangible marketing assets.
Laura O’Laughlin is a Manager in the Montreal office of Analysis Group,
Inc., conducting economic analysis and research in both litigation and
non-litigation contexts. She has extensive experience in the development,
administration, and analysis of surveys and experiments in antitrust, false
advertising, strategy, and intellectual property matters.
Darius Onul is a Senior Analyst at Cornerstone Research. He works
on intellectual property, antitrust, financial institutions, valuation, and
product liability cases. He has a BA with a double major in Economics and
Mathematics from Amherst College.
Anjali Oza is a Vice President in the Menlo Park office of Analysis Group,
Inc. Dr. Oza specializes in the application of economic, statistical, and
market research methods to litigation and strategy matters. She is an
expert in designing and evaluating qualitative and quantitative surveys,
including conjoint analysis, for applications in patent litigation, false
advertising, class action, and Lanham Act matters.
Koen Pauwels is Professor of Marketing at Northeastern University and BI
Oslo, and Honorary Professor at the University of Groningen. Professor
Pauwels’ current research interests include predictive analytics, sentiment
analysis and online versus offline long-term marketing effectiveness.
Eugene Pavlov is a PhD candidate at the University of Washington Foster
School of Business. He studies consumer engagement with online content
and quantifying value of consumer brand perceptions by using economet-
ric, machine learning, and computer vision techniques.
Keiko I. Powers is a Senior Group Director, Analytical Insights, at
MarketShare in Los Angeles. She holds a PhD. in psychometrics from
UCLA and was previously employed by Power Information Network and
J.D. Power and Associates. Most recently, she has been involved in mar-
keting research focusing on the Japanese market as a member of the Japan
Institute of Marketing Science.
Vithala R. Rao is the Deane Malott Professor of Management and
Professor of Marketing and Quantitative Methods, Samuel Curtis
MIZIK_9781784716745_t.indd 14 14/02/2018 16:38

Contributors xv
Johnson Graduate School of Management, Cornell University, Ithaca,

New York. He is an expert on several topics including conjoint
analysis, multidimensional scaling, pricing, bundling design, brand
equity, market structure, corporate acquisition, branding, and trade
promotions.
Rebbecca Reed-Arthurs is a Director at Berkeley Research Group, LLC.
She is an economist who often consults in matters related to survey design,
implementation, and the use of consumer surveys and conjoint analysis
during the estimation of economic damages.
Daniel M. Ringel is Assistant Professor of Marketing at the University of
North Carolina at Chapel Hill. Daniel gathered extensive management
and e-commerce consulting experience prior to his doctoral degree at
Goethe-University Frankfurt (Germany).
John Roberts is Scientia Professor of Marketing in the UNSW Australia
Business School, and a Fellow at the London Business School. He is inter-
ested in the intersection between marketing science advances and manage-
ment practice.
Peter E. Rossi is James Collins Professor of Marketing, Statistics and
Economics at the UCLA Anderson School of Management. He is
an expert in the areas of Pricing and Promotion, Target Marketing,
Direct Marketing, Micro-Marketing, Econometrics of Limited Dependent
Variable Models, and Bayesian Statistical Methods.
Rainer Schwabe is Manager at Cornerstone Research. He works on anti-
trust and competition matters nationally and internationally. His work
has spanned a range of industries, including telecommunications, pharma-
ceuticals, finance, and automotive.
Jorge Silva-Risso is Professor of Marketing at University of California,
Riverside. Previously he was Executive Director of Marketing Science at
J.D. Power, a group he developed that specializes in building and imple-
menting quantitative models of consumer-level response to marketing
programs offered by the automobile industry. Prof. Silva-Risso’s current
research interests include econometric models of consumer response, mar-
keting effectiveness, pricing and the effects of the Internet on marketing,
information and search.
Bernd Skiera is Chaired Professor of Electronic Commerce at the depart-
ment of marketing at Goethe University Frankfurt (Germany). Professor
Skiera is an expert in electronic commerce and online marketing, customer
management, and pricing. He is a co-founder of Marini Media, which
MIZIK_9781784716745_t.indd 15 14/02/2018 16:38

xvi Handbook of marketing analytics
develops and implements IT solutions that integrate online marketing with

offline sales solutions.
Joel H. Steckel is Professor of Marketing and Vice Dean for Doctoral
Education at the Stern School of Business at New York University. His
current research focuses on applications of marketing research and theory
to the law. He was the founding President of the INFORMS Society
on Marketing Science. He currently serves as a co-editor-in-chief of
Marketing Letters.
Olivier Toubia is the Glaubinger Professor of Business at Columbia
Business School. His research focuses on various aspects of innovation
(including idea generation, preference measurement, and the diffusion of
innovation), social networks and behavioral economics.
Michael Trusov is Associate Professor, Robert H. Smith School of Business,
University of Maryland. His research focuses on digital marketing
including such topics as search engines, recommendation systems, social
media and networks, electronic word-of-mouth, e-commerce, consumer-
generated content, text analysis, eye-tracking and data mining.
Catherine E. Tucker is the Sloan Distinguished Professor of Management
Science at the MIT Sloan School of Management and Research Associate
at the NBER. She has particular expertise in online advertising, digital
health, social media, and electronic privacy. Her research interests lie
in how technology allows firms to use digital data to improve their
operations and marketing, and in the challenges this poses for regulations
designed to promote innovation.
Alice M. Tybout is the Harold T. Martin Professor of Marketing at the
Kellogg School of Management, Northwestern University. Professor
Tybout conducts research related to how individuals process, organize,
and utilize information to make judgments and choices.
Klaus Wertenbroch is Professor of Marketing at INSEAD and editor-in-chief
of the European Marketing Academy’s (EMAC) Journal of Marketing
Behavior. Dr. Wertenbroch is an expert in behavioral economics and con-
sumer decision-making, strategic brand management, and pricing.
Alan G. White is Managing Principal in the Boston office of Analysis
Group, Inc. as well as an Adjunct Faculty member in the Department
of Economics at Northeastern University in Boston. Dr. White focuses
on antitrust, intellectual property, and tax/transfer pricing matters in a
range of industries, with a particular interest in health care issues. He
has extensive experience with matters involving class certification and
MIZIK_9781784716745_t.indd 16 14/02/2018 16:38

Contributors xvii
uantification of damages, allegations of false advertising, breach of con-

q
tract, off-label promotion of prescription drugs, and the economic impact
of generic entry and substitution.
Sally Woodhouse is Vice President at Cornerstone Research, where she
leads the firm’s life sciences practice. She provides consulting services on
antitrust, intellectual property, False Claims Act, and breach of contract
cases. She has a PhD in Economics from the University of California,
Berkeley.
Hema Yoganarasimhan is an Assistant Professor of Marketing at the
University of Washington Foster School of Business. Her research focuses
on substantive issues in digital marketing and social influence using econo-
metric, machine learning and analytical models.
J. Douglas Zona is an economist at Square Z Research, LLC. He has
consulted on economics, marketing and antitrust matters for over two
decades, and often serves as an expert witness on these matters.
MIZIK_9781784716745_t.indd 17 14/02/2018 16:38

Overview of the chapters
1: Laboratory experimentation in marketing

Marketing academics, managers, public policy makers, and litigators often
ponder questions that involve relationships between alternative treat-
ments or strategies and people’s responses. Among the variety of research
approaches available to them, only experimental designs afford strong
causal inferences about such relationships. The chapter reviews the nature
of such experiments, discusses the role of laboratory versus field experi-
ments and explores the design of lab experiments along various dimensions.
2: Field experiments
In a digitally enabled world, experimentation is easier. This chapter
explores what this means for marketing researchers, and the subtleties of
designing field experiments for research. It gives guidelines for interpre-
tation and describes the potential advantages and disadvantages of this
methodology for classic areas of marketing.
3: Conjoint Analysis
This chapter offers an overview of Conjoint Analysis, with an eye toward
implementation and practical issues. After reviewing the basic assump-
tions of Conjoint Analysis, I discuss issues related to implementation;
data analysis and interpretation; and issues related to ecological validity.
In particular, I discuss recent evidence regarding consumers’ attention in
Conjoint Analysis surveys, how it may be increased and modeled, and
whether responses in Conjoint Analysis surveys are predictive of real-life
behavior. Each section concludes with practical recommendations.
4: Time-series models of short-run and long-run marketing impact
Determining the long-term impact of marketing actions is strategically
important, yet more challenging than uncovering short-term results. This
chapter describes persistence modeling on time-series data as a promis-
ing method for long-term impact detection, especially as longitudinal
databases in marketing are becoming more prevalent. We provide a brief
technical introduction to each step in persistence modeling, along with a
set of illustrative marketing studies that have used such models. Next, we
summarize various marketing insights that have been derived from the use
of persistence models in marketing.
xviii
MIZIK_9781784716745_t.indd 18 14/02/2018 16:38

Overview of the chapters xix
5: Panel data methods in marketing research

We review panel data models popular in marketing applications and
highlight some issues, potential solutions, and trade-offs that arise in their
estimation. Panel data studies controlling for unobservables often show
dramatically different estimates than cross-sectional studies. We focus on
models with unobservable individual-specific effects and address some
misconceptions appearing in marketing applications.
6: Causal inference in marketing applications
This chapter summarizes the major methods of causal inference and com-
ments on the applicability of these methods to marketing problems.
7: Modeling choice processes in marketing
This chapter examines the use of choice models in marketing. After briefly
describing the genesis of choice modeling, we introduce the two basic work-
horses in choice modeling, the logit and probit models. We use these two
models as a platform from which to show how additional phenomena can
be introduced, including multistage decision processes, dynamic models,
and heterogeneity. After a description of some more advanced models, we
close by illustrating how these models may be used to provide insight to mar-
keting managers by discussing a number of choice modeling applications.
8: Bayesian econometrics
Bayesian econometric methods are particularly well suited for analysis
of marketing data. Bayes theorem provides exact, small-sample inference
within a flexible framework for assessing particular parameters and func-
tions of parameters. We first review the basics of Bayesian analysis and
examine three areas where Bayesian methods have contributed to market-
ing analytics – models of choice, heterogeneity, and decision theory. We
conclude with a discussion of limitations and common errors in the appli-
cation of Bayes theorem to marketing analytics.
9: Structural models in marketing
In this chapter, I provide brief discussions of what we mean by structural
models, why we need them, the typical classes of structural models that we
see being used by marketers these days, along with some examples of these
models. I provide a basic discussion of structural models in the context
of the marketing literature and limit myself largely to models of demand
rather than models of firm behavior.
MIZIK_9781784716745_t.indd 19 14/02/2018 16:38

xx Handbook of marketing analytics
10: Multivariate statistical analyses: cluster analysis, factor analysis, and

multidimensional scaling
In this chapter I present three techniques—Cluster analysis, factor analy-
sis, and multidimensional scaling—popular with marketing researchers
and consultants because they help achieve frequently encountered market-
ing goals. Cluster analysis is useful in finding customer segments, factor
analysis is useful for survey research, and multidimensional scaling is
useful in creating perceptual maps.
11: Machine learning and marketing
Machine learning (ML) refers to the study of methods or algorithms
designed to learn the underlying patterns in the data and make predictions
based on these patterns. A key characteristic of ML techniques is their
ability to produce accurate out-of-sample predictions. We review two
popular machine-learning methods – decision trees and Support Vector
Machines (SVM) in detail.
12: Big data analytics
The field of “Big Data” is vast and rapidly evolving. In this chapter, strict
attention is paid to challenges that are associated with making statistical
inferences from big data. We characterize big data by the four Vs (volume,
velocity, variety and veracity) and discuss the computational challenges in
marketing applications using big data. We review stochastic approxima-
tion, variational Bayes, and the methods for wide data models.
13: Meta analysis in marketing
This chapter discusses important methods and issues in using meta-
analysis to develop a knowledge base in marketing. After defining meta-
analysis and explaining its role in marketing, the author discusses various
steps in a meta-analytic study, focusing both on design and statistical
issues. He then presents a comprehensive tabular overview of published
marketing meta-analyses in various subfields of marketing.
14: Marketing optimization methods
We survey the methods, advances, and insights from research and applica-
tions pertaining to Marketing Optimization Methods over the past 70 years.
Specifically, we classify extant marketing optimization problems into two
key typologies based on: (1) the number (“single” or “multiple”) of “sales
entities” and marketing input variables involved in the problem, and (2)
the nature of the objective function (e.g., static or dynamic). We discuss
the modeling and solving of optimization problems that fall under these
MIZIK_9781784716745_t.indd 20 14/02/2018 16:38

Overview of the chapters xxi
t ypologies. In each example, we summarize the problem; the choice vari-

ables; the constraints; the sales response model; the objective function; the
solution approach/technique; and optimization insights/principles from the
solution.
15: Industry applications of conjoint analysis
This chapter reviews four applications to provide the unique flavor and
demonstrate the versatility of the conjoint analysis method. The following
applications are discussed: store location selection, bidding for contracts,
evaluating the market value of a change in a product attribute (MVAI),
push marketing strategy in a B2B context, and choice of a distribution
channel.
16: How time series econometrics helped Inofec quantify online and offline
funnel progression and reallocate marketing budgets for higher profits
In order to better allocate its limited marketing resources, Inofec, a small
and medium entreprise (SME) in the business-to-business sector, needed
to quantify how its marketing actions drive offline and online funnel
progression. We developed a conceptual framework and econometric
time-series model and found evidence of many cross-channel effects, in
particular offline marketing effects on online funnel metrics and online
funnel metrics on offline purchases. Moreover, marketing communica-
tion activities directly affected both early and later purchase funnel stages
(website visits, online and offline information and quote requests). Finally,
we found that online customer-initiated contacts had substantially higher
profit impact than offline firm-initiated contacts. Shifting marketing
budgets towards these activities in a field experiment yielded net profit
increases 14 times larger than those for the status-quo allocation.
17: Panel data models for evaluating the effectiveness of direct-to-
physician pharmaceutical marketing activities
We illustrate the application of dynamic panel data methods using the
direct-to-physician (DTP) pharmaceutical promotions data described in
an article by Mizik and Jacobson (2004). Specifically, we focus on using
panel data methods to determine appropriate model specification and
to demonstrate how dramatically the estimates of the DTP effectiveness
change across various common model (mis)-specifications.
18: A nested logit model for product and transaction-type choice planning
automakers’ pricing and promotions
We develop a consumer response model to evaluate and plan pricing
and promotions in durable good markets. We discuss its implementation
MIZIK_9781784716745_t.indd 21 14/02/2018 16:38

xxii Handbook of marketing analytics
in the US automotive industry, which “spends” about $50 billion each

year in price promotions. The approach is based on a random effects
multinomial nested logit model of product and transaction-type choice.
Consumers differ in their overall price sensitivity as well as in their relative
sensitivity to alternative pricing instruments which has to be taken into
account to design effective pricing programs. We estimate the model using
Hierarchical Bayes methods to capture response heterogeneity at the local
market level. We illustrate the model through an empirical application to
a sample of data drawn from J.D. Power transaction records.
19: Visualizing asymmetric competitive market structure in large markets
Visualizing competitive relationships in large markets (i.e., markets con-
taining over 1,000 products) is challenging. We discuss a new model
called DRMABS (Decomposition and Re-assembly of MArkets By
Segmentation) for such applications. DRMABS combines methods from
multiple research disciplines such as biology, physics, computer science,
and sociology with a new method of submarket-centric mapping to visual-
ize asymmetric competition in large markets in a single two-dimensional
map.
20: User profiling in display advertising
Constructing behavioral profiles from consumer online browsing activities
is challenging: first, individual consumer-level records are massive and call
for scalable high performance processing algorithms; second, advertising
networks only observe consumer’s browsing activities on the sites partici-
pating in the network, potentially missing site categories not covered by
the network. The latter issue can lead to a biased view of the consumer’s
profile and to suboptimal advertising targeting. We present a method
that augments individual-level ad network data with anonymized third-
party data to improve consumer profile recovery and correct for potential
biases. The approach is scalable and easily parallelized, improving almost
linearly in the number of CPUs. Using economic simulation, we illustrate
the potential gains the proposed model may offer to a firm when used in
individual-level targeting of display ads.
21: Dynamic optimization for marketing budget allocation at Bayer
We present an Excel-based decision-support model that allows determin-
ing near-optimal marketing budgets and represents an innovative and
feasible solution to the dynamic marketing allocation budget problem for
multi-product, multi-country firms. The model accounts for marketing
dynamics and a product’s growth potential as well as for trade-offs with
respect to marketing effectiveness and profit contribution. It was success-
MIZIK_9781784716745_t.indd 22 14/02/2018 16:38

Overview of the chapters xxiii
fully implemented at Bayer, one of the world’s largest pharmaceutical

and chemical firms. The profit improvement potential in this company
was more than 50 percent and worth nearly €500 million in incremental
discounted cash flows.
22: Consumer (mis)behavior and public policy intervention
Consumers often “misbehave.” They save and exercise too little; they spend,
eat, and drink too much and take on too much debt; they work too hard
(or too little); they smoke, take drugs, have unprotected sex, and carelessly
expose their private lives on social media. These misbehaviors, often char-
acterized as time-inconsistent choices, may entail large costs not only to
the individuals concerned, but also to society as a whole. In this chapter, I
discuss how policy makers can take a theory-guided experimental approach,
complemented by field data, to demonstrate consumer precommitment both
as a revealed preference-based criterion for evaluating the need for policy
intervention and as a tool for allowing consumers to limit their misbehaviors
without imposing constraints on market participants’ freedom of choice.
23: Nudging healthy choices with the 4Ps framework for behavior change
In this chapter, we share the 4Ps Framework for Behavior Change,
designed to organize research findings to make them more easily applicable
in the real world. We offer levers the well-meaning planner can employ to
support the healthy intentions of others, and share examples of how the 4Ps
Framework is being applied at Google. Although our examples focus on
nudging people toward healthy food choices, similar strategies can be used
to nudge people’s behavior in any direction that supports their own inten-
tions. We offer advice for influence one-time decisions via (1) the combina-
tion of choices offered, (2) the choice environment, and (3) communication
about the choices. We also offer advice on supporting individuals in the
development of good habits, to make better choices in any time or place.
24: Field experimentation: promoting environmentally friendly consumer
behavior
This chapter discusses the challenges and rewards of conducting field
experiments by sharing the details that went into conducting several
large-scale field experiments within hotels. In discussion of the studies,
we document three stages of conducting field experiments with outside
organizations. The first stage is devoted to advice on outreach, including
communication strategies to reach potential organizations. The second
stage refers to securing buy-in from key stakeholders and organization
partners. Lastly, we detail methodological advice in the implementation
stage by highlighting potential concerns and safeguards.
MIZIK_9781784716745_t.indd 23 14/02/2018 16:38

xxiv Handbook of marketing analytics
25: Regulation and online advertising markets

Online advertising has grown rapidly in recent years. The rise of this new
form of advertising has generated a number of policy questions around
privacy, the ability of local governments to regulate information, and
antitrust in online markets. This chapter reviews three studies using a com-
bination of field experiments and quasi-experimental variation to answer
policy questions related to online advertising.
26: Measuring the long-term effects of public policy: the case of narcotics
use and property crime
This chapter uses multivariate time-series methods to study one of the
most serious public policy problems, the fight against narcotics abuse.
The effects of methadone treatment and legal supervision of narcotics
use and criminal activities were assessed by applying cointegration and
error correction methods that disentangle the long-term (permanent) and
the short-term (temporary) effects of intervention. Overall, the system
dynamics among these variables were characterized by long-term rather
than short-term relationships. Methadone maintenance treatment demon-
strated long-term benefits by reducing narcotics use and criminal activi-
ties. Legal supervision, on the other hand, did not reduce either narcotics
use or property crime in the long run. The chapter explores the policy
implications of these findings.
27: Applying structural models in a public policy context
We present an illustration of how marketing and structural models can
be applied in a public policy context. We describe the demand model in
Albuquerque and Bronnenberg (2012) to evaluate the impact of the 2009
federal policy measure known as the “Car Allowance Rebate System”
program (or “Cash for Clunkers”) on prices and demand in the auto
sector.
28: Avoiding bias: ensuring validity and admissibility of survey evidence
in litigations
Despite the wide scope for survey evidence used in litigation, the relevance
and usefulness of expert-submitted surveys in any legal context is depend-
ent on how they are designed and implemented. The avoidance of bias
in survey evidence is central to a survey’s admissibility and the probative
weight accorded to the survey expert’s testimony. This chapter discusses
possible sources of bias and describes methods and techniques that a
survey expert can use to minimize this bias.
MIZIK_9781784716745_t.indd 24 14/02/2018 16:38

Overview of the chapters xxv
29: Experiments in litigation

Often litigation outcomes hinge on very specific questions of consumer
behavior (e.g., how consumers interpret a specific advertisement).
Randomized experiments are instrumental in these contexts. Courts use
the same criteria as academics to judge these experiments: construct,
internal, and external validity. However, they place different emphases on
them. For example, external validity is much more crucial in a courtroom
than in an academic setting. This article discusses the similarities and dif-
ferences between experiments conducted in academic social science and
litigation. Furthermore, it points to a potential of the courtroom to inform
academic social science that has heretofore gone unexplored.
30: Conjoint analysis in litigation
We discuss the use of consumer surveys to evaluate consumer confusion
in a trademark infringement case. Because trademark owners are often
unable to provide evidence of actual confusion, consumer surveys can be
used to evaluate the likelihood of consumer confusion over similarity of
trademarks or products. We summarize the role surveys play in trademark
infringement cases and discuss their use in a trademark infringement case
involving artesian bottled water from the Republic of Fiji.
31: Conjoint analysis: applications in antitrust litigation
This chapter describes an application of consumer surveys in the litigation
context. This particular application of a survey differs from the typical use
of market research conducted for new product development, consumer
satisfaction studies, or the assessment of consumers’ willingness-to-pay for
a good or service. We describe and explain why and how a survey can be an
important means for either Plaintiffs or Defendants to present evidence on
the interpretation of a claim (here, a so-called All Natural claim displayed
on the packaging of Ben & Jerry’s ice cream), as well as to evaluate the role
that such a claim can play in the consumer’s decision-making process.
32: Feature valuation using equilibrium conjoint analysis
This chapter discusses the use of conjoint analysis in litigation. The author
summarizes key court decisions and motivates the use of conjoint analy-
sis as a method of proof in specific litigation settings. The chapter then
describes the basic elements of conjoint analysis and addresses several tac-
tical considerations in using conjoint analysis. The specific use of conjoint
analysis in a variety of litigation contexts is then summarized, including
an extended summary of the use of conjoint analysis in a landmark smart-
phone dispute.
MIZIK_9781784716745_t.indd 25 14/02/2018 16:38

xxvi Handbook of marketing analytics
33: Regression analysis to evaluate harm in a breach of contract case: the

Citri-Lite Company, Inc., Plaintiff v. Cott Beverages, Inc., Defendant
We outline some basic considerations and implementation strategies
regarding the use of consumer surveys and conjoint analysis in the context
of complex litigation. We also describe two applications of these tech-
niques in antitrust disputes in the payment card and infant formula sup-
plements industries.
34: Consumer surveys in trademark infringement litigation: FIJI vs. VITI
case study
Feature valuation is an important element of the marketing analytics
toolkit and one of the primary motivations behind the popularity of
conjoint analysis. We call attention to an important deficiency in current,
consumer-centric, approaches. Surveys used for feature valuation need to
include a reasonable competitive set. We demonstrate that equilibrium
calculations are both necessary and feasible.
35: Survey evidence to evaluate a marketing claim: Skye Astiana, Plaintiff
v. Ben & Jerry’s Homemade, Inc., Defendant
We discuss the use of regression analysis to evaluate harm in a breach of
contract case involving allegations that the licensor of a product failed
to use commercially reasonable efforts to promote and sell the product.
Regression analysis has been widely used and accepted by US courts
across a large variety of different types of cases, including labor dis-
crimination cases, antitrust cases, and intellectual property cases. In cases
involving marketing issues, regression analysis is frequently used to deter-
mine the effect of promotion on sales.
36: Machine learning in litigation
Litigation presents significant challenges involving the identification,
sorting, and analysis of large amounts of data. Machine learning, which
utilizes algorithms and systems that improve their performance with
experience to classify information and to make predictions, is well-suited
to these tasks. In this chapter, we discuss current machine learning appli-
cations in legal practice, as well as some potential applications of these
techniques in support of expert witness testimony in commercial litigation.
MIZIK_9781784716745_t.indd 26 14/02/2018 16:38

Introduction
Natalie Mizik and Dominique M. Hanssens
Marketing Science contributes significantly to the development and

validation of analytical tools with a wide range of applications in busi-
ness, public policy and litigation support. The Handbook of Marketing
Analytics showcases the analytical marketing methods and their high-
impact real-life applications. Fourteen methods chapters provide an over-
view of specific marketing analytic methods in some technical detail and
22 case studies present thorough examples of the use of each method in
marketing management, public policy, or litigation support. The contrib-
uting authors are recognized authorities in their area of specialty.
Marketing is both a science (academic discipline) and a managerial
practice. Its basic tenet is that customer-oriented managerial actions—
including product, pricing, communication, and distribution decisions—
should generate value for their targeted audiences. Since these actions are
generally costly, value creation efforts on the part of the firms have to
result in customer response (such as consumer purchases) that is strong
enough to justify the costs and to generate profits for the firm. Many
factors influence consumer demand, and not all of these are under control
of the marketer. As such, disentangling the effects of multiple factors and
assessing the top-line and bottom-line impact of marketing has been and
remains a critical challenge.
The academic discipline of marketing has developed and adopted a
number of scientific techniques that enable the assessment of marketing
impact. Implementation of these techniques in the academe is often
referred to as marketing science. Many of these scientific techniques have
been transferred to the world of marketing practice, where they are now
generally referred to as marketing analytics. Importantly, the range of
applications has reached beyond the marketing function in companies
and non-profit organizations to include the domain of public policy and
to serve as means to conflict resolution in litigation support. While the
definition of the intended beneficiaries in marketing (management),
public policy (regulators), and litigation (plaintiffs and defendants) differs,
the challenges facing the marketing scientist, policy analyst, and expert
witness are rather similar, be it predicting consumer response to a new
product introduction or information campaign, assessing the value of
MIZIK_9781784716745_t.indd 1 14/02/2018 16:38

2 Handbook of marketing analytics
an intangible asset, or establishing a causal link between firm or policy

maker’s actions and consumer behavior.
For example, advertising informs consumers of the benefits of purchas-
ing and using a certain product or service. Advertising is costly to the firm,
and a typical marketing analytics task is to determine to what extent the
additional revenue generated by the advertising campaign exceeds its cost.
In a public policy setting, marketing analytics may be used to address
a similar question when the targeted audience is society at large: did a
communications campaign to educate citizens about the advantages of
healthy eating habits make a meaningful difference on health outcomes in
the population? Finally, in a litigation support setting, marketing analytics
may be used to assess the loss of revenue and profitability of one brand as
a result of false advertising initiated by a competitor.
Marketing analytics has been successful in adopting and refining
techniques from several academic disciplines, including economics,
econometrics, operations research, statistics, psychology, sociology and
computer science. In particular, marketing analytics is equally adept at
using primary and secondary data sources, and is equally motivated by
research objectives of description, prediction, and causal inference. This
multi-disciplinary nature of the field has motivated us, via this Handbook
of Marketing Analytics, to showcase the various analytical marketing
methods and their high-impact real-life applications.
As a guide to our readers, the accompanying table presents an overview
of how the applications chapters relate to the methods chapters. Note that
the correspondence is not always one-to-one, i.e., in many cases the appli-
cations chapter illustrates more than one marketing science method. We
hope that this collection of outstanding contributions to methodology and
application will be educational and inspirational to our readers, whether
they are academics or practitioners in the areas of marketing, public policy
or litigation.
MIZIK_9781784716745_t.indd 2 14/02/2018 16:38

Introduction 3
Methods Applications in Applications in Applications in

Chapter Marketing Public Policy Litigation Support
Management
1. L
aboratory l Consumer (mis) l Avoiding bias
experiments behavior and (Chapter 28)
public policy l Experiments
intervention in litigation
(Chapter 22) (Chapter 29)
2. F
ield l Consumer (mis) l Experiments
experiments behavior and in litigation
public policy (Chapter 29)
intervention
(Chapter 22)
l Nudging healthy
choices
(Chapter 23)
l Promoting
environmentally
friendly consumer
behavior
(Chapter 24)
l Regulation in
online
advertising
markets
(Chapter 25)
3. C
onjoint l Industry l Conjoint
analysis applications analysis in
(Chapter 15) litigation
(Chapter 30)
l Applications
in antitrust
(Chapter 31)
l Feature
valuation using
equilibrium
conjoint
analysis
(Chapter 32)
4. T
ime-series l Online and l Narcotics use and
models offline funnel property crime
progression (Chapter 26)
(Chapter 16)
MIZIK_9781784716745_t.indd 3 14/02/2018 16:38


Management
5. P
anel data l Effectiveness of l Evaluating
models direct-to- harm in a
physician breach of
pharmaceutical contract
marketing (Chapter 33)
(Chapter 17)
6. Causality and l Effectiveness
endogeneity of direct-to-
physician
pharmaceutical
marketing
(Chapter 17)
7. Choice models l Automakers’
pricing and
promotion
planning
(Chapter 18)
8. Bayesian l Impact of the
econometrics “Cash for
Clunkers” policy
(Chapter 27)
9. S
tructural l Impact of the l Feature
models “Cash for valuation using
Clunkers” policy equilibrium
(Chapter 27) conjoint
analysis
(Chapter 32)
10. L
atent l Visualizing l Avoiding bias
structure competitive (Chapter 28)
analysis market structure l Surveys in
(Chapter 19) trademark
infringement
(Chapter 34)
l Surveys to
evaluate a
claim (Chapter
35)
11. M
achine l Machine
learning Learning in
Litigation
(Chapter 36)
MIZIK_9781784716745_t.indd 4 14/02/2018 16:38

Introduction 5

Management
12. Big data l Visualizing
competitive
market structure
(Chapter 19)
l User profiling
in display
advertising
(Chapter 20)
13. Meta analysis l Generalizations
in eight
marketing areas
14. Optimization l Online and
offline funnel
progression
(Chapter 16)
l Optimization for
marketing budget
allocation at
Bayer (Chapter
21)
MIZIK_9781784716745_t.indd 5 14/02/2018 16:38

MIZIK_9781784716745_t.indd 6 14/02/2018 16:38
METHODS CHAPTERS
MIZIK_9781784716745_t.indd 7 14/02/2018 16:38

MIZIK_9781784716745_t.indd 8 14/02/2018 16:38
PART I
EXPERIMENTAL
DESIGNS
MIZIK_9781784716745_t.indd 9 14/02/2018 16:38

MIZIK_9781784716745_t.indd 10 14/02/2018 16:38
1. Laboratory experimentation in marketing
Angela Y. Lee and Alice M. Tybout
Marketing academics, managers, public policy makers, and litigators

often ponder questions that involve relationships between alternative
treatments or strategies and people’s responses. For example, an academic
may want to test predictions about how individuals’ thinking style may
influence perceptions of brand extensions. Or, a brand manager may want
to know whether an advertisement highlighting a brand’s features is more
effective than one highlighting its emotional benefits in generating positive
attitudes and intentions to purchase among consumers. A public policy
maker may wonder whether a communication using an authority figure
or one using “the person next door” will result in a higher percentage of
people getting tested for colon cancer. And a litigator contesting patent
infringement may seek to establish the extent of lost sales caused by a
competitor incorporating a patented design into its products.
A variety of research approaches, including examination of historical
data, qualitative research, and consumer surveys, may shed some light
on these questions. However, only experiments afford strong causal
inferences about such relationships. Although experiments conducted in
the field often capture the richness of some real-world situations of inter-
est, experiments conducted in the laboratory can provide a much more
rigorous test of a causal relationship and often do so in a manner that
contains costs, saves time, and minimizes the risks of competitor response
or consumer backlash.
Consider McDonald’s, which, like many large companies, has been a
frequent target for rumors and myths that can negatively impact sales.
A well-known case was the rumor that McDonald’s used red worm meat
in its hamburgers (Greene, 1978). The company launched heavy TV and
print campaigns to counter this false information by using highly cred-
ible spokespersons and referencing objective data to debunk the rumor.
Although such a response seems intuitively reasonable and is consistent
with some basic notions of persuasion, it is not without risk. Theories
of information processing suggest pathways by which a direct refutation
could be ineffective and may even backfire. For example, if the rumor is
deemed to be implausible or not credible, then its refutation could have the
undesirable effect of prompting rehearsal of the rumor, thus reinforcing
rather than weakening it. Following this line of reasoning, Tybout, Calder,
11
MIZIK_9781784716745_t.indd 11 14/02/2018 16:38

and Sternthal (1981) conducted a laboratory experiment to examine the

effectiveness of three different strategies—the direct refutation message
strategy that McDonald’s employed, a reframing message strategy that
weakened the connection between McDonald’s and worms while also
suggesting some favorable associations to worms, and a retrieval mes-
sage strategy that required people to activate prior mental associations
toward McDonald’s that were unrelated to the rumor. They documented
the negative impact of a rumor that McDonald’s hamburgers were made
with worm meat and the ineffectiveness of the direct refutation strategy
McDonald’s employed. Further, they demonstrated that the reframing
and retrieval strategies that were grounded in information processing
theories were effective in countering the negative effect of the rumor on
people’s attitudes toward McDonald’s. Not only did their experiment
establish a clear causal relationship between the various damage-control
strategies and consumers’ attitudes toward McDonald’s, it did so in a con-
trolled setting that reduced monetary costs and minimized the potential
for adverse publicity or competitive interference that might have occurred
had the research been conducted in the field.
The Nature of Experiments
What is an experiment? At the most basic level, an experiment is a study in

which participants are randomly assigned by the researcher to be exposed
to different levels of one or more variables (i.e., independent variables),
and the subsequent effect of this exposure on one or more outcome vari-
ables (i.e., dependent variables) is observed. Thus, an experiment requires
that the researcher identify independent and dependent variables that are
of interest for theoretical or practical purposes and seeks to determine
whether and how these variables are causally linked.
Why do researchers choose to conduct experiments? Experiments are
the best method for establishing a causal relationship between independ-
ent and dependent variables because the researcher controls participants’
exposure to the independent variable(s), thereby insuring that three
conditions required to draw a strong conclusion about causality are met.
First, there must be covariation such that changes in the independent
variable are associated with changes in the dependent variable. Second,
the change in the independent variable or cause must precede the change
in the dependent variable or effect in time, a condition referred to as
temporal precedence. Finally, no variable other than the independent
variable should provide a plausible account for the effect on the dependent
variable.
MIZIK_9781784716745_t.indd 12 14/02/2018 16:38

Laboratory experimentation in marketing 13
In practice, causal relationships are often posited on the basis of covari-

ation observed in historical data, survey responses, or qualitative research.
For example, a manager may examine sales records over time and note
that sales declined following price increases. Or, a writer may seize on
an association between the level of education of a company’s marketing
staff and its market share performance, as was done in an Advertising Age
article announcing, “marketers from companies with significant market-
share gains are far less likely to have M.B.A.s than those from companies
posting significant share losses” (Neff, 2006). Should the conclusion be
that price increases cause sales declines and an MBA education leads to
poorer market share performance? Of course not! Although fundamental
principles of economics might tempt the manager to conclude that, indeed,
raising prices reduces sales, alternative explanations are plausible. Perhaps
competitors dropped their prices at the same time the company raised its
prices or maybe demand for the company’s product varies throughout the
year and the price increases happened to coincide with seasonal downturns
in demand. Likewise, there are undoubtedly numerous differences between
firms that gain versus lose market share other than whether they employ
M.B.A.s to manage the marketing function. The share-gaining and share-
losing firms may vary in terms of size, industry, geographic location, etc.,
and these factors could plausibly affect the intensity of competition, as
well as many other factors that influence market share. In fact, the causal
relationship could be in the opposite direction—low performing firms
might be more motivated to hire M.B.A.s than high performing firms. In
many situations, managers inferring causality from correlation might seek
additional data to rule out alternative explanations, but the alternatives
considered are limited to those they can imagine and the possibility of
additional rivals not addressed by the data always remains.
Ruling out rival explanations is not the only challenge when historical
data serve as the basis for causal inferences. It may also be difficult to
establish temporal precedence because the determination of the start
date of observations is necessarily arbitrary. For example, although
most people would expect advertising to influence sales and hence would
gauge the effectiveness of advertising by examining sales as a function of
advertising expenditure in the same and/or previous period, this approach
may distort the true effect of advertising if the firm’s budgeting strategy is
to spend a certain percentage of last period’s sales on advertising. Thus,
conducting an experiment in which participants are randomly assigned to
treatments and the independent variables of interest are systematically
manipulated is the best way to establish causality.
Returning to the McDonald’s worm rumor study, participants were
recruited to come to a lab setting where, under the guise of evaluating a
MIZIK_9781784716745_t.indd 13 14/02/2018 16:38

t elevision program, the rumor was introduced in the treatment condition

but not in the control condition. Those who heard the rumor were ran-
domly assigned either to hear a direct refutation of the rumor, a message
designed to weaken the association between McDonald’s and the rumor,
or an assertion that activated associations to the McDonald’s brand that
were unrelated to the rumor. Their attitudes toward McDonald’s were
then assessed. Thus, the conditions for establishing causality were met:
first, the independent variable (strategy to counter the rumor) was varied
before the dependent variable (attitude toward McDonald’s) was meas-
ured, and a statistically significant covariation between the independent
and dependent variables was observed. Further, because participants were
randomly assigned to the different treatments or levels of the independent
variable, the groups exposed to each treatment were presumably equiva-
lent in the aggregate a priori (i.e., any differences between and within the
groups such as age, gender, education level, liking for McDonald’s, etc.
would not influence the dependent variable systematically). As a result,
the sole difference between the groups was the treatment to which they
were exposed, making the treatment the only plausible cause for any dif-
ferences in the dependent variable—attitude toward McDonald’s.
Suppose McDonald’s management relied on historical sales data to
make inferences about the impact of the worm rumor and the effectiveness
of the company’s refutation strategy. If the data showed a decline in sales
following circulation of the worm rumor, and that sales rebounded several
months after the company aggressively refuted the rumor, management
might conclude that the rumor caused a downturn in sales, and further
infer that refutation was an effective strategy for combatting the negative
effect of the rumor on sales.
Tybout et al.’s laboratory experiment suggests that the first, but not
the second, inference is warranted. Participants randomly assigned to be
exposed to the rumor evaluated McDonald’s less favorably than those not
exposed to the rumor, ruling out possible rival explanations for the sales
decline based on actions by a competitor, or a general downturn in sales
for the fast food industry, etc. However, participants randomly assigned to
the rumor plus refutation treatment viewed McDonald’s just as negatively
as those exposed to the rumor but who heard no refutation, suggesting the
refutation was not effective in countering the rumor’s effects and that this
strategy should not be used in response to future rumors. The rebound of
sales might instead have occurred because over time consumers recalled
the numerous positive associations they had with McDonald’s prior to
the rumor, and these associations swamped the impact of the rumor. This
interpretation is consistent with the strategies that were found to be effec-
tive in the laboratory experiment and suggests that strategies focused on
MIZIK_9781784716745_t.indd 14 14/02/2018 16:38

reducing the connection between the company and the rumor might be an
effective strategy in response to any future rumors.
In summary, historical, survey, and qualitative data are excellent
sources for hypotheses about relationships between variables, but they are
inadequate to support a strong causal inference. In situations where it is
important to establish causality, an experiment should be conducted.
Choosing between Laboratory and Field

Experiments
The distinction between laboratory and field experiments is the setting in

which the research is conducted. Laboratory experiments occur in settings
created by the researcher for the explicit purpose of testing one or more
hypotheses. Volunteers are recruited and come to a designated physical or
online location where they typically receive some form of compensation
in exchange for reacting to certain stimuli presented by the researcher.
Although steps are typically taken to disguise the independent variables
that are of interest and the researcher’s hypotheses, laboratory experi-
ment participants are well aware that they are participating in research
and that their responses may have consequences beyond reflecting their
own desires. At the same time, when the experimental design exposes
participants to a single treatment, the lack of awareness of other condi-
tions reduces the likelihood of hypothesis-guessing even if the induction is
relatively transparent.
By contrast, field experiments occur in natural settings where par-
ticipants encounter treatments and provide responses in what they believe
is the normal course of their everyday life. As a result, field experiments
allow the researcher to assess the impact of a treatment on real world
behavior and not just antecedents of or surrogates for behavior (e.g.,
attitudes, intentions). However, although the field experimenter may design
different treatments and take pains to administer them following random
assignment, she has little control over the natural variation of a myriad of
variables that are not of particular interest, and the presence of which may
make it difficult to pinpoint the relationship of interest even though it exists.
Moreover, because participants in field experiments are unaware of their
role, ethical issues may arise if the research comes to light at a later point in
time. Such was the case when Facebook systematically varied the favorable-
ness of stories in 700,000 users’ newsfeeds in order to determine the effect of
these stories on users’ emotions as reflected in their own postings; or when
OKCupid management randomly suggested bad matches to its users in a
purported effort to test the validity of its date-matching algorithm.
MIZIK_9781784716745_t.indd 15 14/02/2018 16:38

Whether an experiment is better conducted in the laboratory or in the

field depends on how the research findings will be used, as well as the prac-
tical concerns mentioned earlier. An experiment may be conducted with
one of several goals in mind: (1) testing a theory, (2) testing a theory-based
intervention, and (3) establishing a phenomenon or effect and estimating
the magnitude of the effect.
Testing a Theory
In a theory-testing experiment, the goal is to examine predictions derived

from an articulated theory in order to draw conclusions about its merits.
The independent and dependent variables are chosen to test the relation-
ships between abstract constructs posited by the theory. The interest lies
not in the variables per se, but in the relationships between the theoreti-
cal constructs that the variables are assumed to represent. Accordingly,
the focus is not on generalizing the magnitude of the specific outcomes
observed in the experiment; rather, inferences are made about whether
the outcomes are best explained by the theory. If the theory is supported,
it may then be applied to situations within a set of relevant domains (see
Calder, Philips, and Tybout 1981; Lynch, Alba, Krishna, Morwitz, and
Gurhan-Canli 2012 for more detailed discussions).
In order to provide a strong test of a theory, the researcher strives to
control extraneous factors that might obscure the relationship between
the independent and dependent variables if one actually exists. Failing to
detect a causal relationship that exists between the variables is commonly
referred to as a Type II error, which is closely related to how much statisti-
cal power is afforded given the size of the sample (see later discussion on
power). If participants are very heterogeneous, or if variables unrelated to
the relationship of interest vary dramatically in the natural environment,
the chance of detecting the relationship of interest may be significantly
reduced. For this reason, laboratory experiments, which enable the
recruitment of a relatively homogeneous sample of participants and afford
the researcher control over many variables that are not of theoretical
interest, are typically preferred to field experiments when the goal is to test
theory.
To illustrate theory-testing laboratory experiments, let’s consider the
work of Aaker and Lee (2001), which tested hypotheses grounded in
regulatory focus theory. Regulatory focus theory distinguishes between
two types of goals—promotion goals that involve the pursuit of growth
and accomplishment, and prevention goals that involve the pursuit of
safety and security. The authors proposed that individuals’ view of the
self, which may be either independent or interdependent, would moder-
MIZIK_9781784716745_t.indd 16 14/02/2018 16:38

ate whether a message framed in terms of a promotion or a prevention

goal would be more persuasive. Specifically, they hypothesized that a
promotion goal would be more compatible with an independent self-view,
whereas a prevention goal would be more compatible with an interdepend-
ent self-view; and that compatibility between self-view and goal would
lead to greater persuasion.
Aaker and Lee (2001, exp. 1) tested their hypothesis using a labora-
tory experiment. Type II error was reduced by using a homogeneous
sample—college students from a single university. Participants were
randomly assigned to view one of four versions of a fictional website for
Welch’s Grape Juice that the researchers had constructed to manipulate
the two independent variables, self-view and goal type, while holding
other features of the website constant. After viewing the website, partici-
pants responded to a standard set of questions measuring their attitudes
toward and interest in the product. The findings were consistent with the
regulatory focus-based hypothesis and no alternative interpretation was
apparent. So this research is viewed as supporting and refining regulatory
focus theory. The researchers had no interest in the particular sample of
participants or in Welch’s Grape Juice per se, nor did they attempt to gen-
eralize the specific effects (i.e., evaluations of the website) to other samples
and stimuli. From the standpoint of their goal of testing a hypothesis
grounded in regulatory focus theory, some other homogeneous sample
and website or even a print ad for a different brand in a different category
could provide an equally rigorous test.
Testing an Intervention
The value of theory ultimately lies in its application to real world situa-
tions in the form of theory-based interventions. Researchers may pilot
test these interventions prior to implementing them on a grand scale. In
an intervention-testing experiment, the focus is on the treatments and out-
comes rather than on the abstract theory that led to the selection of these
variables. The goal is to see whether an intervention or treatment has the
desired effect and, if multiple interventions are under consideration, to
gauge their relative effectiveness. Rather than striving to create interven-
tions that vary along a single dimension and controlling for factors unad-
dressed by the theory (as would be the goal in a theory test), researchers
often design interventions that operationalize the theoretical constructs in
multiple ways so as to maximize the likelihood that the intervention will
have the desired impact and relax control over factors that lie outside the
theory to better mimic the natural environment to which the results will
be generalized.
MIZIK_9781784716745_t.indd 17 14/02/2018 16:38

An intervention-testing experiment may be conducted in either a

laboratory or a field setting. The desire to obtain results that generalize
to a natural setting would seem to favor conducting intervention tests
in the field where the implementation of the intervention and contextual
factors cannot be tightly controlled and individuals are unaware of their
role as participants. However, testing an intervention in the field can be
expensive and time-consuming because it may necessitate implementation
on a large scale, and may require the cooperation of a variety of parties
whose interests are not readily aligned. Further, companies that operate
in a competitive environment may fear that conducting a field experiment
could tip their hand to competitors, perhaps allowing them to take actions
that distort the test results and even rush a similar competitive product
to market. In addition, conducting an intervention test in the field where
individuals are unwitting participants can raise ethical concerns and
create backlash, as occurred in the case of field experiments conducted by
Facebook and OKCupid mentioned earlier. As a result, a researcher may
elect to conduct an intervention test in the laboratory. The McDonald’s
worm rumor study is one such example (Tybout et al. 1981). The research-
ers drew on theories of information processing to design potential inter-
ventions and introduced them in a setting that mimicked one where people
might encounter the rumor and McDonald’s response to it.
Work by Tal and Wansink (2015) illustrates the use of both laboratory
and field experiments to test interventions. These authors drew upon
theory about the mental activation of concepts in memory to design
interventions that encouraged consumers to make healthy food purchases.
Their interventions involved priming either healthy or unhealthy food
choices through asking participants to taste (or imagine tasting) food sam-
ples (e.g., apple or cookie) and then observing choices they made either on
a virtual (laboratory) or actual (field) shopping trip. In all experiments,
consumers who were primed to think about healthy choices chose more
fruits and vegetables than those primed to think about unhealthy foods,
leading the authors to recommend consumers having a small healthy
snack before shopping, or grocers offering healthy snack samples in store
to promote healthy living.
Establishing a Phenomenon and its Magnitude
Although the desire to test or apply theory is a common motivation for

laboratory experiments, researchers may conduct such experiments with
the goal of establishing a phenomenon or the magnitude of an effect in the
absence of a well-articulated, abstract theory. For example, a manager
may have an intuition based on sales data across different retail outlets
MIZIK_9781784716745_t.indd 18 14/02/2018 16:38

that sales of a product are tied to its placement within a grocery store
such that sales are greater when the product is displayed next to comple-
mentary categories rather than potentially competing ones (e.g., peanut
butter shelved next to jams and preserves rather than next to soy nut
butter). A litigator may need to estimate sales that were potentially lost
due to a competitor’s infringement on a patent by isolating the effect of
specific product features on consumer preferences. Or a charity may desire
to select the most effective appeal from several executions for generating
donations. In these situations, a field experiment has some obvious advan-
tages. Nevertheless, a laboratory experiment may be the better choice due
to monetary and time constraints.
In summary, if the primary goal is to establish a clear causal linkage
(versus estimating the magnitude of the relationship in natural settings), a
laboratory experiment is preferred. A laboratory experiment may also be
preferred for a variety of practical reasons detailed earlier. An important
additional advantage of conducting an experiment in the laboratory is
the opportunity to solicit participants’ responses to other questions that
may further shed light on the causal relationship. Information such as
age, gender, income, education, past experiences, and their thoughts
and emotions while being exposed to the treatment may also be useful
in identifying why the effect occurs, when it may dissipate or accentuate,
and what kinds of intervention may be useful to enhance or suppress the
effect.
Designing a Lab Experiment
When designing a laboratory experiment, researchers must make a variety

of decisions including determining the number of treatments, the manner
in which these treatments will be administered, the measures that will be
taken to assess the effect of these treatments, how participants will be
chosen, and how many participants will be necessary to achieve a reliable
inference. Key considerations in making these decisions are discussed
below.
Choosing a Passive vs. Active Control Treatment
All experiments have the following elements: independent variables

(operationalized by exposure to treatments, denoted by X) and depend-
ent variables (reflecting the observed effect, denoted by O). The simplest
design has one independent variable (sometimes referred to as factor)
with two levels of treatment, with one of the levels serving as the control
MIZIK_9781784716745_t.indd 19 14/02/2018 16:38

c ondition. And participants are randomly assigned to each of the treat-

ment conditions.
Experimental Group (EG) X O1

Control Group (CG) O2
Participants in the control condition may receive no treatment (i.e.,

passive control), or they may be exposed to an alternative treatment
(i.e., active control). The no treatment control option is often included to
provide a natural baseline condition to capture the situation where partici-
pants behave as they would in the absence of any treatment; although it
should be recognized that the mere awareness of participating in research
may constitute a treatment of sorts. A no treatment control may be of
particular interest when one is considering an intervention and a realistic
alternative is to do nothing. If the intervention does not perform substan-
tially better than the no treatment control, it will be difficult to justify
allocating any significant time or monetary resources to the intervention.
When the objective of the experiment is to compare the effects of differ-
ent treatments (e.g., two different versions of an advertisement), the design
necessarily involves two alternative treatments. An alternative treatment
may also be used to achieve a tighter control of the experiment even when
the objective is not to test different treatments.
EG 1 X1 O1
EG 2 X2 O2
For example, a researcher interested in the influence of positive mood

on brand choice may prefer to contrast the effects of a positive mood
induction (e.g., asking participants to write about a happy event) with a
neutral mood induction (e.g., asking participants to write about their most
recent trip to the grocery store), rather than a no mood induction. In the
absence of any mood induction, participants might arrive at the labora-
tory varying considerably in their mood based on factors unrelated to the
experiment. In general, it is more difficult to detect an effect with a passive,
no treatment control than with an active, alternative treatment control.
To examine the effect of salient healthy food choices, Tal and Wansink
(2015; exp. 3) conducted a laboratory experiment that included both
a passive and an active control group. In the experiment, participants
were randomly assigned to one of three treatments: one group consumed
a sample of chocolate milk labeled as healthy and wholesome (healthy
prime treatment), a second group consumed the same chocolate milk
MIZIK_9781784716745_t.indd 20 14/02/2018 16:38

but labeled as rich and indulgent (unhealthy prime treatment/active

control), and a third group received no prime (passive control). The
dependent measure was the degree to which participants made healthy
food selections in a subsequent shopping trip at an online grocery. The
passive control provided a baseline measure of participants’ preference
for healthy items in the absence of any prime, whereas the active control
enabled the researchers to control for the effect of the product used in the
prime (i.e., chocolate milk), and to determine the effect of the nature of
the prime (i.e., healthy vs. indulgent) relative to the baseline. The findings
revealed that the healthy prime significantly increased the number of
healthy food choices made relative to both the indulgent prime and the no
prime treatments; whereas the number of unhealthy food choices did not
vary across treatments. These outcomes suggest that people’s food choices
are influenced by the salience of healthy options, but not the salience of
unhealthy or indulgent options.
While randomly assigning participants to the different conditions is to
ensure that any effects observed are due to the difference in treatment,
random assignment is sometimes unintentionally violated when research-
ers assign groups of participants to each of the treatment conditions
sequentially over a period of time. This practice is problematic because
participants’ responses may vary depending on conditions that are not
randomly assigned such as the weather, time of day, events reported in
the news, and so on. Thus, a better practice is to concurrently assign
participants to different treatments each time the experiment is run until
the requisite number of participants is achieved.
Between vs. Within-participant Design
When the effect of the treatment is measured by comparing the depend-

ent measures across two different groups as described above, the design
is referred to as a between-participant design. Alternatively, the researcher
may choose a within-participant design in which a single group of partici-
pants is employed for each level of treatment, and measures of the depend-
ent variable are taken both before and after the treatment.
EG 1 O1 X1 O2
The primary advantage of using a within-participant design is efficiency.

By controlling for individual differences, the within-participant design
offers the same statistical power in detecting differences using a smaller
sample. The disadvantage of the within-participant design is that any
effect observed may be open to alternative interpretations. In particular,
MIZIK_9781784716745_t.indd 21 14/02/2018 16:38

the measurement preceding the treatment (O1) may alert participants to

the experimenter’s hypothesis or it may simply encourage participants
to ruminate about their thoughts and feelings. These factors, alone or in
combination with the treatment (X), may account for the change in the
dependent measures observed after the treatment (O2), compromising the
ability to make a strong casual inference. However, these concerns can be
mitigated if the dependent measures are unobtrusive (e.g., the length of time
a participant spends engaging in a task) or are not under participants’ con-
scious control (see discussion of dependent measures later in the chapter).
Single vs. Multiple Factors
When the main objective of the experiment is to compare the effects of dif-
ferent treatments (as in an intervention or effect test), a single factor design
with as many levels of treatments as desired may be adequate. However,
when the objective of the experiment is to delve into the why or how some-
thing happens (as in theory testing), a design involving multiple factors
may be needed for at least two reasons.
First, multiple factors may be included for the simple reason that some
theories specify moderators or boundary conditions. The simplest multi-
factor design is a 2 (XA1, XA2) × 2 (XB1, XB2) design, with participants being
randomly assigned to each of the four experimental groups:
XB1 XB2
XA1 EG 1 EG 2
XA2 EG 3 EG 4
The model of this two-factor design is:
yijk = m + tj + lk+ (tl)jk+ eijk
where m = grand mean, tj is the main effect for the jth level of treatment
XA, lk is the main effect for the kth level of treatment XB, and (tl)jk is the
interaction effect for XAj and XBk.
As an example, the Aaker and Lee (2001) study that was discussed
earlier used a 2 × 2 design to test the hypothesis that individuals’ self-view
moderates whether a promotion or prevention message frame is more
persuasive. The researchers varied the content of a website for Welch’s
Grape Juice that encouraged participants to adopt one of two self-views
(independent or interdependent) and exposed them to a persuasive mes-
sages evoking one of two goal orientations (promotion or prevention).
MIZIK_9781784716745_t.indd 22 14/02/2018 16:38

Another reason to include multiple factors is to help rule out alternative

explanations. While random assignment to experimental treatments serves
to isolate the causal variable, the interpretation of this variable in terms
of the construct it represents is not unique. This is because a variable can
operationalize multiple constructs (and the reverse is also true—a con-
struct can be operationalized by multiple variables). Thus, simply showing
an effect does not allow the researcher to unambiguously establish the
proposed relationship. Returning to the Aaker and Lee (2001) study,
consider how these researchers represented the construct of self-view
in their initial experiment. They did so by varying whether the website
for Welch’s Grape Juice highlighted benefits of the beverage for oneself
(intended to activate an independent self-view) or one’s family (intended
to activate an interdependent self-view). Although it is reasonable to argue
that these treatments represented the construct in the intended manner,
they might also have varied participants’ involvement in the task, with
participants being more involved when the site focused on the benefits of
grape juice to themselves rather than to their families. If so, an alternative
explanation for the findings could be presented in which involvement and
goal focus rather than self-view and goal focus explained the findings. To
rule out alternative explanations, multiple variables that might represent
the construct could be employed. If the effects of these variables converge,
then the plausibility of rival explanations is reduced. This strategy was
employed by Aaker and Lee, who used a more elaborate three-factor
design in their Experiment 2 to test the relationship between self-view and
goal focus, using people’s ability to recall the information as the depend-
ent variable. In this study, self-view was varied by priming an independent
or interdependent view (as in Experiment 1) as well as by recruiting
participants from two different cultures known to be associated with dif-
ferent self-views (American-independent, Chinese-interdependent). They
found that American participants as well as those whose independent
self-view was activated had better recall of the promotion-framed than
the prevention-framed message, whereas Chinese participants as well as
those whose interdependent self-view was activated had better recall of the
prevention-framed than the promotion-framed message. The convergence
of the effects of culture and self-view priming on participants’ memory of
the message strengthened the theory test that different goal orientations
are associated with distinct self-views by limiting the likelihood of a rival
explanation of involvement for the results.
In general, adding independent variables to an experiment may increase
the rigor of the theory test by ruling out rival interpretations and identify-
ing the specific conditions under which the hypothesized effect occurs.
However, this benefit comes with a cost. As the model becomes more
MIZIK_9781784716745_t.indd 23 14/02/2018 16:38

complex, the interpretation of the interaction effects can get progressively

more difficult. An alternative to expanding a design to include more
factors is to conduct several experiments, each employing a simple 2 × 2
design but differing in context or in the variables used to operationalize
the constructs.
Irrespective of the number of factors in the basic design, there may be
times when it is desirable to control for the effects of some “nuisance”
variables (i.e., factors that lie outside the theory but are likely to introduce
systematic variation in participants’ responses). For example, if Aaker and
Lee (2001) had recruited participants from four different universities or
employed websites for not one but four brands, they might wish to control
for the idiosyncratic effects of these variables by randomly assigning
participants to one of the 16 conditions according to a Latin Square design
as illustrated below:
Brand 1 Brand 2 Brand 3 Brand 4

University 1 A* B C D
University 2 B C D A
University 3 C D A B
University 4 D A B C
Notes:
* A = Independent self-view/Promotion frame.
B = Independent self-view/Prevention frame.
C = Interdependent self-view/Promotion frame.
D = Interdependent self-view/Prevention frame.
This design assumes there is no interaction between the variables of

interest (self-view and message frame in this example) and the nuisance
variables (participant’s university and brand). That is, the effect of
self-view  message frame does not vary by university or by brand. And
each participant’s response is modeled as follows:
yijk = m + ri + bj + tk + ll + (tl)kl + eijk
where µ = grand mean, ri is the effect of the participant’s university i, bj is

the effect of brand block j, tk is the treatment effect of self-view, ll is the
treatment effect of message frame, (tl)kl is the interaction effect for the
combination of kth level of self-view and the lth level of message frame.
MIZIK_9781784716745_t.indd 24 14/02/2018 16:38

Full vs. Fractional Factorial Design
When the objective of the research is to test for both main and interaction
effects, as is typically the case in theory-testing research, a full factorial
design is used where every level of one factor is crossed with all levels of the
other factors. This was the case for both of the Aaker and Lee (2001) experi-
ments described above. A full factorial design ensures that all the independ-
ent variables in the model, including the interaction terms, are orthogonal to
each other so that each of the effects could be estimated independently of all
other effects. Sometimes for efficiency it is desirable to use just a subset (i.e.,
a fraction) of the experimental conditions of a full factorial design, care-
fully chosen to preserve the orthogonality of the design. With a fractional
factorial design, the researcher will be able to estimate the main effects with
a much smaller sample, but will not be able to estimate all the interaction
effects. One instance of a fractional factorial design is the Latin Square
design described earlier. A common use of fractional factorial designs is in
conjoint studies (see Chapter 3 on conjoint analysis in this volume).
Another strategy that makes efficient use of participants is to “yoke”
additional cells to a simple factorial design. The Tybout et al. (1981)
experiment illustrates this strategy. The basic design in this study was
a 2 × 2 factorial where the participants were randomly assigned to one
of four conditions created by crossing mention of the worm rumor
(rumor absent, rumor present) with the inclusion of questions prompting
retrieval of prior attitudes toward McDonald’s (questions absent, ques-
tions present). Two additional treatments were yoked to the condition
where the rumor was introduced and the retrieval questions were absent.
In the first yoked treatment condition, McDonald’s refutation of the
rumor was presented. In the second condition, a response designed to
weaken the connection between McDonald’s and worms while making
people’s mental associations to worms more positive was presented. The
design is depicted below.
No Rumor Rumor
No Retrieval EG 1 EG 2 EG 5* EG 6**
Questions
Retrieval EG 3 EG 4
Questions
Notes:
* Rumor, no retrieval questions, McDonald’s refutation.
** Rumor, no retrieval questions, a message designed to weaken the connection between
McDonald’s and worms and making people’s associations to worms more positive.
MIZIK_9781784716745_t.indd 25 14/02/2018 16:38

Notice that the yoked treatments could have been included as addi-
tional treatments in a fully crossed design by allowing the retrieval ques-
tions variable to assume four rather than two levels. Doing so would have
required eight cells rather than six cells, while allowing the researchers
to examine the effectiveness of dual-approach strategies (e.g., retrieval
questions + McDonald’s refutation). Yet another design could be a single-
factor design with five conditions (EG 1, EG 2, EG 4, EG 5, and EG 6) if
the researchers were not interested at all in people’s attitudes when prior
associations are made salient in the absence of a rumor. The key consid-
eration to bear in mind in design selection is how efficient the design is in
serving the objectives of the research.
Choosing Dependent Variables
There are many types of dependent measures that researchers can use to
assess the effects of the independent variables in a laboratory experiment.
The decision of which measures and how many to include will depend
on the goal of the experiment. Theories specify not only outcomes, but
also processes by which the outcomes occur. Thus, in testing theories, the
researcher may include the outcome measures to capture the proposed
effect, such as participants’ beliefs about or dispositions toward certain
brands or products (i.e., the dependent variable), as well as measures that
allow inferences about the process underlying those outcomes (i.e., the
mediator variable). These process measures serve to strengthen the test
of the theory by allowing the researcher to conduct mediation analyses
to uncover the mechanism that drives the proposed effect. By contrast,
when conducting an intervention test or seeking to establish an effect, the
researcher is primarily interested in whether a desired outcome occurs in
response to the treatments, and is less interested in the process that led to
that outcome, in which case a smaller set of measures may be included. In
the next sections, we describe some of the more commonly used measures
in lab experiments.
Self-reported thoughts, mood, beliefs, attitudes, and intentions

Participants may be asked to write down their thoughts in response to
different treatments; but more typically, they are asked to report their
mood or express their beliefs, attitudes, and intentions using multiple-
item rating scales. Some common examples include the Likert scale
(strongly disagree–strongly agree), semantic differential scale (e.g., cheap–
expensive; very ineffective–very effective), and behavioral intention scale
(e.g., definitely would not buy–definitely would buy). Multiple items are
often used for each dependent variable so that a more stable indictor of
MIZIK_9781784716745_t.indd 26 14/02/2018 16:38

the underlying construct can be obtained than would occur with a single
item. These items are then combined to create an index that serves as the
dependent variable in the data analysis.
Choice/behavior
Participants may also be asked to make choices or engage in certain
behaviors. For example, they may be sent on an online shopping trip
where there are real consequences associated with the choices made (e.g.,
participants receive these products as compensation for participating in
the study). Or participants may be asked to a sample a food product and
the amount that they consume is measured as an indicator of their liking.
Or, participants may be asked to serve as a spokesperson for a cause and
the length and detail of their advocacy may serve as an indicator of the
strength of their support for the cause.
Memory and process measures

Participants typically have some control over their responses when they
self-report their attitudes and behavioral intentions or make conscious
choices. The implicit assumption is that participants have access to their
attitudes and feelings, which is not always true. Further, their responses
may be subject to the social desirability response bias. The laboratory
setting allows the administration of other measures over which partici-
pants have less conscious control. These include recall and recognition of
stimuli presented in the experiment, reaction times to questions, and
physiological measures of attention and arousal such as eye-tracking,
galvanic skin response (GSR), electromyogram (EMG), electroencepha-
logram (EEG), and functional magnetic resonance imaging (fMRI).
Inclusion of these measures is particularly useful when the researcher is
trying to capture automatic responses. However, physiological measures
are expensive to administer on a large scale and their obtrusiveness may be
distracting to participants.
Measures of demographic characteristics and individual differences

As noted earlier, when theory testing is the goal, the sample should be
relatively homogenous on dimensions not of theoretical interest; whereas
when intervention or effects testing is the goal, the sample should reflect
the heterogeneity observed in the natural setting to which the researcher
hopes to apply the findings. Measures of demographic variables such as
age, gender, education, country of origin and income are often included to
determine whether the sample has the desired homogeneity/heterogeneity.
Demographic variables as well as scales that measure individual differ-
ences in personality traits or disposition (e.g., Cacioppo and Petty (1982):
MIZIK_9781784716745_t.indd 27 14/02/2018 16:38

Need for Cognition Scale; Snyder (1974): Self-monitoring Scale) can also
be used to operationalize theoretical concepts. This was the case in the
Aaker and Lee (2001) experiment discussed earlier where participants’
cultural background (American or Chinese) served as one operationaliza-
tion of self-view. Further, demographic characteristics and individual dif-
ferences may be used to partition the data post hoc to explore whether
the same or different effects are observed in subsets of the sample. Thus,
including these measures can be useful in determining the robustness of
effects or in exploring potential moderators post hoc.
When multiple measures are included in the design, the researcher must
consider the order in which they are presented because there is a risk that
initial measures may influence subsequent measures. For example, asking
participants to recall information presented in the treatment just before
expressing their attitude could alter their attitude by encouraging them
to rely on the recalled information that they otherwise may not use. One
approach to addressing these concerns is to present the dependent measure
of greatest interest first and recognize the potential for order effects on
subsequent measures. An alternative strategy is to counterbalance the
order of the measures and make order a blocking variable in the design
to identify potential biases. In the event an order effect is detected, the
researcher may have to consider using dependent variables that are less
likely to have an order effect, such as those used to assess nonconscious
processes (e.g., response time), or collecting data on these variables using
separate experiments.
Selecting a Sample
Historically, participation in a laboratory experiment required people

to show up at a physical location. Today, many experiments are still
conducted in the physical lab, but a growing number of experiments are
conducted online where participants can provide their responses anywhere
via a computer or a mobile device.
Online labor markets such as Amazon’s Mechanical Turk (AMT),
Freelancer, and Guru are now used to recruit research participants. The
possibility of conducting research online allows researchers to access a
more diverse population other than university students or shoppers inter-
cepted at shopping malls. A recent study comparing samples in political
science research found that AMT respondents are more representative of
the US population than the convenience samples typically used in in-per-
son experiments, although they are not as representative as, say, a national
probability sample (see Berinsky, Huber, and Lenz, 2012). Further, the
anonymity afforded by online studies may encourage participants to be
MIZIK_9781784716745_t.indd 28 14/02/2018 16:38

more candid in their responses. However, the biggest disadvantage of using

an online labor markets for research participants is the loss of control.
When responses are collected online, the researcher has little knowledge
of or control over the environment surrounding the participants. Further,
the identity of the participant is difficult to verify (Marder, 2015). There is
also a growing concern that participants recruited from online pools are
savvy, professional survey takers who participate in hundreds of studies
per week. As a result, they often become familiar with commonly used
experimental manipulations and scales, and the responses they provide
may be different from those of a naïve participant that researchers observe
in a lab experiment. Thus, researchers using online pools are advised to use
novel manipulations to operationalize variables of interest, include differ-
ent attention checks in the survey to identify those who may be responding
to the questions mindlessly without even reading the instructions, and to
use a larger sample to reduce the within-cell variance.
Determining Sample Size
How many participants one needs for an experiment depends on several

considerations: What is the significance criterion (a)? How much statisti-
cal power is desired (1 – b)? What is the likely effect size (ES)? What test
statistic will be used to analyze and interpret the data?
The criterion of statistical significance is the researcher’s desire to
control for Type I error—the probability of mistakenly “discovering” an
effect that does not exist. Typically the maximum risk of committing this
error is set to a = .05. Another sample size consideration has to do with the
power of the experiment. Power refers to the researcher’s desire to control
for Type II error—the probability of failing to detect an effect that exists.
The conventional specification of the Type II error is b = .20, and the
power of the test is 1 – b = .80. The sample size is a function of a, b, and
the magnitude of the effect (i.e., ES). Some simple guidelines with illustra-
tive sample sizes are provided by Cohen (1992). For example, to detect a
medium difference in means between two groups at a = .05 and b = .20,
a sample size of 64 in each condition (i.e., total of 128) is needed; and to
detect a small (large) difference, a sample size of 393 (26) per condition is
needed.1
In the August 2015 issue of the Journal of Consumer Research, of the 49
lab experiments reported across the eight empirical papers, the maximum
sample size per cell was 189, and the minimum was 9, with a mean of 50
and a median of 42. With most of the effect sizes typically studied in the
literature being medium or small, it seemed that many of these studies
might be underpowered. However, when researchers use multiple studies
MIZIK_9781784716745_t.indd 29 14/02/2018 16:38

to examine the phenomenon of interest to demonstrate robustness or to

identify boundary conditions, the aggregate sample size would likely be
adequately powered to detect the effect. Further, there may be additional
benefits in running multiple small studies to examine a particular phenom-
enon over running one large study—it allows the researcher to quantify
between-study variation in their quest to test for robustness of the effect
across different contexts, thereby allowing for a more efficient estimate of
the population average effect size and a better calibration of Type I error
(McShane and Böckenholt 2014).
Concluding Remarks
The focus of this chapter is on when it is appropriate to conduct a labo-

ratory experiment and how to design such an experiment. Experiments
are valued for their ability to support strong causal inferences about the
relationship between independent and dependent variables. In compari-
son to field experiments, lab experiments typically afford the researcher
greater control over factors that are not of interest and the ability to detect
a relationship of interest if it indeed exists. By contrast, field experiments
prioritize assessing whether the relationship of interest is powerful enough
to emerge despite the “noise” created by the variation in non-focal factors
in a natural setting.
To illustrate when a laboratory versus a field setting may be more
appropriate for examining a causal relationship, we have described
three possible goals that a researcher may have in mind: theory-testing,
intervention-testing, and effects-estimation. In theory-testing experiments,
the data are valued as evidence for or against some abstract construct
relationship; whereas in intervention-testing and effects-estimation experi-
ments, the specific findings are of interest in their own right, either because
they indicate how an intervention is likely to perform in a natural set-
ting, or they estimate the magnitude of an effect that is of interest. It is
important that this characterization of the three distinct goals not obscure
the necessity of some explanation regardless of the researcher’s goal. The
selection of the independent and dependent variables for investigation
presupposes some theoretical explanation, even if the causal model may
not be theoretically formalized, as any application of the findings beyond
the research setting relies not just on statistical generalization but also the
validity of the explanation.
MIZIK_9781784716745_t.indd 30 14/02/2018 16:38

Note
1. When comparing between means, Cohen (1988) considered an ES (d = (µ1 – µ0)/ s) of .20
to be small, d = .50 to be medium, and d = .80 to be large. When comparing between two
proportions (P), he considered an ES (h = ϕ1 – ϕ2 where ϕ1 = 2 arcsin (!Pk = 2 )) of .20 to
be small, h = .50 to be medium and h = .80 to be large. And when assessing correlations,
r = .10 is considered small, r = .30 is medium, and r = .50 is large.
References
Aaker, Jennifer L. and Angela Y. Lee (2001), “‘I’ Seek Pleasures and ‘We’ Avoid Pains: The
Role of Self-Regulatory Goals in Information Processing and Persuasion,” Journal of
Consumer Research, 28 (June), 33–49.
Berinsky, Adam J., Gregory A. Huber and Gabriel S. Lenz (2012), “Evaluating Online Labor
Markets for Experimental Research: Amazon.com’s Mechanical Turk,” Political Analysis,
20, 351–368.
Cacioppo, John T. and Richard E. Petty (1982), “The Need for Cognition,” Journal of
Personality and Social Psychology, 42(1), 116–131.
Calder, Bobby J., Lynn W. Phillips and Alice M. Tybout (1981), “Designing Research for
Application,” Journal of Consumer Research, 8(September), 197–207.
Cohen, Jacob (1988), Statistical Power Analysis for the Behavior Sciences. Hillsdale, NJ:
Erlbaum.
Cohen, Jacob (1992), “A Power Primer,” Psychological Bulletin, 112(1), 155–159.
Greene, Bob (1978), “Worms? McDonald’s Isn’t Laughing,” Chicago Tribune (November
20), p. 1, Section 2.
Lynch, John G., Joseph W. Alba, Aradhna Krishna, Vicki G. Morwitz and Zeynep Gurhan-
Canli (2012), “Knowledge Creation in Consumer Research: Multiple Routes, Multiple
Criteria,” Journal of Consumer Psychology, 22, 473–485.
Marder, Jenny (2015), “The Internet’s Hidden Science Factory,” PBS, http://www.pbs.
org/newshour/updates/inside-amazons-hidden-science-factory/, February 11 (last accessed
October 3, 2017).
McShane, Blakeley and Ulf Böckenholt (2014), “You Cannot Step into the Same River
Twice: When Power Analyses are Optimistic,” Psychological Science, 9(6), 612–625.
Neff, Jack (2006), “Don’t Study Too Hard: MBA Marketing,” Advertising Age (March 20).
Snyder, Mark (1974), “Self-monitoring of Expressive Behavior,” Journal of Personality and
Social Psychology, 30(4), 526–537.
Tal, Aner and Brian Wansink (2015), “An Apple a Day Brings More Apples Your Way:
Healthy Samples Prime Healthier Choices,” Psychology & Marketing, 35(May), online.
Tybout, Alice M., Bobby J. Calder and Brian Sternthal (1981), “Using Information
Processing Theory to Design Marketing Strategies,” Journal of Marketing Research,
18(February), 73–79.
MIZIK_9781784716745_t.indd 31 14/02/2018 16:38

2. Field experiments
Anja Lambrecht and Catherine E. Tucker
The digital revolution has led to an explosion of data for marketing. This
‘Big Data’ available to researchers and practitioners had created much
excitement about potential new avenues of research. In this chapter, we
argue that an additional large and potentially important part of this revo-
lution is the increased ability for researchers to use data from field experi-
ments facilitated by digital tools.
Marketing as a field, perhaps because of its historical relationship with
psychology, has embraced and idealized field experiments from an early
stage in its evolution. Roberts (1957), when evaluating statistical inference
as a tool for Marketing Research, wrote the following still powerful pas-
sage on the merits of field experiments:
In experimental applications, managerial actions are actually tried out with

the aim of discovering the responses to these actions. All other applications are
nonexperimental or ‘observational.’ [. . .]
The key to modern statistical design of experiments is withholding experimen-

tal stimuli at random. To the extent that randomization and the other condi-
tions above are met, the responses actually observed will reflect the ‘true’ effects
of the stimuli plus random or chance variation. Statistical procedures then need
cope only with the interpretation of chance variation.
In other words, marketing research has from the beginning drawn a

clear and favorable line between experimental techniques which allow a
causal interpretation and everything else. Therefore, we emphasize that
the aim of this chapter is not to claim any novelty in our procedural
guide to the use of field experiments in marketing research, but instead
to attempt to update these techniques for a digital world that has made
their implementation easier, and to provide a guide to the pitfalls of such
techniques for researchers who are new to them.
In this chapter, we set out the field experiment methodology and its
main advantage and also lay out some general guidance for the interpreta-
tion of statistical results from field experiments. We then consider various
applications of field experiments to marketing. We then conclude by
emphasizing the limitations to this methodology.
32
MIZIK_9781784716745_t.indd 32 14/02/2018 16:38

Field experiments 33
A Description of Field Experiment

Methodology
In this section, we describe why field experiments are useful from a statisti-
cal point of view and five steps that researchers need to reflect upon when
designing a field experiment and interpreting its results. The focus of this
chapter is field experiments or interventions in the real world, rather than
the laboratory. The Lee and Tybout chapter in this volume discusses the
lab experiment method and we encourage interested readers to read that
chapter for more information.
Why a Field Experiment?
The raison d’etre of a field experiment is to provide causal inference. List

(2011, 8), in his justification of the use of field experiments, puts it well
when he says that ‘The empirical gold standard in the social sciences is to
estimate a causal effect of some action.’ Therefore, it is useful for market-
ing researchers to understand the econometric framework, upon which
basis field experiments make their claim to provide causal inference that is
superior to other techniques.
A useful approach is that of ‘potential outcomes’ (Rubin, 2005).1 In this
approach, for
Any treatment (x), each i has two possible outcomes:
l yi1 if the individual i experiences x

l yi0 if the individual i does not experience x
The difference between yi1 and yi0 is the causal effect. However, this is
problematic to measure, because a single individual i cannot both receive
and not receive the treatment. Therefore, only one outcome is observed for
each individual. The unobserved outcome for any individual is the ‘coun-
terfactual.’ The lack of observable counterfactuals for each individual
means that those who experience x and those who do not are different,
even if there is a field experiment. Instead, a field experiment ensures that
ex ante, via random assignment, any differences between the treated and
control group should not matter.
Step 1: Decide on Unit of Randomization
The above framework makes the motivation for the use of field experi-
ments straightforward. However, the term ‘random assignment’ and its
implementation turn out to be far more challenging than they appear
MIZIK_9781784716745_t.indd 33 14/02/2018 16:38

in this theoretical setting. Before random assignment can occur, the

researcher needs to decide at what degree of granularity random assign-
ment should occur. Theoretically, randomization could happen, for
example, at the level of the individual, household, town, website, store or
firm. Often, this choice of the ‘unit of randomization’ will determine the
success of a field experiment in terms of statistical power as well as how
convincing the results of the field experiment are.
At the highest level of generality, the statistical power of a randomized
experiment is likely to increase with greater granularity of the unit of
randomization. To consider why, contemplate the following scenario:
Imagine a firm selling bottled water wants to use a field experiment to test
different pricing strategies. It decides (at random) to test ‘everyday low
pricing’ west of the Mississippi and ‘hi–lo’ pricing east of the Mississippi.
In other words, there are just two units—in this case geographical clusters
of stores—that are randomized. Imagine too, that a drought hits the
territory west of the Mississippi at the same time as the experiment. Then,
even if every-day low pricing appears to be selling more bottled water, it
is not clear whether this was due to the randomized experiment or to the
drought. Put differently, the lack of granularity in randomization reduced
the chance that ex ante the ‘unobserved ways’ do not matter, as this lack of
granularity also made it more likely that there might be a systematic error
associated with one territory.2 Given this challenge, a researcher might
always think that it would be attractive to choose the most granular unit
of randomization technologically possible. However, there are also two
constraints that argue against granularity. First, there are the constraints
imposed by the costs and logistics of having a finely grained unit of
observation. Second, the researcher needs to minimize the potential for
spillovers and crossovers between experimental treatments.
In a non-digital environment, randomization is often constrained
simply by the ability to identify an individual and deploy a randomization
algorithm. However, the digital environment makes the conduct of very
granular field experiments straightforward and easy. The ease of such
a procedure has led to a new managerial language of ‘split tests’ or ‘a/b
testing’; commercial firms such as Optimizely3 now allow managers to
independently and easily run field tests to evaluate the effects of different
landing pages or website content using the highly granular unit for rand-
omization of an individual website visit.
However, in an offline environment, maintaining more units for rand-
omization could potentially still be very costly or logistically difficult. For
example, suppose a researcher wanted to evaluate the effect of different ‘sales
scripts’ on the performance of a sales department. Potentially, it might be
attractive to randomize which sales script was used for each call. However,
MIZIK_9781784716745_t.indd 34 14/02/2018 16:38

practically and logistically it might be simpler and cheaper if instead each

sales person would be randomly allotted to perform a single sales script when
making calls. This would reduce training costs and organizational complex-
ity. However, it introduces the risk of systematic bias if, for example, more
able sales people were accidentally randomized into one condition rather
than another. Of course, it is possible to use stratified randomization if such
ability is observable in advance, but potentially it may not be.
Step 2: Ensure No Spillover and Crossover Effects
A more pressing problem, however, than one of simple costs or logistical

complexity when it comes to choosing the right unit of randomization, is
the need to minimize spillovers and crossovers between experimental treat-
ments. A spillover occurs when a treated individual (or other unit) affects
the outcomes for other untreated individuals.4
Suppose a firm randomly selected an individual to receive a free mobile
phone. Potentially their adoption of a mobile phone could affect the adop-
tion outcomes of their relatives and friends, even if those relatives and
friends were supposedly untreated. If such spillovers are a large concern,
then one way of addressing them would be to randomize at the level
of plausibly isolated social networks such as a community, rather than
randomizing at the level of the individual.5
A crossover occurs when an individual who was supposed to be assigned
to one treatment is accidentally exposed to another treatment. Suppose,
for example, a canned soup company is testing different advertising
messages in different cable markets, and an individual is exposed to a
different advertising message from that of their home market because
they are travelling. This could potentially lead to mismeasurement of the
treatment, especially if there were systematic patterns in travel which led
such crossovers to not simply be random noise. Indeed, this is one issue
we faced even in a digital context in Lambrecht and Tucker (2013), where
randomization was implemented on an individual-day level rather than
at the level of the individual. When an individual arrived at a website, a
digital coin-toss determined whether they were exposed to a personalized
ad, taking no account of what type of ad the individual had previously
been exposed to. So an individual could be placed into different condi-
tions on different days, and the number of different conditions they were
placed into was itself related to their frequency of website use. Here, we
took care to include appropriate control variables, but this potential for
crossover between advertising conditions could have been addressed in the
experimental design if the firm we were working with had randomized at
a less granular level.
MIZIK_9781784716745_t.indd 35 14/02/2018 16:38

Step 3: Decide on Complete or Stratified Randomization
The second question that a researcher should tackle after establishing the
unit of randomization is whether to conduct stratified randomization or
complete randomization.
In complete randomization, individuals (or the relevant unit of rand-
omization) are simply allocated at random into a treatment. In stratified
randomization, individuals are first divided into subsamples based on
covariate values so that each of the subsamples are more homogenous
relative to that covariate than the full sample. Then, each individual
in each of these subsets is randomized to a treatment.6 This stratified
technique is useful if a covariate is strongly correlated with an outcome.
For example, household income may be strongly correlated with purchase
behavior towards private label brands. Therefore, it may make sense, if the
researcher has access to household-level data, to stratify the sample prior
to randomization to ensure sufficient randomization occurs within, for
example, the high-income category.
There is a relatively large empirical literature discussing the merits of
different approaches to stratification in the context of schooling experi-
ments and experiments within the developing world. For examples of this
debate, see Bruhn and McKenzie (2008) and Imai et al. (2008, 2009). It is
worth pointing out, though, that the typical school setting on which this
debate is focused is often less relevant to marketing applications. First,
often in marketing it is hard to collect reliable data before an experiment
which would allow stratification and subsequent random assignment
before the experiment. Second, much of the debate is motivated by experi-
mental treatments such as a change in school class size which are very
costly and therefore obtaining statistical efficiency from a small number of
observations is paramount. For example, when randomizing 30 different
schools into different class-size conditions, one might not obtain any sta-
tistical precision in estimates simply because by unlucky chance the richest
schools were all randomly allocated into the lowest class-size condition.
However, for many marketing applications such as pricing or advertising,
the kind of cost constraints that would restrict the researcher to only look
at only 30 units of observations are less likely to be present. Furthermore,
reliable data that would allow such stratification may not be present.
Step 4: Ensure that Appropriate Data Are Collected
After ensuring that randomization is appropriate, researchers should

carefully consider what type of data they need for their later analysis
and ensure the practical set-up allows them to collect this data. This is
MIZIK_9781784716745_t.indd 36 14/02/2018 16:38

articularly important in digital environments where different parties have

p
access to different types of data and it is not always obvious how these can
be collected and linked. For example, advertising networks have access to
ad exposure data but it may require additional steps to ensure that they
likewise capture purchase data and can link those to ad exposures. In
Lambrecht et al. (2017), we were unable to provide this link. By contrast,
in Lambrecht and Tucker (2012) we worked with the web hosting provider
conducting the field experiment to implement Google Analytics to track
consumers arriving from Google’s search engine at the website of the web
hosting provider. Additionally, researchers should carefully consider data
points that are not directly linked to measuring the outcome of the ran-
domization, but they may help the researcher understand the behavioral
mechanism or rule out alternative interpretations. For example, while
conducting a field experiment on Twitter, Lambrecht et al. (2017) concur-
rently collected data from an independent source, on the size of all Twitter
trends their study was focusing on, on every day of the field experiment
from an additional, independent source. This data served to later rule out
that the size of the trends studied led to the effect of interest.
Any researcher interested in field experiment techniques should be
aware of the potential need for a large sample when conducting a field
experiment, especially when the magnitude and direction and heterogene-
ity of the treatment effect is unknown.7 It is devastating to run a field
experiment and obtain statistically imprecise estimates of the causal
effect due to lack of sample size. There are many settings where this may
be a concern. For example, Lewis and Rao (2015) show that for many
online advertising campaigns the effect is so small and heterogeneous that
measurement even with millions of observations can result in imprecise
estimates. It may be possible to identify such contexts by reference to
the explanatory power of different variables in prior observational (and
non-randomized studies). In general, though it is difficult to give practical
advice to researchers beyond aiming for as expansive a sample and data
collection effort as possible.
Step 5: Interpreting Results from a Field Experiment
Though in theory, the ‘potential outcomes’ approach means that inter-

pretation should be straightforward, in practice there are numerous issues
that the researcher should be aware of when interpreting their statistical
results. In general, the key issue is understanding exactly what is different
between the groups who were treated and those who were not, and being
careful about how to generalize this difference.
A key consideration for researchers is how the length of time the
MIZIK_9781784716745_t.indd 37 14/02/2018 16:38

field experiment ran for will affect their interpretation of their results.8
Anderson and Simester (2004) highlighted the importance of making sure
the researcher has access to a long enough period of data by showing that
the long-run effects of promotional depth were negative for established
customers, though in the short run they could look deceptively attractive
due to their ability to attract new customers. In general, researchers should
try and collect data for as long a period as possible to understand whether
any treatment they measure is stable, dissipates or increases in its effect
over time. However, for many field experiments it is hard to measure long-
run effects as the researcher does not have the ability to monitor treated
and untreated individuals over time. Therefore, in most settings research-
ers should carefully consider whether the causal effect they establish truly
reflects the long-run treatment effect.
The existence or importance of Hawthorne effects, where the mere fact
of being observed as part of a field experiment can alter outcomes, is the
subject of much academic debate (Parsons, 1974; Adair, 1984; Jones, 1992;
McCarney et al., 2007).9 In general, however, this kind of critique invites
a researcher to be thoughtful about what really is the difference between
the ‘treatment’ and the ‘control’ and what specifically they measure. The
researcher should provide reassuring evidence for the reader that the
causal effect they measure between the treatment and control is associated
with the part of the treatment they claim it is. For example, Burtch et al.
(2015) use data from a field experiment which introduced new privacy set-
tings in a crowdfunding setting. They devote much space in their article to
giving the reader evidence that the change they measure in crowdfunding
propensity really was a result of the change in privacy settings rather than
simply the introduction of a new screen or potential navigation costs for
the website user.
One obvious concern that researchers face, especially those who work
with firms, is that there may be compromises or challenges to randomiza-
tion. Firms may only be willing, for example, to experiment with, in their
view, less successful media or sales territories, and unwilling to experi-
ment with more successful ones. Similarly, firms may only be willing
to incur the costs of experimentation for their best customers. Simester
et al. (2009) provide a nice example of how a researcher faced with such
constraints can describe the selection criteria which constrained rand-
omization and provide reassuring evidence and discussion to allow the
reader to understand what the constraints mean. In their particular case,
they used the company’s decision to distinguish between ‘best’ customers
and ‘other’ customers when determining random assignment as a useful
way of exploring the underlying behavioral mechanism. In general,
though, in such circumstances the key procedure for any researcher
MIZIK_9781784716745_t.indd 38 14/02/2018 16:38

is to be upfront about the limitation and discuss its implications for

generalizability.10
What Marketing Contexts Can Use Field

Experiments?
Promotion and Marketing Communications
Marketing communications, especially advertising, is perhaps the area

that has been most revolutionized by the ability to conduct field experi-
ments in the digital space.
Some work has focused on measuring the effectiveness of different
forms of advertising. Lewis and Reiley (2014b) measure the effect of
online advertising on offline sales and find positive effects. Draganska
et al. (2014) use field test data to compare the effectiveness of television
and internet advertising. Blake et al. (2015) examine the impact of paid
search advertising on purchases in a large-scale field experiments at eBay.
Sahni (2015) studies how the different timing of ads moderates their
effectiveness. Offline, Bertrand et al. (2010) measure the effectiveness of
advertising in the developing world.
Other work has used field experiments to measure the effectiveness of
advertising for different kind of users and product contexts, such as older
internet users (Lewis and Reiley, 2014a) and different kinds of products
(Bart et al., 2014). Yet another way in which field experiments can be
useful in the context of marketing communications is to explore which
groups of consumers are most responsive to targeted ads. Lambrecht et al.
(2017) show that early trend propagators are on average less responsive to
promoted tweets (advertising messages on Twitter) than consumers who
post on the same trends later on. Hoban and Bucklin (2014) find that users
in most stages of the purchase funnel are receptive to ads, but not those
who previously visited the site without creating an account.
Researchers have also used digital experiments to explore optimal ad
content and design. Fong (2012) explores the content of targeted email
offers and find that a closely matched offer may weaken a customer’s
incentives to search beyond the targeted items. Lambrecht and Tucker
(2012) explore how consumers respond to different prices advertised in
Google search ads. Ascarza et al. (2016) find that customers who were
randomly offered recommendations as to their mobile phone plan were
more likely to churn than those who were not offered recommendations.
Much of this literature has emphasized that not all digital enhance-
ments of ad content are positive. Aral and Walker (2011) show that viral
MIZIK_9781784716745_t.indd 39 14/02/2018 16:38

ad design is only of limited success. Goldfarb and Tucker (2011a) show

that there is a tradeoff between the level of targeting of a display ad’s
content and the ad’s intrusiveness. Goldfarb and Tucker (2015) found a
tradeoff between the degree of standardization of digital ad formats and
how effective they are at attracting viewers’ attention—for most ads,
recall of banner advertising declines the more ads conform to standard
formats, especially for ads that focus on brand logos, and less so for
ads designed by advertising agencies. Tucker (2014a) shows that social
endorsements are only of limited effectiveness in enhancing ad content.
Lambrecht and Tucker (2013) demonstrate that very personalized ad
product content can backfire unless a consumer’s browsing history
indicates that they have reached a stage in their purchase process where
they are ready to buy.
One of the challenges of optimizing online advertising is identifying and
implementing optimal policies in real time. Schwartz et al. (2016) solve the
problem of maximizing customer acquisition rates by testing many ads
on many websites while learning which ad works best on each website by
implementing a multi-armed bandit policy that adjusts in real time in a
large adaptive field experiment.
Pricing
Firms and researchers can use field experiments to understand consumer

response to different prices and set optimal prices. Offline, Anderson and
Simester (2003) looked at the effect of $9 price endings, and Anderson and
Simester (2001) show that sale signs are less effective the more products
have them.
The effect of promotions on sales has attracted much attention in both
offline and online settings. Anderson and Simester (2010) extend earlier
work to show that discounts can lead to customer antagonism, especially
among loyal customers. Lee and Ariely (2006) report on a series of field
experiments in a convenience store where consumers were randomly
exposed to different treatments such as varying when during the shopping
process conditional coupons (of the form ‘Spend $X and get $1 off’) were
handed to them and the amount of the coupon. They find that conditional
coupons are more effective in influencing consumers’ spending when
consumer goals are less concrete. Sahni et al. (2014) find a positive effect of
promotions that largely comes not from redemption of the offers but from
a carryover to the following week. Their study also highlights, however,
that higher risks of crossover and spillover effects exist when experiment-
ing with prices online, especially when price differences between test
conditions become large and social networks are prevalent. Fong et al.
MIZIK_9781784716745_t.indd 40 14/02/2018 16:38

(2015) and Andrews et al. (2015) are among a recent body of work explor-
ing when mobile promotions are effective.
While a majority of field experiments focus on B-to-C settings, a study
by Tadelis and Zettelmeyer (2011) demonstrates that field experiments can
likewise be very useful in understanding B-to-B transactions. The authors,
using a large-scale field experiment that randomly discloses quality
information in wholesale automobile auctions, examine how information
disclosure affects auction outcomes.
Last, field experiments have served to understand consumers’ response to
pay-what-you-want pricing. Kim et al. (2009) find in multiple field studies
that prices paid are significantly greater than zero and can even increase rev-
enues. These studies rely on experimentation over time, highlighting the dif-
ficulty for offline stores, specifically restaurants, to concurrently implement
different pricing mechanisms. By contrast, Gneezy et al. (2012) randomized
in several field experiments the price level and structure to which consumers
were exposed. They show that often, when granted the opportunity to name
the price of a product, fewer consumers choose to buy it than when the price
is fixed and low. Jung et al. (2014) demonstrate that when asked to pay as
much as they like, merely reframing payments to be on behalf of others, not
their own, leads people to pay more. Broadly related, Gneezy et al. (2010)
show that a charitable component in a purchase increased sales significantly
when coupled with a ‘pay-what-you-want’ pricing mechanism.
Product
It can be challenging to implement field experiments to better understand

the relative performance of alternative new products, designing new
products or testing them relative to the competition. In many industries,
operational constraints prevent firms from launching different product
alternatives concurrently, especially in the non-digital economy where
such field experiments can be very costly. In addition, experimenting
with products can confuse customers and lead to spillover and crosso-
ver effects. It may also lead to competitive response prior to a full-scale
product introduction.
One potential avenue for researchers is to work with firms that already
test the profitability of new products and their effect on existing product
lines. For example, McDonald’s regularly tests new menu items by rolling
out a new product to a small subset of stores.11 Additionally, there are
possibilities for field experiments regarding products in the developing
world. For example, using the example of antimalarial bed nets, Dupas
(2014) shows that rather than deterring future purchases, one-off subsidies
can actually encourage willingness to pay.
MIZIK_9781784716745_t.indd 41 14/02/2018 16:38

Additionally, researchers have used field experiments to better under-

stand customer needs in the design of new products, product customiza-
tion and presentation of product information. Boudreau et al. (2011) show
the possibility of using field experiment techniques in product design using
data on software contests. Hildebrand et al. (2014) find that customers
who are randomly designed to a condition where they will create a custom-
ized product from a starting solution are more satisfied with their purchase
than customers who are assigned to a condition that requires an attribute-
by-attribute configuration. Relatedly, Levav et al. (2010) demonstrate in
a field experiment that when consumers customize products, the order in
which attributes are presented changes their revealed preferences. When
users of a social networking site can choose product characteristics, Sun et
al. (2012) find that subjects were more likely to diverge from the popular
choice among their friends as the popularity of that choice increased.
A broadly related question is how consumers respond to different infor-
mation provided in search results. Nosko and Tadelis (2015) implement
a field experiment where they change the search results for a randomly
chosen subset of buyers on eBay using a new suggested measure of quality.
They find that their suggested measure of quality increases the quality of
transactions and, consequently, the retention of buyers.
Distribution
Distribution decisions often involve conflicts of interest, are long-term,

difficult to change and costly to implement. As a result, the use of field
experiments tends to be difficult. However, digital technology, specifically
the online channel, open up new avenues for researchers.
Though there are few field experiments focused on channels, we high-
light a subset of papers that use natural experiments to indicate the kind of
questions that could be answered using field experiments.
Gallino and Moreno (2014) use data from a quasi-experiment that relies
on a new ‘buy-online, pickup-in-store’ functionality being implemented
in the United States but not in Canada and find that the introduction of
‘buy-online, pickup-in-store’ leads to a reduction in online sales but an
increase in store sales and traffic. Such a study could have presumably
been done by randomizing the deployment of a ‘buy-online, pickup-
in-store’ functionality across different US states. Relatedly, Bell et al.
(2015) use a quasi-experiment that the introduction of an offline channel
increases demand overall and through the online channel. Again, it may
have been possible to operationalize this as a field experiment, in particu-
lar if the ‘offline channel’ was of a less costly form such as a popup shop.
MIZIK_9781784716745_t.indd 42 14/02/2018 16:38

Broader Context of Marketing
Last, we address to what extent field experiments are useful when explor-
ing questions of broader importance to marketers. In general, many of the
most important questions of marketing strategy, such as whether there is
a first-mover advantage, are difficult to analyze using a field experiment
technique.
However, recent research suggests that field experiments can be quite
useful for analyzing the broader policy or welfare context in which mar-
keting occurs and investigating how marketing can help correct societally
charged issues such as inequality in income or across nations. A very
useful example of this is the work of Anderson-Macdonald et al. (2015)
investigating what parts of a marketing or entrepreneurial education can
benefit small startups in South Africa. He finds that, in general, parts of
a curriculum focused on the demand side tended to be more useful than
parts of the curriculum focused on the cost side. Another notable feature
of this experiment is the mix between digital and non-digital methods in
the experimental setting. The educational treatment was done at great
expense offline, but data collected was facilitated and made less costly by
the use of digital survey tools to monitor the effects of the treatment.
Digitization and Big Data have also attracted increasing attention to
consumer privacy. Miltgen and Tucker (2014) provide some evidence from
a field experiment that when money is not involved, people tend to behave
in a privacy-protective way that is consistent with their stated privacy
preferences. However, when pecuniary rewards are in play, consumers
behave inconsistently with their stated privacy preferences, particularly
consumers who have the most online experience.12 A complement to this
work on privacy is understanding what makes consumers behave in a
non-private way and share information online. Toubia and Stephen (2013)
investigate this using a field experiment on Twitter and show that both
image-related and intrinsic utility matter as motivations.
Lastly, field experiments can shed light on a number of broader social
issues and serve as real-world validation of laboratory experiments on a
variety of topics. Gneezy et al. (2012) examine prosocial behavior in the
field and show that initial pro-social acts that come at a cost increase
the likelihood of subsequent prosocial acts. Baca-Motes et al. (2013)
show that a purely symbolic commitment to an environmentally friendly
practice significantly increases this practice. Gneezy and Rustichini (2000)
found that the introduction of fines increased late arrivals by parents at
day-care centers. Based on a field study in an all-you-can-eat restaurant,
Just and Wansink (2011) suggest that individuals are consuming to get
their money’s worth rather than consuming until their marginal hedonic
MIZIK_9781784716745_t.indd 43 14/02/2018 16:38

utility of consumption is zero. Shu et al. (2012) partner with an auto-

mobile insurance company and find that signing official documents at
the top rather than at the bottom makes ethics more salient and reduces
dishonesty. Kivetz et al. (2006) demonstrate in the field that consumption
increases as consumers approach a reward. Anderson and Simester (2008)
used a field experiment that randomized whether there was a surcharge for
larger sizes to show that customers respond negatively toward attempts to
stigmatize a group by charging a higher price to them.
Limitations
Any empirical technique has limitations, and given the special status that
field experiments are afforded regarding causal inference in the social sci-
ences, it is particularly important to understand these limitations. We also
point our readers to the broader debate in economics about the usefulness
of field experiments (see, for example, Deaton (2009) and Banerjee and
Duflo (2008)).
Lack of Theory
A common critique of field experiments is that they lack theoretical

grounding. However, this appears to be a critique of implementation
rather than a critique of method, since a field experiment is purely a sta-
tistical technique for obtaining causal inference. It is perfectly viable and
indeed desirable for a field experiment to both test and enhance theory.
Indeed List (2011) states that ‘Experimental results are most generalizable
when they are built on tests of [economic] theory.’
One practical way that many field experiments test and enhance theory
is by considering different treatment effects in their data, and showing that
the treatment effect is larger when theory would predict and absent when
theory would predict. Of course, one limitation to this approach is that if
there is uncertainty about the exact outcome, it is very hard to design field
experiments to test a behavioral mechanism at the same time as designing
the initial field experiment.
It is worth noting that structural econometric techniques can be com-
bined very well with field experiment data. There is nothing that forces
a structural research project to use observational data, and indeed great
insights can be gained from the combination of an economic model and
associated modeling with the clarity about the data generating process that
is afforded by a field experiment. Examples of researchers who have pur-
sued this path include, in economics, Duflo et al. (2012) who model dynamic
MIZIK_9781784716745_t.indd 44 14/02/2018 16:38

incentives for absenteeism, and, in marketing, Yao et al. (2012) who use a
structural model to evaluate implied discount rates in a field experiment
where consumers were randomly switched from a linear to a three-part
tariff pricing plan as well as Dube et al. (2016 who use two field experiments
and a structural model to analyze the role of self-signaling in choices.
Another kind of work in this vein is researchers who use estimates
from a field experiment to validate their model. For example, Misra and
Nair (2011) used their estimates of differences in dynamic incentives for
sales force compensation to implement a field test of new compensation
schemes which led to $12 million annually in incremental revenues. Li
and Kannan (2014) use a field experiment to evaluate their model for
multichannel attribution.
A general challenge with field experiments is clarifying the degree of
generalizability of any one study and understanding how the lessons of
one point in time will apply in the future.13 It is perhaps a useful reminder
in particular that the aim of a field experiment is not simply to measure
a variable at one point in time, but instead to try and measure something
that has relevance to both managers and academic theory in the future.
External Generalizability
An obvious question is how the results of a field experiment conducted,

for example, in Mexico will generalize to, say, Norway or India. Without
knowledge of the precise primitives that condition a behavioral response
among a population, such generalizations are impossible. The same
critique would be true of a study based on observational data, and it is
important to recognize that a field experiment does not solve this general-
izability problem.
Another more subtle critique regarding generalizability is the extent to
which the culture of the firm that is willing to experiment may affect the
results. For example, a firm that is willing to embrace digital experimenta-
tion might have other attributes such as superior staff or design interface
which aid in unobserved ways the success of the field test. This may
potentially limit the generalizability of the findings in other commercial
contexts.
Of course, one solution to both these challenges is to replicate field
experiments across multiple different domains, customers and firms. Such
replications allow researchers to understand better the boundaries of any
measured effect from in a field experiment context. A good example of
the advantages of such an approach is provided by Kremer and Holla
(2009), who summarize the learning of several field experiments for
the developing world. We also point to Lambrecht et al. (2017), who
MIZIK_9781784716745_t.indd 45 14/02/2018 16:38

implement a field experiment with both a charity for homeless people as

well as with a fashion firm to confirm their results.
One-shot
One practical challenge of field experiments is that they often require sub-
stantial effort and/or expense and so a researcher often has only one shot.
This has two implications. First, a field experiment ‘gone wrong’ because
of a flaw in the setup, be it theoretical or in the practical implementation,
can often not easily be run again, requiring the researcher to carefully con-
sider all possible difficulties and carefully check all practical requirements
(e.g., regarding data collection) upfront. Second, it means that researchers
can usually implement only a limited set of experimental conditions. As
a result, researchers who aim to demonstrate a more complex behavioral
mechanism sometimes complement their field data with laboratory experi-
ments (Berger and Heath, 2008).
Limited Scope
In the current debate about how appropriate field experiments are

for understanding poverty interventions, the director of the World
Bank’s research department wrote the provocatively entitled ‘Should the
Randomistas Rule?’, making the following point:
From the point of view of development policy-making, the main problem in

the randomistas’ agenda is that they have put their preferred method ahead of
the questions that emerge from our knowledge gaps. Indeed, in some respects
(such as the sectoral allocation of research) the randomistas success may have
made things worse. The risk is that we end up with lots of social experiments
that provide evidence on just one or two parameters for a rather narrow set
of assigned interventions and settings. The knowledge gaps persist and even
widen. (Ravallion, 2009)
The same argument could be made within marketing. Field experi-

ment methods are a wonderful way of accurately measuring a causal
effect. However, as this article has highlighted, there are some domains
of marketing enquiry such as communication and pricing where field
experiments are particularly apt, and other areas such as strategy, product
development, and distribution where field experiment techniques are often
more difficult to implement and less likely to be useful. Obviously, this
does not mean that such questions should not be asked, but instead that
we should be mindful that field experiments have many advantages as a
technique but a potentially limited range of applications.
MIZIK_9781784716745_t.indd 46 14/02/2018 16:38

Conclusion
This chapter argues that one of the major advances of the digital age
has been to allow digital experimentation. The main advantage of such
digital experimentation is to allow causal inference. The challenge now
for researchers in this space is to ensure that the causal inferences they
are making are both correct given the setting and limitations of any field
experiment, and useful in terms of advancing marketing practice.
Notes
1. This builds on a large number of books and articles that have covered similar material
(Angrist and Pischke, 2009; Manski, 2007; Meyer, 1995; Cook and Campbell, 1979;
Imbens and Wooldridge, 2009).
2. Stratified randomization can deal with this possibility when there is data on the observ-
able characteristics of different units.
3. optimizely.com
4. Roberts (1957) puts this well by advising the researcher to make sure that the popula-
tion being studied can be broken down into smaller units (families, stores, sales territo-
ries, etc.) for which the experimental stimuli can be measured and for which responses
to the stimuli are not ‘contagious.’
5. Such spillovers are currently attracting the attention of econometricians at the frontier
of the analysis of randomized experiments. We point the interested reader to the work
of Barrios et al. (2012), among others.
6. A special case of such a stratified design is a pairwise design where each stratum con-
tains a matched pair of individuals, one of whom receives the treatment and the other
does not.
7. Roberts (1957) states that ‘The sample size is large enough to measure important
responses to experimental stimuli against the background of uncontrolled sources of
variation.’
8. Roberts (1957) urges researchers to ensure that ‘The experiment is run sufficiently long
that responses to experimental stimuli will have time to manifest themselves.’
9. Roberts (1957) emphasizes that researchers should try and make sure ‘Neither the
stimulus nor the response is changed by the fact that an experiment is being conducted.’
10. Roberts (1957) somewhat anticipates this when he urges researchers to ensure that ‘The
experimentor is able to apply or withhold, as he chooses, experimental stimuli from any
particular unit of the population he is studying.’
11. http://www.mcdonalds.co.uk/ukhome/whatmakesmcdonalds/questions/food/nutrition
al-information/how-do-you-product-test-new-products.html (last accessed October 3,
2017).
12. Much work on privacy is limited by firm’s unwillingness to experiment with something
as legally and ethically sensitive as consumer privacy. Therefore, many studies have
taken the approach of Goldfarb and Tucker (2011b); Tucker (2014b) and mixed field
experiment data with quasi-experimental changes in privacy regimes.
13. Roberts (1957) urges researchers to ensure that ‘The underlying conditions of the past
persist into the future.’
MIZIK_9781784716745_t.indd 47 14/02/2018 16:38

References
Adair, J. G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact.

Journal of Applied Psychology 69 (2), 334–345.
Anderson, E. T. and D. I. Simester (2001). Are sale signs less effective when more products
have them? Marketing Science 20 (2), 121–142.
Anderson, E. T. and D. Simester (2003). Effects of $9 price endings on retail sales: Evidence
from field experiments. Quantitative Marketing and Economics 1 (1), 93–110.
Anderson, E. T. and D. I. Simester (2004). Long-run effects of promotion depth on new
versus established customers: Three field studies. Marketing Science 23 (1), 4–20.
Anderson, E. T. and D. I. Simester (2008). Research note: Does demand fall when custom-
ers perceive that prices are unfair? The case of premium pricing for large sizes. Marketing
Science 27 (3), 492–500.
Anderson, E. T. and D. I. Simester (2010). Price stickiness and customer antagonism.
Quarterly Journal of Economics 125 (2), 729–765.
Anderson-Macdonald, S., R. Chandy, and B. Zia (2015). Returns to business education: The
impact of marketing (versus finance) skills on the performance of small firm owners in
South Africa, Unpublished Manuscript, Stanford University.
Andrews, M., X. Luo, Z. Fang, and A. Ghose (2015). Mobile ad effectiveness: Hyper-
contextual targeting with crowdedness. Marketing Science 35 (2), 218–233.
Angrist, J. D. and J.-S. Pischke (2009). Mostly Harmless Econometrics: An Empiricist’s
Companion. Princeton University Press.
Aral, S. and D. Walker (2011). Creating social contagion through viral product design: A
randomized trial of peer influence in networks. Management Science 57 (9), 1623–1639.
Ascarza, E., R. Iyengar, and M. Schleicher (2016). The perils of proactive churn prevention
using plan recommendations: Evidence from a field experiment. Journal of Marketing
Research 53 (1), 46–60.
Baca-Motes, K., A. Brown, A. Gneezy, E. A. Keenan, and L. D. Nelson (2013). Commitment
and behavior change: Evidence from the field. Journal of Consumer Research 39 (5),
1070–1084.
Banerjee, A. V. and E. Duflo (2008). The experimental approach to development economics.
Working Paper 14467, National Bureau of Economic Research.
Barrios, T., R. Diamond, G. W. Imbens, and M. Kolesar (2012). Clustering, spatial
correlations, and randomization inference. Journal of the American Statistical Association
107 (498), 578–591.
Bart, Y., A. T. Stephen, and M. Sarvary (2014). Which products are best suited to mobile
advertising? A field study of mobile display advertising effects on consumer attitudes and
intentions. Journal of Marketing Research 51 (3), 270–285.
Bell, D., S. Gallino, and A. Moreno (2015). Showrooms and information provision in omni-
channel retail. Production and Operations Management 24 (2), 360–362.
Berger, J. and C. Heath (2008). Who drives divergence? Identity signaling, outgroup
dissimilarity, and the abandonment of cultural tastes. Journal of Personality and Social
Psychology 95 (3), 593.
Bertrand, M., D. Karlan, S. Mullainathan, E. Shafir, and J. Zinman (2010). What’s advertis-
ing content worth? Evidence from a consumer credit marketing field experiment. Quarterly
Journal of Economics 125 (1), 263–305.
Blake, T., C. Nosko, and S. Tadelis (2015), Consumer Heterogeneity and Paid Search
Effectiveness: A Large-Scale Field Experiment. Econometrica 83, 155–174
Boudreau, K. J., N. Lacetera, and K. R. Lakhani (2011). Incentives and problem uncertainty
in innovation contests: An empirical analysis. Management Science 57 (5), 843–863.
Bruhn, M. and D. McKenzie (2008). In pursuit of balance: Randomization in practice in
development field experiments. World Bank Policy Research Working Paper Series WPS
4752.
Burtch, G., A. Ghose, and S. Wattal (2015). The hidden cost of accommodating crowdfunder
privacy preferences: A randomized field experiment. Management Science 61 (5), 949–962.
MIZIK_9781784716745_t.indd 48 14/02/2018 16:38

Cook, T. D. and D. T. Campbell (1979). Quasi-Experimentation: Design & Analysis Issues for
Field Settings. Houghton Mifflin.
Deaton, A. S. (2009). Instruments of development: Randomization in the tropics, and the
search for the elusive keys to economic development. Working Paper 14690, National
Bureau of Economic Research.
Draganska, M., W. R. Hartmann, and G. Stanglein (2014). Internet versus television adver-
tising: A brand-building comparison. Journal of Marketing Research 51 (5), 578–590.
Dube, J.-P., X. Luo, and Z. Fang (2016). Self-signaling and pro-social behavior: a cause
marketing mobile field experiment. Marketing Science 36 (2), 161–186.
Duflo, E., R. Hanna, and S. P. Ryan (2012). Incentives work: Getting teachers to come to
school. American Economic Review 102 (4), 1241–78.
Dupas, P. (2014). Short-run subsidies and long-run adoption of new health products:
Evidence from a field experiment. Econometrica 82 (1), 197–228.
Fong, N. M. (2012). Targeted marketing and customer search. Available at SSRN 2097495.
Fong, N. M., Z. Fang, and X. Luo (2015). Geo-conquesting: Competitive locational target-
ing of mobile promotions. Journal of Marketing Research 52 (5), 726–735.
Gallino, S. and A. Moreno (2014). Integration of online and offline channels in retail: The
impact of sharing reliable inventory availability information. Management Science 60 (6),
1434–1451.
Gneezy, A., U. Gneezy, L. D. Nelson, and A. Brown (2010). Shared social responsibility: A
field experiment in pay-what-you-want pricing and charitable giving. Science 329 (5989),
325–327.
Gneezy, A., U. Gneezy, G. Riener, and L. D. Nelson (2012). Pay-what-you-want, identity,
and self-signaling in markets. Proceedings of the National Academy of Sciences 109 (19),
7236–7240.
Gneezy, A., A. Imas, A. Brown, L. D. Nelson, and M. I. Norton (2012). Paying to be nice:
Consistency and costly prosocial behavior. Management Science 58 (1), 179–187.
Gneezy, U. and A. Rustichini (2000). Fine is a price. Journal of Legal Studies 29, 1.
Goldfarb, A. and C. Tucker (2011a). Online display advertising: Targeting and obtrusive-
ness. Marketing Science 30 (3), 389–404.
Goldfarb, A. and C. Tucker (2011b). Privacy regulation and online advertising. Management
Science 57 (1), 57–71.
Goldfarb, A. and C. Tucker (2015). Standardization and the effectiveness of online advertis-
ing. Management Science 61 (11), 2707–2719.
Hildebrand, C., G. Häubl, and A. Herrmann (2014). Product customization via starting solu-
tions. Journal of Marketing Research 51 (6), 707–725.
Hoban, P. R. and R. E. Bucklin (2014). Effects of internet display advertising in the purchase
funnel: Model-based insights from a randomized field experiment. Journal of Marketing
Research 52 (3), 375–393.
Imai, K., G. King, C. Nall, et al. (2009). The essential role of pair matching in cluster-
randomized experiments, with application to the Mexican universal health insurance
evaluation. Statistical Science 24 (1), 29–53.
Imai, K., G. King, and E. A. Stuart (2008). Misunderstandings between experimentalists and
observationalists about causal inference. Journal of the Royal Statistical Society: Series A
(Statistics in Society) 171 (2), 481–502.
Imbens, G. and J. Wooldridge (2009). Recent developments in the econometrics of program
evaluation. Journal of Economic Literature 47 (1), 5–86.
Jones, S. R. (1992). Was there a Hawthorne effect? American Journal of Sociology 98 (3),
451–468.
Jung, M. H., L. D. Nelson, A. Gneezy, and U. Gneezy (2014). Paying more when paying for
others. Journal of Personality and Social Psychology 107 (3), 414.
Just, D. R. and B. Wansink (2011). The flat-rate pricing paradox: Conflicting effects of
‘all-you-can-eat’ buffet pricing. Review of Economics and Statistics 93 (1), 193–200.
Kim, J.-Y., M. Natter, and M. Spann (2009). Pay what you want: A new participative pricing
mechanism. Journal of Marketing 73 (1), 44–58.
MIZIK_9781784716745_t.indd 49 14/02/2018 16:38

Kivetz, R., O. Urminsky, and Y. Zheng (2006). The goal-gradient hypothesis resurrected:
Purchase acceleration, illusionary goal progress, and customer retention. Journal of
Marketing Research 43 (1), 39–58.
Kremer, M. and A. Holla (2009). Improving education in the developing world: What have
we learned from randomized evaluations? Annual Review of Economics 1 (1), 513–542.
Lambrecht, A. and C. Tucker (2012). Paying with money or with effort: Pricing when
customers anticipate hassle. Journal of Marketing Research 49 (1), 66–82.
Lambrecht, A. and C. Tucker (2013). When does retargeting work? Information specificity in
online advertising. Journal of Marketing Research 50 (5), 561–576.
Lambrecht, A., C. Tucker, and C. Wiertz (2017). Advertising to early trend propagators?
Evidence from Twitter. Marketing Science, forthcoming.
Lee, L. and D. Ariely (2006). Shopping goals, goal concreteness, and conditional promotions.
Journal of Consumer Research 33 (1), 60–70.
Levav, J., M. Heitmann, A. Herrmann, and S. S. Iyengar (2010). Order in product customi-
zation decisions: Evidence from field experiments. Journal of Political Economy 118 (2),
274–299.
Lewis, R. A. and J. M. Rao (2015). The Unfavorable Economics of Measuring the Returns
to Advertising, Quarterly Journal of Economics 130 (4), 1941–1973.
Lewis, R. A. and D. H. Reiley (2014a). Advertising effectively influences older users:
How field experiments can improve measurement and targeting. Review of Industrial
Organization 44 (2), 147–159.
Lewis, R. A. and D. H. Reiley (2014b). Online ads and offline sales: measuring the effect
of retail advertising via a controlled experiment on Yahoo! Quantitative Marketing and
Economics 12 (3), 235–266.
Li, H. A. and P. Kannan (2014). Attributing conversions in a multichannel online marketing
environment: An empirical model and a field experiment. Journal of Marketing Research
51 (1), 40–56.
List, J. A. (2011). Why economists should conduct field experiments and 14 tips for pulling
one off. Journal of Economic Perspectives 25 (3), 3–16.
Manski, C. F. (2007). Identification for Prediction and Decision. Harvard University Press.
McCarney, R., J. Warner, S. Iliffe, R. van Haselen, M. Griffin, and P. Fisher (2007). The
Hawthorne effect: A randomised, controlled trial. BMC medical research methodology
7 (1), 30.
Meyer, B. (1995). Natural and quasi-experiments in economics. Journal of Business and
Economic Statistics 13 (2) 151–161.
Miltgen, C. and C. Tucker (2014). Resolving the privacy paradox: Evidence from a field
experiment. Mimeo, MIT.
Misra, S. and H. S. Nair (2011). A structural model of sales-force compensation dynamics:
Estimation and field implementation. Quantitative Marketing and Economics 9 (3), 211–257.
Nosko, C. and S. Tadelis (2015). The limits of reputation in platform markets: An empirical
analysis and field experiment. National Bureau of Economic Research working paper No.
20830.
Parsons, H. M. (1974). What happened at Hawthorne? New evidence suggests the Hawthorne
effect resulted from operant reinforcement contingencies. Science 183 (4128), 922–932.
Ravallion, M. (2009). Should the randomistas rule? The Economists’ Voice 6 (2).
Roberts, H. V. (1957). The role of research in marketing management. Journal of Marketing
22 (1), 21–32.
Rubin, D. B. (2005). Causal inference using potential outcomes. Journal of the American
Statistical Association 100 (469), 322–331.
Sahni, N. (2015). Effect of temporal spacing between advertising exposures: Evidence from
an online field experiment. Quantitative Marketing and Economics 13 (3), 203–247.
Sahni, N., D. Zou, and P. K. Chintagunta (2014). Effects of targeted promotions: Evidence
from field experiments. Available at SSRN 2530290.
Schwartz, E. M., E. Bradlow, and P. Fader (2016). Customer acquisition via display advertis-
ing using multi-armed bandit experiments. Marketing Science 36 (4), 500–522.
MIZIK_9781784716745_t.indd 50 14/02/2018 16:38

Shu, L. L., N. Mazar, F. Gino, D. Ariely, and M. H. Bazerman (2012). Signing at the begin-
ning makes ethics salient and decreases dishonest self-reports in comparison to signing at
the end. Proceedings of the National Academy of Sciences 109 (38), 15197–15200.
Simester, D., Y. J. Hu, E. Brynjolfsson, and E. T. Anderson (2009). Dynamics of retail adver-
tising: Evidence from a field experiment. Economic Inquiry 47 (3), 482–499.
Sun, M., X. M. Zhang, and F. Zhu (2012). To belong or to be different? evidence from a
large-scale field experiment in China. NET Institute Working Paper (12–15).
Tadelis, S. and F. Zettelmeyer (2011). Information disclosure as a matching mechanism:
Theory and evidence from a field experiment. Available at SSRN 1872465.
Toubia, O. and A. T. Stephen (2013). Intrinsic vs. image-related utility in social media: Why
do people contribute content to Twitter? Marketing Science 32 (3), 368–392.
Tucker, C. (2014a). Social Advertising. Mimeo, MIT.
Tucker, C. (2014b). Social networks, personalized advertising, and privacy controls. Journal
of Marketing Research 51 (5), 546–562.
Yao, S., C. F. Mela, J. Chiang, and Y. Chen (2012). Determining consumers’ discount rates
with field studies. Journal of Marketing Research 49 (6), 822–841.
MIZIK_9781784716745_t.indd 51 14/02/2018 16:38

3. Conjoint Analysis
Olivier Toubia
This chapter assumes the reader has a basic understanding of the work-
ings of Conjoint Analysis. For readers interested in a more comprehensive
coverage of the topic, I recommend the exhaustive reviews of academic
research in Conjoint Analysis in Agarwal et al. (2015); Bradlow (2005);
Green, Krieger and Wind (2001); or Netzer et al. (2008). Conversely,
readers who would like an introduction to the basics of conjoint meas-
urement may want to consult Sawtooth Software’s website (see http://
www.sawtoothsoftware.com/support/technical-papers#general-conjoint-
analysis and http://www.sawtoothsoftware.com/academics/teaching-aids),
or Ofek and Toubia (2014a), Rao (2010), or Green, Krieger and Wind
(2001).
Conjoint Analysis: Overview
Conjoint Analysis is probably one of the most used quantitative market-

ing research methods. Its history started in the early 1970s (Green and
Rao 1971), and it has foundations in Mathematical Psychology (Luce
and Tukey 1964). Many managerial applications of Conjoint Analysis
have been documented over the years (e.g., Green, Krieger and Wind
2001). “Classic” applications include the design of Marriott’s Courtyard
Hotels (Wind et al. 1989) and the design and evaluation of the New Jersey
and New York EZ-Pass system (Green, Krieger and Vavra 1999). More
recent high-profile applications include the Apple v. Samsung patent trial
(see Netzer and Sambandam 2014 for a description). Conjoint Analysis
has also been adapted in creative ways that have extended the scope of
its applications. For example, Yahoo! used a modified form of Conjoint
Analysis to understand users’ preferences for various types of news articles
(Chu et al. 2009). Based on this understanding, Yahoo! was able to better
customize the news articles shown on its landing page and increase the
click-through rates on these articles.
Conjoint Analysis is a method for quantifying consumer preferences,
i.e., for estimating utility functions. The premise of Conjoint Analysis is to
decompose a product or service into attributes (e.g., “number of minutes
included,” “number of GB of data,” “charge for additional minutes,”
52
MIZIK_9781784716745_t.indd 52 14/02/2018 16:38

Conjoint Analysis 53
“base price,” etc.) that each has different levels (e.g., “500 minutes,” “1,000
minutes,” “unlimited”). The output of a Conjoint Analysis study is an esti-
mation of how much each consumer in a sample values each level of each
attribute. Such preferences are called partworths, because they capture
how much each part of the product is worth to the consumer.
Conjoint Analysis takes somewhat of an indirect approach to estimat-
ing partworths. Instead of asking consumers directly how much they
value each level of each attribute, Conjoint Analysis asks consumers to
evaluate profiles, defined by a set of attribute levels. A profile might be
a “$100 plan with unlimited calls and 10 GB of data per month.” Then,
Conjoint Analysis relies on statistical analysis to disentangle the value of
each attribute level based on consumers’ evaluations of profiles. By doing
that, Conjoint Analysis builds a model of consumer behavior, which can
predict each consumer’s preferences for any profiles, even if they were
not included in the survey. For example, suppose we have five attributes
with three levels each. There are 35 = 243 possible profiles. We might ask
consumers to evaluate 15 of these profiles, estimate their partworths for
each attribute level based on these data, and then be able to predict market
share for any set of profiles that contains any number of these 243 possible
profiles.
The number of partworths estimated for each attribute is equal to
the number of levels in that attribute minus 1. The loss of one degree
of freedom emerges from statistical considerations, which will become
clear to the statistically minded reader later in the chapter. Intuitively,
each attribute in each profile must be at one level. If there are L levels
in a given attribute, it is possible to describe the level of each profile
on that attribute using only L – 1 variables. (For example, if L = 2 and
we know whether the attribute is at the first level, we can deduce with
certainty whether it is at the second level.) There are different ways
to reduce the degrees of freedom. Interested readers are referred to
Kuhfeld (2005). One simple way is to set one level of each attribute as
the “baseline” and define each other partworth in that attribute with
respect to this baseline. For example, if the partworth for “500 min”
is set as the baseline, the partworth for “1,000 minutes” captures the
additional utility provided to the consumer by an increase from 500
minutes to 1,000 minutes.
Mathematically, if consumers are indexed by i, profiles by j, and
attributes by k, Conjoint Analysis assumes that the utility of profile j for
consumer i is given as follows:
uij 5 ai 1 a k bik xjk 1eij (3.1)
MIZIK_9781784716745_t.indd 53 14/02/2018 16:38

Where:
l ai is an intercept that captures the baseline utility for consumer i.

Note that this intercept is not included when using Choice-Based
Conjoint Analysis (see below).
l bik is a vector that captures the partworths of consumer i for attribute
k. Because of the reduction in degrees of freedom mentioned earlier,
if there are L levels in attribute k, this vector has one row and L – 1
columns.
l xjk is a vector that captures the level of profile j on attribute k.
If there are L levels in attribute k, this vector also has one row and
L – 1 columns.
l eij captures random variations.
Note that this basic model assumes that all levels of all attributes enter
linearly and independently into the utility function. However, this model
may be easily extended to include interactions between attributes. For
example, if it is believed that consumers value voice minutes more in a
cellular plan when more data are available, an additional interaction term
may be included in the utility function, which would capture the joint
presence of a large number of minutes and a large amount of data. In
practice, these interactions are seldom used. One of the issues related to
the use of interactions is that the number of possible interactions is very
large. Therefore they should only be included if the researcher has a strong
and valid reason to believe that specific interactions are relevant.
Note also that the additivity of the utility function implies that the basic
model is compensatory, i.e., it is possible to “make up” for a lower value
on one attribute by increasing the value on another attribute. However,
in some cases, consumers may evaluate profiles using non-compensatory
rules. Examples of non-compensatory rules include conjunctive rules
(where a profile “passes” the rule if it meets a list of criteria, e.g., a car
has to be of a certain body type and be below a certain price), disjunctive
rules (where a profile “passes” the rule if it meets any criterion from
a list, e.g., a car has to be of a certain body type or be below a certain
price), disjunctions of conjunctions (where a profile “passes” the rule if
satisfies at least one conjunctive rule from a set of conjunctive rules – see
Hauser et al. 2010), lexicographic (where profiles are ranked based
on criteria that are considered sequentially, e.g., cars are first ranked
according to body type, then according to price), and elimination by
aspect (where profiles are eliminated from the choice set by considering
various criteria sequentially – see Tversky 1972). It has been noted that
non-compensatory decision rules might actually be approximated using
MIZIK_9781784716745_t.indd 54 14/02/2018 16:38

additive utility functions, such as the one assumed typically in conjoint

analysis, by allowing extreme weights on certain subsets of attributes
(see, for example, Bröder 2000). Nevertheless, a literature has developed
for dealing specifically with non-compensatory rules (see, for example,
Gilbride and Allenby 2004; Jedidi and Kohli 2005; Kohli and Jedidi 2007;
Hauser 2014; Yee et al. 2007). This literature often considers the use of
non-compensatory rules by consumers when they form their considera-
tion sets (i.e., the relatively small set of alternatives to consider seriously),
and assumes that choices among the alternatives in the consideration set
follow a compensatory process.
There exist a wide range of options for running a Conjoint Analysis
study. Surveys may be run literally within a day with very limited budget.
Other surveys, in particular in litigation contexts, can take months and
cost hundreds of thousands of dollars. While Conjoint Analysis surveys
vary in many ways, they all involve the following steps:
1. Select attributes and levels.

2. Survey Implementation and Data Collection.
3. Partworths Estimation and Inference.
Readers are referred to Orme (2002) or Ofek and Toubia (2014b) for
guidelines regarding the first step. The second and third steps will be dis-
cussed below.
I close this section by noting that there also exist market research
methods that measure partworths directly instead of taking the indirect
approach followed by Conjoint Analysis. These methods are referred
to as “self-explicated” (Leigh, MacKay and Summers 1984; Netzer and
Srinivasan 2011). Although the self-explicated approach leads to ques-
tions that are probably easier for consumers to answer and produces
data that are easier to analyze, it suffers from one major limitation. In
particular, when asked directly how much they care about each attribute
or level, consumers have a tendency to claim that “everything is impor-
tant.” This leads to partworth estimates that do not discriminate as much
between attributes. By forcing consumers to make tradeoffs (e.g., “this
plan has more data but it is more expensive, is the difference really justi-
fied?”), Conjoint Analysis is believed to provide a more nuanced picture
of consumer preferences. Note, however, that empirical comparisons of
Conjoint Analysis versus the self-explicated approach have produced
mixed results (e.g., Leigh, MacKay and Summers 1984; Netzer and
Srinivasan 2011; Sattler and Hensel-Börner 2001), and the self-explicated
approach remains a viable alternative to Conjoint Analysis.
MIZIK_9781784716745_t.indd 55 14/02/2018 16:38

Survey Implementation
In this section I discuss some issues related to choosing a Conjoint

Analysis format, constructing an experimental design, hosting the survey
and collecting the data.
Format
Several formats of Conjoint Analysis have been proposed over the years.
The most traditional format is usually referred to as “ratings-based
Conjoint Analysis.” Ratings-based Conjoint Analysis consists of showing
respondents several profiles (usually between 12 and 20) and asking them
to rate each of them on some response scale. That is, each profile receives
a preference score that may be translated into a numerical value. Profiles
are assumed to be rated independently from each other by the consumer,
i.e., there are no comparison between profiles.
This older format of Conjoint Analysis offers several benefits, but it
suffers from some limitations. One of the main benefits is the ease with
which it may be implemented and the ease with which the results may
be analyzed. It is not an exaggeration to claim that with today’s tools, a
ratings-based Conjoint Analysis survey may be conducted from start to
finish within a day and with virtually no budget. In particular, libraries
exist that will provide the researcher with an efficient experimental design
(see next subsection). Online platforms like Qualtrics or SurveyMonkey
may be used to construct the online survey, i.e., obtain a link to the survey
that may be shared with respondents. This link may be sent to lists main-
tained by the researcher, or panels like Amazon Mechanical Turk may be
used to obtain several hundred respondents within a few hours, for a cost
in the order of $1 per respondent. Finally, the analysis of ratings-based
Conjoint Analysis data may be conducted using standard software such
as Microsoft Excel. These benefits make ratings-based Conjoint Analysis
a good choice for researchers working on a very tight deadline and with a
very tight budget.
However, ratings-based Conjoint Analysis also suffers from limitations.
In particular, it does not truly force respondents to make tradeoffs or to
make choices that resemble real life situations. Indeed, nothing prevents
the respondents from giving the same rating to all profiles. In addition,
rating is not an activity in which consumers engage on a regular basis
in their everyday lives (with a few notable exceptions such as product
reviews). Therefore, it is questionable whether ratings-based Conjoint
Analysis provides data that reflect the real-world decisions made by
consumers.
MIZIK_9781784716745_t.indd 56 14/02/2018 16:38

Another popular format of Conjoint Analysis, which has become

the state of the art, is called Choice-Based Conjoint Analysis (CBC).
(See Louviere and Woodworth 1983 for an early reference on CBC and
Louviere, Hensher and Swait 2000 for a more recent and exhaustive treat-
ment of CBC). This format asks consumers to choose between profiles.
That is, the respondent is presented with a series of choice questions (often
about 12 to 20) one after the other, where each question asks to select
which profile from a small set (usually two to four) the respondent would
be most likely to choose or purchase. Each choice question may also offer
a “no choice” alternative, i.e., the respondent is able to indicate that they
would not purchase any option in the set.
The main benefit of this format is that it is closer to the type of decisions
that consumers make in real life. Indeed, most consumption decisions
involve choosing one alternative over others. Accordingly, this format
is considered more realistic. In addition, when a “no choice” option is
included, this format does not only allow the researcher to predict relative
preferences for various profiles, it also allows predicting the proportion of
consumers who would actually purchase each profile. In other words, this
format allows estimating primary demand.
The main disadvantage of this format is that it requires more resources
to implement. In particular, the theory behind optimal experimental
designs and the practical identification of optimal experimental designs are
more challenging with CBC than with ratings-based Conjoint Analysis.
The implementation of the survey and the data collection are not sig-
nificantly more challenging. The statistical analysis of CBC data requires
more advanced statistical software, and it may not be done using built-in
functions in Microsoft Excel. Some studies have compared ratings-based
Conjoint Analysis to CBC in terms of their ability to predict choices, with
mixed results (e.g., Elrod, Louviere and Davey 1992; Moore 2004).
Several other formats of Conjoint Analysis are also worth mentioning.
These include paired-comparisons (Johnson 1987; and Toubia et al. 2003)
and rankings (Green and Rao 1971; Srinivasan and Shocker 1973). These
formats are not used as frequently in today’s environment.
In practice, researchers on a tight budget who would like to run a
Conjoint Analysis study without the need for specialized software or
advanced statistical knowledge would be best advised to settle for a
ratings-based format. Researchers with more resources should favor
a Choice-Based Conjoint format, with the realization that it tends to
significantly increase the total cost of the survey.
MIZIK_9781784716745_t.indd 57 14/02/2018 16:38

Experimental Design
The experimental design behind a Conjoint Analysis survey specifies

the set of profiles to be included. In the case of ratings-based Conjoint
Analysis, it specifies the set of profiles to be rated by respondents, i.e., it
specifies the level of each attribute for each profile. In the case of Choice-
Based Conjoint Analysis, it specifies the sets of profiles to be included in
each choice question.
Experimental designs should not be chosen randomly. First, a poorly
designed set of profiles may lead to data that cannot be estimated using
regression analysis. For example, if two attribute levels are perfectly cor-
related (e.g., all profiles with unlimited voice also have unlimited data),
it will not be possible statistically to estimate the partworths of these two
attribute levels separately. Second, even if the set of profiles is compat-
ible with a regression, the confidence intervals around the estimates may
be larger than optimal. That is, the experimental design may not be as
statistically efficient as it could be. The statistical efficiency of a conjoint
experimental design is a measure of the accuracy with which it allows
estimating the partworths. See Kuhfeld, Tobias and Garratt (1994) for
formal definitions of statistical efficiency, and Toubia and Hauser (2007)
for measures of statistical efficiency that take into account the managerial
goals of the study.
A large academic literature has studied ways to find optimal experimen-
tal designs, i.e., experimental designs with maximum statistical efficiency.
This literature is not unique to marketing. Indeed, the issue of optimally
designing experiments is relevant in many fields, including agriculture,
physics, biology, psychology, etc. Interested readers are referred to
Kuhfeld, Tobias and Garratt (1994) and Kuhfeld (2005).
In the case of ratings-based Conjoint Analysis, well-developed libraries
of optimal experimental designs are readily accessible. Examples include
the %MktEx routine in SAS (Kuhfeld 2005) and the Excel-based library
provided by Ofek and Toubia (2014b). Optimal designs tend to have
certain properties. For example, they tend to be “orthogonal,” meaning
that, for any two attributes, each pair of levels occurs in the same number
of profiles (e.g., three profiles have attribute 1 at level 1 and attribute 2 at
level 1, three profiles have attribute 1 at level 1 and attribute 2 at level 2,
etc.).
In the case of Choice-Based Conjoint Analysis, optimizing experi-
mental designs is more challenging, because the statistical efficiency of
a CBC design depends on the true value of the partworths (Huber and
Zwerina 1996; Arora and Huber 2001). It is advisable to use specialized
software to create the designs in such cases. Examples include Sawtooth
MIZIK_9781784716745_t.indd 58 14/02/2018 16:38

Software’s CBC offering (see http://www.sawtoothsoftware.com/products/

conjoint-choice-analysis/cbc).
Note that adaptive designs have been proposed in an effort to reduce the
length of conjoint questionnaires and further increase the efficiency of the
designs. These methods leverage the ability to do computations on the fly
in order to customize each question based on that particular respondent’s
answers up to that point. Examples include Sawtooth Software ACA
(Johnson 1987) and ACBC (Sawtooth Software 2014), and FastPace
(Toubia et al. 2003, 2004, 2007). Other researchers have proposed inter-
mediate solutions, in which different experimental designs are used across
respondents (e.g., Sándor and Wedel 2005). Although these methods have
been shown to work well, their implementation often requires custom-
ized programming, which may require additional time and programing
resources.
In practice, researchers using ratings-based Conjoint Analysis should
take advantage of existing libraries of optimal experimental designs.
Researchers using CBC are advised to use specialized software to con-
struct their experimental designs, such as Sawtooth. Researchers with
sufficient resources may also use adaptive experimental designs, which
may require customized programming.
Survey Hosting
Many options are easily accessible today to host a Conjoint Analysis

survey. Some specialized software exists, such as Sawtooth Software’s
SSI Web suite. Alternatively, these surveys may be programmed using
general online survey software such as Qualtrics (www.qualtrics.com)
and SurveyMonkey (www.surveymonkey.com). Ofek and Toubia (2014b)
provide examples of online Conjoint Analysis surveys developed in these
platforms. Note that because Conjoint Analysis surveys tend to contain
several questions, they are usually not suitable for “pre-scroll” surveys
such as Google Consumer Surveys.
Data Collection
Most Conjoint Analysis studies are now performed online. Many options
are available today for data collection. Some researchers have access to
proprietary mailing lists of respondents, which may include their per-
sonal contacts, existing customers, etc. Other researchers use traditional
online panels such as Research Now. Those hosting their surveys on
Qualtrics may use that same platform as a source of respondents. In par-
ticular, Qualtrics partners with several online panel companies and offers
MIZIK_9781784716745_t.indd 59 14/02/2018 16:38

c ompetitive panel services. Another alternative is Amazon Mechanical

Turks (AMT). AMT is a panel maintained by Amazon. Unlike with tradi-
tional panels that tend to give “reward points” to their members, members
of the AMT panel (referred to as “workers”) receive well-defined financial
compensation for each survey (or “HIT”) that they complete. Moreover,
AMT allows researchers (“requesters”) to “reject” data coming from
any respondent due to poor quality. This gives panel members a strong
incentive to provide thoughtful answers. Accordingly, evidence suggests
that the quality of the data provided by AMT is at least as good, if not
superior, compared to traditional online panels (Buhrmester, Kwang
and Gosling 2011; Paolacci, Chandler and Ipeirotis 2010). AMT is also
very convenient, as it only takes a few hours to collect data from several
hundred respondents. However, AMT does not allow researchers to limit
their respondents to specific demographic groups. In particular, tradi-
tional online panels maintain basic demographic data on their members,
and allow researchers to specify quotas based on these characteristics (e.g.,
limit the sample to specific age groups or geographical locations, or ensure
that the sample of respondents matches specific distributions). AMT
mainly allows researchers to limit the sample of respondents to specific
countries and to recruit “master workers” with very high approval rates
(i.e., their data have almost never been rejected). However, if a researcher
wanted to screen respondents based on other criteria, they would need to
either announce in the survey description that this survey should only be
completed by certain groups of people, or they should include screening
questions within the survey. The former option suffers from the issue that
it is very hard to enforce and verify that only the “right” consumers took
the survey. The latter option suffers from the limitation that all respond-
ents should be compensated, even those that end up not qualifying. This
oversampling greatly increases the cost per respondent. AMT has become
a very common source of respondents in academia, but its adoption in
industry (and in particular in litigation contexts) has been quite limited.
Note that even researchers who are reluctant to using AMT for their main
survey may still find it a very convenient and inexpensive way to collect
pretest responses.
In practice: traditional online panels offer the “safest” source of
respondents for Conjoint Analysis surveys. Amazon Mechanical Turk can
be faster and cheaper and provide data of higher quality, but it does not
offer as much in terms of imposing quotas based on demographics. AMT
tends to be preferred by academics, while consultants and practitioners
often rely on traditional online panels.
MIZIK_9781784716745_t.indd 60 14/02/2018 16:38

Partworths Estimation and Inference
Partworths Estimation
The data collected in a Conjoint Analysis survey consist of some evaluations

(usually ratings or choices) by a group of consumers on a set of profiles.
Regression analysis is used to estimate the impact of each attribute level on
each respondent’s evaluations. The dependent variable captures the con-
sumers’ evaluations, and the independent variables capture the description
of the profiles. In the case of ratings-based Conjoint Analysis, the depend-
ent variable is usually treated as a continuous variable, and Ordinary Least
Square (OLS) regression may be used. In the case of CBC, the dependent
variable is a discrete choice, and logistic regression is typically used.
One key aspect related to partworth estimation in Conjoint Analysis
is how heterogeneity is addressed. Simple approaches include ignoring
heterogeneity altogether by running a single aggregate regression to esti-
mate average preferences in the market. Consumers may also be grouped
based on demographic or other variables, and separate regressions may
be run for each group. In the case of ratings-based Conjoint Analysis, one
separate regression may be run for each respondent, providing partworth
estimates at the individual level. Ofek and Toubia (2014b) provide an
Excel spreadsheet that contains an example of such a regression.
However, the state of the art consists in providing individual-level
estimates of partworths that are informed by the entire sample. This is
typically achieved using hierarchical Bayes (Lenk et al. 1996; Rossi and
Allenby 2003). Readers interested in a simple introduction to hierarchical
Bayes are referred to Sawtooth Software’s technical papers on this topic
(see www.sawtoothsoftware.com/support/technical-papers#hierarchical-
bayes-estimation). In a nutshell, hierarchical Bayes simultaneously esti-
mates each respondent’s partworths, together with the distribution of
partworths among respondents. A set of partworths is estimated for each
respondent, which is shrunk toward the population average. This shrink-
age reduces the risk of overfitting, by imposing a penalty on parameter
estimates that deviate too much from the mean. Other approaches include
latent class analysis (Kamakura and Russell 1989; Andrews, Ansari and
Currim 2002; Moore 2004), as well as approaches based on Machine
Learning (Evgeniou, Pontil and Toubia 2007). Despite the promise held
by these alternative methods, hierarchical Bayes has become the method of
choice. Its implementation, which used to require extensive programming,
is now much more accessible. Open-source software includes Stan (www.
mc-stan.org) and OpenBUGS (www.openbugs.net). Sawtooth Software
offers commercial software tailored to Conjoint Analysis.
MIZIK_9781784716745_t.indd 61 14/02/2018 16:38

In practice: researchers performing a ratings-based Conjoint Analysis

study with limited resources may use Excel for analysis, perhaps analyzing
data at the aggregate or segment level. Researchers performing a CBC
study and/or researchers with access to enough resources are advised
to estimate partworths using hierarchical Bayes, perhaps using existing
statistical software.
Inference Based on Partworths
Estimating partworths opens many opportunities to address various

managerial questions. Some of the most common types of inference based
on Conjoint Analysis include:
l Optimizing the design of a single product/service,

l Optimizing the design of a line of products/services,
l Inferring willingness to pay for particular features of products/
services,
l Predicting market share,
l Segmenting the market based on preferences.
All these analyses rely on the same model of consumer behavior, which
specifies a utility function based on partworths, and on a link between
utility and choice. In the case of CBC, the link between utility and choice is
given simply by logistic probabilities. In the case of ratings-based conjoint,
one may, for example, assume that when given a choice between various
alternatives, a consumer would choose the one with the highest utility.
Armed with such a model of consumer choice, researchers can simulate
how the market would respond to any set of profiles. In particular,
demand simulators may be built that take as input the partworths of a
representative sample of consumers, and that estimate the market shares
of any profiles given these partworths. See Ofek and Toubia (2014b) for
an example of an Excel-based market share simulator. Such simulators
allow users to specify any number of profiles based on the list of attributes
included in the survey. These profiles may capture existing offerings,
competitors, as well as potential new offerings. Once a market share simu-
lator has been built, it is possible to “play” with the set of profiles and see
the resulting market shares immediately. In addition, several algorithms
have been proposed to find the optimal product or product line, i.e., the
set of profile specifications that will maximize profit (or other objective
functions). See Kohli and Sukumar (1990) or Belloni et al. (2008) for a
review. The implementation of these algorithms often requires customized
programming.
MIZIK_9781784716745_t.indd 62 14/02/2018 16:38

Beyond predicting market shares and optimizing product lines, Conjoint

Analysis is often used to infer willingness to pay for features of a product
or service. This is feasible as long as price is one of the attributes in the
survey. Suppose that the partworth for a price level p1 is bp1, and that the
partworth for a price level p0 < p1 is bp0. Because consumers should prefer
lower prices, holding every other attribute constant, we should expect
the following inequality to hold: bp0 > bp1. In other words, a reduction
in price from p1 to p0 provides a utility of (bp0 − bp1) to that consumer. If
we assume that utility for money is linear in the range [p0, p1], then we
can infer that a reduction of price of $1 provides a utility of bpp01 2 bp1
2 p0 . If we
further assume that utility for money is symmetric in gains versus losses
(i.e., we assume no loss aversion), this quantity captures the “utility equiv-
alent” of $1 for that consumer. Conversely, we can argue that each “unit”
p 2p
of utility is worth bp01 2 b0p1 in dollars for that consumer. This quantity may
be referred to as an “exchange rate” between utility and money. Consider
another attribute where the partworth for level l1 is bl1, and the partworth
for level l0 is bl0. A change from level l0 to l1 provides a utility of (bl1 − bl0)
p 2p
to that consumer. If each “unit” of utility is worth bp01 2 b0p1 in dollars for
that respondent, then if we again assume that utility is linear, we can infer
( )( )
that the respondent should be willing to pay bl1 2b bl02 bp1 2 p0 in dollars for a
p0 p1
change from level l0 to l1. This gives us an estimate of the Willingness to
Pay (WTP) for level l1 relative to level l0 for that consumer. Once WTP is
computed for each consumer in the panel, it may relevant to compute the
mean, median and standard deviation of the WTP. It is also possible to
build a demand curve for that attribute, i.e., the proportion of consumers
in the sample who would be willing to pay at least price p for that attribute,
where p varies.
Another approach for making monetary inferences based on the output
of a Conjoint Analysis survey is to rely again on a market share simula-
tor. In particular, instead of estimating a WTP for each consumer in the
panel for a specific feature, we can determine by how much price would
have to be decreased in order to make up for a reduction in one feature
(or a combination of features). In order to achieve this, we can specify
a set of competing alternatives, e.g., five existing plans offered by our
competitors, and a focal alternative, e.g., a plan offered by our company.
We can estimate the market share of our plan assuming certain levels for
each attribute, e.g., unlimited voice and unlimited data. Then, we can
reduce one of the features of our plan, e.g., only 10 GB of data instead
of unlimited data. Naturally, we would expect the predicted share of our
plan to drop. We can then use the simulator to determine by how much
we would need to decrease the price of our plan with 10 GB in order to
raise the share back to the original level (with unlimited data). Toubia,
MIZIK_9781784716745_t.indd 63 14/02/2018 16:38

Hauser and Garcia (2007) used a similar method to determine the discount
that should be offered to convince wine customers to switch from cork to
screw caps. A similar approach was used in an expert report on the famous
Apple v. Samsung case, to determine how much consumers value certain
features of smartphones such as “pinch-to-zoom.” Readers are referred
to Netzer and Sambandam (2014) for a short and simplified discussion.
This approach is not without its critics, however. Notably, Allenby et al.
(2014) warn against ignoring competitive response to changes in product
attributes and stress the need to consider equilibrium profits when using
Conjoint Analysis to value product features.
Finally, once partworths have been estimated, researchers sometimes
find it useful to explore the existence of distinct segments in the population.
This may provide valuable insights to marketers and constitutes one viable
way to segment markets (other ways include demographic segmentation,
psychographic segmentation, etc.). For this, any segmentation approach
such as k-means clustering may be used.
In practice, calculations of willingness to pay may be completed very
easily using any data handling software. Market share simulators may be
implemented within Microsoft Excel or more complex technical program-
ming software. Market share simulators may also be used to approximate
the market value of an attribute, by determining the loss in profit (i.e.,
price reduction) for a company that would reduce their offering on this
attribute. Segmentation may be conducted using any available statistical
software.
Ecological Validity and the Issue of

Attention
The first question on many people’s minds is whether Conjoint Analysis
does a good job predicting real-life choices, i.e., whether it has good eco-
logical validity. Ideally, testing ecological validity requires comparing
predictions from a Conjoint Analysis survey to choices made by consum-
ers in the real world. Creating such a situation is challenging, as real-life
environments rarely mimic the sterilized and simplified format of Conjoint
Analysis. However, several studies have been able to test the ecological
validity of Conjoint Analysis, and their results have been quite positive.
See Louviere (1988) or Green and Srinivasan (1990) for a review. In addi-
tion, many studies have tested the external validity of Conjoint Analysis,
i.e., its ability to predict choices in other contexts, which are not necessar-
ily real-world decisions.
In addition to comparing predictions from Conjoint Analysis to actual
MIZIK_9781784716745_t.indd 64 14/02/2018 16:38

behavior, researchers have studied more generally the issue of how much
attention consumers spend in Conjoint Analysis surveys, and whether
their level of attention in the survey is similar to how they would approach
choices in real life. Such evidence will be reviewed later in this section.
First, we review recent attempts to motivate participants to pay more
attention to surveys and take the task more seriously.
Incentive Alignment
Traditional surveys do not link the respondents’ compensation to their

answers. That is, from the perspective of the respondent, there are often
very few consequences to their answers. While most respondents prob-
ably have a good nature and good intentions, there are so many demands
on consumers’ time and attention today that it is hard to assume that all
consumers will spontaneously answer all survey questions in a way that
is exactly consistent with how they would behave in real life. Why would
a rational person care to think hard about the questions in a survey, if
there is nothing to gain from it? In addition to attention, social desirability
is another obvious concern. Consumers may be embarrassed to reveal
certain preferences and to admit to the researcher (and to themselves)
that they care more or less about certain attributes. Examples include
price sensitivity (consumers may not want to admit their true level of price
sensitivity), and any other preference that is related to some social norms
(e.g., how much consumers care about the environment, fair trade, etc.).
One way to start tackling these issues is incentive alignment, i.e., link-
ing the consumer’s compensation for taking a survey to their answers
in the survey. Incentive alignment has a long tradition in economics.
Some of the first documented uses in the marketing context of Conjoint
Analysis include Toubia et al. (2003) and Ding, Grewal and Liechty
(2005). In particular, Ding, Grewal and Liechty (2005) proposed an
incentive-aligned conjoint mechanism, whereby each choice made by
each respondent during the Conjoint Analysis survey has some positive
probability of being realized (i.e., the respondent may actually receive his
or her chosen alternative). That is, each respondent has some probability
of being selected as a “winner.” When that happens, one of their choices
is randomly selected, and they receive their favorite alternative from that
choice. Ding, Grewal and Liechty (2005) showed that this mechanism
increases external validity in choice-based conjoint (CBC) experiments,
compared to a benchmark with no incentive alignment. While it paved
the way for incentive alignment research in Conjoint Analysis, the initial
mechanism proposed by Ding, Grewal and Liechty (2005) is not very
practical, as it requires being able to offer any possible profile as a possible
MIZIK_9781784716745_t.indd 65 14/02/2018 16:38

compensation. Consequently, Ding (2007) extended this method by allow-

ing researchers to reward respondents from a limited set of products. Ding
(2007)’s mechanism involves inferring the respondent’s willingness to pay
for one or a few reward profiles. Dong, Ding and Huber (2010) further
improved the practicality of incentive alignment by proposing an alterna-
tive approach, based on an inferred rank order of the potential reward
profiles, which does not require the estimation of willingness to pay. One
potential concern with incentive alignment would be that consumers
systematically select more expensive alternatives, in order to increase the
market (and therefore resale) value of the prize they will receive if they
are selected as winners. This is addressed by giving each winner a fixed
monetary prize, using that money to purchase their preferred alternative
from one of their choice questions, and giving them the change in cash.
For example, Toubia et al. (2003) gave each winner $100, with which they
purchased a laptop bag priced between $70 and $100 which was selected
based on respondents’ answers, and gave the difference between $100 and
the price of the laptop bag as cash to respondents.
Incentive alignment has become the gold standard in Conjoint Analysis.
Indeed, it has been shown to lead to significant improvements in the real-
ism of Conjoint Analysis surveys, although some eye-tracking evidence
reviewed later in this section suggests it may not be enough to induce
consumers to treat Conjoint Analysis choices exactly like they would treat
real-life choices. One key limitation of incentive alignment is logistical.
The costs and logistics of distributing products to consumers may become
prohibitive, in particular as the sample grows (although usually only a
fraction of consumers are randomly selected to get a prize), and for more
expensive product categories. One creative solution was provided by Ding
et al. (2011). These authors studied preference for automobiles, where
incentive alignment required putting a positive probability on the event
that one respondent would receive $40,000 toward the purchase of an
actual automobile. In order to offer such incentives, the authors purchased
prize indemnity insurance on the open market, for a fixed fee. That is,
the authors paid the insurance company a fixed fee, and the insurance
company was responsible for paying the $40,000 prize if a respondent
actually won it.
Gamification
Incentive alignment provides an extrinsic motivation to respondents to

be truthful in their answers and to take surveys seriously. Another way
to increase attention is to increase intrinsic motivation, by gamifying the
experience. In particular, the first use of online surveys was to perform
MIZIK_9781784716745_t.indd 66 14/02/2018 16:38

the same type of surveys that used to be conducted offline, in an online

environment. However, with online studies, it is possible to perform com-
putations on the fly during the survey, and to connect respondents with
one another as they go through the task. Researchers are now starting to
leverage the web more fully to invent new tasks that take advantage of its
capabilities. For example, Ding, Park and Bradlow (2009) proposed an
online incentive-aligned method inspired by barter markets. Park, Ding
and Rao (2008) introduced a preference measurement mechanism that
relies on upgrading decisions: respondents state their willingness to pay for
an upgrade, and the transaction is realized if a randomly generated price is
smaller than stated willingness to pay. Toubia et al. (2012) developed and
tested an incentive-aligned conjoint poker game to measure preferences.
This game collects data that are similar to CBC, but in a gamified context.
Traditional poker uses regular playing cards. From a Conjoint Analysis
perspective, playing cards are profiles with two attributes (Color with
four levels, and Number with 13 levels). These authors develop a version
of poker where cards may have any number of attributes and levels (e.g.,
Design, Color and Price). Similar to poker, players create hands based on
similarities and differences between cards. In the process of creating these
hands, players are required to pay attention to the profiles captured on
these cards, which increases their motivation to process all the available
information.
Screening for Attention
In addition to providing incentives to respondents and making the survey-

taking experience more enjoyable, several routine measures exist to check
for attention and screen out inattentive respondents. First, it is common to
start an online survey with a “CAPTCHA.” While the primary purpose of
this type of questions is to ensure that the survey is completed by humans
instead of by internet bots, it also provides a very basic attention check.
Second, it is advisable to insert at least one “attention check” question
(also called “Instructional Manipulation Check”) at the end of the survey.
These questions are often multiple-choice questions with an open-ended
option. The instructions to these questions are often a few lines, that
may include a statement like: “If you have read this question carefully,
please . . .” These questions are designed such that only respondents
who have carefully read the instructions are able to provide a “correct”
response, and those who fail to do that may be dropped from the sample.
Oppenheimer, Meyvis and Davidenko (2009) show that the inclusion of
such questions can increase the statistical power and reliability of a survey
dataset. Third, respondents who completed the survey suspiciously fast
MIZIK_9781784716745_t.indd 67 14/02/2018 16:38

may be automatically discarded. There is no universal cutoff for response

time. Some researchers like to drop respondents with a log of response
time that is less than 1 or 1.5 standard deviations from the mean. The
commercial survey-hosting platform Qualtrics drops respondents with
response time less than one-third of the average from the initial “soft
launch” of the survey (i.e., the first 60 or so respondents).
Eye Tracking Evidence
Eye-tracking research has a long tradition in advertising and branding

(e.g., Pieters and Warlop 1999; Wedel and Pieters 2000; Pieters and Wedel
2004; Van der Lans, Pieters, and Wedel 2008). More recently, researchers
have started using eye tracking in Conjoint Analysis in order to directly
measure how respondents allocate their attention during surveys.
Eye-tracking data are composed of fixations and saccades (Wedel and
Pieters 2000). Fixations represent the time periods in which participants
fix their eyesight on a specific location; saccades represent eye movements
between two fixations. As mentioned above, Toubia et al. (2012) used eye
tracking to measure attention in regular CBC versus their Conjoint Poker
game. Profile information is usually presented in a matrix format (e.g.,
one column per choice alternative and one row per attribute). Toubia et al.
(2012) found that participants in their Conjoint Poker had on average at
least one fixation on approximately 90 percent of the cells in the matrix
containing the choice-relevant information. However, this proportion
dropped to 60–70 percent for participants in an incentive-aligned CBC
condition. Yang, Toubia and De Jong. (2015) found similar results.
That is, even when incentives are aligned, participants in CBC tend to
ignore 30–40 percent of the choice-relevant information provided to
them. Meißner, Musalem and Huber (2016) present eye-tracking evidence
that suggests that respondents tend to adjust their decision processes to
increase speed while maintaining reliability. Shi, Wedel and Pieters (2013)
show that the information acquired by respondents is influenced by the
format in which the information is presented (i.e., whether attributes are
in rows and alternatives in columns, or the other way around). Stüttgen,
Boatwright and Monroe (2012) provide eye tracking evidence that sup-
ports a satisficing model of choice, according to which respondents stop
evaluating choice alternatives once they have found one that is satisfac-
tory. In such a model, the final choice is influenced by the order in which
alternatives are considered.
The eye-tracking evidence provided in these studies suggests that
respondents in Conjoint Analysis surveys do not process all the relevant
information presented to them even in the presence of incentive align-
MIZIK_9781784716745_t.indd 68 14/02/2018 16:38

ment, and that their information processing may be easily influenced by

incidental factors. This raises the question of whether consumers ignore
some relevant information in real-life choices as well. In other words, do
consumers also ignore 30–40 percent of the relevant information when
making real-life choices?
All incentive-aligned preference measurement methods follow an
approach known in economics as the random lottery mechanism (RLM).
In an RLM, each choice has some probability of being realized and at most
one choice is realized per subject. In other words, incentive alignment uses
tasks that are “probabilistically” incentive aligned, i.e., each choice only
has some (usually small) probability of being realized. In contrast, most
real-life decisions involve what may be labeled as “deterministic” incen-
tives, i.e., the transaction will happen with probability 1. Yang, Toubia
and De Jong (2017) argue that if it takes effort for consumers to process
information during a Conjoint Analysis task, we should expect attention
levels in probabilistically incentive-aligned tasks to be lower than they are
in deterministically incentive-aligned tasks. Indeed, the cognitive costs
involved in processing information are the same irrespective of the incen-
tives. On the other hand, the benefits from these efforts are larger when
choices are more likely to be realized. Therefore, a boundedly rational
consumer should invest less effort in processing information when choices
are less likely to be realized. In order to test this hypothesis, Yang, Toubia
and De Jong (2017) ran an eye-tracking study in which each respondent
makes a single choice that may be realized with probability 0, 0.01, 0.50,
0.99, or 1. They find that, indeed, the amount of information processed
and the time taken to make a decision are positively correlated with this
probability, and that the probabilistic incentives that are typically used in
Conjoint Analysis (where the probability that each choice will be realized
in usually in the order of 0.01) are not enough to motivate consumers
to treat these choices as they would treat real-life choices. Nevertheless,
incentive alignment remains the state of the art in choice experiments.
One may wonder whether a solution to this problem would be to make
all Conjoint Analysis choices deterministically incentive-aligned. That is,
each choice question would be realized with certainty. In addition to being
prohibitively costly, this approach would also be incorrect methodologi-
cally. Indeed, when multiple questions are asked in a Conjoint Analysis
survey, a basic assumption is that these choices are independent. However,
if each choice is realized, this assumption would be violated. For example,
a consumer who chose an SLR camera in the first question may choose
a compact camera in the next question, since their utility for a new SLR
camera diminishes once they already have one.
A more promising solution to the attention problem would be to
MIZIK_9781784716745_t.indd 69 14/02/2018 16:38

develop models of information search and choice such as the ones of

Stüttgen et al. (2012) or Yang, Toubia and De Jong (2015). These models
capture both how consumers acquire information and how they choose
based on this information. Such models may be extended to allow for
counterfactual simulation, or extrapolation, where real-life search and
choices would be predicted based on data coming from probabilistically
aligned incentive-aligned choices.
However, this approach may not be enough to close the gap between
probabilistic and deterministic incentives. Indeed, Yang, Toubia and
De Jong (2017) show that the probability that the choice will be realized
does not only impact what and how much information consumers pay
attention to, it also impacts how they choose. In particular, these authors
find that respondents for whom choices are more likely to be realized also
tend to choose more familiar products and tend to be more price sensitive.
These findings are consistent with previous findings by Ding, Grewal
and Liechty (2005) who report that consumers show a greater willingness
to try new things, exhibit less price sensitivity, and exhibit more socially
desirable behaviors when choices are purely hypothetical as opposed to
probabilistically incentive-aligned. These effects may be explained using
the concept of Psychological Distance (Trope and Liberman, 2010). It
has been shown that improbable events tend to be more psychologically
distant than probable ones, i.e., the lower the probability of the event,
the greater its psychological distance (Todorov, Goren and Trope 2007;
Wakslak et al. 2006). In turn, it has been shown that when choices are
more psychologically distant, consumers are more likely to choose based
on abstract, high-level, positive considerations (referred to as desirability
concerns), versus more concrete, practical, negative ones (referred to
as feasibility concerns in the literature). This theory explains the results
reported by Yang, Toubia and De Jong (2017) and by Ding, Grewal and
Liechty (2005). Indeed, price is a pragmatic, negative, feasibility-oriented
attribute, and therefore we should expect consumers to be more price
sensitive when choices are less psychologically distant (i.e., more likely to
be realized). Similarly, trying new things and behaving in a socially desir-
able manner tend to be desirability-oriented features, which should receive
more weight when choices are more psychologically distant (i.e., less likely
to be realized). These findings imply that it may not be enough to predict
the level of attention that consumers would pay in real-life choices in order
to predict these choices. It may also be necessary to model how preferences
are impacted by probabilistic versus deterministic incentives.
To close on a positive note, eye tracking also provides valuable informa-
tion that may be leveraged to improve our ability to measure consumers’
preferences efficiently. For example, Yang, Toubia and De Jong’s (2015)
MIZIK_9781784716745_t.indd 70 14/02/2018 16:38

model links partworths and eye movements, which enables the researcher
to learn about the respondent’s preferences from their eye movements.
Yang, Toubia and De Jong (2015) find that this additional information
allows reducing the length of Conjoint Analysis questionnaires. In their
study, they find that leveraging eye tracking data allows extracting as
much information in 12 choice questions as would be extracted in 16
choice questions without eye tracking data. Such a model is becoming
increasingly feasible in practice, as eye-tracking technology becomes more
easily accessible. In particular, it is now possible to conduct eye-tracking
studies using the camera on the respondent’s computer or smartphone
(e.g., www.eyetrackshop.com, www.youeye.com).
In practice: whenever feasible, it is recommended to use incentive
alignment in Conjoint Analysis, despite the implied costs. It is also
recommended to design surveys that are attractive and engaging in order
to motivate respondents to pay more attention to the task. Researchers
should also implement measures and tests of attention and drop respond-
ents who appear to have been inattentive. Despite these best practices, it
is important to keep in mind that Conjoint Analysis remains a marketing
research tool, which can at best approximate real-life decisions. The first-
best option would be to manipulate choice options in real-life and observe
the resulting consumer choices. Short of this, incentive-aligned Conjoint
Analysis may be viewed as a second-best solution.
Conclusions
After 45 years, Conjoint Analysis remains a major quantitative market-

ing research method and a major area of academic research in market-
ing. New, exciting research is expected, enabled by new technological
developments that make the collection of physiological data feasible on
a large scale (e.g., eye tracking, skin conductance, brain responses). This
chapter has reviewed a selected set of issues related to implementing a
Conjoint Analysis survey and making quantitative, managerially relevant
inferences based on the data. Particular emphasis was placed on issues of
ecological validity and attention. Recent tools for motivating respondents
to behave in Conjoint Analysis surveys like they would behave in real life
were reviewed, including incentive alignment and gamification. Despite
these advances, it is important to keep in mind that a Conjoint Analysis
survey will always remain a survey tool, which at best approximates real-
life choices. Conjoint Analysis may not be perfect, but it may also be one
of the most efficient and reliable methods available today for quantifying
consumer preferences.
MIZIK_9781784716745_t.indd 71 14/02/2018 16:38

The Apple v. Samsung case provided another demonstration of the value

of Conjoint Analysis, which greatly increased interest in this method, in
particular among the legal community. Hopefully this chapter will help
prospective users decide whether Conjoint Analysis is the right approach
for them. Such decision requires being aware of other available options.
In particular, it is important to keep in mind that Conjoint Analysis
is particularly suited for situations in which customers routinely make
tradeoffs between various attributes of a product or service, and when these
attributes may be described in objective terms (e.g., number of minutes,
number of pixels, miles per gallon). In some situations, tradeoffs are less
relevant, perhaps because there is only one main attribute in the product/
service, or the focal attribute is not really comparable to other attributes.
In such cases, simpler methods may be considered, such as the Contingent
Valuation Method (Arrow et al. 1993; Mitchell and Carson 1989). In other
cases, attributes are harder to define objectively, perhaps because they
involve aesthetics and/or sensory considerations. In such cases, more quali-
tative approaches may be considered as alternatives to Conjoint Analysis.
References
Agarwal, James, et al. “An Interdisciplinary Review of Research in Conjoint Analysis:

Recent Developments and Directions for Future Research.” Customer Needs and Solutions
2.1 (2015): 19–40.
Allenby, Greg M., et al. “Valuation of Patented Product Features.” Journal of Law and
Economics 57.3 (2014): 629–663.
Andrews, Rick L., Asim Ansari, and Imran S. Currim. “Hierarchical Bayes versus finite
mixture Conjoint Analysis models: A comparison of fit, prediction, and partworth recov-
ery.” Journal of Marketing Research 39.1 (2002): 87–98.
Arora, Neeraj and Joel Huber. “Improving parameter estimates and model prediction by
aggregate customization in choice experiments.” Journal of Consumer Research 28.2 (2001):
273–283.
Arrow, Kenneth, et al. “Report of National Oceanic and Atmospheric Administration panel
on the reliability of natural resource damage estimates derived from contingent valua-
tion.” Federal Register 58 (1993): 4601–4614.
Belloni, Alexandre, et al. “Optimizing product line designs: Efficient methods and compari-
sons.” Management Science 54.9 (2008): 1544–1552.
Bradlow, Eric T. “Current issues and a ‘wish list’ for Conjoint Analysis.” Applied Stochastic
Models in Business and Industry 21.4–5 (2005): 319–323.
Bröder, Arndt. “Assessing the empirical validity of the ‘Take-the-best’ heuristic as a model of
human probabilistic inference.” Journal of Experimental Psychology: Learning, Memory,
and Cognition 26.5 (2000): 1332.
Buhrmester, Michael, Tracy Kwang, and Samuel D. Gosling. “Amazon’s Mechanical Turk:
a new source of inexpensive, yet high-quality, data?” Perspectives on Psychological Science
6.1 (2011): 3–5.
Chu, Wei, et al. “A case study of behavior-driven Conjoint Analysis on Yahoo!: Front
Page Today Module.” Proceedings of the 15th ACM SIGKDD international conference on
Knowledge discovery and data mining. ACM, 2009.
MIZIK_9781784716745_t.indd 72 14/02/2018 16:38

Ding, Min, “An incentive-aligned mechanism for Conjoint Analysis.” Journal of Marketing
Research 44.2 (2007): 214–223.
Ding, Min, Rajdeep Grewal, and John Liechty. “Incentive-aligned Conjoint Analysis.”
Journal of Marketing Research 42.1 (2005): 67–82.
Ding, Min, Young-Hoon Park, and Eric T. Bradlow. “Barter markets for Conjoint
Analysis.” Management Science 55.6 (2009): 1003–1017.
Ding, Min, et al. “Unstructured direct elicitation of decision rules.” Journal of Marketing
Research 48.1 (2011): 116–127.
Dong, Songting, Min Ding, and Joel Huber. “A simple mechanism to incentive-align con-
joint experiments.” International Journal of Research in Marketing 27.1 (2010): 25–32.
Elrod, Terry, Jordan J. Louviere, and Krishnakumar S. Davey. “An empirical comparison
of ratings-based and choice-based conjoint models.” Journal of Marketing Research 29.3
(1992): 368–377.
Evgeniou, Theodoros, Massimiliano Pontil, and Olivier Toubia. “A convex optimization
approach to modeling consumer heterogeneity in conjoint estimation.” Marketing Science
26.6 (2007): 805–818.
Gilbride, Timothy J. and Greg M. Allenby. “A choice model with conjunctive, disjunctive,
and compensatory screening rules.” Marketing Science 23.3 (2004): 391–406.
Green Paul, E., A. M. Krieger, and T. Vavra. “Evaluating EZ-Pass: using Conjoint Analysis
to assess consumer response to a new tollway technology.” Marketing Research 11.2 (1999):
5–16.
Green, Paul E., Abba M. Krieger, and Yoram Wind. “Thirty years of Conjoint Analysis:
Reflections and prospects.” Interfaces 31.3 supplement (2001): S56–S73.
Green, Paul E., and Vithala R. Rao. “Conjoint measurement for quantifying judgmental
data.” Journal of Marketing Research (1971): 355–363.
Green, Paul E. and Venkat Srinivasan. “Conjoint Analysis in marketing: new developments
with implications for research and practice.” Journal of Marketing (1990): 3–19.
Hauser, John R. “Consideration-set heuristics.” Journal of Business Research 67.8 (2014):
1688–1699.
Hauser, John R., Olivier Toubia, Theodoros Evgeniou, Rene Befurt, and Daria Dzyabura.
“Disjunctions of conjunctions, cognitive simplicity, and consideration sets.” Journal of
Marketing Research 47.3 (2010): 485–496.
Huber, Joel and Klaus Zwerina. “The importance of utility balance in efficient choice
designs.” Journal of Marketing research (1996): 307–317.
Jedidi, Kamel and Rajeev Kohli. “Probabilistic subset-conjunctive models for heterogeneous
consumers.” Journal of Marketing Research 42.4 (2005): 483–494.
Johnson, Richard M. “Adaptive Conjoint Analysis.” Sawtooth Software Conference
Proceedings. Sawtooth Software, Ketchum, ID, 1987.
Kamakura, Wagner A. and Gary Russell. “A probabilistic choice model for market segmen-
tation and elasticity structure.” Journal of Marketing Research 26 (1989): 379–390.
Kohli, Rajeev and Kamel Jedidi. “Representation and inference of lexicographic preference
models and their variants.” Marketing Science 26.3 (2007): 380–399.
Kohli, Rajeev and Ramamirtham Sukumar. “Heuristics for product-line design using
Conjoint Analysis.” Management Science 36.12 (1990): 1464–1478.
Kuhfeld, Warren F. “Marketing research methods in SAS.” Experimental Design, Choice,
Conjoint, and Graphical Techniques. Cary, NC, SAS-Institute TS-722 (2005).
Kuhfeld, Warren F., Randall D. Tobias, and Mark Garratt. “Efficient experimental design
with marketing research applications.” Journal of Marketing Research (1994): 545–557.
Leigh, Thomas W., David B. MacKay, and John O. Summers. “Reliability and validity
of Conjoint Analysis and self-explicated weights: A comparison.” Journal of Marketing
Research (1984): 456–462.
Lenk, Peter J., et al. “Hierarchical Bayes Conjoint Analysis: Recovery of partworth
heterogeneity from reduced experimental designs.” Marketing Science 15.2 (1996):
173–191.
Louviere, Jordan J. “Conjoint Analysis modelling of stated preferences: a review of theory,
MIZIK_9781784716745_t.indd 73 14/02/2018 16:38

methods, recent developments and external validity.” Journal of Transport Economics and
Policy (1988): 93–119.
Louviere, Jordan J., David A. Hensher, and Joffre D. Swait. Stated choice methods: analysis
and applications. Cambridge University Press, 2000.
Louviere, Jordan J. and George Woodworth. “Design and analysis of simulated consumer
choice or allocation experiments: an approach based on aggregate data.” Journal of
Marketing Research (1983): 350–367.
Luce, R. Duncan and John W. Tukey. “Simultaneous conjoint measurement: A new type of
fundamental measurement.” Journal of Mathematical Psychology 1.1 (1964): 1–27.
Meißner, Martin, Andres Musalem, and Joel Huber. “Eye-Tracking Reveals Processes that
Enable Conjoint Choices to Become Increasingly Efficient with Practice.” Journal of
Marketing Research. 53.1 (2016): 1–17.
Mitchell, Robert Cameron and Richard T. Carson (1989), Using Surveys to Value Public
Goods: The Contingent Valuation Method, Resources for the Future, Washington, DC.
Moore, William L. “A cross-validity comparison of rating-based and choice-based Conjoint
Analysis models.” International Journal of Research in Marketing 21.3 (2004): 299–312.
Netzer, Oded and Rajan Sambandam. “Apple vs. Samsung: The $2 Billion Case.” Columbia
CaseWorks (2014).
Netzer, Oded and Visvanathan Srinivasan. “Adaptive self-explication of multiattribute pref-
erences.” Journal of Marketing Research 48.1 (2011): 140–156.
Netzer, Oded, et al. “Beyond Conjoint Analysis: Advances in preference measurement.”
Marketing Letters 19.3–4 (2008): 337–354.
Ofek, Elie and Olivier Toubia. “Conjoint Analysis: Online Tutorial.” Harvard Business
School Tutorial 514–712. (2014a).
Ofek, Elie and Olivier Toubia. “Conjoint Analysis: A Do it Yourself Guide.” Harvard
Business School Technical Note 515–024. (2014b).
Oppenheimer, Daniel M., Tom Meyvis, and Nicolas Davidenko. “Instructional Manipulation
Checks: Detecting Satisficing to Increase Statistical Power.” Journal of Experimental
Social Psychology 45 (2009): 867–872.
Orme, Bryan. “Formulating attributes and levels in Conjoint Analysis.” Sawtooth Software
Research Paper (2002): 1–4.
Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. “Running experiments on
Amazon Mechanical Turk.” Judgment and Decision Making 5.5 (2010): 411–419.
Park, Young-Hoon, Min Ding, and Vithala R. Rao. “Eliciting preference for complex
products: A web-based upgrading method.” Journal of Marketing Research 45.5 (2008):
562–574.
Pieters, Rik and Luk Warlop. “Visual attention during brand choice: The impact of time
pressure and task motivation.” International Journal of Research in Marketing 16.1 (1999):
1–16.
Pieters, Rik and Michel Wedel. “Attention capture and transfer in advertising: Brand, picto-
rial, and text-size effects.” Journal of Marketing 68.2 (2004): 36–50.
Rao, Vithala R. “Conjoint Analysis.” Wiley International Encyclopedia of Marketing (2010).
Rossi, Peter E., and Greg M. Allenby. “Bayesian statistics and marketing.” Marketing
Science 22.3 (2003): 304–328.
Sándor, Zsolt and Michel Wedel. “Heterogeneous conjoint choice designs.” Journal of
Marketing Research 42.2 (2005): 210–218.
Sattler, Henrik and Susanne Hensel-Börner. “A comparison of conjoint measurement with
self-explicated approaches.” Conjoint Measurement. Springer (2001): 121–133.
Sawtooth Software. “The Adaptive Choice-Based Conjoint (ACBC) Technical Paper.”
Sawtooth Software Technical Paper Series (2014). Available at: http://www.sawtoothsoft
ware.com/support/technical-papers/adaptive-cbc-papers/acbc-technical-paper-2009 (last
accessed October 3, 2017).
Shi, Savannah Wei, Michel Wedel, and F. G. M. Pieters. “Information acquisition during
online decision making: A model-based exploration using eye-tracking data.” Management
Science 59.5 (2013): 1009–1026.
MIZIK_9781784716745_t.indd 74 14/02/2018 16:38

Srinivasan, Venkataraman and Allan D. Shocker. “Linear programming techniques for mul-
tidimensional analysis of preferences.” Psychometrika 38.3 (1973): 337–369
Stüttgen, Peter, Peter Boatwright, and Robert T. Monroe. “A satisficing choice model.”
Marketing Science 31.6 (2012): 878–899.
Todorov, Alexander, Amir Goren, and Yaacov Trope. “Probability as a psychological dis-
tance: Construal and preferences.” Journal of Experimental Social Psychology 43.3 (2007):
473–482.
Toubia, Olivier and John R. Hauser. “Research note-on managerially efficient experimental
designs.” Marketing Science 26.6 (2007): 851–858.
Toubia, Olivier, John Hauser, and Rosanna Garcia. “Probabilistic polyhedral methods for
adaptive choice-based Conjoint Analysis: Theory and application.” Marketing Science
26.5 (2007): 596–610.
Toubia, Olivier, John R. Hauser, and Duncan I. Simester. “Polyhedral methods for adaptive
choice-based Conjoint Analysis.” Journal of Marketing Research 41.1 (2004): 116–131.
Toubia, Olivier, et al. “Fast polyhedral adaptive conjoint estimation.” Marketing Science
22.3 (2003): 273–303.
Toubia, Olivier, et al. “Measuring consumer preferences using conjoint poker.” Marketing
Science 31.1 (2012): 138–156.
Trope, Yaacov and Nira Liberman. “Construal-level theory of psychological distance.”
Psychological review 117.2 (2010): 440.
Tversky, Amos. “Elimination by aspects: A theory of choice.” Psychological Review 79.4
(1972): 281.
Van der Lans, Ralf, Rik Pieters, and Michel Wedel. “Eye-movement analysis of search effec-
tiveness.” Journal of the American Statistical Association 103.482 (2008): 452–461.
Wakslak, Cheryl J., et al. “Seeing the forest when entry is unlikely: probability and the mental
representation of events.” Journal of Experimental Psychology: General 135.4 (2006): 641.
Wedel, Michel and Rik Pieters. “Eye fixations on advertisements and memory for brands: A
model and findings.” Marketing science 19.4 (2000): 297–312.
Wind, Jerry, et al. “Courtyard by Marriott: Designing a hotel facility with consumer-based
marketing models.” Interfaces 19.1 (1989): 25–47.
Yang, Liu, Olivier Toubia, and Martijn G. De Jong. “A Bounded Rationality Model of
Information Search and Choice in Preference Measurement.” Journal of Marketing
Research 52.2 (2015): 166–183.
Yang, Liu, Olivier Toubia, and Martijn G. De Jong. “Attention, Information Processing and
Choice in Incentive-Aligned Choice Experiments.” Working paper. Columbia Business
School (2017).
Yee, Michael, Ely Dahan, John R. Hauser, and James Orlin. “Greedoid-based noncompen-
satory inference.” Marketing Science 26.4 (2007): 532–549.
MIZIK_9781784716745_t.indd 75 14/02/2018 16:38

MIZIK_9781784716745_t.indd 76 14/02/2018 16:38
PART II
CLASSICAL
ECONOMETRICS
MIZIK_9781784716745_t.indd 77 14/02/2018 16:38

MIZIK_9781784716745_t.indd 78 14/02/2018 16:38
4. Time-series models of short-run and
long-run marketing impact
Marnik G. Dekimpe and Dominique M. Hanssens
Marketing data appear in a variety of forms. A frequently occurring

form is time-series data. Examples include the number of web clicks or
new Facebook likes per hour, daily category sales, weekly measures of a
brand’s aided advertising awareness, private-label value shares per month,
the evolution of prices or advertising spending levels for several compet-
ing brands over the last few years. The main feature of time-series data is
that the observations are ordered over time, and hence earlier observations
likely have predictive content for future observations.
Time series can refer to a single variable, such as sales or advertising, but
can also cover a vector of variables, like sales, prices and advertising that
are considered jointly. In some instances, marketing modelers may want
to build a univariate model for a time series, and analyze the series strictly
as a function of its own past. This is, for example, the case when one has
to forecast (or extrapolate) exogenous variables or when the number of
variables to be analyzed (e.g. the number of items in a broad assortment) is
so large that building multivariate models for each of them is too unwieldy
(Hanssens, Parsons and Schultz 2001). However, univariate time-series
models do not address the cause-and-effect questions that are central to
marketing planning. To specify the lag structure in response models, one
extends the techniques of univariate extrapolation to the case of multiple
time series.
Time-series data can be summarized in time-series models. However,
not all models built on time-series data are referred to as time-series
models. Unlike most econometric approaches to dynamic model specifica-
tion, time-series modelers take a more data-driven approach. Specifically,
one looks at historically observed patterns in the data to help in model
specification, rather than imposing a priori a certain structure (such as a
geometric decay pattern in the popular Koyck specification) derived from
marketing or economic theory.
Over the last two decades, time-series techniques specially designed to
disentangle short- from long-run relationships have become popular in
the marketing literature. This fits well with one of marketing’s main fields
of interest: to quantify the long-run impact of marketing’s tactical and
79
MIZIK_9781784716745_t.indd 79 14/02/2018 16:38

strategic decisions. Indeed, long-run market response is a central concern

of any marketing strategy that tries to create a sustainable competitive
advantage. However, this is easier said than done, as only short-run results
of marketing actions are readily observable. An excellent discussion of dif-
ferent time-series methods and their ability to derive long-term marketing
impact may be found in Leeflang et al. (2009).
This chapter will focus on the use of persistence modeling to address the
problem of long-run market-response identification by combining into one
metric the net long-run impact of a chain reaction of consumer response,
firm feedback and competitor response that emerges following an initial
marketing action. This marketing action could be an unexpected increase
in advertising support (e.g., Dekimpe and Hanssens 1995a), a price
promotion (e.g., Pauwels, Hanssens and Siddarth 2002) or a competitive
activity (e.g., Steenkamp et al. 2005), and the performance metric could be
category demand (Nijs, Dekimpe, Steenkamp and Hanssens 2001), brand
sales (Dekimpe and Hanssens 1995a), brand profitability (Dekimpe and
Hanssens 1999) or stock returns (Pauwels, Silva-Risso, Srinivasan and
Hanssens 2004), among others.
Persistence modeling is a multi-step process, as depicted in Figure 4.1
(taken from Dekimpe and Hanssens 2004). In a first step, one applies unit-
root tests to the different performance and marketing-support variables of
interest to determine whether they are stable (mean or trend-stationary)
or evolving. In the latter case, the series have a stochastic trend, and
one has to test whether a long-run equilibrium exists between them.
This is done through cointegration testing. Depending on the outcome
of these preliminary (unit-root and cointegration) tests, one specifies a
vector-autoregressive (VARX) model in the levels, a VARX model in
the differences or a Vector Error Correction Model. From these VARX
models, one can derive impulse-response functions (IRFs), which trace the
incremental effect of a one-unit (or one-standard-deviation) shock in one
of the variables on the future values of the other endogenous variables.
Relatedly, one can use variance-decomposition approaches to quantify the
dynamic explanatory power of different endogenous drivers.
Below, we provide a brief technical introduction to each of these steps,
along with a set of illustrative marketing studies that have used them. Next,
we summarize various marketing insights that have been derived from
their use. The current review builds upon and complements earlier book
chapters on the topic, such as Dekimpe and Hanssens (2004), Hanssens
and Dekimpe (2012) and Dekimpe, Franses, Hanssens and Naik (2008).
MIZIK_9781784716745_t.indd 80 14/02/2018 16:38

UNIT-ROOT TESTING:
MIZIK_9781784716745_t.indd 81
Are performance and marketing variables stable or evolving?
Evolving Stable
COINTEGRATION TEST:
Does a long-run equilibrium exist between the evolving variables?
Yes No
81
VECTOR ERROR CORRECTION MODEL VARX MODEL IN DIFFERENCES VARX MODEL IN LEVELS
DERIVE IMPULSE-RESPONSE FUNCTIONS (IRFs) AND ASSOCIATED PERSISTENCE LEVELS

(GENERALIZED) FORECAST ERROR VARIANCE DECOMPOSITION (GFEUD)
Figure 4.1 Overview of persistence modeling procedure
14/02/2018 16:38
Table 4.1 Persistence modeling steps*
MIZIK_9781784716745_t.indd 82
Methodological approach Relevant Literature Research questions
Econometrics Marketing
1. Unit root test Dickey and Fuller (1979) Dekimpe and Hanssens (1995a,b) Are performance and
Kwiatkowski et al. (1992) Slotegraaf and Pauwels (2008) marketing variables
Enders (1995) Nijs et al. (2001) stationary (mean/trend
reverting) or evolving
(unit root)?
2. Cointegration test
– E&G 2-step approach Engle and Granger (1987) Baghestani (1991) Do evolving variables move
– Johansen’s FIML Johansen (1988) Dekimpe and Hanssens (1999) together?
82
approach
3. Impulse Response Analysis
– IRF Lütkepohl (1993) Dekimpe and Hanssens (1995a) What is the long-term
– GIRF Pesaran and Shin (1998) Dekimpe and Hanssens (1999) performance impact of a
marketing shock?
4. Variance Decomposition
Analysis
– FEUD Hamilton (1994) Hanssens (1998) What fraction of performance
variance comes from each
marketing action?
– GFEUD Pesaran and Shin (1988) Nijs et al. (2007) Without imposing a causal
order?
Note: * The listed studies are given for illustrative purposes only. As such, the list is not meant to be exhaustive.
14/02/2018 16:38
Time-series models of short-run and long-run marketing impact 83
TECHNICAL BACKGROUND
Unit-root Testing: Are Performance and Marketing Variables Stable or

Evolving?
The distinction between stability and evolution is formalized through the

unit-root concept. Following Dekimpe and Hanssens (1995a), we consider
first the simple case where the over-time behavior of the variable of interest
(e.g., a brand’s sales St) is described by a first-order autoregressive process:
(1 − ϕL)St = c + ut, (4.1)
where ϕ is an autoregressive parameter, L the lag operator (i.e., LkSt =

St-k), ut a residual series of zero-mean, constant-variance (s2u) and uncor-
related random shocks, and c a constant. Note that Equation 4.1 may also
be written in the more familiar form
St = c + ϕ St−1 + ut, (4.2)
which corresponds to a simple regression model of St on its own past, with

ut the usual i.i.d. residuals. Applying successive backward substitutions
allows us to write Equation 4.2 as
St = [c/(1 − ϕ) + ut + ϕut−1 + ϕ2 ut−1 +. . . , (4.3)
in which the present value of St is explained as a weighted sum of random

shocks. Depending on the value of ϕ, two scenarios can be distinguished.1
When ϕ < 1, the impact of past shocks diminishes and eventually becomes
negligible. Hence, each shock has only a temporary impact. In that case,
the series has a fixed mean c/(1 − ϕ) and a finite variance s2u /(1 − ϕ2). Such
a series is called stable or stationary. When ϕ = 1, however, 4.3 is subject to
a division by zero, so the series no longer has a fixed mean. Instead, each
random shock has a permanent effect on the subsequent values of S. Sales
do not revert to a historical level, but instead wander freely in one direc-
tion or another, i.e., they evolve. Distinguishing between both situations
involves checking whether the parameter ϕ in Equation 4.1 is smaller than
or equal to one.2
Numerous tests have been developed to distinguish stable from evolving
patterns. One popular test, due to Dickey and Fuller (1979), is based on
the following equation:
(1 − L) St = ΔSt = a0 + bSt−1 + a1ΔSt−1 + . . . + amΔSt−m + ut. (4.4)
MIZIK_9781784716745_t.indd 83 14/02/2018 16:38

The t-statistic of b is compared with critical values and the unit-root null
hypothesis is rejected if the obtained value is larger in absolute value than
the critical value. Indeed, if b = 0, there is no mean reversion in sales levels,
and vice versa. The m ΔSt−j terms reflect temporary sales fluctuations and
are added to make ut white noise. Because of these additional terms, one
often refers to this test as the “augmented” Dickey–Fuller (ADF) test. The
ADF test was used, for example, in Dekimpe and Hanssens (1999). They
analyzed a monthly sample of five years of market performance (number
of prescriptions), market support (national advertising and number of
sales calls to doctors) and pricing (price differential relative to the main
challenger) data for a major brand in a prescription drug market. Based
on the Schwartz (SBC) criterion (cf. infra), a value of m varying between
0 (price differential and sales-calls series) and 2 (prescription series) was
selected. The t-statistic of the b-parameter in Equation (4.4) was smaller
in absolute value than the 5 percent critical value for each of the variables,
implying the presence of a unit root in each of them.
Key decisions to be made when implementing ADF-like unit-root tests
are (1) the treatment (inclusion/omission) of various deterministic com-
ponents, (2) the determination of the number of augmented (ΔSt−j) terms,
and (3) whether or not allowance is made for structural breaks in the data.
First, Equation 4.4 tests whether or not temporary shocks may cause a
permanent deviation from the series’ fixed mean level. When dealing with
temporally disaggregated (less than annual) data, marketing researchers
may want to add deterministic seasonal dummy variables to the test equa-
tion to allow this mean level to vary across different periods of the year.
Their inclusion does not affect the critical value of the ADF test. This
is not the case, however, when a deterministic trend is added to the test
equation, in which case one tests whether shocks can initiate a permanent
deviation from that predetermined trend line. Assessing whether or not a
deterministic trend should be added is intricate because the unit-root test
is conditional on its presence, while standard tests for the presence of a
deterministic trend are, in turn, conditional on the presence of a unit root.
An often-used test sequence to resolve this issue is described in Enders
(1995, 256–257). Marketing applications include Nijs et al. (2001) and
Srinivasan, Vanhuele and Pauwels (2010), among others.
A second critical issue in the implementation of ADF tests is the
determination of the number of augmented terms. Two popular order-
determination procedures are the application of fit indices such as the AIC
or SBC criterion (see e.g. Nijs et al. 2001; Srinivasan, Pauwels, Hanssens
and Dekimpe 2004), or the top-down approach advocated by Perron
(1994). The latter approach, used in a marketing setting by Deleersnyder,
Geyskens, Gielens and Dekimpe (2002), starts with a maximal value of m,
MIZIK_9781784716745_t.indd 84 14/02/2018 16:38

and successively reduces this value until a model is found where the last lag
is significant, while the next-higher lag is not.
Finally, a decision has to be made whether or not to allow for a struc-
tural break in the data-generating process. Indeed, the shocks considered
in Equations 4.1–4.4 are expected to be regularly occurring, small shocks
that will not alter the underlying data-generating process. This assumption
may no longer be tenable for shocks associated with, e.g., a new product
introduction (see, e.g., Pauwels and Srinivasan 2004; Nijs et al. 2001) or
an internet channel addition (Deleersnyder et al. 2002). Such shocks tend
to be large, infrequent, and may alter the (long-run) properties of the time
series. A failure to account for these special events has been shown to bias
unit-root tests toward finding evolution. In that case, one would errone-
ously conclude that all (regular) shocks have a long-run impact, while (1)
these shocks cause only a temporary deviation from a fixed mean (deter-
ministic trend), and (2) only the special events caused a permanent shift in
the level (intercept and/or slope) of an otherwise level (trend) stationary
series. Appropriate adjustments to Equation 4.4 to account for such spe-
cial event(s) have been proposed by Perron (1994) and Zivot and Andrews
(1992), among others. Different testing procedures are used depending on
whether the presumed structural break is determined a priori (imposed) by
the researcher (as in Deleersnyder et al. 2002) or determined endogenously
(as in Kornelis, Dekimpe and Leeflang 2008).
Importantly, ADF type tests are characterized by a unit-root null
hypothesis. Many marketing studies (see, for example, Pauwels, Leeflang,
Teerling and Huizingh 2011) also apply the Kwiatkowski, Phillips,
Schmidt and Shin (1992) test, which maintains stationarity as null
hypothesis. Consistency in the conclusion (stationary versus evolving)
increases one’s confidence in the test results. To increase the power of the
tests (which may be especially called for when the time series are not very
long), researchers are increasingly adopting panel versions of the different
unit-root tests (for marketing applications, see, for example, van Heerde,
Gijsenberg, Dekimpe and Steenkamp 2013 or Luo, Raithel and Wiles
2013).
Other developments that are relevant to applied marketing researchers
deal with the design of unit-root tests that incorporate the logical-
consistency requirements of market shares (Franses, Srinivasan and
Boswijk 2001) and the use of outlier-robust unit-root (and cointegra-
tion, cf. infra) tests as described in Franses, Kloek and Lucas (1999).
Pauwels and Hanssens (2007) and Fang, Li, Huang and Palmatier (2015)
implemented rolling-window unit-root tests to identify changing regimes
of, respectively, stability and evolution over time. Unit-root tests are
basically univariate tests. Wang and Zhang (2008), however, argue that
MIZIK_9781784716745_t.indd 85 14/02/2018 16:38

performance series can evolve because of an intrinsic-evolving market or

because of continuous marketing support, and show how this distinction
has important budgeting implications. To that extent, they extend the
univariate tests described in Equations 4.1 and 4.4 by explicitly adding the
level of marketing support to the right-hand side of the test equation. A
similar reasoning was recently used in Hanssens, Wang and Zhang (2016)
in their study on opportunistic marketing spending.
Cointegration Tests: Does a Long-run Equilibrium Exist between Evolving

Series?
Evolving variables are said to be cointegrated when a linear combination

exists between them that results in stable residuals. Even though each of
the individual variables can move far away from its previously held posi-
tions, this long-run equilibrium prevents them from wandering apart.3
Such long-run equilibria can emerge because of a variety of reasons.
Among them, certain budgeting rules (e.g., percentage-of-sales allocation
rules) imply that sales successes eventually translate into higher marketing
spending. Similarly, competitive decision rules can result in firms’ market-
ing spending levels never deviating too far from each other. Finally, cus-
tomers’ limited budgets may cause different price levels to be associated
with different long-run demand levels, which would imply a cointegration
relationship between sales and prices.
Consider, without loss of generality, a three-variable example where a
brand’s sales (S), marketing support (M) and its competitors’ marketing
support (CM) are all evolving (i.e., they all have a unit root). The existence
of a perfect equilibrium relationship between these three variables would
imply (see Powers et al. 1991 for a more in-depth discussion):
St = b0 + b1 Mt + b2 CMt (4.5)
In practice, however, we are unlikely to observe a perfect equilibrium in

every single period. A more realistic requirement is that its deviations are
mean-reverting (stable) around zero, i.e., eS,t in Eq. (4.6) should no longer
be evolving, even though each of the other variables in the equation is:
St = b0 + b1 Mt + b2 CMt + eS,t . (4.6)
A simple testing procedure for cointegration, proposed by Engle and

Granger (1987), is to estimate (4.6) using OLS, and test the residuals eS,t
for a unit root using standard unit-root tests (without intercept in the
test equation and using updated critical values as listed in Engle and Yoo
MIZIK_9781784716745_t.indd 86 14/02/2018 16:38

1987). A marketing application of the Engle-and-Granger (EG) approach

to cointegration testing can be found in Baghestani (1991), among others.
Lately, Johansen’s Full Information Maximum Likelihood (FIML)
approach has become the more popular procedure to test for cointegra-
tion. The latter test was applied by Dekimpe and Hanssens (1999, 406)
in their analysis of a prescription drugs market (see before). It was found
that even though each of the individual series (prescriptions, advertising,
sales calls and price differential) was evolving, the four variables were tied
together in a long-run equilibrium that prevented them from wandering
too far apart from each other. Other marketing applications include,
among others, Nijs et al. (2001) and Steenkamp et al. (2005).
As with the unit-root tests, cointegration tests have also been extended
to allow for structural breaks; see e.g. Gregory and Hansen (1996) for
a technical discussion. Similar to panel-based unit-root tests, also panel
cointegration tests have been developed. A recent marketing applica-
tion can be found in Luo et al. (2013). Grewal, Mills, Mehta and
Mujumdar (2001) discuss in more detail some methodological issues when
using cointegration analysis to model marketing interactions in dynamic
environments.
VAR Models: How to Capture the Dynamics in a System of Variables?
The third step in persistence modeling is to specify a vector-autoregressive

model to link the (short-run) movements of the different variables under
consideration. Depending on the outcomes of the preceding unit-root and
cointegration tests, these VAR models are specified in the levels (no unit
roots), in the differences (unit roots without cointegration), or in error-
correction format (cointegration).4
For expository purposes, we first consider a model in levels, and focus
on a simple three-equation model linking own sales performance (S), own
marketing spending (M) and competitive marketing spending (CM). The
corresponding VAR model (in which, for ease of notation, all determinis-
tic components are omitted) becomes:
St p111 p112 p113 St21

£ Mt § 5 £ p121 p122 p123 § £ Mt21 § 1 c1
CMt p131 p132 p133 CMt21
pJ11 pJ12 pJ13 St2J uS,t

£ pJ21 pJ22 pJ23 § £ Mt2J § 1 £ uM,t § (4.7)
pJ31 pJ32 pJ33 CMt2J uCM,t
MIZIK_9781784716745_t.indd 87 14/02/2018 16:38

where J is the order of the model, and where = [uS,t uM,t uCM,t]′ ~ N(0, S).
This specification is very flexible, and reflects multiple forces or channels of
j j
influence: delayed response (p12, j=1, . . . J), purchase reinforcement (p11),
j j
performance feedback (p21), inertia in decision making (p22) and competi-
j
tive reactions (p32). Only instantaneous effects are not included directly,
but these are reflected in the variance–covariance matrix of the residuals
(S). Estimation of these models is straightforward: (1) all explanatory
variables are predetermined, so there is no concern over the identification
issues that are often encountered when specifying structural multiple-
equation models, and (2) all equations in the system have the same
explanatory variables so that OLS estimation can be applied without loss
of efficiency.
However, this flexibility comes at a certain cost. First, the number of
parameters may become exorbitant. For J = 8, for example, the VAR
model in Equation 4.7 will estimate 9  8 = 72 autoregressive parameters.
If, however, one considers a system with 5 endogenous variables, this
number increases to 25  8 = 200. Several authors (see e.g., Pesaran, Pierse
and Lee 1993; Dekimpe and Hanssens 1995a) have therefore restricted
all parameters with |t-statistic| < 1 to zero.5 While this may alleviate the
problem of estimating and interpreting so many parameters, it is unlikely
to fully eliminate it.6 As a consequence, VAR modelers typically do not
interpret the individual parameters themselves, but rather focus on the
impulse-response functions (IRFs) derived from these parameters. As
discussed in more detail in the next section, IRFs trace, over time, the
incremental performance and spending implications of an initial one-
period change in one of the support variables. In so doing, they provide a
concise summary of the information contained in this multitude of param-
eters, a summary that lends itself well to a graphical and easy-to-interpret
representation (cf. infra).
Second, no direct estimate is provided of the instantaneous effects. The
residual correlation matrix can be used to establish the presence of such
an effect, but not its direction. Various procedures have been used in the
marketing literature to deal with this issue, such as an a priori imposi-
tion of a certain causal ordering on the variables (i.e., imposing that an
instantaneous effect can occur in one, but not the other, direction) as in
Dekimpe and Hanssens (1995a), a sensitivity analysis of various causal
orderings (see e.g., Dekimpe, Hanssens and Silva-Risso 1999), or account-
ing for expected instantaneous effects in the other variables when deriving
the impulse-response functions, as implemented in Nijs et al. (2001) and
Steenkamp et al. (2005).
If some of the variables have a unit root, the VAR model in Eq. (4.7) is
specified in the differences; e.g., St, St-1, . . . are replaced by ΔSt, ΔSt-1,. . .
MIZIK_9781784716745_t.indd 88 14/02/2018 16:38

If the variables are cointegrated as well, this model in differences is aug-

mented with the lagged residuals of the respective long-run equilibrium
relationships (cf. Eq. 4.6), resulting in the following specification:
DSt aS 0 0 eS,t21
£ DMt § 5 £ 0 aM 0 § £ eM,t21 § (4.8)
DCMt 0 0 aCM eCM,t21
pj11 pj12 pj13 DSt2j uS,t

1 a £ pj21
J
pj22 j
p23 § £ DMt2j § 1 £ uM,t §
j51
pj31 pj32 pj33 DCMt2j uCM,t
The addition of the error-correction terms [aSeS,t-1 aMeM,t-1 aCMeCM,t-1]′

implies that in every period there is a partial adjustment towards restor-
ing the underlying, temporarily disturbed, long-run equilibrium. Said
differently, the system partially corrects for the previously observed
deviations [eS,t-1 eM,t-1 eCM,t-1]′, and the respective a-coefficients reflect the
speed of adjustment of the corresponding dependent variable toward the
equilibrium. A good review on the implementation issues involved can
be found in Franses (2001). In the earlier prescription-drugs example,
Dekimpe and Hanssens (1999) had identified that all four series in their
sample were evolving, and that a long-run equilibrium relationship existed
between them. They therefore estimated a four-equation VAR model that
was specified in the differences, whereby each equation was augmented
with a lagged error-correction term (i.e., the lagged residuals from the
equilibrium relationship).7
In Equations 4.7 and 4.8, all three variables are incorporated as
endogenous. Adding more endogenous variables quickly increases the
dimension of the autoregressive parameter matrices. To still control for
the impact of some other variables, yet avoid that this dimension becomes
excessive, one can consider to add them as exogenous variables, to arrive
at a VARX specification. For example, when augmenting Equation 4.7
with four exogenous variables: distribution (DISt), feature (Ft), display
(Dt) and feature and display (FDt), 12 extra g-parameters are estimated.
St p111 p112 p113 St21

£ Mt § 5 £ p121 p122 p123 § £ Mt21 §
CMt p131 p132 p133 CMt21
pJ11 pJ12 pJ13 St2J

1 c1 £ pJ21 pJ22 pJ23 § £ Mt2J § 1
pJ31 pJ32 pJ33 CMt2J
MIZIK_9781784716745_t.indd 89 14/02/2018 16:38

ln (DISt)
g11 g12 g13 g14 uS,t
ln (Ft)
£ g21 g22 g23 g24 § ≥ ¥ 1 £ uM,t § (4.9)
ln (Dt)
g31 g32 g33 g34 uCM,t
ln (FDt)
If these variables would have been treated as endogenous, 49

p-parameters would have to be estimated for each autoregressive lag.
The decision whether or not to treat a variable as endogenous (in the
VAR-part) or exogenous (in the X-part) is either made a priori by
the researcher (depending on whether the variables are central to the
research question at hand; see, for example, Nijs et al. 2001) or empiri-
cally through prior Granger causality tests (see, for example, Fang et al.
2015).
Impulse-response Function Derivation
An impulse-response function (IRF) traces the incremental effect of a one-

unit (or one-standard deviation) shock in one of the variables on the future
values of the other endogenous variables. The first steps of this process are
depicted in the Appendix (where we consider, for expository purposes, a
VAR model of order 1). IRFs can also been seen as the difference between
two forecasts: a first extrapolation based on an information set that does
not take the marketing shock into account, and another prediction based
on an extended information set that takes this action into account. As
such, IRFs trace the incremental effect of the marketing action reflected
in the shock. Note that marketing actions (e.g., a price promotion) are
operationalized as deviations from a benchmark, which is derived as the
expected value of the marketing mix-variable (e.g., the price) as predicted
through the dynamic structure of the VAR model. See Pauwels et al.
(2002) for an extensive discussion on this issue.
A graphical illustration of some IRFs, taken from Nijs et al. (2001), is
given in Figure 4.2. The top panel shows the IRF tracing the incremental
performance impact of a price-promotion shock in the stationary Dutch
detergent market. Because of the chain reaction of events reflected in
this IRF, we see various fluctuations over time; for example, a typical
stockpiling effect, feedback rules and competitive reactions. Eventually,
however, any incremental effect disappears. This does not imply that
no more detergents are sold, but rather that no additional sales can be
attributed to the initial promotion. In contrast, in the evolving dairy-
creamer market shown in the bottom panel of Figure 4.2, we see that
this incremental effect stabilizes at a non-zero, or persistent, level. In that
MIZIK_9781784716745_t.indd 90 14/02/2018 16:38

A: Impulse response function for a stationary market

DETERGENT
1.6
1.4
Price promotion elasticity
1.2
1.0
0.8
0.6
0.4 Long-run Impact
0.2
0.0
–0.2 0 5 10 15 20 25
–0.4
Weeks
B: Impulse response function for an evolving market

DAIRY CREAMER
2.5
Price promotion elasticity
2.0
1.5
1.0
0.5 Long-run Impact
0.0
0 5 10 15 20 25
Weeks
Figure 4.2 Impulse response functions
case, we have identified a long-run effect, as the initial promotion keeps

on generating extra sales. Behavioral explanations for this phenomenon
could be that newly attracted customers make regular repeat purchases, or
that the existing customer base has increased its usage rate.
While impulse-response functions are useful summary devices, the mul-
titude of numbers (periods) involved still makes them somewhat awkward
to compare across brands, markets or marketing-mix instruments. To
reduce this set of numbers to a more manageable size, one often (see Nijs
et al. 2001; Srinivasan et al. 2004; Pauwels and Srinivasan 2004) derives
various summary statistics from them, such as:
MIZIK_9781784716745_t.indd 91 14/02/2018 16:38

1. the immediate performance impact of the marketing-mix shock;

2. the long-run or permanent (persistent) impact, which is the value to
which the IRF converges,
3. the cumulative effect before this convergence level is obtained. This
cumulative effect is often called the total short-run effect. For station-
ary series, this reflects the area under the curve. In case of a persistent
effect, one can compute the combined (cumulative effect) over the
time span it takes before the persistent effect is obtained. The time
interval before convergence is obtained is often referred to as the dust-
settling period (Dekimpe and Hanssens 1999; Nijs et al. 2001).8
In the impulse-response derivation in the Appendix, no instantaneous

effects are captured, i.e., a shock in one of the variables does not result in a
non-zero shock value in the other variables. Moreover, since all variables in
the VAR model are predetermined, instantaneous effects are not captured
through any of the pij parameters. In order to capture such instantaneous
effects, the approach by Evans and Wells (1983) has become popular in
recent marketing applications (see e.g., Nijs et al. 2001; Srinivasan et al.
2004). The information in the residual variance-covariance matrix of the
VAR model is used to derive a vector of expected instantaneous shock
values following an initiating shock in one of the variables.9 This expected
shock vector, rather than the [0 1 0]′ vector used in the Appendix, is subse-
quently traced through the system in order to derive its incremental impact
on the future values of the various endogenous variables. This procedure
(referred to as Generalized Impulse Response Functions or GIRFs) was
adopted in Dekimpe and Hanssens’s (1999) analysis of a prescription drug
market. Impulse-response functions were used to quantify the immediate,
short- and long-run performance, spending and profit implications of
changes in, respectively, advertising support, the number of sales calls, and
the price differential with a major competitor. Focusing on their long-term
conclusions, increases in calling support failed to produce persistent sales
gains, but were costly in the long run. Narrowing the price gap with its
competitors improved the brand’s long-run profitability, even though this
strategy contributed to the long-run sales erosion of the brand. Finally,
the observed reductions in advertising support had a negative impact on
long-run sales levels as well.
Following the VAR(X) estimation and the derivation of the associated
GIRFs, Generalized Forecast Error Variance Decomposition (GFEVD)
can be used to quantify the relative importance of (current and past
fluctuations in) a given marketing instrument (or other shock component).
Following Nijs, Srinivasan and Pauwels (2007), the GFEVD can be
quantified as:
MIZIK_9781784716745_t.indd 92 14/02/2018 16:38

a cij (l)
t
g 2
uijg (t) l50

, (4.10)
a a cij (l)
5 t m
g 2
l50 j51
where cijg (l) is the value of a generalized impulse response function

(GIRF) following a shock to variable j on performance variable i at time l.
Importantly, the GFEVD quantifies the dynamic explanatory value of
each endogenous variable j on variable i, akin to a “dynamic R2” (Nijs
et al. 2007). FEVD has been used in recent marketing studies by Hanssens
(1998), Joshi and Hanssens (2010), Srinivasan, Pauwels and Nijs (2008),
Srinivasan et al. (2010), and Fang et al. (2015), among others.
SUBSTANTIVE INSIGHTS
Marketing-mix Effectiveness
Initial applications of the persistence-modelling approach in marketing

focused on the quantification of short- and long-run effectiveness of dif-
ferent marketing-mix instruments on a variety of performance metrics.
Marketing-mix instruments included, for example, advertising support
(e.g., Dekimpe and Hanssens 1995a; van Heerde et al. 2013), price pro-
motions (e.g., Slotegraaf and Pauwels 2008; Srinivsan et al. 2004), assort-
ments (Bezawada and Pauwels 2013), or competitive activities (Steenkamp
et al. 2005), and the performance metrics have been primary demand
(Nijs et al. 2001; Dekimpe, Hanssens and Silva-Risso 1999) or second-
ary demand (Dekimpe and Hanssens 1995a), profitability (Dekimpe and
Hanssens 1999), or stock prices (Pauwels et al. 2004), among others. While
many studies have focused on the aggregate performance metrics, others
explored the heterogeneity in response across performance components
such as category incidence, brand choice and purchase quantity (Pauwels
et al. 2002), or across consumer segments (Lim, Currim and Andrews
2005; Sismeiro et al. 2012). In combination, these studies have resulted in
a rich set of empirical generalizations on marketing’s short- and long-run
effectiveness (see also Hanssens 2015 for a review). The key insights from
some of these studies are summarized in Table 4.2, Panel A.
Following this initial wave of studies, persistence modeling has received
a new impetus from a number of research streams: (1) the interest in the
marketing–finance interface (see panel B in Table 4.2 for some illustrative
studies), (2) the potential cannibalization when adding online (or offline)
MIZIK_9781784716745_t.indd 93 14/02/2018 16:38

Table 4.2 Strategic insights from persistence modeling*
Study Contribution
MIZIK_9781784716745_t.indd 94
Panel A: Short- and long-run marketing-mix effectiveness
Dekimpe and Hanssens (1995a) Persistence measures quantify marketing’s long-run effectiveness. Image-oriented and
price-oriented advertising messages have a differential short- and long-run effect.
Dekimpe and Hanssens (1999) Different strategic scenarios (business as usual, escalation, hysteresis and evolving
business practice) have different long-run profitability implications.
Dekimpe, Hanssens, and Silva-Risso Little evidence of long-run promotional effects is found in CPG markets.
(1999)
Nijs, Dekimpe, Steenkamp and Hanssens Limited long-run category expansion effects of price promotions. The impact differs
(2001) in terms of the marketing intensity, competitive structure, and competitive conduct
in the industry.
94
Pauwels, Hanssens and Siddarth (2002) The decomposition of the promotional sales spike in category-incidence, brand-
switching and purchase-quantity effects differs depending on the time frame
considered (short versus long run).
Slotegraaf and Pauwels (2008) Both permanent and cumulative sales effects from marketing promotions are greater
for brands with higher equity and more product introductions. Brands with low
equity gain greater benefits from product introductions.
Srinivasan, Pauwels, Hanssens and Price promotions have a differential performance impact for retailers versus
Dekimpe (2004) manufacturers.
Panel B: Marketing/finance interface
Chakravarty and Grewal (2011) The past behavior of firm stock returns and volatility may create investor
expectations of short-term financial performance, which drives managers to modify
either R&D or marketing budgets or both.
Joshi and Hanssens (2010) Advertising has a direct effect on firm value, beyond its indirect effect through market
performance. The advertiser benefits, while competitors of comparable size get
hurt.
14/02/2018 16:38
Luo (2009) Negative word-of-mouth hurts firm value and increases volatility in the short run and
in the long run. It takes several months for these effects to wear in.
MIZIK_9781784716745_t.indd 95
Luo, Raithel and Wiles (2013) Variance in brand ratings across consumers (brand dispersion) affects stock prices: it
harms returns but reduces firm risk. Also, there is an asymmetric effect of downside
versus upside dispersion.
Luo and Zhang (2013) Consumer buzz and traffic in social media are useful predictors of firm value.
Pauwels, Silva-Risso, Srinivasan and New product introductions benefit firm value in the short run and the long run, while
Hanssens (2004) rebates hurt firm value in the long run. It takes several weeks for these effects to
wear in.
Panel C: On- versus offline selling
Deleersnyder, Geyskens, Gielens and Limited evidence of cannibalization by the Internet channel in the European
Dekimpe (2002) newspaper industry.
Pauwels, Leeflang, Teerling and Huizingh The long-run revenue impact of the introduction and marketing efforts of an
95
(2011) informational website depends on the product type and the consumer segment.
Pauwels and Neslin (2015) Adding bricks-and-mortar stores cannibalizes existing catalog and Internet channels
differently.
Wiesel, Pauwels and Arts (2011) Multiple cross-channel effects exist, with off-line marketing activities affecting online
funnel metrics, and online funnel metrics affecting off-line sales.
Panel D: New/social media
Demirci, Pauwels, Srinivasan and Yildirim Brand strength and the search-versus-experience nature of the category affect
(2014) the effectiveness of different types of online media, and their synergy with other
marketing actions.
Fang, Li, Huang and Palmatier (2015) Attracting existing sellers has a greater effect on click rate than new sellers in the
launch stage, but the opposite is true in the mature stage. Attracting new buyers
exerts a greater effect on click rate and price than does attracting existing buyers,
and this pattern is more pronounced in the mature stage.
14/02/2018 16:38
Table 4.2 (continued)
MIZIK_9781784716745_t.indd 96
Study Contribution
Panel D: New/social media
Kireyev, Pauwels and Gupta (2016) Display ads significantly increase search conversion. Both search and display ads
exhibit significant dynamics that improve their effectiveness and ROI over time. In
addition to increasing search conversion, display ad exposure also increases search
clicks, thereby increasing search advertising costs.
Luo and Zhang (2013) Consumer buzz and traffic in social media are useful predictors of firm value.
Srinivasan, Rutz and Pauwels (2015) Online owned, (un)earned and paid media can explain a substantial part of the path
to purchase, also for CPG brands.
96
Pauwels and Weiss (2008) Moving from free to fee structure slows the growth of free users directly and reduces
the effectiveness of marketing communications in generating free users for online
content providers.
Panel E: Inclusion of mindset metrics
Pauwels and van Ewijk (2013) Both attitude survey and online behavior metrics matter for sales explanation and
prediction in business-to-consumer categories.
Srinivasan, Vanhuele and Pauwels (2010) Mindset metrics such as advertising awareness, brand consideration and brand liking
can add explanatory power in a sales response model that already accounts for
short-run and long-run effects of advertising, price, distribution and promotion.
Note: * The listed studies are given for illustrative purposes. As such, the list is not meant to be exhaustive. The current table complements earlier
reviews in, among others, Dekimpe and Hanssens (2000, 2010).
14/02/2018 16:38
stores to a firm’s channel portfolio, (3) the emergence of numerous new/

social media, and (4) the possibility to include mindset metrics in response
models.
Marketing–finance Interface
Time-series methods are well suited to analyze stock-price data, and quan-
tify their sensitivity to new marketing information. Not only can they be
employed without having to resort to strong a priori assumptions about
investor behavior such as full market efficiency, VAR models are also
very flexible to accommodate feedforward and feedback loops between
investor behavior and managerial behavior. Given the increasing interest
in understanding the linkage between product markets (“Main Street”)
and financial markets (“Wall Street”), it is not surprising that time-
series models in general, and VAR models in particular, have been used
in that research domain. Some illustrative examples are given in Panel
B of Table 4.2. More extensive reviews are available in Srinivasan and
Hanssens (2009) and Luo, Pauwels and Hanssens (2012).
Online versus Offline Selling
Since the commercialization of the World Wide Web, many companies

have set up websites to increase their revenues (Pauwels et al. 2011).
Similarly, companies that were originally online sellers increasingly add
physical (bricks-and-mortar) stores to their channel portfolio (see, e.g.,
Pauwels and Neslin 2015). Such channel additions are infrequent discrete
events that can permanently lift baseline sales, but that may also raise
considerable cannibalization concerns (Deleersnyder et al. 2002), and
structurally alter existing relationships among input and output variables.
Structural-break unit-root and cointegration tests, along with pre- and
post-event VARX estimations and conditional forecasts, have been often-
used approaches to get insights in the performance implications of these
additions, as illustrated in panel C of Table 4.2.
New/Social Media
The emergence of new media has brought along a new set of marketing
metrics, which can easily be tracked over time. Given the multitude of
these new media (Twitter, Facebook, etc.), the large number of metrics
that can be derived from them (like website visits, paid search clicks,
Facebook likes, Facebook unlikes, etc.), and the large number of feedback
loops that may exist (not only among these online metrics themselves, but
MIZIK_9781784716745_t.indd 97 14/02/2018 16:38

also with more traditional offline metrics), many researchers have opted
for the flexibility of VAR models, with their data-driven identification of
relevant effects, to study these phenomena. Trusov, Bucklin and Pauwels
(2009), for example, studied the effect of word-of-mouth marketing on
member growth at an internet social network, and compared it with more
traditional marketing vehicles. Word-of-mouth referrals were found to
have a substantially longer carryover effect than more traditional mar-
keting actions, and to have higher elasticities as well. Luo and Zhang
(2013) linked various buzz and online traffic measures to the subsequent
performance of a firm’s stock in the market, while Srinivasan, Rutz and
Pauwels (2016) considered the effects of consumer activities on paid,
owned and earned online media on sales, as well as their interdependen-
cies with the more traditional marketing-mix elements of price, advertising
and distribution.
Inclusion of Mindset Metrics
While mind-set metrics such as awareness, liking and consideration

have a long history in marketing (e.g., as building blocks in hierarchy-
of-effects models), questions/doubts about their long-term sales effects
through brand building have long prevailed. Not only were time-series
data on these metrics often missing, prior evidence on the exact inter-
relationships and sequence of these effects was mixed (Srinivasan et al.
2010). Indeed, marketing theory appears insufficiently developed to posit
non-equivocally one specific sequence. A flexible modeling approach that
does not impose an a priori sequence on the effects, yet which can capture
multiple interactions among the various measures, is therefore called for.
VAR models are ideally placed to do so, and were used in, among others,
Srinivasan et al. (2010) and Pauwels and van Ewijk (2013). Using French
data from Prométhée, a band performance tracker developed by Kantar
Worldpanel, Srinivasan and co-authors added, for more than 60 CPG
brands, various mindset metrics to a VAR model that already accounted
for the short- and long-run effects of advertising, price, distribution
and promotions. Importantly, the mind-set metrics add considerable
explanatory power, and can be used by managers as early performance
indicators. Pauwels and Van Ewijk, in turn, combine slower-moving atti-
tudinal survey measures with rapidly-changing online behavioral metrics
to explain the sales evolution of over 30 brands across a diverse set of
categories (CPG as well as services and durables).
MIZIK_9781784716745_t.indd 98 14/02/2018 16:38

CONCLUSION
In this chapter, we reviewed the persistence modeling approach, which

has received considerable attention in the recent marketing literature.
However, this by no means offers an exhaustive discussion of all time-
series applications in marketing. Because of space limitations, we did not
review the use of “more traditional” time-series techniques in market-
ing, such as univariate ARIMA modeling, multivariate transfer-function
modeling, or Granger-causality testing. A review of these applications is
given in Table 1 of Dekimpe and Hanssens (2000). Similarly, we did not
discuss the frequency-domain approach to time-series modeling (see, e.g.,
Bronnenberg, Mela and Boulding 2006 for a recent application on the
periodicity of pricing, or Lemmens, Croux and Dekimpe 2007 for a study
on the periodicity of the European integration in consumer confidence),
nor did we review recent applications of band-pass filtering to isolate
business-cycle fluctuations in marketing time series (see Deleersnyder and
Dekimpe 2017 for a review of this research stream), the use of smooth-
transition regression models to capture different elasticity regimes (see,
e.g., Pauwels, Srinivasan and Franses 2007), or the use of state-space
modeling, an approach especially suitable to also derive normative
implications (see Dekimpe et al. 2008 or Naik 2015 for reviews along
that dimension). Indeed, the use of time-series techniques in marketing is
expanding rapidly, covering too many techniques and applications to be
fully covered in a single chapter.
Referring to the expanding size of marketing data sets, the accelerat-
ing rate of change in the market environment, the opportunity to study
the marketing–finance relationship, and the emergence of internet data
sources, Dekimpe and Hanssens argued in 2000 that “for time-series
modelers in marketing, the best is yet to come” (192). Pauwels, Currim,
Dekimpe, Ghysels, Hanssens, Mizik and Naik (2004) identified a number
of additional research opportunities, including ways to (1) capture asym-
metries in market response, (2) allow for different levels of temporal
aggregation between the different variables in a model, (3) cope with
the Lucas Critique, (4) handle the short time series often encountered in
many applications, and (5) incorporate Bayesian inference procedures in
time-series modeling. In each of these areas, we have recently seen impor-
tant developments. For example, Lamey, Deleersnyder, Dekimpe and
Steenkamp (2007) developed an asymmetric growth model to capture the
differential impact of economic expansions and recessions on private-label
growth, while Gijsenberg, van Heerde and Verhoef (2015) introduced a
Double-Asymmetric Structural VAR model to allow for the possibility
that negative shocks, followed by same-size positive shocks, lead to a
MIZIK_9781784716745_t.indd 99 14/02/2018 16:38

net short- or long-run loss/gain. Ghysels, Pauwels and Wolfson (2006)

discussed Mixed Data Sampling (MIDAS) regression models to dynami-
cally relate hourly advertising to daily sales, see also Tellis and Franses
(2006) who derive for some basic models what could be the optimal level
of temporal aggregation. Tests for the Lucas Critique are becoming more
widely accepted in marketing (see, e.g., van Heerde et al. 2005, 2007).
Krider, Tieshan, Liu and Weinberg (2005) developed graphical procedures
to test for Granger causality between short time series, and Chakravarthy
and Grewal (2011) used a Bayesain VARX model to combine information
across many short cross-sections. Bayesian VAR models are also used by
Demerci, Pauwels, Srinivasan and Yildirim (2014) in their study on condi-
tions for owned, paid and earned media impact and synergy.
In sum, the use of time-series procedures in marketing is rapidly
expanding, not only because more extensive (in terms of both the included
variables and the length of the time window covered) data sets become
available, but also because various research questions have come to the
fore that (1) potentially/likely involve multiple feedback loops, and (2)
where marketing theory is insufficiently developed to specify a priori all
temporal precedence relationships. In those instances, the flexibility of
VAR models to capture dynamic inter-relationships, and to quantify the
short- and long-run net effects of the various influences at hand, becomes
very valuable. We hope the current chapter will contribute to a further
diffusion of these techniques in the marketing community.
Notes
1. Strictly speaking, one could also consider the situation where ϕ > 1, in which case past
shocks become more and more important, causing the series to explode to plus or minus
infinity. Situations where the past becomes ever more important are, however, unrealistic
in marketing.
2. The previous discussion used the first-order autoregressive model to introduce the con-
cepts of stability, evolution and unit roots. The findings can easily be generalized to the
more complex autoregressive moving-average process (L)St = c + Q(L)ut. Indeed, the
stable/evolving character of a series is completely determined by whether or not some of
the roots of the autoregressive polynomial (L) = (1 − ϕ1L − . . .−ϕpLp) are equal to
one.
3. One could argue that two mean-stationary series are also in long-run equilibrium, as
each series deviates only temporarily from its mean level, and hence, from the other.
However, this situation is conceptually different from a cointegrating equilibrium, in
which a series can wander away from its previously-held positions, but not from the
other.
4. In case only a subset of the variables has a unit root or is cointegrated, mixed models are
specified.
5. Note that this may necessitate the use of SUR, rather than OLS, estimation, as the
equations may now have a different set of explanatory variables.
MIZIK_9781784716745_t.indd 100 14/02/2018 16:38

6. Another way to deal with the degrees-of-freedom problem is to impose a variety

of restrictions to limit the number of parameters (see, for example, Pauwels 2004).
Alternatively, panel data can be used to increase the degrees of freedom under appro-
priate pooling assumptions (see, for example, Horváth and Wieringa 2008; Horváth,
Leeflang, Wieringa and Wittink 2005). Sismeiro, Mizik and Bucklin (2012) use panel
VAR models to investigate whether different dynamic business scenarios coexist across a
firm’s customer base. Chakravarty and Grewal (2011) also pool across a cross-section of
shorter time series, and apply a hierarchical Bayesian (random effect) parameterization
of the relevant coefficients.
7. Error-correction models can be specified when the series are cointegrated, but also when
all variables in the system are stationary (see Fok, Horváth, Paap and Franses 2006 for
an in-depth discussion). Recent applications in a stationary environment include van
Heerde et al. (2007, 2010, 2013) and Gijsenberg (2014), among others.
8. In panel B, the dust-settling period is defined in terms of the last period that has an impact
significantly different from the non-zero asymptotic value (see Nijs et al. 2001 for details).
9. Assuming multivariate normality of the residuals of the VAR model, it is easy to show
that the expected shock values in the other variables after a one-unit shock to the i-th
variable are given by [sij/sii], with the s elements derived from the estimated residual
variance-covariance matrix of the VAR model.
References
Baghestani, Hamid (1991), “Cointegration Analysis of the Advertising–Sales Relationship,”

Journal of Industrial Economics, 39 (6), 671–681.
Bezawada, Ram and Koen Pauwels (2013), “What Is Special About Marketing Organic
Products? How Organic Assortment, Price and Promotions Drive Retailer Performance,”
Journal of Marketing, 77 (1), 31–51.
Bronnenberg, Bart J., Carl F. Mela and William Boulding (2006), “The Periodicity of
Pricing,” Journal of Marketing Research, 43 (3), 477–493.
Chakravarty, Anindita and Rajdeep Grewal (2011), “The Stock Market in the Driver’s Seat!
Implications for R&D and Marketing,” Management Science, 57 (9), 1594–1609.
Dekimpe, Marnik G., Philip Hans Franses, Dominique M. Hanssens and Prasad A. Naik
(2008), “Time-Series Models in Marketing,” in B. Wierenga (ed.), Handbook of Marketing
Decision Models, Springer, 373–398.
Dekimpe, Marnik G. and Dominique M. Hanssens (1995a), “The Persistence of Marketing
Effects on Sales,” Marketing Science, 14 (1), 1–21.
Dekimpe, Marnik G. and Dominique M. Hanssens (1995b), “Empirical Generalizations
about Market Evolution and Stationarity,” Marketing Science, 14 (3 sup 2), G109–G121.
Dekimpe, Marnik G. and Dominique M. Hanssens (1999), “Sustained Spending and Persistent
Response: A New Look at Long-Term Marketing Profitability,” Journal of Marketing
Research, 36 (4), 397–412.
Dekimpe, Marnik G. and Dominique M. Hanssens (2000), “Time-Series Models in
Marketing: Past, Present and Future,” International Journal of Research in Marketing, 17
(2–3), 183–193.
Dekimpe, Marnik G. and Dominique M. Hanssens (2004), “Persistence Modeling for
Assessing Marketing Strategy Performance,” in D. Lehmann and C. Moorman (eds.),
Assessing Marketing Strategy Performance, Marketing Science Institute, 69–93.
Dekimpe, Marnik G. and Dominique M. Hanssens (2010), “Time Series Models in Marketing:
Some Recent Developments,” Marketing Journal of Research and Management, 6 (1),
93–98.
Dekimpe, Marnik G., Dominique M. Hanssens and Jorge M. Silva-Risso (1999), “Long-
Run Effects of Price Promotions in Scanner Markets,” Journal of Econometrics, 89 (1–2),
269–291.
MIZIK_9781784716745_t.indd 101 14/02/2018 16:38

Deleersnyder, Barbara and Marnik G. Dekimpe (2017), “Business-Cycle Research in

Marketing,” in B. Wierenga and R. van der Lans (eds.), Handbook of Marketing Decision
Models, Springer.
Deleersnyder, Barbara, Geyskens Inge, Katrijn Gielens and Marnik G. Dekimpe (2002),
“How Cannibalistic is the Internet Channel? A Study of the Newspaper Industry in the
United Kingdom and the Netherlands,” International Journal of Research in Marketing,
19 (4), 337–348.
Demirci, Ceren, Koen Pauwels, Shuba Srinivasan and Gokhan Yildirim (2014), “Conditions
for Owned, Paid, and Earned Media Impact and Synergy,” Marketing Science Institute
Working Paper Series No. 14–101.
Dickey, David A. and Wayne A. Fuller (1979), “Distribution of the Estimators for
Autoregressive Time Series with a Unit Root,” Journal of the American Statistical
Association, 74 (366), 427–731.
Enders, Walter (1995), Applied Econometric Time Series. New York: John Wiley & Sons.
Engle, Robert F. and Clive W.J. Granger (1987), “Cointegration and Error Correction:
Representation, Estimation and Testing,” Econometrica, 55 (2), 251–276.
Engle, Robert F. and Byung S. Yoo (1987), “Forecasting and Testing in Co-Integrated
Systems,” Journal of Econometrics, 35 (1), 143–159.
Evans, Lewis and Greame Wells (1983), “An Alternative Approach to Simulating VAR
Models,” Economic Letters, 12 (1), 23–29.
Fang, Eric (ER), Xiaoling Li, Minxue Huang and Robert W. Palmatier (2015), “Direct and
Indirect Effects of Buyers and Sellers on Search Advertising Revenues in Business-to-
Business Electronic Platforms,” Journal of Marketing Research, 52 (3), 407–422.
Fok, Dennis, Csilla Horváth, Richard Paap and Philip Hans Franses (2006), “A Hierarchical
Bayes Error Correction Model to Explain Dynamic Effects of Price Changes,” Journal of
Marketing Research, 43 (3), 443–461.
Franses, Philip Hans (2001), “How to Deal with Intercept and Trend in Practical
Cointegration Analysis,” Applied Economics, 33 (5), 577–579.
Franses, Philip Hans, Teun Kloek and André Lucas (1999), “Outlier Robust Analysis of
Long-Run Marketing Effects for Weekly Scanner Data,” Journal of Econometrics, 89
(1/2), 293–315.
Franses, Philip Hans, Shuba Srinivasan and Peter Boswijk (2001), “Testing for Unit Roots in
Market Shares,” Marketing Letters, 12 (4), 351–364.
Ghysels, Eric, Koen H. Pauwels, and Paul J. Wolfson (2006), “The MIDAS Touch: Linking
Marketing to Performance at Different Frequencies,” working paper.
Gijsenberg, Maarten J. (2014), “Going for Gold: Investing in the (Non)Sense of Increased
Advertising Around Major Sports Events,” International Journal of Research in Marketing,
31 (1), 2–15.
Gijsenberg, Maarten J., Harald J. van Heerde and Peter C. Verhoef (2015), “Losses Loom
Longer than Gains: Modeling the Impact of Service Crises on Perceived Service Quality
over Time,” Journal of Marketing Research, 52 (5), 642–656.
Gregory, Allen W. and Bruce E. Hansen (1996), “Tests for Cointegration in Models with
Regime and Trend Shifts,” Oxford Bulletin of Economics and Statistics, 58 (3), 555–560.
Grewal, Rajdeep, Jeffrey A. Mills, Raj Mehta and Sudesh Mujumdar (2001), “Using
Cointegration Analysis for Modeling Marketing Interactions in Dynamic Environments:
Methodological Issues and an Empirical Illustration,” Journal of Business Research, 51
(2), 127–144.
Hamilton, James (1994), Time Series Analysis. Princeton, NJ: Princeton University Press.
Hanssens, Dominique M. (1998), “Order Forecasts, Retail Sales and the Marketing Mix for
Consumer Durables,” Journal of Forecasting, 17 (3/4), 327–346.
Hanssens, Dominique M. (2015), Empirical Generalizations about Marketing Impact, 2nd
Edition. Cambridge, MA: Marketing Science Institute.
Hanssens, Dominique M. and Marnik G. Dekimpe (2012), “Short-Term and Long-Term Effects
of Marketing Strategy,” in V. Shankar and G. Carpenter (eds.), Handbook of Marketing Stra-
tegy, Cheltenham, UK and Northampton, MA, USA: Edward Elgar Publishing, 457–469.
MIZIK_9781784716745_t.indd 102 14/02/2018 16:38

Hanssens, Dominique M. and Ming Ouyang (2002), “Hysteresis in Marketing Response:

When is Marketing Spending an Investment?” Review of Marketing Science, 419.
Hanssens, Dominique M., Leonard J. Parsons and Randall L. Schultz (2001), Market
Response Models, 2nd Edition. Boston, MA: Kluwer Academic Publishers.
Hanssens, Dominique M., Fang Wang and Xiao-Ping Zhang (2016), “Performance Growth
and Opportunistic Marketing Spending,” International Journal of Research in Marketing,
33 (4), 711–724.
Horváth, Csilla, Peter S.H. Leeflang, Jaap E. Wieringa and Dick R. Wittink (2005),
“Competitive Reaction- and Feedback Effects based on VARX Models of Pooled Store
Data,” International Journal of Research in Marketing, 22 (4), 415–426.
Horváth, Csilla and Jaap Wieringa (2008), “Pooling Data for the Analysis of Dynamic
Marketing Systems,” Statistica Neerlandica, 62 (2), 208–229.
Johansen, Søren (1988), “Statistical Analysis of Cointegration Vectors,” Journal of Economic
Dynamics and Control, 12 (2–3), 231–254.
Joshi, Amit M. and Dominique M. Hanssens (2010), “The Direct and Indirect Effects of
Advertising Spending on Firm Value,” Journal of Marketing, 74 (1), 20–33.
Kireyev, Pavel, Koen Pauwels and Sunil Gupta (2016), “Do Display Ads Influence Search?
Attribution and Dynamics in Online Advertising,” International Journal of Research in
Marketing, 33 (3), 475–490.
Kornelis, Marcel, Marnik G. Dekimpe and Peter S. H. Leeflang (2008), “Does Competitive
Entry Structurally Change Key Marketing Metrics?” International Journal of Research in
Marketing, 25 (3), 173–182.
Krider, Robert E., Tieshan Li, Yong Liu and Charles B. Weinberg (2005), “The Lead-Lag
Puzzle of Demand and Distribution: A Graphical Method Applied to Movies,” Marketing
Science, 24 (4), 635–645.
Kwiatkowski, Denis, Peter C.B. Phillips, Peter Schmidt and Yongcheol Shin (1992), “Testing
the Null Hypothesis of Stationarity against the Alternative of a Unit Root,” Journal of
Econometrics, 54 (1–3), 159–178.
Lamey, Lien, Barbara Deleersnyder, Marnik G. Dekimpe and Jan-Benedict E.M. Steenkamp
(2007), “How Business Cycles Contribute to Private-Label Success: Evidence from the
United States and Europe,” Journal of Marketing, 71 (1), 1–15.
Leeflang, P., T. Bijmolt, J. van Doorn, D. Hanssens, H. van Heerde, P. Verhoef and J.
Wierenga, “Lift versus Base: Current Trends in Marketing Dynamics,” International
Journal of Research in Marketing, March 2009.
Lemmens, Aurélie, Christophe Croux and Marnik G. Dekimpe (2007), “Consumer
Confidence in Europe: United in Diversity?” International Journal of Research in
Marketing, 24 (2), 113–127.
Lim, Jooseop, Imran S. Curim and Rick L. Andrews (2005), “Consumer Heterogeneity
in the Longer-Term Effects of Price Promotions,” International Journal of Research in
Marketing, 22 (4), 441–457.
Luo, Xueming (2009), “Quantifying the Long-Term Impact of Negative Word of Mouth on
Cash Flows and Stock Prices,” Marketing Science, 28 (1), 148–165.
Luo, Xueming, Koen H. Pauwels and Dominique Hanssens (2012), “Time-Series Models
of Pricing the Impact of Marketing on Firm Value,” in S. Ganesan (ed.), Handbook of
Marketing and Finance, Cheltenham, UK and Northampton, MA, USA: Edward Elgar
Publishing, 43–65.
Luo, Xueming, Sascha Raithel and Michael A. Wiles (2013), “The Impact of Brand Rating
Dispersion on Firm Value,” Journal of Marketing Research, 50 (3), 399–415
Luo, Xueming and Jie Zhang (2013), “How Do Consumer Buzz and Traffic in Social Media
Marketing Predict the Value of the Firm?” Journal of Management Information Systems,
30 (2), 213–238.
Lütkepohl, Helmut (1993), Introduction to Multiple Time Series Analysis. Berlin:
Springer-Verlag.
Naik, Prasad A. (2015), “Marketing Dynamics: A Primer on Estimation and Control,”
Foundations and Trends in Marketing, 9 (3), 175–266.
MIZIK_9781784716745_t.indd 103 14/02/2018 16:38

Nijs, Vincent, Marnik G. Dekimpe, Jan-Benedict E. M. Steenkamp and Dominique M.

Hanssens (2001), “The Category Demand Effects of Price Promotions,” Marketing
Science, 20 (1), 1–22.
Nijs, Vincent R., Shuba Srinivasan and Koen Pauwels (2007), “Retail-Price Drivers and
Retailer Profits,” Marketing Science, 26 (4), 473–487.
Pauwels, Koen (2004), “How Dynamic Consumer Response, Competitor Response,
Company Support, and Company Inertia Shape Long-Term Marketing Effectiveness,”
Marketing Science, 23 (4), 596–610.
Pauwels, Koen, Imran Currim, Marnik G. Dekimpe, Eric Ghysels, Dominique M. Hanssens,
Natalie Mizik and Prasad Naik (2004), “Modeling Marketing Dynamics by Time Series
Econometrics,” Marketing Letters, 15 (4), 167–183.
Pauwels, Koen and Dominque M. Hanssens (2007), “Performance Regimes and Marketing
Policy Shifts,” Marketing Science, 26 (3), 293–311.
Pauwels, Koen, Dominique M. Hanssens and S. Siddarth (2002), “The Long-Term Effects of
Price Promotions on Category Incidence, Brand Choice and Purchase Quantity,” Journal
of Marketing Research, 39 (4), 421–439.
Pauwels, Koen, Peter S.H. Leeflang, Marije Teerling and K.R. Eelko Huizingh (2011),
“Does Online Information Drive Offline Revenues? Only for Specific Products and
Consumer Segments!” Journal of Retailing, 87 (1), 1–17.
Pauwels, Koen and Scott A. Neslin (2015), “Building with Bricks and Mortar: The Revenue
Impact of Opening Physical Stores in a Multichannel Environment,” Journal of Retailing,
91 (2), 182–197.
Pauwels, Koen, Jorge Silva-Risso, Shuba Srinivasan and Dominique M. Hanssens (2004),
“New Products, Sales promotions, and Firm Value: The Case of the Automobile
Industry,” Journal of Marketing, 68 (4), 142–156.
Pauwels, Koen and Shuba Srinivasan (2004), “Who Benefits from Store Brand Entry?”
Pauwels, Koen, Shuba Srinivasan, and Philip Hans Franses (2007), “When Do Price
Thresholds Matter in Retail Categories?” Marketing Science, 26 (1), 83–100.
Pauwels, Koen and Bernadette van Ewijk (2013), “Do Online Behavior Tracking or Attitude
Survey Metrics Drive Brand Sales? An Integrative Model of Attitudes and Actins on the
Consumer Boulevard,” Marketing Science Institute Working Paper Series No. 13–118.
Pauwels, Koen and Alan Weiss (2008), “Moving from Free to Fee: How Online Firms
Market to Change Their Business Model Successfully,” Journal of Marketing, 72 (3),
14–31.
Perron, Pierre (1994), “Trend, Unit Root and Structural Change in Macro-Economic Time
Series,” in B. Rao (ed.), Cointegration for the Applied Economist, New York: St. Martin’s,
113–146.
Pesaran, Hashem H. and Yongcheol Shin (1998), “Generalized Impulse Response Analysis
in Linear Multivariate Models,” Economic Letters, 58 (1), 17–29.
Pesaran, M. H., R. Pierse and K.C. Lee (1993), “Persistence, Cointegration and Aggregation:
A Disaggregated Analysis of Output Fluctuations in the US Economy,” Journal of
Econometrics, 56 (1–2), 57–88.
Powers, Keiko, Dominique M., Hanssens, Yih-Ing Hser and M. Douglas Anglin (1991),
“Measuring the long-term effects of public policy: The case of narcotics use and property
crime,” Management Science, 37 (6), 627–644.
Sismeiro, Catarina, Natalie Mizik and Randolph E. Bucklin (2012), “Modeling Coexisting
Business Scenarios with Time-Series Panel Data: A Dynamics-Based Segmentation
Approach,” International Journal of Research in Marketing, 29 (2), 134–147.
Slotegraaf, Rebecca J. and Koen Pauwels (2008), “The Impact of Brand Equity and
Innovation on the Long-Term Effectiveness of Promotions,” Journal of Marketing
Research, 45 (3), 293–306.
Srinivasan, Shuba and Dominique M. Hanssens (2009), “Marketing and Firm Value:
Metrics, Methods, Findings and Future Directions,” Journal of Marketing Research, 46
(3), 293–312.
MIZIK_9781784716745_t.indd 104 14/02/2018 16:38

Srinivasan, Shuba, Koen Pauwels, Dominique M. Hanssens and Marnik G. Dekimpe (2004),
“Do Promotions Benefit Manufacturers, Retailers, or Both?” Management Science, 50 (5),
617–629.
Srinivasan, Shuba, Koen Pauwels and Vincent Nijs (2008), “Demand-Based Pricing versus
Past-Price Dependence: A Cost-Benefit Analysis,” Journal of Marketing, 72 (2), 15–27.
Srinivasan, Shuba, Oliver J. Rutz and Koen Pauwels (2016), “Paths to and off Purchase:
Quantifying the Impact of Traditional Marketing and Online Consumer Activity,” Journal
of the Academy of Marketing Science, 44 (4), 440–453.
Srinivasan, Shuba, Marc Vanhuele and Koen Pauwels (2010), “Mind-Set Metrics in Market
Response Models: An Integrative Approach,” Journal of Marketing Research, 47 (4),
672–684.
Steenkamp, Jan-Benedict E. M., Vincent R. Nijs, Dominique M. Hanssens and Marnik
G. Dekimpe (2005), “Competitive Reactions to Advertising and Promotion Attacks,”
Tellis, Gerard J. and Philip Hans Franses (2006), “Optimal Data Interval for Estimating
Advertising Response,” Marketing Science, 25 (3), 217–229.
Trusov, Michael, Randolph E. Bucklin and Koen Pauwels (2009), “Effects of Word-of-
Mouth versus Traditional Marketing: Findings from an Internet Social Networking Site,”
Journal of Marketing, 73 (5), 90–102.
van Heerde, Harald J., Marnik G. Dekimpe and William P. Putsis, Jr. (2005), “Marketing
Models and the Lucas Critique,” Journal of Marketing Research, 42 (1), 15–21.
van Heerde, Harald J., Maarten Gijsenberg, Marnik G. Dekimpe and Jan-Benedict E. M.
Steenkamp (2013), “Price and Advertising Effectiveness over the Business Cycle,” Journal
of Marketing Research, 50 (2), 177–193.
van Heerde, Harald, J., Kristiaan Helsen and Marnik G. Dekimpe (2007), “The Impact of
a Product-Harm Crisis on Marketing Effectiveness,” Marketing Science, 26 (2), 230–245.
van Heerde, Harald J., Shuba Srinivasan and Marnik G. Dekimpe (2010), “Estimating
Cannibalization Rates for Pioneering Innovations,” Marketing Science, 29 (6), 1024–1039.
Wang, Fang and Xiao-Ping Zhang (2008), “Reasons for Market Evolution and Budgeting
Implications,” Journal of Marketing, 72 (5), 15–30.
Wiesel, Thorsten, Koen Pauwels and Joep Arts (2011), “Marketing’s Profit Impact:
Quantifying Online and Off-line Funnel Progression,” Marketing Science, 30 (4), 604–611.
Zivot, Eric and Donald W. K. Andrews (1992), “Further Evidence on the Great Crash,
the Oil Price Shock and the Unit Root Hypothesis,” Journal of Business and Economic
Statistics, 10 (3), 251–270.
MIZIK_9781784716745_t.indd 105 14/02/2018 16:38

APPENDIX
Impulse-response Functions: Mathematical Derivation
St p11 p12 p13 St2j uS,t

£ Mt § 5 £ p21 p22 p23 § £ Mt2j § 1 £ uM,t § ,
CMt p31 p32 p33 CMt2j uCM,t
one sets [uS, uM, uCM] = [ 0,0,0 ] prior to t

[ 0,1,0 ] at time t
[ 0,0,0 ] after t
and computes (simulates) the future values for the various endogenous
variables, i.e.:
St p11 p12 p13 0 0 0

£ Mt § 5 £ p21 p22 p23 § £ 0 § 1 £ 1 § 5 £ 1 § ,
CMt p31 p32 p33 0 0 0
St p11 p12 p13 0 0 p12

£ Mt § 5 £ p21 p22 p23 § £ 0 § 1 £ 1 § 5 £ p22 § ,
CMt p31 p32 p33 0 0 p32
St p11 p12 p13 p12 0

£ Mt § 5 £ p21 p22 p23 § £ p22 § 1 £ 0 §
CMt p31 p32 p33 p32 0
p11p12 p12p22 p13p32

5 £ p21p12 p22p22 p23p32 § ,
p31p12 p32p22 p33p32
etc.
MIZIK_9781784716745_t.indd 106 14/02/2018 16:38

5. P
anel data methods in marketing
research
Natalie Mizik and Eugene Pavlov
The increased availability of longitudinal marketing data collected at the

individual level has offered marketing researchers the option to utilize
panel data methods to better study marketing phenomena. The term
“panel data” refers to data sets that pool time-series data over multiple
cross-sections: individuals, households, firms, business units, or brands.
Panel data are more informative than cross-sectional data because
they allow addressing individual heterogeneity, modeling dynamic proc-
esses, and assessing effects that are not detectable in pure cross-sections.
Panel data are more informative than aggregate time series because they
allow tracking of individual histories and eliminate biases resulting from
aggregation. Panel data offer more variability and greater efficiency and
allow estimation of more complex and insightful models. Importantly,
panel data allow researchers to design models that control for omitted and
unobservable factors which can often mask causal effects of interest.
Indeed, the role of unobservables (such as individual ability, firm
culture, management quality) has long been debated in the marketing
and economics literature. A number of strategy perspectives, for example,
the Resource-Based and Austrian economics perspectives, highlight the
central role of unobservable factors in explaining business performance.
Marketing is largely concerned with the development and deployment of
intangible assesses. These assets often fall into the category of unobserva-
bles. Unobservable factors (which include both true unobservables and
factors that are simply difficult to measure) can be posited to be the most
influential determinants of business performance (Jacobson 1990).
As we discuss later in this chapter, modeling and controlling for
unobservables in panel data often comes at the expense of efficiency.
Kirzner (1976), for example, notes that studies placing great emphasis on
unobservable factors are often criticized as incapable of saying anything
about observed strategic factors. He feels, however, that the truth is the
other way around. Only by controlling for unobservables can insights
into strategic factors be adequately assessed. According to Kirzner (1976),
“The real world includes a whole range of matters beyond the scope of
the measuring instruments of the econometrician. Economic science must
107
MIZIK_9781784716745_t.indd 107 14/02/2018 16:38

encompass this realm.” As such, empirically assessing marketing impact

hinges critically on controlling for the role of unobservable factors and the
panel data methods offer tools to achieve this.
In this chapter, we review panel data models popular in marketing
applications and highlight some issues, potential solutions, and trade-offs
that arise in their estimation. Panel data studies controlling for unob-
servables often show dramatically different estimates than cross-sectional
studies (Mizik and Jacobson 2004). We focus on estimation of models with
unobservable individual-specific effects and address some misconceptions
appearing in marketing applications. The choice of discussed topics is
highly selective and reflects the authors’ review of the panel data methods
used in the marketing field. We do not cover some important issues (e.g.,
the weak instruments problem) and recent developments in the causal
modeling as these are presented in Chapter 6, “Causal Inference in
Marketing Applications.” Furthermore, Chapter 17, using pharmaceuti-
cal marketing activity and drug prescriptions data, presents an empirical
illustration of the models, methods, and issues discussed here.
STATIC PANEL DATA MODELS
Time-invariant Random Effects: The Random-effects Model
Marketing researchers are frequently confronted with the data comprising

observations of multiple units (firms, stores, customers) over time. Let yit
be the value of the dependent variable for individual or firm i at time t and
let the set of predictor variables be represented by the vector xit.
yit = a0 + bxit + uit (5.1)
The error term uit in Equation 5.1 reflects the influence of omitted factors
affecting yit. Some of these factors reflected in the error term can be posited
to be specific to a particular cross-sectional unit i. As such, the error term
in Equation 5.1 can be expressed as
uit = mi + eit,
where mi is an unobservable time-invariant individual-specific factor and

eit is a contemporaneous (idiosyncratic) shock. This structure of the error
term induces a block diagonal variance-covariance matrix and calls for
the use of generalized least squares (GLS). As long as mi and eit are uncor-
related with the explanatory factors xit included in the model, OLS and
MIZIK_9781784716745_t.indd 108 14/02/2018 16:38

Panel data methods in marketing research 109
GLS estimation generate consistent coefficient estimates. However, the

residuals for a given cross-section i are correlated across periods and, as
a result, the reported standard errors from OLS estimation will be biased
and inconsistent. The GLS model, known as the random-effects model in
the panel data literature (e.g., Chamberlain 1984; Hsiao 1986), not only
generates consistent standard errors but is also asymptotically efficient.
For the random-effects model specification to be valid, it should be
plausible that all individual effects μi are drawn from the same probability
distribution. Strong heterogeneity across cross-sections invalidates the
random-effects specification. Generally, random-effects models are unat-
tractive for panels with small number of cross-sectional units N and for
panels with large time dimension T.
Time-invariant Fixed Effects: The Fixed-effects Model
The random-effects model assumes zero correlation between the explana-

tory factors xit and the unobserved individual-specific factor μi. Many
researchers (e.g., Mundlak 1978) have criticized the random-effects speci-
fication because of the restrictiveness of this assumption. Indeed, many
theories of firm performance (e.g., the resource-based perspective, Rumelt
1984; Wernerfelt 1984) emphasize the inter-relatedness of invisible assets
and strategic choices. The fixed-effects model takes into account the likely
correlation of strategic factors with the unobservable factors that persist
over time. Allowing for fixed effects of this type requires modeling these
effects explicitly:
yit = ai + bxit + eit (5.2)
Equation 5.2 differs from equation 5.1 in that it allows for the time-
invariant (fixed) unobserved factors that differ across cross-sections i to
be correlated with the explanatory factors xit. The effect of these fixed
factors is reflected in the individual-specific constant ai. To the extent that
fixed effects ai are correlated with the observed explanatory variables xit
included in the model (even if the correlation is with just one of the several
explanatory variables included in the set x, see discussion of bias spread-
ing later in the chapter), the OLS or GLS estimation of Equation 5.2 will
generate biased and inconsistent coefficient estimates.
Consistent estimation of the static fixed-effects models

For static panel data models, researchers typically choose one of the two
common estimation approaches for obtaining consistent estimates of the
effects for the observed strategic factors xit in the presence of unobservable
MIZIK_9781784716745_t.indd 109 14/02/2018 16:38

fixed effects (ai). One approach, the within (i.e., mean-difference) estima-
tor, involves analysis of deviations from the individual-specific mean of
each variable. That is, the following model is estimated:
yit 2 yi 5 (ai 2 ai ) 1 b (xit 2 xi ) 1 (eit 2 ei ) 5 b (xit 2xi ) 1 (eit 2 ei ) (5.3)
Here, yi 5 T1 g Tt51 yit and the means of other variables are defined
similarly. Since ai 5 ai (ai is constant over time for a given cross-sectional
unit), the within transformation of the data eliminates the individual-
specific unobserved effects ai from the equation. The within estimator
for the effects of the time-varying factors b̂ is numerically identical to
the least-squares dummy variable (LSDV) estimator of b̂. The advantage
of the dummy variable approach is that it does not difference out and
provides direct estimates of a^ i . For short panels (small T and large N),
however, the estimates of a^ i are inconsistent (Cameron and Trivedi 2005,
704).
The other common approach to estimating fixed-effects models, the
first-difference estimator, involves taking first differences of the data. That
is, the following model is estimated:
yit 2 yit21 5 (ai 2 ai) 1 b (xit 2 xit21) 1 (eit 2 eit21) 5
b (xit 2 xit21) 1 (eit 2 eit21) (5.4)
Taking either the first-differences or the mean-differences removes all

time-invariant factors, including fixed effects ai. Equation 5.3 assesses how
the deviations from the mean of the outcome variable yit are affected by
the explanatory variables xit deviating from their mean values. Equation
5.4 assesses how the first-difference in the outcome variable yit is affected
by the explanatory variables xit deviating from their previous values. If the
model is specified correctly (no mis-specification issues are present), these
estimators will generate statistically identical estimates. Under certain
conditions (discussed below), however, one estimator may be preferred to
the other.
The choice of the estimator for a fixed-effects model: first-difference vs.

mean-difference
If the panel consists of two periods only, the within and the first-difference
estimators (equations 5.3 and 5.4, respectively) are algebraically identi-
cal. For T > 2, mean-differencing (the within estimator) is more efficient
under the assumption of homoscedastic and serially uncorrelated distur-
bances. The within estimator also has an advantage in that it does not
MIZIK_9781784716745_t.indd 110 14/02/2018 16:38

eliminate a portion of the data as a result of differencing. First-order dif-

ferencing eliminates N out of N*T observations, second-order differencing
(i.e., yit 2 yit22) eliminates 2N, and so on. For these reasons, the within
estimator (mean-difference) is a more popular method of removing ai in
static panel data models and is the default method of fixed-effects panel
data regressions in many software packages. It is implemented in Stata
with the xtreg, fe command.
The relative efficiency of the within versus the first-difference estimator
depends on the statistical properties of the idiosyncratic error term eit. The
within estimator is more efficient when the idiosyncratic errors eit are seri-
ally uncorrelated. If eit , iid [ 0, s2e ] , then taking first difference generates
the error term Deit which follows an MA(1) process and has a first-order
autocorrelation coefficient of −0.5. As such, the first-difference estimator,
while still unbiased, is less efficient. However, if eit follows a random walk
(exhibits high levels of autocorrelation), the first-differenced error term
Deit is serially uncorrelated, and the first-difference estimation is more effi-
cient. The first-difference estimation can be implemented in Stata with the
regress d1.Y d1.X command, where operator d1 denotes first-differencing.
In situations where the error term eit is somewhere between a random walk
and the iid process, it is more difficult to decide between the first-difference
versus the within estimators. Wooldridge (2006, 487) suggests examining
the autocorrelation patterns of the differenced errors Deit in order to decide
between the first-difference versus the mean-difference estimators. He also
suggests performing estimation using both methods to compare the results
and then to try to identify the sources of any differences in the estimates.
If the first-difference and the mean-difference estimates differ significantly
(i.e., the difference cannot be attributed to a sampling error), the strict
exogeneity assumption (E (eit 0 xis , ai ) 5 0, s = 1, . . ., T) might be violated.
Any of the standard endogeneity problems (measurement error, omitted
variables, simultaneity) can induce contemporaneous correlation between
the error term eit and the explanatory variables xit. A contemporaneous
correlation causes the first-difference and the within estimators to be
inconsistent and to have different probability limits. In some applications,
it is also possible for the errors eit to be correlated with the past or future
values of xit. Correlation between eit and xis, for s ≠ t also causes both
estimators to be inconsistent. If s < t (error is correlated with the past
values of explanatory variables), including lags of xit and interpreting the
equation as a distributed lag model solves the problem. The correlation
of eit with the future values of explanatory variables xis (i.e., s > t) is more
problematic as it rarely makes economic sense to include future explana-
tory variables into the estimation model.
Another consideration when choosing the estimator for a fixed-effects
MIZIK_9781784716745_t.indd 111 14/02/2018 16:38

model is the potential presence of a measurement error. When measure-

ment error is present in the explanatory variables, the severity of attenua-
tion bias differs for the first-difference and the within estimators (Griliches
and Hausman 1986). We address this issue in more detail later in the
chapter. In summary, the higher the autocorrelation in the mismeasured
explanatory variable, the greater the attenuation bias under the first-
difference estimator, compared to the bias under the within estimator.
However, if the time dimension T is sufficiently high, taking higher-order
differences can potentially remedy the problem.
Our discussion of the first-difference versus the within estimator so far
pertained to static panels only. Once dynamics are introduced into the
model and a lagged dependent variable is added to the right-hand-side
of a model, the time-difference-based estimator becomes the estimator of
choice. In dynamic panels, the within estimator is always biased (Nickell
1981). Time-differencing is the core of instrumental variable-based estima-
tion in dynamic panels (e.g., Anderson and Hsiao 1981; Arellano and
Bond 1991).
Choosing Between Random-effects and Fixed-effects Specification in Static

Panel Data Models
An important issue in static panel data models is whether a random-effects

or a fixed-effects model is appropriate. The most important advantage
of the fixed-effects model is that it allows for a non-zero correlation
between unobserved individual effects ai and explanatory variables xit,
hence delivering consistent estimates regardless of whether the assump-
tion (cov (ai , xit) 5 0) truly holds. The random-effects model, on the other
hand, relies on the zero correlation assumption and delivers inconsist-
ent estimates if this assumption is violated. Only if the zero correlation
assumption (cov (ai , xit) 5 0) holds, is the random-effects specification
more desirable than the fixed-effects specification because it generates
more efficient parameter estimates.
Some researchers prefer random-effects models because they allow
identifying parameters on time-invariant regressors (e.g., gender). Indeed,
in the fixed-effects model, where all time-invariant effects are differenced
out, it is impossible to distinguish between the effects of time-invariant
observables (individual-specific characteristics) and the unobservable
fixed effects. This motivation alone, however, is never a legitimate reason
for selecting random-effects over fixed-effects specification.
The choice between random-effects and fixed-effects model specifica-
tion should be driven by the validity of the assumption of no correlation
between the unobservable factors ai and the explanatory factors xit (i.e.,
MIZIK_9781784716745_t.indd 112 14/02/2018 16:38

(cov (ai , xit) 5 0). Other considerations should not drive the choice between
random-effects versus fixed-effects model specification (Wooldridge 2006,
493). Specification tests for choosing fixed-effects versus random-effects
exist and the Hausman (1978) test is the most popular among them. It is
focused on assessing the validity of the cov (ai , xit) 5 0 assumption. We
describe the test, its interpretation, and limitations later in the chapter.
DYNAMIC PANEL DATA MODELS
In dynamic panel data models, a lag of the dependent variable enters

the right-hand-side of the estimating equation as another explanatory
variable. Researchers are often compelled to include a lagged dependent
variable as a predictor when estimating regression models for longitudinal
panel data. The reason is that in most situations, the best predictor of
what happens at time t is what happened at time t – 1. Many marketing
processes and data series marketing researchers work with (sales, earnings,
etc.) have fixed effects and also exhibit high levels of persistence (autocor-
relation) and, as such, warrant the inclusion of lagged dependent variables
into the model:
yit 5 ai 1 fyit21 1 bxit 1 eit (5.5)
Models with lagged dependent variables are known as dynamic panel

data models and econometricians have long emphasized that lagged
dependent variables can cause major estimation problems and lead to
severe biases, particularly when individual-specific effects are present.
OLS, random-effects, and within estimators generate biased estimates in
dynamic panel data models and instrumental variable-based estimators
(Anderson and Hsiao 1981; Arellano and Bond 1991) are preferred for
dynamic panel data models with individual-specific effects. Unfortunately,
some of the estimation issues in the dynamic panel data models are not
widely known or appreciated in marketing applications.
Problems with OLS, Within, and Random-effects Estimators in Dynamic

Panel Data Models
When a lagged dependent variable enters the model with unobserved indi-
vidual effects, standard OLS, within, and random-effects estimators are
not appropriate, as we describe below.
MIZIK_9781784716745_t.indd 113 14/02/2018 16:38

OLS
The OLS estimator generates biased and inconsistent estimates of model
5.5. The intuition is straightforward. Consider the OLS estimation of
model 5.5:
yit 5 a0 1 fyit21 1 bxit 1 ai 1 eit (5.6)
Both yt and yt21 depend on ai. This means that the lagged dependent
variable yt21 and ai, which is a part of the composite OLS error (ai 1 eit), are
correlated. As such, the exogeneity assumption is violated and the estimate
of f, as well as the estimates for the other explanatory variables correlated
with regressor yt21 , are biased. Hsiao (2014, 86) formally derives the bias for
the OLS estimator of f in a simple autoregressive model with fixed effects
and reports that OLS tends to overestimate the magnitude of the autoregres-
sive coefficient. Higher variance of individual-specific effects s2a increases
the magnitude of the bias.
Trognon (1978) provides OLS bias formulas for a dynamic panel data
model with exogenous regressors and for an autoregressive process of
order p. Adding exogenous explanatory variables does somewhat reduce
the magnitude, but does not alter the direction or the bias in f: in the first-
order autoregressive model with exogenous regressors, the OLS estimate
of f remains biased upward and the effects of the exogenous factors are
underestimated (their estimates are biased toward zero). The direction of
the asymptotic bias for higher-order autoregressive models is difficult to
postulate a priori.
Within estimator
The within estimator is not appropriate for the dynamic panel data models
with individual-specific effects either. The within transformation of the
data in the dynamic panel data models leads to biased estimates. If we
apply the within estimator to model (5.5), we would regress ( yit 2yi ) on
(yit21 2 yi ) and (xit 2 xi):
yit 2 yi 5 (ai 2 ai ) 1 f ( yit21 2yi ) 1 b (xit 2xi ) 1 (eit 2ei ) 5
f ( yit21 2 yi ) 1 b (xit 2 xi ) 1 (eit 2 ei ) (5.7)
This regression has an error term equal to (eit 2 ei ) . By construction, yit

is a function of eit and yit21 is a function of eit21. But eit21 enters the calcu-
lation of the mean of errors (ei ) and, as such the lagged mean-differenced
dependent variable regressor ( yit21 2 yi,21) is correlated with the mean-
differenced error term (eit 2 ei ) . Specifically, yit21 and ei are correlated
MIZIK_9781784716745_t.indd 114 14/02/2018 16:38

because they share a common component (eit21) . This correlation of the

lagged mean-differenced dependent variable with the mean-differenced
error term gives rise to the dynamic panel bias (Nickell 1981).
Nickell (1981, 1422) derives the general expression for the within esti-
mator bias in dynamic panels. For the arbitrary T and f the bias is equal
2 (1 1f) 1 (12 fT)
plimN S ` (f̂2 f) 5 e12 f 3
T 21 T 12 f
2f 1 (1 2 fT) 21
e12 a1 2 bf
(12 f) (T21) T 12 f
The magnitude of the bias can be significant. For example, when the
true value of f 50.5 and T 5 10, the bias is equal to –0.167. This implies
a 33.4 percent deviation from the true value (i.e., –0.167/0.5). As long as f
is positive, the sign of the bias is always negative and the within estimator
underestimates the magnitude of f.
The severity of the bias for the within estimator is greater for shorter
panels. The bias diminishes for longer time series because as T‡∞, the
contribution of eit21 to ei decreases and ( yit21 2 yi,21) becomes asymptoti-
cally uncorrelated with (eit 2 ei ), reducing the dynamic panel bias of the
mean-difference (i.e., within) estimator. For large T, the asymptotic bias
is approximated by:
2 (11 f)
plimN S ` (f̂ 2 f) >
T21
Random effects
A random-effects specification is generally not appropriate in dynamic
panel data models because the assumption of no correlation between the
unobservable factors μi and the explanatory factors is violated. The logic
is straightforward. If we add a lagged dependent variable to the set of
explanatory variables in a random-effects model (5.1), we obtain the fol-
lowing model:
yit 5 a0 1 fyit21 1 bxit 1 mi 1 eit (5.8)
In the random-effects models the random intercept (μi) is assumed to be

independent of all other variables on the right-hand side. μi represents the
combined effect on yit of all unobserved variables that are constant over
time. Because the model 5.8 applies at all time points, μi also has a direct
effect on yit21 :
MIZIK_9781784716745_t.indd 115 14/02/2018 16:38

yit21 5 a0 1 fyit22 1 bxit21 1 mi 1 eit21 (5.9)
That is, yit21 is not statistically independent of μi, which is a component

of the composite error in the equation (5.8) above. This violation of the
zero correlation assumption in the random-effects model biases both the
coefficient for the lagged dependent variable yit21 and the coefficients of
all other explanatory variables xit correlated with yit21.
For a summary discussion of the required assumptions about the initial
conditions, and the resulting consistency/inconsistency of the maximum
likelihood (MLE), generalized least-squares (GLS), instrumental variables
(IV), and generalized method of moments (GMM) estimators in models
with individual effects, see Hsiao (2014). Different assumptions about
initial conditions (Hsiao 2014, 87, outlines four different cases and six
subcases) imply different likelihood functions and generate different
results. It is often not possible to make an informed choice regarding the
initial conditions, and an incorrect choice results in inconsistent estimates.
Anderson and Hsiao (1981) proposed a simple consistent estimator that
is independent of initial conditions, and it became the foundation for
the development of a set of consistent estimators preferred in empirical
applications with dynamic panel data models.
Consistent Instrumental Variable-based Estimation of Dynamic Panel

Data Models with Individual-specific Effects
The first-difference instrumental variable-based estimator developed by

Anderson and Hsiao (1981) and its extensions (e.g., Arellano and Bond
1991) became dominant for estimating dynamic panel data models with
individual effects.
Consider the first-difference transformation of equation 5.5:
yit 2 yit21 5 f ( yit21 2yit22) 1 b (xit 2 xit21) 1 (eit 2eit21) (5.10)
By construction, yit21 is correlated with eit21 and f is biased. As such,

an instrument Z is required for the regressor ( yit21 2 yit22) . An instru-
mental variable candidate should exhibit the properties of relevance
(i.e., cov (Z,yit21 2 yit22) 20) and validity (i.e., cov (Z,eit 2 eit21) 5 0).
Anderson and Hsiao (1981) pointed out that yit22 is a valid instrument for
( yit21 2 yit22 ) because it is not correlated with eit21. The estimation can be
carried out in a two-stage least squares (2SLS) procedure:
Step 1: Regress ( yit21 2 yit22 ) on yit22 and obtain predicted values
i
Dyit21 . Since yit22 is a valid instrument, Dyit21 is a portion of ( yit21 2 yit22 )

i
uncorrelated with eit21.
MIZIK_9781784716745_t.indd 116 14/02/2018 16:38

i
Step 2: Regress (yit 2 yit21) on Dyit21 and (xit 2 xit21) . The resulting
estimates f̂ and b̂ are consistent.
Other valid instruments also exist. For example, (yit22 2 yit23) is also
a valid instrument for ( yit21 2 yit22) . Using (yit22 2 yit23) rather than
yit22 , however, requires an additional time period of data and leaves the
researcher with N fewer observations in the final estimation step. The
strength of a particular instrumental variable is an empirical question,
and can be examined in the first stage of 2SLS estimation. The Anderson-
Hsiao estimator is implemented in Stata with xtivreg, fd command.
Extending this logic of Anderson and Hsiao (1981) further, any level or
difference of yit , appropriately lagged, is a valid instrumental variable for
( yit21 2 yit22 ) . The pool of such potential instrumental variables grows
with increasing T. Certain optimal combinations of instrumental vari-
ables might deliver more efficient estimates. Identification of this optimal
combination is at the core of Arellano and Bond (1991) estimator.
The Arellano-Bond GMM estimator specifies a system of equations
(one equation per time period) and allows the instruments to differ for
each equation (e.g., additional lags are available as instruments in later
periods). As we have many instruments and only one variable that requires
instrumentation ( yit21 2 yit22) , the system will be overidentified, calling
for the use of Generalized Method of Moments (GMM).
The method of moments estimator uses moment conditions of the type:
ticular instrument: Z1it ' (eit 2eit21) . N1 g t51 Z1it

E [ Z1it
r (Dy 2 fDy
it it21 2 bDxit) ] 5 0, which reflects the validity of a par-
N r De 50 are sample ana-
it
logues of these moment conditions. The goal of the method of moments
estimator is to find values b and f such that sample moment conditions
are satisfied. If the system is overidentified (i.e., there are more instru-
ments than variables that require instrumentation), it is often impossible
to find values b and f that strictly satisfy all orthogonality conditions.
Instead, the idea underlying the GMM approach is to find b and f that
minimize a certain (loss) function of all sample moment conditions. Such
objective function often takes the form:
J ( b, f) 5 g ( b,f)r W g ( b,f)
Here, g ( b,f) is l 3 1 vector of l stacked sample moment conditions, l is

the number of instruments, and W is a l 3 l weighting matrix. As long
as W is positive-definite, GMM estimates of b and f will be consistent
(Wooldridge 2002, 422). However, certain choices of W can also deliver
efficiency of b and f estimates. The optimal weight corresponding to a
specific moment condition is typically inversely proportional to the vari-
ance of this moment condition.
MIZIK_9781784716745_t.indd 117 14/02/2018 16:38

The Arellano-Bond (1991) estimator is defined as:
bAB 5 c a a X rZi bWN a a Zi rXi b d a a X rZi bWN a a Zir yi b

N N 21 N N
Y
8
i51i i51 i51 i i51
Xi is the matrix of regressors where row t is [ Dyit21, Dxitr ] (t=3,. . ., T), yi

8
8
is a vector of the dependent variable with Dyit in row t, and Zi is a matrix
of instruments:
zi3 r 0 ... 0
0 zi4 r ( 0
Zi 5 ≥ ¥
( ( f (
0 0 . . . ziT r
The zit element of Zi is [ yit22 , yit23, . . . ,yi1 , Dxitr ] , and the number of rows
of Zi equals to T 2 2. For example, if T 5 5,
yi1 Dxi3 0 0 0 0 0 0 0
Zi 5 £ 0 0 yi2 yi1 Dxi4 0 0 0 0 §
0 0 0 0 0 yi3 yi2 yi1 Dxi5
The intuition underlying this structure is as follows. Suppose we observe

five years of panel data, 2011 to 2015. For 2011 and 2012 we do not have valid
instruments (e.g., we do not observe years 2009 and 2010). Thus, only years
2013–2015 will enter the Arellano-Bond estimation procedure. We have only
one valid instrument for 2013 coming from 2011. For 2014 we have two valid
instruments from 2011 and 2012. For 2015 we have three valid instruments.
Arellano-Bond GMM-based estimator utilizes information more efficiently
(compared to Anderson and Hsiao 1981), especially for longer panels as the
pool of available instruments grows in T. When T is large, a researcher might
wish to limit the maximum number of lags of an instrument. The Arellano-
Bond (1991) estimator is implemented in Stata with the xtabond routine.
One weakness of the Arellano-Bond (“Difference GMM”) estimator
is that lagged levels sometimes can be rather weak instruments for the
first-differenced variables. The problem is particularly pronounced when
the variables exhibit high autocorrelation (e.g., random walk). Arellano
and Bover (1995) and Blundell and Bond (1998) developed the so-called
System GMM estimator, which incorporates lagged differences, along
with lagged levels of yit, into the matrix of instruments Zi. Incorporating
additional information contained in lagged Dyit allows to further increase
efficiency of the estimator. The Blundell and Bond (1998) estimator is
implemented in Stata with xtdpd.
Lags of independent variables as instruments are consistent under the
MIZIK_9781784716745_t.indd 118 14/02/2018 16:38

assumption that idiosyncratic errors eit are not serially correlated. This
assumption is testable through the Arellano-Bond (1991) test for serial
correlation in errors. If eit are iid, then ∆eit exhibit negative first-order
serial correlation and zero serial correlation at higher orders. That is,
when the null hypothesis of no serial correlation is rejected at order 1,
but is not rejected at higher orders, the validity of Arellano-Bond instru-
ments is supported. The test is implemented in Stata with estat abond
command which should be run after xtabond (or xtdpd in case of system
GMM estimation). The Sargan/Hansen test of overidentifying restrictions
(Sargan 1958, Hansen 1982) assesses the joint validity of instruments in
a given model. xtabond2 command reports Sargan and Hansen statistics
separately after model estimation. Roodman (2009) offers discussion of
the tests and their interpretation.
The two-step Arellano-Bond estimation has been shown to generate
downward biased standard errors (the one-step implementation does
not have this issue). Arellano and Bond found that “the estimator of the
asymptotic standard errors of GMM2 shows a downward bias of around
20 percent relative to the finite-sample standard deviations” (1991, 285).
The Windmeijer (2005) finite sample correction resolves the issue. It is
available in Stata with the xtabond, twostep vce(robust) command syntax.
SPECIFICATION TESTING
How can a researcher choose an appropriate model specification and esti-

mator for the data at hand? Hausman (1978) suggested a specification test
designed to assist researchers in choosing between potential alternative
estimators. The test relies on the observation that two consistent estimates
will not differ systematically. The Hausman specification test can be used
to determine the possible presence of the different types of unobservable
factors and their correlation with the explanatory factors. The hypothesis
of no time-invariant effects, for example, can be assessed by comparing
the estimates of the fixed-effects model with the random-effects model.
Similarly, the fixed-effects estimator can be compared to the fixed-effects/
instrumental variable estimator to test for the presence of contempora-
neous shocks correlated with the error term and the fixed-effects/instru-
mental variable model can be compared to the fixed-effects/instrumental
variable/serial correlation model to test for the presence of an autocorre-
lated error term. In the discussion below we use the test for random versus
fixed-effects specification as an illustration.
The following logic underlies the Hausman specification test. Fixed-
effects estimates are assumed to be consistent whether the assumption of
MIZIK_9781784716745_t.indd 119 14/02/2018 16:38

Table 5.1 Hausman test for fixed-effects vs. random-effects specification
FE estimator RE estimator Implication

H0: (cov (ai , xit) 5 0) Consistent Consistent and RE model preferred
efficient
H1: (cov (ai , xit) 2 0) Consistent Inconsistent FE model preferred
cov (ai , xit) 5 0 holds or not, because they directly account for time-invar-
iant individual-specific unobserved heterogeneity. The random-effects
model estimates are consistent and efficient (i.e., minimum variance)
under the null hypothesis that the fixed effects and the contemporaneous
shocks are uncorrelated with the explanatory factors. However, under the
alternative hypothesis of omitted fixed effects correlated with the explana-
tory factors included in the model, the random-effects estimates will be
biased and inconsistent (see Table 5.1).
Under the null hypothesis of the time-invariant individual-specific effects
ai being uncorrelated with the explanatory factors xit (i.e., cov (ai , xit) 5 0),
the estimates from a random-effects model should not differ significantly
from the estimates obtained from a fixed-effects model. If a statistically
significant discrepancy between random-effects and fixed-effects model
estimates is not detected, the finding is interpreted as evidence in favor of
the assumption that individual effects are (approximately) uncorrelated
with the regressors. In such a case, random-effects estimates are consist-
ent and the random-effects model is preferred to fixed-effects models
because the random-effects estimates are efficient and the coefficients
on time-invariant regressors can be identified. However, if a significant
discrepancy between random-effects and fixed-effects model estimates is
found, random-effects estimates are deemed inconsistent and the fixed-
effects model is preferred.
The Hausman test statistic can be computed as:
( b̂FE 2 b̂RE) 2
H5
Var ( b̂FE ) 2 Var ( b̂RE )
Under the null hypothesis H follows x2M distribution, where M is the

dimensionality of the coefficient vector. The test can be performed for the
whole set of coefficients on time-varying regressors (time-invariant regres-
sors are not identified in the fixed-effects model) or for a subset of the
coefficients of interest. In Stata, this test is implemented with the hausman
command.
Before interpreting the Hausman test and using it to choose between
MIZIK_9781784716745_t.indd 120 14/02/2018 16:38

estimators, however, it is important to understand the underlying assump-

tions and limitations of this test.
Assumption of Consistency of ̂FE under Both the Null and the Alternative
Hypotheses
The Hausman test relies on the assumption that the fixed-effects estimator
b̂FE is consistent. That is, it assumes that there is no correlation between
xit and eit in any time period once fixed effects are accounted for. This
assumption can be violated. For example, it is violated if relevant variables
are omitted or the unobserved heterogeneity in the model is time-variant
and the unobserved effect varies over time (ait). In this case, a fixed-effects
estimator is not consistent, and cannot serve as an appropriate bench-
mark in the Hausman test. Under time-varying unobserved heterogeneity,
neither fixed-effects nor random-effects estimators are appropriate and
the Hausman test would not indicate that.
In the classic interpretation of the Hausman test, the difference between
the random-effects and fixed-effects model estimates is attributed to
a single issue, namely, the correlation between the unobserved fixed
effects and the explanatory factors. Often, in empirical applications the
discrepancy between the fixed-effects and random-effects estimators can
be driven by other factors.
For example, when the right-hand-side variables are subject to measure-
ment error, a fixed-effects estimator can be subject to a greater attenuation
bias compared to a corresponding cross-section estimate. The fixed-effects
estimator removes all cross-sectional variation in the data, which is good
because it removes the biases due to unobserved individual heterogene-
ity. However, is also removes useful information about the variables of
interest. Depending on the characteristics of particular data, the change
in the signal-to-noise ratio as a result of applying a fixed-effects estimator
is ambiguous, and in many cases is disadvantageous. When measurement
error is present, a researcher undertaking a Hausman test might find that
fixed-effects estimates are lower in absolute magnitude compared to the
alternative random-effects or OLS estimates. The difference might be due
to the unobserved heterogeneity biases in random-effects and OLS, or
it can be due to the attenuation bias exacerbated by the differencing of
the data in the fixed-effects estimation. In such case, rather than relying
on the Hausman test to choose between fixed-effects and random-effects
estimators, a researcher should undertake steps to investigate and tackle
the potential measurement error problem (e.g., through IV methods).
MIZIK_9781784716745_t.indd 121 14/02/2018 16:38

Assumption of Efficiency for the Random-effects Estimator
A fundamental assumption of the Hausman test for the random-effects

estimator is that individual effects are distributed independently of the
idiosyncratic error and regressors. The assumption of efficiency is violated
when the data are clustered. In empirical applications where cluster-robust
standard errors are preferred over classical errors, a robust Hausman test
procedure might be required (Cameron and Trivedi 2009, 261). Such a
situation might occur, for example, if there are no distinct individual fixed
effects, but rather the errors uit for a given panelist i exhibit significant
autocorrelation.
Cameron and Trivedi (2009) suggest the following procedure for a
robust Hausman test. Test H0: g 5 0 in the following regression:
(yit 2 ûyi ) 5 (1 2 û ) a 1 (xit 2 ûxi ) b 1 (xit 2 xi ) g 1vit ,
where xit refers to time-varying regressors and u^ is an estimate of q 5 1 −

!σe2 /(Tσα2 + σe 2) the relative proportion of how much between versus within
variation is used by the random-effects estimator (u 5 0 corresponds to a
pooled OLS estimate, u 5 1 corresponds to a fixed-effects estimate—i.e.,
within variation only). u^ could be estimated beforehand using random-
effects estimation (e.g., it is a part of a standard output in xtreg, re
command in Stata). The interpretation of rejecting H0: g 5 0 is similar to
that in the classic Hausman test.
“All or Nothing” Assumption Regarding Exogeneity in the Model
The null and the alternative hypotheses in the Hausman test refer to
extreme cases where either all covariates are exogenous (i.e., the random-
effects estimator is appropriate), or none of the regressors are exogenous
(a fixed-effects model is required). Baltagi (2005, 19) notes that one should
probably not immediately proceed with fixed-effects estimation if the
classic Hausman test rejects H0. Instead, he advises researchers to explore
models that allow for only some regressors to be correlated with the fixed
effects ai , while still maintaining the assumption (that all regressors xit are
uncorrelated with idiosyncratic shocks eit).
Hausman and Taylor (1981) developed an estimator which allows
some of the regressors in the set xit to be correlated with ai. The Hausman
and Taylor (HT) estimator is an instrumental variable-based estimator
(implemented in Stata with command xthtaylor). It combines the elements
of both fixed-effects and random-effects estimators and offers a range of
benefits. The HT procedure gives researchers additional flexibility: when it
MIZIK_9781784716745_t.indd 122 14/02/2018 16:38

is appropriate, it delivers consistent estimates that are more efficient than

fixed-effects and it allows for identification of time-invariant regressors.
As such, it generates better estimates than either the random-effects or the
fixed-effects estimators.
Baltagi (2005, 132) suggests the following sequence of steps in applying
the HT pre-test estimator:
Step 1: If H0 of the standard Hausman test (fixed-effects vs. random-

effects) is not rejected, a random-effects model should be chosen.
Step 2: If H0 of the standard Hausman test is rejected, HT estimation is

implemented, and another Hausman test (fixed-effects vs HT) is run.
If H0 of the second Hausman test is not rejected (no systematic difference

between fixed-effects and HT estimates), HT model should be used. If H0
of the second Hausman test is rejected, a fixed-effects model should be
used.
Power Issues
The Hausman test is a statistical test derived under large sample properties.
The denominator of the Hausman statistic relies on the asymptotic
variances of coefficient estimates. The betas are assumed to be nor-
mally distributed with means b̂FE and b̂RE and the asymptotic variances
Var ( b̂FE ) and Var (b̂RE ) . The Hausman test computed for small samples
should be viewed with additional caution because the variances ( b̂FE ) and
( b̂RE ) calculated based on small samples can be far from their asymptotic
counterparts.
MEASUREMENT ERROR IN PANEL DATA MODELS
Measurement error is a well-known problem in the empirical literature. Its

consequences can be more severe in panel data setting.
The error-in-variables problem typically refers to measurement error
in the independent variables. An immediate consequence of the error-
in-variables problem is the so-called attenuation bias in the estimated
coefficient of interest. That is, a bias toward zero. Measurement error
in the dependent variable has less severe consequences. It causes loss of
efficiency, but it does not cause bias in the estimates. In the discussion that
follows we focus on the measurement error in the independent variables
and potential solutions for obtaining consistent estimates.
MIZIK_9781784716745_t.indd 123 14/02/2018 16:38

Errors in Variables in Cross-sectional Settings
To introduce the problem, let us begin with a simple cross-sectional illus-

tration of measurement error in the independent variable. Consider the
following model:
yi 5 a0 1 bxi 1 ei (5.11)
We are interested in estimating b, which measures the relationship

between xi and yi . However, we can only observe x*i , which is our measure
of xi combined with a classical measurement error ni (x*i 5 xi 1 ni). That
is, vi is iid noise with a mean of zero and variance s2v and is uncorrelated
with xi and ei. Because cov ( xi , ni ) 5 0 and because x*i 5 xi 1 ni , it follows
that our observed measure x*i is correlated with vi. The magnitude of
their covariance is equal to the variance of the measurement error ni :
cov (x*i , ni) 5 E (x*i , ni) 5 E (xi , ni) 1 E (n2i ) 5 s2v .
The covariance between our observed measure x*i and measurement
error vi causes a non-zero correlation between the regressor and the
composite error in the model:
yi 5 a0 1 b (x* *
i 2ni ) 1 ei 5 a0 1 bx i 1 (ei 2 bni)
Because cov ( xi , ni ) 5 0, var (x*i ) 5 var (xi) 1var (ni) 5 s2x 1 s2v , and
cov (x*i , ei 2 bni) 5 2bcov (x*i , ni ) 52bs2v , we can derive the OLS estima-
tor as:
cov (x*i , ei 2 bni ) bsn2 sx2

plim ( b̂) 5 b 1 5 b 2 5 ba b (5.12)
var (x*i ) sx2 1 sn2 sx2 1 sn2
2
Unless sv2 50, the multiplier term ( s 2 1
sx
sn2
) is always less than 1 and b̂ is
x
inconsistent and biased toward zero. This result is known as attenuation
bias. The magnitude of the bias depends on the signal-to-noise ratio:
the greater the variance of the measurement error (noise) relative to the
variance of the true regressor xi (signal), the greater the magnitude of the
bias.
Inclusion of additional regressors into model 5.11 increases the magni-
tude of the attenuation bias and the bias spreads to additional regressors.
Please see next section for a discussion of measurement error bias in
multivariate setting and bias spreading.
MIZIK_9781784716745_t.indd 124 14/02/2018 16:38

Errors in Variables in Static Panel Data Models
Measurement error can be significant in the cross-sectional setting, but

in the panel data setting, the attenuation bias due to measurement error
can become even more severe, particularly when the researcher utilizes the
mean-difference or the first-difference panel data estimators to control for
time-invariant individual-specific fixed effects ai . Under strict exogeneity
in the classical errors-in-variables model, differencing removes the omitted
variable (fixed effects) bias but exacerbates measurement error bias. The
intuition behind this phenomenon is straightforward: while eliminating
the effect of ai , the within and the first-differencing estimators remove a
large portion of variation in the data, both the noise and the signal. For a
wide variety of data generating processes underlying the xit and nit series,
the signal-to-noise ratio decreases when the within or the first-difference
estimators are applied, making the attenuation bias in the estimates more
pronounced. The measurement error and the resulting attenuation bias
may be responsible for the within and the first-difference estimators gener-
ating small and insignificant estimates in many empirical settings (Angrist
and Pischke 2008).
Measurement error bias in OLS and first difference-estimators in static

panels
Let us consider the following static panel data model with measurement
error in the independent variable:
yit = ai + bxit + eit (5.13)
Here, xit is the true regressor of interest, and x*it is its observed value
which is measured with measurement error nit where x*it 5 xit 1 vit. For
generality, let us allow xit series to be autocorrelated with the autocor-
relation parameter gx (gx , 1) and the measurement error vit series to be
autocorrelated with the autocorrelation parameter gn (gn , 1) , such that
cov (nit , nit21 ) 5 gn s2n , where Var (nit) 5 sv2. Further, let’s assume that
the measurement error vit is not correlated with the true regressor xit , the
unobserved individual effect ai, and the idiosyncratic error eit . Estimating
model 5.13 by OLS yields the following probability limit for the estimate
8
bOLS :
sx2 Cov (xit ,ai)
8
plim N S ` bOLS 5 b 2 2
1 (5.14)
sn 1 sx sv2 1 sx2
8
The total bias of bOLS consists of two components. The first term,
2
multiplier ( s 2s1x s 2 ) , is the familiar attenuation bias caused by the presence
n x
MIZIK_9781784716745_t.indd 125 14/02/2018 16:38

of the measurement error. The second term (Cov(xit, αi ) /( σ2ν + σ2x)) is the
omitted variable bias caused by the failure to account for the individual
heterogeneity.
Individual-specific heterogeneity effects ai can be eliminated from
model 5.13 through first-differencing and estimating the model:
Dyit 5 bDxit 1 Deit . In this formulation, the expected value of b̂ can be
derived similarly to that in equation 5.12 as:
2
sDx
plim (b̂) 5 ba 2 1 s 2 b,
where
sDx Dv
2 5 Var (x 2 x
sDx it it21) 5Var (xit) 22cov (xit , xit21) 1Var (xit21)
Assuming that xit is stationary means that moments of xit distribution

are the same for any t. In particular, Var (xit) 5Var (xit21) . Then,
2 5 2s 2 2 2cov (x , x
it21) 5 2sx (12 gx) . If vit is stationary as well, then
sDx 2
x it
2 2 ( )
sDn 5 2sn 1 2gn . Hence, the probability limit of the first-difference esti-
mate under measurement error (Pischke 2007) is
sx2 (12gx)
plim N S ` b^ FD 5 b (5.15)
sx2 (12gx) 1 sn2 (1 2 gn)
We can compare the magnitude of the bias in the OLS (equation 5.14)
and first-difference (equation 5.15) estimates. If there is no measurement
error (sn2 5 0) , the first-difference estimate is unbiased while OLS is
biased because it fails to account for individual heterogeneity. If sn2 . 0,
both estimators are subject to attenuation bias, and the relative size
of the biases depends on gv and gx , the degree of autocorrelation in
the measurement error and explanatory variable, respectively. If xit is
autocorrelated stronger than the measurement error vit (i.e., gx . gv), first-
differencing xit results in a reduction in the signal-to-noise ratio making
the attenuation bias b̂FD more severe compared to the attenuation bias
component in the OLS estimate. When nit resembles white noise (no
persistence), the attenuation bias of the first-difference estimator is large,
especially for higher gx . On the other hand, as the persistence in the
measurement error increases (gn goes to 1), the attractiveness of the first-
difference estimator increases.
Measurement error bias in mean-difference and first-difference estimators

in static panels
Griliches and Hausman (1986) compared attenuation biases of the mean-
difference (the within) and first-difference estimators. Both estimators
MIZIK_9781784716745_t.indd 126 14/02/2018 16:38

Table 5.2 Conditions when the attenuation bias is smaller for the within
estimator versus the first-difference estimator, under rj = 0
T=2 Biases are identical

T=3 r2 , r1
T=4 23 r2 1 13 r3 , r1
. . .
2
T‡ T ( r1 1 r2 1 ..) , r1
Source: Adapted from Griliches and Hausman 1986, p. 99.
address the individual heterogeneity issue by differencing out ai , but they

have different implications for the magnitude of the measurement error
bias. Griliches and Hausman (1986) point out that, while the attenuation
bias in the first-difference estimator does not depend on the lengths of the
time series dimension T (if N S `) , it does so for the within estimator
because the mean-differencing transformation for the within estimation is
calculated taking into account all periods. As such, the relative advantage
of a particular estimator depends on T, rj (the j-th order autocorrelation
coefficient of the true regressor), and rj (the j-th order of the autocorrela-
tion coefficient in the measurement error).
Under rj 5 0 (for all j), higher rj results in larger attenuation bias
for the first-difference estimator since first-differencing removes “more
of a signal” in the variable with higher autocorrelation (Griliches and
Hausman 1986, 98). The relationship between the biases under the within
and the first-difference estimators is summarized in Table 5.2 above.
The condition for the within estimator to be less biased than the first-
difference estimator depends on the decay pattern in the xit correlogram:
the steeper the decline in the autocorrelation function of xit , the greater the
attenuation bias under first-differencing, compared to the bias under the
within estimator.
The intuition of this result generalizes to the case when measurement
error is autocorrelated with coefficient rj . Generally, if rj . rj . 0 for all j
(i.e., the serial correlation is greater in the explanatory variable than in the
measurement error) and the decline in the autocorrelation function of xit
is steeper than that in the autocorrelation function of vit , the within esti-
mator is less biased than first-difference estimator. For exact conditions
under which the within is less biased than the first-difference estimator
under correlated errors, see Griliches and Hausman (1986, 101).
MIZIK_9781784716745_t.indd 127 14/02/2018 16:38

Errors in Variables in Dynamic Panel Data Models
In many empirical settings with measurement error problems, the within

estimator may be more consistent compared to the first-difference estima-
tor. However, the within estimator is not appropriate in dynamic panel
data models. In dynamic models where measurement error is suspected,
the researcher can consider long-difference estimators to assess the
problem and reduce the measurement error bias.
If measurement error is not autocorrelated (rj 5 0, for all j), then a
long-difference estimator with order j 5T21 (i.e., xit 2 xiT21) is optimal
(it is also less inconsistent than the within estimator in static models). For
differences of orders longer than 1 and shorter than T21, the situation is
more ambiguous, and the outcome depends on T and the speed of auto-
correlation decay of xit . If the measurement error is autocorrelated, then
the optimal order of the difference estimator (i.e., the differencing of order
j which minimizes attenuation bias in the long-difference estimator) is one
that maximizes the expression (12rj) / (12rj) (Griliches and Hausman
1986, 101). Depending on the data-generating processes underlying xit and
vit , optimal j might be 1, T – 1, or something in-between.
Assessing and Managing Measurement Error Problem in Panel Data

Models
To assess the potential presence of measurement error, the researcher

can compare results from the within, the first-difference, and the long-
difference estimators. Under no measurement error in static fixed-effects
models, the estimates should be roughly the same since all three estimators
are consistent as they eliminate the unobserved individual effect ai . If first-
difference estimates are lower in magnitude compared to within estimates,
and the discrepancy in magnitude dissipates/reverses when longer differ-
ences are used, this pattern might indicate the presence of a measurement
error. Similarly, in dynamic models, an increase in the magnitude of the
estimates between the first-difference and long-difference estimators may
indicate the presence of measurement error.
Dealing with the measurement error problem in panel data models
typically requires finding instruments. First, one can look for external instru-
ments that are correlated with the true underlying variable xit , but uncorre-
lated with the measurement error nit . Such instruments are often difficult to
find. Second, depending on the statistical properties of xit and vit , one might
be able to use certain lags/leads of the observed variable x*it as an instrument.
In particular, if nit is iid and if xit is serially correlated, one could potentially
use x*it22 and/or Dx*it22 to instrument for Dxit in first-difference estimation
MIZIK_9781784716745_t.indd 128 14/02/2018 16:38

(Hsiao 2014, 456). In general, if vit is known/assumed to exhibit a certain

structure, a consistent IV-based estimation should be available provided that
the panel at hand is long enough. For a further reading and applications
of IV-based measurement error treatments in panels, we refer the reader to
Hsiao (2014), Griliches and Hausman (1986), and Biørn (2000).
BIAS SPREADING IN MULTIVARIATE MODELS

One common misconception about the violation of the exogeneity (zero
correlation) assumption is that if only one of the variables in the set of
explanatory factors is correlated with the error term, then the other coef-
ficients will still be consistently estimated. This is incorrect. The estimates
for all explanatory variables included in the model will be biased, unless
they are perfectly orthogonal. The bias effectively spreads from the endog-
enous regressor to the other estimates.
To provide a quick intuition for bias spreading, consider the fixed-effects
model (5.2) and assume that only one, the first, variable in the set xit (x1it)
is correlated with the individual-specific effect ai . That is, cov (ai , x1it) 20.
If the researcher chooses to estimate model yit 5 a0 1 bxit 1uit without
explicitly addressing the fixed effects ai , we have the situation where
uit 5 ai 1eit and E (uit 0 x1it) 2 0, with sx1,u 5 sx1,ai 2 0.
21
Because b̂5 b1 (X rX/N) (X rU/N) , plim ( b̂) 5 b requires plim (X rU/N)
5 0. If this does not hold, the estimator is inconsistent.
In our case,
sx1, ai
1 r 0 ¥
plim X U 5 ≥
N ..
and 0
plim (b̂ 2b) 5 plim (X rX/N)21 (X rU/N) 5
sx1, ai q11
1 0 q21
plim (X rX)21 ≥ ¥ 5 sx1, ai ≥ ¥ 5
N .. ..
0 qK1
sx1, ai 3 [ 1st column of Q21 ] ,
where Q = plim N1 X rX. Effectively, the bias is smeared over all other
estimates. It affects not only the estimate for x1, but to the extent x1 is
MIZIK_9781784716745_t.indd 129 14/02/2018 16:38

c orrelated with the other explanatory variables, the estimates for the other
explanatory variables are affected as well, even though they are uncorre-
lated with the unobserved time-invariant factor ai .
Endogeneity Bias Spreading in Multivariate Setting
The following illustrates bias spreading from endogenous to exogenous

variables in a two-variable model. Consider the following true model:
y 5 x1b1 1x2 b2 1 qg 1e (5.16)
Assume that the regressors x1, x2, and q are uncorrelated with the error
term e, i.e., plim N1 qre 5 0 and plim N1 xjr e 5 0 for j=1, 2. Also assume
that x1 is uncorrelated with q, while x2 is correlated with q. That is,
plim N1 x1r q 5 0, plim N1 x2r q 2 0. Further, assume that q is unobserved and
is omitted in the estimation. The estimating equation becomes
y 5 x1 b1 1x2 b2 1 h, (5.17)
where h 5qg 1e
As such, x1 is an exogenous regressor, while x2 is endogenous.
The Frisch–Waugh–Lovell theorem states that coefficients from a
multiple regression can be reconstructed from a series of bivariate
regressions. Specifically, b1 in the equation (5.17) above can be obtained
by first regressing y on x2 (step 1), then regressing x1 on x2 (step 2), and
finally regressing the residuals from step one on residuals from step two
(step 3).
Let us define the projection matrix P2 and the residual-making matrix
M2 (aka annihilator matrix) as follows:
21
P2 5 x2 ( x2r x2 ) x2r
M2 5 I 2P2 ,
where I is an identity matrix. P2 and M2 are symmetric (M2 5 M2r , P2 5 P2r )

and idempotent (P2 5 P2 P2 , M2 5M2 M2), and P2 x2 5 x2 , M2 x2 5 0 by
construction (Hayashi 2000, 9).
Applying projection and annihilator matrices to estimating equation
(5.17) yields representation of b1 as a function of residuals from two
bivariate regressions. To see this, multiply both sides of (5.17) by M2:
M2 y 5 M2 x1 b1 1 M2 x2 b2 1 M2h (5.18)
MIZIK_9781784716745_t.indd 130 14/02/2018 16:38

Because M2 x2 50 (M2 x2 = I − x2(x′2 x2)−1 x′2 = x2 − x2(x′2 x2)−1 x′2 x2 = 0),

equation (5.18) becomes
M2 y 5 M2 x1 b1 1 M2h (5.19)
Redefining M2 y 5 |
y , M2 x1 5 |
x and M2h 5 |
h, equation (5.19) can be
written as
|
y 5|
x b1 1 |
h (5.20)
Then
b1 5 (|
xr|
x) (|x r|
21
y ) 5 (x1r M2 rM2 x1) 21 (x1r M2 rM2 y)
8
(5.21)
Because M2 is symmetric (M2 5 M2 r) and idempotent (M2 5 M2 M2),

(5.21) can be rewritten as:
b1 5 (x1r M2 x1) 21 (x1r M2 y) 5

8
(x1r M2 x1)21 (x1r M2 (x1 b1 1 x2 b2 1 qg 1 e)) =
(x1r M2 x1) 21 x1r M2 x1 b1 1 (x1r M2 x1)21 x1r M2 x2 b2 1 (x1r M2 x1)21 x1r M2 (qg 1e)
(5.22)
Since M2 x2 5 0, second term becomes zero, and (5.22) is simplified to
b1 5 b1 1 (x1r M2 x1)21 x1r M2 (qg 1e)

8
(5.23)
(x1r M2 x1) 21 x1r M2 (qg 1 e) is the “smeared bias” term. To derive the
probability limit of this bias let us first simplify the x1r M2 (qg 1e)
component:
x1r M2 (qg 1e) 5 x1r (I 2 x2 (x2r x2)21 x2 r) (qg 1 e) 5
x1r (qg 1e) 2x1 rx2 (x2r x2) 21 x2 rqg 2x1 rx2 (x2r x2) 21 x2 re 5
covx1,x2 covx2,q
a2 g b (5.24)
Vx2
Employing the exogeneity assumption on x1 (i.e., plim N1 x1r e 50 and
plim N1 x1r q 5 0) and assumption plim N1 x2r e 5 0, the terms x1r (qg 1 e) and
MIZIK_9781784716745_t.indd 131 14/02/2018 16:38

x1 rx2 (x2r x2) 21 x2 re cancel out. (x2r x2) 21 x2r x1 is an OLS estimate from bivar-
iate regression of x1 on x2, which equals covVx , x . 1 2
x
Now, let us rewrite x1r M2 x1 as follows: 2
x1r M2 x1 5 x1r (I 2 x2 (x2r x2) 21 x2r ) x1 5 x1r x1 2 x1r x2 (x2r x2)21 x2r x1 5
cov2x1,x2 Vx1Vx2 2 cov2x1,x2
Vx1 2 5 (5.25)
Vx2 Vx2
Combining (5.24) and (5.25), the asymptotic bias is equal to

Vx1Vx2 2 cov2x1,x2 21 covx1,x2covx2,q
8
plim (b1 2b1) 5 a b a2g b=

Vx2 Vx2
covx1,x2covx2,q
2g
Vx1Vx2 2 cov2x1,x2
This expression could be further simplified to aid interpretation. Since

covx1, x2 5 rx1,x2 sx1 sx2 and covx2,q 5 rq,x2sqsx2 (where r is correlation
coefficient):
rx1,x2 sx1 sx2 rq,x2 sq sx2 rx1,x2 rq, x2 sq

8
plim ( b1 2b1) 5 2g 5 2g (5.26)

Vx1Vx2 (12 rx21,x2) sx1 (12 rx21, x2)
This expression is generally non-zero. Hence, even though x1 is

exogenous, coefficient b1 would still be biased as long as x1 is correlated
with the endogenous regressor x2. The sign of the bias is determined
by the signs of covx1,x2, covx2,q and g. The magnitude of the bias is
amplified when x1 and x2 are highly correlated. Only in the special case
when x1 and x2 are orthogonal (rx1, x2 5 0), the bias equals zero. Strict
orthogonality of x1 and x2, however, almost never holds in economic
settings.
Measurement Error Bias Spreading in Multivariate Setting
Smearing of the bias also occurs in multivariate regression in the case

of measurement error. Consider a two-variable model where one of the
regressors is mismeasured:
y 5 a0 1 b1x1 1 b2x2 1 e (5.27)
MIZIK_9781784716745_t.indd 132 14/02/2018 16:38

where x1 is measured with error and x2 is measured without error. That is,
we observe x*1 5 x1 1v. If equation (5.27) is estimated by OLS then both
8
estimates, b1 and b2 are biased and inconsistent (Greene 2017):
8
1
(5.28)
8
plim b1 5 b1 a b
1 1 sv2 s11
sv2 s12
b (5.29)
8
plim b2 5 b2 2 b1 a
1 1 sv2 s11
where s ij is the ij-th element of the inverse of the covariance matrix and sv2
is the variance of the measurement error v.
8
b1 is still subject to attenuation bias as in the bivariate case: the mag-

nitude of the estimate is smaller than the true b1. As long as x1 and x2 are
correlated, the magnitude of the attenuation bias is greater in the multi-
variate setting than in the bivariate setting. The intuition for this result is
that the additional variable x2 in the regression will serve as a proxy for
a part of the signal in the mismeasured regressor x1. As such, the partial
correlation between y and x1 will be attenuated even more.
8
b2 is biased and the direction of the bias can be either upward or down-
ward, depending on the sign of b1 and covariance between the two regressors.
CONCLUSION
Panel data allow researchers to design insightful models and control

for the effects of unobservable factors. We advise caution and careful
testing of alternative specifications before selecting models and estima-
tors and suggest steps to avoid common errors in panel data modeling.
Misspecification can lead to significant biases and erroneous conclusions
about the economic effects of marketing or public policy activities.
References
Anderson, Theodore Wilbur and Cheng Hsiao (1981), “Estimation of Dynamic Models
with Error Components,” Journal of the American Statistical Association, 76 (January),
598–606.
Angrist, Joshua D. and Jörn-Steffen Pischke (2008), Mostly Harmless Econometrics: An
Empiricist’s Companion. Princeton, NJ: Princeton University Press.
Arellano, Manuel and Stephen Bond (1991), “Some Tests of Specification for Panel Data:
Monte Carlo Evidence and an Application to Employment Equations,” Review of
Economic Studies, 58(2), 277–297.
Arellano, Manuel and Olympia Bover (1995), “Another look at the instrumental variable
estimation of error-components models,” Journal of Econometrics, 68(1), 29–51.
MIZIK_9781784716745_t.indd 133 14/02/2018 16:38

Baltagi, Badi (2005), Econometric Analysis of Panel Data. New York: John Wiley & Sons.
Biørn, Erik (2000), “Panel Data with Measurement Errors: Instrumental Variables and
GMM Procedures Combining Levels and Differences,” Econometric Reviews, 19(4),
391–424.
Blundell, Richard and Stephen Bond (1998), “Initial Conditions and Moment Restrictions in
Dynamic Panel Data Models,” Journal of Econometrics, 87(1), 115–143.
Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics: Methods and
Applications. New York: Cambridge University Press.
Cameron, A. Colin and Pravin K. Trivedi (2009), Microeconometrics Using Stata (Vol. 5).
College Station, TX: Stata Press.
Chamberlain, Gary (1984), “Panel Data,” in Z. Griliches and M. Intriligator (eds), Handbook
of Econometrics. Amsterdam: North Holland, 1247–1318.
Greene, William (2017), Econometric Analysis. Lecture notes. http://people.stern.nyu.edu/
wgreene/Econometrics/Econometrics-I-13.pdf
Griliches, Zvi and Jerry A. Hausman (1986), “Errors in Variables in Panel Data,” Journal of
Econometrics, 31(1), 93–118.
Hansen, Lars Peter (1982), “Large Sample Properties of Generalized Method of Moments
Estimators,” Econometrica: Journal of the Econometric Society, 50(4), 1029–1054.
Hausman, Jerry A. (1978), “Specification Tests in Econometrics,” Econometrica 46
(November), 1251–1271.
Hausman, Jerry A. and William E. Taylor (1981), “Panel Data and Unobservable Individual
Effects,” Econometrica, 49(6), 1377–1398.
Hsiao, Cheng (2014), Analysis of Panel Data, Cambridge: Cambridge University Press. 3rd
edition.
Jacobson, Robert (1990), “Unobservable Effects and Business Performance,” Marketing
Science, 9 (Winter), 74–85, 92–95.
Kirzner, Israel M. (1976), “On the Method of Austrian Economics,” in E.G. Dolan (ed.), The
Foundations of Modern Austrian Economics, Kansas City: Sheed and Ward, 40–51.
Mizik, Natalie and Robert Jacobson (2004), “Are Physicians ‘Easy Marks’? Quantifying
the Effects of Detailing and Sampling on New Prescriptions,” Management Science,
1704–1715.
Mundlak, Yair (1978), “On the Pooling of Time Series and Cross-Sectional Data,”
Econometrica, 46 (January), 69–86.
Nickell, Stephen (1981), “Biases in Dynamic Models with Fixed Effects,” Econometrica,
1417–1426.
Pischke, Jörn-Steffen (2007), Lecture notes on measurement error. London School of
Economics. http://econ.lse.ac.uk/staff/spischke/ec524/Merr_new.pdf.
Roodman, David (2009), “How to do xtabond2: An introduction to difference and system
GMM in Stata,” Stata Journal, 9 (1), 86–136.
Rumelt, Richard (1984), “Towards a Strategic Theory of the Firm,” in B. Lamb (ed.),
Competitive Strategic Management, Englewood Cliffs, NJ: Prentice Hall, 556–570.
Sargan, John D. (1958), “The estimation of economic relationships using instrumental vari-
ables,” Econometrica: Journal of the Econometric Society, 393–415.
Trognon, Alain (1978), “Miscellaneous Asymptotic Properties of Ordinary Least Squares
and Maximum Likelihood Estimators in Dynamic Error Components Models,” Annales
de l’INSEE. Institut National de la Statistique et des Études Économiques, 631–657.
Wernerfelt, Birger (1984), “A Resource-based View of the Firm,” Strategic Management
Journal, 5 (April–June), 171–180.
Windmeijer, Frank (2005), “A Finite Sample Correction for the Variance of Linear Efficient
Two-step GMM Estimators, ”Journal of Econometrics, 126 (1), 25–51.
Wooldridge, Jeffrey (2002), Econometric Analysis of Cross Section and Panel Data,
Cambridge, MA: MIT Press.
Wooldridge, Jeffrey (2006), Introductory Econometrics: A Modern Approach, Mason, OH:
Thomson/South-Western.
MIZIK_9781784716745_t.indd 134 14/02/2018 16:38

6. Causal inference in marketing
applications
Peter E. Rossi
The fundamental goal of marketing analytics is to inform decisions that

firms make about the deployment of marketing resources. Marketing mix
decisions in which firms optimize the allocation of the marketing budget
over various marketing activities are one classic example of such a deci-
sion problem. More recently, digital marketing methods have radically
increased the number of possible ways that a firm may “touch” or inter-
act with a customer. These myriad methods pose the simpler problem of
“attributing” a sales response to a specific marketing action such as expo-
sure to a paid search advertisement. At the core, all firm decisions regard-
ing marketing involve counterfactual reasoning. For example, we must
estimate what a potential customer would do had they not been exposed to
a paid search ad in order to “attribute” the correct sales response estimate
to this action. Marketing mix models pose much more difficult problems
of valid counterfactual estimates of what would happen to sales and
profits if marketing resources were re-allocated in a different manner than
observed in the past.
The importance of counterfactual reasoning in any problem related
to optimization of resources raises the ante for any model of customer
behavior. Not only must this model match the co-variation of key vari-
ables in the historical data, but the model must provide accurate and valid
forecasts of sales in a new regime with a different set of actions. This
means that we must identify the causal relationship between marketing
variables and firm sales/profits and this causal relationship must be valid
over a wide range of possible actions, including actions outside of the
support of historical data.
The problem of causal inference has received a great deal of attention
in the bio-statistics and economic literatures, but relatively little attention
in the marketing literature. Given that marketing is, by its very nature, a
decision-theoretic field, this is somewhat surprising. The problems in the
bio-statistics and economics applications are usually evaluating the causal
effect of a “treatment” such as a new drug or a job-training program.
Typically, the models used in these literatures are simple linear models.
Often the goal is to estimate a “local” treatment effect. That is, a treatment
135
MIZIK_9781784716745_t.indd 135 14/02/2018 16:38

effect for those induced by an experiment or other incentives to become

treated.
A classic example from this literature is the Angrist and Krueger (1991)
study, which starts with the goal of estimating the returns to an additional
year of schooling but ends up only estimating (with a great deal of uncer-
tainty) the effect of additional schooling for those induced to complete the
10th grade (instead of leaving school in mid-year). To make any policy
decisions regarding investment in education, we would need to know the
entire causal function (or at least more than one point) for the relationship
between years of education and wages. The analogy in marketing analytics
is to estimate the causal relationship between exposures to advertising and
sales. In order to optimize the level of advertising, we require the whole
function, not just a derivative at a point.
Much of the highly influential work of Heckman and Vytacil (2007)
has focused on the problem of evaluating job training programs where
the decision to enroll in the program is voluntary. This means that those
people who are most likely to benefit from the job training program or
who have the least opportunity cost of enrolling (such as the recently
unemployed) are more likely to be treated. This raises a host of thorny
inference problems. The analogy in marketing analytics is to evaluate the
effect of highly targeted advertising.
This chapter summarizes the major methods of causal inference and
comments on the applicability of these methods to marketing problems.
The Problem of Observational Data
Consider the generic problem of building a sales response model that

links sales to various input variables that measure price, promotion and
advertising, broadly construed. Assembling even observational data to
fit such a model can be very demanding. At least three or possibly four
different sources are required: (1) Sales data, (2) Pricing and promotional
data, (3) Digital advertising and (4) Traditional advertising such as TV,
print and outdoor. Typically, these sources feature data at various levels of
temporal, geographic and product aggregation. For example, advertising
is typically not associated with a specific product but with a line of prod-
ucts and may only be available at the monthly or quarterly level. Since, at
its core, inference with observational data is about exploiting variance in
marketing input to identify causal effects, the limitations of the data can
be severe.
Consider a very simple problem in which we have aggregate time series
data on the sales of a product and some measure of advertising exposure.1
MIZIK_9781784716745_t.indd 136 14/02/2018 16:38

Causal inference in marketing applications 137
St = f(At|q) + et
Our goal is to infer the function, f, which can be interpreted as a causal

function. That is, we can use this function to make valid predictions of
expected sales for a wide range of possible values of advertising. In order
to consider optimizing advertising, we require a non-linear function
which, at least at some point, exhibits diminishing returns. Given that we
wish to identify a non-linear relationship, we will require more extensive
variation in A than if we assume a linear approximation. The question
from the point of view of causal inference is whether or not we can use the
variation in the observed data to make causal inferences.
The statistical theory behind any standard inference procedure
for such a model (non-linear least squares, maximum likelihood, or
Bayesian methods) assumes the observed variation in A is as though
obtained via random experimentation. In a likelihood-based approach,
we make the assumption that the marginal distribution of A is unrelated
to the parameters, q, which drive the conditional mean function. An
implication of this assumption is that the conditional mean function
is identified only via the effect of changes in A rather than levels. In
practice, this may not be true. For example, it may be that both A and
S are determined simultaneously or that there is some sort of feedback
relationship between sales and advertising. Suppose each quarter, the
level of advertising is set as a function of the last quarter’s sales or as
a function of this quarter’s sales. In this situation, we may not be able
to obtain valid (consistent) estimates of the sales response function
parameters.2
Another possibility is that there is some unobservable variable that
influences both advertising and sales. For example, suppose there are
advertising campaigns for a competing product that is a close substitute
and we, as data scientists, are not aware of or cannot observe this activity.
It is possible that, when there is intensive activity from competitive adver-
tising, the firm increases that scale of its advertising to counter or blunt the
effects of competitive advertising. This means that we no longer estimate
the parameters of the sales response function consistently. In general,
anytime the firm sets A with knowledge of some factor that also affects
sales and we do not observe this factor, we will have difficulty recovering
the sales response function parameters. In some sense, this is a generic and
non-falsifiable critique. How do we know that such an unobservable does
not exist? We can’t prove it.
Typically, the way we might deal with this problem is to include as large
a possible set of covariates in the sales equation as control variables.3 The
problem in sales-response model-building is that we often do not observe
MIZIK_9781784716745_t.indd 137 14/02/2018 16:38

any actions of competing products or we only observe these imperfectly

and possibly at a different time frequency. Thus, one very important set of
potential control variates is often not available. Of course, this is not the
only possible set of variables observable to the firm but not observable to
the data scientist.
There are three possible ways to deal with this problem of “simultane-
ity” or “endogeneity.”
1. We might consider using data sampled at a much higher frequency

than the decisions regarding A are made. For example, if advertis-
ing decisions are made only quarterly, we might use weekly data and
argue that the lion’s share of variation in our data holds the strategic
decisions of the firm constant.
2. We might attempt to partition the variation in A into that which is
“clean” or unrelated to factors driving sales and that which is related.
This is the logical extension of the conditioning approach of adding
more observables to the model. We would then use an estimation
method that uses only the “clean” portion of the variation.
3. We could consider experimentation to break whatever dependence
there is between the advertising and sales.
Each of these ideas will be discussed in detail below. Before we embark

on a more detailed discussion of these methods, we will relate our discus-
sion of simultaneity or endogeneity to the literature on causal inference for
treatment effects.
The Fundamental Problem of Causal

Inference
A growing literature (c.f. Angrist and Pischke (2009) and Imbens and
Rubin (2015)) emphasizes a particular formulation of the problem of
causal inference. Much of this literature re-interprets existing econo-
metric methods in light of this paradigm. The basis for this paradigm of
causal inference was originally suggested by Neyman (1923), who con-
ceived of the notion of potential outcomes for a treatment. The notation
favored by Imbens and Rubin is as follows. Y represents the outcome
random variable. In our case, Y will be sales or some sort of event (like a
conversion or click) which is on the way toward a final purchase. We seek
to evaluate a treatment, denoted D. For now, consider any binary treat-
ment such as exposure to an ad. We conceive of there being two potential
outcomes:
MIZIK_9781784716745_t.indd 138 14/02/2018 16:38

Yi(1): potential outcome if unit i is exposed to the treatment

Yi(0): potential outcome if unit i is not exposed to the treatment.
We would like to estimate the causal effect of the treatment which is

defined as
∆i = Yi(1) – Yi(0)
The fundamental problem of causal inference is that we only see one

of two potential outcomes for each unit being treated. That is, we only
observe Yi(1) for Di = 1 and Yi(0) for Di = 0. Without further assumptions
or information, this statistical problem is unidentified. Note that we
have already simplified the problem greatly by assuming a linear model
or restricting our analysis to only one “level” of treatment. Even if we
simplify the model by assuming a constant treatment effect, ∆i = ∆i , the
model is still not identified.
To see this problem, let’s take the mean differences in Y between those
who were treated and not treated and express this in terms of potential
outcomes.
E [ Yi 0 Di 5 1 ] 2 E [ Yi 0 Di 5 0 ] 5 E [ Yi (1) 0 Di 5 1 ] 2 E [ Yi (0) 0 Di 5 0 ]
5E [ Yi (1) 0 Di 51 ] 2E [ Yi (0) 0 Di 51 ] 1
E [ Yi (0) 0 Di 5 1 ] 2E [ Yi (0) 0 Di 50 ]
This equation simply states that what the data identifies is the mean
difference in the outcome variable between the treated and untreated, and
this can be expressed as the sum of two terms.
The first term is the effect on the treated, E [ Yi (1) 0 Di 5 1 ] 2E [ Yi (0) 0
D i = 1 and the second term is called the selection bias, E[Yi(0)|Di = 1]− E
[Yi(0)|D i = 0]. Selection bias occurs when the potential outcome for those
assigned to the treatment differs in a systematic way from those who are
assigned to the “control” or assigned not to be treated. This selection bias is
what inspired much of the work of Heckman, Angrist and Imbens to obtain
further information. The classic example of this is the so-called ability bias
argument in the literature on education. We can’t simply compare the wages
of college graduates with those who did not graduate for college because it
is likely that college graduates have greater ability even “untreated” with a
college education. Those who argue for the “certification” view of higher
education are the extreme point of this selection bias – they argue that the
only point of education is not those courses in Greek philosophy but simply
the selection bias of finding higher ability individuals.
It is useful to reflect on what sort of situations are likely to have large
selection bias in the evaluation of marketing actions. Mass media like TV
MIZIK_9781784716745_t.indd 139 14/02/2018 16:38

or print are typically only targeted at a very broad demographic group.

For example, advertisers on the Super Bowl are paying a great deal of
money to target men aged 25–45. There is year-to- year variation in Super
Bowl viewership that, in principle, would allow us to estimate some sort of
regression-based model of the effect of exposure to Super Bowl ads. The
question is what is the possible selection bias? It is true that the effective-
ness of a beer ad on those who view the Super Bowl versus a random
consumer may be very different, but that may not be relevant to the Super
Bowl advertiser. The SB advertiser cares more about the effect on the
treated; that is, the effect of exposure on those in the target audience who
view the Super Bowl. Are those who choose not to view the Super Bowl
in year X different from those who view the Super Bowl in year Y? Not
necessarily, viewership is probably driven by differences in the popularity
of the teams in the SB. Thus, if our interest is the effect of the treated Super
Bowl fan, there probably is little selection bias (under the assumption that
the demand for beer is similar across the national population of Super
Bowl fans).
However, selection bias is a probably a very serious problem in other
situations. Consider a firm like North Face that markets outdoor clothing.
This is a highly seasonal industry with two peaks in demand each year:
one in the Spring as people anticipate summer outdoor activities and
another in the late fall as consumers are purchasing holiday gifts. North
Face is aware of these peaks in demand and typically schedules much of
its promotional and advertising activity to coincide with these peaks in
demand. This means we can’t simply compare sales in periods of high
advertising activity to sales in periods of low as we are confounding the
seasonal demand shift with the effect of marketing.
In the example of highly seasonal demand and coordinated marketing,
the marketing instruments are still mass or untargeted for the most part
(other than demographic and, possibly, geographic targeting rules).
However, the problem of selection bias can also be created by various
forms of behavioral targeting. The premier example of this is the paid
search advertising products that generate much of Google Inc.’s profits.
Here the ad is triggered by the consumer’s search actions. Clearly, we can’t
compare the subsequent purchases of someone who uses search keywords
related to cars with those not exposed to these paid search ads. There is
apt to be a huge selection bias as most of those not exposed to the car
keyword-search ad are not in the market to purchase a car. Correlational
analyses of the impact of paid search ads are apt to show a huge impact
that is largely selection bias (see Blake et al. (2015) for analysis of paid
search ads for eBay in which they conclude that they have little effect).
There is no question that targeting ads based on the preferences of custom-
MIZIK_9781784716745_t.indd 140 14/02/2018 16:38

ers as revealed in their behavior is apt to become even more prevalent in

the future. This means that, for all the talk of “big data,” we are creating
more and more data that is not amenable to analysis with our standard
bag of statistical tricks.
Randomized Experimentation
The problem with observational data is the potential correlation between
“treatment” assignment and the potential outcomes. We have seen that
this is likely to be a huge problem for highly targeted forms of marketing
activities where the targeting is based on customer preferences. More gen-
erally, any situation in which some of the variation in the right-hand side
variables is correlated with the error term in the sales response equation
will make any “regression-style” method inconsistent in estimating the
parameters of the causal function. For example, the classical errors-in-
variables model results in a correlation between the measured values of the
rhs variables and the error term.
In a randomized experiment, the key idea is that assignment to the treat-
ment is random and therefore uncorrelated with any other observable or
unobservable variable. In particular, assignment to the treatment is uncor-
related with the potential outcomes. This eliminates the selection bias term.
E [ Yi (0) 0 Di 5 1 ] 2 E [ Yi (0) 0 Di 5 0 ] 5 0
This means that the difference in means between the treated and
untreated populations consistently estimates not only the effect on the
treated, but also the average effect or the effect on the person chosen at
random from the population. However, it is important to understand
that when we say person chosen at random from the “population,” we
are restricting attention to the population of units eligible for assignment
in the experiment. Most experiments have a very limited domain. For
example, if we randomly assign designated market areas (DMAs) in
the northeast portion of the United States, our population is only that
restricted domain. Most of the classic social experiment in economics
have very restricted domains or population to which the results can be
extrapolated. Generalizability is the most restrictive aspect of randomized
experimentation. Experimentation in marketing applications such as
“geo” or DMA-based experiments conducted by Google and Facebook
is starting to get at experiments which are generalizable to the relevant
population (i.e. all US consumers).
Another key weakness of randomization is that this idea is really a large
MIZIK_9781784716745_t.indd 141 14/02/2018 16:38

sample concept. It is of little comfort to the analyst that treatments were

randomly assigned if it turns out that randomization “failed” and did not
give rise to a random realized sample of treated and untreated units. With
a very small N, this is a real possibility. In some sense, all we know is that
statements based on randomization only work asymptotically.
A practical limitation to experimentation is that there can be situations
in which randomization results in samples with low power to resolve causal
effects. This can happen when the effects of the variables being tested are
small, the sales response model has low explanatory power, and the sales
dependent variable is highly variable. A simple case might be where you are
doing an analysis of the effect of an ad using individual data and no other
covariates in the sales response model. The standard errors of the causal
effect (here just the coefficient on the binary treatment variables) of course
are decreasing only at rate !n and increasing in the standard deviation
of the error term. If the effects are small, then the standard deviation of
the error term is about the same as the standard deviation of sales. Simple
power calculations in these situations can easily result in experimental
designs with thousands or even tens of thousands of subjects, a point made
recently by Lewis and Rao (2015). Lewis and Rao neglect to say that if there
are other explanatory variables (such as price and promotion) included in
the model, then even though sales may be highly variable, we still may be
able to design experiments with adequate power even with smallish N. It is
important to note that the error variance is not the variance of sales. Other
explanatory variables will, by definition, be orthogonal to the advertising
treatment variable but still helpful in increasing power.
While randomization might seem the panacea4 for estimation of causal
effects, it has severe limitations for situations in which a large number
or a continuum of causal effects are required. For example, consider the
situation of two marketing variables and a possibly non-linear causal
function:
St 5 f (X1,t ,X2,t 0 u) 1 et
In order to maximize profits for choice of the two variables, we must

estimate not just the gradient of f() at some point (or the average gradient
average over the joint distribution of the two variables) but the entire func-
tion. Clearly, this would require a continuum of experimental conditions.
Even if we discretized the values of the variables used in the experiments,
the experimental paradigm clearly suffers from the curse of dimensionality
as we add variables to the problem. For example, the typical marketing
mix model might include at least five or six marketing variables resulting
in experiments with hundreds of cells.
MIZIK_9781784716745_t.indd 142 14/02/2018 16:38

Poor Man’s Randomization or

Instrumental Variables
In many situations, we do not have the luxury of injecting true randomized

variation into our data via experimentation. If we have strong reason to
believe there is “selection on unobservables” or large measurement errors,
what can be done short of experimentation? The answer most econometri-
cians would offer would be to use instrumental variable methods. These
have their origin in work done shortly after World War II at the Cowles
Commission (then housed at University of Chicago). This work was moti-
vated by the desire to estimate demand and supply equations from data on
equilibrium quantities. The key observation was that such a system of equa-
tions could be identified if there are “exclusion” restrictions. In other words,
if we could identify a variable which moved around Demand without affect-
ing Supply and vice versa, we might be able to estimate the slopes of the
demand and supply curve. In marketing, we might focus on estimating the
Demand curve alone, arguing that our job is to solve the “supply” or firm
profit maximization problem. In the case of only one “structural” equation
demand equation that relates sales to price and other marketing variables,
we seek a variable(s) that affect a rhs variable but do not have any direct
effect on sales. These are called Instrumental Variables or IVs.
The idea of an IV is that, while some of the variation of a rhs X variable
is contaminated (in the sense of being correlated5 with the error term), that
portion which is driven by the instrument Z is not. Clearly, this variation
(that due to Z) can be used to estimate causal effects. The only question
is how to do so. The key idea here also comes from the early work on
this problem. While I cannot “regress” the dependent variable on X to
get a causal estimate, I can project both the dependent and independent
variables on the instrument, Z. This is called the “reduced” form, a term
invented at the Cowles Commission.
St 5 g1 (Zt) 1 uS,t

Xt 5 g2 (Zt) 1 uX,t
At a given value of Z, Z0, I can estimate the impact of X on S using what

has become known as the “Wald” estimator (in a linear model this would
be called Indirect Least Squares).
0g1 /0Z
D5 2
0g2 /0Z Z = Z
0
The Wald estimator makes a great deal of intuitive sense. The numerator
is basically the derivative of the mean of S wrt to Z and the denominator
MIZIK_9781784716745_t.indd 143 14/02/2018 16:38

the derivative of the conditional mean of X wrt to Z. Thus, the quotient is

the derivative of the causal function wrt to X based on a perturbation at Z0.
In a constant effects linear model, the Wald estimator will consistently
recover the true constant linear causal effect of X on S. In heterogenous
effect models, the IV estimator has the so-called LATE (local average
treatment effect) interpretation. It only estimates the causal impact of
X on S for those units whose values of X are affected by the treatment
assignment. In a non-linear heterogeneous model, it is difficult to interpret
the IV estimator. For non-linear homogeneous models, the estimator
of the non-linear IV will require that the instrument be independent (or
conditionally independent) of the Sales equation error term instead of
merely mean independent.
Since the IV estimator is only using a portion of the variation in the X
variable to estimate causal effects, IV estimates may have large standard
errors. As the strength of an instrument or instruments declines, the usual
asymptotic approximation used to compute standard errors worsens. In
the case of weak instruments, the IV estimators can be very biased and
have enormous confidence intervals6 (including infinite length intervals).
There is no real consensus on how to estimate standard errors or confi-
dence intervals for weak to moderately weak instruments.
The problem with the IV approach to causal inference is that we rarely
have access to any variable that can be argued to be a valid instrument. In
the case of instruments for prices, we might argue that cost factors should
be valid instruments, but these variables are rarely very strong instruments.
I am not aware of any generic arguments supporting any set of variables
as valid instruments for advertising variables. Sometimes there amounts
to a natural “randomization” that occurs, but this is extremely rare. Two
examples of this are Angrist’s (1990) study, which considers the effect
of serving in the US armed forces on wages, using as an instrument the
draft lottery for the Vietnam War. Here there was a true randomization
of draft eligibility. More recently, Stephens-Davidowitz et al. (2015) used
whether the home team of a DMA is in the Super Bowl as an instrument
to measure the effect of movie ads.
There is a fairly large cottage industry of economists who try to find
instruments to estimate causal effects. Many of these efforts fall short
as the arguments for validity of these instruments are undermined by
subsequent research. In almost all situations, the inference conducted
with IVs assumes or conditions on the validity of the instrument and,
therefore, understates the bias and uncertainty in these estimates. I do not
think that IV methods have much promise in marketing applications due
to the problems in finding valid instruments and the inference problems
associated with IV methods.
MIZIK_9781784716745_t.indd 144 14/02/2018 16:38

Other Control Methods
We have seen that randomization either by direct intervention (i.e.,

experimentation) or appeal to “naturally” occurring randomization (IVs)
can help solve the fundamental problem of causal inference. Another
approach is to add additional covariates to the analysis in hopes of achiev-
ing independence of the treatment exposure conditional on these sets
of covariates. If we can find covariates that are highly correlated with
the unobservables and then add these to the sales response model, then
the estimate on the treatment or marketing variables of interest can be
“cleaner” or less confounded with selection bias.
If we have individual level data and are considering a binary treat-
ment such as ad exposure, then conditioning on covariates to achieve
approximate independence simplifies to the use of propensity scores as a
covariate. The propensity score7 is nothing more than the probability that
the individual is exposed to the ad as a function of covariates (typically the
fitted probability from a logit/probit model of exposure). For example,
suppose we want to measure the effectiveness of a YouTube ad for an
electronic device. The ad is shown on a YouTube channel whose theme is
electronics. Here the selection bias problem can be severe – those exposed
to the ad may be pre-disposed to purchase the product. The propensity
score method attempts to adjust for these biases by modeling the prob-
ability of exposure to the ad based on covariates such as demographics
and various “techno-graphics” such as browser type and previous viewing
of electronics YouTube channels. The propensity score estimate of the
treatment or ad exposure effect would be from a response model that
includes the treatment variable as well as the propensity score. Typically,
effect sizes are reduced by inclusion of the propensity score in the case of
positive selection bias.
Of course, the propensity score method is only as good as the set of
covariates used to form the propensity score. There is no way to test that
a propensity score fully adjusts for selection bias other than confirma-
tion via true randomized experimentation. Goodness-of-fit or statistical
significance of the propensity score model is reassuring but not conclusive.
There is a long tradition of empirical work in marketing that demonstrates
that demographic variables are not predictive of brand choice or brand
preference. This implies that propensity score models built on standard
demographics are apt to be of little use reducing selection bias and obtain-
ing better causal effect estimates.
Another way of understanding the propensity score method is to think
about a “synthetic” control population. That is, for each person who is
exposed to the ad, we find a “twin” who is identical (in terms of product
MIZIK_9781784716745_t.indd 145 14/02/2018 16:38

preferences and ability to buy) who was not exposed to the ad. The differ-
ence in means between the exposed (treatment) group and this synthetic
control population should be a cleaner estimate of the causal effect. In
terms of propensity scores, those with similar propensity scores are consid-
ered “twins.” In this same spirit, there is a large literature on “matching”
estimators that attempt to construct synthetic controls (c.f. Imbens and
Rubin Chapters 15 and 18). Again, any matching estimator is only as good
as the variables used in implementing “matching.”
With aggregate data, the “difference-in-differences” approach to con-
structing a control group has achieved a great deal of popularity. A nice
example of this approach can be found in Blake et al. (2015). Here they
seek to determine the impact of sponsored search ads using a “natural”
experiment in which eBay terminated paid search ads on MSN after a
certain date. The standard analysis would be simply to compare some
outcome measure such as clicks, conversions or revenue before and after
termination of the sponsored search ads. In this approach, the “control” is
the period after termination and the “experimental” or treatment period is
the before. There are two problems with this approach. First, this does not
control for other time-varying factors influencing interest in the sponsored
search keywords. Second, there can be power problems. The standard
difference-in-differences approach is to find a control condition where
there was no change in sponsored search ads. The authors use Google
organic search results as the control. The difference-in-differences method
is simply to subtract the before-and-after differences on MSN from the
before-and-after differences on Google (the control). The success of this
strategy depends on whether or not Google keyword results constitute a
valid control. Blake et al. are suspicious of this assumption and pursue a
randomized experimentation strategy to estimate the impact of sponsored
search ads.
The popularity of the differences-in-differences approach is that all that
appears to be required is some subset of the data (typically a geographically
based subset) that was not exposed to the advertisement or policy change.
It is not possible to test the assumption that the changes in the response
variable for the control subset are independent of the “treatment.” There
are also a host of power and statistical inference problems associated with
the difference-in-differences literature (see Chapters 5 and 8 of Angrist
and Pischke). As a practical matter, it is advisable to do a “placebo” test if
a difference-in-differences approach is adopted. That is, take two subsets
of the data where there should be, by definition, no treatment effect and
perform a difference-in-differences analysis on the “placebo” sample.
MIZIK_9781784716745_t.indd 146 14/02/2018 16:38

Panel Data and Selection on

Unobservables
Up to this point, I have considered only aggregate time series data. The
problem with this data with respect to causal inference is that there can
be decisions to set the rhs variables that, over time, induce an “endog-
eneity” problem or a correlation with the model errors. The same is true
for pure cross-sectional variables. If the X variables are correlated with
unobserved cross-sectional characteristics, valid causal inferences cannot
be obtained.
If we have panel data and we think that there are unobservables that
are time invariant, then we can adopt a “fixed effects” style approach that
uses only variation within unit over time to estimate causal effects. The
only assumption required here is that the unobservables are time invari-
ant. Given that marketing data sets seldom span more than a few years,
this time invariance assumption seems eminently reasonable. It should be
noted that if the time span increases, a host of non-stationarities arise such
as the introduction of new products and entry of competitors. In sum, it
is not clear that we would want to use a long-time series of data without
modeling the evolution of the industry we are studying.
Consider the example of estimating the effect of a Super Bowl ad.
Aggregate time series data may have insufficient variation in exposure
to estimate ad effects. Pure cross-sectional variation confounds regional
preferences for products with true useful variation in ad exposure. Panel
data, on the other hand, might be very useful to isolate Super Bowl ad
effects. Klapper and Hartman (2017) exploit a short panel of six years of
data across about 50 different DMAs to estimate effects of CPG ads. They
find that there is a great deal of variation from year to year in the same
DMA in Super Bowl viewership. It is hard to believe that preferences for
these products vary from year to year in a way that is correlated with the
popularity of the Super Bowl broadcast. Far more plausible is that this
variation depends on the extent to which the Super Bowl is judged to be
interesting at the DMA level. This could be because a home team is in the
Super Bowl or it could just be due to the national or regional reputation
of the contestants. Klapper and Hartmann estimate linear models with
Brand-DMA fixed effects (intercepts) and find a large and statistically
significant effect of Super Bowl ads by beer and soft drink advertisers.
This is quite an achievement, given the cynicism in the empirical advertis-
ing literature about ability to have sufficient power to measure advertising
effects without experimental variation.
Many, if not most, of the marketing mix models estimated today are
estimated on aggregate or regional time series data. The success of Klapper
MIZIK_9781784716745_t.indd 147 14/02/2018 16:38

and Hartmann in estimating effects using more disaggregate panel data is

an important source of hope for the future of marketing analytics.
It is well known that the idea of using fixed effects or unit-specific
intercepts does not generalize to non-linear models. If we want to opti-
mize the selection of marketing variables then we will have to use more
computationally intensive hierarchical modeling approaches to allow
response parameters to vary over cross-sectional units. Advocates of the
fixed-effects approach argue that the use of fixed effects does not require
any distributional assumptions nor the assumption that unit parameters
are independent of the rhs variables. Given that it is possible to construct
hierarchical models with a general distributional form as well as to allow
unit characteristics to affect these distributions,8 it seems the time is ripe
to move to hierarchical approaches for marketing analytics with non-
linear response models. This approach exploits the advantage we have in
marketing of having comprehensive datasets without adding the difficult
to verify assumptions used in the IV literature.
Regression Discontinuity
Many promotional activities in marketing are conducted via some sort of

threshold rule or discretized into various “buckets.” For example, consider
the loyalty program of a gambling casino. The coin of the realm in this
industry is the expected win for each customer, which is simply a function
of the volume of gambling and type of game. The typical loyalty program
encourages customers to gamble more and come back to the casino by
establishing a set of thresholds. As customers increase their expected win,
they “move” from one tier or “bucket” in this program to the next. In the
higher tiers, the customer receives various benefits like complementary
rooms or meals. The key is that there is a discrete jump in benefits by
design of the loyalty program. On the other hand, it is hard to believe
that the response function of the customer to the level of complementary
benefits is non-smooth or discontinuous. Thus, it would seem that we can
“select” on the observables to compare those customers whose volume
of play is just on either side of each discontinuity in the reward program.
As Hartmann et al. (2011) point out, as long as the customer is not
aware of the threshold or the benefits from “selecting in” or moving to
the next tier are small relative to the cost of greater play, this constitutes a
valid Regression Discontinuity (RD) design. Other examples in marketing
include direct mail activity (those who receive offers and or contact are a
discontinuous function of past order history) and geographic targeting (it
is unlikely people will move to get the better offer).
MIZIK_9781784716745_t.indd 148 14/02/2018 16:38

Regression discontinuity analysis has received a great deal of attention

in economics as well (see Imbens and Lemieux 2008). The key assumption
is that the response function is continuous in the neighborhood of the
discontinuity in the assignment of the treatment. There are both para-
metric and non-parametric forms of analysis, reflecting the importance
of estimating the response function without bias that would adversely
affect the RD estimates. Parametric approaches require a great deal of
flexibility that may compromise power, while non-parametric methods
rest on the promise to narrow the window of responses used in the vicinity
of the threshold(s) as the sample size increases. This is not much comfort
to the analyst with one finite sample. Non-parametric RD methods are
profligate with data as, ultimately, most of the data is not used in forming
treatment effect estimates.
RD designs result in only local estimates of the derivative of the response
function. For this reason, unless the ultimate treatment is really discrete,
RD designs do not offer a solution to the marketing analytics problem
of optimization. RD designs may be helpful to corroborate the estimates
based on response models fit to the entire dataset (the RD estimate and the
derivative the response function at the threshold should be comparable).
Model Evaluation
The purpose of causal inference in marketing applications is to inform

firm decisions. As I have argued, in order to optimize actions of the firm,
we must consider counterfactual scenarios. This means that the causal
model must predict well in conditions that can be different from those
observed in the data. The model evaluation exercise must validate the
model’s predictions across a wide range of different policy regimes. If we
validate the model under a policy regime that is the same or similar to the
observational data, then that validation exercise will be uninformative or
even misleading.
To see this point clearly, consider the problem of making causal infer-
ences regarding a price elasticity. The object of causal inference is the true
price elasticity in a simple log–log approximation.
lnQt 5 a 1hPt 1 et
Imagine that there is an “endogeneity” problem in the observational

data in which the firm has been setting price with partial knowledge of the
demand shocks that are in the error term. Suppose further that the firm
raises price when it anticipates a positive demand shock. This means that
MIZIK_9781784716745_t.indd 149 14/02/2018 16:38

an OLS estimate of the elasticity will be too small, and we might conclude,
erroneously, that the firm should raise its price even if the firm is setting
prices optimally.
Suppose we reserve a portion of our observational data for out-of-
sample validation. That is, we will fit the log–log regression on observa-
tions, 1, 2, . . . T0, reserving observations T0+1, . . ., T for validation.9 If
we were to compare the performance of the inconsistent and biased OLS
estimator of the price elasticity with any valid causal estimate using our
“validation” data, we would conclude that OLS is superior using anything
like the MSE metric. This is because OLS is a projection-based estimator
that seeks to minimize mean squared error. The only reason OLS will
fare poorly in prediction in this sort of exercise is if the OLS model is
highly over-parameterized and the OLS procedure will over-fit the data.
However, the OLS estimator will yield non-profit maximizing prices if
used in a price optimization exercise because it is inconsistent for the true
causal elasticity parameter.
Thus, we must devise a different validation exercise in evaluating causal
estimates. We must either find different policy regimes in our observa-
tional data or we must conduct a validation experiment.
Conclusions
The goal of marketing analytics is to inform the decisions of firms in

optimally setting their marketing input variables. Optimization is the
ultimate exercise in causal or counterfactual reasoning that requires valid
causal estimates of the entire sales response function. In this chapter, I
have reviewed the problem of causal inference and many of the popular
methods. In order to make headway on this important problem, we must
exploit the rich possibilities of highly detailed and disaggregate data and
stop pretending that aggregate time series data are sufficient.
Marketing activities that are targeted based on customer preferences
present the most difficult challenge in causal reasoning. The canonical
example of this is paid search advertising. Since these ads are directly trig-
gered by the keyword searches of the customers, the possibility of selection
bias is maximized. Correlational or regression-style analyses in which sales
are correlated with paid search activity will inevitably over-estimate the
impact of paid search ads. However, this problem extends to the increas-
ingly sophisticated set of advertising products that are triggered based on
estimates of the preferences of customers and is not specific to paid search.
At this point, there is no substitute for properly conducted experimental
evidence to evaluate the causal impact of behaviorally targeted marketing.
MIZIK_9781784716745_t.indd 150 14/02/2018 16:38

Notes
1. Many marketing mix models are built with advertising expenditure variables not adver-
tising exposure variables. This confounds the problem of procurement of advertising
with the measurement of exposure. Sales response models must have only exposure vari-
ables on the right-hand side.
2. Bass (1969) constructed such a model of the simultaneous determination of sales and
advertising using cigarette data.
3. The proper way to view propensity score analysis is as particular example of adding
control variables where the control variable is the propensity score.
4. Randomized assignment to treatment typically means randomized treatment in market-
ing applications. That is to say, there is always full compliance – if you are assigned to a
treatment you take it and if you are not assigned to a treatment you do not take it. An
exception might be leakage in Geo experiments – if subjects work in different areas than
they reside, some who are assigned to non-exposure may become exposed. In biostatistics
and economics, there can be an important distinction between assignment and receiving
the treatment which, fortunately, we can largely ignore in marketing applications.
5. Note that the selection bias discussed above can always be expressed as a correlation
between a treatment variable and the error term.
6. See, for example, Rossi (2014) for a more detailed discussion of this point.
7. See Imbens and Rubin, Chapter 13, for more details on propensity scores.
8. See Dube, Hitsch, and Rossi (2011) and Rossi and Allenby (2011) for examples and
further discussion.
9. It does not matter how sophisticated we are in selecting estimation and validation
subsets, any cross-validation style procedure will be subject to the same vulnerabilities
laid out here.
References
Angrist, J. D. (1990), “Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from
Social Security Administrative Records,” American Economic Review 80, 313–335.
Angrist, J. D. and A. B. Kruger (1991), “Does Compulsory Schooling Attendance Affect
Schooling and Earnings?” Quarterly Journal of Economics 106, 976–1014.
Angrist, J. D. and J. Pischke (2009), Mostly Harmless Econometrics, Princeton, NJ:
Princeton University Press.
Bass, F. M. (1969), “A Simultaneous Equation Regression Study of Advertising and Sales of
Cigarettes,” Journal of Marketing Research 6, 291–300.
Blake, T., C. Nosko and S. Tadelis (2015), “Consumer Heterogeneity and Paid Search
Effectiveness: A Large Scale Field Experiment,” Econometrica 83, 155–174.
Dube, J. P., G. Hitsch, and P. E. Rossi (2011), “State Dependence and Alternative
Explanations for Consumer Inertia,” Rand Journal of Economics 41, 417–445.
Hartmann, W. and D. Klapper (2017), “Super Bowl Ads,” Marketing Science, forthcoming.
Hartmann, W., H. Nair, and S. Narayanan (2011), “Identifying Causal Marketing Mix
Effects Using a Regression Discontinuity Design,” Marketing Science 30, 1079–1097.
Heckman, J. J. and E. J. Vytacil (2007), “Econometric Evaluation of Social Programs,
Part I: Causal Models, Structural Models and Econometric Policy Evaluation,” in J. J.
Heckman and E. E. Leamer, eds, Handbook of Econometrics, Amsterdam: Elsevier, 2007,
4779–4874.
Imbens, G. W. and T. Lemieux (2008), “Regression Discontinuity Designs: A Guide to
Practice,” Journal of Econometrics 142, 807–828.
Imbens, G. and D. Rubin (2015), Causal Inference for Statistics, Social and Biomedical
Sciences: An Introduction, New York: Cambridge University Press.
MIZIK_9781784716745_t.indd 151 14/02/2018 16:38

Lewis, R. and J. Rao (2015), “The Unfavorable Economics of Measuring the Returns to
Advertising,” Quarterly Journal of Economics, 130(4), 1941–1973.
Neyman, J. (1923, 1990), “On the Application of Probability Theory to Agricultural
Experiments. Essay on Principles: Section 9,” translated in Statistical Science 5, 465–480.
Rossi, P. (2014), “Even the Rich Can Make Themselves Poor: A Critical Examination of IV
Methods,” Marketing Science 33, 655–672.
Rossi, P. and G. Allenby (2011), “Bayesian Applications in Marketing,” in Geweke et al. eds,
The Oxford Handbook of Bayesian Econometrics, Oxford: Oxford University Press.
Stephens-Davidowitz, S., H. Varian, and M. D. Smith (2015), “Super Returns to Super Bowl
Ads?” working paper, Google Inc.
MIZIK_9781784716745_t.indd 152 14/02/2018 16:38

PART III
DISCRETE CHOICE
MODELING
MIZIK_9781784716745_t.indd 153 14/02/2018 16:38

MIZIK_9781784716745_t.indd 154 14/02/2018 16:38
7. Modeling choice processes in marketing
John Roberts and Denzil G. Fiebig
Choice Modeling in a Management

Decision Making Context
There are many definitions of marketing. One to which we are drawn is
that marketing is the management of the customer-facing activities of
the organization. George Day (1994) suggests that to successfully under-
take this task marketers need two sets of skills; the ability to understand
customer needs better than their competitors (what he terms “market
sensing”) and that of harnessing the resources of the firm to better meet
those identified needs (“market linking” or “market relating”).
It follows that marketers need tools that help them understand what
consumers value and the decisions that they will make based on those
values, as well as how consumers will react to different stimuli as a result
of both the internal actions of the firm and the external changes in the
environment. Marketers need both prognostic tools to forecast how the
market will react given a certain set of conditions, and diagnostic ones that
allow their organization to design its products and services to influence
those consumer reactions in a direction that meets the objectives of the
firm and creates value for the consumer.
Central to both of these tasks, prognostics and diagnostics, is the
concept of choice. How many consumers will choose the organization’s
offering, and how does that depend on the firm’s actions and environ-
mental turbulence? Therefore, it is unsurprising that the subject of choice
has a long pedigree in the field of marketing, borrowing strongly from the
fields of economics and econometrics on the one hand, and psychology on
the other. This research has been conducted at both the aggregate market
level and at the level of the individual. The two are related. Market-level
analysis must have some understanding of heterogeneity if it is to avoid
the fallacy of averages, while individual-level analysis must have some
method of aggregation across individuals if it is to inform the organization
about the overall effects of its marketing activity.
This chapter has as its focus individual-level choice processes, but it also
discusses how these might be used at the market level. We concentrate
on the consumer purchase decision, but these models may be used for
other decision-making processes such as consumption, managerial option
155
MIZIK_9781784716745_t.indd 155 14/02/2018 16:38

e valuation, and other types of decision making. We call these choices

“discrete” because we examine indivisible products where a consumer
must decide whether or not to buy a product (or more generally take
an action), and cannot choose a continuous amount to reflect his or her
utility (although an integral number of units may be selected). For a useful
summary of the economic perspective and theoretical basis of choice
models, see Ben Akiva and Lerman (1985, Chapter 3).
The origins of choice models
Early work in economics suggested that the utility, Uij, that consumer i
could expect to derive from product j was a function of the attributes k
(k= {1, 2, . . ., K}) that the consumer perceived the product to contain,
yijk, multiplied by how important those attributes were to the consumer, bik
(e.g., Lancaster 1966). Assuming separability of attributes and linearity in
attribute levels, this is frequently expressed as:
Vij 5 a bik yijk

K
(7.1)
k51
When evaluating products, we may represent consumers as having a

ranked set of utilities for the alternative products from which they might
choose, and developing an intention to select the product that they prefer
most. However, a number of other factors may intervene before a pur-
chase can be made (e.g., Sheth 2011). One factor is the attitude of others
(e.g., Fishbein 1967). Consumers’ purchase intentions are also influenced
by changes in family incomes, environmental factors at the point of pur-
chase (such as availability, competitive activity, etc.) and a variety of other
variables. Finally, measurement error may arise when we try to estimate
both utility and intended behavior. Thus, estimated preferences and inten-
tions are not completely reliable predictors of actual buying behavior;
while they indicate likely future behavior, they fail to include a number
of additional factors that may intervene. For a review as to when these
factors are likely to be significant, see Morwitz, Steckel and Gupta (2007).
To address the noise or error that is introduced between utility
measurement and a consumer’s later actual behavior it is common to
decompose the utility that consumer i expects to obtain from product j,
Uij, into a deterministic component that represents the part of utility that
an observer can estimate at the time of purchase, Vij, and an error term eij,
that is a component of a consumer’s true utility that is not observed. The
resulting representation given by equation (7.2) is called a random utility
model.
MIZIK_9781784716745_t.indd 156 14/02/2018 16:38

Modeling choice processes in marketing 157
Uij 5 Vij 1 eij (7.2)
Given that consumer i is assumed to always buy the product with the
highest utility, Uij, but we cannot fully observe this, we need a repre-
sentation of how the unobserved utilities of product j (for j = {1, 2, . . .,
J}) relate to the actual choice and ultimately into his or her associated
probabilities of choice, Pij. Early attempts to undertake this task adopted
a share of utility model, in which a product’s probability of being chosen
equaled its utility divided by the sum of the utilities of all of the products
that might have been chosen (e.g., Luce 1959). While simple, this approach
has a number of drawbacks. First, the predicted probability of a product
being selected is not invariant to the scale used. That is, if a constant is
added to the utility of each product, the predicted probability of each
product being chosen will change. Second, Luce’s axiom, the foundation
on which this formula is predicated, requires that the ratio of any two dif-
ferent products being selected does not depend on the presence or absence
of other possible products in the available set (Bradley and Terry 1952).
This assumption, known as the independence of irrelevant alternatives (or
IIA), can be problematic in some applications. For example, assume that a
commuter has the option of driving a car or catching a blue bus, and does
so with equal probability of 0.5. If a new bus is added to the commuting
route, identical in all respects to the blue bus (schedule, comfort, price,
etc.), except that it is red, one might assume that the red bus would draw
(almost) exclusively from the blue bus for which it is a perfect substitute
and negligibly from the car, giving probabilities of PCar, PBlue, and PRed of
0.5, 0.25, and 0.25 respectively. However, a share of utility model would
suggest that the red bus would draw proportionately from the blue bus
and the car and thus lead to probabilities of probabilities of PCar, PBlue, and
PRed of 0.33, 0.33, and 0.33 respectively.
In order to adopt a more axiomatic approach to the relationship
between probability of choice and the underlying utilities on which it is
based, econometricians consider possible distributions of the error term in
equation (7.2), eij, and use these to derive the implied probability that the
utility, Uij, would be greater than the utilities of all of the other available
products, {Uij’, j’=1, 2, . . .., J and ≠ j}. This approach, the results of which
are described in the next section, led to the basic choice models that are in
common usage today.
In his Figure 1 (reproduced here as Figure 7.1), McFadden (1986)
describes the relationship between physical attributes and consumer
perceptions of them, past choice (behavior), future choice intentions, and
intermediate constructs such as preferences.
MIZIK_9781784716745_t.indd 157 14/02/2018 16:38

External Factors (Historical experience Market Information Product Attributes

& Socioeconomic effects)
Generalized Attitudes Perceptions (beliefs)

(Values)
Attitude Inventories Judgement Data
Decision Protocol Preferences
Stated Protocol Stated Preferences
Behavioral Intentions
Experimental Constraints
Stated Intentions
BLACK BOX
Market Constraints (budget, availability)
Market Behavior
Figure 7.1 Path diagram for the customer decision process
The workhorses of discrete choice

modeling
The Logit Model of Choice
The first model to axiomatically derive the probability of purchase, based

on an assumption about the distribution of eij, was the multinomial logit
model, developed by McFadden (1974). It is still the predominant model
used in practical marketing applications today, although it frequently
needs modification, and often we must move beyond it. McFadden
assumed that eij follows an extreme value type 1 (EV1) or Weibull
distribution.
MIZIK_9781784716745_t.indd 158 14/02/2018 16:38

One attractive feature of the EV1 assumption is that by assuming that all
of the error terms for consumer i are independent and identically distrib-
uted across alternatives, it is possible to derive a closed-form solution for
the probability that any product j is chosen, as illustrated in equation (7.3).
eVij
a j r [C e
Pij 5 Vijr (7.3)
where C is the set of alternatives evaluated by the consumer.

Equation (7.3) is known as the multinomial logit model, and an
examination of its functional form indicates that it suffers from the
independence of irrelevance alternatives (IIA) (stemming from its iid
error assumption). One attractive feature of the multinomial logit model
is that the utility-to-choice probability transformation specified in equa-
tion (7.3) allows us to substitute the determinants of utility and examine
their predicted effect on choice. For example, the effect of the perceived
product attributes described by equation (7.1) on choice probabilities can
be directly modeled.
Nested Logit Model
As illustrated by the red bus–blue bus example, the IIA assumption

may not always be a good one. If some products are closer substitutes
than others, one might expect their proportional draw to be greater.
Fortunately, there is a test to see whether this property has been violated
(Hausman and McFadden 1984). If the IIA assumption is not tenable,
researchers have taken two main approaches to addressing the result-
ant problem. For a detailed discussion of such approaches see Louviere,
Hensher and Swait (2000, Chapter 6).
The first remedy to IIA violations is to consider sub-choices by the indi-
vidual in which the choice between each element in a sub-set of alternatives
that are likely to be similar, and then to model the choice between different
sub-sets. The second approach is to assume a more flexible error structure,
in which the errors in equation (7.2) are not assumed to be independently
and identically distributed. Similar choices are likely to have associated
unobservable features that are correlated and that can be accommodated
by allowing for a full (or at least less constrained) covariance matrix of the
error terms.
To illustrate the first approach, consider a consumer choosing a brand
of breakfast cereal. It may not be realistic to believe that pre-sweetened
cereals such as Cheerios or Nutrigrain would draw as much share from a
health-focused cereal such as All Bran as would another health-focused
MIZIK_9781784716745_t.indd 159 14/02/2018 16:38

First level of choice

(Buy in category) Do not buy Buy
Second level of choice Brand 1 Brand 2 Brand 3 …………Brand J

(Brand choice given category purchase)
Figure 7.2 Example of a nested choice model: category purchase and

brand choice
cereal such as granola. However, we can structure our representation of

the decision process so that the consumer first chooses between healthy
cereals or pre-sweetened ones and then, conditional on that choice,
chooses a product within the healthy or pre-sweetened class. That way, the
decision at each level may be amenable to representation by a logit model,
giving rise to what we call the nested logit model.
As well as providing a means to overcome the IIA problem while still
maintaining the simplicity of the logit model, the nested logit model is
an excellent way to represent a number of different choice processes. For
example, such nesting can be used to describe the decision to buy in the
category or not, followed by the decision as which brand to buy, given a cat-
egory purchase (e.g., Roberts and Lilien 1993), as illustrated in Figure 7.2.
In the category purchase/brand choice example in Figure 7.2, consumer
i’s probability of choosing j, Pij, may be written as
Pij 5 PiB Pij 0B (7.4)
where PiB is the probability of consumer i buying in the category, while

Pij|B is the probability of him/her selecting brand j, given a category pur-
chase. One attractive feature of the nested logit model lies in the modeling
of the inter-relationship between the two decision levels. The utility of
individual brands should affect the utility of the category as a whole. If
a new car is launched that is highly appealing to consumer i, that should
increase the expected utility of his/her buying a car, which in turn should
increase the probability of a category purchase. In the nested logit model,
there is a term in the utility of the upper-level choice utility, UiB, known
MIZIK_9781784716745_t.indd 160 14/02/2018 16:38

as the inclusive value, IViB, which specifies how individual brands’ utili-
ties affect the utility of a category purchase as a whole (e.g., Louviere,
Hensher and Swait 2000). IViB may be shown to be equal to the expression
in equation (7.5).
IViB 5 ln ( a eVij )
r
(7.5)
j r[C
In marketing, where strategies may be targeted at either increasing
primary demand (category demand) or secondary demand (brand choice,
given category purchase), this distinction is a particularly useful one. As an
example of this in practice in the US ground coffee market, see Guadagni
and Little (1983) for a model of brand choice (conditioned on purchase) and
Guadagni and Little (1998) for the corresponding category purchase model.
The nested logit model is also extremely useful for understanding the
structure of competition implied by consumer switching. See Urban,
Johnston and Hauser (1984) for an example in the freeze-dried coffee
market, using the nested logit model to determine the best representation
of category structure.
The Probit Model of Choice
As an alternative to nesting as a means of ameliorating the effects of

unequal draws between products (the IIA assumption), it is possible to
model the nature of these interactions directly. If we relax the assumption
that the error terms, eij, are independent of each other, then the constraint
of one product drawing share proportionately from all others can cor-
respondingly be relaxed. The most common assumption with correlated
error structures is that the error terms follow a multivariate normal
distribution. Such a representation is known as the probit model (e.g.,
Wooldridge 2010). The cost of this more generalized formulation is that it
no longer leads to a closed-form solution for the probabilities and hence
complicates estimation.
To illustrate the probit model, it is useful to examine one specific
example of it, the binary probit, when the consumer considers just two
alternatives. From equation (7.2), the probability that consumer i chooses
product 1 over product 2, Pi1, may be written as equation (7.6)
Pi1 = Pr(Ui1> Ui2) = Pr(Vi1 − Vi2 > ei2 − ei1) = 1−  (Vi1 − Vi2) (7.6)
where  is the cumulative distribution function of the standard normal

distribution, N(0, 1), with the normalization to unity of the variance of the
differenced errors necessary for identification.
MIZIK_9781784716745_t.indd 161 14/02/2018 16:38

Unfortunately, the probit probability has no closed form solution and

as the number of alternatives in the choice set increases its parameters
become increasingly difficult to estimate. Recent advances in numerical
methods have reduced the barriers that this imposes, but the multinomial
probit is still applied in a minority of real world marketing applications.
Decomposing utility
While it is useful to understand an individual’s choice response as a func-
tion of utility, it is far more diagnostic to the manager to decompose that
utility into more actionable measures, such as the product’s attributes
or its price. Substituting equation (7.1) into the probability of choice
(equation (7.3) or equation (7.6), for example) provides a mechanism
by which product attributes or consumers’ perceptions of them may be
related to choice. Price can be treated as an attribute for the purpose of
studying price elasticities. Frequently, more sophisticated response curves
are required than those represented by such a simple substitution. For
example, behavioral economics has suggested that price response may
not be symmetric around some reference price around which a consumer
anchors his or her judgment. In this regard, Lattin and Bucklin (1989)
demonstrate that explanatory power is increased by allowing price elastici-
ties of prices increases to be greater than those for price decreases.
Product attributes may be incorporated either as objectively measured
features (such as brand name, size, or claimed fuel economy) or subjec-
tively measured perceptions. Perceptions may be elicited using surveys of
consumers and relating those perceptions to past reported behavior or to
future behavioral intentions. For example, Danaher et al. (2011) relate the
intended probability of choosing an airline to its perceived performance,
reputation and price which they in turn relate to perceptions of 29 sub-
attributes, allowing management to focus on those perceptions with high
importance weights and performance deficits that can be cost-effectively
addressed. Guadagni and Little (1983) include the objectively measured
variables of brand and pack size. The role of objectively measured
attributes in driving choice can not only be calibrated by this type of study
of consumers’ past choices and determining how they vary as a function
of their constituent attributes, they can also be gauged by seeking a con-
sumer’s intent toward hypothetical products, using choice-based conjoint
analysis (see, for example, Rao 2014), as described below.
Other management decision variables may be incorporated into discrete
choice models, though often in a way that is somewhat arbitrary and a
matter of convenience. Inserting such explanatory variables may often
MIZIK_9781784716745_t.indd 162 14/02/2018 16:38

provide a reasonable representation of the effect of marketing mix vari-

ables, but it may also undermine the elegance of the assumptions that led
to the choice model in the first place. For example, Erdem and Keane
(1996) allow advertising to shift consumer perceptions, which then affect
choice through a variant of equation (7.1).
Advanced models of choice

Generalized Logit Model
The multinomial logit in equation (7.3) and probit in equation (7.6) have
been generalized to cover a number of behavioral situations and to accom-
modate panel data where repeated choice occasions are available for each
individual. Fiebig et al. (2010) provide a comprehensive review of the situ-
ations in which extensions to the logit model may be useful. By combining
equations (7.1) and (7.2), they show how the vector of importance weights,
b = {bk } Kk51, and the properties of the error term, {eij}, can be generalized
to allow a relaxation of the IIA assumption to generate the Generalized
Multinomial Logit model. We write:
bi = si .b + g.hi + (1−g) sihi (7.7)
where si is a scale parameter for the error term and hi is a measure of

individual level heterogeneity. The generalized logit shares with the
probit model the characteristic of being difficult to estimate. However,
by restricting the sources of individual level heterogeneity, it is possible
to alleviate most problems with IIA, while still maintaining tractability.
In Fiebig et al.’s Figure 1, reproduced here as Figure 7.3, by restricting
the parameter g to 0 or 1, one can derive the more tractable Generalized
Multinomial Logit Model Type I or II; while, by not allowing variance in
hi, or limiting variance in s across individuals, one can derive the Scale
Multinomial Logit model and Mixed (Heterogeneous) Multinomial Logit
model, respectively. Fiebig et al. (2010) suggest that while the Mixed
(Heterogeneous) Multinomial Logit model has enjoyed considerable
success in marketing, on the 10 data sets that they examined, it was out-
performed by the Generalized Multinomial Logit model on seven, and the
Scale Multinomial Logit model on three.
MIZIK_9781784716745_t.indd 163 14/02/2018 16:38

G-MNL
βi = σi.β + γ.ηi + (1-γ) σiηi
γ =1 γ=0
G-MNL-I G-MNL-II
βi = σi.β + ηi βi = σi.(βi + ηi)
var(ηi) = 0 σi = σ = 1
S-MNL MIXL
βi = σi.β βi = (βi + ηi)
σi = σ = 1 var(ηi) = 0
S-MNL
βi = σi.β
Source: Reproduced from Fiebig et al. (2010) with permission.
Figure 7.3 The G-MNL model and its special cases
Tobit Model
In many situations observations are not available on all levels of the inde-
pendent variables that form the predictors of utility in equation (7.1). For
example, a supermarket may have a policy of not pricing milk under $1 a
pint. Purchases of milk at prices below $1 are never observed and thus con-
sumer responses are censored above the resultant utility stemming from
that price. To ignore this censoring will result in biased estimators and so
James Tobin (1958) developed the Tobit model to account for the missing
data. Chandrashekaran and Sinha (1995) provide a nice example in mar-
keting when studying trial-and-repeat. Repeat purchase is predicated on
initial trial and so repeat is not observed for all consumers, in particular
not for those for whom trial never occurs.
MIZIK_9781784716745_t.indd 164 14/02/2018 16:38

Calibrating choice models
Choice models have enjoyed considerable popularity in marketing in two

particular areas. One is in the study of actual consumer behavior, while
the other involves observing consumers’ intentions toward hypothetical or
real products in a given future scenario. We briefly examine both of these
approaches below.
Models Using Scanner Data
The advent of store scanners in the 1980s in developed countries led to

the availability of large amounts of data at the individual level that choice
models are well-equipped to harness. Not only did a large quantity of
binary choice data become available regarding consumers’ shopping of
specific stock keeping units (SKUs), but associated marketing activity in
terms of price, promotions and in-store advertising was also recorded,
allowing its effect on purchase behavior to be gauged. To insert variables
such as advertising as explanatory variables into equation (7.1) to deter-
mine their effect on preference and choice makes an implicit assumption
that advertising will be translated (usually linearly) into beliefs that, in
turn, will influence preference. There is some small irony that, while the
behavioral underpinnings of the transformation from utility to choice
have undergone considerable research and debate, little process justifica-
tion is given for slapping marketing mix variables into the utility function.
Because behavioral field data tend to be automatically captured, the
major dependent variable of these choice models tends to be actual
purchase or consumption, and the independent variables tend to be envi-
ronmental ones (such as competitive actions) and management control
ones (such as price). While rich in these dimensions, such data often
have limited information on the consumer characteristics that are highly
influential in choice (for example, the member of the household for whom
the purchase is being made).
Models Using Choice Experiments
Conjoint analysis is an approach to understanding consumer evalua-

tion designed to infer consumers’ implied trade-offs between different
attribute levels in terms of utility or preferences. The development of
choice models allows conjoint analysis to progress from understanding
the drivers of consumer preference to explaining their intentions and likely
future behavior. Louviere and Woodworth (1983) took the techniques
of conjoint analysis combined with experimental design to show how
MIZIK_9781784716745_t.indd 165 14/02/2018 16:38

c hoice-based conjoint analysis could be used to estimate the partworths of

different attributes. For an excellent review of recent advances of conjoint
analysis see Rao (2014). Carson and Louviere (2011) suggest that many
of the terms used in conjoint analysis and experimental choice modeling
and measurement may be subject to ambiguity. We commend that paper
to those interested in understanding where those sources of ambiguity are
likely to arise, but we try to adopt most frequently used meanings. The
design of choice experiments has evolved considerably over the past 10
years. Commercial software such as Sawtooth has made choice experi-
ments extremely accessible to a wide range of analysts. Orme (2013) sug-
gests that ratings-based measures have given way to tasks in which the
respondent picks the best and the worst from a choice set (MaxDiff and
Best–Worst) (see Louviere, Flynn and Marley 2015 for the development
of these approaches). Presenting adaptive choice sets has reduced the
respondent burden as software determines the next choice set that will
yield the most amount of information for each respondent, given his or her
answers to previous choice tasks. See Louviere, Hensher and Swait (2000,
Chapter 10) for a series of marketing applications using choice models in
an experimental setting.
Estimation
As well as programs to facilitate experimental design, discrete choice esti-

mation programs are also readily available. Hensher, Rose and Greene
(2005, Chapters 10, 14 and 16), for example, provide the ideas and tech-
niques behind maximum likelihood estimation of multinomial logit and
nested logit and simulated maximum likelihood for mixed logit models,
respectively. Bayesian methods have also been proposed but compari-
sons with simulated maximum likelihood suggest little difference in the
estimates (Elshiewy et al., 2017).
In specifying choice models, the analyst must obviously be wary of
ensuring that the assumptions underlying the model do indeed pertain.
One major threat to the validity of estimation results is that of endogene-
ity. Endogeneity arises when variables used as predictors are themselves
endogenous (internal) to the system being estimated. For example, if
individual choice is thought to be a function of price, but the price is set (at
least partially) to clear demand, then biased and inconsistent estimators
may arise because the explanatory variables may be correlated with the
error terms of the latent utilities. Villas-Boas and Winer (1999) show that
substantial estimation errors can arise as a result of failing to account for
endogeneity. The remedy for this problem is usually to find a surrogate or
“instrument” for the independent variable that is unlikely to be correlated
MIZIK_9781784716745_t.indd 166 14/02/2018 16:38

with the error structure. For example, lagged price may provide a good
instrument for price. Wooldridge (2010, Chapter 6.3) provides an excellent
description of the Durbin-Wu-Hausman test to probe the degree of threat
posed by endogeneity.
Applying choice models to represent

marketplace phenomena
The basic discrete choice models described in the previous sections have
been applied to address a number of additional behavioral phenomena
beyond the effect of perceived attributes on an individual’s choice process,
as well as to inform a variety of management marketing decisions. In this
section, we will look at a number that have attracted considerable atten-
tion. In terms of behavioral aspects, we look at multistage choice models,
models accounting for heterogeneity, and dynamic models. In terms of
informing managerial decisions we consider product design, marketing
mix response and strategic decisions. These are summarized in Table 7.1.
Multi-stage Choice Models and Consideration
Many scholars have spoken of the advantage of regarding choice not as

a single decision, but as a process with a number of stages (e.g., Wright
and Barbour 1977). Equation (7.4) demonstrated one such example: that
of understanding need arousal (that is, category purchase or primary
demand), followed by brand choice (secondary demand). Roberts and
Lilien (1993, Table 2.1) provide a categorization of the marketing models
typically used at different stages of the consumer choice process; need
arousal, information search, evaluation, purchase and post purchase.
One stage of the decision process that has attracted much attention is
that of consideration. In many categories there are many possible brands
from which the consumer can choose and it may not be feasible for him
or her to evaluate all of the brands of which s/he is aware. Nor may it pass
a cost–benefit test for him/her to do so, if evaluation and search has an
associated physical or psychological cost (see Shugan 1980).
Gensch (1987) demonstrated not only that model fit could be improved
by the inclusion of a second stage in the choice process, but also that
different managerial drivers may be present at each, providing diagnostic
information as to where marketers should focus their attention at different
stages of the sales process. Using the inclusive value of the consideration
set (equation 7.5), Roberts (1983) was able to derive the level of threshold
utility necessary for a product to be able to justify the (psychological and
MIZIK_9781784716745_t.indd 167 14/02/2018 16:38

Table 7.1 Applications of choice modeling
MIZIK_9781784716745_t.indd 168
Type of Application Examples of application
7.1 Leveraging Consideration Consideration Consideration Choice Affect in choice Non-
the Consumer (Self-explicated) (Scanner data) (Psychology) archetypes Roberts et al. compensatory
Decision Roberts and Lattin Siddarth, Hutchinson, Swait Popal and (2015) two stage
Process (1991) Utility Bucklin& Raman and Wang (2016) Emotions choice models
Thresholds Morrison (1995) Mantrala Information in choice Gilbride and
Hauser and Heterogeneity (1994) context models Allenby (2004)
Wernerfelt Andrews and Retrieval/ of choice
(1990) Cost Srinivasan salience processes
benefit (1995)
168
Dynamics
7.2 Heterogeneity Latent segments Discrete Heterogeneity on Primary vs Discrete Use of probit
and in choice models Continuous Probit models secondary segment choice for
Segmentation Kamakura and segments Allenby and demand targeting segmentation
Russell (1989) Andrews Ainslie Rossi (1998) Arora Allenby Kamakura, Chintagunta and
and Currim and Ginter Kim and Lee Honore (1996)
(2002) (1998) (1996)
7.3 Dynamics Variety seeking Variety seeking Loyalty and Consumer Consumer Trial–repeat
and Market Lattin and and inertia heterogeneity learning learning models
Evolution McAlister (1985) jointly Ailawadi, Roberts and Erdem and Chandrashekaran
Seetharaman and Gedenk and Urban (1998) Keane (1996) and Sinha
Chintagunta Neslin (1999) Survey based Scanner (1995) Split
(1998) based hazard model
14/02/2018 16:38
MIZIK_9781784716745_t.indd 169
7.4a Product Perceptions Rating and choice Adaptive choice- Quality Menu planning
design and versus objective based conjoint based conjoint Reference Liechty,
consumer measures analysis Toubia, Hauser effects Ramaswamy,
response Adamowicz et al. Moore (2004) & Simester Hardie, Johnston and Cohen
(1997) (2004) and Fader (2001)
(1993)
7.4b Marketing Brand choice Category purchase Primary and Reference Generalizations
mix response models of choice models of secondary points price & of Reference
modeling Guadagni and choice demand promotion points
169
Little (1983) Guadagni and Gupta (1988) Lattin and Kalyanaram
Little (1995) Bucklin (1989) and Winer
(1995)
7.5 Competitive Market structure Acquisition and Portfolio models Defence Growth and
analysis and Urban, Johnson retention of choice prelaunch defence
strategy and Hauser Rust, Lemon & Ben Akiva et al. Roberts, Nelson agenda
(1984) Zeithaml (2004) (2002) and Morrison Hauser, Tellis
(2005) and Griffin
(2006)
14/02/2018 16:38
physical) cost of its consideration. See also Hauser and Wernerfelt (1990)
and Roberts and Lattin (1991) for the development and testing of similar
models.
For example, Roberts and Lattin (1991, equation 6) demonstrate that
a utility-maximizing consumer should include product j in his or her
consideration set if its utility, uj, passes the following threshold:
uj . ln [ ( a eu jr) (ecj 2 1) ] (7.8)

j r [C
where uj is the utility of product j′ in the consumer’s consideration set, C,

and cj is the search and processing cost associated with considering j.
While Roberts and Lattin adopt a cost–benefit approach to ascertain
whether a particular product justifies entering into (or staying in) a
consumer’s consideration set, Ben Akiva and Boccara (1995) model
the composition of the set as a whole, not just the value of incremental
changes to it.
The survey approach to consideration sets adopted by Roberts and
Lattin lends itself well to the elicitation of self-stated consideration sets
although this may require care at the estimation stage because of issues
related to a form of self-selection (see Carson and Louviere 2014). Where
only behavioral data are used in analysis, it may be preferable to treat con-
sideration as a latent construct and to infer its membership by estimation.
Siddarth, Bucklin and Morrison (1995) and Andrews and Srinivasan
(1995) show that such sets could be inferred from scanned data and the
resultant two-stage representation of consumer behavior improved model
fit. In a review of consideration models, Roberts and Lattin (1997) discuss
the relative merits of elicited and inferred considerations sets. While previ-
ous models of consideration focused on the cost–benefit of consideration,
Hutchinson, Raman and Mantrala (1994) modeled the probability of
retrieval from memory, incorporating the role of salience in evaluation.
In other extensions to the consumer decision process in choice, Swait,
Popa and Wang (2016) compared limited and full information processing
representations of the consumer, while Roberts et al. (2015) incorpo-
rated emotions to complement cognitive evaluation choice processes.
Gilbride and Allenby (2004) also extended the flexibility of choice models
when they moved from compensatory choice processes to conjunctive,
disjunctive and lexicographic screening.
Accounting for Heterogeneity
One of the advantages of models of individual choice is that, at least in

principle, they allow us to study differences between consumer evaluation
MIZIK_9781784716745_t.indd 170 14/02/2018 16:38

and choice processes. In practice, we may not have enough degrees of

freedom to estimate models of choice at the individual level, nor may it be
managerially useful to do so, and thus the challenge becomes to segment
the population into groups of similar consumers within each segment, but
with meaningful differences between segments. Early attempts to do this
on the basis of observable characteristics had at best mixed usefulness, and
so researchers decided to look for a discrete number of latent segments,
based on consumers’ behavior within the group (e.g., Kamakura and
Russell 1989). By and large a discrete representation of consumer differ-
ences has given way to a process that describes tastes and attribute impor-
tance on a distribution through the population (e.g., Andrews, Ainslie
and Currim 2002). Segmentation models have been used to understand
differences between primary (category) demand and secondary demand
(brand choice) (Arora, Allenby and Ginter 1998), and to target consumers
based on their likely location on the taste distribution (Kamakura, Kim
and Lee 1996). While the mixed logit model has been a common method
of representing heterogeneity, probit models with mixing distributions
have also proved popular (e.g., Allenby and Rossi 1998; Chintagunta and
Honore 1996) and more recently both approaches have been combined
(e.g., Keane and Wasi 2013). For a comprehensive view of approaches to
segmentation and heterogeneity see Wedel and Kamakura (2012).
Dynamics in Choice Models
Consumers’ evaluation and choice processes are generally not static. First,
they may vary cyclically, depending on the purchase context, and, second,
they may evolve systematically over time. Choice models have been
adapted to represent both of these marketplace phenomena.
Models in which choice in one period of time is dependent on choice
in the previous period are said to exhibit state dependence. Behaviorally,
choice of an alternative in time period t + 1 that is higher than its long-
term average when that alternative was chosen in time t may be driven by
inertia or habit (e.g., Seetharaman and Chintagunta 1998). Conversely,
if a purchase in time t reduces a product’s probability of purchase on the
next occasion, that consumer is said to be exhibiting variety-seeking (e.g.,
Lattin and McAlister 1985). Kahn (1995) provides a nice classification of
the different types of variety-seeking that we might observe. Seetharaman
and Chintagunta (1998) warn of the dangers of only including one of these
phenomena in choice models when both may be present. They demon-
strate that a failure to account for inertia may lead to a false conclusion
that variety seeking is occurring in the marketplace. By the same token,
they note that a failure to adequately account for consumer heterogeneity
MIZIK_9781784716745_t.indd 171 14/02/2018 16:38

in a model estimated across consumers may lead to a conclusion of state

dependence, whereas what the model may be picking up is idiosyncratic
differences in preferences across the sample. Thus, when Guadagni and
Little (1983) used geometric decaying state dependence as an explanatory
variable in their choice model, they appropriately attributed its influence
to differences in brand loyalty between consumers, rather than habit.
As well as studying choice that fluctuates systematically around previous
purchases, choice models have also been adapted to study the evolution of
choice as new products diffuse through the population. Rogers (2003) in
his seminal book on diffusion identified the factors that drive the sales of
new products and other innovations to start slowly, go through a growth
phase and then slow down. This work relates to sociology, agricultural
economics, marketing and many other disciplines. In marketing, early
work concentrated on the effects of diffusion at the aggregate or market
level, with the Bass (1969) model describing the S-shaped sales curve along
which many new products were observed to evolve. Roberts and Urban
(1988) provide an individual analog to this aggregate phenomenon by use
of a dynamic discrete choice model. They suggest that if, at time t, a con-
sumer can be assumed to have a normally distributed set of beliefs about
the product attributes that determine a product’s utility (equation (7.1))
then we can assume that these beliefs will be updated in a Bayesian way as
the consumer gathers more information, shifting his or her beliefs toward
some true level and reducing associated uncertainty. They further show
that a constantly risk-averse consumer will linearly discount uncertainty
(or the variance of beliefs). Substituting the Bayesian updating of mean
beliefs and their uncertainty into the expected utility function and then
the expected utility function into the discrete choice model allows them to
study how a consumer’s probability of choice will evolve, providing the
individual analog to the aggregate level diffusion curve of Bass.
Roberts and Urban assume that the consumer is myopic. A further
advance to their approach is contained in the work of Erdem and Keane
(1996). Using a structural model, they allow for the fact that the consumer
may be forward-looking and calibrate their model in a packaged-goods
setting, as opposed to the durable one of Roberts and Urban.
Highly related to dynamic discrete choice models is the use of hazard
rate models (e.g., Jain and Vilcassim 1991). Hazard rate models describe
the probability of an event occurring in any time interval and thus are
analogous to choice models. As with the probability of choice, the hazard
rate can be expressed as a function of a product’s constituent attributes, or
other management or environmental decision variables. One advantage of
hazard rates lies in their ability to look at flexibility in periods of time over
which the purchase decision may take place, making them more easily
MIZIK_9781784716745_t.indd 172 14/02/2018 16:38

applied for decisions of when purchase will take place (in continuous time),
as well as what will be purchased (e.g., Chintagunta 1993). These dynamic
models may be applied to different decision stages of the diffusion process.
For example, Chandrashekaran and Sinha (1995) look at the determinants
of consumer trial and repeat using different dynamic hazard rate models.
Product Design and Marketing Mix Modeling
One of the most popular applications of choice models in marketing is

to calibrate the relationship between marketing actions and the resultant
market share through the construct of individual brand choice. Choice-
based conjoint analysis has proved popular as a means of calibrating the
effect of price and product attributes, including brand, on probability of
purchase (e.g., Moore 2004). Both objectively measured attributes, and
consumer perceptions of their value have proven valuable in predicting
choice (e.g., Adamowicz et al. 1997). Methodologically, choice-based con-
joint has evolved (e.g., the adaptive polyhedral designs of Toubia, Hauser
and Simester 2004). It has also developed to account for new phenomena
(e.g., the asymmetric quality response function of Hardie, Johnson and
Fader 1993). Finally, the technique has broadened the range of man-
agement problems for which it has been used (e.g., the menu planning
problem addressed by Lietchy, Ramaswamy and Cohen 2001).
Other elements of the marketing mix have also been studied using
discrete choice models, including the effect of pricing, promotions and
displays on both primary and secondary demand (see, for example,
Guadagni and Little 1983, 1998; Gupta 1988). The incorporation of refer-
ence points has improved fits in keeping with suggestions from prospect
theory (e.g., see Lattin and Bucklin 1989; Kalyanaram and Winer 1995).
Competitive Analysis and Strategy
Perhaps surprisingly, discrete choice models have had less impact at the
strategic level than the operational one. There are exceptions. In the early
days of choice modeling in marketing, Urban, Johnson and Hauser (1984)
used nested choice models to understand the market structure in the
US coffee market. Market structure analysis provides valuable strategic
insight in terms of both competitive analysis and portfolio planning.
More recently, Roberts, Nelson and Morrison (2005) developed a
dynamic brand choice model for market defense. The problems facing a
defendant are different to those facing a new entrant and so a dynamic
model to calibrate the speed and degree of the evolution of the market
is required to allow the incumbent to slow the rate of share loss and
MIZIK_9781784716745_t.indd 173 14/02/2018 16:38

minimize its ultimate equilibrium level. Somewhat unusually, Roberts

and his colleagues calibrated this dynamic model prior to the launch of
the new entrant. Hauser, Tellis and Griffin (2006) provide a nice review
of the issues facing companies using such models strategically for growth
and defense.
Choice models have been used for other strategic problems (such as
portfolio planning by Ben Akiva et al. 2002), but only on a limited basis.
Given their suitability in terms of calibration potential and formulation
for strategic questions such as new market entry and balancing the firm’s
product-market portfolio, this is perhaps surprising.
Challenges
Marketing is going through a period of great turbulence, much of which

is addressed elsewhere in this volume. These changes include the rise
of social networks, the increased ability of marketers to automatically
capture data on identifiable consumer groups and to tailor their offerings
to them, and the fragmentation of distribution (including the rise of the
mobile consumer and multichannel purchasing). Each of these will have
implications for how choice models are applied. We briefly provide our
view as to where some of these changes will occur.
The Rise of Social Networks
The advent of the internet and increased penetration of smart phones

has led to a much greater degree of electronic connectedness (e.g., Berger
and Milkman 2012; Stephen and Toubia 2010). Social networking sites
such as Facebook and blogs have become highly influential in driving
consumer choice. To some extent this is not new, but the degree of its
influence is new and it is growing. One issue that this raises for marketing
modelers is how to represent this social influence. Martin Fishbein (1976)
suggested that in addition to the evaluation of the personal consequences
of choosing a course of action, represented in our case by equation (7.1),
many decision-makers’ choices will be influenced by the views of others
(Ryan and Bonfield 1975). He advocated adding another set of terms to
the measurement of attitude that relate to the views of other stakeholders
important to the decision maker. These were the social normative beliefs
of others and the decision maker’s motivation to comply with these views.
It would seem useful if we adopted a similar approach to understand-
ing the views of others in determining utility. However, not all others’
views should be given equal weight. Trusov, Bucklin and Pauwels (2009)
MIZIK_9781784716745_t.indd 174 14/02/2018 16:38

c onsider network structures on how social influence will spread through

the population, affecting different members’ choices. This is clearly an
area of high potential as current methods of collaborative filtering become
finer in their ability to forecast the effects of others’ choices on the focal
consumer (Linden, Smith and York 2003).
Advent of the Addressable Consumer
In 1991, Blattberg and Deighton (1991) predicted the advent of an era

of addressability, leading to a more interactive relationship between the
marketer and consumer. This has certainly occurred, with companies like
Tesco customizing their communications and offerings to consumers at
a more and more granular level (Humby, Hunt and Phillips 2004). The
ability of discrete-choice models to add rigor to the modeling of purchase
histories is obvious, and this area does reveal the opportunity for us to
fine-tune our dynamic models to represent evolving manufacturer-con-
sumer interactions.
The addressable customer affects both of Day’s (1994) marketing capa-
bilities: market sensing and market linking. In terms of market sensing,
the ability to calibrate customers allows more focused prospecting at the
acquisition stage, pre-emptive strategies to be put in place at the retention
stage, and more effective initiatives at the account growth stage. Arora et
al. (2008) explore the role that choice models may have in personalizing
product offerings in this “big data” world.
Distribution Fragmentation and the Mobile Customer
The rise of the internet channel has meant that the purchase process is
considerably more multichannel than it was previously. Whereas it used to
be sufficient to understand the effect of the marketing mix on the final pur-
chase decision, increasingly marketers are being asked to identify the effect
of different touch points on the final decision outcome. The ability of the
consumer to engage with the marketer at any place and at any time, and
for the marketer to engage with the consumer at any place and at any time,
means that the consumer experience corridor is attracting considerably
more attention in both academia and industry with touchpoint attribution
models being adopted by many multichannel organizations (Chittilappilly
et al. 2013). A special issue of the Journal of Retailing edited by Verhoef,
Kannan and Inman (2015) explored the challenges posed by consumers’
channel switching, both in terms of changing competitive infringement (as
witnessed by the effect of the ecommerce model of Amazon on the bricks-
and-mortar business of Borders) and channel coordination and multiple
MIZIK_9781784716745_t.indd 175 14/02/2018 16:38

touch points for both business to consumer and business to business

marketers (see Berger et al. 2002 for a framework and a summary of the
literature in this area).
Summary
This chapter has outlined basic choice models and showed how they can
be generalized to handle a more complex set of phenomena. The survey
focused on the application of choice models: the management decisions to
which an understanding of customers might lead. It examined the market-
ing environment for trends, and suggested challenges that will face choice
modelers as consumers become more connected with each other, more
mobile while still in touch, and more fragmented in terms of channels for
information and products and services. These trends apply to business to
consumer marketing, but they are applicable to business to business mar-
keting as well (e.g., Bolton Lemon and Verhoef 2008).
References
Adamowicz, Wiktor, Joffre Swait, Peter Boxall, Jordan Louviere and Michael Williams (1997)
“Perceptions versus objective measures of environmental quality in combined revealed and
stated preference models of environmental valuation.” Journal of Environmental Economics
and Management 32, no. 1: 65–84.
Allenby, Greg M. and Peter E. Rossi (1998) “Marketing models of consumer heterogene-
ity.” Journal of Econometrics 89, no. 1: 57–78.
Andrews, Rick L., Andrew Ainslie, Imran S. Currim (2002) “An Empirical Comparison of
Logit Choice Models with Discrete Versus Continuous Representations of Heterogeneity.”
Journal of Marketing Research 39, no. 4: 479–487.
Andrews, Rick L. and Imran S. Currim (2003) “A comparison of segment retention criteria
for finite mixture logit models.” Journal of Marketing Research 40, no. 2: 235–243.
Andrews, Rick L. and T. C. Srinivasan (1995) “Studying consideration effects in empirical
choice models using scanner panel data.” Journal of Marketing Research 32, no. 1: 30–41.
Arora, Neeraj, Greg M. Allenby and James L. Ginter (1998) “A hierarchical Bayes model of
primary and secondary demand.” Marketing Science 17, no. 1 (1998): 29–44.
Arora, Neeraj, Xavier Dreze, Anindya Ghose, James D. Hess, Raghuram Iyengar, Bing Jing,
Yogesh Joshi, V. Kumar, N. Lurie, Scott Neslin and S. Sajeesh (2008) “Putting one-to-one
marketing to work: Personalization, customization, and choice.” Marketing Letters 19,
no. 3–4: 305–321.
Bass, F. M. (1969) “A new product growth for model consumer durables.” Management
Science 15, no. 5: 215–227.
Ben-Akiva, Moshe and Bruno Boccara (1995) “Discrete choice models with latent choice
sets.” International Journal of Research in Marketing 12, no. 1: 9–24.
Ben-Akiva, Moshe E. and Steven R. Lerman (1985) Discrete choice analysis: theory and
application to travel demand. Vol. 9. Cambridge, MA: MIT Press.
Ben-Akiva, Moshe, Daniel McFadden, Kenneth Train, Joan Walker, Chandra Bhat,
Michel Bierlaire and Denis Bolduc (2002) “Hybrid choice models: progress and chal-
lenges.” Marketing Letters 13, no. 3: 163–175.
MIZIK_9781784716745_t.indd 176 14/02/2018 16:38

Berger, Jonah and Katherine L. Milkman (2012) “What makes online content viral?” Journal
of Marketing Research 49, no. 2: 192–205.
Berger, Paul D., Ruth N. Bolton, Douglas Bowman, Elten Briggs, V. Kumar, A. Parasuraman
and Creed Terry (2002) “Marketing Actions and the Value of Customer Assets a
Framework for Customer Asset Management.” Journal of Service Research 5, no. 1: 39–54.
Blattberg, Robert C. and John Deighton (1991) “Interactive marketing: Exploiting the age of
addressability.” Sloan Management Review 33, no. 1: 5.
Bolton, Ruth N., Katherine N. Lemon and Peter C. Verhoef (2008) “Expanding business-
to-business customer relationships: Modeling the customer’s upgrade decision.” Journal of
Marketing 72, no. 1: 46–64.
Bradley, Ralph Allan and Milton E. Terry (1952) “Rank analysis of incomplete block
designs: I. The method of paired comparisons.” Biometrika 39, no. 3/4: 324–345.
Carson, Richard T. and Jordan J. Louviere (2011) “A common nomenclature for stated pref-
erence elicitation approaches.” Environmental and Resource Economics 49, no. 4: 539–559.
Carson, Richard T. and Jordan J. Louviere (2014) “Statistical properties of consideration
sets,” Journal of Choice Modelling 13: 37–48.
Chandrashekaran, Murali and Rajiv K. Sinha (1995) “Isolating the determinants of innova-
tiveness: A split-population tobit (SPOT) duration model of timing and volume of first and
repeat purchase.” Journal of Marketing Research 32, no. 4: 444–456.
Chintagunta, Pradeep K. (1993) “Investigating purchase incidence, brand choice and pur-
chase quantity decisions of households.” Marketing Science 12, no. 2: 184–208.
Chintagunta, Pradeep K. and Bo E. Honore (1996) “Investigating the effects of marketing
variables and unobserved heterogeneity in a multinomial probit model.” International
Journal of Research in Marketing 13, no. 1: 1–15.
Chittilappilly, Anto, Madan Bharadwaj, Payman Sadegh and Darius Jose. “Method, com-
puter readable medium and system for determining weights for attributes and attribute
values for a plurality of touchpoint encounters.” US Patent Application 13/789,453, filed
March 7, 2013.
Danaher, Peter J., John H. Roberts, Alan Simpson and Ken Roberts (2011) “Practice Prize
Paper-Applying a Dynamic Model of Consumer Choice to Guide Brand Development at
Jetstar Airways.” Marketing Science 30, no. 4: 586–594.
Day, George (1994) “The capabilities of market-driven organizations.” Journal of
Marketing 58, no. 4: 37–52.
Elshiewy, O., G. Zenetti, and Y. Boztug (2017) “Differences between classical and Bayesian
estimates for mixed logit models: a replication study.” Journal of Applied Econometrics 32,
no. 2: 470–476.
Erdem, Tülin and Michael P. Keane (1996) “Decision-making under uncertainty: Capturing
dynamic brand choice processes in turbulent consumer goods markets.” Marketing Science
15, no. 1: 1–20.
Fiebig, Denzil G., Michael P. Keane, Jordan Louviere, and Nada Wasi (2010) “The
generalized multinomial logit model: accounting for scale and coefficient heterogene-
ity.” Marketing Science 29, no. 3: 393–421.
Fishbein, Martin (1967) “Attitude and Prediction of Behavior,” in Martin Fishbein (ed.),
Readings in Attitude Theory and Measurement. New York: Wiley, 477–492.
Fishbein, Martin (1976) “Extending the extended model: Some comments,” in B.B. Anderson
(ed.), Advances in Consumer Research, Vol. 3. Chicago: Association for Consumer
Research, 491–497.
Gensch, Dennis H. (1987) “A two-stage disaggregate attribute choice model,” Marketing
Science 6, no. 3: 223–239.
Gilbride, Timothy J. and Greg M. Allenby (2004) “A choice model with conjunctive, disjunc-
tive, and compensatory screening rules.” Marketing Science 23, no. 3: 391–406.
Guadagni, Peter M., and John D. C. Little (1983) “A logit model of brand choice calibrated
on scanner data.” Marketing Science 2, no. 3: 203–238.
Guadagni, Peter M. and John D. C. Little (1998) “When and what to buy: a nested logit model
of coffee purchase.” Journal of Forecasting 17, no. 3–4: 303–326.
MIZIK_9781784716745_t.indd 177 14/02/2018 16:38

Gupta, Sunil (1988) “Impact of sales promotions on when, what, and how much to
buy.” Journal of Marketing Research 25, no. 4: 342–355.
Hardie, Bruce G. S., Eric J. Johnson and Peter S. Fader (1993) “Modeling loss aversion and
reference dependence effects on brand choice.” Marketing Science 12, no. 4: 378–394.
Hauser, John R. and Birger Wernerfelt (1990) “An evaluation cost model of consideration
sets.” Journal of Consumer Research 16, no. 4: 393–408.
Hauser, John, Gerard J. Tellis, and Abbie Griffin (2006) “Research on innovation: A review
and agenda for marketing science.” Marketing Science 25, no. 6: 687–717.
Hausman, Jerry and Daniel McFadden (1984) “Specification tests for the multinomial logit
model.” Econometrica 52, no. 5: 1219–1240.
Hensher, David A., John M. Rose and William H. Greene (2005) Applied choice analysis: a
primer. New York: Cambridge University Press.
Humby, Clive, Terry Hunt and Tim Phillips (2004) Scoring points: How Tesco is winning
customer loyalty. London: Kogan Page.
Hutchinson, J. Wesley, Kalyan Raman and Murali K. Mantrala (1994) “Finding choice
alternatives in memory: Probability models of brand name recall.” Journal of Marketing
Research 31, no. 4: 441–461.
Jain, Dipak C. and Naufel J. Vilcassim (1991) “Investigating household purchase timing
decisions: A conditional hazard function approach.” Marketing Science 10, no. 1: 1–23.
Kahn, Barbara E. (1995) “Consumer variety-seeking among goods and services: An integra-
tive review.” Journal of Retailing and Consumer Services 2, no. 3: 139–148.
Kalyanaram, Gurumurthy and Russell S. Winer (1995) “Empirical generalizations from ref-
erence price research.” Marketing Science 14, no. 3 supplement: G161-G169.
Kamakura, Wagner A., Byung-Do Kim and Jonathan Lee (1996) “Modeling preference and
structural heterogeneity in consumer choice.” Marketing Science 15, no. 2: 152–172.
Kamakura, Wagner A. and Gary Russell (1989) “A probabilistic choice model for market
segmentation and elasticity structure.” Journal of Marketing Research 26: 379–390.
Keane, Michael P. and Nada Wasi (2013) “Comparing alternative models of heterogeneity in
Consumer choice behavior.” Journal of Applied Econometrics 28: 1018–1045.
Lancaster, Kelvin (1966) “A new approach to consumer theory.” Journal of Political
Economy 132–157.
Lattin, James M. and Randolph E. Bucklin (1989) “Reference effects of price and promotion
on brand choice behavior.” Journal of Marketing Research 26, no. 3: 299–310.
Lattin, James M. and Leigh McAlister (1985) “Using a variety-seeking model to identify
substitute and complementary relationships among competing products.” Journal of
Marketing Research 23, no. 4: 330–339.
Liechty, John, Venkatram Ramaswamy and Steven H. Cohen (2001) “Choice Menus for
Mass Customization: An Experimental Approach for Analyzing Customer Demand with
an Application to a Web-Based Information Service.” Journal of Marketing Research 38,
no. 2: 183–196.
Linden, Greg, Brent Smith and Jeremy York (2003) “Amazon.com recommendations: Item-
to-item collaborative filtering.” Internet Computing, IEEE 7, no. 1: 76–80.
Louviere, Jordan J., David A. Hensher and Joffre D. Swait (2000) Stated choice methods:
analysis and applications. New York: Cambridge University Press.
Louviere, Jordan J., Terry N., Flynn and Anthony A. J. Marley (2015) Best worst scaling
theory, methods and applications. New York: Cambridge University Press.
Louviere, Jordan J. and George Woodworth (1983) “Design and analysis of simulated con-
sumer choice or allocation experiments: an approach based on aggregate data.” Journal of
Marketing Research 20, no. 4: 350–367.
Luce, R. D. (1959) Individual Choice Behavior: A Theoretical Analysis. New York: Wiley.
McFadden, Daniel (1974) “Conditional logit analysis of qualitative choice behavior,” in Paul
Zarembka (ed.), Frontiers in Econometrics, New York: Wiley, 105–142.
McFadden, Daniel (1986) “The choice theory approach to market research.” Marketing
Science 5, no. 4: 275–297.
Moore, William L. (2004) “A cross-validity comparison of rating-based and choice-based
MIZIK_9781784716745_t.indd 178 14/02/2018 16:38

conjoint analysis models.” International Journal of Research in Marketing 21, no. 3:

299–312.
Morwitz, Vicki G., Joel H. Steckel, and Alok Gupta (2007) “When Do Purchase Intentions
Predict Sales?” International Journal of Forecasting 23, no. 3: 347–364.
Orme, Bryan (2013) “Advances and trends in marketing science from the Sawtooth Software
perspective.” Working Paper, Orem, UT: Sawtooth Software, Inc.
Rao, Vithala R. (2014) Applied Conjoint Analysis, Heidelberg: Springer Verlag.
Roberts, John H. (1983) “A Multi-Attribute Utility Diffusion Model: Theory and Application
to the Pre-Launch Forecasting of Automobile.” Unpublished Ph.D. thesis, Cambridge,
MA: Massachusetts Institute of Technology.
Roberts, John H. and James M. Lattin (1991) “Development and testing of a model of con-
sideration set composition.” Journal of Marketing Research 28, no. 4: 429–440.
Roberts, John H. and James M. Lattin (1997) “Consideration: Review of research and pros-
pects for future insights.” Journal of Marketing Research 34, no. 3: 406–410.
Roberts, John H. and Gary Lilien (1993) “Explanatory and predictive models of consumer
behavior,” in Jehoshua Eliashberg and Gary Lilien (eds), Handbooks in Operations
Research and Management Science, Vol. 5, Amsterdam: North Holland, 27–82.
Roberts, John H., Charles J. Nelson, and Pamela D. Morrison (2005) “A prelaunch diffusion
model for evaluating market defense strategies.” Marketing Science 24, no. 1: 150–164.
Roberts, John H. and Glen L. Urban (1988) “Modeling multiattribute utility, risk, and
belief dynamics for new consumer durable brand choice.” Management Science 34, no. 2:
167–185.
Roberts, Ken, John H. Roberts, Peter J. Danaher, and Rohan Raghavan (2015) “Practice
Prize Paper—Incorporating Emotions into Evaluation and Choice Models: Application to
Kmart Australia.” Marketing Science 34, no. 6: 815–824.
Rogers, Everett M. (2003) Diffusion of innovation 5th ed. New York: Free Press.
Ryan, Michael J. and Edward H. Bonfield (1975) “The Fishbein extended model and con-
sumer behavior.” Journal of Consumer Research 2, no. 2: 118–136.
Seetharaman, P. B. and Pradeep Chintagunta (1998) “A model of inertia and variety-seeking
with marketing variables.” International Journal of Research in Marketing 15.1: 1–17.
Sheth, Jagdish N. (2011) Models of Buyer Behavior: Conceptual, Quantitative, and Empirical.
Decatur, GA: Marketing Classics Press.
Shugan, Steven M. (1980) “The cost of thinking.” Journal of Consumer Research 7, no. 2:
99–111.
Siddarth, S., Randolph E. Bucklin, and Donald G. Morrison (1995) “Making the cut:
Modeling and analyzing choice set restriction in scanner panel data.” Journal of Marketing
Research 32, no. 3: 255–266.
Stephen, Andrew T. and Olivier Toubia (2010) “Deriving value from social commerce net-
works.” Journal of Marketing Research 47, no. 2: 215–228.
Swait, Joffre, Monica Popa, and Luming Wang (2016) “Capturing Context-Sensitive
Information Usage in Choice Models via Mixtures of Information Archetypes.” Journal
of Marketing Research, https://www.ama.org/publications/JournalOfMarketingResearch/
Pages/capturing-context-sensitive-information-usage.aspx (last accessed October 3, 2017).
Tobin, James (1958) “Estimation of relationships for limited dependent variables.”
Econometrica 26, no. 1: 24–36.
Toubia, Olivier, John R. Hauser, and Duncan I. Simester (2004) “Polyhedral methods for
adaptive choice-based conjoint analysis.” Journal of Marketing Research 41, no. 1: 116–131.
Trusov, Michael, Randolph E. Bucklin, and Koen Pauwels (2009) “Effects of word-of-
mouth versus traditional marketing: findings from an internet social networking site.”
Journal of Marketing 73, no. 5: 90–102.
Urban, Glen L., Philip L. Johnson, and John R. Hauser (1984) “Testing competitive market
structures.” Marketing Science 3, no. 2: 83–112.
Verhoef, Peter C., P. K. Kannan, and J. Jeffrey Inman (2015) “From multi-channel retail-
ing to omni-channel retailing: Introduction to the special issue on multi-channel retail-
ing.” Journal of Retailing 91, no. 2: 174–181.
MIZIK_9781784716745_t.indd 179 14/02/2018 16:38

Villas-Boas, J. Miguel, and Russell S. Winer (1999) “Endogeneity in brand choice

models.” Management Science 45, no. 10: 1324–1338.
Wedel, Michel, and Wagner A. Kamakura. (2012) Market Segmentation: Conceptual and
Methodological Foundations. 2nd Ed. Vol. 8. Springer Science & Business Media
Wooldridge, Jeffrey M. (2010) Econometric Analysis of Cross Section and Panel Data.
Wright, Peter and Fredrick Barbour (1977) Phased decision strategies: Sequels to an initial
screening. Working Paper, Graduate School of Business, Stanford University.
MIZIK_9781784716745_t.indd 180 14/02/2018 16:38

8. Bayesian econometrics
Greg M. Allenby and Peter E. Rossi
The goal of statistical inference is to make statements regarding the value

of unknown quantities using information that is available to the analyst.
These statements, or inferences, are made in terms of probability state-
ments such as an interval of probable values of a parameter or latent con-
structs such as utility and preference. When the latent construct is discrete,
inference is conveyed in terms of probability statements of the hypoth-
esized value being true. Inference can also be made about a parameter
taking on certain values, such as a consumer’s having an elastic response
to prices. The construct of interest may also be a yet unobserved value
from the model in the form of a prediction or the outcome of some action.
Information inferences such as these can come from two forms – from
prior data and from non-data-based information, such as theories of
behavior and subjective views of the phenomena of interest. For example,
researchers may believe that consumer price responsiveness is such that
people would rather pay less for an offering than more for it, resulting in
downward-sloping demand curves. This information can be incorporated
into an analysis by specifying a particular functional form for a model,
and by restricting parameters to take on values only within a particular
domain.
Marketing presents some unique types of data and challenges to con-
ducting statistical inference. Marketing is characterized by many “units”
of analysis, none of which is associated with data that are particularly
very informative. One example is a conjoint survey, where hundreds of
respondents are asked to provide preferences for hypothetical offerings,
typically in the form of discrete choices, across a dozen or so choice
tasks. For any one respondent, the amount of information provided is
scant, although there may be many respondents included in the study.
Respondent heterogeneity complicates analysis as it is preferred to allow
each respondent to be represented by a unique set of parameters and coef-
ficients. Another example is retail scanner data, where there are thousands
of offerings on shelves and a large number of geographical markets, or
stores, with marketing mix variables (e.g., displays or local advertising)
that may not vary much. The goal of employing both types of data is to
make plausible predictions for decision-making across a large number of
units.
181
MIZIK_9781784716745_t.indd 181 14/02/2018 16:38

In this chapter, we argue that Bayesian econometric methods are

particularly well suited for the analysis of marketing data. Bayes’ theorem
provides exact, small-sample inference within a flexible framework for
assessing particular parameters and functions of parameters. We first
review the basics of Bayesian analysis, and then examine issues associated
with modern Bayes computation responsible for the increased develop-
ment of Bayesian methods in marketing. We then examine three areas
where Bayesian methods have contributed to marketing analytics – models
of choice, heterogeneity and decision theory. This last area includes
issues associated with simultaneity and strategically defined covariates.
We conclude with a discussion of limitations and common errors in the
application of Bayes theorem to marketing analytics.
Basic Bayes
In the analysis of data, all Bayesians adhere to a principle known as the

likelihood principle (Berger and Wolpert, 1988), which states that all
information about model parameters contained in the data is expressed in
terms of the model likelihood. The likelihood is a description of the gen-
erative mechanism for the data (i.e., the distribution of the data) expressed
as a function of model parameters. If two models have the same likeli-
hood function, then an analyst should make the same inference about the
unknowns of the model.
The likelihood principle distinguishes Bayesian analysis from many
modern econometric methods, such as Generalized Method of Moments
(Hansen, 1982), that rely on other conditions to make statistical inference.
GMM methods, for example, can be used to estimate a standard regres-
sion model using discrete outcome data (e.g., 0–1). Such a model would
have zero likelihood for any set of regression coefficients.
In addition, Bayesian analysis is conducted conditional on the data, in
contrast to the frequentist approach where the sampling distribution is
determined prior to seeing the data. Bayesian inference is based on the one
dataset on hand, while non-Bayesian inference involves many hypothetical
datasets when constructing confidence intervals and calculating p-values
to test hypotheses.
Bayesian analysis is based on Bayes theorem, which states that the
posterior distribution of model parameters (q) given the data (D) , p (u 0 D)
is obtained from the definition of conditional probability:
p (D,u) p (D 0 u) p (u)
p (u 0 D) 5 5
p (D) p (D)
MIZIK_9781784716745_t.indd 182 14/02/2018 16:38

Bayesian econometrics 183
or
p (u 0 D) ~ p (D 0 u) p (u)
where p (D 0 u) is the likelihood of the data and p (u) is the prior distribu-
tion. The denominator, p (D) , is left out of the later expression because q
is the variable of interest, and inference is unaffected by its value, up to a
constant of proportionality. The expression above is the only theorem that
guides Bayesian analysis. Modern Bayesian computing methods use some
type of simulation method for generating draws of u from its posterior
distribution, p (u 0 D) , which summarizes all information from the prior
and the data.
The challenge in conducting Bayesian analysis is in summarizing the
information contained in the posterior distribution. The dimension of
the posterior distribution can be very large in marketing applications,
especially in models that account for heterogeneous response among
the units of analysis, such as key accounts and respondents. A conjoint
analysis involving 500 respondents and 10 partworths leads to a posterior
distribution of 5,000 parameter values, not including parameters from the
prior distribution. Modern Bayesian methods summarize the posterior
distribution via simulation methods, and in particular Monte Carlo
Markov Chain (MCMC) methods that are particularly well suited for the
analysis of hierarchical models.
The advantage of simulation methods is that they facilitate inves-
tigation of particular respondents and cross-sectional units p (ui 0 D) ,
as well as functions of interest of these parameters, i.e., p (h (u) 0 D) .
This ability contrasts with sampling theory methods that are content
with reporting point estimates and standard errors to summarize
information from the data. We believe these summary measures are
somewhat irrelevant because they are based on properties of hypotheti-
cal distributions, not on the observed data (D) . Moreover, the normal
approximation often used when interpreting standard errors can often
be very misleading when working with marketing data because of data
sparseness.
Prediction from a Bayesian point of view can be thought of in a way that
|
is similar to inference where the predictive data (D) is unobservable, and
one should compute the posterior distribution of the unobservable given
|
the observed data, p (D 0 D) . The Bayesian solution to obtaining the predic-
tive distribution for a model by integrating over the posterior distribution
of model parameters:
p (D 0 D) 5 3 p (D 0 u) p (u 0 D) du
| |

MIZIK_9781784716745_t.indd 183 14/02/2018 16:38

where we assume that the predictive values of that data are conditionally
independent of the past values given the model and its parameters q. The
expression above is a reminder that Bayesian analysis employs the entire
posterior distribution in making inferences and making predictions, and
|
avoids the use of plug-in approximations (i.e., p (D 0 û) ) because it does
not fully reflect the uncertainty of unobservable, latent quantities such as
parameters.
Finally, optimal decisions associated with a Bayesian analysis employs
the concept of a loss function L (a,u) which is a function of an action
(a) and an unobserved parameter or state of nature (u) . The Bayesian
approach to the problem is to choose the action a so that the posterior
expectation of loss is minimized:
min { L (a) 5Eu0D [ L (a,u) ] 5 3 L (a,u) p (u 0 D) du

a
This formulation recognizes that we do not know for certain the true
state of nature, (u) , and we must account for this uncertainty in our
decision. Statistical estimation can be viewed as a special case of Bayesian
decision theory where the decision is to pick a point estimate for (u) . If we
assume a squared-error loss function, where over-prediction and under-
prediction are assigned equal penalty:
L (û, u) 5 (û2u) rA (û2u)
then it can be shown that the optimal point estimate for u under squared-
error loss is the mean of the posterior distribution p (u 0 D) (see Zellner,
1971).
Marketing problems employ a wide variety of loss functions that
are of interest to analysts beyond squared error loss. These include the
desire to maximize profits, consumer utility and intermediate constructs
such as brand recall, recognition and consideration. Bayesian analysis
provides a flexible tool for addressing a wide range of decisions in
marketing.
To illustrate these concepts, consider a simple example involving a
binary outcome variable from a binomial distribution. The binominal dis-
tribution is often used in the analysis of marketing data when respondents
either respond or not, such as when they click on a website or purchase
a product. The outcome variable can take on two values: zero, implying
failure or no action, and one, implying success or purchase. The likelihood
for the data can be expressed as:
yt , Bin (u)
MIZIK_9781784716745_t.indd 184 14/02/2018 16:38

with a likelihood function over T observations:
p (y 0 u) 5 q u yt (12u)12yt
T

t51
5 qn (12 q) T2n
where n 5 g Tt51 yt is the total number of successes. A convenient prior

distribution for that has a similar form to the likelihood is the Beta
distribution:
p (u 0 a,b) ~ ua21 (12u) b21
with support on the unit interval (0,1). The posterior is obtained by multi-
plying the likelihood by the prior:
p (u 0 y) ~ p ( y 0 u) p (u)
5 [ un (12u) T2n ] 3 [ ua21 (12q) b21 ]
5 un1a21 (12 u) T2n1b21
, Beta ( n 1a,T 2n 1 b)
which is also a of the form of a Beta distribution. The parameters of the

prior distribution (a, b) are specified by the analyst and are seen to act
like “data” in the calculations. The posterior distribution is a compromise
between the likelihood and prior, which accounts for the shrinkage nature
of Bayesian estimates.
The prediction of a new outcome yf in the Beta-Binomial model is
obtained by employing the predictive formula described above:
p ( yf 0 y) 5 3 p (yf 0 u) p (u 0 y) du
5 3 up (u 0 y) du

5 E [u 0 y ]
In summary, the Bayesian approach to statistics and econometrics

provides the right answer to the right question, i.e., the posterior distribu-
tion of unknown quantities (e.g., parameters) given quantities that are
observed (e.g., data), provides a full accounting of uncertainty and pro-
vides an integrated approach to inference and decision making. The costs
of the Bayesian approach are increased computing costs when dealing
with simple models that do not have simple expressions for the posterior
distribution, and the need to specify a prior distribution for the model
MIZIK_9781784716745_t.indd 185 14/02/2018 16:38

parameters. We examine computational costs and prior specifications in

the context of marketing applications below.
Bayesian Computation
We consider methods for simulating from the posterior distribution for

the regression model. The outcomes y in a regression model are assumed
to be distributed multivariate Normal with mean equal to the regression
line and variance equal to the residual variance:
y 5Xb 1e ; e , N (0, s2I)
where X are explanatory variables that are assumed to be distributed inde-

pendent of the model parameters. The likelihood for the regression model
can be shown to be:
21
p (D 0 u) 5 p (y 0 X, b, s2) ~ (s2)2n/2 exp c ( y 2Xb)r ( y2Xb) d
2s 2
The prior distribution for our illustration is factored into a conditional
and marginal distribution:
p (u) 5 p ( b 0 s 2) p (s 2)
where the conditional prior for b is assumed Normal ( b, A21) and the mar-
ginal prior for s2 is assumed to be inverted gamma:
21
p (b 0 s2, b, A) 5 (s 2)2k/2 exp c ( b 2b )r A ( b2b ) d
2s 2
n0
a 2 11b 2n0 s20
p (s 2 0 n0 , s20) ~ (s 2) 2 exp c d
2s 2
The posterior distribution for the model is proportional to the product
of the likelihood and the prior:
p (u 0 D) 5 p ( b, s2 0 ( y,X) , (b, A) , (n0 ,s20))
21
~ (s 2)
2n/2
exp c (y 2Xb) r (y2Xb) d
2s 2
21
( b2 b ) r A ( b2b ) d
2k/2
3 (s 2) exp c
2s 2
MIZIK_9781784716745_t.indd 186 14/02/2018 16:38

n0
2a 2 11b 2n0s20
3 (s 2) exp c d
2s 2
One approach to conducting Bayesian inference is to work with the
posterior distribution and determine analytic expressions for its form and
moments, such as the mean of the distribution. Alternatively, simulation
methods can be used to generate Monte Carlo draws from the posterior
distribution. One strategy for the standard regression model is:
1. Generate a draw of s 2 from its marginal, inverted gamma distribution.

2. Use the draw of s 2 as a conditioning argument for the conditional
posterior distribution of b given s 2.
3. Repeat.
It is often the case that it is either difficult or impossible to simulate

directly from the posterior full conditional distributions in steps 1–3
above. This occurs when the prior distribution and the likelihood do not
conform to each other, as in discrete choice models where the likelihood is
comprised of discrete mass points and the prior is a density. In these cases,
the Metropolis–Hastings (MH) algorithm can be used to simulate draws
(see Chib and Greenberg, 1995).
The simplest form of the MH algorithm uses a random-walk to generate
candidate draws that are accepted with probability a. If the candidate
draw is rejected, then the value of the parameter is not updated and instead
retains its current value and no updating occurs. The MH algorithm works
by setting a so that the acceptance probability of a new draw makes the
Markov chain “time reversible” with respect to the posterior distribution
of a model, so that the stationary distribution of the Markov chain is also
the posterior distribution. This allows us to use the MH algorithm as a
device for simulating from the posterior. The random-walk MH chain
proceeds as follows:
1. Generate a candidate value of a parameter unew using the old value plus
a symmetric disturbance: unew 5 uold 1N (0,t2) where t2 is specified by
the analyst so that 30–50 percent of the candidates are accepted.
( new 0
2. Compute the acceptance probability a 5 min { 1, pp (uuold 0 DD }
3. Accept the new draw of u with probability a: draw a Uniform(0,1)
random variable and if U , a accept the draw of u. Otherwise, retain
the old value of u and proceed to the next draw in the recursion.
To understand why the MH algorithm works it is first necessary to

describe a Markov chain more formally and then to establish two facts
MIZIK_9781784716745_t.indd 187 14/02/2018 16:38

about them with regard to their stationary, long-run distributions and the
property of time reversibility. A Markov chain is a stochastic process that
describes the evolution of random variables by specifying transition prob-
abilities of moving from one realization to the next. The simplest Markov
chain contains just two states, or values, that a variable can assume and
has a matrix of transition probabilities:
p11 p12
P5 c d
p21 p22
where pij is the probability of moving from state i to state j, and the sum of
the probabilities in each row sum to one, e.g.,.p1,1 1 p1,2 5 1.0 The transi-
tion probability pi,i is the probability of staying in the state i. If the prob-
ability of being in each of the two states is initially p0 = (0.7, 0.3), then the
state probabilities after one iteration of the Markov chain are:
p1 5 p0 P
p11 p12
5 [ 0.7 0.3 ] c d
p21 p22
5 [ 0.7p11 1 0.3p21 0.7p12 1 0.3p22 ]
The transition matrix P is therefore the key component of the Markov

chain as it described how the state probabilities change over time. If
0.50 0.50
P5 c d
0.25 0.75
Then
p1 5 p0 P
0.50 0.50
5 [ 0.7 0.3 ] c d
0.25 0.75
5 [ 0.425 0.575 ]
and we can see that the probability of being in the second state increases
from 0.30 to 0.575. As the chain continues to iterate, the state probabilities
will converge to long-run or steady-state probabilities:
p1 5 p0 P p2 5 p1 P 5 p0 PP 5 p0 P2 pr 5 p0Pr
and the effects of the starting distribution p0 would wear off. The chain
will converge to what is know as the stationary distribution, p, defined
such that:
MIZIK_9781784716745_t.indd 188 14/02/2018 16:38

pP 5 p
For the transition matrix P defined above, it can be verified that the
long-run stationary distribution is p 5 [ 13 23 ] which is obtained regardless
of the initial probabilities p0.
The goal of the MH algorithm is to construct a Markov chain with sta-
tionary distribution equal to the posterior distribution of a specific model.
This is accomplished by making the chain time-reversible with respect
to the posterior. A time reversible chain is one where the probability of
moving from state i to state j is the same as moving from state j to state i.
At any point in time, the probability of seeing an i→j transition is pi pij and
so a chain is time reversible if
pi pij 5 pj pji
Furthermore, since the row probabilities in the transition matrix P sum

to one, we have:
a pipij 5 a pjpji 5 a pji 5 pj

i i i
or
pP 5 p
In other words, p is the stationary distribution.

The property of time reversibility provides us with an alternative to the
complicated task of searching for the transition matrix P with the stationary
distribution we design, i.e., the posterior distribution of our model. Instead of
a direct search for P, we can use the property of time reversibility to modify an
arbitrary chain with a transition matrix Q so that it produces the s tationary
distribution we desire. This is accomplished by modifying the transition prob-
abilities of an arbitrary “candidate-generating” distribution qij such that:
pij 5 qij a (i, j)
where
pj qij
a 5 min e 1, f
pi qij
That is, a candidate state value is generated according to the transition
matrix Q and accepted with probability a. With probability 1 2a the
candidate value is rejected and the old value is retained. This algorithm
results in a Markov chain with stationary distribution p.
MIZIK_9781784716745_t.indd 189 14/02/2018 16:38

We prove this assertion by showing that an i S j transition is equal to

a j S i transition with respect to p regardless of the candidate-generating
distribution Q.
pj qji
pi pij 5 pi qij min e 1, f 5 min { pi qij , pj qji }
pi qij
pi qij
pj pji 5 pi qji min e 1, f 5 min { pj qji , pi qij }
pj qji
The right sides of the above expressions are the same, and therefore
pi pij 5 pj pji and the resulting Markov chain has stationary distribution
p. If we select p as the posterior distribution of our model, and we regard
the “states” of the stochastic process as the possible values that our model
parameters can assume, then the resulting Markov chain will simulate
draws from the posterior distribution p. All that is needed is to be able to
evaluate the posterior distribution up to the constant of proportionality
that cancels from the numerator and denominator of the above expression:
pi ~ p (D 0 ui ) p (ui)
The candidate-generating probabilities qji and qij cancel in the expres-

sion for a for the random-walk MH chain that employs a symmetrical
distribution (i.e., the Normal distribution) to generate the candidates. If
uj 5 ui 1 e with e symmetrical, then the resulting transition probabilities
are such that qij 5 qji. Other variants of the MH algorithm generate
candidate values of q in other ways, and can result in faster convergence
and better mixing properties of the Markov chain. These versions do not
result in qij 5 qji and lead to different values of a. The Gibbs sampler can
be shown to be a special case of the MH algorithm with a 5 1. However,
regardless of which variation of the MH algorithm is employed, the result
is a general method of employing a Markov chain to simulate draws from
the posterior distribution of model parameters.
A special case of the Metropolis–Hastings algorithm is the Gibbs sam-
pler. The candidate generating mechanism of the Gibbs sampler differs
from the random walk mechanism described above in that candidates are
generated sequentially from the full conditional distributions of the poste-
rior. It can be shown (see Rossi, Allenby and McCullough, 2005) that the
acceptance probability for the Gibbs sampler (a) is always equal to one.
MIZIK_9781784716745_t.indd 190 14/02/2018 16:38

Bayes in Marketing
Bayesian statistics has made significant inroads into marketing because

of its ability to deliver exact, small-sample inference in a scarce data envi-
ronment characterized by discrete outcomes and heterogeneous decision
makers. We discuss three aspects of Bayesian analysis in marketing –
models of decision making, models of heterogeneity and models that
examine the loss function or optimal decisions that flow from Bayesian
analysis.
Unit-level Models
Marketing data at the disaggregate consumer level are characterized

by having many zeros, indicating consumers not doing something. The
standard regression model described earlier is not appropriate for disag-
gregate analysis because the regression error term is assumed to follow a
continuous distribution, which is not consistent with the dependent vari-
able having a mass buildup at zero. However, the regression model can be
modified in various ways to deal with the discreetness of marketing data.
The simplest example is to assume that the output from a regression model
is not directly observed, and that what is observed is a censored realization
of a continuous latent variable:
z 5 Xb1 e e , N (0, s 2)
and
0 if z # 0
ye
1 if z . 0
where the indicator variable y takes on values depending on the latent

variable z. This model, known as a binomial probit model, is useful when
modeling yes/no decision of consumers.
Another example of censored regression model is the Tobit model:
0 if z # 0
y5 e
z if z . 0
which is used in regression analysis when the data takes on positive values
with a mass buildup at zero. A final example is the ordered probit model
used in the analysis of ranked outcome data:
y 5 r ; cr21 # z , cr
MIZIK_9781784716745_t.indd 191 14/02/2018 16:38

where the observed data take on integer values depending on the relation-
ship of the censored regression value and cutoff values { cr } . This model
is often used to model integer data from fixed point rating scales found
customer satisfaction data.
The above models are all examples of hierarchical models that can be
written in the form:
y | z
z | x, b
where all information in the data (y) is transmitted to the model param-
eters through the latent variable z. In other words, y and b are independent
of each other, given z. Models employing conditional independence are
known as hierarchical models, and as we will see below they are particu-
larly well suited to be estimated by Bayesian MCMC methods.
We can write our model using brackets to denote distributions as:
[y|z] [z|x, b] [b]
where the first factor is the censoring mechanism, the second factor is the
latent regression and the third factor is the prior on b.
The traditional analysis of these models typically integrates the latent
variable z from the model likelihood and finds the parameter values that
maximize the probability of the observed data. The Bayesian analysis of
censored regression models differs in that the latent variable z is intro-
duced as a latent object of interest and Bayes theorem is used to obtain
parameter estimates. The Gibbs sampler for this model involves first
generating draws from the full conditional distribution of z given all other
parameters:
1. [ z 0 else ] ~ [ y 0 z ] [ z 0 x, b ]
which takes the form of a censored normal distribution, being greater than
zero when y equals one, and negative when y is equal to zero. The second
step in model estimation involves draw of the latent regression coefficients:
2. [ b 0 else ] ~ [ z 0 x, b ] [ b ]
which are draws from the standard regression model conditional on the
previous draws of z. The advantage of Bayesian estimation is seen here
in two ways: (1) the MCMC iterations involve simplified portions of the
entire likelihood involving only the parameter of interest; and (2) draws of
MIZIK_9781784716745_t.indd 192 14/02/2018 16:38

latent variables such as z depend entirely on the Bayes theorem to deter-

mine the distribution from which to draw, or equivalently the acceptance
probability in the general MCMC procedure.
Unit-level models in marketing include anything related to consumer
response, including the choice of products in the marketplace and opinions
expressed in surveys. The ease of Bayesian estimation stems from the need
to only evaluate the prior and likelihood at specific proposed parameter
values, and does not require the computation of a gradient and Hessian
as in MLE estimation. In addition, hierarchical models allow condition-
ing on other model parameters, such as latent utilities, that simplify the
computations.
Bayesian models of consideration sets (Gilbride and Allenby 2004;
Terui et al. 2011), and economic models of choice involving satiation
(Kim et al. 2002), multiple constraints (Satomura et al. 2011), indivisibility
(Lee and Allenby, 2014), kinked budget sets (Howell et al. 2015), and
complementary products (Lee et al. 2013) provide examples of the versatil-
ity of Bayesian methods in dealing with complex computational issues in
model development. Outside of formal choice model, Bayesian methods
have been used to address the analysis of data collected on fixed point
ratings scales (Rossi et al. 2001; Büschken et al. 2013) and other scales (see
Marshall and Bradlow 2002).
Heterogeneity
Bayesian methods have made the biggest impact in marketing by allowing

models of demand and demand formation to vary by respondent through
variation in model coefficients. Bayesian methods offer a flexible set of
tools for allowing consumers to be represented with unique tastes, prefer-
ences and sensitivities to variables like prices. The advantage of Bayesian
models of heterogeneity is their ability to pool information across indi-
viduals while not requiring that all respondents have the same model
coefficients.
A challenge in allowing for heterogeneity in Bayesian models is in
specifying the prior distribution of the parameters for the cross-sectional
units { ui } . It has proven to be beneficial to specify the prior distributions
across the cross-sectional units hierarchically, using a random-effects
model.
p (u1,...uI , t 0 h) ~ c q p (ui 0 t) d 3 p (t 0 h)
i
where t are referred to as hyper-parameters because they describe the

variation of other parameters and not variation of the observed data.
MIZIK_9781784716745_t.indd 193 14/02/2018 16:38

A multivariate Normal distribution is often used for the distribution of

heterogeneity:
p (ui 0 t) 5 Normal (t 5 { u,Vu })
and the hyper-parameters t are the mean and covariance matrix of the
Normal distribution. An additional prior distribution is provided on the
hyper-parameters, so that the analyst is not forced to specify the location
and variability of the distribution of heterogeneity, but this can be inferred
from the data. The parameters of the prior distribution of hyper-param-
eters, h, are specified by the analyst and are not estimated from the data.
The non-Bayesian analysis of models with cross-sectional variation of
model parameters are known as random-effect models. Since models are
viewed as fixed but unknown constants in the classical statistics paradigm,
the random effects { ui } are typically integrated out of the model to obtain
the marginal likelihood of the data given the hyper-parameters:
p (D 0 t) 5 q 3 p ( yi 0 ui ) p (ui 0 t) dui
i
In this formulation, the first-stage prior is viewed as part of the likeli-

hood instead of as the first stage of the prior distribution. The distinction
between likelihood and prior is blurred in non-Bayesian analysis and both
are considered part of the model. The resulting marginalized likelihood
is a function of the “fixed but known” hyper-parameters. A challenge in
conducting inference about the hyper-parameters is that the marginalized
likelihood involves an integral that can sometimes be difficult to evaluate.
The Bayesian analysis of the random-effect model includes the prior
on the hyper-parameters and does not involve any marginalization of the
likelihood. Instead, all parameters are viewed as latent, observed quanti-
ties and Bayes theorem is used to derive their posterior distribution:
p ({ ui } ,t 0 D,h) ~ c q p ( yi 0 ui) p (pi 0 t ) d 3 p (t 0 h)

i
MCMC methods are then used to generate draws from the high-
dimensional posterior distribution of all model parameters. The posterior
distribution then needs to be marginalized to obtain the posterior distribu-
tion of any particular parameters, e.g.,
p (u1 0 D,h) 5 3 p ({ ui }, t 0 D,h) du21 dt
where “u21” denotes the set { ui } except for the first respondent.
Fortunately, this integration is easy to evaluate with the MCMC estimator
MIZIK_9781784716745_t.indd 194 14/02/2018 16:38

by ignoring, or discarding, the draws of parameters which are not of inter-

est. The posterior distribution of any specific parameter in the joint poste-
rior distribution is obtained by running the Markov chain and saving the
specific parameter draws of interest. The posterior distribution of specific
parameters, or functions of specific parameters (e.g., market share esti-
mates) are obtained by building up the posterior distribution of interest
from the draws of the full joint posterior distribution.
The Markov chain Monte Carlo algorithm for the random-effects
model proceeds in two steps:
2. Generate draws of the hyper-parameters t: [ t 0 else ] ~ [ w [ ui 0 t ] ] [ t 0 h ]

1. Generate draws of { ui } : [ qi 0 else ] ~ [ yi 0 ui ] [ ui 0 t ] , i=1, . . ., I
i
The presence of conditional independence associated with the hierarchi-
cal specification of the model leads to simplification of the draws, where
the first draw does not depend on the parameters of second-stage prior
[ t 0 h ] , and the second draw does not depend on the data [ yi 0 ui ] .
Historically, the normal model for heterogeneity has been employed.
That is, P (ui 0 t) is a normal distribution (see Allenby and Rossi 1999).
While the normal distribution is a flexible distribution allowing for arbi-
trary local, scale and correlation, there are several notable limitations of
the normal distribution as used in marketing applications. For example,
consider the distribution of marketing mix sensitivities over consumers.
We observe, for example, that some consumers are extremely sensitive to
price while other consumers are virtually insensitive to price. This gives
rise to highly skewed distributions of price sensitivity across customers.
In many unit-level models, there are brand or product specific intercept
parameters. For many products, we might expect that the distribution of
brand preferences might exhibit more than one mode. For example, there
might be one mode corresponding to those who have a strong preference
for the product over others and another mode near zero, which represents
the group of consumers who regard the focal product as similar to others
in the product category. One approach for dealing with weak preferences
is to employ a model of heterogeneous variable selection (Gilbride et al.
2006). An alternative flexible generalization of the normal distribution is
a mixture of normal distributions (see Allenby et al. 1998; Chankukala et
al. 2011).
With even a relatively small number of components, mixtures of normal
distributions can easily accommodate skewness and multimodality. In a
Bayesian context, proper priors on the mixture of normal components
and mixture probabilities enforce strong shrinkage and parsimony on the
resulting mixture. This means that a full Bayesian approach to a mixture
MIZIK_9781784716745_t.indd 195 14/02/2018 16:38

of normals can accommodate a large or even, potentially, infinite number

of mixture components. Rossi (2014) provides a self-contained discussion
of mixture models including both finite and infinite mixtures of normals.
Allenby et al. (2014) show that using a mixture of normal distribution can
yield materially different conclusions in the valuation of product features.
Kim et al. (2013) demonstrate the usefulness of combining mixture models
and variable selection methods.
Decision Theory and Strategic Covariates
Decision theory is one of the most powerful aspects of the Bayesian para-
digm. Bayesian decision theory identifies the optimal action as the one
that minimizes expected posterior loss, where the loss function can be
broadly construed and can include aspects of profits and consumer utility.
We note that the loss function is completely distinct from the likelihood
or model that is assumed to generate the data. The posterior distribu-
tion arises from the prior and the likelihood, and the loss function can be
chosen completely distinct from the process assumed to generate the data.
A special case of decision theory is model selection. If we assume that
the loss function is a 0–1 binary function for choosing the correct model,
then the best model is the one that maximizes the posterior probability
of the model being correct. The posterior probability of a model can be
calculated using the Bayes theorem:
p (D 0 Mm) p (Mm)
p (Mm 0 D) 5
p (D)
where Mi denote model “m.” The posterior model probabilities are often
expressed in terms of a posterior odds ratio that compares two models
against each other:
p (M1 0 D) p (D 0 M1) p (M1)
5 3
p (M2 0 D) p (D 0 M2) p (M2)
equal to the Bayes factor multiplied by the prior odds of the models. The
Bayes factor is the ratio of the marginal distribution of the data, or the
average of the likelihood with respect to the prior:
p (D 0 Mm) 5 3 p (D 0 Mm,u) p (u) du
Calculating the marginal distribution of the data can be difficult, and

there exist numerical methods (Schwarz 1978; Newton and Raftery 1994;
Gelfand and Dey 1994) for its evaluation.
Decision theory can be used to obtain optimal marketing decisions or
MIZIK_9781784716745_t.indd 196 14/02/2018 16:38

actions (x) by considering the outcomes (y) conditional on actions and

parameter values (u) . The goal is to find the action that maximizes the
objective function (p) , using the posterior distribution of parameters to
predict outcomes that can be valued in terms of p:
maxp (x) 5 Eu [ Ey0x,u [ p ( y 0 x,u) ] ]
A marketing example of the application of Bayesian decision theory is

discussed by Rossi, McCulloch and Allenby (1996) in the context of disag-
gregate couponing strategies, where the retailer can determine to whom to
offer a discount and the extent of the discount.
Many of the decisions made by marketers affect the variables typically
viewed as explanatory in models of demand and sales. Price is an example
of a variable that can be optimized using Bayesian decision theory by
forecasting profits associated with different values of price. When this
occurs, price can no longer be considered an independent variable since
it is determined, in part, by the same parameters as in the conditional
demand model. Models for such strategically determined covariates
involve a multi-equation likelihood with shared parameters (e.g., sales as
a function of q and price also as a function of q). Manchanda et al. (2004)
provide an example of dealing with strategically determined covariates
where salesforce effort is a function of expected return.
The Bayesian analysis of demand and supply, and simultaneous systems
in general, provide a rich area for future research. Otter et al. (2011) dis-
cuss inferences about supply-side issues, and Yang et al. (2003) discuss an
analysis involving supply-side pricing behavior. The field of strategy has
historically focused on decisions of the firm (i.e., the supply side), while
marketing has focused on decisions of the consumer (demand side). A rich
set of issues for study is present at the intersection of these fields.
Concluding Comments
Bayesian methods have made great inroads in marketing because of the

need to work with disaggregate data and the decision orientation of our
field. Bayesian analysis delivers exact, finite-sample inference in sparse
data environments, and the presence of a prior distribution serves to
stabilize inference to avoid likelihoods with unbounded values.
The practicality of the Bayesian approach to inference has influenced
not only the academic marketing literature but also industry practices.
Each year many thousands of conjoint studies are designed and analyzed
using Bayesian methods and are applied to a wide range of industries
MIZIK_9781784716745_t.indd 197 14/02/2018 16:38

and marketing problems, such as design of new products, forecasting

demand for new products, and optimal pricing of existing products. The
industry leader in conjoint software, Sawtooth Software Inc., provides
a full Bayesian treatment for the most popular Choice-Based Conjoint
Model. General purpose software such as R and SAS includes extensive
implementations of Bayesian approaches to the analysis of choice data.
Bayesian methods have also been picked up in the area of marketing mix
optimization and advertising attribution. Here observational data is used
to build models that attempt to estimate the effects of exposure to different
sorts of advertising such as TV, print and digital. These effect estimates are
then used to consider re-allocation of the marketing budget of a firm to
various modes of advertising. Practitioners have recognized that Bayesian
methods for estimating marketing mix models and aggregate demand
models in general are superior to standard non-Bayesian methods. Other
firms that seek to optimize retail pricing and promotion also use Bayesian
methods due to the very large number of parameters in pricing and promo-
tion models and the relative sparseness of the data. The superior sampling
properties of Bayes estimators are often overlooked in the academic litera-
ture, which is often more focused on the model specification. In practice,
however, obtaining reasonable and reliable estimates is very important.
In sum, Bayesian methods have had considerable influence on both aca-
demic and industry researchers. The appeal is a practical one, motivated
by superior inference capabilities and the ease by which it is possible to
analyze almost any model that can be specified by a researcher.
References
Allenby, Greg M., Neeraj Arora and James L. Ginter (1998) “On the Heterogeneity of
Demand,” Journal of Marketing Research, 35, 384–389.
Allenby, Greg M. and Peter E. Rossi (1999) “Marketing Models of Consumer Heterogeneity,”
Journal of Econometrics, 89, 57–78.
Allenby, Greg M., Jeff D. Brazell, John R. Howell and Peter E. Rossi (2014) “Economic
Valuation of Product Features,” Quantitative Marketing and Economics, 12, 421–456.
Berger, J. O. and R. L. Wolpert (1988) “The Likelihood Principle. Institute of Mathematical
Statistics.” Lecture Notes 6.
Büschken, Joachim, Thomas Otter and Greg M. Allenby (2013) “The Dimensionality
of Customer Satisfaction Survey Responses and Implications for Driver Analysis,”
Marketing Science, 32(4), 533–553.
Chankukala, Sandeep, Yancy Edwards and Greg M. Allenby (2011) “Identifying Unmet
Demand,” Marketing Science, 30(1), 61–73.
Chib, Siddhartha and Edward Greenberg (1995) “Understanding the Metropolis-Hastings
Algorithm,” American Statistician, 49(4), 327–335.
Gelfand, Alan E. and Dipak K. Dey (1994) “Bayesian Model Choice: Asymptotics and
Exact Calculations,” Journal of the Royal Statistical Society. Series B (Methodological)
501–514.
MIZIK_9781784716745_t.indd 198 14/02/2018 16:38

Gilbride, Timothy J. and Greg M. Allenby (2004) “A Choice Model with Conjunctive,
Disjunctive, and Compensatory Screening Rules,” Marketing Science, 23(3), 391–406.
Gilbride, Timothy J., Greg M. Allenby and Jeff Brazell (2006) “Models of Heterogeneous
Variable Selection,” Journal of Marketing Research, 43, 420–430.
Hansen, Lars Peter (1982) “Large Sample Properties of Generalized Method of Moments
Estimators,” Econometrica: Journal of the Econometric Society, 50(4), 1029–1054.
Howell, John R., Sanghak Lee and Greg M. Allenby (2015) “Price Promotions in Choice
Models,” Marketing Science, 35(2), 319–334.
Kim, Sunghoon, Simon J. Blanchard, Wayne S. DeSarbo and Duncan K.H. Fong (2013)
“Implementing Managerial Constraints in Model-Based Segmentation: Extension of Kim,
Fong, and DeSarbo (2012) with an Application to Heterogeneous Perceptions of Service
Quality,” Journal of Marketing Research, 50, 664–673.
Kim, Jaehwan, Greg M. Allenby and Peter E. Rossi (2002) “Modeling Consumer Demand
for Variety,” Marketing Science, 21(3), 229–250.
Lee, Sanghak, Jaehwan Kim and Greg M. Allenby (2013) “A Direct Utility Model for
Asymmetric Complements,” Marketing Science, 32(3), 454–470.
Lee, Sanghak and Greg M. Allenby (2014) “Modeling Indivisible Demand,” Marketing
Science, 33(3), 364–381.
Manchanda, Puneet, Peter E. Rossi and Pradeep K. Chintagunta (2004) “Response Modeling
with Nonrandom Marketing-mix Variables,” Journal of Marketing Research, 41(4),
467–478.
Marshall, Pablo and Eric T. Bradlow (2002) “A Unified Approach to Conjoint Analysis
Models,” Journal of the American Statistical Association, 97(459), 674–682.
Newton, Michael A. and Adrian E. Raftery (1994) “Approximate Bayesian inference with
the weighted likelihood bootstrap,” Journal of the Royal Statistical Society. Series B
(Methodological): 3–48.
Otter, Thomas, Timothy J. Gilbride and Greg M. Allenby (2011) “Testing Models of
Strategic Behavior Characterized by Conditional Likelihoods,” Marketing Science, 30(4),
686–701.
Rossi, Peter E. (2014) Bayesian Semi-Parametric and Non-Parametric Methods in Marketing
and Micro-Econometrics. Princeton, NJ: Princeton University Press.
Rossi, Peter E., Zvi Gilula and Greg M. Allenby (2001) “Overcoming Scale Usage
Heterogeneity: A Bayesian Hierarchical Approach,” Journal of the American Statistical
Association, 96, 20–31.
Rossi, Peter E., Robert E. McCulloch and Greg M. Allenby (1996) “The Value of Purchase
History Data in Target Marketing,” Marketing Science, 15, 321–340.
Rossi, Peter E., Greg M. Allenby and Robert McCulloch (2005) Bayesian Statistics and
Marketing. New York: John Wiley & Sons.
Satomura, Takuya, Jaehwan Kim and Greg M. Allenby (2011) “Multiple Constraint Choice
Models with Corner and Interior Solutions,” Marketing Science, 30(3), 481–490.
Schwarz, Gideon (1978) “Estimating the Dimension of a Model,” Annals of Statistics, 6(2),
461–464.
Terui, Nobuhiko, Masataka Ban and Greg M. Allenby (2011) “The Effect of Media
Advertising on Brand Consideration and Choice,” Marketing Science, 30(1), 74–91.
Yang, Sha, Yuxin Chen and Greg M. Allenby (2003) “Bayesian Analysis of Simultaneous
Demand and Supply,” with discussion, Quantitative Marketing and Economics, 1,
251–304.
Zellner, Arnold (1971) An Introduction to Bayesian Inference in Econometrics. New York:
John Wiley & Sons.
MIZIK_9781784716745_t.indd 199 14/02/2018 16:38

9. Structural models in marketing
Pradeep K. Chintagunta
Over the past two decades structural models have come to their own in
empirical research in marketing.1 The basic notion of appealing to eco-
nomic theory when building models of consumer (e.g., Guadagni and
Little 1983) and firm behavior (Horsky 1977; Horsky and Nelson 1992) in
marketing has been around for much longer than that. Yet, this idea has
come to the forefront as authors have confronted the challenges associated
with drawing inferences from purely statistical relationships governing
the behaviors of the agents of interest. While these relationships provide
important insights into the correlational structure underlying the data,
they are less useful when one is interested in quantifying the consequences
of a change in either the structure of the market (e.g., what happens when
a retailer closes down its bricks-and-mortar operations to focus solely on
online sales) or in the nature of conduct of one or more players in that
market (e.g., what happens to prices of car insurance when consumers
change the ways in which they search for these prices). Since the econom-
ics underlying the conduct or the behavior of agents in the presence of the
structure are not explicitly built into models that only focus on describing
statistical relationships between agents’ actions and outcomes, it is diffi-
cult if not impossible for those models to provide a prediction when one of
these dimensions actually changes in marketplace.
As marketers move away from being focused only on “local” effects of
marketing activities, e.g., what happens when I change price by 1percent,
in order to better understand the consequences of broader shifts in policy,
the need for structural models has also grown. In this chapter, I will
focus on a small subset of such “structural models” and provide brief
discussions of what we mean by structural models, why we need them, the
typical classes of structural models that we see being used by marketers
these days, along with some examples of these models. My objective is
not to provide a comprehensive review. Such an endeavor is far beyond
my current purview. Rather, I would like to provide a basic discussion of
structural models in the context of the marketing literature. In particular,
to keep the discussion focused, I will limit myself largely to models of
demand rather than models of firm behavior.
200
MIZIK_9781784716745_t.indd 200 14/02/2018 16:38

Structural models in marketing 201
What is a structural model?
The definition and key elements of a structural model have been well
established, at least since the important chapter by Reiss and Wolak
(2007). Other papers by Kadiyali et al. (2001), Chan et al. (2009) and
Chintagunta et al. (2004, 2006) have also stayed close to this early work.
And I will not depart in any significant way from the previous work that
precedes this chapter and will draw heavily from that work. In simple
terms, a structural model is an empirical model; one that can be taken
to the data. But it is not any empirical model – since an equation that
establishes a statistical relationship between a set of explanatory variables
and an outcome variable is also an empirical model. What distinguishes a
structural model is that the relationship between explanatory and outcome
variables is based on theory – most often in economic theory – although
it is not limited just to economic principles and can encompass theories
from other disciplines such as psychology as well (Erdem et al. 2005). The
theory for its part makes a prediction about the behavior of some set of
economic agents (consumers, firms, etc.) and thereby governs how the
outcome variable of interest is influenced by the explanatory variables.
Thus the key ingredients of the model are the (economic) agents involved;
the nature of their behavior (optimizing, satisficing, and so on); and the
relationships between explanatory and outcome variables ensuing from
such behavior. These ingredients stem from the researcher’s beliefs about
how they map onto the specific context of interest.2
Since theories make specific predictions, it is unlikely that these predic-
tions about the explanatory and outcome variables can perfectly rational-
ize the actual data one observes on these variables in the market. The
link between the predictions of the model and the observed outcome data
is provided by the “unobservables” in the model. These unobservables
essentially allow us to convert the economic (or some other theory-based)
model into an econometric model, i.e., the final empirical model that we
take to the data. These unobservables get their nomenclature from vari-
ables that are unobserved to us as researchers but are, in general, known
to the agents whose behavior is being modeled.
As Reiss and Wolak point out, these unobservables can be of different
forms. First, we have “structural” error terms – variables that belong to
the set of explanatory variables in the economic model but constitute the
subset that we do not observe as researchers. For example, we know that
shelf space and shelf location are important determinants of a brand’s
market share in addition to price and advertising. But in many situations
we do not have access to data on these variables. In such situations they
become part of the unobservables and constitute “structural” error in the
MIZIK_9781784716745_t.indd 201 14/02/2018 16:38

sense that they are directly related to the theory we are trying to create an
empirical model for.
The second set of unobservables has a very long history in the marketing
literature – unobserved heterogeneity. These unobservables help explain
differences in the relationship between the explanatory and outcome
variables across different agents whose behavior is being characterized
by the structural model. For instance, when looking at brand choice
behavior, data patterns might reveal that one consumer is very price
sensitive whereas another consumer is not. By allowing the consumers’
utility parameters to differ from one another we can capture some of the
deviations between the theoretical model and the data on hand across
consumers in the market.
The third set of unobservables comes about in structural models that
involve agent uncertainty about a specific parameter in the model. In
these models, agents learn about the parameter they are uncertain about
over time but usually have some prior belief about the parameter, often
characterized via a distribution. Learning is a consequence of “signals”
received by the agent (say a consumer) from another agent (say a firm)
or from the environment that allows the former agent to update his/her
belief about the uncertain parameter. As the agent receives more signals,
the uncertainty gets resolved over time. While there exist instances where
the researcher also observes the signals received by the agent, in most
instances this is not the case. In such situations the signals received become
part of the set of unobservables from the researcher’s perspective.
A fourth set of unobservables comes from measurement error. For
instance one might be interested in studying the relationship between the
level of advertising received by a consumer and the purchases that might
be caused by this advertising. In these cases, one observes advertising at a
level different from the exact exposure that the consumer members receive.
Rather, one might have proxies for advertising such as the expenditure
on that activity in the market where the consumer resides or the average
exposure of the specific demographic profile to which the consumer
belongs. Such errors in the measurement of variables constitute another
unobservable from the researcher’s perspective.
Structural models: A simple illustration
I begin with the classic brand choice model that has been ubiquitous
in marketing and that is based on the model of consumer utility maxi-
mization. I use the framework from Deaton and Muellbauer (1980) or
Hanemann (1984), for a consumer i on purchase occasion t choosing from
MIZIK_9781784716745_t.indd 202 14/02/2018 16:38

among J brands in a category (j = 1,2,. . .,J). The consumer starts out with

a bivariate direct utility function; with one argument being a “quality”
egory ( g Jj51 cijt xijt) and the other being the quality weighted quantity of
(cijt) weighted sum of the quantities (xijt) of each of the brands in the cat-
an “outside” good. When the consumer maximizes this utility function

subject to a budget constraint; the condition under which a single brand, j,
is picked from the category is given by the following expression (see, e.g.,
Hanemann 1984):
pijt min pijt 1
5 min e a b, f (9.1)
cijt k 51,2,..,J cijt ci0t
where pjt denotes the price of brand j and the price of the outside good
has been normalized to 1 and cijt denotes the quality of the outside good.
The First Unobservable: An Aspect of Quality Known to the Consumer, but

Not Observed by the Researcher
Since the quality is a positive quantity, we can define the quality as

|
aj 1Zjt b 1 eijt) where |
cijt 5 exp (| aj denotes the intrinsic utility that con-
sumers have for brand j; Zjt are the marketing variables (other than price)
|
associated with brand j on occasion t; b is the vector denoting the effects
of these marketing variables on the indirect utility; and eijt denotes other
factors that are observed by the consumer but not by the researcher that
affect quality for the brand at that occasion for the consumer (some of
the “unobservables” referred to earlier). Further, I write the quality of
the outside good or the “no purchase” alternative as: ci0t 5exp (ei0t) . Now
taking the logarithm of both sides of equation (9.1) and, simplifying, we
can write uijt as the following equation:
|
uijt 5 |
aj 1 Zjt b 2ln ( pjt) 1 eijt (9.2)
Following in the long tradition of logit models starting with McFadden

(1974) in economics and Guadagni and Little (1983) in marketing, I make
the assumption that the eijt terms (for alternatives 0 through J) have the
i.i.d. extreme value distribution with scale factor q. We can therefore
obtain the probability that the consumer i chooses brand j on purchase
occasion t Prijt 5 P (uijt $ uikt ,4k 5 0,1,2,3,. . .,J ) as follows:
exp (aj 1Zjt b 2uln ( pjt))

Prijt 5 (9.3)
1 1 a exp (ak 1Zkt b2uln ( pkt))
J
k51
MIZIK_9781784716745_t.indd 203 14/02/2018 16:38

where aj (referred to as the intrinsic preference parameter) and b (referred

to as the responsiveness parameters) are scaled versions of the original
parameters in the quality functions.
Why is the logit model as described above a “structural” model? Recall,
a key ingredient of a structural model is the presence of an economic
agent – in this case, the consumer. Further, the consumer engages in
optimizing behavior – in this case that of utility maximization. Based on
this behavior we have obtained a relationship between outcomes that we
observe in the data (purchases of the different brands) and the various
explanatory variables such as prices and other marketing variables is
obtained as a consequence of such behavior.
Estimation of the parameters Q 5 { aj , j 51,..J; b;u } of the above
model proceeds usually with consumer-level choice data over time.
While other approaches have been used as well, a popular means of
estimating the model parameters is via maximum likelihood estima-
tion. First, we write out the joint likelihood of purchases across
purchase occasions, brands and consumers that corresponds to the
actual purchases one observes in the data and then choosing the Q to
maximize this likelihood function. The model parameters are identified
as follows. The share of purchases in the data corresponding to each
brand and to the outside good identifies the aj parameters; whereas
the { b,u } parameters are identified off how the choices made by con-
sumers vary with changes in the prices and other marketing activities
across consumers, brands and purchase occasions. Even in the absence
of panel data, i.e., only with consumer choices on one purchase occa-
sion, the parameters are identified due to variation across brands and
consumers.
The Second Unobservable: Consumers are Heterogeneous in Their

Preferences and How They Respond to Marketing Activities
The next set of unobservables that we can introduce into the above
model corresponds to the heterogeneity across consumers in their
preference and responses to marketing activities. Accordingly, several
researchers, e.g., Kamakura and Russell (1989), Chintagunta, Jain and
Vilcassim (1991), Gonul and Srinivasan (1993), Rossi, McCulloch and
Allenby (1996), among many others, have allowed Q to vary across
consumers following some distribution (either discrete or continu-
ous) across consumers such that Qi , f (Q) ; where f(.) represents the
density of a specified multivariate distribution. Specifically, when the
parameters are heterogeneous, the consumer’s probability of choosing
brand j can be written as:
MIZIK_9781784716745_t.indd 204 14/02/2018 16:38

exp (aij 1Zjt bi 2ui ln ( pjt))

Prijt 5 (9.3)
1 1 a exp (aik 1Zkt bi 2 ui ln ( pkt))
J
k51
A popular choice for Q is the multivariate normal distribution such

that Qi , MVN (Q, W) where Q denotes the mean vector of the multi-
variate normal distribution and W is the associated covariance matrix.
Identification of the parameters of this model, in contrast with those
from the previous model, requires the presence of panel data. Why? As
before, the mean parameters of the heterogeneity distribution Q require,
in principle, only data such as those required for model (9.3). However,
the identification of the parameters of the W matrix comes from how an
individual consumer’s purchase shares of the various brands varies across
consumers (for the aj parameters); and how that consumer’s purchases
change with changes in prices and other marketing activities vis-à-vis the
purchases of other consumers. The more the variation in within-consumer
behavior across consumers in the sample, the larger the estimated het-
erogeneity across consumers. However, if the nature of variation for one
consumer is very much like that for any other consumer, then there is little
to distinguish between the behaviors of the different consumers, leading to
the finding of limited heterogeneity in the data.
The estimation of the parameters of this model once again proceeds by
constructing the likelihood function. Since a given consumer is assumed
to carry the same vector of Q parameters across purchase occasions, the
likelihood function is first constructed for an individual consumer across
his or her purchases, conditional on the parameters for that consumer
(which represents a draw from the heterogeneity distribution). The uncon-
ditional likelihood for the consumer is then just the conditional likelihood
integrated over the distribution of heterogeneity across consumers. The
sample likelihood will then be the product of the individual unconditional
likelihoods across consumers.
An important point to note is that for the model in (9.2) either with or
without heterogeneity, the model prediction for a given set of marketing
variables and prices will be a probability that a consumer purchases a
brand at that purchase occasion. This prediction will not be perfect since
we as researchers never observe the error term, or unobservable, eijt.
A Detour: Discrete-choice Demand Models for Aggregate Data
More recently, the logit model has been used in conjunction with aggre-
gate data – store or market (e.g., Berry 1994; Berry, Levinsohn and Pakes
MIZIK_9781784716745_t.indd 205 14/02/2018 16:38

1995; Nevo 2001; Sudhir 2001) level data. Assuming for now that there is
no heterogeneity in the intrinsic preference or the responsiveness param-
eters, the probability of a consumer purchasing brand j is once again given
by the expression in equation (9.3). Aggregating these probabilities across
all consumers (N) visiting the store or purchasing in that market in a given
time period t (say a week) we can obtain the market share as follows:
Sjt 5 a b a Prijt 5 Prijt 5

1 N exp (aj 1Zjt b 2uln ( pjt))
(9.4)
1 1 a exp (ak 1Zkt b2uln ( pkt))
N i51 J
k51
The sampling error associated with the share in equation (9.4) is then
given as follows:
Sjt (12Sjt)
sejt 5 (9.5)
Å N
It is clear that, as the number of consumers in the market becomes
“large,” the sampling error shrinks to zero. And equation (9.4) will rep-
resent the market share of the brand in that week. At the aggregate level,
however, Sjt represents a deterministic relationship between the various
explanatory variables (prices and other marketing variables) and the
outcome variable – market share. Recall that this was not the case at the
individual level. So although the expressions in the two cases are identical,
the nature of the outcome variable has different implications.
At issue is that if the expression in equation (9.4) is to be used as a
predictor of the outcome variable, market share, then it implies that given
a set of parameters and a set of observable variables, researchers will
be able to predict market shares perfectly, i.e., with no associated error.
Clearly such a claim would be inappropriate as one cannot perfectly
predict shares. This brings up a need for another error that can explain the
discrepancies between the model prediction and what we observe in the
data in terms of the brand shares for different time periods. An easy way
in which these errors can be introduced is additively in equation (9.4). In
other words we can write the share expression as:
exp (aj 1Zjt b2uln ( pjt))

Sjt 5 1 ejt (9.5)
1 1 a exp (ak 1 Zkt b 2uln ( pkt))
J
k51
But would such an error term be viewed as being “structural”? Perhaps

the error can be viewed as measurement error in shares. However, the
source of the deviation in unclear.
MIZIK_9781784716745_t.indd 206 14/02/2018 16:38

Unobserved Demand Factors at the Aggregate Level (i.e., Common Across

Consumers)
One can alternatively argue that these are brand-level factors that have not
been included as part of vector { pjt , Zjt } that we have already introduced
into the model. So these are unobservables like shelf space and shelf loca-
tion that are common across consumers who visit a store, are brand spe-
cific, influence shares, but are not observed by us as researchers (in most
cases). So if the error term captures such factors that have been omitted
in the model, where would they belong? It appears that they should be
included as a brand- and week-specific measure of quality when one is
looking at store share data. Denoting these factors as xjtfor brand j in week
t, the share equation in (9.5) can instead be written as:
exp (aj 1Zjt b2uln ( pjt) 1 jxjt)

Sjt 5 (9.6)
11 a exp (ak 1Zkt b 2u ln ( pkt) 1jkt )
J
k51
Since the jjt are not observed by us as researchers, they qualify for inclu-
sion as unobservables. Further, since they are integral to the utility
maximization problem considered earlier, they can also be viewed as being
structural in nature.
So the (observed) explanatory variables are the same as those in equa-
tion (9.2) but the outcome variable is the shares of the different brands
in a given market- and time- period. Per se, estimation of the model in
equation (9.6) is straightforward since it can be “linearized” as follows:
Sjt
lna b 5 aj 1 Zjt b2u ln ( pjt) 1jjt (9.7)
S0t
In general, given the observables in the above model, it would appear

that the unknown parameters can be estimated within a regression frame-
work. Indeed, that is the case. The structural error term jjt plays the role
of the error term in this regression.
One issue to be cognizant of when estimating the parameters using the
aggregate data is to make sure that one understands how managers are set-
ting their levels of Zjt, pjt, and jjt . Consider a store manager who provides
prime shelf space for a product that she then wants to charge a premium
price for. In this case, pjt is being set based on the jjt for that brand. In
such a situation, one of the explanatory variables in the model, i.e., price,
will be correlated with the error term in the model, jjt . In other words, in
MIZIK_9781784716745_t.indd 207 14/02/2018 16:38

this case, prices are being set “endogenously” and one must address the
associated endogeneity issue.
I will not go into the issue of endogeneity and how one goes about
resolving endogeneity in such a model. Others have tackled this issue
(Berry 1994; Berry et al. 1995; Rossi 2014). Briefly, there are two broad
approaches to tackling the issue – one that is agnostic about the specific
data-generating process that leads to the endogeneity issues (sometimes
referred to as a “limited information” approach) and one that considers
the data-generating process more explicitly (sometimes referred to as
a “full information” approach). Under the former category, we have
instrumental variables approaches (e.g., see the discussion in Rossi 2014),
control-functions (Petrin and Train 2010) and so on. Examples of studies
using the latter approach include, e.g., Yang et al. (2003). Thus, while
there are several approaches to addressing the problem, consensus about
a universal best approach is lacking. There are of course pros and cons
associated with each approach and each context within which it is applied.
While the presence of the structural error term jjt in equation (9.7)
addresses the issue of variability of shares from observed outcomes,
there is another form of variability that the model does not account for.
Specifically, the model in equation (9.6) suffers from the Independence
from Irrelevant Alternatives (or IIA) problem. In particular, what that
means is that if brand j changes its prices then the shares of the other
brands will change proportional to those brands’ market shares (i.e.,
in a manner consistent with the IIA assumption). In reality, of course,
careful inspection of the share data in conjunction with changes in prices
(for example) might reveal to the researcher that the IIA assumption is
inconsistent with the data on hand. In such instances the logical question
that arises is: how can I modify the model to be able to accommodate
deviations from the IIA?
The answer to this stems from one of the unobservables we have
already introduced – that of heterogeneity in the preferences and response
parameters. The presence of “heterogeneity” in preferences and respon-
siveness parameters results in an aggregate share model that no longer
suffers from the IIA problem. This is how. Recall that, in the context of
consumer data, we allowed these consumers to have parameters Q that
vary according to a multivariate normal distribution. The question then
becomes, if such heterogeneity exists at the consumer level, what does
the aggregate share of brand j look like in week t? If the consumer level
probability is given by the expression in equation (9.3) then the aggregate
share of brand j in week (or some other time period) t requires us to
integrate out the heterogeneity distribution in that week. This yields the
following expression.
MIZIK_9781784716745_t.indd 208 14/02/2018 16:38

Sjt 5 3
exp (aij 1Zjt bi 2ui ln ( pjt) 1 jjt)
f (Qi ) dQ
11 a exp (aik 1Zkt bi 2ui ln ( pkt) 1 jkt)
J
k51
53
exp ([ aj 1Zjt b2u ln ( pjt) 1 jjt ] 1 [ Daij 1 Zjt D bi 2 Dui ln ( pjt) ])
11 a exp ([ ak1Zkt b2u ln ( pkt)1 jkt ] 1 [ Daik1Zkt D bi 2 Dui ln ( pkt) ])

J
k51
f (DQi) dDQ (9.8)
In equation (9.8), aij 5 aj 1Daij , where the first term on the right-hand-
side, aj, is the mean of that parameter across consumers and the second
term is the deviation of consumer i’s preference from the mean. The second
line of equation (9.8) separates the part that is not consumer-specific from
the part that is; so the heterogeneity distribution only pertains to the
distribution of consumer deviations DQifrom the overall mean. Thus,
DQi ,MVN (0,W) .
From the above expression it is clear that the ratio of the shares of
two brands, j and k, depends on the levels of the explanatory variables
of all other brands and hence free from the effects of the IIA property. A
clear downside to the model in (9.8) is that it is no longer linearizable as
it once was. Hence other approaches need to be employed to address the
unobservability of jjt. In particular, Berry (1994) proposed the contraction
mapping procedure to isolate the component aj 1Zjt b 2uln ( pjt) 1 jjt (or
the “linear utility” component in the language of Berry and BLP) in the
first square bracket in the numerator and denominator from (9.8) above;
conditional on a chosen set of parameters for the “non-linear” part, i.e.,
that corresponding to the heterogeneity distribution. This restores the lin-
earity we saw in (9.7) and regression methods can once again be employed.
An alternative approach that has been proposed more recently is that by
Dube, Fox and Su (2012) using an MPEC (Mathematical Programming
with Equilibrium Constraints) approach. The identification of the param-
eters of this model was implicit in my discussion for the motivation of
including the “additional” error term (to better fit share variability over
time) and heterogeneity in the parameters (to better account for deviations
from IIA). Small deviations from IIA will result in finding low variances
for the heterogeneity distribution, DQi ,MVN (0,W) .
MIZIK_9781784716745_t.indd 209 14/02/2018 16:38

Back to the consumer demand model
The above discussion covers the first two types of unobservables identified
earlier. It also introduced a third unobservable identified in the context of
aggregate demand data.
A Third Unobservable: Consumption (and other) Signals Received by

Consumers as They Seek to Learn about the Quality of a Product
The third set alluded to previously involves agent uncertainty about a

specific parameter in the model. In the logit model, this is often assumed
to be the preference for a product, i.e., aj . What is a context within which
such uncertainty could occur? One obvious case would be when a con-
sumer encounters purchasing in a new category (s)he has not purchased
from before. Take, for example, a consumer who has newly become a
first-time parent and has never purchased diapers before. In this instance,
the consumer might not know the quality of each of the brands of diapers
available in the marketplace. When this happens, aj is not known to the
consumer and can hence be thought of as a random variable from the
consumer’s perspective, a j . Now, if we assume that the consumer is risk-
(
neutral and maximizes expected utility then the probability of the con-
sumer purchasing brand j will be:
exp (E (aj) 1Zjt b2u ln ( pjt))

(
Prijt 5 (9.9)
11 a exp (E (ak) 1 Zkt b2u ln ( pkt))
J
k51
where E(.) is the expectation operator.

The question is: what happens when the consumer does not know the
mean of the distribution of aj? In such a situation, does the consumer
(
seek to resolve his or her uncertainty regarding this quality, and if so how
does (s)he do it? (The following discussion draws heavily from Sriram
and Chintagunta 2009.) Here we consider the case in which the consumer
learns about the unknown quality. The typical assumption is that consum-
ers learn in a Bayesian fashion over time. Let aj be the true quality of the
brand j. Consumers do not know this true quality. And while they know
that it comes from a distribution, unlike the case above, they do not know
the mean of that distribution. In period 0, the consumer starts with a prior
belief that the quality is normally distributed with mean a0j and variance
s0j2 , i.e.,
a0j | (a0j , s20 j) (9.10)
(
MIZIK_9781784716745_t.indd 210 14/02/2018 16:38

For now we assume that the above prior belief is common across
consumers. In period 1, the consumer would make a purchase decision
based on these prior beliefs for each of the J brands. If consumer i, i = 1,
2, . . . I, purchases brand j, she can assess the quality of the product from
her consumption experience, aEij1. If we assume that the consumer always
derives the experience of quality that is equal to the true quality, then this
one consumption experience is sufficient to assess the true quality of the
product. However, in reality, this experienced quality might differ from
the true quality, because of (1) intrinsic product variability and/or (2) idi-
osyncratic consumer perceptions. Hence, researchers typically assume that
these experienced quality signals are draws from a normal distribution
whose mean equals the true quality, i.e., that these are unbiased signals.
Thus, we have
aEij1 | N (aj ,sj2)
where sj2 captures the extent to which the signals are noisy. Thus, for
learning to extend beyond the initial purchase, we need sj2 > 0. In (9.11)
consumers do not know the mean but are assumed to know the variance.
Subsequent to the first purchase (and consumption experience) the
consumer has some more information than the prior she started with.
Consumers use this new information along with the prior to update
their beliefs about the true quality of the product in a Bayesian fashion.
Specifically, since both the prior and the signal are normally distributed,
conjugacy implies that the posterior belief at the end of period 1 would
2 such that
also follow a normal distribution with mean aij1 and variance sij1
$
aij1 5 uij1 a0j 1 v ij1aEij1
2 5
1
sij1
1 1
1
s 20j s 2j
sj2
uij1 5
s 20j 1 s 2j
$ 5 s0j2
vij1 (9.12)
s0j2 1 sj2
If none of the other brands is purchased in the first period, the posterior
distributions for those brands will be the same as the prior distributions as
there is no additional information to update the consumer’s beliefs about
these brands.
MIZIK_9781784716745_t.indd 211 14/02/2018 16:38

This posterior belief at the end of period 1 acts as the prior belief at
the beginning of period 2. Thus, when the consumer makes a purchase
decision in period 2, she would expect her quality experience to come from
the distribution
aij2 , N (aij1, s2ij1)
(
On the other hand, a consumer who does not make a purchase in period
1 will use the same prior in period 2 as she did in period 1. Hence, we can
generalize the above equations for any time period t, t = 1, 2, . . ., T, as
follows
$ a
aijt 5 uijt aij (t21) 1 vijt Eijt
2 5
1 1
(9.13)
a Iijt
sijt 5
1 Iijt t
1
sij2 (t21) sj2 1 t51
1
s0j2 sj2
sj2
uij1 5
Iijt sij2 (t21) 1 sj2
$ 5 Iijt sij2(t21)
vij1
Iijt sij2(t21) 1 sj2
Where Iijt is an indicator variable that takes on the value 1 if consumer

i makes a purchase of brand j in period t and 0 otherwise. Similarly, when
the consumer makes a purchase in period t+1, she would assume that the
quality of the product comes from the posterior distribution at the end of
period t. The above equations also imply that as the number of consump-
tion experiences increase, the consumer learns more and more about the
true quality of the product. As a result, her posterior mean would shift
away from her initial prior and move closer to the true mean quality.
Similarly, as she receives more information, her posterior variance would
decrease. It is in this sense that the consumer “learns” about quality in this
model.
In learning models as described above, the consumer actually observes
the signals aEijt in each time period; so this quantity is known to the
consumer. However, the signal observed by the consumer is seldom
observed by the researcher (for an exception see Sriram et al. 2015). Thus
MIZIK_9781784716745_t.indd 212 14/02/2018 16:38

in such situations the signals received by consumers become part of the set
of unobservables from the researcher’s perspective. Researchers typically
assume, as above, that the signals come from a known distribution with
unknown parameters and then simulate these signals over the course of
the estimation. Accordingly, identification in learning models poses a
challenge. One needs to observe a pattern in the data that suggests that
behavior evolves over time consistent with converging towards some
preference level if indeed there is support for the Bayesian updating
mechanism described above.
For example, one implication of the expression in equation (9.13) is
that if the variance of the received signals s2j is high then learning will
be slower than when the variance is low. As an example of identification
using this idea, Sriram et al. (2015) look at a situation where the variance
of signals received by consumers can be high or low with these variances
being observed by researchers. The context is that of consumers deciding
whether to continue subscribing to a video-on-demand service. Consumers
who receive high (low) quality service are more likely to continue (stop)
subscribing but consumers are uncertain about their quality. They learn
about this quality based on the signals received. If the signals consumers
receive have low variance then consumers receiving either high or low
quality of service learn about this quality quickly; those with high quality
continue with the firm and those with low quality leave, i.e., terminate the
service. But if signals have a high variance, learning is slow and consumers
receiving low quality service may continue with the service. Indeed, the
patterns in the data suggest precisely this nature of behavior. Figures 9.1
and 9.2 below are adapted from Sriram et al. (2015).
Given the nonlinearity associated with learning models, one often
finds evidence of learning even when it is unclear whether such learning is
going on in the data. Thinking about the sources of identification prior to
estimation makes for good practice not just with these models but with all
econometric models in general.
Why do we need structural models?
Structural models are useful in many contexts; I highlight two of them

here. The first is in quantifying the effects of various marketing interven-
tions by estimating the underlying parameters of the structural model of
interest. The second is using the estimated parameters from the model to
assess the consequences of changing one of the ingredients of the model.
For example, one might be interested in understanding the consequences
of changing the nature of interactions among the agents involved in the
MIZIK_9781784716745_t.indd 213 14/02/2018 16:38

10
8
Termination probability (%)
0
1 2 3 4 5 6 7 8+
No. of periods of high/low quality encounters
Figure 9.1 Evidence of differential learning among consumers

experiencing low variability
structural model. I will now illustrate these two types of applications and
explain why it might be difficult to make the same assessments sans the
structural model.
In the Sriram et al. study (2015) mentioned above, some consumers are
exposed to signals about the quality they receive that have high variance
whereas the signals that others receive have low variance. The latter are
able to learn about the true quality they receive quicker than those with
high variance. An implication of this is that when consumers are uncertain
about the quality they experience, those experiencing low temporal vari-
ability in quality are likely to be more responsive (in terms of termination)
to the average quality level compared to those experiencing high vari-
ability. Specifically, if, at the time of signing up for the service, a consumer
has a high prior belief on the quality, then it becomes more difficult for
the consumer to learn that the quality is actually low when the variance
of signals received is high. As a consequence these consumers will respond
less, in terms of termination, to the quality they receive. On the other
hand, for consumers receiving higher quality than their prior belief, high
variability will interfere with such learning so termination may be higher
than for those with high quality but low signal variability. In other words,
we would see an interaction effect between average quality and variability
MIZIK_9781784716745_t.indd 214 14/02/2018 16:38

8
Termination probability (%)
0
1 2 3 4 5 6 7 8+
No. of periods of high/low quality encounters
High quality Low quality
experience experience
HQ+ LQ+
HQ– LQ–
Figure 9.2 Evidence of differential learning among consumers

experiencing high variability
on termination in the data. Indeed, the authors find such an interaction

effect in the data. Interestingly, the data also reveal that the main effect
of variability is negative, which is indicative of a form of “risk aversion”
among the consumers. Such a risk aversion effect would also translate to
higher termination at high levels of variability. To quantify the level of
quality sensitivity and risk aversion, however, requires a model that also
controls for other factors that could be affecting termination behavior.
This is the role that the structural model plays in that article. Estimating
the quality effect for different consumers in such a model provides insights
to managers interested in lowering termination rates for their service.
Now consider the case when one did not use a structural model based on
the data patterns but instead specified a functional relationship between
termination behavior and the level of quality received by a consumer. Such
a model would be entirely plausible for the data on hand since the interest
would be on quantifying the effects of raising or lowering quality on termi-
nation behavior. While such a model can be made extremely flexible, it is
unclear whether it would have included variability as a covariate. Suppose
MIZIK_9781784716745_t.indd 215 14/02/2018 16:38

the researcher chooses to include variability, the likely conclusion would

have corresponded to the main effect of variability mentioned above – that
of higher variability leading to a higher termination rate. What would
have been critical to include would be the interaction effect. Even if the
researcher chooses to include an interaction effect, it would be unclear
where such an effect would be coming from and what the consequences of
such an effect would be for a manager trying to change the level of quality
available in the marketplace. As the structural model reveals, variability
aids retention at low quality levels so the manager would have to assess
the consequence of affecting quality in such a scenario. The structural
model is useful in assessing what would happen in this context. Of course,
structural models are not infallible – an incorrectly specified model would
lead to incorrect inferences being drawn about the behavior of consumers.
Hence it is crucial to base the model on patterns observed in the data and
to then check for robustness of the results to alternative specifications that
might also be consistent with patterns in the data.
Next, I turn to an example where the structural model can help answer
a question dealing with a change in agent interaction or the structure of
the market in which the agents make decisions. An important article in
this area that showcases this role of structural models is Misra and Nair
(2011). The article looks at the topic of salesforce compensation and asks:
What is the likely consequence of modifying the compensation scheme
provided to the salesforce? Companies may be interested in answering
this question but may be reluctant to experiment with alternative schemes
for several reasons. First, changing the compensation scheme could be, at
least in the short-run, a very expensive proposition for the firm. Second,
an inappropriate change in schemes might have a negative impact on the
morale of the salespeople. Thus, if there is a way for the firm to under-
stand the consequences of changing the compensation scheme, such an
approach would be very valuable to the firm. This is where the Misra and
Nair article comes in.
The authors have access to rich individual salesperson-level perform-
ance data (in terms of sales calls made and sales generated) from a specific
firm. This allows them to build a rich dynamic structural model of agent
behavior that captures the specifics of the compensation scheme that the
firm had in place as well as the data patterns that characterize the behavior
of the salespeople. Next, Misra and Nair estimate the model parameters
(using recent techniques for the estimation of such dynamic models). The
important aspect of this article is what it does next. It does not content
itself by simply estimating the model parameters; rather, the authors
first conduct counterfactuals with alternative compensation schemes to
understand specific schemes in which firm profits would go up. Next, they
MIZIK_9781784716745_t.indd 216 14/02/2018 16:38

implement a new compensation scheme for the employees of the firm.

The behavior of the salespeople as well as their output levels change in a
manner as predicted by the counterfactual analysis under this new com-
pensation plan. The new plan results in a 9 percent improvement in overall
revenues. Such an increase corresponds to about $12 million incremental
revenues annually. In addition, the article shows an improvement in
performance and satisfaction among the salespersons after the implemen-
tation of the new program. This provides a very strong vindication of the
use of structural models to improve outcomes for firms as well as their
employees. Further, the insights from the structural model are critical for
identifying and evaluating alternative schemes and their consequences.
Clearly, a field implementation of the output of a structural model is
quite novel; indeed, this is a direction in which the literature in structural
models appears to be progressing. In addition to the above study, there are
a few other studies that have assessed the external validity of predictions
from structural models – Cho and Rust (2008), in the context of imple-
menting new auto rental policies, and Bajari and Hortacsu (2005), in the
context of estimating bidder valuations in auctions, to name a couple. My
expectation is that such studies will gather steam in the future.
Next, I discuss two more recent articles, Rossi and Chintagunta (2015,
2016), where the context is more slanted toward public policy. The idea
behind the first study is as follows. On the Italian highway, drivers are
faced with the problem of not knowing the prices at the gasoline stations
that are located on the highway. Price information can only be obtained
by getting off the highway and driving to the rest stop. Drivers in other
countries face a similar problem, i.e., while information on the location
of the next station is posted on the highway, prices at the station are
not known to the drivers. To engender price transparency and make the
information more accessible to drivers, the Italian government required
the management of the highway system to install price signs on the high-
way. These signs, located every four stations, were required to provide the
prevailing prices at the four gas stations following the sign in the direction
of travel. The signs were installed between July 2007 and 2009. What is of
interest here is whether the introduction of the signs resulted in a change in
prices charged by the stations whose prices are posted on the signs relative
to those whose prices are not posted.
In order to measure the impact of the price signs, it is important to con-
trol for a variety of confounding factors that might affect the identifica-
tion and estimation of the effect of signs on prices. Rossi and Chintagunta
(2016) find that the installation of signs indeed lowers prices charged by
stations whose prices are posted on the signs. Curiously, however, the level
of dispersion across prices on a given sign does not diminish significantly
MIZIK_9781784716745_t.indd 217 14/02/2018 16:38

as a consequence of sign installation. A potential explanation for this is

that while 94 percent of those driving past the first station on the sign
also drive past the sign, the number drops to 64 percent for the second
station, 49 percent for the third station and only 39 percent for the fourth
station. This means that having signs every fourth station does not inform
a majority of consumers driving past a station about prices at that station.
A question that then arises is: by how much further would prices at the
stations fall if drivers were informed about prices at all stations. Such a
scenario can, e.g., occur if signs were installed prior to each and every
station on the highway. Since there is a cost associated with installing
these signs, a related question that arises is whether the benefits outweigh
the costs in this situation and whether we can determine this even prior to
the installation of the signs. This is where the structural model comes in.
Rossi and Chintagunta (2015) develop a structural model that incor-
porates consumers’ uncertainty about prices when driving down the
motorway. Resolving the uncertainty in the absence of price signs requires
consumers to engage in costly search, i.e., they need to drive to the gas
station to obtain price information. This could lead to higher prices at the
pump since the gas station recognizes that if the consumer leaves without
filling gas, they will need to expend the search cost again to visit another
station. For drivers transiting in front of the sign, price uncertainty is
resolved due to the presence of the sign. The authors then leverage the
difference in pre- and post-disclosure prices to recover the cost that a
fraction of consumers (who are exposed to the price signs and whose data
are available to the authors) incur to obtain price information before
the signs are installed. A second component of the structural model that
Rossi and Chintagunta propose involves the oligopolistic price-setting
behavior of gas stations given the above demand model. This component
of the model allows them to predict the level of prices that would prevail
if all consumers have access to price information in the counterfactual
scenario. The authors find that, compared with the case of perfect price
information, in the absence of mandatory price disclosure, gas stations
increase their margins by about 31 percent thereby indicating the benefits
of installing the signs. This approach therefore provides valuable input to
policy makers considering the costs and benefits of installing additional
signs on the highway.
Looking back and looking ahead
A large fraction of structural models in marketing has tended to fall

into three main buckets. The first of these is models of “demand” and
MIZIK_9781784716745_t.indd 218 14/02/2018 16:38

“supply.” Such models have a long association in the economics literature.

According to Reiss and Wolak (2007), such models have been popular
since the time of the Cowles Commission for Research in Economics – an
economic research institute founded by the businessman and economist
Alfred Cowles in Colorado Springs in 1932. The commission, which also
had a home at the University of Chicago from 1932 to 1955 and now
is located at Yale University, emphasized econometrics in the context
of “economic equilibrium.” It is in this light that a vast majority of
early structural models in marketing developed and flourished (see e.g.,
Bronnenberg et al. 2005 for a discussion of models built in this tradi-
tion). The typical structure of these models entails a demand specification
derived from the underlying utility behavior of consumers; and a supply
model of firm behavior that characterizes firms’ actions for a variety of
marketing mix decisions – prices, advertising, etc. In this bucket, I also
include studies that focus on simple and more complex demand models
(e.g., Berry et al. 2014) that explicitly account for supply-side considera-
tions in the estimation of demand parameters (e.g., Nevo 2001).
As a second bucket of structural models that have been popular in
marketing, I include those in the dynamic structural tradition. On the
demand side, dynamics can arise for several reasons (see Chintagunta and
Nair 2010 for a discussion) – storability, durability and experience goods,
among others. Why does storability result in dynamics in behavior? The
main reason is a purchase today by a consumer increases his or her inven-
tory of the product. In turn, this makes the consumer less likely to buy the
product tomorrow. Thus a marketer who encourages a customer to make
a purchase today needs to explicitly take into account the consequences of
this purchase for the future. Some examples of studies in this vein in mar-
keting are Erdem et al. (2003) and Sun (2005). Durable good demand, at
least as it refers to the first time adoption of a product, on the other hand,
is a dynamic problem because if a consumer makes a purchase today it
implies that the consumer is out of the market tomorrow. The consumer in
this case is explicitly trading off making a purchase today (at a potentially
higher price and lower quality) and enjoying the utility from consuming
the product for one day with waiting till tomorrow and buying the product
at a potentially lower price and higher quality. A good exemplar of this
research in marketing is Nair (2007). Experience goods I have referred to
previously under the nomenclature of learning models. Experience goods
are therefore characterized by ex ante uncertainty about some aspect of
the product (say its quality). This uncertainty is then resolved by consump-
tion. In this case, dynamics arise because if a consumer makes a purchase
today, it provides that customer with a signal of the uncertain aspect
(quality), which provides the consumer with new information when (s)he
MIZIK_9781784716745_t.indd 219 14/02/2018 16:38

goes to make the next purchase. This provides an explicit link between
purchasing today and purchasing tomorrow (see Ching et al. 2013). Note
however, that the model I described previously was a “myopic” model of
learning since it did not fully consider this intertemporal link.
The third bucket includes models that have recently seen an interest
in marketing – those involving uncertainty, not about the parameters
of the utility function as in learning models, but about some feature or
characteristic of the product itself. Here I am referring to the models
of search. Specifically, in this case, the consumer may not be perfectly
informed about the price of a product in the market and needs to engage in
costly (defined broadly as including time and psychological costs) search
to uncover information about price (as examples, see Mehta et al. 2003
and Honka 2014). Alternatively, consumers search for a product that
best matches their preferences, as in shopping online for a digital camera
that best suits one’s needs (e.g., Kim et al. 2010). In particular, as online
browsing behavior, visit and purchase information become more widely
available, I expect these models to see increasing application in marketing.
Structural models have certainly made an impact in the field of mar-
keting. While diffusion has taken a while, today they are considered an
integral part of the marketer’s toolbox. Looking ahead there appear to be
three principal domains in which the research seems to be progressing. I
will very briefly mention each of them in turn.
Combining Multiple Data Sources
I alluded to this first topic in an article with Harikesh Nair (Chintagunta

and Nair 2010). As structural models get more complicated, they place an
increasingly bigger burden on the data used for parameter identification
and estimation. While one ideally seeks patterns in the data that can iden-
tify the key parameters of interest (see Einav and Levin 2010), researchers
in marketing are increasingly recognizing that one can leverage multiple
sources of data – outcomes data from the marketplace, survey data on
consumers, experimental data from the lab – to improve the credibility
of estimates and to relax assumptions made by structural models. For
example, in the context of dynamic structural models it is notoriously
difficult to identify the discount factor of consumers (separately from
the other parameters in the model). Dube et al. (2014) show how we can
combine inputs from conjoint analysis to better inform the estimates of
such models (see also Rao 2015).
MIZIK_9781784716745_t.indd 220 14/02/2018 16:38

Combining Multiple Methods
Second, when identification depends critically on some variation in the

data, it may make sense to first establish that such a variation actu-
ally exists before constructing a complicated structural model. Often
the presence of the variation can be established via other methods, say
a difference-in-differences analysis in the first stage as a prelude to the
estimation of the structural model. Previously, I described the Rossi and
Chintagunta (2016 and 2015) papers. A key parameter of interest in the
latter article is the search cost incurred by customers when shopping for
gasoline. This parameter is identified off the change in prices charged by
gas stations after information provision via price signs. So it was impor-
tant to first establish that prices did change with the introduction of the
signs before attempting to identify the search costs from the structural
model. This required a “pre-analysis” using a different approach. My
sense is that going forward there will be a bigger need to bring multiple
methods to bear when dealing with increasingly more complex structural
models.
Using Field Experiments to Validate and Implement Recommendations

Based on Counterfactuals
Finally, the real power of structural models as a useful tool to improve

managerial practice is only now being seen. As field implementations of
recommendations from these models such as the one carried out by Misra
and Nair become more widespread, the power of structural models to aid
decision-making will increasingly become clear. Such implementations
are, however, not without associated costs. Consequently the availability
of a company willing to field-test model-based counterfactuals should not
be a substitute for carefully thought out structural models to obtain these
counterfactuals.
To summarize, I feel that while we have come a long way, there is still
much to be discovered in the realm of structural models in marketing.
Points (2) and (3) above make me particularly optimistic about bridging
the gap between the more economics-oriented researchers in marketing
and the more psychology-oriented researchers. First, as I alluded to
earlier, the models underlying structural methods can draw from beyond
the discipline of economics. Second, it is clear that knowledge and imple-
mentation of experimental methods will likely enrich our understanding
of markets using structural methods. This provides an excellent platform
for researchers with economics and psychology backgrounds to come
together to make contributions to the field of marketing.
MIZIK_9781784716745_t.indd 221 14/02/2018 16:38

NOTES
1. I thank Anita Rao and S. Sriram for their useful comments on an earlier version. My
thanks to the Kilts Center at the University of Chicago for financial support.
Please note that parts of this chapter appear elsewhere in the chapter “Structural
models in Marketing: Consumer Demand and Search” of the second edition of the
“Handbook of Marketing Decision Models,” edited by B. Wierenga and R. van der Lans.
2. A point to emphasize here relates to causality. If the researcher is interested only in estab-
lishing causality, then a structural model per se may not be required (see e.g., Goldfarb
and Tucker 2014).
References
Berry, S. (1994), “Estimating Discrete-Choice Models of Product Differentiation,” RAND

Journal of Economics, 25(2), 242–262.
Berry, S., Levinsohn and A. Pakes (1995), “Automobile prices in market equilibrium,”
Econometrica, 60(4), 841–890.
Berry, S., A. Khwaja, V. Kumar, B. Anand, A. Musalem, K. C. Wilbur, G. Allenby and P.
Chintagunta (2014), “Structural Models of Complementary Choices,” Marketing Letters,
25(3), 245–256.
Bronnenberg, B. J., P. E. Rossi and N. J. Vilcassim (2005), “Structural Modeling and Policy
Simulation,” Journal of Marketing Research, 42(1), 22–26.
Chan, T., V. Kadiyali, and P. Xiao (2009), “Structural Models of Pricing,” in Handbook of
Pricing Research in Marketing, Northampton, MA, USA and Cheltenham, UK: Edward
Elgar Publishing.
Ching, A. T., T. Erdem and M.P. Keane (2013), “Learning Models: An Assessment of
Progress, Challenges and New Developments, Marketing Science, 32(6), 913–938.
Chintagunta, P. K., D. C. Jain and N. J. Vilcassim (1991), “Investigating heterogeneity in brand
preferences in logit models for panel data,” Journal of Marketing Research, 42(1), 417–428.
Chintagunta, P. K., V. Kadiyali, N. Vilcassim and J. Naufel (2004), “Structural Models of
Competition: A Marketing Strategy Perspective,” in Christine Moorman and Donald R.
Lehmann eds. Assessing Marketing Strategy Performance, Marketing Science Institute.
Chintagunta, P. K., T. Erdem, P.E. Rossi and M. Wedel (2006), “Structural Modeling In
Marketing: Review and Assessment,” Marketing Science, 25(6), 604–616.
Chintagunta, P. K. and H. Nair (2010), “Discrete Choice Models of Consumer Demand in
Marketing,” Marketing Science, 30(6), 977–996.
Cho, S. and J. Rust (2008), “Is econometrics useful for private policy making? A case study of
replacement policy at an auto rental company,” Journal of Econometrics, 145(1–2), 243–257.
Deaton, A. and J. Muellbauer (1980), Economics and Consumer Behavior, New York:
Cambridge University Press.
Dube, J.-P., J. T. Fox and C.-L. Su (2012), “Improving the numerical performance of
static and dynamic aggregate discrete choice random coefficients demand estimation,”
Econometrica, 80(5), 2231–2267.
Einav, L. and J. Levin (2010), “Empirical Industrial Organization: A Progress Report,”
Journal of Economic Perspectives, 24(2), 145–162.
Erdem, T., S. Imai, and M. P. Keane. (2003), “Brand and quantity choice dynamics under
price uncertainty,” Quantitative Marketing and Economics, 1(1), 5–64.
Erdem, T., K. Srinivasan, W. Amaldoss, P. Bajari, H. Che, Teck H. Ho, W. Hutchinson,
M. Katz, M.P. Keane, R. Meyer, and P. Reiss (2005), “Theory-Driven Choice Models,”
Marketing Letters, 16(3), 225–237.
Goldfarb, A. and C. E. Tucker (2014), “Conducting Research with Quasi-Experiments: A
Guide for Marketers,” Rotman School Working Paper, Toronto, Ontario.
MIZIK_9781784716745_t.indd 222 14/02/2018 16:38

Gonul, F. and K. Srinivasan (1993), “Modeling multiple sources of heterogeneity in multinomial

logit models: Methodological and managerial issues,” Marketing Science, 12(3), 213–229.
Guadagni, P. and J. D. C. Little (1983), “A logit model of brand choice calibrated on scanner
data,” Marketing Science, 2(3) 203–238.
Hanemann, M. W. (1984), “Discrete / Continuous Models of Consumer Demand,”
Econometrica, 52, 541–561.
Honka, E. (2014), “Quantifying search and switching costs in the US auto insurance indus-
try,” RAND Journal of Economics, 45(4), 847–884.
Horsky, D. (1977), “An empirical analysis of the optimal advertising policy,” Management
Science, 23(10), 1037–1049.
Horsky, D. and P. Nelson (1992), “New Brand Positioning and Pricing in an Oligopolistic
Market,” Marketing Science, 11(2), 133–153.
Kadiyali, V., K. Sudhir, and V. R. Rao (2001), “Structural Analysis of Competitive
Behavior: New Empirical Industrial Organization Methods in Marketing,” International
Journal of Research in Marketing, 18(1), 161–186.
Kamakura, W. A. and G. J. Russell (1989), “A Probabilistic Model for Market Segmentation
and Elasticity Structure,” Journal of Marketing Research, 26, 279–390.
Kim, J., P. Albuquerque and B. Bronnenberg (2010), “Online demand under limited con-
sumer search,” Marketing Science, 29(6), 1001–1023.
McFadden, D. (1974), “Conditional Logit Analysis of Qualitative Choice Behavior,” in P.
Zarembda ed. Frontiers in Econometrics, New York: Academic Press, 105–142.
Mehta, N., S. Rajiv and K. Srinivasan (2003), “Price uncertainty and consumer search: A
structural model of consideration set formation,” Marketing Science, 22(1), 58–84.
Misra, S. and H. Nair (2011), “A structural model of sales-force compensation dynamics:
Estimation and field implementation,” Quantitative Marketing and Economics, 9(3), 211–257.
Nair, H. (2007), “Intertemporal price discrimination with forward-looking consumers:
Application to the US market for console video-games,” Quantitative Marketing and
Economics, 5(3), 239–292.
Nevo, A. (2001), “Measuring Market Power in the Ready-to-Eat Cereal Industry,”
Econometrica, 69(2), 307–342.
Petrin A. and K. Train (2010), “A Control Function Approach to Endogeneity in Consumer
Choice Models,” Journal of Marketing Research, 47(1), 3–13.
Rao, A. (2015), “Online Content Pricing: Purchase and Rental Markets,” Marketing Science,
34(3), 430–451
Reiss, P. C. and F. A. Wolak (2007), “Structural econometric modeling: Rationales and
examples from industrial organization,” in J. J. Heckman and E. E. Leamer eds. Handbook
of Econometrics, Vol. 6A, Amsterdam: North-Holland, 4277–4415.
Rossi, P. E., R. McCulloch and G. M. Allenby (1996), “The Value of Purchase History Data
in Target Marketing,” Marketing Science, 15(4), 321–340.
Rossi, F. and P. K. Chintagunta (2016), “Price Transparency and Retail Prices: Evidence from
Fuel Price Signs in the Italian Motorway,” Journal of Marketing Research, 53(3), 407–423.
Rossi, F. and P. K. Chintagunta (2015), “Price Uncertainty and Market Power in Retail
Gasoline,” working paper, University of Chicago.
Rossi, P. E. (2014), “Even the Rich Can Make Themselves Poor: A Critical Examination of
IV Methods in Marketing Applications,” Marketing Science, 33(5), 655–672.
Sriram, S. and P. K. Chintagunta (2009), “Learning Models,” Review of Marketing Research,
6, 63–83.
Sriram, S., P. K. Chintagunta and P. Manchanda (2015), “Service Quality Variability and
Termination Behavior,” Management Science, 61(11), 2739–2759.
Sudhir, K. (2001), “Competitive Pricing Behavior in the Auto Market: A Structural Analysis,”
Marketing Science, 20(1), 42–60.
Sun, B. (2005), “Promotion effect on endogenous consumption,” Marketing Science, 24(3),
430–443.
Yang, S., Y. Chen and G. M. Allenby (2003), “Bayesian Analysis of Simultaneous Demand
and Supply,” Quantitative Marketing and Economics, 1(3), 251–275.
MIZIK_9781784716745_t.indd 223 14/02/2018 16:38

MIZIK_9781784716745_t.indd 224 14/02/2018 16:38
part iv
latent structure
analysis
MIZIK_9781784716745_t.indd 225 14/02/2018 16:38

MIZIK_9781784716745_t.indd 226 14/02/2018 16:38
10. Multivariate statistical analyses:
cluster analysis, factor analysis, and
multidimensional scaling
Dawn Iacobucci
Cluster analysis, factor analysis, and multidimensional scaling are three

extremely useful techniques for academic and industry marketing research-
ers and consultants. Cluster analysis is useful in finding customer seg-
ments, factor analysis is useful for survey research, and multidimensional
scaling (MDS) is useful in creating perceptual maps. The basic logic and
goals of each were presented and illustrated, and references were suggested
to pursue further inquiry.
Cluster Analysis1
Market segments are composed of groups of customers who are similar

to each other with respect to their demographics, attitudes, brand prefer-
ences, or purchases, and those profiles differ from group to group. Cluster
analysis is perfectly suited to this goal, because it is designed to identify
groups of similar entities, with differences between clusters.
Figure 10.1 shows a simple example based on customer ratings of their
preference for power in a laptop or design in a tablet. A cluster analysis
will indicate that there is a segment of each kind of customer as well as
customers who might not belong to either group. Naturally, real data are
less clear-cut than Figure 10.1, and cluster analyses are typically based on
1
X2 = ‘I like
tablets with 2 7
sleek design’
3 4
X1 = ‘I need power
in my laptop’
Figure 10.1 Cluster analysis identifies clouds of similar data points
227
MIZIK_9781784716745_t.indd 227 14/02/2018 16:38

input from far more than two variables. For example, online recommenda-
tion engines cluster several millions of customers on thousands of SKUs.
There are several decisions to be made when conducting a cluster
analysis. They are: (1) data preparation, (2) the cluster model to be used,
and (3) the interpretation of the clusters. Each issue is discussed in turn.
Data Preparation
In preparation for a cluster analysis, one question is which variables

should be selected for inclusion. This issue is important in most types of
analyses, because a model’s results can only be as good as the quality and
coverage of the input variables informing the model. Obviously the content
mapping is important—decisions based on cluster-derived samples cannot
be made about launching a new product if the input variables reflected
only customer demographics and not their preferences and behaviors.
Companies may have internal data that are immediately relevant, through
their customer relationship management databases or their captures of
customer media consumption. Companies can also supplement their data,
e.g., zip codes of customer contact data, with free, online secondary data,
e.g., median household incomes for those zip from census.gov, or with
customized marketing research survey data.
In this selection stage, it is also important to note that if 10 input
variables measure, say, customers’ attitudes toward their favorite football
team and one input variable captures how much they spent on special
online sporting events, then the results will naturally be more a reflection
of attitudes than viewing expenditures. (In this case, the mean over the 10
attitude measures might be taken and used as a single input score along
with the media variable. The standard error of the scale based on 10 items
will likely be smaller than the standard deviation of the media variable,
implicitly still weighting the attitudes more than the media variable, but
the effect will be more subtle.)
Once the variables have been selected that will form the basis of the
segmentation, the cluster analysis needs to compute some measure of
similarity. Correlations are frequently used as an index of similarity
(e.g., r 5 11.0 indicates two customers with identical patterns, r 5 21.0.
indicates two customers with the opposite patterns). Correlations are
popular because of their familiarity and the ease with which they may be
interpreted. They reflect patterns (e.g., customers 1 and 2 are frequent
consumers of items X, Y, and Z, but infrequent purchasers of A, B, C),
but correlations do not reflect means. In business, those mean differ-
ences often reflect purchasing volume (e.g., customer 1 might purchase
X, Y, and Z twice as often as customer 2), so when volume matters, a
MIZIK_9781784716745_t.indd 228 14/02/2018 16:38

Multivariate statistical analyses 229
cutomer
ID Mystery Bio DIY
1057 3 2 1
0143 5 3 0
1552 0 1 1
0094 1 0 2
... ...
N
Means: 2.25 1.50 1.00
Figure 10.2 Online purchase data
better choice for an index of similarity (or dissimilarity) is the Euclidean
5 g k51
distance. Customers 1 and 2 would be deemed d12 units far apart, where
2
d12 r (x1k 2x2k) 2 across the k 5 1,2,. . .r variables. For more options
see Aldenderfer and Blashfield (1984) and Everitt et al. (2011).
Clustering Models
Next, the marketing analyst must choose among the many clustering algo-
rithms. Some cluster models are “hierarchical” and we show a popular
example of such a model—Ward’s method, and others are not, and we
show an example of that as well—k-means clustering. For each, we shall
illustrate using the small data set in Figure 10.2, which depicts the pur-
chase patterns of four customers across three genres of book purchases.
Ward’s method
Ward’s method is a clustering technique that operationalizes the intuition
that segments should consist of similar customers, whereas customers in
different segments should be different. In the statistical parlance, the clus-
tering model minimizes the variability within clusters, and maximizes the
variability between clusters.
Figure 10.3 shows the computation of the total sums of squares in the
Customer ID Mystery Bios DIY Computing Sum of Squares (SS)

A 1057 3 2 1 (3–2.25)2 + (2–1.50)2 + (1–1.00)2 = 0.81
B 0143 5 3 0 (5–2.25)2 + (3–1.50)2 + (0–1.00)2 = 10.81
C 1552 0 1 1 (0–2.25)2 + (1–1.50)2 + (1–1.00)2 = 5.31
D 0094 1 0 2 (1–2.25)2 + (0–1.50)2 + (2–1.00)2 = 4.81
Means: 2.25 1.50 1.00 SSTotal = 21.74
Figure 10.3 Entertainment data: preparation for Ward’s method
MIZIK_9781784716745_t.indd 229 14/02/2018 16:38

Possible means
Cluster for Mystery Bios DIY Error SS R2
{A&B} {C} {D} A&B 4.0 2.5 0.5 3.00 0.862
{A&C} {B} {D} A&C 1.5 1.5 1.0 5.00 0.770
{A&D} {B} {C} A&D 2.0 1.0 1.5 4.50 0.793
{B&C} {A} {D} B&C 2.5 2.0 0.5 15.00 0.310
{B&D} {A} {C} B&D 3.0 1.5 1.0 14.50 0.333
{C&D} {A} {B} C&D 0.5 0.5 1.5 1.50 0.931 min error,
max R2
Figure 10.4 Ward’s method: 1st iteration
small illustration data set: SStotal 5 21.74; i.e., the amount of variability
that may be apportioned across the clusters. Each step of the model seeks
to assign customers to groups so as to maximize R2. Recall, from regres-
sion, R2 is a measure of fit that indicates the amount of total variance
explained by the model. It is defined as: R2 5 12 (SSerror/SStotal) , so to say
that maximum variance is explained is also to say that error variability is
minimized.
Ward’s method begins with each of the N customers in his or her own
cluster (i.e., each cluster is of size 1). In the first iteration, customers are
combined to form clusters of size 2. First, customers A and B are combined,
and C and D are left in their own clusters. Then, customers A and C are com-
bined, with B and D left in their own clusters. Each possible two-customer
segment is created, and the R2 is calculated for each combination. For
example, the SSerror 5 3.00 the first row is derived by comparing customer
A’s data (and B’s data) to their combined means (4.0, 2.5, 0.5), as follows:
SSerror 5 (3 24) 2 1 (2 22.5) 2 1 (120.5) 2 1 (524) 2 1 (3 22.5) 2 1 (0 20.5) 2
= 3.00. In Figure 10.4 we see R2 maximized when customers C and D form
a segment, with customers A and B in their own individual segments.
Ward’s method is a “hierarchical” cluster model, which means that once
customers C and D are joined in a segment, they will always be in the same
cluster (whether other customers join that segment or not). Thus in Figure
10.5, the second iteration of the model treats C and D together, and tries
out all remaining possibilities of clusters—that customer A or customer
B might join the {C&D} segment, but the highest R2 is achieved when
customers A and B constitute their own segment.
Given the small size of this illustration data set, the only possible
iteration that remains would be for customer segment {C&D} to join with
{A&B}. The starting and endpoints in cluster analyses are not particularly
insightful—the starting place has all customers in separate segments, and
it is not very efficient for companies to truly customize their offerings for
MIZIK_9781784716745_t.indd 230 14/02/2018 16:38

Possible means
Cluster for Mystery Bio DIY Error SS R2
{C&D&A} {B} C&D&A 1.33 1.00 1.33 7.32 0.663
{C&D&B} {A} C&D&B 2.00 1.33 1.00 20.67 0.049
{C&D} {A&B} A&B 4.00 2.50 0.50 3.00 0.862 min error,
max R2
Figure 10.5 Ward’s method: 2nd iteration
each individual, and the endpoint has all customers in one segment, and
presumably a mass marketing strategy would not appeal to the customers
who are heterogeneous across segments. So the question is whether the
company finds more insight and utility in sorting customers into three
segments {C&D,A,B} or two {C&D,A&B}.
Ward’s method is popular and empirically well-behaved. It might be
less advised in application to so-called big data, because it requires large
numbers of combinations to be computed in early iterations.
K-means clustering
In k-means clustering, the marketing analyst has a rough guess that there
might be, say, five segments, and so tells the computer to derive a five-
cluster solution. The model sets k = 5 and proceeds. Obviously it would be
smart to also check k = 4, k = 6, and perhaps more solutions to see what
number of clusters might provide a partitioning of customers that seems
optimal in terms of parsimoniously fitting the data. The k-means solutions
are not hierarchical, so the four clusters when k = 4 might not be four of
the five clusters when k = 5, for example.
The k-means model begins with random assignment. Figure 10.6 shows
the four customers assigned to one of two clusters; k = 2 for this simple
example. The centroid (multivariate means) are computed for cluster 1,
which consists of customers B and C, and for cluster 2, which consists of
customers A and D. Those means are at the top of Figure 10.7.
Next in Figure 10.7, the distances are computed between each customer
Customer Random Number Assign to

A 0.8 cluster 2
B 0.3 cluster 1
C 0.4 cluster 1
D 0.6 cluster 2
Figure 10.6 K-means method: starting configuration
MIZIK_9781784716745_t.indd 231 14/02/2018 16:38

Means for Mystery Bio DIY
A&D 2.00 1.00 1.50

B&C 2.50 2.00 0.50
Customer and Cluster Distance2

A A&D 2.25
A B&C 0.50 move A to join B&C
B A&D 15.25
B B&C 7.50 keep B in B&C
C A&D 4.25
C B&C 7.50 move C
D A&D 2.25
D B&C 8.50 keep D
Figure 10.7 K-means method: 1st iteration
and the means of each cluster. If the customer is closer to the cluster he or
she is already assigned to, the customer stays put. If the customer’s data
more closely resemble the other cluster, the model will move the customer
to that other cluster. The distances are computed in Figure 10.7 for all
four customers, to diagnose whether they belong in the B&C cluster or the
A&D cluster. When the customers are reclassified, there now exist still k =
2 clusters, but they consist of customers A&B and C&D.
In Figure 10.8, the means of the new clusters are computed, and a new
assessment is conducted regarding whether each customer is in the optimal
cluster or again should be moved. Figure 10.8 shows that in this second
iteration, each customer is in the cluster with the mean profile that is
closest to his or her own individual data. Thus, no more iterations are nec-
essary, and the final partition is comprised of clusters {A,D} and {B,C}.
A question naturally arises as to how many clusters exist in the data. It is
answered by looking at the tradeoff of a large number of clusters explain-
ing the data better while the marketing analyst simultaneously seeks a
small number of clusters for purposes of parsimonious understanding
and communication. For example, the end R2 in a k-means can be plotted
against k (for various runs on k) to see the point at which the enhancement
of fit diminishes with the extraction of additional clusters. This issue is
relevant in factor analysis and multidimensional scaling as well and will be
revisited in those contexts.
MIZIK_9781784716745_t.indd 232 14/02/2018 16:38

Means for Mystery Bio DIY
A&B 4.00 2.50 0.50

C&D 0.50 0.50 1.50
Customer Cluster Distance2

A A&B 1.50
A C&D 8.75 keep A where it is
B A&B 1.50
B C&D 28.75 keep B where it is
C A&B 18.50
C C&D 0.75 keep C where it is
D A&B 17.50
D C&D 0.75 keep D where it is
Figure 10.8 K-means method: 2nd iteration
Interpretation and Verification
The interpretation of the clusters begins by examining the means

on each variable, in the example, the means are at the top of Figure
10.8. Those means indicate that cluster 1 (with customers A and B)
buy mysteries almost twice as much as biographies, and the profile
of means indicate that cluster 2 (with customers C and D) do not
buy at the same volume as the first segment and if they buy, they
purchase do-it-yourself manuals. Those profiles of means can be pre-
sented in table or figure form, for each segment across all the input
variables.
The profiles can be substantiated by running an analysis of variance
(ANOVA) in which the cluster membership serves as a predictor variable,
and each variable that had served as input to the cluster analysis serves as
a dependent variable in a separate ANOVA. With many input variables,
many ANOVAs must be run, so the critical value might be reduced from
the usual a 5 0.05 level to something more conservative, say, a 5 0.05/p,
where p is the number of input variables.
The profiles can also be supplemented if there exist additional data.
For example, whereas the variables used as inputs to the cluster analysis
to derive segments are likely to convey purchase behaviors or attitudes
toward brands and ads, marketers would then desire to know what kind
of people are in each cluster (i.e., demographic variables) and how they
might be reached (i.e., media consumption habits). These supplemental
variables would also be run through ANOVAs to see whether segments
MIZIK_9781784716745_t.indd 233 14/02/2018 16:38

differ significantly in their gender proportions, average houseld incomes,

frequency of PBS viewing, and so on.
Summary of Cluster Analysis
The goal of marketing segmentation is to find groups of customers who

are similar to each other, and the groups themselves are different (if the
groups were similar, the segments would be combined). Cluster analysis
is perfectly suited to this goal. Cluster analysis can also group SKUs into
recommendation sets.
In terms of limitations, many algorithms are based on combinatorics,
which may be problematic for large data sets. A solution might be to
cut the data, e.g., into frequent and infrequent purchases, cluster in each
group, and inspect the two solutions to see if there is convergence or if
the segment structures are slightly different such that purchase frequency
functions as a moderator.
Cluster analyses based on purchase data are the models underlying
online recommendation agents (cf., Amazon, Netflix, Pandora). In that
application, clustering is often called “collaborative filtering” because
the purchase data for other customers who are similar (vis-à-vis those
purchases) are used to generate recommendations. Cluster analysis also
underlies Match.com and other dating sites. People answer survey ques-
tions and the model locates others whose profiles are similar.
There are many clustering models, and many excellent resources such
as Aldenderfer and Blashfield (1984) and Everitt et al. (2011). For related
techniques, see McCutcheon (1987) on latent class analysis, or Smithson
and Verkuilen (2006) on fuzzy set theory.
Factor Analysis
Managers say things like, “If you can’t measure it, you can’t manage it”
or “You manage what you measure.” Quantitative indicators are not
the only means of assessing business practices, but they can be extremely
helpful.
There are two major decisions to be made when conducting a factor
analysis. They are: (1) the number of factors to extract and (2) the rotation
of the factors and their interpretation.
Measuring objective indicators like a car’s gas mileage or speed is
relatively easy, but marketers frequently find themselves in the business
of trying to understand customers’ attitudes and behavioral propensities,
asking survey questions such as, “Do you like the car’s style?” or “Does it
MIZIK_9781784716745_t.indd 234 14/02/2018 16:38

feel luxurious?” When a customer is asked such a question, the response

is a data point, X, that is assumed to reflect a true attitude, t, as well as
some measurement error, e; i.e., X5 t1 e. Measurement error is assumed
to be random, so that high and low errors cancel each out (e 50), and the
average is thought to be a decent estimate of the truth, E (x) 5 t.
Two attitudes that are perennial favorites of marketing managers
are “attitude toward the ad” (Aad) and “attitude toward the brand”
(Abrand). So, imagine a study in which customers are shown an advertise-
ment and asked for their opinions about the ad on three survey questions,
Aad1, Aad2, Aad2. Next the customers are asked for their opinions about
the brand featured in the ad, again using three survey questions, Abrand1,
Abrand2, Abrand3. The marketing analyst uses multi-item scales so that
if there is anything strange about one or two of the questions (or one
or two of a respondent’s answers), given the measurement model just
stated, the average should nevertheless produce a reasonable facsimile
of customers’ attitudes. It will be expected that the three ad attitudes are
likely correlated among themselves, and the three brand attitudes will be
correlated among themselves. It is also highly likely that there will be some
cross-correlations, between the Aad and Abrand variables.
The factor analytical model expresses each measured variable as a func-
tion of the underlying factors, F, weights of the variable on each factor,
b’s, and a final term, U, with its own weight, d:
xj 5 bj1F1 1bj2 F2 1 . . . 1 bjr Fr 1djUj . (10.1)
In the model, the Fs are the “factors” or “common factors” to reflect

the goal of factor analysis capturing the common variability, i.e., the co-
variability across a set of items. Factor analysis is sometimes referred to as
a “data reduction” tool because the number of common factors, r, is usu-
ally much less than the number of observed variables, p. In our example,
we are positing six variables as a function of two factors. (We shall discuss
shortly how r is determined in general.) The U term is called a “unique-
ness” factor and it reflects any specific and systematic idiosyncrasy of a
variable as well as a random element of measurement error.2
Figure 10.9 depicts the factor analytical model for our example.
There are six variables (in the boxes), three of which measure customers’
attitudes toward the advertisement shown in the study (Aad), and three
measure customers’ attitudes toward the brand (Abrand) featured in the
ad. The factors, F1 and F2 (in the ovals), are said to be “latent” or not
directly observable, rather we infer them from the data patterns among
the six Aad and Abrand items. The factors themselves are likely to be
correlated, and that correlation is labeled ϕ. The b weights are called
MIZIK_9781784716745_t.indd 235 14/02/2018 16:38

Common Measured Uniqueness

Factors Variables Factors
F1 = Attitude b11 d1
toward Ad Aad1 U1
b21
b31 d2
Aad2 U2
b41
b51 d3
b61 Aad3 U3

factor inter-
correlation d4
b12 b22 Abrand1 U4
b32 d5
b42 Abrand2 U5
F2 = Attitude b52 d6
Abrand3 U6
toward brand b62
Figure 10.9 Factors reflected in measured variables
factor loadings, and they reflect the relationships between each factor and
the six variables; e.g., Aad1 will be expected to have a high loading on F1
(Aad) and a low loading on F2 (Abrand). The Us at the right of the figure
represent the uniqueness factors, and the d weights reflect their impact on
their respective observed variables.
A factor analysis model finds the b’s in equation (10.1) to capture
as much of the information contained in the original X1, X2, . . ., Xp
variables as possible. In the factor analytic context, that means capturing
the pattern of correlations among the p variables in the p×p correlation
matrix, R.
The computer or model proceeds as follows. First, the correlation
matrix is adjusted for the uniqueness factors. The obverse of unique-
ness is communality, or the extent of covariability with other variables.
Communalities are estimated for each variable as the squared multiple
correlation (SMC) from predicting each variable from the others, in turn,
i.e., R21•2,3,. . .,p, R22•1,3,. . .,p, . . ., and R2p•1,2,3,. . .,p-1 (then the uniqueness of a
variable is 1 minus its communality). The SMCs are imputed into the
diagonal of R, and we’ll call that adjusted matrix: Radjusted = RSMC. The
difference between the two matrices is depicted in Figure 10.10, for our
example data set on p = 6 variables.
Next, the RSMC is “factored” or decomposed into matrices of “eigen-
values” and “eigenvectors.” Each eigenvector will form a column of the
vector matrix V and its values, v1, v2, . . ., vp comprise the loadings that
indicate the extent to which the variables X1, X2, . . ., Xp load on the
corresponding factor. The first vector or factor is derived to capture the
MIZIK_9781784716745_t.indd 236 14/02/2018 16:38

1.000 0.971 0.944 0.402 0.371 0.382

0.971 1.000 0.957 0.404 0.369 0.386
R 0.944
0.402
0.957
0.404
1.000
0.387
0.387
1.000
0.355
0.964
0.379
0.948
0.371 0.369 0.355 0.964 1.000 0.967
0.382 0.386 0.379 0.948 0.967 1.000
SMC’s (R21–2,3,4,5,6 = 0.946, R22–1,3,4,5,6 = 0.958, ...):

Aad1 Aad2 Aad3 Abrand1 Abrand2 Abrand3
0.946 0.971 0.944 0.402 0.371 0.382

0.971 0.958 0.957 0.404 0.369 0.386
RSMC 0.944
0.402
0.957
0.404
0.922
0.387
0.387
0.935
0.355
0.964
0.379
0.948
0.371 0.369 0.355 0.964 0.958 0.967
0.382 0.386 0.379 0.948 0.967 0.940
Figure 10.10 Adjusting the correlation matrix in preparation for factor

analysis, R‡RSMC
aximum covariability among the Xs. The eigenvalue indicates how much
m
(co)variability that eigenvector captured. The second vector or factor is
derived to capture the maximum amount of covariability that remains
among the Xs with the constraint that the second vector be orthogonal to
(uncorrelated with) the first. The eigenvalue–eigenvector step is written as
RSMC 5 VLV r (V r is the transpose of V, and the eigenvalues, l1,l2,. . .lp
form the diagonal elements in L).
The eigensolution is broken in two by defining a matrix B 5VL.5
such that RSMC 5 BBr. Recall that, to achieve parsimony, the number of
common factors retained (r) is fewer than the number of input variables
(p), so that while the matrix RSMC can be perfectly reproduced by BBr,
extracting r factors yields an approximation: RSMC < BrBrr . Figure 10.11
presents the first two eigenvectors as the columns of V, and their cor-
responding eigenvalues in L. (For readers rusty in matrix multiplication,
calculate (0.409)(2.00115) + (0.407)(0) to obtain 0.819, all values in the
solid boxes, and (0.406)(0) + (–0.425)(1.30979) to obtain –0.557, values
in the dashed boxes.) Note the sums of squared elements of eigenvectors
(columns of V) are 1.0 (within rounding), whereas the sum of squares for
B equal the eigenvalues. The B matrix is the raw, “unrotated” (not to be
interpreted) factor loadings matrix.
We will address the issue of rotations and the interpretation of
factor loadings shortly, but we are currently steeped in eigenvalues (and
MIZIK_9781784716745_t.indd 237 14/02/2018 16:38

V .5 Unrotated
V r=2 r=2 factor loadings, Br
 
v1 v2 √ l1 √ l2 v1 v2
Aad1 0.409 0.407 2.00115 0 Aad1 0.819 0.534
Aad2 0.412 0.412 0 1.30979 Aad2 0.825 0.541
Aad3 0.402 0.406 Aad3 0.804 0.532
Abrand1 0.412 –0.392 Abrand1 0.824 –0.514
Abrand2 0.406 –0.425 Abrand2 0.813 –0.557
Abrand3 0.408 –0.405 Abrand3 0.816 –0.531
sum of squares sum of squares
each coloumn: 0.999 0.999 each coloumn: 4.005 1.716 = l
√'s : 2.001 1.310 = √
l
Figure 10.11 Matrix multiplication of eigenvector to factor
e igenvectors), and they can be used to answer the question, “How many
factors are there?” or “What is r?”
Selecting “r,” the Number of Factors
For many statistical models, there is a tradeoff between fully explaining a

sample of data and doing so parsimoniously. In factor analysis, the trade-
off is between extracting more factors (r approaches p) to capture as much
of the covariability among the Xs as possible, and extracting few factors
(r is as small as possible) to capture the covariability “reasonably well.”
One simple and well-behaved heuristic is to consider diminishing
returns—if an eigenvalue reflects the amount of (co)variability captured
by an eigenvector (or factor), we can examine the point at which extracting
an additional factor doesn’t seem to pay off much in terms of how much
(co)variability it explains in the data.
This judgment is made by examining a plot of the eigenvalues (output
by default in most statistical computing packages). Figure 10.12 shows
a plot of eigenvalues, and the idea is to look for a break or an elbow in
the curve. In this figure, we see that extracting F1 explains some amount
of covariability, and extracting F2 explains some covariability as well,
even if not as much as F1. F3 explains some more covariability. However,
F4 and those that follow explain only negligible amounts of additional
covariability. Thus, the break between the 3rd and 4th eigenvalues suggest
we should take r = 3 factors.
MIZIK_9781784716745_t.indd 238 14/02/2018 16:38

eigenvalue
2.0
1.5 take 3
1.0
0.5
1 2 3 4 5 6 ... #factors
Figure 10.12 How many factors?
F2 F’2 F”2
Aad2
Aad3 Aad1
F1
Abrand1
Abrand3
Unrotated factors, F Abrand2
Orthogonal rotation, F’
Oblique rotation, F” F”1
F’1
Figure 10.13 Factors and rotations
Factor Rotations
While factors are extracted to optimize a certain mathematical prop-

erty (eigenvectors explain maximum (co)variability), they are rotated to
enhance interpretability. The raw loadings in matrix B from Figure 10.11
are plotted in Figure 10.13. All three ad variables are in the northeast
quadrant, and all three brand variables are in the southeast. F1 is the
horizontal axis, and all six Xs will have a positive first coordinate—all six
Xs load on F1. F2 is the vertical axis, the three ad Xs are positive, and the
MIZIK_9781784716745_t.indd 239 14/02/2018 16:38

three brand Xs are negative. The loadings indicate that F1 reflects all six
variables, and F2 reflects some kind of contrast between the ad and brand
variables. That interpretation isn’t very enlightening.
One means of rotating factors functions like operating a spinner in a
children’s board game—we take the original factors and rotate the axes a
bit clockwise until the axes are in a location we like better. If we spin the
axes labeled F1 and F2 through an approximate angle of q = 45°, then the
new axes would appear where there are dashed lines labeled F1′ and F2′.
That rotation is said to be an “orthogonal” rotation because F1′ and F2′
are still uncorrelated (the axes are perpendicular to each other). When the
Xs are projected onto these new axes, the rotated factors, it is clearer to see
that F1′ is defined by the three brand variables having high loadings (and
the three ad variables have relatively lower loadings), and F2′ is defined by
the three ad variables.
An orthogonal rotation is achieved by a simple transformation. We
can estimate that the angle from F1 to the placement of F1′ is about
45°. In Figure 10.14, the raw, unrotated factor loadings matrix B from
Figure 10.11 is repeated for convenience. The small matrix in the center
contains the sine and cosine of the 45° angle, and the matrix multiplication
yields the orthogonally rotated factors, F1′ and F2′. The matrix at the right
contains the new factor loadings. Note its interpretation, consistent with
Figure 10.13, indicates that F1′ is defined by the brand variables, and F2′
by the ad variables. (It is standard to use a cut-off of 0.3 to determine the
loadings that are large, associated with the variables that help to define a
factor, versus those loadings that are so small as to be sampling variability
or noise.) The most frequently used and best performing orthogonal
Unrotated Orthogonally
factors, Br transformation rotated factors
F1 F2 F '1 F '2
Aad1 0.819 0.534 0.708 0.706 0.203 0.956
=
Aad2 0.825 0.540 –0.706 0.708 0.202 0.965
Aad3 0.804 0.532 0.193 0.945
Abrand1 0.824 –0.514 0.946 0.219
Abrand2 0.813 –0.557 0.969 0.180
Abrand3 0.816 –0.531 0.953 0.201
cos θ –sin θ
sin θ cos θ
θ ≈ 45º
Figure 10.14 Matrices for orthogonal factor rotation
MIZIK_9781784716745_t.indd 240 14/02/2018 16:38

Orthogonally Oblique
rotated factors factor loadings
F '1 F '2 F ''1 F ''2
0.203 0.956 Aad1 0.008 0.874
0.202 0.965 ^3 Aad2 0.008 0.898
0.193 0.945 Aad3 0.007 0.843
0.946 0.219 Abrand1 0.848 0.010
0.969 0.180 Abrand2 0.910 0.006
0.953 0.201 Abrand3 0.865 0.008
ϕ = 0.385
Figure 10.15 Matrices for oblique factor rotation
rotation is called “varimax” and it is available through most statistical

computing packages.
In many uses of factor analysis, it is a little restrictive to assume that
the factors are uncorrelated. In our example, it is extremely likely that
customers’ attitudes toward advertisements and brands are correlated.
The oddity is that the resulting axes (representing the factors) will no
longer be perpendicular. That’s okay; we are not building houses, we are
modeling customer data.
A factor rotation that allows factors to be correlated is called an oblique
rotation. The best algorithm is called “promax,” and it is available in most
statistics packages, and it is very easy to understand. Promax begins with
an orthogonal rotation, thus, the rotation we just saw in Figure 10.14 is
repeated for convenience at the left of Figure 10.15. Next, every loading
is simply raised to a power, very often to the power of 3. When raising a
factor loading to a power, large factor loadings get a little smaller, but
small loadings get very tiny, with the result being an even clearer delinea-
tion of variables that load, and don’t load, on each factor. (Using an odd
number like 3 as the power ensures that the positive and negative signs on
the loadings are maintained.)
For these data, the factor inter-correlation is moderate, ϕ = 0.385, but
greater than zero, which suggests that an oblique rotation represents the
data better than an orthogonal rotation would. If phi is very close to zero,
the orthogonal factors may be used. If phi is very large (0.7 or higher), we
may have extracted too many factors.
MIZIK_9781784716745_t.indd 241 14/02/2018 16:38

Summary of Factor Analysis
Factor analysis is very useful to marketing managers for understanding cus-

tomer survey data. Factor models can simplify a large number of variables
to a smaller number of factors based on the correlations among the survey
questions. The number of factors is determined as a trade-off between
goodness of fit on the data (requiring more factors), and parsimony (requir-
ing fewer). Factor rotations facilitate the interpretation of the solutions.
One arena in which the data reduction goal is useful is in regression.
If a marketing manager desired to predict willingness to purchase from
attitudes about ads and brands, and per our example, each predictor was
measured using three-item scales, the inclusion of all six variables (or
either set of three) would certainly create multicollinearity problems. If,
instead, the six variables were reduced to two factors (each being essen-
tially the average of its three items), those two predictors would less likely
cause multicollinearity problems.
Finally, this coverage would be remiss if we did not at least mention
confirmatory factor analysis. The model just described is the classic
“exploratory” factor analysis model. In confirmatory factor analysis,
we hypothesize that certain variables will load on one factor and not
on others, and the non-loadings are not estimated but fixed at zero.
Confirmatory factor analyses are especially important as an integral part,
with path models, of structural equations models.
In terms of limitations, sometimes the hopes of what factor analysis can
achieve overreach what it can in fact achieve. Specifically, if there is little
thought and planning in data collection and some arbitrary set of vari-
ables is measured with no particular theoretical expectation of how they
might map onto common constructs, the factor analysis will still seek week
patterns of correlated sets of variables, but results might not be very clear.
There needs to be thoughtful planning in data collection if there is to be a
hope that the data analysis might be clear and informative.
There are many excellent resources on factor analysis, including Cliff
(1987), Comrey and Lee (1992), Iacobucci (1994), Gorsuch (1983), Kim
and Mueller (1978a, 1978b). Long (1983) is an excellent introduction to
confirmatory factor analysis.
Multidimensional Scaling (MDS)
Marketing managers frequently use perceptual maps to understand their

positioning in the marketplace. Simple maps may be drawn from survey
questions such as, “Do you believe that Whole Foods offers fresh
MIZIK_9781784716745_t.indd 242 14/02/2018 16:38

produce?” and “Do you believe that Whole Foods offers good value?”
juxtaposed with “How important is freshness when you shop for grocer-
ies?” and “How important is value?” Means over survey respondents
are calculated and plotted to see whether a brand excels on dimensions
that consumers consider to be important. Many brand attributes may be
plotted, and competitor brands may be superimposed on the plots.
This approach to creating perceptual maps is appealing for its simplic-
ity. Yet the map can only reflect the attributes measured on the survey,
and if consumers distinguish among brands using features and benefits
that the brand manager does not anticipate, those features will not be
reflected in the brand positions.
By comparison, perceptual maps derived from multidimensional scaling
(MDS) pose an omnibus question to consumers, simply, “How similar are
brands A and B?” (asked for all pairs of brands). Consumers proceed to
make brand comparisons along whatever attributes they care about, and
marketing managers infer them using MDS.
The heart of the MDS model is the analogy between distance and (dis)
similarity. A map is created so that brands thought to be similar will be
represented as points close together on the map, and brands thought to
be different will be farther apart. The map is studied for its configuration
as well as its dimensions. The configuration (i.e., relative brand locations)
helps inform numerous marketing questions, such as market structure
analysis, given that close brands are most competitive and likely substi-
tutes, verification of the effectiveness of marketing communications in
having properly positioned a brand vis-à-vis its competition, the necessity
for repositioning, strategic opportunities for brand development where
there currently exist empty spaces in the map, etc. The dimensions in a per-
ceptual map can also be informative, just as labels of North, South, East,
and West are in a geo-map, and we’ll show how to find their perceptual
equivalents.
There are several major decisions to be made when conducting an MDS.
They are: (1) the nature of the data to be modeled, (2) the MDS model to
be used, (3) the number of dimensions to extract, and (4) the interpretation
of the configuration and dimensions. We discuss each.
Dissimilarities Data
If the basic model or metaphor underlying MDS is that distances are used
to represent dissimilarities, the marketing analyst usually simply asks con-
sumers to fill out survey questions of the form, “How similar are these two
brands?” cycling through all pairs of p brands. Consumers use a scale such
as 1 = “very similar” and 9 = “very different.”
MIZIK_9781784716745_t.indd 243 14/02/2018 16:38

Other data collection options are available depending on the context.

For example, when brand managers run “blind taste tests,” they are
obtaining “confusions data.” Two soft drinks, Coke and Coke Zero,
will be mistaken for each other frequently if they taste similar to
consumers. Another kind of data that can serve as inputs to an MDS is
“co-purchase” data. For example, if most households buy cheese with
crackers, ice cream with toppings, chips with dips, the proportions of
co-purchasing can be modeled to provide a perceptual map of proximal
consumption, to assist in promotion opportunities. An analogous data
situation arises when consumers use checklists to indicate which brands
they have tried, and aggregating across consumers indicates the fre-
quency with which any pair of brands has been considered (cf. DeSarbo
and Cho, 1989).
MDS Models
With proximities data in hand, the MDS model begins to fit them onto a
map. Say consumers think brands A and B are very similar (call the dis-
similarities judgement dAB , and say dAB 51), B and C a little less similar
(dBC 5 2), and A and C still less similar (dAC 53). The brands could be
placed along a line, with A at point 1, B at point 2, and C at point 4. That
1 – d model would capture the data perfectly with dAB = 1, dBC = 2, dAC = 3.
Naturally, real data are noisier and real brands are more complex, so
the data are unlikely to be fit perfectly in 1 – d. For example, imagine the
data were dAB 51, dBC 52, dAC 5 2.24. These dissimilarities judgments
wouldn’t be represented perfectly in 1 – d, but they would be so in 2 – d
(with the squares of those ds defining three legs of a triangle and the
Pythagorean theorem).
Alternatively, we can assume that there is likely measurement error
in consumer judgements, and note that while the values are different, these
ds still follow the same rank order as the 1 – d example had. If we take the
ds at face value, we are fitting a “metric” MDS model, whereas if we simply
wish to render their relative size, we would fit a “nonmetric” MDS model.
In the class metric MDS model, the data values dij, representing the
dissimilarity judgment for brands i and j, are squared and centered by
removing the effects of the row means, column means, and the grand mean
(see Figure 10.16):
d*ij 5 20.5 [ d2ij 2 (d 2i. 2 d..2) 2 (d.j2 2 d..2) 2 d..2 ]
5 20.5 [ d2ij 2 d2i. 2 d2.j 1 d..2 ] .
MIZIK_9781784716745_t.indd 244 14/02/2018 16:38

δij δ2ij
A B C D A B C D row means δ2i–
A 0 A 0 9 36 25 17.5
B 3 0 B 9 0 9 16 8.5
C 6 3 0 C 36 9 0 25 17.5
D 5 4 5 0 D 25 16 25 0 16.5
17.5 8.5 17.5 16.5 15.0

column means δ2–j grand mean δ2...
∆ matrix:
A B C D
A 10 1 –8 –3
B 1 1 1 –3
C –8 1 10 –3
D –3 –3 –3 9
Figure 10.16 Classic metric MDS: data preparation
II
Configuration
Coordinates, X
C B A I II
A 1.2 0.5
I B 0.0 0.5
C –1.2 0.5
D 0.0 –1.5
D
Figure 10.17 Classic metric MDS: results
The matrix D is factored into D 5 XXr, where the matrix X contains the
coordinates for p points (brands) in r-dimensional space (thus p  r,
read “p by r,” meaning p rows and r columns). This problem is solved as
D 5 VLVr (an eigensolution with V being the matrix of eigenvectors and
Λ the diagonal matrix of eigenvalues, much like in factor analysis). The
1/2
matrix of MDS coordinates is defined X 5 VL .
Figure 10.17 contains the 2 – d solution (after standardizing the dimen-
sions), both plotted and in matrix form. Given that the MDS model works
on configurations of distances, the model would be equally valid if the “T”
appearance of the four brands were reflected vertically or horizontally, or
rotated through an angle.
By comparison, in nonmetric MDS, the input data are translated
to ranks and then modeled. In addition, whereas for metric MDS, the
MIZIK_9781784716745_t.indd 245 14/02/2018 16:38

odel-derived distances dij are a linear function of the dissimilarities

m
data, dij, in nonmetric MDS, the model-derived distances are a monotonic
function of the data. The monotonic function is compatible with the
assumption that the data increase by relative (ranked) amounts but with-
out the precision of intervals imposed on the more roughly measured data.
The assumptions of metric MDS may seem more stringent than those of
nonmetric MDS, but in practice, the perceptual maps that result from the
two approaches are often very similar.
Another popular MDS model is called INDSCAL, which stands for
“Individual Differences Scaling.” Where the classic metric and nonmetric
MDS models are used on single data sets (e.g., one consumer at a time, or
more typically, one matrix representing the average ds over a sample of
consumers), INDSCAL takes as input data multiple layers of dissimilarity
judgements, one for each consumer, dijk, where i, j = 1. . . p brands as
before, and k = 1. . .N consumers.
The INDSCAL model proceeds like the metric model but rather than
working with Euclidean distances, dij 5 !g t51r (xit2xjt) 2 , it uses weighted
Euclidean distances, dijk 5 !g t51 wkt (xit 2xjt) , defined for i = stimulus,
r 2
t = dimension, and k = consumer. The model then produces the usual p×r
matrix X, containing the coordinates of the brands in space, along with a
N×r matrix W, which contains the “subject weights” wkt representing the
weight that person k puts on the tth dimension. Those subject weights can
then be correlated with any additional information we had collected on the
consumers, such as demographic information or other attitudinal ratings
to learn, say, that consumers who weight dimension 1 heavily tend to be
male, whereas the consumers for whom dimension 2 is more salient are
older, for example.
How to Determine Dimensionality
As is true for many statistical models (e.g., as we discussed for factor analysis),
MDS has its own version of the tradeoff between model fit and the parsimony
of the model. Ideally, the perceptual map fits the data “as best as possible”
and does so in “minimal dimensionality.” As more dimensions are extracted,
the data fit improves, but parsimony declines. Furthermore, human beings
are so used to seeing 2-d geo-maps that 2-d perceptual maps dominate as
well, even if 3-d or 4-d perceptual maps might describe the data better.
Different MDS models use different measures of fit. Classic metric
MDS often produces a series of eigenvalues, and INDSCAL usually
produces a model R2. Both of these are “goodness of fit” indices (higher
numbers mean better fits). Nonmetric MDS usually produces a measure
called “Stress,” and it is a “badness of fit” index.
MIZIK_9781784716745_t.indd 246 14/02/2018 16:38

accounted for, R2, eigenvalue)

0.4 0.4
Goodness of fit (variance
Badness of fit (Stress)

0.3 take 3-d 0.3 take 4-d
0.2 0.2
0.1 0.1
1 2 3 4 5 ... 1 2 3 4 5 ...
#dimension #dimension
(MDS run once as 1-d, (MDS run once as 1-d,
another run as 2-d, etc.) another run as 2-d, etc.)
Figure 10.18 Determining number of dimensions
Figure 10.18 shows examples of plots for each. For either kind of fit
index, the goal is still to identify a break in the curve. For goodness-of-
fit indices, the number of dimensions to extract lies to the left (or above)
the break; the argument of diminishing returns suggests that taking yet
another dimension does not sufficiently enhance the fit. For badness of
fit indices, the number of dimensions to extract lies to the right (or below)
the break; the argument of diminishing returns suggests that taking yet
another dimension does not improve the (lack of) fit.
How to Interpret the Dimensions: Attribute Vector Fitting
In Figure 10.19, we see a simple MDS plot of an easily interpretable per-

ceptual map. The first dimension seems to differentiate the “colas” from
the “uncolas,” and the second dimension captures the diet versus non-diet
drinks. Real perceptual maps can be more ambiguous, so regressions are
used to fit vectors representing various brand attributes onto the map to
aid the interpretation.
To conduct this “vector fitting,” we would have asked consumers sev-
eral survey questions on each brand about attributes that we hypothesize
are important. Thus, in the survey, after collecting the dissimilarities data
about pairs of soft drinks, we would ask a series of questions about 7Up,
and then ask the same attribute questions about Diet Coke, and so on.
Figure 10.20 shows the little dataset we will create. The first two col-
umns are the coordinates from the 2-d MDS solution. These coordinates
are what are mapped in Figure 10.19. The next two columns are simply
the first two columns standardized—the dimensions have been turned into
MIZIK_9781784716745_t.indd 247 14/02/2018 16:38

II
Diet Coke
Diet Pepsi
I
7 up
Pepsi
Sprite Coke
Figure 10.19 Simple soft drinks example to interpret
Coordinates on Standardized
Dimensions: Coordinates 0 = nondiet 0 = uncola
I II I II 1 = diet 1 = cola
Coke 0.5 –0.5 0.641 –0.862 0 1
Pepsi 0.6 –0.4 0.808 –0.637 0 1
Diet Coke 0.4 0.5 0.474 1.387 1 1
Diet Pepsi 0.5 0.4 0.641 1.162 1 1
7Up –0.7 –0.3 –1.366 –0.412 0 0
Sprite –0.6 –0.4 –1.198 –0.637 0 0
mean: 0.000 0.000
standard deviation: 1.000 1.000
Figure 10.20 Vector fitting to interpret MDS
z-scores. Those two standardized columns will serve as the two predictor
variables in the regressions. The remaining columns represent attributes of
the brands—here they are binary just for simplicity.
One multiple regression is run for each attribute. When running
the regression in Figure 10.20 on the diet versus non-diet property,
specifically, d̂iet 5 b1 zdimI 1 b2 zdimII, the regression R2 5 0.987,
and the coefficient estimates are dˆiet 5 0.117zdimI 1 0.949zdimII. For
the cola–uncola attribute, specifically, ĉola 5 b1 zdimI 1 b2 zdimII,
the regression R2 5 0.993, and the coefficient estimates are
ĉola 5 0.964zdimI 1 0.086zdimII.
The betas from these regressions are the coordinates for the head of an
attribute vector emanating from the origin. In Figure 10.21, we see that the
MIZIK_9781784716745_t.indd 248 14/02/2018 16:38

II
Diet Coke
diet Diet Pepsi
segment 1
cola
I
7 up
Pepsi
Sprite
Coke
Figure 10.21 Overlaying attribute vectors in standardized space
cola attribute vector points roughly to the “east,” indicating the direction
in which that attribute is maximized (brands farther east are those per-
ceived to have much of that attribute). Similarly, the diet attribute vector
points almost due north, such that brands at the top of the perceptual map
are the diet drinks, and by implication, through the origin heading in the
opposite direction, toward the south are the non-diet drinks.
Ideal Point Preference Models
MDS aids the marketing manager in understanding consumer perceptions

about brand positions, but marketers also care about consumer prefer-
ences. When preference data are collected, e.g., rating or ranking each
brand in the study, they may be modeled in the form of an “ideal point”
and also overlain on the map.
A customer’s ideal point is located in the MDS space with having just
the right amount of dimension 1’s attribute and dimension 2’s attribute,
and the distance between the ideal point and the existing brands represents
the model’s predictions for how much the consumer will like the brands.
For example, the star in Figure 10.21 shows a segment of consumers who
like diet soft drinks, preferring Diet Pepsi just a bit more than Diet Coke,
but both of these drinks to all the others. Ideal points are located using
regression, much like what was done for the brand attributes.
MIZIK_9781784716745_t.indd 249 14/02/2018 16:38

Summary of Multidimensional Scaling
Perceptual maps are useful to marketers as they consider their brand’s

positioning in the marketplace. Several kinds of MDS models exist,
including classic metric, non-metric, and individual differences models.
Attribute vectors help guide interpretation of the MDS solutions.
A related model that has become popular for creating perceptual maps
is called Correspondence Analysis (CA). One reason for its popularity is
that MDS requires dissimilarities data as inputs, whereas CA can be used
on data matrices that may be brands as rows and attributes as columns,
with the matrix elements being the means over a sample of consumers of
ratings on a nine-point scale indicating the extent to which the attribute
is descriptive of each brand. CA models yield coordinates for brands
and attributes. The distance between brands indicates dissimilarity, as in
MDS. Brands closer to an attribute are perceived to be characterized by
that attribute (the CA version of attribute vectors in MDS).
In terms of limitations, perhaps the greatest resistance to MDS is
the requirement of collecting dissimilarities data that are not likely to
already be a part of a planned survey. It is more standard for surveys
to ask consumers to rate one or more brands on several attributes, and
some marketing analysts derive dissimilarities from such data, e.g., two
brands would be highly correlated if they have similar profiles across the
attributes.
There are many excellent resources on MDS, including Coxon (1982),
Davison (1983), Green, Carmone and Smith (1989), Kruskal and Wish
(1978), and anything Wayne DeSarbo writes, such as DeSarbo, Manrai,
and Manrai (1994). In addition, Clausen (1998) and Greenacre (2007) are
excellent introductions to correspondence analysis.
Chapter Summary
Cluster analysis, factor analysis, and multidimensional scaling are three

extremely useful techniques for academic and industry marketing research-
ers and consultants. The basic logic and goals of each were presented and
illustrated, and references were suggested to pursue further inquiry.
Notes
1. For more information on each technique, please see Iacobucci (2017).

2. A model related to factor analysis is called “principal components analysis,” and its
MIZIK_9781784716745_t.indd 250 14/02/2018 16:38

model formulation looks similar, but it has no uniqueness factors, in part because users
typically do not care about measurement error on the variables.
References
Cluster Analysis
Aggarwal, Charu C. (2013), Data Clustering: Algorithms and Applications, Boca Raton, FL:
Chapman & Hall/CRC.
Aldenderfer, Mark S. and Roger K. Blashfield (1984), Cluster Analysis, Newbury Park, CA:
Sage.
Everitt, Brian S., Sabine Landau, Morven Leese, and Daniel Stahl (2011), Clustering
Analysis, 5th ed., New York: Wiley.
King, Ronald S. (2014), Cluster Analysis and Data Mining: An Introduction, Herndon, VA:
Mercury Learning and Information.
McCutcheon, Allan L. (1987), Latent Class Analysis, Newbury Park, CA: Sage.
Romesburg, Charles (2004), Cluster Analysis for Researchers, Lulu.
Smithson, Michael and Jay Verkuilen (2006), Fuzzy Set Theory, Thousand Oaks, CA:
Sage.
Factor Analysis References
Cliff, Norman (1987), Analyzing Multivariate Data, San Diego: Harcourt Brace Jovanovich.
Comrey, Andrew L. and Howard B. Lee (1992), A First Course in Factor Analysis, 2nd ed.,
Hillsdale. NJ: Erlbaum.
Fabrigar, Leandre R. and Duane T. Wegener (2011), Exploratory Factor Analysis, New
York: Oxford University Press.
Gorsuch, Richard L. (1983), Factor Analysis, 2nd ed., Hillsdale, NJ: Erlbaum.
Iacobucci, Dawn (1994), “Classic Factor Analysis,” in Richard Bagozzi (ed.), Principles of
Marketing Research, Cambridge, MA: Blackwell, 279–316.
Kim, Jae-On and Charles W. Mueller (1978a), Introduction to Factor Analysis: What It Is and
How to Do It, Beverly Hills, CA: Sage.
Kim, Jae-On and Charles W. Mueller (1978b), Factor Analysis: Statistical Methods and
Practical Issues, Beverly Hills, CA: Sage.
Long, J. Scott (1983), Confirmatory Factor Analysis, Newbury Park, CA: Sage.
Pette, Marjorie A., Nancy R. Lackey, and John J. Sullivan (2003), Making Sense of Factor
Analysis: The Use of Factor Analysis for Instrument Development in Health Care Research,
Thousand Oaks, CA: Sage.
Thompson, Bruce (2004), Exploratory and Confirmatory Factor Analysis, New York:
American Psychological Association.
Walkey, Frank and Garry Welch (2010), Demystifying Factor Analysis: How it Works and
How to Use It, Bloomington, IN: Xlibris.
Multidimensional Scaling References
Borg, Ingwer and Patrick J. F. Groenen (2005), Modern Multidimensional Scaling: Theory
and Applications, New York: Springer.
Borg, Ingwer, Patrick J. F. Groenen, and Patrick Mair (2012), Applied Multidimensional
Scaling, New York: Springer.
Clausen, Sten Erik (1998), Applied Correspondence Analysis, Thousand Oaks, CA: Sage.
MIZIK_9781784716745_t.indd 251 14/02/2018 16:38

Cox, Trevor F. and Michael A. A. Cox (2000), Multidimensional Scaling, 2nd ed., Boca
Raton, FL: Chapman & Hall/CRC.
Coxon, A. P. M. (1982), The User’s Guide to Multidimensional Scaling, Exeter, UK:
Heinemann.
Davison, Mark L. (1983), Multidimensional Scaling, New York: Wiley.
DeSarbo, Wayne and Jaewun Cho (1989), “A Stochastic Multidimensional Scaling Vector
Threshold Model for the Spatial Representation of ‘Pick Any/N’ Data,” Psychometrika,
54(1), 105–129.
DeSarbo, Wayne, Ajay K. Manrai, and Lalita A. Manrai (1994), “Latent Class
Multidimensional Scaling: A Review of Recent Developments in the Marketing and
Psychometric Literature,” in Richard P. Bagozzi (ed.), Advanced Methods of Marketing
Research, New York: Blackwell Publishers, 190–222.
Green, Paul E., Frank J. Carmone Jr., and Scott M. Smith (1989), Multidimensional Scaling:
Concepts and Applications, Boston: Allyn & Bacon.
Greenacre, Michael (2007), Correspondence Analysis in Practice, 2nd ed., New York:
Chapman & Hall/CRC Interdisciplinary Statistics.
Kruskal, Joseph B. and Myron Wish (1978), Multidimensional Scaling, Beverly Hills, CA:
Sage.
General References
Grimm, Laurence G. and Paul R. Yarnold (1995), Reading & Understanding Multivariate
Statistics, New York: American Psychological Association.
Iacobucci, Dawn (2017), Marketing Models: Multivariate Statistics and Marketing Analytics,
3rd ed., Nashville, TN: Earlie Lite Books.
Johnson, Richard A. and Dean W. Wichern (2007), Applied Multivariate Statistical Analysis,
6th ed., Upper Saddle River, NJ: Pearson.
Kachigan, Sam Kash (1991), Multivariate Statistical Analysis: A Conceptual Introduction,
2nd ed., New York: Radius Press.
Rencher, Alvin C. and William F. Christensen (2012), Methods of Multivariate Analysis, 3rd
ed., New York: Wiley.
Tabachnick, Barbara G. and Linda S. Fidell (2012), Using Multivariate Statistics, 6th ed.,
Upper Saddle River, NJ: Pearson.
MIZIK_9781784716745_t.indd 252 14/02/2018 16:38

PART V
MACHINE LEARNING
AND BIG DATA
MIZIK_9781784716745_t.indd 253 14/02/2018 16:38

MIZIK_9781784716745_t.indd 254 14/02/2018 16:38
11. Machine learning and marketing
Daria Dzyabura and Hema Yoganarasimhan
Machine learning (ML) refers to the study of methods or algorithms

designed to learn the underlying patterns in the data and make predic-
tions based on these patterns.1 ML tools were initially developed in the
computer science literature and have recently made significant headway
into business applications. A key characteristic of ML techniques is their
ability to produce accurate out-of-sample predictions.
Academic research in marketing has traditionally focused on causal
inference. The focus on causation stems from the need to make counter-
factual predictions. For example, will increasing advertising expenditure
increase demand? Answering this question requires an unbiased estimate
of advertising impact on demand.
However, the need to make accurate predictions is also important to
marketing practices. For example, which consumers to target, which
product configuration a consumer is most likely to choose, which version
of a banner advertisement will generate more clicks, and what the market
shares and actions of competitors are likely to be. All of these are predic-
tion problems. These problems do not require causation; rather, they
require models with high out-of-sample predictive accuracy. ML tools can
address these types of problems.
ML methods differ from econometric methods both in their focus and
the properties they provide. First, ML methods are focused on obtaining
the best out-of-sample predictions, whereas causal econometric methods
aim to derive the best unbiased estimators. Therefore, tools that are opti-
mized for causal inference often do not perform well when making out-of-
sample predictions. As we will show below, the best unbiased estimator
does not always provide the best out-of-sample prediction, and in some
instances, a biased estimator performs better for out-of-sample data.2
Second, ML tools are designed to work in situations in which we do
not have an a priori theory about the process through which outcomes
observed in the data were generated. This aspect of ML contrasts with
econometric methods that are designed for testing a specific causal theory.
Third, unlike many empirical methods used in marketing, ML techniques
can accommodate an extremely large number of variables and uncover
which variables should be retained and which should be dropped. Finally,
scalability is a key consideration in ML methods, and techniques such as
255
MIZIK_9781784716745_t.indd 255 14/02/2018 16:38

feature selection and efficient optimization help achieve scale and effi-
ciency. Scalability is increasingly important for marketers because many
of these algorithms need to run in real time.
To illustrate these points, consider the problem of predicting whether
a user will click on an ad. We do not have a comprehensive theory of
users’ clicking behavior. We can, of course, come up with a parametric
specification for the user’s utility of an ad, but such a model is unlikely
to accurately capture all the factors that influence the user’s decision to
click on a certain ad. The underlying decision process may be extremely
complex and potentially affected by a large number of factors, such as
all the text and images in the ad, and the user’s entire previous browsing
history. ML methods can automatically learn which of these factors affect
user behavior and how they interact with each other, potentially in a
highly non-linear fashion, to derive the best functional form that explains
user behavior virtually in real time. ML methods typically assume a model
or structure to learn, but they use a general class of models that can be
very rich.
Broadly speaking, ML models can be divided into two groups: super-
vised learning and unsupervised learning. Supervised learning requires
input data that has both predictor (independent) variables and a target
(dependent) variable whose value is to be estimated. By various means,
the process learns how to predict the value of the target variable based
on the predictor variables. Decision trees, regression analysis, and neural
networks are examples of supervised learning. If the goal of an analysis
is to predict the value of some variable, then supervised learning is used.
Unsupervised learning does not identify a target (dependent) variable,
but rather treats all of the variables equally. In this case, the goal is not to
predict the value of a variable, but rather to look for patterns, groupings,
or other ways to characterize the data that may lead to an understanding
of the way the data interrelate. Cluster analysis, factor analysis (principle
components analysis), EM algorithms, and topic modeling (text analysis)
are examples of unsupervised learning.
In this chapter, we first discuss the bias–variance tradeoff and regu-
larization. Then we present a detailed discussion of two key supervised
learning techniques: (1) decision trees and (2) support vector machines
(SVM). We focus on supervised learning, because marketing researchers
are already familiar with many of the unsupervised learning techniques.
We then briefly discuss recent applications of decision trees and SVM in
the marketing literature. Next, we present some common themes of ML
such as feature selection, model selection, and scalability, and, finally, we
conclude the chapter.
MIZIK_9781784716745_t.indd 256 14/02/2018 16:38

Machine learning and marketing 257
Bias–Variance Tradeoff
The bias–variance tradeoff demonstrates the key difference between pre-

diction and causal-inference problems. In causal-inference problems, the
goal is to obtain unbiased estimates of the model parameters. However,
when the goal is the best out-of-sample prediction, parameter values do
not need to be unbiased. Therefore, methods built for causal inference are
not optimized for prediction, because they restrict themselves to unbiased
estimators.
When assessing how good a model will be at making predictions, we
distinguish between two different sources of error: bias and variance.
Error due to bias is the systematic error we can expect from estimating
the model on a new data set. That is, if we were to collect new data and
estimate the model several times, how far off would these models’ predic-
tions be, on average? The error due to variance is the extent to which
predictions for a point differ across different realizations of the data. For
example, a model that overfits to the training data will have high variance
error because it would produce very different estimates on different data
sets. Overfitting occurs when a model is fit too closely to a finite sample
data set. Thus, when the model is applied to a different finite sample, it
performs poorly.
Let us now examine how these two sources of error affect a model’s pre-
dictive ability. Let y be the variable we want to predict, and let x1,. . .,xn be
the predictors. Suppose a function exists that relates y to x, y 5 f (x) 1 e,
where e is normally distributed with mean 0 and variance se. We would
like to estimate a model, fˆ (x) , to minimize the mean squared error
of the prediction. The expected squared prediction error at point x is
MSE (x) 5E [ (y 2 fˆ (x)) 2 ] , which can be decomposed as follows:
MSE (x) 5 (E [ fˆ (x) ] 2f (x)) 2 1 E [ f̂ (x) 2E [ fˆ (x) ] ] 2 1 se2 (11.1)
The last term, se2, is inherent noise in the data, so it cannot be minimized
and is not affected by our choice of fˆ (x) . The first term is the squared
bias of the estimator; the second term is the variance. We can see that both
the bias and variance contribute to predictive error. Therefore, when we
are trying to come up with the best predictive model, an inherent tradeoff
exists between bias and variance of the estimator. By ensuring no bias,
unbiased estimators allow no tradeoff. We refer readers to Hastie et al.
(2009) for the formal derivation of the above.
To allow for a tradeoff, we introduce the concept of regularization.
Instead of minimizing in-sample error alone, we introduce an additional
term and solve the following problem:
MIZIK_9781784716745_t.indd 257 14/02/2018 16:38

minimizex a ( yi 2 fˆ (xi)) 2 1 lR ( f̂ )
n
(11.2)
i
The term R ( fˆ ) is a regularizer. It penalizes functions that create sub-

stantial variance. The specific form of R ( fˆ ) will depend on the model to
be estimated, fˆ, and is typically chosen a priori. The weight given to the
regularizer relative to in-sample fit is captured by l, which controls the
amount of regularization and allows us to maximize predictive perform-
ance by optimally trading off bias and variance. A key idea in ML is that
l can be optimally derived from the data itself instead of being imposed
exogenously. Usually it is selected using cross-validation, by splitting the
data into several training and validation sets. By repeatedly holding out
some subset of the data for validation, we can determine the value of l
that leads to the best prediction for the holdout data. Therefore, the model
is explicitly optimized to make the best out-of-sample prediction given the
data. Note that by introducing regularization, we have sacrificed the unbi-
asedness of the estimator in favor of getting better out-of-sample predic-
tions. A more formal treatment of regularization follows later.
By empirically making the bias–variance tradeoff, regularization allows
us to consider a much broader class of models. For example, we can have
models with many more predictors than observations, or models with
many parameters, such as high-degree polynomials, or highly non-linear
models, such as decision trees or random forests.
The ability to consider a rich class of models is important for applica-
tions with no hypothesized parametric model that can be estimated on the
data. For example, in the computer science literature, a commonly studied
problem is image recognition, where the goal is to recognize the object in a
picture, and the data are pixels. Of course, this case has many more predic-
tors than data points, and we have no model for how the pixels actually
combine to make an image of, say, a dog or a house. As such, classical
ML applications focus much less on modeling than does econometrics
or classical statistics. Rather, the focus is on “learning” from the data. In
such settings, weak assumptions about model structure combined with
large data sets that are often characterized by high dimensions and a lot
of missing data lead to natural concerns in (1) computation and (2) data
overfitting. To deal with these challenges, several techniques have been
developed, including regularization, cross-validation, and approximate
optimization methods.
MIZIK_9781784716745_t.indd 258 14/02/2018 16:38

Decision Tree-based Models
In the most general formulation of a statistical prediction problem, we are

interested in the conditional distribution of some variable y given a set of
other variables x 5 (x1,. . .,xp) . In ML, the x variables are often referred
to as “predictors” or “features” (in marketing, these are usually called
explanatory variables), and the focus of many ML problems is to find a
function f (x) that provides a good prediction of y. We typically have some
observed data { x,y } and want to compute a good prediction of y for a new
draw of x. The definition of a good predictor is based on its ability to mini-
mize a user-defined loss function such as the sum of squared residuals. The
relevant loss in a prediction problem is associated with new out-of-sample
observations of x, not the observations used to fit the model.
There are two main types of supervised learning models: (1) decision
trees and (2) support vector machines. We discuss decision trees here and
support vector machines in the next section.
Linear regression (for continuous variables) and logistic regression
(for discrete data) are popular tools used for summarizing relationships
in the data. An alternative way to build a predictor is to use a decision
tree. We start by describing the simplest class of tree-based models, called
classification and regression trees (CART). Breiman et al. (1984) discuss
the advantages and disadvantages of CART and then conclude with a
description of the boosting technique that alleviates some of the issues
with CART.
Classification and Regression Trees (CART)
CART recursively partitions the input space corresponding to a set of

explanatory variables into multiple regions and defines a local model on
each region, which could be as simple as assigning an output value for each
region. This type of partitioning can be represented by a tree structure,
where each leaf of the tree represents an output region. Consider a data
set with two input variables { x1,x2 } that are used to predict or model an
output variable y using a CART. An example tree with three leaves (or
output regions) is shown in Figure 11.1. This tree first asks if x1 is less than
or equal to a threshold t1. If yes, it assigns the value of 1 to the output y.
If not (i.e., if x1 . t1), it then asks if x2 is less than or equal to a threshold
t2. If yes, it assigns y 5 2 to this region. If not, it assigns the value y 5 3.
The chosen y value for a region corresponds to the mean value of y in that
region in the case of a continuous output and the dominant y in case of
discrete outputs.
A general tree model can be expressed as follows:
MIZIK_9781784716745_t.indd 259 14/02/2018 16:38

x1 ≤ t1
y=1 x2 ≤ t2
y=2 y=3
Figure 11.1 Example of a CART model
y 5 f (x) 5 a wk I (x [ Rk) 5 a wk f (x;vk) ,

K K
(11.3)
k51 k51
where x denotes the vector of features or explanatory variables, Rk is the

kth region of the K regions used to partition the space, wk is the predicted
value of y in region k, and vk is the choice of variables to split on as well
as their threshold values for the path to the kth leaf. When y is continuous,
wk is the mean response in the kth region. For classification problems where
the outcome is discrete, wk refers to the distribution of the y’s in the kth leaf.
Growing a tree requires optimally partitioning the data to derive the
points of split (threshold values of x at each tree node) as well as the value
of y in each leaf, which is an NP-complete problem (Hyafil and Rivest,
1976). It is commonly solved using a greedy algorithm that incrementally
builds the tree by choosing the best feature and the best split value for that
feature at each step of the tree-construction process. That is, the greedy
algorithm makes the locally optimal choice at each stage of the optimiza-
tion process with the hope of finding a global optimum.
Trees are trained (or “grown”) by specifying a cost function that is
minimized at each step of the tree using a greedy algorithm. For a tree that
uses two-way splits, the split function determines the best feature ( j*) and
its corresponding split value (v*) as follows:
( j*,u*) 5 arg min cost (xi,yi :xij # u) 1cost (xi , yi :xij . u) ,

j[ (1,....,d),u[Xj
(11.4)
where d is the number of input variables, Xj is the domain of values

assumed by xj , and cost is a function that characterizes the loss in pre-
diction accuracy due to a given split. The cost function that is used for
evaluating splits depends on the setting in which the decision tree would
be used. For example, the cost function could be the mean squared error
MIZIK_9781784716745_t.indd 260 14/02/2018 16:38

of the predictions in the case of the decision tree being used in a regres-
sion setting, or the misclassification rate in a classification setting. The
split procedure evaluates the costs of using all of the input variables at
every possible value that a given input variable can assume, and chooses a
variable ( j*) and the value (u*) that yields the lowest cost. The stopping
criteria for the tree construction can either be based on the cost function or
on desired properties of the tree structure. For example, tree construction
can be stopped when the reduction in cost as a consequence of introduc-
ing a new tree node becomes small or when the tree grows to a predefined
number of leaves or a predefined depth.
The greedy algorithm implies that at each split, the previous splits are
taken as given, and the cost function is minimized going forward. For
instance, at node B in Figure 11.1, the algorithm does not revisit the split
at node A. However, it considers all possible splits on all the variables at
each node, even if some of the variables have already been used at previous
nodes. Thus, the split points at each node can be arbitrary, the tree can
be highly unbalanced, and variables can potentially repeat at later child
nodes. All of this flexibility in tree construction can be used to capture a
complex set of flexible interactions, which are learned using the data.
CART is popular in the ML literature for many reasons. The main
advantage of a simple decision tree is that it is very interpretable—infer-
ring the effect of each variable and its interaction effects is easy. Trees
can accept both continuous and discrete explanatory variables, can work
with variables that have many different scales, and allow any number of
interactions between features (Murphy, 2012). A key advantage of CART
over regression models is the ability to capture rich non-linear patterns in
data, such as disjunctions of conjunctions (Hauser et al., 2010). CART
models are also robust to errors, both in the output and in the explanatory
variables, as well as missing explanatory variable values for some of the
observations. Further, CART can do automatic variable selection in the
sense that CART uses only those variables that provide better accuracy in
the regression or classification task. Finally, because the CART technique
is non-parametric, it does not require data to be linearly separable, and
outliers do not unduly influence its accuracy. These features make CART
the best off-the-shelf classifier available.
Nevertheless, CART has accuracy limitations because of its discon-
tinuous nature and because it is trained using greedy algorithms and thus
can converge to a local maximum. Also, decision trees tend to overfit
data and provide the illusion of high accuracy on training data, only to
underperform on the out-of-sample data, particularly on small training
sets. Some of these drawbacks can be addressed (while preserving all of the
advantages) through boosting, which gives us MART.
MIZIK_9781784716745_t.indd 261 14/02/2018 16:38

Boosting or MART
Boosting is a technique that can be applied to any classification or pre-

diction algorithm to improve its accuracy (Schapire, 1990). Applying the
additive boosting technique to CART produces MART (multiple additive
regression trees), which has been shown empirically to be the best classifier
available (Caruana and Niculescu-Mizil, 2006; Hastie et al., 2009). MART
can be interpreted as a weighted linear combination of a series of regres-
sion trees, each trained sequentially to improve the final output using a
greedy algorithm. MART’s output FN (x) can be written as
FN (x) 5 a ak fk (x, bk) ,

N
(11.5)
k51
where fk (x, bk) is the function modeled by the kth regression tree and ak
is the weight associated with the kth tree. Both fk (.) s and aks are learned
during the training or estimation.
We choose fk (x, bk) to minimize a prespecified cost function, which is
usually the least-squares error in the case of regressions and an entropy or
logit loss function in the case of classification or discrete choice models.
Given the set of data points (xi ,yi) 0 1 # i # n and a loss function L ( yi , ŷi)
corresponding to making a prediction of ŷi for yi, the boosting technique
minimizes the average value of the loss function. It does so by starting
with a base model F1 (x) and incrementally refining the model in a greedy
fashion:
F1 (x) 5 arg min a L (yi , f1 (xi )) ,

n
(11.6)
f1 i51
Fk (x) 5 Fk21 (x) 1 argmin a L (yi,Fk21 (xi) 1 fk (xi))

n
(11.7)
fk i51
At each step, fk (x, bn) is computed so as to best predict the residual

value y 2Fk21(x) . In particular, boosting techniques use gradient descent
to compute fk (.) at each step using gk , which is the gradient of L ( y, F (x))
evaluated at F(x) 5 F(k – 1)(x):
0L ( yi, F (xi))
gik 5 c d (11.8)
0F (xi) F(x) 5Fk 2 1 (x)
Given gk, gradient boosting makes the following update:
Fk (x) 5Fk21 (x) 2gk ·gk , (11.9)
MIZIK_9781784716745_t.indd 262 14/02/2018 16:38

where gk is the step length chosen so as to best fit the residual value:
gk 5 arg min a L (yi , Fk21 (xi) 2g 3 gk ( xi ))

n
(11.10)
g i51
Note the gradients are easy to compute for the traditional loss functions.
For example, when the loss function is the squared-error loss function
1/2 ( yi 2F (xi)) 2, the gradient is simply the residual yi 2F (xi) . In general,
boosting techniques can accommodate a broad range of loss functions and
can be customized by plugging in the appropriate functional form for the
loss function and its gradient.
MART can be viewed as performing gradient descent in the function
space using “shallow” regression trees (i.e., trees with a small number
of leaves). MART works well because it combines the positive aspects
of CART with those of boosting. CART, especially shallow regression
trees, tends to have high bias but low variance. Boosting CART models
addresses the bias problem while retaining the low variance. Thus, MART
produces high-quality classifiers.
Application of Boosted Decision Trees in Marketing
Two recent studies use boosted trees in marketing applications. In a study

involving millions of searches, Yoganarasimhan (2017) used boosted
regressions (MART) to show that personalized rank orderings for each
consumer (and each instance of search) can improve the likelihood of
consumers clicking and dwelling on search results. Further, she finds
that logistic regression provides no improvement over the baseline.3 She
uses the predictive model to examine the heterogeneity in returns from
personalization as a function of user-history and query-type. Rafieian
and Yoganarasimhan (2017) also use boosted trees to build a targeting
model for mobile in-app advertisements. In their study, they use data from
over 27 million impressions in mobile apps. They show that boosted trees
perform better than other commonly used models such as OLS regres-
sions, logistic regressions, LASSO, and Random Forests for predicting
click-through rates of consumers for mobile advertisements. They use
their results to examine the relative value of behavioral and contextual
targeting in mobile ads, and to explore the impact of targeting on com-
petition among advertisers and the incentives of the platform to share
data with advertisers. Together, these studies establish the effectiveness of
decision-tree-based models in improving marketing decisions.
MIZIK_9781784716745_t.indd 263 14/02/2018 16:38

Support Vector Machines
A support vector machine, or SVM, is a semi-parametric method typically

used for a specific kind of prediction problem—the classification problem.
SVMs are robust to a large number of variables and small samples, can
learn both simple (e.g., linear) and complex classification models, and
have built-in regularizers that help avoid overfitting. They also produce
classifiers with theoretical guarantees of good predictive performance
(of unseen data). The theoretical foundations of this method come from
statistical learning theory.
Classification Problems
Classification problems are prediction problems in which the variable of

interest is discrete, such as which product(s) the consumer will consider or
purchase, or whether or not a consumer will purchase. A general form of
a binary (two-class) classification problem is described as follows: given
a set S of labeled data points, S = { (xi , yi) } , |S| = N, where xi [ Rd are
vectors of predictor variables and yi [ { 11,21 } are class labels, construct
a rule that correctly assigns a new point x to one of the classes. A classi-
fier is a rule that is trained on the labeled data and applied to new data to
predict the labels. A classifier is typically represented as a function (x) :
Rd S R, called the classifier function. In the case of binary classifiers, a
point is assigned the label +1 if f (x) ≥ 0, and the label −1 otherwise.
Linear Classifiers
We start by describing the SVM methodology for the simple case

of linear classifiers where the classifying function f (x) has the form
f (x) 5 b0 1 bTx. A set of points { (xi ,yi) } is linearly separable if all the
points in the set can be correctly classified using a linear classifier. That
is, if yi [ { 21,11 } , the set is linearly separable if a linear function f (x)
exists such that yi · f (xi) . 0 for all i 5 1, . . . , N. For example, the set of
points in Figure 11.2 is linearly separable. To aid visual exposition, the
example depicts a simple case with two continuous predictors, x1, x2.
However, the same concepts apply to tasks in which the problem is higher
dimensional. Note that in this example, several lines (or, more generally,
hyperplanes) exist that correctly classify the data; see Figure 11.2a. We
can ask whether some are better than others. To help us choose a clas-
sifier, we define the concept of a margin, which captures this intuition: a
line is a weak classifier if it passes too close to the points, because it will be
sensitive to noise and will not generalize well. Therefore, our goal should
MIZIK_9781784716745_t.indd 264 14/02/2018 16:38

Op
tim
al H
yp
erp
lan
e
Maximum
margin
(a) Many linear classifiers can correctly classify (b) The maximum margin classifier is the
this set of points strongest
Figure 11.2 A linearly separable set of points
be to find a line that passes as far as possible from all the points, as shown
in Figure 11.2b.
That is, we seek the classifier that gives the largest minimum distance
to all the training examples; this distance is called the “margin” in SVM
theory. For now, we rely on intuition to motivate this choice of the clas-
sifier; theoretical support for this choice is provided below. The optimal
separating hyperplane maximizes the margin of the training data, as in
Figure 11.2b. The training examples that are closest to the hyperplane are
called support vectors. Note that the margin in Figure 11.2b, M, is twice
the distance to the support vectors. The distance between a point xi and
the hyperplane (b, b0) is given by
0 b0 1bTxi 0
distance 5 (11.11)
0 0b0 0
Thus, the margin is given by M 5 2 . b0 10 b00b xi , which is twice the dis-
0 T 0
tance to the closest points. Because a single hyperplane can be defined in

infinitely many ways, by scaling with 7b7, the parameters of the hyperplane
are normalized such that 0 b0 1 bTx 0 = 1. Then the margin is simply given
by 5 00 2b 00 . A hyperplane (b, b0) is called a g-margin separating hyperplane
if yi · f (xi) . g for all (xi,yi) [ S.
We can now write the problem of finding the maximum margin linear
(MML) classifier as an optimization problem that maximizes the margin
M subject to some constraints. It is typically written as minimizing M1 2,
which is a function of b, and the constraints require that the hyperplane
correctly classifies all the training examples xi :
1
minimizex 7 b7 2 (11.12)
2
MIZIK_9781784716745_t.indd 265 14/02/2018 16:38

subject to
yi (bTxi 1 b0) $ 1 4i 5 1,...,N.
The MML has several noteworthy properties. First, it can be efficiently

solved because it is a quadratic optimization problem that has a convex
objective function. Second, it has a unique solution for any linearly sepa-
rable set of points. Third, the solution to the MML classifier depends only
on the subset of points that act as the support vectors. The other points
can lie anywhere outside the margin, and their positions do not affect the
solution.
Allowing Misclassified Examples
Because the optimal separating hyperplane is drawn as far away from the
training examples as possible, the MML is only robust to noisy predictors,
not to noisy labels. Because it does not allow for misclassified examples,
even a single misclassification error in the training data can radically affect
the solution. To address this problem, the above approach can be relaxed
to allow for misclassified examples. The main idea is this: instead of con-
straining the problem to classify all the points correctly, explicitly penalize
incorrectly classified points. The magnitude of the penalty attached to
a misclassification will determine the tradeoff between misclassifying a
training example and the potential benefit of improving the classification
of other examples. The penalization is done by introducing slack variables
for each constraint in the optimization problem in equation (11.12), which
measure how far on the wrong side of the hyperplane a point lies—the
degree to which the margin constraint is violated. The optimization
problem then becomes
00 b 00 2 1 C a ji
1 N
minimizex (11.13)
2 i51
subject to yi (bTxi 1 b0) $ 1 2 ji, ji $ 0, 4i 5 1,...,N.
Now, if the margin constraint is violated, we will have to set ji > 0 for
some data points. The penalty for this violation is given by C·ji , and it is
traded off with the possibility of decreasing 0 b 0 2 . Note that for linearly
separable data, if C is set to a sufficiently large value, the optimal solution
will have all the ji = 0, corresponding to the MML classifier. In general,
the larger the value of C, the fewer margin constraints will be violated.
Users typically choose the value of C by cross-validation. Note that in
MIZIK_9781784716745_t.indd 266 14/02/2018 16:38

this more general formulation, many more data points affect the choice
of the hyperplane: in addition to the points that lie on the margin, the
misclassified examples also affect it. We will come back to this formulation
shortly and see how this formulation can be seen from the point of view
of regularization.
The above problem is also a quadratic optimization problem that has
a convex objective function and therefore can be efficiently solved. One
common method for solving it is by introducing Lagrange multipliers and
forming a dual problem. The Lagrange function resulting from the opti-
mization problem in equation (11.13) is obtained by introducing Lagrange
multipliers to the objective function for the constraints:
Lp 5 00 b 0 2 1C a ji 2 a ai ( yi ( bTxi 1 b0) 2(12ji)) 2 a mi ji ai, mi, ji $0,

1 N N N
2 i51 i51 i51
(11.14)
where mi and ai are Lagrange multipliers. We obtain first-order conditions

by taking derivatives with respect to b, b0, and xi:
b 5 a ai yi xi ,
N
(11.15)
0 5 a ai yi , ai 5 C 2 mi ,4i 5 1,..., N.

i51
N

i51
Plugging these into the Lagrangian function in (11.14), we obtain the

Lagrangian dual problem:
maximize a ai 2 a a ai air yi yir xTi xir

N 1 N N
(11.16)
i51 2 i51 ir 51
subject to 0 # ai # C, g ai yi 5 0.
N

i51
Note that in the above optimization problem, the input features xi only
enter via inner products. This property of SVM is critical to the com-
putational efficiency for nonlinear classifiers. Next, we show how the
SVM machinery can be used to efficiently solve nonlinear classification
problems.
Non-linear Classification—Kernel Method
Suppose now that our data are not separable by a linear boundary, but can
be separated by a non-linear classifier, such as in Figure 11.3a. The kernel
MIZIK_9781784716745_t.indd 267 14/02/2018 16:38

x2 z2
x1 z1
(a) Points cannot be correctly separated (b) The same points in the transformed
with a linear classifier, but a nonlinear space are now linearly separable.
classifier f (x) = –2 + x12 + x22
separates them perfectly.
Figure 11.3 Nonlinear classification
method, also known as the “kernel trick,” is a way to transform the data
into a different space, and construct a linear classifier in this space. If the
transformation is non-linear, and the transformed space is high dimen-
sional, a classifier that is linear in the transformed space may be nonlinear
in the original input space.
Consider the example of the circle shown in Figure 11.3a, which
represents the equation x21 1x22 5 2. That is, the non-linear classifier
f (x) 5 22 1 x21 1x22 separates the data set perfectly. Let us now apply the
following nonlinear transformation to x:
z 5  (x) : z1 5 x21, z2 5 x22 (11.17)
After the transformation, the classifier becomes a linear one defined as

|
follows: (z) 522 ·1 1 1 · z1 11· z2 5 bz.
Now, if we plot the data in terms of z, we have linear separation, as
shown in Figure 11.3b. The transformed space that contains the z vectors
is called the feature space, because its dimensions are higher-level features
derived from the raw input x. The transform, typically referred to as the
feature transform, is useful because the non-linear classifier (circle) in
the X-space can be represented by a linear classifier in the Z-space. Let d
|
be the dimensionality of the X space, and d the dimensionality of the Z
|
space; similarly, we let b represent the weight vector.|
Then a linear classi-
|
fier f in z corresponds to a classifier in x, f (x) 5 f ( (x)) .
If the transformed data are linearly separable, we can apply methods
developed for linear classifiers to obtain the solution in the transformed
MIZIK_9781784716745_t.indd 268 14/02/2018 16:38

|
space, b, and then transform it back to the X space. Note the in-sample
error in the original space X is the same as in the feature space Z.
The feature transform can be general, but as it becomes more complex,
the dimensionality of the feature space increases, which in turn affects the
guarantees on the classifier’s performance on new data. The kernel trick
addresses this issue by using so-called kernel functions, the mapping does
not have to be explicitly computed, and computations with the mapped
features remain efficient. This efficiency is obtained by noting that the
Lagrangian dual formulation in equation (11.16) only involves the inner
products of input features. The objective function in the transformed
feature space becomes
a i 2a a
N 1 N N
a 2 ai air yi yir 8  (xi ) , (xir) 9. (11.18)
i51 i51 i r 51
Thus, the solution involves  (x) only through inner products. Therefore,
we never need to specify the transform  (x), but only the function that
computes inner products in the transformed space:
K (x,xr) 5 8  (x) , (x) 9. (11.19)
The function K (x, xr) is known as the kernel function. The most com-
monly used choices for K are polynomial kernels:
K (x, xr) 5 (11 8 x, xr 9 ) d, (11.20)
and Gaussian kernels:
K (x,xr ) 5 exp(–||x – x′||2/(2σ2)). (11.21)
By replacing the inner product in the SVM formulation in equation

(11.14) by the kernel, we obtain a MML classifier in the transformed fea-
ture space defined by the kernel, which is non-linear in the original space.
Margin, VC Dimension, and Generalization
Generalization refers to a ML model’s predictive power outside the

training data, that is, its ability to make the best prediction ŷ for a new
data point x, which is not a part of the training set. In this context, we
present the Vapnik-Chervonenkis generalization theorem, which provides
a bound on the ability of a model fit to a training set to generalize to new
data points.
MIZIK_9781784716745_t.indd 269 14/02/2018 16:38

The Vapnik-Chervonenkis (VC) dimension measures the richness, or

flexibility, of a classifier. The VC dimension measures how complex a
classifier can be through the maximum number k of data points that can
be separated into all possible 2k ways using the model, a process which is
referred to as “shattering” the set of data points. A classifier f (x) with
parameter vector q shatters a set of data points if, for all possible labels
of those points, a q exists such that f correctly classifies all the data
points.
The more complex the set of classifiers captured by f, the higher the VC
dimension. For example, the VC dimension of a line in two dimensions is
three, because any three points (that are not collinear) can be shattered
using this model, but no set of four points can be shattered. In higher
dimensions, the VC dimension of hyperplanes in Rd is known to be d 1 1.
The VC dimension can be viewed as the number of a model’s hypotheses.
We have the following result that provides the upper bounds for the VC
dimension h for the set of g-margin separating hyperplanes.
Let xi be a set of points in Rd that belong to a sphere of radius Q. Then
the set of g-margin separating hyperplanes has VC dimension h:
R 2
h # mina a b , db 11. (11.22)
g
Note the upper bound is inversely proportional to the margin g, suggest-
ing the larger the margin, the lower the VC dimension of the correspond-
ing set of classifiers.
In evaluating a classification algorithm, we are interested in the number
of errors the classifier will make when classifying unseen, out-of-sample,
data when all we know for sure is the number of errors made on the train-
ing, or in-sample, data. This number cannot be computed exactly, but it
can be upper-bounded using the VC dimension. The VC generalization
bound gives an upper bound on the probability of a test sample being
misclassified by a g-margin hyperplane. With probability 12d, the prob-
ability of a test sample being misclassified is
m E 4m
Perr # 1 a11 1 1 b, (11.23)
N 2 Å NE
where 2N d
haln 1 1b 2 ln
h 4
E54 . (11.24)
N
N is the number of points in the training sample, m is the number
of training examples misclassified by the hyperplane, and h is the VC
MIZIK_9781784716745_t.indd 270 14/02/2018 16:38

imension. Vapnik and Chervonenkis (1971) developed the unifying rela-

d
tionship between the VC dimension, sample size, and classification errors.
The first term in equation (11.23) is the proportion of misclassified data
points in the training sample; the second term is a function of the model
complexity, which increases with the VC dimension, h. Therefore, the
bound on the probability of misclassifying a new data point is proportional
to the VC dimension of the set of classifiers. Thus, all else being equal, a
more complex classifier (one with a higher VC dimension) is likely to be
a worse predictor than a simple classifier. We have also seen in equation
(11.22) that the VC dimension decreases as the margin (g) increases; this
finding provides a theoretical foundation for looking for classifiers with
the maximum margin, such as the MML. More generally, it motivates
regularization, which is a method used to prevent model overfitting.
Regularization
The VC generalization bound tells us that, as far as out-of-sample predic-

tion is concerned, we should be better off fitting the data using a “simpler”
model. Therefore, rather than simply finding a model that minimizes
error, we introduce a term to the optimization that penalizes for model
complexity, called the regularization penalty. This approach avoids over-
fitting by constraining the algorithm to fit the data using a simpler model.
Consider the SVM optimization problem in equation (11.13). ji is set to
1 2 yi ( bTxi 1b0) , if a data point in the training set is misclassified, and
0 if it is classified correctly. The optimization problem can be rewritten as
g b g 1 C a (1 2yi ( bTxi 1b0)) 1

1 2 N
minimizex (11.25)
2 i51
Here, we can view the second term, Cg N i51

(12yi ( bTxi 1b0)) 1, as the
loss for misclassifying a data point, and can view the first term, the linear
squared geometric margin 12 0 b 0 2, as the regularization penalty that helps
stabilize the objective. Regularization thus helps us select the solution with
the largest geometric margin, corresponding to lower VC dimension, or
model complexity.
This type of regularizer, which penalizes the squared or L2 norm of the
parameter values, is sometimes referred to as weight decay, because it forces
the weights to decay toward 0. Note that when applied to linear regression,
it results in what is called ridge regression in econometrics. Similarly, the
L1 regularizer 0 b 0 corresponds to lasso regression when applied to linear
regression. With the L1 regularizer, many of the less relevant features
will be set exactly to 0, resulting in feature selection. Other than linear
regression, regularization is also used for logistic regression, neural nets,
MIZIK_9781784716745_t.indd 271 14/02/2018 16:38

and some matrix-decomposition methods. In the more general form of

this regularization, called Tikhonov regularization, different penalties can
be placed on different weights being large, resulting in the form bT GT Gb
(Tikhonov and Arsenin, 1977).
Typically, the optimization is written in the form of minimizing in-
sample errors, plus the regularization penalty, that is,
minimizex Ein (w) + lC wT w (11.26)
for L2 regularization. The functional form of the regularizer is usually

chosen ahead of time, whereas the parameter l, which determines the
amount of regularization, needs to be trained. Such training is necessary
because the type of regularization is usually known based on the type of
data, and type of model to be fit, but the data themselves should dictate
the amount of regularization. We want to pick the l that will result in
the best out-of-sample prediction (the best in-sample fit is achieved using
l 5 0 ). To determine which value of l leads to the best out-of-sample
prediction, we train it using a validation method, which we describe next.
In general, regularization is necessary if the class of models is too rich
for the data. Then we can combat overfitting by regularization, which
penalizes the sizes of the parameters. For example, Hauser et al. (2010)
estimated a very rich model of non-compensatory consideration-set
formation, called disjunction of conjunctions, which allow for non-
compensatory rules of the form (fuel efficient AND Toyota AND sedan)
OR (powerful AND BMW AND sports car). Note the complexity of this
model is exponential in product attributes and is prone to overfitting. In
fact, any training data consisting of considered and non-considered prod-
ucts can be perfectly fit with a separate conjunction for all the features
of each considered product. The authors use regularization to combat
overfitting and look for “simple” decision rules to fit the data, resulting in
good out-of-sample performance.
Note that, because we are explicitly penalizing for complexity, we can
consider a much broader class of models (e.g., many more predictors than
data points, high-degree interactions, etc.) because the regularizer will
guarantee we find the best predictive model that does not overfit.
Applications of SVM in Marketing
Because of SVM’s robustness and ability to handle large, high dimensional

data, it has become one of the most popular classification algorithms over
the past 20 years, with applications in image recognition, text mining,
and disease diagnosis. Cui and Curry (2005) introduced it to marketing
MIZIK_9781784716745_t.indd 272 14/02/2018 16:38

and provide an excellent overview of SVM theory and implementations.

They also compare the predictive performance of SVM to that of the
multinomial logit model on simulated choice data, and demonstrate SVM
performs better, particularly when data are noisy and products have many
attributes (i.e., high dimensionality). They also see that when predicting
choices from larger choice sets, SVM more significantly outperforms the
multinomial logit model. Although both methods’ predictive ability falls
as the size of the choice set increases, because the first-choice prediction
task becomes more difficult, the decline is much steeper for multinomial
logit than for SVM. Evgeniou et al. (2005) present and test a family of
preference models, including highly non-linear ones, which are estimated
using SVM methodology. The estimation procedure uses regularization
to prevent the complex models from overfitting that is similar to that of
SVM. For linear utility models, they find the SVM significantly outper-
forms logistic regression on out-of-sample hit rates. The improvement of
using SVM versus logistic regression is particularly large when the choice
design is random; the methods perform approximately equally well for a
balanced orthogonal choice design. Similar to Cui and Curry (2005), they
find SVM performs significantly better when noise increases, suggest-
ing SVM is more robust to noise. Next, they test the performance of the
methods on utility models that involve interactions among attributes. For
these models, they show SVM performs similar to hierarchical Bayes (HB)
estimation of a correctly specified nonlinear model. However, SVM better
captures the nonlinear parts of the model. Additionally, SVM can handle
more complex models with more interactions than HB can, because it is
computationally efficient.
Evgeniou et al. (2007) extend SVM to develop a framework for
modeling choice data for multi-attribute products, which allows the
capturing of respondent heterogeneity and the pooling of choice data
across respondents. The attribute partworths are shrunk to the mean with
regularization parameters that are trained using cross-validation.
More recently, Huang and Luo (2015) used fuzzy SVM, an extension of
SVM methodology, for preference elicitation of complex products with a
large number of features. They proposed an adaptive question-selection
process using fuzzy SVM active learning to adaptively select each sub-
sequent question. They showed that, due to the convex nature of SVM
optimization, such an approach is computationally efficient for preference
elicitation of complex products on the fly.
Another extension is the latent-class SVM model, which allows the
use of latent variables within SVM. Liu and Dzyabura (2016) develop an
algorithm for estimating multi-taste consumer preferences by building on
the convex–concave procedure used to estimate latent-class SVM while
MIZIK_9781784716745_t.indd 273 14/02/2018 16:38

capturing respondent heterogeneity. They show their model’s prediction is

better than single-taste benchmarks.
Common Issues in ML Methods
Training, Validation, and Testing
Dividing the data into separate sets for the purpose of training, valida-
tion, and testing is common. Researchers use the training data to estimate
models, the validation data to choose a model, and the testing data to
evaluate how well the model performs. We discuss below the reasons
for splitting the data into the constituent parts and issues related to this
framework.
We first examine the need for using a testing data set. As discussed
earlier, the goal of ML techniques is to provide the best out-of-sample
predictions as opposed to simply improving the model fit on the sample
data set. Given this need, the predictive ability of ML techniques is
evaluated by first constructing a model on a training data set and then
evaluating its accuracy on a testing data set, whose corresponding data
items weren’t included in the training data set. This approach provides
a meaningful estimate of the expected accuracy of the model on out-of-
sample data.
Let us now examine the need for having a validation data set. Consider
an ML technique that trains multiple models on a training set S, and
picks the model that provides the best in-sample accuracy (the lowest
error on the set S). This approach will prefer larger and more detailed
models to less detailed ones, even though the less detailed ones might
have better predictive performance on out-of-sample data. For example,
if we are approximating a variable y using a polynomial function applied
on inputs x, then, if we determine the order of the polynomial based on
the accuracy of prediction on the training set S, we would always pick a
very high-degree, high-variance polynomial model that overfits the data
in S and may, as a consequence, perform poorly on the testing data. To
address this issue, cross-validation splits the input data set S into two
components: St (training) and Sv (validation). It then uses the training set
St to generate candidate models, and then picks a model that performs best
on Sv as opposed to basing the decision solely on St fit. Cross-validation
thus ensures the chosen model does not overfit St and performs well on
out-of-sample data.
The cross-validation enhancement can be applied to any ML algorithm.
For example, in the case of boosted MART, cross-validation typically
MIZIK_9781784716745_t.indd 274 14/02/2018 16:38

works as follows. After each tree is computed based on St and added to the
MART ensemble, the MART is evaluated on the validation data set Sv .
Although the additional tree would have improved the accuracy on St , it
might not have necessarily improved the accuracy on Sv . So the algorithm
could introduce a stopping rule for MART construction that terminates
the MART construction when k consecutive iterations (or trees) have not
yielded accuracy improvements on Sv. The algorithm would then select
the MART (or the output of an intermediate step) that yielded the best
accuracy on Sv as the best model.
Note that validation is not free. The algorithm has to split the input
data set into two smaller components and use only one of them for the
purpose of training. This procedure leaves fewer samples to train a model,
resulting in a suboptimal model. However, the accuracy gains realized
from avoiding overfitting typically trump the reduction in the size of the
training data, particularly for large data sets. As a consequence, most ML
practitioners use cross-validation as part of their modeling toolkit.
Additional Techniques to Avoid Overfitting
In addition to the cross-validation method discussed above, additional

techniques exist for reducing the effect of overfitting. We now discuss
some of those techniques.
Regularization
Because simple models tend to work better for out-of-sample forecasts,

ML researchers have come up with ways to penalize models for excessive
complexity. This process is known in ML as “regularization” or “complex-
ity control,” and we will give examples when we discuss specific methods.
Although economists also tend to prefer simpler models (for the same
reason), they have not been as explicit about quantifying complexity costs.
Tuning regularization parameters using cross-validation

If we have an explicit numeric measure of model complexity, we can
view it as a parameter that can be tuned to produce the best out-of-
sample predictions. The standard way to tune a parameter is to use k-fold
cross-validation:
(1) Divide the data into k equal subsets (folds) and label them s 5 1,. . .,k.
Start with s 5 1.
(2) Pick an initial value for the tuning parameter.
(3) Fit your model using the k 2 1 subsets other than s.
MIZIK_9781784716745_t.indd 275 14/02/2018 16:38

(4) Predict the outcome variable for subset s and measure the associated
loss.
(5) Stop if s 5 k; otherwise, increment s by 1 and go to step 2.
After cross-validation, we end up with k values of the tuning parameter

and the associated loss, which you can then examine to choose an appro-
priate value for the tuning parameter. Even if no tuning parameter exists,
using cross-validation to report goodness-of-fit measures is generally a
good idea, because it measures out-of-sample performance, which is gen-
erally more meaningful than in-sample performance (such as R2). Using
the test–train cycle and cross-validation in ML is common, particularly
when large data sets are available. If the data are large enough that a
model can be estimated on a subset of the data, using separate training and
testing sets provides a more realistic measure of prediction performance.
Feature selection
Feature selection is a standard step in ML settings that involve supervised
learning (Guyon and Elisseeff, 2003). Feature selection typically provides
a faster and more computationally efficient model by eliminating less rele-
vant features with minimal loss in accuracy. It is thus particularly relevant
for training large data sets that are typical in various target application
settings. Feature selection also provides more comprehensible models that
offer a better understanding of the underlying data-generating process.
When the data sets are modest in size and the number of features is large,
feature selection can actually improve the predictive accuracy of the model
by eliminating irrelevant features whose inclusion often results in overfit-
ting. Many ML algorithms, including neural networks, decision trees,
CART, and naive Bayes learners, have been shown to have significantly
worse accuracy when trained on small data sets with superfluous features
(Duda and Hart, 1973; Aha et al., 1991; Breiman et al., 1984; Quinlan,
1993).
The goal of feature selection is to find the smallest set of features that
can provide a fixed predictive accuracy. In principle, this problem is
straightforward because it simply involves an exhaustive search of the
feature space. However, with even a moderately large number of features,
an exhaustive search is practically impossible. With F features, an exhaus-
tive search requires 2F runs of the algorithm on the training data set,
which is exponentially increasing in F. In fact, this problem is known to be
NP-hard (Amaldi and Kann, 1998).
The wrapper method addresses this problem by using a greedy algo-
rithm (Kohavi and John 1997). Wrappers can be categorized into two
types—forward selection and backward elimination. In forward selection,
MIZIK_9781784716745_t.indd 276 14/02/2018 16:38

features are progressively added until a desired prediction accuracy is

reached or until the incremental improvement is very small. By contrast, a
backward-elimination wrapper starts with all the features and sequentially
eliminates the least valuable features. Both wrappers are greedy in the
sense that they do not revisit former decisions to include (in forward selec-
tion) or exclude features (in backward elimination). More importantly,
they are “black box” techniques in the sense that they can work with any
ML algorithm by invoking them without needing to understand their
internal structure.
To enable a wrapper algorithm, the researcher needs to specify a selec-
tion as well as a stopping rule. A commonly used and robust selection rule
is the best-first selection rule (Ginsberg, 1993), wherein the most promising
node is selected at every decision point. For example, in a forward-selec-
tion algorithm with 10 features, at the first node, this algorithm considers
10 versions of the model (each with one of the features added) and then
picks the feature whose addition offers the highest prediction accuracy.
The process continues until a stopping-rule condition is satisfied. A stop-
ping rule consists of a cut-off point for the incremental gain obtained at
each step of the algorithm, and when the incremental gain is less than this
cut-off point, the feature-selection process ends and emits the currently
selected set of features.
Wrappers offer many advantages. First, they are agnostic to the
underlying learning algorithm and the accuracy metric used for evaluating
the predictor. Second, greedy wrappers have been shown to be robust
to overfitting and computationally advantageous (Reunanen, 2003); the
resulting model requires fewer features to be computed during testing,
and the testing-classification process itself is faster because the model is
compact.
Conclusion
ML methods are gaining traction in both marketing practice and aca-

demic research. They provide a set of valuable tools to help us increase
the out-of-sample performance of marketing models and thereby improve
the quality of marketing decisions. In this chapter, we presented a brief
overview of the two most commonly used ML methods, decision trees and
SVM, as well as a discussion of their applications in marketing. With the
advent of large data sets, focus on real-time performance, and the avail-
ability of cheap and fast computing (e.g., Amazon EC2), we hope market-
ers can use ML techniques to answer a new set of exciting and challenging
substantive questions going forward.
MIZIK_9781784716745_t.indd 277 14/02/2018 16:38

Notes
1. The authors thank Bryan Bollinger, Shahryar Doosti, Theodoros Evgeniou, John
Hauser, Panos Ipeirotis, Lan Luo, Eugene Pavlov, Omid Rafieian, and Amin ZadKazemi
for their comments.
2. For a detailed discussion of the roles of causal, predictive, and descriptive research in
social sciences, please see Shmueli (2010).
3. In a comparison of logistic regression and decision trees, Perlich et al. (2003) examined
several data sets. Taking different sized subsamples of the data, they estimated both
models using learning curves, that is, how the model’s predictive accuracy improves as
the sample size increases. They found that logistic regressions work better for smaller
data sets, and trees work better for larger data sets. Interestingly, they found this pattern
holds even for training sets from the same domain.
References
Aha, W., D. Kibler, and M. K. Albert. Instance-based Learning Algorithms. Machine

Learning, 6(1): 37–66, 1991.
Amaldi, E. and V. Kann. On the Approximability of Minimizing Nonzero Variables or
Unsatisfied Relations in Linear Systems. Theoretical Computer Science, 209(1): 237–260,
1998.
Breiman, L., J. Friedman, C. Stone, and R. Olshen. Classification and Regression Trees. The
Wadsworth and Brooks-Cole statistics-probability series. Taylor & Francis, 1984.
Caruana, R. and A. Niculescu-Mizil. An Empirical Comparison of Supervised Learning
Algorithms. In Proceedings of the 23rd International Conference on Machine Learning,
ACM, 2006, 161–168.
Cui, D. and D. Curry. Prediction in Marketing using the Support Vector Machines.
Marketing Science, 24(4): 595–615, 2005.
Duda, R. O. and P. E. Hart. Pattern Recognition and Scene Analysis, New York: Wiley, 1973.
Evgeniou, T., C. Boussios, and G. Zacharia. Generalized Robust Conjoint Estimation.
Marketing Science, 24(3): 415–429, 2005.
Evgeniou, T., M. Pontil, and O. Toubia. A Convex Optimization Approach to Modeling
Consumer Heterogeneity in Conjoint Estimation. Marketing Science, 26(6): 805–818,
2007.
Ginsberg, M. Essentials of Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, 1993.
Guyon, I. and A. Elisseeff. An Introduction to Variable and Feature Selection. Journal of
Machine Learning Research, 3:1157–1182, 2003.
Hastie, T., R. Tibshirani, J. Friedman, T. Hastie, J. Friedman, and R. Tibshirani. The
Elements of Statistical Learning, Vol. 2. New York: Springer, 2009.
Hauser, J. R., O. Toubia, T. Evgeniou, R. Befurt, and D. Dzyabura. Disjunctions
of Conjunctions, Cognitive Simplicity, and Consideration Sets. Journal of Marketing
Research, 47(3): 485–496, 2010.
Huang, D. and L. Luo. Consumer Preference Elicitation of Complex Products using Fuzzy
Support Vector Machine Active Learning. Marketing Science, 35(3): 445–464, 2015.
Hyafil, L. and R. L. Rivest. Constructing Optimal Binary Decision Trees is NP-complete.
Information Processing Letters, 5(1): 15–17, 1976.
Kohavi, R. and G. H. John. Wrappers for Feature Subset Selection. Artificial Intelligence,
97(1): 273–324, 1997.
Lemmens, A. and C. Croux. Bagging and Boosting Classification Trees to Predict Churn.
Journal of Marketing Research. 43(2): 276–286, 2006.
Liu, L. and D. Dzyabura. Capturing Multi-taste Preferences: A Machine Learning Approach.
Working Paper, 2016.
MIZIK_9781784716745_t.indd 278 14/02/2018 16:38

Murphy, K. P. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press,

2012.
Perlich, C., F. Provost, and J. S. Simonoff. Tree Induction vs. Logistic Regression: A
Learning-curve Analysis. Journal of Machine Learning Research, 4:211–255, 2003.
Quinlan, J. R., C4. 5: Programs for Machine Learning, Volume 1. San Mateo, CA: 1993.
Rafieian, O. and H. Yoganarasimhan. Targeting and Privacy in Mobile Advertising.
Working Paper, 2017.
Reunanen, J. Overfitting in Making Comparisons between Variable Selection Methods.
Journal of Machine Learning Research, 3:1371–1382, 2003.
Schapire, R. E. The Strength of Weak Learnability. Machine Learning, 5(2): 197–227, 1990.
Shmueli, G. To Explain or to Predict? Statistical Science, 25(3): 289–310, 2010.
Tikhonov, A. N. and V. Y. Arsenin. Solutions of Ill-posed Problems. Washington, DC:
Winston, 1977.
Vapnik, V. N. and A. Y. Chervonenkis. On the Uniform Convergence of Relative
Frequencies of Events to their Probabilities. Theory of Probability & Its Applications,
16(2): 264–280, 1971.
Yoganarasimhan, H. Search Personalization using Machine Learning. Working Paper, 2017.
MIZIK_9781784716745_t.indd 279 14/02/2018 16:38

12. Big data analytics
Asim Ansari and Yang Li
The field of “Big Data” is vast, and rapidly evolving. It is fueled by the
explosion of opportunities and technologies to collect, store, and analyze
vast amounts of data on consumers, firms, and other entities. Many fields
of enquiry are germane to the analysis of big data, including statistical
inference, optimization, machine learning, networking, and visualization.
Given the vastness of the intellectual terrain that is involved, and the
variety of perspectives that one can use to harness big data, only a limited
understanding can be gained via a particular lens. While the analysis of
big data requires handling challenges that are associated with a number of
areas such as data storage, data processing, and rapid access to data, an
understanding of the emerging advances and technologies in these areas is
critical for unleashing the promise of big data. In this chapter, we restrict
attention to challenges that are associated with making statistical infer-
ences from big data. In doing so, it is useful to characterize big data by the
four Vs: volume, velocity, variety, and veracity.
VOLUME
Volume refers to the fact that a big data set contains a large quantity of
data. In a typical rectangular dataset, volume can be expressed in terms of
the total number of observations N. When N is very large, we have what
is called a tall dataset. In panel data settings, each individual has multiple
observations. Such data are typically analyzed via hierarchical models in
which the number of model parameters grows with the number of individ-
uals. Both the number of individuals and the total number of observations
characterize volume in these settings.
Many marketing contexts generate tall datasets. Retailers routinely
collect data on the purchases of millions of customers and aim to generate
insights and predictions, both at the individual and the population levels.
Consumers are immersed in a highly connected world and their activities
and interactions leave traces that can be of value to marketers. Click-
stream data generated from online interactions is one example of big data.
The internet is also a ready source of data on sequences of advertising
exposures for consumers and their consequent responses. Similarly, user-
280
MIZIK_9781784716745_t.indd 280 14/02/2018 16:38

Big data analytics 281
generated content, in the form of reviews and opinions of consumers, and

data that social networks spawn on a continuous basis are increasingly
being used by firms for targeting purposes. A number of computational
challenges are associated with the analysis of such data. Traditional
iterative estimation methods that require many passes through the entire
dataset do not scale well, and therefore approximate algorithms that
reduce computational time and tradeoff some bias against efficiency gains
are needed.
VELOCITY
The modern world generates data at high velocity. For example, in retail
contexts, each browsing session, or each purchase occasion generates
new information about a consumer’s preferences. Companies such as
Netflix and Amazon have access to the viewing and purchasing habits of
their users as these dynamically evolve over time. Firms need to integrate
such new information with the existing profile of the customer in a timely
manner to account for shifts in preferences and tastes. More often than
not, recent shifts in preferences could be the best predictors of future
behavior and therefore timely integration of new information is of con-
siderable importance to firms. As a result, the arrival of new information
requires quickly updating customer-specific parameters in the statisti-
cal models that analyze the big data. Moreover, the aggregation of the
new observations across all customers also shifts information about the
entire customer base, and thus population level parameters also need to
be updated to reflect the changing preference structure. Such streaming
datasets can be considered infinite, as the number of observations grow
with time. Analysts therefore need online methods of inference to handle
these streaming contexts.
VARIETY
Marketing data nowadays is also available in many different forms. Much

of user-generated content does not arrive in the neat rectangular arrays of
numbers that we were used to in traditional modeling contexts. Customers
write reviews about products and firms, contribute to blogs, use Twitter
to comment on unfolding events and situations, and use social networking
platforms to interact with each other and with firms. Data are therefore
available in the form of text, images, video, and relations among indi-
viduals, and marketers need to be able to decipher the information that is
MIZIK_9781784716745_t.indd 281 14/02/2018 16:38

c ontained in the data. While data are available in many different modali-
ties, in the end, such data gets converted to numbers, and then analyzed,
either using traditional methods or via newer approaches. Relational data
garnered from social networks also poses its own challenges, in terms
of sampling, clustering, and data analysis. These data require complex
models as the dependency structure needs to be properly modeled, and in
many cases, modeling of heterogeneity becomes very important.
Marketing academics have begun to leverage data of variegated forms.
Methods of text mining and natural language processing, and approaches
based on topic modeling are gaining currency in marketing. Similarly,
information contained in images can be parsed and analyzed using image
processing methodologies and deep learning technologies that apply
hierarchical models composed of multiple layers to capture different levels
of representation.
Another aspect of big data variety is reflected in high dimensional
datasets. Such datasets are characterized by the number of variables
(dimensions), p being very large, and in some case much larger than N,
and are termed as wide datasets. In these instances, dimension reduction
and regularization (i.e., the ability to tradeoff model complexity and fit)
become crucial to avoid overfitting and to adequately summarize the
information within the wide data.
VERACITY
Finally, veracity refers to the inferential challenges that stem from the
way big data is sometimes pieced together from disparate sources. For
instance, firms can bring together data on consumer reviews, user gener-
ated content such as product tags, as well as traditional numerical indica-
tors of preferences in the form of ratings. Moreover, data germane to a
particular marketing context could be available across different levels of
aggregation. Such data that differs in modality and source of origin needs
to be fused appropriately to unearth meaningful insights, and uncertainty
about the quality of the data needs to be reflected in the analysis.
We now briefly describe the computational challenges that arise in esti-
mating complex models in big data settings and highlight a few strategies
that are being actively pursued to handle these challenges.
MIZIK_9781784716745_t.indd 282 14/02/2018 16:38

Computational Challenges in Big Data
Large volume implies difficulties in managing the data and making it

suitable for analysis. Moreover, statistical analysis needs to confront
head-on the challenges associated with developing and using computa-
tional methods that scale well. Conventional methods that are suitable in
small data contexts are often not scalable in big data settings.
Marketers are often interested in leveraging the heterogeneity in con-
sumer preferences for targeting and personalization. An understanding
of the distribution of consumer responses to product attributes can guide
product design decisions – an insight that would be lost if the preference
is examined only at the mean (Allenby and Rossi 1999). Also, modeling
individual-level responses to marketing actions allows firms to adjust
allocation of resources across regions, stores, and consumers (Rossi et al.
1996). In most marketing data settings, hierarchical Bayesian models are
used to recover consumer heterogeneity, as these models appropriately
pool information across consumers to generate both individual-level and
population-level inference.
Simulation-based Markov Chain Monte Carlo (MCMC) has been
the method of choice for estimating hierarchical Bayesian models in
marketing. MCMC methods are iterative algorithms that yield samples of
parameter draws from the posterior distribution and parameter inference
is based on these samples. A variety of MCMC methods have been used
in the literature to handle different types of hierarchical Bayesian models.
Gibbs sampling is suitable for conjugate models (posteriors of these
models come from the same distributional family as the priors; Gelfand
and Smith 1990; Gelfand et al. 1990), whereas Metropolis–Hasting
methods (Metropolis et al. 1953; Chib and Greenberg 1995) and their
extensions such as Hamiltonian Monte Carlo and Langevin Methods are
useful for non-conjugate models. However, these MCMC methods are not
suitable for big data settings as they require a very large number of itera-
tions for convergence. Big data analysis thus needs approximate methods
that scale linearly with the number of observations.
A number of different estimation strategies are being actively investi-
gated to handle the computational challenges spawned by large volume.
Ansari, Li, and Zhang (2017) outline these strategies and describe their
utility in various settings. These strategies include: (1) the use of approxi-
mate models that require less computation but provide a high-fidelity
representation of the model likelihood, via polynomial approximations
and Gaussian processes (Rasmussen and Williams 2006), (2) the use
of optimization-based methods instead of simulation-based estimation,
as optimization requires much fewer iterations to obtain estimates, (3)
MIZIK_9781784716745_t.indd 283 14/02/2018 16:38

the use of subsampling of data so that each iteration of the estimation

algorithm is based on a random subset of the entire dataset, and (4)
“divide-and-conquer” approaches in which the dataset is partitioned into
distinct subsets, the subsets are analyzed in parallel and the results are then
appropriately combined.
Approximate MCMC Algorithms
Many different approaches that rely on subsampling and the “divide-and-

conquer” strategy are being actively investigated in the context of MCMC
estimation to reduce its computational burden. Korattikara et al. (2014)
and Bardenet et al. (2014) show how to reduce the estimation time of the
Metropolis–Hastings (MH) algorithm using subsampling. In a typical
MH algorithm, a proposed parameter draw in an iteration is accepted or
rejected using a MH test that is based on the entire dataset. These studies
show how approximate MH steps that rely on subsets of the data can
be used instead. Similarly, Maclaurin and Adams (2014) develop Firefly
Monte Carlo sampling that operates on random subsets of the data in each
iteration of the MCMC algorithm.
A number of stochastic gradient methods have been recently proposed
to reduce the computational cost that arises from the need to use gradients
in Langevin and Hamiltonian Monte Carlo methods. Welling and Teh
(2011) develop a Stochastic Gradient Langevin Dynamics procedure that
uses noisy gradients to generate candidate parameter draws. Chen et al.
(2014) use a similar idea in the context of Hamiltonian Monte Carlo.
A number of approaches for distributed MCMC based on the “divide-
and-conquer” principle have also been recently proposed. These methods
divide the entire dataset into a number of disjoint subsets. Similarly,
the posterior is also factored into sub-posteriors. MCMC inference
is performed on each subset separately and in parallel, without any
communication between these computational processes, resulting in an
embarrassingly parallel approach (i.e., without the need for communica-
tion among sub-processes). Samples from the sub-posteriors are then
appropriately combined. Scott et al. (2016) show how consensus Monte
Carlo can be used to obtain a weighted average of the separate posterior
samples. Neiswanger et al. (2014) use density estimation, where a density
estimator is fit to each sub-posterior based on the MCMC output. The full
posterior density is then obtained as the product of these sub-posterior
estimators, which allows sampling from this approximation. In a similar
vein, Wang and Dunson (2013) propose the Weierstrass sampler, which
uses Weierstrass transforms on sub-posterior densities.
MIZIK_9781784716745_t.indd 284 14/02/2018 16:38

Optimization-based Approaches
Parallel to the above-described developments for MCMC methods, sto-

chastic approximation is gaining ascendance in the context of optimiza-
tion. These stochastic approximation approaches rely on subsets of the
data to cut down the computation involved in optimization. Prominent
among these approaches are stochastic gradient descent as well as stochas-
tic variational inference. Similarly, methods that are capable of leveraging
distributed computation in an embarrassingly parallel fashion are being
actively developed in the machine learning communities to analyze data-
sets that do not fit in memory or are distributed across many different
locations. Ansari and Li (2017) describe these approaches and illustrate
their use in the context of marketing models.
In this chapter, we will focus on optimization approaches for computing
the posterior. In particular, we will explore stochastic gradient approaches
and stochastic variational Bayesian optimization methods for handling tall
datasets and explore their potential use for marketing models and prob-
lems. We will also briefly explore how dimension reduction methods such
as the Lasso and the Elastic-Net can be useful in dealing with wide data.
Stochastic Approximation
The computational complexity associated with traditional estimation

methods limits their applicability to large data sets. Algorithms that scale
linearly with the number of observations are more useful in big data set-
tings. Methods based on stochastic approximation, such as stochastic
gradient descent (SGD) and stochastic variational inference (SVI), there-
fore, are becoming increasingly popular. In this section, we will describe
how and why stochastic gradient descent works. We focus on its utility in
optimization of likelihoods in frequentist settings as well as in maximum-
a-posteriori estimation within the Bayesian context.
Let us consider data D containing iid samples, D 5 { xi , yi } , i 51,. . .,N,
where the outcome yi [ R is distributed conditional on a vector xi [ Rp
according to a density, f ( yi ; xi , ) . In order to understand the data gen-
erating process, and also to make predictions, we either minimize a loss-
a log-likelihood, which is given by l (;D) 5 g N

function (such as least squares for regression) or, more generally, maximize
( )
i51 log f yi ; xi , so as to
obtain an MLE estimate of the true parameter *. In Bayesian contexts,
objective function, yielding p (; D) 5 g i51

an additional term that reflects the prior is also part of the optimization
N log f ( y ; x ,) 1log p () .
i i
If the dataset is small, we can use conventional methods such as EM,
MIZIK_9781784716745_t.indd 285 14/02/2018 16:38

Fisher scoring or gradient descent to obtain the model parameters. In gra-

dient descent for MLE, for example, each iteration updates the parameter
estimates q using the gradient of the log-likelihood
a , log f (yi ; xi, ) ,

g N
t11 5 t 1
N i51
(12.1)
where, t denotes iteration, = is the gradient operator and g is a scalar called

learning rate, which needs to be chosen properly. The above optimization
can be improved by using a positive-definite matrix Ct instead of a scalar
learning rate, i.e.,
Ct a , log f ( yi ; xi, ) .
g N
t11 5 t 1 (12.2)
N i51
This second-order gradient descent is a variant of the Newton method for
optimization.
When the data are massive (i.e., large N), or we have a streaming dataset
for which N is not known (i.e., infinite N), gradient descent and other tra-
ditional optimization approaches do not scale well or are not applicable,
due to the following two reasons. First, these methods typically require an
evaluation of the objective function using the entire dataset. For example,
the gradient descent method requires the computation of the gradient
using all the observations in the sample. In other words, across iterations,
multiple passes over all the observations are needed. This makes such
methods computationally prohibitive. Second, methods such as Fisher
scoring and Newton algorithm, require an inversion of p 3 p matrices for
each iteration, which significantly adds to the computational complexity,
when the data is high dimensional, i.e., p is large.
Given the above reasons, stochastic approximation methods based on
noisy estimates of the gradient become useful in reducing the computa-
tional burden.
Stochastic Gradient Descent
The SGD algorithm simplifies the parameter update in equation (12.1) by

using a stochastic approximation of the gradient that is based on a single
observation as
t 5 t21 1 gt , log f ( yi ; xi , t21) . (12.3)
The above update eventually goes through every observation, and

therefore the SGD method is also suitable for streaming contexts. The
MIZIK_9781784716745_t.indd 286 14/02/2018 16:38

learning rate sequence gt . 0 requires that g t gt 5 ` and g t g2t , ` for the

procedure to converge. It is typical to specify gt 5 g1 /t t, where t [ [ 1/2,1 ] .
We can see that SGD replaces the inversion of a p 3 p matrix with a
scalar sequence gt . Also, instead of computing the gradient over the entire
dataset, SGD computes the gradient on a single observation. In practice,
multiple passes, with the observations randomly shuffled, can be made
until convergence is achieved. However, it has been shown across multiple
contexts that even a single pass over the data can result in a fairly good
estimate of .
We now illustrate the SGD method for a very simple regression context.
Let yi 5 xi  1 ei, ei ,N (0,s 2 ) , for i 5 1,. . .,N, a typical regression setup for
a dataset of N observations. For simplicity, assume that the error variance
s2 is known. The log-likelihood for a single observation i is given by
1
log f ( yi ; xi ,) 5 2 ( y 2 xir ) 2,
2s 2 i
and the gradient of the log-likelihood at observation i is given by
1
,q log f ( yi ; xi , ) 5 ( y 2xir ) xi .
s2 i
Then, assuming gt 5 g1 /t, where g1 . 0, and starting the algorithm with
a value 0 , the SGD algorithm updates q as follows:
t 5 t21 1 gt ( yi 2xir t21) xi ,
5 (1 2 gt xir xi ) t21 1 gt yi xi . (12.4)
The SGD method can suffer from numerical instability problems if the
learning rate gt is set too high and the algorithm can diverge, instead of con-
verging to the true parameter value. Setting the learning parameter too low,
however, can result in slow convergence. Moreover, as this is an approxi-
mate algorithm, there is an efficiency loss, compared to more traditional
optimization methods. The loss in efficiency can be handled using averaging
of the parameter estimates across the iterations. Toulis and Airlodi (2015)
and Toulis and Airlodi (2016) show that the instability issues can be tackled
using an implicit stochastic gradient descent method. The parameter update
in implicit method differs from the above SGD update as follows:
t 5 t21 1 gt , log f ( yi ; xi ,t )

im
im im (12.5)
a k .
1 t im
t 5 (12.6)
t k51
MIZIK_9781784716745_t.indd 287 14/02/2018 16:38

In the above, the first equation represents the implicit update. This is
an implicit update because imt occurs on both sides of the equation. The
second equation represents the parameter averaging. Upon completion,
the averaged parameter provides an estimate of the true parameter.
While the above shows gradients based on a single observation, oftentimes
the gradients are based on random subsets of observations. This again
improves the stability of the algorithm.
Variational Bayes
The stochastic approximation method described above works directly on

the model likelihood or on the posterior distribution to yield parameter
estimates. However, in many complex models, the posterior distribution
or the model likelihood is not available in closed-form because of the
presence of latent variables. In such settings, a variational approxima-
tion to the posterior can be used for fast estimation of model parameters.
Variational Bayesian (VB) methods for Bayesian models tackle the scal-
ability challenge via a deterministic optimization approach that approxi-
mates the posterior distribution and yields accurate parameter estimates
at a fraction of the computational cost associated with simulation-based
MCMC methods. VB methods are particularly suitable for estimating
complex models in which the number of parameters increases with data
size, as MCMC methods suffer because of the need to sample a very large
number of parameters.
Below we elaborate and extend recent developments in variational
Bayesian inference and highlight how two VB estimation approaches –
Mean-field VB for conjugate models and Fixed-form VB for non-
conjugate models – can be effectively combined for estimating complex
hierarchical marketing models.
Mean-field Variational Bayes (MFVB)
The essence of Bayesian inference is to summarize the posterior distribu-

tion of the unknown parameters p ( 0 y) . For almost all practical problems,
closed-form solutions are not available, necessitating approximation
methods such as MCMC and VB to summarize the posterior distribution.
MCMC uses simulation to sample from the probability distributions of
a Markov chain that has the desired posterior as its equilibrium distribu-
tion. In contrast, variational inference seeks to deterministically approxi-
mate the posterior with a simpler distribution, q () , called the variational
distribution (Bishop 2006; Ormerod and Wand 2010). The variational dis-
MIZIK_9781784716745_t.indd 288 14/02/2018 16:38

tribution represents a family of distributions of a certain functional form.

The goal is to find a member of the family that is closest to the posterior of
interest. In short, VB recasts Bayesian model inference as an optimization
problem, therefore making it possible to obtain advantages in speed and
scalability.
The objective function in VB optimization is the dissimilarity (or
distance) between the candidate variational distribution q () and the
posterior of interest p ( 0 y) . In probability theory, a measure of dissimilar-
ity between distributions is the Kullback-Leibler (KL) divergence, defined
as follows,
KL [ q () 00 p ( 0 y) ] 5 Eq [ logq () ] 2 Eq [ log p ( 0 y) ] $ 0, (12.7)
where, the expectation Eq [ · ] is with respect to the variational distribution

q () , and the equality holds if and only if q () 5 p ( 0 y) almost every-
where (Kullback and Leibler 1951).
Remember our goal is to find a proper approximating variational q ()
that makes the KL as close to zero as possible. But the posterior p ( 0 y)
in the KL is unknown to begin with, therefore we need to impose certain
restrictions on q () for the inference to proceed. Such restrictions serve to
structure the approximating distribution so that its functional form can be
either inferred or set.
Mean-field approximation represents the most useful restriction for
conjugate or semi-conjugate marketing models (Ormerod and Wand 2010;
to have a factorized product form w D

Grimmer 2010). Specifically, the variational distribution q () is assumed
i51 qi (i) , over some partition
{ 1,. . .,D } of . By setting 0 (q) /0q 5 0, it is easy to show the factors in the
product satisfy (Murphy 2012),
q*i (i) ~ exp { Eu2i [ log p (i 0 y,2i)] }, (12.8)
where, the expectation Eu2i [ · ] is over the variational distributions of the

remaining parameters except for i , and p (i 0 y, 2i) is the posterior full
conditional distribution. When we use conjugate priors for { i } D i51 , the
posterior full conditional distributions will have closed form, and in (12.8)
this leads to a closed-form solution to the optimal density q*i (i ) . Also,
the variational and the full conditional belong to the same distributional
family. Because of the use of conjugate or semi-conjugate priors, MFVB
is usually considered the deterministic counterpart to Gibbs sampling in
MCMC.
Rewrite qi (i ) 5 qi (i 0 hi ) , where hi is the parameter for the i-th vari-
ational distribution. Then finding the optimal variational density only
MIZIK_9781784716745_t.indd 289 14/02/2018 16:38

requires optimization to obtain the variational parameters { hi } Di51. As we

will see in the following sections, this can be done using simple coordinate
ascent optimization in which different variational parameters are updated
sequentially in an iterative and deterministic fashion, until convergence is
achieved. Within MFVB, the KL divergence is also a closed-form func-
tion of the variational parameters, therefore we can directly assess the
convergence of MFVB by monitoring the change in the magnitude of KL
in (12.7).
Fixed-form Variational Bayes (FFVB)
In MFVB, the optimal functional form as well as the optimal variational

parameters are inferred given only the likelihood and the conjugate priors.
Thus the mean-field assumption is nonparametric in spirit. When conju-
gacy is not available, however, one has to make parametric assumptions
that fix the functional form of the variational distributions (Honkela
et al. 2010; Wang and Blei 2013; Knowles and Minka 2011; Salimans
and Knowles 2013). This variational approximation for non-conjugate
models is often called fixed-form VB, and is analogous to the Metropolis–
Hastings algorithm within the umbrella of MCMC in its applicability to a
wide variety of non-conjugate models.
A number of different approaches have been used to implement fixed-
form variational Bayes. Wang and Blei (2013) suggest Laplace variational
inference, which is based on the Laplace approximation of the posterior.
Knowles and Minka (2011) use non-conjugate variational message pass-
ing with the delta method (see also Bickel and Doksum 2007; Braun and
McAuliffe 2010). Salimans and Knowles (2013) propose stochastic linear
regression, which we adopt in the current chapter for fixed-form VB,
thanks to its generality and accuracy for marketing applications.
In many empirical marketing settings, we can approximate the data-
generation process using the exponential family of distributions such as
normal, exponential and Poisson (Wedel and Kamakura 2001). When
fixed to this distributional family, the variational density can be written as
q ( 0 h) 5 n () exp (S () h 2 Z (h)) , (12.9)
where, h is the vector of natural parameters, S () represents the sufficient

statistics of , Z (h) ensures normalization, and n () is the base measure.
As the goal of FFVB is to find the hthat minimizes the KL divergence in
(12.7), Salimans and Knowles (2013) show in the stochastic linear regres-
sion framework that a fixed-point solution exists to the optimization
problem, namely,
MIZIK_9781784716745_t.indd 290 14/02/2018 16:38

h 5 Covq [ S () 21 ] Covq [ S () , log p ( y,)] , (12.10)
where, Covq denotes the covariance with respect to the variational distribu-
tion. Instead of approximating Covq [ S () 21 ] and Covq [ S () , log p ( y,) ]
directly, one can iteratively evaluate these terms using weighted Monte Carlo
with random samples of ̂ generated from the latest variational approxima-
tion q ( 0 h) .In particular, when multivariate normal is used to approximate
the posterior, i.e., q ( 0 h) 5N (mq (q), Sq(q) ) , where, h 5{ µq(q), Sq(q) }, Minka
(2001) and Opper and Archambeau (2009) show that (12.10) implies
02 log p ( y,u) 0 log p (y,u)

S21
q (u) 5 2Eq c 2 d and mq(u) 5 Eq [  ] 1 Sq(u) Eq c d,
0u 0u
(12.11)
where 0/0 and 02 /02 denote the gradient vector and Hessian matrix of
log p ( y,) , respectively. As in the general case, one can use weighted Monte
2 ( )
Carlo to stochastically approximate the quantities, H 5 2Eq [ 0 log0p 2 y, ] ,
0log p ( y,)
g 5 Eq [ 0 ] , and m 5Eq [  ] . Due to non-conjugacy, an analytical
expression for the KL divergence is unavailable for FFVB, therefore we
assess convergence based on the relative change in the estimates of the
variational parameters.
Next, we discuss two simulation studies that implement MFVB, FFVB,
and the combination of these two to handle hierarchical marketing models.
A Cross-nested Mixed Linear Model
Marketing research environments are replete with panel data which

require careful modeling of multiple sources of unobserved heteroge-
neity (Allenby and Rossi 1999). In many settings, data are available
on multiple consumers and on many different products. For instance,
data from recommender systems include ratings from different users on
many different items. A proper accounting of the variation in such data
sets requires the use of random effects for products as well as for cus-
tomers (Ansari, Essegaeir, and Kohli 2000), resulting in a cross-nested
structure.
The specification below gives a linear model with cross-nested random
coefficients (Rasbash and Browne 2008),
yij 5 xijr  1 zjr li 1 wir gj 1 eij ,
eij , N (0,s 2 ), li ,N (0,L) ,gj , N (0,G) , (12.12)
MIZIK_9781784716745_t.indd 291 14/02/2018 16:38

where, yij indicates the response for person i on item j, i 5 1,. . .,I, and the
vectors li and gj represent individual and product heterogeneities, respec-
tively. The covariate xij characterizes the individual and the item, zj consists
of item-specific variables, and wi contains individual-specific covariates such
as demographics. Each person is assumed to respond to an idiosyncratic set
of j [ Ji items, yielding an unbalanced data set with a total of SN i51 Ji 5N
observations. Such a model arises, for instance, in recommendation systems
where users rate different items (products). Ansari and Li (2017) detail the
derivation of closed-form variational distributions for this model.
To assess the speed, scalability, and accuracy of the MFVB approach,
we now compare it to Gibbs sampling on simulated data sets of varying
sizes. For MFVB, we use a tolerance of 10−4 as the convergence criterion.
For Gibbs sampling, we run the chain for 5,000 iterations, which reflects
a conservative estimate of convergence given the multiple sources of
heterogeneity in the model.
Table 12.1 shows the comparison results for simulated data sets of
different sizes. One can see that MFVB requires very few iterations for
convergence. It is also clearly apparent that the MFVB approach is
considerably faster than MCMC and results in a substantial reduction in
computational time.1 The last column of Table 12.1 reports the ratio of
the time required for Gibbs sampling to that of MFVB. As the MFVB
approach requires fewer iterations for larger data sets, we see that this
ratio increases with data set size. Therefore MFVB scales much better than
MCMC for larger data sets.
To assess the accuracy, we simulate 10 different data sets with I 53,000
and J 550, and compute the root mean squared errors (RMSE) between
the estimated and the true parameters. Across the 10 simulations, the mean
and standard deviation of RMSE across model parameters are 0.338 and
Table 12.1 Compare MFVB to MCMC for the cross-nested mixed linear

model
Persons Products Observations MFVB Gibbs Speed

I J I×J (Tol = 10-4) Sampling Ratio
(5000 iter)
# Iter Time (sec) Time (sec)
300 50 15,000 7 0.26 136.56 525.23
3,000 50 150,000 6 2.05 1,338.84 653.1
3,000 500 1,500,000 3 6.64 13,642.13 2054.54
30,000 500 15,000,000 3 114.44 593,138.32 5182.96
MIZIK_9781784716745_t.indd 292 14/02/2018 16:38

0.006, respectively, for MFVB estimation, when compared to 0.338 and

0.005, respectively, for Gibbs sampling. The similarity indicates that the
MFVB method produces parameter estimates that are as accurate as Gibbs
sampling.
A Hierarchical Logit Model
Understanding how consumers make choices has been of enduring interest

to marketers. Among the various methods of choice modeling, hierarchi-
cal logit models have arguably received the widest range of recent applica-
tions as it can flexibly approximate any random utility model (McFadden
and Train 2000). Suppose for individual i, where i 5 1,. . .,I, we observe Ti
choice events, and at each event the individual selects one option from J
alternatives. We can write the utility that individual i receives from option
j at the t-th choice event as follows,
r l 1 e .
Uijt 5 xijt (12.13)
i ijt
In the above utility function, xijt represents the observed variables

relating to individual i and alternative j at choice occasion t. The random
coefficient li captures individual preferences and is usually assumed to
follow a multivariate normal population distribution, li , N ( b,L) , that
characterizes the unobserved heterogeneity in the preferences. Also, let
yijt be the binary variable indicating whether or not option j is chosen by
individual i at event t. When the unobserved utility component eijt is iid
extreme value, we obtain the conditional logit choice probability
exp (xijt
r l)
i
P ( yijt 0 ,L) 5 . (12.14)
a exp (xikt li)
J
r
k51
We adopt typical semi-conjugate priors for the population parameters,

 ,N ( b , Sb) , and L , IW ( rL,RL) , which constitute the conjugate part
of the hierarchical model. In contrast, the individual parameters { li } Ii51
are nonconjugate, i.e., without closed-form posterior full conditionals.
The resulting hierarchical logit model contains both conjugate and
non-conjugate components. We can therefore use MFVB to update the
semi-conjugate population parameters and FFVB methods to update the
non-conjugate individual level coefficients.
We assume a factorized form for the variational approximation to the
true posterior,
MIZIK_9781784716745_t.indd 293 14/02/2018 16:38

q ( ,L, { li }} 5q (  }.q (L) . q q (li) .

I
(12.15)
i51
Conditional on li , one can apply MFVB to derive closed-form variational

distributions, q ( ) 5 N ( q (b), q(b)) and q (L) 5 IW (rq(L),Rq(L)) . For the
non-conjugate parameters, we resort to the exponential family and fix the
variational distribution to multivariate normal, q (li) 5 N (q(li), q(li)) .
This FFVB updating is then embedded as an inner loop within the outer
MFVB iterations to estimate the hierarchical logit model. Algorithm 12.1
provides the details of the hybrid VB process.
The computational time associated with the Hybrid VB procedure
can be reduced by resorting to stochastic variational Bayesian methods
which use stochastic approximation methods involving random samples
Algorithm 12.1: Hybrid VB for Hierarchical Logit Model

1. Initialize q(b), q(b), { q(li) } 4i , { q(li) } 4i ,rq(L) and Rq(L) .
2. Set the number of FFVB inner iterations M, and step size w.
3. FFVB updates for li,4i, as follows
(1) Initialize Hli 5 S21 q (li) , gli 5 q (li), mli 50.
(2) InitializeHli 5 0, gli 50, mli50.
(3) At each iteration n 5 1,. . .,M:
(a) Generate a draw ˆl i from N (q (li), q(li) ) .
(b) Calculate the gradient ĝ liand Hessian Ĥliof log
p(y,{li}i ) at ˆl i.
(c) Set
gli 5 (12w) gli 1wĝli, Hli 5 (12w) Hli
2wĤ and m 5 (12w) m 1wl ˆ i.
li li li
(d) Update q(li) 5 H21

li and q (li) 5 q (li) gli 1 mli.
(e) If n . M2 ,then,
2 2
gli 5 gli 1 ĝ , H 5 H li 2 Ĥli
M li li M
2ˆ
and Mli 5 m li 1 li.
M
21
(4) Set q(li) 5 H li and q(li) 5 Sq(li) gli 1 m li.
4. MFVB updates
Sq(b) 5 (S21
b 1Irq (L)Rq (L) )
21 21
and q(b) 5
MIZIK_9781784716745_t.indd 294 14/02/2018 16:38

b b 1 rq (L) Rq (L) a q (li) b.

I
q(b) aS21 21
i51
5. MFVB updates
rq(L) 5 rL 1I and Rq(L)5RL 1I Sq(b)
1 a ((q(li) 2 q(b)) (q(li) 2 q(b)) r 1 Sq(li)) .

I

i51
6. Repeat Steps 3–5 until convergence.
of the data. Note that Algorithm 12.1 updates the variational parameters
associated with each individual in an inner loop (Step 3) and then updates
the variational parameters for the population quantities in Steps 4–5.
However, this is wasteful early in the iterative process as the individual
level updates are based on population values that are far from the truth.
Therefore, in stochastic VB, the inner loop involves updating the param-
eters for a mini-batch of randomly selected individuals. The size of the
mini-batch can be adaptively increased over time such that the final
estimates are based on the entire data.
Estimating Hierarchical Logit via Hybrid VB
In this simulation study, we generate multiple datasets from a hierarchical

logit choice model in (12.14). We fix the values of b and L, and draw the
random coefficient li for each of the I individuals. The attribute covariates
of each choice option are simulated from a uniform distribution on each
choice occasion. Different datasets are generated with varying numbers of
individuals I and numbers of observations per individual T, reflecting big
data settings with lengthy and wide panels, respectively. Below we present
the primary results of this study.
First, we assess the speed and scalability of hybrid VB and stochastic
VB with adaptive mini-batch sizes, and compare them with MCMC runs
for 5,000 iterations. Table 12.2 reports the time to completion in seconds.
It can be seen from the table that the two VB schemes are much faster
and more scalable than MCMC. In particular, the VB with stochastic
optimization leads to the most significant improvement in estimation
speed. For instance, on the largest data set with 50,000 individuals and
200 observations per individual, the conventional MCMC with 5,000 runs
takes 41,262.3 seconds, or 11.5 hours, to finish, whereas, the stochastic VB
with adaptive mini-batch sizes merely takes 1,779.7 seconds, or less than
MIZIK_9781784716745_t.indd 295 14/02/2018 16:38

Table 12.2 Estimation time (seconds) for the hierarchical logit model
I T Hybrid Adaptive MCMC

VB Mini-batch (5,000 iter)
1,000 50 91.6 24.3 442.1
50,000 50 5,903.6 1,060.8 18,838.7
1,000 200 142.6 42.5 952.4
50,000 200 12,550.4 1,779.7 41,262.3
30 minutes, to converge. The 23 times speed ratio highlights the drastic

enhancement made possible by VB in handling big data situations.
We examine the accuracy of the hybrid VB and its variants through
the total variation error (TVE) in the choice probability estimates (Braun
and McAuliffe 2010). TVE assesses the distance between the estimated
predictive choice distribution p̂ ( yn 0 Xn) and the true predictive choice
distribution p* ( yn 0 Xn) , computed at a new attribute matrix Xn , i.e.,
a p̂ ( ynj 0 Xn) 2 p ( ynj 0 Xn) .

1 J *
TVE 5 (12.16)
2 j51
For the two VB schemes, we take Monte Carlo draws from the esti-
mated variational distributions q () and q () to approximate p̂ ( yn 0 Xn) .
For MCMC, we use the empirical distribution of the resulting Markov
chain for this approximation.
We calculate TVEs for 20 replications under every simulation scenario.
Table 12.3 reports the mean and standard deviation (in parentheses) of the
TVE results. We can see that both versions of VB generate TVEs similar
to those of MCMC, indicating that VB is as precise as the gold-standard
MCMC in predicting choice probabilities. It is also apparent that the
Table 12.3 Total variation error for the hierarchical logit model
I T Hybrid Adaptive MCMC

VB Mini-batch (5000 iter)
1,000 50 0.78% 0.76% 0.73%
(0.30) (0.28) (0.28)
50,000 50 0.62% 0.65% 0.67%
(0.15) (0.15) (0.29)
1,000 200 0.73% 0.71% 0.83%
(0.21) (0.26) (0.22)
50,000 200 0.64% 0.62% 0.70%
(0.23) (0.19) (0.19)
MIZIK_9781784716745_t.indd 296 14/02/2018 16:38

Table 12.4 Estimated population covariance matrix for the hierarchical

logit
Model
True Covariance Hybrid VB MCMC
0.250 0.125 0.125 0.125 0.260 0.127 0.120 0.128 0.248 0.127 0.123 0.123
0.125 0.250 0.125 0.125 0.127 0.262 0.129 0.127 0.127 0.244 0.121 0.127
0.125 0.125 0.250 0.125 0.120 0.129 0.261 0.121 0.123 0.121 0.241 0.124
0.125 0.125 0.125 0.250 0.128 0.127 0.121 0.253 0.123 0.127 0.124 0.249
larger the data set, the smaller is the total variation errors for the VB
methods, reflecting the suitability of VB for big data settings.
As characterizing consumer heterogeneity is very important for target-
ing and personalization in marketing, we also examine the recovery of
the population covariance matrix L. Table 12.4 presents the estimates
as well as the true covariance in the simulation with I = 50,000 and
T = 200. It is clear from the diagonal and off-diagonal entries that
hybrid VB and MCMC yield variance and covariance estimates at
similar levels of accuracy, relative to the truth. Thus, to the extent that
population distribution is useful for targeting and personalization, we
have shown how the hybrid VB approach is useful in supporting these
marketing actions.
Until now we have focused on the computational challenges that arise
from tall datasets. We now shift attention to wide data and illustrate
briefly dimension reduction approaches that are useful in such settings.
Wide Data
Marketers nowadays have access to a large collection of variables that

describe consumer behavior. If p denotes the number of variables (or
dimensions) and N denotes the number of observations, the term “wide
data” refers to situations where p is large. In fact, the easy availability of
textual and image data on the internet has resulted in datasets where the
number of dimensions can be larger than the number of observations, i.e.,
p . N, the so-called high dimensional datasets. For instance, consider a
marketer interested in understanding the relationship between the text of
a review and the overall rating or sentiment that a customer assigns to a
product/brand. In analyzing such textual data, the actual text of the review
is represented as a bag of words. Looking across the entire set of reviews
in the dataset (also called a corpus in Natural Language Processing), the
MIZIK_9781784716745_t.indd 297 14/02/2018 16:38

number of distinct words (also called the vocabulary size) can be signifi-
cantly greater than the number of reviews.
When modeling high dimensional data, analysts are interested in both
understanding the patterns that are present in the data and in predicting the
outcome of interest for future observations. A regression for the ratings, or
a logistic regression for the sentiment, that uses all the features/variables
as independent variables can result in an unwieldy statistical model that
overfits the noise in the data, and is therefore unlikely to predict well in the
future. Such a model is also not very useful in developing a proper under-
standing of the data, given the large number of coefficients that appear
relevant. In such situations, one is interested in sparse representations of the
data, i.e., to identify a statistical model in which relatively few parameters
are shown to be important or relevant. Such sparsity is achieved via regu-
larization approaches that result in automatic feature/variable selection.
More formally, consider a linear regression setup,
yi 5 b0 1 a xij bj 1 ei , i 51,. . .,N,

p
(12.17)
j51
where, the error e i is iid distributed, having  [ ei ] 50 and var [ ei ] 5 s2.

When p . N, ordinary least squares cannot be used as it overfits the data
and causes identifiability problems. In situations where p is large, but less
than N, least squares results in parameter estimates that are all non-zero,
making interpretation difficult. Least squares estimates have low bias, but
have large variance, and therefore will not predict well. One can improve
both interpretability as well as predictive performance by using some form
of regularization. Regularization is akin to using a “prior” on the coeffi-
cients so as to shrink their values by constraining them such that only a few
of the coefficients are nonzero. This involves estimating parameters using
a combination of a loss-function (e.g., squared loss, or negative log-likeli-
hood) and a penalty or regularizing function. Regularization offers both
computational and statisticaadvantages. For instance, regularization lends
numerical stability to the optimization problem, and can result in faster
solutions. From the statistical perspective, regularization avoids overfitting.
The statistical and machine learning literature on high dimensional
modeling has developed a number of penalized likelihood methods for
regularization. Among these, the lasso (Tibshirani 1996), and the elastic-
net are the most popular. We now describe the lasso briefly.
Lasso
In lasso, or ,1 regularized regression, the regression coefficients are

obtained using the optimization,
MIZIK_9781784716745_t.indd 298 14/02/2018 16:38

minimize 5 a ayi 2 b0 2 a xij bj b

N p 2

b i51 j51
subject to 0 b 0 # t
where, the ,1 norm is defined as 0 b 0 5 g j51 0 bj 0 . Notice that the optimi-

p
zation constrains the coefficients to the extent that their sum lies within
a “budget” t. The budget controls the complexity of the model. A larger
budget implies that there is greater leeway for the parameters and there-
fore more parameters are allowed to be non-zero. The value of the tuning
parameter t. that results in best predictions can be determined separately,
typically via cross-validation. Predictive performance is best when the
model is complex enough to capture the signal in the data, without at the
same time overfitting. The above optimization can alternatively be written
using a Lagrangian specification as follows:
minimize 5 a ayi 2 b0 2 a xij bj b 1 l a 0 bj 0 .

N p 2 p

b
i51 j51 j51
The tuning parameter l controls the relative impact of the loss function
and the penalty term. When l = 0, the penalty term has no impact, and
the lasso will provide the least squares estimates. Notice that the shrink-
age is not applied to the intercept, which measures the mean value of the
outcome variable.
The lasso is similar to the more traditional regularizer, ridge regression
that is popular in robust regression contexts. In ridge regression, the coef-
ficients are obtained via the following optimization:
minimize 5 a ayi 2 b0 2 a xijbj b 1 l a b2j .

N p 2 p

b i51 j51 j51
In ridge regression, the ,2 5 g j51 b2j penalty shrinks the coefficients

p
towards zero, when l→. This helps in improving prediction, but the ,2
penalty just reduces the magnitude of the coefficients; it does not set any
of the coefficients to zero. In contrast, the ,1 norm associated with the
lasso is special as it yields sparse (or corner) solutions, i.e., it not only
shrinks the magnitude of the coefficients but also ensures that only some
of the parameters are assigned non-zero values, by shrinking some of the
coefficients exactly to zero. Thus the lasso provides automatic relevance
determination or variable selection, and thus yields sparse models.
One can study the geometry of the optimization setup to understand
why the lasso results in corner solutions. Figure 12.1 represents the
MIZIK_9781784716745_t.indd 299 14/02/2018 16:38

β2 10 β2 10
8 8
6 6
4 4
2 2
β1 β1
–22 2 4 6 8 10 –22 2 4 6 8 10
–2 –2
2
(a) Lasso (b) Ridge
Figure 12.1 Constraint regions and contours of the error for lasso and
ridge regressions
s ituation for the two dimensional case. The constraint region ) b1)+) b2)# t
for the lasso is represented by the grey diamond and the constraint region
) b 21)+) b 22)# t2 for the ridge regression is represented by the grey circle. The
ellipses represent regions of constant loss (i.e., regression sum of squares).
The optimization solution is obtained by the first point where the elliptical
contours touch the constraint region. It is clear from the left figure that
the optimum can occur in the corner of the constraint set, whereas, such a
corner solution cannot be possible in the ridge regression setup.
While we looked at the lasso in the context of regression, it can also be
used for non-linear models, including generalized linear models. Extensions
of the lasso can be used for popular marketing models such as the multi-
nomial logit. We refer the reader to Hastie, Tibshirani, and Wainwright
(2015) for an extensive discussion of lasso and its generalizations.
Conclusions
In this chapter, we discussed how different marketing settings result in big

data and how the different characteristics of big data, i.e., the 4Vs, create
computational challenges for modeling such data. In particular, we focused
on how stochastic approximation approaches and stochastic variational
Bayesian methods can be used by marketers to handle the challenges that
arise from high volume. We also looked at the potential of regularization
approaches for handling high dimensional datasets. While our discussion
MIZIK_9781784716745_t.indd 300 14/02/2018 16:38

centers on tall and wide datasets, the other characteristics of big data, includ-
ing velocity and variety, are becoming increasingly relevant. A number
of exciting untapped research opportunities exist in modeling marketing
data in streaming contexts as well. Similarly, marketers can benefit from
modeling approaches that handle data of multiple modalities, such as text,
numbers, images and sound tracks. It is our hope that marketing researchers
will enthusiastically embrace these emerging and promising opportunities.
Note
1. For a fair comparison, we code both VB and MCMC in Mathematica 11 and use the just-in-
time compilation capability of Mathematica to compile the programs to C. We run all pro-
grams on a Mac computer with 3GHz 8-Core Intel Xeon E5 processor and 32GB of RAM.
References
Allenby, G. M. and P. E. Rossi (1999). “Marketing models of consumer heterogeneity,”

Journal of Econometrics, 89, 57–78.
Ansari, A., S. Essegaier and R. Kohli (2000). “Internet recommendation systems,” Journal of
Marketing Research, 37(3), 363–375.
Ansari, A. and Y. Li (2017). “Stochastic Variational Bayesian Inference for Big Data
Marketing Models,” Working Paper.
Ansari, A., Y. Li and J. Zhang (2017). “Probabilistic Topic Model for Hybrid Recommender
Systems: A Stochastic Variational Bayesian Approach,” Working Paper.
Bardenet, R., A. Doucet and C. Holmes (2014). “Towards scaling up Markov chain Monte
Carlo: an adaptive subsampling approach,” Proceedings of the International Conference on
Machine Learning.
Bickel, P. J. and K. A. Doksum (2007). Mathematical Statistics: Basic Ideas and Selected
Topics, 2nd ed., vol. 1, Upper Saddle River, NJ: Pearson Prentice Hall.
Bishop, C. (2006). Pattern Recognition and Machine Learning, New York: Springer.
Braun, M. and J. McAuliffe (2010). “Variational inference for large-scale models of discrete
choice,” Journal of the American Statistical Association, 105(489), 324–335.
Chen, T., E. B. Fox and C. Guestrin (2014). “Stochastic gradient Hamiltonian Monte
Carlo,” Proceeding of the 31st International Conference on Machine Learning.
Chib, S. and E. Greenberg (1995). “Understanding the Metropolis-Hastings algorithm,”
American Statistician, 49(4), 327–335.
Gelfand, A. E., S. E. Hillsb, A. Racine-Poonc and A. F. M. Smith (1990). “Illustration of
Bayesian inference in normal data models using Gibbs sampling,” Journal of the American
Statistical Association, 85(412), 972–985.
Gelfand, A. E. and A. F. M. Smith (1990). “Sampling-based approaches to calculating mar-
ginal densities,” Journal of the American Statistical Association, 85(410), 398–409.
Grimmer, J. (2010). “An introduction to Bayesian inference via variational approximations,”
Political Analysis, 19(1), 32–47.
Hastie, T., R. Tibshirani and M. Wainwright (2015). Statistical Learning with Sparsity: The
Lasso and Generalizations, Boca Raton, FL: CRC Press.
Honkela, A., T. Raiko, M. Kuusela, M. Tornio and J. Karhunen (2010). “Approximate
Riemannian conjugate gradient learning for fixed-form variational Bayes,” Journal of
Machine Learning Research, 11, 3235–3268.
MIZIK_9781784716745_t.indd 301 14/02/2018 16:38

Knowles, D. A. and T. P. Minka (2011). “Non-conjugate variational message passing for

multinomial and binary regression,” Advances in Neural Information Processing Systems, 24.
Korattikara, A., Y. Chen, and M. Welling (2014). “Austerity in MCMC land: cutting the
Metropolis-Hastings budget,” Proceedings of the International Conference on Machine
Learning.
Kullback, S. and R. A. Leibler (1951). “On information and sufficiency,” Annals of
Mathematical Statistics, 22, 79–86.
Li, Y. and A. Ansari (2014). “A Bayesian semiparametric approach for endogeneity and
heterogeneity in choice models,” Management Science, 60(5), 1161–1179.
Maclaurin, D. and R. P. Adams (2014). “Firefly Monte Carlo: exact MCMC with subsets of
data,” arXiv:1403.5693.
McFadden, D. and K. Train (2000). “Mixed MNL models for discrete response,” Journal of
Applied Econometrics, 15(5), 447–470.
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller and E. Teller (1953).
“Equation of state calculations by fast computing machines,” Journal of Chemical Physics,
21, 1087.
Minka, T. P. (2001). “A family of algorithms for approximate Bayesian inference,” Ph.D.
Thesis, MIT.
Murphy K. P. (2012). Machine Learning: A Probabilistic Perspective, Cambridge, MA: MIT
Press.
Neiswanger, W. Y. Y. and E. Xing (2014). “Asymptotically exact, embarrassingly parallel
MCMC,” Proceedings of the 30th International Conference on Conference on Uncertainty
in Artificial Intelligence.
Opper, M. and C. Archambeau (2009). “The variational Gaussian approximation revisited,”
Neural Computation, 21(3), 786–792.
Ormerod, J. T. and M. P. Wand (2010). “Explaining variational approximations,” American
Statistician, 64(2), 140–153.
Rasbash J. and W. J. Browne (2008). “Non-hierarchical multilevel models,” Handbook of
Multilevel Analysis, New York: Springer, 303–336.
Rasmussen, C. E. and C. K. I. Willams (2005). Gaussian Processes for Machine Learning,
Rossi, P. E., R. E. McCulloch and G. M. Allenby (1996). “The value of purchase history data
in target marketing,” Marketing Science, 15(4), 321–340.
Salimans, T. and D. A. Knowles (2013). “Fixed-form variational posterior approximation
through stochastic linear regression,” Bayesian Analysis, 8(4), 837–882.
Scott, S. L., A. W. Blocker and F. V. Bonassi (2016). “Bayes and big data: the consensus
Monte Carlo algorithm,” International Journal of Management Science and Engineering
Management, 11(2), 78–88.
Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso,” Journal of the
Royal Statistical Society, Series B, 267–288.
Toulis, P. and E. M. Airoldi (2015). “Scalable estimation strategies based on stochastic approx-
imations: classical results and new insights,” Statistics and Computing, 25(4), 781–795.
Toulis, P. and E. M. Airoldi (2016). “Implicit stochastic gradient descent,” Annals of
Statistics, forthcoming.
Wang, C. and D. M. Blei (2013). “Variational inference in nonconjugate models,” Journal of
Machine Learning Research, 14(1), 1005–1031.
Wang, X. and D. B. Dunson (2014). “Parallelizing MCMC via Weierstrass sampler,”
arXiv:1312.4605.
Wedel, M. and W. A. Kamakura (2001). “Factor analysis with (mixed) observed and latent
variables in the exponential family,” Psychometrika, 66(4), 515–530.
Welling, M. and Y. W. Teh (2011). “Bayesian learning via stochastic gradient Langevin
dynamics,” Proceedings of the International Conference on Machine Learning.
MIZIK_9781784716745_t.indd 302 14/02/2018 16:38

PART VI
GENERALIZATIONS AND
OPTIMIZATIONS
MIZIK_9781784716745_t.indd 303 14/02/2018 16:38

MIZIK_9781784716745_t.indd 304 14/02/2018 16:38
13. Meta analysis in marketing
Donald R. Lehmann
At its most basic level, meta analysis is an attempt to codify what we can
learn from multiple past experiences.
Types of Meta Analysis
Meta analysis and replication are closely related. Both focus on establish-
ing generalizations. In general, replications create data points for use in
meta analysis.
In marketing (and many other fields), meta-analysis has come to mean
a quantitative integration of past research projects, i.e., the analysis of a
number of related “primary” analyses. At least three types of meta analy-
ses have been employed which differ in their objectives.
Establishing the Statistical Significance of a Phenomenon
One goal is to see if the cumulative evidence demonstrates whether a cor-

relation, an effect, or level is “significant”, i.e., different from zero. Two
basic approaches have been used to test for non-zero effects. The first,
more conservative one is to count across cases in order to establish how
often (for example) an effect is significant. The second, and more appro-
priate, approach is to pool all the available information. In the rare case
when the raw data are available, this simply means combining the data
from each observation, estimating the effect of interest, and verifying
whether it is significant. In the more likely case that all that is available are
the significance levels from a number of studies, a pooling test can be used.
The test statistic for combining p values across k studies is:
22 a ln ( pi)
k

i51
It is distributed approximately chi-squared. For example, assume the p

values for five studies relating two variables to each other were .12, .34, .08,
.15, and .06. (Note that none of these studies is significant at the “stand-
ard” .05 level.) The corresponding natural logarithms are –2.12, –1.08,
–2.53, –1.90, and –2.81 respectively. These sum to –10.44. Multiplying
305
MIZIK_9781784716745_t.indd 305 14/02/2018 16:38

–10.44 by –2 is 20.88. The critical chi-squared value for significance at the

.05 level is 18.3. Thus, the combined effect is significant, even though it
was not in any of the five studies.
To a large extent, statistical significance is not an interesting question to
a practitioner. Essentially everything is related to everything else, however
remotely, so there is at least an infinitesimal relation or effect between any
two things (constructs, variables). A more interesting question is how large
is the relation or effect.
Establishing the Degree of Correlation among Variables
Here the focus is on the scale-less relation between variables. It makes

sense when neither variable is measured in units which matter per
se (for example, most multi-item scales measuring constructs such as
open-mindedness).
Establishing the Magnitude (Size) of an Effect
In marketing, there are two types of focal variables of interest. One is the
level of a variable, for example the percentage of people who adhere to a
drug regimen. The other is the magnitude of the impact of one variable
on another, as assessed by a coefficient in some statistical model such as
regression analysis. This type of meta analysis seems most managerially
relevant. We focus on it for the rest of this chapter because managers base
decisions on the size of the marginal impact rather than on the correlation
or whether it is significant.
Why Do a Meta Analysis?
There are two basic reasons for doing a meta analysis. The first is knowl-
edge development and learning. It is interesting to learn about empirical
generalizations (see Hanssens 2015) including both a sense of what a
typical/average effect is and which factors make it larger or smaller. The
second is to use the results to make predictions about what would happen
if a certain situation arises or to discover which situation produces the
largest (or smallest) effect.
MIZIK_9781784716745_t.indd 306 14/02/2018 16:38

Meta analysis in marketing 307
Meta Analysis and Replications
Meta analysis is closely related to the concept of replications. In market-

ing, “exact” replications rarely exist. In fact, realistically, a replication can
never be exact since time, researcher, participants, and so on necessarily
vary across studies. To the extent that the studies are treated as exact rep-
lications, this leads to an emphasis on statistical significance and/or the
average result.
By contrast, the type of meta-analysis most used and useful in market-
ing explicitly recognizes that differences exist in method and measure-
ments, analysis, the exact product or behavior examined, and the situation
(e.g., location, sample). Especially for post hoc meta analyses, this means
meta analysis treats the different studies and conditions as conceptual
replications. Thus, the focus is as much or more on identifying systematic
variation in results than on establishing a single (mean) empirical gener-
alization. Put differently, the appropriate focus is on finding systematic
variation in the results and its sources.
Steps in a Meta Analysis
Step 1: Deciding What to Analyze
This seemingly trivial step is still necessary. For example, if you are inter-
ested in the effect of price on the dependent variable (e.g., sales), you
need to decide if it is the regular price or price promotion and whether to
study absolute price or relative-to-competition price. If the answer is all of
the above, then you need to include additional variables (Z) in the meta-
analysis “design” to account/control for the differences.
Practically, what to study depends on what data (studies) are available.
For example, studying how a particular result depends on a specific
variable may be very desirable but not feasible given the paucity (or even
absence) of studies that report it. This leaves a choice: either set out on a
major effort to run studies or switch topics/focus. Realistically the latter is
typically the chosen (and wisest) approach. The scarcity of data typically
leads researchers to include different types of studies in the meta analysis
and in effect combine “apples and oranges,” i.e., conceptual/imperfect
replications.
MIZIK_9781784716745_t.indd 307 14/02/2018 16:38

Step 2: Sampling/Assembling a Data Set
A key task in meta analysis is to assemble a series of “similar” studies. The

goal is to collect all the relevant studies. Various forms of literature search
(now aided by online search tools) of key words, bibliographies, key jour-
nals, and so on are pretty standard. In addition, requests for unpublished
work (again often available online) are also useful. Alternatively, in some
cases an individual organization may have done a sufficient number of
related studies to support a meta analysis on their own.
Like any statistical analysis, the value of a meta analysis depends on the
representativeness of the data used in it. The problem is that studies are
often not available in the public domain. One category of the unavailable
studies stems from the so-called file drawer problem. The basic idea is that,
due to the publication process, studies with weak results are not published.
While weak is usually defined as statistically non-significant, this tends to
be related to having smaller effects or small sample sizes. When the data
come from company records, a similar tendency to hide failures has the
same consequences.
Efforts to deal with this problem include requests for unpublished (file
drawer) papers to “de-bias” the sample. One approach, similar to that
used to deal with non-response bias, is to collect a sample of unpublished
studies and test to see if their results differ from those included in the
meta analysis. This is particularly important when the dependent variable
is a level (e.g., percent). Ironically, when the meta analysis focuses on a
measure of impact (e.g., elasticity or regression coefficient), it may be
unaffected by a biased sample even if the values of the dependent variable
are.
One way to assess the seriousness of sample bias is to test for how many
null (zero) results would be needed to make the average effect non-sig-
nificant. Unfortunately, this is dependent on the number of observations
in the meta analysis and concerned mostly with statistical significance.
A more promising approach is to use the available studies to estimate
the distribution of effect sizes (e.g., as a gamma distribution) or just plot
them, and then see if the pattern suggests that a large number of small size
effects are likely to exist (Rust, Farley and Lehmann 1990). There is also
a question about which results are of higher quality: those that are easily
available or those that have in effect been buried. Sometimes results are
not published because of design flaws or some form of contamination.
Thus, even if one finds studies that differ from those included in a meta
analysis, it is not always clear what to do with them.
MIZIK_9781784716745_t.indd 308 14/02/2018 16:38

Step 3: Creating the Meta Analysis Model
Meta analysis has two components, the model of the effect of interest used
in the individual studies and the meta-analysis model of factors that influ-
ence its key outputs.
Assume a number of studies have been run that assess the effect of a
variable X on the criterion variable Y:
Y = B0 +B1X +B2W +e1,
where W stands for other variables that were included in the estimation of
B1. Here B1 is the effect of interest. The meta-analysis model then expresses
B1 as a function of other variables (Z):
B1 = C0 +C1Z +e2
Meta analysis focuses on both finding the “typical” B1, i.e., the average
effect, and, more heavily, on those factors (Zs) that influence it, i.e. the C
values. (Because the average is potentially influenced by the particular Z
values in the available observations, some researchers “de-bias” the aver-
age by using B0 to estimate it when the W values are effect coded.)
Variables to Include
The size of an effect (e.g. advertising elasticity) is determined by a variety

of factors. Roughly, these can be broken down into four categories, two
technical and two substantive. What follows is a highly abbreviated list of
the types of variables which fall in each category.
1. Technical (Methodological) Variables

a. Measurement: How each variable has been measured can
impact the results and therefore needs to be accounted for.
This includes
i. Scale/number of scale prints
ii. Self vs. other assessed; Unobtrusive vs. obtrusive
iii. Researcher/author (these have been found to have signifi-
cant effects)
b. Analysis: How the original paper/study examined its data is
critical. Aspects include
MIZIK_9781784716745_t.indd 309 14/02/2018 16:38

i. Other variables included in the model

ii. Functional form of the relationship (linear, non-linear)
iii. Estimation method (OLS, Hierarchical model, Bayesian,
etc.)
2. Substantive Variables
a. What was studied, for example:
i. Product (which may be broken down by, for example,
durable vs. consumable, high vs. low sales volume)
ii. Phenomenon (Sales, choice, attitude)
iii. Intervention/Manipulation
b. Situation, for example:
i. Location
ii. Sample characteristics
1. Age, etc.
2. Expertise
iii. Context
1. Lab, online, field
2. Time period
Step 4: Data Discovery and Preparation
At least in the case of meta analysis based on others’ (i.e., published)

research, the information available is often not in the desired form or not
explicitly included. Consider first the dependent variable (e.g., the size of
the effect you are interested in). Some studies report standardized coef-
ficients, others unstandardized ones, elasticities, or correlations. The first
step is to convert these to a common metric.
The next step has to do with the variables you wish to associate with the
dependent variable. Here you will encounter missing data as well as varied
operationalizations. Standard approaches (all imperfect) for dealing with
missing data include simply deleting the observation (thereby reducing
power and risking biasing the results), treating the data as missing in the
analysis, replacing it with the mean from the available observations, or
inferring/imputing the value based on relating the variable with missing
data to others in those observations that included both and then using
the value of the particular observation on those other variables to predict
(infer) what the value is on the missing variable.
One important consideration is that you are not limited to using
variables that were used in the individual studies. For example, if age was
MIZIK_9781784716745_t.indd 310 14/02/2018 16:38

not included in the previous analyses/models, you can still include it in

the meta analysis, assuming it is reported in the paper or project report.
This is an often overlooked opportunity to create “new” knowledge via
meta analysis. Of course, one can always go back to the authors (or their
web appendices, etc.) to try to obtain data on other variables as well. A
less arduous but sometimes useful approach is to ask a panel of experts to
assess other variables, for example how innovative the products used in
the individual studies were. Thus it is quite feasible to estimate the effects
of variables not included in the individual studies (in our example age and
product innovativeness).
“Design” Inefficiency
A major statistical problem in meta-analysis on published data is that the

sample is not only limited in size but also in coverage of the design implied
by the set of variables used in it. Typically, only a small fraction of possible
combinations of the variables have been employed in studies. For example,
even if the “design” only contains eight binary variables, it would require
256 studies (observations) just to have a single observation in each cell.
Further, there is a strong tendency for studies to be concentrated
in a small number of combinations. This occurs partly because future
researchers gravitate toward methods that “work,” and the review process
favors established procedures. In the limit this means a subject is primarily
studied with one data collection method, set of covariates, measure of
each variable, estimation method, and so on, which makes it impossible to
determine which aspect leads to the results.
More generally, there are a few clusters of studies that follow similar
designs. This leads to a collinearity problem and the need to decide which
variables to include (using cluster membership as a variable is one option)
and which to exclude.
In order to reduce this problem in the future, one can resort to “prospec-
tive” meta analyses. In these, the design is laid out up front and studies are
then executed to fill the various cells in a specific design. Unfortunately,
this requires a large budget and/or a large number of cooperative research-
ers who, even with the best of intentions, inevitably introduce additional
variance into the data.
A more manageable approach is to simply design the next study to
provide the maximum additional information possible (Farley, Lehmann
and Mann 1998). The result is to basically see what is the most typical
combination of (design) variables in the meta analysis and then to change
about half of them.
MIZIK_9781784716745_t.indd 311 14/02/2018 16:38

Step 5: Refining the Meta Analysis Model
It would be ideal, once you specified the variables to include and coded
each observation on them, to simply run a regression (or some other pro-
cedure) on the data set and be done. Unfortunately, this is rarely possible
when you have several predictors (e.g., of the size of the effect).
The first problem is sample size. Many levels of variables (e.g., studies
done in South America) typically have few observations. Although there is
no hard rule, when you have fewer than five observations, the coefficient
of their effect tends to be unstable. This leaves two choices: drop the
variable (and risk omitted variable bias) or group the variable with similar
ones. While it is possible to do this by verifying which other variables seem
to have a similar effect and grouping them together, it is generally fine to
just group a variable with another on logical/theoretical grounds.
The second problem is non-significance of coefficients driven by limited
sample size plus collinearity (confounding) of the predictor (design)
variables. Here again you face the option to drop the variable (which
again may be insignificant because of its relation to another variable, thus
producing biased coefficients) or combine it with others in an index for,
perhaps, income and education. While this won’t separate the effects of
income and education, it also won’t produce a possible false interpretation
that only one matters.
Taking the two previous points together, this strongly suggests that the
first step in analysis should be to examine frequencies and the correlations
among the variables.
After an initial estimation of the meta -analysis model, at this point one
typically alters the variables in the model and re-estimates it. Depending
on the results, this may result in further modifications. The basic point is
that developing a meta analysis is a craft involving sequential adjustments
rather than a set of pre-determined steps.
Estimation Issues
Correlated Observations
Often multiple observations originate from the same paper, study or

author. Any of these have a large number of aspects that could influence
the results. Therefore, it is useful to control for these in the analysis. While
some researchers have dealt with this by either averaging the Bis within a
study or picking a representative one, this discards information. While the
correlated errors can be accounted for in a hierarchical model, a simple
MIZIK_9781784716745_t.indd 312 14/02/2018 16:38

fixed effect (dummy variable) to account for the mean effect for study typi-
cally performs quite well (Bijmolt and Pietens 2001).
Weighing Observations
Not all observations are of equal quality. One approach is to weigh the Bis
by the sample size used to estimate them. A more sophisticated approach
is to weigh the Bis by the inverse of their variance. Fortunately, in many
cases this does not materially alter the results.
Ancillary Statistics
Over time some researchers have begun calculating and reporting a

number of statistics related to the meta analysis.
Fail-safe n
This statistic (Rosenthal 1979) calculates the number of zero-effect studies
that would be required to be added before a/the result would become non-
significant. It has some value if the objective is to “prove” an effect is sig-
nificant (which suggests how many non-significant studies would have to
be excluded from the analysis, i.e., in a file drawer, to invalidate a finding).
Some researchers examine the pattern of results to see if there appears
to be a discontinuity at a specific level of result or statistical significance
(e.g., 5 percent). Essentially this involves “backcasting” a forecast of how
many small(er) results should exist for them to form a smooth curve (Rust,
Farley and Lehmann 1990). As a basic check, it is useful to simply plot the
distribution of effects and see if it looks reasonable.
Other tests
A number of other tests are sometimes reported. For example, Cochran’s Q
tests for whether results are equal (homogeneous) and I2 tests for whether
the variability is non-random (Huedo–Medina et al. 2006; Higgins and
Thompson 2002). If the meta analysis does not explain a significant amount
of the variance, it suggests all the results come from the same distribution
and can simply be averaged. Equivalently, if a particular design variable is
not significant, then it means that it may not have any effect. Importantly,
these tests are subject to the low power available in most meta analyses.
Indeed, some fairly large coefficients can be non-significant.
Equivalent tests can be done with regression. If the overall R2 of the
meta-analysis regression is not significant, this means you cannot reject
the hypothesis that there are no significant differences in the results (based
on the variables examined) and hence the studies are poolable, i.e., can
MIZIK_9781784716745_t.indd 313 14/02/2018 16:38

simply be averaged. Overall it makes sense to perform the meta analytic

regression and interpret the results with caution.
Fixed vs. Random Effects
The simplest way to assess effects is to assume they are “fixed,” i.e., deter-
ministic. Alternatively, you can assume there is unexplained (random)
variation in them; that is, random effects. As in the general econometric
literature, there are proponents of both in meta analysis (Hunter and
Schmidt 2000). Given a bias toward parsimony, I prefer simpler methods
(fixed effects). Put differently, if one wants to get a reasonable (ball park)
sense, fixed effects should suffice, at least as a starting point. For those
interested in more precision or those who believe effects vary, random
coefficients are frequently employed.
Meta Analysis as Predictive Simulator
It is both interesting and theoretically important to understand the

average size of variables (e.g., percents) and effects (e.g., elasticities), how
they vary, and what they depend on. It is also possible to use them as the
basis for simulations to answer “what if” questions. The answers to “what
if” both provide hypotheses for the results of future research and a basis
for managerial projection and decision optimization. For example, Punam
Keller has developed, with the Centers for Disease Control (CDC), a site
called MessageWorks, which allows a person to compare the likely effects
of different types of health communications. Such an approach, along
with the use of “big data,” can be used to automate many marketing deci-
sions (Bucklin, Lehmann, and Little 1998). Of course, as an approach is
used, it will change its effectiveness, for example due to competitive reac-
tions. This highlights the need to update meta analyses periodically as new
data become available (and include time as a variable in the analysis).
Postscript
Empirical generalizations, the output of meta analyses, have a long tradi-

tion in marketing. Early examples include Clarke’s (1976) analysis of the
duration (long term) effect of advertising on sales and Leone and Schultz’s
(1980) summary of sales response effects. They are also widely utilized in
fields such as medicine (where an early example uncovered the value of
aspirin in treating heart problems) and management. Table 13.1 provides
a sample of published meta analyses, with a heavy focus on marketing.
MIZIK_9781784716745_t.indd 314 14/02/2018 16:38

Table 13.1 Examples of meta analysis applications in marketing
Topic Paper Focus
MIZIK_9781784716745_t.indd 315
Advertising Assmus, Farley, and Lehmann (1994) Advertising elasticity
Aurier and Broz-Giroux (2014) Long-term effects of campaigns
Batra et al. (1995) Advertising effectiveness
Brown, Homer, and Jeffrey Inman (1998) Ad evoked feelings
Brown and Stayman (1992) Attitude toward the ad
Capella, Webster, and Kinard (2011) Cigarette advertising
Compeau and Grewal (1998) Comparative advertising
Eisend (2011) Humor in advertising
Eisend (2006) Two-sided advertising
Grewal et al. (1997) Competitive advertising
Hite and Fraser (1998) Altitude toward the ad
315
Keller and Lehmann (2008) Health communication
Lodish et al. (1995) TV advertising
Aurier and Broz-Giroux (2014) Ad campaign effects
Sethuraman, Tellis, and Briesch (2011) Brand advertising elasticities
Vakratsas and Ambler (1999) How advertising works
White and Italia (2000) Fear appeals in health campaign
Brands Eisend and Stokburger-Sauer (2013) Brand personality
Heath and Chatterjee (1995) Decoy effects
Capabilities Cano, Carrillat, and Jaramillo (2004) Market orientation
Kirca, Jayachandran, and Bearden (2005) Market orientation
Krasnikov and Jayachandran (2008) Marketing, R&D and Operations Capabilities
Consumer Behavior Beatty and Smith (1987) External Search
Carlson, Vincent, Hardesty, and Bearden (2009) Relation of Objective and Subjective Knowledge
Farley, Lehmann, and Ryan (1981) (Fishbein) attitude models
14/02/2018 16:38
MIZIK_9781784716745_t.indd 316
Topic Paper Focus
Farley, Lehmann, and Ryan (1982) Howard-Sheth model
Holden and Zlatevska (2015) Partitioning paradox
Janiszewski, Noel, and Sawyer (2003) Spacing Effects and Verbal Learning
Peterson, Albaum, and Beltramini (1985) Effect size in consumer behavior experiments
Scheibehenne, Greifeneder, and Todd (2010) Choice Overload
Sheppard, Hartwick, and Warshaw (1988) Theory of Reasoned Action
Szymanski and Henard (2001) Customer satisfaction
van Laer, de Ruyter, Visconti, and Wetzels (2014) Narrative Transportation
Völckner and Hofmann (2007) Price-Perceived Quality Relationship
Zlatevska, Dubelaar, and Holden (2014) Effect of Portion Size
316
New Products Arts, Frambach, and Bijmolt (2011) Consumer innovation adoption
Bahadir, Bharadwaj, and Parzen (2009) Organic sales growth
Chang and Taylor (2016) Consumer participation in new product development
Evanschitzky, Eisend, Calantone, and Jiang (2012) New product success
Henard, Szymanski (2001) New product success
Krishna et al. (2002) Effect of price presentation
Montoya-Weiss and Calantone (1994) New product performance
Noseworthy and Trudel (2011) Evaluation of incongruous product forms
Rubera and Kirca (2012) Innovativeness and firm performance
Sultan, Farley, and Lehmann (1990) Diffusion (Bass) models
Szymanski, Troy, and Bharadwaj (1995) Order of Entry Effect
Troy, Hirunyawipada, and Paswan (2008) Cross-functional integration
Van den Bulte and Stremersch (2004) Social contagion and income inequality
14/02/2018 16:38
Method Churchill and Peter (1984) Rating scale reliability
Cooper, Hedges, and Valentine (2009) General Reference
MIZIK_9781784716745_t.indd 317
Eisend (2015) Effect Size
Eisend and Tarrahi (2014) Selection bias
Farley, Lehmann, and Mann (1998) Study design
Farley and Lehmann (1986) General Reference
Farley, Lehmann, and Sawyer (1995) General Reference
Glass, McGaw, and Smith (1981) General Reference
Hedges and Olkin (1985) General Reference
Homburg, Klarmann, Reimann, and Schilke (2012) Key informant accuracy
Hunter and Schmidt (2004) General Reference
Kepes et al. (2013) General Reference
317
Peterson (2001) Use of college students
Peterson, Albaum, and Beltramini (1985) Effect size in consumer behavior experiments
Rosenthal (1991) General Reference)
Schmidt (1992) General Reference
Price Bell, Chiang, and Padmanabhan (1999) Promotional response
Bijmolt, Heerde, and Pieters (2005) Price elasticity
Estelami, Lehmann, and Holden (2001) Macro-economic determinants of price knowledge
Kremer, Bijmolt, Leeflang, and Wieringa (2008) Price promotions
Nijs, Dekimpe, Steenkamp, and Hanssens (2001) Price promotions
Rao and Monroe (1989) Impact on perceived quality
Sethuraman (1995) National and store brand promotional price elasticity
Sethuraman, Srinivasan, and Kim (1999) Cross-price effects
Tellis (1988) Price elasticity
14/02/2018 16:38
MIZIK_9781784716745_t.indd 318
Topic Paper Focus

Sales Albers, Mantrala, and Sridhar (2010) Personal selling elasticities
Brown and Peterson (1993) Salesperson job satisfaction
Churchill, Ford, Hartley, and Walker (1985) Salesperson performance
Franke and Park (2006) Adaptive selling and customer orientation
318
Geyskens, Steenkamp, and Kumar (1999) Channel relationship satisfaction
Other Blut, Frennea, Mittal, and Mothersbaugh (2015) Switching costs impact on satisfaction and repurchase
Geyskens, Steenkamp, and Kumar (1998) Trust in channel relationship
Gelbrich and Roschk (2011) Complaint compensation and satisfaction
Palmatier, Dant, Grewal, and Evans (2006) Relationship marketing
You, Vadakkepatt, and Joshi (2015) Electronic word of mouth elasticity
Zablah, Franke, Brown, and Bartholomew (2012) Customer orientation impact on frontline employees
14/02/2018 16:38
Many have begun to see meta analysis as a series of pre-specified proce-

dures (including tests for file drawer problems, tests for homogeneity of
variances, development of a hierarchical model) executed by using a par-
ticular software routine or package. The problem with this is it puts the
analyst farther from the data and makes less obvious the decisions that are
being made implicitly. Therefore, I prefer a more hands-on and iterative
approach whereby you collect data, do some analysis, revise your data or
model, do some more analysis, and so on. I also favor using standard OLS
regression, at least until I find the meta-analysis model I feel is most useful.
In other words, performing meta analysis is more craft than science (or
art) and one learns as much from encountering problems (e.g., about what
to study next) as from the final model output. Indeed, I have observed that
true experts are rarely dogmatic about specific procedures, even if their
writings suggest they are.
References
Albers, S., Mantrala, M. K., & Sridhar, S. (2010). Personal selling elasticities: a meta-analy-
sis. Journal of Marketing Research, 47(5), 840–853.
Arts, J. W., Frambach, R. T., & Bijmolt, T. H. (2011). Generalizations on consumer innova-
tion adoption: A meta-analysis on drivers of intention and behavior. International Journal
of Research in Marketing, 28(2), 134–144.
Assmus, G., Farley, J. U., & Lehmann, D. R. (1984). “How Advertising Affects Sales: Meta-
Analysis of Econometric Results,” Journal of Marketing Research, 21 (February), 65–74.
Aurier, P. & Broz-Giroux, A. (2014). Modeling advertising impact at campaign level:
Empirical generalizations relative to long-term advertising profit contribution and its
antecedents. Marketing Letters, 25(2), 193–206.
Bahadir, S. C., Bharadwaj, S., & Parzen, M. (2009). A meta-analysis of the determinants of
organic sales growth. International Journal of Research in Marketing, 26(4), 263–275.
Bahadir, S. C., Bharadwaj, S., & Parzen, M. (2009). A meta-analysis of the determinants of
organic sales growth. International Journal of Research in Marketing, 27(1), 87–89.
Batra, R., Lehmann, D. R., Burke, J., & Pae, J. (1995). When does advertising have an
impact? A study of tracking data. Journal of Advertising Research, 35(5), 19–33.
Beatty, S. E. & Smith, S. M. (1987). External search effort: An investigation across several
product categories. Journal of Consumer Research, 14(1), 83–95.
Bell, D. R., Chiang, J., & Padmanabhan, V. (1999). The decomposition of promotional
response: An empirical generalization. Marketing Science, 18(4), 504–526.
Bijmolt, T. H., Heerde, H. J. V., & Pieters, R. G. (2005). New empirical generalizations on
the determinants of price elasticity. Journal of Marketing Research, 42(2), 141–156.
Bijmolt, T. H. & Pieters, R. G. (2001). Meta-analysis in marketing when studies contain
multiple measurements. Marketing Letters, 12(2), 157–169.
Blut, M., Frennea, C. M., Mittal, V., & Mothersbaugh, D. L. (2015). How procedural,
financial and relational switching costs affect customer satisfaction, repurchase intentions,
and repurchase behavior: A meta-analysis. International Journal of Research in Marketing,
32(2), 226–229.
Brown, S. P. & Peterson, R. A. (1993). Antecedents and Consequences of Salesperson Job
Satisfaction: Meta-Analysis and Assessment of Causal Effects. Journal of Marketing
Research, 30 (February), 63–77.
MIZIK_9781784716745_t.indd 319 14/02/2018 16:38

Brown, S. P., Homer, P. M., & Inman, J. J. (1998). A meta-analysis of relationships between ad-
evoked feelings and advertising responses. Journal of Marketing Research, 35(1), 114–126.
Brown, S. P. & Stayman, D. M. (1992). Antecedents and Consequences of Attitude Toward
the Ad: A Meta-Analysis. Journal of Consumer Research, 19(1), 34–51.
Bucklin, R. E., Lehmann, D. R., & Little, J. D. C. (1998). From decision support to decision
automation: a 2020 vision. Marketing Letters, 9(3), 235–246.
Cano, C. R., Carrillat, F. A., & Jaramillo, F. (2004). A meta-analysis of the relationship
between market orientation and business performance: evidence from five continents.
International Journal of research in Marketing, 21(2), 179–200.
Capella, M. L., Webster, C., & Kinard, B. R. (2011). A review of the effect of cigarette adver-
tising. International Journal of Research in Marketing, 28(3), 269–279.
Carlson, J. P., Vincent, L. H., Hardesty, D. M., & Bearden, W. O. (2009). Objective and
subjective knowledge relationships: A quantitative analysis of consumer research findings.
Journal of Consumer Research, 35(5), 864–876.
Chang, W. & Taylor, S. A. (2016). The Effectiveness of Customer Participation in New
Product Development: A Meta-Analysis. Journal of Marketing, 80(1), 47–64.
Churchill, G. A., Ford, N. M., Hartley, S. W., & Walker, O. C. (1985). The Determinants of
Salesperson Performance: A Meta-Analysis. Journal of Marketing Research, 22(2), 103–118.
Churchill, G. A. & Peter, J. P. (1984). Research Design Effects on the Reliability of Rating
Scales: A Meta-Analysis. Journal of Marketing Research, 21(4), 360–375.
Clarke, D. G. (1976). Econometric Measurement of the Duration of Advertising Effect on
Sales. Journal of Marketing Research, 13 (November), 345–357.
Compeau, L. D. & Grewal, D. (1998). Comparative price advertising: an integrative review.
Journal of Public Policy & Marketing, 17(2), 257–273.
Cooper, H., Hedges, L. V., & Valentine, J. C. (2009). The handbook of research synthesis and
meta-analysis (2nd ed.). New York: Russell Sage Foundation.
Eisend, M. (2015). Have We Progressed Marketing Knowledge? A Meta-Meta-Analysis of
Effect Sizes in Marketing Research. Journal of Marketing, 79(3), 23–40.
Eisend, M. (2011). How humor in advertising works: A meta-analytic test of alternative
models. Marketing Letters, 22(2), 115–132.
Eisend, M. (2006). Two-sided advertising: A meta-analysis. International Journal of Research
in Marketing, 23(2), 187–198.
Eisend, M. & Stokburger-Sauer, N. E. (2013). Brand personality: A meta-analytic review of
antecedents and consequences. Marketing Letters, 24(3), 205–216.
Eisend, M. & Tarrahi, F. (2014). Meta-analysis selection bias in marketing research.
International Journal of Research in Marketing, 31(3), 317–326.
Estelami, H., Lehmann, D. R., & Holden, A. C. (2001). Macro-economic determinants of
consumer price knowledge: A meta-analysis of four decades of research. International
Journal of Research in Marketing, 18(4), 341–355.
Evanschitzky, H., Eisend, M., Calantone, R. J., & Jiang, Y. (2012). Success factors of product
innovation: An updated meta-analysis. Journal of Product Innovation Management,
29(S1), 21–37.
Farley, John U. & Donald R. Lehmann (1986). Meta-Analysis in Marketing: Generalization
of Response Models. Lexington, MA: Lexington Books.
Farley, J. U., Lehmann, D. R., & Sawyer, A. (1995). Empirical marketing generalization
using meta-analysis. Marketing Science, 14(3_supplement), G36–G46.
Farley, J. U., Lehmann, D. R., & Ryan, M. J. (1981). Generalizing from “imperfect” replica-
tion. Journal of Business, 54(4), 597–610.
Farley, J. U., Lehmann, D. R., & Mann, L. H. (1998). Designing the next study for maximum
impact. Journal of Marketing Research, 35(4), 496–501.
Farley, J. U., Lehmann, D. R. & Ryan, M. J. (1982). Pattern in Parameters of Buyer
Behavior Models: Generalization from Sparse Replication. Marketing Science, 1 (Spring),
181–204.
Franke, G. R., & Park, J. E. (2006). Salesperson adaptive selling behavior and customer
orientation: a meta-analysis. Journal of Marketing Research, 43(4), 693–702.
MIZIK_9781784716745_t.indd 320 14/02/2018 16:38

Gelbrich, K. & Roschk, H. (2011). Do complainants appreciate overcompensation? A meta-

analysis on the effect of simple compensation vs. overcompensation on post-complaint
satisfaction. Marketing Letters, 22(1), 31–47.
Geyskens, I., Steenkamp, J. B. E., & Kumar, N. (1998). Generalizations about trust in
marketing channel relationships using meta-analysis. International Journal of Research in
marketing, 15(3), 223–248.
Geyskens, I., Steenkamp, J. B. E., & Kumar, N. (1999). A meta-analysis of satisfaction in
marketing channel relationships. Journal of Marketing Research, 223–238.
Glass, Gene V., Barry McGaw, and Mary Lee Smith (1981), Meta-Analysis in Social
Research. Beverly Hills, CA: Sage Publications.
Grewal, D., Kavanoor, S., Fern, E. F., Costley, C., & Barnes, J. (1997). Comparative versus
noncomparative advertising: a meta-analysis. Journal of Marketing, 61(4), 1–15.
Hanssens, D. M., ed. (2015). Empirical Generalizations about Marketing Impact. Cambridge,
MA: Marketing Science Institute.
Heath, T. B. & Chatterjee, S. (1995). Asymmetric decoy effects on lower-quality versus
higher-quality brands: Meta-analytic and experimental evidence. Journal of Consumer
Hedges, Larry V. & Ingram Olkin (1985). Statistical Methods for Meta-Analysis. San Diego,
CA: Academic Press.
Henard, D. H. & Szymanski, D. M. (2001). Why some new products are more successful than
others. Journal of Marketing Research, 38(3), 362–375.
Higgins, J. P. T. & Thompson, S.G. (2002). Quantifying heterogeneity in a meta-analysis.
Statistics in Medicine, 21, 1539–1558.
Hite, R. E., & Fraser, C. (1988). Meta-Analyses of Attitudes toward Advertising by
Professionals. Journal of Marketing, 52(3), 95–103.
Holden, S. S. & Zlatevska, N. (2015). The partitioning paradox: The big bite around small
packages. International Journal of Research in Marketing, 32(2), 230–233.
Homburg, C., Klarmann, M., Reimann, M., & Schilke, O. (2012). What drives key inform-
ant accuracy? Journal of Marketing Research, 49(4), 594–608.
Huedo-Medina, T. B., Sanchez-Meca, J., Marin-Martinez, F., & Botella, J. (2006). Assessing
Heterogeneity in Meta-Analysis: Q Statistic or I^2 Index? Psychological Methods, 11(2),
193–206.
Hunter, J. E. & Schmidt, F. L. (2000). Fixed effects vs. random effects meta-analysis models:
Implications for cumulative research knowledge. International Journal of Selection and
Assessment 8, 275–292.
Hunter, J. E. & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in
research findings (2nd ed.). Newbury Park: Sage.
Janiszewski, C., Noel, H., & Sawyer, A. G. (2003). A meta-analysis of the spacing effect in
verbal learning: Implications for research on advertising repetition and consumer memory.
Journal of Consumer Research, 30(1), 138–149.
Keller, P. A. & Lehmann, D. R. (2008). Designing effective health communications: a meta-
analysis. Journal of Public Policy & Marketing, 27(2), 117–130.
Kepes, S., McDaniel, M. A., Brannick, M. T., & Banks, G. C. (2013). Meta-analytic reviews
in the organizational sciences: Two meta-analytic schools on the way to MARS (the Meta-
Analytic Reporting Standards). Journal of Business and Psychology, 28(2), 123–143.
Kirca, A. H., Jayachandran, S., & Bearden, W. O. (2005). Market orientation: A meta-
analytic review and assessment of its antecedents and impact on performance. Journal of
Marketing, 69(2), 24–41.
Krasnikov, A. & Jayachandran, S. (2008). The relative impact of marketing, research-and-devel-
opment, and operations capabilities on firm performance. Journal of Marketing, 72(4), 1–11.
Kremer, S. T., Bijmolt, T. H., Leeflang, P. S., & Wieringa, J. E. (2008). Generalizations on
the effectiveness of pharmaceutical promotional expenditures. International Journal of
Research in Marketing, 25(4), 234–246.
Krishna, A., Briesch, R., Lehmann, D. R., & Yuan, H. (2002). A meta-analysis of the impact
of price presentation on perceived savings. Journal of Retailing, 78(2), 101–118.
MIZIK_9781784716745_t.indd 321 14/02/2018 16:38

Leone, R. P. & Schultz, R. L. (1980). A Study of Marketing Generalizations, Journal of

Marketing, 44 (Winter), 10–18.
Lodish, L. M., Abraham, M., Kalmenson, S., Livelsberger, J., Lubetkin, B., Richardson, B.,
& Stevens, M. E. (1995). How TV advertising works: A meta-analysis of 389 real world
split cable TV advertising experiments. Journal of Marketing Research, 32(2), 125–139.
Lynch, J. G., Bradlow, E. T., Huber, J. C., & Lehmann, D. R. (2015). Reflections on the
replication corner: In praise of conceptual replications. International Journal of Research
in Marketing, 32(4), 333–342.
Montoya-Weiss, M. M. & Calantone, R. (1994). Determinants of new product performance:
a review and meta-analysis. Journal of Product Innovation Management, 11(5), 397–417.
Nijs, V. R., Dekimpe, M. G., Steenkamps, J. B. E., & Hanssens, D. M. (2001). The category-
demand effects of price promotions. Marketing Science, 20(1), 1–22.
Noseworthy, T. J., & Trudel, R. (2011). Looks interesting, but what does it do? Evaluation of
incongruent product form depends on positioning. Journal of Marketing Research, 48(6),
1008–1019.
Palmatier, R. W., Dant, R. P., Grewal, D., & Evans, K. R. (2006). Factors influencing the effec-
tiveness of relationship marketing: a meta-analysis. Journal of Marketing, 70(4), 136–153.
Peterson, R. A., Albaum, G., & Beltramini, R. F. (1985), A Meta-Analysis of Effect Size in
Consumer Behavior Experiments. Journal of Consumer Research, 12 (June), 97–103.
Peterson, R. A. (2001). On the use of college students in social science research: Insights from
a second-order meta-analysis. Journal of Consumer Research, 28(3), 450–461.
Rao, A. R. & Monroe, K. B. (1989). The effect of price, brand name, and store name on
buyers’ perceptions of product quality: An integrative review. Journal of Marketing
Rosenthal, R. (1991). Meta-analytic procedures for social research (vol. 6). Thousand Oaks,
CA: Sage.
Rosenthal, R. (1979). The “File Drawer Problem” and Tolerance for Null Results.
Psychological Bulletin, 86 (3), 38–41.
Rubera, G. & Kirca, A. H. (2012). Firm innovativeness and its performance outcomes: A
meta-analytic review and theoretical integration. Journal of Marketing, 76(3), 130–147.
Rust, R. T., Lehmann, D. R., & Farley, J. U. (1990). Estimating Publication Bias in Meta-
Analysis. Journal of Marketing Research, 27(May), 220–227.
Scheibehenne, B., Greifeneder, R., & Todd, P. M. (2010). Can there ever be too many
options? A meta-analytic review of choice overload. Journal of Consumer Research, 37(3),
409–425.
Schmidt, F. L. (1992). What Do Data Really Mean? Research Findings, Meta-Analysis, and
Cumulative Knowledge in Psychology. American Psychologist, 47 (October), 1173–1181.
Sethuraman, R. (1995). A meta-analysis of national brand and store brand cross-promo-
tional price elasticities. Marketing Letters, 6(4), 275–286.
Sethuraman, R., Srinivasan, V., & Kim, D. (1999). Asymmetric and neighborhood cross-
price effects: Some empirical generalizations. Marketing Science, 18(1), 23–41.
Sethuraman, R., Tellis, G. J., & Briesch, R. A. (2011). How well does advertising work?
Generalizations from meta-analysis of brand advertising elasticities. Journal of Marketing
Research, 48(3), 457–471.
Sheppard, B. H., Hartwick, J., & Warshaw, P. R. (1988). The Theory of Reasoned Action:
A Meta-Analysis of Past Research with Recommendations for Modifications and Future
Research. Journal of Consumer Research, 15(3), 325–343.
Sultan, F., Farley, J. U., & Lehmann, D. R. (1990), Meta-analysis of Application of
Diffusion Models. Journal of Marketing Research, 27, 70–77.
Szymanski, D. M. & Henard, D. H. (2001). Customer satisfaction: A meta-analysis of the
empirical evidence. Journal of the Academy of Marketing Science, 29(1), 16–35.
Szymanski, D. M., Troy, L. C., & Bharadwaj, S. G. (1995). Order of entry and business per-
formance: An empirical synthesis and reexamination. Journal of Marketing, 17–33.
Tellis, G. J. (1988). The Price Elasticity of Selective Demand: A Meta-Analysis of Econometric
Models of Sales. Journal of Marketing Research, 25 (November), 331–342.
MIZIK_9781784716745_t.indd 322 14/02/2018 16:38

Troy, L. C., Hirunyawipada, T., & Paswan, A. K. (2008). Cross-functional integration and
new product success: an empirical investigation of the findings. Journal of Marketing,
72(6), 132–146.
Vakratsas, D. & Ambler, T. (1999). How advertising works: what do we really know? Journal
of Marketing, 63(1), 26–43.
Van den Bulte, C. & Stremersch, S. (2004). Social contagion and income heterogeneity in new
product diffusion: A meta-analytic test. Marketing Science, 23(4), 530–544.
van Laer, T., de Ruyter, K., Visconti, L. M., & Wetzels, M. (2014). The Extended
Transportation-Imagery Model: A Meta-Analysis of the Antecedents and Consequences
of Consumers’ Narrative Transportation. Journal of Consumer Research, 40(5), 797–817.
Völckner, F. & Hofmann, J. (2007). The price-perceived quality relationship: A meta-ana-
lytic review and assessment of its determinants. Marketing Letters, 18(3), 181–196.
Witte, K. & Allen, M. (2000). A meta-analysis of fear appeals: Implications for effective
public health campaigns. Health Education & Behavior, 27(5), 591–615.
You, Y., Vadakkepatt, G. G., & Joshi, A. M. (2015). A meta-analysis of electronic word-of-
mouth elasticity. Journal of Marketing, 79(2), 19–39.
Zablah, A. R., Franke, G. R., Brown, T. J., & Bartholomew, D. E. (2012). How and when
does customer orientation influence frontline employee job outcomes? A meta-analytic
evaluation. Journal of Marketing, 76(3), 21–40.
Zlatevska, N., Dubelaar, C., & Holden, S. S. (2014). Sizing up the effect of portion size on
consumption: a meta-analytic review. Journal of Marketing, 78(3), 140–154.
MIZIK_9781784716745_t.indd 323 14/02/2018 16:38

14. Marketing optimization methods
Murali K. Mantrala and Vamsi K. Kanuri
In this era of big data and growing attention to ‘marketing analytics’,

there is much buzz about ‘optimization’ and ‘Marketing Optimization’
today in the trade press, in marketing analytics software vendors’ and
consultants’ offerings, email campaigns, ‘white papers’, webinars, and
blogs – especially those related to ‘lead generation’, ‘customer engage-
ment’, ‘multichannel marketing’, ‘social media optimization’ or SMO
and ‘digital marketing’ (e.g., Chaffey and Patron 2012). However, most
of these discussions tend to concentrate on what one industry consultant,
Decision Analyst, calls ‘micro’ optimization issues (Thomas 2006). These
are typically narrowly focused steps that can be taken to improve various
facets of outbound marketing campaigns or programs – e.g., ‘marketing
automation’, generation, tracking and conversions of leads to sales or
clicks in sync with inbound queries or visit behavior of prospects, includ-
ing measurements and lots of ‘testing’, e.g., ‘A/B data testing’ of website
features, copy testing, product testing, etc. There seems to be a nebulous
but prevalent idea that if some micro measure or facet of market response
to a marketing effort improves then the whole campaign becomes more
profitable and/or improves the firm’s achievement of its overall objectives
(Edelman 2010). This, unfortunately, may not be the case, once both the
real and opportunity costs of the marketing actions are accounted for, and
the overall objective is more precisely defined.
Further most of these ‘micro’ marketing optimization activities that are
focused on improving or refining steps in data collection, measurements,
and analyses, really belong to what can be termed the empirical ‘Estimation
Phase’ of an effort to improve marketing performance. However, once the
measurements have been made, marketing management still has to decide
how products and services should be configured and priced, marketing
budgets set and resources allocated across various uses and marketing
instruments. It is these types of questions and decisions that really con-
stitute the true or ‘macro’ ‘Optimization Phase’ of marketing strategy or
campaign improvement, and the way they are resolved can greatly impact
a firm’s overall profitability and performance. This chapter is concerned
with such Optimization Phase issues.
More precisely, this chapter’s subject, marketing optimization meth-
ods, encompasses the models, approaches, and techniques involved in
324
MIZIK_9781784716745_t.indd 324 14/02/2018 16:38

Marketing optimization methods 325
etermining and understanding the optimal solution to a marketing

d
decision problem. In our discussion of these issues, we will usually assume
some underlying predictive model of how the marketing output of interest
responds to changes in the relevant marketing input(s) is in place from the
Estimation Phase. Based on this assumption, we focus on modeling of a
marketing optimization decision problem, the derivation of the optimal
solution, and analytical insights into this solution’s properties as well as
the effects of deviations from the optimal actions by way of simulations
and ‘what-if’ analyses.
By emphasizing the nuances of macro or aggregate Marketing
Optimization Methods in this chapter, it is not our intention to minimize
the critical importance of the predictive model estimation phase in the
applications of these methods. However, it is quite clear that estimation
issues tend to get more attention and dominate optimization questions in
both marketing analytics research and practice. Consider, for example,
the continuing use of various short-cuts and ‘rules of thumb’ rather than
more economically sound analytical optimization principles in making
key marketing decisions such as setting budgets and product prices
(Doctorow et al. 2009) even as the use of more sophisticated data mining
and measurements has grown. This suggests a failure to grasp that richer
data and better measurements are of little value if decisions based on them
are done poorly in the end. Further, some decision rules that seem to
work well in some situations or time periods may severely reduce realized
profits, waste resources, or leave much money on the table in others. The
fact is that careful optimization matters as much as good estimation in the
pursuit of maximal profit and this reality must not be lost sight of amidst
all the current excitement about exploiting ‘big data’ and ‘data analytics’.
Consequently, this chapter seeks to provide a better understanding of
macro marketing optimization methods to interested marketing analysts
and high-level marketing decision-makers. We endeavor to accomplish
this objective via a survey of the methods, advances, and insights from
research and applications pertaining to Marketing Optimization Methods
by both academic and industry scholars over the last 70 years, with more
attention to several illustrative models within the last 20 years. Before
describing the content of this chapter in more detail, however, it is useful
to briefly review some basic definitions and history of the development of
Marketing Optimization Methods.
The Oxford dictionary’s meaning of ‘optimization’ is the ‘action of
making the best or most effective use of a situation or resource.’ The word
‘optimum’ is of Latin origin and means ‘the ultimate ideal’ or ‘the best of
something’. Thus, we can say optimization is concerned with determining
the action in some situation, with respect to some resource/s or activity/s
MIZIK_9781784716745_t.indd 325 14/02/2018 16:38

or instrument/s, i.e., choice or decision variables, that finds or achieves the

optimum (the ‘maximum’ or ‘minimum’) with respect to some specified
objective metric (e.g., profit). In general, an optimization problem is the
problem of finding the best solution from all feasible solutions. Given
quantitative inputs and outputs, the formulation of the problem is usually
mathematical in nature.
The typical math optimization problem involves the following: objec-
tives, steps and components. The objective is usually maximizing or mini-
mizing a real function over a feasible region, i.e., the ‘objective function’
that relates action/s with respect to the decision variables to the objective
metric of interest. The objective function embeds a predictive response
function of some form, relating inputs to immediate outcomes like ‘sales’
along with other components such as the costs of the inputs. It is optimized
by systematically choosing decision variable values from within a feasible
set of alternatives (i.e., decisions are usually subject to some constraints)
and computing the realized values of the objective function corresponding
to each of these input values. Subsequently, depending on the nature of
the decision variables, objective function, and constraints, there are a
variety of optimization ‘approaches’ available in the operations research
discipline to solve the problem including, e.g., linear programming (LP),
nonlinear programming (NLP), integer programming (IP), mixed integer
linear and nonlinear programming (MILP and MINLP), dynamic pro-
gramming (DP) and stochastic programming (SP). Within each approach,
there are a variety of techniques (‘algorithms’ or ‘heuristics’) for perform-
ing the computations involved, e.g., ‘simplex’ in LP, Newton’s method or
gradient descent method in NLP, simulated annealing in IP, ‘backward
induction’ in DP, genetic algorithms in SP and so on.
To summarize, in this chapter, we take Marketing Optimization
Methods as (1) encompassing the formulations or ‘models’ of macro
optimization decision problems, primarily comprised of ‘sales response
models’, objective functions, and constraints, that involve actions and
outputs which are of interest to marketers, (2) the optimality conditions
that should be satisfied by the decisions, as well as (3) the ‘approaches’ and
‘techniques’ used to determine the optimal solutions.
Marketing Optimization Methods have been an essential feature of
marketing decision model building and have played a prominent role
in spawning the field of Marketing Science in the last century. Indeed,
many of the early advances in the field tackled real marketing decision
problems that emphasized optimization more than estimation, e.g., Little
and Lodish (1969), Lodish (1971), Montgomery and Silk (1972) and
Urban (1969). This is because, at its core, marketing management is
about optimization with regard to the creation of offerings of products
MIZIK_9781784716745_t.indd 326 14/02/2018 16:38

(or services), product lines and assortments, the pricing of these offer-
ings, and the investment and allocation of resources towards activities
such as advertising and promotion, personal selling, distribution and
display involved in marketing these offerings. However, the emphasis
on optimization methods in Marketing Science clearly declined between
1980 and 2010. One indicator is that the term ‘optimization’ does not
figure among the top 20 most popular keywords associated with articles
in the leading journal Marketing Science since the beginning of the 1980s
(Mela et al. 2013). Rather, marketing scientists’ attention clearly shifted
to the empirical ‘estimation’ aspects of marketing problems as indicated
by keywords like ‘choice models’, ‘econometric models’, ‘forecasting’,
‘conjoint analysis’, ‘hierarchical Bayes’, and ‘Bayesian analysis’ in the top
20 keyword list of Mela et al. (2013).
But there are signs that research on marketing optimization methods
is making a comeback since the beginning of the new millennium with
the proliferation of new marketing technologies, channels, media, mar-
kets and competitors even as marketing budgets and resources remain
constrained. Clearly, marketers have many more options and factors to
consider and trade off in marketing decision-making with respect to prices
and limited resources. That is, marketing optimization problems facing
marketers have been rapidly multiplying in recent years, calling for greater
expertise in this domain for marketing success. Therefore, it is hoped that
this chapter’s review of classical as well as new marketing optimization
problems and solutions contributes to improving knowledge and stimulat-
ing research in this area.
In the next section, we begin with two important typologies of market-
ing optimization problems around which we organize the rest of the
content in this chapter. The first typology is a classification of these
problems according to the number (‘single’ or ‘multiple’) of ‘sales entities’
and marketing input variables involved in the problem. In the second
typology, we classify optimization problems according to the nature of the
objective function (e.g., static or dynamic) involved.
Typologies of Marketing Optimization

Problems
By Number of Sales Entities and Marketing Input Variables
Marketing optimization problems typically involve ‘sales entities’ that

generate the outputs of interest, e.g., ‘sales’, when acted upon or impacted
by the optimization decision variables. Sales entities in an optimization
MIZIK_9781784716745_t.indd 327 14/02/2018 16:38

problem could be single, e.g., the firm’s entire ‘market’, or multiple, e.g.,
customer segments of this market, geographic areas in this market, prod-
ucts or services being marketed or time periods or intervals of a planning
horizon. Lastly, sales entities can be individual customers and households
or more aggregate groupings of customers, e.g., market segments or
markets. The distinguishing feature of any sales entity is that it is charac-
terized by a sales response function relating the marketing input/s directed
at it and the outcome/s of interest from it (typically taken to be ‘sales’
units, e.g., number or dollar value of customers or orders or physical units
of a product sold, unless otherwise stated). Sales response function is syn-
onymous with ‘demand function’, especially when the decision variable of
interest is price.
Next, a problem may involve single or multiple inputs. They typically
are one or more of the famous ‘4 Ps’ of the marketing mix – product,
price, promotion, and place (distribution). Here it is useful to distin-
guish between three types of common marketing inputs, namely product
‘attribute’, ‘price’ or ‘resource’. In general, an attribute, e.g., ‘convenience’
or ‘durability’, is a feature of a product that has one of more ‘levels’
and the decision-maker can choose to include one or the other levels of
the attribute in the product. Naturally, the inclusion or exclusion of the
attribute-level will impact customer demand for the product as would
be represented by its demand function – which could be specified at the
individual- or more aggregate-level. Similarly, price is the payment per
unit of a good or service that is expected or required by the supplier. Price,
price discounts, markdowns (magnitudes and/or timing), shipping fee
reductions, are all price-related marketing decision variables. Notably,
price can also be viewed as a product attribute whose level will affect the
demand for the product. However, the price level is ‘special’ because it also
appears in the product margin per unit and, therefore, will have a second
effect on the level of profit made by the decision-maker. Because it appears
twice in a profit-focused decision-maker’s objective function – once in the
demand function, and a second time in the gross margin per unit demand –
in a multiplicative way, the decision-maker’s profit outcomes are typically
very sensitive to price changes.
Lastly, a resource is a source or supply from which benefit is produced.
Typically resources are stocks or supplies of materials, efforts, time,
services, staff, knowledge, or other assets that are transformed to produce
benefit and in the process may be consumed or made unavailable. Thus,
resources have utility, limited availability and can be depleted. A resource,
however, has a cost or monetary expenditure related to it that will enter
the objective function of the optimization problem. In marketing, the
common resources of interest include advertising and direct market-
MIZIK_9781784716745_t.indd 328 14/02/2018 16:38

ing expenditures, personal selling effort, trade promotion investments,

distribution channel investments, numbers of distribution outlets, sales-
people, service staff and shelf space. We shall distinguish between price
and resource decision variables when discussing illustrations of various
optimization problems (see Tables 14.1a and 14.1b) in the next sections.
By Type of Objective and Objective Function
There are a plethora of marketing output variables or metrics that could

be the ‘objective’ of interest in a marketing optimization problem. Some
common objective metrics at the firm-level include sales, shares, rev-
enues, gross and net profits, customer equity (e.g., Berger and Nasr 1998;
Blattberg and Deighton 1996) and at the individual-level include agents’
utilities and customer lifetime values. Typically, the optimal solution to
an optimization problem will change as the objective metric changes, even
as the sales entities, decision variables, and response functions involved
in the problem stay the same. For example, in general, the solution to a
constrained resource allocation problem changes when the objective is
changed from maximizing revenues to maximizing net profit contribu-
tion; or maximizing expected profit versus expected value that includes
outcome uncertainty.
Three other and perhaps more fundamental dimensions along which
objective function of marketing optimization problems vary include:
Static versus dynamic objective functions

A ‘static’ problem is one in which the decision-maker is interested in
choosing the levels of marketing input/s to maximize the objective metric
of interest, say profit, for just the short-term (current) time period.
Alternatively, if the decision-maker recognizes and accounts for the ‘car-
ryover effect’ of his/her decision beyond the current period, then s/he has a
dynamic or long-term objective function. The latter may be defined over a
finite or infinite time horizon. Most real-world decision makers have finite
time horizons. Analytically, however, it may be simpler at times to solve an
infinite horizon problem to derive insights into the nature of the optimal
decision. Further, time itself can be viewed as a discrete or continuous
variable. Continuous-time models are common in theoretical and analyti-
cal research. However, in practice, optimization problems are usually cast
as discrete-time problems because decision-makers think of making and
implementing decisions with regard to resources over discrete periods
of time like hours, days, weeks, months, quarters and years that are the
operational units of time typically used in their planning. However, con-
tinuous time problems and methods are becoming increasingly important
MIZIK_9781784716745_t.indd 329 14/02/2018 16:38

in practice even as technologies enabling ‘real-time’ decision-making pro-

liferate (e.g., Van Ittersum et al. 2013).
Lastly, a problem with a multi-period objective function can still reduce
to a static optimization problem if the decision-maker is interested only in
setting the level of a marketing input for the first period (i.e., a one-time
‘impulse’ decision). The problem can become a dynamic optimization
problem when the decision-maker is interested in not just choosing the
input level once in the beginning of the horizon, but is in fact interested in
determining the optimal sequence of actions (or ‘policy’) with respect to a
marketing input over the whole time horizon – finite or infinite – of inter-
est. Again, not all problems that involve choosing an optimal sequence of
actions are truly dynamic optimization problems. This is because there
actually might be no impact of the decision taken in one time period
or instant of time on the outcome in a future period or instant in time.
Then again, the problem reduces to making a series of static optimization
decisions. For example, if the price I set today has no impact on demand
in the next period because the customers involved in each period are com-
pletely different sets – say each set comes in, buys, and leaves the market
in the same period with no contact or communication with the next set
of customers – then what looked like a dynamic optimization problem
involving selecting a sequence of prices reduces to a series of separable
or independent one-period price decisions. Thus, a truly or inherently
dynamic optimization problem is one that has a dynamic objective func-
tion and the decision-maker’s aim is to choose a sequence of actions to
maximize some objective over a specified time horizon and where there are
intertemporal effects of the decision/s made in each time period or instant
that make them inter-temporally interdependent. (Note, however, that the
optimal solution to a real dynamic optimization problem could be keeping
the price or resource level constant over the planning horizon.)
Deterministic versus stochastic objective functions

Once sales response functions and other inputs involved in the objective
function have been calibrated in the estimation phase, many decision-
makers view the objective function as ‘deterministic’, i.e., the relationship
between the objective function metric and selected chosen input is taken as
certain, and the decision-maker proceeds with the optimization based on
that assumption. However, a more realistic decision-maker may recognize
that response functions are estimated with error and there may be other
random factors in the environment that may impact the realization of the
objective. In other words, the objective function is more likely than not
actually ‘stochastic’ in nature. Now if the decision-maker takes cognizance
of this uncertainty in the outcome then his/her objective metric of inter-
MIZIK_9781784716745_t.indd 330 14/02/2018 16:38

est will be modified to its ‘expected’ value (e.g., expected profit, expected
utility etc.) and his/her goal will be to choose the values of the input vari-
ables that optimize his/her expected value or utility objective function.
Depending on the form of this expected value objective function, the
variability (or variance) in the realized objective that is acknowledged by
the decision-maker may still not impact his/her optimal decisions. This
often occurs when uncertainty enters the objective function only in an
additive manner and is independent of the level of the marketing input,
and/or the decision-maker is ‘risk-neutral’, i.e., his/her expected utility
effectively does not give any weight to the variance in response. In all
other situations, the optimal decisions should be impacted by uncertainty
i.e., the optimal solutions in the deterministic versus stochastic cases
are different depending on the decision-maker’s risk attitude (e.g., risk-
neutral or risk-averse) (e.g., Aykac et al. 1989). Interestingly, deterministic
optimization problems tend to dominate in both academic research and
practice – probably because of the analytical tractability of deterministic
problems and/or the complexity in conceptualizing and solving stochastic
optimization problems.
Monopolistic versus competitive situations

In most real-world markets there is competition and, therefore, market-
ing optimization problems should be modeled as competitive decision
problems with the goal of finding equilibrium solutions which repre-
sent the decision-making firm’s or agent’s optimal choice of marketing
input taking into account the best response actions (reactions) of the
competitor/s involved in the problem (e.g., the competitive ‘Nash equilib-
rium’). However, finding equilibrium solutions involve sophisticated game
theory and fairly strict assumptions about the settings and the active com-
petition between rivals in them. Solving such problems is typically very
complicated – especially so when the optimization problems are dynamic
and differential games. Further, not all marketing decisions involve or
impact active outside competitors and many can be reasonably made
assuming competition’s choices of their marketing inputs remain fixed.
The latter types of problems involve ‘monopolistic’ objective functions.
Lastly, it should be noted that the above three dimensions may combine
in different ways in defining the nature of the objective functions: For
example, some problems may involve static and stochastic objective func-
tions while others may involve dynamic and competitive objective func-
tions and still others may be dynamic, stochastic as well as competitive.
Problems involving the latter are naturally the most complex optimization
problems to solve.
There are other criteria that may be applied to classifying marketing
MIZIK_9781784716745_t.indd 331 14/02/2018 16:38

optimization problems and models. However, the above six domains of

problems combined with the three forms of objective functions defined
above suffice to encompass a preponderance of marketing optimization
problems seen in practice. We now proceed to discuss in more depth the
‘methods’ related to modeling and solving selected examples of optimiza-
tion problem types that we have identified above. In each example, we
summarize the problem; the choice variables; the constraints; the sales
response model; the objective function; the solution approach/technique;
and optimization insights/principles from the solution. More specifically,
due to space constraints, we review only a few illustrative and instructive
examples of ‘Static’ and ‘Dynamic’ objective function problems in the
next two Sections. Most of these problems assume deterministic and
monopolistic objective functions but we believe these selections will
suffice to provide marketing analysts with useful insights into important
optimization methods, principles and approaches that they can apply in
fairly stable marketing environments where competitive reactions may
not an immediate concern. Tables 14.1a and 14.1b classify and summarize
several illustrative classic as well as recent examples of static and dynamic
marketing optimization problems using our first typology.
Static Marketing Optimization Problems
Single Resource Single Entity Optimization Problems
The most prevalent and recurring example of this optimization problem

for firms is the periodic, e.g., annual, marketing ‘budgeting’ problem.
The problem: The basic managerial question here is: ‘How much should
be the total marketing (or advertising or personal selling etc.) budget
($) that should be invested in a specified market in the current planning
period (denoted t)? Not surprisingly, it was among the earliest problems to
be tackled in Marketing Science with the focus on advertising (Dean 1951).
The choice variable: This is the total amount of the specified resource or
budget, denoted x, to be invested in the single decision period.
The sales response model: As already mentioned, this is the relationship
between the output (usually ‘sales’ in physical units denoted s) from the
single entity or market and the marketing input or effort. There is general
agreement that while sales increase with marketing input, a sales-market-
ing effort response function overall exhibits diminishing returns, i.e., is
concave as the marketing effort or resource expended increases. However,
at the time the initial budgeting models were proposed, there was some
debate (e.g., Little 1979) whether the aggregate sales response function is
MIZIK_9781784716745_t.indd 332 14/02/2018 16:38

Table 14.1a Illustrative static marketing optimization models
MIZIK_9781784716745_t.indd 333
Single Marketing Input Multiple Marketing Inputs
Non-price Price Multiple Non-price Price and Non-price
Inputs (IMC problems) Inputs (Marketing Mix
problems)
Single Representative Dean (1951) Monroe and Della Gatignon and Hanssens Dorfman and Steiner
Entity study Bitta (1978) (1987) (1954)
Optimization Determining profit- Determining profit- Determining profit- Determining profit-
problem studied maximizing advertising maximizing price for maximizing mix of maximizing mix of
budget new product interacting advertising price, advertising, and
and sales force efforts product quality
333
Optimization Marginal analysis Marginal analysis Numerical optimization Marginal analysis
approach
Multiple Representative Lodish (1980) Reibstein and Mantrala et al. (2007) Kanuri et al. (2017)
Entity study Gatignon (1984)
Optimization Determining profit- Determining expected Determining platform Determining platform
problem studied maximizing sales profit-maximizing firm profit-maximizing firm profit-maximizing
resource allocation product line pricing distribution, product design and pricing of
across products, quality, and sales menu of subscription
customers investments plans
Optimization Repetitive incremental Numerical Analytical (marginal Mixed integer
approach analysis solution to optimization analysis) nonlinear program
knapsack problem
14/02/2018 16:38
Table 14.1b Illustrative dynamic marketing optimization models
MIZIK_9781784716745_t.indd 334
Single Marketing Input Multiple Marketing Inputs
Non-price Price Multiple Non- Price and Non-price
price Inputs (IMC Inputs (Marketing Mix
problems) problems)
Single Representative study Nerlove and Arrow Nair (2007) Naik and Raman Naik et al. (2005)
Entity (1962) (2003)
Optimization Determining Determining the Determining Determining interactive
problem studied discounted price sequence that the discounted advertising and price
cumulative profit- maximizes expected cumulative profit- promotion policies
334
maximizing discounted value of maximizing mix maximizing discounted
advertising future profits from a of TV & Print cumulative profit over
expenditure policy durable good advertising finite horizon when
over infinite horizon facing oligopolistic
competition
Optimization Calculus of variations Dynamic Deterministic optimal Specialized ‘marketing
approach programming control theory) mix algorithm’
(numerical allowing for
procedure) interactions based
on deterministic
differential game
theory
14/02/2018 16:38
MIZIK_9781784716745_t.indd 335
Multiple Representative study Aravindakshan et al. Bayus (1992) Sridhar et al. (2011) Fischer et al. (2011)
Entity (2014)
Optimization Determining Determining the price Determining platform Finding discounted
problem studied spatiotemporal of two overlapping firm discounted cumulative profit-
allocation of ad generations of cumulative profit- maximizing pricing
budget maximizing products to maximize maximizing and allocations of
expected discounted total discounted investment policies marketing budget
335
value of future profits profit over second for product quality across mix of countries,
over infinite horizon generation time and sales force products, and
horizon investments over marketing activities
finite horizon
Optimization method Stochastic optimal Deterministic optimal Deterministic optimal Calculus of variations
control theory control theory control theory with Lagrange
approach
14/02/2018 16:38
Diminishing Returns (Concave) Quadratic
12.00 160.00
10.00
120.00
8.00
Sales ($)
Sales ($)
6.00 80.00
4.00
40.00
2.00
0.00 0.00
0 20 40 60 80 0 5 10 15
Marketing effort ($) Marketing effort ($)
S-Shaped Semi-log
70.00 5.80
60.00 5.60
50.00
Sales ($)
5.40
Sales ($)
40.00
30.00 5.20
20.00
5.00
10.00
0.00 4.80
0 20 40 60 80 100 120 140 160 0 2 4 6 8
Marketing effort (hours) Marketing Effort ($)
Figure 14.1 Common specifications of sales response models
concave over the entire range of effort or is S-shaped, i.e., initially convex
exhibiting increasing returns, and then decreasing returns after some level
of effort (known as the ‘inflection point’) (see Figure 14.1). S-shaped func-
tions actually seemed more consistent with marketing managers’ intuitive
beliefs as reflected by many subjective judgment-based or ‘decision
calculus’ measurements, as well as observed practices such as pulsing or
flighting in expending advertising budgets (Little 1979). Figure 14.1 shows
common examples of specifications of concave and S-shaped response
functions. In proposing his early ‘ADBUDG’ specification of the sales
response model, Little (1970) clearly felt it appropriate to allow for both
possibilities and let the data decide.
Subsequently, however, the bulk of empirical evidence supported the
view that aggregate sales response functions are predominantly concave,
MIZIK_9781784716745_t.indd 336 14/02/2018 16:38

not S-shaped in form (e.g., Simon and Arndt 1980). This was a very useful
empirical finding for later models and research because concave functions
are not only easier to estimate but also easier to optimize using marginal
analysis methods of convex programming (a special case of NLP). Unless
otherwise stated, we shall hereafter assume that sales response models
are concave in this chapter’s exposition. Mathematically, we express the
concave sales response function, as s = f(x), where f is a continuous, dif-
ferentiable function, with a positive first derivative or slope, f’(x) > 0, and
a negative second derivative, f’’ < 0.
The objective function: The standard assumption is that the outcome of
interest is net profit = dollar margin per unit times sales units less the cost
of resource. Now, if the resource being invested is the monetary equivalent
of some physical units (e.g., number of ads or number of sales reps) then
the cost of the resource is simply the same as the resource expenditure.
However, if physical units is the measure of the resource being allocated
then the cost of the resource could be a linear or nonlinear function of
these units. In the latter case, the usual assumption is that the resource cost
function is convex in form, i.e., cost per unit of the resource increases as
more units of it are consumed. Hereafter, unless otherwise stated, we shall
assume the resource input choice variable in the optimization problem is
measured in dollars rather than physical units.
Assuming that we are considering a setting where competition is
absent or not active, the monopoly profit objective function can then be
expressed as: p 5 ( p 2 c) f (x) 2x, where p = price per unit and c is unit
production cost. Thus, m 5 ( p 2c) is the gross margin or ‘contribution’
per unit. In the present discussion, we assume both price and production
cost are held constant.
The constraints: In this problem of determining the optimal budget, the
only constraint is that x ≥ 0. The mathematical statement of the profit-
maximizing resource budget-setting problem is then:
Maximize (x): p 5 ( p2 c) f (x) 2x 5 mf (x) 2 x, subject to x ≥ 0

(14.1)
The solution and optimality conditions: Notably, given the sales response
function is concave, the objective net profit as a function of the input
resource is also concave in form – specifically quadratic or inverted-U
in shape. This allows the use of convex programming (a special case of
NLP) approach to find the optimal budget. More specifically, because
the objective function is concave, we can simply find the maximum by
setting the first derivative of the objective function to zero, implying the
point where the incremental or marginal contribution dollars gained
MIZIK_9781784716745_t.indd 337 14/02/2018 16:38

from an additional unit of effort equals its incremental or marginal cost

(which is $1). In other words, the ‘optimality condition’ is simply marginal
contribution = marginal cost at the optimum x*, i.e., m f’(x*) = 1.
Note that the optimality condition emerged from the so-called first-
order condition (FOC) of the optimization problem. Because the objective
function is concave, however, the second-order condition is automatically
satisfied. i.e., the second derivative of f (x) at the optimum budget should
be negative for the optimum to be a maximum and not a minimum.
Insights from the solution: Upon examining the optimality condition,
the first insight is that the optimum budget increases as the gross margin
increases which makes sense when the goal is to maximize profit. Next,
with a little manipulation, the optimality condition can be expressed in
terms of marketing resource elasticity, denoted µ, where the elasticity is
defined as the percentage change in the sales output resulting from a one
percent change in resource input. Mathematically, we can denote market-
ing elasticity as: µ = (xf) ( 0x
0f
) . Then we can express the optimality condition
as follows: The ratio of the optimal budget to its resulting contribution
dollars should equal the marketing elasticity, i.e. µ 5 mfx* (x*) .
The flat maximum principle: This is a very key insight that emerges from
performing sensitivity (or ‘what-if’) analyses with respect to departures
from optimality. Specifically, one can investigate the percentage deviation
from optimized profit corresponding to some percentage deviation upward
or downward from the optimum budget. Tull et al. (1986) performed such
analyses assuming in turn one of three types of underlying sales response
models. The major insight from their analysis was that realized profit is
relatively insensitive to fairly wide deviations from the optimum budget.
Specifically, the profit is not more than 2.5 percent lower even with ± 25
percent deviations from the optimum budget (see Figure 14.2). Because
net profit is relatively insensitive while sales increase as effort is increased,
Tull et al. (1986) concluded that overspending errors are not as bother-
some as underspending errors. This finding, known as the flat maximum
principle, may appear reassuring to managers but as we shall show later,
the flat profit maximum response to budget changes can be very mislead-
ing because it can mask some very severe effects of poor optimization in
the realm of underlying budget allocation decisions.
The budgeting problem we have considered above assumes the goal of
the decision-maker is to choose the budget today that maximizes current
period net profit and does not allow for any long-term or carryover effects
of the marketing investment in the current period. However, virtually
all empirical studies have demonstrated that the effect of an impulse of
marketing effort in some time period or data interval, e.g., week, month,
year, is not instantaneous but carries over to subsequent periods. In this
MIZIK_9781784716745_t.indd 338 14/02/2018 16:38

1,010
810
The Flat
610
$
410
210
10
40 60 80 100 120 140 160 180
Marketing Investment level (hours)
opt sales CPI sales opt. profit CPI profit
Source: Figure obtained from Mantrala et al. (1992, Figure 6).
Figure 14.2 The flat maximum principle
connection, a meta-analysis of the carryover effects estimated by available

empirical studies of various marketing communication instruments has
been conducted by Kohler et al. (2017). The measure of carryover effect
used by these researchers as the dependent variable in their meta-analysis
is the ratio of the carryover effect to the total (current period + carryover
effect) that they call ‘long-term share of the total effect’ (LSTE). This
measure is dimensionless and allows pooling and comparison of LSTE
estimates from diverse forms of response model specifications. Kohler et
al. (2017) find that the mean value of 918 estimates of LSTE from prior
empirical studies of various marketing communication efforts (e.g., mass
media advertising, personal selling, targeted advertising) is about 0.61.
That is, on average, the carryover effect is twice as large as the short-term
effect. Thus, a more far-sighted manager who is aware of carryover effects
may wish to choose the budget that maximizes long-term net profit rather
than short-term profit.
The most common formulation to incorporate carryover effects in the
sales response model is the Koyck model that assumes geometric decay
of the short-term effect of marketing, i.e., st 5 f (x) 1lst21, where 0 < l
< 1 is the carryover parameter. If such carryover effects are allowed for
in the sales response model then the objective function for determining
the long-term profit-maximizing marketing investment or budget in the
current period is modified to:
f (x) mf (x)
Maximize (x): p 5 ( p2c) a b2x5 2 x, subject to x ≥ 0
12 l 12l
(14.2)
MIZIK_9781784716745_t.indd 339 14/02/2018 16:38

where the factor (1 21 l) is known as the marketing multiplier (Simon

1982). The solution to (14.2) must satisfy the optimality condition:
m r
(1 2 l ) f (x) 51. It is evident that the current period optimum budget is
significantly larger when the carryover effect is taken into account than
when it is ignored or overlooked.
Single Entity Single Price Optimization Problems
The problem: The fundamental question in this class of problems is: what
is the optimal price to set for a product assuming other marketing vari-
ables are held fixed?
The choice variable in this problem is the price p > 0.
The response or demand function in this case is: s 5 f ( p) . The common
assumption for this price response function is that sales decrease as price
increases. Some common specifications for this downward-sloping price
response functions are shown in Figure 14.3.
The objective function for this problem is then the gross contribution
= margin times demand as a function of price. Note that price variable
enters in the margin as well as the demand function making the objective
30.00
25.00
20.00
Sales ($)
Linear
15.00
Nonlinear
10.00
5.00
0.00
0 2 4 6 8 10
Price ($)
Notes:
Linear response function: S = 25 – 2P.
Nonlinear response function: S = e25p–10.
Figure 14.3 Price response functions
MIZIK_9781784716745_t.indd 340 14/02/2018 16:38

function concave even though the demand function is downward-sloping.

Mathematically, the decision maker’s problem is then:
Maximize (p):  5 ( p2c) f ( p) , subject to p ≥ 0 (14.3)
The optimal solution p* can be found by taking the first derivative of

(14.3) and setting it equal to zero. This FOC for optimality can then be
qualitatively stated as: the optimal price level is that at which marginal
revenue equals the marginal cost. Mathematically this can be expressed in
terms of price elasticity as follows:
( p* 2c) 2f ( p*) 1
*
5 r( *) 5 e
p
pf p (14.4)
where, e is the price elasticity and the left-hand side (LHS) of (14.4) is com-
monly known as the Lerner Index. This index (which is bounded between 0
and 1) is interpreted to be a measure of the market power for a monopolist
and the main insight is that the Lerner index reduces in magnitude as the
elasticity increases. That is, the higher the market’s price elasticity, the
lower is the firm’s market power. Note that (14.4) implies: p* 5 ( e 2e 1) c.
Single Entity Multi-variable Optimization Problems
The problem: Firms typically use multiple marketing variables, prices,

product features, and marketing resources together to influence demand
of a market. This then leads to the classic marketing mix optimization
problem, specifically, how should all these marketing inputs be simultane-
ously set so as to maximize the net profit of the firm?
The choice variables are the multiple marketing inputs, i.e., price as well
as resources such as advertising (u), personal selling (v), distribution (d)
etc.
The sales response model is then a joint function of the multiple inputs,
e.g., in the case of price and two resource variable s = f (p, u, v). It is
common to assume that the function f is concave with respect to the
resource variables (holding other variables constant) and downward-
sloping with respect to price (holding other variables constant). The objec-
tive function of this decision problem is then the net profit as a function
of all the marketing inputs and (assuming only two resources u and v), the
decision-maker’s problem can then be expressed as:
Maximize (p, u, v):  5 (p2c) f ( p,u,v) 2u 2v, subject to p ≥ 0, u ≥ 0,

v ≥ 0 (14.5)
MIZIK_9781784716745_t.indd 341 14/02/2018 16:38

The solution: Dorfman and Steiner (1954) were the first to derive the
conditions for optimal values of the decision variables in this problem.
Continuing with the assumption that the marginal cost is fixed, the three
optimality conditions – one for each of p, u, and v are derived from setting
the partial derivatives of the objective function with respect to each choice
variable equal to zero, i.e., their FOC:
0f
( p 2c) 1 f ( p,u,v) 5 0 (14.6)
0p
0f
( p 2c) 2 1 5 0 (14.7)
0u
0f
( p 2c) 2 1 5 0 (14.8)
0v
Let us denote the gross margin as a fraction of price by L, i.e.,
( c)
L 5 p2 p
0f
; the marginal revenue product of advertising by h, i.e., h 5 p 0u ;
0f
the marginal revenue product of personal selling by ϑ, i.e., q5 p 0v ; and
recall that the price elasticity e 5 (pf ) 0p
0f
. Then the above first-order condi-
tions can be compactly and meaningfully summarized in the form of
the famous Dorfman-Steiner (D-S) (1954) conditions for marketing mix
optimality. Specifically, the optimal levels of the marketing mix variables
are those that simultaneously satisfy the following conditions:
1
5 e 5 h 5 u (14.9)
L
That is, the optimal levels of the price and resources are those at which
the reciprocal of the gross margin as a fraction of price equals the price
elasticity as well as the marginal revenue products of the marketing
resources. Note that with further manipulation of (14.9), the conditions
for the optimal levels of the resources can be expressed in terms of their
respective elasticities. Specifically, Albers (2000) has provided the follow-
ing two versions of the rule for the optimal ‘marketing resource’ (advertis-
ing or personal selling etc.) budget level:
v* v 0f
5µ5a b (14.10)
mf (v*) f 0v
i.e., [Optimal marketing resource budget/Gross margin revenues (or profit
contribution)] = marketing resource elasticity (µ). Alternatively,
v* µ
*
5 (14.11)
f (v ) e
MIZIK_9781784716745_t.indd 342 14/02/2018 16:38

i.e., [Optimal marketing resource budget/Sales Revenue] = marketing

resource elasticity/price elasticity.
The appeal of the D-S conditions is that they apply for all sales response
functions that have the properties specified above. Second, they are very
implementable if elasticities and gross margins are known. That is, firms
can go a long way toward optimizing their marketing mix if they have and
can use some approximate estimates of response elasticities along with
their product margin information. Instead, however, as we have already
noted, many firms still take recourse to fairly arbitrary decision rules of
thumb such as percentage – of-past or forecasted sales and ‘affordable
methods – rather than the above optimal rules to set optimal marketing
budgets (Bigne 1995).
One explanation for the use of such decision rules is that firms find it
challenging to perform the measurements and analytics to empirically
determine marketing variable elasticities for their settings. It would be
surprising if larger firms that take pride in their analytics capability still
have this problem but the challenges are understandable in the cases of
smaller enterprises with less analytical capabilities or when businesses
enter new markets where not much historical data has accumulated
and field experiments are difficult. Interestingly though, there has been
considerable academic work in recent years on meta-analyses of numerous
past studies to provide robust benchmark estimates and empirical gener-
alizations (EGs) with respect to estimates of response elasticities and other
parameters like carryover estimates that could assist marketers making
marketing mix decisions. Some major meta-analyses and their findings are
summarized in Table 14.2. Plugging them into the D-S optimality condi-
tions, these benchmark estimates can be taken advantage of to determine
near optimal mixes in practice. In the next section, we move to multiple
sales entity problems.
Multiple Entity Single Resource Optimization Problems
The problem: There are two versions of these problems – the constrained
budget allocation problem; and the unconstrained budget allocation
problem. In the former, the question is: How should a given budget be
allocated across n different sales entities such as markets, customer seg-
ments, products, etc.? In the unconstrained problem, the optimal alloca-
tions are freely determined and their sum amounts to the optimal budget.
Below we consider the constrained budget allocation problem in a static,
deterministic, and monopoly decision-making setting. Some pioneering
examples of such marketing optimization problems published early on
include:
MIZIK_9781784716745_t.indd 343 14/02/2018 16:38

Table 14.2 Empirical generalizations from most recent meta analyses

useful for marketing optimization
Marketing Empirical Generalization Study

Instrument
Aggregate The LSTEa of marcom is 0.607. That Kohler et al. (2017)
marketing is, on average, the carryover effect
communication of a marcom effort is 1.545 times the
(marcom) effort short-term effect
The median 90% implied duration Kohler et al. (2017)
intervalb is 8.75 months
The LSTE of pharma products is 0.700 Kohler et al. (2017)
and that of non-pharma products is
0.571
Mass Media The LSTE of mass media advertising Kohler et al. (2017
Advertising is 0.523
The short-term advertising elasticity is Sethuraman et al.
between 0.09 and 0.120 (2011)
Henningsen et al.
(2011)
The long-term advertising elasticity is Sethuraman et al.
0.240 (2011)
Personal The current-period personal selling Albers et al. (2010)
Selling elasticity is 0.340
The current-period personal selling Albers et al. (2010)
elasticity in Europe is 0.426 and that
in the United States is 0.318
The LSTE of personal selling is 0.684 Kohler et al. (2017)
Price The price elasticity is -2.62 Bijmolt et al. (2005)
Price elasticities are the strongest in Bijmolt et al. (2005)
the growth stage of product
categories, both for durables and for
groceries.
Targeted The LSTE of targeted advertising is Kohler et al. (2017)
Advertising 0.642
(includes online,
direct mail, email)
Notes:
a
LSTE is the long-term share of total effect, which is defined as (carryover effect/total
effect) = carryover effect/(current period effect + carryover effect).
b
The ‘90% duration interval’ is the number of periods during which 90% of the expected
total or cumulative marcom effort’s effect has taken place.
MIZIK_9781784716745_t.indd 344 14/02/2018 16:38

1. Determining the optimal print advertising budget and/or its allocation

across different geographic areas or different print media (e.g., Little
and Lodish (1969) and Urban (1975));
2. Determining the optimal sales force size and/or its allocation across
different territories or products (e.g., Lodish (1980) and Mantrala et
al. (1992));
3. Determining the optimal selling time and/or call allocation across dif-
ferent customers (e.g., Lodish (1971) and Montgomery et al. (1971)).
Choice variables: These are the allocations (in either physical units, e.g.,
number of ads, number of sales calls etc. or their monetary equivalents) of
the total resource budget made to each sales entity, xi for i = 1. . .n.
The sales response models: As in the case of the budgeting problem, the
sales response functions characterizing the market entities (e.g., geographic
areas, products and media) competing for the resource lie at the heart of allo-
cation models. Frequently these disaggregate response functions are hetero-
geneous in their parameters if not shapes and can be concave or S-shaped.
Again, however, positive allocations to units are likely to fall in the concave
portions of the sales response curves. Therefore, we shall continue with the
assumption that functions are concave unless otherwise stated.
The objective function: The objective function is the sum of the contribu-
tions from each sales entity. Note that if all the individual entities’ sales
response functions are concave then their sum, i.e., the objective function,
is also a concave function of the allocations. Also, let the margin per unit
be constant in time although it may vary across the sales entities.
The constraints: In the constrained budget allocation problem, the
allocations should be greater than or equal to zero and the sum of the
allocations across the sales entities should be less than or equal to the total
budget B. The manager’s optimization problem then can be stated as:
Maximize (x): a mi fi (xi ) , subject to a xi # B (14.12)
The solution: The problem can be solved using convex programming

and the Lagrange multiplier technique. The optimal allocation solution
then should satisfy the following: because all functions are concave, the
budget will be exhausted under optimality and the optimality conditions
for maximizing the objective, i.e., total contribution, are (1) that the mar-
ginal contributions mi fir( x*i) of all entities that receive positive allocations
to the total budget., i.e., g x*i 5B. Note, however, that if the size of B is
should be equal at these allocations; and (2) these allocations should sum
sufficiently small, one or more entities may receive zero allocations in the
optimal solution.
MIZIK_9781784716745_t.indd 345 14/02/2018 16:38

In the unconstrained problem, when the total budget has not been
set, we can simultaneously determine the optimal total budget as well
as its optimal allocations by applying these optimality conditions:
m1 f1r (x*1) 5 ... 5 mi fir(xir) 5k, i.e., the marginal contributions of all enti-
ties at their optimal allocations should equal the marginal cost (k) of the
resource; and the optimal budget is equal to the sum of these optimal
allocations. Alternatively, the optimal allocations across the sales entities
are those at which the ratios of each pair of allocations is equal to the
ratio of their corresponding sales response elasticities, and the sum of the
allocations equals the total budget. Qualitatively, the key insight is that the
allocations to the sales entities should be proportionate to their response
elasticities or, more simply, the more responsive entities should receive
higher allocations.
Unfortunately, in practice, allocation decisions are often done by apply-
ing constant proportion of investment (CPI) allocation rules. Examples
include allocation of budgets according to the ratio of entities’ sales
potentials, consumer population sizes etc. The basic problem with such
allocation rules is that they often confuse potentials or market sizes with
responsiveness. Consequently, the optimal allocation ratios considering
responsiveness are often quite different from those of CPI allocations.
Further, under CPI rules all entities receive positive allocations regardless
of the size of the budget and, also, all entities’ allocations increase pro-
portionately as the budget is increased or decreased. However, given sales
response heterogeneity, optimization prescribes that for budgets below a
certain critical size, only some entities should receive positive allocations
while others should get nothing. Moreover, even when the given budget is
greater than the critical budget size, optimal allocations to entities often
increase disproportionately as the budget size is increased. This means
there can be reversals in the ratios of divisions of incremental budgets
among the entities. (Indeed, if the response functions are S-shaped, there
may even be reversals in not just allocation ratios but allocation levels as
well as budget increases.)
Furthermore, we have noted earlier that the ‘flat maximum effect’ can
mitigate to some extent the adverse consequences of budgeting errors.
However, as Mantrala et al. (1992) demonstrate, allocation errors are
usually much more consequential. Specifically, the authors show examples
where allocation errors can lead to so much of loss that the flat maximum
principle can be comforting for budgeters only when they can trust or
rely on allocators to make careful and optimal decisions. A number of
studies in the operations research literature have presented algorithms and
procedures for solving the distribution of effort problems when the sales
response functions are concave or S-shaped (see, e.g., Charnes and Cooper
MIZIK_9781784716745_t.indd 346 14/02/2018 16:38

1958; Freeland and Weinberg 1980; Koopman 1953; Sinha and Zoltners
1979).
Before concluding, we wish to highlight another interesting marketing
budget allocation problem that is a variation of equation (14.12) where
the objective metric is customer equity, e.g., Berger and Bechwati (2001)
(see also, Blattberg and Deighton 1996; Kumar and George 2007). More
specifically, customer equity is the sum of two customer-level net present
values: the return from customer acquisition spending and the return
from retention spending. Berger and Bechwati express customer equity
as: am 2A 1 a (m 2 Rr) [ rr (12rr) ] where, a is the acquisition rate (i.e.,
proportion of solicited prospects acquired) and depends on A, the level
of acquisition spending (i.e., dollars spent per solicited prospect), m is the
margin (in monetary units) on a transaction, A is the acquisition spending
per solicited prospect, R is the retention spending per customer per year.
Further, rr 5 (1 1r d) where, r is the yearly retention rate (as a proportion)
and d is the yearly discount rate appropriate for marketing investments
(again, as a proportion). The acquisition rate and retention rate are both
modeled as concave (modified exponential) functions of the acquisition
spending and retention spending respectively. Then the firm’s problem
is to allocate its promotion budget between acquisition spending and
retention spending so as to maximize its customer equity, subject to the
following constraints: A 1 (a*R) 5 B; A $ 0, R $ 0.
Berger and Bechwati (2001) solve this optimization problem using the
add-in Solver function in Excel that applies the NLP approach using
Generalized Reduced Gradient (GRG) technique. Solver proceeds by first
finding a ‘feasible’ solution, i.e., a solution for which all the constraints are
satisfied. Then, Solver seeks to improve upon the first solution through
changing the decision variables values to move from one feasible solution
to another feasible solution until the objective function has reached its
maximum or minimum.
More generally, Excel’s Solver can solve any non-linear optimization
problems with any type of restrictions (Fylstra et al. 1998). As noted by Albers
(2000), the development of Excel’s Solver, in a readily accessible spreadsheet
software environment, has certainly allowed the development and solution
of a wide range of common nonlinear optimization problems that arise in
marketing decision-making requiring numerical solution techniques.
Multiple Entity Multi-variable Optimization Problems
Product line pricing problems

The problem: We now address a very common but complicated
multivariable multiple entity pricing optimization problem: How should a
MIZIK_9781784716745_t.indd 347 14/02/2018 16:38

firm price a line of interdependent products? Here we shall discuss a fairly

straightforward version of the problem facing a retailer pricing a line of
eggs presented and solved by Reibstein and Gatignon (1984) that is quite
instructive.
The choice variables in the problem are the prices of individual products
in the product line (the products being the multiple entities).
The demand function (sales response to price changes) for each product
is assumed to be a multiplicative (concave downward) function of its own
price and prices of the other products. This demand function allows for
demand interdependencies as reflected by cross-price elasticities, i.e., the
impact of one product’s price on another product’s demand (as distinct
from own price elasticity that captures the effect of a product’s own price
on its own demand). If the price of one product has a positive effect on the
demand of another product, then they are substitutes; if the cross-price
effect is negative then they are complements. More specifically, consider
just two products with the following demand functions: S1 5 a1 pb11 pb212 and
S2 5 a2 pb121 pb22 where, p1 and p2 are the prices of the two products, b1and
b2 are own price elasticities for product 1 and 2 and b12 and b21 are cross
price elasticities for product 1 and 2.
The objective function is then: p 5 ( p1 2c1) S1 1 ( p2 2c2) S2 and the
decision-maker’s problem is:
Maximize ( p1, p2): p 5 ( p1 2 c1) a1 pb11pb212 1 ( p2 2 c2) a2 pb121 pb22 ,

subject to pi . 0 (14.13)
The solution to the problem can be found by taking the derivatives of the
objective function with respect to p1 and p2, respectively, and setting the
resulting expressions equal to zero. These two first-order conditions can
then be simultaneously solved to obtain the optimal prices. Upon doing
so, we obtain the following results as indicated by Reibstein and Gatignon
(1984):
b1 b21 S*2
p*1 5 c d 2 ca b a * b ( p*2 2c2) d (14.14)
(b1 11) c1 b1 1 1 S1
b2 b12 S*1
p*2 5 c d 2 ca b a * b ( p*1 2c1) d (14.15)
(b2 11) c2 b2 11 S2
Note that if the two cross-price elasticities are zero, i.e., the products are
independent, then each product’s optimal price can be found independ-
ently according to equation (14.4).
MIZIK_9781784716745_t.indd 348 14/02/2018 16:38

The key insights from this solution are the optimal price for each
product is a function of (1) its own elasticity; (2) own marginal cost; (3) the
price of the other product; (4) the cross-price elasticity; (5) the scale factors
for each product; (6) the other product’s cross-elasticity; and (7) the other
product’s marginal cost.
Resource allocation with cross-market network effects

The problem: We now consider a variant of the multivariable-multiple
entity marketing optimization problem where the multiple entities are two
distinct customer groups or sources of revenues of a platform firm in a
‘two-sided market’. For example, a daily newspaper firm obtains revenues
from two sources: (1) paying subscribers (readers) and (2) advertisers
who wish to reach the readers on the other ‘side’ of the platform. The
daily newspaper firm invests in marketing resources separately aimed at
each of its sources, e.g., investments in product quality and distribution
to increase its reader revenue; and investments in sales force effort to
increase its advertising revenues. The problem for the firm is to determine
the optimal levels of each marketing investment that maximize its total
profit from both sides. This problem was discussed and solved in a study
by Mantrala et al. (2007). There are numerous such platform firms in the
marketplace (Evans 2003).
The choice variables in the problem are the investment levels in product
quality, distribution, and sales force efforts.
The sales response model for this problem is novel as it comprises of two
interrelated demand functions – one representing reader demand and the
other for advertiser demand. The twist in the problem is that both groups’
demands potentially are impacted by not only the marketing inputs
aimed at them but also by the level of demand of the other group, i.e., the
presence of cross-market network effects. It is the consideration of such
network effects that marks marketing optimization in two-sided markets
as a novel and special but actually very prevalent category of marketing
analytics problems found in practice today.
More specifically, let (q, d, a) denote dollars invested in quality,
distribution, and advertising sales, respectively. Let S denote the number
of subscribers for the year, m1 = margin ($) per issue (price minus cost),
and k = number of issues subscribed to per year. Then, the number of
subscribers can be represented as S = f1(q,d, R) and advertising revenue
($) as R = f2(a, S) where f1(.) and f2(.) are general diminishing returns
response functions or concave as assumed in previous models. Specifically,
0fi 02fi
0x . 0, 0x2 # 0,
i = 1, 2 and x = q, d, a, R, S as necessary, with the system
allowing for interrelated demands, i.e., advertising revenue R directly
affects subscribers number in the subscription response function, and
MIZIK_9781784716745_t.indd 349 14/02/2018 16:38

subscriptions directly influence advertising revenue in the advertising

revenue response function.
The constraints are that marketing inputs should be non-negative and
sum to the total marketing budget. However, below, we consider the
unconstrained budget problem (with all other inputs including prices and
marginal costs held fixed) as in Mantrala et al. (2007).
The objective function of the problem, assuming again that the platform
operator’s goal is to maximize net profit, is then: p(q, d, a) = m1kS + m2R
– q – d – a where m1 and m2 are the respective constant gross margins ($)
per subscribed issue and per dollar advertising revenue respectively. Thus,
the decision-maker’s problem is:
Maximize (q, d, a):  5 m1 kS 1 m2R 2q 2d 2a, subject to q, d, a $ 0

(14.16)
The solution is found as follows: The first-order conditions that maxi-

mize net profit in (14.10) are:
0S 0R
pq 5 km1 1 m2 2 1 5 0 (14.17)
0q 0q
0S 0R
pd 5 km1 1 m2 2 1 5 0 (14.18)
0d 0d
0S 0R
ps 5 km1 1 m2 2 1 5 0 (14.19)
0a 0a
However, the subscribers S = f1(q,d, R) and advertising revenue R =

f2(a, S) constitute a system of two recursive equations. Therefore, to
obtain the optimal investment levels, as shown by Mantrala et al. (2007),
the derivatives in equations (14.11), (14.12) and (14.13) can be decom-
posed and rearranged to obtain the following more interpretable version
of the above three first-order conditions:
0f1 0f2 0f1

m1k 1 m2 5d
0q 0S 0q
0f1 0f2 0f1

m1k 1 m2 5 d (14.20)
0d 0S 0d
0f1 0f2 0f2

m1k 1 m2 5 d
0R 0a 0a
MIZIK_9781784716745_t.indd 350 14/02/2018 16:38

0f1 0f2
where, d 5 1 2 0R 0S ,
which the authors call ‘cross-market dependency
coefficient’.
Insights: The presence of d≠1 in (14.20) makes these FOCs differ-
ent from the standard ones obtained in single-revenue markets (e.g.,
Dorfman and Steiner 1954; Hanssens et al. 2001, 358–361). They are, in
fact, a generalization of the standard D-S results for a two-sided market.
Specifically, note that if either or both of the cross-market effects are zero
then (20) reduces to the standard D-S results (see equations (14.6) and
(14.7) above). Second, four types of markets based on the cross-market
dependency coefficient can be identified specifically unrelated, partially
related, interrelated with opposing feedback effects, and interrelated with
positive feedback effects (see Mantrala et al. (2007)). Most critically,
managers of platform firms need to know in which type of market they
operate in order to optimize marketing investment decisions.
Multi-format product line and pricing problem with cross-market network

effects
The problem: Kanuri et al. (2017) present and solve a version of this
problem facing a newspaper platform firm such as the one in the last
example. Specifically, the newspaper wishes to simultaneously determine
the configuration and pricing of a menu (i.e., product line) of multi-
format (i.e., print and digital) subscription plans to offer its heterogeneous
market of readers that maximizes its total profit from both subscribers and
advertisers.
The choice variables for the analyst are the price and configuration of
each plan in the menu.
The response models: To solve the platform’s menu design problem,
the analyst would need: (1) segment-level reader preferences and
willingness to pay (WTP) for various multi-format subscription plans;
and (2) estimates of inter-related reader and advertiser demand func-
tion elasticities by format. Because there are several print and digital
subscription bundles that are new to the market, the analyst would
not have a priori knowledge about preferences and WTP for all multi-
format plans. Therefore, the authors propose a logit model of plan
preferences to model the probability that reader i chooses plan g in
choice set q:
exp (xgq
r b 1 p b 1 zr b )
ix gq ip gq iz
Prigq 5 ,4i [ I and 5q
5i P [QQ
a [ exp (xgrqbix 1 pgrqbip 1 z gqbiz) ] 1exp (ai)
G
4q P
r r
g r51
(14.21)
MIZIK_9781784716745_t.indd 351 14/02/2018 16:38

where,
xgq = a vector of 1s and 0s representing multi-format versions available in plan g
and choice set q
pgq = weekly subscription price of plan g in choice set q
zgq = a vector representing interactions between the formats
bix = a vector of parameter coefficients (partworths) corresponding to format-
version x for reader i
bip = parameter coefficient (partworth) of price p for reader i
biz = a vector of parameter coefficients (partworths) corresponding to the
interactions between the formats
eigq = random component of reader i’s utility
ai = constant term representing the utility of the no-choice option for reader i.
Subsequently, the analyst can use the preference data to measure the
WTP for each multi-format plan g using Kohli and Mahajan’s (1991)
piecewise linear approach: Uij 0 2p 1Ui (p) $ ai 1 | e , where Uij 0 2p represents
the total utility of the plan configuration j excluding reader i’s utility of
price. Ui ( p) is the utility of a price point p, ai is reader’s utility of the status
quo or the no-choice option and | e is an arbitrary positive number used to
round the price ‘p’.
Next, the analyst can use the following 4-equation simultaneous
response model system to obtain reader and advertiser demand function
elasticities by print and digital formats:
(14.22)
where,
PAt,OAt Print and digital advertising demand at time period t
PRt,ORt Print and digital reader demand at time period t
PAMM,OAMM Marketing investments that affect print and digital advertiser
demand
PRMM,ORMM Marketing investments that affect print and digital reader
demand
PMP,OMP Number of potential print and digital readers in the NDMA
MIZIK_9781784716745_t.indd 352 14/02/2018 16:38

Note that this system extends the one noted in the previous example
(i.e., by Mantrala et al. (2007)) to multiple formats.
The objective function: The primary objective of the newspaper is
to maximize profits from readers and advertisers, which can be expressed
as:
Maximize (B, p): p = a j Bj (PFj) + (PAt 1 OAt) *Ma (14.23)
s.t. dkj 5 b 1, if Skj $ imax * Ski } d Skj $ 0  j, k (14.24)

[j
0, otherwise
Skj 5 RPkj – Pj  j, k (14.25)
a dkj $ 1 k (14.26)

je J
where, Bj indicates whether or not the newspaper is offering the jth sub-
scription plan, PFj is the subscription profit, PA, OA are the forecasted
print and digital advertising revenues, Ma is the margin on print and
digital advertising revenue, Skj and RPkj are the consumer surplus and res-
ervation prices and Pj is the price of the subscription plan j.
The constraints: While maximizing the objective function, the analyst
needs to account for the way in which readers self-select their subscription
plans (equations 14.24–14.26) (Moorthy 1984). In particular, the analyst
needs to account for the fact that a reader will select a plan only if: (1) the
surplus she derives from subscribing to plan j is strictly positive, and (2)
the surplus she derives from plan j is greater than the surplus she derives
from all the other plans offered in the menu.
The solution: Real-world product line design and pricing problems
(with the number of products > 2) generally belong to a class of NP hard
problems and therefore, analytical, closed-form solutions are not feasible.
Moreover, this particular problem presents the analyst with a discrete
combinatorial challenge with an extremely large search space. Therefore,
to obtain solution in a reasonable amount of time, the authors propose a
novel heuristic based solution to obtain profit maximizing plans.
The heuristic, which resembles a coordinated gradient ascent approach,
assists the newspaper in building its menu by sequentially assigning a
profit-maximizing plan to each segment, subject to plans assigned to prior
segments. The authors implemented their heuristic on real newspaper
data and obtained profit maximizing plans for several newspaper business
models. The key insights are: (1) the optimal product-line composition and
MIZIK_9781784716745_t.indd 353 14/02/2018 16:38

prices are influenced by the customer group that contributes the highest
revenue (i.e., advertisers) even though the product line is for the customer
group that contributes the least revenue (i.e., readers), (2) total profits
are maximized when marketing investments in each market are aimed at
jointly maximizing total profits from the two customer groups (integrated
strategy) rather than aimed at separately maximizing profits from each
customer group (‘siloed’ strategy), and (3) the profit maximizing menu,
under a siloed business model comprises of a partial mixed bundle of print
and digital subscription plans and that under an integrated business model
comprises of a pure bundle of print and digital subscription plans.
Dynamic Optimization Problems
As already explained, truly dynamic optimization problems involve deter-

mining a marketing action’s sequence or policy that optimize a long-term
objective over some finite or infinite planning horizon. All such prob-
lems are inherently ‘multiple sales entity’ in nature if we conceptualize
each upcoming period in a discrete-time horizon or instant of time in a
continuous-time horizon as an ‘entity’. Below we consider several illustra-
tions of dynamic marketing optimization problems that we believe are
instructive.
Dynamic Single Resource Single Entity Optimization Problems
The problem: The dynamic analog of Problem 1a in the static case has been
discussed at length by Sethi (1977). This is the problem of determining or
characterizing an optimal policy for expending a marketing resource, say
advertising, over time (as opposed to the static problem of finding the one-
time) optimal advertising budget.
The choice or ‘control’ variable in this problem is advertising expenditure
rate u(t) over time.
The state equation: The dynamic version of the static sales response
model is expressed by the state equation which is typically a differential
equation (in continuous-time) or difference equation (in discrete-time)
with sales as the ‘state’ variable, which evolves over time, under the
influence of the ‘control’ variable, specifically, ad expenditure rate. Two
famous versions of the state equation employed in these models are:
l the Nerlove–Arrow (1962) ‘advertising capital’ (or goodwill stock)

model; and
l the Vidale–Wolfe (1957) direct sales-advertising response model.
MIZIK_9781784716745_t.indd 354 14/02/2018 16:38

More specifically, the Nerlove–Arrow (1962) model assumes that adver-

tising expenditures affect the present and future demand for the product
and, hence, the present and future net revenue of the firm. Consequently,
advertising can be treated as an investment in building up some sort of
advertising capital or stock of goodwill which, however, depreciates over
time. Nerlove and Arrow (1962) assume that the goodwill stock depreci-
ates at a constant proportional rate d. Mathematically, their goodwill state
equation is:
dA
5 x 2 dA (14.27)
dt
where, A(0) = A0 and x = x(t) is current advertising expenditure in dollars.
Sales are then modeled as a function of goodwill, and other variables such
as price that we assume are held constant in this illustration, i.e., S =
S(A,. . .). In contrast, Vidale and Wolfe (1957) bypass the issue of goodwill
and directly model changes in rate of sales of a product as the result of two
effects, (1) response to advertising which acts (via the response constant
a) on the unsold portion of the market, and (2) loss due to forgetting
which acts (via the decay constant b) on the sold portion of the market.
Assuming a saturation level or market potential M, the Vidale–Wolfe state
equation for a monopolistic firm can be expressed as:1
dS S
5 axa1 2 b (14.28)
dt M
The objective function of the monopoly firm in this problem is the

discounted cumulative profit over an infinite horizon. Then the problem
of the firm can be stated as follows: To determine the policy x*(t) that
maximizes the discounted cumulative net profit over an infinite horizon,
i.e.,
Maximize (x): J 5 3 e2at [ R̂ (x,A,z) 2x ] dt

`
(14.29)
0
subject to the above Nerlove–Arrow state equation, where a is the dis-

count factor, and R̂ (x, A, z) = pS(A,,) – cS(A,..), i.e., the contribution
or gross margin dollars in period t, which is a function of marketing com-
munication spending x through the accumulated goodwill A, and other
variables z (e.g., price, discount etc.).
The solution to the N–A dynamic optimization problem can be obtained
by using the ‘calculus of variations’ (e.g., Kamien and Schwartz 2012) or
Pontryagin’s maximum principle as summarized by Sethi (1977). Sethi
shows the solution (optimal policy) has the form of a ‘bang-bang’ control,
MIZIK_9781784716745_t.indd 355 14/02/2018 16:38

i.e., apply an appropriate impulse of ad expenditure to take the goodwill

to its long-term or steady-state value (A ) instantaneously, and then
switch the control to x* (t) 5 dA and stay at this level to sustain the level
of stock of goodwill at A. An important insight from the N–A model is
the dynamic counterpart of the Dorfman and Steiner (1954) theorem.
Specifically, in the long run, the advertising expenditures should be
proportional to sales. The result offers support for the practice by some
companies of setting this period’s advertising expenditure in proportion
to the previous period’s sales. However, companies must get this propor-
tion reasonably right (see Mantrala 2002 for an example). When we shift
to the Vidale–Wolfe state equation, Sethi (1977) shows the solution to the
infinite-horizon problem is a feedback optimal control policy such that
the optimal ad expenditure in the steady state is directly proportional to
the decay constant and inversely proportional to the discount rate and
market potential.
Dynamic Single Entity Single Price Optimization Problems
There are many variants of dynamic price optimization problems – with

and without uncertainty, with and without inventory constraints, for new
versus old products (e.g., Mantrala and Rao 2001; Raman and Chatterjee
1995; Robinson and Lakhani 1975). A classic optimization problem in
this category is the one involving intertemporal price discrimination by
durable goods manufacturers (e.g., television and car manufacturers). We
discuss a discrete-time version of this problem below.
The problem: The key issue confronting a manufacturer producing
durable goods is determining an optimal pricing policy for the good over a
finite or an infinite horizon in the presence of forward looking consumers.
Typically, consumers who buy the durable product in the current period
are not in the market for the same product in subsequent periods. This
gives manufacturers an incentive to adopt a skimmed price differentiation
strategy, where they sell the product at a premium to consumers who have
high WTP in the initial period and then, sell the product at a lower price
to consumers who have lower WTP in subsequent periods. However, a
potential concern is that consumers who have high WTP and are forward-
looking could delay their purchases to avail of the manufacturer’s lower
prices in subsequent periods. If so, how should a durable goods manufac-
turer set its prices optimally?
The choice variables available to the manufacturer are prices in the cur-
rent and future periods.
The response model: To ascertain the optimal pricing schedule, the
manufacturer needs to first model consumer demand for the durable good.
MIZIK_9781784716745_t.indd 356 14/02/2018 16:38

As outlined by Nair (2007), such a demand function can be modeled using

a random coefficient logic demand system:
ar
urt 5 2 br pt 1 et (14.30)
(12 dc) 2
where, urt is the utility of consumers of type ‘r’ at time t, ar is the utility
that a consumer derives from the use of product per period of consump-
tion, dc is the consumer discount factor, br is the price sensitivity, pt is the
price of the product in time period t and et is a time specific error term
that controls for unobservable product characteristics. The utility (ur0t)
of deferring purchase to the future period can then be modeled using the
discounted expected value of waiting until the future period.
ur0t 5 dc Et [ max (ur,t11, ur0,t11) ] 1 er0t 2 ert (14.31)
Note that a consumer will buy the product in the current period only if
his/her utility from purchase exceeds waiting in future state (St11). This
can be mathematically represented as:
ar
2 br pt 1 et .Wr (St) 1 er0t 2 ert (14.32)
(1 2dc ) 2
where,
Wr(St) 5dc3lnc expa

ar
2p (St11) 1et11b 1exp (Wr (St11))d dF (St11 0 St)
(12 dc) 2
(14.33)
The objective function: The firm’s objective is to maximize its profits

over any infinite horizon:
Maximize  (st ,pt) 5 c a r 5 Mrt sr ( st, pt) d ( pt 2c)

R
(14.34)
1
where, Mrt is the market potential of the customer type ‘r’.

The solution to this infinite horizon profit maximization problem is a
value function that denotes the present discounted profit of current and
future profits when firm sets its prices in both the periods optimally:
V (St) 5 max [ p (St,pt) 1 df3 V (St) dF (et11 0 et) ] (14.35)
and the optimal pricing policy is:
MIZIK_9781784716745_t.indd 357 14/02/2018 16:38

p* (St) 5 arg max [ V (St)] (14.36)
While this dynamic pricing problem can be solved using traditional

game theoretic techniques for stylized demand models (e.g., Besanko
and Winston 1990), for more realistic demand models, such as the one
employed by Nair (2007), an analyst needs to resort to numerical dynamic
programming methods. The issue of intertemporal price discrimination
has spawned a rich field of literature that offers several interesting insights.
For example, the manufacturer’s optimal policy is indeed to charge a
higher price in the initial period and lower prices in the subsequent period.
However, the optimal price to charge in each period is contingent on
the discount factor of the consumers. Specifically, as the discount factor
increases, the optimal price decreases and the rate of price decline in future
periods decreases (Besanko and Winston 1990). Moreover, manufacturers
can benefit from having forward-looking consumers with low WTP (Su
2007). The rationale behind this counterintuitive finding is that when
forward looking consumers with low WTP defer their decision to future
periods, they end up competing with forward looking consumers with
high WTP, which consequently increases the WTP of consumers with low
valuations.
Dynamic Single Entity Multi-variable Optimization Problems
Dynamic single entity multi-variable optimization without time-varying

effectiveness
The problem: The focus of this problem is to determine the mix of
marketing communication activity expenditures over time that maxi-
mizes the cumulative or long-term return from an advertising campaign
assuming an infinite planning horizon and time-invariant resource effec-
tiveness. This problem was addressed by Naik and Raman (2003) and is
in effect the dynamic version of the static multivariable marketing-mix
optimization problem discussed earlier (holding prices and marginal costs
fixed).
The choice or control variables: In the present problem, let there be
physical units of advertising expenditures over time on two different and
distinct communications media – say print advertising and TV advertising
– denoted by ut and vt, respectively.
The state equation: Given a media plan {(ut, vt): t? (1, 2 . . .)}, the
advertiser generates the sales sequence {S1, S2 . . . St . . .}. The discrete-time
version of the state equation specified by Naik and Raman (2003) is then:
St 5 a 1 b1ut 1 b2vt 1 kutvt 1 lSt21 1 nt (14.37)
MIZIK_9781784716745_t.indd 358 14/02/2018 16:38

where, a is the base sales and (b1, b2) denote unequal independent
effectiveness parameters and k, the coefficient of the interaction term,
denotes synergy between the two media when k > 0 and l is the carryo-
ver coefficient (Koyck form). Hereafter, we switch from discrete-time to
continuous-time horizon as this simplifies the analytics and exposition
to some extent. Given the focus on synergy as well as dynamics, the
continuous-time version of the state equation is specified as follows:
dS DSt
5 lim Dt S 0 (14.38)
dt Dt
dS
5 b1u (t) 1b2v (t) 1 ku (t) v (t) 2 (12l) S (14.39)
dt
The objective function is now the dynamic version of the static version
in equation (14.5). Specifically the stock of profit at each instant in time
is given by p (S,u,v) 5 mS 2 u2 2 v2 where it is assumed the cost of each
resource is a convex quadratic function of the physical units expended.
The decision-maker’s objective is then to choose u and v to maximize
cumulative discounted profit over an infinite horizon:
Maximize J (u,v) 5 3 e2rt  (S (t) , u (t),v (t)) dt

`
(14.40)
0
where, r denotes the discount rate and J(u, v) is the net present value of
any multimedia policies (u(t), v(t)).
The solution: Naik and Raman (2003) solve the maximization problem
induced by equations (21) and (22) by applying optimal control theory.
The optimal solutions for the two control variables are the following:
m (b2km 12b2 (11 r 2 l))

u* 5 (14.41)
4 (1 1 r 2 l) 2 2 k2m2
m (b1km 12b2 (1 1 r 2 l))

v* 5 (14.42)
4 (11 r 2 l) 2 2 k2m2
Notably, the above solutions imply constant or even expenditures in

the two media over time. Second, some of the key insights from these
results as derived by Naik and Raman (2003) are: as synergy increases,
the advertiser should increase the total budget but decrease (increase) the
proportion of media budget allocated to the more (less) effective com-
munications activity. Furthermore, if the various activities are equally
MIZIK_9781784716745_t.indd 359 14/02/2018 16:38

effective, then the advertiser should allocate the media budget equally
among them, regardless of the magnitude of synergy.
In reality, however, marketing effectiveness can vary over time, e.g.,
consumer segments, values and tastes change as products age and com-
petitive landscape or economic conditions change, making the aggregated
market less or more responsive over time to marketing efforts (e.g.,
Mahajan et al. 1980). In the context of online marketing investments,
Biyalogorsky and Naik (2003, 30) state that: with the changing nature of
the Internet, it is possible that . . . [the effectiveness of online marketing
investments] . . . may change over time in predictable ways’. Therefore,
next, we provide an illustration of a problem of marketing mix optimiza-
tion with time-varying effectiveness that also involves a finite rather than
infinite planning horizon.
Dynamic single entity multi-variable optimization with time-varying

effectiveness
The problem is how should optimal marketing-mix levels be set over
a finite planning horizon when the effectiveness of the marketing
inputs is time-varying? Raman et al. (2012) formulate and solve such a
problem.
The choice or control variables are the marketing mix elements, e.g., the
expenditures of two resources like advertising and personal selling over
time.
The state equation is
dS
5 2dS 1 b1 (t) ut 1 b2 (t) vt (14.43)
dt
where, S is the sales of the product, d represents the rate of decay in
sales, and u and v represent the units of the two marketing activities (e.g.,
number of sales calls, ad exposures). Lastly, b1(t) and b2(t) reflect the
time-varying effectiveness of u and v, respectively, that have the form of
any of a variety of functions of time such as polynomial functions in a
specific application.
The objective function represents the discounted cumulative profit over a
finite horizon T as a function of the policies for two marketing inputs over
time. Consequently, the optimization problem of the firm can be expressed
as determining optimally u(t) and v(t) over its planning horizon T to
maximize discounted long-term profits. Mathematically, this is:
Maximize J (u,v) 5 3 e2rtp p (S (t), u (t) ,v (t)) dt 1 mSe2rt (14.44)

T

0
MIZIK_9781784716745_t.indd 360 14/02/2018 16:38

where, J is the objective functional of the firm, and p(S, u, v) = m(t)

S−c1 (t) u2 2c2 (t) v2 subject to the dynamics in equation (14.43) and the
salvage value: mSqe2rT.
The solution: Raman et al. (2012) solve the problem using optimal
control theory and produce the following insights. First, the optimal
allocations are proportional to the effectiveness parameters (consistent
with earlier results of Naik and Raman (2003)) but there is a finite
horizon effect. Second, due to the time-varying parameters, the optimal
allocation ratio (i.e., u/v) will change over time, thereby directing
managers to emphasize different marketing mix elements at different
times over the planning horizon. Third, the allocation ratio can switch
over the planning horizon, causing complete reversals in the emphasis
placed on one instrument versus the other. Raman et al.’s (2012) results
and insights are very useful considering that conventional wisdom on
the product life cycle (PLC) concept recommends switching emphasis
– e.g., the recommendation that advertising should be emphasized over
personal selling in the introductory phase while personal selling should
receive greater weight later in the PLC. However, Raman et al. (2012)
provide analytical proof that such actions are optimal and establish
the precise nature – quantitatively and qualitatively – of the optimal
variation in spending on different marketing instruments over time such
as offline and online media. Managers can combine these rules with
empirically derived parameter estimates to improve their marketing
resource allocation.
To summarize, most marketing inputs have dynamic effects, consumers
in most markets are not myopic, and firms are paying more attention to
long-term results. Hence, there is a pronounced need for dynamic optimi-
zation thinking in marketing decision making. We have only summarized
a few illustrative cases of such optimization problems above and several
more are summarized in Table 14.1b. In making these selections, we hope
that the reader has a better sense of the many twists and variants possible
in what may sound as the same or similar dynamic optimization problem.
For example, asking a question such as ‘how should I price my product
optimally over time?’ can seem relatively straightforward. But the solution
can widely differ depending on all the other factors and conditions in the
problem setting. However, as the optimization problems become richer,
more realistic and interesting, they also become much more complicated
to formulate and solve. That is, increasing complexity typically calls for
increasingly sophisticated knowledge and expertise in applying methods
such as dynamic programming, calculus of variations, deterministic and
stochastic optimal control theory. Not everyone, of course, can be such
an optimization specialist. However, having a good grasp of the main
MIZIK_9781784716745_t.indd 361 14/02/2018 16:38

principles and insights that have been discovered by specialist analysts,

such as those covered above, can be very helpful.
Conclusion
A key task of a marketing manager is to determine the optimum levels of

various marketing instruments (e.g., advertising, sales force and prices)
in order to maximize firm and customer-level outcomes (e.g., profits,
revenues, customer equity, customer life time value etc.) To accomplish
this task, the marketing manager needs to identify the key instrument(s)
that he would like to optimize, develop a predictive model or response
function (for the entire market or for each entity within the firm’s market)
that relates the outcome of interest to the key marketing instrument(s),
calibrate the response function using an appropriate estimation technique
and, finally, set up the objective function and constraints on the decision
variables and solve the optimization problem using a relevant optimi-
zation technique to determine the appropriate levels of the marketing
instruments.
It is within the context of accomplishing this key marketing task that we
approached this chapter on Marketing Optimization Methods. While our
illustration of various marketing problems is by no means exhaustive, we
hope that it demonstrates how an analyst can formulate and solve various
contemporary marketing optimization problems that confront marketers
in implementing their strategies.
Looking ahead, the proliferation of marketing channels and rapid
advances in digital and information technology are presenting market-
ers with a plethora of new opportunities to engage with their customers
and maximize the probability of a sale. However, these contemporary
developments are also posing some difficult questions with regards to
marketing resource allocations. For example, in the realm of multi-
channel and omni-channel marketing, marketers are facing questions
such as: (1) how to optimally allocate marketing resources at various
customer touchpoints in order to maximize profits and customer life
time value (Kumar and Rajan 2012), (2) how to use marketing-mix
instruments (e.g., price and promotion) in order to deter (or leverage)
‘showrooming’ behavior and maximize profits from both online and
offline channels (Verhoef et al. 2015), and (3) how can mobile promo-
tions be optimally timed so as to increase cross-channel synergies and
ultimately, maximize profits (Shankar et al. 2010). Similarly, in the
sales force domain, programmatic advertising is eliminating the need
for human interaction (Seitz and Zorn 2016). Consequently, marketers
MIZIK_9781784716745_t.indd 362 14/02/2018 16:38

Table 14.3 Key takeaways of optimization problems surveyed in this chapter
Class of Optimization Problems Some Key Takeaway(s)

Static Single Resource Single Entity Optimum budget increases as the gross margin on the resource increases
Optimization Problems The ratio of the optimal budget to its resulting contribution dollars should equal the
MIZIK_9781784716745_t.indd 363
marketing elasticity, i.e. m 5 mfx* (x*)
The flat maximum principle: The realized profit is relatively insensitive to fairly wide
deviations from the optimum budget.
(
Static Single Entity Single Price Lerner’s Index ( p*p*2 c ) : this index, which is bounded between 0 and 1, is a measure of
Optimization Problems the market power for a monopolist. This should equal the reciprocal of the price elasticity at
optimality.
Static Single Entity Multi-variable The Dorfman-Steiner rule: the optimal levels of the marketing mix variables are those
Optimization Problems that simultaneously satisfy the following conditions:
1
5 e5 h5 u
L
Static Multiple Entity Single For an unconstrained problem, the allocations to the sales entities should be
363
Resource Allocation Problems proportionate to their response elasticities or, more simply, the more responsive entities
should receive higher allocations.
Static Multiple Entity Multi-variable For a firm offering two products at different prices, the optimal price for each product is
Optimization Problems a function of: (i) its own elasticity; (ii) own marginal cost; (iii) price of the other product;
(iv) the cross-price elasticity; (v) scale factors for each product; (v) the other product’s cross-
elasticity; and (vi) the other product’s marginal cost.
For firms with cross-market network effects, the optimum marketing resource budget is
a function of the ‘cross-market dependency coefficient’ (d).
For firms offering multiple products and experiencing cross-market network effects:
optimal product-line composition and prices are influenced by the customer group that
contributes the highest revenue even though the product line is for the customer group
that contributes the least revenue.
total profits are maximized when marketing investments in each market are aimed at
jointly maximizing total profits from the two customer groups (integrated strategy) rather
than aimed at separately maximizing profits from each customer group (‘siloed’ strategy).
14/02/2018 16:38
Class of Optimization Problems Some Key Takeaway(s)
MIZIK_9781784716745_t.indd 364
Dynamic Single Resource Single The optimal ad expenditure in the steady state is directly proportional to sales in the
Entity Optimization Problems steady state.
Dynamic Single Entity Single Price For manufacturers producing durable goods:
Optimization Problems the optimal pricing policy is to charge a higher price in the initial period and lower prices
in the subsequent period.
the optimal price to charge in each period is contingent on the discount factor of the
consumers. Specifically, as the discount factor increases, the optimal price decreases and
the rate of price decline in future periods decreases.
forward looking consumers with low WTP can be beneficial because when forward
looking consumers with low WTP defer their decision to future periods, they end up
competing with forward looking consumers with high WTP, which consequently increases
364
the WTP of consumers with low valuations.
Dynamic Single-Entity Multi-variable A firm determining the profit maximizing mix of marketing communication expenditures over
Optimization Problems time, where the effectiveness of marketing inputs is time-invariant:
should increase the total budget but decrease (increase) the proportion of media budget
allocated to the more (less) effective communications activity, as the synergy between
multiple media increases.
should allocate the media budget equally amongst them, regardless of the magnitude of
synergy, if the various media are equally effective.
For a firm determining the profit maximizing mix of marketing communication expenditures
over time, where the effectiveness of marketing inputs is time-varying:
the optimal allocations are proportional to the effectiveness parameters.
the optimal allocation ratio will change over time, thereby directing managers to emphasize
different marketing mix elements at different times over the planning horizon.
the allocation ratio can switch over the planning horizon, causing complete reversals in
the emphasis placed on one instrument versus the other.
14/02/2018 16:38
are frequently confronted with the challenge of determining the optimal

mix of outside and inside sales forces that will maximize customer
experience and net profits from offline (i.e., face-to-face) and online
channels (Mantrala and Albers 2012). Moreover, disruptive technolo-
gies such as augmented virtual reality bring with them a distinctive abil-
ity to enhance shopping experience through virtual fitting rooms and
boost customer participation and conversion in the case of virtual B2B
tradeshows. Consequently, how should retailers allocate their resources
between in-store and virtual marketing to maximize customer equity?
Similarly, how should trade show management firms allocate their
resources between offline and virtual events in order to boost leads?
More importantly, how can they time offline and virtual show events so
as to maximize their ROI (Gopalakrishna and Lilien 2012)? Likewise,
in the realm of social media, what content strategy (e.g., content of
the posts, sentiment of the text, timing of posts) can media publishing
firms such as newspapers and magazines adopt in order to maximize
engagement, click-through rate and, subsequently, advertising revenue
resulting from impressions generated by click-throughs on the firm’s
website?
We hope that this chapter will motivate marketing practitioners and
academics to solve such contemporary marketing problems using tradi-
tional and newer marketing optimization approaches. The key takeaways
outlined in Table 14.3 along with the elasticity and carryover estimates
shown in Table 14.2 should provide marketing managers with a starting
point to answer some of the questions listed above. For example, if we
treat channels as sales entities, one could use the takeaway that the alloca-
tions to the sales entities should be proportionate to their response elas-
ticities (Little and Lodish 1969), to distribute advertising dollars between
print and online channels. Alternatively, marketers can also formulate
their own objective functions that better suit their respective institutional
settings and solve the optimization problems using a wide array of free
and commercial optimization software available in the market today
(see Table 14.4). Moreover, through this chapter, we also urge marketing
scholars to take a closer look at the exciting research avenues listed above
and continue with the tradition of proposing normative guidelines to help
practitioners address these important problems and improve marketing
productivity.
MIZIK_9781784716745_t.indd 365 14/02/2018 16:38

Table 14.4 Some commercial and free optimization software
Name of the License Type Description and Key Capabilitiesa

Software
AIMMS Commercial A software system designed for modeling and
solving large-scale optimization and scheduling-
type problems. It is known for its GUI building
capabilities.
AMPL Commercial An algebraic modeling language for describing
and solving high-complexity problems for large-
scale mathematical computing. It supports LP,
QP, MILP, MINP and SP.
CPLEX Commercial An optimization software package for solving
very large LP, IP, MIP, MINLP and QP
problems.
FortSP Commercial A software package dedicated for solving SP
problems.
Gurobi Commercial A commercial optimization solver for solving LP,
QP, QCP, MILP, MIQP, and MIQCP problems.
ADMB Free A software suite for solving non-linear
optimization problems. It is known for its ability
to integrate MCMC methods for problems
involving Bayesian modeling.
EvA2 Free A software system that uses evolutionary
algorithms to optimize linear and nonlinear
objective functions.
OpenOpt Free A numerical optimization framework written in
Python. It supports NLP, LP, MIP, MINLP, and
QP problems.
PPL Free A software system that provides numerical
abstractions for large scale IP optimization
problems. It is known for its convex polyhedral
abstractions.
TAO Free A software for large scale optimization LP
and NLP problems. It is known for its ability
to parallel process while solving complex
optimization problems.
Note: a Abbreviations: GUI – Graphical User Interface, IP – Integer Programming, LP –

Linear Programming, MILP – Mixed Integer Linear Programming, MCMC – Markov
chain Monte Carlo, MINLP – Mixed Integer Non Linear Programming, MIQP – Mixed
Integer Quadratic Programming, MIQCP – Mixed Integer Quadratic Constrained
Programming, NLP – Nonlinear Programming, QP – Quadratic Programming, SP –
Stochastic Programming.
MIZIK_9781784716745_t.indd 366 14/02/2018 16:38

Note
1. Little (1975) has proposed a discrete-time version of a dynamic sales response model
(BRANDAID) that he showed later is a generalization of the discrete-time versions of
the Nerlove–Arrow and Vidale–Wolfe models (Little 1979). It is useful to note here that
if a constant level of advertising expenditure was continuously applied, and the market
potential is fixed, then sales will reach a long-run equilibrium and the form of the sales-
advertising response function in this ‘steady state’ is linear (concave) under the Nerlove–
Arrow (Vidale–Wolfe) model.
References
Albers, Sönke (2000), “Impact of Types of Functional Relationships, Decisions, and

Solutions on the Applicability of Marketing Models,” International Journal of Research in
Marketing, 17 (2), 169–75.
Albers, Sönke, Murali K. Mantrala, and Shrihari Sridhar (2010), “Personal Selling
Elasticities: A Meta-analysis,” Journal of Marketing Research, 47 (5), 840–53.
Aravindakshan, Ashwin, Olivier Rubel, and Oliver Rutz (2014), “Managing Blood
Donations with Marketing,” Marketing Science, 34 (2), 269–80.
Aykac, Ahmet, Marcel Corstjens, David Gautschi, and Ira Horowitz (1989), “Estimation
uncertainty and optimal advertising decisions,” Management Science, 35 (1), 42–50.
Bayus, Barry L. (1992), “The Dynamic Pricing of Next Generation Consumer Durables,”
Berger, Paul D. and Nada Nasr Bechwati (2001), “The Allocation of Promotion Budget to
Maximize Customer Equity,” Omega, 29 (1), 49–61.
Berger, Paul D. and Nada I. Nasr (1998), “Customer Lifetime Value: Marketing Models and
Applications,” Journal of Interactive Marketing, 12 (1), 17–30.
Besanko, David and Wayne L. Winston (1990), “Optimal Price Skimming by a Monopolist
Facing Rational Consumers,” Management Science, 36 (5), 555–67.
Bigne, J. Enrique (1995), “Advertising Budget Practices: A Review,” Journal of Current
Issues & Research in Advertising, 17 (2), 17–31.
Bijmolt, Tammo H.A., Harald J. van Heerde, and Rik G.M. Pieters (2005), “New Empirical
Generalizations on the Determinants of Price Elasticity,” Journal of Marketing Research,
42 (2), 141–56.
Biyalogorsky, Eyal and Prasad Naik (2003), “Clicks and Mortar: The Effect of Online
Activities on Offline Sales,” Marketing Letters, 14 (1), 21–32.
Blattberg, Robert C. and John Deighton (1996), “Manage Marketing by the Customer
Equity Test,” Harvard Business Review, 74 (4), 136–44.
Chaffey, Dave and Mark Patron (2012), “From web analytics to digital marketing optimi-
zation: Increasing the commercial value of digital analytics,” Journal of Direct, Data and
Digital Marketing Practice, 14 (1), 30–45.
Charnes, Abraham and William W. Cooper (1958), “The Theory of Search: Optimum
Distribution of Search Effort,” Management Science, 5 (1), 44–50.
Dean, Joel (1951), “How Much to Spend on Advertising,” Harvard Business Review, 29 (1),
65–74.
Doctorow, David, Robert Hoblit, and Archana Sekhar (2009), “Measuring Marketing:
McKinsey Global Survey Results,” McKinsey Quarterly, 5 (March), 13.
Dorfman, Robert and Peter O. Steiner (1954), “Optimal Advertising and Optimal Quality,”
American Economic Review, 44 (5), 826–36.
Edelman, David C. (2010), “Four Ways to Get More Value from Digital Marketing,”
McKinsey Quarterly, 6 (March), 1–8.
MIZIK_9781784716745_t.indd 367 14/02/2018 16:38

Evans, David S. (2003), “Some Empirical Aspects of Multi-sided Platform Industries,”

Review of Network Economics, 2 (3).
Fischer, Marc, Sönke Albers, Nils Wagner, and Monika Frie (2011), “Practice Prize Winner-
Dynamic Marketing Budget Allocation Across Countries, Products, and Marketing
Activities,” Marketing Science, 30 (4), 568–85.
Freeland, James R. and Charles B. Weinberg (1980), “S-Shaped Response Functions:
Implications for Decision Models,” Journal of the Operational Research Society, 31 (11),
1001–7.
Fylstra, Daniel, Leon Lasdon, John Watson, and Allan Waren (1998), “Design and Use of
the Microsoft Excel Solver,” Interfaces, 28 (5), 29–55.
Gatignon, Hubert and Dominique M. Hanssens (1987), “Modeling Marketing Interactions
with Application to Salesforce Effectiveness,” Journal of Marketing Research, 247–57.
Gopalakrishna, Srinath and Gary L. Lilien (2012), Trade Shows in the Business Marketing
Communications Mix. Northampton, MA: Edward Elgar.
Hanssens, Dominique M., Leonard J. Parsons, and Randall L. Schultz (2001), Market
Response Models: Econometric and Time Series Analysis. New York: Springer Science &
Business Media.
Henningsen, Sina, Rebecca Heuke, and Michel Clement (2011), “Determinants of Advertising
Effectiveness: The Development of an International Advertising Elasticity Database and a
Meta-analysis,” BuR-Business Research, 4 (2), 193–239.
Kamien, Morton I. and Nancy Lou Schwartz (2012), Dynamic Optimization: The Calculus
of Variations and Optimal Control in Economics and Management. Amsterdam: Elsevier.
Kanuri, Vamsi K., Murali K. Mantrala and Esther Thorson (2017), “Optimizing a Menu
of Multi-format Subscription Plans for Ad Supported Media Platforms: A Model and
Application in the Daily Newspaper Industry,” Journal of Marketing, 81(2), 45–63.
Kohler, Christine, Murali K. Mantrala, Sonke Albers, and Vamsi K. Kanuri (2017), “A
Meta-Analysis of Marketing Communication Carryover Effects,” Journal of Marketing
Research, forthcoming.
Kohli, Rajeev and Vijay Mahajan (1991), “A Reservation-price Model for Optimal Pricing
of Multiattribute Products in Conjoint Analysis,” Journal of Marketing Research,
347–54.
Koopman, Bernard O. (1953), “The Optimum Distribution of Effort,” Journal of the
Operations Research Society of America, 1 (2), 52–63.
Kumar, V. and Morris George (2007), “Measuring and Maximizing Customer Equity: A
Critical Analysis,” Journal of the Academy of Marketing Science, 35 (2), 157–71.
Kumar, V. and Bharath Rajan (2012), Customer Lifetime Value Management: Strategies to
Measure and Maximize Customer Profitability. Cheltenham, UK and Northampton, MA:
Edward Elgar Publishing.
Little, John D. C. (1979), “Aggregate Advertising Models: The State of the Art,” Operations
Research, 27 (4), 629–67.
Little, John D. C. (1975), “BRANDAID: A Marketing-Mix Model,” Operations Research,
23 (4), 628–55.
Little, John D. C. (1970), “Models and Managers: The Concept of a Decision Calculus,”
Management Science, 16, B466–85.
Little, John D.C. and Leonard M Lodish (1969), “A Media Planning Calculus,” Operations
Research, 17 (1), 1–35.
Lodish, Leonard M. (1971), “CALLPLAN: An Interactive Salesman’s Call Planning
System,” Management Science, 18 (4-part-ii), P-25-P-40.
Lodish, Leonard M. (1980), “A User-oriented Model for Sales force Size, Product, and
Market Allocation Decisions,” Journal of Marketing, 44 (3), 70–78.
Mahajan, Vijay, Stuart I. Bretschneider, and John W. Bradford (1980), “Feedback Approaches
to Modeling Structural Shifts in Market Response,” Journal of Marketing, 71–80.
Mantrala, Murali K. (2002), Allocating Marketing Resources. London: Sage Publications.
Mantrala, Murali K. and Sönke Albers (2012), Impact of the Internet on B2B Sales Force
Size and Structure. Cheltenham, UK and Northampton, MA: Edward Elgar Publishing.
MIZIK_9781784716745_t.indd 368 14/02/2018 16:38

Mantrala, Murali K., Prasad A. Naik, Shrihari Sridhar, and Esther Thorson (2007), “Uphill
or Downhill? Locating the Firm on a Profit Function,” Journal of Marketing, 71 (2),
26–44.
Mantrala, Murali K. and Surya Rao (2001), “A Decision-Support System that Helps
Retailers Decide Order Quantities and Markdowns for Fashion Goods,” Interfaces, 31
(3_supplement), S146-S65.
Mantrala, Murali K., Prabhakant Sinha, and Andris A. Zoltners (1992), “Impact of
Resource Allocation Rules on Marketing Investment-level Decision and Profitability,”
Journal of Marketing Research, 29 (2), 162.
Mela, Carl F., Jason Roos, and Yiting Deng (2013), “Invited Paper–A Keyword History of
Marketing Science,” Marketing Science, 32 (1), 8–18.
Monroe, Kent B. and Albert J. Della Bitta (1978), “Models for Pricing Decisions,” Journal
of Marketing Research, 413–28.
Montgomery, David B. and Alvin J. Silk (1972), “Estimating Dynamic Effects of Market
Communications Expenditures,” Management Science, 18 (10), B-485-B-501.
Montgomery, David B., Alvin J. Silk, and Carlos E. Zaragoza (1971), “A Multiple-Product
Sales Force Allocation Model,” Management Science, 18 (4-part-ii), P-3-P-24.
Moorthy, K. Sridhar (1984), “Market Segmentation, Self-selection, and Product Line
Design,” Marketing Science, 3 (4), 288–307.
Naik, Prasad A. and Kalyan Raman (2003), “Understanding the Impact of Synergy in
Multimedia Communications,” Journal of Marketing Research, 40 (4), 375–88.
Naik, Prasad A., Kalyan Raman, and Russell S. Winer (2005), “Planning Marketing-Mix
Strategies in the Presence of Interaction effects,” Marketing Science, 24 (1), 25–34.
Nair, Harikesh (2007), “Intertemporal Price Discrimination with Forward-looking
Consumers: Application to the US Market for Console Video-games,” Quantitative
Marketing and Economics, 5 (3), 239–92.
Nerlove, Marc and Kenneth J. Arrow (1962), “Optimal Advertising Policy under Dynamic
Conditions,” Economica, 129–42.
Raman, Kalyan and Rabikar Chatterjee (1995), “Optimal Monopolist Pricing under
Demand Uncertainty in Dynamic Markets,” Management Science, 41 (1), 144–62.
Raman, Kalyan, Murali K. Mantrala, Shrihari Sridhar, and Yihui Elina Tang (2012),
“Optimal Resource Allocation with Time-Varying Marketing Effectiveness, Margins and
Costs,” Journal of Interactive Marketing, 26 (1), 43–52.
Reibstein, David J. and Hubert Gatignon (1984), “Optimal Product Line Pricing: The
Influence of Elasticities and Cross-elasticities,” Journal of Marketing Research, 21 (3),
259–67.
Robinson, Bruce and Chet Lakhani (1975), “Dynamic Price Models for New-Product
Planning,” Management Science, 21 (10), 1113–22.
Seitz, Jürgen and Steffen Zorn (2016), “Perspectives of Programmatic Advertising,” in
Programmatic Advertising, Oliver Busch, ed. New York: Springer.
Sethi, Suresh P. (1977), “Optimal Advertising for the Nerlove–Arrow Model under a Budget
Constraint,” Operational Research Quarterly, 28 (3), 683–93.
Sethuraman, Raj, Gerard J. Tellis, and Richard A. Briesch (2011), “How Well Does
Advertising Work? Generalizations from Meta-analysis of Brand Advertising Elasticities,”
Journal of Marketing Research, 48 (3), 457–71.
Shankar, Venkatesh, Alladi V.enkatesh, Charles Hofacker, and Prasad Naik (2010),
“Mobile Marketing in the Retailing Environment: Current Insights and Future Research
Avenues,” Journal of Interactive Marketing, 24 (2), 111–20.
Simon, Hermann (1982), “ADPULS: An Advertising Model with Wearout and Pulsation,”
Simon, Julian L. and Johan Arndt (1980), “The Shape of the Advertising Response
Function,” Journal of Advertising Research, 20 (4), 11–28.
Sinha, Prabhakant and Andris A. Zoltners (1979), “The Multiple-Choice Knapsack
Problem,” Operations Research, 27 (3), 503–15.
Sridhar, Shrihari, Murali K. Mantrala, Prasad A. Naik, and Esther Thorson (2011),
MIZIK_9781784716745_t.indd 369 14/02/2018 16:38

“Dynamic Marketing Budgeting for Platform Firms: Theory, Evidence, and Application,”
Su, Xuanming (2007), “Intertemporal Pricing with Strategic Customer Behavior,”
Management Science, 53 (5), 726–41.
Thomas, Jerry W. (2006), “Marketing Optimization,” Decision Analyst.
Tull, Donald S., Van R. Wood, Dale Duhan, Tom Gillpatrick, Kim R. Robertson, and
James G. Helgeson (1986), “’Leveraged’ Decision Making in Advertising: The Flat
Maximum Principle and Its Implications,” Journal of Marketing Research, 23 (1), 25–32.
Urban, Glen L. (1975), “Allocating Ad Budgets Geographically,” Journal of Advertising
Research, 15 (6), 7–16.
Urban, Glen L. (1969), “A Mathematical Modeling Approach to Product Line Decisions,”
Van Ittersum, Koert, Brian Wansink, Joost M. E. Pennings, and Daniel Sheehan (2013),
“Smart Shopping Carts: How Real-Time Feedback Influences Spending,” Journal of
Marketing, 77 (6), 21–36.
Verhoef, Peter C., P. K. Kannan, and J. Jeffrey Inman (2015), “From Multi-channel
Retailing to Omni-channel Retailing: Introduction to the Special Issue on Multi-Channel
Retailing,” Journal of Retailing, 91 (2), 174–81.
Vidale, M. L. and H. B. Wolfe (1957), “An Operations-Research Study of Sales Response to
Advertising,” Operations Research, 5 (3), 370–81.
MIZIK_9781784716745_t.indd 370 14/02/2018 16:38

Case studies and
Applications
MIZIK_9781784716745_t.indd 371 14/02/2018 16:38

MIZIK_9781784716745_t.indd 372 14/02/2018 16:38
PART vii
case studies and

applications
in marketing
MANAGEMENT
MIZIK_9781784716745_t.indd 373 14/02/2018 16:38

MIZIK_9781784716745_t.indd 374 14/02/2018 16:38
15. Industry applications of conjoint analysis
Vithala R. Rao
While conjoint analysis was originally developed to estimate utility values

for attribute levels, it quickly became clear how versatile and useful the
methodology is for marketing decision making (Green and Rao 1971).
It has been applied with significant benefit to a large array of marketing
decision problems such as product and service design, market segmenta-
tion, competitive analysis, pricing decisions, and sales/distribution analy-
sis. Table 15.1 shows a selection of such applications. Appendix A to this
chapter provides a brief description of the conjoint analysis method.
This chapter reviews five applications to provide the unique flavor and
demonstrate the versatility of the conjoint analysis method. The following
applications are discussed: store location selection, bidding for contracts,
evaluating the market value of a change in a product attribute (MVAI),
push marketing strategy in a B2B context, and choice of a distribution
channel.1
Store Location
Retailers expand their business by expanding their presence in new geo-

graphic areas. They evaluate the potential of several new store locations
using estimates of expected sales (or profits) and select a few locations for
their geographic expansion. The estimate of expected sales in any location
is simply the product of total market potential in the area and expected
Table 15.1 A selection of domain areas of applications
Application Domain Products Services

Product design Electric car Hotels (courtyard by Marriott)
Carpet cleaners Electronic toll systems (E-Z Pass)
Personal computers Consumer discount cards
Market segmentation Copying machines Car rental agencies
Product positioning Ethical drugs Banking services
Competitive analysis Ethical drugs Transcontinental airlines
Pricing Gasoline pricing Telephone services pricing
Health insurance policies
375
MIZIK_9781784716745_t.indd 375 14/02/2018 16:38

market share of the new store. The estimate of market potential needs to
include the likely market expansion due to the presence of the new store.
The expected market share for the new store depends on the strength of
competing stores in the area. While historical data can provide estimates
of the current market potential and market shares of existing stores,
judgment is called for estimating market expansion and market share.
Conjoint methods have been applied in this context. One model in the
franchising context (Ghosh and Craig 1991) considers both the potential
to take market share from existing competitors and the market expansion
potential in the geographic area due to the new store. We will first describe
a mathematical model to estimate expected market share and then show
how judgment is used for estimating its components as described in
Durvasula, Jain, and Andrews (1992).
Let us consider a geographic area with n existing stores and introduc-
tion of another store (n+1). Let Mi, denote the market share of the i-th
store. Let ME denote the market expansion due to the presence of the
captured by i-th store (i=1, . . ., n+1) and g i5n11

new store. Let ki denote the proportion of the market expansion potential
i51
ki 51. All the ki values
are non-negative.
The new store will capture some market share of each of the existing
stores, and PMSi is the proportion of current market share of i-th store
(Mi) captured by the new store. With these symbols, an estimate i5n of
the market share of the (n + 1)-th store can be derived as: MSn+1 5 a
(PMSi *Mi 1 kn11*ME) / (1 1ME) , and the revised market shares of the i51
existing stores are given by: MSi 5 (Mi 2PMS*Mi 1ki*ME) / (1 1 ME) .
Here, market shares of the n existing stores are typically known and the
other quantities (PMSs, ks, and ME) need to be estimated by another
model or judged by the decision makers.
One model used for estimating the PMS quantities is: PMSi =
PMIN + (PMAX- PMIN) (1- f (Si)); i = 1, . . ., n, where PMIN (≥
0) and PMAX (≤ 1) are the minimum and maximum share an outlet
can obtain and Si is the relative strength of the existing stores in the
area. Typically, f (Si) is modeled as a logistic function in Si. PMIN and
PMAX are judgmentally obtained. The relative strength construct (Si)
depends on various store attributes and can be modeled using conjoint
analysis.
Durvasula, Jain, and Andrews (1992) applied this model for the case of
banks and showed how conjoint analysis can be used in estimation. The
context is that of a firm, called ABC Commerce, evaluating the potential
of four locations, L1, L2, L3, and L4 in a certain geographic region. The
firm currently has 16 branches in the region. In order to evaluate relative
strength, the authors identified five attributes (by an exploratory study).
MIZIK_9781784716745_t.indd 376 14/02/2018 16:38

Industry applications of conjoint analysis 377
The attributes are: competitor’s market share, growth of competitor’s

deposits, aggressiveness of the competitor in attracting deposits, age of
the competitor’s branch, and type of financial institution. The first three
attributes were each at three levels described as “below,” “about,” and
“above average of the ABC firm”; the fourth attribute was described
by two levels as “relatively new” and “relatively established.” The fifth
attribute, type of financial institution, was described by two levels of
“statewide” and “local.” Using these five attributes, 16 descriptions of the
competitive situation were developed using a fractional factorial design
and four experienced managers rank ordered the sixteen profiles on the
relative competitor strength. Based on these judgments, partworth values
were computed for each of the five attributes for each manager separately.
There was some heterogeneity among the partworths across managers.
The authors used these results to evaluate the market potential for the
four locations using the models described earlier; the conjoint results for
competitor’s strength were the major input into the analysis. Managers
also provided additional inputs (e.g., PMIN, PMAX etc.) judgmentally.
The logistic functions, f(S), were estimated individually from the estimates
of competitive strength obtained for the competitive branches in each
location calculated using the partworth values. There was a reason-
able agreement among the managers in their site evaluation. The market
expansion (ME) was assumed to be zero in this application and the values
of kis were not estimated. The average market share potential for the
proposed branches at locations L1, L2, L3, and L4 were 27.3, 11.1, 17.0,
and 23.6, respectively. Based on this analysis, locations L1 and L4 were
judged as offering higher potential. One should note that this analysis was
conducted at one particular point in time, and expected growth factors
were not included in these assessments. A dynamic conjoint study is called
for to assess growth as well. Nevertheless, this illustration shows how
conjoint analysis can be employed for retail location decisions.
A Bidding Application
The Alpha catering firm located in Scandinavia was experiencing a decline

in market share. The Alpha firm faces competition from four other firms
in this market; we call these Beta, Gamma, Delta, and Phi; all but one of
these are large firms and the fifth one (Phi) is a small entrepreneurial firm.
These catering firms set up cafeterias on customers’ (or client companies’)
premises and runs these cafeterias. They set prices for each item sold in
cafeteria meals2 at the company facility, and the client firms offer some
subsidy to employees for lunch.
MIZIK_9781784716745_t.indd 377 14/02/2018 16:38

Pricing mechanisms in this catering supplier market are very compli-

cated. Potential suppliers submit competitive bids that propose a fixed
(one-time) payment for set-up costs for a cafeteria at the customer firm’s
location. These set-up costs are to be borne by the customer firm for the
contract and are the basis for choosing a catering supplier. In order to
understand the clients’ trade-offs, the research firm conducted marketing
research using conjoint analysis as the main technique for understanding
the various trade-offs involved among the bids presented by the suppliers.
The attributes in the conjoint study were the setup costs specific to each
supplier. The research firm used prior knowledge of the setup costs of the
five competing firms to come up with a range of set-up costs. Rather than
using actual possible values of set-up costs for each supplier, an index
was used to describe the set-up costs (excluding the costs of catering and
banquets) of each catering firm. These indexes varied from a low of 85 to
a high of 120. For each supplier, five levels of the index were developed;
for example, the levels for one catering firm, Gamma, were 85, 90, 95, 100,
and 110. For a different supplier, Alpha, the levels were 90, 95,100, 110,
and 120. Using an orthogonal fractional factorial design from a 55 facto-
rial design, the researchers constructed 25 profiles of bid costs for each of
the five competing firms; one profile was repeated three times, resulting in
a total of 27 profiles; these were divided into three rotation sets A, B, and
C of nine each. Each respondent received one of these rotation sets in a
random order; the nine profiles within the rotation set also were admin-
istered randomly to each respondent. A respondent in a client company
indicated the catering firm he or she will offer the contract for the cafeteria
business for each choice set.
The researchers in this study first conducted preliminary interviews and
focus groups to identify the factors that decision makers in the customer
companies paid attention to. These variables fall into three groups: (1) cus-
tomer characteristics (size, percent managerial and white- collar personnel,
etc.) and preferences for menu and frequency of repetition; (2) restaurant
factors (food quality, ambiance and service offered); and (3) pricing vari-
ables (lunch price and company subsidy). These data were collected from
each client company in addition to the choice data. In all a sample of 207
respondents were contacted in the study; each respondent was chosen to
represent his or her company and was responsible for making the decision
on the choice of a catering firm for his company.
An aggregated logit model was developed to describe the choices made
by the respondents. In this model, the bid price indexes and other vari-
ables were used as predictors. The model was estimated using maximum
likelihood methods. The fit was quite good (model chi square was 286.44
with 34 degrees of freedom, with a p-value close to zero); several of the
MIZIK_9781784716745_t.indd 378 14/02/2018 16:38

Table 15.2 Predictions for three competitive bid profiles
Catering Competitive bid Competitive bid Competitive bid

company profile Set 1 profile Set 2 profile Set 3
Bid index Predicted Bid index Predicted Bid index Predicted
probability probability probability
of winning of winning of winning
the the the
contract contract contract
Alpha 110 0.10 115 0.04 105 0.15
Beta 100 0.78 95 0.81 100 0.75
Gamma 95 0.005 95 0.005 95 0.005
Delta 102 0 100 0 102 0
Phi 100 0.115 100 0.145 100 0.095
variables turned out to be significant, as expected. The partworth values

for the bid price attribute were in the expected direction; i.e., the prob-
ability of winning a bid decreased with increases in bid price. But, these
relationships differed across the five suppliers. The analysis revealed the
impact on the probability of winning a contract for the Alpha Company
for changes in the three sets of variables noted above. The impact on the
probability of choosing the Alpha firm decreases with an increase in the
number of managerial and white-collar employees in the customer firm
and when the customer firm prefers a dining room environment relative to
a cafeteria. Similarly, the probability increases with changes in the weekly
menus and lower lunch prices.
A decision support system was developed using the estimated logit
model to predict the probability of winning a contract for the Alpha
Company for a potential client under the assumptions of potential bids
by the competing firms. The Alpha Company manager simply had to
input the characteristics of the potential client and his or her assumptions
about the possible competitive bids. Table 15.2 is an example of such a
prediction for one client company, Omega. In this example, it is clear that
the entrepreneurial firm will not be able to win the contract unless it drasti-
cally reduces its costs. Also, the chances of the Alpha Company winning
fall when its bid goes up and rise when its bid goes down.
The Alpha Company used this decision support system in its bids and
experienced great success in landing new contracts.
MIZIK_9781784716745_t.indd 379 14/02/2018 16:38

Market Value of an Attribute

Improvement (MVAI)
As firms improve the attributes of their products, a question arises whether

the attribute improvement measured in terms of profitability is worth the
cost. This question can be answered with the help of conjoint analysis, as
shown by Ofek and Srinivasan (2002). We now describe their approach.3
It is possible to derive a mathematical expression for the market value of
an attribute improvement. For this purpose, we consider a market consist-
ing of J firms, each offering one product in a category. Each product has K
attributes in addition to its price. Let xjk be the value of the k-th attribute
for the j-th product and let pj be the price of the j-th product. Consumers
have the choice of buying any one of the J products or not buying at all.
Let mj denote the market share for the j-th product (j= 1, . . ., J) and m0 be
the market share of the no-purchase option. Further, let cjk be the change
in the cost of the j-th product for a unit change in the k-th attribute. The
authors consider the ratio of the positive change in market share due to the
improvement in an attribute to the negative change in market share due
to an increase in price as the market value of an attribute improvement.
Mathematically,
MVAI = 2 (0mj / 0xjk) / (0mj / 0pj)
It would be worthwhile for the firm to undertake the attribute improve-

ment if this quantity exceeds the cost of attribute improvement (cjk).
Naturally, the market share of a brand depends upon the choice set,
competitive reactions, heterogeneity of the sample of individuals whose
responses are used to calibrate the conjoint model, and the particular
specification used for the conjoint model, and the rule used to translate
utilities into probabilities of choice. The changes in market share can be
estimated using a conjoint study. This is what Ofek and Srinivasan used to
empirically evaluate attribute improvements in a product under two sce-
narios: (1) no reaction by competition and (2) competitors react by making
appropriate changes in their own products. They used a logit model to
specify the probabilities of choice at the individual level and aggregated
them to obtain market shares at the aggregate level.
We use the authors’ example to illustrate the approach. The product
category for this example is portable camera mount products. The set of
competing products consists of UltraPod, Q-Pod, GorillaPod, Camera
Critter, and Half Dome; the third product is a hypothetical one under
development. These products are described on five attributes: weight, size,
set-up time in minutes, stability, and positioning flexibility for adaptation
MIZIK_9781784716745_t.indd 380 14/02/2018 16:38

to different terrains and angles. In the conjoint study, each attribute was
varied at three levels and 302 subjects ranked 18 full profiles. The authors
estimated the MVAI for each of the five attributes when changes are made
in each of the three products. Their results show that the benefits from
improving all attributes except set-up time exceed the cost of making the
improvement. The authors found that the MVAI values calculated using
a commonly used approach of averaging the ratio of weights of attribute
and price across the individuals in the sample to be considerably upward
biased and possibly incorrect. Further, the profitability of different
attribute improvements is much lower when competitive reactions are
considered in the computations. Note that such reaction calculations are
possible with simulations in conjoint studies.
Marketing Initiatives in a B2B Context
This application will describe how conjoint analysis was applied in setting
marketing initiatives (largely push marketing strategies) in a B2B context
using the published article by Levy, Webster, and Kerin (1983), who
applied conjoint analysis to the problem of determining profit functions for
alternative push strategies for a margarine manufacturer. They described
each push strategy in terms of four marketing mix variables: coopera-
tive advertising (3 levels described as 3 times at 15 cents/lb.; 4 times at 10
cents/lb.; and 6 times at 7 cents/lb.), coupons in local newspapers (3 levels
described as 2 times at 25 cents/lb., 4 times at 10 cents/lb. and 3 times at 15
cents/lb), financial terms of sale (2 levels described as 2 percent/10 days/net
30 and 2 percent/30 days), and service level defined in terms of percentage
of items shipped that were ordered by the retailer (3 levels described as 96
percent, 98 percent, and 99.5 percent). While the costs for a push strategy
could be computed from internal records of the firm, sales response could
not be estimated from past data. The authors utilized conjoint analysis
to determine the retailers’ sales response to different push strategies. For
this purpose, nine profiles, developed using a partial factorial orthogonal
design, were presented to a sample of 68 buyers and merchandising man-
agers. The judgment by the respondent was the expected change from last
year’s sales due to the push marketing mix defined by each profile. All the
retail buyers were classified into small, medium, and large buyers, with
respective levels of past purchases of 5,000, 15,000, and 30,000 cases. The
sales level used in the questionnaires was changed according to the size
of past buying by the retail buyer. The judged sales changes were used in
computing the expected sales revenues and profits from each marketing
mix and average partworth values were computed as dollar sales.
MIZIK_9781784716745_t.indd 381 14/02/2018 16:38

Based on this analysis, the authors concluded that the least profitable
marketing mix is cooperative advertising offered three times a year at 15
cents per pound, coupons in newspapers offered two times a year at 25
cents per pound, terms of sale 2 percent/10 days/ net 30, and 96 percent
level of service. The most profitable marketing mix consisted of coopera-
tive advertising six times a year at 7 cents per pound, coupons four times
a year at 10 cents per pound, 2 percent/30 day terms and a 98 percent
service level. Although the particular results are specific to the situation
considered, the application shows how conjoint analysis can be employed
to determine the allocation of a marketing mix budget for a brand.
Choice of a Distribution Channel for

Purchase of a Durable Item
This is based on an empirical study conducted by Foutz, Rao, and Yang

(2002); while the authors’ purpose was to test some behavioral decision
theories, we use it simply to show an application of choice-based conjoint
analysis to the problem of an individual choosing an outlet (conventional
bricks-and-mortar, catalog, and an internet store) for purchasing a com-
puter monitor. The choice context given to respondents of the study was
as follows:
Place yourself in a situation where you have just settled down in a new city, and
you are thinking of purchasing a new 17’” computer monitor for yourself, since
you sold the old one when you moved. You have a budget of three hundred U.S.
dollars for this purchase, and you have other uses for any funds left over. You
also wish to get the monitor soon due to the need of some work at hand. After
some initial information search, you have narrowed down to your most favorite
model. Your search has also identified three retailers, each of which is the best
in each of the three channels from which you may consider purchasing the
monitor, bricks & mortar, print catalog, and the Internet/online. Fortunately, all
of them carry the model you want.
All three retailers are described on five attributes of average price, prod-
uct trial/evaluation, sales assistance, speed of acquiring purchased monitor,
and convenience of acquisition and return, described on 3, 2, 3, 3, and 3
levels respectively. The definitions of the levels were as shown in Table 15.3.
This study was conducted among 146 graduate and senior undergradu-
ate students (78 males and 68 females) in a major Northeastern university;
respondents were compensated for their participation in the study. Each
survey took about half an hour and consisted of 11 conjoint choice
tasks on channel choices for the purchase of a computer monitor and
respondents were asked to choose the one option from which he/she would
MIZIK_9781784716745_t.indd 382 14/02/2018 16:38

Table 15.3 Attributes and levels for the computer monitor conjoint study
Attribute Levels
Average price 1. around $230
2. around $250
3. around $270
Product trial/evaluation 1. display only
2. display AND physical/virtual trial
Sales assistance 1. not available
2. only minimal technical support
3. very helpful with rich technical information
Speed of acquiring purchased 1. same day
monitor 2. within 2–7 days
3. longer than 7 days
Acquisition and return 1. in store only
2. mail only
3. in store OR mail
Table 15.4 Attributes and levels for the competitive options in the

computer monitor study
Bricks and Print catalog Internet/online

mortar
Average price Around $270 Around $250 Around $230
Product trial/evaluation Display Display Display
AND AND AND
physical/virtual physical/virtual physical/virtual
trial trial trial
Sales assistance Very helpful Very helpful Only minimal
with rich with rich technical
technical technical support
information information
Speed of acquiring Same day Within 2–7 days Same day
purchased monitor
Acquisition and return Mail only In store only Mail only
actually purchase a monitor. An example of a purchase situation is shown

in Table 15.4.
In addition a short questionnaire was used to collect information
on demographics and other important individual characteristics. The
MIZIK_9781784716745_t.indd 383 14/02/2018 16:38

Table 15.5 Logit estimates for the choice-based conjoint study of channel

choice
Attribute and levels Coefficient Standard t-value p-level

Error
Channel:
Bricks and mortar 0.112 0.882 1.27 0.20
Catalog −0.221 0.096 −2.29 0.02
Internet 0
Price:
$230 2.702 0.138 19.57 0.00
$250 1.598 0.129 12.37 0.00
$270 0
Trial and evaluation:
Display only −0.730 0.095 −7.70 0.00
Display and physical trial 0
Sales assistance:
Not available −1.692 0.119 −14.23 0.00
Only minimal technical support −0.763 0.113 −6.71 0.00
Very helpful rich technical 0
information
Speed of acquisition:
Same day 2.000 0.121 16.48 0.00
Within 2–7 days 1.564 0.125 12.46 0.00
Longer than 7 days 0
Acquisition and return:
In store only −0.136 0.106 −1.28 0.20
Mail only −0.873 0.113 −7.70 0.00
In store or mail 0
Likelihood of the model −901.15
Rho-square 0.37
Number of observations 1,305
ajority of the respondents had more than three years of online experi-
m
ence (93.8 percent of the 146 respondents) and spent less than 20 hours per
week online (72.4 percent). One-third (32.4 percent) of the respondents
spent less than $200 per year online; another third (37.9 percent) spent
between $200 and $1,000 annually online; the rest of them spent more than
MIZIK_9781784716745_t.indd 384 14/02/2018 16:38

$1000. 64.8 percent of the respondents had purchased computer monitors

before, however only 20.7 percent claimed that they had adequate techni-
cal knowledge about computer monitors. In addition, 71 percent of the
respondents had purchased from catalogs before.
The choice data were analyzed using a simple multinomial logit model.
The fit of the model as described by the Rho-square (a measure analogous
to R-square for the multinomial logit analysis) was 0.37; this indicates
heterogeneity among the respondents. The estimates for the sample as a
whole, shown in Table 15.5, represent average partworth values for the
attributes used in the study; there were few surprises in the partworth
values. After appropriate validation, these estimates can be employed in
identifying the attribute levels deemed important in a new store on any one
of the three distribution channels. We should note that the attribute levels
implied different resource commitments in the design of a store.
Conclusion
This chapter has summarized a set of five applications of conjoint analy-

sis to show the versatility of the method. In general, the methodology of
conjoint analysis is extremely useful in conceptualizing and implementing
research for a variety of marketing decision problems. It is the imagination
of researchers that may limit the usefulness of conjoint methods.
Notes
1. This material is drawn from Chapter 9 and Section 8.6.1 of Vithala R. Rao, Applied Conjoint
Analysis, Berlin Heidelberg: Springer Verlag, 2014; used with the permission of Springer.
2. The catering company also sets fixed fees for setting up the catering arrangement and
arranging special banquets, but these were outside the scope of this study.
3. While the authors developed their theory using continuous changes in the attributes, we
use discrete changes for the purpose of exposition.
References
Durvasula, S. S. Sharma and J. C. Andrews (1992), “STORLOC: A Retail Store Location:

Model based on Managerial Judgments,” Journal of Retailing, 68 (4), 420–444.
Foutz, Y. N. Z, V. R. Rao and S. Yang (2002), “Incorporating Reference Effects into
Conjoint Choice Models,” Working paper, Cornell University, March.
Ghosh, A. and S. Craig (1991), “FRANSYS: A Franchise Distribution System Location
Model,” Journal of Retailing, 67 (4), 467–495.
Green, P. E. and V. R. Rao (1971), “Conjoint Measurement for Quantifying Judgmental
Data,” Journal of Marketing Research, 8 (August), 355–363.
MIZIK_9781784716745_t.indd 385 14/02/2018 16:38

Levy, Michael, John Webster, and Roger Kerin (1983), “Formulating Push Marketing
Strategies: A Method and Application,” Journal of Marketing, 47 (Winter), 25–34.
Ofek, E. and V. Srinivasan (2002), “How Much Does the Market Value an Improvement in
a Product Attribute?” Marketing Science, 21 (4), 398–411.
Rao, Vithala R. (2014), Applied Conjoint Analysis. New York: Springer.
MIZIK_9781784716745_t.indd 386 14/02/2018 16:38

Appendix: Brief Discussion of Conjoint

Analysis Method
The methodology of Conjoint Analysis is similar to other methods of mar-

keting research. Once the managerial problem is defined, the researcher
translates it into a research problem and implements it with conjoint
approach. We will describe this in the context of new product design
or product modification for a smartphone company like Samsung. For
this purpose, the researcher first identifies the set of relevant alternatives
(brands) in the product category and then determines their attributes and
different levels (or values) they take. Let us assume that the set of relevant
brands consists of four competing brands: Apple, Samsung, Google, and
LG. Let us assume that this study is planned for Samsung. The attributes
can be several. A preliminary study enables the researcher to choose a
subset of important attributes. This initial study may lead to five attributes
and levels as: style, weight, talk time, camera quality and brand. Talk time
is proxy for battery life. In this study, let us assume that the following
levels (or values) are identified for the attributes:
l Phone style: candy bar, slide phone, flip phone, or touch screen (4
levels);
l Brand: Samsung, Google, Nokia, and LG (4 levels);
l Weight: 100gm, 115 gm, 130 gm, and 145 gm (4 levels);
l Talk time: 5 hours, 7 hours, 9 hours, and 11 hours (4 levels); and
l Camera quality (in megapixels); 8, 12, 16, 20 (4 levels).
Given these attributes and levels, there can be as many as 1,028 (= 4 ×

4 × 4 × 4 × 4 × 4) alternative profiles of brands to consider. The conjoint
methodology enables the researcher to reduce this set to 16 profiles
selected according to experimental design procedures (see the OPTEX
algorithm in the SAS system for one such procedure). An example of a
profile is: (Touch screen, Google, 130 gm, 9 hours, 12 Megapixels).
Having developed these profiles, they are then administered to a sample
of respondents to seek their preferences for each of them. Basically, there
are two procedures in this task; one is called the ratings method wherein
the respondent evaluates each profile individually. The other is called
choice-based method; this method involves presenting choice sets to the
respondents (each set consisting of four or five profiles similar to the
one illustrated above) and eliciting which option he or she will choose.
The data collected are then analyzed according to a statistical method
(multiple regression for ratings method and logit for choices). These
analyses will yield (part-) utility values for each level of the attributes.
MIZIK_9781784716745_t.indd 387 14/02/2018 16:38

These part utilities will then be used to estimate the overall utility of any
product. Normally, the estimated model is “validated” using additional
data collection.
An example of a utility model is:
U = 0.26*DS1 + 0.74*DS1 + 0.28*DS3 + 0.07*DB1 – 0.04*DB1 + 0.46 *

DB3 + 0.24 * Talk time – 0.012 * Weight + 0.09 * Megapixels,
where DS1, DS2, and DS3 are dummy variables (taking values of 1 or
zero) for the phone styles of slide phone, touch screen, and flip phone,
respectively, and DB1, DB2, and DB3 are dummy variables (taking
values of 1 and zero) for the brands of Samsung, Nokia, and Google,
respectively. This estimated utility model has face validity. Touch screen is
preferred relative to other phone styles, Google brand is preferred, lighter
phone is preferred, more hours of talk time and megapixels are preferred.
Figure 15A.1 shows the steps involved in the conjoint methodology;
this figure shows only two of the many options available for implementing
conjoint analysis.
MIZIK_9781784716745_t.indd 388 14/02/2018 16:38

Purpose of the Decide on the

Conjoint Major Approach
Study for Implementation
Identify
Product
Attributes
and Levels
Ratings- Choice-
Based Based
Design Design
Profiles Choice
Sets
Collect Collect
Preference Choice
Data Data
Analyze Analyze
Data Data
(Regression) (Logit)
Part-worth Functions (utility values for attribute levels)
Use Results for the Study Purpose
Figure 15A.1 Major steps in a conjoint study
MIZIK_9781784716745_t.indd 389 14/02/2018 16:38

16. How time series econometrics helped
Inofec quantify online and offline funnel
progression and reallocate marketing
budgets for higher profits
Koen Pauwels
Analytical marketing is not very common in small- and medium-size enter-

prises in the business-to-business sector. As such, if we had a model or decision
support system to enable us to decide how to allocate resources across commu-
nication activities and channels, we will have a huge advantage compared to our
competitors.
Leon Suijkerbuijk, CEO of Inofec
The company and its challenges
Inofec BV, a family-run European office furniture supplier with about 80

employees, offers an array of over 7000 SKUs to professional end users.
Having just taken over the helm from the company founder (his father),
CEO Leon Suijkerbuijk saw a key opportunity for more profitable growth
from analyzing Inofec’s own financial and marketing data. So far, long-
term effects or cross-effects between channels had not been considered,
and allocation decisions were mainly based on gut feeling or “that’s
how we did it last time.” Against this background, Leon was looking for
another perspective and was willing to adopt a marketing science approach
to answer the following specific questions: (1) Do Inofec’s marketing com-
munication activities only “feed the funnel” or do they also have an effect
on later stages of the purchase funnel? (2) What is the (net) profit effect of
their marketing communication activities? Especially, what is the effect of
“customer-initiated contacts” versus “firm-initiated contacts”? (3) When
does the effect “hit in” and how long does it last? (4) How can Inofec
improve its profits by reallocating budgets?
To answer these questions, we (Wiesel, Arts and Pauwels 2011) worked
with the company in the several phases outlined in Figure 16.1. The first
phase consisted of jointly defining the managerial problem and mapping
out the online and offline funnel for this company. The second phase lev-
eraged data from the distinct databases, which turned out to be the most
390
MIZIK_9781784716745_t.indd 390 14/02/2018 16:38

Time series econometrics to quantify funnel progression 391
1 2 3 4
Validating
Defining Organizing,
Insights
Managerial Leveraging Analyzing
& Discussing
Problem Data
Strategy
5
Infrastructure and Training to Improve Decision Making
Define managerial problem in Discussing results, designing strategies

1 4
collaboration with the on data-driven insights, and deduct field
company – duration: approx. experiment to gain further insights and
3 months validate results – duration:
approx. 7 months and ongoing
2 Leveraging data from existing Ongoing training in marketing analytics

5
systems (e.g., transaction, and improving the decision making
marketing databases) process with existing information –
– duration : approx. 20 months duration: throughout the whole
collaboration and ongoing
Employing marketing science

3
approaches to derive insights
from data – duration: approx. 3 months
Figure 16.1 Collaboration process in key phases
time-consuming part of the project. In the third phase, we established

the right fit among organizational problem, data and methodology, and
estimated the time-series model. The fourth phase saw the design and use
of an analytic dashboard based on the model estimates, which created the
enthusiasm for running a field experiment. Finally, ongoing is the process
of training employees in the use of analytics, and in further improving the
model, dashboard and decision making.
mapping out Inofec’s offline and online

purchase funnels
Our conceptual framework (Figure 16.2) focuses on the effect of market-

ing communication activity on profits, accounting for dynamic effects
among purchase funnel stages in both offline and online channels, and
feedback effects within and across channels.
MIZIK_9781784716745_t.indd 391 14/02/2018 16:38

MARKETING
Level 1
Adwords
Email
Feed Catalog Feed
Fax
Flyer
Benefits Benefits
Costs
Level 2
ONLINE CHANNEL OFFLINE CHANNEL
Web visits Leads (info requests)
F Leads (info requests) Cross-Channel effect F
Quote requests
Quote requests Orders
Orders
Financial Results Financial Results
PROFITS
Level 3
Figure 16.2 Conceptual framework
Marketing Activity: Firm-initiated Contacts and Customer-initiated

Contacts
Depicted as level 1 in Figure 16.2, organizations use different marketing

communication activities in order to generate revenue and move custom-
ers through the purchase funnel. Broadly speaking, we distinguish “firm-
initiated contacts” (FICs) from “customer-initiated contacts” (CICs),
which require the prospective customer to take an action (e.g., click on
an ad) before the company is charged. Inofec has only recently started to
spend on CICs in the form of search engine ads (about 13 percent of the
total marketing budget), and management was doubtful about the incre-
mental revenues generated. In contrast, it had always spent heavily (about
70 percent of the budget) on direct mail (flyers), followed by fax and email
campaigns to prospective customers. Finally, the percent discount given to
customers was believed to strongly drive demand.
Channels and Purchase Funnel Stages
Depicted as level 2 in Figure 16.2, customers’ channel preferences can

switch as they move closer to purchase. For the online funnel, web visits
MIZIK_9781784716745_t.indd 392 14/02/2018 16:38

and leads (information requests) signal the beginning of the purchase

process. Request for quotes (via the website) indicates that the prospec-
tive customer is evaluating the offer. Finally, orders (via the website) is
a straightforward variable representing actual purchase. For the offline
funnel, the variables are similar, except that we do not observe an equiva-
lent measure to web visits.
Marketing Effects on Purchase Funnel Stages
Both online and offline marketing activity may ultimately generate profits
(level 3 in Figure 16.2) by inducing prospective customers to start/finish
their purchase process either online or offline. Customers may search
online when the need arises for office furniture, visit the website to ask for
information, but then call up the salesforce for the final quote and order
(cross-funnel effects). Moreover, a marketing exposure or touch point
may increase conversion down the funnel. For instance, being exposed
to paid-search ads may increase the prospect’s familiarity with the brand,
while a well-designed catalog in the mail can signal the high quality of the
company and its product. Both instances may increase customer conver-
sion in later stages. In our framework and model, we account for both:
marketing activities can affect the beginning but also later stages of the
purchase funnel.
Organizing and Leveraging the Data
Before the time-series model could be estimated, we had to prepare the

data coming from four databases: transactional (order volume, sales price
and cost of goods sold), marketing spending, online purchase funnel and
offline purchase funnel. The analysis was at the daily level since marketing
actions varied daily and we aimed to identify funnel progression, which
typically occurs over a few days. Operationalizing the variables as shown
in Table 16.1, our data covered 876 days (over 2.5 years) across 12,000
customers. Leveraging the data for model-free insights, we observed the
online channel was more popular for information requests (online leads
are higher than offline leads), but the offline channel was more popular
for quote requests and orders. In addition, the average offline order was
slightly higher than the average online order.
MIZIK_9781784716745_t.indd 393 14/02/2018 16:38

Table 16.1 Variable operationalization
Variable Operationalization
Marketing Catalog Daily cost of catalogs (0 on days with no
activity catalogs sent)
Fax Daily cost of faxes (0 on days with no faxes
sent)
Flyers Daily cost of flyers (0 on days with no flyers
sent)
Adwords Daily costs of pay-per-click referrals
eMail Daily number of net emails (sent minus bounced
back)
Discounts Percentage of revenue given as a discount
Online Web visits Daily total amount of visits to the website
funnel Online leads Daily requests for information received via the
website
Online quotes Daily requests for offers received via the website
Online orders Daily number of orders received via the website
Offline Offline leads Daily requests for information received via sales
funnel reps, telephone or mail
Offline quotes Daily requests for offers received via sales reps,
telephone or mail
Offline orders Daily number of orders received via sales reps,
telephone or mail
Performance Sales revenues Daily sales revenues
(Gross) profit Daily revenues minus cost of goods sold
Analysis and results
We extended the persistence modeling approach (Dekimpe and Hanssens

1999) to account for dynamic and cross-channel effects. Specifically, we
estimated a vector-autoregressive (VAR) model with 14 regression equa-
tions; explaining both online (Google Adwords, email) and offline (fax,
flyer, catalog and discounts) Marketing, Online purchase funnel metrics
(web visits, online leads, quote requests and orders), Offline purchase
funnel metrics (offline leads, quote requests and orders) and Profits
(revenues – costs of goods sold). As control variables, we included an
intercept C, a time trend T, day-of-week seasonal dummy variables (using
Friday as the benchmark), and dummy variables for holidays. The model
explained 77 percent of the variation in profits (adjusted R2 = 0.76).
MIZIK_9781784716745_t.indd 394 14/02/2018 16:38

Figure 16.3 shows estimated impulse response functions, i.e., the profit
effects for €1 spent on the three main marketing activities. Table 16.2
derives from these figures the total (cumulative) profit effect, including the
number of days till the peak effect (wear-in period) and the total number
of days with significant profit effects (wear-out period).
Catalogs showed no significant profit effects. While faxes achieved their
peak impact on the day sent (wear-in of 0), Adwords took one day and
Flyers took two days to do so (wear-in of 2). Interestingly, the effect of
faxes also wore out quickly, while Adwords and Flyers continued to affect
purchases for at least one week. In response to Inofec’s questions about
these differences, we proposed that these temporal patterns were driven by
the effect of different marketing activities on different stages of the pur-
chase funnel. Based on the restricted impulse response analysis (Pauwels
2004), we estimated the separate effects of each marketing activity on the
online and offline funnel stages, as shown in Figure 16.4.
Faxes hardly “feed the funnel” at all: they are unlikely to get the atten-
tion of prospective customers early on in the purchase funnel. However,
they directly increase online information requests and quotes, and offline
orders. The latter direct path represents 83 percent of faxes’ total profit
impact. Because of this direct effect on later funnel stages, the profit
impact of faxes materializes and dissipates quickly. Higher spending on
Google Adwords both feeds the funnel, in the form of online visits, and
increases online quotes and orders, even keeping online visits constant.
This illustrates the “billboard” or “inferred quality” effects of Google
Adwords: we infer (in the absence of individual-level data, which Google
does not share) that high paid-search rankings increase the likelihood that
a prospective customer, after having checked and dismissed competitive
offerings, progresses towards a purchase. Two-thirds (66 percent) of
Google Adwords’ impact is through the visits-offline orders path, explain-
ing the longer wear-in of the profit effect of Adwords versus faxes. Finally,
flyers feed both the online and the offline funnels and yield profit through
many paths, none of which dominate and all of which yield rather small
profit effects in the end. As a result, flyers take longer to wear-in and have
a smaller total impact on profits than either faxes or Adwords.
Finally, Figure 16.4 shows a clear directionality of cross-channel effects.
Offline marketing may affect online funnel metrics, but not vice versa.
Conceivably, many prospective customers prefer to start the purchase
decision process online, even when they noticed the firm’s offline market-
ing activities. In contrast, online funnel metrics significantly affect offline
funnel metrics, but not vice versa. In other words, some customers move
from online to offline as their decision process moves from information to
evaluation and finally to action. This is consistent with prospects enjoying
MIZIK_9781784716745_t.indd 395 14/02/2018 16:38

MIZIK_9781784716745_t.indd 396
Profit effect of faxes Profit effect of adwords Profit effect of flyers
2.5 25 0.3
2 20 0.2
1.5 15 0.1
1 10 0
0.5 5 –0.1
396
0 0 –0.2
0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 10 11 12
Days Days Days
Note: * Profit effect estimate of 1 euro spent in solid line, standard error bands in dotted lines.
Figure 16.3 Wear-in and wear-out of the marketing activities’ profit effects*
14/02/2018 16:38
Table 16.2 Marketing’s total profit effect, sales elasticity and its timing
in days
Variable Profit effect Sales elasticity Wear-in Wear-out

Fax (€) 3.33 0.05 0 6
Flyers (€) 0.57 0.04 2 9
Adwords (€) 55.72 4.35 1 9
eMail (each) 0.71 0.12 2 5
Discount (1%) 789 0.75 0 2
the search convenience of the Internet at early stages, and personal contact
with salespeople at later stages of the purchase cycle.
Discussing Strategy Options and

Validating Insights in a Field Experiment
Discussing our results, Inofec concluded it is unwise to credit a marketing

activity only for orders in ‘its’ channel, a practice typical for companies
with different managers for different channels. This approach would be
especially suboptimal for Google Adwords, which obtains 73 percent of its
total profit impact from offline orders. In contrast, faxes and flyers obtain
only 6 percent and 20 percent of their profit impact, respectively, from the
“other” channel. Moreover, managers were surprised to learn that flyers,
the activity that consumes 70 percent of the marketing budget, brings in
less money than they spend on it. Upon reflection, they attributed this
finding not to inherent issues with the marketing channel or its ad message
(which is basically the same across channels), but to overspending: when
anticipating a sales slump, Inofec had often started sending flyers to con-
tacts on third-party lists of new businesses – many of which are not in the
market for its products. In contrast, people searching for “office furni-
ture” online (and then clicking on a paid search ad) have self-revealed to
be in the market for such furniture.
The results and subsequent discussion allowed us to design the market-
ing dashboard in Figure 16.5, which enabled decision makers to perform
“what-if” analyses that showed the projected profit implications of
considered budget changes. Using the dashboard led managers to the
risky strategy recommendations of (1) decreasing spending on flyers,
(2) increasing spending on Adwords. As to the other actions, managers
saw increasing emails as low cost and relatively risk-free, while they knew
that increasing spending on faxes was not feasible due to a new Dutch law
MIZIK_9781784716745_t.indd 397 14/02/2018 16:38

Leads Quotes Orders Profit
Adwords
5.17 0.0068 0.0383
Online 0.0011 0.0636 3.50

Online Online Online 27%
visits
1.53
0.025 0.0011 0.17 0.03
0.09 1.94
Offline Offline Offline 73%
Flyers
0.03 0.0005 0.0004
Online 0.0011 0.0636 3.50
visits
1.53
0.025 0.0011 0.17 0.03
0.0002 0.09 1.94

Faxes
0.02 0.0004 0.0002
Online 0.0011 0.0636 3.50

visits
1.53
0.025 0.0011 0.17 0.03
0.09 1.94
0.0096
Figure 16.4 How marketing activities affect purchase funnel metrics and

profits
MIZIK_9781784716745_t.indd 398 14/02/2018 16:38

Figure 16.5 Marketing dashboard showing the projected profits of

spending allocations
against unsolicited faxes. Instead of rolling out the strategy recommenda-

tions immediately, we instead validated our model in a field experiment.
Specifically, we divided Inofec’s market in four comparable regions and
ran a 2 × 2 field experiment with a base (no changes in the planned flyer
campaigns) and low-spend condition (halving flyers spending), and a
base and high condition (doubling spending) for Adwords. This allowed
us to separately test the impact of reducing spending on the ineffective
marketing action (keeping others constant) – as managers contemplate
in crunch times with cost savings demands, and the impact of increasing
spending on the effective marketing action – as managers contemplate in
boom times with revenue growth demands. After the experiment had run
for three months, we compared daily net profits (net of marketing costs)
with a difference-in-differences approach.1 Table 16.3 shows the results.
While the control conditions saw daily net profits increase by €11 during
the experiment (likely due to increased furniture demand), the experimen-
tal condition applying both recommendations saw profits increase by
€154, i.e, a 14-fold higher profit increase than the status quo. Interestingly,
only applying one part of the recommendation also substantially increased
net profits. When the company’s strategy focused on higher growth,
Inofec could double Adwords without decreasing flyers (yielding €81
more net daily profits). In contrast, when the focus was on efficiency (e.g.,
because budgets were tight or needed for other actions), the company
MIZIK_9781784716745_t.indd 399 14/02/2018 16:38

Table 16.3 Daily net profit changes during the experiment versus before
the experiment
Adwords
High Base
Flyers Base € 81.39 € 10.84
Low € 153.71 € 135.45
could simply cut the least efficient activity of flyers, while maintaining
spending on Adwords. To validate that our estimated effect sizes would
still hold up after such a substantial policy change, we re-estimated our
model on the 91 days of data during the experiment, and indeed found
similar coefficient estimates. The one exception was that each euro spent
on flyers now returned 0.92 euros in the lowest marketing spend condition.
This was consistent with Inofec’s explanation that diminishing returns
were to blame for the original findings and suggested that flyers should
not be cut much more.
Ongoing learning and organizational

impact
This case study changed the organization as it led Inofec to rethink how
it makes decisions. Since its inception, the company was managed by
intuition. Hence, it was unlikely to totally abandon “gut feel” in deci-
sion making. Given the complexity of marketing problems, the litera-
ture suggests that a combination of marketing analytics and managerial
intuition provides the best results for many marketing decisions (Lilien
and Rangaswamy 2008). Accordingly, Inofec now uses both scientific
approaches as well as intuition in order to make their decisions. Moreover,
our work became a basis for discussing the operational dimensions of
Inofec’s marketing activities, affecting the mental models of decision
makers throughout the organization (Kayande et al. 2009). We devel-
oped a spreadsheet-driven dashboard tool – including a rolling windows
approach to update the model estimates – that allows easy entry of
potential marketing allocation plans and then uses the model estimates
to project likely profit consequences (Pauwels et al. 2009). Finally, the
ongoing training and increasing clout of a new employee, in charge of
marketing analytics, is expected to help institutionalize the marketing
scientific approach to allocating marketing resources – the final step in
model adoption according to Davenport (2009). As Inofec’s CEO con-
MIZIK_9781784716745_t.indd 400 14/02/2018 16:38

cluded: “We are going to design way more elaborate marketing strategies.
In doing so, we will focus on the linkages between online and offline activi-
ties, explicitly distinguish the effects, and explore new opportunities due to
new technical developments.”
Note
1. For each condition, we subtract the gross profits in the three months preceding the
experiment from gross profits in the three months of the experiment, and then scale each
condition’s profit change by the national average profit change (to control for seasonal
and general economy factors that may boost or depress profits in all conditions).
References
Davenport, T. 2009. Make better decisions. Harvard Business Review (November), 117–123.
Dekimpe, M. G. and D. M. Hanssens. 1999. Sustained spending and persistent response:
A new look at long-term marketing profitability. Journal of Marketing Research 36(4),
397–412.
Kayande, U., A. De Bruyn, G. L. Lilien, A. Rangaswamy, and G. H. van Bruggen. 2009.
How incorporating feedback mechanisms in a DSS affects DSS evaluations. Information
Systems Research 24(4), 527–546.
Lilien, G. L. and A. Rangaswamy. 2008. Marketing engineering: Models that connect
with practice. In B. Wierenga, ed. Handbook of Marketing Decision Models. New York:
Springer Science Business Media, 527–559.
Pauwels, K. H. 2004. How dynamic consumer response, competitor response, company
support and company inertia shape long-term marketing effectiveness. Marketing Science
23(4), 596–610.
Pauwels, K. H., T. Ambler, B. H. Clark, P. LaPointe, D. Reibstein, B. Skiera, B. Wierenga,
and T. Wiesel. 2009. Dashboards as a service: Why, what, how, and what research is
needed? Journal of Service Research 12(2), 175–189.
Wiesel, T., K. Pauwels, and J. Arts. 2011. Practice Prize Paper-Marketing’s Profit Impact:
Quantifying Online and Off-line Funnel Progression. Marketing Science 30 (4), 604–611.
MIZIK_9781784716745_t.indd 401 14/02/2018 16:38

17. Panel data models for evaluating
the effectiveness of direct-to-physician
pharmaceutical marketing activities
Natalie Mizik and Robert Jacobson
The impact of pharmaceutical industry marketing practices is of great

interest to policy makers, the business community, and the general public.
Direct-to-physician (DTP) marketing activities and their effects on physi-
cians’ prescribing behavior have generated heated debates. Many public
policy organizations and consumer advocacy groups believe DTP market-
ing activities compromise physicians’ integrity and significantly influence
their prescribing decisions. Those who hold this view argue that this influ-
ence has a negative impact on patients’ welfare because marketing effort
induces physicians to prescribe more expensive branded drugs needlessly,
even when generic medications are available. The pharmaceutical indus-
try, for its part, does not dispute that its marketing efforts significantly
influence physicians’ decisions on prescriptions. But it argues that this
influence benefits patients because physicians are provided with valuable
information about drugs and, as a result, can make better choices for their
patients.
Earlier research using cross-sectional data to assess the effectiveness
of pharmaceutical marketing effort directed at physicians has suggested
very large effects. These studies generally relied on cross-sectional data
and, as such, suffered from the inability to model the dynamics of the
marketing impact and to control for unobservable physician-specific
effects. The availability of individual-level panel data on detailing (visits
by the pharmaceutical sales representatives, PSRs) and sampling (free
drug samples dispensed by PSRs during the sales call) for a large number
of physicians over an extended period of time provides the opportunity to
more accurately model and better assess the impact of DTP on physicians’
prescribing. Further, these panel data allow us to examine how various
modeling choices affect the estimates of DTP effectiveness.
402
MIZIK_9781784716745_t.indd 402 14/02/2018 16:38

Evaluating direct-to-physician pharmaceutical marketing activities 403
The Data
The dataset comes from Mizik and Jacobson (2004) and covers a 24-month
period for a well-established and widely prescribed drug in the primary
care category. It contains information on the number of new prescrip-
tions for the studied drug and its competitors issued by 55,896 US-based
physicians and detailing and sampling activity by the focal pharmaceutical
firm during each month. The dataset also contains information about the
physician’s specialty area.
Modeling the Data
To illustrate the use of panel data methods, we present three sets of analy-
ses. The first set contains models making use of five different panel data
estimators for contemporaneous effects of detailing and sampling on pre-
scriptions. The second set of models does not limit the effect of detailing
and sampling to be strictly contemporaneous. Rather, these models allow
for the fact that the effects of PSR activity are unlikely to be limited to the
month when the visit occurred but may exhibit delayed and/or carryover
effects into subsequent months. While allowing for dynamic effects, the
models in the second set ignore potential physician-specific heterogeneity
in the level of prescribing (i.e., they do not explicitly model physician-
specific effects). The third set contains models that allow for both dynamic
effects and physician-specific effects and presents the final complete model
we recommend for these data.
Contemporaneous Effects Models
Table 17.1 provides the results from five different panel data estima-
tors that link monthly prescriptions of the drug to PSR activity taking
place during that month. Model 1 is the “population average” estimator
that involves a least-squares analysis of each data point. Model 2 is the
“between” estimator that makes use of the mean values for each physician.
Unlike the population average estimator, which makes use of both time-
series and cross-sectional variation to estimate the model, the between
model makes use of only cross-sectional variation. As such, the between
estimator is analogous to cross-sectional regressions.
Models 3, 4, and 5 incorporate the heterogeneity in physician prescrib-
ing. Model 3 is “the random effects” estimator. It allows for physician-
specific effects (ui) but posits any such effects to be uncorrelated with
the regressors in the model. Such an assumption, however, might not be
MIZIK_9781784716745_t.indd 403 14/02/2018 16:38

Table 17.1 Contemporaneous effects models‡
Model 1 Model 2 Model 3 Model 4 Model 5

OLS Between RE FE FE
mean-diff first-diff
Detailsit 0.630** .740** .145** .117** .043**
(0.004) (.029) (.003) (.003) (0.003)
Samplesit 0.114** .337** .021** .017** .006**
(0.001) (.004) (.000) (.000) (.000)
F-statistic F(45, F(12, F(45, F(35, F(34,
1326205)= 55843)= 1326205)= 1326205)= 1262049)=
9455.76 3093.07 808.22 278.23 274.77
Implied .630 .740 .143 .117 .043
total
detailing
effect
Implied .114 .337 .021 .017 .006
total
sampling
effect
Notes:
‡
Model specifications are provided below. Results are presented as estimate (standard
error). Time, specialty, and specialty-specific trend estimates are not reported for brevity.
The number of observations differs across the models due to the averaging, taking of first-
or mean-differences, and removing outliers.
** p-value < 0.01.
Model 1: Prescribeit = a0+ b0*Detailsit + g0*Samplesit + g t51

Models legend:
T
dt*Timet +
g 11 k *Specialtys + g 11
s51 s
ws*Specialtys*Trendt + hit
Model 2: Prescribei = a0+ b0*Details i + g0*Samples i + g 11

s51
k *Specialtys + hi
s51 s
Model 3: Prescribeit = a0+ b0*Detailsit + g0*Samplesit + g t51

T
dt *Timet +
g 11 k *Specialtys +g 11
s51 s
ws*Specialtys*Trendt + (ui + hit)
Models 4 and 5: Prescribeit = ai+ b0*Detailsit + g0*Samplesit + g t51

s51
T
dt*Timet +
g 11 k *Specialtys + g 11
s51 s s51
MIZIK_9781784716745_t.indd 404 14/02/2018 16:38

valid: pharmaceutical companies might be directing more detailing and

sampling at physicians with higher prescribing levels (e.g., PSRs might
be targeting larger practices to promote the drug because larger practice
sizes have higher prescribing levels). Model 4 relaxes this assumption and
allows for a physician-specific effect (ai) to be correlated with detailing
and sampling. It is estimated with the “within” estimator that is based on
analysis of data taken as deviations from the physician-specific averages
(mean-differencing of the data). Model 5, the “first-difference estimator,”
also allows a physician-specific effect correlated with the regressors, but
removes the fixed effects from the estimating equation through first-
differencing of the data.
Under the null hypothesis of no fixed effect, each of the five estimators
is consistent, with the random effects estimator being efficient (i.e., it is a
feasible GLS estimator). However, under the alternative hypothesis of a
fixed effect, of the five estimators presented in Table 17.1 only the within
and the first-difference estimators (Models 4 and 5) generate consistent
estimates of the coefficients.
Table 17.1 shows significant divergence in the coefficient estimates
across the five estimators. The effect of PSR activity is largest for the
between estimator (.740 for detailing; .337 for sampling) and smallest for
the first-difference estimator (.043 for detailing; .006 for sampling).
Under the null hypothesis of no fixed effects, the random effects model
is both a consistent and efficient estimator. Its estimated coefficients can
be compared to a fixed-effects estimator, (the within or the first-difference
estimator), which are not efficient but are consistent under both the null
(no fixed effects) and the alternative (fixed effects) hypotheses. Model
mis-specification (e.g., the presence of a fixed effect correlated with the
regressors) would be evidenced by the coefficients from the random effects
estimator being statistically different than those in a fixed effect estimator.
This difference is typically assessed with a Hausman test.
A Hausman (1978) test shows a statistically significant difference at
the 1 percent level between the random effects estimates and the within
estimates. Although the within estimates (.166 for detailing; .0167 for
sampling) appear similar to the random effects (.145 for detailing and
.021 for sampling), the large sample size in the dataset provides for very
small standard errors and, as such, is able to discern significant differences
across coefficient estimates.
While the Hausman test allows us to reject the hypothesis that the
random effects model (Model 3) is properly specified, it does not indicate
the source of mis-specification or confirm that the fixed-effects model is
properly specified.
Indeed, comparison of the coefficients for the within estimator (Model
MIZIK_9781784716745_t.indd 405 14/02/2018 16:38

4) with the first-difference estimator (Model 5) suggest a mis-specification

in the model of contemporaneous marketing effects with fixed effects.
Under the null hypothesis that contemporaneous fixed effects model is
correctly specified, the within estimator and the first-difference estima-
tor both provide consistent estimates. But as shown in Table 17.1, the
coefficient estimates from the fixed effects estimators (Models 4 and 5)
differ substantially. The first-difference estimator generates coefficients
for detailing (.043) and sampling (.006) that are roughly one-third the size
of the within estimator (.117 for detailing and .017 for sampling). This
discrepancy highlights the presence of a misspecification in the model of
contemporaneous marketing effects with fixed effects that may be attrib-
uted to, for example, omitted variable and/or measurement error bias.
Dynamic Models in the Absence of Physician-specific Effects
The fact that the Hausman test rejects the null hypothesis does not neces-
sarily mean that a fixed effect correlated with the regressors is present.
Rather, other types of mis-specification may be inducing the significant
differences in the model estimates. For example, some time-varying vari-
ables may have been omitted from the model and their exclusion can be
causing bias.
Indeed, it can be expected that marketing activities have effect not just
in the contemporaneous month but rather may exhibit delayed and/or car-
ryover effects. Further, physician prescribing behavior may exhibit habit
persistence that would induce current prescribing behavior to be related
to past prescribing behavior. To assess the presence of these factors (as an
alternative to physician-specific effects), Table 17.2 provides the results
from three models that include not just contemporaneous marketing
effects but also allow for carryover effects and persistence in physician
prescribing behavior. Because these models allow for an influence of
lagged prescriptions on current period prescriptions, the assumptions of
the random effects models are violated and random effects estimation is
not appropriate for these models. Therefore, these models are estimated
through ordinary least squares.
Model 6 augments the current effects specification with one-month
lagged prescriptions to capture habit persistence (i.e., a state-dependency).
Model 6 can also depict carryover effects to the extent that it reflects a
geometric decay in marketing effects (i.e., a Koyck distributed lag model).
Model 7 adds 12 lags of both detailing and samples in additional to a
contemporaneous effect so as to explicitly model carryover effects. Model
8 also has 12 lags of detailing and sampling but in addition includes 12 lags
of past prescriptions as well.
MIZIK_9781784716745_t.indd 406 14/02/2018 16:38

Table 17.2 Dynamic models neglecting physician-specific fixed effects‡
Model 6 Model 7 Model 8

Detailsit .165 (.003)** 0.060 (0.005)** 0.051 (0.005)**
Detailsit-1 −0.002 (0.006) 0.015 (0.005)**
Detailsit-2 0.007 (0.006) 0.004 (0.005)
Detailsit-3 0.006 (0.006) −0.001 (0.005)
Detailsit-4 0.006 (0.006) −0.007 (0.005)
Detailsit-5 −0.005 (0.006) −0.014 (0.005)**
Detailsit-6 0.002 (0.006) −0.014 (0.005)**
Detailsit-7 0.017 (0.006)** 0.002 (0.005)
Detailsit-8 −0.000 (0.006) −0.009 (0.005)
Detailsit-9 0.004 (0.006) −0.014 (0.005)**
Detailsit-10 0.016 (0.006)** −0.006 (0.005)
Detailsit-11 0.007 (0.006) −0.017 (0.005)**
Detailsit-12 0.019 (0.006)** −0.024 (0.005)**
Samplesit .030 (.000)** 0.013 (0.001)** 0.011 (0.001)**
Samplesit-1 0.004 (0.001)** 0.005 (0.001)**
Samplesit-2 0.005 (0.001)** 0.002 (0.001)**
Samplesit-3 0.005 (0.001)** 0.001 (0.001)
Samplesit-4 0.005 (0.001)** 0.001 (0.001)
Samplesit-5 0.004 (0.001)** 0.000 (0.001)
Samplesit-6 0.004 (0.001)** 0.000 (0.001)
Samplesit-7 0.003 (0.001)** −0.002 (0.001)**
Samplesit-8 0.005 (0.001)** −0.001 (0.001)
Samplesit-9 0.004 (0.001)** −0.001 (0.001)
Samplesit-10 0.003 (0.001)** −0.003 (0.001)**
Samplesit-11 0.005 (0.001)** −0.001 (0.001)*
Samplesit-12 0.007 (0.001)** 0.001 (0.001)*
Prescribeit-1 .739 (.001)** 0.706 (0.001)** 0.276 (0.001)**
Prescribeit-2 0.182 (0.001)**
Prescribeit-3 0.134 (0.001)**
Prescribeit-4 0.085 (0.001)**
Prescribeit-5 0.048 (0.001)**
Prescribeit-6 0.053 (0.001)**
Prescribeit-7 0.034 (0.001)**
Prescribeit-8 0.017 (0.001)**
Prescribeit-9 0.016 (0.001)**
Prescribeit-10 0.010 (0.001)**
Prescribeit-11 0.012 (0.001)**
Prescribeit-12 0.031 (0.001)**
F-Statistic F(45, 1269064)= F(58, 631105)= F(69, 628151)=
55579.7 18740.0 20583.7
Implied total 0.632 0.464 −0.333
detailing effect
Implied total 0.116 0.235 0.139
sampling effect
MIZIK_9781784716745_t.indd 407 14/02/2018 16:38

Notes:
‡
Model specifications are provided below. Results are presented as estimate (standard
error). Time, specialty, and specialty-specific trend estimates are not reported for brevity.
The number of observations differs across the models due to the inclusion of lagged terms
and removing outliers.
** p-value < 0.01, * p value < 0.05.
Models legend:
Model 6: Prescribeit = a0+ b0*Detailsit + g0*Samplesit + 1*Prescribeit-1
+ g t51
T
dt *Timet + g 11 k *Specialtys + g 11
s51 s
Model 7: Prescribeit = a0+ g 12 bj*Detailsit-j + g 12

s51
gj*Samplesit-j + 1*Prescribeit-1
+ g t51 dt*Timet + g 11 k *Specialtys + g 11

j50 j50
T
s51 s
Model 8: Prescribeit = a0 + g 12 bj*Detailsit-j + g 12 gj*Samplesit-j + g 12

s51
 *Prescribeit-j
j51 j
+ g t51 dt*Timet + g 11 k *Specialtys + g 11

j50 j50
T
s51 s s51
Model 6 shows the implied total effects of detailing (.632) and sampling
(.116) very similar to the population average current-effects model (Model
1), but attributes the effect not solely to effects occurring at the month of
the PSR activity.1 Rather, the model depicts smaller current-term effects
of .165 for detailing and .130 for sampling that persist at a monthly rate of
.739 (i.e., dissipate at a monthly rate of .261). One of the advantages
of Model 6 is that imposing a geometric decay structure/specification for
habit persistence allows for a parsimonious model of possible carryover
effects. However, the parsimony is not required with a sufficient number
of observations and it may, in fact, come at the cost of accuracy (as the
imposed structure may not be accurately reflecting the data).
Model 7 allows for direct modeling of the delayed and carryover effects
by adding 12 monthly lags of detailing and sampling. The results from
Model 7 show the constraints imposed by Model 6 are not reflective of
the data. The pattern of the estimates for the lagged effects of detailing
and sampling shows that the assumption of geometric decay implicit in
a Koyck specification does not hold: Marketing effects are not decaying
geometrically from the current period. In fact, it appears that the effects
of sampling do not dissipate at all and remain relatively constant over the
12 months. The total implied effect of detailing (.464) decreases notably
compared to the Model 6 specification, while the total implied effect of
sampling (.235) more than doubles.
Model 8 allows for direct modeling of the delayed and carryover
MIZIK_9781784716745_t.indd 408 14/02/2018 16:38

effects of the marketing activities, and higher order state-dependence

effects in prescribing behavior by adding 12 monthly lags of past
prescribing. The estimated total effects of sampling from Model 8 (.141)
is close to that estimated in Model 6. However, the effects of detailing
is markedly different: the implied total effect of detailing in Model 8 is
negative (–.333). Negative lagged effects of detailing are overwhelming
the positive effects taking place in the initial couple of months.
The fact that statistically significant effects of past prescribing are
present for all lags in Model 8 might suggest the need for the inclusion of
an even higher-order lags in the model. Alternatively, these significant
higher-order lag effects can be stemming from unmodeled physician-
specific effects. To the extent that physician-specific effects correlated
with the regressors are present in the data, the coefficient estimates will
be biased and inconsistent.
Dynamic Panel Data Models with Physician-specific Effects
The models presented in Table 17.3 allow for the presence of fixed
physician-specific effects correlated with the regressors. Model 9 aug-
ments the Model 6 state-dependency specification with the inclusion of
a fixed effect. The within (mean-difference) estimator used to estimate
fixed-effects Model 4 is no longer appropriate as this estimator generates
downward-biased estimates for the lagged dependent variable (Nickell
1981). However, the first-difference estimator provides an approach both
for controlling for fixed effects and for obtaining consistent estimates
for the lagged dependent variable. Taking first-differences of the data
removes the fixed effects from the estimating equation. But it also induces
correlation between the lagged dependent variable and the error term:
DPrescribeit-1 will be correlated with the differenced error term (hit – hit-1)
by construction. As such, least squares estimation of a first-difference
model with a lagged dependent variable would generate biased estimates.
An instrumental variable approach can be used to generate consistent
estimates. Following Anderson and Hsiao (1982), we use lagged values of
the levels of the series (values at time period t–2 and earlier) to generate
instrumental variable estimates for DPrescribeit-1. This procedure gener-
ates consistent (i.e., asymptotically unbiased) estimates of the parameters
and their standard errors.
MIZIK_9781784716745_t.indd 409 14/02/2018 16:38

Table 17.3 Dynamic fixed effects models‡
Model 9 Model 10 Model 11

DDetailsit 0.042 (0.004)** 0.053 (0.005)** .054 (.005)**
DDetailsit-1 0.027 (0.006)** .033 (.006)**
DDetailsit-2 0.021 (0.006)** .026 (.006)**
DDetailsit-3 0.020 (0.007)** .023 (.007)**
DDetailsit-4 0.012 (0.006)** .014 (.006)*
DDetailsit-5 0.002 (0.006) .002 (.006)
DDetailsit-6 −0.002 (0.005) −.001 (.005)
DSamplesit 0.005 (0.000)** 0.006 (0.001)** .006 (.0006)**
DSamplesit-1 0.002 (0.001)** .003 (.0008)**
DSamplesit-2 0.001 (0.001) .002 (.0009)*
DSamplesit-3 0.002 (0.001) .002 (.0009)*
DSamplesit-4 0.002 (0.001) .002 (.0009)**
DSamplesit-5 0.002 (0.001)* .002 (.0008)**
DSamplesit-6 0.001 (0.001) .001 (.0006)*
DPrescrbit-1§ 0.023 (0.002)** 0.161 (0.007)** .208 (.008)**
DPrescribit-2 0.101 (0.005)** .143 (.006)**
DPrescribit-3 0.069 (0.004)** .099 (.004)**
DPrescribit-4 0.032 (0.003)** .060 (.003)**
DPrescribit-5 0.002 (0.002) .012 (.002)**
DPrescribit-6 0.004 (0.001)** .007 (.001)**
DCompetit§ .738 (.050)**
DCompetit-1 −.022 (.0005)**
DCompetit-2 −.014 (.0007)**
DCompetit-3 −.014 (.0006)**
DCompetit-4 .0014 (.0011)
DCompetit-5 .005 (.0009)**
DCompetit-6 −.001 (.0005)**
F-statistic F(28, 873577) = F(45, 851340) = F(52, 851166)=
210.14 140.12 169.34
Implied total 0.043 0.211 0.321
detailing effect
Implied total 0.005 0.024 0.039
sampling effect
Notes:
‡
Model specifications are provided below. Models are estimated in first-differences. Results
are presented as estimate (standard error). Time and specialty effects estimates are not
reported for brevity. The number of observations differs across the models due to the
taking of first differences, the inclusion of lagged terms, and removing outliers.
** p-value < 0.01, * p value < 0.05.
§
Instrumental variable estimate utilized.
MIZIK_9781784716745_t.indd 410 14/02/2018 16:38

Models legend:
Model 10: Prescribeit = ai+ b0*Detailsit + g0*Samplesit + 1*Prescribeit-1 +
+ g t51
T
dt*Timet + g 11 k *Specialtys + g 11
s51 s
Model 10: Prescribeit = ai + g j50 bj*Detailsit-j + g j50 gj*Samplesit-j + g j50

s51
6 6 6
j*Prescribeit-j
+ g t51
T
dt*Timet + g 11 k *Specialtys + g 11
s51 s
Model 11: Prescribeit = ai + g j50 bj*Detailsit-j + g j50 gj*Samplesit-j + g j50

s51
6 6 6
j*Prescribeit-j
+ g j50
6
lj*Competitorit-j + g t51
T
dt*Timet + g 11 k *Specialtys
s51 s
+ g 11
s51
Column 1 of Table 17.3 reports the estimation results for Model 9. The
estimated coefficients are markedly different than those of Model 6
(i.e., a habit persistence model without fixed effects). The current period
effects of detailing (.042) and sampling (.005) are significantly lower, as
is the coefficient for lagged prescriptions (.023), than those in Model 6.
That is, Model 6 is mistaking the unmodeled fixed effect for persistence.
Since under the null hypothesis of no fixed effects the estimates from the
two models should both yield consistent coefficient estimates, the null
hypothesis of no fixed effects can be rejected. Since the estimated effect
of prescriptions lagged one month is very small (.023), the implied totals
from Model 9 are virtually indistinguishable from the contemporaneous
fixed effect Model 5.
Model 10 augments Model 9 by including additional lagged terms
of detailing, sampling, and past prescriptions into the specification.
Unlike the results of Model 8, the inclusion of physician-specific fixed
effect shows the effects of detailing, sampling, and lagged prescriptions
dissipating and all but vanished for lags greater than 6 months. The
difference in estimated coefficients between Model 10 and Model 9
highlights the importance of the inclusion of additional lagged values
of the series. The estimated implied total effects of detailing (.211) and
sampling (.024) are approximately five times larger than those reported
in Model 9.
The effects of lagged detailing, sampling, and prescriptions shown in
Model 10 indicate one reason (i.e., omitted variable bias) that can account
for the difference in coefficient estimates between the mean-difference
(Model 4) and first-difference (Model 5) estimators in Table 17.1. An
additional potential consideration is the role of measurement error.
MIZIK_9781784716745_t.indd 411 14/02/2018 16:38

Measurement error in the regressors attenuates effects (i.e., creates a bias

toward zero in estimated coefficients) and its influence can be particularly
pronounced in fixed-effects panel data models. In these models, since
“signal” is removed via, for example, taking mean differences or first
differences, the effect of measurement error “noise” can become more
pronounced (Griliches and Hausman 1986).2 The use of a long-difference
estimator (creating transformation of the variables of the form: Xit – Xit-j,
with j > 1) provides a mechanism to lessen the inconsistency (attenua-
tion) in coefficient estimates. With a diminishing autocorrelogram, less
“signal” is being removed through longer lag differencing and thus the
inconsistency caused by measurement error is reduced. Estimating Model
10 through the use of a seven-months differencing (i.e., Xit – Xit-7) gener-
ates results in very close correspondence to those reported in Table 17.3,
Model 10 (i.e., the first-difference estimator). As such, we can rule out
measurement error as a substantial cause of the variation in the Table 17.1
models and instead link the variation to omitted variable bias attributable
to fixed effects and time-varying effects.
Model 10 can be further enhanced by including current and lagged
competitor prescriptions into the analysis. To the extent that own and
competitor prescriptions at the individual physician level are correlated,
omitting competitor prescriptions would lead to biased estimates of own
prescribing and to erroneous conclusions about the total effects of detail-
ing and sampling.
The final, complete Model 11 includes both lagged own prescriptions
and lagged competitors’ prescriptions and is able to separate the total
demand dynamics into two key components: competitive substitution and
own demand growth. Lagged own prescriptions reflect persistence and
will have a positive effect on current prescriptions. Lagged competitors’
prescriptions will have a negative effect on current prescriptions as they
capture the substitution effects, i.e., physicians making choices among
competing drugs. Current competitors’ prescriptions, however, may have
either positive or a negative effect because they capture two different
phenomena with opposite effects. In addition to the negative substitution
effect, current competitors’ prescriptions will also reflect the positive effect
of changes in total demand due to overall market expansion or contrac-
tion (i.e., own and competitor sales moving in same direction because
of industry-wide effects). As such, the current-term coefficient (l0) will
depend on the relative magnitude of the two conflicting effects and its sign
cannot be postulated a priori.3
The estimated effects for detailing and sampling in Model 11 are
similar to those estimated for Model 10. The difference arises in the
estimated coefficients for lagged own prescriptions. Each of the lagged
MIZIK_9781784716745_t.indd 412 14/02/2018 16:38

prescription coefficients in Model 11 is higher than its counterpart

in Model 10. Because of this, the estimated implied total effects of detail-
ing (.321) and sampling (.039) are larger in Model 11 than in Model 10.
As expected, consistent with brand switching, we observe negative
effects for lagged competitor prescriptions. The inclusion of these com-
petitive effects is important, not only in helping explain new prescriptions,
but also in allowing us to better isolate the degree of persistence in physi-
cian behavior. That is, because competitor prescriptions are correlated
with own prescriptions, failure to model these competitive effects results in
biased estimates of the autocorrelation coefficients and, as a result, biased
estimates of the total detailing and sampling effects.
Conclusion
Panel data studies are being increasingly used as researchers have come to
appreciate the additional insights that can be gained compared to cross-
sectional studies and estimation precision achieved as compared to time
series. Effective panel data analysis involves understanding heterogeneity
both across units and across time-series dynamics. Carefully comparing
estimation results across various estimators and models provides research-
ers a mechanism to better model effects and understand the nature of
underlying relationships in the data.
Notes
1. The total effect of detailing and sampling can be calculated as g j50 bj / [ 1 2 g Ll50 fl ] and
gK g
J
k50 kg / [ 1 2 L
f
l50 l ] , respectively.
2. Conversely, the effects of measurement error may also be reduced in fixed effects panel
data estimator to the extent the measurement error is autocorrelated.
3. Just as substitution effects cause competitor prescriptions to influence own prescriptions,
own prescriptions will influence the amount of competitor prescriptions. To account for
this simultaneity, we make use of instrumental variable estimation by using lagged values
of the levels of the series (values at time period t – 2 and earlier) to generate an instrumen-
tal variable estimates for ∆Competitorit.
References
Anderson, T.W. and Cheng Hsiao (1982), “Formulation and Estimation of Dynamic Models
Using Panel Data,” Journal of Econometrics, 18, 47–82.
Griliches, Zvi and Jerry A. Hausman (1986), “Errors in Variables in Panel Data,” Journal of
Econometrics, 31, 93–118.
MIZIK_9781784716745_t.indd 413 14/02/2018 16:38

Hausman, Jerry A. (1978), “Specification Tests in Econometrics,” Econometrica, 46 (6):

1251–1271.
Mizik, Natalie and Robert Jacobson (2004), “Are Physicians ‘Easy Marks’? Quantifying
the Effects of Detailing and Sampling on New Prescriptions,” Management Science,
1704–1715.
Nickell, Stephen (1981), “Biases in Dynamic Models with Fixed Effects,” Econometrica, 49,
1417–1426.
MIZIK_9781784716745_t.indd 414 14/02/2018 16:38

18. A nested logit model for product
and transaction-type choice for planning
automakers’ pricing and promotions
Jorge Silva-Risso, Deirdre Borrego and
Irina Ionova
Price promotions play an important role in the marketing mix plan of

most companies, especially for US automakers. Pricing promotion deci-
sions are particularly vital, given variation in capacity utilization, the long
cycle to design and initiate production of new vehicles, and the numerous
tools available to customize automotive pricing (cash incentives, promo-
tional financing, and leasing).
By the time our research was developed there had been relatively little
work to characterize the price and promotion responsiveness in durable
goods markets and particularly in the automobile market. Building on the
extant literature, we develop a consumer response model to evaluate and
plan pricing and promotions in such durable-goods markets. We discuss
its implementation in the US automotive industry and we illustrate the
model through an empirical application on a sample of data drawn from
J.D. Power transaction records in the entry SUV segment. Finally, we
discuss an example of an actual implementation. We offer insights onto
the underlying drivers of consumer heterogeneity in preferences for pro-
motion types that are used for price customization for a durable product,
such as an automobile. Differences with respect to frequently purchased
products in data, consumer decisions, and the long inter-purchase inter-
val necessitate the use of a specific model structure (see Silva-Risso et
al. 1999). First, consumers choose from a menu of alternative price
promotions. Second, consumers also choose how to structure their car
acquisition (e.g., purchase or lease, and how long to finance). Third, a
new-car acquisition may involve the trade-in of a used car, which results
in additional complexity for pricing. Fourth, except for the information
about the product traded-in, the transaction data available for modeling
do not contain any information about consumers’ previous purchase
history. Fifth, with a few exceptions, retailers sell only one brand, hence
product and price comparisons need to be performed across stores. Sixth,
automakers are constrained to offer the same pricing and promotional
415
MIZIK_9781784716745_t.indd 415 14/02/2018 16:38

conditions to all their dealers in a local market (i.e., they cannot alternate
sales promotions among retailers in a local market).
There are two main findings. First, in durable goods markets, consum-
ers are heterogeneous with respect to transaction types as well as brand
preferences. Second, consumers are heterogeneous in their relative sen-
sitivity to the different pricing instruments, not just on their overall price
sensitivity. Thus, some consumers are more responsive to a cash discount,
others to a reduced interest rate, etc. Hence, price discounts of the same
magnitude may lead to different effects, depending on what instruments
are used and the idiosyncratic price sensitivities of the target consumers.
A menu of pricing options tends to be most profitable, given the
constraint of blanket pricing. The best combination of pricing instruments
and their respective levels depends on the consumers’ transaction type
preferences and price sensitivities in the target market. Hence, a profit
maximizing manufacturer needs to find the “optimal” structure for its
pricing program, not just an overall “optimal” price level.
Modeling Objective and Specification
Our modeling objective is to develop a decision support system that would

help automobile manufacturers increase the effectiveness and efficiency
of their pricing and other marketing activities. The modeling approach
leveraged the extant literature on response models, but also took into
account the unique properties of the data and the product category. The
Power Information Network (PIN) database (see Silva-Risso and Ionova
2008) captures all the transactions recorded at each participating dealer
and does not rely on panels, which may significantly differ from the
overall population. However, in contrast to scanner panel data, the long
inter-purchase times in the automobile industry result in having only one
observation per buyer in the sample. Instead of having a history of pur-
chases and shopping trips, the only information available about previous
consumer purchases is the vehicle the consumer traded in (and only in the
40 percent of cases where there is a trade-in). Thus, for transactions with
a trade-in, we capture observed heterogeneity through variables similar to
the “last brand” variable used in several CPG scanner panel data models
(e.g. Bucklin and Lattin 1991). It should be noted, though, that, in the
several years since their typical last-car purchase, car buyers are likely to
have changed their preferences and needs.
The acquisition of a car involves multiple consumer decisions: the
choice of a product (vehicle model, such as Honda Accord), whether to
purchase or lease (cf. Dasgupta et al. 2007), and the term of the financing
MIZIK_9781784716745_t.indd 416 14/02/2018 16:38

A nested logit model for planning automakers’ pricing and promotions 417
contract (e.g., 36, 48, 60, 72 months). Furthermore, automakers offer a

menu of promotional programs (sales incentives) from which the con-
sumer may choose, e.g., customer cash rebates (cash discounts paid by the
manufacturer), promotional interest rates (with a schedule for different
terms), or lease “support.” Some of these programs can be combined
(e.g., in some cases automakers offer reduced interest rates in addition to
a cash rebate). Consumer response models need to include these decisions
and measure the effects of the multiple marketing offerings available to
consumers.
Modeling the consumer decision of transaction type is important for
several reasons. First, some promotional programs are structured to
increase or decrease the penetration of specific types of transactions. For
example, a manufacturer may want to increase (or decrease) the propor-
tion of leases. In some cases, the objective is to shorten the financing
period with promotional programs that target shorter-term contracts (e.g.,
a substantially lower interest rate for 36 or 48 months compared with 60
months or longer). Second, because promotional programs may affect the
penetration of the different types of transactions, a good prediction of
these changes is necessary for cost1 and profit estimation.
New-car retailing is different from other product categories in that it
is based on a heavily regulated franchise system. Franchised-car retailers
(dealers) sell only vehicles of one automaker.2 Furthermore, within the
same local market (e.g., DMA), car manufacturers must offer exactly the
same pricing and promotional conditions to all their dealers. Additionally,
all new car sales or leases have to be processed by a franchised dealer. State
laws prevent automakers from selling directly to consumers, discounters,
or wholesalers. Hence, we need to take into account that local markets are
the finest geographical unit for price customization, and that all retail sales
should be channeled through franchised dealers.
Our approach is based on a random effects multinomial nested logit
model of product (vehicle model, such as Hyundai Tucson), acquisition
(cash, finance with multiple terms, lease) and program-type choice (e.g.,
customer cash rebate, promotional APR, cash/promotional APR combi-
nation), see the model structure in Figure 18.1. Geographic location plays
an important role in segmenting consumer preferences in the automobile
industry. Consumers in California, for example, are more likely than
those living in the Midwest to purchase Japanese brands. Buyers in rural
areas are more likely than those in urban areas to purchase pickup trucks.
Furthermore, as mentioned before, the influence of other factors, such as
state-specific franchise laws, constrain manufacturers to offer same pricing
and promotional conditions to all retailers (dealerships) in the same local
market. Assessing the price and promotion response of a geographical
MIZIK_9781784716745_t.indd 417 14/02/2018 16:38

area is, therefore, an analytically convenient and managerially useful basis

on which to develop a promotional planning system.
Our approach to overcoming the lack of purchase histories at the indi-
vidual level is to estimate choice model parameters at a DMA level using a
hierarchical Bayes structure. We specify a panel structure (see Rossi et al.
1996; Rossi and Allenby 2003) where the units of analysis are local mar-
kets (DMAs).3 Car manufacturers, typically, set promotional programs at
the national or regional level, and customize those programs for specific
local markets (e.g., New York). Region definitions are specific to each
manufacturer. However, because regions are a set of local markets, DMA-
level coefficients allow us to estimate program effects at the desired level
of analysis: local market (DMA), region, and national for all automakers.4
Most implementations of the model have been at the national level, in
which we structure the prior distribution of the DMA-level parameters to
be distributed around an overall national mean. However, in some cases,
automakers are interested in focusing just on one or a few regions. In that
case, the DMA-level parameters are structured to be distributed around
that specific region mean.
The basic building block of our modeling approach is a nested logit5
model of automobile and transaction-type choice behavior in which the
utility of a particular vehicle is a function of the marketing mix and other
transaction-specific variables (Figure 18.1). In this model, the first stage
of the hierarchical Bayes structure is a nested logit choice model in which
the probability that consumer h in DMA m chooses automobile i and
transaction-type t at time t is given by:6
Phtm (i,t) 5 Phtm (t 0 i) . Phtm ( i ) (18.1)
where the probability of choosing transaction type t, conditional on auto-

mobile i at time t is given by:
exp (Uhtm,it)
Phtm (t 0 i) 5
a
(18.2)
exp ( U h
tm, itr )
tr
with the utility of transaction type t given by:
Uhtm,it 5 am,it 1 m,t Xhtm, it (18.3)
where am,it are transaction-type specific intercepts to be estimated, Xhtm, it is

a vector of consumer-specific and marketing variables and mt is a vector
of parameters to be estimated.
In turn, the probability of choosing automobile i is given by:
MIZIK_9781784716745_t.indd 418 14/02/2018 16:38

Household h, region m, time t
Transaction type /
MIZIK_9781784716745_t.indd 419
finance term {τ}
I1 I2 … Ik Vehicle model i = 1,…,k
… …
Lease Cash
Dealer financed
419
Stand Stand
Alone Stand alone Rebate / APR
Alone
Rebate APR program Combo program
Rebate
24 mths … 60 mths 24 mths 72 mths

…
24 mths … 72 mths 24 mths … 72 mths
Promotional APRs
aprs
Market rate
Figure 18.1 Nested logit model structure
14/02/2018 16:38
exp (Vtm,i
h )
Phtm (i)
a exp (Vtm,k)
5 (18.4)
h
k
with the utility of automobile i for consumer h, local market (DMA) m at

time t given by:
1 nm Ina a exp (Uhtm,itr) b

7
Vhtm,i 5 dm,i 1 gm Ytm,i
h (18.5)
tr
h
where d m,i are product specific intercepts to be estimated, Ytm, i is a vector
of consumer-specific and marketing variables, g m is a vector of parameters
to be estimated, and n m is the nested logit dissimilarity coefficient8 to be
estimated.
In the second stage of the hierarchical structure we specify a multivari-
ate normal prior over DMA parameters a m,it, d m,i, # m,t, g m,i, n m
am,it ,dm,i ,bm,t ,gm,i , nm , MVN amn , a b. (18.6)

n
Finally, in the third stage the national mean is assumed to come from a
distribution defined by the hyper priors as follows9:
mn , MVN (h,C) , (18.7)
a n Wishart ((rR) ,r) .

21
21 (18.8)
Empirical Illustration
We illustrate the modeling approach with an empirical application to
entry-level SUVs in the Western region10 (Arizona, California, Hawaii,
Idaho, Nevada, Oregon, Washington).
The PIN data base has data from 22 DMAs in the Western region. Note
that this empirical application does not correspond to any actual client
implementation. Confidentiality prevents us from publishing details of
actually implemented models. However, this illustration is realistic in that
it follows the current model methodology used in the implemented models.
Data Description
The main data source is new car sales transactions collected by the Power
Information Network, a division of J.D. Power and Associates. PIN
MIZIK_9781784716745_t.indd 420 14/02/2018 16:38

c ollects sales transaction data from a sample of dealerships in the major

metropolitan areas in the United States These are retail transactions, i.e.,
sales or leases to final consumers, excluding fleet sales.11 Each observa-
tion in the PIN database contains the transaction date, the manufacturer,
model year, make, model, trim and other vehicle information, the trans-
action price, consumer rebates, the interest rate, term, amount financed
(when the vehicle is financed or leased), etc.
We complemented sales transactions with a database containing full
details of promotional programs (incentives) offered by automakers
compiled by J.D. Power. For example, this database contains details of
the term structure of promotional APRs (e.g., 1.9 percent for 24 months,
2.9 percent for 36 months, 3.9 percent for 48 months and 4.9 percent for
60 months), several types of dealer and customer cash programs (e.g.,
loyalty, captive, conquest), etc. Demographic data are also augmented
with updated census data, by linking PIN transactions with census data at
the block group level (see Scott Morton et al. 2001 for more details).
Transaction Types
Auto sales transactions are typically classified in three categories: (1)

cash, which are those transactions in which the consumer purchased the
vehicle, but did not arrange financing through the dealer; (2) finance, if
the consumer buys a car and finances it through the dealer; and (3) lease,
if the consumer contracts a lease instead of purchasing the car. For price
promotion planning and budgeting we need to estimate the proportion
of consumers who choose each promotion type. Hence, these three basic
transaction types need to be expanded to include the specific type of pro-
motion the consumer opted for (Figure 18.1).
The three basic types of promotions are customer cash rebates, reduced
interest rate finance programs, and lease promotions. Those programs are
commonly offered as alternatives that cannot be combined. For example,
an automaker may offer consumers the option of taking $2,000 in
customer cash rebate, or promotional financing with rates of 0.9 percent,
1.9 percent, 2.9 percent and 3.9 percent for 24, 36, 48 and 60 months, or
a reduction of $30 in monthly lease payments. Consumers can choose to
take the customer cash (rebate) and finance the transaction through the
dealer at market rates. That consumer executes a finance transaction (at
the market rate), but receives a rebate instead of a financing incentive.
Another consumer who decides to take the 1.9 percent APR and finance
at 36 months opts for the promotional APR.
Additionally, automakers also offer combinations of customer cash
and promotional APRs, and they may do so while offering stand-alone
MIZIK_9781784716745_t.indd 421 14/02/2018 16:38

(not combinable) rebates and promotional APRs. To accommodate these

offerings, we need to expand each financing term into three alternatives:
stand-alone customer cash and financing at market rate, stand-alone
promotional APR (no rebate), or a combination of both (see Figure
18.1). There are other types of programs, such as loyalty cash and captive
cash (to promote business for the automaker’s financing arm), but all of
them can be addressed with this expanded set of transaction types. (For
a description of the model variables and intercepts as well as the detailed
structure of the random utility specification for each branch of the nested
logit, see Silva-Risso and Ionova 2008.)
Estimation and Implementation Results
Plots of parameter estimates are presented in Figure 18.2. The mean of the
posterior distribution of the parameters has the expected sign, and very
rarely there is a sign change within a 95 percent interval. The plots also
reveal substantial differences in response parameters across local markets.
Simulations
We use the model to improve the promotional offerings as of the begin-

ning of May 2016. Ideally, we should seek an increase in profits but,
because manufacturing variable costs and margins are not publicly avail-
able, we focused on searching for pricing programs that would deliver a
similar volume at a lower cost per unit, more volume at a similar cost, or
both a higher volume at a lower cost. In this case, cost represents the price
discount offered by the automaker through a specific menu of incentives.12
Model X is the vehicle model in this set which is spending the highest
amount in price promotions, about $3,600. Model X was offering consum-
ers a choice among $3,000 in customer cash or a promotional APR of 3.9
percent through 60 months or a lease program with $1,500 in lease cash
and a lease rate of 1.08 percent. Additionally, Model X offered $700 in
dealer incentives, $500 in loyalty cash and $1,000 in captive cash (promo-
tional money applied when a consumer finances through the financing
captive arm).
The cost of the finance promotions is computed by discounting the
cash flows (i.e., monthly payments) at the market rate (at the time of
the transaction) and subtracting that net present value from the amount
financed (see Silva-Risso and Ionova 2008: Appendix B Synthetic Monthly
Payment for Financing Loans, equation B1). A similar procedure is fol-
lowed to compute the cost of lease promotions. The average cost per unit
MIZIK_9781784716745_t.indd 422 14/02/2018 16:38

Coefficient Coefficient
BA BA
K K
ER ER
0.5
1.0
1.5
2.0
2.5
–6.50
–6.25
–6.00
–5.75
–5.50
–5.25
–5.00
SF SF
I I
BE EL BE EL
MIZIK_9781784716745_t.indd 423
C N D C N D
H D H D
IC O IC O
FR O- BO R FR O- BO R
ES RE IS ES RE IS
N DD E N DD E
O O
-V IN -V IN
G G
M H ISA M H ISA
ED O ED O
N LIA N LIA
FO LA L O FO
R U R
LA OL
U
D LO S V LU D LO S V LU
-K S E -K S E
M LA A G M LA A G
O N A O N A
N MA GE S N MA GE S
TE T LE TE T
R H R H
LE
EY FA S EY FA S
PA -SA LL PA SA LL -
LM L S LM L S
SA I SA I
C SP NA C SP NA
R R S R R S
A P IN A I
M M
Figure 18.2 Sample of model estimates

PO H G PO PH NG
N RT OE S N
TO TO RT OE S
SA -S LA NI SA -S LA NI
N X T
DMA
DMA
SA N TK
D SA N K N X
N F N D
TO O FR TO O
Rebate Coef
TA RA N R TA A N R
BA NC -M RE BA NC -M RE
R I O N R IS O N
BR SC D O BR C D O
O SA ES
A - A
O SA ES
-
-S OA N D TO -S OA N D TO
A A
N K-S IE N K-S IE
M A G M
Log of Monthly Pmt (lease) Coef
SE AR N O A G
SE AR N O
A - S J A J
TT A OS TT -SA OS
Source: Details of model estimates from Silva-Risso and Ionova (2008).

LE NL E LE NL E
- TA UO -T UO
TU
C B
TU A
C C C B
SO SP OM S S O
Y O P M
Y ON OK A
U N(N KA A
M U
M (NO AN
A OG NE
-E A
-E GA E
L AL L L
C ES C ES
EN ) EN )
TR TR
O O
14/02/2018 16:38
(in this case $3,600) is the result of the weighted average of the cost of all
transaction types (see Figure 18.1).
We built a market simulator based on the sample of consumers used for
calibration. We updated the environment (i.e., the pricing and incentive
programs for all products and markets) to reflect market conditions in
May 2016. Then, we created a set of scenarios in which Model X would
change the incentive offerings.13 Drawing from the posterior distributions
of the response parameters, we obtained distributions for the expected
share and program cost (price discount) for Model X. We used the means
of the resulting distributions (share and cost) to evaluate programs. For
example, increasing customer cash to $3,500, while lowering the APRs to
0.9 percent (36 months), 1.9 percent (48 months), 2.9 percent (60 months);
adding a combination of $2,500 customer cash and a 1.9 percent (36
months), 2.9 percent (60 months) APR program; lowering lease cash to
$1,250; and discontinuing $700 in dealer cash, would result in an increase
of sales of 2.9 percent with a reduction in unit cost of $278. We also found
programs that would increase sales by 6 percent for the same cost, or that
would keep sales at the same volume with savings greater than $300 per
unit.
Mid-size Domestic SUV – Improving Efficiency
In January 2016, a change in incentive programs was recommended to

automaker X to move mid-size SUV Y to the efficient frontier. There are
multiple alternatives to structure an incentive program that may result in
a similar “cost” (or price discount, or net price). However these different
structures may result in a wide range of incremental volume (or profits).
The efficient frontier analysis helps identify the most promising programs
along the chosen dimensions (e.g., the most profitable program for a given
net price, the least costly program for a market share objective, etc.).
In Figure 18.3, we show the efficient frontier for the MY2016 mid-size
SUV Y indicating the position of the price promotion program being
offered at the beginning of January 2016 and the proposed program (along
with the cost per unit and unit volume dimensions). The estimated impact
was a reduction in incentive cost (i.e., a smaller price discount) without
a decrease in sales volume. The economic effect was estimated as an
efficiency gain of $4.6 million per month (see Table 18.1).
The two programs are similar, except that the proposed program offers
a much lower promotional APR for financed purchases, instead of the
$1,000 in “captive” cash. Captive cash is an additional cash bonus offered
to consumers for financing or leasing through the automaker’s financing
arm. As such, consumers who take the cash rebate and finance through the
MIZIK_9781784716745_t.indd 424 14/02/2018 16:38

6,000
5,500
MIZIK_9781784716745_t.indd 425
5,000 Program at the beginning
of january '16
4,500
4,000
3,500
3,000 Efficiency frontier
2,500
425
Proposed program
2,000
1,500
Est. Total company expense/unit

1,000
500
0
6,500 7,000 7,500 8,000 8,500 9,000 9,500 10,000 10,500 11,000 11,500
January 2016 retail sales
Figure 18.3 Mid-size domestic SUV – incentive change/volume relationship
14/02/2018 16:38
Table 18.1 Mid-size domestic SUV – effects of proposed incentive

program (1/16)
Program Structure Beginning of January ’16 Proposed

CustomerCash Rebate $2,500 $2,500
APR up to 60 months 4.90% 0.00%
APR 72 months market rate 2.90%
Lease Cash $2,500 $2,500
Lease APR 36 months 3.74% 3.74%
Lease Loyalty Cash $1,000 $1,000
Captive Cash $1,000 0
Cost Structure Beginning of January ’16 Proposed

Penetration Cost per Penetration Cost per
affected unit affected unit
CustomerCash Rebate 49% $2,500 38% $2,500
Promotional Finance Rate 11% $1,509 29% $3,927
Captive Cash 84% $1,000 0% 0%
Lease Cash 40% $2,500 33% $2,500
Promotional Lease Rate 40% $1,499 33% $1,499
Lease Loyalty Cash 11% $1,000 9% $1,000
Average Cost Per Unit $3,937 $3,488

Estimated Retail Sales (1/16) 9,360 9,445
Estimated Savings per Init $439
Estimated Incremental Sales 85
Estimated Cost Savings $4,147,452
Estimated Margin from $459,000
Incremental Units
Total Efficiency Gains (1/16) $4,606,452
“captive” at the market standard rate, as well as consumers who take the
promotional APR or lease program, qualify for the $1,000 captive cash.
The lower financing interest rates in the proposed program result in a
greater discount (net present value) of about $2,400. Thus, after account-
ing for the elimination of the captive cash, APR transactions enjoyed
a net enhancement of $1,400, while the promotional money for rebate
and lease transactions got reduced by $1,000 (through the elimination of
captive cash). In sum, the efficiency gains hinge on reducing promotional
MIZIK_9781784716745_t.indd 426 14/02/2018 16:38

money from rebate and lease transactions by $1,400, while enhancing

promotional APR transactions by about $1,400. Note that the 84 percent
penetration of captive cash results from the 40 percent of consumers
who lease, the 11 percent of consumers who take the promotional APR
program and 33 percent14 of consumers who take the rebate and finance at
the dealer through the automaker’s financing arm (captive).
Despite a reduction in the effective price discount of $1,000, enough
consumers were expected to stay with the rebate15 and lease programs,
so that the average cost of the proposed program was lower than the
program at the beginning of January 2016. These effects are captured
by the transaction type intercepts (idiosyncratic preference for a specific
transaction type) together with the respective response parameters. The
computation of the estimated efficiency gains are presented in the lower
panel of Table 18.1.
Concluding Remarks
We document in this chapter the development and implementation of a

consumer response model to evaluate and plan pricing and promotions in
the automotive market. The PIN Incentive Planning System, as this model
is known, is based on a multinomial nested logit model of car and trans-
action-type choices. The system is currently being used by most major
automobile manufacturers. It has been credited to help save hundreds of
millions of dollars to several automakers.
We found that consumers are heterogeneous in their preferences for
products as well as transaction types, which may be a characteristic unique
to durable goods markets. Interestingly, consumers differ in their overall
price sensitivity as well in their relative sensitivities to specific pricing
instruments (e.g., cash discounts, reduced interest financing, reduced
lease payments). This phenomenon results in some consumers being more
responsive to cash discounts, while other consumers are more responsive
to low interest financing, and so on. Hence, automakers find it more effec-
tive to offer a menu of alternative “incentives” for consumers to choose
from (e.g., a choice among cash discount, reduced interest financing,
discounted lease payments, etc.) The specific levels at which each pricing
instrument (or “incentive”) should be offered depends on the specific
combination of consumer preferences and relative sensitivities in a given
market, as well as product categories, channel effects, etc. The search for
efficient pricing programs is not trivial and this is a core competency that
this model has brought to the automotive industry.
We should note some limitations from our work. Our model focuses
MIZIK_9781784716745_t.indd 427 14/02/2018 16:38

only on choice effects and does not capture the peaks and troughs driven
by consumers accelerating or postponing their decisions (not necessarily
affecting choice). Even though incremental sales are driven by choice
effects, it is also relevant to capture the up and down waves driven from
consumer time decisions for proper planning.16
Notes
1. We consider that referring to price promotions as a “cost” is a misnomer. In fact, price
promotions are a tool to customize pricing and increase revenues through price discrim-
ination among consumers with different degrees of price sensitivity (cf. Varian 1980).
We use the term “cost,” in this chapter, to be consistent with the usage and accounting
practices in the automobile industry.
2. There are a few cases in which dual dealerships are allowed, e.g., for low-share makes.
Note also that some automakers allow dealers to carry more than one of the auto-
maker’s nameplates (e.g., Chrysler and Jeep).
3. Note that this specification does not assume that consumers in a local market are
homogenous. We capture observed heterogeneity, first, through information of the car
traded-in and consumer demographics. Second, we capture within-DMA unobserved
heterogeneity through the posterior distribution of the DMA response parameters
(analogous to estimating DMA-level random coefficients).
4. Weights are used to project the PIN data sample to the volumes and shares of each
DMA, then to project the respective DMAs to the corresponding region shares and
volume and to project regions to the US market, using a procedure similar to the one
described by Maddala (1993) for choice-based samples.
5. Other examples of the use of nested logit and related models are Ainslie et al. (2006);
Cui and Curry (2005); Nair et al. (2005); Sriram et al. (2006) and Yang et al. (2006).
6. Note that, as illustrated in Figure 18.1, we tested a four-level nested logit (product,
acquisition type, program type, term) and a three-level nested logit (product, acquisi-
tion/program type, term). However, in the empirical analysis, dissimilarity coefficients
(i.e., inclusive value parameters) for financing terms and transaction types were not
significantly different from 1, and the model reduced to the two-level nested logit illus-
trated here. Dasgupta et al. (2007) found a similar result. However, in other applica-
tions, e.g., at the national level with a larger number of local markets, we have found
3- and (in a few cases) 4-level structures.
7. For simplicity, we omitted the error terms. The multinomial nested logit assumes gen-
eralized extreme-value distribution for the error structure (McFadden 1978; Maddala
8. The dissimilarity parameter is the coefficient of the inclusive value: ln ( g exp (Utm,itr
1993, 70), i.e., that the error terms in each nest are correlated (Train 2003, 83).
h )).
tr
The inclusive value represents the overall attractiveness of the corresponding lower
nest, expressed as the natural log of the denominator of the corresponding multinomial
logit in equation (18.2). McFadden (1978) showed that the dissimilarity coefficient is
approximately equal to 1 minus the pairwise correlation between the error terms of the
alternatives in that node, which in this case are the transaction-type utilities in equa-
tion (18.3). Hence, the value of the dissimilarity coefficient should be in the [0,1] range.
Values outside the [0,1] range are indicative of model misspecification. A value of nm =
1 indicates complete independence and the nested logit reduces to the standard multi-
nomial logit (Train, 2003).
9. Given this hierarchical set up, the posterior distributions for all unknown parameters
can be obtained using either Gibbs or Metropolis-Hastings steps. r, h, R and C are
set to be the number of parameters plus one, 0 (null matrix), I (Identity Matrix),
MIZIK_9781784716745_t.indd 428 14/02/2018 16:38

and I*1000, respectively, which represents a fairly diffuse prior yet proper posterior
distribution.
10. This “Western” region is for illustrative purposes only and does not correspond to any
actual specific automaker region definition.
11. A major source of fleet sales is vehicles sold to rental car companies, which are often
affiliated with or owned by a car manufacturer. Hence, fleet sales are frequently
“managed” by automakers to partially offset supply-demand gaps. Using total sales,
including fleet sales, as was done by Berry, Levinsohn and Pakes (1995, 2004) and
Sudhir (2001) would bias the response parameter estimates.
12. Because the cost (effective average price discount) of an incentive program depends on
the proportion of consumers who will choose each component of the program (e.g.,
cash rebates, reduced interest rate, lease), the effective cost is not known a priori. We
need to estimate the impact on sales (or share) and the cost for each program using the
model.
13. These scenarios were created by modifying the levels of the components of the incen-
tives offered by Model X and searching for better programs in a trial and error mode.
For simplicity, we kept the pricing and incentives offered by competitors fixed at the
May 2016 levels. However, competitive programs could be modified simultaneously
with the target product (in this case, Model X).
14. Note that of the 49 percent of consumers who prefer to take the rebate of $2,500 at the
beginning of January 2016, 33 percent also finance through the captive and qualify for
the additional $1,000 captive cash. The remaining 16 percent, either pay out of their
pockets or finance through other financing institutions (e.g., a credit union).
15. This result is consistent with the finding of Bruce et al. (2006) about rebates being used
to enhance the “ability to pay,” particularly for consumers who have “negative equity”
in the car they are trading in.
16. Additionally, making predictions for peaks and troughs explicit and linked to purchase
acceleration would help prevent a misleading read of outcomes (e.g., if a purchase accel-
eration peak is interpreted as a higher incremental volume than true).
References
Ainslie, Andrew, Xavier Drèze and Fred Zufryden (2005), “Modeling Movie Life Cycles and
Market Share,” Marketing Science, 24 (3), 508–517.
Berry, Steven T., J. Levinsohn and A. Pakes (1995), “Automobile Prices in Marketing
Equilibrium,” Econometrica, 63 (4), 841–890.
Berry, Steven T., J. Levinsohn and A. Pakes (2004), “Differentiated Products Demand
Systems from a Combination of Micro and Macro Data: The New Vehicle Market,”
Journal of Political Economy, 89, 400–430.
Bruce, Norris, Preyas Desai and Richard Staelin (2006), “Enabling the Willing: Consumer
Rebates for Durable Goods,” Marketing Science, 25 (4), 350–366.
Bucklin, Randolph E. and James M. Lattin (1991), “A two-state model of purchase incidence
and brand choice,” Marketing Science, 10 (Winter), 24–39.
Busse, Meghan, Jorge Silva-Risso and Florian Zettelmeyer (2006), “$1000 Cash Back:
The Pass-Through of Auto Manufacturer Promotions,” American Economic Review
(September), 1253–1270.
Cui, Dapeng and David Curry (2005), “Prediction in Marketing Using the Support Vector
Machine,” Marketing Science, 24 (4), 595–615.
Dasgupta, Srabana, S. Siddarth and Jorge Silva-Risso (2007), “Lease or Buy? A Structural
Model of a Consumer’s Vehicle and Contract Choice Decisions,” Journal of Marketing
Research, (August), 490–502.
Maddala, G. S. (1983), Limited-Dependent and Qualitative Variables in Econometrics, New
York: Cambridge University Press.
MIZIK_9781784716745_t.indd 429 14/02/2018 16:38

McFadden, Daniel (1978), “Modeling the choice of residential location,” in A. Karlqvist,

L. Lundqvist, F. Snickars and J. Weibull, eds. Spatial Interaction Theory and Planning
Models, Amsterdam: North Holland, 75–96.
Nair, Harikesh, Jean-Pierre Dubé and Pradeep Chintagunta (2005), “Accounting for
Primary and Secondary Demand Effects with Aggregate Data,” Marketing Science, 24
(3), 444–460.
Neslin, Scott A., Caroline Henderson and John Quelch (1985), “Consumer Promotions and
the Acceleration of Product Purchases,” Marketing Science, 4 (3), 147–165.
Rossi, Peter E. and Greg M. Allenby (2003), “Bayesian Statistics and Marketing,” Marketing
Science, 15 (4), 321–340.
Rossi, Peter E., Robert E. McCulloch and Greg M. Allenby (1996), “The Value of Purchase
History Data in Target Marketing,” Marketing Science, 22 (3), 304–328.
Scott Morton, Fiona, Florian Zettelmeyer and Jorge Silva-Risso (2001), “Internet Car
Retailing,” Journal of Industrial Economics, 49 (4), 501–519.
Scott Morton, Fiona, Florian Zettelmeyer and Jorge Silva-Risso (2003), “Consumer
Information and Price Discrimination: Does the Internet Affect the Pricing of New Cars to
Women and Minorities?” Quantitative Marketing and Economics, 1 (1), 65–92.
Silva-Risso, Jorge, Randolph E. Bucklin and Donald G. Morrison (1999), “A Decision
Support System for Planning Manufacturers’ Sales Promotion Calendars,” Marketing
Science, 18 (3), 274–300.
Silva-Risso, Jorge and Irina Ionova (2008), “Practice Prize Winner: Nested Logit Model
for Planning Automakers’ Pricing and Promotions,” Marketing Science, 27 (4), 545–566.
Sriram, S., Pradeep K. Chintagunta and Ramya Neelameghan (2006), “Effects of Brand
Preference, Product Attributes, and Marketing Mix Variables in Technology Product
Markets,” Marketing Science, 25 (5), 440–456.
Sudhir, K. (2001), “Competitive Pricing Behavior in the Auto Market: A Structural
Analysis,” Marketing Science, 20 (1), 42–60.
Train, Kenneth (2003), Discrete Choice Methods with Simulation, New York: Cambridge
University Press.
Varian, Hal R. (1980), “A Model of Sales,” American Economic Review, 70 (4), 651–659.
Yang, Sha, Vishal Narayan and Henry Assael (2006), “Estimating the Interdependence of
Television Program Viewership Between Spouses: A Bayesian Simultaneous Equation
Model,” Marketing Science, 25 (4), 336–349.
Zettelmeyer, Florian, Fiona Scott Morton and Jorge Silva-Risso (2006), “Scarcity Rents in
Car Retailing: Evidence from Inventory Fluctuations at Dealerships,” National Bureau of
Economic Research, Working Paper 12177.
MIZIK_9781784716745_t.indd 430 14/02/2018 16:38

19. Visualizing asymmetric competitive
market structure in large markets1
Daniel M. Ringel and Bernd Skiera
Understanding competition and competitive market structure is essential

for firms to derive a good competitive strategy (Rao and Sabavala 1986)
that supports pricing policies, product design, product differentiation and
communication strategies (DeSarbo, Manrai and Manrai 1993; Urban,
Johnson and Hauser 1984; Bergen and Peteraf 2002; Lattin, Carrol and
Green 2003). Although firms can obtain some insight into the competi-
tive landscape by analyzing their own product sales data or by purchasing
reports on market shares, such information does not provide answers
to the questions of who their key competitors are, in which part of the
market they are and what the overall competitive structure of the market
looks like. Moreover, large markets typically consist of several submarkets
whose identification and analysis provides better explanations of con-
sumer behavior than is apparent from the full market (Urban et al. 1984).
Such knowledge about submarkets is valuable to manufacturers because
they need to know which of their products reaches which submarket,
which submarkets they currently do not cover well, which competitors are
strongest in which submarkets and at the expense of which other products
do their new products gain market share. In addition, it helps retailers
answer the question of how to serve the needs of many different types of
potential customers with as few products as possible so they do not tie
up excessive capital in inventory. Finally, legislators must understand
competitive market structures to determine the size of markets, to detect
early warning signals of potential market failure (e.g., the formation of
cartels or monopolistic structures) as well as to answer the question of
whether a merger or an acquisition would provide a single firm with too
much market power.
Obtaining answers to the above questions is a complex undertaking
since the analyst must consider each competitor in relation to all others. To
do so, analysts can resort to a number of perceptual mapping techniques
that visualize competitive relations and thus facilitate decisions (Lilien
and Rangaswamy 2004; Smelcer and Carmel 1997) and enhance decision
quality (Ozimec, Natter and Reutterer 2010). The major challenge that
remains is the growing number of competing brands and products within
431
MIZIK_9781784716745_t.indd 431 14/02/2018 16:38

markets. For instance, in 2012 consumers could choose among 920 digital
cameras, 1,196 washing machines or 1,514 vacuum cleaners (Ringel and
Skiera 2016). Yet, mapping techniques provided by marketing scholars
at that point in time uncovered and visualized competitive relations only
among a limited number of products (e.g., 7 detergents, 62 digital cameras
or 169 cars, see Ringel and Skiera 2016).
While it is relatively easy to visualize competitive market structure for
small markets by mapping bubbles onto a two-dimensional space, where
each bubble represents a single product, the graphical representation of
larger markets quickly takes the form of a dense lump of bubbles, making
the resulting map difficult to decipher (Netzer, Feldman, Goldenberg and
Fresko 2012). Such lumping among hundreds of products is especially
severe when the visual representation is generated using multidimen-
sional scaling techniques (MDS) that have become popular in marketing
research over the past decades. Moreover, a circular bending effect, which
refers to objects being mapped in a circular shape or “horseshoe,” is
common to MDS solutions and can lead to an inaccurate interpretation of
competitive relationships, since products that have weak or non-existent
competitive relationships with one another may appear closer together
than they should (Kendall et al. 1970; Clark, Carroll, Yang and Janal
1986; Diaconis, Goel and Holmes 2008).
The main reason such horseshoes appear when mapping large markets
using MDS is that large markets typically consist of several submarkets
with the products of one submarket having no or only very weak relations
to products of other submarkets. For instance, assume that display size
is a submarket defining criteria for TV sets. Someone wanting to buy a
TV for a large space in his living room will probably only choose among
very large TVs (e.g., 60-inch display) and not consider smaller TVs (e.g.,
all TVs smaller than 55 inches). Consequently, most competitive relation-
ships among products in very large markets are either very weak, or most
often, even zero, leading to what we refer to as a very sparse dataset. When
MDS now attempts to position products of a sparse data set in a map in
such a way that all these zero or nearly zero relationships are reflected in
similar distances of the corresponding products to another, it arranges
them in a circular, horseshoe shape.
To solve the above problems, analysts can confine their analysis to
individual submarkets. However, unlike in the above example of small and
large TV sets, it is not always clear what the true submarket-separating
criteria are. Therefore, an analyst can easily make a mistake when defining
individual submarkets up front, leading to an incomplete and perhaps
even incorrect competitive market structure map. And finally, when
only individual submarkets are analyzed, no insight is created as to how
MIZIK_9781784716745_t.indd 432 14/02/2018 16:38

Visualizing asymmetric competitive market structure in large markets 433
these individual submarkets relate to another and where exactly they are
separated.
Another important aspect of competitive analysis is competitive asym-
metry. It exists when the degree of competition between two firms is
not equal, such as when Firm A competes more intensely with Firm B
than Firm B competes with Firm A (DeSarbo and Grewal 2007). For
example, Apple is a large and best-known manufacturer of MP3 players
(i.e., iPods) whereas iRiver only supplies a few models and is less known.
From iRiver’s perspective, the competition with Apple is quite intense.
From Apple’s point of view, however, iRiver is hardly a competitor worth
noting. A complete visualization of competitive market structure must
therefore also include competitive asymmetries.
Decomposition and Re-Assembly of

Markets by Segmentation
Given the need and the challenge of visualizing competitive relationships

in large markets (i.e., markets containing over 1,000 products), Ringel and
Skiera (2016) developed a new model called DRMABS (Decomposition
and Re-assembly of MArkets By Segmentation). DRMABS combines
methods from multiple research disciplines such as biology, physics,
computer science and sociology with a new method of submarket-centric
mapping to visualize asymmetric competition in large markets in a
single, two-dimensional map. Moreover, DRMABS uncovers submarket
structures without requiring a priori submarket definitions and depicts
submarket separation clearly in its graphical output.
DRMABS is based on the idea of breaking up a large problem into
smaller problems, solving each smaller problem and putting all smaller
solutions back together in such a way that they fit optimally together.
Thus, DRMABS consists of two parts: decomposition and reassembly.
Each part, in turn, consists of two steps for a total of four steps
(see Figure 19.1). In what follows is a more detailed description of each
step.
Step 1: Find Submarkets
In Step 1, DRMABS identifies submarkets that, taken together, make up

the entire competitive landscape of the market under analysis. To do so,
DRMABS uses multilevel coarsening and refinement Louvain community
detection, which generates a coarse-grained representation of the submar-
kets that together represent the market. A submarket is defined as a group
MIZIK_9781784716745_t.indd 433 14/02/2018 16:38

Find submarkets Map submarkets

decomposition globally
Part 1
• Multilevel coarsening & refinement • Aggregate to representatives

Louvain community detection • Visualization of similarity (VOS)
mapping
Map products locally, Add asymmetry

optimize globally
re-assembly
Part 2
• Submarket-centric mapping • Global by consideration frequency

• Local by conditional probability
Figure 19.1 The two parts and four steps of DRMABS
of products that compete intensely among themselves and weakly with

products outside the group.
The major advantage of this method over methods commonly used in
marketing (e.g., k-means or WARD clustering) is that it: (1) identifies
the number of existing submarkets, (2) handles very large markets with
very heterogeneous submarket sizes, and (3) does not erroneously merge
products or entire submarkets into lumps of overall weakly related
products.
The required input is a symmetric relationship matrix that captures the
(normalized) relationship strength (i.e., the similarity) of each product
with all other products in the market. The output is a list indicating the
submarket membership of each product.
Step 2: Map Submarkets Globally
In Step 2, DRMABS selects the most central product of each identified

submarket as submarket representative using harmonic centrality (Boldi
and Vigna 2014) and aggregates all between-submarket relations (i.e.,
MIZIK_9781784716745_t.indd 434 14/02/2018 16:38

s imilarities) to these representatives. The full market of hundreds of prod-

ucts is thus reduced to a small set of submarket representatives. DRMABS
uses the most central product of each submarket as its representative.
Once all submarket representatives are identified and all relations (i.e.,
similarities) are aggregated into a new (much smaller) symmetric matrix
of similarity, DRMABS uses a method called Visualization of Similarities
(VOS) to map all submarket locations relative to each other in a full
market map. The main advantage of VOS over traditional MDS is that:
(1) VOS does not suffer under circular bending effects (i.e., horseshoes)
and (2) VOS does not have the tendency to lump market dominating
products together (van Eck, Waltman, Dekker and van den Berg 2010).
The final outputs of Step 2 are the coordinates of the submarket centers in
the global (i.e., complete) market map.
Step 3: Map Products Locally, Optimize Globally
In Step 3, DRMABS conducts a submarket-centric mapping that maps

submarkets locally and optimizes them globally across all products to
obtain a single visualization of the entire competitive market structure
while preserving the local structure of each submarket. First, VOS maps
each submarket. Then, all local submarkets are placed in the global map
derived in Step 2. To do so, the coordinates of each submarket representa-
tive are set to the submarket position in the global map and the product
coordinates of all other products are geometrically transformed (i.e.,
shifted) according to their relative positions to their respective submarket
representatives in the local submarket maps.
Although all submarkets are now in a common map, their orientation
(i.e., rotation) relative to another is not optimal. Since products of differ-
ent submarkets can still have weak competitive relations with one another,
submarkets must be rotated to account for such relations. Furthermore,
the ratio between distance and similarity is not necessarily the same across
the (local) maps of all submarkets. Additionally, submarkets are likely
to heavily overlap in a joint space since they were originally configured
locally with far fewer products to fill the map space.
DRMABS solves these problems by applying a common scale to
the distances in all submarkets as well as by optimally rotating and
re-scaling them in such a way that between-submarket product relations
are accounted for and the resulting global map configuration has little
overlap with clear separation of the submarkets. The outputs of Step 3 are
the coordinates for each product in a single, competitive market structure
map.
MIZIK_9781784716745_t.indd 435 14/02/2018 16:38

Step 4: Add Asymmetry
DRMABS visualizes two types of competitive asymmetry: (1) global com-

petitive asymmetry across all products in a market (i.e., market share) and
(2) local competitive asymmetry that measures the intensity of competi-
tion between pairs of products as in the case of Apple’s iPod and iRiver’s
MP3 player.
Global competitive asymmetry is indicated by bubble size in DRMABS’s
visual output. The larger a product’s bubble in the map, the greater its
market share. Local competitive asymmetry relates the similarity of pairs
of products to their respective market shares using conditional probabil-
ity. It is visualized using arrows where an arrow originates in one product
and points at its competitor (and vice versa). The heavier the arrow, the
stronger the competitor to the originating product.
The final output of DRMABS is a two-dimensional representation
of asymmetric competitive market structure with a clear separation of
submarket structures. The market structure analysis can now be extended
by introducing further elements such as product attributes to the map. For
instance, brand can be visualized using bubble color, performance such
as the display size of TVs can be visualized by bubble size and additional
features such as 3D capability of TVs can be visualized using different
shapes (e.g., triangles instead of bubbles).
Empirical Application of DRMABS to the

LED-TV Market
The objective of this empirical application of DRMABS is to analyze the

asymmetric competitive market structure of the German LED-TV market
in September 2012 containing 1,124 individual products. The basic idea
of the analysis is to use consideration sets of consumers to identify com-
petitive relations among products as input to DRMABS. A consideration
set is thereby defined as a set of products that are viable substitutes to
the consumer. However, with 1,124 different LED-TVs available in the
market, traditional approaches for data collection, such as surveys and
scanner panels, are not viable for the analysis (Netzer et al. 2012). Surveys
are limited by the cognitive capacity of interviewed consumers, who are
unlikely to remember all products that they considered for purchase, while
scanner panels require repeat purchases, making them inappropriate for
consumer durables.
An alternative approach is the use of big search data, specifically,
clickstreams of thousands of consumers searching for and comparing
MIZIK_9781784716745_t.indd 436 14/02/2018 16:38

products at a price comparison site. This approach is based on the notion

that clickstream data of consumers searching for and comparing products
online can be used to construct consideration sets (Moe 2006). And since
consideration sets are the ultimate arbiters of competition (Peter and
Olson 1993), they can be used to uncover competitive market structure
(Roberts and Lattin 1991; DeSarbo and Jedidi 1995; Paulssen and Bagozzi
2006).
Data Collection
For the empirical study of the LED-TV market, clickstreams of over

100,000 consumers are collected in September 2012 in real time and at very
low cost by means of a tracking pixel installed at a price comparison site.
Price comparison sites (e.g., Pricegrabber, Idealo or Google shopping)
provide consumers with platforms on which to search for and objectively
compare various products and product offers of thousands of retailers. A
major advantage of using price comparison site data is that, by definition,
such data span across hundreds of retailers and are therefore a better
representation of the market than the inventory of only a single retailer.
Further, since price comparison sites generate revenue with every click
on any retailer offer, regardless of which product the offer is for, they are
indifferent to which products are viewed by consumers, making them an
unbiased data source for product consideration. Finally, price comparison
sites capture revealed measures of consumer search at an individual level,
offering insight into individual customer clickstreams, whereas other
sources of online search (e.g., Google) can only provide summary infor-
mation (e.g., total keyword searches).
Overall, a total of 105,606 individual consideration sets are identi-
fied and, based on the underlying notion that the more frequently two
products are jointly considered, the more similar they are, aggregated
into a symmetric matrix of joint-consideration as input to DRMABS.
The generated symmetric matrix of joint-consideration consists of 1,124
rows and 1,124 columns resulting in a total of 631,126 individual product
relationships (i.e., similarities). The mean consumer consideration set size
of 3.19 with a standard deviation of 2.005 is in line with past studies on
consumer consideration sets.
A first glance at the data shows that Samsung clearly dominates the
LED-TV market at the time with a consideration set share of nearly 43
percent. Overall, the products of the top 10 brands (e.g., Samsung, Philips,
LG, Sony, etc.) jointly capture 96.85 percent of the market with the
remaining 46 brands capturing only 3.15 percent.
MIZIK_9781784716745_t.indd 437 14/02/2018 16:38

Map Generation
Based on the collected data and following DRMABS, a single asymmetric

competitive market structure map for 1,124 LED-TVs is now generated.
The multilevel coarsening and refinement Louvain community detec-
tion algorithm identifies 30 distinct submarkets for which representative
products are identified using harmonic centrality and local submarket
configurations (i.e., maps) are generated using VOS. Figure 19.2 docu-
ments the assembly of the full map starting with the initial positions of the
30 identified submarkets (Solution I), continuing with the introduction
of the optimized local map configurations of all 1,124 products (Solution
II), and finishing with the addition of global competitive asymmetry by
bubble size (Solution III) and local competitive asymmetry by arrows
(Solution IV).
For better readability, the final map (Solution IV of Figure 19.2) is dis-
played in larger format in Figure 19.3. All 1,124 LED-TVs are organized
into submarkets such that within-submarket competitive relations among
products are stronger than between-submarket competitive relations.
Further, submarkets whose products have stronger between-submarket
competitive relations are located closer to each other. Likewise, these sub-
markets are oriented such that individual products (in different submar-
kets) that compete more strongly with one another are positioned closer
together. Furthermore, the largest products in terms of market share
(bubble size) are spread across different submarkets where they serve dif-
ferent consumer needs. An instance of strong local competitive asymmetry
can be seen in submarket 2 (Figure 19.3) where there are two very heavy
arrows originating in a tiny bubbled product (Orion TV24LB860) at the
very edge of the map and pointing at two larger bubbled products (Orion
24LB890 and Telefunken T24EP970CT) toward the center of submarket 2
with arrows so light returning from the Orion TV24LB860 that one would
need to heavily zoom in on the map to see them.
Map Exploration
To better understand the competitive situation in a market and what

its drivers are, additional information such as product attributes can be
transposed onto the DRMABS output using color, shape and size. The
attributes selected in this empirical study are brand, display size, and 3D
capability, although additional attributes could easily be included.
Figure 19.4 depicts competitive market structure with brand (by bubble
color) and display size (by bubble size) transposed onto the product
coordinates. Note that a number of same-brand products are clustered
MIZIK_9781784716745_t.indd 438 14/02/2018 16:38

MIZIK_9781784716745_t.indd 439
Map submarket locations (color) Submarket members added
Global asymmetry by bubble size added Local asymmetry by arrows added
439
Figure 19.2 Four solutions to represent the development of asymmetric competitive market structure map construction
with DRMABS
14/02/2018 16:38
MIZIK_9781784716745_t.indd 440
440
Figure 19.3 Visualization of asymmetric competitive market structure map of 1,124 LED-TVs
14/02/2018 16:38
MIZIK_9781784716745_t.indd 441
441
Legend
Bubbles represent individual products (SKUs)
Bubble color indicates brand
Bubble size indicates display size
Top 10 brands by market share (GfK):
Samsung Philips LG Panasonic Sony Toshiba Sharp Grundig Loewe Telefunken
Figure 19.4 Using brand and display size to understand competitive market structure of 1,124 LED-TVs
14/02/2018 16:38
tightly together, as indicated by bubble color. Overall, brand seems to

contribute to the organization of competitive market structure in the
LED-TV market, as many submarkets are made up of only a few brands.
Display size appears to be another driver for market structure, as small
displays are concentrated toward the upper left of the map, with increasing
display size toward the right (see Figure 19.4). Note that the products with
larger displays (top right) are predominantly offered by leading brands
such as Samsung, LG and Philips, while small displays (top left) are
offered by a very large number of smaller brands.
Finally, a new and innovative product attribute, 3D capability, is
transposed onto the competitive market structure map (triangles in
Figure 19.5) to determine whether it is possible to use such an attribute
for an upfront definition of smaller submarkets that can be analyzed
independently of other submarkets with traditional mapping methods. A
well-defined submarket (or group of contiguous submarkets) consisting
of only 3D LED-TVs would lend support to such an upfront market
definition.
Clearly, 3D capability is not a submarket-defining feature, since 3D
LED-TVs are scattered across most submarkets. Consequently, an upfront
market definition of 3D LED-TVs would have led to a wrong competitive
market structure representation.
Model Comparison
To demonstrate the advantage of using DRMABS for analyzing com-

petitive market structure in large markets, several traditional models for
competitive market structure mapping are also applied to the LED-TV
data. The objective is to both visually inspect each mapping solution for
potential weaknesses such as circular bending, lumping of dominant prod-
ucts and poor submarket recovery as well as measure the quality of the
model output.
Since each model optimizes its own quality metric, a common quality
metric across all models must be used for comparison. Given that the
overarching objective of a competitive market structure map is to position
the strongest competitors of any product as close to the product as pos-
sible, we calculate a top 10 hit-rate indicating how many of each product’s
closest competitors are also positioned closest in the mapping solution.
Figure 19.6 depicts the mapping solutions and top 10 hit-rates of six
popular models. Kamada-Kawai, Fruchterman-Reingold and ordinal
multidimensional scaling all suffer under circular bending and heavy
lumping of dominant products. Using VOS alone leads to a mapping
solution whose shape and general submarket positions resemble that of
MIZIK_9781784716745_t.indd 442 14/02/2018 16:38

MIZIK_9781784716745_t.indd 443
443
Legend
Bubbles represent individual products (SKUs)
Bubble color indicates submarket membership
Triangle indicates 3D capability
Submarkets are numbered 1 through 30
Figure 19.5 3D TVs in the competitive market structure map of 1,124 LED-TV
14/02/2018 16:38
Kamada-Kawai Ordinal MDS Cluster-centric kamada-kawai
MIZIK_9781784716745_t.indd 444
Fruchterman-reingold VOS DRMABS
444
Bubble size indicates global competitive asymmetry (consideration frequency)
Bubble color indicates cluster membership
Mean top 10 hit-rate in %
Figure 19.6 Comparison of different models to display the competitive structure of LED-TV market
14/02/2018 16:38
DRMABS, but the hit-rate of VOS is less than half as high (19 percent).
The cluster-centric Kamada-Kawai solution, which does not optimize
submarket rotation and dilation, suffers from heavy overlapping of
submarkets. Clearly, DRMABS outperforms all other models in terms
of hit-rate (41 percent), shows clear submarket separation and does not
exhibit circular bending or lumping of dominant products.
Conclusion
The combination of DRMABS and clickstream data from price compari-

son sites provides manufacturers, retailers and legislators with fast and
inexpensive insights into today’s large markets that they cannot obtain
from other sources such as market share or sales reports.
Manufacturers can use asymmetric competitive market structure maps
to quickly see how a market is organized, how many submarkets exist,
which competitors they face in each submarket and how strong these
competitors are. Retailers, on the other hand, can use these maps to make
better purchasing and inventory management decisions by covering many
market segments without stocking too many products. Legislators can
monitor the competitive market structure as well as analyze what conse-
quences a merger or acquisition could have on free markets.
By the example of the LED-TV market, manufactures can learn from
Figure 19.3 that there are 30 submarkets and that most submarkets are
dominated by a few products, as indicated by these products’ large bubble
sizes (where bubble size captures global competitive asymmetry). In fact,
the top 10 LED-TVs do not compete primarily against one another, but
rather against products in their respective submarkets, which is an insight
that managers cannot attain solely by considering the products’ market
shares.
Further, the competitive market structure map in Figure 19.4 reveals
that a given brand might face different competitors in different areas
of the market, and they enable manufacturers to observe who these
competitors are. For instance, Sony’s closest competitors in the area of
34” to 37” TVs (bottom left of Figure 19.4) are Grundig and Panasonic.
However, in the 40” to 46” area of the market (bottom right of Figure
19.4), Sony faces different competitors, namely LG, Sharp and Philips.
Consequently, product line managers must align their targeting, product
differentiation and communication strategies to the specific competitors
they face in a specific area of the market, especially when competitors
have different strengths and follow different strategies. Note that the
orientation of individual submarkets relative to one another is crucial
MIZIK_9781784716745_t.indd 445 14/02/2018 16:38

in correctly assessing who the closest competitors in nearby submarket

are. Since DRMABS accounts for between-submarket relations by rotat-
ing submarkets, it provides insights beyond a mere series of individual
submarket maps.
Furthermore, manufacturers can attain insight into the positioning of
different brands. For instance, while Samsung products are present across
the entire market (see Figure 19.4), products of the premium brand Loewe
are concentrated in one central submarket (14). Although Loewe’s manag-
ers may consider it to be good news that the brand practically defines its
own submarket, Figure 19.3 and Figure 19.4 jointly show that the “Loewe
submarket” is isolated from other submarkets and draws relatively little
consumer consideration. This insight should alarm Loewe managers since
it essentially means that the once highly popular Loewe brand is dropping
out of consumers’ consideration sets. Indeed, in line with this troubling
insight, Loewe filed for bankruptcy in 2013, only one year after the data
collection for this study.
Retailers receive guidance from the presented competitive market struc-
ture map in selecting and managing their product inventory. Most retailers
have both budget and space constraints when stocking products for sale.
Within these constraints they must decide which LED-TVs to order from
more than 1,000 products offered by manufacturers. Wrong decisions can
leave them either with overstock that does not sell, or shortages of “hot
products” that prevent them from meeting the demand of their customers.
In both cases they end up losing money. By identifying submarkets and
their respective most popular products, retailers can easily serve a broad
spectrum of consumer needs with a relatively small number of products
(e.g., 30 LED-TVs if they select the most popular one in each submarket).
Further, retailers can obtain an indication of how great the overall market
demand for each product is (global asymmetry) and balance order quanti-
ties accordingly. Retailers wishing to offer some alternatives to any given
product can use the DRMABS output to find the respective substitutes.
Finally, legislators can, for instance, learn from Figure 19.4 that Samsung,
Philips and LG build strong presences in most submarkets and that the
potential merger of any of the three would likely lead to undesirable levels
of market power. Figure 19.4 also outlines that TVs with 3D capabilities do
not create a new market but compete very strongly with other TVs.
Note
1. This article is based upon Ringel DM, Skiera B (2016) Visualizing asymmetric com-
petition among more than 1,000 products using big search data. Marketing Science
MIZIK_9781784716745_t.indd 446 14/02/2018 16:38

35(3):511–534. For a full technical description of DRMABS and a formal specification of

the model please refer to the original article.
References
Bergen M, Peteraf MA (2002) Competitor identification and competitor analysis: A broad-

based managerial approach. Managerial and Decision Economics 23(4–5): 157–169.
Boldi P, Vigna S (2014) Axioms for centrality. Internet Mathematics 10(3–4): 222–262.
Clark WC, Carroll JD, Yang JC, Janal MN (1986) Multidimensional scaling reveals two
dimensions of thermal pain. Journal of Experimental Psychology: Human Perception and
Performance 12(1): 103.
DeSarbo W, Jedidi K (1995) The spatial representation of heterogeneous consideration sets.
Marketing Science 14(3): 326–342.
DeSarbo WS, Grewal R (2007) An alternative efficient representation of demand-based
competitive asymmetry. Strategic Management Journal 28(7): 755–766.
DeSarbo WS, Manrai AK, Manrai LA (1993) Non-spatial tree models for the assessment
of comparative maket structure: An integrated review of the marketing and psychometric
literature. Eliashberg J, Lilien G, eds. Handbook in Operations Research and Marketing
Science (North Holland, Amsterdam), 193–257.
Diaconis P, Goel S, Holmes S (2008) Horseshoes in multidimensional scaling and local
kernel methods. The Annals of Applied Statistics 2(3): 777–807.
Kendall M, Cockel R, Becker J, Hawkins C (1970) Raised serum alkaline phosphatase in rheu-
matoid disease. An index of liver dysfunction? Annals of the Rheumatic Diseases 29(5): 537.
Lattin JM, Carrol DJ, Green PE (2003) Analyzing multivariate data (Duxbury Resource
Center, Pacific Grove).
Lilien GL, Rangaswamy A (2004) Marketing engineering: Computer-assisted marketing
analysis and planning (DecisionPro, Victoria, BC, Canada).
Moe WW (2006) An empirical two-stage choice model with varying decision rules applied to
internet clickstream data. Journal of Marketing Research 43(4): 680–692.
Netzer O, Feldman R, Goldenberg J, Fresko M (2012) Mine your own business: Market
structure surveillance through text mining. Marketing Science 31(3): 521–543.
Ozimec A-M, Natter M, Reutterer T (2010) Geographical information systems-based
marketing decisions: Effects of alternative visualizations on decision quality. Journal of
Marketing 74(6): 94–110.
Paulssen M, Bagozzi RP (2006) Goal hierarchies as antecedents of market structure.
Psychology and Marketing 23(8): 689–709.
Peter JP, Olson JC (1993) Consumer behavior and marketing strategy, 3rd ed. (Irwin,
Homewood).
Rao VR, Sabavala DJ (1986) Measurement and use of market response functions for allocating
marketing resources (Marketing Science Institute, Boston).
Ringel DM, Skiera B (2016) Visualizing asymmetric competition among more than 1,000
products using big search data. Marketing Science 35(3): 511–534.
Roberts JH, Lattin JM (1991) Development and testing of a model of consideration set com-
position. Journal of Marketing Research 28(4): 429–440.
Smelcer JB, Carmel E (1997) The effectiveness of different representations for managerial
problem solving: Comparing tables and maps. Decision Sciences 28(2): 391–420.
Urban G, Johnson PL, Hauser JR (1984) Testing competitive market structures. Marketing
Science 3(2): 83–112.
van Eck NJ, Waltman L, Dekker R, van den Berg J (2010) A comparison of two techniques
for bibliometric mapping: Multidimensional scaling and VOS. Journal of the American
Society for Information Science and Technology 61(12): 2405–2416.
MIZIK_9781784716745_t.indd 447 14/02/2018 16:38

20. User profiling in display
advertising
Michael Trusov and Liye Ma
Digital display advertising has established itself as the primary outlet for
advertising dollars spent by marketers online and reached $27 billion in
2015 (eMarketer, 2015). The key to display advertising is user informa-
tion, which feeds into an ad-targeting engine to improve responses to
advertising (e.g., click-through rate or other forms of interaction). One
of the main constituents of user data is web browsing information. As
consumers navigate through the web, advertising networks, such as
Advertising.com or ValueClick.com, can track their online activities
across multiple sites participating in their network, building behavioral
profiles of each individual. One popular way of describing consumers’
interests and preferences revealed through online activities is to represent
individual profiles as a vector of count data that captures number of visits
to corresponding types of websites. For example, a profile dimension on
“Interests in Sports” will be high for a person who is frequent to ESPN.
com; in turn, regular visits to Netflix.com would serve as a proxy for
“Interest in Entertainment.” The observed online activities of an indi-
vidual consumer are thus a collection of visitations to many websites of
different categories, which reflects a combination of her various interests
and behavioral patterns.
This approach to constructing behavioral profiles, while straightfor-
ward, faces some important challenges. First, individual consumer-level
records are massive and call for scalable, high-performance processing
algorithms; second, advertising networks can only observe a consumer’s
browsing activities on the sites participating in the network, potentially
missing site categories not adequately covered. The latter, in particular,
results in a biased view of the consumer’s profile that could lead to subop-
timal advertising targeting.
We present a method that aims to address these challenges. Extending
the Correlated Topic Model (Blei and Lafferty 2007), we develop a
modeling approach that augments individual-level ad network data with
anonymized third-party data that significantly improves profile recovery
performance and helps to correct for potential biases. The approach is
scalable and easily parallelized, improving almost linearly in the number
448
MIZIK_9781784716745_t.indd 448 14/02/2018 16:38

User profiling in display advertising 449
40
Actual Profile Advertiser’s View
35
30
Number of Visits
25
20
15
10
5
0
ce
s/R nce
es
es
ls
rs
es
ia
lt
ai
en
le
io
io
du
ta
ve
ed
rc
am
ic
an
et
ty
nm
at
at
e
rv
ou
er
A
R
M
Po
s
in
s
G
ig
re
fe
Se
lS
ai
X
es
sF
av
al
or
P
Li
rt
X
na
ci
/N
te
nf
es
te
X
So
io
ra
s/I
sin
ie
En
ch
ot
po
or
ew
ar
Bu
om
ct
or
Se
N
ire
C
Pr
D
Figure 20.1 Profile example – actual vs. advertiser’s view
of CPUs. Using economic simulation, we illustrate the potential gains the

proposed model may offer to a firm when used in individual-level target-
ing of display ads.
Modeling User Profile
Our data are obtained from a leading global information and measurement
company that wishes to remain anonymous. The dataset contains detailed
website browsing information of a large panel of more than 45,000 house-
holds over a 12-month period, from January 2012 to December 2012. For
each household in the panel, a detailed log of browsing activities at session
level is recorded. Each website being visited is assigned a unique category,
with a total of 29 categories. The most popular categories include “Social
Media,” “Entertainment,” “Portals,” “News/Information,” and “Retail.”
Figure 20.1 shows an example of a profile fragment. Each solid bar rep-
resents the number of visits to the corresponding site category over a
certain period of time. This consumer shows high level of engagement with
Entertainment, Games and Social Media sites and fairly low interest in
Business Finance and Lifestyles sites.
The consumer profile depicted on Figure 20.1 represents an unbiased
view of the consumer’s online browsing activities, as it was collected using
tracking software installed on the consumer’s computer. An advertiser’s
view of this profile may be quite different, as it depends on the advertiser’s
tracking ability (or the ad network coverage). For example, if Netflix and
MIZIK_9781784716745_t.indd 449 14/02/2018 16:38

DATA PRIOR POSTERIOR
Profile prediction based on Profile distribution from 3rd Bias Corrected Profile
the advertiser’s data party data (anonymized)
Figure 20.2 Intuition behind the proposed approach
Facebook are not part of the advertiser’s network, the profile view may
look like that depicted by pattern-filled bars on Figure 20.1, where the
advertiser underestimates the consumer’s interests in Entertainment and
Social Media categories. This could affect the decision of what type of ads
to serve to this consumer.
The approach presented in this case study addresses this problem as
follows (Figure 20.2). First, we develop a statistical model that describes
a consumer’s profile and, importantly, captures dependencies across
different dimensions of the profile. Second, we calibrate this model using
anonymized third-party data available from market research firms that
employ large online user panels and collect their browsing activities. As
a key outcome of this step, we learn various relationships in cross-site
category activities that exist on the population level. Finally, we combine
the profile information extracted from the advertiser’s own records (pre-
sumably incomplete) with the relationships inferred from the previous step
to arrive at the bias-corrected view of the individual profile.
Our statistical model for describing consumer profiles extends from
the Correlated Topic Model, or CTM (Blei and Lafferty 2007), which
is among the latest developments in the family of Topic Models. Topic
models were originally used to identify the mixture of topics present in a
large number of documents. Just like a document can be considered as a
combination of multiple topics, a consumer’s website visit activities can
be considered as the combination of multiple “roles” or objectives. For
example, the consumer may play a “social” role, where she visits places
like Facebook or Twitter; she may play a “shopper” role at another time
and visit places like eBay or Amazon; she may also play an “information
seeker” role at yet another time, visiting CNN and blogs, etc. Topic
models thus are a good conceptual fit to our task of user profiling using
website visit data.
The most commonly used topic model is the Latent Dirichlet Allocation
model, or LDA (Blei et al. 2003; Griffiths and Steyvers 2004). LDA
models the generation of mixed-topic documents in two steps, from
document to topic and then from topic to word, with each step modeled
MIZIK_9781784716745_t.indd 450 14/02/2018 16:38

as multinomial random draws with Dirichlet priors. While powerful in

extracting meaningful topics, LDA has a major limitation of not being
able to account for correlations in topic composition. The CTM model
(Blei and Lafferty 2007) was developed to address this limitation, where
a multivariate normal prior is used for document-topic composition so
correlations can be explicitly modeled.
Our model further extends the standard CTM model in three aspects,
by accounting for website visitation intensity, by including explanatory
variables such as consumer demographics that give rich descriptions of
consumer profiles, and by modeling the evolution of roles over time to
account for changes in consumers’ website visitation tendencies.
Formally, there are I consumers, each indexed by i,i 5 1,. . .,I. There are
T time periods, each indexed by t, t 5 1,. . .,T. For each consumer at each
time period, we observe a vector of category-specific website visit counts,
denoted as:
Vit 5 (Vit1,..,VitC) (20.1)
In the vector, Vitc is the number of times a consumer visits websites that
belong to category c in the time period t and C is the total number of
categories.
Following the conceptual framework of topic models, each individual
visit takes place in a two-step process. First, the consumer decides on
the role for the website visit. Next, according to the role decided on in
the first step, the consumer decides on the website to visit. For example,
a consumer may decide that she wants to do some online shopping and
then visits Amazon.com. A consumer is expected to have multiple needs,
such as shopping, social, education, etc. The overall website visit profile is
the combination of the different roles the consumer plays to satisfy those
needs. Different consumers would have different emphasis on individual
roles. A college student, for example, may spend more time playing educa-
tion and social roles than a retired person does. The role-composition of
consumer i in time period t is denoted as:
Pit 5 ( pit1,..pitR) (20.2)
In the vector, R is the total number of roles; pitr is the probability she
plays role r in time period t.
When playing different roles, a consumer is expected to visit different
categories of websites with different probabilities. Someone who is doing
online shopping may visit Amazon and eBay, while someone who is
studying may visit a university website. Each role is thus represented as
MIZIK_9781784716745_t.indd 451 14/02/2018 16:38

a distribution over different website categories. The distribution of role r

over the website categories is denoted as:
r 5 (ϕr1,.. ϕrC) (20.3)
In the vector, ϕrC is the probability a consumer taking role r will visit a
web site which belongs to category r.
Furthermore, the total number of visits of consumer i at time t, repre-
senting the consumer’s internet usage intensity, is denoted as Nit and is
drawn from a Poisson distribution:
Nit , Poisson (lit) (20.4)
We perform a logit-transformation of the role composition probabilities

to have Pit be generated from parameter Qit 5 (Uit1,..uitR) (we normalize
uitR 5 0) as follows:
exp (uitr)
1 1 a rr 51..R21exp (uitrr)
pitr 5 , r 51,. . .,R21
µ (20.5)
1
11 a
pitR 5
exp (uitrr)
rr 51..R21
Re-parameterizing this way enables the incorporation of the observed

heterogeneity, unobserved heterogeneity, and dynamics of consumer’s
role composition. In this setup, qitr represents the probability of consumer
i playing role r at time t, relative to role R. We model qitr as:
uitr 5 uir 1 xrit rY r 1 ditr, r 5 1,...,R 21 (20.6)
In equation (20.6), uir is the consumer i’s baseline propensity for role
r. A positive value of uir indicates that the r-th role accounts for a bigger
portion of website visitation than the last role, role R. Xit is a vector of
observed characteristics that can be consumer-specific, time-specific,
or both. The corresponding coefficients are captured in rY r . Admitting
observed heterogeneity this way allows us to analyze how observed
consumer characteristics and other observed characteristics determine role
composition. For example, if age is observed and we expect a younger con-
sumer to spend more time playing a “social” role, then the coefficient for
age for the social role should be positive. Firms that possess large amounts
of data on such characteristics can thus leverage such information to
MIZIK_9781784716745_t.indd 452 14/02/2018 16:38

improve user profiling accuracy. ditr is an individual- and time-specific

factor that captures the evolution of the consumer’s role propensities over
time. It is parameterized to capture both population-level dynamics using
fixed effects and individual-consumer-level dynamics using autoregressive
terms. The usage intensity parameter lit is modeled in a similar approach.
Our model further admits unobserved heterogeneity by treating the
individual consumer-specific baseline role composition parameters and
the usage intensity parameter of consumer i as drawn from a population-
level multivariate normal distribution. The inclusion of the usage intensity
parameter gives us the ability to understand how different roles are related
to the amount of web activities.
ui1 u
... . .1 .
° uiR21¢ , Na ° u ¢ , Sb (20.7)
R21
li ul
In equation (20.7), S encodes the variance of the distribution of each
role across consumers, and the correlations among roles and between roles
and the web site usage intensity.
As discussed earlier, each consumer visit is generated from a two-step
process. For each of visit v,v 51..Nit , she first decides on a role:
citv ,Multinomial (Pit ) (20.8)
Then, based on the chosen role, she decides on the category of the web
site to visit:
citv ,Multinomial (ritv) (20.9)
The overall visit profile is then summarized as
Vit 5 (Vit1 ,..,VitC) (20.10)
Finally, the role-category mapping is drawn from a Dirichlet distribution:
r , Dir (a
Y) (20.11)
The model can be estimated using a hierarchical Bayesian approach

with data augmentation. Our model and estimation approach are particu-
larly suitable for large-scale datasets (“Big Data”) due to its scalability.
Our proposed model falls into the general framework of Latent Variable
models discussed in Ahmed et al. (2012), which demonstrates how this
MIZIK_9781784716745_t.indd 453 14/02/2018 16:38

entire category of models can be efficiently estimated in parallel using,

for example, Hadoop. To further demonstrate the scalability, we also
developed a parallel estimation algorithm and tested its implementation
in a multiprocessor environment, which shows the estimation speeds up
almost linearly when the number of CPUs increases while processing time
per profile record is stable as the size of the database growths.
A key question of interest is to predict a consumer’s overall visit
profile using partial information. Using the estimates of consumers’ role
composition and the category composition of each role, this prediction
can be conducted through standard Bayesian updating. Assume that we
|
know a subset of a consumer i’s website visits, denoted as Vi . The predic-
tion task involves finding the posterior of ui 5 (ui1,...,uiR21, li) t given the
|
data Vi , using the population level estimates as prior. This posterior can
be generated in the same way as the model is estimated. Intuitively, this
approach uses the subset of website visits to refine the understanding of
|
the consumer’s profile. For example, if Vi contains many visits to social
networking websites, the composition of the social role for the consumer
would be adjusted upward. Using this approach, we can generate a predic-
tion of a consumer’s overall visit profile from any subset of data about the
consumer.
Method Application: Scenario Analysis
The improvement in user profiling afforded by our model may have sig-
nificant profit implications for firms. We now present an economic simu-
lation that illustrates potential gains the proposed model may offer to a
firm if used in individual-level targeting of display ads.
Consider a hypothetical digital advertising agency that generates traf-
fic to the website of their client using display (banner) advertising. The
agency distributes ads through ad exchange, paying $2.80 per thousand
impressions served (CPM) and getting an industry average click-through
rate of 0.5 percent (Johnston 2014). Accordingly, the agency’s effective
cost of generating a click to the client’s website is $0.56. The agency
charges the client a pre-negotiated rate of $0.67 per site visit. The agency
operates on a set daily budget of $1,000, which helps to generate about
1,786 visits per day with the baseline click-through rate of 0.5 percent.
Clearly, the agency’s profitability will improve if it can produce more
clicks. While several factors contribute to the click-through rate of a
given ad (e.g., ad creativity, page placement, context), profile-based
targeting is one of the key drivers of ad performance (Hazan and Banfi
2013).
MIZIK_9781784716745_t.indd 454 14/02/2018 16:38

Table 20.1 Segment sizes and click-through probabilities
Segment size Targeting premium CTR

Top users 30% 25% 0.63%
Everybody else 70% −11% 0.45%
Table 20.2 Effective CTR under different targeting approaches
Targeting approach
Base Histogram Proposed model
Precision 30% 42% 54%
CTR 0.50% 0.52% 0.54%
As a common practice in this industry, the agency employs their own pro-
prietary scoring model that links user’s online behavioral profile and pro-
pensity to click on the ad. For the sake of this simulation we assume that,
for the top 30 percent most active users in the target profile category, the
click-through probability is 25 percent higher than the average, while for
the remaining 70 percent of users the CTR is correspondingly 11 percent
lower than average. These numbers are selected to preserve the average
rate of 0.5 percent (Table 20.1).
With the help of our model, the agency should be able to improve
the performance of this campaign by targeting individuals in the “Top
users” segment. In the extreme case, all the ads should be served only to
the “Top users” segment achieving a click-through rate of 0.63 percent
(Table 20.1). Clearly such performance is unrealistic, and the effective
CTR would depend on classification accuracy, which in turn depends on
the information available to the agency and the targeting model. As part
of our study, we analyzed the information content of the data available
to several prominent advertising networks, and evaluated the potential
gains using our modeling approach.1 For example, assuming the agency
is DoubleClick (or has information of similar quality as DoubleClick),
according to the data we have, Table 20.2 presents the results of effective
CTR when different targeting models are used. Using the histogram
approach the agency is able to accurately identify 42 percent of active
users, resulting in an effective click-through rate of 0.52 percent. Our
model produces further improvement with 0.54 percent effective CTR.
Finally, substituting effective click-through rates from Table 20.2 into
profit calculations, we get an improvement in profit of 25 percent for the
MIZIK_9781784716745_t.indd 455 14/02/2018 16:38

Table 20.3 Profit calculations
Targeting approach
Base Histogram Proposed model
CTR 0.50% 0.52% 0.54%

Effective CPC $ 0.56 $0.54 $ 0.52

Traffic 1,786 1,860 1,937
Price to client $ 0.67 $0.67 $ 0.67
Revenue $ 1,196.43 $ 1,246.51 $ 1,297.57

Profit $ 196.43 $ 246.51 $ 297.57
Profit improvement over base 25.5% 51.5%
Model
histogram-based targeting and 51 percent improvement for the proposed

model (Table 20.3).
Conclusion
As “digital” has established itself as a key medium for reaching and inter-
acting with consumers, one-on-one marketing is becoming a norm for
online businesses. Fueling this process is the ability to collect, analyze and
act upon individual-level data. This case study focuses on a fundamental
component of online marketing – user profiling. Valued by most online
businesses, user profile data have broad application across different areas
of digital marketing. McKinsey & Company regards online user profiling
as one of the promising opportunities companies should take advantage
of to unlock “big data’s” potential (Hazan and Banfi, 2013) Our pro-
posed approach extends the Correlated Topic Model (Blei and Lafferty
2007) for user profiling. The proposed approach augments individual-
level first-party data with anonymized third-party data that significantly
improves profile recovery performance and helps to correct for biases. The
approach is highly scalable and easily parallelized, improving almost lin-
early in the number of CPUs. It produces easily interpretable and intuitive
results, while taking into account both observed and unobserved hetero-
geneities. Using economic simulation, we demonstrate potential gains the
proposed model may offer to a firm if used in individual-level targeting of
display ads.
MIZIK_9781784716745_t.indd 456 14/02/2018 16:38

Note
1. In our study, we collected the website coverage information of several prominent adver-
tising networks.
References
Ahmed, Amr, Mohamed Aly, Joseph Gonzalez, Shravan Narayanamurthy, and Alexander
Smola (2012), “Scalable inference in latent variable models,” in Proceedings of the Fifth
ACM International Conference on Web Search and Data Mining, Seattle, WA: ACM,
123–132.
Blei, David M. and John D. Lafferty (2007), “A correlated topic model of science,” Annals
of Applied Statistics, 17–35.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan (2003), “Latent dirichlet allocation,”
Journal of Machine Learning Research, 3: 993–1022.
eMarketer (2015), “US Digital Ad Spending, by Format, 2013–2019” (accessed September 2015),
[available at http://acquisio.com/blog/display-advertising/display-surpasses-search-2015/].
Griffiths, Thomas L. and Mark Steyvers (2004), “Finding scientific topics,” Proceedings
of the National Academy of Sciences of the United States of America, 101.Suppl 1 (2004):
5228–5235.
Hazan, Eric and Francesco Banfi (2013), “Leveraging big data to optimize digital marketing”
(accessed March 4, 2015), [available at http://www.mckinsey.com/client_service/marketing_
and_sales/ latest_thinking/leveraging_big_data_to_optimize_digital_marketing].
Hofmann, Thomas (1999), “Probabilistic latent semantic indexing,” Proceedings of the 22nd
annual international ACM SIGIR conference on Research and development in information
retrieval. ACM.
Johnston, Michael (2014), “Display Ad CPM Rates” (accessed February 9, 2015), [available
at http://monetizepros.com/cpm-rate-guide/display/].
Papadimitriou, Christos H., Prabhakar Raghavan, Hisao Tamaki, Santosh Vempala (2000),
“Latent Semantic Indexing: A Probabilistic Analysis,” Journal of Computer and System
Sciences, 61 (2): 217–235.
MIZIK_9781784716745_t.indd 457 14/02/2018 16:38

21. Dynamic optimization for marketing
budget allocation at Bayer
Marc Fischer and Sönke Albers
Rule of Thumb for Marketing Budgeting

is Common
Determining the marketing budget has been of paramount importance to

marketers for many decades.1 Global players such as Procter & Gamble
spend billions of dollars on advertising per year. Nevertheless, marketing
practitioners frequently use rules of thumb when it comes to determining
marketing budgets. By far the most often used budget rules across regions
and industries are the “percentage-of-sales,” “objective-and-task,” and
“affordability” methods. In addition, budget decisions are often based
on gut feelings or on the negotiations skills of individual managers.
Consequently, politics and individual opinions tend to shape the decision
process instead of fact-based discussions. Obviously, these rules and prac-
tices bear the risk of results far away from the optimal, profit-maximizing
budget.
Challenges of Optimal Budget Allocation
The global annual marketing budget of a company is usually set in the pre-
vious year; that is, it is fixed. If companies offer a broad product portfolio
to customers from various countries and use a variety of communication
channels they need to break down the fixed annual budget into expendi-
tures across countries, products, and communication activities. For many
firms this task requires determining individual budgets for hundreds of
allocation units. As a result, firms face a complex decision problem: they
need to allocate a fixed budget across a multitude of allocation units by
evaluating the impact of these investment decisions on future cash flows.
Since marketing expenditures are immediately recognized as costs on the
income statement, but their total impact on sales often fully unfolds only
in future periods, they need to be evaluated in terms of an investment deci-
sion and based on the principle of marginal returns. Technically speaking,
management needs to solve a dynamic optimization problem for an invest-
458
MIZIK_9781784716745_t.indd 458 14/02/2018 16:38

Dynamic optimization for marketing budget allocation at Bayer 459
ment portfolio under a budget constraint. This management challenge

recurs on a regular basis as marketing budgets are set annually.
Therefore, we developed a new allocation approach. In a first step, we
present a theoretical solution that provides important insights into how
individual budgets should be set so that they account for differences in
profit contribution, marketing effectiveness, and growth potential. In a
second step, we derive a near-optimal allocation rule from that solution
that addresses the demand for simple allocation rules by practitioners. It
can be used with a spreadsheet. While easy to understand and to imple-
ment, the heuristic goes beyond widespread budgeting rules such as the
“percentage-of sales” rule.
Developing a Dynamic Budget Allocation

Approach
According to previous findings, the profit improvement potential from

a better allocation of a total marketing budget is much higher than from
optimizing the total budget (Tull et al. 1986). Therefore, the approach does
not tackle how to determine the overall budget, but how to allocate a fixed
budget that is constant over the planning horizon. The model provides a
solution for an international firm that offers a broad portfolio of products to
customers from different countries, using different marketing tools or activi-
ties to promote its products (e.g., traditional advertising, personal selling,
etc.). The portfolio is composed of products that differ in their life-cycle stage
and the firm wishes to maximize the discounted total profits of its portfolio.
Specifically, the model integrates and trades off information about
l the size of the business,

l the profit contribution margin,
l the (short-term) effectiveness of marketing investments,
l the carryover-effect of marketing investments,
l the growth potential, and
l the time value of money.
In the model, sales are represented by a general growth function (a

product life cycle) and the response to marketing investments. The
growth function describes the evolution of new product sales over time
and is assumed to be influenced by marketing investments. The effect
of marketing investments is determined by a marketing stock that arises
from previous investments and depreciates over time (decay factor), plus
MIZIK_9781784716745_t.indd 459 14/02/2018 16:38

∑ Discount ∑ ∑ Profit con- Unit – ∑k ∑i ∑ Marketing

Max!
factort
Time Countries Products tributionkit saleskit Activities expensekint
(t€T) (k€K) (i€Ik) (n€Ni)
Discounted net value of product portfolio
where Unit sales = f(life cycle, marketing expense, etc.)
Restrictions (1) ∑k ∑i ∑n Marketing expensekint = Total Budgett
(2) Marketing effects deacy at a constant rate over time
(3) Boundary conditions (e.g., positive marketing budgets)
Figure 21.1 Constrained dynamic profit maximization problem
the marketing investments in the current period. Based on these specifica-

tions the discounted net value of the product portfolio is maximized.
Figure 21.1 shows the formulation of the maximization problem and its
restrictions in mathematical terms.
The optimal solution considers dynamics in two different ways. First, it
incorporates the dynamic effects of building and leveraging the marketing
stock, which is reflected in the marketing carryover coefficient. Second, it
accounts for the growth potential of a product that is related to marketing
investments as reflected in the growth elasticity.
The growth elasticity measures the power of marketing to shape the life
cycle. Hence, based on prior evidence it is assumed that the growth process
is not predetermined but can be influenced by the level of marketing expen-
ditures in different phases of the life of a product (Fischer, Leeflang, and
Verhoef 2010). In particular, marketing investments in the growth potential
of a new product have a strong impact on future cumulative sales and
discounted cash flows. On the basis of a parametric growth model, it can be
demonstrated in the subsequent case study how the optimal solution favors
shifting marketing resources to young products so that they can leverage
their endogenous growth potential.
The optimal solution is based on the principle of relative attractiveness
of an allocation unit to get a share of the total marketing budget. The idea
is to calculate the optimal allocation weight for a product, as an example,
and relate this weight to all allocation weights of the portfolio. This share
is proportional to the profit contribution margin, current sales, marketing
budget elasticity, and growth multiplier.
Figure 21.2 explains how the optimal allocation weights for individual
countries, products and marketing activities can be determined in detail.
MIZIK_9781784716745_t.indd 460 14/02/2018 16:38

Optimal allocation weightkint

Optimal budgetkint = Total
∑ ∑ ∑ Optimal allocation weightkint budgett
Countries Products Activities
(k€K) (i€Ik) (n€Ni)
Profit con- Optimal unit Optimal mktg Optimal growth

Optimal allocation tributionkit saleskit elasticitykint elasticitykit
weightkint =
1 + Discount rate – Marketing carryoverkin
Figure 21.2 Optimal solution
Unit sales, marketing elasticity, and growth elasticity are labeled “optimal”
in Figure 21.2 because this information is endogenous and depends on the
budget and resulting unit sales in the optimum. The exact numbers can
only be determined in an iterative process by applying dynamic numeri-
cal optimization techniques. However, the structure of the optimal solu-
tion provides the basis for deriving a heuristic rule that does not require
numerical optimization. We describe this heuristic rule subsequently.
Implications for Budget Allocation
The optimal solution (Figure 21.2) provides a number of intuitive insights

into the allocation problem.
l The optimal budget for a product relative to other products

increases with its contribution margin and its sales base.
l Similarly, the larger a product’s long-term marketing effectiveness
for a certain activity, the higher its optimal budget.
l The long-term marketing effectiveness is composed of the short-
term sales elasticity, the discount rate, and the marketing carryover.
Consequently, if long-term marketing effectiveness is larger across
all activities of a product compared to other products the total
budget for that product increases.
l Finally, the sales growth elasticity varies over the life cycle. It is
largest at the beginning when most of the sales are yet to come.
Hence, the potential impact of marketing expenditures on future
cash flows is greatest at this stage, which is why young products get
a higher allocation weight and thus a larger share in total budget.
MIZIK_9781784716745_t.indd 461 14/02/2018 16:38

Because of the growth potential of a new product, the optimal marketing

budget might even be higher than revenues at the beginning of its life.
Therefore the solution may suggest spending money on products that
involve a temporary loss in such a case.
Adapting the Approach for Practical

Application
For managers, it is more transparent and easier to use an allocation rule
instead of a numerical solution of such a model. Therefore, an allocation
heuristic is derived directly from the theoretical solution that produces
near-optimal budgets, is easy to understand for managers, and can be imple-
mented in a simple spreadsheet. Basically, the proposed heuristic is a simple
proportional rule that integrates relevant information from three areas:
l the long-term effectiveness of marketing investments in the focal

product,
l the profit contribution of the focal product,
l and the focal product’s growth expectations.
Figure 21.3 shows how the allocation weights are determined using the
simplified decision rule. Data for the carryover coefficient, sales elasticity,
and the growth multiplier are not readily available but must be estimated.
For example, if historical sales and marketing time-series are available,
econometric methods can be used to estimate marketing elasticity and
carryover.
Expected
Last period’s revenues in
marketing elasticity Profit con- Last T periods
tribution period’s
Discount – Marketing Last period’s
1+ margin (%) revenue
rate carryover revenues
1 2 3
(Discounted) long-term Size of profit Growth
marketing effectiveness contribution potential
(T = Planning
horizon)
Figure 21.3 Heuristic allocation weight
MIZIK_9781784716745_t.indd 462 14/02/2018 16:38

Current values of revenues are available from last year and the contribu-
tion margin is a target figure decided by management. The growth poten-
tial is calculated as a multiplier that divides expected revenues in 5 years
(planning horizon) by the current revenue level. By this, products get a
greater share of the total budget as long as they are expected to grow.
In contrast, when they are expected to turn into their decline stage, their
budget is reduced.
By definition, the heuristic solution is likely to differ from the optimal
solution, but it should not deviate too much to be useful. The performance
of the heuristic was tested in an experimental simulation study and found
to provide very good results that even improve after several planning
cycles and converge to the optimum if applied consecutively (Fischer,
Wagner, and Albers 2013).
Although the tool was applied to prescription drugs (see below), it is
suitable for many other industries such as consumer durables, consumer
packaged goods, etc. In all these markets, rich information is available at
the aggregate product level that allows the calibration of market response
models.
Practical application in the

pharmaceutical industry: The Bayer Case
Company Background and Challenge
Bayer belongs to the leading companies in the pharmaceuticals and chemi-

cals business sector of the world. As of 2008, the company had €32.9 billion
in sales and around 108,600 employees (Bayer 2009). The company invests
substantial resources in marketing and sales activities. Total marketing
and selling expenditures were €7.6 billion (~23.1 percent of total sales)
in 2008. Bayer consists of three major business areas, Healthcare being
the largest area in terms of sales (contributing almost 50 percent). Within
Healthcare, the Primary Care Unit (€3.1 billion) is the largest within the
prescription drug business (€10.7 billion). The unit operates in four sepa-
rate competitive market environments or therapeutic areas, respectively:
diabetes, hypertension, infectious diseases, and erectile dysfunction.
The challenge for management was to find a balance in the allocation
of marketing resources that trades off the size of the business, the growth
expectations, and eventually the effectiveness of marketing expenditures.
The main objective was to improve the process and results of annual
budget allocation in order to maximize discounted profits from the prod-
uct portfolio over a planning horizon of five years. The implementation of
MIZIK_9781784716745_t.indd 463 14/02/2018 16:38

the allocation tool was targeted at the five main European countries that
contribute the largest share to total sales. The application was developed
in the period 2005–2006 and budget recommendations for 2007 were
derived.
At that point in time, the three therapeutic areas diabetes, hypertension,
and infectious diseases represented established areas that are in their satu-
ration stage. Due to the aging of the population in industrialized societies
and innovative new-product introductions, they are, however, expected to
continue to grow at moderate rates in the future. The biggest challenge for
Bayer in these areas is to keep its market position. Existing and new drugs
by other global players are the main competitors for the Bayer drugs.
In contrast, the market for the treatment of erectile dysfunction is a
new category that was pioneered by Pfizer with its Viagra brand in 1998.
Bayer and Eli Lilly followed in 2003 with the introduction of their brands
Levitra and Cialis. This market is still growing and does not face generic
competitors yet.
Data and Model Estimation
To obtain relevant input information such as sales elasticities and growth

parameters, the authors estimated a market response model for each
product market (Fischer and Albers 2010; Hanssens, Parsons, and Schultz
2001). Quarterly marketing and sales data at the product level of the pre-
vious 10 years (1996–2006) were available. The market response model is
a mathematical representation of how sales evolve over time and react to
marketing and other investments. Estimating the parameters of this model
from the observed sales time-series provides the data input to compute
marketing elasticity and other input data, which are not observed. Bayer
management helped to identify the relevant subcategories and competi-
tors within each therapeutic area by country. Subcategories vary from 12
for Anti-infectives to one for Erectile Dysfunction. Products vary from
15 for the Erectile Dysfunction area and 306 for the Hypertension area.
Table 21.1 gives an overview of the key input variables used to calibrate
the heuristic allocation tool.
Each therapeutic area is specified as a double-log sales response func-
tion that accommodates nonlinear and interaction effects. Marketing mix
data in each area was reflected by including marketing stocks (a combina-
tion of all marketing expenditure types) for Bayer and its competitors (in
total), own and competitive prices and brand/quality effects. A double-log
market response model was used to ensure diminishing marginal returns
and get estimated parameter values associated with marketing-mix vari-
ables that correspond to elasticities, which indicate the effectiveness of the
MIZIK_9781784716745_t.indd 464 14/02/2018 16:38

Table 21.1 Overview of input variables for the heuristic allocation tool for Bayer
Antidiabetes Hypertension Erectile Anti-infectives
MIZIK_9781784716745_t.indd 465
dysfunction
Mean SD Mean SD Mean SD Mean SD
Unit sales in thousand standard units 16,319 20,674 11,391 16,649 1,008 649 5,291 8,004
Elapsed time since launch in years 14.50 12.69 10.00 7.42 2.75 1.91 12.25 10.45
Order of entry (Median) 3 4 2 3
Price in EUR per standard unit .16 .26 .50 2.96 7.00 .48 2.01 1.97
Martketing stock variables:
Detailing at general practitioners 22,519 36,566 64,595 87,134 55,026 30,326 44,259 34,930
in thousand EUR.
465
Detailing at specialists in thousand EUR. 2,081 4,068 8,803 13,701 14,498 12,771 10,380 11,353
Detailing at pharmacies in thousand EUR 588 1,453 1,930 3,039 1,766 2,598
Professional journal advertising in thousand 149 341 458 502 165 295
EUR.
Meeting invitations in thousand EUR. 730 2,030 1,361 3,062 3,884 2,481 471 837
Other Martketing expenditures in thousand 2,558 9,278 3,912 4,404
EUR.
# of countries 5 5 5 5
# of subcategories 6 10 1 12
# of products 104 306 15 100
# of observations 2,398 7,908 233 2,916
Note: All units and EUR figures are on a quarterly basis.
14/02/2018 16:38
specific activities. An elasticity is a dimensionless measure of the relative

change of a dependent variable such as sales divided by the relative
change of an independent variable such as the marketing budget. Thus,
if sales increases by 5 percent as a result of increasing the marketing
budget by 20 percent, then the elasticity is 5 percent/20 percent = 0.25. It
can be compared across products, countries and marketing instruments.
Further, the model incorporates a number of control variables that have
been shown to impact sales of pharmaceuticals, such as order of entry,
country or seasonal effects and asymmetric life cycle functions. In-sample
model fit and predictive validity were very good across all four therapeutic
areas.
The effectiveness of detailing and other marketing activities varies sub-
stantially across the different therapeutic areas. In general, they worked
best in the Erectile Dysfunction category, which is not surprising as this
category was the youngest category and still in its growth phase. In detail-
ing, visiting general practitioners appears to work better than detailing at
specialists and pharmacists. However, considering that specialists account
only for a share of about 20 percent in Diabetes and about 27 percent in
Hypertension, segment-specific specialist detailing effectiveness is four-to-
five times higher. These findings are consistent with findings from other
pharmaceutical studies (Albers, Mantrala, and Sridhar 2010). Own prices
effects were significant, but price changes did not have strong effects. The
impact of competitive marketing expenditures was negative across all
therapeutic areas, although it was not always statistically significant. An
earlier market entry was favorable, as expected. Seasonal effects were only
relevant to Anti-infectives, which experience a high season in autumn and
winter.
The Bayer Implementation
To ensure that management can easily use the allocation formula in

everyday business life, the authors developed an Excel-based Decision
Support Tool. The tool provides budget scenarios and their implications
for the development of market shares and profits over five years and pro-
duces a recommendation for the allocation of the total marketing budget.
It uses input data at the quarterly level.
The heuristic rule requires computing an allocation weight for each mar-
keting spending category and each drug. Input data have been obtained
either from econometric analysis or internal records. The plausibility of
input data, especially the estimated sales elasticities, has been extensively
discussed with different groups of managers in several workshops (global
MIZIK_9781784716745_t.indd 466 14/02/2018 16:38

marketing, market research, product management, sales management,

controlling, etc.).
Following the needs of management, the tool was extended in two
ways. First, a threshold for product budgets was included because of
internal setup costs that are fixed at the product and marketing-activity
level. Second, manual adjustments to budgets recommended by the
heuristic were made possible. By this feature, management could account
for exogenous restrictions to budget setting, e.g., to counter competitive
attacks in a predetermined way. In addition, it enabled management to
investigate the effects of budget scenarios on market share and profit, as
well as on the recommended budgets for other products and marketing
activities. The tool is easy to use and flexible enough to adapt to varying
conditions of decision making. The effort to develop and implement the
budget allocation tool had significant impact on managerial decision
making.
Impact on Managerial Decision Making
Providing structure to the allocation problem

Obviously it is a challenging task to allocate a total budget across six
spending categories for 36 drugs that are marketed in different countries
and therapeutic areas. The suggested allocation heuristic provides struc-
ture to this complex decision problem. It specifies that information and
data from three fields are necessary (data on the long term effectiveness of
marketing, information on a product’s contribution to profit, and growth
potential of the product).
Providing a solution to the problem

The allocation rule suggests that these three fields of information are to be
combined in a multiplicative fashion so that the budgets are proportional
to these three information pieces. Implications from this rule are straight-
forward, in particular:
(1) Products that generate more incremental sales with the same budget
should get a larger slice of the total budget. Of course, relative incre-
mental sales tend to decline as sales and budgets increase due to satu-
ration effects.
(2) Products with a higher level of profit contribution generate more
financial resources to cover their own marketing expenditures and
contribute more to overall profits.
(3) Marketing should support growing and not declining products and
shift resources over the life cycle.
MIZIK_9781784716745_t.indd 467 14/02/2018 16:38

The rule also teaches that the drivers of a product’s near-optimal

budget share interact with each other, i.e., there exist synergies between
them. Finally, it makes the tradeoffs in budget allocation transparent. For
example, a product with high marketing effectiveness but a low profit con-
tribution level could get a lower budget than a product with a high level
of profit contribution but lower marketing effectiveness. Even though that
product’s spending is less effective, it may still contribute more to overall
profit because of its larger sales base.
Understanding the limitations of separate ROI analyses

Profit calculations with the allocation tool quickly revealed the limita-
tions of comparing incremental ROIs that result from raising/decreasing
marketing expenditures for individual products and marketing activities.
First, separate ROI analyses for individual marketing activities do not
consider synergies between marketing activities nor do they consider the
trade-offs that exist with respect to potential profit improvements by other
products and activities. Further, they do not inform about the optimal
magnitude of budget changes for products and activities, given a fixed
total budget. All three requirements are met by the allocation heuristic
within one step.
Organizational impact
Although the allocation tool is not the only source used by Bayer to
generate budget options, it has significantly improved the efficiency and
quality of the decision process. Because of its transparency and top–down
perspective, the allocation tool ameliorates a decision process that often
appears emotional and inefficient. Since it is strictly based on a range
of verifiable input information, the allocation tool adds an independent
perspective and its recommendations are fully fact-based. The budgeting
project contributed substantially to an organizational transformation that
eventually resulted in the creation of a completely new marketing intel-
ligence unit called Global Business Support. This unit supports global
marketing management and sales, including the global management board
with tools, results, and recommendations for a more efficient and effective
use of marketing resources.
Last and most important: financial impact

The tool enables its users to simulate the financial impact of different
budget allocation options. By analyzing the simulation results, it provides
transparency about the impact of different assumptions on financial
results. Figure 21.4 shows an example of the budget-shift recommenda-
tions of the model in the hypertension market. A budget shift between the
MIZIK_9781784716745_t.indd 468 14/02/2018 16:38

Budget before Discounted profit
+6.7 m
4.5 m
1.5 m
Hypertension Hypertension +4.0 m

product A product B
Budget after
2.3 m
2.2 m
Hypertension Hypertension
Hypertension Hypertension product A product B
product A product B
Figure 21.4 Examples of budget re-allocation across products in euros
two products implying an overall budget reduction can cause increased

profits for both products.
Based on the year 2007, the simulation suggested an increase in dis-
counted profits of 55 percent over the next five years due to an optimized
allocation. This is worth of €493 million. In contrast, changing the overall
budget by 20 percent promised a profit impact of less than 5 percent. Even
if only a small portion of this increase can be realized, the additional profit
for a business unit such as Primary Care with €3 billion worldwide sales is
substantial.
The actual profit improvements are hard to evaluate. First, management
did not completely follow the suggested reallocation by the tool for several
reasons (e.g., varying personal experiences, concerns about errors in some
data from third-party data providers). Second, activities by competitors
and exogenous influences on market dynamics impact profit results.
Nevertheless, the business area Bayer HealthCare reports an increase in
EBIT of 12 percent (€273 million) compared to a four percent revenue
increase for the year 2008 (Bayer 2009). Although there is no validation
from a field test, these results are consistent with prior observations that
reallocation really focuses on the bottom line.
MIZIK_9781784716745_t.indd 469 14/02/2018 16:38

Conclusion
The innovative budgeting allocation approach provides a simple but

comprehensive heuristic that accounts for dynamics in marketing effects
and product growth. Allocating a budget proportionally to the size of the
business (sales and profit contribution margin), the effectiveness of the
marketing activities (short-term elasticity and carryover coefficient), and
the growth potential of the product (growth multiplier accounting for time
discounting) revealed substantial profit improvement potentials compared
to a simple allocation dominated by rules of thumbs, separate ROI analy-
ses for different products or subjective evaluations. It is suitable for many
other industries such as consumer durables and consumer packaged goods,
provided that rich information is available at the aggregate product level.
Note
1. This chapter is an adapted version of Marc Fischer, Sönke Albers, Nils Wagner,
and Monika Frie (2011), “Dynamic Marketing Budget Allocation across Countries,
Products, and Marketing Activities,” Marketing Science, 30 (4), pp. 568–585, and
appeared slightly modified under the title “Dynamically Allocating the Marketing
Budget: How to Leverage Profits across Markets, Products and Marketing Activities,”
in Marketing Intelligence Review, 4 (1), 2012, 50–59.
References
Albers, Sönke, Murali K. Mantrala, and Srihari Sridhar (2010), “Personal Selling Elasticities:
A Meta-Analysis,” Journal of Marketing Research, 47 (5), 840–853.
Bayer (2009), Annual Report, 2008. Bayer AG, Leverkusen, Germany.
Fischer, Marc and Sönke Albers (2010), “Patient- or Physician-Oriented Marketing: What
Drives Primary Demand for Prescription Drugs?” Journal of Marketing Research, 47 (2),
103–121.
Fischer, Marc, Peter S. H. Leeflang, and Peter C. Verhoef (2010), “Drivers of Peak Sales for
Pharmaceutical Brands,” Quantitative Marketing and Economics, 8 (4), 429–460.
Fischer, Marc, Nils Wagner, and Sönke Albers (2013), “Investigating the Performance
of Budget Allocation Rules: A Monte Carlo Study,” MSI Report Series No. 13-114,
Cambridge: MA: Marketing Science Institute.
Hanssens, Dominique M., Leonard J. Parsons, and Randall L. Schultz (2001), Market
Response Models: Econometric and Time Series Analysis. 2nd ed., Boston: Kluwer.
Tull, Donald S., Van R. Wood, Dale Duhan, Tom Gillpatick, Kim R. Robertson, and James
G. Helgeson (1986), “’Leveraged’ Decision Making in Advertising: The Flat Maximum
Principle and Its Implications,” Journal of Marketing Research, 23 (1), 25–32.
MIZIK_9781784716745_t.indd 470 14/02/2018 16:38

part viii
case studies and

applications in
public policy
MIZIK_9781784716745_t.indd 471 14/02/2018 16:38

MIZIK_9781784716745_t.indd 472 14/02/2018 16:38
22. Consumer (mis)behavior and public
policy intervention
Klaus Wertenbroch
Consumers often “misbehave” (Thaler 2015).1 They save and exercise too
little; they spend, eat, and drink too much and take on too much debt;
they work too hard (or too little); they smoke, take drugs (but not their
prescription medicine), have unprotected sex, and carelessly expose their
private lives on social media. These misbehaviors may entail large costs
not only to society but also to the individuals concerned. Hence, policy-
makers feel compelled to regulate these behaviors along with the extent to
which companies are allowed to cater to, or take advantage of, consumer
preferences to engage in these behaviors. Examples abound. Witness, for
example, the widespread regulatory constraints imposed by governments
on both companies and consumers such as bans on smoking and taking
drugs, curbs on alcohol consumption, or borrowing limits based on dis-
posable income. Prominent examples of regulatory constraints imposed
on marketers include Australia’s Tobacco Plain Packaging Act, which,
beginning in December 2012, requires cigarette manufacturers to use
generic, undifferentiated packaging; or New York City’s proposed so-
called soda ban of sales of sugar-sweetened drinks in cups of more than
16 ounces (ultimately rejected by the courts in 2014); or the United States
Credit Card Act of 2009, which limits how credit card companies can
charge consumers and make them pay off their debt balances.
What is it about consumer financial decision-making, eating and drink-
ing, smoking, online behavior, and other (mis)behaviors that can make
them problematic? How can empirical methods and findings from mar-
keting science be used to help marketers, consumers, and policy makers
evaluate and control these misbehaviors? In this chapter, I will focus and
build on an approach developed in Wertenbroch (1998) to outline how the
theory-guided use of experimental methods, complemented by field data,
can provide both a criterion for evaluating the need for policy intervention
and a tool, offered by government as well as private enterprise, for allow-
ing consumers to avoid or limit their own misbehaviors without imposing
heavy-handed, intrusive constraints on market participants’ freedom of
choice (Thaler and Sunstein 2003).
473
MIZIK_9781784716745_t.indd 473 14/02/2018 16:38

Criteria for Policy Intervention
How do we know whether a consumer does too much or too little of

something—spends or eats or drinks too much? What are possible diag-
nostic criteria for detecting such misbehavior, or misconsumption, that
might call for regulatory intervention?
Negative Externalities
A seemingly straightforward criterion is whether individual consumer

behavior generates negative externalities; that is, whether one person’s
consumption choices affect other consumers or society negatively (Coase
1960). For example, obesity-related health care costs in the United States
have been estimated at $190 billion in 2005 alone (Cawley and Meyerhoefer
2012). Consumers’ primary reliance on pay-as-you-go public pension
systems, as reflected in low individual retirement savings rates, entails
substantial intra- and intergenerational redistribution effects and nega-
tive incentive effects on labor participation (Börsch-Supan and Schnabel
1998). Less spectacularly, a consumer smoking a cigarette in a bar may
bother other patrons who do not want to be exposed to smelly or harmful
cigarette smoke. While most people will agree that individual consump-
tion choices should be limited to avoid such negative externalities, modern
democratic societies also acknowledge and protect individuals’ freedom of
choice. Policy intervention to curb individual behaviors to protect third
parties thus entails tradeoffs between protecting an individual consumer’s
right to choose freely and other consumers’ rights to protect their own
welfare. Such tradeoffs in determining the need for policy intervention
evolve with the arrival of new scientific information (e.g., about the soci-
etal costs of obesity) but also with changes in societies’ beliefs and prefer-
ences. In the end, they reflect the public’s tastes and weighting of different
consumer segments’ interests more than they provide an easy-to-pin-down
objective criterion for intervention.
Individual Consumer Welfare
Another yardstick for determining the need for intervention is the

consumer’s own welfare. Does a consumption behavior harm the indi-
vidual engaged in it? For instance, obesity has been estimated to reduce
a patient’s life expectancy by as much as eight years (Grover et al. 2015).
Mere common sense suggests that that is enough of a cost to warrant inter-
vention to prevent consumers from becoming obese. Yet, the prevailing
legal and political view in Western societies, grounded in Enlightenment
MIZIK_9781784716745_t.indd 474 14/02/2018 16:38

Consumer (mis)behavior and public policy intervention 475
thinking, is of consumers as sovereign and rational decision-makers who

are the best judges of their own welfare and hence should best be left free
to choose (Mill 1859/1975; Sunstein 2015; Wertenbroch 2014). That view
has been formalized in standard theorizing in neoclassical economics.
Consumer choice maximizes utility based on one’s preferences and subject
to a budget constraint (Stigler and Becker 1977). What consumers choose
is not a criterion for assessing the rationality of their choices; instead,
consistency of a consumer’s choices with a set of simple, intuitive behav-
ioral principles, so-called choice axioms, ensures that these choices maxi-
mize utility (von Neumann and Morgenstern 1944). Although somewhat
counterintuitive, it is thus possible to describe an obese or even addicted
consumer as a rationally self-interested, forward-looking utility maximizer
with stable preferences (Becker and Murphy 1988), that is, as a rational
consumer who does not over- or mis-consume but simply deeply discounts
the future consequences of his or her current choices.
Historically, such a view of consumers as fundamentally rational
decision-makers has focused policymakers on the need to reduce, or
manage, information asymmetries in consumer choice (Akerlof 1970;
Stigler 1961). Yet, four decades of research into heuristics and biases in
human judgment and choice (Kahneman and Tversky 1979; Kahneman
2011; Thaler 2015; Tversky and Kahneman 1974) have shown that con-
sumer choice systematically deviates from the standard axiomatic model
of rationality; therefore, merely providing consumers with more com-
prehensive information about their choice options and the probabilistic
consequences of choosing these options is not enough to allow people to
make optimal choices. Whether and how policymakers can help improve
consumers’ welfare by policy intervention in the face of such decision-
making biases without infringing on individuals’ freedom of choice has
been the subject of much debate in recent years (Thaler and Sunstein 2003,
2008; Sunstein 2015).
Internalities and Precommitment
Negative externalities or third-party assessments of individual consumer

welfare offer a rather pragmatic, if crude, guide to identifying a need
for policy intervention and regulation. They require subjective political
judgments by policymakers of how to balance the negative consequences
of individual choices with preserving individual freedom of choice; these
judgments are not fundamentally based on the preferences of those whose
behavior is being regulated. When asked, many overweight consumers, for
instance, may express unhappiness with being overweight, yet they fail to
reduce their calorie intake and/or exercise. Hence, what they say is often
MIZIK_9781784716745_t.indd 475 14/02/2018 16:38

not consistent with what they choose to do, an inconsistency between their
stated and revealed preferences (Wertenbroch and Skiera 2002).
Many such cases of what we might label misbehavior, or misconsump-
tion, involve intertemporal tradeoffs, which consumers make between
consequences of their consumption choices that occur over time. People
give in to the temptation to consume or do something unhealthy (e.g.,
drink sugary soft drinks, smoke, have unprotected sex, fail to exercise
or take one’s prescription drugs) for its immediate benefits even though
they know that their choice entails much larger negative long-term conse-
quences, which they anticipate they will regret. They thus choose a sooner,
smaller reward (e.g., immediate taste benefits, pleasure, leisure, present
consumption) over a larger, later one (e.g., better health outcomes, suf-
ficient retirement savings), when the sooner, smaller reward is imminent,
even though they prefer the larger, later reward when both occur in the
future. Strotz (1955–56) showed that such intertemporally, or dynami-
cally, inconsistent preferences cannot be characterized by discounting
the future at a constant rate, which is commonly regarded as normative.2
Instead, consumers discount the future consequences of their present
choices disproportionately, or hyperbolically, relative to the immediate
consequences, entailing myopic or present-biased preferences (Ainslie
1975; Frederick, Loewenstein and O’Donoghue 2001; Laibson 1997).
Such present-biased preferences that disproportionately overvalue imme-
diate outcomes can be said to yield negative internalities, that is, costly
consequences for consumers’ own future selves (Bartels and Urminsky
2011; Herrnstein et al. 1993; Hershfield et al. 2011).
Consumers differ—not only across individuals but also intra-individ-
ually across situations—in the degree to which their choices are present-
biased and also in the extent to which they exercise self-control, that is,
in the extent to which they attempt to curb their present-bias to minimize
the negative future consequences of their present choices. Consistent with
Strotz’s (1955–56) analysis, O’Donoghue and Rabin (1999) distinguish
between rational, time-consistent consumers (who may also include those
who use willpower to resist temptation and thus do not exhibit present-
biased choices; Baumeister and Vohs 2003) and others whose choices are
characterized by present-bias. The latter encompass naïfs who do not
foresee the self-control problems that arise from their present-biased pref-
erences and sophisticates who are aware of their present-bias and hence
foresee these self-control problems. Sophisticates may exercise self-control
to curb their present-biased impulses by engaging in precommitment: at
a time when they are not yet tempted to choose a smaller, sooner reward
over a larger, later one, they foresee that they will be tempted when that
smaller, sooner reward becomes imminent. They therefore self-impose
MIZIK_9781784716745_t.indd 476 14/02/2018 16:38

constraints on their future ability to give in to temptation, committing

themselves to smaller, rather than larger, choice sets that limit the future
availability of tempting options (Gul and Pesendorfer 2001; Strotz 1955–
56). The prototypical example is Ulysses, who tied himself to the mast of
his ship to keep himself from giving in to the Sirens’ deadly temptation.
More modern-day examples include Christmas Club bank accounts, in
which consumers choose to save money for their Christmas purchases
and that are subject to restrictions on early withdrawals (Thaler 1980), or
placing one’s alarm clock out of reach so that one needs to get up to reach
it to turn off the alarm (Schelling 1984).
I propose that precommitment may be used as a revealed preference-
based criterion to resolve the policymaker’s dilemma of separating merely
impatient yet rational, time-consistent consumers (i.e., those who may
appear to be misconsuming but who do so out of their own free will, dis-
counting the future consequences of their choices deeply but constantly)
from those consumers who worry about the internalities created by their
present-biased preferences (i.e., those who feel that they are misconsuming
and wish they weren’t). The former group would suffer from regulation
that intrudes on their freedom of choice, whereas the latter is sophisticated
enough to value and self-impose restrictions on their choice sets.
Ariely and Wertenbroch (2002) provided an early illustration of an
intervention that allows policymakers to both identify and help these
sophisticates in the context of a particularly widespread time-inconsistent
behavior, procrastination. Present-biased consumers will put off tedious
tasks that involve small immediate costs (e.g., effort) yet larger long-term
benefits (e.g., good grades from well-written homework assignments),
thereby harming their long-term welfare. Ariely and Wertenbroch (2002)
offered course participants the option to self-impose costly external dead-
lines for when they wanted to submit required class assignments, to help
them spend sufficient time and effort on each assignment. In one study, for
example, students could choose to self-impose deadlines throughout the
semester such that they would lose one percent of their grade for each day
by which they would miss turning in their respective assignment. Because
missing these deadlines entailed costly consequences, students who chose
to self-impose such external deadlines could only make themselves worse
off, compared to setting their own, non-binding private deadlines. Yet
a significant percentage of students chose the costly, binding deadlines,
violating rules of standard rationality. These students preferred to limit
their own freedom of choice, in this case their freedom to procrastinate,
to create incentives for themselves to work on their assignments more effi-
ciently. They thus revealed by virtue of their choices that they were afraid
of otherwise procrastinating, that is, of giving in to their present bias.
MIZIK_9781784716745_t.indd 477 14/02/2018 16:38

A Need for Intervention: Detecting

Consumer Precommitment in the
Marketplace
The previous section outlined three different criteria for policy interven-
tion and consumer protection: the presence of negative externalities,
third-party assessments of individual consumer welfare, and evidence of
consumer precommitment. Of these, precommitment offers the only crite-
rion that reveals the consumer’s own preferences for controlling his or her
consumption, as opposed to relying on the consumer’s stated preferences
or on third-party assessments of the consumer’s welfare. How can firms
and policy makers detect such precommitment in consumer markets?
The first empirical analysis of consumer precommitment in the mar-
ketplace was offered by Wertenbroch (1998), providing a template for a
theory-guided, empirical identification of instances of precommitment as
a behavioral criterion to detect a need for policy intervention. The paper
introduced a formal distinction into the marketing literature between
so-called vice and virtue goods (318–319). Vices are defined as goods that
dynamically inconsistent consumers are tempted to overconsume (e.g.,
alcohol, sweets, etc.), whereas virtues are defined as goods that dynami-
cally inconsistent consumers are tempted to underconsume (e.g., exercise,
spinach, etc.), due to how the costs and benefits of consuming them are
distributed over time. For example, snacking on cookies (a vice) yields an
immediate taste benefit but may make you gain weight over time, while
doing your homework (a virtue) is effortful but helps you achieve better
subsequent grades.
Wertenbroch (1998) hypothesized that consumers who worry about
being tempted to overconsume vices ration their purchase quantities of
these vices, relative to those of comparable virtues. That is, they prefer
to buy these vices in smaller package sizes at a time. For example, many
smokers prefer to buy their cigarettes in packs rather than in cartons
(Wertenbroch 2003). This imposes additional transaction costs on mar-
ginal consumption—they have to take another shopping trip to buy a
new pack when the initial pack is finished. Hence, rationing is a form of
precommitment—at the time of purchase, when consumers are not yet
tempted to overconsume a vice (e.g., in the store), they themselves strategi-
cally change the incentives, which they expect to face later on at the time
of consumption (e.g., at home), self-imposing constraints on marginal vice
consumption. To illustrate, when you have finished a bag of potato chips,
a prototypical impulse good, it is a lot more difficult for you to eat more
chips if you have to go out and buy another bag than if you can simply
grab one from your pantry. Such strategically motivated preferences to
MIZIK_9781784716745_t.indd 478 14/02/2018 16:38

buy vices in smaller package sizes imply that demand for vices ought to
be less price-elastic than demand for comparable virtues: In response to
a given price reduction, demand for vices increases at a slower rate than
demand for virtues (subject to the condition that consumers do not prefer
virtues to vices at all prices). Sophisticated consumers who recognize their
need for self-control will be reluctant to buy more of a vice in response to
a price discount.
In an early example of the application of multiple methods in marketing
science, Wertenbroch (1998) employed a combination of experimental
data, field study data, and aggregate store-level scanner data analysis to
test this hypothesis and to enhance the external validity of the experimen-
tal findings. In an incentive-compatible experiment, 304 MBA student
participants were given an opportunity to buy potato chips. They could
choose between a small purchase quantity (one 6-oz. bag) for $1, or a
larger purchase quantity (three 6-oz. bags) at a quantity discount, or none
at all. The quantity discount depth varied between participants, either
shallow (three bags for $2.80) or deep (three bags for $1.80). To manipu-
late how tempting the chips were (and thus how strong the potential need
for self-control by precommitment was), they were described either as 25
percent fat (a more tempting vice frame) or as 75 percent fat-free (a less
tempting virtue frame), also between participants. Manipulation checks
showed that participants’ perceptions of the two price discount levels and
of the intertemporal costs and benefits differed accordingly. The results
were as predicted: For those 151 participants who bought potato chips,
a logistic regression analysis to predict purchase quantity probabilities
showed that increasing the quantity discount depth was much less effec-
tive at inducing the purchase of the large quantity under the vice frame
(25 percent fat) than under the virtue frame (75 percent fat-free). At the
same time, participants did not exhibit a stronger preference for the chips
when they were framed as a virtue than when they were framed as a vice,
indicating that the reluctance to buy the large size under the vice frame did
not arise because the chips were less preferred overall when framed as 25
percent fat. These results provided initial support for the hypothesis that
consumers control their consumption of tempting vice goods by buying
these vices in smaller package sizes at a time than comparable virtues.
A second experiment provided additional evidence linking participants’
package size preferences to a measure of their need for self-control. A
different group of 310 MBA student participants recruited for this experi-
ment indicated whether they wanted to buy zero, one, or two packs of
Oreo chocolate chip cookies at each of 20 different package prices (from
25¢ to $5 in 25¢ increments). Using an incentive-compatible lottery proce-
dure, 10 percent of the participants were randomly selected to receive $10
MIZIK_9781784716745_t.indd 479 14/02/2018 16:38

worth of experimental subject compensation, and for each of the selected

participants, the experimenter also randomly chose one of the 20 prices, at
which the participant had to buy the number of packs they had indicated,
receiving the Oreos and the remaining balance of $10. The experiment
manipulated between participants whether the Oreos were regular or
reduced fat. A manipulation check confirmed that participants perceived
the intertemporal costs and benefits of consuming the Oreos in line with
conceptualizing regular Oreos as a relative vice and reduced fat Oreos as
a relative virtue. Finally, the experiment determined participants’ need
for self-imposing constraints by measuring participants’ impulsiveness
(i.e., their chronic disposition to yield to temptation, an indicator of their
present-bias), using a scale adapted from Puri (1996). A repeated-measures
ANOVA showed that participants’ decline in per-unit willingness to pay
for two packs rather than one pack (i.e., their preference for quantity
discounts) was more pronounced for regular than for reduced-fat Oreos as
participants’ impulsiveness scores increased. That is, vice buyers were less
price-sensitive than buyers of (comparable) virtues and demanded deeper
quantity discounts for the vice than for the virtue the more impulsive
they were. These findings confirmed that consumers prefer to buy vices in
smaller amounts at a time than comparable virtues and demonstrated that
this preference for rationing vices is a function of consumers’ underlying
need for self-control and thus a form of precommitment.
To examine the external validity of these experimental results, a third
study then compared the depth of actual quantity discounts of relative vices
and virtues in the marketplace. If vice consumers are less responsive to
declining unit prices from quantity discounts than virtue consumers because
they prefer to buy vices in smaller quantities and are therefore reluctant to
trade up to larger purchase quantities, sellers have to offer deeper quantity
discounts for vices than for virtues to encourage sales of larger quantities.
Study 3 examined a convenience sample of price and package size data for
30 matched pairs of regular and light, diet, or otherwise tempered versions
of the same or similar product categories (e.g., regular versus light salad
dressing, regular versus diet soft drinks, sugared versus low-sugar cereal,
etc.), with a maximum of five different package sizes and 15 brands per cat-
egory from a total of seven stores in metropolitan Chicago. Manipulation
check measures from a sample of 136 MBA students showed that consumer
perceptions of the intertemporal costs and benefits of consuming these
products were in line with a conceptualization of the regular products
as relative vices and the light products as relative virtues for 21 of the 30
matched pairs. Regressing logged unit prices (e.g., the price per ounce) on
the logged number of units (e.g., ounces) per pack confirmed that the rela-
tive vices were priced at deeper quantity discounts than the relative virtues
MIZIK_9781784716745_t.indd 480 14/02/2018 16:38

across these 21 pairs (e.g., doubling package size decreased unit price by an
average of 57 percent for relative vices versus only 45 percent for relative
virtues). This finding suggests that marketers’ actual pricing policies are in
line with consumer preferences for rationing purchase quantities of vices
Finally, Wertenbroch (1998) examined 52 weeks of store-level sales data
from 86 stores of Dominick’s Finer Foods, a leading supermarket chain in
metropolitan Chicago with a 20 percent market share at the time, for four
of these matched categories, in which UPCs could be unambiguously iden-
tified as regular and light products. The analyses showed that aggregate
consumer demand for the relative vices was almost 30 percent less price-
elastic than demand for the relative virtues, carefully matching regular and
light UPCs and adjusting for the effects of various control variables. This
result presented additional suggestive evidence of the presence of consumer
precommitment by purchase quantity rationing in the marketplace.
All four studies showed or implied that consumer demand for relative
vices is less price-elastic than demand for relative virtues, as implied by
Wertenbroch’s (1998) purchase quantity rationing hypothesis. Consumers
do not generally prefer virtues over vices, yet demand for vices increases
less than demand for virtues in response to given unit price reductions.
This suggests that consumers self-impose inventory constraints on their
vice consumption, not because they like vices less, but for strategic
reasons, revealing a preference for precommitment. By forgoing unit price
reductions from quantity discounts, they end up paying higher unit prices
for small package sizes (relative to unit prices for large package sizes) of
vices than of virtues—put loosely, paying more to buy less of what they
want too much—a self-control premium. Wertenbroch (1998) illustrates
that key to detecting consumer precommitment in the marketplace is to
assess whether consumers are willing to pay such a premium to ration
themselves or to self-impose any other costly constraint on their own
freedom of choice (e.g., Ariely and Wertenbroch 2002). Such behavioral
evidence of precommitment allows marketers and policymakers to detect
a need for policy intervention purely based on consumers’ revealed prefer-
ences for self-imposing constraints, not on their (possibly biased) stated
preferences or third-party assessments.
A Tool for Intervention: Applying

Consumer Precommitment in the
Marketplace
Wertenbroch (1998) provided the first empirical demonstration of detect-

ing the operation of consumer self-control by precommitment in the
MIZIK_9781784716745_t.indd 481 14/02/2018 16:38

marketplace. Since then, research into applications of precommitment has

ranged from economics to consumer behavior to psychology and medi-
cine, helping consumers obtain better long-term outcomes with respect
to, for example, savings, health, and environmental behaviors. One of the
most prominent applications has been Thaler and Benartzi’s (2004) Save-
More-Tomorrow™ program, in which employees are offered an option
to precommit to save a percentage of their future pay raises as retirement
savings. Because they commit only future raises, employees are less likely
to fall victim to their present-bias at the time when they choose to precom-
mit. Findings from the first implementation showed that the participation
rate was high (78 percent), that 80 percent of those enrolled continued
in the plan for four consecutive pay raises, and that participants almost
quadrupled their average savings rate from 3.5 percent to 13.6 percent
during the 40-month observation period.3
Detecting and using consumer preferences for precommitment in
another savings example, Ashraf, Karlan and Yin (2006) conducted a field
experiment in the Philippines, in which they offered a random subsample
(N = 710) of a larger group of 1,777 retail banking clients a choice to
save money in a regular savings account or in a “commitment” savings
account, which placed restrictions on withdrawing the money (similar to a
certificate of deposit or CD), holding other characteristics constant.4 Two
hundred and two (28 percent) of those randomly selected participants who
had been offered the choice saved their money in a commitment account
rather than in a regular savings account without restrictions on with-
drawal. After one year, average savings balances were 81 percent higher in
the treatment group that included these 202 commitment savers, attesting
to the power of precommitment as a self-control device (for additional
examples of randomized field experiments on precommitment to encour-
age savings, see Brune et al. 2011 and Kast, Meier and Pomeranz 2012).
Rogers, Milkman and Volpp (2014) discuss the use of various com-
mitment devices to change exercise, eating and other health-related
behaviors. An empirical example of offering precommitment contracts
to motivate consumers to eat healthier food is a large-scale field experi-
ment by Schwartz et al. (2014). The authors offered shoppers who were
enrolled in an incentive program that discounted prices of eligible grocery
purchases by 25 percent a choice to precommit to increase their purchases
of healthy food items by five percent above their household baseline in
each of six months. They would forfeit their entire 25 percent discount
for each month that they missed their goal. Thirty-six percent (N = 632)
of those households that were offered the precommitment option chose it;
they subsequently increased their healthy food purchases by 3.5 percent,
whereas households in a control group and those who had declined the
MIZIK_9781784716745_t.indd 482 14/02/2018 16:38

precommitment option showed no increase. Interestingly, the precommit-

ment contract was successful in inducing a desired behavioral change, even
though many households missed the goal and consequently forfeited their
discount, suggesting that goals and penalties or constraints in precom-
mitment contracts need to be carefully calibrated to ensure long-term
effectiveness.
Extending the concept of precommitment to non-binding symbolic
promises, which involve psychological rather economic constraints, Baca-
Motes et al. (2013) provided a subtle intervention to motivate environmen-
tally responsible consumer behavior. Their large-scale field experiment (N
= 2,416) showed that hotel guests who made a specific commitment-like
promise at check-in, symbolized by a lapel pin that they received in return,
to re-use towels in their rooms during their stay exhibited a more than 25
percent higher probability of towel re-usage.
As these examples illustrate, there is ample room for marketers and
policy makers to offer consumers voluntary precommitment mechanisms
(e.g., contracts) to help them engage in behaviors that improve their own
or society’s long-term well-being. What all the examples have in common is
that they offer consumers a choice of precommitting, without forcing them
to do so. Dynamically (time-)consistent consumers have no reason to take
up these offers as they can only make themselves worse off. It is present-
biased, self-aware sophisticates who anticipate their own temptation and
time-inconsistency who can therefore benefit from choosing to precommit.
Their preference to voluntarily impose constraints on their own future free-
dom of choice (e.g., in the form of transaction costs, penalties, or feelings
of guilt when they fail to do what’s in their own or in society’s long- term
interest) reveals that they are concerned about the risk of misbehaving
by giving in to their temptations. Wertenbroch’s (1998) multi-method
analysis of consumer price sensitivity in the face of temptation in the
marketplace demonstrated that precommitment offers not only a tool for
policy intervention but also a criterion—based on consumers’ own revealed
preferences—to detect a need for intervention in the first place.
NOTES
1. This chapter draws on and extends ideas introduced and discussed in Wertenbroch
(2014). I am grateful to Janet Schwartz for helpful comments.
2. Frederick, Loewenstein and O’Donoghue (2001, 356) point out that Samuelson’s (1937)
standard discounted utility model, which uses constant discounting, entails no normative
claim, but that Koopmans (1960) showed that it “could be derived from a superficially
plausible set of axioms.”
3. Benartzi and Lewin (2012) offer details on practical applications of Save-More-Tomorrow™.
MIZIK_9781784716745_t.indd 483 14/02/2018 16:38

4. Dean Karlan is also co-founder of www.stickk.com, launched in 2007, which helps con-
sumers and organizations create precommitment contracts to reach their own or their
members’ personal goals, providing a commercial example of detecting and facilitating
consumer demand for precommitment.
References
Akerlof, George A. (1970), “The Market for ‘Lemons’: Quality Uncertainty and the Market
Mechanism,” Quarterly Journal of Economics, 84 (August), 488–500.
Ariely, Dan and Klaus Wertenbroch (2002), “Procrastination, Deadlines, and Performance:
Self-Control by Precommitment,” Psychological Science, 13 (May), 219–224.
Ashraf, Nava, Dean Karlan, and Wesley Yin (2006), “Tying Odysseus to the Mast: Evidence
from a Commitment Savings Product in the Philippines,” Quarterly Journal of Economics,
121 (May), 635–672.
Baca-Motes, Katie, Amber Brown, Ayelet Gneezy, Elizabeth A. Keenan, and Leif D.
Nelson (2013), “Commitment and Behavior Change: Evidence from the Field,” Journal of
Consumer Research, 39 (February), 1070–1084.
Bartels, Daniel M. and Oleg Urminsky (2011), “On Intertemporal Selfishness: How the
Perceived Instability of Identity Underlies Impatient Consumption,” Journal of Consumer
Research, 38 (1), 182–198.
Baumeister, Roy F., and Kathleen D. Vohs (2003), “Willpower, Choice, and Self-Control,”
in Time and Decision: Economic and Psychological Perspectives on Intertemporal Choice,
ed. George Loewenstein, Daniel Read, and Roy Baumeister, New York, NY: Russell Sage
Foundation, 201–216.
Becker, Gary S. and Kevin M. Murphy (1988), “A Theory of Rational Addiction,” Journal
of Political Economy, 96 (4), 675–700.
Benartzi, Shlomo and Roger Lewin (2012), Save More Tomorrow: Practical Behavioral
Finance Solutions to Improve 401(k) Plans, New York: Penguin.
Börsch-Supan, Axel and Reinhold Schnabel (1998), “Social Security and Declining Labor-
Force Participation in Germany,” American Economic Review, 88 (2), 173–178.
Brune, Lasse, Xavier Giné, Jessica Goldberg, and Dean Yang (2011), “Commitments to Save:
A Field Experiment in Rural Malawi,” World Bank Policy Research Working Paper Series
5748.
Cawley, John and Chad Meyerhoefer (2012), “The Medical Care Costs of Obesity: An
Instrumental Variables Approach,” Journal of Health Economics, 31 (1), 219–30.
Coase, Ronald H. (1960), “The Problem of Social Cost,” Journal of Law and Economics, 3
(October), 1–44.
Frederick, Shane, George F. Loewenstein, and Ted O’Donoghue (2002), “Time Discounting
and Time Preference: A Critical Review,” Journal of Economic Literature, 40 (June), 351–401.
Grover, Steven A., et al. (2015), “Years of Life Lost and Healthy Life-Years Lost from
Diabetes and Cardiovascular Disease in Overweight and Obese People: A Modelling
Study,” Lancet Diabetes & Endocrinology, 3 (2), 114–122.
Gul, Faruk and Wolfgang Pesendorfer (2001), “Temptation and Self-Control,” Econometrica,
69 (6), 1403–1435.
Herrnstein, Richard J., George F. Loewenstein, Dražen Prelec, and William Vaughan,
Jr. (1993), “Utility Maximization and Melioration: Internalities in Individual Choice,”
Journal of Behavioral Decision Making, 6 (September), 149–185.
Hershfield, Hal E., Dan G. Goldstein, William F. Sharpe, Jesse Fox, Leo Yeykelvis, Laura
L. Carstensen, and Jeremy N. Bailenson (2011), “Increasing Saving Behavior Through
Age-Progressed Renderings of the Future Self,” Journal of Marketing Research, 48, S23–S37.
Kahneman, Daniel (2011), Thinking Fast and Slow, New York, NY: Farrar, Straus & Giroux.
Kahneman, Daniel and Amos Tversky (1979), “Prospect Theory: An Analysis of Decisions
under Risk,” Econometrica, 47 (2), 263–291.
MIZIK_9781784716745_t.indd 484 14/02/2018 16:38

Kast, Felipe, Stephan Meier, and Dina Pomeranz (2012), “Under-Savers Anonymous:
Evidence on Self-Help Groups and Peer Pressure as a Savings Commitment Device,”
NBER Working Paper No. 18417.
Koopmans, Tjalling C. (1960), “Stationary Ordinal Utility and Impatience,” Econometrica
28 (2), 287–309.
Laibson, David (1997), “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of
Economics, 112 (2), 443–477.
Mill, John Stuart (1859/1975), On Liberty, New York, NY: Norton.
O’Donoghue, Ted and Matthew M. Rabin (1999), “Doing It Now or Later,” American
Economic Review, 89 (1), 103–124.
Puri, Radhika (1996), “Measuring and Modifying Consumer Impulsiveness: A Cost–Benefit
Accessibility Framework,” Journal of Consumer Psychology, 5 (2), 87–113.
Rogers, Todd, Katherine L. Milkman, and Kevin G. Volpp (2014), “Commitment Devices:
Using Initiatives to Change Behavior,” Journal of the American Medical Association, 311
(20), 2065–2066.
Samuelson, Paul A. (1937), “A Note on Measurement of Utility,” Review of Economic
Studies, 4 (2), 155–161.
Schelling, Thomas C. (1984), “Self-Command in Practice, in Policy and in a Theory of
Rational Choice,” American Economic Review, 74 (May), 1–11.
Schwartz, Janet, Daniel Mochon, Lauren Wyper, Josiase Maroba, Deepak Patel, and Dan
Ariely (2014), “Healthier by Precommitment,” Psychological Science, 25 (2), 538–546.
Stigler, George J. (1961), “The Economics of Information,” Journal of Political Economy,
69 (3), 213–225.
Stigler, George J. and Gary S. Becker (1977), “De Gustibus Non Est Disputandum,”
American Economic Review, 67 (2), 76–90.
Strotz, Robert H. (1955–56), “Myopia and Inconsistency in Dynamic Utility Maximization,”
Review of Economic Studies, 23, 165–180.
Sunstein, Cass R. (2015), “Fifty Shades of Manipulation,” Journal of Marketing Behavior, 1
(3–4), 213–244.
Thaler, Richard H. (1980), “Toward a Positive Theory of Consumer Choice,” Journal of
Economic Behavior & Organization, 1 (1), 39–60.
Thaler, Richard H. (2015), Misbehaving: The Making of Behavioral Economics, New York,
NY: Norton.
Thaler, Richard H. and Shlomo Benartzi (2004), “Save More Tomorrow™: Using Behavioral
Economics to Increase Employee Saving,” Journal of Political Economy, 112 (1, pt. 2),
S164–S187.
Thaler, Richard H. and Cass R. Sunstein (2003), “Libertarian Paternalism,” American
Economic Review, 93 (2), 175–179.
Thaler, Richard H. and Cass R. Sunstein (2008), Nudge: Improving Decisions About Health,
Wealth, and Happiness, New Haven, CT: Yale University Press.
Tversky, Amos, and Daniel Kahneman (1974), “Judgment under Uncertainty: Heuristics
and Biases,” Science, 185 (4157), 1124–1131.
von Neumann John and Oskar Morgenstern (1944), Theory of Games and Economic
Behavior, New York: Wiley.
Wertenbroch, Klaus (1998), “Consumption Self-Control by Rationing Purchase Quantities
of Virtue and Vice,” Marketing Science, 17 (4), 317–337.
Wertenbroch, Klaus (2003), “Self-Rationing: Self-Control in Consumer Choice,” in Time
and Decision: Economic and Psychological Perspectives on Intertemporal Choice, eds.
George Loewenstein, Daniel Read, & Roy Baumeister, New York, NY: Russell Sage
Foundation, 491–516.
Wertenbroch, Klaus (2014), “How (Not) to Protect Meta-Rational Consumers from
Themselves,” Journal of Consumer Protection and Food Safety, 9 (3), 266–268.
Wertenbroch, Klaus and Bernd Skiera (2002), “Measuring Consumer Willingness to Pay at
the Point of Purchase,” Journal of Marketing Research, 39 (May), 228–241.
MIZIK_9781784716745_t.indd 485 14/02/2018 16:38

23. Nudging healthy choices with the 4Ps
framework for behavior change
Zoë Chance, Ravi Dhar, Michelle Hatzis,
Michiel Bakker, Kim Huskey and Lydia Ash
Anyone who has made a New Year’s resolution and failed to make a
lasting behavior change is intimately familiar with the “intention-behavior
gap” (Sheeran, 2002). When it comes to following through on our best-
laid plans, we often fall short—most intentions to change behavior end in
failure (Sheeran, Webb, and Gollwitzer, 2005). There exist a multitude of
situations in which human behavior is seemingly irrational—going against
their intentions, for example—but nonetheless predictable. The promise of
behavioral science is that these anomalies can be exploited opportunisti-
cally to nudge people in the direction of making better choices. To help
people make desired behaviors easier for themselves and others, we have
formed an academic–industry collaboration to develop and implement a
new framework, the 4Ps Framework for Behavior Change. It offers strate-
gies and tactics for helping close the intention-behavior gap, organizing a
variety of “nudges” from marketing, psychology, and behavioral econom-
ics. These nudges can help practitioners and consumers design interven-
tions across multiple domains. The framework is consistent with Richard
Thaler and Cass Sunstein’s ideal of “libertarian paternalism”—nudging
people in directions that align their behaviors with their long-term self-
interest, without curtailing their ultimate freedom to choose (Thaler and
Sunstein, 2003). Focusing on actionable, high-impact levers of change, it
combines common sense with novel ways to make desirable behavior the
path of least resistance. In this chapter, we present the framework, along
with supporting research findings, and describe how it is being applied in
the field: encouraging healthy food choices at Google.
Most people report a desire to eat healthfully (Share Our Strength,
2014), but people eat more and eat more fattening foods than they did
20 years ago, with rates of obesity skyrocketing as a result. In addition
to increasing the personal risks of heart disease, diabetes, and other
chronic illnesses (Flegal, Graubard, Williamson, and Gail, 2007), obesity
is estimated to account for almost 10 percent of total annual medical
expenditures in the USA (Finklestein, Trogdon, Cohen, and Dietz, 2009).
Millions of dollars are being spent on nutritional and wellness education,
486
MIZIK_9781784716745_t.indd 486 14/02/2018 16:38

Nudging healthy choices with the 4Ps framework for behavior change 487
and American consumers spend more than $50 billion a year on weight-
loss attempts (Market Data Enterprises, 2009), but desire and information
are clearly not enough. It is in the public interest to help make healthier
food choices easier for everyone. And in many cases, it is in the interest of
corporations as well.
In 2015, Google celebrated its sixth year holding the number one spot
on Fortune’s list of 100 Best Companies to work for (Fortune, 2015).
And in all those years, Googlers mentioned the free, delicious food as one
of the keys to their satisfaction. The biggest challenge for the food team
was figuring out how to help Googlers stay simultaneously healthy and
satisfied: failing on either dimension would mean loss of productivity and
morale, which could hurt business outcomes and employee retention. And
inducing satisfaction meant not just providing a variety of foods (includ-
ing some less healthy ones), but treating employees as adults in control of
their own decisions about their bodies and their health. Therefore, gentle
nudges that did not restrict choices were appealing to the food team.
When the Google food team engaged Yale School of Management to
help them apply the 4Ps framework, they had already been using many
“tweaks” inspired by behavioral economists that were consistent with the
framework. In fact, they were on the vanguard of applying behavioral
economics to the food environment. Here, we describe how the framework
is being applied at Google, with results of some field experiments. Our
hope is that describing how the framework can be applied to one challenge
(serve food that keeps people healthy and satisfied) in one type of location
(Google offices) will inspire ideas for applying the framework to other
challenges and locations.
The 4Ps Framework for Behavior Change
The 4Ps Framework for Behavior Change leverages principles of behav-

ioral economics, psychology, and marketing to restructure the environ-
ment in ways that (1) maximize the benefits arising from sporadic efforts
to achieve health goals and (2) minimize the effort, time, and willpower
needed to make good choices. These resources are in short supply, and
in everyday life consumers face conflicting pressures in their pursuit of
good choices. Frequently time pressure (Dhar and Nowlis, 1999), deple-
tion of self-control (Pocheptsova, Amir, Dhar, and Baumeister, 2009), or
distractions (Shiv and Nowlis, 2004) limit people’s processing capacity,
which impacts their decisions. Often, they browse without planning ahead,
failing to consider possible alternatives. In many cases they succumb to
temptation in the clash between short-term and long-term goals (Khan
MIZIK_9781784716745_t.indd 487 14/02/2018 16:38

and Dhar, 2006). For all these reasons, it is possible and helpful to nudge
them in the right direction, through the types of simple interventions sug-
gested by the 4Ps framework.
The intervention domains of the 4Ps framework are: Process (how
choices are made), Persuasion (how choices are communicated),
Possibilities (what choices are offered), and Person (how intentions are
reinforced). (See Figure 23.1 for a summary of the framework.) Each lever
of change provides different paths to reduce resistance and nudge indi-
viduals toward healthy choices, offering ways to make intuitive choices
healthier and rational choices easier. Together, the framework provides
comprehensive suggestions for engineering the environment to make the
healthy choice the easy choice. Any aspects of the framework can be used
together; it is not necessary to use all of them. And although we focus on
health and food choices in this chapter, the framework can be applied to
any type of behavior.
Process: How Are Choices Made?
Process interventions can influence behavior by understanding choice heu-

ristics relied upon by consumers (Dhar and Gorlin, 2013) in order make
the healthier options easier to choose. These nudges reposition options in
physical or psychological space, affecting their relative appeal or ease of
selection. This can involve changing the physical location of the options
(order and accessibility) or the structure of the choice (defaults). Because
they involve changes to the context in which a person makes a choice,
behavioral economists call Process interventions “choice architecture”
(Thaler and Sunstein, 2008).
Order
Sequence matters: order has a strong impact on preferences and choices

between options. In a classic marketing study, researchers found consum-
ers who had touched and evaluated four pairs of stockings were four times
as likely to choose the pair on the right as the one on the left—yet they
had no awareness of any order effects (Nisbett and Wilson, 1977). More
meaningfully, a political candidate whose name is listed first gains 3.5 per-
centage points in an election (Koppell and Steen, 2004). And sometimes
the middle option can have an advantage, too—“extremeness aversion”
leads many consumers to avoid, for example, the largest or smallest drink
sizes (Dhar and Simonson, 2003). There are some conflicting findings, but
in general, the privileged position in a visual set (like a buffet line or menu)
MIZIK_9781784716745_t.indd 488 14/02/2018 16:38

PROCESS
ITES
PER
BIL PERSON
SUAS
SI
S IO
PO N
PROCESS: HOW ARE PERSUASION: HOW ARE

CHOICES MADE? CHOICES COMMUNICATED?
• Order: Relative position in a set • Vividness: Emotional connection
• Defaults: Choice that doesn’t require through words, images, or experience
action • Comparisons: Framing relevant trade-
• Accessibility: Easy to see, reach, offs, contrasts, or similarities
choose, or think of • Moments of truth: Time and place in
which message will be most
persuasive
POSSIBILITIES: WHAT
CHOICES ARE OFFERED? PERSON: HOW ARE
INTENTIONS REINFORCED?
• Assortment: Selection and relative
appeal of items in the choice set • Goals: Motivational and measurable
• Bundling: Strategic pairing of desired outcomes
complementary items • Precommitment: Actions planned or
• Quantity: Real or perceived volume or committed to in advance
number • Habits: Automatic behaviors requiring
little mental effort
Figure 23.1 4Ps framework for behavior change
is the first item in a pair or the middle item in a set of three. The privileged
positions in an experiential or auditory set (like a set of stockings to touch
or a list of daily specials to hear) are both the first and the last items.
When options are ordered by an alignable attribute such as size, people
with weak preferences tend to compromise by choosing the middle option
because it is easier to rationalize (Sharpe, Staelin, and Huber, 2008). These
MIZIK_9781784716745_t.indd 489 14/02/2018 16:38

biases can serve health goals, if healthy options are offered in the advan-
taged positions in comparative choices.
Defaults
Due to a bias toward the status quo, and also the ease of not making a
decision, defaults have proven extremely effective in guiding choices, even
in domains as weighty as organ donations (Johnson and Goldstein, 2003)
and retirement savings (Thaler and Benartzi, 2004). Often people are not
even aware of any alternative to the default. For example, in one study at
a Chinese takeout restaurant, patrons were asked if they would prefer a
half-serving of rice (without any price discount). Many chose this option,
which had not occurred to them when the full-sized entrée was offered
as the default (Schwartz, Riis, Elbel, and Ariely, 2012). Defaults are less
effective when preferences are strong. When preschool children were
offered apple slices as the default side but allowed to switch to French
fries, their strong preference for fries led the vast majority to reject the
apples (Just and Wansink, 2009).
Accessibility
Accessibility, or convenience, exerts a gentle but powerful influence on

choices. Often, tempting options are too accessible; for example, when
fast-food restaurants offer free refills on sodas, they encourage consump-
tion of empty calories not only through the price discount but also by elim-
inating the need to wait in line again and pay at the counter. But people
also drink more water when it is easily accessible on their table, rather than
20 feet away (Engell, et al., 1996). Conversely, cafeteria visitors purchased
fewer junk foods when they were less accessible, requiring waiting in a
separate line (Meiselman, et al., 1994), and in another study, people were
less likely to serve themselves ice cream when it was less accessible, in a
closed rather than an open freezer (Levitz, 1976). Perceived accessibility
affects behavior as well. For example, moving healthy foods to eye level
increases their consumption (Thorndike, et al., 2012), even though they
were already visible before. At Google, stocking water bottles in coolers
at eye level while moving sugary beverages to lower shelves behind frosted
glass increased water consumption by 47 percent, decreasing calories con-
sumed from sugary beverages by 6 percent (Kang, 2013).
MIZIK_9781784716745_t.indd 490 14/02/2018 16:38

A small difference in accessibility can have a major impact on snacking. In one of

Google’s large and busy “microkitchen” breakrooms stocked with free drinks and
snacks, undercover observers recorded the number of drinkers who also took a
snack. One beverage station lay 6.5 feet from the snack bar, the other 17.5 feet from
the snack bar. Each beverage station had cold drinks and hot drinks. The snack
bar offered nuts, crackers, candies, dried fruit, chips, and cookies. Observations of
more than 1,000 people found that drinkers who used the beverage station near
the snacks were 50 percent more likely to grab a snack with their drink. For men,
the estimated “penalty” in increased annual snack calorie consumption for using
the closer beverage station was calculated to yield about a pound of fat per year
for each daily cup of coffee!
Persuasion: How Are Choices

Communicated?
In addition to nudging behavior through the choice process, there are

many opportunities for nudging through persuasive communication.
Persuasion interventions can make healthy options more appealing and
unhealthy options less appealing through the fine-tuning of message deliv-
ery. Persuasion interventions are the least invasive and lowest cost way to
nudge people toward better choices. Effective persuasion uses vividness,
comparisons, and “moments of truth.”
Vividness
Vivid messaging and imagery grabs the attention of the intuitive, emo-
tional mind. Triggering emotions such as delight or disgust can help the
gut instinct be the right one. Vividness can be achieved with words or with
a visual or tactile experience.
Names play an important role in expectations and evaluations.
Understanding this, marketers have changed the names of some popular
products. To avoid vivid and negative images of oiliness, Kentucky Fried
Chicken has been officially shortened to KFC®, and Oil of Olay has been
shortened to Olay®. To escape the vivid connection with constipation,
prunes have become “dried plums.” Healthy choices can be assisted by
vivid names as well. Adding adjectives like “succulent” or “homemade”
can make food not only more appealing but also tastier and more filling
(Wansink, van Ittersum, and Painter, 2005). Even fruit consumption
can be nudged—a sign reading “fresh Florida oranges” increased fruit
consumption (Wansink, 2006). However, food names can spur over-
consumption, too: dieters thought a “salad special” was healthier and
MIZIK_9781784716745_t.indd 491 14/02/2018 16:38

thus ate more of it than an identical “pasta special” (Irmak, Vallen, and
Robinson, 2011). And people eat more when portions are called “small”
or “medium,” believing they have eaten less (Aydinoglu, Krishna, and
Wansink, 2009).
Using pictures or objects is another vivid way to engage the emotions,
which can encourage persistence in healthy behaviors. For example, look-
ing at bacteria cultured from their own hands led doctors to wash more
often. And seeing a vial of fat from a gallon of whole milk caused many
milk drinkers to switch to skim (Heath and Heath, 2010). Visuals can also
simplify the decision process. In one cafeteria intervention, implementing
a simple green/yellow/red color-coding system improved sales of healthy
items (green) and reduced sales of unhealthy items (red) (Thorndike, et
al., 2012). Google has implemented stoplight labels as well, with many
Googlers reporting that the colored labels helped them make healthy
choices.
Comparisons
A persuasive message might quantify the effects of a behavior, apply

standards, or frame the outcome as a loss or gain. A quantifying message
could note, “Taking the stairs for 5 minutes a day 5 days a week burns
off 2.5 pounds of fat in a year” or “1 Snickers bar = 20 minute run.”
Standards can increase goal compliance by making progress measurable.
Using a pedometer with a stated goal (e.g., 10,000 steps) increases physical
activity (Bravata et al., 2007); and 8 glasses of water or 5 fruits and veg-
etables per day provide helpful benchmarks for measuring desired health
behaviors. Sometimes the comparison is implied, framed as a loss or a
gain. Although there are subtle qualifications, people are generally more
sensitive to losses than gains, and more motivated by fear than pleasure
(Baumeister, Bratskavsky, Finkenauer, and Vohs, 2001; Kahneman and
Tversky, 1979). Perneger and Agoritsas (2011) surveyed more than 1,000
physicians to find that their beliefs about the effectiveness of a new drug
depended on whether outcomes were framed as a loss (the mortality rate)
or a gain (the survival rate). As marketers know, multiple messages should
be tested to find the one most effective in a given situation.
Moments of Truth
A “moment of truth” is the time and place when people will be most
receptive to persuasive messaging (Dhar and Kim, 2007). The evaluation
of choice alternatives depends on which goals are active in any particular
moment. Therefore, decision processes are quite sensitive to timing—and
MIZIK_9781784716745_t.indd 492 14/02/2018 16:38

Although most people in a different study had predicted that seeing ads for some
commonly disliked vegetables wouldn’t get them to eat more of those vegetables,
it appears they may have been wrong. In one high-traffic café where Googlers
eat free meals, we promoted a series of unpopular vegetables (beets, parsnips,
squash, Brussels sprouts, and cauliflower) as the Vegetable of the Day! with dis-
plays of colorful photos and trivia facts next to a dish containing that vegetable as
its main ingredient. By placing the campaign posters at the moment of truth—right
next to the dish—we increased the number of employees trying the featured dish
by 74 percent and increased the average amount each person served themselves
by 64 percent.
for some marketing campaigns, timing is everything. One creative cam-

paign illustrates the power of the moment of truth. In Beirut, Procter &
Gamble’s laundry detergent marketing team wanted to reach consum-
ers when the goal of having clean clothes was already on their mind.
Because most Beirut residents live in tall apartment buildings and hang
their laundry on balconies to dry, they happen to see the street traffic
below while thinking about clean clothes. Seizing the moment, Procter &
Gamble rented space on the tops of buses to advertise laundry detergent.
Planners of behavioral change can take a page from the marketing play-
book by asking themselves when the goal relevant to the desired behavior
will be most salient. For example, in an office building, signs reminding
employees to take the stairs can be placed at the elevators, when people are
thinking about their goal of getting upstairs. In the right locations, stair
prompts with messages such as “Burn calories, not electricity” have been
found to be highly effective, increasing stair use by as much as 40 percent,
even 9 months later (Lee et al., 2012).
The key to Persuasion is communicating the right message, the right
way, at the right time—when the individual will be most receptive to it.
Possibilities: What Choices Are Offered?
Possibilities provide the most obvious lever of change, yet they are often
overlooked. Possibilities refers to the composition of the choice set: before
trying to steer choices, the planner might improve options. While it may
in rare cases be effective to ban undesirable behavior (such as smoking in
restaurants) or to legislate desirable behavior (such as wearing seatbelts),
the negative reactions against paternalism can often outweigh its benefits.
Therefore, we advocate a gentler approach, maintaining freedom of choice
while improving the options. When designing a choice set to facilitate
MIZIK_9781784716745_t.indd 493 14/02/2018 16:38

healthy choices, the goals should be to make options healthier and to

make healthy options more appealing (or make unhealthy options less
appealing), through assortment, bundling, and quantity. Tempting but
unhealthy options can be reduced or made less available without eliminat-
ing them altogether.
Assortment
The first decision a planner must make is what will the assortment be?
Availability has a strong impact on consumption: people tend to eat
whatever is in front of them. Sometimes the existing options can be made
healthier, either by modifying components (e.g., white to wholegrain
pasta) or by switching the mode of delivery (e.g., salt shakers that dispense
less salt per shake). One study found people were more likely to choose
a healthy option (fruit over a cookie) from a larger assortment than a
smaller one (Sela, Berger, and Liu, 2009). Relative appeal can also be
manipulated. In the Healthy Lunchrooms Initiative, Wansink found that
placing fruit in a nice bowl or under a light increased fruit sales by more
than 100 percent (“Nutrition advice,” 2014).
Variety in an assortment is a powerful stimulant of consumption.
Generally, when consuming more than one thing is possible, more options
mean more consumption. This is true even when variation is purely
perceptual. For example, people ate more M&Ms from a bowl containing
more colors of M&Ms, even though the total quantity and flavors were
identical to a bowl with fewer colors (Kahn and Wansink, 2004). One
way to reduce consumption without restricting choice altogether is by
rotating variety over time, with healthy or desirable options switching
more frequently, to encourage sampling or consumption, with unhealthy
or undesirable options switching less frequently, to encourage satiation.
Bundling
To encourage healthier choices, healthy options can be strategically paired

with other healthy options, or even with less-healthy options. Balancing
the combination of items that satisfy two goals has been shown to be
desirable (Dhar and Simonson, 1999). In many cases, healthy but less tasty
and tasty but unhealthy options may be consumed simultaneously, and
creative bundling can nudge people toward health—“lesser evils” might be
paired with “greater goods.” Bundling a healthy salad with a small portion
of fries to create a “vice-virtue” bundle can persuade some people who
would have ordered fries instead of salad to choose a bundle of one-fourth
fries and three-fourths salad (Liu, et al., 2015). In another field experi-
MIZIK_9781784716745_t.indd 494 14/02/2018 16:38

In a field experiment in another Google microkitchen, we targeted the most popular

snack item: M&Ms. These had been self-served from bulk bins into four-ounce
cups; most employees filled the cup. After taking a baseline measure of consump-
tion, we replaced loose M&Ms with small, individually-wrapped packages. This
simple intervention reduced the average amount of M&Ms employees served
themselves by 58 percent, from 308 calories to 130 calories.
ment, Milkman, Minson, and Volpp (2014) bundled addictive audiobooks

with gym workouts to encourage exercise.
Quantity
Although most choice research has focused on which option is chosen

(Nowlis, Dhar, and Simonson, 2010), the quantity consumed is also
influenced by nudges. People tend to believe the appropriate amount to
consume is an entire portion (e.g., plate, bowl, or package). As a result,
they serve themselves more food and eat more when dishes or utensils
are large. In one experiment, nutrition academics at an ice cream social
served themselves 31 percent more ice cream when given larger bowls and
57 percent more when given both larger bowls and larger serving spoons
(Wansink, van Ittersum, and Painter, 2006). Ice cream in a small cone is
perceived to be more ice cream, and more satisfying, than the same amount
in a large cone (Hsee, 1998). A small, full container conveys abundance,
which leads to satisfaction. At Google, the food team switched 22-ounce
cups to 16-ounce cups to reduce consumption of caloric beverages, offered
smaller to-go boxes to help with portion control, and served desserts either
plated or cut in small quantities.
With Process, Persuasion and Possibilities, behavior can be influenced
in a specific context. It is only through the Person, however, that behavior
can potentially be influenced across contexts over time and across multiple
locations.
Person: How Are Intentions Reinforced?

Person is the most challenging lever of change. Most behavior change
initiatives already focus on the individual person and fail to change behav-
ior even when they succeed in changing intentions. A key reason for the
inconsistency between intentions and behavior is that resisting temptation
requires resources such as attention and willpower, which are often in
short supply. Fortunately, there are ways to support intentions that rely
MIZIK_9781784716745_t.indd 495 14/02/2018 16:38

less on processing and willpower, and more on supportive tools. We can

provide some suggestions for influencing a person through goal setting
and precommitment in order to reinforce healthy intentions. The object of
these interventions is to maintain healthy behaviors over time, eventually
making them habitual and automatic.
Goals
Setting explicit goals can increase healthy choices by reducing the think-
ing required for engaging in a behavior. Effective goals are personal,
motivational and measurable—challenging, specific, and concrete (Locke
and Latham, 1990). “Getting in shape” is a wish, whereas a goal to “run 3
miles 3 times a week until the wedding” entails both a reasonable challenge
and a means of measuring success—and is more likely to yield the desired
outcome (Strecher et al., 1995). Goals also become more manageable
when broken into smaller steps. Like paying for a new car in monthly pay-
ments, a goal of losing four pounds per month becomes easier than losing
50 pounds in a year. And another important benefit of setting intermedi-
ate goals is building momentum by tracking small wins along the way—
perception of progress toward a goal can itself be motivating (Kivetz,
Urmisky, and Zheng, 2006). Tracking goals, with tools for accomplish-
ment and measurement, increases the chance of success.
Precommitment
Willpower is a depletable mental resource; when people are tired, hungry,

stressed, or focused on something else, they are less likely to perform
actions requiring willpower (Baumeister and Tierney, 2011). So, there will
be times in which a desired behavior is particularly difficult or temptation
is particularly strong. Knowing that their willpower may falter, individu-
als can preplan when possible or create their own “commitment devices.”
Researchers have found that, when people make decisions for the distant
future, they save more money (Thaler and Benartzi, 2004) and choose
healthier food (Milkman, Rogers, and Bazerman, 2010; Read and van
Leeuwen, 1998). Commitment devices increase the cost or difficulty of
engaging in undesirable behaviors, thus reducing reliance on willpower.
Many field experiments have asked participants to put their own money
at risk as an incentive for following through on their intended behaviors,
for example losing weight (John et al., 2011), or quitting smoking (Giné,
Karlan, and Zinman, 2010). Observing the power of such interventions,
behavioral economists Dean Karlan and Ian Ayres founded a website,
http://www.stickk.com, that helps users create their own commitment
MIZIK_9781784716745_t.indd 496 14/02/2018 16:38

devices, staking their money or reputation on following through on their

good intentions. The key to the long-term success of goal setting and meas-
urement of health behaviors lies in making those new behaviors habitual.
Habits
Although people experience their own behavior as conscious and inten-

tional, the majority of all actions are automatic, bypassing the conscious
decision-making process entirely (Bargh and Chartrand, 1999). Because
habits are cued automatically and enacted effortlessly, turning healthy
behaviors into habits is the ideal way to sustain them. Implementation
intentions use cues to serve as reminders for triggering a desired behav-
ior, and they can help to develop the behavior into a habit. Research has
shown implementation intentions to be effective in developing healthy
habits such as performing breast self-exams (Prestwich et al., 2005), exer-
cising (Luszczynska, Sobczyk, and Abraham, 2007), and eating vegeta-
bles (Chapman, Armitage, and Norman, 2009)—simply by asking study
participants to decide where, when, and how they plan to take action.
Habits are more easily formed and broken in new environments, because
they lack the contextual cues that triggered old habits (Wood, Tam, and
Guerrero Witt, 2005). Therefore, behavior change efforts launched in
coincidence with other changes such as moves, promotions, reorganiza-
tions, new relationships, new jobs, or even seasonal changes have a greater
chance of success (Verplanken and Wood, 2006). Even in familiar environ-
ments, contextual cues can facilitate habit formation—laying out exercise
clothes the night before can prompt a morning jog, or setting twice-a-day
medications next to the toothbrush can improve medication compliance.
Conclusion
In this chapter, we have shared the 4Ps Framework for Behavior Change,
designed to organize research findings to make them more easily appli-
cable in the real world. We have described many levers the well-meaning
planner can employ to support the healthy intentions of others, and we
have shared some examples of how the 4Ps Framework is being applied
at Google. The examples here focused on nudging people toward healthy
food choices, but similar strategies can be used to nudge people’s behavior
in any direction that supports their own intentions. The framework offers
a toolbox of interventions leveraging a contextual approach aimed at
influencing specific decisions via (1) the combination of choices people are
exposed to, (2) the choice environment, and (3) communication about the
MIZIK_9781784716745_t.indd 497 14/02/2018 16:38

In a field experiment at Google, we helped employees turn goals into healthy eating
habits. Volunteers set personal diet and body goals and were randomly assigned
to one of three groups. The first received information on the link between blood
glucose and weight gain. The second also received tools for using that information:
blood glucose monitoring devices, data sheets, and advice on measuring glucose,
weight, BMI, and body composition. The third was the control group, receiving no
information or tools. Weekly surveys showed those who had received tools in addi-
tion to information made the greatest progress on their goals. After three months,
there was no difference between the information group and the control in achiev-
ing personal goals, while among those who had received the tools, 10 percent
more reported making progress on their body goals and 27 percent more reported
making progress on their diet goals. By the end of the study, those in the tools group
reported healthy choices becoming habitual, “After doing the first blood tests, I didn’t
need to prick myself much more.” Information was not enough to facilitate change,
but tools and measurement gave insight that closed the intention-behavior gap.
choices. Additionally, we have offered advice on supporting the individual

in the development of good habits, to make better choices in any time or
place. There is great potential in the contextual spheres of influence out-
lined here that will enable planners to make good choices easy choices.
References
Aydinoglu, N. Z., Krishna, A., and Wansink, B. (2009). Do size labels have a common
meaning among consumers? In A. Krishna (ed.), Sensory marketing: Research on the sen-
suality of products. New York, NY: Routledge, 343–360.
Bargh, J. A. and Chartrand, T. L. (1999). The unbearable automaticity of being. American
Psychologist, 54, 462–479.
Baumeister, R. F., Bratslavsky, E., Finkenauer, C., and Vohs, K. D. (2001). Bad is stronger
than good. Review of General Psychology, 5(4), 323–370.
Baumeister, R. F. and Tierney, J. (2011). Willpower: Rediscovering the greatest human
strength. New York: Penguin Press.
Bravata, D. M., Smith-Spangler, C., Sundaram, V., Gienger, A. L., Lin, N., Lewis, R.,
Sirard, J. R. (2007). Using pedometers to increase physical activity and improve health: A
systematic review. Journal of the American Medical Association, 298, 2296–2304.
Chapman, J., Armitage, C. J., and Norman, P. (2009). Comparing implementation intention
interventions in relation to young adults intake of fruit and vegetables. Psychology and
Health, 24(3), 317–332.
Dhar, R. and Gorlin, M. (2013). A dual-system framework to understand preference con-
struction processes in choice. Journal of Consumer Psychology, 23(4), 528–542.
Dhar, R. and Kim, E. Y. (2007). Seeing the forest or the trees: Implications of construal level
theory for consumer choice. Journal of Consumer Psychology, 17(2), 96–100.
Dhar, R. and Nowlis, S. M. (1999). The effect of time pressure on consumer choice defer-
ral. Journal of Consumer Research, 25(4), 369–384.
Dhar, R. and Simonson, I. (2003). The effect of forced choice on choice. Journal of Marketing
Research, 40(2), 146–160.
MIZIK_9781784716745_t.indd 498 14/02/2018 16:38

Dhar, R. and Simonson, I. (1999). Making complementary choices in consumption episodes:

Highlighting versus balancing. Journal of Marketing Research, 29–44.
Engell, D., Kramer, M., Malafi, T., Salomon, M., and Lesher, L. (1996). Effects of effort and
social modeling on drinking in humans. Appetite, 26(2): 129–138.
Finkelstein, E. A., Trogdon, J. G., Cohen, J. W., and Dietz, W. (2009). Annual medical
spending attributable to obesity: payer- and service-specific estimates. Health Affairs,
28(5), 822–831.
Flegal, K. M., Graubard, B. I., Williamson, D. F., and Gail, M. H. (2007). Cause-specific
excess deaths associated with underweight, overweight, and obesity. Journal of the
American Medical Association, 17, 2028–2037.
Fortune 100 best companies to work for (2007–2015) (2015). Fortune. Retrieved December
2015, http://fortune.com/best-companies/.
Giné, X., Karlan, D., and Zinman, J. (2010). Put your money where your butt is: A com-
mitment contract for smoking cessation. American Economic Journal: Applied Economics,
2(4) 213–235.
Heath, C. and Heath, D. (2010). Switch: How to change things when change is hard. New
York, NY: Crown Business.
Hsee, C. K. (1998). Less is better: When low-value options are valued more highly than high-
value options. Journal of Behavioral Decision Making, 11, 107–121.
Irmak, C., Vallen, B., and Robinson, S. R. (2011). The impact of product name on dieters’
and nondieters’ food evaluations and consumption. Journal of Consumer Research, 38(2),
390–405.
John, L. K., Loewenstein, G., Troxel, A. B., Norton, L., Fassbender, J. E., and Volpp, K.
G. (2011). Financial incentives for extended weight loss: A randomized, controlled trial.
Journal of General Internal Medicine, 26(6), 621–626.
Johnson, E. J. and Goldstein, D. (2003). Do defaults save lives? Science, 302, 1338–1339.
Just, D. R. and Wansink, B. (2009). Smarter lunchrooms: Using behavioral economics to
improve meal selection. Choices, 24(3), 1–7.
Kahn, B. E. and Wansink, B. (2004). The influence of assortment structure on perceived
variety and consumption quantities. Journal of Consumer Research, 30(4), 519–533.
Kahneman, D. and Tversky, A. (1979). Prospect theory: An analysis of decision under risk.
Kang, C. (2013). Google crunches data on munching in office. Washington Post (September
1). Retrieved December 2015, from http://www.washingtonpost.com/business/technol-
ogy/google-crunches-data-on-munching-in-office/2013/09/01/3902b444-0e83-11e3-85b6-
d27422650fd5_story.html.
Khan, U. and Dhar, R. (2006). Licensing effect in consumer choice. Journal of Marketing
Research, 43(2), 259–266.
Kivetz, R., Urminsky, O., and Zheng, Y. (2006). The goal-gradient hypothesis resurrected:
Purchase acceleration, illusionary goal progress and customer retention. Journal of
Marketing Research, 43, 39–58.
Koppell, J. and Steen, J. A. (2004). The effects of ballot position on election outcomes.
Journal of Politics, 66(1), 267–281.
Lee, K. K., Perry, A. S., Wolf, S. A., Agarwal, R., Rosenblum, R., Fischer, S., Silver, L. D.
(2012). Promoting routine stair use: Evaluating the impact of a stair prompt across build-
ings. American Journal of Preventive Medicine, 42(2), 136–141.
Levitz, L. S. (1976) The susceptibility of human feeding behavior to external controls.
Obesity Perspectives, 53–60.
Liu, P. J., Haws, K. L., Lamberton, C., Campbell, T. H., and Fitzsimons, G. J. (2015). Vice-
virtue bundles. Management Science, 61(1), 204–228.
Locke, E. A. and Latham, G. P. (1990). A theory of goal setting and task performance.
Englewood Cliffs, NJ: Prentice-Hall.
Luszczynska, A., Sobczyk, A., and Abraham, C. (2007). Planning to lose weight: Randomized
controlled trial of an implementation intention prompt to enhance weight reduction
among overweight and obese women. Health Psychology, 26(4), 507–512.
MIZIK_9781784716745_t.indd 499 14/02/2018 16:38

Market Data Enterprises (2009). The Weight Loss and Diet Control Market.
Meiselman, H. L., Hedderley, D., Staddon, S. L., Pierson, B. J., Symonds, C. R. (1994).
Effect of effort on meal selection and meal acceptability in a student cafeteria. Appetite,
23(1), 43–55.
Milkman, K., Minson, J. A., and Volpp, K. G. (2014). Holding the Hunger Games hostage
at the gym: An evaluation of temptation bundling. Management Science, 60(2), 283–299.
Milkman, K. L., Rogers, T., and Bazerman, M. H. (2010). I’ll have the ice cream soon and
the vegetables later: A study of online grocery purchases and order lead time. Marketing
Letters, 21(1), 17–35.
Nisbett, R. E., and Wilson, T. D. (1977). Telling more than we can know: Verbal reports on
mental processes. Psychological Review, 84, 231–259.
Nowlis, S. M., Dhar, R., and Simonson, I. (2010). The effect of decision order on purchase
quantity decisions. Journal of Marketing Research, 47(4), 725–737.
Nutrition advice from nutrition expert Brian Wansink. (2014). Smarter Lunchrooms Movement.
Retrieved December 2015, http://smarterlunchrooms.org/news/nutrition-advice-nutrition-ex
pert-brian-wansink.
Perneger, T. V., and Agoritsas, T. (2011). Doctors and patients susceptibility to framing bias:
A randomized trial. Journal of General Internal Medicine, 26(12), 1411–1417.
Pocheptsova, A., Amir, O., Dhar, R., and Baumeister, R. F. (2009). Deciding without
resources: Resource depletion and choice in context. Journal of Marketing Research, 46(3),
344–355.
Prestwich, A., Conner, M., Lawton, R., Bailey, W., Litman, J., and Molyneaux, V. (2005).
Individual and collaborative implementation intentions and the promotion of breast self-
examination. Psychology and Health, 20, 743–760.
Read, D., and van Leeuwen, B. (1998). Predicting hunger: The effects of appetite and delay
on choice. Organizational Behavior and Human Decision Processes, 76(2), 189–205.
Schwartz, J., Riis, J., Elbel, B., and Ariely, D. (2012). Inviting consumers to downsize fast-
food portions significantly reduces calorie consumption. Health Affairs, 31(2) 399–407.
Sela, A., Berger, J., and Liu, W. (2009). Variety, vice, and virtue: How assortment size influ-
ences option choice. Journal of Consumer Research, 35(6), 941–951.
Share Our Strength (2014). It’s dinnertime: a report on low-income families’ efforts to plan,
shop for and cook healthy meals. Retrieved December 2015, https://www.nokidhungry.
org/images/cm-study/report-highlights.pdf.
Sharpe, K., Staelin, R., and Huber, J. (2008). Using extremeness aversion to fight obesity:
Policy implications of context dependent demand. Journal of Consumer Research, 35,
406–422.
Sheeran, P. (2002). Intention—behavior relations: A conceptual and empirical review. European
review of social psychology, 12(1), 1–36.
Sheeran, P., Webb, T. L., and Gollwitzer, P. M. (2005). The interplay between goal intentions
and implementation intentions. Personality and Social Psychology Bulletin, 31, 87–98.
Shiv, B., and Nowlis, S. M. (2004). The effect of distractions while tasting a food sample:
The interplay of informational and affective components in subsequent choice. Journal of
Consumer Research, 31(3), 599–608.
Strecher, V. J., Seijts, G. H., Kok, G. J., Latham, G. P., Glasgow, R., DeVellis, B., and
Bulger, D. W. (1995). Goal setting as a strategy for health behavior change. Health
Education Quarterly, 22, 190–200.
Thaler, R. H. and Benartzi, S. (2004). Save More Tomorrow™: Using behavioral economics
to increase employee saving. Journal of Political Economy, 112(S1), S164–S187.
Thaler, R. H. and Sunstein, C. R. (2003). Libertarian paternalism. American Economic
Review Papers and Proceedings, 93, 175–179.
Thaler, R. H. and Sunstein, C. R. (2008). Nudge: improving decisions about health, wealth, and
happiness. New Haven, CT: Yale University Press.
Thorndike, A. N., Sonnenberg, L., Riis, J., Barraclough, S., and Levy, D. E. (2012). A
2-phase labeling and choice architecture intervention to improve healthy food and bever-
age choices. American Journal of Public Health, 102(3), 527–533.
MIZIK_9781784716745_t.indd 500 14/02/2018 16:38

Verplanken, B., and Wood, W. (2006). Interventions to break and create consumer habits.
Journal of Public Policy and Marketing, 25(1), 90–103.
Wansink, B. (2006). Mindless eating: Why we eat more than we think. New York, NY:
Bantam.
Wansink, B., Van Ittersum, K., and Painter, J. E. (2005). How descriptive food names bias
sensory perceptions in restaurants. Food Quality and Preference, 16(5), 393–400.
Wansink, B., Van Ittersum, K., and Painter, J. E. (2006). Ice cream illusions: bowls, spoons,
and self-served portion sizes. American Journal of Preventive Medicine, 31(3), 240–243.
Wood, W., Tam, L., and Guerrero Witt, M. (2005). Changing circumstances, disrupting
habits. Journal of Personality and Social Psychology, 88, 918–933.
MIZIK_9781784716745_t.indd 501 14/02/2018 16:38

24. Field experimentation: promoting
environmentally friendly consumer
behavior
Noah J. Goldstein and Ashley N. Angulo
Field experimentation in consumer behavior research presents enormous

challenges, but when pursued with persistence, a creative problem-solv-
ing orientation, and some luck, it also affords sizable rewards. In this
chapter we detail a number of these challenges and rewards specifically
through the lens of field experiments published in two studies examining
the effectiveness of different persuasive messages urging hotel guests to
reuse their towels (Goldstein, Cialdini, and Griskevicius, 2008; Goldstein,
Griskevicius, and Cialdini, 2011). We describe these experiments and
their findings, detail a number of challenges and how the research team
responded to those challenges, and then discuss some of the rewards that
came about as the result of this research. Moreover, we discuss three
central stages of field experimentation in partnership with outside organi-
zations: initial outreach, securing buy-in, and implementation.
The purpose of the Goldstein et al.’s 2008 article was to examine the
types of norms to which individuals are most likely to conform. The
psychological research on social identity had previously demonstrated
that people are most likely to conform to the norms of a reference group
with which they strongly identify. These literatures tended to explore
how personal similarities (e.g., in gender, attitudes, ethnicity, values, age,
social class) between a target individual and a group of people influence
the target’s adherence to the group’s social norms (e.g. Terry and Hogg,
1996; Terry and Hogg, 1999). However, Goldstein and colleagues (2008)
noted that comparatively little research existed at the time exploring
the role contextual similarities (e.g., similarities in contexts, situations,
circumstances, and physical locations) play in adherence to reference
group norms.
One of the central aims of their study was to examine this question
by exploring whether the physical location in which a reference group’s
behavior takes place influences others’ conformity to that behavior. The
researchers aimed to show that what they call provincial norms—the
norms of what occurs in one’s local setting and circumstances—tend to
lead to greater conformity than more general or distal norms, which the
502
MIZIK_9781784716745_t.indd 502 14/02/2018 16:38

Field experimentation 503
authorship team referred to as global norms. They also sought to examine

whether such messages might also be more influential than those com-
municating the norms of reference groups with which individuals typically
identify strongly (e.g., their own gender).
Goldstein and colleagues sought to test these and other ideas out of
the lab, in a real-world context that was likely to have a meaningful
societal outcome. Hotel towel reuse fit the bill for a number of obvious
reasons. First, they noted signs urging guests to engage in conservation
behaviors were becoming increasingly popular in hotels, making them
more and more societally relevant each year. They also had never previ-
ously observed any hotel towel reuse materials communicating social
norms; thus, if successful, they would be able to make an applied case and
offer specific recommendations to promote environmentally conscious
consumer behaviors. The author team was immediately presented with
a number of challenges, including how they would secure a hotel as a
research partner in the first place.
The first stage of field experimentation with outside organizations is
the initial outreach. The easiest way to gain cooperation with a potential
field experiment partner is through pre-existing relationships within
one’s network. Unfortunately, in this particular case, the team had no
connections to the hotel industry, so a cold-call-style letter was written
and sent to almost half a dozen local managers in charge of all the hotels
geographically surrounding the university where the research team was
based. These hotels were picked not just because they would be convenient
for the team to visit but also due to the assumption that the hotels’ pre-
existing associations with the university would facilitate cooperation and
trust (i.e., most hotel guests had some university-related reason for staying
at these hotels). Following the social psychological literature on factors
that promote perceptions of being part of an in-group (e.g. Heider, 1958;
Tajfel, 1978), this pre-existing relationship between the university and
each hotel was highlighted. The potential benefits to the hotel were also
highlighted; it was convenient for this research that conservation behaviors
directly influenced the bottom line for the hotel, but even if there were no
clear and direct benefit to the prospective field experiment collaborator,
one can almost always point to at least some indirect benefit of the study
to the prospective partner (e.g., better understanding the organization’s
customers). Finally, following the social psychological literature on strate-
gies that increase compliance with requests (e.g., Cialdini and Goldstein,
2004), a small gift was also included with the letter—a copy of Cialdini’s
Influence: Science and Practice (Cialdini, 2008). The hope was the book
would be simultaneously useful to each manager while also signaling the
team’s credentials. More generally, perhaps the inclusion of an article or
MIZIK_9781784716745_t.indd 503 14/02/2018 16:38

blog post written by someone on the research team—one understandable

to a layperson—would also act as a gift that simultaneously offers creden-
tials to would-be partners.
After securing a meeting with various hotel managers, the next stage
included getting buy-in from key stakeholders (in this case, convincing
the hotel management to partner with the research team). These meetings
were much like a negotiation, where knowing the other side’s underlying
interests (i.e., not creating more work for their staff, discomfort for their
guests, and not violating any union or other binding contracts) was
important in demonstrating an understanding of where they were coming
from and that great care would be taken not to violate their wishes. In
addition, from a persuasion point of view, one of the major challenges is
trying to determine which motives will resonate most with one’s potential
field experiment partner. There are four typical motives for participation
in field experimentation that we have observed. First, prospects may wish
to be involved in an academic field experiment to help them understand
their business better and make better decisions in the future. Second,
some prospects find collaborating with academics to provide themselves
with a sense of personal prestige—something that credentials them within
their organization or possibly outside of it. Third, some prospects are
excited by the potential that the collaboration might benefit society in
some way, even if there is little possibility of benefitting their own bottom
line. Fourth, some prospects agree to collaborate out of pure personal
interest—they are curious individuals who want to find out the answer
to the questions being asked. In the case of this particular project, the
first and fourth motives were the ones that surfaced most prominently in
the discussions, and therefore more of the conversation focused on those
motives.
Once buy-in from management was obtained, the third stage was imple-
mentation. Of course, with such partnerships it is inevitable that one must
cede some control. Because it was obviously not possible to have typical
university research assistants walking into guests’ bathrooms while guests
were out to collect data on towel reuse, the authorship team needed to
rely on the room attendants to collect the data for them. It was extremely
important to train the room attendants to understand completely what
counted and did not count as an indication of the guests’ desire to reuse
their towel. When running field experiments it is best to take advantage of
existing systems to ensure accuracy and compliance with data collection.
In the hotel with which we partnered, room attendants already had paper
forms that they used to indicate that a room had been cleaned and included
a space for extra notes. Rather than generating a brand new form, the
research team made some very small changes to these pre-existing forms,
MIZIK_9781784716745_t.indd 504 14/02/2018 16:38

ultimately making data collection seem like a simple extension of the tasks
room attendants were used to completing.
Also, knowing that a language barrier might pose a challenge, the team
asked the room attendant supervisor for permission to go into a room and
take pictures of towels in various places to eventually be used in a guide that
pictorially demonstrated what should and should not count as towel reuse.
In addition to in-person training by the researchers, the team also wrote out
instructions in English and then paid a translator to translate the instructions
into Spanish (the native language of the majority of the room attendants).
Because the room attendants were being given new instructions that
differed from their well-established habit, our instructions were somewhat
complicated, and they personally had little incentive to follow these
instructions. Therefore, the room attendant supervisors were asked to
occasionally “test” the room attendants and to report back to the
researchers if there were any room attendants whose data they believed
would be inaccurate. After a few weeks the supervisors named several
room attendants whom they did not endorse and whose data the team
did not use. Had the room attendant supervisor not conducted these tests,
these room attendants’ data likely would have added noise to the experi-
ment and reduced the likelihood of detecting statistically significant effects
between the different message conditions.
Prior to this field experiment comparing provincial and global norms,
the researchers conducted an initial field experiment. This experiment
simply tested the difference in compliance between a control message that
was akin to the standard messages hotels already employed (focusing on
the importance of conservation to the environment) and a descriptive
social norm-based message indicating that most hotel guests participate
in the program (these norm data were based on a small study the authors
had previously discovered). There were several benefits of conducting this
study prior to proposing the provincial norm study. First, from a purely
applied standpoint, demonstrating that a novel sign designed by psycholo-
gists was superior to the standard ones used by almost all hotels would
provide a key applied insight to practitioners in the hospitality industry.
Second, this would provide the hotel manager with tangible results,
further highlighting the utility of research. Finally, and most important,
this experiment helped iron out the kinks, so to speak, of the logistics
and coordination necessary to run future studies jointly with the hotel
management and staff.
After collecting data for this initial experiment for nearly three months,
the researchers found that the social norm message indicating that most
of the hotel guests reuse their towels did indeed yield significantly greater
participation in the hotel’s towel reuse program than the standard
MIZIK_9781784716745_t.indd 505 14/02/2018 16:38

e nvironmental message. The team wanted to demonstrate its appreciation

to the manager and staff, so they purchased two plaques—one for the man-
ager and one for the room attendant supervisors and staff. These plaques
had an award-like appearance and expressed appreciation to the manager
and his employees for playing a major role in the university’s research
and the generation of new scientific knowledge. This accomplished two
goals other than gratitude for its own sake. First, research clearly shows
that demonstrations of gratitude have many positive effects on one’s
relationship with others (Grant and Gino, 2010); thus, these plaques likely
helped further buttress the relationship between the research team and
their field experiment collaborators. This enhanced relationship and trust
between all parties is what allows gatekeepers such as managers and hotel
staffing directors to transition from interested prospects to advocates.
Second, these plaques helped to explicitly identify the manager and staff
as not just helpful to the researchers or even to the larger university, but
as being willing contributors to scientific exploration more generally. This
is consistent with research from psychology and consumer behavior that
finds labeling others with particular traits, values, attitudes, or attributes
increases the likelihood they will act consistently with those labels (e.g.,
Tybout and Yalch, 1980).
The positive relationship the authorship team established with the
manager deepened over time and this proved crucial in facilitating the
manager’s acceptance of the experiment designed to test the provincial
norms hypothesis we described above. Recall the provincial norms experi-
ment was designed to test the hypothesis that messages communicating the
norms of one’s local environment (the provincial norm) would engender
more conformity than norms that are more distal (the global norm), even
when the former is no more diagnostic of effective or acceptable behavior
than the latter. In order to test this idea in the context of towel reuse
programs, the research team came up with what seemed like an unusual
idea. If you recall, the social norm message in the first experiment they ran
at the hotel indicated most of the other guests at the hotel had participated
in the hotel’s conservation program—what seems a fairly provincial norm
(i.e., relevant to the surroundings and location of the individual). The
research team proposed making that norm even more provincial by indi-
cating that most other guests who specifically previously stayed in one’s
own room participated in the conservation program. The researchers also
included several other conditions in their field experiment that included
the same normative data but paired that information with different refer-
ence groups. These reference groups (e.g., those based on gender, being a
good citizen, etc.) were ones with which hotel guests were far more likely
to identify than with guests who previously stayed in their room. The pur-
MIZIK_9781784716745_t.indd 506 14/02/2018 16:38

pose was to examine whether or not the provincial reference group—with

whom few hotel guests would identify—would counter-intuitively result in
more conformity to the towel reuse norm than groups based on identities
individuals typically consider important (e.g., those based on one’s gender,
being a good citizen, etc.).
Imagine if this experiment were the first one proposed to the hotel
manager in the very first meeting. The idea of telling hotel guests about the
behavior that occurred in their own room is unusual at best and unpleasant
at worst. This is why it was so important to build a foundation of respect,
trust, and experience working together before proposing ideas that might
otherwise be considered outrageous and lead to the potential partner
slamming the door in one’s face. Ultimately, the hotel manager agreed to
allow the team to conduct the proposed follow-up experiment—a strategy
that no doubt was helped along by a sense of consistency (Freedman and
Fraser, 1966).
The data for this second field experiment conceptually replicated the
initial experiment. When compared against the standard environmental
message hotels had been using, the four different norm-based messages
were more effective in increasing towel reuse. That is, merely informing
hotel guests that many others generally reused their towels significantly
increased towel reuse compared to focusing guests on the importance of
environmental protection. In addition, consistent with our hypotheses, the
provincial descriptive norm message (the one that highlighted the behavior
of guests in the same room the participants were staying in) outperformed
the other three normative messages in towel reuse. That is, even though
the provincial norm for the frequency of guests’ towel reuse in a particular
hotel room is not any more diagnostic of effective or approved behavior
than the norms paired with the other reference groups—and the provincial
norm message references the norms of the least personally meaningful
reference group—this condition produced the highest level of towel reuse.
Reciprocity by Proxy
Goldstein, Griskevicius, and Cialdini (2011) also tested a completely dif-

ferent hypothesis published several years later. In their personal observa-
tion of hotel conservation programs they noticed that some hotels offered
an indirect incentive to hotel guests if they participated in the conservation
program. Specifically, the hotel would make a donation to a non-profit
environmental protection organization for every guest who participated
in the program. This idea struck the authors as an interesting tactic, but
one they believed could be made far more effective with a seemingly minor
MIZIK_9781784716745_t.indd 507 14/02/2018 16:38

tweak: instead of making the donation contingent on guests’ behavior, the

most effective use of the norm of reciprocation suggests that hotels might
be more successful by first making a donation to such an organization on
behalf of its guests and then asking the guests to participate in return.
Of course, in addition to some of the challenges we described above,
field experiments have their limitations. Often one of the biggest limita-
tions is a constraint on the number of different ideas that can be tested.
Another is that field experiments are typically limited in helping under-
stand underlying psychological processes, especially compared to lab
experiments or surveys. Yet another is the question of how conservative
to be when choosing an experiment’s conditions as well as designing those
conditions to avoid confounds. For example, in the Goldstein et al. (2011)
research, the authorship team had an important decision to make when
designing the wording of the signs that make future donations contingent
on guests’ behavior (which they called the Incentive-by-Proxy condition)
versus making the donation with no strings attached and asking the guests
to reciprocate (which they called the Reciprocity-by-Proxy condition).
There was a concern over whether the small wording differences between
conditions might not be enough to be noticed by guests, so the decision
was made to strengthen the difference between conditions with condition-
relevant text in bold. For example, in the Incentive-by-Proxy condition
this wording read, “PARTNER WITH US TO HELP SAVE THE
ENVIRONMENT” (followed by text explaining how a donation would
be made for each guest who participated in the program) whereas the
wording for the Reciprocity-by-Proxy condition read as follows: “WE’RE
DOING OUR PART FOR THE ENVIRONMENT. CAN WE COUNT
ON YOU?” (followed by text explaining how the hotel had already made
a donation on behalf of the hotel and its guests). This was certainly not
the experimentally “cleanest” comparison. If the Reciprocity-by-Proxy
condition were found to be more successful than the Incentive-by-Proxy
condition (which it ultimately was), there could be a number of reasons
that have nothing to do with the central theoretical difference between
the two conditions (e.g., “you” was mentioned in the one that was more
effective, it was asked in the form of a question, etc.). However, the team
felt that it was important to create a strong difference that honored the
respective intentions of each of the conditions, which would give it the
greatest likelihood of working in the limited field experiment context.
Also, after this study was successfully completed, the team moved their
experimentation to the laboratory and conceptually replicated the results
with messages that have far fewer confounds. In addition, because hotel
guests were not aware that they were enrolled in a study, the researchers
did not have the opportunity to follow up with them to tease apart the
MIZIK_9781784716745_t.indd 508 14/02/2018 16:38

potential mechanism(s) that drove the initial findings. This is where fol-
low-up laboratory studies prove so useful: Not only can they help reduce
confounds, but they also typically give researchers far greater insight into
the psychological underpinnings of the field experiment effects.
Rewards
Although field experiments present more challenges than many other
forms of research, they can also provide many more rewards. One major
benefit of field research is that it is conducted in a real-life setting and is
viewed as more convincing than lab study results. There is no leap of faith
required in making the jump from theory to practice. This is not to say that
real change happens quickly after field experiments are publicized—it took
years after publication of the first hotel study for us to observe any hotels
actually making use of the findings. But sometimes large-scale changes
do occur shortly after field experiments are published. For example, rela-
tively soon after Schultz and colleagues (2007) published their paper on
the benefits of providing normative feedback to home energy users, the
company Opower was founded using the same principles demonstrated in
that work (Cuddy, Doherty, and Bos, 2010). To date, Opower’s feedback
on homeowner’s energy reports has cumulatively saved approximately 11
trillion watt-hours of energy and reduced customers’ energy bills by about
$1.1 billion. It seems very likely that interest in the power of normative
feedback was a direct result of running a field experiment rather than a
survey or lab experiment. Given the potential major impact of field experi-
mentation on scholarship and practice, we look forward to seeing more of
it conducted by consumer researchers in the future.
References
Cialdini, R. B. (2008). Influence: Science and practice (5th ed.). Boston: Allyn & Bacon.
Cialdini, R. B. (2009). We have to break up. Perspectives on Psychological Science, 4(1), 5–6.
Cialdini, R.B. & Goldstein, N.J. (2004). Social Influence: Compliance and Conformity.
Annual Review of Psychology, 55(1), 591–621.
Cuddy, A. J. C., Doherty, K. T., & Bos, M. W. (2010). “OPOWER: Increasing Energy
Efficiency through Normative Influence (A).” Harvard Business School Case 911-016
(Revised January 2012).
Freedman, J. L., & Fraser, S. C. (1966). Compliance without pressure: The foot-in-the-door
technique. Journal of Personality and Social Psychology, 4, 195–203.
Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using
social norms to motivate environmental conservation in hotels. Journal of Consumer
Research, 35, 472–482.
Goldstein, N. J., Griskevicius, V., & Cialdini, R. B. (2011). Reciprocity by proxy: A novel
MIZIK_9781784716745_t.indd 509 14/02/2018 16:38

influence strategy for stimulating cooperation. Administrative Science Quarterly, 56,

441–473.
Grant, A. M. & Gino, F. (2010). A little thanks goes a long way: Explaining why gratitude
expressions motivate prosocial behavior. Journal of Personality and Social Psychology,
98(6), 946–955.
Heider, F. (1958). The Psychology of Interpersonal Behavior. New York, NY: Wiley.
Schultz, P. W., Nolan, J. M., Cialdini, R. B., Goldstein, N. J., & Griskevicius, V. (2007).
The constructive, destructive, and reconstructive power of social norms. Psychological
science, 18(5), 429–434.
Terry, D. J. & Hogg, M. A. (1996). Group norms and the attitude – behavior relationship:
A role for group identification. Personality and Social Psychology Bulletin, 22, 776–793.
Terry, D. J. & Hogg, M. A. (eds.). (1999). Attitudes, behavior, and social context: The role of
norms and group membership. Psychology Press.
Tajfel, H. (1978). “Social Categorization, Social Identity and Social Comparison.” In
Differentiation Between Social Groups: Studies in the Social Psychology of Intergroup
Relations, ed. H. Tajfel. London, England: Academic Press.
Tybout, A.M. & Yalch, R.R. (1980). The Effect of Experience: A Matter of Salience? Journal
of Consumer Research, 6, 406–413.
MIZIK_9781784716745_t.indd 510 14/02/2018 16:38

25. Regulation and online advertising
markets
Avi Goldfarb
The first standard format online banner advertisement was displayed

on the hotwired website for Zima, an alcoholic beverage, in October
1994 (Goldfarb 2004). Since then, online advertising has grown rapidly.
Facebook and Google, two of the world’s most valuable companies, earn
most of their revenue from online advertising. More generally, as con-
sumer attention moves to computers and mobile devices, online advertis-
ing is an increasingly large share of all advertising expenditures.
The rise of this new form of advertising has generated a number of
policy questions around privacy, the ability of local governments to
regulate information, and antitrust in online markets. In this chapter, I
review three studies I conducted in collaboration with Catherine Tucker of
Massachusetts Institute of Technology. These studies use a combination
of field experiments and quasi-experimental variation to answer policy
questions related to online advertising. The article “Privacy Regulation
and Online Advertising” (Goldfarb and Tucker 2011a) informs the privacy
regulation debate by measuring the impact of European privacy regula-
tion on advertising effectiveness in Europe. The articles “Advertising Bans
and the Substitutability of Online and Offline Advertising” (Goldfarb and
Tucker 2011b) and “Search engine advertising: channel substitution when
Pricing Ads to Context” (Goldfarb and Tucker 2011c) both demonstrate
substitution between online and offline advertising markets. “Advertising
Bans” emphasizes the limitations of local advertising policy in the
presence of national online advertising markets while “Search Engine
Advertising” emphasizes that antitrust policy should consider online and
offline advertising to be substitutes.
Privacy
Internet use involves a one-to-one relationship between the end user’s

computer and the computer serving the digital content. This one-to-one
relationship means that it is relatively easy to collect information on
the behavior of individual users at a website. Because collecting such
511
MIZIK_9781784716745_t.indd 511 14/02/2018 16:38

information is straightforward online, online advertising is distinct from

other forms of advertising, as it can be targeted and its effectiveness can be
measured (Goldfarb and Tucker 2011d).
Therefore, the key distinction between online and offline advertising is
the use of data by online advertisers to target and measure their advertis-
ing. This is useful because it allows advertisers to advertise more effec-
tively, potentially increasing the match quality between consumers and
the products they consume. However, the use of such data raises privacy
concerns. The privacy concerns may be driven by a fundamental right to
privacy or the potential harm that may come to some consumers through
the use of the collected information in the form of higher prices, embar-
rassment, or discrimination (Selove 2008; Nissenbaum 2010; Acquisti,
Taylor, and Wagman 2016).
In response to these concerns, there has been a growing pressure on
regulators to restrict the ability of firms to collect and use information
about consumers. European privacy regulation has been relatively strict
and broad, while American privacy regulation has focused more on
finance and health than on digital advertising.
Goldfarb and Tucker (2011a) examined the consequences of the first
major implementation of European privacy regulation with respect to
online advertising. In particular, we documented how advertising effec-
tiveness changed in the United Kingdom, the Netherlands, Italy, France,
and Germany after the 2004 implementation of Directive 2002/58/EC.
This implementation banned the use of “web bugs” and related measures
for tracking consumer behavior without cookies. It also placed some
restrictions on the use of cookies and the use of data about consumer
clicks on websites.
To measure the impact of the ban, we needed two types of information.
First, we needed a measure of the effectiveness of advertising. Second, we
needed a comparison group in order to assess whether advertising effective-
ness in the European Union changed relative to some relevant benchmark.
To measure the effectiveness of advertising, we used data from thou-
sands of field experiments conducted by a marketing research firm from
2001 to 2008. The marketing research firm specialized in measuring the
effectiveness of ongoing advertising campaigns. In particular, advertisers
hire this firm to assess whether an online (banner) ad campaign is work-
ing. The research firm randomly changes several of the advertisements to
a “placebo” advertisement (typically a public service announcement for
an organization like the Red Cross). Web users who saw the company’s
advertisement are said to be in the “treatment group,” while web users
who were targeted for the company’s advertisement but instead saw the
public service announcement are said to be in the “control group.”
MIZIK_9781784716745_t.indd 512 14/02/2018 16:38

Regulation and online advertising markets 513
Web users in the treatment and control groups were asked to fill out
a survey that asked about opinions of the brand in the treatment group.
Thus people who saw the branded ad and people who saw the public
service announcement were both asked about their opinion on the brand.
The difference in favorability and stated purchase intention between the
treatment and control groups can be seen as the effect of the ad on brand
favorability and purchase intent. In other words, the experiment allows
the marketing research firm (and us researchers!) to assess the causal
impact of the advertisement on stated opinions.
It is important to note some limitations of this method. First, we do not
know the impact of the ad on actual purchasing, only on stated intentions
to purchase and stated opinion of the brand. Second, a large fraction of con-
sumers did not fill out the survey. While the response rate for the treatment
and control groups is similar, it is generally low. This suggests the measure
of effectiveness we have could be narrowly seen as a measure of the effect of
an advertisement on the type of people who are willing to fill out surveys.
Nevertheless, the field experiments give us measures of the effectiveness
of thousands of different advertising campaigns across many countries
and over many years. We could use this information to look at changes in
the effectiveness of advertising campaigns in Europe before and after the
2004 implementation of the privacy regulation; however, such an analysis
would be incomplete. It would not help solve the second requirement for
measuring the impact of the regulation: a comparison group to provide a
relevant benchmark.
As a benchmark, we use non-EU countries (the non-EU data come
primarily from the United States, with a small number of campaigns in
each of Canada, Mexico, Brazil, and Australia). We use the change in
EU privacy policy in the use to conduct a “difference-in-differences”
analysis that treats the change in policy as a natural or quasi-experiment.
We compare the change in effectiveness of EU ads before and after the
policy change to the change in the effectiveness of non-EU ads, before
and after the EU policy change. This is called a difference-in-differences
analysis because it looks at the difference in the change in ad effectiveness
across locations over time. The changes in ad effectiveness are, themselves,
differences between the before and after periods. While it is possible to
conduct difference-in-differences estimation by comparing the four aver-
ages (ad effectiveness in the EU before the policy change, ad effectiveness
in the EU after the policy change, ad effectiveness outside the EU before
the EU policy change and ad effectiveness outside the EU after the EU
policy change), it is more common and often more informative to conduct
regression analysis that emphasizes an interaction term between the policy
change timing and the treatment group.
MIZIK_9781784716745_t.indd 513 14/02/2018 16:38

Using such regression analysis, we found that advertising in the EU

became 65 percent less effective after the policy change, compared to
before the change and to the rest of the world. In other words, the EU
policy had a substantial negative effect on whether advertising worked.
The policy implications of this depend on one’s perspective. Less
effective online advertising certainly hurts online advertising platforms,
and it likely also hurts advertisers (though the latter point depends on
equilibrium prices). It may also hurt consumers if advertising is informa-
tive. On the other hand, if privacy is a fundamental human right, our
results suggest that the EU policy was quite effective in limiting what
companies were able to do with consumer data. Regardless of the spin on
the interpretation, since publication, this research has been used in policy
discussions in the EU, United States, and elsewhere in assessing the costs
and benefits of increased privacy regulation.
Advertising regulation and local

control
Next, I discuss another study that used part of this same data set of field
experiments from a marketing research company to assess whether the
digital channel limits the ability of local governments to change behavior
by restricting advertising.
Castells (2001) highlighted the potential of the internet to reduce state
control by allowing information to flow freely across borders. While
national governments have been able to erect barriers to the international
flow of information online (Zhang and Zhu 2011), such barriers have
proven challenging within countries. The point that local government
policies can be undermined by the online channel has received a great deal
of attention in the context of local sales taxes (Goolsbee 2000; Ellison and
Ellison 2009; Einav et al. 2012; Anderson et al. 2010). One common thread
in these studies is that consumers are much more likely to buy online in
locations with high offline sales taxes.
In Goldfarb and Tucker (2011b), we examine whether this reduced
potential of government control applies to advertising regulation. Many
local governments ban certain types of advertising within their juris-
diction. Particularly common in the United States is the banning of
alcohol advertising using billboards and other outdoor media. During the
2001–2008 time period, 17 states regulated such out-of-home advertising
of alcoholic products. To test whether the internet makes government
regulation less effective, we compared the effectiveness of online advertis-
ing campaigns for alcohol within the 17 states that restricted out-of-home
MIZIK_9781784716745_t.indd 514 14/02/2018 16:38

alcohol advertising to the 33 states without such regulations. If states with

restricted out-of-home advertising had more effective online advertising,
then the online advertising was, in effect, blunting the ability of the ban to
change behavior.
Of the thousands of experiments in our data, there were 275 US-based
campaigns for alcoholic beverages. This gave us measures of the effective-
ness of these campaigns. For the people who filled out the survey, some
were based in states with bans and some were based in states without such
bans. Comparing ad effectiveness in states with and without the ban, we
found that consumers in states with alcohol advertising bans are much
more responsive to ads (in terms of stated intention to purchase) than
consumers in other states. This analysis can be seen as a difference-in-
differences analysis that combines experimental and non-experimental
data. The experiment generates ad effectiveness: the difference between
the treatment group that saw the ads and the control group that did not.
The non-experiment generates the impact of the bans: the difference
between the states with and without bans.
Of course, alcohol bans cannot be interpreted as random. Therefore, the
underlying variation is not quasi-experimental. To address this point, we
conducted two types of additional analysis. First, we included covariates
for many state attributes and interacted them with the treatment group.
This allowed us to control for any systematic differences across states
in terms of alcohol consumption and advertising regulation. Second,
we examined four local-level advertising bans that changed during our
sample period: A December 2003 ban on some kinds of alcohol advertis-
ing in Philadelphia, the lifting of a July 2004 ban on alcohol advertising in
Pennsylvania college newspapers, the lifting of a November 2007 ban on
hard liquor advertising in New York, and the December 2007 ban on some
kinds of alcohol advertising in San Francisco.
These changes in policy allow us to compare another difference:
Changes over time in ad effectiveness in these places relative to other
locations. The results hold, though statistical significance is sometimes
lower: offline advertising bans increase the effectiveness of online banner
ad campaigns.
In terms of understanding the mechanism through which the online ad
campaigns substitute for potential offline campaigns, we show that the
results are driven by new products and products with relatively low levels
of awareness.
While such policies might achieve their intended purpose of reducing
exposure of school children to alcohol advertising, we interpret these
results to suggest that the internet can enable firms and consumers to
circumvent (sub-national) offline advertising restrictions. More generally,
MIZIK_9781784716745_t.indd 515 14/02/2018 16:38

our results demonstrate that local offline regulation can be challenging in

the presence of a nationally accessible internet.
Antitrust
Advertising bans can also be used to understand the interaction between

online and offline advertising markets. Such interactions are important to
understand because they help inform antitrust policy by helping to define
relevant markets and relevant margins of competition.
Google is a large company and, in many countries, it has a large share of
the internet search market generally and of the search advertising market
in particular. Being large is not per se illegal, and neither is having a large
share of a market. Still, having a large share of a particular market is often
seen as a necessary condition for a company to be the target of antitrust
action.
Some of the early antitrust arguments against Google relied on their
large share in the search advertising market (Ratliff and Rubinfeld 2010;
Manne and Wright 2011). Because search advertising is a relatively
small share of the overall market for advertising, Google’s share of the
search advertising market is only a concern for the antitrust authorities if
search advertising is a distinct advertising market from other advertising
markets. In other words, for Google to be a target of antitrust regulation
in search advertising, search advertising cannot be a close substitute for
other types of advertising.
In Goldfarb and Tucker (2011c), we examined substitution between
search engine advertising and a particular form of offline advertising:
direct solicitation of customers. Focusing on personal injury lawyers (a
particularly lucrative segment of search advertising), we compared the
prices for search advertising in places that prohibit lawyers from contact-
ing customers directly with prices for search advertising in places that
allowed lawyers to engage in such direct solicitation (pejoratively called
“ambulance chasing”).
We collected data on advertising prices for dozens of law-related key-
words. These data were not drawn from an experiment. Instead, we have
to infer the causal effect of bans by comparing an artificial “treatment
group” that are affected by the bans with an artificial “control group”
that would not be affected by the bans but should be otherwise similar.
For the artificial treatment group, we used keywords related to personal
injury law, and so the lawyers conducting the advertising could be affected
by a ban on direct solicitation. For the artificial control group, we used
other law-related keywords, unrelated to personal injury law (family law,
MIZIK_9781784716745_t.indd 516 14/02/2018 16:38

intellectual property law, etc.). We argue that prices for these words were
unlikely to be affected by a ban on direct solicitation, but are likely to
be affected by other drivers of the price of law-related keywords such as
litigiousness and local competition between lawyers.
We conducted difference-in-differences analysis, comparing the differ-
ence in the prices of personal injury keywords with other law keywords
in states with direct solicitation bans to the difference in the prices of
personal injury keywords with other law keywords in states without direct
solicitation bans.
We found substantial substitution between search engine advertising
and direct solicitation: when direct solicitation is banned, prices for per-
sonal injury keywords are much higher. We interpret this to suggest that
search engine advertising competes directly with offline direct solicitation
(a form of advertising).
This research has been used to argue that online and offline advertising
markets should not be seen as separate markets, but as part of a larger
advertising market (Ratliff and Rubinfeld 2010). If the relevant market
is all advertising, rather than search engine advertising, it is harder to see
how Google can be an antitrust target based on its share of the search
advertising market alone.
Conclusion
This chapter has summarized three studies that used experiments and
difference-in-differences regression modeling to inform policy debates
around privacy, local jurisdiction, and antitrust. Much work remains to
be done to improve the empirical content of these debates, as well as other
discussions in marketing policy.
References
Acquisti, Alessandro, Curtis Taylor, and Liad Wagman. 2016. The Economics of Privacy.
Journal of Economic Literature 54(2), 442–492.
Anderson, E., N. Fong, D. Simester, and C. Tucker. 2010. How sales taxes affect customer and firm
behavior: The role of search on the internet. Journal of Marketing Research 47(2), 229–239.
Castells, Manuel. 2001. The Internet Galaxy: Reflections on the Internet, Business, and
Society. London: Oxford University Press.
Einav, L., D. Knoepe, J. D. Levin and N. Sundaresan. 2012. Sales taxes and internet com-
merce. Working Paper 18018, National Bureau of Economic Research.
Ellison, Glenn and Sara Fisher Ellison. 2009. Tax Sensitivity and Home State Preferences in
Internet Purchasing. American Economic Journal: Economic Policy 1(2), 53–71.
Goldfarb, Avi. 2004. Concentration in Advertising-Supported Online Markets: An Empirical
Approach. Economics of Innovation and New Technology 13(6), 581–594.
MIZIK_9781784716745_t.indd 517 14/02/2018 16:38

Goldfarb, Avi and Catherine Tucker. 2011a. Privacy Regulation and Online Advertising,
Management Science 57(1), 57–71.
Goldfarb, Avi and Catherine Tucker. 2011b. Advertising Bans and the Substitutability of
Online and Offline Advertising. Journal of Marketing Research 48(2), 207–227.
Goldfarb, Avi and Catherine Tucker. 2011c. Search engine advertising: Channel substitution
when pricing ads to context. Management Science 57(3), 458–470.
Goldfarb, Avi and Catherine Tucker. 2011d. Online Advertising. In Advances in Computers
vol. 81, ed. Marvin Zelkowitz. New York: Elsevier.
Goolsbee, A. 2000. In a world without borders: The impact of taxes on internet commerce.
Quarterly Journal of Economics 115 (2), 561–576.
Manne, Geoffrey, and Joshua Wright. 2011. Google and the Limits of Antitrust: The
Case Against the Case Against Google. Harvard Journal of Law and Public Policy 34(1),
171–244.
Nissenbaum, Helen. 2010. Privacy in Context: Technology, policy, and the integrity of social
life. Palo Alto CA: Stanford Law Books.
Ratliff, James D. and Daniel L. Rubinfeld. 2010. Online Advertising: Defining Relevant
Markets. Journal of Competition Law and Economics 6(3), 653–686.
Selove, Daniel. 2008. Understanding Privacy. Cambridge MA: Harvard University Press.
Zhang, X. and F. Zhu. 2011. Group Size and Incentives to Contribute: A Natural Experiment
at Chinese Wikipedia. American Economic Review 101(4), 1601–1615.
MIZIK_9781784716745_t.indd 518 14/02/2018 16:38

26. Measuring the long-term effects of public
policy: the case of narcotics use and
property crime
Keiko I. Powers
A critical issue in the evaluation of public policy effectiveness is the dis-

tinction between short-term and long-term effects. In the former case, an
action (e.g., the provision of a health care service) has a temporary or tran-
sitory effect on some desired outcome (e.g., a reduction in the incidence
of a communicable disease), and, in the latter case, it has a permanent or
trend-setting effect. The difference is of fundamental importance in decid-
ing whether or not the benefits of public policy programs outweigh their
costs.
Major advances in the field of multivariate time-series analysis have
made it possible to empirically differentiate long-term and short-term
effects when equal-interval time series data are available. First, econo-
metric techniques have been developed that measure the presence of per-
manent versus transitory movements in individual time-series data. These
methods are known as tests for unit roots in time series (e.g., Dickey, Bell
and Miller 1986). Second, if long-term movements in individual time series
are discovered, then the existence of long-term relationships among vari-
ables can be investigated using a method known as cointegration (Engle
and Granger 1987). Finally, the long-run and short-run relationships
among a set of variables may be combined in one model, known as an
error-correction model (Engle and Granger 1987).
The empirical investigation used long-term multivariate time-series
modeling to understand one of modern society’s most pressing prob-
lems: narcotics abuse and the associated property crime and how these
behaviors can be influenced by social intervention. In examining possible
strategies to curb the narcotics abuse problem, the main interest lies in
the permanent as well as the temporary effect of social interventions.
To address these key notions, the chapter is organized as follows. First,
it provides a brief overview on how social interventions are designed to
control narcotics abuse and property crime. Next, the focus is on the
methodological issues involved in unit-root testing, cointegration, and
error-correction modeling, which then leads to descriptions of a multi-step
approach for measuring long-term and short-term relationships in the
519
MIZIK_9781784716745_t.indd 519 14/02/2018 16:38

data. The chapter concludes with an analysis of the empirical results and a
discussion of their policy implications.
Background
The two main programs that society currently uses to respond to individu-
als with problems of illegal drug use are health-system interventions and
legal-system controls. The health system deals with physical, mental, and
some behavioral aspects of drug use but does not necessarily address crime
and violence. The legal system, which views drug use from the perspective
of criminal justice, focuses on the criminality of drug users and imposes
penalties for illegal activities, including incarceration. Both the medical
and the criminal aspects of drug use, however, are intricately related. The
strong linkage between narcotics addiction and crime has been well docu-
mented (see e.g., reviews by Speckart and Anglin 1986; Powers 1990).
Studies evaluating the effectiveness of treatment, especially methadone
maintenance, consistently show that treatment reduces narcotics use and
related crime among chronic narcotic addicts (Anglin and Hser 1990).
Evidence for the direct effects of legal supervision, while promising, is
more equivocal (Simpson and Friend 1988). Even fewer studies have
investigated the joint effectiveness of criminal justice system interventions
and community drug treatment on drug use and crime, especially over a
long period of time (Collins and Allison 1983). As a result, the relative con-
tributions of methadone maintenance and legal supervision to combatting
drug use and crime remain unclear. Nor is it known how these two types
of intervention should be combined for maximum efficacy. Furthermore,
before policy decisions can be made, it is necessary to determine whether
such interventions continue to have beneficial effects over the long run
for a sufficiently large number of drug-dependent individuals to be cost
effective.
In order to explore these questions, the present study will develop a
multivariate time-series model, using a cointegration and error-correction
approach to understand the long-term and the short-term relationships
among the intervention and behavioral variables (Engle and Granger
1987). Long-term, or “permanent,” relationships refer to how a stochastic
trend in a given variable is related to the stochastic trends of other vari-
ables. Short-term relationships measure how temporary fluctuations from
the means, or trends, of the measured variables are related to each other.
From the literature, it is clear that methadone maintenance and legal
supervision do not typically operate in isolation from each other, and
both are often imposed, either alone or in combination, in response to
MIZIK_9781784716745_t.indd 520 14/02/2018 16:38

Measuring the long-term effects of public policy 521
illicit drug use or criminal involvement. Therefore, their effects should

be evaluated within a system framework. In the present case, this system
approach allows us to assess the dynamic interplay between narcotics use
and property crime and to examine how this relationship influences and
is influenced by methadone maintenance and legal supervision. Because
the current study examines the interrelationships within the system over a
long period of time, the model will also consider the possible interaction of
maturation, or aging, with the relevant variables.
Data
Sample
The data for the present analysis were taken from extensive retrospective
longitudinal interviews with 720 heroin addicts who entered methadone
maintenance programs in Southern California in the years 1971–1978.
Detailed descriptions of sample selection and sample characteristics are
available elsewhere (Anglin and McGlothlin 1984; Hser, Anglin and Chou
1988). The original sample consisted of 251 Anglo men, 283 Anglo women,
141 Chicanos, and 45 Chicanas. Because the length of the observation
period had to be sufficiently long for the results of time-series analysis to
be reliable and because it was necessary to retain a sufficient number of
subjects for the results to be generalizable, subjects who did not have at
least 80 months of observation were eliminated, providing 627 subjects
(87 percent of the original sample) for the time-series analysis. To ensure
that the reduced sample was representative of the original group, back-
ground characteristics of both samples were compared and are presented
in Table 26.1. No apparent differences were observed between the two
samples. The selected sample consisted of Anglo (74 percent) and Chicano
(26 percent) chronic narcotic addicts, both men (57 percent) and women
(43 percent). All the following analyses are based on the selected sample.
Variables
Five outcome variables were selected for the present analysis:
1. abstinence or no narcotics use (NNU),

2. addictive use or daily narcotics use (DNU) for at least 30 days,
3. property crime (C),
4. methadone maintenance treatment (MM), and
5. legal supervision (LS).
MIZIK_9781784716745_t.indd 521 14/02/2018 16:38

Table 26.1 Sample characteristics
MIZIK_9781784716745_t.indd 522
Background Characteristics Original Sample (N= 720) Selected Sample (N=627)
N % N %
Ethnicity
Chicano 186 25.8 163 26.0
Anglo 534 74.2 464 74.0
Gender
Men 392 54.4 357 56.9
Women 328 45.6 270 43.1
Socioeconomic status of family (%)
522
Poor 7.1 7.1
Working class 33.4 33.4
Middle 45.5 44.9
Upper-middle 13.9 14.6
Problems in family a 2.8 2.8
Gang membership (%) 17.7 18.7
Problems in school (%) 72.0 72.0
Mean highest grade completed 10.9 10.9
Main occupation (%)
Skilled 19.6 19.9
Semi skilled 56.3 57.6
Unskilled 19.0 17.5
Never worked 5.1 4.9
14/02/2018 16:38
Mean age atb
First arrest 17.4 (671) 17.3 (587)
MIZIK_9781784716745_t.indd 523
Time left home 17.7 (706) 17.4 (616)
First narcotic use (FNU) 19.5 19.2
First daily use (FDU) 20.8 20.6
First legal supervision 22.4 (549) 22.3 (484)
First MM entry 26.6 26.9
Interview 31.9 32.5
Incarcerated >30 days prior to 25.1 25.6
FNU (%)
No. of mos. incarcerated prior to FNU (%)
523
None 75.0 74.5
1–12 17.4 18.0
13–24 5.2 5.4
25 or more 2.4 2.1
No. of incarcerations prior to FNU (%) None 1-5 6 or more
None 66.7 65.6
1–5 28.3 29.5
6 or more 5.0 4.9
a Measured by self-reported problematic relationships with parents; a higher value indicates more serious problems (range 1-6).
b The values in parentheses are the number of cases for mean computation after exclusion of missing cases. When not specified, the entire sample
was used.
c Includes incarcerations <30 days.
14/02/2018 16:38
Because abstinence is a traditional goal for social intervention and because

addictive use is highly associated with property crime, these two conditions,
NNU and DNU, were chosen as major indicators of level of drug use. The
value of each of these variables was the percentage of time engaged in the
activity (or the percentage of time in the status) aggregated up among the
627 subjects during 99 successive two-month periods starting at first nar-
cotics use. Variables were measured in terms of the percentage of time to
quantify the amount of each behavior or time-in-status rather than simply
noting whether or not it occurred. In addition, the mean age of the group
at each two-month period was included as a control variable.
The time-series plots of the five outcome variables are given in
Figure 26.1.
Methodology
The above-stated research objectives called for a multivariate time-series,

or “systems” analysis of the dynamic relationships among narcotics use,
criminal behavior, and intervention programs, while controlling for age.
Using aggregate data allows us to distinguish between program or policy
response effects (such as the impact of methadone treatment on narcot-
ics use) and policy feedback effects (such as the presence of narcotics
use leading to methadone treatment or legal supervision). In particular,
the approach allows an examination of the existence versus lack of long-
term and short-term policy response and policy feedback effects within
this system. The following is a step-by-step description of the analytic
procedure, which is graphically depicted in Figure 26.2.
Overview of Analytic Procedure
To explain the difference between long-term and short-term effects, let us

focus on the hypothesized relationship between methadone maintenance
treatment (MM) and narcotics abuse or, in this case, daily use of narcotics
(DNU). From a time-series perspective, the first question to be answered
is whether the observed levels of DNU and MM are stationary or non-
stationary. The distinction between the two terms can be explained as
follows:
Assume that the over-time behavior of a series {Zt} representing a
variable such as MM or DNU can be modeled as a simple stochastic time
series process
Zt = c + ϕZt-1 + at
MIZIK_9781784716745_t.indd 524 14/02/2018 16:38

60
MIZIK_9781784716745_t.indd 525
50
P
e
r
c 40
e Daily Narcotics Use
n No Narcotics Use
t
30 Methadone Maintenance
T Legal Supervision
i
525
Property Crime
m 20
e
%
10
0
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 99
Period
Figure 26.1 Sample data
14/02/2018 16:38
Stage I
MIZIK_9781784716745_t.indd 526
(Examination of Unit Roots)
DO THE VARIABLES CONTAIN LONG-TERM COMPONENTS?
Test: unit-roots test
yes no
Stage II
(Assessment of
Long-Term Equilibrium) ARE THE VARIABLES COINTEGRATED?
Test: equilibrium regression
526
yes no
Stage III
(Assessment of
Short-Term Dynamics)
ERROR CORRECTION MODEL MODEL IN CHANGES MODEL IN LEVELS
Long-Term
Effect
yes no cannot be inferred
Short-Term
Effect maybe yes yes
Figure 26.2 Analytical procedures
14/02/2018 16:38
or, using lag operator notation,
(1 – ϕL)Zt = c + at (26.1)
where:
ϕ is the parameter relating the present to the past of Z,
L is the lag operator such that LkZt = Zt-k with k being a positive integer,
Zt is a random variable measured at time t with t = 1, 2, . . ., T,
c is a constant, and
at is a white noise random shock at time t, which is assumed to have a
normal distribution with mean 0 and constant variance s2a .
When |ϕ| < 1 holds for this model, the series {Zt} is said to be
stationary, having finite mean E(Zt) = c/(1−ϕ), and variance Var(Zt) =
s2a / (1−ϕ2). In this case, all observed fluctuations in {Zt} are temporary
in the sense that the series does not systematically depart from its mean
value, but rather reverts to it. On the other hand, if |ϕ| = 1, the series
is said to be a non-stationary, or evolutionary, series (a random walk,
in this case) whose mean and variance are functions of time t. For this
condition, the observed fluctuations are permanent in the sense that
the series wanders freely without any mean reversion. If |ϕ| > 1, the
series explodes toward + ∞ or – ∞, which is also non-stationary. For
the above model, determining whether the series is stationary or not
is equivalent to testing whether the root of the characteristic equation,
1 – ϕL = 0, is greater than one. When |ϕ| < 1, we conclude that the series
is stationary.
If MM and DNU are stationary, this implies that no long-term change
in these variables is observed over the observation period. Thus, if MM
has an effect at all on DNU, then the effect must be transitory, or short-
term, since the level of DNU will eventually return to its mean. Under
these conditions, we would argue that methadone treatment has only
temporary effects on narcotics use. On the other hand, if MM and DNU
are non-stationary, then we may investigate whether the observed random
walk, or stochastic trend, in DNU can be explained by the stochastic
trend in MM. For example, can a gradual decrease in DNU be explained
by a gradual increase in MM? A positive answer would imply that there
is a long-term, or equilibrium, relationship between the two. A negative
answer still does not rule out the effectiveness of methadone maintenance,
but it would imply that the treatment produces only temporary deviations
in the level of narcotics use. Finally, it is possible that a mixed scenario
occurs, such as the presence of a stochastic trend in narcotics abuse, but
not in methadone treatment. If the change in narcotics use could be related
to the level of methadone treatment, that would imply an even stronger
MIZIK_9781784716745_t.indd 527 14/02/2018 16:38

long-term effectiveness of treatment. For example, a gradual decrease in

narcotics abuse could be related to the steady maintenance of treatment at
a certain level. This same type of development applies to legal supervision
when we evaluate its effect on the dependent variables.
Testing the presence of unit roots (Stage I)

In order to disentangle the various scenarios mentioned above, we start by
performing a univariate analysis of the data, i.e., we examine the pattern
over time of each of the five variables separately. We investigate whether a
variable such as narcotics use behaves as a stationary (i.e., mean-reverting
process) or as a non-stationary (e.g., random-walk) process. We identify
the variable’s behavior by applying the well-known Box-Jenkins method
for univariate ARIMA modeling to each series, with particular attention
to the existence of unit roots, or non-stationary components, in the data
(e.g., Dickey and Fuller 1979). The general integrated autoregressive
moving average, or ARIMA (p, d, q), model is defined as
F(L) ΔdZt = c + Q(L)at (26.2)
where
F (L) = 1 − ϕ 1 L – . . . – ϕp LP, (26.3)
Q (L) = 1 − ϑ1 L – . . . – ϑq Lq (26.4)
are polynomials in the lag operator L for autoregressive parameters and

moving average parameters, respectively, and Δd = (1 – L)d is the difference
operator. Note that if we specify p = 1 and d = q = 0 for equation (26.2),
the resulting ARIMA (1, 0, 0) model is equivalent to equation (26.1).
If the data are generated by an ARIMA model with d = 0, they are
stationary; then all movements in the data should be interpreted as
temporary deviations from a fixed mean, which would limit our ability to
derive long-term inferences from the results. In this case, only short-term
relationships can be assessed. If, on the other hand, one or more unit roots
are found (i.e., d ≥ 1), then we may investigate whether these nonstation-
ary components, or stochastic trends, are related to each other.
Assessment of long-term equilibrium (Stage II)

The analysis of non-stationary components is accomplished by specifying
the “equilibrium regression” proposed by Engle and Granger (1987). An
equilibrium regression, for example between methadone treatment and
MIZIK_9781784716745_t.indd 528 14/02/2018 16:38

narcotics use, would establish that the two time series representing these
variables are related to each other in the long run. In theory, if the equilib-
rium relationship holds between MM and DNU, then they relate to each
other under the linear constraint
DNUt – bMMt = c (26.5)
where b is a constant. Suppose b < 0; then, if the level of MM increases,

DNU must eventually decrease in order to maintain the equilibrium. On
the other hand, with b > 0, if DNU is on the rise, the amount of treatment
will eventually increase. In reality, the linear constraint (26.5) may not
exactly hold in each time period. The difference between the observed level
of, say, DNU, and its equilibrium level given the observed level of MM, is
called the equilibrium error. It may be estimated by calculating the residu-
als from an equilibrium regression, for example,
DNUt = c + bMMt + et (26.6)
where b is called the cointegrating constant. The existence of a long-term

relationship implies that the equilibrium errors et do not have permanent
components in them, i.e., et is a stationary time series even though DNU,
and MM, are not. Indeed, if et were nonstationary, then there would be no
mechanism for tying DNU and MM together in the long run.
The statistical test determining an equilibrium relationship amounts
to estimating the hypothesized equilibrium regressions by ordinary least
squares (Stock 1987) and verifying that the residuals of these regressions
have only transitory components, i.e., unit roots are not present in the
residual series. This regression interpretation is unusual and innovative
in the sense that we are not testing for the usual condition of uncor-
related residuals over time. Instead, we verify that the non-stationary
movement in one variable removes the non-stationary fluctuations in
another variable, such that only transitory (though possibly autocor-
related) components are left in the residuals. Such a condition is called
“cointegration.”
Assessment of short-term dynamics (Stage III)

Next, we proceed to modeling the short-term dynamic relationships in the
system while controlling for long-term effects where applicable. Depending
on the outcomes from Stage I (the presence/ absence of unit roots in each
univariate time series) and from Stage II (the existence/nonexistence of
cointegration among the variables), the analytical procedure for assessing
short-term dynamics will take one of the following three approaches:
MIZIK_9781784716745_t.indd 529 14/02/2018 16:38

1. an error-correction model for cointegrated variables,

2. a model in changes for non-stationary but non-cointegrated variables,
3. a model in levels for stationary variables.
Each of the three approaches is described below.

For the purpose of illustration, we will concentrate on the relationship
between MM and DNU and assume that MM and DNU are an input and
an output series, respectively.
Non-stationary system with cointegration

If cointegration has been established between MM and DNU, then the long-
term relationship between the variables must be incorporated in their short-
term behavior. Engle and Granger (1987) have shown that the existence of
an equilibrium relationship implies that the data are generated according to
a special partial adjustment, or error-correction mechanism. For example,
observed changes in narcotics-use levels could be explained not only by
lagged changes in narcotics use and by changes in methadone treatment, but
also by the “equilibrium error” in the previous period. The equilibrium error
is the amount of excessive, or insufficient, narcotics use given the observed
level of methadone treatment. A fraction of this error is corrected in the
subsequent period so that the system partially adjusts toward equilibrium.
The error-correction model for MM and DNU is expressed as
ΔDNUt = c0 + têt-1 + w (L) ΔMMt + s (L)ΔDNUt-1 + ut (26.7)
where êt-1 is the estimate of the equilibrium error correction term obtained
from equation (26.6), and w (L) and s (L) are parameter polynomials in L:
w (L) = w0 + w1 L + . . . + wr Lr (26.8)
s (L) = 1 – s 1 L – . . . – s s Ls. (26.9)
The contemporaneous and lagged effects of MM are measured by the

terms of w(L). Any additional autocorrelation in DNU is captured by the
terms of s(L) so that the error term ut is a white noise series. The error-
correction model posits that, in each period, the dependent variable will
adjust itself partially (by a factor t) toward the equilibrium level.
Non-stationary system without cointegration

If the data are non-stationary but not cointegrated, we first perform a
simple transformation to stationarity (differencing) and then develop a
model on these differences. For example, we may empirically investigate
MIZIK_9781784716745_t.indd 530 14/02/2018 16:38

the effect of a change in methadone treatment level on a change in narcot-

ics use using the model,
ΔDNUt = co + w(L) ΔMMt + s(L)ΔDNUt-1 + ut. (26.10)
The results would reveal the short-term dynamics of the system, but
they would not explain the long-term behavior of the variables. Notice
that equation (26.10) is a restricted form of equation (26.7), where the
error correction term is absent.
Stationary system
Finally, if the data are stationary, we develop a model on the levels of
narcotics use and methadone treatment,
DNUt = co + w(L) MMt + s(L)DNUt-1 + ut (26.11)
and the results are, again, interpreted as short-term dynamics.
Parameter Estimation Methods for Short-term Dynamics
Parameter estimation for short-term relationships can be carried out

either by using separate distributed-lag models or by developing a system
of equations in vector-autoregressive (VAR) form. In the first case,
we make an a priori distinction between input (exogenous) and output
(endogenous) variables; in the second case, this distinction is not neces-
sary. Equations (26.7), (26.10), and (26.11) are examples of distributed-lag
models, and these models were used to illustrate the underlying concepts
of cointegration and error-correction mechanisms. For the present analy-
sis, however, it would be inappropriate to develop a set of distributed-lag
structural models of narcotics use, crime, and intervention variables,
because such a system would likely be under-identified due to a lack of
specified exogenous variables. Indeed, the present database contains five
possibly jointly endogenous variables (no narcotics use, daily narcotics
use, property crime, methadone maintenance treatment, and legal super-
vision) and only one strictly exogenous variable (age). Therefore, the
vector-autoregressive (VAR) approach advocated by Sims (1980) is more
suitable. For k times series {Z1t, . . ., Zkt}, the VAR(J) model is defined as
Zt = c| + a i Zt-i + |
J
| | |
a t (26.12)
i51
where
|
Zt = a (k x 1) random vector observed at time t for t = 1, 2, . . ., T,
MIZIK_9781784716745_t.indd 531 14/02/2018 16:38

|
c = a (k x 1) vector of constants,
|
i = a (k x k) parameter matrix, and
| | |
at = a (k x 1) white-noise vector assumed to be i.i.d. N(O, S).
The dynamics of the VAR (J) model are specified as follows: the jth
sample partial autoregression matrix P(j) can be obtained from
5 5
P (j) = j with j = 1, 2, . . ., J,
when a VAR(J) is fitted by generalized least squares. If a VAR(p) model

| 5 |
holds for Zt, then for j > p, P (j) = O. I , and therefore the corresponding
5
matrix of estimates P (j) is expected to have all elements near zero. The well-
known Akaike Information Criterion is used to establish the maximum
needed value of j (e.g., Priestley 1981, p.372). The VAR approach focuses on
the lagged structures in the data, both within and across time series, leaving
any contemporaneous effects directionally unspecified. However, the cov-
ariance matrix of the residuals of a VAR model contains information that
may be interpreted as contemporaneous effects among the variables.
In summary, the analytic plan of this study is as follows: First, we
develop univariate ARIMA models for each of the five variables in the
system. If unit roots are not found, then a simple VAR model on the
levels in the data would conclude the analysis. If unit roots are found,
we perform an equilibrium regression test to establish the presence of
long-term relationships in the system. If the data pass the test, the model
combining long-term and short-term effects would be a VAR system on
the differences, augmented by the equilibrium error term. If the data do
not pass, then a simple VAR model on the differences in the data will be
used to estimate short-term dynamics.
Results
Univariate ARIMA Models
The Box-Jenkins modeling approach was applied to each of the five outcome
variables. Dickey-Fuller unit roots tests were carried out to statistically
examine the existence of unit roots in each of the five variables. The resulting
five univariate ARIMA models indicated that a unit root is present in all the
variables, and the outcomes of the Dickey-Fuller tests were consistent with
these results. Because a unit root was present in each of the five outcome
variables, as well as in the control variable AGE, the next step is to test the
long-term relationships among the variables using equilibrium regressions.
MIZIK_9781784716745_t.indd 532 14/02/2018 16:38

Equilibrium Regressions
Table 26.2 summarizes the results of equilibrium regressions for the five
outcome variables. The unit-root tests performed on the error terms of
these five equilibrium regressions confirmed that all residuals were station-
ary, indicating the presence of long-term associations among the depend-
ent variables. The R2 for each of the five regressions show that significant
amounts of variance, ranging from 88 percent to 97 percent, are explained
by the models. Examining the coefficients of the equilibrium regressions
provides the following results. Long-term movements of narcotics use and
property crime go hand-in-hand. As the crime level rises, abstinence from
narcotics use eventually decreases, and daily use increases. Furthermore,
increased crime is associated with lower methadone maintenance involve-
ment and higher legal supervision. Reciprocally, narcotics use has a positive
long-term association with crime involvement. In terms of social interven-
tion effects, methadone maintenance has a significant long-term association
with no narcotics use and property crime, indicating its desirable effects.
Addict involvement in either methadone maintenance or legal supervision
increases the likelihood of involvement in the other. Finally, contrary to our
expectation, legal supervision shows a positive long-term association with
narcotics abuse and crime involvement; that is, as legal status persists, so do
narcotics use and property crime. Some possible justification and explana-
tion for this last finding will be presented in the discussion section.
Overall, the five outcome variables form a cointegrated system. While
each variable individually may move up or down over time without mean
reversion, there exists a dynamic equilibrium state toward which all other
variables will adjust. Therefore, an error-correction model can be used to
examine the short-term relationships within the system in conjunction with
partial adjustment for the long-term behavior of the variables.
Combining Short-term and Long-term Dynamics
The procedure advanced by Tiao and Box (1981) was used to estimate
a VAR model augmented with equilibrium error-correction terms. In
order to determine how many lags were needed for developing a model,
the pattern of the partial autoregression matrices was examined. Based
on the Akaike Information Criterion, specifying one lag was found to be
sufficient to represent the short-term dynamics in the system.
The error-correction equations for the five outcome variables were
estimated simultaneously. The generalized least-squares parameter esti-
mates and the residual correlation matrix are given in Table 26.3. The
error-correction terms in the five equations were all significant at p < 0.05
MIZIK_9781784716745_t.indd 533 14/02/2018 16:38

MIZIK_9781784716745_t.indd 534
Table 26.2 Equilibrium regressions
NNU DNU C MM LS
Const. 14.874 (4.404)* −20.127 (5.965) 14.930 (1.196) 36.258 (5.560) −14.383 (3.600)
DNU _** – 0.213 (0.023)a 0.062 (0.109) 0.120 (0.062)c
C −0.656 (0.178)a 2.236 (0.242)a – −2.412 (0.252)a 0.925 (0.183)a
MM 0.196 (0.072)a 0.055 (0.097) −0.204 (0.021)a – 0.509 (0.029)a
LS 0.057 (0.121) 0.317 (0.164)c 0.232 (0.046)a 1.509 (0.085)a –
AGE 0.746 (0.091)a 0.075 (0.123) −0.085 (0.037)b 0.066 (0.131) 0.074 (0.076)
534
R2*** 0.958 0.927 0.972 0.969 0.880
F(4,94) 506.455a 298.034a 810.362a 738.956a 171.750a
t**** −3.763 −4.649 −5.245 −5.56 −5.069
Unit Root? No No No No No
a Significant at p < 0.01.

b Significant at p < 0.05.
c Significant at p < 0.10.
* The standard error of the estimate is included in parentheses.
** The sign '-' indicates that the row variable is assumed to have no effect on the dependent variable in the corresponding column.
*** Results are based on the individual regressions.
**** Dickey-Fuller unit-root test for cointegration on the residuals with critical values obtained from Engle and Yoo (1987).
14/02/2018 16:38
Table 26.3 Error correction models on first differences
Parameter Estimates of the Lagged Structure

∆NNU ∆DNU ∆C ∆MM ∆LS
MIZIK_9781784716745_t.indd 535
Lag l
∆NNU 0.112 (0.095)* –** – – –
∆DNU – 0.438 (0.104)a 0.200 (0.052)a −0.124 (0.085) 0.096 (0.071)
∆C 0.243 (0.138)c −0.372 (0.246) −0.151 (0.119) −0.075 (0.187) −0.161 (0.146)
∆MM −0.050 (0.093) 0.144 (0.146) 0.019 (0.069) 0.226 (0.110)b 0.134 (0.087)
∆LS 0.129 (0.109) −0.018 (0.172) 0.001 (0.076) 0.079 (0.128) −0.008 (0.101)
∆AGE 0.208 (1.932) 2.630 (3.048) −0.033 (1.341) 2.635 (2.224) 1.545 (1.783)
∆EQ Error −0.178 (0.053)a −0.143 (0.058)b −0.347 (0.110a −0.126 (0.049b −0.273 (0.072)a
R2*** 0.167 0.126 0.272 0.136 0.212
F(6,90) 2.995a 2.163c 5.606a 2.367b 4.046a
535
Residual Correlations****
∆NNU ∆DNU ∆C ∆MM ∆LS
∆NNU 1
∆DNU −0.594 1
∆C −0.372 0.519 1
∆MM 0.376 −0.471 −0.206 1
∆LS −0.134 0.087 0.198 −0.109 1
a Significant at p < 0.01.

b Significant at p < 0.05.
c Significant at p < 0.10.
* The standard error of the estimate is included in parenthesis.
** The sign '-' indicates the row variable is assumed to have no effect on the dependent variable in the corresponding column.
*** Results are based on the individual transfer functions.
**** The approximate standard error for the estimated correlations is 0.10.
14/02/2018 16:38
or better. On the other hand, only a few parameter estimates for the short-
term effects were significant (4 out of 25 estimates, one of which was only
marginally significant). These significant estimates reflect the persistence
of narcotics abuse over time and the contribution of narcotics-use behavior
to subsequent crime involvement. However, it should be emphasized that
the observed changes in the five outcome variables were explained mainly
by the error-correction terms, i.e., partial adjustments toward equilibrium.
Discussion and Conclusions
A conceptual framework using the systems approach and the analytical

techniques applied here has successfully characterized the dynamic inter-
play among narcotics-using behaviors, criminal involvement, and social
interventions over time. The techniques have allowed the disentanglement
of long-term, short-term, and contemporaneous effects of these variables on
each other. In the following discussion, we compare the results found in the
present study, which are pertinent to Southern California heroin addicts,
with our understanding of how current interventions operate in reality.
Narcotics Use and Property Crime: Reciprocal Dynamics
In the present study, a major focus was the assessment of the dynamic
equilibrium relationship between narcotics use and property crime within
the larger social context. The results demonstrate that, at least at the
group aggregate level, there is an interlocked reciprocal response between
the two behaviors that persists over time. Criminal activity contributes to
long-term narcotics use, while, at the same time, narcotics use increases
long-term property crime. This implies that addicts develop a special life-
style commitment from their long-term involvement in both narcotics use
and criminal activities. When the long-term component is partialed out,
current changes in the crime level are driven by the changes in narcotics
use in the immediately previous period, but not vice versa. The contempo-
raneous relationship (where causal direction cannot be statistically speci-
fied) is strong, as has been shown in most of the previous research.
Methadone Maintenance Treatment: Its Impact and Its Role as an

Outcome Measure
Addicts qualify themselves for admission to methadone maintenance

treatment because of problems associated with narcotics dependence.
Admission may also be coerced by referral from the legal system. The
MIZIK_9781784716745_t.indd 536 14/02/2018 16:38

present study confirms previous evaluation studies showing that metha-

done maintenance has significant long-term effects in reducing narcotics
use and related crime. However, no short-term effectiveness of methadone
maintenance was observed.
In addition to individual needs motivating treatment entry, treatment
retention at a group level may depend on program policy and legal pres-
sure. Because treatment results in positive effects, retaining clients over
suitable periods has been one of the goals of treatment or has been con-
sidered itself as an outcome measure. The social benefits demonstrated
by methadone maintenance cannot be maximally obtained without
further commitment of resources to increasing treatment availability.
This study also suggests that legal supervision may increase long-term
methadone maintenance involvement, both in motivating entry and
in prolonging retention. However, the negative long-term association
between property crime and methadone maintenance indicates that
narcotics abusers who are heavily involved in criminal activity tend
to resist methadone maintenance treatment. Therefore, more coercive
intervention efforts may be necessary to first bring them into treat-
ment and then to retain them for a sufficiently long period in order to
maximize social benefits.
Legal Supervision: Maximizing Effectiveness
In contrast to methadone maintenance, legal supervision operates solely

in a mandatory, or imposed, manner. The period of legal sanction is
determined by the detected levels of deviant behaviors such as drug use or
crime. Continued offenses or violations of probation or parole may result
in prolonged or recurrent sentences. The positive relationship observed
in the equilibrium regressions between narcotics-related behaviors and
legal supervision reflects the response of the legal system to continued
antisocial acts. As for the impact of legal supervision on narcotics use
and property crime, no direct effect was demonstrated in the five-variable
system. Only indirect effects, mostly through methadone maintenance
treatment, were observed. Overall, the above results imply that effec-
tive legal supervision can occur only in conjunction with methadone
maintenance treatment.
The present study has demonstrated that the techniques based on
unit-root testing, cointegration, and error-correction modeling can be
applied for evaluating the public policy in addressing the societal problems
associated with drug abuse. The effects of methadone treatment and legal
supervision on narcotics use and criminal activities were assessed, and
the results provided strong evidence of the effectiveness of methadone
MIZIK_9781784716745_t.indd 537 14/02/2018 16:38

maintenance treatment, particularly in the long term. The findings on the

effectiveness of methadone maintenance combined with the importance
of legal coercion in forcing individuals into treatment suggest that
compulsory treatment should be considered for chronic narcotic addicts
convicted of crimes.
References
Anglin, M. Douglas and Yih-Ing Hser (1990), “Treatment of Drug Abuse,” Crime and
Justice, 13, 393–460.
Anglin, M. Douglas and William H. McGlothlin (1984), “Outcome of Narcotic Addict
Treatment in California,” Drug Abuse Treatment Evaluation: Strategies, Progress, and
Prospects, National Institute on Drug Abuse Research Monograph, 51, 106–128.
Collins, James J. and Margaret Allison (1983), “Legal Coercion and Retention in Drug
Abuse Treatment,” Psychiatric Services, 34(12), 1145–1149.
Dickey, David A., William R. Bell, and Robert B. Miller (1986), “Unit Roots in Time Series
Models: Tests and Implications,” American Statistician, 40(1), 12–26.
Dickey, David A. and Wayne A. Fuller (1979), “Distribution of the Estimators for
Autoregressive Time Series with a Unit Root,” Journal of the American Statistical
Association, 74, 427–431.
Engle, Robert F. and Clive WJ Granger (1987), “Co-integration and Error Correction:
Representation, Estimation, and Testing,” Econometrica: Journal of the Econometric
Society, 55, 251–276.
Hser, Yih-Ing, M. Douglas Anglin, and Chih-Ping Chou (1988), “Evaluation of Drug Abuse
Treatment: A Repeated Measures Design Assessing Methadone Maintenance,” Evaluation
Review, 12(5), 547–570.
Powers, Keiko I. (1990), “A Multivariate Time Series Analysis of the Long- and Short-term
Effects of Treatment and Legal Interventions on Narcotics Use and Property Crime.”
Ph.D. diss., University of California at Los Angeles.
Powers, Keiko I., Dominique M. Hanssens, Yih-Ing Hser, and M. Douglas Anglin (1991),
“Measuring the Long-Term Effects of Public Policy: The Case of Narcotics Use and
Property Crime,” Management Science, 37, 627–644.
Powers, Keiko I., Dominique M. Hanssens, Yih-Ing Hser and M. Douglas Anglin (1993),
“Policy Analysis with a Long-Term Time Series Model: Controlling Narcotics Use and
Property Crime,” Mathematical and Computer Modeling, 17(2), 89–107.
Priestley, Maurice Bertram (1981), Spectral Analysis and Time Series. London: Academic
Press.
Simpson, D. Dwayne, and H. Jed Friend (1988), “Legal Status and Long-Term
Outcomes for Addicts in the DARP Follow-up Project,” In C. G. Leukefeld and F. M.
Tims (eds), Compulsory Treatment of Drug Abuse: Research and Clinical Practice, 86,
81–98.
Sims, Christopher A. (1980), “Macroeconomics and Reality,” Econometrica: Journal of the
Econometric Society, 48, 1–48.
Speckart, George and M. Douglas Anglin (1986), “Narcotics Use and Crime: An Overview
of Recent Research Advances,” Contemporary Drug Problems, 13, 741–769.
Stock, James H. (1987), “Asymptotic Properties of Least Squares Estimators of Cointegrating
Vectors,” Econometrica: Journal of the Econometric Society, 55, 1035–1056.
Tiao, George C. and George E. P. Box (1981), “Modeling Multiple Time Series with
Applications,” Journal of the American Statistical Association, 76, 802–816.
MIZIK_9781784716745_t.indd 538 14/02/2018 16:38

27. Applying structural models in a public
policy context
Paulo Albuquerque and Bart J. Bronnenberg
Structural models have been used in marketing to evaluate public policy

measures (e.g., Tuchman, 2015), firm investments (Goettler and Gordon,
2011), advertising policies (Dubé, Hitsch, and Manchanda, 2005),
and dynamic sales force compensation plans (Misra and Nair, 2011).1
Structural models represent consumption and policy data as outcomes of
decisions by economic agents. The idea is that the underlying primitives
of these decisions are policy invariant and can be used to evaluate how
agents react to a counterfactual marketplace in which the conditions of
purchase have changed. Structural models thus aid in making forecasts of
how demand and supply systems adjust to changes in the state variables.
In this chapter, we describe the structural approach presented in
Albuquerque and Bronnenberg (2012, AB henceforth) to measure the
impact of an influential governmental action in the automobile sector –
the Car Allowance Rebate System (CARS, also known as “Cash for
Clunkers” program) – on car sales and prices. CARS was a stimulus
program introduced in July of 2009 by the US government to counteract
the effects of the economic crisis on the auto industry, providing $3,500 or
$4,500 to a consumer who traded an old car for a new one.
Model
A structural model designed to test the implementation of a public

policy starts by describing how agents make decisions. In the case of
the auto industry, AB model consumer purchasing decisions, dealer
and manufacturer choices of price, and manufacturer choices of deal-
ership network size. With parameter estimates in hand, the approach
simulates the impact of price changes resulting from the public policy
discount implemented by the stimulus program. While the objective of
the CARS program was to increase demand by subsidizing consumer
purchases, providing rebates also gives firms the incentive to increase
prices to the final consumer and absorb part of the subsidy. In this
setting, simulating both consumer and firm behavior using a structural
539
MIZIK_9781784716745_t.indd 539 14/02/2018 16:38

model helps predict the pass-through of the subsidy and the net outcome
in the industry.
Consumers
AB assume that there are I consumers in the market. Each individual i

chooses to either purchase a car j or to use a different means of transporta-
tion (the outside good). The indirect utility for consumer i of purchasing
car j is given by
Uijt 5 liXjt 1 biPjt 1 zjt 1eijt

(27.1)
5Vijt 1eijt .
Each purchase option j is further decomposed as a vehicle of brand b,

type m (e.g., midsize sedan, compact SUV), sold at dealer d. The term
Xjt includes observed characteristics, such as engine size, transmission
type, and distance between the individual and dealer locations, while Pjt
represents the price for alternative j at time t. The term zjt captures the
impact of car attributes unobserved to the researcher but considered by
consumers and supply agents. These unobserved shocks are necessary
because, if positively correlated with prices, they create endogeneity
bias in the parameter estimates when left unaccounted for. Individual
consumers are likely to react differently to attributes and price changes
and AB account for these individual differences by allowing the
price response bi and the response to other car characteristics li to
follow known distributions (e.g., income distributions at the zip code
level).
In categories where differentiated products form groups of similar
alternatives, it is important to allow for stronger correlation in the utility
for alike products. In this case, a natural approach is to use a nested logit
formulation, with nests based on car type and brand in the unobservable
term eijt (Cardell, 1997; Richards, 2007). The resulting probability of
household i choosing alternative j – a car of type m and brand b – takes
the form of
Pri ( j) 5 Pri ( j 0 b (m)) 3 Pri (b (m) 0 m) 3 Pri (m) , (27.2)
where Pri (m) is the probability of choosing the car type m or the outside
good; the term Pri (b (m) 0 m) captures the probability of choosing brand
b, given the choice of car type m; and finally Pri ( j 0 b (m)) is the probabil-
ity of buying alternative j, given the choice of brand b and type m. Each
probability is given by
MIZIK_9781784716745_t.indd 540 14/02/2018 16:38

Applying structural models in a public policy context 541
1
expa V b
(12 sB) (12 sM) ij
Pri ( j 0 b (m)) 5 , (27.3)
a j r [b(m)
1
expa V rb
(12 sB ) (12 sM) ij
exp ((12 sB ) IVib(m ))

a
Pri (b (m) 0 m) 5 , and (27.4)
r exp ((12 sB) IVibr)
b [m
exp ((12 sM) IVim)

1 1 a exp ((12 sM) IVimr)
Pri (m) 5 , (27.5)
mr
where IVib(m) and IVim are the inclusive values of brand nest b and type m
equal to
IVib(m) 5 ln a expa
1
Vij b (27.6)
j[b (m)
( 12sB (12sM)
)
IVim 5 ln a exp ((12 sB) IVib) .

and
(27.7)
b[m
Manufacturers and Dealers
To predict managers’ pricing decisions when faced with alternative

demand conditions and the incentive provided by the CARS program,
AB model firm choices in response to consumer demand. Manufacturers
decide on the number of dealers in the market and then set wholesale
prices, while dealers take wholesale prices as given and choose final prices.
A subsidy from the government makes the final price to consumers lower
than the one chosen by dealers, like a discount.
Consider that the supply side has K manufacturers and D dealers.
Manufacturer k sets wholesale prices of all car alternatives in his portfolio,
for each period t (we do not include the time subscript in the following
equations for clarity), to maximize profit
pk 5 a (Wj 2cj) . sj . I2fk , (27.8)

j[k
where Wj is the wholesale price of alternative j and cj the manufacturer

variable cost. The product of market share sj and market size I represents
MIZIK_9781784716745_t.indd 541 14/02/2018 16:38

the total number of vehicles sold for alternative j. The fixed costs incurred
by the manufacturer are denoted by fk.
In equation (27.8), the manufacturer costs cj are unobserved by
researchers – firm costs are typically not part of available data sets – but
can be estimated through first-order profit maximizing conditions of
manufacturers. The fixed costs fk drop out of the estimation and are
assumed to be uncorrelated with the pricing decisions.
Dealers take manufacturer prices as given and choose consumer prices,
obtaining the following profit function:
pd 5 a (Pj 2Wj 1 dj) . sj . I 2 fd . (27.9)

j[d
The terms in parentheses give the unit margin for each car sold: the
difference between consumer price Pj and manufacturer price Wj, plus (or
minus) any additional cash flows dj (such as car service net revenues) that
need to be estimated. AB assume that dj are fixed quantities set based on
industry standards and not strategically decided by each retailer. The term
fd represents the fixed costs of dealer d that drop out of the estimation.
To solve this problem, AB start with the dealer maximization problem
and define the first-order conditions (in vector form) as
P 5 W2D 2 (uD ( Wp) 21 S. (27.10)
The terms P and W are vectors of consumer and manufacturer prices

respectively, while ∆ captures additional cash flows of dealers, which the
retailer considers when setting final prices. The term ∆ is non-standard:
AB observe both final and wholesale prices, which allows them to estimate
∆. The term qD is a dealer ownership matrix where qD ( j, j r ) 51 if alterna-
tives j and j′ are sold by the same dealer and 0 otherwise. Wp is a matrix of
derivatives of share with respect to final price, where the element ( j, j r )
of matrix Wp is 0s0pjrj . The symbol ( is used to represent element-by-element
multiplication and S is a vector of market shares. Equation (27.10) defines
the price charged by dealers as a function of manufacturer prices.
Turning to the manufacturer pricing decisions, manufacturers maxi-
mize profits and play a Bertrand-Nash pricing game that accounts for
dealer pricing strategies. The optimal manufacturer prices are given by the
first order conditions
W 5C 2 (UK ( Ww) 21S, (27.11)
where C is a vector of manufacturer variable costs, S is a vector of market

shares, and UK is a manufacturer ownership matrix. In this matrix,
MIZIK_9781784716745_t.indd 542 14/02/2018 16:38

qK ( j, j r ) 51 if alternatives j and j’ are sold by the same manufacturer

and zero otherwise. The term Ww is a matrix of derivatives of share with
respect to wholesale price, and a typical element (j, j)′ of the matrix Ww
0sjr
is 0w . This derivative can be obtained numerically once the demand side
j
parameters have been estimated (for a similar approach, see Villas-Boas
2007, 633–634).
Data and Estimation
To empirically apply their model and test the impact of the public policy,
AB use data on a large number of individual car transactions occurring in
San Diego and its suburbs between 2004 and 2006. For each transaction,
the researchers observe the car make and model, engine size, fuel, and
transmission type, as well as the zip code of dealer and consumer loca-
tions. For prices, they observe retail and wholesale prices for each car,
including any manufacturer rebates. The data are complemented by US
Census demographic data on income and population density at the zip
code level, used to implement consumer heterogeneity. The authors apply
the model to 15,795 transactions, concentrating the analysis on the most
important manufacturers in the area, which are General Motors (with
brands Cadillac, Chevrolet, and General Motors Cars), Ford, Honda,
Hyundai, Chrysler, Toyota, and Volkswagen. The data include 22 differ-
ent dealerships with a total of J = 62 alternatives.
The authors estimate the demand and supply model in two steps: first,
they estimate the demand parameters; second, using these parameter
values, the supply-side first-order conditions and respective parameters
are obtained. Since the demand model is fully identified from the choice
data, AB estimate the demand side without any assumptions on the
behavior of dealers and manufacturers. They use the control function
approach (Pancras and Sudhir, 2007; Petrin and Train, 2010) to control
for endogeneity of xjt and obtain the demand parameters using simulated
maximum likelihood. The “simulation” comes from including consumer
heterogeneity as draws from the distributions of demographic character-
istics to approximate the integrals in the demand model (for more details,
see Berry et al., 1995, and Nevo, 2001). The likelihood function takes the
following form:
L 5 q q q (Prijt 0 data,u) ijt ,

y
(27.12)
i j t
where yijt is an indicator variable that takes the value of one for the alter-
native chosen by individual i and zero otherwise. The term q is the vector
MIZIK_9781784716745_t.indd 543 14/02/2018 16:38

of demand parameters to be estimated. In the optimization algorithm, the

authors maximize the respective log likelihood function
log L 5 a a a yijt . log (Prijt 0 data,u) . (27.13)

i t j
AB obtain manufacturer variable costs C and dealer revenues Δ from the

first-order conditions of the optimization problem of these agents. For the
variable costs of manufacturers, they use equation (27.11) along with
the observed wholesale prices. That is, with the estimates of the demand
parameters in hand, all terms on the right-hand side of the expression
C 5W 2 [ (UK ( Ww) 21S ] (27.14)
are either observed or can be numerically computed.

Because the authors are also interested in testing different dealer
network configurations, AB use the approach in Pakes et al. (2015) to esti-
mate the fixed costs of dealerships using as input the observed decisions in
terms of size and location of the dealer networks. As the stimulus program
affected prices but not dealership closing or opening decisions, this last
part of the estimation is not important for the evaluation of the CARS
public policy. Nevertheless, with this last model stage, AB’s approach can
be used to quantify the impact of other governmental measures that may
affect the size of dealer networks.
Evaluating the Impact of CARS on Prices

and Demand
Using their model, AB show that the optimal pricing decisions of manu-
facturer and dealers would be to drop prices by $3,000 to $6,000 over two
years in the absence of any public policy, as a response to the financial
crisis and the large negative industry demand shock. The resulting severe
profit reductions and the then tough financial situation of the US car man-
ufacturers would have threatened their survival. The Cash for Clunkers
program was intended to offer some relief to auto companies by funding
a direct decrease in the prices paid by consumers, while keeping the higher
pre-crisis margins.
A subsequent question emerges: given that manufacturers and retailers
know that consumers have $4,500 of additional disposable income to
spend on a new car, do they adjust final prices to account for that subsidy?
The advantage of a structural model as proposed by AB is that it can be
used to measure how much of the subsidy offered to consumers stays in
MIZIK_9781784716745_t.indd 544 14/02/2018 16:38

their hands and how much is transferred to retailers and manufacturers by

way of price changes.
Hence, AB investigate the simultaneous impact of an economic crisis
and a car allowance rebate program by performing a counterfactual
scenario. Using the model estimates, the authors first reduce demand for
automobiles by the observed drop in car purchases during the economic
recession and pre-CARS introduction. They then apply the governmental
subsidy. In modeling terms, simulating the negative demand shock is done
by raising the appeal of the outside good option – which means not spend-
ing income on cars. They reduce retailer-set prices faced by consumers by
$4,500, the amount equivalent to the average subsidy offered by the US
government.
Given the cost primitives, the authors simulate pricing decisions in this
setting. Consumers take into account the benefit of the car allowance
discount, causing overall car demand to increase. Therefore, retailers face
more demand and respond by increasing prices. With these two changes
– an increase in the attractiveness of the outside good because of the crisis
and a subsidy that makes prices faced by consumers $4,500 lower – the
results show that, on average, dealer prices would increase by an average
of $1,542 per car. Therefore, out of the subsidy total of $4,500, an aver-
age of $2,958 ends up in the hands of consumers. Only two-thirds of the
subsidy impacts demand, leading to significantly lower positive support
to the market than in a case when the subsidy is fully “passed through” to
consumers.
Conclusion
In this chapter, we described an approach based on Albuquerque and

Bronnenberg (2012) that can be used to evaluate the impact of governmen-
tal subsidies on the behavior of consumers and on the pricing decisions of
firms. These authors use transactional data, including wholesale and final
car prices to identify consumer utility and cost primitives at the dealer and
manufacturing level. Using structural assumptions, the authors forecast
the demand and supply reactions to a government subsidy and how much
of this subsidy is passed on to consumers.
More recent work has started to use a combination of structural models,
panel data sets, and/or natural or quasi-natural experimental variation
to provide insights in industries likely to be the target of governmental
policies. For example, Shriver (2015) looks at the adoption of alterna-
tive fuel, Tuchman (2015) studies advertising decisions in the context of
e-cigarettes, and Bollinger (2015) analyzes green technology introduction.
MIZIK_9781784716745_t.indd 545 14/02/2018 16:38

Together these papers serve to further demonstrate the increasing practi-

cal relevance and usefulness of structural models in marketing.
NOTE
1. Adapted by permission, Paulo Albuquerque and Bart J. Bronnenberg (2012), “Measuring

the Impact of Negative Demand Shocks on Car Dealer Networks,” Marketing Science,
31 (1), 4–23. © 2012, Institute for Operations Research and the Management Sciences,
5521 Research Park Drive, Suite 200, Catonsville, MD 21228, USA.
References
Albuquerque, P. and B.J. Bronnenberg (2012), “Measuring the impact of negative demand
shocks on car dealer networks,” Marketing Science, 31 (1), 4–23.
Berry S., J. Levinsohn and A. Pakes (1995), “Automobile Prices in Market Equilibrium,”
Bolinger, B. (2015), “Green Technology Adoption: An Empirical Study of the Southern
California Garment Cleaning Industry,” Working Paper.
Cardell, N.S. (1997), “Variance Components Structures for the extreme value and logis-
tic distributions with applications to models of heterogeneity,” Economic Theory, 13,
185–213.
Dubé, Jean-Pierre, Guenter J. Hitsch, and Puneet Manchanda (2005), “An Empirical
Model of Op-timal Dynamic Product Launch and Exit Under Demand Uncertainty,”
Quantitative Marketing and Economics, 3 (2), 107–144.
Goettler, Ronald L. and Brett R. Gordon (2011), “Does AMD Spur Intel to Innovate
More?” Journal of Political Economy, 119 (6), 1141–1200.
Hanssens, D.M., D. Purohit, R. Staelin, P. Albuquerque, and B.J. Bronnenberg (2012),
“Commentaries and Rejoinder to ‘Measuring the Impact of Negative Demand Shocks
on Car Dealer Networks’ by Paulo Albuquerque and Bart J. Bronnenberg,” Marketing
Science, 31 (1), 24–35.
Misra, Sanjog and Harikesh S. Misra (2011), “A Structural model of Sales-Force
Compensation Dynamics, Estimation and Field Implementation,” Quantitative Marketing
and Economics, 9 (3), 211–257.
Pakes, A., J. Porter, K. Ho, and J. Ishii (2015), “Moment Inequalities and Their Application,”
Econometrica, January, 315–334.
Pancras, J. and K. Sudhir (2007), “Optimal Marketing Strategies for a Customer Data
Intermediary,” Journal of Marketing Research, 44 (4), 560–578.
Petrin, A. and K. Train (2010), “A Control Function Approach to Endogeneity in Consumer
Choice Models,” Journal of Marketing Research, 47 (1), 3–13.
Richards, T. (2007), “A Nested Logit Model of Strategic Promotion,” Quantitative Marketing
and Economics, 5, 63–91.
Shriver, S. (2015), “Network Effects in Alternative Fuel Adoption: Empirical Analysis of the
Market for Ethanol,” Marketing Science, 34, (1), 78–97.
Sudhir, K. (2001), “Competitive Pricing Behavior in the Auto Market: A Structural
Analysis,” Marketing Science, 20, 42–60.
Tuchman, A. (2015), “Advertising and Demand for Addictive Goods: The Effects of
E-Cigarette Advertising,” Working Paper.
Villas-Boas, S. (2007), “Vertical Relationships between Manufacturers and Retailers:
Inference with Limited Data,” Review of Economic Studies, 74 (2), 625–652.
MIZIK_9781784716745_t.indd 546 14/02/2018 16:38

part ix
case studies and

applications in
litigation support
MIZIK_9781784716745_t.indd 547 14/02/2018 16:38

MIZIK_9781784716745_t.indd 548 14/02/2018 16:38
28. Avoiding bias: ensuring validity and
admissibility of survey evidence in
litigations
Rebecca Kirk Fair and Laura O’Laughlin
Consumer surveys, often offered as evidence in trademark infringement

matters, have been increasingly offered as evidence in civil courts, in the
antitrust analysis of merger activity, and in agency investigations. The
Ninth Circuit noted in 2015 that surveys are now “de rigueur in patent
cases”1 as a tool to evaluate and quantify damages relating to alleged
infringement, highlighting the increasing acceptance of surveys across
many areas of practice. Notably, recent high-profile litigations involv-
ing patents and technology owned by firms such as Apple, Microsoft,
Samsung, Oracle, and Google show consumer surveys being used in patent
damages matters. In such matters, the intent of consumer surveys is to
provide evidence on drivers of consumer demand, to determine the impact
of particular actions by competitors, and to evaluate “but-for” choices
under alternative competitive conditions. In antitrust matters, survey
experts might evaluate and estimate the impact of collusive or exclusion-
ary activity on consumer demand. Such studies might explore how con-
sumer purchase decisions may change if a new competitor were to enter
a product or geographic market. In trademark litigation, survey experts
might use well-established techniques to evaluate consumer confusion or
assess interpretations of claims made on product packaging.
Despite the wide scope for survey evidence across practice areas, the
relevance and usefulness of expert-submitted surveys in any legal context
is dependent on how they are designed and implemented. A recent deci-
sion from the Seventh Circuit, in which Judge Richard Posner affirmed
a preliminary injunction sought by plaintiff Kraft Foods Group, Inc.
to block Cracker Barrel Old Country Store, Inc. (CBOCS) from selling
branded grocery products, highlights some of the pitfalls of using surveys
in litigation and exemplifies the skeptical view some have expressed with
regard to their “probative significance.”
Consumer surveys conducted by party-hired expert witnesses are prone to
bias. There is such a wide choice of survey designs, none foolproof, involving
such issues as sample selection and size, presentation of the allegedly confusing
products to the consumers involved in the survey, and phrasing of questions in
549
MIZIK_9781784716745_t.indd 549 14/02/2018 16:38

a way that is intended to elicit the surveyor’s desired response – confusion or

lack thereof – from the survey respondents.2
As Judge Posner’s opinion makes clear, the avoidance of bias, either in

fact or appearance, is central not only to a survey’s admissibility, but also
to the probative weight accorded to the survey expert’s testimony. Bias
may sometimes be obvious; at other times, it may be difficult to detect.
This chapter discusses possible sources of bias and describes methods and
techniques that a survey expert can use to minimize this bias.
What is Bias?
Valid surveys require a survey expert to ask the right people the right ques-
tions in the right way. In other words, a survey expert must implement an
appropriate method to accurately measure the construct of interest – all
while sampling from an appropriate population. If a survey fails in any
one of these areas – method, implementation, and population sampled – it
may suffer from one or several biases.
In order to demonstrate that potential biases have been avoided and
to encourage acceptance by courts, the survey expert must take affirma-
tive steps to demonstrate that careful and relevant design and sampling
techniques were used. For purposes of this chapter, we will put potential
biases into three categories: (1) selection biases, (2) information-related
biases, and (3) analytical biases. The first category relates to the popula-
tion studied (i.e., did the expert seek out and ask the right people using
statistically valid sampling techniques?). The second category relates to
which questions are asked, how the questions are asked, and what answers
are offered. The third category relates to how the data are analyzed, such
as implementing criteria for respondent inclusion in the analytical sample,
or the interpretation of open-ended responses. In some cases, if biases are
introduced through the analyses of the results, alternative analyses could
be conducted using the same data. Experts may even recover from errors
resulting from information-related biases – an imperfect question, for
example, may still provide relevant information. On the other hand, it is
nearly impossible to recover from selection-related biases that result in a
failure to identify the right population. A valid survey must study the right
population – otherwise the results are irrelevant.3
These potential biases may exist when any survey is implemented in any
context; however, the incentives present in litigation, as well as the need to
demonstrate rigor in such contexts, make the assessment of bias in court
cases particularly critical. Recent expert reports and court opinions have
MIZIK_9781784716745_t.indd 550 14/02/2018 16:38

Avoiding bias 551
revealed an increasing emphasis on demonstrating that best practices were

followed, including the presentation of affirmative evidence that steps
were taken to identify and reach the appropriate population, to minimize
the possibility of leading questions or answer options, and to deliver
accurate results.
In the remainder of this chapter we discuss approaches to sampling,
implementation, design, and analysis that should be considered in order to
avoid introducing biases. We further discuss expert evidence, which may
include documentation of the survey design process, evidence of adherence
to best practices and an academically sound methodology, and informa-
tion from focus groups and/or pre-testing to illustrate that every effort
was undertaken to avoid biasing the results. Finally, we discuss the pos-
sibility of using external validation – if data are available – to strengthen
an expert’s position with confirmatory results from other external data
sources.
Were the Opinions of Relevant People

Sought?
The first key element to a survey’s reliability is identifying the appropriate

respondent universe from which to draw. That is, the expert must define,
target, and sample from the segment of the population whose beliefs are
relevant to the issues in the case – otherwise, the survey may be open to
critiques of selection bias. In patent infringement matters, the appropriate
universe may include all current and possible future consumers of the at-
issue product. In class action matters, the respondent universe may include
consumers of the relevant product during the class period, or sufficiently
comparable consumers to those who purchased the relevant product
during the class period. In matters relating to false advertising, the stand-
ard may be a “reasonable” or “credulous” consumer to determine whether
a statement is interpreted in a false or misleading manner, although such a
consumer can prove to be an elusive target.4
The selection of the proper universe as well as the method in which the
universe is sampled are critical, and must fit the facts and any other case
requirements. Even if every other step was taken appropriately, if the
wrong people are asked, the results are likely to be irrelevant and the data
may be excluded.5
Proper screening protocols should aim to yield a survey sample consist-
ing of relevant respondents (i.e., respondents with sufficient knowledge
and experience to inform the trier of fact). The first step to accessing
relevant respondents is determining the mode of outreach – phone, mall
MIZIK_9781784716745_t.indd 551 14/02/2018 16:38

intercept, email, internet, or traditional mail. The mode of outreach should

contemplate the target sample – if the analysis requires reaching consum-
ers of postage stamps, a traditional mail approach might be preferred. If
the analysis requires reaching smartphone users, an internet-based panel
would be preferable to traditional mail.6 The appropriate mode of out-
reach, combined with screening questions and demographically relevant
quotas, are typically used to qualify representative members of the target
population. To avoid bias when selecting the target sample, screeners
should be “drafted so that they do not appeal to or deter specific groups
within the target population, or convey information that will influence the
respondent’s answer on the main survey.”7 Improperly drafted screening
questions – such as only asking whether the potential respondent has
an iPhone (rather than asking the respondent to select among possible
smartphones) – can tip off the respondent to the purpose of the study and
can bias survey results.
If the target universe is not appropriately defined, the resulting sample
of respondents may be either overly broad (over-inclusive) or narrow
(under-inclusive). In Competitive Edge v. Staples, the survey expert not
only failed to define a target universe, “greatly” harming the relevance
of his survey, but he also chose to survey an “under-inclusive” sample
of college students, “seriously diminishing the reliability of the survey.”8
In this instance, the sampling errors affected the weight of the evidence,
but in other cases, inappropriately defined samples – or samples that
fail to match the definition of the appropriate universe – may lead to
the exclusion of survey results from evidence.9 To further illustrate, the
survey expert in a recent class action matter defined the target universe
as “the population of [appliance] owners” but failed to provide a viable
method to sample from this population to obtain reliable results. In the
order excluding this expert’s testimony, the court found that the “[expert]
cannot say much of anything about who answered his internet survey . . .
[The expert] can’t say for sure whether any survey-takers actually owned
[the appliance at issue]. Identifying data was not requested, such as serial
number or other criteria tending to establish that the survey responder
really owned the product.”10
Errors related to sampling are particularly problematic because
there is no way to know, with any degree of certainty, whether these
selection-related errors bias the results, and whether the bias overstates
or understates the results. If the sampled population is found to be
“under-inclusive,” as in Competitive Edge v. Staples, then “the survey’s
value depends on the extent to which the excluded population is likely
to react differently from the included population.”11 Without conducting
or relying upon additional research, neither the expert nor the court can
MIZIK_9781784716745_t.indd 552 14/02/2018 16:38

Avoiding bias 553
determine whether the relevant but excluded population is sufficiently

similar to the relevant and included (or even the irrelevant and included)
population to draw any conclusions.
An over-inclusive sample may include individuals who do not have
experience relevant to the issue at hand, an error that requires one to
question whether the opinions of the sampled respondents are even
relevant to the case. As the court found in NetAirus Techs. LLC v. Apple
Inc., an over-inclusive survey may be irrelevant: “[a] survey that generates
answers from respondents who have no basis to provide them is not one
conducted according to accepted principles.”12 Similarly, in affirming dis-
missal in a recent debt collection practices litigation, the Seventh Circuit
found a survey sample size to be over-inclusive and likely inappropriate,
“especially when one considers the mismatch between the population to be
sampled – people who receive dunning letters from debt collectors – and
the sample, which consisted of mall patrons none of whom, for all one
knows, may ever have received such a letter.”13
Although in some instances analytical solutions can be used to reconcile
an over-inclusive sample or one with an inappropriate distribution of
respondents, it is impossible to do so if the target population is under-
inclusive, poorly defined, or if relevant characteristics of the sampled
population are unknown. Although re-weighting can be used to “correct
random samples for participant loss that may be systematic and for
improving the match between nonrandom samples and populations,”14
one cannot weight respondents who have not been included.
Are the Methodologies Academically

Rigorous and Unbiased?
An appropriate and admissible survey should be grounded in academically

rigorous and unbiased methodologies. Once the key questions are identi-
fied, the survey expert should consider the most appropriate approach to
assess these questions. For example, if the objective is to assess the impact
on consumer behavior of particular product logos or claims in advertising
in a trademark or consumer confusion matter, a test-and-control experi-
mental design is often the best choice, as it can help isolate whether there is
a causal link between the logos or claims and consumer behavior.
The “Eveready” trademark survey design, which is based on a survey
used in Union Carbide Corp. v. Ever-Ready Inc.,15 is an early example of the
acceptance of test-and-control design. In that matter, the Seventh Circuit
determined that the district court had erred when it found that surveys
were entitled “to little, if any, weight” and affirmed the value of surveys
MIZIK_9781784716745_t.indd 553 14/02/2018 16:38

in determining whether there exists a likelihood of confusion between two

products.16
If the task is to evaluate the relative importance or value of various
attributes to consumer choice in a patent infringement case, a conjoint
study – a market research technique used to determine how people value the
features that make up a product or service – may be optimal. For example,
in TV Interactive Data Corp. v. Sony Corp.,17 TVI sued Sony and a number
of other Blu-ray and DVD player manufacturers, alleging infringement
of four patents related to certain automatic playback technologies. TVI’s
expert conducted three surveys, including one choice-based conjoint to
measure the “market’s willingness to pay” for the technology in question,
and then analyzed his findings using conjoint analysis.
In other instances, a more direct set of questions may suffice. For exam-
ple, an Eveready study in a trademark infringement survey is designed to
determine brand and product affiliations and ask questions such as “Who
do you think puts out the product shown here?” and “What makes you
think so?”18
Regardless of which methodology is employed, one must take care to
match the design and the questions to the objective. A survey written in an
overly broad manner, even if based on a standard methodology, is likely
to be inadmissible; in Fractus v. Samsung, for example, a broad survey was
excluded because it confused the issue, risking a jury award based on the
value of an internal antenna rather than the value of the at-issue aspect of
the internal antenna.19 And in Oracle v. Google, Judge William Alsup ruled
that conjoint survey results presented by Oracle’s expert were too narrow
to be allowed to determine market share; however, Judge Alsup did allow
that they could be used in the “determination of relative importance
between application startup time and availability of applications.”20
Is the Implementation Appropriate and

Unbiased?
As Judge Posner noted in his Kraft v. CBOCS decision, survey evidence,

like most expert-presented evidence, is generally sponsored by a party in
litigation. To avoid informational biases, the right survey questions must
be asked in the right way, a process which encompasses multiple design
choices. Based on recent litigation, one can conclude that the survey
expert’s decision process in determining how questions are asked should
be made as transparent as possible to the trier of fact. Key design choices
include question phrasing, survey methodology, experimental design, and
survey administration. Practically speaking, a survey in aid of litigation
MIZIK_9781784716745_t.indd 554 14/02/2018 16:38

Avoiding bias 555
will have greater probative value if the expert can document and support
the choice of question, sample, and method, while minimizing the possibil-
ity for or existence of biases that can “tweak” the survey method in his or
her favor.
The survey expert’s decision to use open-ended or closed-ended ques-
tions can have implications in terms of relevance, analysis, and potential
for or perception of bias. Open-ended questions increase analytical
complexity and may make it difficult to group responses effectively, given
the array of words and phrases respondents may use to express the same
concept. Alternatively, closed-ended questions might “push” respondents
into an answer they would not otherwise have given, a concern expressed
by the Seventh Circuit in Hubbard v. Midland Credit Mgmt.21 Qualitative
research to justify closed-ended responses or a two-stage approach (i.e.,
open-ended followed by closed-ended questions) can help to alleviate
concerns of such biases.
When phrasing questions, social science researchers have long empha-
sized the importance of understandable language. The survey expert should
be wary of “unexpected meanings and ambiguities to potential respond-
ents”22 and adopt “a critical attitude toward [their] own questions.”23
Questions should be reviewed for clarity – and should test one concept at a
time. If questions are unclear or attempt to test too many factors at once,
they “may threaten the validity of the survey by systematically distorting
responses if respondents are misled in a particular direction.”24 Examples
of distortion include questions that are framed in a way to prompt a “yes”
or questions that inadvertently “tip off” the respondent to the researcher’s
hypothesis. Results from a classic experiment illustrate this effect. In
this experiment, respondents were presented with three identical product
samples, but were told that the samples were different. Unsurprisingly,
the respondents “acted as a demand bias explanation would assert and
obligingly varied their ratings of three identical samples.”25 In the recent
NetAirus patent litigation, the judge reasoned that the survey evidence was
affected by informational biases (among other issues), and excluded the
survey evidence in part because the expert’s hypothesis was exposed within
the survey instrument.26
An additional way to minimize potential bias is to conduct surveys
and experiments in a manner that is “double-blind,” thus eliminating the
chance that the interviewer could influence the results. Research indicates
that respondents generally want to please those conducting the survey;
therefore, to ensure objectivity, both “the interviewer and the respondent
should be blind to the sponsor of the survey and its purpose.”27 Expert use
of online surveys has reduced the possibility of unobservable interviewer
bias since the interviewer is a computer program.
MIZIK_9781784716745_t.indd 555 14/02/2018 16:38

Was Pre-testing Used to Validate the

Survey Instrument?
Additional steps can be taken to demonstrate that the survey method,

implementation, and sample selection process is unbiased and does not
drive results in a particular direction. To evaluate various design decisions,
the survey may be pre-tested before a full launch “to increase the likeli-
hood that questions are clear and unambiguous.”28 Similarly, pre-testing
can also minimize the possibility of demand artifacts (e.g., unintended
implications from a survey design, such as a respondent’s ability to guess
the sponsor or purpose of a study), which may arise from an aspect of the
survey or experiment and “cause the subject to perceive, interpret, and act
upon what he believes is expected or desired of him by the experimenter.”29
Pre-testing may also provide information on the availability of relevant
respondent groups. Given the importance of reaching the right people, the
expert may choose to oversample or make other adjustments to ensure
that the research questions can be answered by the relevant population.
In some instances, experts may choose to provide information on the
pre-testing process. If the targeted sample is difficult to reach (e.g., rare
stamp collectors), the expert may test alternative methods of reaching
the target sample, such as recruiting respondents through special interest
associations or professional organizations, and provide statistics on the
improved incidence of stamp collectors in the sample. In other instances,
pre-testing may help to refine the language used in questions in order to
“maximize the likelihood that respondents understand the questions they
are being asked.”30 Even though “survey professionals generally do not
describe pilot testing in their survey reports,”31 there have been instances
in which the survey expert has provided testimony and, in some cases,
documentation of pre-test results to the court.32 In the Kraft v. CBOCS
matter, for example, Judge Posner expressed concern that the Kraft survey
might have encouraged guessing; a pre-test may have provided support
that such guessing did not occur.
Is the Analysis of the Survey Data

Appropriate and Unbiased?
Different survey and experimental designs require different methods to

analyze the data; these methods can be affected by analytical biases. In
particular, surveys that include open-ended responses typically require
careful and often subjective analysis in order to determine the results. For
example, in a trademark study, one must determine whether respondents
MIZIK_9781784716745_t.indd 556 14/02/2018 16:38

Avoiding bias 557
have associated the plaintiff’s products or brands with the products of the
defendants by reviewing results to questions such as “Who makes or puts
[this at-issue product] out?” and “Why do you say that?” In addition, mis-
spellings, abbreviations, or colloquialisms used by respondents may make
such analyses difficult and subject to biased interpretation.
To avoid introducing researcher bias, open-ended responses can be
carefully analyzed by coders who are blind to the purpose of the study.
Such coding “requires a detailed set of instructions so that decision
standards are clear and responses can be scored consistently and accu-
rately.”33 Often, it may be important to involve two coders in the analysis
to compare results, cross-check response categorization, and ensure
consistency. If relevant, the expert may choose to include in his or her
production materials the instructions and decision standards provided
to the coders. Regardless, the raw open-ended response data and ensuing
analysis should be provided in order to allow for independent review and
confirmation by opposing parties.
Biases may also be introduced during the analysis of the data. When
analyzing data, it may be necessary to exclude certain categories of
respondents with appropriate justification. One example would be to
exclude “straight-liners,” or respondents who always select the first option
in multiple-choice answers, because the expert may suspect that these
respondents were not paying sufficient attention to the survey task.34
Experts may also choose to exclude those who take too much or too little
time to answer the survey questions. Generally, the analytical results are
unlikely to be affected by such exclusions. If, on the other hand, the expert
excludes larger categories of respondents, such as consumers of particular
products, or consumers residing in certain regions, the reasons for such
exclusions should be well documented and appropriately justified, and the
effect of such exclusions should be tested and understood.
Are the Survey Results Cross-Validated?
To demonstrate that the results of a survey are consistent with other

data or economic theory, survey experts and their teams can also provide
complementary evidence. For example, surveys and market research
conducted in the normal course of business by the parties in suit or by
third parties may support (or refute) the findings of a survey conducted
in a litigation context. Similarly, data analyses – such as a hedonic pricing
analysis or a before-and-after sales data analysis – may provide results
consistent with those found in a survey. If a conjoint design is used to
evaluate several product features, and the market price for one or more of
MIZIK_9781784716745_t.indd 557 14/02/2018 16:38

the tested features can be determined from transaction data, comparisons

can be drawn to confirm and/or scale survey results to match with histori-
cal pricing.
Fact witnesses, deposition testimony, and the evidentiary record – as
well as economic theory – can also corroborate survey results. For exam-
ple, communication between customers and manufacturers, or third-party
product reviews, may indicate that particular features are of importance
in a purchase decision. But if these features appear irrelevant in the
survey, one might conclude that the survey design was flawed or an inap-
propriate sample was surveyed. Design flaws and a disconnection from
the marketplace realities of purchase decisions were among the Seventh
Circuit’s issues in Kraft v. CBOS. Such evidence may also be helpful in
demonstrating that data and conclusions are only minimally affected (if
at all) by possible sources of selection bias, informational bias, and/or
analytical bias.
While most experts would agree that marketplace conditions should
factor into the choice of survey or experimental method, how well specific
methods reflect actual consumer choice processes is a matter of debate.
A conjoint experiment may be viewed by a traditional economist as a
close approximation of the consumer decision-making process, but by a
behavioral economist as not reflective of how consumers make decisions
and therefore of little value. So, it is important at the design stage that
the expert considers possible ways to validate data and methodologies.
As Judge Posner noted, “[courts have failed] to develop a framework
for understanding the conditions that may affect the attention that can
be expected to be given to a particular purchase.”35 If results from a
litigation-sponsored survey are confirmed with other data, the convergent
results may help to strengthen the survey’s evidentiary weight and may
demonstrate that distinctions between the survey and the marketplace do
not affect results.
Conclusion
As surveys have gained prominence in court cases, proper vetting of

survey evidence can be a crucial component of a litigation strategy. Hiring
the right experts and following best practices can help ensure that survey
evidence reaches the jury. Meanwhile, identification of design failures or
biased samples and analyses can help to have faulty surveys excluded.
However, even if a survey contains notable flaws in implementation, anal-
ysis, or validation, case law in the Ninth Circuit and elsewhere establishes
that juries are able to assess the impact of possible technical deficiencies
MIZIK_9781784716745_t.indd 558 14/02/2018 16:38

Avoiding bias 559
on the probative value of a survey. In a recent order in Sentius Int’l LLC

v. Microsoft Corp., Judge Paul Grewal noted that “surveys are not exactly
unusual or unfamiliar to the layperson.”36 Citing the Federal Circuit’s
2014 opinion in Apple Inc. v. Motorola, Judge Grewal maintained that
despite several methodological shortcomings, “questions regarding which
facts are most relevant or reliable to calculating a reasonable royalty are
‘for the jury.’”37
Overall, surveys have been shown in many circumstances to be a useful
method through which to deliver evidence, and can be particularly valu-
able when other sources of data are not available. Nonetheless, courts have
been and are likely to remain skeptical of surveys – and methodological
flaws can hurt both admissibility and weight of impact. Recent decisions
relating to the validity and admissibility of survey evidence, along with
other high-profile litigation outcomes, highlight the necessity for adher-
ence to best practice at every step.
NOTES
1. Sentius International LLC v. Microsoft Corporation, 2015 US Dist. LEXIS 8782 (9th
Cir. N.D. Cal. Jan. 23, 2015).
2. Kraft Foods Group Brands LLC v. Cracker Barrel Old Country Store, Inc., 735 F.3d 735
(7th Cir. Ill. 2013).
3. “A survey is inadmissible when the sample is clearly not representative of the universe it
is intended to reflect.” Bank of Utah v. Commercial Security Bank, 369 F.2d 19 (10th Cir.
1966).
4. Similar to Lanham Act matters, under California laws relating to “misleading” repre-
sentations, the standard is the “reasonable” consumer. See, e.g., Committee on Children’s
Television v. General Foods Corp. (1983) 35 Ca1.3d at 212; Chern v. Bank of America
(1976) 15 Cal.3d 866, 876; Colgan v. Leatherman Tool Group, Inc. (2006) 135 Cal.App.4th
663, 680. In Canada, a recent Supreme Court ruling stated that the standard should apply
to a “credulous and inexperienced” consumer. See Richard v. Time Inc. (2012) SCC 8.
5. J.T. McCarthy, McCarthy on Trademarks and Unfair Competition, 4th ed., Thomson
Reuters/West, 2012, pp. 376–379.
6. Ubiquitous internet use and the decline of land telephone line usage within certain
demographic categories have led to the increasing acceptance of internet-based surveys
that sample from online panels. For extensive discussion of the advantages and dis-
advantages of Internet surveys, please see Shari S. Diamond, “Reference Guide on
Survey Research,” in Reference Manual on Scientific Evidence, 3rd ed., The National
Academies Press, 2011, pp. 359–423, at pp. 406–409.
7. Diamond, “Reference Guide on Survey Research,” at pp. 386–387.
8. Competitive Edge v. Staples, 763 F. Supp. 2d 997; 2010 U.S. Dist. LEXIS 29678 (7th
Cir. N.D. Ill. 2010).
9. “A survey is inadmissible when the sample is clearly not representative of the universe
it is intended to reflect.” Bank of Utah v. Commercial Security Bank, 369 F.2d 19 (10th
Cir. 1966).
10. In Re: Front Loading Washing Machine Class Action Litigation, [Daubert hearings
opinion], July 10, 2013. http://www.lieffcabraser.com/Documents/front-loading-opinion-
daubert.pdf.
MIZIK_9781784716745_t.indd 559 14/02/2018 16:38


12. NetAirus Techs. LLC v. Apple Inc., No. LA CV10-03257, Dkt. No. 524 (C.D. Cal. Oct.
23, 2013).
13. DeKoven v. Plaza Associates, 599 F. 3d 578 – Court of Appeals (7th Cir. 2010).
14. William R. Shadish, Thomas D. Cook, and Donald T. Campbell, Experimental and
Quasi-Experimental Designs for Generalized Causal Inference, Wadsworth, Belmont,
CA, 2002, p. 386.
15. Union Carbide Corp. v. Ever-Ready, Inc., 531 F.2d 366 (7th Cir. Ill. 1976).
16. Ibid.
17. TV Interactive Data Corp. v. Sony Corp., 929 F. Supp. 2d 1006 (N.D. Cal. 2013).
18. Union Carbide Corp. v. Ever-Ready, Inc., 531 F.2d 366 (7th Cir. Ill. 1976).
19. Fractus, S.A. v. Samsung Electronics Co., Ltd., Civ. No. 6:09-cv-203-LED-JDL (E.D.
Tex. Apr. 29, 2011).
20. Oracle Am., Inc. v. Google Inc., 2012 U.S. Dist. LEXIS 33619 (N.D. Cal. Mar. 13, 2012).
21. “More fundamentally, it is not clear that closed-end questions are the appropriate way
to test for the type of alleged deception in this case. The court perceives a significant
risk that the closed-end questions would push respondents to read more into the dis-
puted letters than is actually there.” Hubbard v. Midland Credit Mgmt., 2009 U.S. Dist.
LEXIS 13938 (S.D. Ind. Feb. 23, 2009).
23. S.L. Payne, The Art of Asking Questions, Princeton University Press, Princeton, NJ,
1951, at p. 16.
24. Diamond, “Reference Guide on Survey Research,” at p. 389.
25. The format of the question [i.e., Yes, No, Don’t Know] does not follow best practices
because it does not give “[c]omparable [e]xplicit [e]mphasis to the [a]ffirmative, [n]
egative, and [n]eutral positions.” Additionally, questions that can be answered by a
mere “yes” or “no,” are more likely to be leading because and hence “all other things
being equal, respondents – generally, agreeable people who have agreed to participate
in the first place – are more inclined to be agreeable and answer ‘yes’ than to answer
‘no.’” Alan G. Sawyer, “Demand Artifacts in Laboratory Experiments in Consumer
Research,” Journal of Consumer Research, Vol. 1, No. 4, March 1975, pp. 20–30, at
p. 26.
26. “[T]here was no effort made . . . to shield participants from study goals.” NetAirus
Techs. LLC v. Apple Inc., No. LA CV10-03257, Dkt. No. 524 (C.D. Cal. Oct. 23, 2013).
28. “Texts on survey research generally recommend pretests as a way to increase the likeli-
hood that questions are clear and unambiguous.” Diamond, “Reference Guide on
Survey Research,” at p. 388.
29. Alan G. Sawyer, “Demand Artifacts in Laboratory Experiments in Consumer Research,”
Journal of Consumer Research, Vol. 1, No. 4, March 1975, pp. 20–30, at p. 20.
31. Ibid.
32. See, for example, Hershey Foods Corp. and Homestead, Inc. v. Mars, Inc., 998 F. Supp.
500 (M.D. Pa. 1998).
34. A case that requires particularly careful consideration is when respondents answer
“don’t know” for all questions. By including such respondents who straight-line on
“don’t know,” the analytical results may either overstate or understate the measure of
interest.
35. Kraft Foods Group Brands LLC v. Cracker Barrel Old Country Store, Inc.
36. Sentius International LLC v. Microsoft Corporation, 2015 U.S. Dist. LEXIS 8782 (9th
Cir. N.D. Cal. Jan. 23, 2015).
37. Op cit., citing Apple Inc. v. Motorola, Inc., 757 F.3d 1286 (Fed. Cir. 2014).
MIZIK_9781784716745_t.indd 560 14/02/2018 16:38

29. Experiments in litigation
Joel H. Steckel
Courtroom disputes often hinge on questions of causality. The ques-

tions generally surround allegations that the actions of one party (usually
a company) directly caused harm or injury to another (either another
company or consumers). In our legal system, when one party believes he/
she/it has been caused harm, that party has the opportunity to become a
plaintiff in a lawsuit. The party who allegedly causes the harm takes on
the role of defendant. The two parties then set out to argue questions of
causality.
Examples of such questions along with a parenthetical mention of the
nature of the legal allegation that gave rise to that question include:
l Would the marketing of a chocolate under the brand name SwissKiss

cause consumers to think the chocolate was a Hershey product?
(trademark infringement)
l Did the marketing of a Marlboro sub-brand with the designators
“Lights” and “Lowered Tar & Nicotine” cause consumers to per-
ceive a health advantage and either smoke when they otherwise
would not have or switch to that brand from another? (consumer
fraud)
l Did certain design features of Samsung’s Galaxy phone cause
consumers to buy the Galaxy instead of Apple’s iPhone? (patent
infringement)
l Did a manufacturer/distributor’s placement of cK Calvin Klein
jeans in warehouse clubs cause consumers to think less of the Calvin
Klein brand? (license noncompliance)
l Would an adult novelty and gift store called Victor’s Little Secret
cause harm to Victoria Secret’s reputation in general or its ability to
distinguish its products? (trademark dilution)
l Would a merger of Staples and Office Depot reduce competition
and cause prices to rise for consumers? (antitrust).
In each case, a plaintiff, be it a company, a class of consumers, or the

government on behalf of consumers, must produce evidence of the causal
proposition while the defendant either produces evidence that the causal
proposition is not true or simply that the plaintiff’s evidence is insufficient.
561
MIZIK_9781784716745_t.indd 561 14/02/2018 16:38

As any social scientist realizes, the standard and most compelling

approach for providing evidence of causality is a randomized experiment.
Randomized experiments, widely credited to Sir Ronald Fisher, began
in the study of agriculture (Fisher 1925, 1926), but soon spread to other
spheres of application because they enabled control of sources of vari-
ation apart from the hypothesized causal construct. Various conditions
or treatments being studied including no treatment at all (e.g., exposure
to an advertisement, an alternative advertisement, or no ad at all) are
randomly assigned to experimental units (e.g., people). If implemented
correctly, random assignment creates two or more groups of units that
are probabilistically equivalent to each other on the average. Hence any
outcome differences (e.g., sales, judgments) observed at the end of a study
are likely to be due to the differences in conditions or treatments and not
to differences among the groups that existed prior to the study.
Randomized experiments have spread from psychology to a wide
variety of arenas. Researchers in medicine have referred to randomized
experiments as “the gold standard for treatment outcome research”
(Shadish, Cook, and Campbell 2002, 15). The practicing legal community
has come to a similar conclusion, albeit with different parlance. What
social scientists call randomized experiments, the legal community refers
to as a type of consumer survey. Practicing attorneys do not carefully
distinguish among different types of studies that collect primary data from
consumers. They are all called consumer surveys.
Randomized experiments and consumer surveys conducted for a litiga-
tion must conform to the same scientific standards as traditional academic
randomized experiments: internal validity, external validity, and construct
validity. However, the specific ways in which these criteria are applied
differ. As those of us who have conducted consumer surveys for litigation
know, the degree of scrutiny they receive is usually greater than the scru-
tiny applied in the typical journal review process. As my colleague John
Hauser has remarked with respect to the studies he has done, attorneys
(armed with the assistance of qualified experts) are like reviewers on
steroids. Journal reviewers tend to be open minded as to whether a study
constitutes good science; opposing attorneys and experts begin with the
conviction that there must be something wrong with the study and we have
to find it!
This chapter outlines how the three classes of validity (internal, external,
and construct) manifest themselves in litigation experiments. In particular,
I highlight the heightened importance of external validity. Furthermore, I
identify those aspects of litigation experiments that are most vulnerable to
criticism in the courtroom. The chapter concludes with a description of a
recent litigation experiment that not only passes all validity standards, but
MIZIK_9781784716745_t.indd 562 14/02/2018 16:38

Experiments in litigation 563
also demonstrates that external validity can alter how a specific consumer
behavior standard is understood. In so doing, the study demonstrates
how the requirements of the courtroom can actually contribute to the
understanding of consumer behavior.
I begin with the premise that the differing ways that validity criteria are
applied in litigation experiments stem from the differing goals they have
relative to academic consumer research.
Goals of Academic and Litigation

Experiments
Academic consumer research experiments can be characterized as basic

and universalistic. They aim to develop and refine new knowledge. They
investigate theoretically predictable causal associations (i.e., hypotheses)
between and among abstractly specified constructs that generally hold
true. Such studies are often conducted on convenience (usually student)
samples, with the justification that, since the hypotheses examined are
general (unless specific boundary conditions are specified), it matters little
who they are tested on.
In contrast, litigation experiments are of necessity applied and particu-
laristic. Each is intended to apply to a particular concrete setting, popula-
tion, and often time period. They are usually of little if any interest outside
of the litigation for which they were conducted. For example, Church &
Dwight Co., Inc. v. SPD Swiss Precision Diagnostics GMBH (Case No.
1:14-cv-00585-AJN, United States District Court, Southern District of
New York) centered around whether specific television ads and product
packaging for the ClearBlue pregnancy test implemented between August
2013 and February 2014 caused consumers to have the impression that the
product estimated how long a woman has been pregnant the same way
that a doctor would estimate how long a woman has been pregnant (i.e.,
since the beginning of her last cycle). Note here that the ClearBlue test
measured time since ovulation. The court largely sided with the plaintiff
and against ClearBlue in this matter. As usual with litigation experiments,
the studies conducted in this case are of little general interest. This is the
general circumstance, though I will present an exception later.
The important point here is that the differences between the goals of
academic and litigation experiments have a direct bearing on how study
validity is assessed.
MIZIK_9781784716745_t.indd 563 14/02/2018 16:38

Validities in Experimental Research
As noted above, academic consumer research is often evaluated with

respect to (at least) three types of validity: construct, internal, and exter-
nal. External validity refers to the degree to which data collected in the
research environment can be generalized to the outside world. Consumer
research experiments performed in a natural context (i.e., field experi-
ments) enjoy high external validity. However, most such experiments take
place in more artificial contexts such as a laboratory and/or in front of a
computer screen. In such circumstances, efforts need to be made to ensure
that the critical aspects of reality are captured in the artificial context and
that procedures used in such a context do not compound any inherent
limitations of such a design.
In contrast, internal validity refers to the degree to which specific experi-
mental procedures allow for causal inferences with respect to the stimuli
in the study. Finally, construct validity refers to the degree to which the
underlying constructs are appropriately operationalized in an experi-
ment. While internal, external, and construct validity are all important in
academic laboratory experiments, internal and construct validity occupy
a higher priority because of the basic and universalistic goals of such
research. If appropriate constructs cannot be measured or studies do not
allow for causal inferences to be made, then it does not matter whether or
not the results generalize beyond the research context. Indeed, as a hedge
against concerns about external validity, academic researchers often couch
the results as what “is likely to” or “may” happen outside of the context
of their studies. Litigation experts do not have that luxury. The results of
their work are judged against a standard of what did happen or is happen-
ing in the real world.
Validity in Litigation Experiments
Litigation experiments are evaluated by rabid opposing attorneys and

expert with respect to the three validities noted above. However, attor-
neys and courts rarely refer to the criteria they use to evaluate litigation
experiments as construct, internal, and external validity. Instead, there
are several principles used by the legal community that are examples of
the three validities. This section discusses the three validities as they are
typically applied in litigation experiments and, in so doing, illuminates the
principles used by the legal community.
MIZIK_9781784716745_t.indd 564 14/02/2018 16:38

Construct Validity
In litigation experiments, construct validity refers to the ability of a pro-

posed measure of a dependent variable to reflect a construct referred to in
a statute.
In many cases, construct validity is relatively straightforward to estab-
lish. For example, suppose a new product uses another’s established brand
name in a way that could confuse consumers into thinking that the value
of the brand to its established owner would suffer. In such circumstances,
the owner of the brand name could sue the challenger for trademark
infringement in violation of the Lanham Act, which prohibits an unau-
thorized user of a trademark from using it in a way that “is likely to cause
confusion, or to cause mistake, or to deceive . . .” (15 U.S.C § 1114 (2005)).
In a landmark case (Union Carbide Corp. v. Ever-Ready, Inc., 531 F. 2d
366 (7th Cir. 1976)), experimental subjects were shown an example of a
lamp with the Ever-Ready brand name attached. Confusion was assessed
through a series of questions of the type: “Who puts out the product
shown here?” and “What other products does the company put out?”
The rationale for the last question stemmed from the belief that subjects
might not be able to identify Union Carbide as the company that makes
Eveready batteries but did think that the defendant’s lamp came from the
same company that made Eveready batteries. If the subject responded to
one of the questions with Eveready, Union Carbide, or batteries, that was
taken as a very strong indication that the respondent was confused as to
the source of the lamp.
While it would be difficult to argue against the construct validity in the
Eveready case, another class of trademark violations, trademark dilu-
tion, presents a construct validity problem. Trademark dilution involves
situations in which an entity other than a trademark’s original owner uses
that trademark (e.g., a brand name or logo) in a way that weakens the
strength and uniqueness of the brand associations that consumers hold to
the original brand. The Trademark Dilution Revision Act of 2006 allows
for two ways dilution can happen. It can occur because acts of one party
either “impaired the distinctiveness” or “harmed the reputation” of the
original brand. These are commonly referred to as “dilution by blurring”
and “dilution by tarnishment.”
The most common approach to measuring dilution is that used by the
plaintiff in Nike, Inc. v. Nikepal Int’l, Inc. (Case No. 2:05-1468-GEB-JFM,
United States District Court, Eastern District of California.). Nikepal
used its brand on glass syringes and the like. The study produced in that
case measured dilution by citing that, when subjects were presented with
the word Nikepal, Nike came to mind. The court took this as evidence of
MIZIK_9781784716745_t.indd 565 14/02/2018 16:38

dilution. I would argue that this result is less than surprising. Nike is one
of the world’s best-known brand names. It would be hard to imagine that
the mention of Nikepal would not bring Nike to mind. But this measure
lacks construct validity in a dilution case. The court did not explain how
this association impairs the distinctiveness or harms the reputation of Nike.
More recent research has proposed and implemented a response
latency approach to measuring dilution (Morrin and Jacoby 2000; Pullig,
Simmons, and Netemeyer 2006). Subjects are presented with a series of
paired words and/or phrases on a computer screen. One word is a brand
and the other is a thing that might be associated with that brand. The
subjects’ task is to identify, yes or no, whether the association is appropri-
ate for the brand presented. For example, a study might present the target
brand of study paired with the product category it belongs to embedded
in a longer series of pairs (e.g., Heineken – beer). The proposed measure
of dilution in this case would be the speed and accuracy of the response to
the target brand – category association.
Unlike the association measurement in NikePal, response latency and
accuracy represent an attempt to operationalize a difference in brand
associations. However, the connection between response latency and
impairing distinctiveness or harming reputation in a marketplace remains
tenuous at best.
Internal Validity
The common litigation experiment follows the test/control paradigm.

The appropriate study design to measure consumer perceptions that
indicate a legal violation associated with the use of a hypothesized causal
variable (e.g., being misled by a “false” advertisement, being confused
by an infringed trademark, choosing a competitor’s product because of
an infringed patent) involves dividing respondents into two groups, test
and control. The respondents in each group are shown one of two sets of
stimuli. The test group is shown a set of test stimuli containing the hypoth-
esized causal variable, and the control group is shown a set of control
stimuli not containing the hypothesized causal variable. The difference
between the levels of “behavior” exhibited by the test and control groups
commonly represents the net effect of the causal variable inherent in the
test stimuli. The control group literally controls for any random or system-
atic error that may be introduced, thereby serving as a baseline for deter-
mining the net effect of the causal variable introduced in the test stimuli.
The measure of “behavior” found within the control group represents the
behavior caused by all alternative explanations for responses that could be
interpreted as examples of the alleged legal violation as well as whatever
MIZIK_9781784716745_t.indd 566 14/02/2018 16:38

random noise is in the system. For the control stimulus to be effective, it

should be as similar to the test stimulus as possible, except for the legally
challenged causal variable. The greater the difference between the test and
control stimuli with respect to non-challenged elements, the greater the
chance that some non-challenged element could have caused responses
that might appear to have been caused by the challenged element.
For example, in litigation experiments surrounding false advertising
allegations, researchers are generally charged with examining whether
a particular component of an ad caused a specific (false) impression in
consumers’ minds. In Pharmacia Corp. v. GlaxoSmithKline Consumer
Healthcare, L.P. (Case No. 3:02-cv-05292-MLC, United States District
Court, District of New Jersey, 2003), Pharmacia’s litigation experiment
used a control that, unlike the challenged ad, was not comparative. The
court rejected the experiment because the study failed to control for
the possibility that consumers’ would have pre-existing beliefs that any
comparative ad implies superiority.
External Validity
As noted above, the emphasis on external validity provides a major distinc-

tion between litigation experiments and academic consumer research ones.
Two important experimental considerations feed into this distinction, the
target population for the study and the experimental task and context. Both
must match the circumstances surrounding the claim as closely as possible.
In general, litigation experiments collect data from a sample of con-
sumer respondents with an eye toward generalizing the results from
those data toward a larger group of consumers. The target to which the
survey is designed to generalize to is called the relevant target universe
or population. The selection of the proper universe is crucial because,
even if appropriate questions are asked in an appropriate manner, if the
questions are asked to the wrong individuals, the results of the survey
will be irrelevant to the issue at hand. Different groups will come to the
experiment with different sets of knowledge, perceptions, and experiences.
Each group will make judgments about experimental stimuli through the
lenses of these sets of knowledge, perceptions, and experiences. As such,
judgments made by different groups in the same situation may well differ.
The law has taken the (appropriate) position that, for a litigation experi-
ment to be relevant for a given dispute, the experiment must be conducted
on the correct group. That group must be inferred from the legal statute
under which the claim is lodged in the matter under dispute. The guiding
principle is that the target universe should consist of those people who are
likely to be harmed by the alleged infraction.
MIZIK_9781784716745_t.indd 567 14/02/2018 16:38

For example, in trademark cases involving likelihood of confusion,

the appropriate group is current and potential customers of the alleged
infringer’s product. Those are the people who are most likely to be
confused by the alleged infringement. In Smith v. Wal Mart Stores, Inc.
(Case No. 1:06-cv-526-TCB, United States District Court, Northern
District of Georgia – Atlanta Division 2008), Wal-Mart filed a trademark
infringement suit against the marketer of T-shirts (and other items) with
“Wal*ocaust” and “Wal-queda” on them, alleging that consumers would
think these T-shirts were somehow related to Wal-Mart. The items were
sold on the website www.cafepress.com. Wal-Mart conducted a survey
designed to generalize to all people teenaged and above who used the
internet to search for product information and either used or would
consider using the internet to buy a t-shirt or other novelty with a slogan
on it. The court rejected the study on the grounds of an overly broad
universe. It concluded that the most relevant group of consumers were
the customers of the CafePress website. In hindsight, the logic is clear.
Only those customers would encounter the t-shirts and other products.
Furthermore, those customers likely have a different view of the website
and what it offers than the population at large. As such, they would use
that lens to interpret the allegedly infringing products as they likely would
in the marketplace.
In addition, context is critical to the external validity of a litigation
experiment. Contexts and settings govern the extent to which results from a
survey can be generalized from the research sample to the target universes,
populations, and settings outside of the study. Consumer judgments are
(externally) valid only insofar as they are made about stimuli that they
would view in the real world in the manner in which they would encounter
them in the real world. The context in which a stimulus is represented pro-
vides survey respondents with information they use in forming judgments
when faced with a question about that stimulus (Sudman, Bradburn, and
Schwartz 1996, Chapter 5). As the noted trademark scholar J. Thomas
McCarthy writes, “a survey is designed to prove the state of mind of a
prospective purchaser.” Therefore, “the closer the survey context comes
to marketplace conditions, the greater the evidentiary weight it has”
(McCarthy 2016, §32:163).
Litigation experiments are often criticized for not adequately replicating
marketplace contexts. In particular, different contexts provide different
sets of information that the experimental subject must use in forming a
judgment. For external validity to hold, the subject must have access to the
same information that s/he would have in the actual marketplace.
In Fancaster, Inc. v. Comcast Corporation, etc. (Case No. 08–2922-
DRD, United States District Court, District of New Jersey), the court
MIZIK_9781784716745_t.indd 568 14/02/2018 16:38

rejected the defendant’s experiment because it presented the context of its

fancast.com website via a printout of static screenshots instead of a live
version. The court remarked that the website was meant to allow consum-
ers to browse and interact with the website via hyperlinks. This certainly
deprived subjects of information they would have had access to that could
have enabled them to identify Comcast as the sponsor of the website.
How Academics Can Learn From The

Courtroom
The recent trial, People of the State of California v. Overstock.com (Case

No. RG10–546833, Superior Court of the State of California in and for
the County of Alameda), centered on the application of (what is now
relatively old) research on how consumers interpret reference prices.
Overstock has always engaged in comparative advertising techniques
on its website (http://www.overstock.com and later http://www.o.co). It
displays on each of its product pages a higher price for the product above
the price offered by Overstock. At the time of trial (2013), the terminol-
ogy applied to the higher price was “compare.” The use of such pricing
comparisons is referred to in the industry as “advertised reference price(s)”
or “ARP(s).” The State of California alleged that Overstock’s ARPs had
been false or misleading because consumers thought that the ARP was
the prevailing market price or the manufacturer’s list price. It was neither.
Overstock instructed its employees to choose the highest price they could
find as an ARP or constructed an ARP using a formula that applied an
arbitrary multiplier to Overstock’s wholesale cost.
How the presence and level of ARPs impact consumer perceptions
of value and purchase intent was a significant topic in the literature 20
years ago. The general findings were that increases in ARPs increased
the consumer’s perception of product quality, increased purchase inten-
tion, and decreased inclination to search (Compeau and Grewal 1998).
However, that was done in an environment when the internet and online
shopping were either nascent or non-existent. It is an open question
as to whether these findings still hold as we close in on 2020, at a time
when online shopping and the ability to compare through comparison
engines and individual browsing have become widespread. Indeed, in an
attempt to mitigate any damages, Overstock claimed that the statement
“compare” had little or no effect. Fortunately, this could be tested with
unimpeachable construct, internal, and external validity.
Visitors to the Overstock website were randomly directed to different
webpages for the same product, with only one variable on the two sets
MIZIK_9781784716745_t.indd 569 14/02/2018 16:38

of pages changed. One set of visitors who clicked on a product were

shunted to the existing standard format displaying a “Compare at” price,
a “Today’s price” and a “You save” calculation of the difference in dollars
and percentage terms; the other set of visitors were directed to a page that
deleted the “Compare at” price and “You save” calculation.
Visitor response to the two sets of pages were then tracked, tabulated,
and compared in an attempt to measure the significance of the variable
being tested.
This experiment provided significant insight into consumer behavior
because, unlike the early academic studies that tested consumers’ stated
intentions, the litigation study examined actual purchase decisions (i.e.,
it had external validity). The results showed a conversion rate of 3.74
percent when the ARP information was present compared to a 3.71 per-
cent conversion rate when such information was absent — or a 1 percent
“uplift” in conversion rate when ARPs were present. Overstock used these
results to argue that the ARP had very little (if any) effect. The court was
not persuaded. It concluded that, even if the tests accurately captured the
relatively small size of the “bump” engendered by the use of ARPs, the
small bump grows large when accumulated over time. Furthermore, the
court opined that this bump had been larger in the past.
But that is not the point here. The point is that these results call into
question the findings of prior academic literature in a meaningful way.
Indeed, the external validity demanded by the courtroom has raised
doubts as to whether the scholarly findings of prior years hold at all
beyond the time and experimental setting they were conducted in. Future
research is indeed called for. The applied particularistic research required
here provides inspiration for future basic universalistic research.
Concluding Remarks
Over the years I have developed the view that good science is good science,
in our journals and in the courtroom. Experiments in both arenas are held
to similar “classes” of standards: construct, internal, and external validity.
The biggest differences in my view are twofold. First, academic consumer
research experiments must be interesting in the sense that they are held to
a standard of advancing the state of human knowledge. Litigation experi-
ments must be interesting in the sense that they must address particular
issues important for the matter at hand. Second, litigation experiments
place a much greater emphasis on external validity for the same reason.
They must address particular issues important for the matter at hand.
I have written elsewhere about the need for greater cooperation between
MIZIK_9781784716745_t.indd 570 14/02/2018 16:38

academia and practice (Steckel and Brody 2001). The courtroom is but
another example. I hope the results of the Overstock experiment persuade
the reader that the external validity required by the courtroom provides
scholars with the opportunity to test and (as in this case, possibly) revise
their theories in that and other contexts. As Lambrecht and Tucker (this
volume) point out, as the world becomes digitally enabled, the ease of
doing this can only increase.
REFERENCES
Compeau, Larry D. and Dhruv Grewal (1998), “Comparative Price Advertising: An

Integrative Review,” Journal of Public Policy and Marketing, 17 (Fall), 257–73.
Fisher, Ronald A. (1925), Statistical Methods for Research Workers, Edinburgh: Oliver and
Boyd.
Fisher, Ronald A. (1926), “The Arrangement of Field Experiments,” Journal of the Ministry
of Agriculture of Great Britain, 33, 505–13.
Hauser, John R. (2012), “The Future of Branding and Intellectual Property in Marketing.”
Presentation made at the Conference on Brands and Branding in Law, Accounting, and
Marketing, Kannan-Flagler School, University of North Carolina, April 13.
Lambrecht, Anja and Catherine E. Tucker (this volume), “Field experiments,” in Handbook
of Marketing Analytics.
McCarthy, J. Thomas (2016), McCarthy on Trademarks and Unfair Competition, 4th ed.,
Eagan, MN: Thomson Reuters.
Morrin, Maureen and Jacob Jacoby (2000), “Trademark Dilution: Empirical Measures for
an Elusive Concept,” Journal of Public Policy and Marketing, 19 (2), 265–76.
Pullig, Chris, Carolyn J. Simmons, and Richard G. Netemeyer (2006), “Brand Dilution:
When Do New Brands Hurt Existing Brands?” Journal of Marketing, 70 (April), 52–66.
Shandish, William R., Thomas D. Cook, and Donald T. Campbell (2002), Experimental and
Quasi-Experimental Designs for Generalized Causal Inference, Boston: Houghton-Mifflin.
Steckel, Joel H. and Ed Brody (2001), “2001: A Marketing Odyssey,” Marketing Science, 20
(4), 331–36.
Sudman, Seymour, Norman M. Bradburn, and Norbert Schwartz (1996), Thinking About
Answers: The Application of Cognitive Processes to Survey Methodology, San Francisco:
Jossey Bass.
MIZIK_9781784716745_t.indd 571 14/02/2018 16:38

30. Conjoint analysis in litigation
Sean Iyer
Introduction
First developed formally in the early 1970s, Conjoint Analysis has been
used widely in marketing science to study and measure preference.
Numerous corporations have used and continue to use Conjoint Analysis
to make business decisions. With its use in the global smartphone litiga-
tion wars, Conjoint Analysis has enjoyed more than its 15 minutes of
fame in recent high-stakes litigation.1 Variants of it are routinely used
in intellectual property disputes, in product liability class actions, and in
consumer protection food-labeling matters.
Conjoint Analysis as a method of proof in the courtroom may have
arisen as a response to more stringent standards for admissibility of expert
opinions related to damages. Courts increasingly demand sophisticated
damages models tied to facts of a case. Take patent infringement, where
the call for market-based evidence is particularly strident. Uniloc v.
Microsoft2 and its progeny have laid to rest such venerable expert career
platforms as the 25 percent “rule of thumb.” This so-called rule, which is
more accurately thought of as a “wink wink, say no more” assumption,
used 25 percent as a “reasonable” percentage applied to the profit margin
of the accused product or service that was implicated by the intellectual
property in question.
Other decisions, notably Cornell v. HP,3 Lucent v. Gateway,4 and Laser
Dynamics v. Quanta5 have catechized unfounded entitlements to the
so-called Entire Market Value (EMV) of a product or service as a basis
for calculating damages. (EMV is the market price of the accused product
or service in question, instead of, say, a component of the device that
“contains” the accused functionality.) The doctrine of apportionment,
enshrined in the nineteenth century Supreme Court Garretson decision,
wherein the royalty base is “apportioned” to a portion of the product or
service, is back in vogue.6 And courts have also tightened standards with
respect to “comparable” license agreements in Georgia Pacific damages
analyses (see, for example, the Federal Circuit’s guidance in ResQNet.com
v. Lansa.7 The ResQNet decision held that “the trial court must carefully
tie proof of damages to the claimed invention’s footprint in the market
place.”8
572
MIZIK_9781784716745_t.indd 572 14/02/2018 16:38

Conjoint analysis in litigation 573
Courts have also demanded rigor in class actions. The Supreme Court’s
Comcast decision calls for a “rigorous analysis” in the class certification
stage so that plaintiffs’ damages methodologies should be sufficiently
tied to the asserted liability theories.9 Given this increased scrutiny and
concomitant uncertainty with respect to what methods pass Daubert
screens, Conjoint Analysis has emerged as a potential option for litigants
and their experts.10 Like any scientific method, it is subject to misuse and
misinterpretation. With that caution in mind, I discuss some overarching
features of Conjoint Analysis.
It Is Widely Used and Accepted in Business and Academia
Conjoint Analysis, as a general methodology, is recognized by academ-

ics and industry practitioners as one of the most widely studied and
applied quantitative methods in marketing science.11 Variants of Conjoint
Analysis have been used by dozens of companies. It was used to design
AT&T’s first cellular phone, the EZ-Pass toll collection system, and
new services for the Ritz Carlton and Marriott hotel chains.12 The basic
mathematics behind Conjoint Analysis has been developed, vetted, and
peer-reviewed in academic studies in academic journals. In academia,
applications of Conjoint Analysis have been used to measure consumer
preferences for features of numerous products, including smartphones,
automobiles, and GPS devices.13 Over the years, Conjoint Analysis, when
properly implemented, has predicted consumer preferences.14
In short, Conjoint Analysis—as an overall methodology—is tested,
used, and relied upon in academia and business. It has been used in numer-
ous and varied product and service segments. Because widespread accept-
ance in the scientific community is an important factor in determining the
admissibility of expert evidence,15 this makes Conjoint Analysis—again,
as an overall methodology—appealing for litigators who wish to avoid
untested methods with limited applicability.
Even in cases where Conjoint Analysis is used in a manner that may be
novel in litigation, comparable uses in related contexts outside of litigation
may be persuasive to a judge or jury. For example, in Apple v. Samsung
II, Samsung argued that Apple’s Conjoint Analysis evidence should be
excluded since “no court has allowed an expert to use conjoint surveys to
quantify demand as [Apple’s expert] does here.”16 The court noted that
“the test for admission of expert opinion is not whether a court has admit-
ted such an opinion in a previous case” and that the analysis presented
by Apple “is adequately supported by marketing literature.”17 Citing a
number of academic studies, the court decided that the methodology used
in the case “is sufficiently reliable to survive Daubert scrutiny because [the
MIZIK_9781784716745_t.indd 573 14/02/2018 16:38

Apple expert’s] methodology is substantially similar to those employed

in these studies.”18 But other applications of Conjoint Analysis have not
fared as well, as I discuss below.
It Relies upon Consumer Surveys, Which Have Long Been Accepted by

Courts
Consumer surveys, either through fact discovery or sponsored by an

expert, have been used in various types of litigation. Properly implemented
consumer surveys have the benefit of collecting input from an appropri-
ate group of consumers and can target a representative sample of actual
purchasers of the product at issue to generate feedback from the relevant
market. Best practices in survey design have been widely discussed and
debated and are available for an expert to consult when designing a con-
joint survey or when defending against Daubert challenges.19 While courts
have excluded survey evidence when the survey in questions was not prop-
erly designed or executed, survey evidence as a general evidentiary tool is
not controversial; properly conducted surveys continue to be accepted by
courts.20
In Oracle v. Google, the court identified problems with the design of the
survey and excluded certain results based on the Conjoint Analysis per-
formed by plaintiff’s expert but noted that the plaintiff’s expert’s “conjoint
analysis in this particular instance is an unreliable predictor.” Notably,
the judge held that “consumer surveys are not inherently unreliable for
damages calculation.”21
It Can be Tailored to Address the At-issue Feature in a Particular

Litigation
The central idea behind Conjoint Analysis is that consumers’ prefer-

ences for a product can be decomposed into the features of the product
or service. Features can be physical characteristics (such as the size of an
automobile) or intangible consumer benefits (such as brand association).
The expert designs a consumer survey and asks respondents to choose
between different hypothetical products that vary in one or more features.
This gives so-called conjoint data, which is then analyzed to quantify the
contribution of an individual feature to the overall product or offering.
What makes Conjoint Analysis attractive is the flexibility the expert has
in creating the hypothetical products and specifying the underlying fea-
tures.22 For example, in patent infringement matters, a properly designed
and well executed survey can—at least in theory—include a feature that
maps to the allegedly infringing functionality. In a case with allegations of
MIZIK_9781784716745_t.indd 574 14/02/2018 16:38

misleading labeling, a feature can be designed to map onto the allegedly

“misleading” label (and used to capture the incremental valuation of the
consumers over a “non-misleading” label). Therefore, Conjoint Analysis
may be a useful methodology so long as the expert is careful in designing
the survey and interpreting its results.
Using Conjoint Analysis in Litigation

The Basics
At the heart of conjoint analysis is the insight that consumers’ preferences

for a product can be decomposed into attributes that make up the product.
So, a natural first step in Conjoint Analysis is to identify the product
attributes. For example, a hypothetical Conjoint Analysis of smartphones
may include attributes such as storage capacity, battery life, display size,
and price. Each attribute in the survey can have multiple values, known as
levels. For example, the levels could be:
l Storage capacity: 16GB, 32GB, 64GB

l Battery life: (measured in talk time) 300 minutes, 600 minutes, 900
minutes
l Display size: 3.5’’, 4.7’’, 5.5’’
l Price: $199, $399, $599.
A reasonably representative set of attributes and associated levels can

then be used to create hypothetical products called profiles, which are
then presented to the consumers for evaluation. Marketing researchers
have proposed and implemented different methods for this evaluation and
designing the underlying survey, such as a Full Profile (or ratings based)
Conjoint Analysis, paired comparisons, Choice-Based Conjoint Analysis
(“CBC”), and Adaptive Conjoint Analysis.23
CBC, regarded as the state-of-the-art,24 is appealing because its mechan-
ics closely resemble an actual purchase decision made by consumers.
In CBC, survey respondents are asked to choose between hypothetical
profiles each representing a product. This exercise is called a choice task.
The profiles presented to the respondents all have the same attributes, but
differ in some of the levels. This forces consumers to make trade-offs when
making their choice. For example, in a simplified choice task, a respondent
may be shown the following profiles:
MIZIK_9781784716745_t.indd 575 14/02/2018 16:38

l Smartphone A: 16GB storage, 300 minutes of battery talk time, 4.7”

display screen, price of $199.
l Smartphone B: 16GB storage, 900 minutes of battery talk time, 4.7”
display screen, price of $399.
In this simplified example, Smartphone B offers 600 additional minutes

of battery talk time but it also comes with a $200 increase in price. (In
this example, the hypothetical phones do not differ in any other attribute
levels.)
A respondent faced with this choice task will have to decide which
one he values more and choose accordingly. If the respondent chooses
Smartphone B, one can infer that the value of 600 minutes of additional
talk time is at least worth the extra $200 for that respondent. This choice
gives one data point (technically, an “inequality constraint”) that can be
used in the statistical analysis and estimation of consumer valuations.
Each choice exercise may include more profiles to choose from (four pro-
files are routinely used), as well as a “None of the these” alternative, also
known as an “outside option.” The number of attributes and levels may
be larger and more varied. Additionally, each respondent is given multiple
(potentially as many as 12 to 16) different choice exercises and hundreds
of respondents are included in the survey. In this way, the researcher
generates a large number of inequality constraints (at the respondent
level each respondent does a number of choice tasks and several hundred
respondents take the survey).
The next step in Conjoint Analysis is to analyze the data collected
by the consumer survey. Depending on the type of Conjoint Analysis
used in collecting data, the estimation can be performed via a variety of
methods, such as regression-based models, random utility models, and
Hierarchical Bayes estimation.25 The output of the statistical analysis is
a set of estimates called partworths, which are measures of utility. The
analysis generates parthworths for each level of each attribute, measuring
how much value consumers place on that level. In litigation applications,
partworths can then be used to calculate metrics of interest. Example of
metrics include:
l Willingness-to-pay (“WTP”): When price is included as one of the

attributes of the survey, Conjoint Analysis generates partworths
for price levels, similar to partworths for levels of other attributes.
In such cases, it would be possible to infer monetary valuations by
using information from trade-offs between levels of other attributes
and levels of prices.26 With additional analysis, WTP data derived
from conjoint analysis can be used to measure how much consumers
MIZIK_9781784716745_t.indd 576 14/02/2018 16:38

are willing to pay for increasing from one level of an attribute to the
next level. WTP is related to but distinct from market price. WTP,
when properly constructed, gives a demand side measure and not
equilibrium market price impact.
l Preference shares: Partworths can also be used in “market simula-
tions” to estimate what share of the population would be willing
to buy a product, based on the attributes and levels specified for
the product. An extension of this concept is a willingness-to-buy
(“WTB”) measure. The WTB measure starts by calculating the
share of the population who would be willing to buy a product and
assesses the decline in that share when a certain feature is removed
from the product. Therefore, the WTB measure can be useful in
addressing what-if questions such as “what percent of the purchas-
ers would cease to buy a product if the product did not incorporate
an infringing feature?”
Some Tactical Considerations
Conjoint Analysis, when done properly, can be expensive and time-

consuming. A threshold question for counsel is whether a different type
of survey, say a direct elicitation survey, will do the job. The virtue of
direct elicitation surveys is simplicity. They are direct and relatively easy
to administer, do not usually involve involved statistical analysis of survey
data, and are reasonably inexpensive. Therefore, they can be a simple way
to present hypothetical scenarios to respondents and query the extent to
which there is demand for a patented feature. But direct elicitation surveys
may improperly elevate the relevance or importance of a particular feature
by focusing the respondent on that feature. Another challenge in direct
elicitation surveys is to reasonably approximate the purchase content. To
the extent that direct elicitation survey strays too far from the “moment of
truth” of the purchase decision, it is subject to attacks regarding validity.
Notwithstanding the relative attractiveness of Conjoint Analysis, I
briefly discuss some tactical considerations with the use of Conjoint
Analysis in a litigation context. These include whether to use Conjoint
Analysis as an affirmative club or as a defensive shield; timelines for con-
ducting a conjoint survey; whether to run a pilot study; and coordinating
with the damages team. I summarize these considerations below.
Conjoint Analysis appears to have found more favor on the plaintiff’s
side. For example, Conjoint Analysis results have been used to support
Georgia-Pacific analysis conducted by the damages expert in patent
infringment. In some cases, the outputs from Conjoint Analysis have been
used as mathematical inputs to a plaintiff’s damages model. It has also
MIZIK_9781784716745_t.indd 577 14/02/2018 16:38

been used as an offensive move in class certification. In theory, Conjoint

Analysis can be used as a “defensive shield,” for example, in rebutting
improperly designed surveys that generate biased estimates by improperly
elevating the relevance or importance of a particular feature.
An important consideration when considering Conjoint Analysis in
litigation is the time it takes to properly design and conduct such a study
and analyze the results. Typically, the more complex the product or
service being tested, the longer it takes to put together a conjoint survey.
Accordingly, I expect defensive uses to be more limited, not least because
the paucity of “response time” on the defensive side constrains the poten-
tial use of a properly-done conjoint analysis.
Conjoint Analysis typically has the following stages:
Planning stage
In this stage, the survey expert decides whether Conjoint Analysis is
likely to be useful for the litigation issue at hand and, if so, estimates
how long it will take to design and implement the survey and analyze
the results. This stage typically takes several days, however the timeline
may extend to many weeks depending on the complexities of the matter,
the theory of liability, and the products and features at issue. The key
consideration for the expert in this stage is to have enough understand-
ing of the technology at issue and the marketplace to assess whether it
would be possible to design attributes and levels in a meaningful way.
In complex cases, such assessment may require discussions with techni-
cal experts to understand the patented technology and how it impacts
consumer-facing features of the product. It may also require review of
publicly available information to understand how consumers approach
the purchase process in the relevant market and the information they
are exposed to.
Design stage
In this stage, the conjoint expert prepares and vets the “survey question-
naire,” decides which questions to ask, how to word questions, what
information and instructions to provide to the respondents, and so on.
At this stage, the conjoint expert should ensure that the survey does not
suffer from design problems (such as unclear or leading questions) and
document pretest results and outcomes. A second task in the design stage
is identifying the features to be included in the survey, finalizing feature
descriptions, choice of attributes and levels. As I discuss below, feature
selection is a theme that repeatedly appears in motions attempting to
exclude Conjoint Analysis results and is carefully addressed by courts in
numerous Daubert orders. Accordingly, this stage requires care and docu-
MIZIK_9781784716745_t.indd 578 14/02/2018 16:38

mentation of evidence supporting particular feature selection decisions.

The design stage can take several weeks.
Sampling and administration stage

This is the data collection stage where the survey is fielded. A repre-
sentative sample of the appropriate universe of consumers is targeted and
responses of those who complete the survey are recorded for analysis. This
stage typically takes one to three weeks, though this period can be longer
for harder to reach populations.
Analysis and reporting stage

In the final stage, the survey expert undertakes statistical analysis using the
collected data, estimates partworths and calculates the relevant metrics.
This stage can take anywhere between one and three weeks.
Another consideration is whether or not to run a pilot study. Pilot
studies can be thought of as small-scale dress rehearsals for the main
survey and can be useful to detect potential issues or to test various design
alternatives to assess which is best received by respondents and is likely to
generate the most reliable results.27 Designing and running a pilot study
adds time and expense to the process. Also, parties should be mindful of
issues related to discoverability of pilot studies.28
In summary, putting together a defensible Conjoint Analysis can be
time consuming, particularly in cases that involve complex products. A
conjoint study that is an afterthought to a damages claim and comes late
in the case-planning calendar can severely handicap the expert sponsoring
the survey. Ideally, the survey expert should be involved in the early stages
of the litigation and have sufficient time to interact with technical experts
to understand the tested features. Also, early coordination with the dam-
ages expert may be useful to ensure that the survey methodology is well
understood and properly applicable to and relevant for the particular
model of damages.
Examples of Conjoint Analysis in

Litigation
To identify cases involving conjoint analysis, I used Westlaw for litigation

cases in the last 10 years and used the search term “conjoint analysis.” This
resulted in 25 distinct cases. Applications of Conjoint Analysis in litigation
are focused in two areas: intellectual property, which includes cases related
to copyright infringement and patent infringement, and consumer protec-
tion, which includes cases related to product liability, false advertising,
MIZIK_9781784716745_t.indd 579 14/02/2018 16:38

and food and beverage labeling.29 My search also showed a recent trend
in product liability class action cases: plaintiffs propose conjoint analysis
and/or a hedonic regression as means to estimate damages during Class
Certification Stage.30 Below, I summarize salient aspects in a selection of
matters where conjoint analysis has been used by litigants.
Oracle America, Inc. v. Google Inc.
Oracle America, Inc. v. Google Inc. is a long-running dispute related

to Oracle’s copyright and patent claims regarding Google’s Android
operating system.31 In this case, the Oracle expert used conjoint analysis
to predict the preference shares of smartphones.32 Prior to running his
survey, the expert conducted focus group interviews to understand which
features consumers consider while purchasing a smartphone. These inter-
views showed 39 features “that real-world consumers said they would
have considered when purchasing a smartphone, including battery life and
cellular network.”33 Of these 39 features, the Oracle expert included seven
features in his conjoint survey. Three out of these seven features were
related to the patented functionality.
The court opined that the survey produced unreliable market share cal-
culations and excluded the conjoint survey.34 The Court held that Oracle’s
expert failed to apply reasonable criteria to choose the distraction features
in his survey. Specifically, the court noted that the Oracle expert omit-
ted important features, such as battery life, Wi-Fi, weight, and cellular
network in his survey design.
To address the court’s concerns, the Oracle expert explained that “it
is not necessary in a conjoint analysis to test every distinguishing feature
that may matter to consumers because study participants are told to hold
all other features constant.”35 But the court determined that “the conjoint
study’s own irrational results shows that study participants did not hold
all other, non-tested features constant” and gave the following example
for study’s irrational results: “the results show that one-quarter of all par-
ticipants preferred (9%), or were statistically indifferent between (16%), a
smartphone costing $200 to a theoretically identical smartphone costing
$100 . . . The likely explanation for this irrational result is that survey
respondents were not holding non-specified features constant and instead
placing implicit attributes on features such as price.” Consequently, the
court ruled that “[t]he conjoint analysis’ determination of market share is
STRICKEN.”36
Notably, the court was not against the use of conjoint analysis as a
methodology. In fact, the court stated: “[i]f the conjoint analysis had been
expanded to test more features that were important to smartphone buyers
MIZIK_9781784716745_t.indd 580 14/02/2018 16:38

(instead of the four non-patented features selected for litigation purposes),

then the study participants may not have placed implicit attributes on the
limited number of features tested.”37
Apple v. Samsung I
The sprawling Apple v. Samsung I case involved use of conjoint analysis

to analyze consumer WTP for certain smartphones and tablet features.38
Apple’s expert conducted two conjoint surveys for this matter, one for
smartphones and one for tablets.39 In both surveys, the expert included non-
patented features in his survey design along with the patents-of-interest.40
Apple’s conjoint analysis expert measured consumers’ WTP for Apple’s
iOS system touchscreen features, such as “rubberbanding,” tap to re-
center after zoom, and two finger gestures, used in allegedly infringing
Samsung smartphones and tablets.41 Apple’s expert also included distrac-
tion features such as size and weight, storage/memory, connectivity,
and number of apps. Apple’s expert used a rich design that relied on
graphical representations and animation videos to explain each feature
to the respondents and made sure that the respondents understood each
feature in the survey.42 The conjoint surveys survived Daubert and Apple’s
expert testified in court during the high-profile trial and retrial.43 (See the
Appendix for a more complete discussion of the conjoint-related issues.)
Schwab v. Philip Morris
In Schwab v. Philip Morris, plaintiffs claimed that the tobacco companies

deceived smokers into believing that “light” cigarettes “were less harmful
than ‘regular’ cigarettes, when in fact they were at least as dangerous and
defendants knew of their dangers.”44 Plaintiffs’ expert used conjoint analy-
sis to determine “whether health risks are a significant contributing factor
in consumer decisions to smoke ‘light’ cigarettes; what proportion of
‘light’ cigarette-smoking consumers relied on health risks as a significant
contributing factor to their decision; and how consumers and the market
would react to cigarettes with different levels of health risks.”45
Plaintiffs’ expert included the following features for cigarettes: “(i)
pack type (hard or soft); (ii) degree of perceived health risks; (iii) taste;
and (iv) price.” For the perceived health risks feature, the expert included
the following feature levels: “less than ‘ultra-light’ cigarettes, the same as
‘ultra-light’ cigarettes, the same as ‘light’ cigarettes, the same as regular
cigarettes, and greater than regular cigarettes.”46 The expert concluded
that health risks are a contributing factor in the choice of 90.1 percent
of “light” cigarette consumers.47 In addition, the expert determined that
MIZIK_9781784716745_t.indd 581 14/02/2018 16:38

health risks are ranked above every other feature, with the exception of
price.48
The district court determined that the expert’s testimony met the
requirements of Rules 702 and 703 the Federal Rules of Evidence.49 And
the district court certified the class sought by the plaintiffs.50 However,
US Court of Appeals overturned this decision and decertified the class.51
Notably, the Court of Appeals ruled that the class cannot be certified
because “[i]ndividualized proof is needed to overcome the possibility that
a member of the purported class purchased Lights for some reason other
than the belief that Lights were a healthier alternative.”52
In re Whirlpool Corp. Front-Loading Washer Products Liability Litigation
In In re Whirlpool Corp. Front-Loading Washer Products Liability Litigation,

plaintiffs were Ohio purchasers of a certain type of Whirlpool’s front-
loading washing machines.53 Defendant was the Whirlpool Corporation.54
Plaintiffs asserted that the relevant washing machines “were designed
with inherent defects that cause them to accumulate residue, mold and/
or mildew, leading in some cases to accompanying odors.”55 Although
Whirlpool recommended a set of maintenance steps to prevent and reduce
the buildup in the machines, plaintiffs claimed that Whirlpool failed to dis-
close these maintenance tasks at the time of purchase to the consumers.56
Plaintiffs’ expert relied on conjoint analysis to estimate how much less
customers would be willing to pay for their washing machines had they
known prior to their purchase the additional maintenance Whirlpool
recommends to deter mold-related problems in their front-loading
machines.57 The plaintiffs’ expert chose six features for his conjoint survey:
type of machine (front- or top-loading), brand, price, efficiency, capac-
ity, and required maintenance.58 The maintenance attribute was the
feature-of-interest in the study. While the top-loading washers included
in the conjoint survey had only one maintenance attribute level—“No
additional maintenance required,” front-loading washers had six pos-
sible maintenance attribute levels. These attribute levels were: “(i) ‘No
additional maintenance required;’ (ii) ‘Must leave washer door open after
every wash;’ (iii) ‘Must inspect under door seal monthly and, if stained,
clean with bleach/water;’ (iv) ‘Must run a monthly clean or empty cycle
with bleach;’ (v) ‘Must purchase cleaning product that costs $2.33 a month
for use in empty wash cycle;’ and (vi) ‘Must leave washer door open after
every wash, inspect and clean under the door seal monthly, and run a
monthly clean or empty cycle with bleach or an approved washer cleaning
product.”59
Plaintiffs’ conjoint expert concluded that “if Whirlpool had informed
MIZIK_9781784716745_t.indd 582 14/02/2018 16:38

consumers, pre-sale, of the additional maintenance Whirlpool recom-

mends to deter mold-related problems, consumers would have been will-
ing to pay between $143 and $419 less for those front-loading machines.”60
The Court denied Whirlpool’s motion to exclude the conjoint study.61 The
jury returned a verdict for Whirlpool in this matter.62
Khoday v. Symantec Corp.
The defendants in this matter were Symantec, a software company “that

sells internet security software products under the Norton brand,” and
Digital River, an “ecommerce website designer for online retailers.”63
From 2000 through June 2010, Digital River managed the online store
for Symantec’s Norton products. While running this online store, Digital
River offered a download insurance product—Electronic Download
Service (“EDS”)—which allowed customers to re-download Norton soft-
ware for up to one year if the customers “lost their original software by
purchasing a new computer or if their computer crashed.”
Symantec began the transition to manage their own storefront in
October 1999 and started to offer a similar download insurance product—
Norton Download Insurance (“NDI”)—which allegedly operated on
the same principles as EDS. NDI and EDS were automatically added
to customer’s shopping carts when they purchased a Norton software
product. Norton and Digital River charged “between $4.99 and $16.99 for
the download insurance products, depending on the value [they] believed
customers would be willing to pay for download insurance for a particular
type of Norton software.”
Plaintiffs, purchasers of these download insurance products, argued that
there were many free alternative options for customers to re-download the
purchased Norton software. Plaintiffs’ survey expert considered the value
of the download insurance products arising from the convenience these
products provide by automatically injecting a software key. If customers
did not purchase a download insurance product, they could find their
software keys in their software purchase confirmation email or by contact-
ing Symantec customer service via phone.64 The plaintiffs’ conjoint survey
expert used four different features in his survey design: (1) the number of
times the software could be re-downloaded, (2) the re-download process
(i.e., whether the product key is automatically injected), (3) whether a
computer security newsletter is provided, and (4) price.65
Based on his conjoint analysis, the plaintiffs’ expert concluded that “the
fair market value of automatic product key injection offered in conjunc-
tion with the redownload of Norton computer security software products
is between $0.05 and $0.16 per transaction.”66 On March 31, 2015, the
MIZIK_9781784716745_t.indd 583 14/02/2018 16:38

court granted plaintiffs’ motion for class certification and allowed the
conjoint survey.67 On October 8, 2015, the parties participated in media-
tion and agreed to settle.68
Notes
1. See, for example, Apple Inc. v. Samsung Elecs. Co. Ltd. et al. No. 5:11-cv-01846, (N.D.
Cal. June 30, 2012) and Apple Inc. v. Samsung Elecs. Co. Ltd. No. 12-cv-00630, (N.D.
Cal. Feb. 25, 2014).
2. Uniloc USA, lnc. v. Microsoft Corp., 632 F.3d (Fed. Cir. 2011).
3. Cornell Univ. v. Hewlett-Packard Co., 609 F. Supp. 2d (2009).
4. Lucent Techs., Inc. v. Gateway, Inc., 580 F.3d (Fed. Cir. 2009) (“Lucent v. Gateway”).
5. LaserDynamics, Inc. v. Quanta Computer, lnc., 694 F.3d (Fed. Cir. 2012).
6. For details of these decisions and a general overview of developments in reasonable
royalty damages, see Shankar Iyer, “Patent Damages in the Wake of Uniloc,” Spring
2012, Vol. 23 No.3, Damages in IP Litigation, ABA Intellectual Property Section.
A more recent discussion is Zalin Yang, “Damaging Royalties:An Overview of
Reasonable Royalty Damages,” Berkeley Technology Law Journal.
7. ResQNet.com, Inc. v. Lansa, Inc., 594 F.3d (Fed. Cir. 2010).
8. ResQNet.com, Inc. v. Lansa, Inc., 594 F.3d 869 (Fed. Cir. 2010).
9. Comcast Corp. v. Behrend, 133 S. Ct. 1426 (2013).
10. The Daubert standard is a rule of evidence regarding the admissibility of expert testimony.
11. For a general overview of early business applications of Conjoint Analysis, see Green,
Paul E., Abba M. Krieger, and Yoram Wind. “Thirty years of Conjoint Analysis:
Reflections and Prospects,” Interfaces, 31:3 (2001), S56-S73.
12. Orme, Bryan K, Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research, Research Publishers, 2010, p. vii.
13. See, e.g., Hauser, John R., Olivier Toubia, Theodoros Evgeniou, Rene Befurt, and
Daria Dzyabura. “Disjunctions of conjunctions, cognitive simplicity, and considera-
tion sets.” Journal of Marketing Research 47, no. 3, 2010, pp. 485-496.
14. See John R. Hauser & Vithala Rao, “Conjoint Analysis, Related Modeling, and
Applications,” in Advances in Marketing Research: Progress and Prospects 141–168
(Jerry Wind & Paul Green eds., 2004).
15. Daubert, 509 U.S. at 594.
16. Apple v. Samsung II, Order granting in part and denying in part motions to exclude
certain expert opinions, February 25, 2014, p. 27.
19. See, for example, Diamond, Shari S., “Reference Guide on Survey Research,” in
Reference Manual on Scientific Evidence, Third Edition, Federal Judicial Center, 2011.
20. See, for example, Lucent v. Gateway, 1301, 1333-34.
21. Oracle v. Google, “Order Granting in Part and Denying in Part Google’s Daubert Motion
to Exclude Dr. Cockburn’s Third Report,” March 13, 2012, p. 15 (emphasis added).
22. Feature selection is a crucial step in the design of a conjoint survey which should be
done carefully and supported adequately. Failure to do so may be grounds for exclu-
sion of the survey. (See Oracle Am., Inc. v. Google, Inc., No. C 10—03561, 2012 WL
850705 (N.D. Cal. Mar. 13, 2012) (“Oracle v. Google”).
23. See Vithala R. Rao, Conjoint Analysis Springer (2014), Chapters 3 and 4.
24. Orme, Bryan K, Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research, Research Publishers, 2010, p. 45.
MIZIK_9781784716745_t.indd 584 14/02/2018 16:38

25. See for example, Hauser, John R., and Vithala R. Rao. “Conjoint analysis, related
modeling, and applications.” In Marketing Research and Modeling: Progress and
Prospects, pp. 141-168. Springer, 2004.
26. Orme, Bryan K. Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research, Research Publishers, 2010, p. 84.Chapter 9 has an explanation of
WTPs.
27. Diamond, S. S. (2011) “Reference Guide on Survey Research.” Reference Manual on
Scientific Evidence, 3rd edition, (Federal Judicial Center). p. 388.
28. Diamond, S. S. (2011) “Reference Guide on Survey Research.” Reference Manual on
Scientific Evidence, 3rd edition, (Federal Judicial Center). p. 389.
29. The search found 10 and 14 cases in the intellectual property and consumer protection
fields, respectively. The search also found an antitrust case involving conjoint analysis:
U.S. v. H & R Block, Inc. No. 11-00948 (BAH). In this matter, the plaintiffs’ expert
proposed conjoint analysis during the class certification stage.
30. See, for example, In re NJOY, Inc. Consumer Class Action, Scotts EZ Seed Litigation,
and Miller v. Fuhu Inc.
31. Oracle America, Inc. v. Google, Inc., No. 3:10-cv-03561, “Order Granting in Part and
Denying in Part Google’s Daubert Motion to Exclude Dr. Cockburn’s Third Report,”
March 13, 2012, p. 1.
32. Ryan V., Christopher, Avelyn M. Ross and Kristen P. Foster. “4 Tips for Using Consumer
Surveys In Patent Cases – Law360.” Accessed April 14, 2016. http://www.law360.com/art
icles/536189/4-tips-for-using-consumer-surveys-in-patent-cases.
March 13, 2012, p. 14.
March 13, 2012, p. 14.
March 13, 2012, p. 15.
March 13, 2012, p. 16.
March 13, 2012, p. 16.
38. Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No. 5:11-cv-01846, “Order Denying Motion
for Permanent Injunction,” December 17, 2012, p. 1.
39. See Expert Report of John R. Hauser, Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No.
5:11-cv-01846-LHK, Exhibits D and E.
41. Expert Report of John R. Hauser, Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No. 5:11-cv-
01846-LHK,p. 34.The “rubberband” feature refers to the “scrolling effect that occurs
when [a user] reach[es] the end of a webpage” and the screen bounces back.See “Steve Jobs
and The ‘rubber Band’ patent.” Engadget. Accessed April 15, 2016. http://www.engadget.
com/2012/08/07/steve-jobs-and-the-rubber-band-patent.
43. Ryan V., Christopher, Avelyn M. Ross and Kristen P. Foster. “4 Tips for Using
Consumer Surveys In Patent Cases – Law360.” Accessed April 14, 2016. http://www.la
w360.com/articles/536189/4-tips-for-using-consumer-surveys-in-patent-cases;Bishop,
Bryan. “Apple Expert: Smartphone Owners Are Willing to Pay $100 Premium
for Features Samsung Copied.” The Verge, August 10, 2012. http://www.theverge.
MIZIK_9781784716745_t.indd 585 14/02/2018 16:38

com/2012/8/10/3234453/apple-expert-smartphone-owners-100-premium-copied-samsu
ng-trial; Apple Inc. v. Samsung Elecs. Co. Ltd. et al., No. 5:11-cv-01846, “Order
Granting-In-Part and Denying-In-Part Motions to Exclude Expert Testimony,” June
29, 2012, p. 7.
44. Schwab v. Philip Morris USA, Inc., No. 04-CV-1945(JBW), “Memorandum and Order,”
September 25, 2006, p. 16.
September 25, 2006, pp. 308–309.
September 25, 2006, pp. 309–310.
September 25, 2006, p. 311.
September 25, 2006, p. 311.
September 25, 2006, p. 316.
September 25, 2006, p. 27.
51. McLaughlin v. Philip Morris USA, Inc. (Philip Morris v. Schwab), 522 F.3d 215 (2nd Cir.
2008) | Public Health Law Center.” Accessed April 15, 2016. http://publichealthlawcenter.
org/resources/mclaughlin-v-philip-morris-usa-inc-philip-morris-v-schwab-522-f3d-215-
2nd-cir-2008.
52. McLaughlin v. Philip Morris USA, Inc. (Philip Morris v. Schwab), 522 F.3d 215 (2nd Cir.
2008) | Public Health Law Center.” Accessed April 15, 2016. http://publichealthlawcenter.
org/resources/mclaughlin-v-philip-morris-usa-inc-philip-morris-v-schwab-522-f3d-215-
2nd-cir-2008.
53. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer Products
Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014, p. 1.
58. Order Regarding Daubert Motions, In re: Whirlpool Corp. Front-loading Washer
Products Liability Litigation Case, No. 1:08-WP-65000 (MDL 2001), October 3, 2014,
pp. 40–42.
pp. 41–42.
p. 44.
62. Verdict Form, In re: Whirlpool Corp. Front-loading Washer Products Liability Litigation
Case, No. 1:08-WP-65000 (MDL 2001), October 30, 2014.
63. Amended Memorandum Opinion and Order, Devi Khoday and Danise Townsend, et. al,
v. Symantec Corp. and Digital River, Inc. Civil No. 11-180 (JRT/TNL), March 19, 2015.
64. Memorandum Opinion and Order Granting Plaintiff’ Motion for Class Certification,
Devi Khoday and Danise Townsend, et. al, v. Symantec Corp. and Digital River, Inc. Civil
No. 11-180 (JRT/TNL), March 31, 2014, p. 9.
MIZIK_9781784716745_t.indd 586 14/02/2018 16:38

No. 11-180 (JRT/TNL), March 31, 2014, p. 9.
No. 11-180 (JRT/TNL), March 31, 2014, p. 9.
No. 11-180 (JRT/TNL), March 31, 2014, p. 27.
68. “Symantec Norton Insurance Class Action Lawsuit Settlement.” Top Class Actions, July
14, 2014. https://topclassactions.com/lawsuit-settlements/closed-settlements/34134-syma
ntec-norton-insurance-class-action-lawsuit.
69. For a discussion of the technical terms in this discussion, the reader is referred to the
chapters by Toubia and Howell, Allenby, Rossi in this volume.
MIZIK_9781784716745_t.indd 587 14/02/2018 16:38

Appendix: The Use of Conjoint Analysis in

Apple v. Samsung I
The Apple v. Samsung patent infringement cases are among the most high-
profile cases to use Conjoint Analysis. In Apple v. Samsung I, it was used
to estimate a price premium for various touchscreen features. Apple’s
conjoint expert focused on customers who were known to have purchased
the infringing smartphones (and tablets).
The experimental design included six features plus price: the capabilities
of the touchscreen, size and weight, camera, storage, connectivity, number
of apps, and price. The levels of the first feature were designed to represent
the benefit to the customer of the patents. The other features were used
to “distract” the customer to minimize focus on the touchscreen features
alone. Questions were worded so that all other features were instructed
to be held constant. The customer was asked to focus on the smartphone
(tablet) that he/she had bought and to assume that only the indicated
features and price varied. Respondents were drawn from a professional
panel and screened to be relevant. Survey craft was emphasized—pretest-
ing, layout of the features, video instructions for both the features and
the task, security controls, and tests of respondents’ care in answering the
questions. All estimation was by standard hierarchical Bayes with price
partworths constrained to be monotonic.
Price Premium
The price premium was calculated using a conjoint-analysis simulator

which compared two Samsung smartphones (tablets): one product with
the feature enabled by the patent and the current price and another
product with the highest level that was possible with a non-infringing
alternative. The price was lowered until the sample of customers was indif-
ferent between the two products. The expert testified that the price premi-
ums indicated that customers valued the patented features.
The expert also calculated, but did not testify about, willingness-to-pay
(WTP). WTP was defined as the change in price necessary for a customer
to accept the non-infringing alternative rather than the feature enabled
by the patent. This WTP was based on draws from the posterior distribu-
tion, but was not based on ratios of partworths as in Howell, Allenby,
and Rossi (see Chapter 32 in this volume). The expert had to assure that
calculations were based only on prices within the range of the conjoint
experiment and had to account for (prospect-theory-based) non-linearity
in the price response. The WTP calculations gave similar results to the
MIZIK_9781784716745_t.indd 588 14/02/2018 16:38

price premium calculations and were used as a check for convergent

validity.69
Use in the Trial
The damages calculations were done by another expert. The conjoint anal-
ysis was used “as an indicator of demand.” The conjoint expert was noted
that “I just have market demand and . . . the actual price that you pay
depends upon both the demand and also what Apple and Samsung would
be willing to supply.” The court endorsed this use of conjoint analysis as
relevant to the case (Judge Koh’s decision, June 29, 2012).
Permanent Injunction
After the jury awarded damages, the plaintiff sought to use the con-
joint analysis to justify a permanent injunction against the infringing
smartphones (tablets). Initially, the court judged that conjoint analysis
measured demand for features not products (see Judge Koh’s decision,
December 17, 2012), but that decision was remanded back to the court
(Court of Appeals for the Federal Circuit (“CAFC”) decision, November
18, 2013). In a follow-up decision, the district court endorsed the con-
joint survey for patent evaluation, but questioned its use for a permanent
injunction, citing the need to account for market price and citing that the
expert put forth the study (appropriately) for market demand only (Judge
Koh’s decision, March 6, 2014). However, this decision was remanded
back to the court (CAFC decision, September 17, 2015).
MIZIK_9781784716745_t.indd 589 14/02/2018 16:38

31. Conjoint analysis: applications in
antitrust litigation
Michael P. Akemann, Rebbecca Reed-Arthurs
and J. Douglas Zona
The use of consumer surveys and conjoint analysis has become increas-
ingly common in complex litigation, especially in intellectual property
disputes. In trademark litigation, for example, consumer surveys are often
used to assess the extent of consumer confusion across similarly marked
products. In patent infringement matters, consumer surveys are used to
assess customer demand for patented features in complex, multi-featured
products, to apportion value between patented and non-patented features,
and to estimate consumer willingness to pay for products that are provided
free in the marketplace. In such applications, survey respondents are often
asked to choose between hypothetical bundles of products (some of which
include the patented technology at issue) that have been assigned reason-
able prices. Statistical methodologies are then applied to estimate how
much a consumer is willing to pay to have the patented feature included
in the product or to estimate the increase in consumer demand associated
with including the patented feature in the product.
Although the use of consumer surveys and conjoint analysis has been
less common in antitrust litigation, there is a fairly long history of the use
of these techniques in the antitrust context. Diamond (2011) discusses
a 1985 antitrust case in which the plaintiff used a consumer survey to
identify product characteristics that affected consumer preferences and to
estimate alleged damages.1 Rubinfeld (2008) notes that US government
antitrust authorities have relied on conjoint analysis on numerous occa-
sions, including in reaching a consent decree with ski resort operators
in the 1997 United States v. Vail Resorts matter.2 Walter and Reynolds
(2008) and Hurley (2010) discuss the UK Competition Commission’s use
of customer surveys in defining relevant antitrust markets and assessing
competitive effects in the case of proposed mergers.3
Economists often prefer using market data (based on revealed prefer-
ence) rather than survey data (based on stated preference). However, in
some instances market data relevant to the issue at hand are unavailable.
For example, consumer surveys and conjoint analysis can be valuable in
assessing the possible price effects of a proposed merger before it has been
590
MIZIK_9781784716745_t.indd 590 14/02/2018 16:38

Conjoint analysis: applications in antitrust litigation 591
consummated.4 In matters involving alleged anticompetitive foreclosure,

such techniques may be useful in assessing the potential impact of the
challenged conduct when such conduct has been sufficient to preclude the
introduction of a new product into the marketplace. More generally, since
antitrust cases often involve the construction and analysis of the so-called
but-for world (i.e., a hypothetical set of market circumstances in which the
alleged antitrust misconduct is assumed not to have occurred), conjoint
survey data can help generate targeted evidence relevant to the analysis of
such hypothetical circumstances.
Moreover, in some instances conjoint survey data has advantages over
market data. For example, since market prices are typically determined
simultaneously by the often-complex interaction of supply and demand
factors, it can sometimes be difficult to disentangle or isolate the impact
of one particular subset of factors (e.g., alleged actions taken to further a
price-fixing conspiracy). As Rubinfeld (2008) notes, “A properly designed
survey (including the statistical design of the research and the crafting of
the survey instrument) can avoid a number of problems that are inherent
in market-based data. Thus, the judicious design of the survey questions
and the selection of those to be surveyed can ensure that explanatory
variables are exogenous, rather than endogenous.”5
We focus in this chapter on the use of consumer surveys and conjoint
analysis in antitrust litigation. Next, we describe the basic techniques and
implementation strategies with an emphasis on areas where disputes are
likely to arise. We then present two case studies based on our experiences
in litigation-related consulting before concluding the chapter.
Conjoint Survey Implementation

Strategies
Economists often use market data to assess the impact of a policy or

conduct as such data reflect the real-world choices of customers and
firms and thereby reveal the preferences of those entities. However,
there are various situations where market data are unavailable or are
insufficient for the task. A common example is when the research goal is
to value a hypothetical product that does not yet exist in the real world.
In this instance, an appropriately conducted conjoint survey designed
to elicit stated preferences of respondents can be used to provide insight
as to the value of the hypothetical product. Another example is when
the product feature of interest consistently varies at the same time as
other product features, making it difficult or impossible to separately
identify the feature’s value using market data. A conjoint survey can
MIZIK_9781784716745_t.indd 591 14/02/2018 16:38

be used to forecast consumer demand for products with certain feature

combinations and to predict the marketplace impact of new product
offerings.
In a choice based conjoint (“CBC”) survey, respondents are shown a set
of products that vary across several features or attributes, including price,
and asked to choose the product they would be most likely to purchase
within that set. Respondents are then asked to repeat this exercise with
different sets of products (or choice sets). While conjoint surveys can be
a useful tool in estimating what would have happened in certain coun-
terfactual situations, such surveys typically require a simplified model of
the world. When constructing the conjoint survey instrument, it is often
not possible, as a practical matter, to enumerate all product features
and to present the full set of possible product variations available in the
marketplace. Attempting to do so, particularly for multifaceted and dif-
ferentiated products such as smartphones and other consumer electronic
devices, would be both impractical (from a data gathering perspective)
and overly mentally taxing and time consuming for survey respondents.
Instead, survey respondents are typically presented with a stylized and
simplified set of product characteristics that represent the salient facets
of the marketplace and any counterfactual worlds that are necessary to
address the issues at hand. The conjoint survey results are then extrapo-
lated to real-world settings.
Conjoint survey analysis performed for litigation also typically requires
another simplification relative to the real world. While the ultimate
question the researcher wishes to answer typically involves all purchasers
(or potential purchasers) of a product or all consumers impacted (or
potentially impacted) by a particular policy, it is generally impractical to
survey the entire population of decision makers. Instead, the researcher
must determine how to obtain responses from an appropriate sample of
decision makers that accurately represents the broader pool of real world
decision makers.
The ultimate goal of the analysis will determine the structure of the
conjoint survey instrument, the design of the questions, the sample that
should be surveyed, and how the data should be analyzed. However,
designing and implementing most conjoint surveys for use in litigation
generally require three broad and interrelated steps. First, determine the
research question to be addressed with the survey exercise and develop the
questions (or choice exercises) to present to survey respondents. Second,
identify the relevant target population and an appropriate sample from
that population. Third, analyze the choice data generated by the conjoint
survey. In each broad step, there are important research design and
implementation issues that must be addressed, many of which are often
MIZIK_9781784716745_t.indd 592 14/02/2018 16:38

a source of some contention in the litigation environment. We highlight

some of these issues below.
Determining the Goal and Designing the Conjoint Survey Instrument
In some litigation matters, a consumer survey can be designed to produce

an end result that is directly relevant to one or more of the core issues in
the case. For example, in a trademark dispute one of the central liability
issues is whether consumers were confused by similarly marked prod-
ucts; a consumer survey can be used to test that proposition directly. As
another example, determining what fraction of a relevant population has
used a device or service in antitrust or intellectual property disputes can be
relevant to identifying the sales base at issue; a consumer survey can help
identify that fraction directly.
Conjoint survey results, however, are more often used in litigation mat-
ters as inputs into subsequent analyses. For example, in patent infringe-
ment matters, conjoint surveys are sometimes used to develop evidence on
consumer demand for patented features, which is relevant to assessing a
reasonable royalty for a patented technology. In antitrust cases, conjoint
surveys can be used to help assess product substitution patterns or the
competitive advantages conferred by particular product features, which
may be relevant to analyzing the ultimate liability and damages issues in
the case.
If the objective of the analysis is to evaluate the market outcomes in the
counterfactual (or “but-for”) world, care should be taken to design the
conjoint survey to support that objective. If the end goal is an analysis
of potential market outcomes, that goal might affect the relevant sample
population (potential consumers of the product at issue), which product
features and prices to include (features that are representative of models
available on the market which differ across models and firms), and
whether to include an outside option (i.e., to allow consumers to purchase
nothing and retain the status quo).
In generating a conjoint survey for use in litigation, the goal is gener-
ally to create a survey instrument that is understood by respondents,
accurately addresses the issue in question, and avoids biasing respondents
towards particular answers.6 While these principles are generally not
debated, there is often substantial contention over the survey instrument
design on various dimensions. If the goal of the conjoint survey is, for
example, to determine how potential consumers would react to a change in
the available features of a product in the real world, an important instru-
ment design question is the degree to which the survey should attempt to
educate respondents about those product features.
MIZIK_9781784716745_t.indd 593 14/02/2018 16:38

One approach is to start with the same amount of information that

would typically be available to consumers when making those decisions,
while also explaining more fully any product aspects that are relevant to
the fundamental question. For example, if the conjoint survey is designed
to ask potential purchasers of infant formula to choose between different
formulations with different additives, the question might provide standard
information found on product packaging along with further summary
information from an academic study comparing the impact of the differ-
ent additives on nutritional content. This approach, however, runs a risk
of creating bias in the survey respondents’ answers if there is inordinate
attention drawn to certain product features (in this case, the presence or
absence of certain additives). Another approach is to provide little or no
explanation of the included product features beyond that which might be
available on the product packaging itself. While such an approach might
better mimic (at least some) actual purchase decisions, it might also fail
to probe consumer preferences over features as deeply as an approach
that provided more detailed information about product performance and
characteristics.
Another common point of debate when implementing CBC surveys is
the selection and number of features to include in the survey instrument.
Including fewer features runs the risk of unduly focusing respondents on
those features that are included in the survey (thereby potentially inflating
their value), while including too many features becomes unwieldy and
difficult for the respondent to process. Ensuring that the right features are
included in the survey is also important and often debated. For example,
in order to use conjoint survey results to help estimate a reasonable royalty
for the patent at issue in a patent infringement matter, it is important that
the survey correctly characterizes the product feature(s) that are enabled
by using the patented technology.
A final issue worth highlighting in the implementation of CBC sur-
veys is whether to allow respondents to select “None of the Above” or
another outside option. In some cases, the focus of the survey is on the
relative value of different product characteristics, so the presence of an
outside option may be less important, but in other circumstances, such
as determining market shares, it may be more important. For example, if
the survey asks potential purchasers of infant formula to choose between
a variety of products, and an outside option is indicated in the particular
circumstances, including a “None of the Above” might be the most
appropriate research choice. Allenby et al. (2014) contend that allowing
respondents to select “None of the Above” creates a more realistic choice
set for survey respondents and is particularly important in estimating
market share impacts and consumer responses to the introduction of new
MIZIK_9781784716745_t.indd 594 14/02/2018 16:38

product features.7 In contrast, Haaijer et al. (2001) explain that including

a “None of the Above” option allows survey respondents to avoid making
difficult choices and can complicate the statistical modeling and estima-
tion process.8 One method for addressing the outside option issue is to
implement a two-stage question design in which a respondent is forced to
choose between a set of products in the first part of the question and then
asked whether he or she would actually purchase their selected product in
the second stage.
CBC analysis provides survey respondents with realistic, if somewhat
simplified, choice sets designed to more closely approximate the real-world
customer experience (as compared to some other survey approaches), and
then uses consumer choices to estimate the implied value that consumers
place on different product features. Instead of using the somewhat indirect
conjoint approach, some researchers prefer to ask respondents more
directly about their reactions to product features and/or changes in market
circumstances – for example how they would respond (if at all) to a five
percent increase in price of a relevant product? As described in Reynolds
and Walters (2008), this direct approach is common in surveys commis-
sioned by the UK Competition Commission. To improve the accuracy
of this approach, the Commission guides consumers to relive their actual
purchase decisions by structuring its surveys to first pose questions about
factual matters (e.g., what are you shopping for today?) to establish a
context for the question, then about their current behavior and choices
(e.g., why did you come to this store relative to others?), and finally about
what they would have done in alternate circumstances (e.g., what would
you do if prices were 5 percent higher in this location?).9
Identifying and Sampling from a Relevant Target Population
Once the purpose of the survey has been established and the conjoint
survey instrument has been designed, the next step is to identify the
target population for the survey – i.e., the universe of individuals that
are relevant to answering the question at hand. In the case of estimat-
ing the importance of a particular product characteristic in relation to
overall consumer demand for that product, the relevant target population
might be either current or current and likely future purchasers for those
products. For example, when assessing how consumer demand for infant
formula would be affected by the addition of certain additives, the target
population might be current or expectant parents (or other caretakers of
infants).
Defining the relevant target population is often a point of contention
in surveys used in litigation. In recent patent litigation between Apple
MIZIK_9781784716745_t.indd 595 14/02/2018 16:38

and Samsung, for example, Samsung disputed whether a survey of exist-

ing Samsung phone owners was the appropriate population, given that
purchasers of other brands of phones may have considered Samsung
phones and may have purchased differently if a different set of features
had been available. In the context of antitrust market definition, Reynold
and Walters (2008) caution that the sample should include members of
the proposed relevant antitrust market, and not just users of the merging
firms’ products as it is the elasticity of market demand, not the residual
demand facing the merging firms, that is generally relevant for market
definition.10
Once the relevant target population is determined, members of that
population must be selected and contacted for potential inclusion in the
survey. A commonly used approach is to employ a sampling procedure
based on probability or random sampling – either using stratified or non-
stratified approaches. When using random sampling, each member of the
target population has an equal probability of being included in the sample.
For example, if the relevant target population is all current Visa card
holders, each card holder would have an equal chance of being selected for
inclusion in the sample.
However, there are instances where it is not possible, or it is cost
prohibitive, to obtain a true random sample of the relevant population.
According to Diamond (2011), some forms of nonprobability sampling
are regularly used in certain fields (e.g., marketing) and courts generally
accept evidence generated under such approaches when they are consistent
with the approaches reasonably relied upon by experts in the field.11
When non-random samples are used, or when substantial numbers
of contacted individuals choose not to reply to the survey (creating the
potential for non-response bias), there is likely to be substantial debate as
to whether the sample is likely to be representative of the relevant target
population. One way to assess the representativeness is to compare the
frequency of demographic characteristics of the survey respondents to that
of the broader population or the characteristics appearing in other surveys
that did use random sampling. Another approach, put forth in Allenby et
al. (2014), is to collect information on variables that are closely related to
the variables at issue. The authors suggest, by way of example, that “If we
were doing a survey of smartphones, we might insert questions about own-
ership of smartphones by make or model and compare the market shares
of our survey with those known in the US market.”12 Such responses and
demographic characteristics can be used to assess the representativeness
of the sample, and in some circumstances, be used to reweight the survey
responses to make the survey more representative of the relevant target
population.
MIZIK_9781784716745_t.indd 596 14/02/2018 16:38

Some of the common methods of conducting a survey are in-per-

son interviews, phone surveys, mail surveys, or online. Each of these
approaches comes with strengths and weaknesses that are outlined in
detail in Diamond (2011). In brief, in-person interviews and phone surveys
can be expensive and require well-trained personnel, but often produce
higher response rates and lend themselves to random sampling techniques.
It is also relatively easy to use random sampling for mail surveys (e.g., by
randomly selecting households in a relevant geographic area), but these
surveys have historically suffered from low response rates. Internet-based
surveys are becoming increasingly common in the litigation context. They
are relatively inexpensive and quick to complete, and they provide the
researcher with various ways to avoid common sources of bias due to the
design and implementation of the survey instrument (e.g., by rotating the
order in which options appear to avoid giving an advantage to the first or
last option in a list and eliminating the chance that interviewer intonation
or preference might inadvertently sway survey respondents). They also
allow the researcher to contact a sample that is pre-screened for certain
characteristics (e.g., parents or smartphone owners) and to provide a
greater variety of information to respondents than might be possible using
mail or telephone surveys, but can result in greater respondent fatigue.13
When conducting an internet survey, however, particular care should
be used in ensuring that the sampled population adequately reflects the
relevant target population. Ideally, the sample frame (i.e., the population
that the survey is drawn from) and the relevant target population would
match perfectly. However, this is rarely the case. Internet surveys are often
conducted using a standing pool of panellists willing to respond to such
surveys. Depending on the methods used to recruit panel participants,
the panel may be more or less representative of the broader population of
interest. Panel providers that recruit participants on an opt-in basis using
social media advertisements may, for example, disproportionately contain
younger members with lower incomes and lower workforce attachment
rates. In contrast, panel providers that operate on an invitation-only
basis and actively recruit harder-to-reach segments of the population
may achieve a more representative panel. Yeager et al. (2011) have shown
variation in the quality of online panels as compared to results found using
a simple random sampling approach conducted via alternative means.14
Because of this variation in quality, a researcher using online survey
panels should undertake a careful comparison of the difference between
the target population and the sampling frame to determine to what degree
this difference is likely to bias or otherwise invalidate the results of the
survey. Sometimes, this step is combined with an assessment of the quality
of the final sample, by comparing the known characteristics of the target
MIZIK_9781784716745_t.indd 597 14/02/2018 16:38

population either as produced by other surveys of a known quantity or

though census and/or other means, with the characteristics of the final
survey sample as discussed in the prior section.
Analyzing the Conjoint Survey Data
CBC-based approaches assume that consumers make trade-offs between

different products based on the level of utility they can derive from those
products. Instead of directly asking respondents to rate or rank possible
products or features, CBC analyses use data on how respondents trade
off between hypothetical choices to draw inferences about the utility each
respondent derives from various attributes of a product. A key assumption
of this analytical approach is that the value of a product to an individual
can be expressed as the sum of predicable components that represent pref-
erences for each attribute of the product and a random, individual-specific
unobservable component.15 In other words, the sum of the individual
utilities associated with the attributes of a product gives the total utility
of that product for each individual. Given this assumption, analysis of the
survey responses can yield measurements of the utilities associated with
specific product attributes. Because the random component cannot be
observed, the model cannot predict each individual’s selection with cer-
tainty. However, the model does allow one to estimate the probability that
individuals will choose a particular product.16
Different research choices can be made regarding the random compo-
nent in the model, which lead to somewhat different model specifications
and preferred estimation methods. For example, a conditional logit model
estimates a single coefficient for each product attribute and assumes the
error term is IID extreme-value and exhibits independence of irrelevant
alternatives. As another example, in a mixed logit model (also known
as random parameter or random coefficient logit model), the coefficient
associated with each attribute is allowed to vary randomly across respond-
ents and can be estimated using simulation methods (e.g., Maximum
Simulated Likelihood17 or Bayesian methods18). Cameron and Trivedi
(2005) provide a detailed description of various multinomial models and
estimation approaches.19
Calculating the utilities associated with various product attributes may
be sufficient for certain purposes. For example, consumer willingness to
pay for a product feature can be calculated after implementing a logit
model by dividing the utility coefficient of that feature by the utility
coefficient of price. In a patent infringement matter, this evidence may be
sufficient to establish demand for the patented feature.
However, in many antitrust cases further analysis of the data will be
MIZIK_9781784716745_t.indd 598 14/02/2018 16:38

required because the market context is often relevant to addressing the key
issues in the case. For example, under certain assumptions conjoint survey
data can be used to estimate systems of demand equations, to estimate
market-wide responses to changes in price or product characteristics, or
to simulate different “but-for” market outcomes.20 As another example,
simulations can be conducted to test how a change in product attributes
changes the utility of different products and how consumers substitute
between products with different sets of features. Hildebrand (2006) out-
lines a methodology for using conjoint analysis to implement a hypotheti-
cal monopolist test for the purposes of market definition.21
In the next section, we present two case studies that illustrate how con-
joint survey data and models have been used in recent antitrust litigation.
Applying Conjoint Analysis in Antitrust

Litigation
Valuing Hypothetical Products and Predicting But-for Sales in the

Payment Card Industry
Our first case study on the use of conjoint analysis in antitrust litiga-
tion relates to the US payment card industry and allegations of antitrust
foreclosure related to certain rules adopted by the two largest payment
card associations (or network systems): Visa and MasterCard. Visa and
MasterCard were organized as joint ventures, owned by the numerous
banking institutions that are members of the networks. Member banks of
the MasterCard and Visa networks can function either as card “issuers,”
merchant “acquirers,” or both. An “issuer” member bank issues cards to
cardholders; it serves as the liaison between the network and the individual
cardholder. An “acquirer” member bank acquires the card-paid transac-
tions of a merchant; a particular acquiring bank acts as liaison between the
network and those merchants accepting the network’s payment cards with
whom it has contracted.
Visa and MasterCard imposed certain restrictions on member banks
that wanted to issue Visa and MasterCard credit and debit cards.
Specifically, both MasterCard and Visa had rules that prohibited their
members from issuing American Express or Discover cards. Those rules
(Visa’s by-law 2.10(e) and MasterCard’s Competitive Programs Policy
(“CPP”)) were the focus of a civil lawsuit filed by the US Department of
Justice against Visa and MasterCard (the “DOJ Case”) in October 1998.22
The Court in the DOJ Case ordered the repeal of Bylaw 2.10(e) and the
CPP with respect to third-party issuing, finding that those restrictions
MIZIK_9781784716745_t.indd 599 14/02/2018 16:38

were an antitrust violation.23 Following the judicial appeals process of the

DOJ Case, Discover sued both Visa and MasterCard for various alleged
antitrust violations related to the card associations’ rules involving the
issuance of credit and debit cards.24
As with the DOJ Case, a key focus of the antitrust allegations in the
Discover matter was on the Visa and MasterCard rules prohibiting asso-
ciation member banks (i.e., banks that were issuing Visa or MasterCard
payment cards) from also issuing payment cards over the Discover or
American Express networks (the “exclusionary rules”). Under Discover’s
theory, these exclusionary rules, in essence, forced issuing banks to
make an “all-or-nothing” choice between continuing to issue Visa and/
or MasterCard branded cards or issuing cards over the Discover or
American Express networks. Discover’s theory of injury was that, without
the exclusionary rules, more banks would have issued more Discover (and
American Express) cards, which would have lead them to be accepted
more widely and rendered them more attractive to consumers and mer-
chants. Discover therefore claimed antitrust liability and damages based
on a theory of market foreclosure.
In assessing these antitrust allegations, the experts in the case (which
included economists, statisticians, and marketing experts) analyzed the
impact of the exclusionary rules on competition among card networks (and
among issuing banks) and on end consumers. These analyses focused on
overall output in the payment card industry, Discover’s sales and market
share in the industry, the degree of product and brand differentiation across
payment networks and issuing banks, and the extent of product variety
available to consumers. Evidence relevant to these issues was available from
a number of sources, including testimony from company fact witnesses and
business documents focused on strategy and marketing issues.
One key issue in the case, however, was how consumers might value
product offerings that did not then exist in the marketplace – e.g., a
Discover-branded card issued by Bank of America. A second key issue
was how Discover might have fared in a competitive environment in which
the exclusionary rules did not exist. In other words, what would Discover’s
sales (and market share) have been in a “but-for world” in which it had
been able to contract more easily with issuing banks to offer Discover-
branded payment cards.
A conjoint survey was conducted to help assess these issues. The
conjoint survey asked respondents to choose between four different
credit card offerings, with varying features. Some of the tested features
included: association or network type (i.e., Visa, MasterCard, Discover, or
American Express), issuing bank (Bank of America, Chase, etc.), annual
fee, and other card features (e.g., type of rewards program, if any).
MIZIK_9781784716745_t.indd 600 14/02/2018 16:38

There were a number of controversial features regarding the conjoint

survey design. For example, the study did not ask respondents to assume a
so-called empty wallet (i.e., to assume that they did not have their existing
actual payment cards available). The study also did not present consumers
with a “none of the above” option.
The relevant target population for the conjoint survey was current
and prospective credit card holders. A random sample was drawn from a
standing internet panel of potential respondents and an online survey was
conducted. The sampling process was designed to reflect a broad cross-
section of credit and debit users in the United States.
Once the survey was complete, the CBC survey data were analyzed
using simulation techniques to estimate but-for card shares for the
Discover network and for third-party issued Discover cards. The study
measured a base set of payment card “take rates” (interpreted as a meas-
ure of market share) to determine base market shares for Discover, Amex,
and various Visa and MasterCard products. The study also measured
take rates with new Discover-branded payment cards issued by other
banks added to the marketplace mix. In considering Discover’s but-for
sales, the experts examined the extent to which any new card types issued
by Visa and MasterCard network banks would have been expected to
“cannibalize” existing Discover card business.25 In other words, would
additional Discover-branded cards issued by Visa or MasterCard network
banks have expanded the overall sales through Discover’s network, or just
diverted sales from one Discover branded card to another?
The CBC data were used to construct a number of simulation exercises
that were used to address a wide variety of antitrust issues in the case. For
example, the simulation results were used to argue that the exclusionary
rules restricted output, as the results suggested that the availability of non-
proprietary Discover cards would increase total output in the US payment
cards industry. These results were also used to help analyze the following:
l Expected profitability to third-party banks of issuing payment cards

on Discover’s network;
l Extent to which Discover was a differentiated brand;
l Attractiveness of non-proprietary Discover cards to consumers (i.e.,
consumer demand for the allegedly foreclosed products); and
l Alleged damages suffered by Discover from the exclusionary rules.
Along with some of the survey design issues noted above, the simulation
model was also a point of some controversy in the litigation. There was
substantial debate among the experts in the case as to whether the simula-
tion models, which were used to estimate relative market shares (for both
MIZIK_9781784716745_t.indd 601 14/02/2018 16:38

card networks and issuing banks), were a sufficiently rich and realistic
characterization of the actual market environment faced by cardholders.
Since the actual market environment involved many more card features
and options than could be incorporated practically into the simulation
models, the models necessarily required abstraction from many real-world
complexities. As such, the perceived validity and interpretation of the
relative market share estimates were colored substantially by the extent
to which the abstractions were deemed to be reasonable, given the market
circumstances at issue.26
Estimating But-for Market Shares and Antitrust Damages in the Infant

Formula Supplements Industry
Our second case study on the use of conjoint analysis in antitrust litigation
comes from allegations of antitrust liability and damages from exclusive
distribution contracts in the infant formula supplements industry. One of
the more important developments in the infant formula industry in recent
years has been the introduction of DHA and ARA additives produced
from various sources other than breast milk.27 DHA and ARA are types of
fats that are found naturally in breast milk.28 Research suggests that DHA
and ARA from breast milk provide substantial benefits for infant eye and
brain development.29
Infant formula supplemented with DHA and ARA has been substan-
tially more expensive than un-supplemented infant formula. For example,
the US Department of Agriculture estimated that a supplemented Mead
Johnson infant formula was about 9.4 percent more expensive per ounce
in January 2006 than an un-supplemented infant formula. Despite the
higher prices, these additives have been well-received by US consumers
and sales of DHA- and ARA- supplemented infant formula expanded
rapidly following their initial introduction. In 2004, about 65 percent of
US infant formula dollar sales were DHA- and ARA- supplemented and
by 2008 the rate was over 95 percent.30
Martek Biosciences Corporation (“Martek”) is a US a producer of food
ingredients from microbial sources (e.g., algae and fungi).31 Martek has
developed and patented fermentable strains of microalgae which produce
oils rich in DHA.32 A similar Martek-patented process was developed for
a fungus that produces an oil containing ARA.33 Substantially all DHA
and ARA supplements used in US infant formula have been produced and
sold by Martek,34 at least in part because Martek has had long-term sole
source (exclusive) supply agreements in place with the large infant formula
manufacturers operating in the United States.35
BNLfood (“BNL”) sells DHA and ARA supplements (derived using
MIZIK_9781784716745_t.indd 602 14/02/2018 16:38

egg phospholipids) for use in infant formula. BNL sells its DHA and
ARA products in Europe, Asia and the United States. BNL’s DHA and
ARA supplements are advertised as “completely natural egg derived
fatty acids for all life stages, i.e., infancy, adulthood & ageing.”36 This
characteristic may resonate with some end-purchasers of infant formula,
who would prefer to avoid bio-engineered DHA and ARA, as third-party
market research has identified strong consumer demand for organic and
natural infant nutrition products and anticipates market growth in this
area.37
In addition to being derived from natural sources, BNL’s fatty acids
were alleged to be more effective than Martek’s oils for some functions.
For example, studies have suggested that ARA and DHA are more
bioavailable when they are delivered in phospholipid form as opposed to
triglyceride form38 and that egg phospholipids are more bio-effective than
triglycerides (algal oil-type).39
In 2011, BNL filed an antitrust lawsuit against Martek related to
Martek’s exclusive contracts with infant formula manufacturers for the
supply of ARA and DHA. According to the Amended Complaint in that
case, the matter involved:
an action against Martek for its efforts to monopolize the manufacture and
sale of DHA and ARA for use in infant formula in the US market. Faced with
an emerging competitive threat from BNLfood, Martek has acted to protect
its monopoly position by extending its exclusive contracts in violation of US
antitrust laws.40
The economic experts in the case analyzed whether: (1) Martek’s alleged
anticompetitive conduct has substantially limited competition in the
market(s) in which its products are sold; (2) Martek’s alleged anticompeti-
tive conduct has caused injury to BNL; and (3) the extent of damages to
BNL arising from Martek’s alleged anticompetitive conduct. One of the
key elements of the antitrust analysis was an estimate of BNL’s long-run
US market share but-for Martek’s exclusive agreements, which allegedly
delayed BNL’s entry into the US market. One basis for estimating this
figure was market data on the shares achieved by a variety of organic or
natural products, which reflect the demand for natural, non-bioengineered
products in the United States.
Another estimate of the long-run market share that BNL’s egg-based
DHA and ARA could be expected to achieve in the but-for world was
derived using an internet-based survey of recent infant formula purchas-
ers in the United States and a series of conjoint exercises that were used
to estimate the relative demand for different additives types at different
price points. The survey was thus designed to measure the demand for
MIZIK_9781784716745_t.indd 603 14/02/2018 16:38

infant formula supplemented with egg-based DHA and ARA relative to

the demand for infant formula supplemented with algae or fish-oil based
DHA and ARA.
The survey involved more than 400 parents with children under the age of
three who had purchased infant formula within the last year. The sample was
reweighted to account for differences between the internet-based sample and
the general population of interest using the distributions of income, educa-
tion and age. The respondents were asked to undertake a series of conjoint
exercises. The survey instrument was designed to rotate the order of ques-
tions and potential responses to questions to avoid bias from order effects.
In this case, the conjoint survey instrument was relatively simple – a
series of bilateral choice exercises with products represented by three
characteristics. There was a dispute as to whether this abstraction was
an adequate representation of actual consumer choices for purposes of
addressing the issues in the case. There was also a dispute over whether
additional features should have been included in the survey instrument in
order to “mask” the specific feature of interest.
For purposes of addressing the relevant antitrust questions in this
case, the model results were used to estimate long-run market shares
assuming BNL was able to enter the US market. In order to implement
the model, the product alternatives available in the marketplace were
defined (Martek’s and BNL’s), including expected price differentials. The
FDA has estimated that the cost of algal- and fungal-supplemented infant
formula in 2006 was about 10 percent higher than non-supplemented
formula.41 Consequently, assuming that egg-based supplements were three
times more expensive than algal- and fungal-based supplements, under
certain assumptions the price of infant formula supplemented with egg-
based DHA and ARA would be about 30 percent higher than other sup-
plemented infant formula. When the survey respondents were confronted
with an egg-based product that was 30 percent more expensive, about 20
percent of respondents selected the more expensive egg-based product.
Thus, the survey supported an estimated long-run market share for BNL
in the but-for world of approximately 20 percent.
The results from the survey also allow quantification of the reduction in
consumer welfare from Martek’s alleged foreclosure strategy by estimating
a logit model on the conjoint survey results.42 This result was used to argue
that Martek’s exclusive contracts imposed a substantial anticompetitive
effect in the marketplace.
Finally, damages in antitrust matters are often calculated by computing
the change in plaintiff’s profits associated with two different states of the
world: one with the anticompetitive behavior at issue (e.g., monopolized
market or degraded access to an essential input) and another which
MIZIK_9781784716745_t.indd 604 14/02/2018 16:38

reflects the world without the presumed anticompetitive behavior (the

“but-for world”). Because the but-for world typically cannot be observed,
inference from observed or generated data is typically required – one such
source for such inference is conjoint survey data. As such, the conjoint
survey data and model, and subsequent long-run market share estimate,
were also used to support BNL’s damages claims in the case.
Conclusion
Consumer surveys and conjoint analysis have a number of potential appli-

cations in antitrust litigation, especially in circumstances where analysis
of consumer substitution patterns across products is likely to be relevant.
These include merger investigations, analysis of foreclosure allegations,
and damages estimation.
Although economists often prefer using market data rather than survey
data, in some instances market data are unavailable (e.g., for certain hypo-
thetical products) and in other instances conjoint survey data can have
certain advantages over market data. As such, conjoint analysis can often
be a useful tool to supplement other sources of evidence in antitrust cases.
Notes
1. S. Diamond, 2001, “Reference Guide on Survey Research,” Reference Manual on

Scientific Evidence, pp. 359–423 at p. 365.
2. D. Rubinfeld, 2008, “Quantitative Methods in Antitrust,” Competition Law and Policy,
pp. 723–742 at p. 736.
3. G. Reynold and C. Walters, 2008, “The Use of Customer Surveys for Market Definition
and the Competitive Assessment of Horizontal Mergers,” Journal of Competition Law
& Economics, 4(2), pp. 411–431; S. Hurley, 2010, “The Use of Surveys in Merger and
Competition Analysis,” Journal of Competition Law &Economics, 7(1), pp. 45–68.
4. Related, consumer surveys and conjoint analysis can be used to help merging compa-
nies formulate new product offerings and refine pricing strategies.
5. D. Rubinfeld, 2008, “Quantitative Methods in Antitrust,” Competition Law and Policy,
pp. 723–742 at p. 735.
6. A survey instrument can be pre-tested with a small sample of respondents who meet the
criteria for inclusion in the sample. During the pre-test, the respondents are asked to
complete the survey and questioned about, for example, how they interpreted the ques-
tions and whether they found them confusing, what they understood the features to be,
whether they could identify the purpose or sponsor of the survey, whether they could
tell what features, products, or characteristics were of primary interest, etc. The survey
instrument can then be revised in response to such feedback.
7. T. Allenby et al., 2014, “Valuation of Patented Product Features,” Journal of Law and
Economics, 57(3), pp. 629–663 at p. 644.
8. R. Haaijer et al., 2001, “The ‘no-choice’ alternative in conjoint choice experiments,”
International Journal of Market Research, 43(1), pp. 93–106 at pp. 93 and 105.
MIZIK_9781784716745_t.indd 605 14/02/2018 16:38

9. G. Reynold and C. Walters, 2008, “The Use of Customer Surveys,” at pp. 418–419.

10. Reynold, Walters, 2008, “The Use of Customer Surveys,” at p. 422.
11. S. Diamond, S., 2011, “Reference Guide on Survey Research,” Reference Manual on
Scientific Evidence, 3, pp. 359–423 at p. 361.
12. G. Allenby et al., 2014, “Valuation of Patented Product Features,” Journal of Law and
Economics, 57(3), pp. 629–663 at p. 641.
13. Savage and Waldman (2008) discuss some of the advantages and disadvantages of mail
versus online surveys with an emphasis on the potential for respondent fatigue and the
number of choice tasks to present using each mode. (S. Savage and D. Waldman, 2008,
“Learning and Fatigue During Choice Experiments; A Comparison of Online and Mail
Survey Modes,” Journal of Applied Econometrics, 23, pp. 351–371.)
14. D. Yeager et al., 2011, “Comparing the Accuracy of RDD Telephone Surveys and
Internet Surveys Conducted with Probability And Non-Probability Samples,” Public
Opinion Quarterly, 75(4), pp. 709–747.
15. The economic theory behind this approach is based on random utility theory which has
its origins in work by Thurstone and tied closely to work by Daniel McFadden on dis-
crete choice modeling. (See, L. Thurstone, 1927, “A Law of Comparative Judgment,”
Psychological Review, 34, pp. 273–286; and, e.g., D. McFadden, 1974, “Conditional
Logit Analysis of Qualitative Choice Analysis,” in Frontiers in Econometrics, ed.
P. Zarembka, 105–142. New York: Academic Press.) Louviere, Flynn, and Carson
(2010) provide a historical perspective on the evolution of CBC methods (which they
refer to as Discrete Choice Experiments) and how CBC methods differ from other
forms of conjoint analysis. (See J. Louviere, T. Flynn and R. Carson, 2010, “Discrete
Choice Experiments Are Not Conjoint Analysis,” Journal of Choice Modelling, 3(3),
pp. 57–72.)
16. The probability is equal to the sum of the systematic component of utility associated
with that product plus the random component for that product is greater than the sum
of the systematic and random components for each competing option.
17. D. McFadden and K. Train, 2000, “Mixed MNL Models for Discrete Response,”
Journal of Applied Econometrics, 15, pp. 447–470; D. Revelt and K. Train, 1998,
“Mixed Logit With Repeated Choices: Households’ Choices of Appliance Efficiency
Level,” Review of Economics and Statistics, pp. 647–657; D. Brownstone and K. Train,
1999, “Forecasting New Product Penetration With Flexible Substitution Patterns,”
Journal of Econometrics, 89, pp. 109–129.
18. Hierarchal Bayesian methods can be implemented by statistical packages such as
those offered by Sawtooth Software. See http://www.sawtoothsoftware.com/products/
conjoint-choice-analysis.
19. A.C. Cameron and P.K. Trivedi, 2005, Microeconometrics: Methods and Applications,
New York: Cambridge University Press, Chapter 15.
20. Dube et. al., (2002) describe the derivation of a mixed multinomial logit discrete
choice demand system, discuss issues arising from the estimation of demand systems
using individual choice level data, and review several empirical applications of related
methodologies. (See J.P. Dube, et. al., 2002, “Structural Applications of the Discrete
Choice Model,” Marketing Letters 13:3, pp. 207–220.) Rubinfeld (2011) provides a
more general overview of econometric issues common in antitrust settings, including a
description of several approaches to merger simulation.
21. D. Hildebrand, 2006, “Using Conjoint Analysis for Market Definition; Application
of Modern Market Research Tools to Implement the Hypothetical Monopolist Test,”
World Competition, 29(2), pp. 315–336.
22. US v. Visa USA, Inc., et al., 163 F. Supp. 232, 327-329 (S.D.N.Y. 2001). The DOJ also
sought the repeal of the card association governance rules which permitted “members
of each association to sit on the Board of Directors of either Visa or MasterCard,
although they may not sit on both.”
23. US v. Visa USA, Inc., et al., 163 F. Supp. 232, 410-411 modified, 183 F. Supp. 613,
614-619 (S.D.N.Y. 2001). The District Court’s order was upheld on appeal and became
MIZIK_9781784716745_t.indd 606 14/02/2018 16:38

final on October 4, 2004. US v. Visa USA, Inc., et al., 344 F.3d 229 (2d Cir. 2003), cert.
denied, 543 U.S. 811 (2004).
24. Discover Financial Services Inc. and Discover Bank v. Visa USA Inc., et al, 04-cv-
07844, U.S. District Court, Southern District of New York (Manhattan). American
Express also sued Visa and MasterCard on similar grounds. For further background,
see also http://www.bloomberg.com/apps/news?pid=newsarchive&sid=a7pbgZn.610c
&refer=finance and http://www.nytimes.com/2007/11/08/business/08visa.html?_r=0.
25. Cannibalization was measured as the ratio of the difference in take rate for proprietary
Discover cards (i.e., Discover-branded payment cards that were also issued by Discover
as the issuing bank) from the two scenarios to the overall take rate for non-proprietary
Discover cards (i.e., Discover-branded cards issued by other banks). For example,
suppose the take rate for proprietary Discover cards was 19 percent in the base case,
and in the new scenario the take rate for proprietary Discover cards was 18 percent and
third-party Discover cards take rate was 5 percent. In this case, the cannibalization rate
is 20 percent [(19–18)/5].
26. The litigation between Discover and Visa settled on the eve of trial, so it is unclear how
the competing interpretations of the various experts would have been received by the
jury.
27. Abbott first introduced these additives into its US product lines in 2002, with Mead
Johnson and Nestle following in 2003. V. Oliveira, E. Frazao, and D. Smallwood,
“Rising Infant Formula Costs to the WIC Program: Recent Trends in Rebates and
Wholesale Prices / ERR-93,” USDA, February 2010, p. 9. (Hereinafter “Rising Infant
Formula Costs to the WIC Program”.)
28. See A. Abad-Jorge, “The Role of DHA and ARA in Infant Nutrition and
Neurodevelopmental Outcomes,” Today’s Dietitian, 10 (10), p. 66.
29. See, e.g., R. Uauy, D. R Hoffman, P. Mena, A. Llanos, E. E Birch, “Term infant
studies of DHA and ARA supplementation on neurodevelopment: results of rand-
omized controlled trials,” Journal of Pediatrics, 143 (4), Supplement, October 2003,
pp. 17–25.
30. “Rising Infant Formula Costs to the WIC Program,” p. 9.
31. In December 2010, Martek was acquired for $1.09 billion by Dutch vitamin maker
Royal DSM N.V. R. Sharrow, “Martek to be acquired by Royal DSM for $1.09B,”
Baltimore Business Journal, December 21, 2010, http://www.bizjournals.com/baltimore/
news/2010/12/21/martek-to-be-acquired-by-royal-dsm.html.
32. Martek Biosciences Corporation Form 10-K for the Fiscal Year Ended October 31,
2005, p. 2. See also US Patent No. 5,374,657.
33. Martek Biosciences Corporation Form 10-K for the Fiscal Year Ended October 31,
2005, p. 2.
34. Martek’s DHA oil is the only source of DHA currently used in infant formula in the
United States and represents “nearly 100% of the estimated $4.5 billion US retail
market for infant formula.” See Martek Biosciences Corporation Form 10-K For the
Fiscal Year Ended October 31, 2009, p. 15.
35. See, e.g., http://www.bloomberg.com/apps/news?pid=newsarchive&sid=aSv1xqLUNRQ
A; http://www.prnewswire.com/news-releases/martek-biosciences-announces-extended-
global-sole-source-supply-agreement-with-mead-johnson-96792844.html; and http://
www.prnewswire.com/news-releases/martek-signs-multi-year-worldwide-sole-source-sup
ply-agreement-with-abbott-58472377.html.
36. See http://www.ovolife.eu/.
37. “US Infant Nutrition Market: N5BD-88,” Frost and Sullivan, 2009.
38. See, e.g., F. Thies et al., “Unsaturated fatty acids esterified in 2-acyl lysophosphati-
dylcholine bound to albumin are more efficiently taken up by the young rat brain
than the unesterified form,” Journal of Neurochemistry, 59, pp. 1110–1116 (1992); and
V. Wijendran et al., “Efficacy of dietary arachidonic acid provided as triglyceride or
phospholipid as substrates for brain arachidonic acid accretion in baboon neonates,”
Pediatric Research, 51, pp. 265–272 (2002).
MIZIK_9781784716745_t.indd 607 14/02/2018 16:38

39. See, e.g., V. Wijendran et al., Pediatric Research, 51, pp. 265–272 (2002) and M.
Lagarde et al., Journal of Molecular Neuroscience, 16, pp. 201–204 (2001).
40. Amended Complaint for Injunctive Relief and Damages, BNLFood Investments
Limited SARL v. Martek Biosciences Corp., May 5, 2011.
41. “Rising Infant Formula Costs to the WIC Program,” p. 10.
42. See, e.g., Kenneth Train, Discrete Choice Methods with Simulation, 2nd ed., New York:
Cambridge University Press, 2009, pp. 55–56.
MIZIK_9781784716745_t.indd 608 14/02/2018 16:38

32. Feature valuation using equilibrium
conjoint analysis
John R. Howell, Greg M. Allenby and
Peter E. Rossi
In many settings it is necessary to determine the value of a feature of a

product rather than the value of a product as a whole. Feature valuation
is relevant in both product design and development as well as in litiga-
tion settings. In order to make decisions regarding investment in product
research and development, companies must know the price premium com-
manded by a new product feature. Feature valuation is also relevant in
legal settings, particularly in patent disputes (Barry, Arad, and Swanson
2013; Jeruss, Feldman, and Walker 2012). One of the core issues in patent
and other intellectual property litigation is determining profits lost due
to infringement or a reasonable royalty rate. This involves determining
the value of the feature that the patent enables in the product since many
products are a bundle of features, some of which may or may not be
covered by the patent.
Consider the recent high-profile case between Samsung and Apple
(Apple Inc. v. Samsung Electronics Co. 2001). This dispute involved
features of smartphones, including so-called smart gestures. The jury was
asked to consider the market value of the patent in question. In another
example, the court was asked to determine the fair licensing rights for
recorded music played in bars and nightclubs (Phonographic Performance
Co. 2007). The common problem in these situations is that the object
under question is not a standalone product that is readily available, but
the object to be valued is a feature of the product rather than the product
as a whole.
In this chapter, we outline an approach to valuing a feature of a product
based on the principle of incremental profits.1 A feature or invention is
only valuable to a firm if it can translate the feature into additional profits.
The amount of additional profits that a firm can capture by implementing
a feature represent the maximum amount a firm would be willing to pay
to develop that feature internally or purchase it from an external source.
The premise of feature valuation in a litigation context is very similar.
One important aspect of damages is the profits lost by the firm whose
patents are infringed upon by other firms. In the absence of infringement,
609
MIZIK_9781784716745_t.indd 609 14/02/2018 16:38

the profits of the patent holder would be higher, and these lost profits are
a measure of damages.
In order to calculate potential or lost profits we need to represent
the demand system and the competitive environment of the firm. In a
real marketplace, companies change their prices in response to different
market situations. The problem with many methods of calculating optimal
prices and the associated demands is that they assume a static marketplace
where the set of competitive products remains fixed when a firm enhances
its product. In reality, as features are introduced into the market place,
competitors respond to those introductions by either adjusting prices or
feature sets. These price changes can dampen the advantage that a com-
pany can achieve by the introduction of the new product. If a company
does not take this competitive reaction into account, static models will
overstate the potential benefit of the product introduction.
Competitive reaction can be taken into account using a concept called
market equilibrium. Equilibrium occurs when the market participants do
not have an incentive to change their current offerings. When a market
is not in equilibrium firms would be able to increase profits by adjust-
ing prices or features offered. Economic theory suggests that markets
will settle into equilibrium as profit maximizing firms seek to maximize
their respective advantage. If we have an accurate demand system and
information on the marginal costs of the participating firms, we can
mathematically simulate a market’s equilibrium and capture both the first
and second order effects of a feature introduction on profitability.
This chapter will discuss in more detail the equilibrium calculations
and how an analyst can carry those out. An important part of these
equilibrium calculations will be the creation of a high-quality demand
model. Discrete choice models are commonly used for the creation of these
demand models so we will discuss the specifics of using this technique for
the creation of a basic demand model. The chapter will then illustrate the
technique with a hypothetical patent setting using digital cameras. The
conclusion will discuss the broader application of these techniques as well
as the limitations and challenges that these techniques engender.
Equilibrium Calculations
An equilibrium occurs when none of the firms under study have a profit
incentive to change the price or features they offer. Any equilibrium cal-
culation must take into account a firm’s incentives as well as consumer
responses. Many different theories of equilibrium have been developed
using various assumptions about the decision-making process of the
MIZIK_9781784716745_t.indd 610 14/02/2018 16:38

Feature valuation using equilibrium conjoint analysis 611
competing firms. With many of the proposed equilibrium theories, math-

ematically solving for the equilibrium can be complicated and require
specialized algorithms. While the basic premise of using equilibrium as
the basis for a lost profit calculation applies to any equilibrium calcula-
tion, actually solving for equilibrium may not be practical depending on
the specific context and assumptions. Rather than considering specialized
equilibrium settings, we demonstrate the technique using a common and
relatively simple equilibrium concept called a Nash equilibrium.
A Nash equilibrium has two major simplifying assumptions that make
the equilibrium calculations fast and stable. The first is that all the firms
have access to the same information. This means that all the firms have
complete information about the consumer demand system and cost struc-
ture of the various firms in the study. While this may seem to be a strong
assumption, it is fairly mild in this specific situation. When the demand
system is constructed using a discrete choice survey, all the firms should
have access to similar survey responses. Software to create and field
discrete choice surveys is readily available and commonly used in market-
ing research studies. Because the survey data are relatively inexpensive
to collect, it is reasonable to assume that all the firms participating in the
market would have access to the demand system. The cost structure of
the participating firms is more closely guarded and difficult to ascertain.
Given publicly available information and the similar cost structure of
competing firms it is expected that the marginal costs can be reconstructed
for the firms in question. In intellectual property cases, such as the recent
cases over software patents, it is even less of a concern since the marginal
cost for including a feature is near zero and marginal costs could be a
subject of discovery.
The second major assumption in a Nash equilibrium is that firms
maximize profits for a single period and do not engage in forward looking
or dynamic maximization. This assumption is important for bounding
the equilibrium algorithm and simplifying computations. While there are
some extensions that could be made to handle multi-period optimizations,
we do not address them in this chapter.
In order to calculate the economic value of a product feature, we calcu-
late the equilibrium profits before and after adding the feature to a prod-
uct. This change in profit represents the value to the firm of implementing
the feature. Alternatively, we could consider the equilibrium profits of a
firm with and without a competitor implementing the feature. This would
give a measure of the damages that a firm incurred due to the competitor
implementing a patented product feature.
The difference in profit measure is directly related to the value that
the firm can capture from the implementation of a feature. A number of
MIZIK_9781784716745_t.indd 611 14/02/2018 16:38

a lternative measures have been proposed, but they largely ignore competi-
tive pressures or rely solely on a measure of a customer’s willingness-to-pay
for the feature rather than the value that the firm can ultimately capture.
Commonly used methods include calculating a customer’s willingness-to-
pay (WTP) or using a pseudo-WTP.
Willingness-to-pay is a social welfare concept that does not directly
relate to the value provided to the firm; however, it is a commonly used
metric to measure the value of a feature. In order to calculate WTP we
need to define two products, a feature-poor product, A, and a feature-rich
product, A*. WTP is the welfare surplus that a customer receives when
confronted with a choice between the feature poor product, A, and the
feature rich product, A*. This surplus is measured in terms of money
rather than in terms of utility. WTP is defined as the amount of additional
money that would have to be given to the consumer before they would be
indifferent between the feature-poor product with the additional money
and the feature-rich product. This is represented as:
V ( p,y 1WTP 0 A) 5 V (p,y 0 A*)
where V (p,y 0 A) is the indirect utility function of the consumer defined as:
V ( p,y 0 A) 5 max U ( x 0 A) subject to pr x # y.

x
In the previous equation, y represents the consumer’s budget constraint,

p a vector of prices, and x a vector of quantities for the product. A con-
sumer receives utility from three sources: (1) consumption of the product,
(2) consumption of an outside alternative, and (3) the random utility error.
Under a logit demand system, the indirect utility function is found by
solving for the expected maximum utility (see Ben-Akiva et al. (2015) for
an example).
V ( p, y 0 A) 5 E c max Uj 0 Ad
j
5 bpy 1 ln a exp (ajr b 2 bppj)

J

j51
It should be noted that is expressed in terms of utility rather than in

monetary terms. In order to convert utility to monetary terms we divide
by the marginal utility of income. For the logit model this is commonly
assumed to be the price coefficient.
We can then solve for WTP:
MIZIK_9781784716745_t.indd 612 14/02/2018 16:38

ln c a exp (a*j b 2 bp pj) d ln c a exp (ajr b2 bp pj) d

J J
r
j51 j51
WTP 5 2
bp bp
It should be noted that this value for WTP is slightly different from
WTP measure used in traditional conjoint literature (Orme 2001). In the
traditional conjoint literature, WTP is defined as the additional price
a firm can charge such that a consumer is indifferent between choice A
and A*.
exp (akr b2 pbp)
5
a exp aj b2 pj bp 1 exp akb 2pb
J
( r ) ( r )
j51, j2k
exp (a*k b 2 ( p 1 pWTP) bp)

r

a exp (aj b 2 pj bp) 1 exp (ak b 2 ( p1pWTP) bp)
J
r * r
j51, j2k
This simplifies to:

(akr 2 a*k r ) b
pWTP 5
2bp
When the products differ on only one attribute, and the attributes are
dummy coded, this further simplifies to the ratio of the partworth for the
feature and the price coefficient as is commonly seen. Since this does not
take into the account the changing consumption of the outside good or
the utility value due to the omitted characteristics represented by the error
term, it cannot be considered a true willingness-to-pay measure. In addi-
tion, it is invariant to which product receives the enhancement. For these
reasons we call this a pseudo-WTP (pWTP). It remains a useful method
for interpreting and comparing coefficients in a logit model, however it is
not a rigorous way of measuring consumer surplus.
Both WTP and pWTP suffer from similar problems in that they are
consumer measures rather than a measure of value that can be extracted by
the firm. Competitive markets prevent firms from capturing the full value
of the consumer surplus. When firms compete they lose market power rela-
tive to the consumer and the price of an offering falls. In fact, in perfectly
competitive markets, the price falls to the marginal cost and consumers
are able to retain the entire consumer surplus. When firms are offering dif-
ferentiated products, the prices settle to be somewhere between a perfectly
MIZIK_9781784716745_t.indd 613 14/02/2018 16:38

competitive market and a monopoly market and the consumer surplus

is split between the consumer and the firms. The exact ratio of this split
depends on the relative substitutability of the competitive offerings and the
market power of the firms. For this reason, WTP as a measure of consumer
surplus will tend to overstate the value of a specific product feature.
Even in a monopoly market, WTP and related measures are a poor
basis for setting prices and thus a poor measure of feature value. WTP
is a heterogeneous, consumer-specific measure, while firms are generally
limited to offering a product at a uniform price. If a firm were to price their
product based on a WTP measure, they must choose a single summary
measure as the basis for the pricing decision. For example, if the firm
priced the product at the median WTP value for consumers, then half
the market would purchase the product and half the market would not
purchase the product. It is unlikely that segmenting the market in this way
would be optimal. Depending on the slope of the demand curve, a higher-
priced niche product or a lower-priced mass-market product would lead
to greater profitability. In reality, consumers purchase products based on
which products will maximize their total consumer surplus rather than a
feature-specific surplus, further complicating the use of WTP to determine
purchases and thus feature value. Because the purchase rate depends on
the specific demand curve it is impossible to determine whether WTP will
over-state or understate the true value of a feature, although in practice
WTP generally overstates the true value of a feature.
An equilibrium analysis avoids these problems by directly incorporating
the firm’s problem into the analysis. An equilibrium analysis calculates
the optimal prices of all the competitive firms and uses those prices to
forecast market shares. This market-share measure and the total size of
the market can be used to predict the profitability of each firm under
various scenarios. By properly constructing the scenarios to reflect the
market place both with and without the feature under study, it is possible
to determine the difference in expected profit. This difference in expected
profit is a true measure of the value of developing or acquiring the given
feature for the firm. A firm should be willing to pay up to the difference in
profit for the feature in question as it is the highest price that represents a
positive return on investment.
In order to undertake an equilibrium calculation a number of assump-
tions are necessary. We need to define the nature of competition, the
competitive set, and the equilibrium concept. In addition we will need a
parametric demand system, such as the heterogeneous logit model, where
the products can be described as a bundle of attributes. We will also need
cost information in order to calculate optimal profit. Together these
components form the basis of the equilibrium calculation.
MIZIK_9781784716745_t.indd 614 14/02/2018 16:38

Specifically we will assume the following:
1. A demand specification: A standard heterogeneous logit demand that

is linear in the attributes (including price)
2. Cost specification: Constant marginal cost (this assumption can be
easily relaxed)
3. Single product firms (this assumption can be easily relaxed)
4. Feature exclusivity: The feature can be added to only one product
(this assumption can be easily relaxed)
5. No entry-exit: Firms cannot enter or exit the market after the product
enhancement takes place
6. Static Nash Price Competition.
Assumption 1 can be replaced with any valid demand system.

Assumption 5 avoids the combinatorial problem of determining at which
point a firm would leave the market and how the remaining firms would
respond. The proposed method will suggest if any firms is in danger of
exit, but it is not clear how to allow for firm entry. Assumption 6 cannot
be relaxed without considerable difficulty. Alternative equilibrium settings
have been proposed, however it is not obvious how to implement those
strategies with discrete choice data.
Using Discrete Choice to create a Logit

Demand System
Discrete choice or choice-based conjoint analysis studies have been used

extensively in marketing and economics to study preferences and con-
sumer choice. When consumers purchase at most one unit of a product
the standard choice model can be used to create a valid demand system.
The demand system mirrors a choice problem when consumers are faced
with choice alternatives, each described by a set of characteristics, aj, and a
price pj.The standard random utility model suggests that the utility for the
j th alternative consists of a deterministic portion driven by the character-
istics and price and a stochastic portion, which is assumed to follow a Type
I extreme value distribution (Ben-Akiva et al. 2015).
uj 5 b'aj 2 bp pj 1 ej
The vector of attributes of the product, aj, includes the feature that
requires valuation. This vector can represent discrete attributes in which
case the values are dummy coded or continuous attribute quantities. The
MIZIK_9781784716745_t.indd 615 14/02/2018 16:38

focal feature is denoted as af. Feature enhancement is modeled as alterna-

tive levels for the focal feature, af (a single element of the vector): 1 if the
product includes the feature enhancement af = 1 otherwise af = 0 .
There are a number of assumptions and conditions that are important
for use in feature valuation. The first, contrary to common practice, the
price of the offering must be represented as a linear variable rather than
an attribute with dummy variable coding. In addition the coefficient for
the price attribute must be strictly negative. In many conjoint settings,
if the price attribute regularly takes on K values, the researcher includes
K 21 dummy variables for each of those values. This leads to a non-
continuous demand function and can create a situation where there is not
an equilibrium price as the best response functions will not be continuous.
In addition, if the price coefficient, bp , is positive, there will not exist
an equilibrium since utility will increase as prices increase. This leads to
infinite prices and profits.
The second important assumption is that the model is compensatory
with linear utility. This means that the consumers trade-off prices and
features when making purchasing decisions. Thus firms can adjust prices
to compensate for the presence or absence of a given feature, especially the
focal feature.
The final important assumption is that the choice exercise is a reason-
able representation of the actual purchasing process. It is possible for
consumers to express preferences based on any choice set put before them.
This does not necessarily mean that those preferences represent a good
proxy to the actual choice process. The ability of choice models using
conjoint analysis to accurately predict market choices has been the subject
of extensive research (Allenby et al. 2005). The prevailing view is that a
well-designed choice exercise can provide a reasonable approximation
of market place demand. We consider a well-designed choice study as
one that has a reasonable set of competitors, a properly screened and
informed sample, appropriately defined attributes and levels, and properly
designed choice alternatives. In general the conjoint task should mimic the
real purchase decision as closely as possible while respecting the limits of
respondents’ processing ability.
To calculate the standard choice model we must calculate the prob-
ability that the jth alternative has the maximum utility and is thus the
chosen alternative given the assumed error distribution. As is well known,
the utility for a given alternative is arbitrary. This means that all the utili-
ties in a choice set can be multiplied by any constant (scale shift) without
changing the preference order. The same is true for locations shifts where
an arbitrary constant is added to all the utility values. This means that it
is not possible to identify the utility model without specifying a reference
MIZIK_9781784716745_t.indd 616 14/02/2018 16:38

utility and specifying the scale factor. The reference utility is usually set
by assigning one of the products a utility of zero. This is generally accom-
plished by dummy coding the attributes and assigning the associated
levels for the reference product a design code of zero. It is also necessary
to account for scale shifts. Typically, this is resolved by setting the scale
parameter for the extreme value distribution to 1.0.
Assuming all consumers have enough money to purchase any alterna-
tive, the random utility model yields the standard multinomial logit
specification commonly used with choice-based conjoint studies:
exp (braj 2 bp pj)

Pr ( j) 5
a exp (b ak 2 bp pk)
K
r
k51
where k indexes the choice options. In order to compute the equilibrium,
we will need to compute the aggregate demand for each choice alterna-
tive. Aggregate demand is calculated based on the probability of purchase
times the total market size. As discussed below, the inclusion of an outside
good is necessary in the logit model. This is done by modifying the design
matrix A to include a good representing the outside alternative. If the
outside good is constructed to have a utility of zero, this leads to the com-
monly seen 1 1 in the denominator of the multinomial logit formulation.
Because the b parameters are heterogeneous, we need to account for those
individual specific differences in the calculation of the aggregate probabil-
ity of purchase. We can calculate expected aggregate demand as:
E (Q) 5 M3Pr ( j 0 b ) Pr (b 0 q) db
where M is the market size, Q is the quantity demanded, and q is a vector

of hyper-parameters. If we use simulation-based Bayesian statistics, we
don’t need to calculate the integral directly. We simply need draws from
the posterior distribution of b, which are output by the MCMC estimation
algorithm or can easily be simulated from the posterior distribution of q.
With the draws of b, the formula for aggregate demand becomes:
E (Q) 5 M a Pr ( j 0 br) 5 M a
R R exp (Ajbr 2 bppj)
a k51 exp (Akbr 2 bppk)

K
r51 r51
where A represents the design matrix for the market being studied such
that
MIZIK_9781784716745_t.indd 617 14/02/2018 16:38

a1
a
A 5 ≥ 2 ¥.
(
aK
In this specification, akis the design for the kth product and j indexes the
focal product. Based on the aggregate demand it is possible to consider the
firms problem.
Equilibrium calculations
The firm’s problem is to maximize profits given a set of competitive offer-

ings. The challenge is that it is not reasonable to assume that the market
place will remain static in the face of price or product feature changes.
This means that it is not possible to simply compute a profit maximiz-
ing solution for the firm. If any competitor decides to react by changing
features or, more likely, prices, then the profit maximizing solution is no
longer valid. We use an equilibrium concept to account for the possibility
that firms will adjust their prices or products in the face of a new com-
petitive offering. In equilibrium there is no incentive for firms to change
their product offerings as any change will result in strictly lower profits.
It is possible to consider many different types of equilibriums where firms
try to simultaneously maximize product offerings and prices. However
an equilibrium where firms attempt to optimize product offerings is a
significant computational task. The number of solutions that need to
be considered expands multiplicatively with the number of features. For
simplicity we restrict our investigation to a price equilibrium where firms
simply respond by changing prices. The principles and methods discussed
in a price equilibrium extend to other types of equilibriums and provide a
strong foundation for understanding the basic concept.
The solution to an equilibrium relies on a concept called a conditional
demand curve. The conditional demand curve represents the demand that
a firm would expect given a set of competitive offerings and prices. This is
based on a firm’s profit function represented as:
p ( pj 0 p2j) = E [ Q ] ( pj 2cj) .
The marginal cost for firm j is represented by cj, the price for the firm pj ,
the price for the competitors p2j , and the quantity sold Q. Note that the
choice probability component of the profit function depends on the entire
set of competitive firms prices, p, and product offerings, A. This profit
MIZIK_9781784716745_t.indd 618 14/02/2018 16:38

function forms the basis for calculating the best response for a given firm.
A firm’s best response is the maximum profit.
max p ( pj 0 p2j)
p
Since the total market size, M, is simply a scaling factor for the total
profit, it does not affect the final solutions and can be ignored in the cal-
culation of the Nash equilibrium price. If the profit function is a concave
function then we could calculate each firm’s conditional best response
analytically by finding the partial derivative of each firms profit function,
setting it equal to zero, and solving the resulting systems of equations.
There are a few challenges, however, that makes this impractical. The first
is that it is not possible to show that the profit function is strictly concave
in a heterogeneous logit setting. The concavity of the function depends
on the parameter values derived from the choice models. In practice this
appears to be primarily driven by the distribution of the price coefficient.
The heterogeneous logit model will often lead to a concave profit func-
tion; however, we have observed cases where the profit function is not
concave. When the profit function is not concave, a common variation is
to observe a reasonable local maximum followed by a local minima and
then an extremely large global maximum. This appears to occur when
there is a large mass of price coefficients close to zero. The implication
here is that there is a small set of very price-insensitive consumers. These
price-insensitive consumers would buy the product regardless of the price,
so a firm would find it most profitable to charge very high prices to just
this small set of consumers. In most settings this not a practical solution
and we see it as an artifact of the choice exercise.
A second computational challenge stems from the integral in the profit
function of the firm. Recall that:
E (Q) 5M3Pr ( j 0 b) Pr (b 0 u) d b
This means that there will be an integral in any maximization problem

that is undertaken. It is therefore necessary to use simulation to solve the
optimization problem. Given that both the market share and the derivative
of market share are computationally easy to calculate, it is possible to use a
large number of draws to approximate the integral as previously described.
Given the profit function for the firm, the first-order conditions for firm
j are:
dp d
5 E c Pr ( j 0 p, A) d (pj 2 cj) 1 E [ Pr ( j 0 p, A) ] .
dpj dpj
MIZIK_9781784716745_t.indd 619 14/02/2018 16:38

We therefore define the first-order conditions for all firms as
dp
h1 ( p) 5
dp1
dp
h2 ( p) 5
dp2
h (p) 5
(
dp
hJ ( p) 5
dpJ
and the equilibrium price vector, p*, is the zero of the function h (p) .
As previously discussed, the profit functions for firms are often, but
not always concave. Because of this it is necessary to independently verify
the computed root to the first-order conditions and we demonstrate two
methods for finding equilibrium prices. The first method involves directly
computing equilibrium prices using the first-order conditions. This opti-
mization involves using a quasi-Newton method to find the roots directly.
The optimization problem is finding the minimum of the norm of h (p)
min 00 h ( p) 0
p
At a true root the norm of h (p) should be exactly equal to zero.

Numerical errors due to the root-finding algorithm will often prevent
the norm from completely reaching the root. This method allows you to
leverage the numerical optimization methods built into computational
software.
An alternative is an iterative method that leverages the concept of a
best response curve. Starting with an arbitrary set of prices, find the profit
maximizing solution for the 1st firm. This can either be done directly using
the conditional profit function or by using the first-order conditions.
Using the conditional profit functions directly does not necessarily assume
any specific shape of the demand function and is generally more robust,
albeit slower. The price vector is then updated with the profit-maximizing
price and the process is repeated with the second firm and so forth. Once
the profit-maximizing price for the Jth firm is computed, repeat the process
starting from the first firm. The process ends when the difference between
optimal price vectors for subsequent iterations is less than a predetermined
tolerance. The full algorithm is described below:
1. Define a vector of starting prices.

2. Calculate the profit optimizing price for the 1st firm.
MIZIK_9781784716745_t.indd 620 14/02/2018 16:38

3. Update the price for the first firm in the price vector.
4. Repeat steps 2–3 for the remaining firms, one at a time
5. Calculate the difference between the starting price vector in 1 and the
updated price vector from step 4.
6. If the difference between the price vectors is greater than the tolerance
set the price vector from step 4 as the starting price vector and go to
step 2.
The final price vector represents a Nash equilibrium. If there is concern

about multiple equilibriums, repeat the process from different starting
points and compare the results.
In general, we run both methods and compare the results. This gives us
greater confidence that the computed equilibriums are reasonable and stable.
If the results disagree we engage in further investigation to determine if profit
functions are concave in the neighborhood of the computed equilibriums.
Use of Equilibrium Calculations to Assess

Feature Value
Equilibrium calculations can be used to assess the value of a feature to a

firm. The researcher simply needs to calculate the equilibrium profit under
a base setting where the feature is not included in the firm’s product and
then recalculate the equilibrium with the feature included. The difference
in profit between the two equilibrium conditions represents the poten-
tial value that a firm can receive from the feature implementation after
accounting for possible competitive responses. This value could be used
for determining the potential budget to develop and implement the feature
or it could be used in licensing or damage calculations.
What Makes a Good Conjoint Analysis?
Because equilibrium analysis relies heavily on a stable and well-designed

conjoint analysis, it is necessary to take the extra effort to create a quality
conjoint analysis model. A quality conjoint analysis model is driven by the
trade-off between realism of the exercise and capabilities of the respond-
ent and survey platform. Data quality is compromised as the realism of
the tasks decreases and as the survey becomes more complex. These two
dimensions generally work in opposite directions and thus a trade-off is
necessary. This section addresses some of the most important considera-
tions. For a more general treatment of the best practices in discrete choice
conjoint analysis, see Orme (2009).
MIZIK_9781784716745_t.indd 621 14/02/2018 16:38

Surveys used for feature valuation need to include a reasonable com-

petitive set. While it is not possible or practical to include the complete
universe of possible competitors, a representative sample of the com-
petitive environment should be used. Competition serves as an important
component of an equilibrium model, and the quality of the equilibrium
analysis depends heavily on the included set of competitors. In the digital
camera study discussed below we used four companies to represent the
competitive set. These competitors were chosen to be representative of
the market as a whole. The competitive set included the two leading
digital camera manufacturers. These manufacturers represent a significant
portion of the digital camera market and have a very loyal following. In
addition they offer a wide range of camera models including many models
in the target range. This makes the task more plausible since it is likely that
the companies offer a camera similar to any of the experimentally designed
cameras. We chose two additional companies to represent a mid-tier
competitor and a less popular competitor. Both companies had significant
brand recognition in the broader consumer electronics space, but were not
in a dominant position in the digital camera market. This was done in an
attempt to represent different types of competitors to cover the majority of
the market without overloading the study design with many digital camera
makers. The nature of the competitive set used will vary from category to
category. In general we don’t feel it is necessary to completely cover the
market, but an attempt should be made to represent the major market
segments.
An additional important consideration is the total number of features
included in the conjoint study. Experienced conjoint researchers generally
recommend between four and eight variable features in a conjoint analy-
sis. This figure includes the brand and price attributes. In contrast, digital
cameras can include hundreds of features that differentiate one model of
camera from another. Because of this it becomes necessary to carefully
consider which features to include in the study. A respondent is expected
to hold constant all the features that are not explicitly included in the
conjoint study. We find it helpful to remind consumers that the cameras
are identical except for the described features and include a description of
the basic model. An alternative to describing non-included features is to
state that all the features not described are acceptable to the respondent.
We generally find it is preferable to explicitly describe the non-included
attributes as it makes the choices more concrete and helps respondents
hold the omitted attributes constant.
The feature that we were valuing in the exercise is the presence or
absence of a swivel screen. This feature was described using the image
in Figure 32.1. We also included a feature for the number of megapixels
MIZIK_9781784716745_t.indd 622 14/02/2018 16:38

Figure 32.1 Swivel screen attribute
of the camera sensor, 10 or 16 megapixels; the degree of optical zoom,

4x or 10x; the quality of the video, HD Video (720p) or Full HD Video
(1080p) with stereo sound; and the presence or absence of built-in WiFi
connectivity.
We considered a number of additional attributes, but these were eventu-
ally excluded. All the included attributes were fully described using a recal-
lable glossary including pictures to describe the features where necessary.
These attributes were selected to include a balance of important and less
important features and were also commonly included in the marketing
materials of the firms selling the cameras. Also note that it is not necessary
to include an exhaustive list of feature levels for the cameras. Our features
have two levels each even though more could be represented.
One of the features of a conjoint study is the ability to include an
additional option for “I would not choose any of these.” While the use of
this “none” option is a matter of debate for preference-based studies, it is
necessary for equilibrium calculations. It represents an outside good and
allows respondents to completely opt out of the market as the price of the
offerings increases. Without a none option, the study would not be able to
consider how the market expands or contracts as the prices and features
of the offerings change. This expansion and contraction of the market is
necessary to create a real demand curve.
The traditional method of including a none option is to present the
option alongside the other offerings as an additional concept to consider.
An alternative method, called “dual-response none,” for collecting the
no-purchase option is to include a follow-up question that directly asks the
respondent whether they would purchase the chosen product (Brazell et al.
2006; Wlömert and Eggers 2016). The first step presents a forced choice
MIZIK_9781784716745_t.indd 623 14/02/2018 16:38

task where the experimentally designed concepts are presented to the user.
The respondent then chooses their most preferred option. A follow-up
task is then included which asks the respondent if they would actually pur-
chase the product chosen. If the respondent indicates that yes, they would
purchase the product, the chosen product is recorded as the final choice.
If the respondent would not purchase the product the previous response is
discarded and the no-purchase option is recorded.
While the two methods for eliciting the no-purchase decision should
lead to logically identical results, in practice the two methods often lead
to significantly different choice patterns. Asking a conjoint question as
a dual-response none question generally increases the prevalence of the
“would not purchase” option. It is not uncommon to see the “would not
purchase” share more than double with the use of the dual-response none.
While we won’t speculate on the respondent psychology leading to the
change in none share, we generally feel that the dual-response none meth-
odology leads to a none share that is more in line with the non-purchase
option observed in actual purchase situations. For this reason we recom-
mend that the dual-response none option be used when designing conjoint
studies for equilibrium calculations.
An additional important criterion to consider when fielding a con-
joint study is the experimental design used. A conjoint study should
be considered an experiment and designed to allow for the maximum
discrimination between features. Experimental design is a highly technical
subject (see, for example, Box and Draper 1987, Chapters 4 and 5) and
we will not cover it fully here. Existing experimental-design software
used to create choice-based conjoint studies is readily available and will
create high-quality designs. The combination of features, brands, and
prices in a conjoint study is specifically designed to have a high degree a
variation. This variation is necessary to increase the power of the study to
discriminate the value of different features and attributes. The problem
is that an unrestricted experimental design will create offerings that are
not currently represented in the market place and may vary significantly
from existing product offerings. This is a necessary component of the
survey technique and care should be taken to avoid artificially restricting
offerings by constraining or modifying the generated experimental design.
Example equilibrium value calculation
We demonstrate the procedure using the digital camera study described

in the previous section. The study was fielded using Survey Sampling
International’s internet panel in August of 2013.2 We received 501 com-
MIZIK_9781784716745_t.indd 624 14/02/2018 16:38

pleted questionnaires. We performed a number of data quality checks to

insure that the data received were legitimate. One measure that is com-
monly used to detect poor respondents is response time. The median
time to complete the survey was 220 seconds. We reran the analysis
eliminating the fastest quartile of respondents and found little change in
the results. Another common measure used to detect respondents that
did not carefully consider the choice tasks is to remove those respondents
who answered the same choice for all the tasks (e.g. always selected the
left-most task). Of our 501 respondents, only two displayed this behav-
ior and were eliminated. We also eliminated six respondents who always
chose the same brand and two respondents who always chose the highest-
priced product. It did not appear that these respondents took the choice
task seriously. In addition, 23 respondents answered “none” to all their
tasks. While these responses may be legitimate, we cannot identify those
respondent’s preference parameters from the data provided, so they were
removed as well. This leaves a final sample of 468 respondents out of the
original sample of 501.
We analyze the conjoint data using a Bayesian hierarchical model. In the
hierarchical model we assume that the conjoint-partworths are drawn from
a normal prior distribution. This is similar to a random coefficients model
and leads to a posterior distribution that is symmetric, but has thicker tails
than a normal distribution. The model used was a slight variation of the
rhierMnlRwMixture routine in the bayesm R Package (R 2015) described
in Rossi, Allenby and McCulloch (2005). The underlying algorithm was
modified to enforce the constraint that the price coefficient be strictly nega-
tive. The constraint was made by reparameterizing the price parameter to
be the negative exponential of the draw from a normal distribution. The
full procedure can be found in Allenby, Brazell, Howell and Rossi (2013).
With the estimates of the Bayesian hierarchical model, we are able to
construct a set of draws from the posterior distribution that form the foun-
dation of our approximation to the full posterior predictive distribution.
There are two ways to create this set of draws. The most common way is
to simply use the draws of bh from the converged MCMC chain. These
draws represent a set of draws tied to each individual responding to the
survey. The advantage of using these draws is that they are a byproduct of
the MCMC calculation and do not require any additional calculation to
compute. On the other hand, they may be more sensitive to outliers and do
not allow for adjusting the sample weights to better reflect the population
characteristics.
The second technique involves sampling from the posterior distribution
for the upper-level parameters. The MCMC sampling procedure creates
an empirical distribution for the mean and variance of the population. We
MIZIK_9781784716745_t.indd 625 14/02/2018 16:38

can use these empirical distributions to create a sample from the posterior
distribution. The process draws a series of samples from a normal distribu-
tion with the draw of the mean and variance as the distribution param-
eters. In this way it is possible to draw a sample of arbitrary size from the
posterior distribution. While this method requires recalculating the sample
after the completion of the MCMC chain, the resulting distribution is less
sensitive to outliers and can be easily reweighted to match population
characteristics. We use this method for calculating the posterior distribu-
tion for this example exercise. We sampled 1,000 draws of the mean and
variance parameters and then created a sample of 10,000 draws from each
of those samples, for a total of 1,000 × 10,000 = 10,000,000 draws from
the posterior distribution. These draws can then be used to empirically
sample from the distribution of expected demand. Recall that the formula
for expected demand is:
 (Q) 5M3Pr ( j 0 b) Pr (b 0 U) db
Using the draws from the posterior distribution we can approximate

this integral by computing:
 (Qj) 5 M 3 a Pr ( j 0 bi )
1 I

I i51
where I indexes the draws from the posterior distribution. From a com-
putational standpoint we need to calculate the relative choice probability
for each draw from the posterior distribution and then scale the average
of those choice probabilities by the total market size. This step forms an
input into the profit maximizing routine.
Using the camera study previously described, we applied these tech-
niques to calculate an equilibrium price change for one of the features.
Consider the following scenario. Nikon currently holds a patent on a
unique swivel screen feature for a camera. They are considering develop-
ing the feature for introduction into a new camera model and want to
determine the potential return on the investment they could expect. They
recognize that competitors are likely to respond to the feature introduc-
tion by adjusting their prices to compensate for the new market situation.
Nikon fields the conjoint study described above. Using the equilibrium
calculation they can determine that they are likely to increase profits by
about 42 percent. The full distribution of the expected change in profits for
Nikon is show in Figure 32.2. This change in profits can be calculated as:
* ) 2 pNikon ( pw/oSS
pNikon ( pSS * )
%Dp 5
*
pNikon ( pw/oSS )
MIZIK_9781784716745_t.indd 626 14/02/2018 16:38

10% 20% 30% 40% 50% 60% 70%
Figure 32.2 Posterior distribution of percentage change in profit for

Nikon when introducing a swivel screen feature
This is simply the difference in profit when Nikon implements the swivel
screen and when it does not implement the swivel screen, normalized by
the profit without the investment.
The 95 percent posterior interval (represented by the short vertical bars
in the figure) for the change in profits is quite wide, going from 26 percent
to 58 percent. This reflects the high uncertainty in the result even with a
study including nearly 500 respondents.
Consider an alternative scenario where Sony, instead of responding
strictly by adjusting prices, decides to also implement a swivel screen.
After litigation it is determined that Sony has infringed on Nikon’s patent.
The question then is how much harm has Nikon suffered due to the
infringement by Sony. The answer is clearly that Nikon has been damaged
to the extent that Sony’s infringement hurt the profits that Nikon would
have received in the absence of the infringement. This can be calculated as:
pNikon ( p*Nikon, SonySS ) 2 p ( pNikon

* OnlySS )
%DpNikon 5
pNikon ( p*Nikon only SS )
The resulting distribution can be seen in Figure 32.3.

The posterior mean (represented by a dashed vertical line) indicates that
Nikon would suffer an 11 percent drop in profit due to Sony’s infringe-
ment. If Sony adds the swivel screen to its camera, the demand for Sony’s
MIZIK_9781784716745_t.indd 627 14/02/2018 16:38

–20% –10% 0% 10% 20%
Figure 32.3 Posterior distribution of change in Nikon Profit when Sony

introduces an infringing swivel screen camera
product would increase dramatically. Much of that increase would come

at the expense of Nikon if Nikon were to maintain the same price it had
before Sony were to introduce the swivel screen. However, we would not
expect Nikon to maintain its price and not react to the increased competi-
tion. Nikon would be forced to reduce prices significantly to combat the
threat. This would lead to a decrease in prices for Nikon from $264 to $225
while maintaining approximately the same share as was present before
Sony introduced the infringing product. This illustrates the importance of
considering the equilibrium profit calculations rather than simply holding
prices or shares constant.
Recent Court Decisions Supporting Our

View
Conjoint analysis has proved attractive not only in patent litigation but
also in consumer fraud cases. In consumer fraud litigation, the plaintiffs
identify some alleged false product representation or an omission of
negative product information such as side effects from the use of a drug
or mildew in a washing machine. The plaintiffs then proceed to argue that
a class action is the proper way of adjudicating the dispute and award-
ing damages. The first step in this process is to seek certification of the
class under federal statues governing class litigation. In class certifica-
tion, the plaintiffs must show that damages can be proven on the basis of
MIZIK_9781784716745_t.indd 628 14/02/2018 16:38

arguments that apply in some “predominant” sense over the entire class
and that damages can be calculated in a “uniform manner.” Class action
plaintiffs are attracted to conjoint as a way to isolate the impact of the
alleged fraud on the price of the product. That is, the view is that damages
should be based on the difference between the “value” of the product as
represented and the value as received. The problem becomes one of how to
assess this difference in “value.”
For example, in Saavedra v. Eli Lilly, the plaintiffs sought damages
for what they alleged was Eli Lilly’s failure to disclose a higher incidence
of withdrawal symptoms from discontinuing use of the anti-depressant,
Cymbalta. The “value” as represented could be the market price of the
Cymbalta medicine. The real question is what is the value as received. The
plaintiffs proposed to use conjoint analysis to “value” the product in the
counterfactual world in which the full extent of withdrawal symptoms was
known. In particular, the plaintiffs proposed to undertake a “willingness
to pay” calculation for the purpose of valuation. The court denied the
plaintiffs’ motion for class certification (Case No. 2:12-CV-9366-SVW,
12/18/2014), arguing that this proposed method was inappropriate. The
court stated “the Plaintiffs used the term ‘value’ to mean consumer
utility – a concept distinct from price. . . . It appears that consumer value
is a subjective concept distinct from the fair market value concept com-
monly used.” The court further states that the “Plaintiffs’ theory of injury
is distinct from the typical benefit-of-the bargain claim because it focuses
only on the demand side of the equation, rather than the intersection of
demand and supply.”
In short, the court in Saavedra v. Eli Lilly endorses our view that we
must use market prices as the basis for damages in litigation and that
WTP is not appropriate. The court even understood the basic point
that a WTP analysis will tend to overstate the damages, stating: “he [the
plaintiffs’ expert] forgets that a rational consumer would surely pay less
than she believes a drug is worth.” The plaintiffs failed to show the court
that conjoint surveys can be used as an input to the process of estimating
a market price for the product as received.
In the US District Court in California, the court recently denied a
motion for certification of a class in a consumer fraud case involving
e-cigarettes, endorsing the same logic we have outlined in our work. In
Ben Z. Halberstam v. NJOY, Inc et al, Judge Margaret Morrow (Case No.
14-CV-00428) stated (citing a decision in Werdebaugh v. Blue Diamond
Growers) that the correct measure of restitution “can be determined by
taking the difference between the market price actually paid by consum-
ers and the true market price that reflects the impact of the unlawful,
unfair, or fraudulent business practices.” Judge Morrow rejected the
MIZIK_9781784716745_t.indd 629 14/02/2018 16:38

plaintiffs’ use of a WTP analysis to estimate this difference in market

value. Referencing Apple, Inc. v. Samsung Electronics Co., she found
that the plaintiffs’ expert only considered “the demand side of the market
equation” and “the ultimate price of a product is a combination of market
demand and supply.”
While conjoint analysis has enjoyed some success in the patent arena,
the courts are also picking up on the limitations of a conjoint-based WTP
analysis even in those cases. In her decision to deny Apple injunctive relief
in Apple v. Samsung (Case No. 11-CV-01846), Judge Koh criticized the
Apple conjoint expert’s WTP metrics, stating: “the survey leaves the Court
with no way to compare [the plaintiffs’ conjoint expert’s] willingness to pay
metrics . . . to the market price of the infringing devices, which reflects the
real-world interaction of supply and demand for the infringing and non-
infringing devices.” She further stated that “the serious market competition
in the smartphone and tablet industry works to depress prices, whereas the
plaintiffs’ [conjoint expert’s] survey did not account at all for competitor
products.” The court, in this case, also understood that any WTP analysis
will overstate the value of a product as measured by market price.
In summary, the courts in both the patent and consumer fraud class
action areas are starting to appreciate our basic point that any measure
of economic damages must be based on market or equilibrium prices and,
further, that WTP does not measure a market price. What is missing is a
methodology that combines conjoint survey data with cost and competi-
tive information to provide an estimate of market price premiums as we
have outlined here.
Conclusion
The use of equilibrium analysis is an important distinction of our approach

to product feature valuation. Prior to our work, approaches to feature
valuation only considered consumer valuation (such as Willingness To
Pay) and did not consider the ability of the firm to capture this increased
WTP. In many cases, the use of WTP and other demand-only measures
will overstate the value of product features. This occurs because firms face
competitive pressure that limit their ability to capture the full increase in a
consumer willingness to pay.
A proper approach to feature valuation must embed the firm’s pricing
decision in a realistic competitive environment. An equilibrium analysis
naturally incorporates competition and competitive reactions that limit
the pricing power of the firm. There are two difficulties in implementing
an equilibrium analysis. The first is that an equilibrium analysis requires
MIZIK_9781784716745_t.indd 630 14/02/2018 16:38

the marginal cost of the products and especially the marginal cost of the
feature. This is often closely guarded by companies, but should be a part
of any investment decision. The second difficulty is primarily compu-
tational. Solving the equilibrium analysis increases the computational
burden and there is an absence of commercial software that implements
the technique. Both of these problems are easily surmounted and should
not been seen as a major impediment to the use of equilibrium analysis for
feature valuation.
Feature valuation is an important element of the marketing analytics
tool kit and one of the primary motivations behind the popularity of con-
joint analysis. Our hope is that we have called attention to an important
deficiency in current, consumer-centric, approaches and demonstrated
that equilibrium calculations are feasible.
NoteS
1. For details and further discussion, see “Valuation of Patented Product Features”
(2014), Journal of Law and Economics, 57, 629–663 and “Economic Valuation of Product
Features” (2014), Quantitative Marketing and Economics 12, 4, 421–456.
2. This survey was part of a wave of four other very similar conjoint studies on digital
cameras. Across all studies, 16,185 invitations were sent and 6,384 individuals responded.
Of those who responded, 2,818 passed screening and of those that passed screening,
2,503 completed the study. The other four studies were not considered in the analysis.
References
Allenby, Greg M., Geraldine Fennell, Joel Huber, Thomas Eagle, Tim Gilbride, Dan
Horsky, Jaehwan Kim, Peter Lenk, Rich Johnson, Elie Ofek, Bryan K. Orme, Thomas
Otter, and Joan Walker (2005), “Adjusting Choice Models to Better Predict Market
Behavior,” Marketing Letters, 16 (3/4), 197–208.
Apple Electronics Co. Ltd. v. Samsung Electronics Co. Ltd., No. 11-CV-1846 [N.D. Cal.
December 2, 2011].
Barry, Chris, Ronen Arad, and Kristofer Swanson (2013), “2013 Patent Litigation Study,”
PWC Research Report.
Ben Z. Halberstam v. NJoy Inc et al, No. 2:14-CV-00428 [C.D. Cal. August 14, 2015].
Ben-Akiva, Moshe, Daniel McFadden, and Kenneth Train (2015), “Foundations of Stated
Preference Elicitation.” Working Paper, http://eml.berkeley.edu/~train/foundations.pdf
(accessed January 4, 2016).
Jennifer L Saavedra v. Eli Lilly and Company, No. 2:12-CV-09366 [C.D. Cal. December 18,
2014].
Jeruss, S., R. Feldman and J. H. Walker (2012), “The America Invents Act 500: Effects of
Patent Monetization Entities on US Litigation,” Duke Law and Technology Review, 11
(2), 357–388.
Orme, Bryan K. (2001), “Assessing the Monetary Value of Attribute Levels with Conjoint
Analysis: Warnings and Suggestions.” Unpublished manuscript. Sawtooth Software,
Sequim, Wash.
MIZIK_9781784716745_t.indd 631 14/02/2018 16:38

Orme, Bryan K. (2009), Getting Started with Conjoint Analysis: Strategies for Product Design
and Pricing Research. Madison, WI: Research Publishers.
Phonographic Performance Co. of Australia Ltd. (ACN 000 680 704) under section 154(1) of
the Copyright Act 1968(CTH) (2007), available at Federal Court of Australia, http://www.
judgments.fedcourt.gov.au/judgments/Judgments/tribunals/acopyt/2007/2007acopyt0001.
R Core Team (2015), “R: A language and environment for statistical computing.” R
Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org/.
Wlömert, Nils and Felix Eggers (2016), “Predicting new service adoption with conjoint
analysis: external validity of BDM-based incentive-aligned and dual-response choice
designs,” Marketing Letters, 27: 195–210.
MIZIK_9781784716745_t.indd 632 14/02/2018 16:38

33. Regression analysis to evaluate harm
in a breach of contract case: the Citri-
Lite Company, Inc., Plaintiff v. Cott
Beverages, Inc., Defendant
Rahul Guha, Darius Onul and Sally Woodhouse
We discuss the use of regression analysis to evaluate harm in a breach of

contract case involving allegations that the licensor of a product failed
to use commercially reasonable efforts to promote and sell the product.
Regression analysis has been widely used and accepted by United States
courts across a large variety of different types of cases, including labor
discrimination cases, antitrust cases, and intellectual property cases.1 In
cases involving marketing issues, regression analysis is frequently used to
determine the effect of promotion on sales.
Allegations
On July 26, 2007, the Citri-Lite Company (“Citri-Lite”) filed suit against
Cott Beverages, Inc. (“Cott”) for breach of contract due to Cott’s alleged
failure to promote a licensed product in a commercially reasonable
manner.2 Citri-Lite, a company that produced and marketed “Slim-Lite,”
a non-carbonated fruit-flavored zero-calorie weight-loss drink, entered
into an exclusive licensing agreement with Cott on September 17, 2003.3
As a consequence of the agreement, Cott agreed to manufacture, produce,
distribute, sell, and market Slim-Lite with the following terms:
1. Cott would pay a royalty rate of $0.50 to Citri-Lite for each case sold;4
2. Cott would maintain the level of marketing support Citri-Lite was
previously providing; and
3. Cott would provide some method to protect Citri-Lite if the product
did not fare as well as expected.5
In particular, Cott agreed to spend, on average over each rolling year

during the royalty period, the amount of $0.80 per case sold for marketing
purposes and to use commercially reasonable efforts to promote and sell
633
MIZIK_9781784716745_t.indd 633 14/02/2018 16:38

Slim-Lite in order to maintain or enhance the value of Slim-Lite’s good-

will.6 Cott also agreed to pay Citri-Lite a minimum royalty of $350,000
for each year of the contract term.7 The agreement specified that each
party could unilaterally terminate the agreement with 60 days’ notice.8 By
October 2005, Cott informed Citri-Lite that it intended to terminate the
agreement.9,10
Prior to the agreement, the marketing strategy most widely employed by
Citri-Lite was the use of in-store demos, which involved free product sam-
pling by customers.11 Although the agreement did not explicitly require
the use of demos or the use of any specific marketing program,12 Citri-Lite
alleged that Cott violated the terms of the agreement by reducing and
then stopping all demo activity at Sam’s Club in 2005, claiming this had a
negative impact on Slim-Lite’s success at Sam’s Club.13
Defendant’s Expert Assignment and

Findings
Cott retained Dr. Randolph Bucklin, a marketing expert, to measure the

effectiveness of the in-store demos run by Cott in Sam’s Club locations on
sales and to determine whether the costs and benefits of demos support the
hypothesis that demos were paying for themselves over time.14 Dr. Bucklin
conducted a regression analysis to assess the issue. Dr. Bucklin evaluated
demo activity at Sam’s Club on three dimensions:
1. the effect of demos on Slim-Lite’s sales at Sam’s Club during the

weeks in which the demos were performed;
2. a comparison between the cost of demos and the increase in the con-
tribution margin (the difference between the sale price of additional
cases and the costs of production of the additional cases) resulting
from increased sales in the weeks in which demos were performed; and
3. the effect of demos on sales in following weeks in which demos were
not run to determine whether there was any carrying effect on sales.15
His analysis included weekly sales data and weekly demo data at Sam’s
Club for the whole period of the agreement.16
First, Dr. Bucklin regressed weekly aggregate Slim-Lite sales for Sam’s
Club locations on the number of demos run in that particular week and
found that demos led to an increase in sales of approximately 12 cases
during the weeks in which demos were performed compared with the
weeks in which the demos were not.17 Second, Dr. Bucklin determined
that Cott spent, on average, approximately $143 per demo, which included
MIZIK_9781784716745_t.indd 634 14/02/2018 16:38

Regression analysis to evaluate harm in a breach of contract case 635
both the cost of the demo and the price of the Slim-Lite cases used for sam-
pling purposes, and that the increase in the contribution margin brought
by the sale of the 12 additional cases in weeks in which the demo occurred
was only $16.20, resulting in a net loss of $126.80 due to each demo in
weeks in which demos were performed compared with weeks in which a
demo was not performed.18 Third, Dr. Bucklin tested whether demos in
prior weeks affected current weekly aggregate Slim-Lite sales for Sam’s
Club locations and found that demos did not have any long-term effect on
sales of Slim-Lite using either a four- or eight-week horizon.19 Dr. Bucklin
also ran an analysis to determine if the effects of the demos were different
from January to August 2004, when the Sam’s Club distribution was
constant, versus September 2004 to April 2005, when the number of Sam’s
Club locations in which Slim-Lite was carried increased significantly. He
found the same effect for these two subsamples as for the full sample.20
Therefore, Dr. Bucklin argued that the demo activity at Sam’s Club only
increased sales in weeks in which the demo activity was performed and not
in subsequent weeks, and that each demo resulted in a net loss in weeks
in which demos were performed.21 Since the demos did not result in short-
term profits, and did not increase long-term sales, Dr. Bucklin concluded
that the demos were not effective marketing tools for Slim-Lite in Sam’s
Club locations and hence that Cott’s decision to reduce and ultimately
cancel demos at Sam’s Club was reasonable.22
The agreement specified that Cott spend $0.80 per case sold on market-
ing efforts, and that Cott make commercially reasonable efforts to pro-
mote Slim-lite.23 Dr. Bucklin argued that the three alternative marketing
strategies used by Cott besides demos were reasonable:
1. the presence of Slim-lite at Sam’s Club had a positive marketing effect

since Sam’s Club is a selective retailer;
2. increasing the number of Sam’s Club locations that carried Slim-Lite,
along with obtaining distribution at other retailers, are valid market-
ing mechanisms; and
3. reducing the price at which Sam’s Club acquired Slim-Lite from Cott
also should have had a positive marketing effect, assuming that Sam’s
Club decreased the prices that customers faced as well.24
Finally, as Cott spent at least $0.80 per case when considering all these
marketing mechanisms, including the price reduction, and because demos
were not commercially reasonable efforts, Dr. Bucklin argued that Cott’s
efforts were appropriate and reasonable from a marketing perspective.25
MIZIK_9781784716745_t.indd 635 14/02/2018 16:38

Plaintiff’s Rebuttal
Thomas M. Neches, CPA, was retained on behalf of Citri-Lite to deter-

mine Plaintiff’s economic losses, assuming liability had been proved.
Mr. Neches had three main critiques of Dr. Bucklin’s linear regression
analysis.26 First, Mr. Neches argued that Dr. Bucklin incorrectly included
585 demos within his population of 8,158 purported Slim-Lite demos.27
Second, Mr. Neches contended that Dr. Bucklin’s analysis failed to distin-
guish between demos presented on Friday, Saturday or Sunday and demos
presented on other days of the week.28 An industry marketing expert sepa-
rately had testified that Monday through Thursday demos are expected to
be less effective than Friday through Sunday demos. Third, Mr. Neches
disagreed with Dr. Bucklin’s measure of the cost effectiveness of Slim-Lite
demos, defined as the cost of a single demo compared to the expected
increase in gross profits during the week of the demo.29 Mr. Neches
argued that a better measure of the cost effectiveness of demos would be
to analyze the cumulative effect of demos on cumulative Slim-Lite sales.30
He based this opinion on the testimony of the industry marketing expert
that the goal of in-store demos is not to create a single-week increase in a
product’s sales, but to maintain current sales levels and to promote long-
term sales growth.
Using the same Sam’s Club weekly Slim-Lite sales data and demo data
that Dr. Bucklin used, Mr. Neches performed a least-squares multivariate
linear regression analysis to estimate the cumulative weekly number of
Slim-Lite cases sold as a function of three factors—the cumulative number
of weekly Friday–Sunday demos, the cumulative number of weekly
Monday–Thursday demos, and the number of stores carrying Slim-Lite
in the week.31 Mr. Neches concluded that each additional Friday–Sunday
demo costing $143 would be expected to generate $263 additional profit,
hence these demos were cost effective and it was not “commercially
reasonable” for Cott to have discontinued the demos.32 On the other
hand, Monday–Thursday demos were not cost effective because they only
increased sales by 53.8 additional cases, resulting in a net loss of $53.80.33
Thus, Mr. Neches concluded that it was not “commercially reasonable” to
have demos on those days.34
Defendant’s Response to Plaintiff’s

Rebuttal
In response, Dr. Bucklin argued that Mr. Neches’ decision to regress

cumulative sales on cumulated demos resulted in a spurious regression.35
MIZIK_9781784716745_t.indd 636 14/02/2018 16:38

A spurious regression, as defined by Barre and Howland’s Introductory

Econometrics, occurs when a researcher mistakenly believes that X and Y
are related. Dr. Bucklin assessed both that cumulative sales and cumula-
tive demos were growing over time, or “trending,” simply due to the fact
that a cumulative sum was constructed by adding past quantities to the
current period’s quantities.36 Subsequently, he contended that, even if
there were no relationship between sales and demos, the Neches model
would have shown a positive correlation between cumulative sales and
cumulative demos since both statistics would trend upward over time.37
Hence, Mr. Neches’ results failed to show conclusively that demos were
related to sales. Moreover, applying standard regression techniques to
variables that grow over time can lead to misleading statistical conclu-
sions including R2 and t-statistics.38 An artificially high R2 might mislead
the researcher to believe in the existence of a strong relationship, whereas
erroneous t-statistics can lead to incorrect hypothesis testing.39 Finally,
even if one were to accept Mr. Neches’ premise that cumulative demos are
a good measure of demo activity over time, one would also expect to see
a positive relationship between cumulative demos and current sales.40 Dr.
Bucklin estimated a regression to test this and concluded that cumulative
demos had very little effect on current sales.41 He argued that this was
further proof of a spurious regression since it made no sense for cumula-
tive demos to affect cumulative sales if they failed to impact current sales.42
Case Outcome
After reviewing the experts reports, rebuttal reports and a number of legal
motions submitted by or on behalf of both Citri-Lite and Cott, Judge
Oliver W. Wanger decided that Cott was not liable to Citri-Lite for any
of the claims and argued that Cott had established, with a preponderance
of evidence, the fact that it acted with reasonable justification to protect
its own economic interest.43 More specifically, Judge Wanger concluded
that the licensing agreement between Cott and Citri-Lite did not guaran-
tee Slim-Lite’s commercial or marketing success, and that Cott did spend
a significant amount of “money, time and effort” to market Slim-Lite,
which falls under the umbrella of commercially reasonable marketing
efforts.44 Directly related to the expert reports, Judge Wanger concluded
that the industry marketing expert’s opinions were “old school” and
“not founded in modern marketing science, economics, or quantifiable
approaches” and were thus less persuasive than Dr. Bucklin’s opinions.45
MIZIK_9781784716745_t.indd 637 14/02/2018 16:38

Notes
1. Daniel L. Rubinfeld, “Reference Guide on Multiple Regression,” Reference Manual on

Scientific Evidence, 3rd. ed., 2011, pp. 306–307.
2. The Citri-Lite Company v. Cott Beverages, Inc., (hereinafter “Citri-Lite v. Cott
Beverages”) US District Court for the Eastern District of California, Case 1:07-cv-01075.
3. Citri-Lite v. Cott Beverages Findings of Fact and Conclusions of Law Following Bench
Trial, Findings of Fact, September 30, 2011, (“Findings of Fact”) ¶¶ 2, 27.
4. A case is defined in the agreement as “the quantity of twelve (12) containers of the
product, where each container holds twenty (20) ounces or any configuration of con-
tainers.” (Findings of Fact, ¶ 29.)
5. Findings of Fact, ¶ 27.
6. Findings of Fact, ¶¶ 28, 36. The notion of commercially reasonable efforts was not
explicitly defined in the agreement. Dr. Bucklin argued that because marketing involves
a certain degree of risk, a commercially reasonable marketing effort need not be suc-
cessful. (Findings of Fact, ¶ 32; Citri-Lite v. Cott Beverages, Trial Testimony Transcript
of Randolph Bucklin, July 12, 2011, (“Bucklin Trial Testimony”) p. 18. While goodwill
is not defined in the agreement, it is generally understood to relate to building the brand
as opposed to harming it. (Findings of Fact, ¶ 31.)
10. At the start of the agreement, Slim-Lite was carried in 248 Sam’s Club locations. By
January 2005, Slim-Lite was carried in approximately 500 Sam’s Clubs nationwide.
However, by May 2005 distribution was cut drastically by Sam’s Club and, as a conse-
quence, Cott began to consider an exit strategy from Slim-Lite. (Findings of Fact, ¶¶ 44,
56.)
12. Findings of Fact, ¶¶ 34, 35.
13. Between January and July 2004, Cott consistently used demos for Slim-Lite twice a
month in each Sam’s Club location, which was then reduced to one demo per month
between August and December 2004. Cott also ran two demos at all Sam’s Club
locations during January 2005, in order to increase declining sales, and cancelled all
remaining demos in March 2005. (Findings of Fact, ¶¶53, 56, 58–59.) Citri-Lite also
alleged that Cott breached the agreement by failing to implement a repackaging change
requested by Sam’s Club toward the end of 2004 and by neglecting to conduct promo-
tional activity at retailers other than Sam’s Club and Walmart that carry Slim-Lite,
such as Food Lion. (Findings of Fact, ¶¶ 47–49.)
14. Findings of Fact, ¶¶ 62, 64; Bucklin Trial Testimony, p. 8.
15. Findings of Fact, ¶ 67; Bucklin Trial Testimony, pp. 21–22.
16. Findings of Fact, ¶ 66; Bucklin Trial Testimony, pp. 20–21.
17. Findings of Fact, ¶ 68; Bucklin Trial Testimony, p. 22.
18. Findings of Fact, ¶ 69; Bucklin Trial Testimony, pp. 22–23, 25–27.
19. Findings of Fact, ¶ 70; Bucklin Trial Testimony, pp. 22, 31.
20. Findings of Fact, ¶ 71; Bucklin Trial Testimony, p. 97.
21. Findings of Fact, ¶¶ 20, 65; Bucklin Trial Testimony, pp. 21–24.
22. Findings of Fact, ¶¶20, 65, 153; Bucklin Trial Testimony, pp. 23–24.
25. Findings of Fact, ¶¶ 107, 153–154.
26. Updated Expert Report of Thomas M. Neches, CPA., October 10, 2010, Exhibit A to
Declaration of Gregory Ellis in Support of Cott Beverages, Inc.’s (1) Motion to Strike
Expert Opinions of Thomas Neches and (2) Motion to Strike Certain Opinions of John
Carson, Citri-Lite v. Cott Beverages, (“Updated Neches Report”), pp. 21–23.
27. Updated Neches Report, p. 22.
MIZIK_9781784716745_t.indd 638 14/02/2018 16:38


30. Updated Neches Report, p. 23
35. Second Rebuttal Expert Report of Randolph E. Bucklin, November 5, 2010, Exhibit D
to Declaration of Gregory Ellis in Support of Cott Beverages, Inc’s (1) Motion to Strike
Expert Opinions of Thomas Neches and (2) Motion to Strike Certain Opinions of John
Carson, Citri-Lite Company v. Cott Beverages, (“Bucklin Second Rebuttal Report”), p. 2.
36. Howard Barreto and Frank M. Howland, Introductory Econometrics (New York:
Cambridge University Press, 2006), pp. 611–612, as cited in Bucklin Second Rebuttal
Report, p. 2.
37. Bucklin Second Rebuttal Report, p. 2.
38. Bucklin Second Rebuttal Report, pp. 2–3.
41. Bucklin Second Rebuttal Report, pp. 4–5.
43. Citri-Lite Company v. Cott Beverages, Findings of Fact and Conclusions of Law Following
Bench Trial, Order, September 30, 2011, ¶¶ 1–2.
44. Citri-Lite Company v. Cott Beverages, Findings of Fact and Conclusions of Law Following
Bench Trial, Conclusions of Law, September 30, 2011, ¶ 73.
MIZIK_9781784716745_t.indd 639 14/02/2018 16:38

34. Consumer surveys in trademark
infringement litigation: FIJI vs. VITI case
study
T. Christopher Borek and Anjali Oza
In this case study, we discuss the use of consumer surveys to evaluate con-
sumer confusion in a trademark infringement case. In trademark cases, a
plaintiff alleging infringement must provide evidence of the infringement,
including the likelihood of consumer confusion. Since trademark owners
are often unable to provide evidence of actual confusion, consumer
surveys can be used to evaluate the likelihood of consumer confusion
over similarity of trademarks or products1 because a survey gauges the
“subjective mental associations and reactions of prospective purchasers.”2
Courts can rely on survey evidence to establish likelihood of confusion
in trademark infringement cases. The admissibility of a survey and the
weight given to survey evidence depend in part on the survey design and
the manner in which the survey was conducted.3 Below we summarize
the role surveys have played in trademark infringement cases and discuss
how consumer surveys were used by both the plaintiffs and defendants in
a trademark infringement case involving artesian bottled water from the
Republic of Fiji.
Trademark Infringement Cases
Trademark infringement is the “unauthorized use of a trademark or

service mark on or in connection with goods and/or services in a manner
that is likely to cause confusion, deception, or mistake about the source
of the goods/and or services.”4 Infringement can involve a wide range of
brand names, mottos, symbols, logos, or other forms of identification, and
infringing marks do not have to be exact replicas of the protected mark in
order to cause confusion, deception, or mistake. Typically, the plaintiff
has senior use of the mark and alleges that the defendant’s junior use of the
same or similar mark causes a likelihood of confusion.
In trademark litigation, a plaintiff must prove that it owns a valid mark,
the plaintiff’s mark has priority, and the defendant’s mark is likely to
cause confusion in the minds of consumers about the source or sponsor-
640
MIZIK_9781784716745_t.indd 640 14/02/2018 16:38

Consumer surveys in trademark infringement litigation 641
ship of the goods or services offered under the parties’ marks.5 Critical
aspects of trademark infringement cases are, typically, the degree of
similarity between the marks at issue and whether the parties’ goods and/
or services are sufficiently related that consumers are likely to mistakenly
assume that they come from a common source. Trademark owners who
can successfully prove their case in court can obtain an injunction against
the infringer to prevent further use of the mark. One common approach
for measuring likelihood of consumer confusion is through consumer
surveys.
Common Survey Formats for Measuring Likelihood of Consumer

Confusion
There are two common types of survey formats for addressing likelihood
of confusion due to similarity of marks or products:
Eveready Format: Asks respondents to name the company that

they think puts out the junior mark; and
Squirt Format: Asks whether the junior and senior marks are
put out by the same or different companies.
This case study focuses on the Eveready format, which has been
accepted in numerous trademark infringement cases.6 Originally used in
Union Carbide Corp. v. Ever-Ready, Inc.,7 the Eveready format has become
the “gold standard”8 for evaluating likelihood of consumer confusion in
cases with a strong, top-of-mind senior mark (i.e., highly accessible in
memory).9 In a typical Eveready format, a respondent is first shown the
defendant’s mark in context (e.g., picture or product, advertisement, etc.)
and then asked a series of open-ended questions, such as “who makes or
puts out the product(s) shown here,” “why do you say that,” and “what
other product(s) does the company put out.” The last question could be
asked in the form of closed-ended “sponsorship” or “affiliation” ques-
tions, e.g., “do you believe that whoever makes or puts out the product(s)
1) is sponsored by (or affiliated with) another company, 2) is not spon-
sored by (or affiliated with) any other company.”10 There is some variation
in the phrasing of these questions and care must be taken to ensure that the
questions are not leading or suggestive.
The last question allows the researcher to identify whether consumers
believe that the mark they were shown is related to similar brands the com-
pany produces, even if consumers do not know the name of the company
itself. For example, in the original Union Carbide Corp. v. Ever-Ready, Inc.
case, while consumers were not able to directly name Union Carbide as the
MIZIK_9781784716745_t.indd 641 14/02/2018 16:38

company that makes Eveready batteries, they did indicate that the product
came from the same company that made Eveready batteries.11
When designing an Eveready format survey, it is critical that the survey
respondents are drawn from the proper universe (i.e., individuals who
make the ultimate purchase decision), and that the sample of respondents
is representative of the universe it is intended to reflect.12 Surveys may
be conducted by mall-intercept, telephone, or online.13 Regardless of
the format, the survey must include a screener to select the appropriate
respondents who represent typical purchasers of the product in question
(i.e., the product associated with the defendant’s allegedly infringing
mark). In addition to sampling respondents from the proper universe, it
is important to select the appropriate control for the survey. Controls can
be used to eliminate background noise. A strong mark is more likely to
be remembered and associated with more products (even when there is no
similarity of marks) than a mark that is relatively unknown. For example,
if a respondent is shown a soda can with a Pepsi mark, and then asked who
makes or puts out this product or what other brand this product brings to
mind, he or she may guess Coke even though there is no similarity in the
marks. This is due, in part, to the strength of Coke brand. The control
group “functions as a baseline and provides a measure of the degree to
which respondents are likely to give an answer . . . not as a result of the
[product at issue], but because of other factors, such as the survey’s ques-
tions, the survey’s procedures . . . or some other potential influence on a
respondent’s answer such as preexisting beliefs.”14
In implementing an Eveready survey, it is common to use a test and
control group and compare the levels of confusion between the two
groups. Respondents in the test group are shown a picture or advertise-
ment with the stimulus or defendant’s mark and asked the series of
questions described above. In the test group, confusion is measured by the
proportion of respondents who associated the defendant’s mark with the
plaintiff’s either by naming the plaintiff specifically or naming other prod-
ucts produced by the plaintiff. Respondents in the control group are asked
the same questions as the test group but receive a slightly varied control
mark without the at-issue features. Similarly, confusion is measured by
the proportion of respondents who associate the control mark with the
plaintiff. There may be some baseline level of confusion in both groups
due to other factors, such as brand awareness or brand association, and
thus the difference in confusion levels between the test and control group
is a measure of consumer confusion due to the defendant’s mark.
If the difference is positive, then consumers are more likely to confuse
the defendant’s mark with plaintiff’s products or services than they are
with a slightly varied control mark, indicating the defendant’s mark is
MIZIK_9781784716745_t.indd 642 14/02/2018 16:38

causing consumer confusion. If there is no difference or a negative dif-

ference in confusion between the test and control group, then there is no
evidence that respondents confuse the defendant’s mark with plaintiff’s
products or services.
Fiji Water Company, LLC, et al. (“FIJI”) v. Fiji

Mineral Water USA, LLC, et al. (“VITI”)
Allegations
On October 6, 2009, FIJI filed suit against VITI for federal law claims of
trademark infringement, trade dress infringement, trademark dilution,
and California statutory and common law claims of unfair competition.15
FIJI subsequently filed a motion for a preliminary injunction against
VITI. Plaintiffs allege Defendant’s VITI bottled water was confused with
FIJI bottled water due to similarities between the trademarks and trade
dress of FIJI’s bottled water products and the packaging and labeling of
VITI’s bottled water products.16 Figure 34.1 depicts the packaging for
selected FIJI and VITI bottles.
In order for FIJI to succeed on the merits of its trade dress infringe-
ment claim, FIJI must show that the VITI bottle design will likely cause
consumers to believe that VITI is produced by or affiliated with FIJI.
Confusion can be established using guidelines commonly known as the
Sleekcraft factors. The eight factors are: (1) the similarity of the mark(s)
or trade dress; (2) the strength of the mark(s) or trade dress; (3) evidence
of actual confusion; (4) the proximity or relatedness of the goods; (5) the
degree to which the marketing channels used for the goods converge; (6)
the type of goods and the degree of care likely to be exercised by the pur-
chasers; (7) the defendant’s intent in selecting the mark or trade dress; and
(8) the likelihood of expansion of the product lines.17 In this case, the court
relied on evidence from a consumer survey for (2) the strength of the trade
dress, and (3) evidence of actual confusion.18
FIJI Survey Expert Assignment and Findings
FIJI retained Dr. Hal Poret, a survey expert, to measure the likelihood
of consumer confusion between the VITI label and packaging and the
FIJI trade dress. Dr. Poret’s assignment was to determine whether or
not consumers who viewed the VITI label and packaging confused the
bottled water for FIJI bottled water, or any other related products pro-
duced by Plaintiffs.19 Dr. Poret, the FIJI expert, addressed this question
MIZIK_9781784716745_t.indd 643 14/02/2018 16:38

Source: Expert Report of Hal Poret on Survey to Determine Whether There is a

Likelihood of Confusion Between VITI Bottled Water and FIJI Bottled Water, Fiji Water
Company, LLC et. al. v. Fiji Mineral Water USA, LLC et. al., United States District
Court, Central District of California, Southern Division, Case 8:09-cv-01148-CJC-MLG,
filed February 8, 2010 (“Expert Report of Hal Poret”), at 36.
Figure 34.1 FIJI (left) and VITI (right) artesian bottled water

packaging
through a mall-intercept Eveready consumer confusion survey of 415

male and female respondents 16 years of age or older who were likely to
purchase bottled water either for themselves or someone else in the next
three months.20 The mall-intercept covered 12 markets around the United
States.21 Dr. Poret used a test and control group design with a bottle of
water as the stimulus. The test group was handed VITI bottled water (the
defendant’s product) and the control group was shown a similar stimulus
of artesian bottled water without the VITI label and packaging.22
In his study, FIJI expert found that 67 out of 209 respondents (32.1
percent) in the test group confused the VITI bottled water with the FIJI
brand. In the control group, FIJI expert found that 16 out of 206 respond-
ents (7.8 percent) confused the control bottled water with the FIJI brand.
MIZIK_9781784716745_t.indd 644 14/02/2018 16:38

After adjusting for confusion found in the control group, the FIJI expert
found a 24.3 percent confusion level as a result of the defendant’s label and
packaging.23 That is, after controlling for the baseline level of confusion,
24.3 percent of respondents answered that the VITI bottled water was
made by FIJI or the same company that makes FIJI bottled water. Thus,
FIJI expert concluded that “there is a high likelihood that VITI bottled
water will be confused with FIJI.”24
FIJI Survey Expert Eveready Survey Design
In the test group of the FIJI expert’s survey, respondents were shown the
defendant’s label and packaging of the VITI bottled water, an image of
which is provided in Figure 34.2.
The respondents were then asked the Eveready format questions such
as: “what company or brand puts out the product I just showed you?”
“does the company that puts out the product I just showed you put out
any other product or products that you know of?” “what other product or
products?”25 Respondents who mentioned the FIJI brand or company in
response to any of the questions were coded as confused.
Source: Expert Report of Hal Poret, at 163.
Figure 34.2 Test stimulus used in Dr. Poret’s (FIJI expert) survey
MIZIK_9781784716745_t.indd 645 14/02/2018 16:38

In the control group of the FIJI expert’s survey, respondents were shown
a similar product, but without the at-issue features of the label and pack-
aging of VITI bottled water. Specifically, the control bottle excluded the
square shape of the bottle, the blue bottle cap, the three-dimensional effect
of two transparent labels and the stylization of the VITI logo, among
others. The control bottle retained other, not-at-issue features, such as
the information that the product was mineral water from the Fiji Islands
and a label that conveyed images of water and tropical islands, among
others.26 Figure 34.3 depicts the image of the bottle packaging utilized in
the control survey.
The respondents in the control group were asked the same questions
as the respondents in the test group. Respondents who mentioned the
Figure 34.3 Control stimulus used in Dr. Poret’s (FIJI expert) survey
MIZIK_9781784716745_t.indd 646 14/02/2018 16:38

Table 34.1 Summary of Dr. Poret’s (FIJI expert) Eveready survey results
Level of confusion
Test group 32.1%
Control group 7.8%
Difference in confusion level 24.3%
FIJI brand or company in response to any of the questions were coded

as confused. It is expected that, when shown a bottle of water from the
Fiji islands, some consumers will naturally assume that it is related to the
FIJI brand. Thus, the difference between the rate of confusion in the test
and control groups is the rate of confusion that can be attributed to the
defendant’s labeling and packaging.27
After the mall-intercept survey was conducted, follow-up validation
calls were conducted by a third-party company to verify the respondent’s
identities, satisfaction of the screening criteria, and participation in the
study.28 A total of 281 out of 415 respondents were successfully contacted
(68 percent validation) and no discrepancies were found.29
VITI Survey Expert Assignment and Findings
VITI retained Dr. Kevin Gentry, a survey expert, to perform a separate

survey to measure the likelihood of consumer confusion of the VITI pack-
aging and labeling with the FIJI trade dress. Dr. Gentry, the VITI expert,
conducted an Eveready survey similar in design – but not identical – to the
one used by the FIJI expert, Dr. Poret. Dr. Gentry conducted an online
survey of 401 respondents, who were first asked to identify the brands
of bottled water they purchase from a list of 19 brands.30 Then, respond-
ents were shown an image of VITI bottled water and asked three main
questions:
Q2) Where do you think this product comes from?

Q3) Whose product do you think this is?
Q4) What brand names are also used by the company that makes this
product?31
Based on his survey and analysis, VITI expert concluded that only 8
percent of respondents confused the VITI label and packaging with the
FIJI brand.32
MIZIK_9781784716745_t.indd 647 14/02/2018 16:38

FIJI Criticisms of VITI Survey
The Plaintiffs described key differences between the FIJI expert’s and the
VITI expert’s surveys as flaws that undermined VITI expert’s conclusion.
Those flaws, and the corresponding biases, included the following:
Improper sample selection

To qualify for the VITI expert’s online survey,33 respondents were required
to do the majority or equal amount of the grocery shopping for their
household.34 Plaintiffs argued that this sample selection excludes a large
portion of the sample of likely consumers of bottled water.35 FIJI and
VITI bottled water is sold not only at grocery stores, but also at delis,
convenience stores, movie theaters, and such. Plaintiffs argued that the
survey screening criteria should select a sample of respondents that is
representative of the universe of consumers that are likely and able to pur-
chase bottled water. By restricting the sample to members of households
who do the majority of the grocery shopping, the VITI expert improperly
limited the sample.36
Biased respondents with brand names in the screener

The first question in the VITI expert’s survey asked respondents which
brands of bottled water they had purchased in the last year, and included
a long list of bottled water brand names including FIJI and VITI.37 By
asking this question, Plaintiffs claimed that VITI expert implicitly sug-
gested to respondents that FIJI and VITI were two separate brands
without any affiliation. Furthermore, Plaintiffs argued that the VITI
expert focused the respondent on different brands of bottled water, teach-
ing the respondent names of new brands he or she may not have heard of
yet, and changing the overall frame of reference of the respondent.38
Ignored all responses to Q2 indicating that VITI comes from FIJI

The VITI expert completely disregarded any answer to Q2 “where do you
think this product comes from” that indicated VITI “comes from” FIJI
(even if it was clear the respondent meant the company or brand FIJI and
not the geographic location).39 As the brand name FIJI also refers to a
geographic location, Plaintiffs argue that it is difficult, if not impossible,
to tell whether or not respondents who answer “Fiji” are referring to the
water coming from the country Fiji, or the brand FIJI.
Ignored misspelled responses to Q3 indicating that VITI is made by FIJI

The VITI expert’s unadjusted 9 percent finding was based on the 35 respond-
ents who named FIJI (and spelled it correctly) in Q3, “whose product do
MIZIK_9781784716745_t.indd 648 14/02/2018 16:38

you think this is?”40 In deciding whether or not a respondent confused the
VITI label and packaging with the FIJI trade dress, VITI expert did not
consider 10 responses that misspelled “Fiji” as “Figi,” “Fugi,” or “Fuji.”41
Ignored all responses to Q4 indicating that VITI is made by the company

that uses FIJI brand
The VITI expert did not count any answer to Q4 “what brand names are
also used by the company that makes this product” that indicated that
VITI is made by the company that uses the FIJI brand.42
Failed to ask two standard questions commonly used in the Eveready

format
The VITI expert failed to ask whether the maker of VITI is sponsored by
(or affiliated with) the maker of FIJI or whether VITI is put out with the
approval of FIJI.43
Failed to verify or validate respondents

The VITI expert failed to verify the demographic characteristics of the
respondents who took his survey and whether or not they met his screen-
ing criteria.44
Plaintiffs found that, in total, the VITI expert failed to account for 79
respondents that named FIJI in response to any of his three main questions.
Using the VITI expert’s data, Plaintiffs found that, with the most conserva-
tive approach, at least 18–20 percent of respondents (upward to 28 percent)
confused the VITI label and packaging as being associated with the FIJI
brand or company.45 Due to these biases, among others, Plaintiffs argued that
the VITI expert’s survey artificially deflated the confusion level findings.46
Case Outcome
The court found evidence of consumer confusion based on FIJI expert

Dr. Poret’s survey and his analysis of VITI expert Dr. Gentry’s survey.47
Furthermore, the court agreed with Dr. Poret’s critique of Dr. Gentry’s
analysis stating, “[t]he Court was troubled by several aspects of Dr. Gentry’s
survey, all of which indicate that his report under-stated the confusion level.
Indeed, FIJI expert Dr. Hal Poret found that after adjusting for some of
these aspects, Dr. Gentry’s survey would support finding at minimum a
17.5% confusion level, and perhaps as high as a 28% confusion level.”48 Due
to this evidence of confusion and other factors, Judge Cormac J. Carney
granted FIJI’s motion for a preliminary injunction against VITI.49
MIZIK_9781784716745_t.indd 649 14/02/2018 16:38

Notes
1. Bird, C. R., and Steckel, J. H., “The Role of Consumer Surveys in Trademark
Infringement: Empirical Evidence from the Federal Courts,” U. Pa. J. Bus. L. 14 (2011):
1016.
2. McCarthy, T. J., McCarthy on Trademarks and Unfair Competition, 32:158 at 32 189
(4th ed. 2003).
3. Thornburg, R. H. Trademark Surveys: Development of Computer-Based Survey
Methods, 4J. Marshall Rev. Intell. Prop. L. 91 (2005), 93.
4. http://www.uspto.gov/page/about-trademark-infringement, accessed August 15, 2015.
5. http://www.uspto.gov/page/about-trademark-infringement, accessed August 15, 2015.
6. Contrary to the Eveready format, the Squirt format is commonly used in cases in
which the mark is not as well-known (i.e., not easily accessible in memory) and must be
included in the survey design as part of a line-up of brands. See Squirtco v. Seven-Up
Co., 628 F. 2d 1086, 1089 n.4, 1091 (8th Cir. 1980).
7. Union Carbide Corp. v. Ever-Ready, Inc., 531 F. 2d 366 (7th Cir. 1976).
8. Swann, J. B., Brewster, W. H., Mayberry, J. D., and Henn, Jr., R. C., “Likelihood
of Confusion Surveys,” Intellectual Property Desk Reference: Patents, Trademarks,
Copyrights and Related Topics, 171.
9. Swann, Brewster, Mayberry, and Henn, “Likelihood of Confusion Surveys,” 171.
10. Swann, Brewster, Mayberry, and Henn, “Likelihood of Confusion Surveys,” 171–182.
11. Union Carbide Corp. v. Ever-Ready, Inc., 531 F. 2d 381 (7th Cir. 1976).
12. See, e.g., 1-800 Contacts, Inc. v. WhenU.com, 309 F. Supp. 2d 467, 499 (S.D.N.Y. 2003).
13. Regardless of the format, it is standard procedure to validate that a survey was actu-
ally completed by a real respondent, the demographic information provided by the
respondent was accurate, and the respondent who completed the survey actually met
the screening criteria. See, e.g., Lavrakas, P. J., “Telephone Surveys,” in Handbook
of Survey Research, eds. P. V. Marsden and J. D. Wright, 2nd edition, Bingley, UK:
Emerald Group Publishing, 2010, at 493.
14. Novartis Consumer Health Inc. v. Johnson & Johnson Merck Consumer Pharms. Co., 129
F. Supp. 2d 351, 365 n.10 (D.N.J. 2000).
15. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., 741 F. Supp. 2d
1165, United States District Court, Central District of California, Southern Division,
Case 8:09-cv-01148-CJC-MLG, filed September 30, 2010, at 1.
16. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., at 1.
17. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., 741 F. Supp. 2d
1165, at 7.
18. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC e. al., 741 F. Supp. 2d
1165, at 8–9.
19. Expert Report of Hal Poret, at 35–36.
20. Expert Report of Hal Poret, at 39, 48, 74.
21. Expert Report of Hal Poret, at 39.
29. Industry practice is to validate 15–20% of respondents. Expert Report of Hal Poret, at
54.
30. Rebuttal Report of Hal Poret Regarding Gentry Bottled Water Survey, Fiji Water
Company, LLC et. al. v. Fiji Mineral Water USA, LLC et al., United States District
Court, Central District of California, Southern Division, Case 8:09-cv-01148-CJC-
MLG, filed August 2, 2010, (“Rebuttal Report of Hal Poret”), at 5.
MIZIK_9781784716745_t.indd 650 14/02/2018 16:38

31. Rebuttal Report of Hal Poret, at 5.

32. Dr. Gentry finds 9% of respondents confused VITI with FIJI but subtracts one percent-
age point for noise, resulting in 8% confusion. Rebuttal Report of Hal Poret, at 5.
33. Plaintiffs also criticized Dr. Gentry’s use of an online survey, claiming that “a two-
dimensional photo did not capture the overall impression of the VITI bottle and all
aspects of the relevant trade dress as would be seen by a real consumer in three-dimen-
sions.” The court did not take this criticism into account in the final decision. Rebuttal
Report of Hal Poret, at 7, 20.
38. Rebuttal Report of Hal Poret, at 21–22.
39. Rebuttal Report of Hal Poret, at 12–16, 22.
45. Rebuttal Report of Hal Poret, at 16, 24.
47. Fiji Water Company, LLC et al. v. Fiji Mineral Water USA, LLC et al., at 8–9.
MIZIK_9781784716745_t.indd 651 14/02/2018 16:38

35. Survey evidence to evaluate a marketing
claim: Skye Astiana, Plaintiff v. Ben &
Jerry’s Homemade, Inc., Defendant
Alan G. White and Rene Befurt
This case study describes an application of market research methods – in

this case, a consumer survey – in the context of a business litigation where
two parties are engaged in a legal dispute among themselves. This par-
ticular application of a survey differs from the more typical use of market
research conducted for new product development, consumer satisfaction
studies, or the assessment of consumers’ willingness-to-pay for a good or
service. Below we describe a particular type of claim made for certain food
products (here, Ben & Jerry’s ice cream products) – so-called All Natural
claims – and explain why and how a survey can be an important means for
either Plaintiffs or Defendants to present evidence on the interpretation of
an “All Natural” claim, as well as to evaluate the role that such a claim can
play in the consumer’s decision-making process.
Consumer surveys have played an important role in market research
for several decades. Companies and marketing managers have long recog-
nized that consumer preferences are heterogeneous and the doctrine that
“one size fits all” is empirically false.1 As a consequence, understanding
preferences, purchase drivers, perceived satisfaction, and numerous other
consumer decision-making factors became tremendously important to a
product’s commercial success, which led a vast number of companies to
engage in market research and use surveys regularly in order to better
understand their current and prospective customers.2 Results from such
surveys can be crucial for other audiences, too. For example, they may
assist a judge or jury in answering important questions of interpretation
and reliance on advertisement messages.
“All Natural” Food Products
“All Natural” claims have increasingly become the subject matter of law-
suits in recent years.3 These cases may involve allegations that the adver-
tising of various manufacturers is deceptive and misleading. For example,
the complaints may interpret “All Natural” labels very narrowly and claim
652
MIZIK_9781784716745_t.indd 652 14/02/2018 16:38

Survey evidence to evaluate a marketing claim 653
that, because of food processing or artificial ingredients, the products, as

characterized, are not “All Natural.” Because the term “All Natural” has
not been defined (yet) by a government institution such as the Food and
Drug Administration or the Federal Trade Commission, neither consum-
ers nor producers can follow an objective definition or guideline as to the
meaning of this term. It is this ambiguity and lack of clarity on interpreta-
tion that may well be the genesis of many of these lawsuits. In addition to
the question of how to interpret an “All Natural” claim, the question arises
whether or not – and to what extent – damages should be awarded to those
who have been affected by the allegedly false and misleading claim, as con-
sumers may have purchased products that do not provide the advertised
characteristic(s). Therefore, in a typical false advertising case, Plaintiffs
may argue that the alleged deception “caused” the sales of a product or at
least increased its sales because the false promise signaled additional value
to the consumer. However, it may not be clear whether this allegation of
impact is true; that is, without data one can only speculate whether there
was any cause-and-effect relationship between the alleged false claim and
sales of the product. In litigation, it is not sufficient to demonstrate con-
sumer confusion; one also needs to assess to what extent the “All Natural”
claim drove the sales of a product, if at all.
Recent court decisions reflect the ambiguity about the “right” inter-
pretation of the “All Natural” claims, and a review of judges’ decisions
suggests that there is no universal rule that can be applied to all cases.
For example, courts have adopted both narrow and broad interpretations
of “All Natural”. One of the best-known decisions relates to the “All
Natural” claim on pasta products sold under the Buitoni brand. Judge
John F. Walter in the Central District of California favored a rather
general definition of “All Natural” and pointed out that a reasonable
consumer would be aware that pastas don’t grow on ravioli trees and
tortellini bushes – meaning that consumers expect a certain amount of
food processing but still consider the product “All Natural.”4
On the other hand, Judge William P. Dimitrouleas in the Geraldine’s
Cookies matter relied on a more literal and narrow interpretation of
the “All Natural” claim and allowed allegations stating that consumers
may have been deceived because genetically modified ingredients such
as canola oil, dextrose, or corn starch were not natural.5 Given these
decisions and the different reasoning behind each, it appears that there is
no hard-and-fast rule on what constitutes consumers’ reasonable beliefs
about and reliance on “All Natural” claims as displayed on product labels
and advertised by firms. As a consequence, the extent to which a product
is “All Natural” – at least in consumers’ minds – has to be determined on
a case-by-case basis. Similarly, it is not immediately evident to what extent
MIZIK_9781784716745_t.indd 653 14/02/2018 16:38

consumers rely on the label in its entirety – or even a specific element of

it (e.g., a specific claim) – when they purchase any given product. Surveys
are an important tool in answering these questions as they can elicit
consumers’ different interpretations of “All Natural” claims, and establish
a causal relationship (if any) between a given claim and consumers’
purchase intent.
The Ben & Jerry’s “All Natural” Case
On September 29, 2010, Plaintiffs brought a “class action” lawsuit6 on

behalf of individuals who purchased ice cream products produced by
Defendant Ben & Jerry’s Homemade, Inc. (“Ben & Jerry’s”) on or after
September 29, 2006, and that were labeled “All Natural” but allegedly
contained alkalized cocoa processed with a synthetic agent. Plaintiffs
claimed that the packaging and advertisements for the ice cream prod-
ucts were deceptive and misleading because of the use of the words “All
Natural” to characterize the ingredients of the product. According to
Plaintiffs, using a “synthetic” agent to alkalize the cocoa contained in the
ice cream was counter to Ben & Jerry’s claiming its ice cream products as
being “All Natural” and would therefore need to be corrected and reme-
died. Specifically, the complaint that was filed in the United States District
Court in the Northern District of California alleged six causes of action,
which were: (1) unlawful business practices, (2) unfair business practices,
(3) fraudulent business practices, (4) false advertising, (5) restitution based
on unjust enrichment, and (6) common law fraud. For the purpose of this
case study, we will focus on the false advertising claim pertaining to the
“All Natural” promise that was made on the packaging of specific Ben &
Jerry’s ice cream products.
Ben & Jerry’s Survey Expert: Assignment and Findings
Dr. Kent Van Liere, a survey expert on behalf of Ben & Jerry’s, filed
an expert report on October 2, 2013, employing a survey on consumers’
knowledge and beliefs about Ben & Jerry’s ice cream products.7 The goal
of such a report and its included analyses is to assist the judge in his or
her evaluation of consumers’ interpretation of and reliance on the “All
Natural” claims. An expert report is somewhat comparable to an aca-
demic article in the sense that it explains the methodologies, data sources,
and sets forth the results of the survey in a manner that is transparent and
reproducible.
As with any academic research paper, Dr. Van Liere was given a
MIZIK_9781784716745_t.indd 654 14/02/2018 16:38

research question – specifically, he was asked to “to conduct research to

determine whether or not consumers perceive that the words ‘All Natural’
on Ben & Jerry’s product labels mean that the cocoa contained in the
product is processed ‘naturally.’” In addition, Dr. Van Liere evaluated
the extent to which consumers’ perceptions of the “All Natural” claims
affected purchase intentions, if at all. Based on his survey, Dr. Van Liere
concluded that “[m]ore than half of consumers surveyed . . . do not know
or do not perceive that the Ben and Jerry’s product shown contains alka-
lized cocoa,” and that “[a]fter using a control condition to net out guess-
ing, demand effects, and preexisting beliefs that may have occurred when
shown the “All Natural” label, only 13 percent of respondents believe the
cocoa is alkalized using a natural ingredient.” Ultimately, he concluded
that “using the control to net out responses not related to the ‘All Natural’
label, [his] results demonstrate that a net 3 percent of respondents believe
the cocoa is alkalized naturally and indicate that this fact would make
them more likely to purchase the product.”8
Ben & Jerry’s Survey Expert: Survey Design – the Diamond Principles
Dr. Van Liere’s survey was conducted according to a set of principles that
were established by Professor Shari Diamond, a renowned professor of
law and a social psychologist. Dr. Van Liere adhered to five principles set
forth by Professor Diamond pertaining to (1) the relevant survey popula-
tion; (2) procedures for sampling from the relevant population; (3) ques-
tion design and interviewing procedures; (4) the nature of the specific test
and control stimuli shown to survey participants; and (5) the protocol
for calculating the results from the survey. These principles are described
below.
1. Dr. Van Liere defined his relevant population to be consumers 18

years or older who have purchased Ben & Jerry’s ice cream, ice cream
bars, or frozen yogurt in the past 10 years prior to taking the survey.
2. Dr. Van Liere created his sample using a mall intercept. The advan-
tages of a mall-intercept design are twofold: consumers are already in
a shopping environment and can take the stimuli and control packag-
ing in their hands, therefore emulating an actual shopping experience
more closely. The Van Liere survey ultimately relied on a sample of
consumers from 10 different shopping malls across the United States
and employed quotas for respondents’ age and gender.
3. Dr. Van Liere’s question design and interviewing procedures relied on
common practice survey methods. Each qualified respondent in the
mall intercept was seated in a private room and asked a number of
MIZIK_9781784716745_t.indd 655 14/02/2018 16:38

questions by the interviewer, including screener questions, filter ques-

tions, and questions related to “All Natural.”9,10 Respondents who
passed the screener were instructed to “[p]lease look at this [ice cream]
product as you would if you were considering making a purchase.
Take as much time as you would like to review it. You may pick up the
product but please do not open it. When you are finished looking at
it, please let me know,” and were then asked an open-ended question
about what message or messages the ice cream’s packaging conveyed
to them.11,12 After this assessment of the packaging’s primary message,
respondents answered a series of closed-ended questions. First, a ques-
tion addressed their knowledge or beliefs about the product’s produc-
tion processes. This question asked respondents whether they agreed
or disagreed with a series of statements, which included the main
research interest (“The product uses cocoa processed with an alkali”),
mixed among distractors that were included to disguise the ultimate
purpose of the survey.13 Further, Dr. Van Liere used three randomly
rotating questions for three product categories – nine questions in
total – that addressed the knowledge or beliefs about the production
process of the ice cream in more detail. Among these nine questions,
six were included as distractors – again, to disguise the goal of the
research. The remaining three questions targeted the topic of interest,
including the key question that asked, “The cocoa in this product is
processed with an alkali. If you have an opinion, which of the follow-
ing types of alkali do you think is used to process the cocoa in this
product?” and provided the following answer choices “A synthetic
alkali,” “A natural alkali,” “Either a natural or a synthetic alkali,”
and “Don’t know/No opinion.” Finally, the survey followed up with
two questions regarding purchase intent. First, a filter question deter-
mined whether or not the most recently provided answer about the
type of alkali used influenced respondents’ intent to purchase the Ben
& Jerry’s ice cream presented to them. If it did, respondents were then
asked to reveal whether they were more, neither more nor less, or less
likely to purchase the ice cream in front of them. Lastly, the survey
assessed whether the word “natural” meant to respondents that a
product is “organic” or is “not organic.”14
4. Dr. Van Liere employed a test and control group setup. The pro-
grammed survey randomly assigned respondents to a test or a control
group and each respondent was exposed to a stimulus – a Ben &
Jerry’s ice cream carton. For the test group, the stimulus was a Ben
& Jerry’s Cherry Garcia ice cream carton with the at-issue claim “All
Natural” embedded on the label (as allegedly purchased by class
members); for the control group, the stimulus was the same Ben &
MIZIK_9781784716745_t.indd 656 14/02/2018 16:38

Jerry’s Cherry Garcia ice cream carton, yet with an altered label
containing the phrase “Vermont’s Finest” and without the claim “All
Natural” on the label.
Except for the altered label (i.e., “Vermont’s Finest” used in place
of “All Natural”) the test and control stimuli were the same. In his
expert report, Dr. Van Liere reasons the choice of this survey design
as a method to counteract potential “background noise” that may be
caused by elements of the test condition that do not constitute alleg-
edly deceptive content. While this is a true and important statement,
experimental setups such as the one used by Dr. Van Liere can also
“All Natural” label as sold in retail locations: test group
“Vermont’s Finest” label used as control stimulus: control group
Figure 35.1 Test and control stimuli used in the Van Liere survey (with
added illustrations)
MIZIK_9781784716745_t.indd 657 14/02/2018 16:38

establish a causal nexus between a stimulus and a target variable, and

have therefore become valuable tools in litigations.
5. Dr. Van Liere points to Professor Diamond to address the protocol
for calculating the results from the survey. In this case, Dr. Van Liere
focuses on counting the occurrences of confusion related to the “All
Natural” packaging and netting out potential noise; he did not utilize
the experiment for conducting any further statistical analyses.
Ben & Jerry’s Survey Expert: Plaintiffs’ Criticisms
Plaintiffs took issue with Dr. Van Liere’s survey and critiqued it on a
number of dimensions. One expert, Dr. Stephen A. Schneider, attacked
the design of the survey, while another expert, Dr. Elizabeth Howlett, pro-
vided other conceptual critiques. Both attempted to undermine Dr. Van
Liere’s credibility and the reliability of the results of his survey. Critique
points addressing the survey craftsmanship included, among others, an
alleged failure to measure the impact of the general “All Natural” claim on
consumer purchasing decisions; targeting of the wrong population; assess-
ment of the wrong time period because Dr. Van Liere included consumers
who purchased Ben & Jerry’s in the past 10 years; lack of representative-
ness of the sample because Dr. Van Liere sampled consumers across the
country, rather than focusing only on the California customers at issue;
and insufficient sample size. Critique points addressing the survey on a
conceptual level included a failure to control for the effects of marketing,
news coverage, FDA warnings, popular culture, advertising, and other
sources of information that influenced customers to believe that Ben &
Jerry’s is “All Natural,” above and beyond being exposed to the packag-
ing in a single showing in a mall; a lack of testing of whether some custom-
ers had already come to believe that all Ben & Jerry’s ice cream products
are “All Natural” and all ingredients and processes are natural; and the
lack of control for the fact that some consumers were exposed to adver-
tisements or warnings that Ben & Jerry’s was not “All Natural” while
others were exposed to messages that Ben & Jerry’s was “All Natural.”
In a response declaration, Dr. Van Liere addressed these critiques one by
one and insisted that his survey was ultimately reliable; he also argued that
neither of the opposing experts was qualified to evaluate his survey based
on their educational background.
MIZIK_9781784716745_t.indd 658 14/02/2018 16:38

Case Outcome
After her review of experts’ reports, rebuttal reports, and a number of

legal motions submitted by each party, Judge Phyllis J. Hamilton denied
Plaintiffs’ motion for class certification, which would have allowed
the lawsuit to proceed as a single action covering all purchasers of the
relevant Ben & Jerry’s products. In her opinion, the Judge explicitly
acknowledged the results of Dr. Van Liere’s survey, stating that the
“Defendant also asserts that the class is overbroad, because at most
only 13% of consumers surveyed expected that the ‘All Natural’ label
meant that the alkali was ‘natural’ and only 3% said it would affect their
purchasing decision.” Aside from relying on Dr. Van Liere’s survey, the
Court considered further evidence by Defendants that showed the lack
of a price premium associated with Ben & Jerry products labeled as “All
Natural” compared with those that did not carry such a claim on the
label. Taking the survey results and price premium analyses together,
the Court concluded that Defendant had provided sufficient evidence
suggesting that consumers were not likely deceived by the “All Natural”
claim, while Plaintiffs had presented no evidence in opposition and could
not speak to consumers’ beliefs as reliably as that in Dr. Van Liere’s
survey.
Notes
1. Vithala R. Rao and Joel H. Steckel (1998), “Analysis for Strategic Marketing,”
pp. 23–75.
2. William M. Pride and O. C. Ferrel (2010), “Marketing,” p. 130.
3. For example, an article in the Wall Street Journal pointed out that at least 100 “All
Natural” lawsuits were filed in the years 2012 to 2013. See Mike Esterl, “Some Food
Companies Ditch ‘Natural’ Label,” Wall Street Journal, November 6, 2013.
4. Specifically, Judge Walter stated in his order to dismiss an “All Natural” law suit against
Nestle: “For example, Plaintiff offers the Webster’s Dictionary definition of ‘natural,’
meaning ‘produced or existing in nature’ and ‘not artificial or manufactured.’ [. . .]
However, even Plaintiff admits that this definition clearly does not apply to the Buitoni
Pastas because they are a product manufactured in mass [. . .], and the reasonable con-
sumer is aware that Buitoni Pastas are not ‘springing fully-formed from Ravioli trees
and Tortellini bushes.’” Order Granting Defendants’ Motion to Dismiss First Amended
Complaint (filed 9/23/13; Docket No. 30), Case No. CV 13-5213-JFW (AJWx), October
25, 2013.
5. Judge Dimitrouleas stated in his order to dismiss a Defendant’s motion in an “All
Natural” law suit against Bodacious Food Company: “Specifically, Plaintiff alleges
that ‘the Products contain synthetic, artificial, and/or genetically modified ingredients,
including, but not limited to, Sugar, Canola Oil, Dextrose, Corn Starch, and Citric
Acid.’ [. . .] The Court finds no basis to disregard those allegations, which identify the
specific compounds that are purportedly not ‘natural.’” Order Denying Defendant’s
Motion to Dismiss, Case No. 14-80627-CIV-DIMITROULEAS, September 14, 2014.
MIZIK_9781784716745_t.indd 659 14/02/2018 16:38

6. A “class action” lawsuit refers to lawsuits brought by one or more plaintiffs on behalf
of many similarly situated individuals who are able to present evidence that “the ques-
tions of law or fact common to the members of the class predominate over any ques-
tions affecting only individual members, and that a class action is superior to other
available methods for the fair and efficient adjudication of the controversy” (“The Use
of Econometrics in Class Certification,” American Bar Association, Econometrics:
Legal, Practical, and Technical Issues, ABA Publishing, 2005, pp. 179–224, p. 180). For
example, if it were discovered that manufacturers were conspiring to raise prices on a
common consumer good, it is possible that the consumers of this good will have been
affected in a similar way. It may be more efficient for a single lawsuit representing these
similarly-affected consumers to be filed, rather than thousands of separate lawsuits, one
for each consumer, for both the court and the individuals.
7. Expert Report of Dt. Kent Van Liere, Skye Astiana on behalf of herself and all others
similarly situated v. Ben & Jerry’s Homemade, Inc., Case No. 4:10-cv- 04387-PJH,
October 2, 2013 (hereafter “Van Liere Report”).
8. Van Liere Report, p. 6.
9. Surveys and interviews screen respondents by asking them a series of background ques-
tions before presenting them with the full questionnaire. These background questions
ensure that only individuals in the target population are surveyed. Dr. Van Liere asked
screener questions to identify American consumers who were over 18 and who had
purchased Ben & Jerry’s ice cream or frozen yogurt in the past ten years. Dr. Van Liere
excluded consumers who worked in a market research company, a store in the mall, or
a dairy product manufacturer, as well as consumers who had participated in a market
research study in the past three months. See Van Liere Report, p. 9 and Exhibit C.
10. Surveys include filter questions to reduce the likelihood of respondents guessing at
answers if they do not have an opinion on the topic of interest. Filter questions can
include a “no opinion” option in the answer or they can explicitly ask respondents
whether or not they have an opinion on the topic before asking follow-up questions.
See Shari S. Diamond, “Reference Guide on Survey Research,” in Reference Manual
on Scientific Evidence, 359–423, Federal Judicial Center/National Academy of Sciences,
2011.
11. Respondents were also asked a follow-up question: “What makes you say that?”
12. Open-ended questions allow respondents to answer using their own words, while close-
ended questions require respondents to select one (or more) answer option(s) provided
to them in the survey.
13. To avoid respondent bias, some surveys include distractors, which are questions unre-
lated to the central research topic, in order to obscure the true purpose of the survey.
For example, Dr. Van Liere’s primary topic of interest is the alkali content of the Ben
& Jerry’s product, but includes questions on the use of pasteurized milk and the car-
rageenan content of the product to avoid placing explicit emphasis on the topic of inter-
est, which could bias responses.
14. All of the questions described here were followed up with an open-ended question of
“why do you say that?”
MIZIK_9781784716745_t.indd 660 14/02/2018 16:38

36. Machine learning in litigation
Vildan Altuglu and Rainer Schwabe
Machine Learning Applications in Legal

Practice1
Classification: E-Discovery
In civil law, the United States’ Federal Rules of Civil Procedure (“FRCP”)
require that parties to a lawsuit provide documents to the opposing side
that are relevant to the matter at hand, as long as they are not excluded by
attorney–client privilege or similar restrictions and as long as the request
is not overly burdensome.2 Since the adoption of the FRCP in 1938, the
scope and volume of discovery has steadily increased in lockstep with
technological innovations such as the office copier.3 Amendments to the
FRCP in 2006 made it clear that all “electronically stored information”
is within the scope of civil discovery. As one recent review of discovery
rules notes, “[d]iscoverable information is now found not only on desktop
computers and network servers, but on PDAs, smart cards, cell phones,
thumb drives and backup tapes, as well as in bookmarked files, temporary
files, activity logs, Facebook accounts, and text messages, to name just
a few examples.”4 As of 2013, a single lawsuit could involve more than
100 million pages of discovery documents, requiring over 20 terabytes of
storage.5 By one estimate, discovery costs represent 35–50 percent of the
cost of litigation.6
The need to deal with this deluge of information has created an industry
around electronic discovery or e-discovery through which “electronic
data is sought, located, secured and searched with the intent of using it
as evidence in a civil or criminal legal case.”7 A large group of software
and consulting firms with different business models and different levels
of technological sophistication assist companies and law firms in the
e-discovery process.8
Machine learning plays an increasingly important role in one aspect
of e-discovery: document review.9 In a typical document review exercise,
documents requested are classified in terms of their likely relevance to the
case. Traditional methods for accomplishing this classification typically
involve keyword or Boolean searches, followed by manual review of the
search results.10 It is hardly surprising, however, that methods utilizing
661
MIZIK_9781784716745_t.indd 661 14/02/2018 16:38

machine learning have been shown to outperform traditional methods

both in accuracy and cost.
A typical example of an e-discovery document review process involving
machine learning tools operates as follows.11 Given a document request,
a human operator uses traditional tools (typically a keyword search or
random selection within a class of documents) to identify documents to
show to human reviewers. These human reviewers label each document as
responsive or unresponsive (i.e., relevant or not relevant to the questions at
issue in the case), creating a training set for the machine learning algorithm.
The machine learning algorithm then reviews additional documents, using
the presence of terms or word patterns to predict the likelihood that each
of these added documents is responsive. The additional documents that
the algorithm deems most likely to be responsive are manually assessed
for responsiveness by human reviewers and added to the training set, thus
improving the algorithm’s precision. This process continues until enough
responsive documents have been identified.
In a related approach, human reviewers evaluate the responsiveness of
those documents about which the algorithm is least certain rather than
those about which it is most certain, a procedure known as uncertainty
sampling.12 This procedure for expanding the training set is designed to
maximize the improvement in the algorithm’s accuracy at each step. The
procedure continues until the algorithm is judged to be accurate enough,
where sufficiency is determined by a cost–benefit analysis that weighs
the cost of continuing to employ human reviewers against the increased
accuracy that would be gained through additional manual review of bor-
derline documents. At this point, the final algorithm is run one last time,
and human reviewers go through all documents that score above a given
threshold relevance score.
Classification: Identifying Precedent
In common law systems such as the judicial system in the United States,
previous court rulings can establish a principle or rule that can be binding
or persuasive to a court. Black’s Law Dictionary defines precedent as “rule
of law established for the first time by a court for a particular type of case
and thereafter referred to in deciding similar cases.”13 Identifying the best
precedent to cite in support of legal arguments can make the difference
between winning or losing a case. In principle, lawyers must evaluate
thousands of cases in order to identify the most relevant legal precedent
for a given case. More often, they rely on their experience and training to
narrow their search to a more manageable number.
However, if a case presents factual circumstances that are somewhat
MIZIK_9781784716745_t.indd 662 14/02/2018 16:38

Machine learning in litigation 663
different than those present in commonly cited case precedent, a lawyer

may be unable to narrow her search based on her experience. In such a
situation, the problem of identifying relevant precedent from hundreds
of years of case law is not unlike the problem of identifying relevant
documents among the millions produced, or identifying spam among the
thousands of emails in an inbox. Methods similar to those described above
for reviewing produced documents in e-discovery are being applied to
identifying relevant precedent.14 Further, computer scientists are develop-
ing “semantic searching” techniques to identify documents that may not
include the specific keywords used in the search, but include synonyms
or equivalent words. For example, a search might flag cases discussing
“debentures” when the keyword itself was “bonds.” 15
Prediction: Predicting Case Outcomes
Lawyers are routinely asked questions about exposure, litigation costs,

or the likelihood of prevailing on their clients’ claims, and typically
provide answers based on their professional experience. Demand for
more precise, data-driven analysis of these questions has led to the
emergence of software and consulting companies that provide services
such as legal cost benchmarking, predicting exposure to damages, and
predicting judicial decisions.16 Academics have also taken an interest.
The most sophisticated of these quantitative approaches use machine
learning algorithms.
A good example is the prediction of judicial decisions, where prediction
of US Supreme Court decisions, in particular, has garnered considerable
attention. Ruger et al. (2004), for instance, developed a forecasting model
using classification tree algorithms that correctly predicted 75 percent of
the Supreme Court’s decisions during the 2002 Term compared with a
success rate of 59 percent for a group of 83 legal experts making similar
predictions.17 Specifically, Ruger et al.’s approach identifies the independ-
ent variables most likely to influence the votes in the Supreme Court
and models how changes in these independent variables would affect the
outcome. The variables were ordered sequentially according to their influ-
ence on the decision, with each possible value of each variable representing
a separate branch of the decision tree.18 The estimated decision tree for
Justice O’Connor’s vote is shown in Figure 36.1.
The estimated decision tree for Justice O’Connor’s vote predicts that
the Justice would vote to reverse a previous decision whenever the lower
court’s decision was liberal. Further down the tree, the model predicts
that if (a) the lower court’s decision was not liberal, (b) the case was not
from the Second, Third, District of Columbia or Federal Circuit, and c)
MIZIK_9781784716745_t.indd 663 14/02/2018 16:38

Start
Is the lower Yes

court decision Reverse
liberal?
No
Case from
2nd, 3rd, D.C., Yes
Affirm
or Federal
Circuit?
No
Affirm
o
Is the
N
Is the primary issue civil

respondent the Yes
rights, First Amendment,economic
United States? activity, or federalism?
Ye
Reverse
s
No
Reverse
Source: T. Ruger, P. Kim, A. Martin, and K. Quinn (2004), “The Supreme Court
Forecasting Project: Legal and Political Science Approaches to Predicting Supreme Court
Decisionmaking,” Columbia Law Review, 104, 4, 1150–1210, figure 1.
Figure 36.1 Decision tree for Supreme Court Justice O’Connor
the United States was a respondent in the case, Justice O’Connor would
vote to reverse precedent if the primary issue in the case was related to civil
rights, the First Amendment, economic activity, or federalism.19 In some
cases, Ruger et al. (2004) incorporated other justices’ predicted decisions
into a given justice’s decision tree. For example, according to Ruger
et al. (2004), Justice Thomas’ decision tree predicts that he would vote to
MIZIK_9781784716745_t.indd 664 14/02/2018 16:38

r eaffirm precedent if Justice Scalia’s predicted vote was not liberal and the
lower court’s ruling was conservative.20
More recently, Katz et al. (2014) have applied the extremely randomized
tree method, a close relative of the random forest approach, to identify
optimal decision trees while considering over 90 potential explanatory var-
iables.21 The random forest approach creates a “forest” of decision trees,
each tree trained on a different randomly selected subset of the overall
dataset. The model then makes a prediction by aggregating the predictions
of all the random trees in the forest by majority rule. The extremely rand-
omized tree method modifies this by randomizing the subset of attributes
used in each tree and the thresholds for those attributes.
Machine Learning Applications in Support

of Expert Witness Testimony
An Assessment of the Current Use of Machine Learning in Expert

Testimony
We begin by discussing simple searches we undertook to identify litigation

cases where machine learning methods have been used in support of expert
witness opinions. Using the Bloomberg Law database, we conducted
keyword searches of all publicly available court opinions through October
17, 2016, for the following broad terms: machine learning, data mining,
and text mining.22
Of the 164 hits, we identified only one that related to expert witness
testimony which applied machine learning techniques.23 This case was a
criminal litigation where a computer science professor testified about his
academic research on validating the uniqueness of a person’s handwriting
(i.e., the idea that “each individual has consistent handwriting that is
distinct from the handwriting of another individual”) and used machine
learning techniques to identify individual writers.
We found that most references to machine learning or data mining in
court opinions relate to the allegations or the challenged conduct in the
case (e.g., alleged data scraping from the plaintiff’s website via the use of
data mining techniques). The second most frequent references to these
terms related to description of a party involved in the litigation (e.g., as
part of the job description of the plaintiff in a sexual harassment case).
The third most frequent references were mentions of the use of data
mining and/or machine learning as part of evidence-gathering, either
at the pre-trial phase or during trial (e.g., an insurance company using
data mining techniques to uncover evidence of fraud in claims data).
MIZIK_9781784716745_t.indd 665 14/02/2018 16:38

Table 36.1 Data mining, text mining, and machine learning in court

opinions
Category Description Number of Percent

cases
Practice at issue Data mining, text mining, or 91 55%
machine learning were used by
plaintiffs or defendants and is
a component of the conduct at
issue in the case. Many cases
involve unauthorized use of
private data
Participant Data mining, text mining, or 41 25%
machine learning is pertinent
to the description of the
background of a plaintiff,
defendant, or another
interested party.
Evidence Data mining, text mining, 11 7%
or machine learning was
used to collect evidence. This
could be during trial or as
part of participants' normal
surveillance.
Legislation Reference to legislation 5 3%
regarding data mining.
Testimony Use of data mining as part of 1 1%
expert testimony.
Other Case about data mining patent; 15 9%
citation to data mining article;
unknown.
Total 164
Source: Authors’ classification based on search results from Bloomberg Law.
Table 36.1 provides a summary of the frequency of keyword mentions in

court opinions.
While these are not exhaustive searches, this simple exercise suggests
that the use of machine learning methods in support of expert opinions,
especially in the context of commercial litigation, is in very early stages of
adoption.
MIZIK_9781784716745_t.indd 666 14/02/2018 16:38

Potential Applications of Machine Learning to Expert Testimony
We believe that machine learning methods could be well-suited to several

areas of expert testimony either as a standalone method or as a way to
validate the results of commonly used empirical tools such as event studies
or conjoint analysis. For instance, text mining and sentiment analysis
could be used to analyze extensive textual data in the public domain (e.g.,
years of public press coverage, social media coverage, or equity analyst
coverage). This could be in the context of a defamation case where state-
ments about a company, product, or person are alleged to have negatively
impacted the party of interest; the goal of the analysis would be to assess
the level of dissemination of the statements and the context in which the
statements were disseminated (e.g., negative versus positive connotation,
linking statements to a specific event, issue or descriptor of interest).
Similarly, machine learning techniques can be used to collect and ana-
lyze the incredibly rich trail of consumer commentary and online reviews in
certain intellectual property rights cases, consumer class action cases, and
securities class action cases. For example, academic studies have applied
text mining and sentiment analysis to both conventional and social media
in order to assess links between stock price movements and measures of
investor sentiment or buzz, and links between product reviews, consumer
demand, and the structure of market competition.24 Similar techniques
can be used to assess the relative importance of investors’ comments or
various company news events for determining share price movements, to
analyze the weight consumers place on specific product features in making
purchase decisions, or to help assess possible substitution patterns within
a product category.
Notes
1. The authors gratefully acknowledge excellent assistance from Robert Meyer, Ryann
Noe, Jacob Ryan, and Justin Ying. The views expressed in this article are solely those
of the authors, who are responsible for the content, and do not necessarily reflect the
views of Cornerstone Research.
2. FRCP Rule 26(b). “Parties may obtain discovery regarding any nonprivileged matter
that is relevant to any party’s claim or defense—including the existence, description,
nature, custody, condition, and location of any documents or other tangible things.”
FRCP Rule 26(b)(2)(C) “On motion or on its own, the court must limit the frequency
or extent of discovery otherwise allowed by these rules or by local rule if it determines
that: the burden or expense of the proposed discovery outweighs its likely benefit.”
3. Milberg LLP and Hausfeld LLP (2011), “E-Discovery Today: The Fault Lies Not In
Our Rules,” Federal Courts Law Review, 4, 2, 1–52, p. 20.
4. “E-Discovery Today,” p. 6.
5. “E-Discovery Today,” p. 7.
MIZIK_9781784716745_t.indd 667 14/02/2018 16:38

6. “The IT Manager’s Indispensable Guide to E-Discovery,” LiveOffice LLC, October,

2010, p. 4. (https://www.insight.com/content/dam/insight/en_US/pdfs/insight/resource-
center/whitepapers/liveoffice-e-discovery-whitepaper.pdf.)
7. “The IT Manager’s Indispensable Guide to E-Discovery,” p. 4.
8. “Magic Quadrant for E-Discovery Software,” Gartner, June 19, 2014.
9. P. Flach, Machine Learning: The Art and Science of Algorithms that Make Sense of
Data. Cambridge University Printing House, 5th ed., 2014.
10. G. Cormack and M. Grossman (2014), “Evaluation of Machine-Learning Protocols
for Technology-Assisted Review in Electronic Discovery,” Proceedings of the 37th
International ACM SIGIR Conference on Research & Development in Information
Retrieval, New York City, ACM.
11. This description is based on G. Cormack and M. Grossman (2014), “Evaluation
of Machine-Learning Protocols for Technology-Assisted Review in Electronic
Discovery,” Proceedings of the 37th International ACM SIGIR Conference on Research
& Development in Information Retrieval, New York City, ACM.
12. D. Lewis and W. Gale (1994), “A Sequential Algorithm for Training Text Classifiers,”
Proceedings of the 17th Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, pp. 3–12, 1994.
13. Black’s Law Dictionary, West Publishing, 5th ed., 1979, p. 1059.
14. J. McGinnis and R. Pierce (2014), “The Great Disruption: How Machine Intelligence
Will Transform the Role of Lawyers in the Delivery of Legal Services,” Fordham Law
Review, 82, 6, 3041–3066.
15. McGinnis and Pierce, “The Great Disruption,” 3049.
16. D. Katz (2013), “Quantitative Legal Prediction—or—How I Learned to Stop Worrying
and Start Preparing for the Data-Driven Future of the Legal Services Industry,” Emory
Law Journal, 62, 909–941, p. 928.
17. T. Ruger, P. Kim, A. Martin, and K. Quinn (2004), “The Supreme Court Forecasting
Project: Legal and Political Science Approaches to Predicting Supreme Court
Decisionmaking,” Columbia Law Review, 104, 4, 1150–1210. The paper reports the
results of an interdisciplinary project where, for every argued case during the 2002
Term, the authors obtained predictions of the outcome prior to oral arguments from
a set of independent legal specialists and predictions from a statistical model using
classification trees. For a description of classification and regression tree methods,
see H. Varian (2014), “Big Data: New Tricks for Econometrics,” Journal of Economic
Perspectives, 28, 2, 3–27, pp. 7–14.
18. The approach employed in this study used eleven decision trees: one to predict the deci-
sion of each Supreme Court justice, one to predict whether there will be a unanimous
“liberal” decision, and one to predict whether there will be a unanimous “conservative”
decision.
19. Ruger, Kim, Martin, and Quinn, “The Supreme Court Forecasting Project” p. 1166.
20. Ruger, Kim, Martin, and Quinn, “The Supreme Court Forecasting Project” p. 1198.
21. D. Katz, M. Bommarito, and J. Blackman (2014), “Predicting the Behavior of the
Supreme Court of the United States: A General Approach” Unpublished manuscript,
p. 6. (http://ssrn.com/abstract=2463244).
22. The search was conducted as of October 17, 2016; the earliest document in our search
results is a 1962 court opinion. An analysis of references to machine learning, text
mining, and data mining in Google Scholar Case Law yields similar results. Data mining
is “the process of discovering interesting patterns and knowledge from large amounts of
data,” and often involves the use of machine learning methods. See J. Han, M. Kamber,
and J. Pei Data Mining: Concepts and Techniques, Morgan Kaufmann; 3rd edition, 2012,
p. 33. Text mining is essentially the same principle applied to text: analyzing text data
and extracting useful information from it. See I.H. Witten, (2005), “Text mining,” in
Practical Handbook of Internet Computing, M.P. Singh ed., Chapman & Hall/CRC
Press, pp. 14–1–14-22.
23. Pettus v. United States, 37 A.3d 213 (D.C. 2012), Court Opinion (02/09/2012).
MIZIK_9781784716745_t.indd 668 14/02/2018 16:38

24. See, e.g., W. Antweiler and M. Z. Frank (2004), “Is All That Talk Just Noise? The
Information Content of Internet Stock Message Boards,” Journal of Finance, 59 (3),
1259–1294; N. Archak, A. Ghose, and P. G. Ipeirotis (2001), “Deriving the Pricing
Power of Product Features by Mining Consumer Reviews,” Management Science, 57
(8), 1485–1509; O. Netzer, R. Feldman, J. Goldenberg, and M. Fresko (2012) “Mine
Your Own Business: Market-Structure Surveillance through Text Mining,” Marketing
Science, 31 (3), 521–543. Other studies have applied similar techniques to study the
effect of media coverage on stock returns and volume, the relative informational
content of social media commentary from different groups of stakeholders, among
others. See, e.g., P. Tetlock, (2007), “Giving Content to Investor Sentiment: The Role
of Media in the Stock Market,” Journal of Finance, 62 (3), 1139–1168; S. Jian, H. Chen,
J. Nunamaker, and D. Zimbra (2014) “Analyzing Firm-Specific Social Media and
Market: A Stakeholder-Based Event Analysis Framework,” Decision Support Systems,
67, 30–39. For a review see A. Nassirtoussi, S. Aghabozorgi, T. Wah, and D. Ngo
(2014), “Text Mining for Market Prediction: A Systematic Review,” Decision Support
Systems, 41, 7653–7670.
MIZIK_9781784716745_t.indd 669 14/02/2018 16:38

MIZIK_9781784716745_t.indd 670 14/02/2018 16:38
Index
A/B tests 324 price premium 584–5

ability bias 139 Apple v. Samsung II 573
active vs. passive control treatment Arellano-Bond GMM estimator
19–21 117–19
adaptive conjoint analysis 575 ARIMA model 99, 528, 532
addressable consumer, choice models asymmetric competition 433, 434,
and 175 436–7, 445
ad-targeting engine 448 attitudes 26–7
advanced models of choice 163–4 see automotive pricing 415–16
also specific models autoregressive process
advertised reference price (ARP) 569 first-order 83
advertising 2, 39–40 see also marketing
bans 516 B2B context, marketing 381–2
effectiveness of 39 bans, advertising 516
networks 448, 455 Bayes
online 39–40 structure 418
prices 516 theorem 182–3, 194, 196
targeting 448 Bayesian
Adwords 398 analysis 182–6, 186–90, 453, 598
aggregate data 146, 210–13, 524 basics of 182–6
discrete-choice demand models for Beta distribution in 185
205–6 binominal distribution for 184
Akaike Information Criterion 532, 533 challenge in 183
Akiva, Ben 174 covariates and 197
alternative hypotheses, in Hausman decision theory and 184, 196–7
test 120, 122, 405 heterogeneity 193–6
alternative treatment 20 likelihood principle in 182
Amazon 281 loss function and 184
ecommerce model of 175 in marketing 191–7
Mechanical Turk (AMT) 28, 56, 60 of random-effect model 194–5
American privacy regulation 512 unit-level models 191–3
analysis of variance (ANOVA) 233, 480 econometrics 181–98
analytical biases 550, 553, 555, 556 application in marketing 191–7
antitrust 516–17 computation 186–90
litigation estimation 192–3
but-for sales in the payment card hierarchical model 625
industry 599–602 inference 187
infant formula supplements statistics 191, 617
industry 602–5 VARX model 100
Apple Inc. v. Motorola 559 behavior
Apple Inc. v. Samsung Electronics Co. change framework 486–7
2001 52, 64, 72, 581, 584, 609, 630 decision theories 382
conjoint analysis 584–5 intention scale 26
671
MIZIK_9781784716745_t.indd 671 14/02/2018 16:38

beliefs 26–7 defendant’s

Ben & Jerry’s 652, 654, 665 expert assignment and findings
All Natural claims 652, 659 634–5
consumers’ perceptions of 655 response to plaintiff’s rebuttal
Bertrand-Nash pricing game 542 636–7
Beta distribution, in Bayesian analysis plaintiff’s rebuttal 636
185 B-to-B transactions 41
Beta-Binomial model 185 B-to-C settings 41
between-participant design, laboratory budget constraint 203, 459, 475, 612
experiments 21–2 business-to-business marketing 390
bias 549–50 but-for sales in the payment card
academically rigorous and unbiased industry 599–602
methodologies 553–4
analytical biases 550, 553, 555, 556 car allowance rebate system (CARS)
defined 550–51 539
implementation, unbiased 554–5 cash for clunkers 539, 544
information-related 550 cash rebates 417, 421, 424
pre-testing and survey instrument causal inference 11, 13, 15
556 “difference-in-differences” approach
selection-related 550–51 146
survey analysis 556–7 fundamental problem of 138–41
survey results, cross-validated 557–8 instrumental variable methods and
survey’s reliability 551–3 143–4
bias spreading 129–33 in marketing applications 135–50
endogeneity 130–32 model evaluation and 149–50
measurement error 132–3 observational data, problem of
in multivariate setting 132–3 136–8
bias–variance tradeoff 257–8 propensity score method 145–6
bidding application 377–9 randomized experimentation and
big data 32, 43, 141, 231, 280, 283, 301, 141–2
314, 325, 436, 453, 456 CBC see choice-based conjoint (CBC)
computational challenges in 283–5 analysis
approximate MCMC algorithms channels and purchase funnel stages
284 392–3
optimization-based approaches choice architecture 488
285 choice axioms 475
stochastic approximation and 285–8 choice experiments, models using
variational Bayes and 288–97 165–6
variety and 281–2 choice models, in marketing 155–76,
velocity and 281 187
veracity and 282 applications of 168–9
volume and 280–81 challenges 174–6
binomial probit model 191 for competitive analysis 173–4
binominal distribution, for Bayesian conjoint analysis and 165–6
analysis 184 decomposing utility of 162–3
boosting 262–3 dynamics in 171–3
Box-Jenkins method 528, 532 estimation 166–7
brand loyalty 172 generalized logit model 163–4
breach of contract 633 heterogeneity, accounting for 170–71
allegations 633–4 logit model of choice 158–9
MIZIK_9781784716745_t.indd 672 14/02/2018 16:38

Index 673
marketing mix modeling and 173 model comparison 442–5

multi-stage 167, 170 competitive markets 613
nested logit model 159–61 competitive structure 431, 444
origins of 156–8 competitors 622
probit model of choice 161–2 competitors’ marketing support (CM)
product design and 173 86
for strategic problems 173–4 complete randomization 36
Tobit model 164 conditional demand curve 618
using choice experiments 165–6 conjoint analysis
using scanner data 165 and choice models 165–6
choice-based conjoint (CBC) analysis Choice-Based Conjoint Analysis
57, 58, 61, 62, 382, 384, 575, 615, (CBC) 57
617 data collection for 59–60
survey 592, 598 ecological validity of 64–71
Church & Dwight Co., Inc. v. SPD experimental design of 58–9
Swiss Precision Diagnostics external validity of 64
GMBH 563 eye tracking in 68–71
circular bending effect 432, 435 formats of 56–7
Citri-Lite Company, Inc., Plaintiff v. gamification in 66–7
Cott Beverages, Inc. 633–7 incentive alignment in 65–6
class action 551–2, 572, 580, 628–9, industry applications of 375
667 B2B context, marketing 381–2
classification and regression trees bidding application 377–9
(CART) 259–61 distribution channel 382–5
classification problems 264 market value of attribute
classifier 264 improvement (MVAI) 380–81
linear 264–6 store location 375–7
margin for 264–5 managerial applications of 52
MML 265–6 “no choice” alternative 57
ClearBlue pregnancy test 563 overview of 52–5
closed-ended questions 555, 656 partworths in 53
cluster analysis 227–34, 434 estimation 61–2
clustering models for 229 inference based on 62–4
data preparation for 228–9 ratings-based 56
interpretation of clusters 233–4 screening for attention in 67–8
k-means clustering and 231–3 self-explicated approach and 55
verification of clusters 233–4 surveys 55, 59
Ward’s method and 229–31 conjoint analysis in litigation 572–3
Cochran’s Q tests 313 analysis and reporting stage 579
cointegration 86–7, 519, 520, 529 antitrust litigation
collaboration process 391 but-for sales in the payment card
commercial litigation 666 industry 599–602
communication strategies 431, 445, 503 Apple v. Samsung I 581, 584
community drug treatment 520 price premium 584–5
competitive analysis 375 basics 575–7
Competitive Edge v. Staples 552 consumer surveys 574
competitive market structure 431–3 design stage 578–9
DRMABS 433 Khoday v. Symantec Corp. 583–4
DRMABS, LED-TV market 436–7 Oracle America, Inc. v. Google Inc.
map generation 438 580–81
MIZIK_9781784716745_t.indd 673 14/02/2018 16:38

planning stage 578 formats for measuring likelihood

In re Whirlpool Corp. Front-Loading of consumer confusion 641–3
Washer Products Liability consumer welfare 474–5
Litigation 582–3 contingent valuation method 72
sampling and administration stage Cornell v. HP 572
579 correlated topic model (CTM) 448,
Schwab v. Philip Morris 581–2 450, 456
survey implementation strategies correspondence analysis (CA) 250
591–3 cost specification 615
conjoint survey data, analysis Cott Beverages, Inc. 633–7
598–9 counterfactuals 221
goal and designing the conjoint Cowles Commission for Research in
survey instrument 593–5 Economics 143, 219
identifying and sampling 595–8 Cracker Barrel Old Country Store, Inc.
tactical considerations 577–8 (CBOCS) 549
conjoint surveys 181 criminal justice system interventions
analysis 592–3 520
data, analysis 598–9 cross-channel effects 395, 397
instrument 593–5 cross-check response categorization
constant proportion of investment 557
(CPI) allocation rules 346 cross-nested mixed linear model 291–3
construct validity 562, 564, 565–6 Gibbs sampling in 292
consumer behavior 62, 474, 482, 483 MFVB vs. MCMC for 292
consumer (mis)behavior 473 crossover effects 35
consumer precommitment 478–81 cross-sectional regressions 403
intervention, tool 481–3 cross-sectional settings, measurement
policy intervention 474 error in 124
individual consumer welfare 474–5 cross-validation 551
internalities and precommitment crowdfunding 38
475–7 customer cash rebates 421
negative externalities 474 customer decision process, path
consumer confusion 549, 553, 590, 640, diagram for 158
642–4, 647, 653 customer-initiated contacts (CIC) 390,
consumer decision making 558, 652 392
consumer demand models 210–13 Cymbalta 629
consumer precommitment 478–81
in marketplace 481–3 daily narcotics use (DNU) 521, 524,
consumer price responsiveness 181 527, 529, 530
consumer profiling see user profiling, damages, calculation of 572
in display advertising dashboard, marketing 399
consumer purchase funnel 391 data augmentation 453
consumer surveys 549–50, 574 see also data collection
bias for conjoint analysis 59–60
and conjoint analysis see conjoint for field experiments 36–7
analysis in litigation data modeling 403
Fiji vs. Viti case 643 contemporaneous effects models
allegations 643 403–6
Eveready survey design 645–7 dynamic models 406–9
improper sample selection 648 dynamic panel data models 409–13
trademark infringement 640–41 physician-specific effects
MIZIK_9781784716745_t.indd 674 14/02/2018 16:38

Index 675
dynamic models in absence of physician-specific effects

406–9 dynamic models in absence of
dynamic panel data models with 406–9
409–13 dynamic panel data models with
data quality 621 409–13
Day, George 155 Dirichlet distribution 453
decision support system 379, 416 discrete choice models 158–62, 610,
decision theory 182, 184 615 see also specific models
and strategic covariates 196–7 discrete-choice demand models, for
decision tree-based models 259–63 aggregate data 205–6, 210–13
boosting 262–3 disjunction of conjunctions 272
classification and regression trees display advertising, user profiling in
259–61 448–9
decomposition and re-assembly modeling user profile 449–54
of markets by segmentation scenario analysis 454–6
(DRMABS) 433 distribution channels 382–5
LED-TV market 436–7 distribution, field experiments and
data collection 437 42
map exploration 438–42 Dorfman-Steiner (D-S) optimality
map generation 438 conditions 342, 343
model comparison 442–5 Double-Asymmetric Structural VAR
degrees of freedom 53–4 model 99
demand model 543 DRMABS see decomposition and
demand specification 615 re-assembly of markets by
demand system mirrors 615 segmentation (DRMABS)
demographic variables designing, drug abuse see narcotics abuse
laboratory experiments 27–8 Durbin-Wu-Hausman test 167
dependent variables 19 dynamic choice models 171–3
dynamic panel data models 113–19 dynamic marketing optimization
selection 26 problems
depletion of self-control 487 single resource single entity 354–6
Diamond, Shari 590, 596–7, 655, single-entity multi-variable 358–62
658 single-entity single-price 356–8
Dickey–Fuller (ADF) test 84, 532 dynamic models in absence of
“difference-in-differences” analysis 146, physician-specific effects 406–9
513, 517 dynamic optimization 330, 354–62,
digital marketing 135, 324, 393, 456 459–61
Digital River 583 dynamic panel data models 113–19
digitization, consumer privacy and errors in variables in 128
43 instrumental variable-based
Directive 2002/58/EC, Germany 512 estimation of 116–19
direct-to-physician (DTP) marketing OLS estimator for 114
402 random-effects specification for
data 403 115–16
data modeling 403 with physician-specific effects
contemporaneous effects models 409–13
403–6 within estimator for 114–15
dynamic models 406–9 dynamic profit maximization 460
dynamic panel data models dynamic single resource single entity
409–13 optimization problems 354–6
MIZIK_9781784716745_t.indd 675 14/02/2018 16:38

dynamic single-entity multi-variable of conjoint analysis 58–9

optimization problems 358–62 demographic characteristics,
with time-varying effectiveness measures of 27–8
360–62 dependent variables selection 26
without time-varying effectiveness full vs. fractional factorial 25–6
358–60 individual differences, measures of
dynamic single-entity single-price 27–8
optimization problems 356–8 memory and process measures 27
passive vs. active control treatment
eBay 39, 42 19–21
ecological validity, conjoint analysis of of ratings-based conjoint analysis 58
64–71 sample selection 28–9
economic simulation 449, 454, 456 sample size 29–30
e-discovery 661–2 self-reported thoughts, mood,
electronic download service (EDS) 583 beliefs, attitudes, and intentions
empirical generalizations 93, 306–7, 26–7
314, 343, 344 single vs. multiple factors 22–4
endogeneity 111, 138, 147, 166–7, 208, external validity 567–9
540 extreme value type 1 (EV1) 158–9
bias spreading, in multivariate extremeness aversion 488
setting 130–32 eye tracking evidence 27, 68–71
Engle-and-Granger (EG) approach, EZ-Pass system 52, 573
cointegration testing 87
entire market value (EMV) 572 Facebook 97, 174, 450, 511
equilibrium analysis 614, 621–2, factor analysis 234–42
630–31 factor rotations in 239–41
equilibrium conjoint analysis, feature number of factors in 238
valuation 609–10 false advertising 2, 551, 567, 579, 628,
discrete choice to create a logit 653, 654
demand system 615–18 Fancaster, Inc. v. Comcast Corporation,
equilibrium calculations 610–15, etc. 568
618–21 feature exclusivity 615
to assess feature value 621–4 feature selection, in machine learning
court decisions 628–30 276–7
example 624–8 feature space 268
equilibrium model 622 feature transform 268–9
equilibrium regressions 528–9, 532, feature valuation 609–10
533, 537 equilibrium calculations 610–15,
error-correction approach 520, 530, 618–21
531, 533–6 to assess feature value 621–4
error-in-variables problem 123 court decisions 628–30
European privacy regulation 511, 512 example 624–8
Eveready format 641–2, 645 logit demand system 615–18
Excel-based decision support tool 466 feature-specific surplus 614
exogeneity, in specification testing Federal Rules of Evidence 582
122–3 Federal Trade Commission 653
experimental designs 19–30, 473, 479, field experimentation 32–47, 397–400,
564 502–7, 511–14
between vs. within-participant 21–2 complete/stratified randomization
choice/behavior 27 in 36
MIZIK_9781784716745_t.indd 676 14/02/2018 16:38

Index 677
context of marketing and 43–4 fixed-form variational Bayes (FFVB)

crossover effects and 35 290–91
data collection in 36–7 flat maximum principle 338–9
distribution and 42 Food and Drug Administration 653
vs. laboratory experiments 15–19 4 Ps of marketing mix 328
limitations 44–6 Fractus v. Samsung 554
external generalizability 45–6 franchised-car retailers 417
lack of theory 44–5 Frisch–Waugh–Lovell theorem 130
limited scope 46 frontier analysis 424
one shot 46 Full Information Maximum
marketing communications and Likelihood (FIML) approach,
39–40 cointegration testing 87
need for 33 full vs. fractional factorial design 25–6
pricing and 40–41 functional magnetic resonance imaging
products and 41–2 (fMRI) 27
promotion communications and
39–40 galvanic skin response (GSR) 27
randomization in 33–5 gamification, in conjoint analysis 66–7
reciprocity by proxy 507–9 generalization, SVM 269–71
results interpretation in 37–9 generalized forecast error variance
rewards 509 decomposition (GFEVD) 92–3
spillover effects and 35 generalized impulse response functions
on Twitter 37 (GIRFs) 92–3
Fiji vs. Viti case 643 generalized least squares (GLS) 108–9,
allegations 643 116
biased respondents with brand generalized logit model 163–4
names 648 generalized method of moments
improper sample selection 648 (GMM) estimators 116, 117–19,
standard questions, failed to ask 182
649 generalized multinomial logit model
firm-initiated contacts (FIC) 390, 392 163–4
first-difference estimator 405–6, 409, Generalized Reduced Gradient (GRG)
412 technique 347
fixed-effects models 110 Georgia-Pacific analysis 577
vs. mean-difference estimator Gibbs sampling 283, 289, 292, 293
110–12 global norms 503, 505–6
measurement error bias in 125–6 goodness-of-fit propensity score model
first-difference instrumental variable- 145
based estimator 116–17 Google 146, 511
Fishbein, Martin 174 Google Adwords 395
Fisher, Ronald 562 Google Analytics 37
Fisher scoring 286 Google search ads 39
fixations, eye-tracking data 68 government regulation and online
fixed-effects models 109, 405 advertising market 511
advantage of 112 antitrust 516–17
estimation of 109–10 local control 514–16
first-difference vs. mean-difference privacy 511–14
estimator 110–12 government stimulus program 539,
vs. random-effects model 112–13 544
specification testing for 119–23 growth elasticity 460, 461
MIZIK_9781784716745_t.indd 677 14/02/2018 16:38

Hartman, W. 147 individual-specific random effects 402

Hausman and Taylor (HT) estimator industry applications of conjoint
122–3 analysis 375
Hausman specification test see B2B context, marketing 381–2
specification testing bidding application 377–9
Hausman test statistic 120 distribution channel 382–5
Hawthorne effects 38 market value of attribute
hazard rate models 172, 173 improvement (MVAI) 380–81
healthy food choices 18, 20–21, 486, store location 375–7
497 inequality constraint 576
heroin addicts 521, 536 infant formula litigation 594–5
heterogeneity market shares and antitrust damages
Bayesian models of 193–6 in 602–5
choice models of 170–71 infant formula supplements industry
heuristic solutions 459, 462, 463, 464, 602–5
465 inferred quality 395
hierarchical Bayes (HB) models 61, information-related bias 550
273, 283, 418 infringement 640–49
hierarchical logit model 293–5 Inofec BV
population covariance matrix for 297 challenges 390–91
total variation error for 296 channels and purchase funnel stages
via hybrid VB 294–7 392–3
hotel towel usage 503 marketing activity 392
Howlett, Elizabeth 658 marketing effects on purchase funnel
Hubbard v. Midland Credit Mgmt 555 stages 393
hybrid VB procedure 294–7 offline and online purchase funnels
hypertension market 468 391–2
in-person interviews 597
ideal point preference models 249 instrumental variables (IVs) 116, 143–4
illegal drug use 519 see also narcotics intellectual property 517, 572, 579, 590,
abuse 593, 609, 611, 633, 667
impulse-response functions (IRFs) 80, intentions 26–7
88, 90–93, 106 internal validity, experimental research
incentive programs 415, 417, 418, 421, 564, 566–7
422, 424, 482 internalities and precommitment 475–7
incentive-aligned conjoint analysis internet-based surveys 597, 603
65–6, 69 interpreting field experiment results
incentive-by-proxy condition 508 37–9
incentive-compatible experiment 479 intervention-testing experiment 17–18
independence of irrelevant alternatives
(IIA) 157 kernel functions 269
indirect least squares see Wald kernel method 267–9
estimator Khoday v. Symantec Corp. 583–4
individual consumer welfare 474–5 Kirzner, Israel M. 107
individual differences Klapper, D. 147
individual differences scaling k-means clustering 231–3
(INDSCAL) 246 Koyck model 79, 339–40, 406, 408
in personality traits 27–8 Kraft Food Group, Inc. 549
individual-consumer-level dynamics Kraft v. CBOCS 554, 556, 558
453 Kullback-Leibler (KL) divergence 289
MIZIK_9781784716745_t.indd 678 14/02/2018 16:38

Index 679
labor participation 474 Lerner index 341

laboratory experiments 11–30 libertarian paternalism 486
advantage of 19 likelihood principle 182
designing 19–30 Likert scale 26
between vs. within-participant limitations
21–2 of field experiments 44–6
choice/behavior 27 external generalizability 45–6
demographic characteristics, lack of theory 44–5
measures of 27–8 limited scope 46
dependent variables selection 26 one shot 46
full vs. fractional factorial 25–6 of normal distribution 195
individual differences, measures linear classifiers 264–6
of 27–8 linear regression
memory and process measures 27 analysis 636
passive vs. active control treatment for continuous variables 259
19–21 litigation see antitrust
sample selection 28–9 litigation experiments 561–3
sample size 29–30 courtroom, learning from 569–70
self-reported thoughts, mood, experimental research 564
beliefs, attitudes, and goals 563
intentions 26–7 validity 564
single vs. multiple factors 22–4 construct 565–6
vs. field experiments 15–19 external 567–9
intervention-testing 17–18 internal 564, 566–7
phenomenon establishment 18–19 litigation support 1–2
theory-testing 16–17 local competitive asymmetry 436
nature of 12–15 local markets (DMAs) 418–20
need for 12 location selection 375–7
for relationships between variables logistic regression, for discrete data 259
12–15 logit demand system 612, 615–18
rumor and 11–12 logit model 417, 419, 158–9
Lagrange multipliers 267, 345 logit-transformation 452
Lanham Act 565 long-difference estimators 128, 412
Laplace approximation of posterior long-run impact 79–80, 85
290 long-term equilibrium 528–9
Laser Dynamics v. Quanta 572 long-term impact 519–38
Lasso regression 298–300 time-series models 79–98
Latent Dirichlet Allocation (LDA) long-term share of the total effect
model 450–51 (LSTE) 339
Latin Square design 24 lower-priced mass-market product 614
least-squares analysis 403 Lucas Critique 99–100
least-squares dummy variable (LSDV) Lucent v. Gateway 572
110 Luce’s axiom 157
legal practice
e-discovery 661–2 McFadden, Daniel 157, 158
in identifying precedent 662–3 machine learning (ML) 255–77
machine learning applications in bias–variance tradeoff 257–8
661–5 characteristics of 255
in predicting case outcomes 663–5 decision tree-based models 259–63
legal supervision (LS) 521, 537–8 vs. econometric methods 255
MIZIK_9781784716745_t.indd 679 14/02/2018 16:38

feature selection in 276–7 communications for field

in litigation see machine learning, in experiments 39–40
litigation context of 43–4
methods 255–6 field experiments in 32–47
predictors in 259 laboratory experimentation in 11–30
as regularization 275–6 machine learning and 255–77
scalability in 255–6 meta analysis in 305–19
supervised 256 modeling choice processes in 155–76
support vector machines 264–74 optimization methods 324–66
testing 274–5 panel data methods in 107–33
tools 255 rumor and 11–12
training and 274–5 science 1
unsupervised 256 structural models in 200–221
validation and 274–5 time-series models in 79–100
machine learning, in litigation 661 unobservable factors in 107
applications 665–6 marketing activities 394
expert testimony 667 marketing budget allocation 390, 392,
classification 397, 458
e-discovery 661–2 Bayer case 463–4
identifying precedent 662–3 data and model estimation 464–6
predicting case outcomes 663–5 Bayer implementation 466–7
macro marketing optimization 325 managerial decision making 467–9
mail surveys 597 dynamic approach 459–61
mall-intercept design 655 implications 461–2
mall-intercept survey 647 optimal 458–9
mapping 431–2 practical application 462–3
margin, in SVM 264–5, 269–71 marketing communication activity 39,
market equilibrium 610 391–2
market expansion (ME) 376, 377, channels and purchase funnel stages
412 392–3
market linking 155, 175 customer-initiated contacts 392
market price 557, 569, 572, 577, 591, firm-initiated contacts 392
629–30 marketing effects on purchase funnel
market response 80, 99, 324, 394, stages 393
463–4 marketing dashboard 397, 399
market segmentation 375 marketing effects, on purchase funnel
market sensing 155, 175 stages 393
market share simulators 62–4 marketing elasticity 338, 461, 462, 464
market simulations 577 marketing input variables, optimization
market structure 173, 243, 431–3, 435, problems 328–9
438, 442, 445 marketing mix 98, 382
market value of attribute improvement instruments 93, 97
(MVAI) 380–81 modeling, choice models and 173
marketing 1 see also advertising marketing optimization problems
analytics 1, 2 324–66
applications, causal inference in class of 363–4
135–50 dynamic
Bayesian econometric methods for single resource single entity 354–6
181–98 single-entity multi-variable 358–62
boosted decision trees in 263 single-entity single-price 356–8
MIZIK_9781784716745_t.indd 680 14/02/2018 16:38

Index 681
empirical generalization for 344 dynamic panel data models 128

macro 324 static panel data models 125
micro 324 MessageWorks 314
software for 366 meta analysis, in marketing 305–19
static applications 315–18, 319
models 333–5 estimation issues in
multiple entity multi-variable ancillary statistics 313
347–54 Cochran’s Q tests 313
multiple entity single resource 343, correlated observations 312–13
345–7 equivalent tests 313–14
single entity multi-variable 341–3 fail-safe n 313
single entity single price 340–41 fixed vs. random effects 314
single resource single entity 332, weighing observations 313
336–40 post-hoc 307
typologies of as predictive simulator 314
marketing input variables 328–9 replications and 305, 307
objective function 329–32 steps in 307–12
sales entities 327–8 types of 305–6
type of objective 329–32 use 306
Markov chain 187–90, 195 variables for 309–10
Marlboro 561 methadone maintenance (MM)
Marriott hotel chains 573 treatment 521, 524, 527, 529, 530,
Marriott’s Courtyard Hotels 52 536–7
Martek Biosciences Corporation 602 legal supervision 537–8
MasterCard 599–602 Metropolis–Hastings (MH) algorithm
maximum likelihood estimation (MLE) 187–90, 283, 284
116, 137, 166, 204, 543 Microsoft Excel 56, 62, 64
maximum margin linear (MML) Solver function in 347
classifier 265–6 mind-set metrics, in marketing 98
mean-difference estimator, fixed-effects Mixed Data Sampling (MIDAS)
models 110 regression models 100
vs. first-difference estimator mobile customers
110–112 choice models and 175–6
measurement error bias in 126–7 internet channel, rise of 175
measurement error bias 125, 406 monopoly market 614
in first difference-estimators 125–7 Monte Carlo Markov Chain (MCMC)
in mean-difference estimators 126–7 methods 183, 195, 283–4
in OLS 125–6 mood 26–7
spreading in multivariate setting multidimensional scaling (MDS)
132–3 242–50, 432
measurement error, in panel data dimensionality in 246–7
models 123–9 dissimilarities data and 243–4
assessment of 128–9 ideal point preference models
first-difference estimators 126–7 249
in static panels 125–6 MDS model and 244–6
management of 128–9 vector fitting to interpret 247–9
in mean-difference 126–7 multi-format product line and pricing
in OLS 125–6 problem 351–4
variables with cross-market network effects
cross-sectional settings 124 351–4
MIZIK_9781784716745_t.indd 681 14/02/2018 16:38

multinomial logit model 158–9, 163, Nerlove–Arrow model 354–6

273, 385 nested logit model 159–61, 417, 427
multi-period optimizations 611 structure 419
multiple additive regression trees NetAirus Techs. LLC v. Apple Inc. 553
(MART) 262–3, 274–5 Netflix.com 281, 448
multiple entity multi-variable Newton method for optimization 286
optimization problems 347–54 Neyman, J. 138
multi-format product line 351–4 Nickell, Stephen 115
pricing problem 351–4 Nike, Inc. v. Nikepal Int’l, Inc. 565
product line pricing 347–9 non-linear classification, SVM 267–9
resource allocation with nonprobability sampling 596
cross-market network effects non-random samples 596
349–51 non-stationary system
multiple entity single resource with cointegration 530
optimization problems 343, 345–7 without cointegration 530–31
multiple factors design, laboratory Norton Download Insurance (NDI)
experiments 22–4 583
multi-stage choice models 167–70 nudging behavior 486, 491
multivariate models, bias spreading in nuisance, effects of 24
129–33 null hypotheses, in Hausman test 121
multivariate statistical analyses
cluster analysis 227–34 objective function 329
factor analysis 234–42 deterministic vs. stochastic 330–31
multidimensional scaling 242–50 monopolistic vs. competitive
multivariate time-series analysis 519, situations 331–2
524 static vs. dynamic 329–30
MVAI see market value of attribute oblique rotation 241
improvement (MVAI) observational data, problem of 136–8
Office Depot 561
narcotics abuse 519, 524, 527–8, 533, offline advertising 512 see also online
536, 537 advertising
narcotics use and property crime offline funnel 390, 393, 394, 395
519–21 omitted variable bias 126, 312, 406,
legal supervision 537–8 411, 412
methadone maintenance (MM) online advertising 39–40, 511 see also
treatment 536–7 offline advertising
methodology 524 antitrust 516–17
long-term equilibrium 528–9 local control 514–16
non-stationary system with privacy 511–14
cointegration 530 online funnel 392, 394, 395
short-term dynamics 529–30 online labor markets 28–9
stationary system 531 online surveys 56, 597
unit roots, presence of 528 online vs. offline selling 97
parameter estimation methods for OpenBUGS 61
short-term dynamics 531–2 open-ended questions 555, 641, 655,
reciprocal dynamics 533–6 656
Nash equilibrium 611, 619, 620 optimization see also marketing
Neches model 637 optimization problems
Need for Cognition Scale 28 meaning of 325–6
negative externalities 474, 475, 478 principles of 325
MIZIK_9781784716745_t.indd 682 14/02/2018 16:38

Index 683
problems 326 see also specific unit-root testing 83–6

problems VAR models 87–90
Oracle America, Inc. v. Google Inc. 554, strategic insights from 93–8
574, 580–81 marketing–finance interface 97
ordinary least square (OLS) marketing-mix effectiveness 93, 97
estimator 108–9, 114 mind-set metrics 98
measurement error bias in 125–6 new/social media 97–8
regression 61 online vs. offline selling 97
outlier-robust unit-root tests 85 persuasive communication 491
out-of-home advertising 514–15 comparisons 492
moments of truth 492–3
paid search advertisement 135 vividness 491–2
paired-comparisons based conjoint persuasive messages 22, 502
analysis 57 Pharmacia Corp. v. GlaxoSmithKline
panel data 107 Consumer Healthcare, L.P. 567
and unobservables 147–8 phone surveys 597
panel data models, in marketing physician-specific effects
107–33 dynamic models in absence of 406–9
dynamic 113–19 dynamic panel data models with
measurement error in 123–9 409–13
assessment of 128–9 PIN incentive planning system 427
cross-sectional settings 124 “pinch-to-zoom,” smartphones feature
dynamic panel data models 128 64
in first difference-estimators in “placebo” sample 146
static panels 125–7 Poisson distribution 452
management of 128–9 policy intervention
mean-difference 126–7 consumer (mis)behavior 474
in OLS 125–6 individual consumer welfare 474–5
variables in static panel data internalities and precommitment
models 125 475–7
specification testing in 119–23 negative externalities 474
static 109–13 possibilities 493–4
partworths 53, 576 assortment 494
estimation 61–2 bundling 494–5
inference based on 62–4 quantity 495
passive vs. active control treatment posterior distribution 182–7, 189,
19–21 194–7, 212, 283, 288, 422, 424,
patent infringement 11, 551, 554, 572, 617, 625–6, 627
579, 584, 590, 593–4, 598 potential outcomes framework 138–9,
paternalism 493, 586 141
pay-what-you-want pricing 41 power information network (PIN)
People of the State of California v. database 416, 420–21
Overstock.com 569 prediction models 259
perceptual maps 242–4, 246–7, 249–50, predictors, in machine learning 259
431 present-biased consumers 477
persistence modeling 80–81 pre-testing and survey instrument 556
steps 82 pre-testing of survey 551, 556
cointegration tests 86–7 price coefficient 612
impulse-response function price customization 415, 417
derivation 90–93 price effects, on demand 543, 590
MIZIK_9781784716745_t.indd 683 14/02/2018 16:38

price premium 584–5, 609, 630, 659 stationary system 531

price-insensitive consumers 619 unit roots, presence of 528
pricing 375 parameter estimation methods for
field experiments and 40–41 short-term dynamics 531–2
mechanisms 378 reciprocal dynamics 533–6
pricing promotion decisions 415–16 pseudo-WTP 612–13
empirical illustration 420 psychological distance 70
data description 420–21 public policy context, structural
transaction types 421–2 models in 539–40
estimation and implementation consumers 540–41
results 422 data and estimation 543–4
mid-size domestic SUV 424–7 manufacturers and dealers 541–3
simulations 422–4 pricing decisions of car
modeling objective and specification manufacturer 544–5
416–20 public policy effectiveness 519 see also
principal components analysis 250–51 narcotics use and property crime
privacy 511–14
probabilistic modeling 451, 452 Qualtrics 56, 59–60, 68
probit model of choice 161–2 quasi-experiment 42, 511, 513
process interventions 488 quasi-Newton method 620
accessibility 490
defaults 490 random coefficients
order 488–90 logit model 598
product attribute valuation 382 model 625
product design 375 random lottery mechanism (RLM) 69
choice models and 173 random parameter 598
and marketing mix modeling 173 random sampling 596–7
product life cycle (PLC) 361, 459 random utility model 156–7, 293, 576,
product line pricing optimization 617
problems 347–9 random-effects estimators
product positioning 375 assumption of efficiency for in
product-market portfolio 174 Hausman test 122
products, field experiments and 41–2 random-effects models 108–9, 194
profile fragment 449 advantage of 112
promax 241 Bayesian analysis of 194–5
promotion communications, for field for dynamic panel data models
experiments 39–40 115–16
promotions on sales, effect of 40 vs. fixed-effects models 112–13
propensity score method 145–6 Markov chain Monte Carlo
property crime and narcotics use algorithm for 195
519–21 specification testing for 119–23
legal supervision 537–8 randomization, in field experiments
methadone maintenance (MM) 33–5
treatment 536–7 complete/stratified 36
methodology 524 in non-digital environment 34
analytic procedure 524–8 in offline environment 34–5
long-term equilibrium 528–9 rankings-based conjoint analysis 57
non-stationary system, with ratings-based conjoint analysis 56–9,
cointegration 530 61, 62
short-term dynamics 529–30 reference utility 617
MIZIK_9781784716745_t.indd 684 14/02/2018 16:38

Index 685
regression analysis 58, 61, 256, 306, self-imposed restrictions 477

479, 513–14, 633–7 Self-monitoring Scale 28
regression discontinuity 148–9 self-reported thoughts 26–7
regression model 182, 186–7, 191 selling, online vs. offline 97
regularization 298 semantic differential scale 26
machine learning as 275–6 Sentius Int’l LLC v. Microsoft Corp.
penalty 271–2 559
Tikhonov 272 serial correlation 119
for tradeoff 257–8 shattering, notion of 270
relative substitutability 614 short-run impact, time-series models
re-parameterizing 452 79–98
replications, meta analysis and 305, 307 short-term dynamics 529–30
research participants 28, 29 signal-to-noise ratio 125–6
resource allocation, with cross-market simulated maximum likelihood 166
network effects 349–51 simulation methods 183
resources, in marketing 328–9 simulation-based Bayesian statistics
ResQNet.com v. Lansa 572 617
retail scanner data 181 simultaneity, problem of 111, 138, 182
rhierMnlRwMixture routine 625 single entity multi-variable
ridge regression 271, 299–300 optimization problems 341–3
risk-averse consumer 172 single entity single price optimization
Ritz Carlton 573 problems 340–41
role-category mapping 453 single factor design, laboratory
rolling-window unit-root tests 85 experiments 22–4
root mean squared errors (RMSE) 292 single product firms 615
root-finding algorithm 620 single resource single entity
rumor 11–12, 13–15 optimization problems 332,
336–40
Saavedra v. Eli Lilly 629 Skye Astiana, Plaintiff v. Ben & Jerry’s
saccades, eye-tracking data 68 Homemade, Inc. 652
sales entities, in optimization problems slack variables 266
327–8 Sleekcraft factors 643
sample selection 28–9 smart gestures 609
sample size 29–30 smartphones 575–6
Samsung 561, 596 Smith v. Wal Mart Stores, Inc. 568
Sawtooth Software’s SSI Web suite 59, social networks 40
166 choice models and 174–5
Scale Multinomial Logit model 163 social norms 502, 503
scanner data, models using 165 Solver function, Microsoft Excel 347
Schneider, Stephen A. 658 specification testing 119–23
Schwab v. Philip Morris 581–2 alternative and null hypothesis 121
Schwartz (SBC) criterion 84 computation 120
screening for attention 67–8 exogeneity in 122–3
screening protocols 551 for fixed-effects vs. random-effects
search advertising 39, 140, 150, 511, 119–21
516–17 power issues in 123
selection on unobservables 143, 147–8 random-effects estimator,
selection-related bias 550–51 assumption of efficiency for 122
self-control premium 481 spillover effects 35
self-explicated approach 55 spreadsheet-driven dashboard tool 400
MIZIK_9781784716745_t.indd 685 14/02/2018 16:38

Squirt format 641 unobservables in 200–201, 202–9

S-shaped functions 336 usefulness 213–18
Staples 561 structural models, in public policy
Stata 111, 117, 118, 119 context 539–40
static marketing optimization problems consumers 540–41
models 333–5 data and estimation 543–4
multiple entity manufacturers and dealers 541–3
multi-variable 347–54 pricing decisions of car
single resource 343, 345–7 manufacturer 544–5
single entity submarket 433–4, 436, 438
multi-variable 341–3 submarket-separating criteria 432
single price 340–41 supervised machine learning models
single resource 332, 336–40 256
single resource single entity 332, decision trees 259–63
336–40 support vector machines 264–74
static Nash price competition 615 support vector machines (SVM)
static panel data models 264–74
errors in variables in 125 applications of 272–4
fixed-effects model 109–13 classification problems 264
random-effects model 108–9, 112–13 generalization 269–71
stationary system 531 latent-class 273–4
statistical inferences 181 linear classifiers 264–6
stochastic approximation 285–8 margin in 269–71
stochastic gradient descent (SGD) misclassified examples 266–7
285–8 non-linear classification 267–9
stochastic variational inference (SVI) optimization problem 271–2
285 regularization 271–2
stock keeping units (SKUs) 165 VC dimension in 269–71
strategically defined covariates 182, 197 survey admissibility 550, 559
stratified randomization 35, 36 survey cross-validation 248, 273–5, 299
structural models, in marketing survey evidence 652
200–221 “All Natural” food products 652–4
definition 200 Ben & Jerry’s “All Natural” case
for demand and supply 219–20 654
for dynamic tradition 219–20 assignment and findings 654–5
elements of 200–201 Diamond principles 655–8
field-test models based Plaintiff’s criticisms 658
counterfactuals and 221 survey implementation strategies 591–3
illustration 202–9 conjoint survey data, analysis 598–9
consumers heterogeneity 204–5 goal and designing the conjoint
discrete-choice demand models survey instrument 593–5
for aggregate data 205–6, identifying and sampling 595–8
210–13 survey pre-testing 551, 556
market structure 203–4 survey questionnaire 578
unobserved demand factors at SurveyMonkey 56, 59
aggregate level 207–9 surveys
multiple data sources for, combined analysis 556–7
220 Eveready format 641
multiple methods for, combined 221 interpretation of open-ended
for search 220 responses 550
MIZIK_9781784716745_t.indd 686 14/02/2018 16:38

Index 687
reliability 551–3 training 257, 265–6, 269–70, 272, 274,

results, cross-validated 557–8 276
Squirt format 641 costs of 35
swivel screen attribute 623 in-person 505
Symantec 583 job-training program 135–6
machine learning and 274–5
tall dataset 280, 285, 297 in use of analytics 391
television markets 431–45 Trognon, Alain 114
Tesco 175 TV Interactive Data Corp. v. Sony
theory-based interventions 17 Corp. 554
theory-testing experiment 16–17, Twitter 37, 97, 450
30 advertising on 39
Tikhonov regularization 272 to comment 281
time pressure 487 Type I error 29–30
time-consistent consumers 476–7 Type II error 16–17, 29
time-series analysis 519, 521
time-series data 79 Uniloc v. Microsoft 572
time-series econometrics 390 Union Carbide Corp. v. Ever-Ready, Inc.
data, organizing and leveraging 553, 565, 641
393–4 unit roots
Inofec BV presence of 528
challenges 390–91 testing of 83
channels and purchase funnel design of 85
stages 392–3 United States Credit Card Act (2009)
marketing activity 392 473
marketing effects on purchase United States v. Vail Resorts 590
funnel stages 393 unit-level models, in marketing 191–3,
offline and online purchase 195
funnels 391–2 unit-root testing 83–6, 537
time-series models 79–100, 393 unobservables
long-term marketing 79–80 from explanatory variables 201–2,
persistence modeling 80–98 203–4
short-run marketing 79–80 factors in marketing 107
for stock-price data analysis 97 heterogeneity 202, 204–5
Tobacco Plain Packaging Act (2012), from measurement error 202
Australia 473 and panel data 147–8
Tobin, James 164 in structural models 200–201
Tobit model 164, 191–2 unobserved demand factors at
total variation error (TVE) 296 aggregate level 207–9
tracking goals 496 unsupervised machine learning models
trademark dilution 561, 565, 643 256
Trademark Dilution Revision Act user profiling, in display advertising
(2006) 565 448–9
trademark infringement 640–41 modeling user profile 449–54
for measuring likelihood of scenario analysis 454–6
consumer confusion 641–3 utility-to-choice probability
trademark infringement suit 568 transformation 159
trademark litigation 549, 590, 640
trade-offs 108, 165, 242, 378, 468, validation 274–5
575–6, 598, 616, 621 machine learning and 274–5
MIZIK_9781784716745_t.indd 687 14/02/2018 16:38

Valueclick.com 448 volume 280–81

Vapnik-Chervonenkis (VC) big data and 280–81
dimension 270–71
generalization theorem 269–71 Wald estimator 143–4
variable coding 616 Ward’s method 229–31
variable operationalization 394 washing machines 432, 582, 628
variables, in cross-sectional settings 124 Weibull distribution see extreme value
variational Bayesian (VB) methods type 1 (EV1)
288–97 Weierstrass sampler 284
cross-nested mixed linear model weight decay 271
291–3 Werdebaugh v. Blue Diamond Growers
FFVB 290–91 629
hierarchical logit model 293–7 Whirlpool Corporation 582
MFVB 288–90 Whirlpool Corp. Front-Loading Washer
variational distribution, MFVB 288–9 Products Liability Litigation 582–3
variety, big data and 281–2 wide data 297–300
Vector Error Correction Model 80 willingness-to-pay (WTP) 63, 351–2,
vector-autoregressive (VAR) models 576–7, 584, 612–13
80, 87–90, 97, 100, 394, 531, 532, within estimator 114–15, 405–6
533 within-participant design
velocity, big data and 281 disadvantage of 21
veracity, big data and 282 laboratory experiments 21–2
vice and virtue goods 478 word-of-mouth marketing 98
“vice-virtue” bundle 494 World Wide Web, commercialization
Victor’s Little Secret 561 of 97
Vidale–Wolfe state equation 355, 356 wrapper method 276–7
Visa 599–602
visualization 433, 435 Yahoo! 52
visualization of similarities (VOS) 435
vividness 491–2 Zima 511
MIZIK_9781784716745_t.indd 688 14/02/2018 16:38

[Research Handbooks in Business and Nanagement] Natalie Mizik, Dominique M. Hanssens, Editors - Handbook of Marketing Analytics_ Methods and Applications in Marketing Management, Public Policy, and Litigation Support (2018, Edward.pdf

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

[Research Handbooks in Business and Nanagement] Natalie Mizik, Dominique M. Hanssens, Editors - Handbook of Marketing Analytics_ Methods and Applications in Marketing Management, Public Policy, and Litigation Support (2018, Edward.pdf

Uploaded by

Copyright:

Available Formats

Handbook of Marketing

Cheltenham, UK • Northampton, MA, USA

MIZIK_9781784716745_t.indd 3 14/02/2018 16:38

All rights reserved. No part of this publication may be reproduced, stored

Edward Elgar Publishing, Inc.

A catalogue record for this book

Library of Congress Control Number: 2017950469

This book is available electronically in the

ISBN 978 1 78471 674 5 (cased)

Typeset by Servis Filmsetting Ltd, Stockport, Cheshire

MIZIK_9781784716745_t.indd 4 14/02/2018 16:38

part i Experimental Designs

1 Laboratory experimentation in marketing 11

part ii Classical Econometrics

4 Time-series models of short-run and long-run marketing

part iii Discrete Choice Modeling

7 Modeling choice processes in marketing 155

MIZIK_9781784716745_t.indd 5 14/02/2018 16:38

9 Structural models in marketing 200

part iv Latent Structure Analysis

10 Multivariate statistical analyses: cluster analysis, factor

part v Machine Learning and Big Data

11 Machine learning and marketing 255

part vi Generalizations and Optimizations

13 Meta analysis in marketing 305

CASE STUDIES AND APPLICATIONS

PARt vii Case Studies and Applications in

15 Industry applications of conjoint analysis 375

MIZIK_9781784716745_t.indd 6 14/02/2018 16:38

18 A nested logit model for product and transaction-type

part VIII Case Studies and Applications in Public

22 Consumer (mis)behavior and public policy intervention 473

PART IX Case Studies and Applications in

28 Avoiding bias: ensuring validity and admissibility of survey

MIZIK_9781784716745_t.indd 7 14/02/2018 16:38

29 Experiments in litigation 561

MIZIK_9781784716745_t.indd 8 14/02/2018 16:38

Michael P. Akemann is a Managing Director at Berkeley Research Group,

MIZIK_9781784716745_t.indd 9 14/02/2018 16:38

Michiel Bakker Director of Google’s Global Food program. Michiel and

MIZIK_9781784716745_t.indd 10 14/02/2018 16:38

Marnik G. Dekimpe is Research Professor of Marketing at Tilburg

MIZIK_9781784716745_t.indd 11 14/02/2018 16:38

Dominique M. Hanssens is Distinguished Research Professor of Marketing

MIZIK_9781784716745_t.indd 12 14/02/2018 16:38

­ arketing strategy, with an emphasis on the interactions between firm

MIZIK_9781784716745_t.indd 13 14/02/2018 16:38

His research focuses on topics such as marketing by two-sided platforms,

MIZIK_9781784716745_t.indd 14 14/02/2018 16:38

Johnson Graduate School of Management, Cornell University, Ithaca,

MIZIK_9781784716745_t.indd 15 14/02/2018 16:38

develops and implements IT solutions that integrate online marketing with

MIZIK_9781784716745_t.indd 16 14/02/2018 16:38

­ uantification of damages, allegations of false advertising, breach of con-

MIZIK_9781784716745_t.indd 17 14/02/2018 16:38

1: Laboratory experimentation in marketing

MIZIK_9781784716745_t.indd 18 14/02/2018 16:38

5: Panel data methods in marketing research

MIZIK_9781784716745_t.indd 19 14/02/2018 16:38

10: Multivariate statistical analyses: cluster analysis, factor analysis, and

MIZIK_9781784716745_t.indd 20 14/02/2018 16:38

t­ ypologies. In each example, we summarize the problem; the choice vari-

MIZIK_9781784716745_t.indd 21 14/02/2018 16:38

in the US automotive industry, which “spends” about $50 billion each

1 Laboratory experimentation in marketing 11

4 Time-series models of short-run and long-run marketing

7 Modeling choice processes in marketing 155

9 Structural models in marketing 200

10 Multivariate statistical analyses: cluster analysis, factor

11 Machine learning and marketing 255

13 Meta analysis in marketing 305

PARt vii Case Studies and Applications in

15 Industry applications of conjoint analysis 375

18 A nested logit model for product and transaction-type

part VIII Case Studies and Applications in Public

22 Consumer (mis)behavior and public policy intervention 473

PART IX Case Studies and Applications in

28 Avoiding bias: ensuring validity and admissibility of survey

29 Experiments in litigation 561

arketing strategy, with an emphasis on the interactions between firm

uantification of damages, allegations of false advertising, breach of con-

t ypologies. In each example, we summarize the problem; the choice vari-

t elevision program, the rumor was introduced in the treatment condition

c ondition. And participants are randomly assigned to each of the treat-

yijk = m + tj + lk+ (tl)jk+ eijk