Professional Documents
Culture Documents
Ray Poynter
January 2014
Introduction
Nate Silvers recent bestseller, The Signal and the Noise, highlighted a number of important and
disruptive implications for anybody trying to understand markets, customers, and brands,
especially for anybody who is looking to make predictions based on data. This white paper is
designed to extract the key messages for marketers and insight professionals.
1. Big Data has less potential to help businesses than most people seem to be claiming.
2. Many things cant be forecasted or predicted accurately.
3. Humans and machines working together tend to beat just humans and just machines.
4. We need to move our statistics on from Gauss towards Bayes.
Big Datas Feet of Clay
Nobody, including Nate Silver, is saying that Big Data has no uses. Indeed, Silvers election
forecasts are an application of multiple data sources, as are his predictions of baseball teams
successes and failures. In his book, Silver talks about the great strides that have been made in
weather forecasting because of one type of Big Data. But, Silver sounds a massive note of
caution commenting our predictions may be more prone to failure in the era of Big Data.
Silvers concerns about Big Data stem from two interrelated aspects: noise (defined as unhelpful
and possibly misleading information) and scale.
The first concern is that in the era of Big Data, noise is growing faster than the signal. If there is
an exponential increase in the amount of available information, there is likewise an exponential
increase in the number of hypotheses to investigate. With the noise growing faster than the
signal, messages will become harder to find, not easier.
The second concern is that sheer scale of Big Data will make people think that the old rules, the
rules of ordinary data, no longer apply. Silver criticizes Chris Anderson (editor of Wired
magazine and author of The Long Tail) who wrote in 2008 that Big Data would obviate the
need for theory and even the scientific method. Silver points out that when there are an almost
infinite number of possible connections, ones based on prior knowledge, theory, and
experiments where variables can be controlled, are the keys to success.
Silver also highlights the problem that with more data we are tempted to create more complex
models, but this is often beyond us. As a case in point, Silver looks at climate change. He
shows that simple models of global warming have been very accurate. However, attempts to
predict the impact of climate change on specific regions, countries, and particularly cities, have
been much less successful which in turn has provided ammunition to those who try to deny
climate change.
1. Chaotic systems
2. Missing data
3. Extrapolation
4. Feedback loops
Chaotic Systems
Chaotic systems are ones where a tiny change in the input data can result in a massive change
in the outcome event. The best known example of this is the weather, with the well-known
metaphor that the fluttering of the wings of a butterfly in the Amazon can cause a hurricane in
another part of the world. Silver shows how the quality (i.e. accuracy) of weather forecasts have
improved radically over the last twenty five years, but he also highlights two tests that are highly
relevant to anybody seeking to make forecasts about markets and consumers, which he terms
No Change and Climate.
One way of predicting the weather tomorrow, or next week, is to say it will be the same as
today. This is the No Change prediction. Any good prediction scheme should be able to beat
No Change. The Climate prediction for tomorrow, or next week, or next month, is the average of
the same day for the last few years. What Silver showed was that despite the improvements in
weather forecasting, its chaotic nature meant that forecasts only beat No Change and Climate
for about a week, further out than that No Change and Climate win.
In market research, a forecast of no change in a market, or of average performance for that type
of product (with a given distribution, ad spend, promotion budget etc) is the test that market
predicting needs to beat. Similarly, market research needs to assess the time frame over which
its predictions can beat its equivalents of the No Change and Climate tests.
Missing Data
One of the issues about the scale of Big Data is that it can blind people to what is not being
measured. For example, a project might collect a respondents location through every moment
of the day, their online connections, their purchases, and their exposure to advertising. Surely
that is enough to estimate their behavior? Not if their behavior is dependent on things such as
their childhood experiences, their genes, conversations overheard, behavior seen etc.
Silver highlights this issue in the context of earthquakes. Despite a massive effort by scientists,
data scientists, and researchers we have almost no ability to forecast an earthquake. This is
probably because an earthquake is a chaotic event and we dont have enough information
about what is happening under the ground and around the world, i.e. the failure is caused by
missing data and compounded by chaos theory.
The limited explanatory power of many market research models raises the question about how
much the lack of consistent accuracy arises from missing information. As the earthquake
illustration shows, missing information, especially in a complex system, can have massive
consequences.
Extrapolation
Extrapolation is when data is collected for one range or area and then the results are forecast
for some other range or area. For example, collecting the data for the UK and forecasting sales
for Europe, or collecting data on Monday and Tuesday and using that to forecast behavior over
a seven day week.
Silver highlights this problem by showing how his models are good at forecasting how major
league baseball players will perform next season, and how bad they are at forecasting how a
minor league player will perform in the majors next year. The major league performance is
within the box, but predicting what happens to a minor league player is extrapolation, i.e. out of
the box.
This within-the-box versus extrapolation problem also happens within marketing and market
research. Models do a pretty good job of predicting the performance of a line extension, but a
less good job when faced with truly new products, such as the Apple iPhone or the Apple
Newton. (The Apple Newton was a personal digital assistant with handwriting recognition
launched by Apple in 1993 to rave reviews, it bombed and was discontinued in 1998.)
Feedback Loops
In social systems a feedback loop is when the cause and the outcome become correlated with
each other, removing a clear cause and effect, and removing the ability of researchers to find
enduring laws to govern the market. Silver points out that most economic models developed
during one cycle of the economy fail when applied to the next cycle. The market factors in the
knowledge of the rules discovered, so the response to the levers of the economy changes.
In marketing and market research this means that when new laws are discovered they start to
cease to be true. As marketers start to employ a new method, competitors and customers start
to adapt to it. When the large retailers first introduced sales it was to sell surplus stock and to
boost expenditure during periods when spending was low. However, patterns adapted, people
started deferring their expenditure waiting for the sales, stores started stocking some items for
the sales, then the sales were moved forward into the lull created by deferred expenditure. Now
customers and retailers are in a cat and mouse game, as soon as retailers think they have
found a successful pattern, competitors and customers change their behavior and create new
patterns.
This issue was addressed by Einstein when he commented No problem can be solved from the
same level of consciousness that created it.
Bayes Approach?
With a Bayesian approach we need to assess the
prior probabilities. You did not select the agency at
random, so we want to factor that into the
assessment. Lets assume that you feel that the
chance that you appointed a good agency, given all
the checks and processes you went through, was
75%.
With that prior probability, and the assumption that
a bad agency is simply spinning a coin (50% of the
time they are right, 50% of the time they are wrong)
then Bayes would suggest that there is a 55%
chance the agency was simply unlucky (the
arithmetic is shown in the side box).
Frequentist statistics say the odds are 20% that the
agency was unlucky, so the agency should probably
be fired. However, Bayes* would suggest that the
odds they were unlucky was 55% - so they should
probably be given a second chance.
*Note, the Bayes estimate includes two subjective
numbers, the chance that you made a mistake (we
set this at 25%), and the chance that a poor agency
gets the result right (we set this at 50%). Change
one or both of these numbers and the result
changes.
In a world of frequentist statistics there is no place
for prior probabilities based on our knowledge of the
world but there is also no scope for blame if the
assessment is wrong. In Bayes we can take prior
knowledge into account, but we have to take
responsibility for creating the best prior probabilities
we can.
Bayes is an iterative approach. Before this test we
thought there was a 75% chance the agency was a
good agency, we now think there is a 55% chance
they are a good agency. This number will be used
when we next assess their project.
Closing thoughts
It is not certain that Nate Silvers prescriptions and recommendations for the process of
forecasting human behavior and intentions will prove to be as useful or as long lasting as
George Gallups were in 1936.
However, it is clear that the age of representative samples, long timelines, and statistics based
on the normal distribution is passing. New approaches such as insight communities, social
media research, Big Data, mobile market research and Bayesian analytics are the tools of the
day, demanding new mindsets.
Marketers, market researchers, and insight professionals need to find ways of using the new
tools and approaches to ensure that they improve their ability to forecast future outcomes, to
understand what can and cannot be forecast and how to make this information useful to
decision makers.
The arrival of Big Data is not a reason for businesses to sit back and assume that this will solve
all their needs. In many ways Big Data creates as many problems as it solves and the solving of
these new problems is likely to require the involvement of the human mind, the development of
hypotheses, and the application of market research.
One major challenge for marketers, market researchers, and insight professionals is to
become knowledgeable users of Big Data and Bayesian thinkers. Companies who do not
gain an overview of what Big Data is, and what it can and cant do, are likely to be seen
as easy targets by aggressive sales teams looking to create momentum and to sell, what
can be in some cases, the modern day equivalent of snake oil.
References
The Economist. (2006). Bayes Rules. 5 June, 2006. Available from:
http://www.economist.com/node/5354696. [October 2013].
Taleb, Nassim. (2007). The Black Swan. Random House, New York.
Silver, Nate. (2012). The Signal and the Noise. Penguin, New York.
10
Contact Us
info@visioncritical.com
North America: 1 877 669 8895
UK: +44 (0) 20 7633 2900
Australia: +61 (2) 9256 2000
Hong Kong: +852 3489 7009