You are on page 1of 8

HBR.

ORG MAY 2016


REPRINT R1605E

SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE

Superforecasting:
How to Upgrade
Your Company’s
Judgment
by Paul J.H. Schoemaker and Philip E. Tetlock

This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE

SPOTLIGHT ARTWORK Sarah Morris, Esther (São Paulo)


2014, household gloss paint on canvas

2 Harvard Business Review May 2016


This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG

Paul J.H. Schoemaker Annenberg University


is the former research Professor at the University
director of the Wharton of Pennsylvania and a
School’s Mack Institute and coauthor of Superforecasting
a coauthor of Peripheral (Crown, 2015). Schoemaker
Vision (Harvard Business served as an adviser to
Review Press, 2006). and Tetlock co-led the
Philip E. Tetlock is the Good Judgment Project.

Superforecasting:
How to Upgrade
Your Company’s
Judgment
BY PAUL J.H. SCHOEMAKER AND PHILIP E. TETLOCK

I
magine that you could dramatically
improve your firm’s forecasting ability,
but to do so you’d have to expose just
how unreliable its predictions—and
the people making them—really are.
That’s exactly what the U.S. intelligence
community did, with dramatic results.
May 2016 Harvard Business Review 3
This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE

Back in October 2002, the National Intelligence those not worth focusing on. We can dispense with
Council issued its official opinion that Iraq pos- predictions that are either entirely straightforward
sessed chemical and biological weapons and was ac- or seemingly impossible. Consider issues that are
tively producing more weapons of mass destruction. highly predictable: You know where the hands of
Of course, that judgment proved colossally wrong. your clock will be five hours from now; life insurance
Shaken by its intelligence failure, the $50 billion companies can reliably set premiums on the basis of
bureaucracy set out to determine how it could do updated mortality tables. For issues that can be pre-
ABOUT THE GOOD
better in the future, realizing that the process might JUDGMENT PROJECT dicted with great accuracy using econometric and
reveal glaring organizational deficiencies. In 2011, Philip Tetlock operations-research tools, there is no advantage to
The resulting research program included a large- teamed up with Barbara be gained by developing subjective judgment skills
scale, multiyear prediction tournament, co-led by Mellers, of the Wharton in those areas: The data speaks loud and clear.
one of us (Phil), called the Good Judgment Project. School, to launch the At the other end of the spectrum, we find issues
Good Judgment Project.
The series of contests, which pitted thousands of The goal was to determine that are complex, poorly understood, and tough to
amateurs against seasoned intelligence analysts, whether some people quantify, such as the patterns of clouds on a given
generated three surprising insights: First, talented are naturally better than day or when the next game-changing technology
generalists often outperform specialists in making others at prediction will pop out of a garage in Silicon Valley. Here, too,
forecasts. Second, carefully crafted training can en- and whether prediction there’s little advantage in investing resources in sys-
performance could be
hance predictive acumen. And third, well-run teams enhanced. The GJP was one tematically improving judgment: The problems are
can outperform individuals. These findings have im- of five academic research just too hard to crack.
portant implications for the way organizations and teams that competed in The sweet spot that companies should focus on
businesses forecast uncertain outcomes, such as how an innovative tournament is forecasts for which some data, logic, and analy-
a competitor will respond to a new-product launch, funded by the Intelligence sis can be used but seasoned judgment and careful
Advanced Research
how much revenue a promotion will generate, or Projects Activity (IARPA), questioning also play key roles. Predicting the com-
whether prospective hires will perform well. in which forecasters were mercial potential of drugs in clinical trials requires
The approach we’ll describe here for building an challenged to answer the scientific expertise as well as business judgment.
ever-improving organizational forecasting capabil- types of geopolitical and Assessors of acquisition candidates draw on formal
ity is not a cookbook that offers proven recipes for economic questions that scoring models, but they must also gauge intangibles
U.S. intelligence agencies
success. Many of the principles are fairly new and pose to their analysts. such as cultural fit, the chemistry among leaders,
have only recently been applied in business settings. The IARPA initiative and the likelihood that anticipated synergies will
However, our research shows that they can help ran from 2011 to 2015 and actually materialize.
leaders discover and nurture their organizations’ best recruited more than 25,000 Consider the experience of a UK bank that lost
predictive capabilities wherever they may reside. forecasters who made well a great deal of money in the early 1990s by lend-
over a million predictions
on topics ranging from ing to U.S. cable companies that were hot but then
Find the Sweet Spot whether Greece would tanked. The chief lending officer conducted an au-
Companies and individuals are notoriously inept exit the eurozone to the dit of these presumed lending errors, analyzing the
at judging the likelihood of uncertain events, as likelihood of a leadership types of loans made, the characteristics of clients
studies show all too well. Getting judgments wrong, turnover in Russia to the and loan officers involved, the incentives at play, and
risk of a financial panic in
of course, can have serious consequences. Steve China. The GJP decisively other factors. She scored the bad loans on each fac-
Ballmer’s prognostication in 2007 that “there’s no won the tournament— tor and then ran an analysis to see which ones best
chance that the iPhone is going to get any significant besting even the intelligence explained the variance in the amounts lost. In cases
market share” left Microsoft with no room to con- community’s own analysts. where the losses were substantial, she found prob-
sider alternative scenarios. But improving a firm’s lems in the underwriting process that resulted in
forecasting competence even a little can yield a com- loans to clients with poor financial health or no prior
petitive advantage. A company that is right three relationship with the bank—issues for which exper-
times out of five on its judgment calls is going to have tise and judgment were important. The bank was
an ever-increasing edge on a competitor that gets able to make targeted improvements that boosted
them right only two times out of five. performance and minimized losses.
Before we discuss how an organization can build On the basis of our research and consulting ex-
a predictive edge, let’s look at the types of judgments perience, we have identified a set of practices that
that are most amenable to improvement—and leaders can apply to improve their firms’ judgment

4 Harvard Business Review May 2016 COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED.
This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG

Idea in Brief
THE PROBLEM THE RESEARCH IN PRACTICE
Organizations and individuals On the basis of research To improve prediction
are notoriously poor at judging involving 25,000 forecasters capability, companies should
the likelihood of uncertain and a million predictions, the keep real-time accounts of
events. Predictions are often authors identified a set of how their top teams make
colored by the forecaster’s practices that can improve judgments, including underlying
susceptibility to cognitive companies’ prediction assumptions, data sources,
biases, desire to influence capability: training in the external events, and so on.
others, and concerns about basics of statistics and biases; Keys to success include
reputation. Getting judgments debating forecasts in teams; requiring frequent, precise
wrong can of course have and tracking performance and predictions and measuring
serious consequences. giving rapid feedback. accuracy for comparison.

in this middle ground. Our recommendations focus example). The prediction itself must be expressed
on improving individuals’ forecasting ability through as a numeric probability so that it can be precisely
training; using teams to boost accuracy; and tracking scored for accuracy later. That means asserting that
prediction performance and providing rapid feed- one is “80% confident,” rather than “fairly sure,”
back. The general approaches we describe should of that the prospective employee will meet her targets.
course be tailored to each organization and evolve as Understand cognitive biases. Cognitive bi-
the firm learns what works in which circumstances. ases are widely known to skew judgment, and some
have particularly pernicious effects on forecasting.
Train for Good Judgment They lead people to follow the crowd, to look for
Most predictions made in companies, whether they information that confirms their views, and to strive
concern proj­ect budgets, sales forecasts, or the per- to prove just how right they are. It’s a tall order to
formance of potential hires or acquisitions, are not debias human judgment, but the GJP has had some
the result of cold calculus. They are colored by the success in raising participants’ awareness of key
forecaster’s understanding of basic statistical argu- biases that compromise forecasting. For example,
ments, susceptibility to cognitive biases, desire to the proj­ect trained beginners to watch out for con-
influence others’ thinking, and concerns about rep- firmation bias that can create false confidence, and
utation. Indeed, predictions are often intentionally to give due weight to evidence that challenges their
vague to maximize wiggle room should they prove conclusions. And it reminded trainees to not look at
wrong. The good news is that training in reasoning problems in isolation but, rather, take what Nobel
and debiasing can reliably strengthen a firm’s fore- laureate Daniel Kahneman calls “the outside view.”
casting competence. The Good Judgment Project For instance, in predicting how long a proj­ect will
demonstrated that as little as one hour of training take to complete, trainees were counseled to first
improved forecasting accuracy by about 14% over ask how long it typically takes to complete similar
the course of a year. (See the exhibit “How Training proj­ects, to avoid underestimating the time needed.
and Teams Improve Prediction.”) Training can also help people understand the
Learn the basics. Basic reasoning errors (such psychological factors that lead to biased probability
as believing that a coin that has landed heads three estimates, such as the tendency to rely on flawed
times in a row is likelier to land tails on the next flip) intuition in lieu of careful analysis. Statistical intu-
take a toll on prediction accuracy. So it’s essential itions are notoriously susceptible to illusions and su-
that companies lay a foundation of forecasting ba- perstition. Stock market analysts may see patterns in
sics: The GJP’s training in probability concepts such the data that have no statistical basis, and sports fans
as regression to the mean and Bayesian revision (up- often regard basketball free-throw streaks, or “hot
dating a probability estimate in light of new data), for hands,” as evidence of extraordinary new capability
example, boosted participants’ accuracy measurably. when in fact they’re witnessing a mirage caused by
Companies should also require that forecasts include capricious variations in a small sample size.
a precise definition of what is to be predicted (say, Another technique for making people aware
the chance that a potential hire will meet her sales of the psychological biases underlying skewed
targets) and the time frame involved (one year, for estimates is to give them “confidence quizzes.”

May 2016 Harvard Business Review 5


This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE

Participants are asked for range estimates about collaboratively in teams. In each of the four years of
general-interest questions (such as “How old was the IARAP tournament, the forecasters working in
Martin Luther King Jr. when he died?”) or company- teams outperformed those who worked alone. Of
specific ones (such as “How much federal tax did our course, to achieve good results, teams must be deftly
firm pay in the past year?”). The predictors’ task is to managed and have certain distinctive features.
give their best guess in the form of a range and assign Composition. The forecasters who do the best
a degree of confidence to it; for example, one might in GJP tournaments are brutally honest about the
guess with 90% confidence that Dr. King was be- source of their success, appreciating that they may
tween 40 and 55 when he was assassinated (he was have gotten a prediction right despite (not because
39). The aim is to measure not participants’ domain- of) their analysis. They are cautious, humble, open-
specific knowledge, but, rather, how well they know minded, analytical—and good with numbers. (See
what they don’t know. As Will Rogers wryly noted: the sidebar “Who Are These Superforecasters?”) In
“It is not what we don’t know that gets us into trou- assembling teams, companies should look for natu-
ble; it is what we know that ain’t so.” Participants ral forecasters who show an alertness to bias, a knack
commonly discover that half or more of their 90% for sound reasoning, and a respect for data.
confidence ranges don’t contain the true answer. It’s also important that forecasting teams be in-
Again, there’s no one-size-fits-all remedy for tellectually diverse. At least one member should
avoiding these systematic errors; companies should have domain expertise (a finance professional on
tailor training programs to their circumstances. a budget forecasting team, for example), but non­
Susquehanna International Group, a privately held experts are essential too—particularly ones who
global quantitative trading firm, has its own idiosyn- won’t shy away from challenging the presumed ex-
cratic approach. Founded in 1987 by poker aficio- perts. Don’t underestimate these generalists. In the
nados, the company, which transacts more than a GJP contests, nonexpert civilian forecasters often
billion dollars in trades a year, requires new hires to beat trained intelligence analysts at their own game.
play lots of poker—on company time. In the process,
trainees learn about cognitive traps, emotional in-
fluences such as wishful thinking, behavioral game
theory, and, of course, options theory, arbitrage,
and foreign exchange and trading regulations. The
poker-playing exercises sensitize the trainees to the
How Training and Teams Improve Prediction
value of thinking in probability terms, focusing on The Good Judgment Project tracked the accuracy of participants’ forecasts
information asymmetry (what the opponent might about economic and geopolitical events. The control group, made up of
know that I don’t), learning when to fold a bad hand, motivated volunteers, received no training about the biases that can plague
and defining success not as winning each round but forecasters. Its members performed at about the same level as most employees
as making the most of the hand you are dealt. in high-quality companies—perhaps even better, since they were self-selected,
Companies should also engage in customized competitive individuals. The second group benefited from training on biases
training that focuses on narrower prediction do- and how to overcome them. Teams of trained individuals, who debated their
mains, such as sales and R&D, or areas where past forecasts (usually virtually), performed even better. When the best forecasters
were culled, over successive rounds, into an elite group of superforecasters,
performance has been especially poor. If your sales
their predictions were nearly twice as accurate as those made by untrained
team is prone to hubris, that bias can be systemati-
forecasters—representing a huge opportunity for companies.
cally addressed. Such tailored programs are more
challenging to develop and run than general ones, (% more accurate than random guesses)
but because they are targeted, they often yield
NO TRAINING 36%
greater benefits.

Build the Right Kind of Teams TRAINING 41%

Assembling forecasters into teams is an effective


TEAMS 44%
way to improve forecasts. In the Good Judgment
Project, several hundred forecasters were randomly
ELITE TEAMS 66%
assigned to work alone and several hundred to work

6 Harvard Business Review May 2016


This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG

Who Are These Superforecasters?


The Good Judgment Project identified the traits
shared by the best-performing forecasters in the
Intelligence Advanced Research Projects Activity
tournament. A public tournament is ongoing at opinions far too long. This often happens uncon-
gjopen.com; join to see if you have what it takes. sciously because easily available numbers serve as
PHILOSOPHICAL APPROACH AND OUTLOOK convenient starting points. (Even random numbers,
CAUTIOUS They understand that few things are certain when used in an initial estimate, have been shown to
HUMBLE They appreciate their limits
NONDETERMINISTIC They don’t assume that what happens anchor people’s final judgments.)
is meant to be One of us (Paul) ran an experiment with University
of Chicago MBA subjects that demonstrated the im-
ABILITIES AND THINKING STYLE
OPEN-MINDED They see beliefs as hypotheses to be tested pact of divergent exploration on the path to a final
INQUIRING They are intellectually curious and enjoy mental prediction. In one test, subjects in the control group
challenges were asked to estimate how many gold medals the
REFLECTIVE They are introspective and self-critical
NUMERATE They are comfortable with numbers U.S. would win relative to another top country in
the next summer Olympics and to provide their 90%
METHODS OF FORECASTING
confidence ranges around these estimates. The other
PRAGMATIC They are not wedded to any one idea or agenda
ANALYTICAL They consider other views group was asked to first sketch out various reasons
SYNTHESIZING They blend diverse views into their own why the ratio of medals might be lower or higher than
PROBABILITY-FOCUSED They judge the probability of events
in years past and then make an estimate. This group
not as certain or uncertain but as more or less likely
THOUGHTFUL UPDATERS They change their minds when new naturally thought back to terrorist attacks and boy-
facts warrant it cotts, and considered other factors that might influ-
INTUITIVE SHRINKS They are aware of their cognitive and
ence the outcome, from illness to improved training
emotional biases
to performance-enhancing drugs. As a consequence
WORK ETHIC of this divergent thinking, this group’s ranges were
IMPROVEMENT-MINDED They strive to get better
TENACIOUS They stick with a problem for as long as needed significantly wider than the control group’s, often
by more than half. In general, wider ranges reflect
more carefully weighed predictions; narrow ranges
commonly indicate overconfident—and often less
accurate—forecasts.
Diverging, evaluating, and converging. Trust. Finally, trust among members of any team
Whether a team is making a forecast about a single is required for good outcomes. It is particularly criti-
event (such as the likelihood of a U.S. recession two cal for prediction teams because of the nature of the
years from now) or making recurring predictions work. Teams that are predicting the success or fail-
(such as the risk each year of recession in an array of ure of a new acquisition, or handicapping the odds
countries), a successful team needs to manage three of successfully divesting a part of the business, may
phases well: a diverging phase, in which the issue, reach conclusions that raise turf issues or threaten
assumptions, and approaches to finding an answer egos and reputations. They are also likely to expose
are explored from multiple angles; an evaluating areas of the firm, and perhaps individuals, with
phase, which includes time for productive disagree- poor forecasting abilities. To ensure that forecasters
ment; and a converging phase, when the team settles share their best thinking, members must trust one
on a prediction. In each of these phases, learning and another and trust that leadership will defend their
prog­­ress are fastest when questions are focused and work and protect their jobs and reputations. Few
feedback is frequent. things chill a forecasting team faster than a sense
The diverging and evaluating phases are essen- that its conclusions could threaten the team itself.
tial; if they are cursory or ignored, the team develops
tunnel vision—focusing too narrowly and quickly Track Performance
locking into a wrong answer—and prediction qual- and Give Feedback
ity suffers. The right norms can help prevent this, Our work on the Good Judgment Project and with a
including a focus on gathering new information and range of companies shows that tracking prediction
testing assumptions relevant to the forecasts. Teams outcomes and providing timely feedback is essential
must also focus on neutralizing a common predic- to improving forecasting performance.
tion error called anchoring, wherein an early—and Consider U.S. weather forecasters, who, though
possibly ill-advised—estimate skews subsequent much maligned, excel at what they do. When they

May 2016 Harvard Business Review 7


This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG

say there’s a 30% chance of rain, 30% of the time the forecast: time pressure, directive leadership,
it rains on those days, on average. Key to their su- failure to fully explore alternate views, silencing of
perior performance is that they receive timely, dissenters, and a sense of infallibility (after all, 24
continual, and unambiguous feedback about their previous flights had gone well).
accuracy, which is often tied to their performance To avoid such catastrophes—and to replicate
reviews. Bridge players, internal auditors, and oil successes—companies should systematically col-
BRIER SCORES geologists also shine at prediction thanks in part to lect real-time accounts of how their top teams make
REVEAL YOUR
BEST—AND WORST— robust feedback and incentives for improvement. judgments, keeping rec­ords of assumptions made,
PREDICTORS The purest measure for the accuracy of predic- data used, experts consulted, external events, and
It’s important that tions and tracking them over time is the Brier score. It so on. Videos or transcripts of meetings can be used
forecasters make precise allows companies to make direct, statistically reliable to analyze process; asking forecasters to re­cord
estimates of probability—
comparisons among forecasters across a series of pre- their own process may also offer important insights.
for example, pegging at
80% the likelihood that dictions. Over time, the scores reveal those who ex- Recall Susquehanna International Group, which
their firm will sell between cel, be they individuals, members of a team, or entire trains its traders to play poker. Those traders are
9,000 and 11,000 units teams competing with others. (See the sidebar “Brier required to document their rationale for entering or
of a new product in the Scores Reveal Your Best—and Worst—Predictors.”) exiting a trade before making a transaction. They are
first quarter. That way, the
But simply knowing a team’s score does little to asked to consider key questions: What information
predictions can be analyzed
and compared using a improve performance; you have to track the process might others have that you don’t that might affect
method called Brier scoring, it used as well. It’s important to audit why outcomes the trade? What cognitive traps might skew your
allowing managers to were achieved—good or bad—so that you can learn judgment on this transaction? Why do you believe
reliably rank forecasters from them. Some audits may reveal that certain pro- the firm has an edge on this trade? Susquehanna
on the basis of skill.
cess steps led to a good or a bad prediction. Others further emphasizes the importance of process by
Brier scores are
calculated by squaring may show that a forecast was correct despite a faulty pegging traders’ bonuses not just to the outcome of
the difference between a rationale (that is, it was lucky), or that a forecast was individual trades but also to whether the underlying
probability prediction and wrong because of unusual circumstances rather than analytic process was sound.
the actual outcome, scored a flawed analysis. For example, a retailer may make Well-run audits can reveal post facto whether
as 1 if the event happened
very accurate forecasts of how many customers will forecasters coalesced around a bad anchor, framed
and 0 if not. For example,
if a forecaster assigns a visit a store on a given day, but if a black-swan event— the problem poorly, overlooked an important insight,
0.9 probability (a 90% say, a bomb threat—closes the store, its forecast for or failed to engage (or even muzzled) team members
confidence level) that the that day will be badly off. Its Brier score would in- with dissenting views. Likewise, they can highlight
firm will exceed a sales dicate poor performance, but a process audit would the process steps that led to good forecasts and
target and the firm then
show that bad luck, not bad process, accounted for thereby provide other teams with best practices for
does, her Brier score for
that forecast is: the outlying score. improving predictions.
(0.9 – 1)², or 0.01. Gauging group dynamics is also a critical part of
If the firm misses the target, the process audit. No amount of good data and by- EACH OF the methods we’ve described—training,
her score is: the-book forecasting can overcome flawed team team building, tracking, and talent spotting—is es-
(0.9 – 0)², or 0.81.
dynamics. Consider the discussions that took place sential to good forecasting. The approach must be
The closer to zero the score
is, the smaller the forecast between NASA and engineering contractor Morton customized across businesses, and no firm, to our
error and the better the Thiokol before the doomed launch of the space knowledge, has yet mastered them all to create a
prediction. shuttle Challenger in 1986. At first, Thiokol engineers fully integrated program. This presents a great oppor-
Brier scoring makes advised against the launch, concerned that cold tem- tunity for companies that take the lead—particularly
it readily apparent who’s
peratures could compromise the O-rings that sealed those with a culture of organizational innovation
good at forecasting and
who isn’t. By enabling the rocket boosters’ joints. They predicted a much and those who embrace the kind of experimentation
direct comparison among higher than usual chance of failure because of the the intelligence community did.
forecasters, the tool temperature. Ultimately, and tragically, Thiokol But companies will capture this advantage only
encourages thoughtful reversed its stance. if respected leaders champion the effort, by broad-
analysis while exposing
The engineers’ analysis was good; the organiza- casting an openness to trial and error, a willingness
“shooting from the hip” and
biased prognostications. tional process was flawed. A reconstruction of the to ruffle feathers, and a readiness to expose “what
events that day, based on congressional hearings, re- we know that ain’t so” in order to hone the firm’s
vealed the interwoven conditions that compromised predictive edge.  HBR Reprint R1605E

8 Harvard Business Review May 2016


This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.

You might also like