Professional Documents
Culture Documents
Superforecasting:
How to Upgrade
Your Company’s
Judgment
by Paul J.H. Schoemaker and Philip E. Tetlock
This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE
Superforecasting:
How to Upgrade
Your Company’s
Judgment
BY PAUL J.H. SCHOEMAKER AND PHILIP E. TETLOCK
I
magine that you could dramatically
improve your firm’s forecasting ability,
but to do so you’d have to expose just
how unreliable its predictions—and
the people making them—really are.
That’s exactly what the U.S. intelligence
community did, with dramatic results.
May 2016 Harvard Business Review 3
This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
SPOTLIGHT ON MANAGING FOR AN UNPREDICTABLE FUTURE
Back in October 2002, the National Intelligence those not worth focusing on. We can dispense with
Council issued its official opinion that Iraq pos- predictions that are either entirely straightforward
sessed chemical and biological weapons and was ac- or seemingly impossible. Consider issues that are
tively producing more weapons of mass destruction. highly predictable: You know where the hands of
Of course, that judgment proved colossally wrong. your clock will be five hours from now; life insurance
Shaken by its intelligence failure, the $50 billion companies can reliably set premiums on the basis of
bureaucracy set out to determine how it could do updated mortality tables. For issues that can be pre-
ABOUT THE GOOD
better in the future, realizing that the process might JUDGMENT PROJECT dicted with great accuracy using econometric and
reveal glaring organizational deficiencies. In 2011, Philip Tetlock operations-research tools, there is no advantage to
The resulting research program included a large- teamed up with Barbara be gained by developing subjective judgment skills
scale, multiyear prediction tournament, co-led by Mellers, of the Wharton in those areas: The data speaks loud and clear.
one of us (Phil), called the Good Judgment Project. School, to launch the At the other end of the spectrum, we find issues
Good Judgment Project.
The series of contests, which pitted thousands of The goal was to determine that are complex, poorly understood, and tough to
amateurs against seasoned intelligence analysts, whether some people quantify, such as the patterns of clouds on a given
generated three surprising insights: First, talented are naturally better than day or when the next game-changing technology
generalists often outperform specialists in making others at prediction will pop out of a garage in Silicon Valley. Here, too,
forecasts. Second, carefully crafted training can en- and whether prediction there’s little advantage in investing resources in sys-
performance could be
hance predictive acumen. And third, well-run teams enhanced. The GJP was one tematically improving judgment: The problems are
can outperform individuals. These findings have im- of five academic research just too hard to crack.
portant implications for the way organizations and teams that competed in The sweet spot that companies should focus on
businesses forecast uncertain outcomes, such as how an innovative tournament is forecasts for which some data, logic, and analy-
a competitor will respond to a new-product launch, funded by the Intelligence sis can be used but seasoned judgment and careful
Advanced Research
how much revenue a promotion will generate, or Projects Activity (IARPA), questioning also play key roles. Predicting the com-
whether prospective hires will perform well. in which forecasters were mercial potential of drugs in clinical trials requires
The approach we’ll describe here for building an challenged to answer the scientific expertise as well as business judgment.
ever-improving organizational forecasting capabil- types of geopolitical and Assessors of acquisition candidates draw on formal
ity is not a cookbook that offers proven recipes for economic questions that scoring models, but they must also gauge intangibles
U.S. intelligence agencies
success. Many of the principles are fairly new and pose to their analysts. such as cultural fit, the chemistry among leaders,
have only recently been applied in business settings. The IARPA initiative and the likelihood that anticipated synergies will
However, our research shows that they can help ran from 2011 to 2015 and actually materialize.
leaders discover and nurture their organizations’ best recruited more than 25,000 Consider the experience of a UK bank that lost
predictive capabilities wherever they may reside. forecasters who made well a great deal of money in the early 1990s by lend-
over a million predictions
on topics ranging from ing to U.S. cable companies that were hot but then
Find the Sweet Spot whether Greece would tanked. The chief lending officer conducted an au-
Companies and individuals are notoriously inept exit the eurozone to the dit of these presumed lending errors, analyzing the
at judging the likelihood of uncertain events, as likelihood of a leadership types of loans made, the characteristics of clients
studies show all too well. Getting judgments wrong, turnover in Russia to the and loan officers involved, the incentives at play, and
risk of a financial panic in
of course, can have serious consequences. Steve China. The GJP decisively other factors. She scored the bad loans on each fac-
Ballmer’s prognostication in 2007 that “there’s no won the tournament— tor and then ran an analysis to see which ones best
chance that the iPhone is going to get any significant besting even the intelligence explained the variance in the amounts lost. In cases
market share” left Microsoft with no room to con- community’s own analysts. where the losses were substantial, she found prob-
sider alternative scenarios. But improving a firm’s lems in the underwriting process that resulted in
forecasting competence even a little can yield a com- loans to clients with poor financial health or no prior
petitive advantage. A company that is right three relationship with the bank—issues for which exper-
times out of five on its judgment calls is going to have tise and judgment were important. The bank was
an ever-increasing edge on a competitor that gets able to make targeted improvements that boosted
them right only two times out of five. performance and minimized losses.
Before we discuss how an organization can build On the basis of our research and consulting ex-
a predictive edge, let’s look at the types of judgments perience, we have identified a set of practices that
that are most amenable to improvement—and leaders can apply to improve their firms’ judgment
4 Harvard Business Review May 2016 COPYRIGHT © 2016 HARVARD BUSINESS SCHOOL PUBLISHING CORPORATION. ALL RIGHTS RESERVED.
This document is authorized for use only in Pavel Atanasov's MIM-EN_Ene2024_J1&J2&J3 - Data Analytics for decision making -- OPE at IE Business School from Jan 2024 to Jan 2025.
FOR ARTICLE REPRINTS CALL 800-988-0886 OR 617-783-7500, OR VISIT HBR.ORG
Idea in Brief
THE PROBLEM THE RESEARCH IN PRACTICE
Organizations and individuals On the basis of research To improve prediction
are notoriously poor at judging involving 25,000 forecasters capability, companies should
the likelihood of uncertain and a million predictions, the keep real-time accounts of
events. Predictions are often authors identified a set of how their top teams make
colored by the forecaster’s practices that can improve judgments, including underlying
susceptibility to cognitive companies’ prediction assumptions, data sources,
biases, desire to influence capability: training in the external events, and so on.
others, and concerns about basics of statistics and biases; Keys to success include
reputation. Getting judgments debating forecasts in teams; requiring frequent, precise
wrong can of course have and tracking performance and predictions and measuring
serious consequences. giving rapid feedback. accuracy for comparison.
in this middle ground. Our recommendations focus example). The prediction itself must be expressed
on improving individuals’ forecasting ability through as a numeric probability so that it can be precisely
training; using teams to boost accuracy; and tracking scored for accuracy later. That means asserting that
prediction performance and providing rapid feed- one is “80% confident,” rather than “fairly sure,”
back. The general approaches we describe should of that the prospective employee will meet her targets.
course be tailored to each organization and evolve as Understand cognitive biases. Cognitive bi-
the firm learns what works in which circumstances. ases are widely known to skew judgment, and some
have particularly pernicious effects on forecasting.
Train for Good Judgment They lead people to follow the crowd, to look for
Most predictions made in companies, whether they information that confirms their views, and to strive
concern project budgets, sales forecasts, or the per- to prove just how right they are. It’s a tall order to
formance of potential hires or acquisitions, are not debias human judgment, but the GJP has had some
the result of cold calculus. They are colored by the success in raising participants’ awareness of key
forecaster’s understanding of basic statistical argu- biases that compromise forecasting. For example,
ments, susceptibility to cognitive biases, desire to the project trained beginners to watch out for con-
influence others’ thinking, and concerns about rep- firmation bias that can create false confidence, and
utation. Indeed, predictions are often intentionally to give due weight to evidence that challenges their
vague to maximize wiggle room should they prove conclusions. And it reminded trainees to not look at
wrong. The good news is that training in reasoning problems in isolation but, rather, take what Nobel
and debiasing can reliably strengthen a firm’s fore- laureate Daniel Kahneman calls “the outside view.”
casting competence. The Good Judgment Project For instance, in predicting how long a project will
demonstrated that as little as one hour of training take to complete, trainees were counseled to first
improved forecasting accuracy by about 14% over ask how long it typically takes to complete similar
the course of a year. (See the exhibit “How Training projects, to avoid underestimating the time needed.
and Teams Improve Prediction.”) Training can also help people understand the
Learn the basics. Basic reasoning errors (such psychological factors that lead to biased probability
as believing that a coin that has landed heads three estimates, such as the tendency to rely on flawed
times in a row is likelier to land tails on the next flip) intuition in lieu of careful analysis. Statistical intu-
take a toll on prediction accuracy. So it’s essential itions are notoriously susceptible to illusions and su-
that companies lay a foundation of forecasting ba- perstition. Stock market analysts may see patterns in
sics: The GJP’s training in probability concepts such the data that have no statistical basis, and sports fans
as regression to the mean and Bayesian revision (up- often regard basketball free-throw streaks, or “hot
dating a probability estimate in light of new data), for hands,” as evidence of extraordinary new capability
example, boosted participants’ accuracy measurably. when in fact they’re witnessing a mirage caused by
Companies should also require that forecasts include capricious variations in a small sample size.
a precise definition of what is to be predicted (say, Another technique for making people aware
the chance that a potential hire will meet her sales of the psychological biases underlying skewed
targets) and the time frame involved (one year, for estimates is to give them “confidence quizzes.”
Participants are asked for range estimates about collaboratively in teams. In each of the four years of
general-interest questions (such as “How old was the IARAP tournament, the forecasters working in
Martin Luther King Jr. when he died?”) or company- teams outperformed those who worked alone. Of
specific ones (such as “How much federal tax did our course, to achieve good results, teams must be deftly
firm pay in the past year?”). The predictors’ task is to managed and have certain distinctive features.
give their best guess in the form of a range and assign Composition. The forecasters who do the best
a degree of confidence to it; for example, one might in GJP tournaments are brutally honest about the
guess with 90% confidence that Dr. King was be- source of their success, appreciating that they may
tween 40 and 55 when he was assassinated (he was have gotten a prediction right despite (not because
39). The aim is to measure not participants’ domain- of) their analysis. They are cautious, humble, open-
specific knowledge, but, rather, how well they know minded, analytical—and good with numbers. (See
what they don’t know. As Will Rogers wryly noted: the sidebar “Who Are These Superforecasters?”) In
“It is not what we don’t know that gets us into trou- assembling teams, companies should look for natu-
ble; it is what we know that ain’t so.” Participants ral forecasters who show an alertness to bias, a knack
commonly discover that half or more of their 90% for sound reasoning, and a respect for data.
confidence ranges don’t contain the true answer. It’s also important that forecasting teams be in-
Again, there’s no one-size-fits-all remedy for tellectually diverse. At least one member should
avoiding these systematic errors; companies should have domain expertise (a finance professional on
tailor training programs to their circumstances. a budget forecasting team, for example), but non
Susquehanna International Group, a privately held experts are essential too—particularly ones who
global quantitative trading firm, has its own idiosyn- won’t shy away from challenging the presumed ex-
cratic approach. Founded in 1987 by poker aficio- perts. Don’t underestimate these generalists. In the
nados, the company, which transacts more than a GJP contests, nonexpert civilian forecasters often
billion dollars in trades a year, requires new hires to beat trained intelligence analysts at their own game.
play lots of poker—on company time. In the process,
trainees learn about cognitive traps, emotional in-
fluences such as wishful thinking, behavioral game
theory, and, of course, options theory, arbitrage,
and foreign exchange and trading regulations. The
poker-playing exercises sensitize the trainees to the
How Training and Teams Improve Prediction
value of thinking in probability terms, focusing on The Good Judgment Project tracked the accuracy of participants’ forecasts
information asymmetry (what the opponent might about economic and geopolitical events. The control group, made up of
know that I don’t), learning when to fold a bad hand, motivated volunteers, received no training about the biases that can plague
and defining success not as winning each round but forecasters. Its members performed at about the same level as most employees
as making the most of the hand you are dealt. in high-quality companies—perhaps even better, since they were self-selected,
Companies should also engage in customized competitive individuals. The second group benefited from training on biases
training that focuses on narrower prediction do- and how to overcome them. Teams of trained individuals, who debated their
mains, such as sales and R&D, or areas where past forecasts (usually virtually), performed even better. When the best forecasters
were culled, over successive rounds, into an elite group of superforecasters,
performance has been especially poor. If your sales
their predictions were nearly twice as accurate as those made by untrained
team is prone to hubris, that bias can be systemati-
forecasters—representing a huge opportunity for companies.
cally addressed. Such tailored programs are more
challenging to develop and run than general ones, (% more accurate than random guesses)
but because they are targeted, they often yield
NO TRAINING 36%
greater benefits.
say there’s a 30% chance of rain, 30% of the time the forecast: time pressure, directive leadership,
it rains on those days, on average. Key to their su- failure to fully explore alternate views, silencing of
perior performance is that they receive timely, dissenters, and a sense of infallibility (after all, 24
continual, and unambiguous feedback about their previous flights had gone well).
accuracy, which is often tied to their performance To avoid such catastrophes—and to replicate
reviews. Bridge players, internal auditors, and oil successes—companies should systematically col-
BRIER SCORES geologists also shine at prediction thanks in part to lect real-time accounts of how their top teams make
REVEAL YOUR
BEST—AND WORST— robust feedback and incentives for improvement. judgments, keeping records of assumptions made,
PREDICTORS The purest measure for the accuracy of predic- data used, experts consulted, external events, and
It’s important that tions and tracking them over time is the Brier score. It so on. Videos or transcripts of meetings can be used
forecasters make precise allows companies to make direct, statistically reliable to analyze process; asking forecasters to record
estimates of probability—
comparisons among forecasters across a series of pre- their own process may also offer important insights.
for example, pegging at
80% the likelihood that dictions. Over time, the scores reveal those who ex- Recall Susquehanna International Group, which
their firm will sell between cel, be they individuals, members of a team, or entire trains its traders to play poker. Those traders are
9,000 and 11,000 units teams competing with others. (See the sidebar “Brier required to document their rationale for entering or
of a new product in the Scores Reveal Your Best—and Worst—Predictors.”) exiting a trade before making a transaction. They are
first quarter. That way, the
But simply knowing a team’s score does little to asked to consider key questions: What information
predictions can be analyzed
and compared using a improve performance; you have to track the process might others have that you don’t that might affect
method called Brier scoring, it used as well. It’s important to audit why outcomes the trade? What cognitive traps might skew your
allowing managers to were achieved—good or bad—so that you can learn judgment on this transaction? Why do you believe
reliably rank forecasters from them. Some audits may reveal that certain pro- the firm has an edge on this trade? Susquehanna
on the basis of skill.
cess steps led to a good or a bad prediction. Others further emphasizes the importance of process by
Brier scores are
calculated by squaring may show that a forecast was correct despite a faulty pegging traders’ bonuses not just to the outcome of
the difference between a rationale (that is, it was lucky), or that a forecast was individual trades but also to whether the underlying
probability prediction and wrong because of unusual circumstances rather than analytic process was sound.
the actual outcome, scored a flawed analysis. For example, a retailer may make Well-run audits can reveal post facto whether
as 1 if the event happened
very accurate forecasts of how many customers will forecasters coalesced around a bad anchor, framed
and 0 if not. For example,
if a forecaster assigns a visit a store on a given day, but if a black-swan event— the problem poorly, overlooked an important insight,
0.9 probability (a 90% say, a bomb threat—closes the store, its forecast for or failed to engage (or even muzzled) team members
confidence level) that the that day will be badly off. Its Brier score would in- with dissenting views. Likewise, they can highlight
firm will exceed a sales dicate poor performance, but a process audit would the process steps that led to good forecasts and
target and the firm then
show that bad luck, not bad process, accounted for thereby provide other teams with best practices for
does, her Brier score for
that forecast is: the outlying score. improving predictions.
(0.9 – 1)², or 0.01. Gauging group dynamics is also a critical part of
If the firm misses the target, the process audit. No amount of good data and by- EACH OF the methods we’ve described—training,
her score is: the-book forecasting can overcome flawed team team building, tracking, and talent spotting—is es-
(0.9 – 0)², or 0.81.
dynamics. Consider the discussions that took place sential to good forecasting. The approach must be
The closer to zero the score
is, the smaller the forecast between NASA and engineering contractor Morton customized across businesses, and no firm, to our
error and the better the Thiokol before the doomed launch of the space knowledge, has yet mastered them all to create a
prediction. shuttle Challenger in 1986. At first, Thiokol engineers fully integrated program. This presents a great oppor-
Brier scoring makes advised against the launch, concerned that cold tem- tunity for companies that take the lead—particularly
it readily apparent who’s
peratures could compromise the O-rings that sealed those with a culture of organizational innovation
good at forecasting and
who isn’t. By enabling the rocket boosters’ joints. They predicted a much and those who embrace the kind of experimentation
direct comparison among higher than usual chance of failure because of the the intelligence community did.
forecasters, the tool temperature. Ultimately, and tragically, Thiokol But companies will capture this advantage only
encourages thoughtful reversed its stance. if respected leaders champion the effort, by broad-
analysis while exposing
The engineers’ analysis was good; the organiza- casting an openness to trial and error, a willingness
“shooting from the hip” and
biased prognostications. tional process was flawed. A reconstruction of the to ruffle feathers, and a readiness to expose “what
events that day, based on congressional hearings, re- we know that ain’t so” in order to hone the firm’s
vealed the interwoven conditions that compromised predictive edge. HBR Reprint R1605E