Philip Tetlock, author of Superforecasting, got famous by studying prediction. His first major experiment, the Expert Political Judgment experiment, is frequently cited as saying that top pundits’ predictions are no more accurate than a chimp throwing darts at a list of possibilities- although Tetlock takes great pains to confess to us that no chimps were actually involved, and this phrasing just sort of popped up as a flashier way of saying “random”.
Although this was generally true, he was able to distinguish a small subset of people who were able to do a little better than chance. His investigation into the secrets of their very moderate success led to his famous “fox” versus “hedgehog” dichotomy, based on the fable that “the fox knows many things, the hedgehog knows one big thing”. Hedgehog pundits/experts are people who operate off a single big idea- for example, an economist who says that government intervention is always bad, predicts doom for any interventionist policy, and predicts great success for any noninterventionist one. Foxes are people who don’t have much of a narrative or ideology, but try to find the right perspective to approach each individual problem. Tetlock found that the hedgehogs did worse than the chimp and the foxes did a little better.
Cut to the late 2000s. The US intelligence community has just been seriously embarrassed by their disastrous declaration that there were weapons of mass destruction in Iraq. They set up an Intelligence Advanced Research Projects Agency to try crazy things and see if any of them worked. IARPA approached a bunch of scientists, handed them a list of important world events that might or might not happen, and told them to create some teams and systems for themselves and compete against each other to see who could predict them the best.
Tetlock was one of these scientists, and his entry into the competition was called the Good Judgment Project. The plan was simple: get a bunch of people to sign up and try to predict things, then find the ones who did the best. This worked pretty well. 2,800 people showed up, and a few of them turned out to be…
…okay, now we’re getting to a part I don’t understand. When I read Tetlock’s paper, all he says is that he took the top sixty forecasters, declared them superforecasters, and then studied them intensively. That’s fine; I’d love to know what puts someone in the top 2% of forecasters. But it’s important not to phrase this as “Philip Tetlock discovered that 2% of people are superforecasters”. This suggests a discontinuity, a natural division into two groups. But unless I’m missing something, there’s no evidence for this. Two percent of forecasters were in the top two percent. Then Tetlock named them “superforecasters”. We can discuss what skills help people make it this high, but we probably shouldn’t think of it as a specific phenomenon.
Anyway, the Good Judgment Project then put these superforecasters on teams with other superforecasters, averaged out their decisions, slightly increased the final confidence levels (to represent the fact that it was 60 separate people, all of whom were that confident), and presented that to IARPA as their final answer. Not only did they beat all the other groups in IARPA’s challenge in a landslide, but they actually did 30% better than professional CIA analysts working off classified information.
Having established that this is all pretty neat, Tetlock turns to figuring out how superforecasters are so successful.
First of all, is it just luck? After all, if a thousand chimps throw darts at a list of stocks, one of them will hit the next Google, after which we can declare it a “superchimp”. Is that what’s going on here? No. Superforecasters one year tended to remain superforecasters the next. The year-to-year correlation in who was most accurate was 0.65; about 70% of superforecasters in the first year remained superforecasters in the second. This is definitely a real thing.
Are superforecasters just really smart? Well, sort of. The superforecasters whom Tetlock profiles in his book include a Harvard physics PhD who speaks 6 languages, an assistant math professor at Cornell, a retired IBM programmer data wonk, et cetera. But the average superforecaster is only at the 80th percentile for IQ – just under 115. And there are a lot of people who are very smart but not very good at predicting. So while IQ definitely helps, it isn’t the whole story.
Are superforecasters just really well-informed about the world? Again, sort of. The correlation between well-informedness and accuracy was about the same as the correlation between IQ and accuracy. None of them are remarkable for spending every single moment behind a newspaper, and none of them had as much data available as the CIA analysts with access to top secret information. Even when they made decisions based on limited information, they still beat other forecasters. Once again, this definitely helps, but it’s not the whole story.
Are superforecasters just really good at math? Again, kind of. A lot of them are math PhDs or math professors. But they all tend to say that they don’t explicitly use numbers when doing their forecasting. And some of them don’t have any kind of formal math background at all. The correlation between math skills and accuracy was about the same as all the other correlations.
So what are they really good at? Tetlock concludes that the number one most important factor to being a superforecaster is really understanding logic and probability.
Part of it is just understanding the basics. Superforecasters are less likely to think in terms of things being 100% certain, and – let’s remember just how far left the bell curve stretches – less likely to assign anything they’re not sure about a 50-50 probability. They’re less likely to believe that things happen because they’re fated to happen, or that the good guys always win, or that things that happen will necessarily teach a moral lesson. They’re more likely to admit they might be wrong and correct themselves after an error is discovered. They’re more likely to debate with themselves, try to challenge their original perception, start asking “What could be wrong about this thing I believe?” rather than “How can I prove I’m right?”
But they’re also more comfortable actively using probabilities. Like my predictions, the Good Judgment Project made forecasters give their answers as numerical probability estimates – for example, 15% chance of a war between North and South Korea in the next ten years killing > 1000 people. Poor forecasters tend to make a gut decision based on feelings that superficially related to the question, like “Well, North Korea is pretty crazy, so they’re pretty likely to declare war, let’s say 90%” or “War is pretty rare these days, how about 10%?”. Superforecasters tend to focus on the specific problem in front of them and break it down into pieces. For example, they might start with the Outside View – it’s been about 50 years since the Koreas last fought, so their war probability per decade shouldn’t be more than about 20% – and then adjust that based on Inside View information – “North Korea has a lot fewer foreign allies these days, so they’re less likely to start something than they once were – maybe 15%”.
Or they might break the problem down into pieces: “There would have to be some sort of international incident, and then that incident would have to erupt into total war, and then that war would have to kill > 1,000 people. There are about two international incidents between the Koreas every year, but almost none of them end in war; on the other hand, because of all the artillery aimed at Seoul, probably any war that did happen would have an almost 100% chance of killing > 1,000 people” … and so on. One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.
Most interesting, they seem to be partly immune to cognitive bias. The strongest predictor of forecasting ability (okay, fine, not by much, it was pretty much the same as IQ and well-informedness and all that – but it was a predictor) was the Cognitive Reflection Test, which includes three questions with answers that are simple, obvious, and wrong. The test seems to measure whether people take a second to step back from their System 1 judgments and analyze them critically. Superforecasters seem especially good at this.
Tetlock cooperated with Daniel Kahneman on an experiment to elicit scope insensitivity in forecasters. Remember, scope insensitivity is where you give a number-independent answer to a numerical question. For example, how much should an organization pay to save the lives of 100 endangered birds? Ask a hundred people, and maybe the average answer is “$10,000”. Ask a (different group of) a hundred people how much the same organization should pay to save the lives of 1000 endangered birds, and maybe the average answer will still be $10,000. So it seems you can get people to change their estimate of the value of bird life just by changing the number in the question. Poor forecasters do the same thing on their predictions. For example, a hundred poor forecasters might on average predict a 15% chance of war in Korea in the next five years, and a different group of a hundred poor forecasters might on average predict a 15% chance of war in Korea in the next fifteen years. They’re ignoring the question and just going off of a vague feeling of how likely another Korean war seems. Superforecasters, in contrast, showed much reduced scope insensitivity, and their probability of a war in five years was appropriately lower than of a war in fifteen.
Maybe all this stuff about probability calibration, inside vs. outside view, willingness to change your mind, and fighting cognitive biases is starting to sound familiar? Yeah, this is pretty much the same stuff as in the Less Wrong Sequences and a lot of CFAR work. They’re both drawing from the same tradition of cognitive science and rationality studies.
So as I said before, Superforecasting is not necessarily too useful for people who are already familiar with the cognitive science/rationality tradition, but great for people who need a high-status and official-looking book to justify it. The next time some random person from a terrible forum says that everything we’re doing is stupid, I’m already looking forward to pulling out Tetlock quotes like:
The superforecasters are a numerate bunch: many know about Bayes’ theorem and could deploy it if they felt it was worth the trouble. But they rarely crunch the numbers so explicitly. What matters far more to the superforecasters than Bayes’ theorem is Bayes’ core insight of gradually getting closer to the truth by constantly updating in proportion to the weight of the evidence. That’s true of Tim Minto [the top superforecaster]. He knows Bayes’ theorem, but he didn’t use it even once to make his hundreds of updated forecasts. And yet Minto appreciates the Bayesian spirit. “I think it is likely that I have a better intuitive grasp of Bayes’ theorem than most people,” he said, “even though if you asked me to write it down from memory I’d probably fail.” Minto is a Bayesian who does not use Bayes’ theorem. That paradoxical description applies to most superforecasters.
And if you’re interested, it looks like there’s a current version of the Good Judgment Program going on here that you can sign up to and see if you’re a superforecaster or not.
EDIT: A lot of people have asked the same question: am I being too dismissive? Isn’t it really important to have this book as evidence that these techniques work? Yes. It is important that the Good Judgment Project exists. But you might not want to read a three-hundred page book that explains lots of stuff like “Here’s what a cognitive bias is” just to hear that things work. If you already know what the techniques are, it might be quicker to read a study or a popular news article on GJP or something.