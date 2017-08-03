There’s a new ad on the sidebar for Metaculus, a crowd-sourced prediction engine that tries to get well-calibrated forecasts on mostly scientific topics. If you’re wondering how likely it is that a genetically-engineered baby will be born in the next few years, that SpaceX will reach Mars by 2030, or that Sci-Hub will get shut down, take a look.
(there are also the usual things about politics, plus some deliberately wacky ones like whether Elon Musk will build more kilometers of tunnel than Trump does border wall)
They’re doing good work – online for-cash prediction markets are limited to a couple of bets by the government, and they usually focus on the big-ticket items like who’s going to win elections. Metaculus is run by a team affiliated with the Foundational Questions Institute, and as their name implies they’re really interested in using the power of prediction aggregators to address some the important questions for the future of humanity – like AI, genetic engineering, and pandemics.
Which makes me wonder: what’s everyone else’s excuse?
Back when it looked like prediction markets were going to change everything (was it really as recent as two months ago?), various explanations were floated for why they hadn’t caught on yet. The government was regulating betting too much. The public was creeped out by the idea of profiting off of terrorist attacks. Or maybe people were just too dumb to understand the arguments in favor.
Now there are these crowd-sourced aggregator things. They’re not regulated by the government. Nobody’s profiting off of anything. And you don’t have to have faith that they’ll work – Philip Tetlock’s superforecasting experiments proved it, and Metaculus tracks its accuracy over time. I know the intelligence services are working with the Good Judgment Project, but I’m still surprised it hasn’t gone further.
Robin Hanson is the acknowledged expert on this, thanks to his long career of trying to get institutions to adopt prediction markets and generally failing. He attributes the whole problem to signaling and prestige, which I guess makes as much sense as anything. Tyler Cowen says something similar here. But I’m still surprised there aren’t consultant superforecaster think tanks hanging up their shingles. Forget prestige – aren’t there investors who would pay for their wisdom? And why can’t I hire Philip Tetlock to tell me whether my relationship is going to last?
I asked Prof. Aguirre of Metaculus, and he said (slightly edited for flow):
I don’t think Tetlock has any “secret sauce”, though I think he did a good job. The Metaculus track record is pretty good and will continue to improve. There’s definitely real predictive power. Our main challenge is that all the personnel involved are very time-limited and we’re also operating on a shoestring, probably 1/50th of what Tetlock spent of IARPA’s money.
If you are an individual company or investor, you don’t really get that much from “crowdsourcing” because you don’t really have a crowd (unless you’re a big business and can force your employees to make the predictions); so I’d guess most companies probably just fall back on asking some group of people to get together and make projections etc. My personal view is that the power really comes when you get a *lot* of people, questions, and data; you can then start to leverage that to improve the calibration (by recalibrating based on past accuracy), identify the really good predictors, and build up a large enough corpus of results that the predicted probabilities become grounded in an actual ensemble — the ensemble of questions on the platform.
Relatedly, the ability to make use of probabilistic predictions is, sadly, confined to a rather small fraction of people. I think typical decision-makers want to know what *will* happen, and 60-40 or 75-25 or even 80-20 is not the certainty they want. In the face of this uncertainty, I think people mentally substitute a different question to which they feel that they have some sort of good instinct, or privileged understanding. I think there’s also an element where they just don’t really *believe* the numbers because they are sometimes “wrong.” This sometimes frustrates me, but is not wholly irrational, I would say, because in the absence of a real grounding of the probabilities, if you have some analyst come and tell you that there’s a 70-30 chance, what exactly do you do with that? How would you possibly trust that those are the right numbers if the question is something “squishy” without a solid mathematical model?
I wonder if there’s data about how accuracy changes with number of predictors and predictor quality. There are so many smart people around who are interested in probability and willing to answer a few questions that this seems like a really stupid bottleneck. I’m happy I can do my part pushing Metaculus, but someone seriously needs to find a way to make this scale.
I wonder if allowing comments on the site will affect its accuracy due to a recency bias invoked by the top comments.
+1 Meta points to you, sir!
Also to answer your question, most likely yes. While debating the subject and adding a variety of views may in fact increase overall accuracy, the recency bias effect caused by the limitations of discussion boards is very likely to produce a larger magnitude effect in the opposite direction. The benefit of discussion are much diminished if people do not take in the whole of the discussion.
I use the site frequently. Based on my experience, this is a valid concern though I’m not sure that the recency bias is as much of an issue as much as the lack of comments.
Also, there’s going to be an even more significant bias from whoever is the first to submit a prediction.
I’m kind of annoyed that it requires a lot of covering the screen up with my hand to avoid seeing everyone else’s prediction before I formulate mine. I wonder if anyone has tested whether making everyone predict independently would be better.
Metaculus has done a few questions in the past that have hidden the predictions, but those were mainly meta questions (eg. “what will be 2/3 of the average response to this question?”), so they have the technology but haven’t really done the studies.
I think someone in Tetlock’s team has done this experiment on GJOpen, though that site does not display comments + forecasts on the same screen as the forecast. (Instead you have to scroll down.) I’ll ask around and get back to you on the result.
Meanwhile, back in 1977 someone is wondering why more business are not adopting personal microcomputers given their obvious productivity benefits. Technology takes time to disseminate, so the question you need to be asking is not why this specific technology is taking time to disseminate, but why it takes time in general. Likely it’s some complicated mix of signalling, prestige, institutional inertia, and a general feeling that these methods are still unproven. The last one is very important because nobody wants to be the fool who advocated for Hot New Thing over Proven Old Method only to have it blow up in their face. This is a legitimate concern given the large number of Hot New Things that sputter out and amount to nothing.
Also few people have even heard of Philip Tetlock’s super-forecasting experiments, and fewer still are going to be convinced by them. Knowledge also takes time to disseminate, and people are notoriously hard to convince. Superforcasting was published in 2015, for crying out loud, this is like being shocked that businessmen were still using dumb phones in 2009. If prediction aggregation remains obscure in the next Presidential election, then you can start to wonder if there’s something strange going on. As it is right now, this is normal.
Oh! It occurs to me that my examples above are physical devices, while prediction aggregation is a technique. Consider instead poll aggregation. The very first poll aggregator was Real Clear Politics in 2002, but poll aggregation didn’t become widely known until the 2008 election at the earliest, and i wouldn’t say it was widespread until 2012. That’s a full decade. Again it’s normal for things to take time to spread.
Here’s my excuse. I eagerly hopped on the bandwagon of forecasting and prediction markets, and tried talking to lots of rationalists and effective altruists about it over the last couple years. It’s one of those things that wasn’t taken up as an idea to actually do something about by many in the rationality community besides Robin Hanson. You, Scott, have blogged about it too. Frankly, I expect it wasn’t being said by the right people in the community in the right way to be taken seriously. The rationalist diaspora is hard to read these days, and to the extent the rationality community runs on a heuristic of concluding the next big idea must be what everyone else in the community is currently talking about, it’s hard to tell why some memes get taken more seriously than others.
There was a time last year when the rationality community’s interest in Kegan levels spiked for a while. I wish I knew how to get people to excited about prediction markets and forecasting the way they get excited about Kegan levels. Anyway, I gave up on trying to get people to care about forecasting and prediction markets, as I wasn’t well-placed to do something all by myself, and I was hoping Metaculus would take off. Maybe that’s happening now. I’ll believe people will stop making excuses for not doing anything about forecasting when I see it, though.
I’ve been trying to think what kind of prediction market/aggregator might catch on more in general. I think you might be able to get one to snowball if it had the following features:
1. Anybody could post a question to be predicted
2. Predictions are scored in some kind of market-like way, where the successful people’s winnings comes out of the pocket of the unsuccessful people, and you get more credit for getting counterintuitive things (as measured in everyone else disagreeing with you) than obvious things.
PredictionBook has 1 but not 2. It had a great economy of people putting in silly things like “My relationship is going to last more than a year” or “There will be a twist ending to the latest Unsong chapter”, but there wasn’t a meaningful way to aggregate a person’s record across all of their predictions. If you wanted to look like a “good” predictor, you could just add “the sun will come up tomorrow”, predict 100%, and look great. Without knowing who the best predictors were, you couldn’t do any post-hoc adjustments to make things like superforecasting possible.
Metaculus has 2 but not 1. I wouldn’t be surprised if it gets good answers to the things it’s asking, but it’s never going to be the same kind of fun tool for little real-life questions that PredictionBook is, and it probably doesn’t have the same potential to encourage community participation.
Is there anything with both 1 and 2? Does combining these have as much potential as I think?
Obviously, we need to get a superforecaster on the job of predicting which kind of prediction aggregator is most likely to catch on.
Maybe a model like steemit where people are paid in a cryptocurrency for posting questions, voting on questions, and making correct predictions, and where this cryptocurrency derives its value from giving holders extra voting power.
I think you might be underestimating the difficulty of implementing 1 while keeping the tournament fair. Phrasing and resolving questions takes a lot of effort on the part of the organizers, and I think it would take at least two clever ideas to find a way to find a way to automate away this labor to the point where everyone (or at least a lot more investment into hiring people to oversee tournaments and markets).
Though note that the people at Augur and Steemit, among others, think they’ve basically solved the problem, so maybe I’m grossly miscalibrated.
I’m worried the cryptocurrency aspect of Augur makes it too high-barrier-to-entry compared to a meaningless point system like Metaculus. And maybe I’m missing something, but Steemit doesn’t seem to be a prediction market at all.
You’re right on Steemit not being a prediction market. Whoops. My bad.
I think it would still be interesting to see what Augur’s solution to question resolution is, and to see if it can be salvaged for a point-based system.
Augur is a prediction market on Ethereum. Anyone can pose questions and anyone can bet. It’s decentralized, so it can’t (easily) be shut down by a government. It’s currently in beta with a release expected this year, I think. You can already use it with toy money.
Yeah, I’m familiar with what they claim to be able to do, though I haven’t had the time to look into how they resolve the problem resolution problem.
FWIW, Augur is a ripoff of http://bitcoinhivemind.com/. Bitcoin hivemind is technically superior and made by less sketchy people.
Thanks!
Didn’t Robin Hanson claim that a fair number of companies had tried internal prediction markets, found that they worked, and abandoned them?
Yes, I think a lot of binary questions are bad for those reasons. But I think the typical commercial applications are for continuous variables, like ship dates or sales forecasts, which should be easier to understand.
So my Boyfriend posited an interesting idea: If prediction markets work, then they would diminish the value and prestige of company executives, who are the very people who will decide whether or not to implement them. Now in a free market one supposes that eventually some company will care more about the value of accurate prediction over the prestige of its executives, and then use the competitive advantage to attain market dominance, forcing everyone else to catch up. However until that happens, it’s unlikely that executives will implement measures that they feel threaten their positions.
Which i guess is what Robin Hanson was getting at.
That’s an explanation of why they don’t adopt it in the first place, but I don’t think it makes sense as an explanation of why they abandon it. Do they learn something using it? Does it diminish their prestige more than they expected?
Most likely they thought that these markets would be useful tools that would enhance their abilities as decision makers. When it became evident that they could in fact replace them as decision makers, at least in some domains, they did not seem quite as appealing. So they discontinued them on spurious but plausible sounding reasons.
This probably did not happen at a concious level. People are very good at rationalizing reasons for why things that are in their self-interest are in fact in everyone’s interest. They are also very good at convincing themselves this is the case, because the most effective lie is one you believe in.
I remember hearing that Google cancelled their prediction markets at least in part due to OPSEC reasons. After all, you didn’t want to make so much information on project dates and new programs available to everyone inside of the company, many of whom might be tempted to leak it.
I’ve also heard from Tetlock that managers have found it difficult to motivate people in the presence of prediction tournaments – either it was a sure thing, in which case why bother, or it was sure to fail, in which case also why bother. The tournaments created self-fulfilling prophecies, in other words, and I’m sure you can imagine how you wouldn’t want those in your company.
Maybe knowing what will happen is easier than executing correctly on that knowledge. How many events truly blindside people, especially people in the relevant industry?
I’m kind of reminded of Microsoft in the 1990s. They knew the Internet would be important, and they took steps that they thought would help them, but it didn’t work out. Was the problem lack of foreknowledge, or inability to execute on that foreknowledge?
I think that’s what the prediction markets are missing, someone who conclusively demonstrates how to leverage that knowledge.
Isn’t this just a prediction subproblem? “We know the Internet will be big, but we don’t know whether to support HTTP or Gopher”. Sounds like a problem a prediction market could help with. “Okay, it’s HTTP, but we don’t know whether paid subscriptions or ads will be more monetizable”. Well…
I think the microsoft problem is typical, and it is only survivor bias that make sit look otherwise.
Saying that Robin Hanson attributes something to signaling and prestige is kind of like saying that Joseph McCarthy suspects the influence of Communism.
Mostly true, but I think this was the thing that started him being interested in signaling, so it escapes that problem.
I find Hanson’s and Tetlock’s work on prediction markets and aggregators very exciting, but have never participated in one, and might be representative of the marginal user that Metaculus would try to attract.
I think the biggest piece missing for me is incentives, either financial or reputational along the lines of a StackOverflow points system or GitHub commit history.
Making good predictions takes a lot of time. I need a good reason to justify spending that time. If not significant financial returns, at least some badge that lets me credibly signal “I am an unusually good predictor of the future for hard, important questions in X domain” to employers, collaborators, or internet people I get into arguments with.
But without a prediction market, figuring out what questions merit the “hard, important” criteria is a central-planning-complete problem. You can design criteria like “well calibrated” or “correct 60% of the time against consensus on questions with at least 10 other people predicting” but these will be incommensurable between people who answer different sets of questions.
It’s interesting, because both Metaculus and GJOpen (which grew out of Tetlock’s work on IARPA) have reputational mechanisms. Metaculus has a cumulative score, while GJOpen tracks your average Brier score and how much better you do than the average forecaster who forecasted on the same questions on the same days you did. Clearly these are not perfect scoring mechanisms and can be gamed, but I think it’s a start.
New Zealand had a pretty good set up for a few years until it was banned: https://en.wikipedia.org/wiki/IPredict
There are, or at least there are superforecasters trying. Good Judgment Inc is kind of a superforecaster-run consulting firm.
In addition, several superforecasters are involved in finance, and I believe at least one runs their own hedge fund. It’s also not clear how super superforecasters are – my impression is that they’re merely smart people who were placed in an environment where they were incentivized to be accurate, and that many hedge funds and prop shops already internally implement a lot of the guidelines that are suggested in Superforecasting.
I think it’s true that prediction tournaments are better than not aggregating information, or aggregating information by passing around lengthy internal memos. (Certainly better than passing around dubiously-sourced articles on social media. And prediction markets, or tournaments with smarter aggregation rules, should be significantly better than modern tournaments, at least in theory.) But I think one possible contributor for why prediction tournaments have not caught on is simply that current techniques don’t work that well – their superiority over getting together a team of smart people with some sane incentives is not sufficient (or at least not clearly sufficient) to overcome their novelty and prestige issues.
Damn, a fourth article in two days?
I think moving to Ward Street has made Scott more productive. Or perhaps he hasn’t started working full time yet?
Creepy. A world of rationalists and autists using prediction markets to optimize their lives for a crowdsourced utility function which perfectly fits the aggregate but never an individual.
The focus on AI risk in the rationalist community makes perfect sense, actually, but it should be directed inwards.
I know your comment is at least partially in jest, but there are people at MIRI working on examining theoretical models of prediction markets…
30% chance of Trump being out of office by February 2019!? That does not seem remotely realistic.
Which way? Is it too low or too high?
Way too high (sadly).
Mueller’s investigation seems to have teeth, but a year and a half is just not enough time for a complicated fraud/collusion investigation to pan out, let alone tried and processed through a friendly congress. Trump might resign if some unambiguously incriminating bombshell exists and it’s low hanging enough for Mueller to find in a year and half, but that is nowhere near a double digit chance.
Assassination is unlikely, certainly not a double digit chance. Health issues might be 4-5% according to this table, if even that; that table is from the social security population, which doesn’t have access to the best health care the US can provide on prompt notice.
That’s enough risk for an optimistic 10%, but 30% is plain ludicrous.
I think there’s a fair chance he could get fed up and flounce out. Am I alone?
Most businesses don’t even have staff suggestion boxes. Bottom-up information is a very under implemented area.