Why Not More Excitement About Prediction Aggregation?

There’s a new ad on the sidebar for Metaculus, a crowd-sourced prediction engine that tries to get well-calibrated forecasts on mostly scientific topics. If you’re wondering how likely it is that a genetically-engineered baby will be born in the next few years, that SpaceX will reach Mars by 2030, or that Sci-Hub will get shut down, take a look.

(there are also the usual things about politics, plus some deliberately wacky ones like whether Elon Musk will build more kilometers of tunnel than Trump does border wall)

They’re doing good work – online for-cash prediction markets are limited to a couple of bets by the government, and they usually focus on the big-ticket items like who’s going to win elections. Metaculus is run by a team affiliated with the Foundational Questions Institute, and as their name implies they’re really interested in using the power of prediction aggregators to address some the important questions for the future of humanity – like AI, genetic engineering, and pandemics.

Which makes me wonder: what’s everyone else’s excuse?

Back when it looked like prediction markets were going to change everything (was it really as recent as two months ago?), various explanations were floated for why they hadn’t caught on yet. The government was regulating betting too much. The public was creeped out by the idea of profiting off of terrorist attacks. Or maybe people were just too dumb to understand the arguments in favor.

Now there are these crowd-sourced aggregator things. They’re not regulated by the government. Nobody’s profiting off of anything. And you don’t have to have faith that they’ll work – Philip Tetlock’s superforecasting experiments proved it, and Metaculus tracks its accuracy over time. I know the intelligence services are working with the Good Judgment Project, but I’m still surprised it hasn’t gone further.

Robin Hanson is the acknowledged expert on this, thanks to his long career of trying to get institutions to adopt prediction markets and generally failing. He attributes the whole problem to signaling and prestige, which I guess makes as much sense as anything. Tyler Cowen says something similar here. But I’m still surprised there aren’t consultant superforecaster think tanks hanging up their shingles. Forget prestige – aren’t there investors who would pay for their wisdom? And why can’t I hire Philip Tetlock to tell me whether my relationship is going to last?

I asked Prof. Aguirre of Metaculus, and he said (slightly edited for flow):

I don’t think Tetlock has any “secret sauce”, though I think he did a good job. The Metaculus track record is pretty good and will continue to improve. There’s definitely real predictive power. Our main challenge is that all the personnel involved are very time-limited and we’re also operating on a shoestring, probably 1/50th of what Tetlock spent of IARPA’s money.

If you are an individual company or investor, you don’t really get that much from “crowdsourcing” because you don’t really have a crowd (unless you’re a big business and can force your employees to make the predictions); so I’d guess most companies probably just fall back on asking some group of people to get together and make projections etc. My personal view is that the power really comes when you get a *lot* of people, questions, and data; you can then start to leverage that to improve the calibration (by recalibrating based on past accuracy), identify the really good predictors, and build up a large enough corpus of results that the predicted probabilities become grounded in an actual ensemble — the ensemble of questions on the platform.

Relatedly, the ability to make use of probabilistic predictions is, sadly, confined to a rather small fraction of people. I think typical decision-makers want to know what *will* happen, and 60-40 or 75-25 or even 80-20 is not the certainty they want. In the face of this uncertainty, I think people mentally substitute a different question to which they feel that they have some sort of good instinct, or privileged understanding. I think there’s also an element where they just don’t really *believe* the numbers because they are sometimes “wrong.” This sometimes frustrates me, but is not wholly irrational, I would say, because in the absence of a real grounding of the probabilities, if you have some analyst come and tell you that there’s a 70-30 chance, what exactly do you do with that? How would you possibly trust that those are the right numbers if the question is something “squishy” without a solid mathematical model?

I wonder if there’s data about how accuracy changes with number of predictors and predictor quality. There are so many smart people around who are interested in probability and willing to answer a few questions that this seems like a really stupid bottleneck. I’m happy I can do my part pushing Metaculus, but someone seriously needs to find a way to make this scale.

This entry was posted in Uncategorized and tagged . Bookmark the permalink.

172 Responses to Why Not More Excitement About Prediction Aggregation?

  1. apaperperday says:

    The problem is right there in your post.

    Nobody is profiting off of anything.

    So why get involved? It takes a lot of time, and the gains don’t accrue to you.
    Moreover, why start one? It takes a lot of time, involves paying for hosting, etc. and you can’t make money on it.

    Scott Sumner has put a fair amount of effort into several incarnations of a platform. Each one seems (as far as I can tell) to persist until it runs out of money, and then sort of die off.

    As a final comment, some of the places they might be most useful are along the lines of predicting the probabilities or timelines of unlikely/far off events. Knowing if there is a 0.1% or 0.2% chance of an asteroid devastating the planet in the next 1000 years is hugely important. Payouts are difficult on these timelines. Making money is hard at these probabilities (a 0.1% margin for profit is tiny even if it does represent double the chance of armageddon). Beyond all of which is the fact that money is near valueless in that state of the world.
    Put together, the places we might most like to have good predictions or even just a good understanding of the uncertainty, are also the places where it is really hard to make it worth someone’s while to spend money and time.

  2. Reasoner says:

    Is the deficiency on the demand side (no one needs predictions made) or on the supply side (no one is good at making them?)

  3. captjparker says:

    if you have some analyst come and tell you that there’s a 70-30 chance, what exactly do you do with that?

    I’d go further and say if the prediction is that there is a 70% chance of a unique non repeating event happening the “70% chance” doesn’t have very much meaning and there is no way judge the “accuracy” of that prediction.

    • andrewflicker says:

      That *can* be true, captjparker, but I don’t think it’s required to be true. I can imagine a scenario where a 70/30 prediction can be retroactively analysed to be composed of the probabilities of several non-unique iterated events that combined into a unique non-repeating form.

  4. Steve Sailer says:

    I predicted on November 28, 2000 that a future Republican Presidential candidate could win the Electoral College by appealing to Rust Belt states’ blue collar whites rather than by trying to win over Hispanics via promoting more immigration, as Republicans were being advised to do by their good friends in the Democratic Party and the mass media:

    http://www.vdare.com/articles/gop-future-depends-on-winning-larger-share-of-the-white-vote

    Sixteen years later, that turned out to be how Trump won.

    But, it’s not like there was an active betting interest in my prediction for most of those 16 years until it came true. Heck, barely anybody besides me talked about this strategy over all those years.

    Instead, it simply never occurred to most people interested in Republican prospects that there was a potential alternative GOP path to the White House in contrast to the Democrats’ and NYT’s insistence was that the only hope for the GOP was to turn illegal aliens into voters.

    Sure, the GOP would lose on each one, but they’d make it up on volume!

    I mean, would Chuck Schumer lie to John McCain, Lindsey Graham, and Marco Rubio about what was in the Republican Party’s best interest?

    Eventually, my more realistic way of thinking persuaded Ann Coulter, and Donald Trump happened to see Ann on TV in the spring of 2015, and here we are.

    But it took forever.

    My impression about the real world is not that there is a shortage of answers that prediction markets could solve, but that there is a shortage of questions. Orwell’s “1984” insists that the key to maintaining political power is to emasculate the language of tools for asking questions inconvenient to the powerful. In “1984,” there are only two ways to think about things: the Party’s way, and via Crimethink, which triggers Crimestop or protective stupidity.

    • Deiseach says:

      Eventually, my more realistic way of thinking persuaded Ann Coulter, and Donald Trump happened to see Ann on TV in the spring of 2015, and here we are.

      So it’s all your fault, Steve Sailer? 🙂

    • Lirio says:

      From what i remember of Trump’s first speech when he announced his intention to run, he actually mooted a large number of policies and talking points. The Mexicans thing just happened to be the one that the talking heads on TV hated most, so of course they talked about it endlessly, giving him assloads of free publicity. If it had been something else, i expect he would have focused his politics on that something else. So i don’t think Trump initially saw the possibilities of the coalition he wound up winning with, though once he did see the opening he seized it and stuck with it.

      The question now is just how solid that coalition is. 78k votes across three medium to large states is a very narrow margin to win by, and it still looks to me that the election was primarily down to random variation. Though Trump’s strategy was successful in getting to the point that random variation could be a factor at all. There really doesn’t seem to have been any other strategy on the table that could have possibly gotten a Republican candidate close enough to bet on the dice. At least in that sense Steve Sailer, your observation was rather astute.

      Going back to the margins, both Trump and Clinton were very problematic and weak candidates. So the question is which candidate’s problems had the larger effect. Given stronger candidates for both parties, would the margin get larger or smaller? If it gets larger it should keep the Republicans competitive via the electoral collage through several election cycles at least, but if it gets smaller the strategy is non-viable unless the Democrats field a weak candidate and the Republicans a strong one.

      (Also there’s decent evidence that as far as popular vote i concerned, candidates matter little. It’s more a referendum on the parties. However given the victory condition is the EC, it’s possible for it to get close enough to the wire for candidate effects and random variation to throw it one way or the other.)

  5. Steve Sailer says:

    One of Tetlock’s superforecasters explained to me how he did it a few years ago:

    http://isteve.blogspot.com/2013/12/tetlocks-good-judgment-project.html

    Short answer: you need to be smart and sensible and work extremely hard for a whole year or more.

    My guesstimate would be that it would take me 500 to 1000 hours of work in a year to have a shot at becoming a superforecaster, and I didn’t want to do that much work.

    Tetlock’s superforecasters are kind of the opposite of an aggregation system. They are the result of a system of winnowing away all the chaff.

    And even after you’ve been identified as a superforecaster, how do you prove that you aren’t coasting this year on your reputation?

    I don’t think that is an insoluble problem, by the way.

  6. lukeprog says:

    > why can’t I hire Philip Tetlock to tell me whether my relationship is going to last?

    Well, you sort of can, but it’s expensive.

    The other best option I know of, in the sense that they selected the best predictors among hundreds, is Hypermind. Emile has probably spent even more time trying to get companies to use prediction markets than Robin has.

  7. RohanV says:

    Perhaps someone should set up a hedge fund using a prediction market. It has simple questions with obvious answers. “One month from now, will ABC’s stock price be higher or lower?” Then execute a trade based on the answer. Hedge funds traditionally charge an incentive fee of 20% (20% of profit, 0 if the fund loses money), so take that amount and distribute to the participants in the prediction market.

    If the fund consistently makes money, compared to a standard benchmark, I think that would go a fair ways to validating the concept.

    • kominek says:

      the already publicly available option chain information is equivalent to what the market thinks the stock will do over a number of time periods. now, many stocks have only thinly traded options, and few time periods, so the prices might not be efficiently distributing information, but you’ve also got fewer options to do anything about it. and if trading volume on them picks up, it probably won’t go unnoticed for long.

      • sflicht says:

        this +10000

        People outside of finance, I think, generally don’t appreciate just how amazingly efficient the market is at valuing stocks etc.

        While I would be in favor of publicly subsidized PMs on matters of great policy relevance, it’s not because there’s a lack of people betting on these things already. The financial system is a huge prediction market already. It’s just that standard financial instruments are more difficult to interpret in terms of “policy relevant” variables. And I think there’s actually probably a direct tradeoff between such interpretability and the issue of “ambiguous questions / hard-to-adjudicate edge cases / controversial resolutions” that inevitably plagues PMs.

        Furthermore I think there’s compelling evidence that high interpretability is *causally* linked to low liquidity. For example, the most important, easily interpretable, readily available financial instrument widely traded today might be the Federal Funds futures, which amount to a direct bet on the probability of a Fed interest rate hike at any given meeting. These are incredibly illiquid, relative to other relevant instruments. People still ascribe great weight to the implied probabilities from the prices of these futures, but it’s striking to think about how much less efficiently priced these are compared to, say, 10 year Treasury futures. Famously, Goldman Sachs attempted to create a range of macroeconomic derivatives linked to observables (GDP etc) in the early 2000s, and completely failed to get people interested in trading these.

        The jury’s still out, but I suspect there are deep, intrinsically Hansonian, reasons why these “serious” attempts at applying PM ideas in the context of highly liquid financial markets have failed to take off.

  8. Guy in TN says:

    Is there any empirical evidence that prediction markets are more accurate than polling, in terms of predicting events? Because if the purpose of prediction markets is to harness the wisdom-of-the-crowd, then regular polling seems like the most direct and obvious way of doing this.

    Prediction markets are not a random sample of “the crowd”, they are a sample people who have extra money and time laying around to play with. For example, it would be of little loss of utility to a extremely wealthy person to put down $100,000 on a bet, while the loss of utility for a poor person putting down $100 would be enormous. In addition to the problem of a non-random sample, the “voting with money” aspect also means that some people influence the prediction many times more than others. So even if ten poor people put down 100$, and the rich person put down his utility-loss equivalent of 100,000$, the betting market would predict that the event the rich person supported would happen, even if they crowd was heavily against it.

    • random832 says:

      Sure, but after the event comes to pass, the rich guy has $100,000 less, and the poor people have each $10,000 more. They may spend it on necessities rather than folding it back into the prediction market, but if this happens enough times the rich person won’t be able to put down $100,000 anymore. So it’s self-correcting.

      • Guy in TN says:

        I question the self-correcting aspect, since that seems to rely on prediction markets being universally played, and played to such an extent that it significantly changes wealth distribution. As long as the outside economy still exists, you are going to be constantly bringing in people from that world and their associated wealth disparities, ensuring that the wealth equilibrium inside the prediction market is never achieved, thus ensuring that the markets don’t ever gain predictive power.

        If the prediction market is only useful once we reach a certain level of economic equality, I wouldn’t hold my breath for it.

        • Adam Berman says:

          If the only thing a a prediction market achieves is taking money from dumb rich people and giving it to smart poor people, I’d call that a wild success.

          • alchemy29 says:

            In poker, if you have many more chips than your opponent, you can play badly and still take all of their chips. I think there is an analogous situation with prediction markets.

          • Guy in TN says:

            @Adam Berman
            The market rewards people for betting correctly, which is best achieved by either having insider knowledge or the power to manipulate world events. Neither of those traits correlate with being poor. Random832’s hypothetical was the rich guy losing, but the rich guy winning seems like the more normal scenario.

            This is all tangential to the question of the supposed predictive abilities of the market, of course.

          • Deiseach says:

            Unfortunately, most markets take money from poor people and give it to rich people. When an Irish semi-state organisation was privatised, a lot of people thought that (in the same way as when Thatcher privatised a lot of British companies) this was the equivalent of free money. They sank savings and even took out loans to buy shares, expecting that they could then sell them on at a hefty profit when large investors wanted to buy.

            However, the large investors knew a trick worth two of that; they sat back, waited for the share prices to fall, then swooped in and gobbled them up at bargain rates, which left a couple of businessmen very well off and a lot of people very angry.

            I found it difficult to sympathise with the losers because they started demanding the government compensate them for their losses; they were the aspiring bourgeoisie who liked to complain about the lazy parasites on the dole and the working-class expecting too much in pay raises, but when it came to them being outwitted by professional investors on the stock market, suddenly free market enterprise was a bad thing and the government owed them. Plainly they forgot that share prices can go down as well as up, and a fool and his money are soon parted.

            Also, “poor” is a relative term here: if you have enough money to put substantial bets on prediction markets (which you will need to do to make any high returns), then you’re only ‘poor’ by comparison to someone who can dump a million dollars on a bet.

          • kominek says:

            Unfortunately, most markets take money from poor people and give it to rich people.

            can you suggest a mechanism by which wealthy but low-information investors in a prediction market can take money from poor but high-information investors?

            in stock markets, retail investors frequently become excited about a stock going up, and purchase some in hopes it will continue to go up without bound, well beyond any realistic valuation of the security. but in a prediction market every security has a bounded maximum value (the payout if the corresponding event occurs), and a known pay-out date. the closer the security’s price gets to that bound, the less your potential payout becomes.

            prediction market securities are much more like government bonds than corporate stocks. you’re buying something that will pay out $1 if an event happens, guaranteed. any manipulations which occur after you’ve purchased your security are irrelevant. if you continue to think the security is worth at least what you paid for it, then any drop in price is simply an opportunity to make more money faster.

          • random832 says:

            Random832’s hypothetical was the rich guy losing, but the rich guy winning seems like the more normal scenario.

            Maybe, but the rich guy losing (i.e. “a much larger total monetary volume of wrong bets are made, attributable to rich guys preferring to bet on those, and therefore the market fails to predict the outcome”) was your scenario above, not mine.

            Whether it’ll be corrected by dumb rich guys dropping out of the market or them getting better at predicting or influencing things doesn’t really change my point.

    • kominek says:

      In addition to the problem of a non-random sample, the “voting with money” aspect also means that some people influence the prediction many times more than others. So even if ten poor people put down 100$, and the rich person put down his utility-loss equivalent of 100,000$, the betting market would predict that the event the rich person supported would happen, even if they crowd was heavily against it.

      if the participants are as you described, the $100k bettor has put a $100k loss up against $1000 of earnings, while the $100 bettors have put $100 of loss up against $10000 earnings. as a simple expectation maximizers, the $100k bettor must think that the event is >99.009% likely to go their way, while the $100 bettors must think the outcome is only >0.99% likely to go their way.

      so you’ve got a room with 11 people in it, and 10 of them are saying “well, there exists a possibility that X won’t happen” and the 11th says “i’m extremely confident that X is going to happen.” i feel that the consensus of the room is that X is likely to happen, and the market should reflect that.

      (if these 11 people are kelly bettors, then the $100 bettors are just wild speculators, and the 11th fellow probably has more confidence in the outcome than they have in their love for their spouse.)

      • Guy in TN says:

        In this scenario, all you know about the $100 bettors is that they think the outcome is >1%. The may think it could be as high as 99%, and they would not have the ability to reflect that in their betting.

        So its more like walking in the room of 11 people, 1 of whom says “x event will surely not happen”, and the other 10 saying “our position is somewhere between being completely unsure, to being completely sure that x event will happen, although we cannot tell you what part of the spectrum we fall on.”

        Is the consensus of this room that x event won’t happen? Or is there another consensus, but the betting market is unable to discover it?

    • anthonynaguirre says:

      I don’t know of good sources for the honest track records of prediction markets, akin to what Metaculus does on its page. But if anyone knows of them it would be quite interesting to analyze and compare.

      Metaculus did one fun question about prediction market (predictiIt) vs. 538.com on a slate of US primary elections. PredictIt edged out 538, but both did well and it could easily have gone the other way. I think political prediction markets pretty closely follow polls anyway, as they are obviously the most pertinent information source.

  9. baconbacon says:

    One simple fact that often goes ignored in the prediction market hype is that some events cannot work well with with them. The market has to be either segregated from or positively correlated with individual bets or else it will fall apart. Sports events are great for prediction markets because there are very few ways a bet can alter the outcome of a game (and those ways are known and can be policed reasonably well). A prediction market for terrorism has serious issues. If it ever got accurate enough to be useful information the bets become self defeating. Some one drops $10,000 on terrorist attack on Hamburg in the next 2 weeks then law enforcement will react to that bet and increase the chances of preventing the actual attack. The act of betting on a specific event makes it less likely that the event happens and prevents one of the virtuous cycles of markets (correct people win more money and thus have more money to bet in the next round increasing their influence) from occurring.

    This is one of 3 major flaws with nGDP targeting through a nGDP futures market for CBs. If a CB actually controlled nGDP well enough to avoid recessions and acted based on the predictions of the market you would have to be a fool to bet against the CB, and it will eventually lose the information that it needs to function.

    • captjparker says:

      If a CB actually controlled nGDP well enough to avoid recessions and acted based on the predictions of the market you would have to be a fool to bet against the CB, and it will eventually lose the information that it needs to function.

      Not everyone buying futures is trying to bet against the market. Most are buying insurance to lower the risk on assets they hold. The real question is: are there enough actors wanting to use NGDP futures for risk mitigation to make a market.

      • baconbacon says:

        The value of market predictions is that you get the sum (weighted) expectation of all market participants in a single price. Knocking out downside speculators automatically makes the market less robust, and less valuable.

        The real question is: are there enough actors wanting to use NGDP futures for risk mitigation to make a market.

        No, the question is (for MMs) will the market be robust and accurate enough to guide the Fed’s actions.

  10. andrewflicker says:

    I’ll echo the others in that I started using Metaculus a week or two back, but without a financial incentive, or a reputational one strong enough to actually affect my life, there’s little drive to either keep using it or spend the necessary amounts of time and effort to make *good* predictions. I can toss off low-effort arbitrage style bets and probably end up net-positive, but to really “superforecast” would mean taking time away from my real work, my relationships, or my personal studies… which means it needs to offer something worthwhile in return for that labor.

    • Reasoner says:

      Yep. To take this even further, what if there isn’t actually a forecasting skill, just domain expertise in lots of different areas? In that case, the main contributor to forecasting accuracy would be willingness to spend a lot of time reading and acquiring domain expertise.

  11. AnthonyC says:

    I think typical decision-makers want to know what *will* happen, and 60-40 or 75-25 or even 80-20 is not the certainty they want…because in the absence of a real grounding of the probabilities, if you have some analyst come and tell you that there’s a 70-30 chance, what exactly do you do with that?

    I’m not sure who counts as a decision make for the purpose of this statement, and I think it is probably true overall, but my job involves making lots of prediction for other people about future advances related to new technologies coming to market, and it doesn’t align well with my experience. At minimum a large minority of the decision makers I personally interact with accept that we both know any number I give them about the future is wrong. This market will not be exactly $X billion in 2025, that coating will not cost exactly $Y/m2, and so on, and that product probably won’t be on the market in exactly Z years. But those numbers are decent summaries of the expected/likely consequences of a large chain of assumption and data points, and here’s how changing the assumptions affects the outcome range (with alternate scenarios, for example), etc.

  12. Sam Reuben says:

    So I was interested, and opened up the results-reporting link on Metaculus to see what kind of predictions they were making and how those results were doing. I really shouldn’t have been surprised at how disappointing they were. In general, the results tended to fall into a few main categories:

    1) Remarkably inconsequential topics which experts know more about, such as the rotting sea-thing or whether the Switch would be using some particular patent or other. For these, it’s not clear that we’re getting better results than just asking an expert or asking Nintendo, and this isn’t a good selling point for real predictive power.

    2) Really, really easy questions, like “will housing prices go up?” or “will 2016 be the warmest year on record?” There’s a possibility of the answers here being wrong, but seriously, easy questions don’t prove the potency of the system one bit. If this system is actually useful, then it will (with tuning) give us accurate answers with high confidence to things which ordinary people and experts have a hard time coming to consensus on. If it can’t do that, then why not just use polls and expert opinion the way we always have?

    3) Nebulous questions that use “soon” and other words to make the results much more difficult to judge objectively. This is just poor practice. I mean, “how important are time crystals?” Really? How on earth is that supposed to be judged through an objective lens, rather than just trusting an expert on whether they really are or aren’t important?

    4) Questions about how humans are going to act, such as through voting, which seems like a rather pointless abstraction. If you want to know what people are going to do, why not ask them through polling and construct a meta-analytic algorithm to interpret those polled results? Oh, wait, people already do that, and it tends to be rather successful.

    There are also a few silly community-oriented questions, like “will Metaculus add a meta discussion feature,” which really should not be included in an attempted demonstration of how good Metaculus is at predicting things. Only a few questions, generally about the stock market and AlphaGo, seem to qualify as things which there’s some real contention over and which can’t be effectively predicted through traditional means. Meanwhile, on the failure side, the incorrect answers seem to be be things that most people got wrong as well, and the Metaculus scores tend to be identical to what the community was saying! This doesn’t seem like an advance in prediction in the slightest.

    Metaculus can be defended by claiming that it’s not highly tuned yet, and that it doesn’t yet know who tends to be right and wrong. I’m not so certain it’s ever going to know that, with how it’s built. If its accuracy relies on taking the opinions of those people who tend to be correct and rating them highly, then what it should really be seeking to do isn’t so much giving individual predictions as identifying experts. In fact, a critical system to identify experts who tend to be right over experts who tend to be wrong would be extraordinarily useful. That isn’t what a prediction market seems to be doing, though, and until we get some actual results, it doesn’t seem like there’s much of a place for them.

    To make the goalposts clear: in order for Metaculus or other prediction markets to be useful, they must be able to generate accurate answers with good confidence to questions which:
    -Are clearly defined enough for the results to be incontestable.
    -Can’t be answered effectively through standard polling and poll analysis.
    -Are a subject of contention among experts, and such impossible to answer by deferring to their opinions.
    -Hopefully have some serious relevance to the state of the world.
    It’s fine if early results leave out the fourth qualification and just serve as proof of concept, but the finished product should actually change how we interact with the world. Without the first three, however, it’s not clear that the prediction market has done anything new. I think it goes without saying that the answers should be correct, but they also have to have significant confidence behind them to be worthwhile. If Metaculus makes a ton of 51% confidence predictions and gets 51% of them correct, it is indeed very reliable at analyzing the odds, but you still wouldn’t get much advantage over using its predictions over just tossing a quarter.

    So yeah, not convincing. It’s probably worth participating in prediction markets, for the sake of testing out the tech, but it’s highly misleading to say that they’re any good at all right now. There’s probably a reason why corporations and governments aren’t using them, and the answer isn’t always stodgy, nervous conservatism. (Not the political kind, I’ll clarify.)

    • anthonynaguirre says:

      Sam,

      Thanks for taking a look! A few comments on your concerns:

      1) Remarkably inconsequential topics which experts know more about, such as the rotting sea-thing or whether the Switch would be using some particular patent or other. For these, it’s not clear that we’re getting better results than just asking an expert or asking Nintendo, and this isn’t a good selling point for real predictive power.

      Yes, these are inconsequential. Part of the aim of the site is to be fun, and part is to provide fodder to hone people’s skill and identify who is good at predicting; those are the main purpose of questions like these.

      2) Really, really easy questions, like “will housing prices go up?” or “will 2016 be the warmest year on record?” There’s a possibility of the answers here being wrong, but seriously, easy questions don’t prove the potency of the system one bit. If this system is actually useful, then it will (with tuning) give us accurate answers with high confidence to things which ordinary people and experts have a hard time coming to consensus on. If it can’t do that, then why not just use polls and expert opinion the way we always have?

      No question is “easy” if you are trying to get the right probability for it. Knowing whether 2016 will be the warmest at 99% versus 95% vs 90% confidence is hard. (And for 2017, it’s not even clear if the answer is > 50% or less.) Brier score is not the best in this sense, in that it does not really distinguish a 90% from a 99% prediction that is correct. That’s why our actual scoring system is different, and it really punishes you if you say 99% or 1% and are wrong.

      3) Nebulous questions that use “soon” and other words to make the results much more difficult to judge objectively. This is just poor practice. I mean, “how important are time crystals?” Really? How on earth is that supposed to be judged through an objective lens, rather than just trusting an expert on whether they really are or aren’t important?

      I’m a bit confused here. All of the questions aim to have precise resolution criteria. These may be flawed and may not capture exactly what you want, but there is a big effort to put them there. They aren’t always in the title though so you have to read the actual question body.

      4) Questions about how humans are going to act, such as through voting, which seems like a rather pointless abstraction. If you want to know what people are going to do, why not ask them through polling and construct a meta-analytic algorithm to interpret those polled results? Oh, wait, people already do that, and it tends to be rather successful.

      I agree! I don’t think sites like this really add that much to good poll aggregation. Same goes for prediction markets. The strength is for questions that are not amenable to polling, and are not really tracked by any major market. As such, I think asking something like Metaculus to predict the stock market (which we did in a couple places but it’s really just for fun) is going to be ineffective: that’s a system that has specific dynamics to defeat prediction of many of its properties.

      -Are clearly defined enough for the results to be incontestable.
      -Can’t be answered effectively through standard polling and poll analysis.
      -Are a subject of contention among experts, and such impossible to answer by deferring to their opinions.
      -Hopefully have some serious relevance to the state of the world.

      I pretty much agree with this list! And indeed this is what we’re aiming for. But it also has to be fun and interesting. I’d also add that an “expert prediction” is often either hard to find, or elicit, or combine amongst experts, so providing a central aggregation point for such predictions is useful in and of itself.
      I’d also agree that no small part of the use is identifying good predictors in different domains.

      • Sam Reuben says:

        Wow, didn’t expect to get a response directly from the folks! Thanks for looking and commenting. I feel like I should try and retract some of the grumpiness from my initial post; I think the idea of a prediction market is neat, but as with most other neat ideas that are in their early stages, it gets oversold by some major degree. I wanted to, as it were, rein in some of the excitement and make things a bit more critical. That means that I highly respect what you all are doing, and most of all, how hard it is. I’m not against the project one bit, just against the naive understanding that with just this “one weird trick” we can predict the future. (I think that something similar happens on this site with regards to AI, for what it’s worth.)

        On to your specific responses:

        Your answer to my 1) is good. I have to say, that’s an excellent reason for them being there. It would worry me if they were all that there is, but they aren’t, so that’s that.

        I’m a little worried about the probability-measuring that’s mentioned in 2. It seems like it could be interesting and powerful, but it means that your site by nature will tend to give far higher rankings to people who are good at math and in particular putting their understanding of the likelihood of an event to numbers. This means that the site could be liable to getting poor results for things that lend themselves poorly to mathematical thinking, like judging some social responses. Still, that’s a hard problem either way, and it doesn’t have a single magical solution. Yours is a solid one.

        Thank you for clearing up my misunderstanding on 3). I hadn’t gone into the questions to check for detailed qualifications. That was entirely my fault. There might still be questions which are harder to answer as yes or no, but you all are certainly working well with them.

        For 4: glad we think alike! I’d be interested in chatting about what precisely makes a question poorly amenable to polling but vulnerable to this kind of approach, or else reading something you’ve written on it. I have some suspicions, but there’s nothing like hearing from someone who’s thought about it more.

        I guess the last thing I’d say is that I’d suggest that the exact task of identifying good predictors might be more or less the same thing as obtaining expert predictions and analyzing them. If someone is able to correctly predict a ton of things about what’s going to happen in a domain, that makes them a meta-expert in that domain – exactly the sort of person who you’d want to ask about it! (Regular experts, of course, can be terrible at predictions: does the average farmer know much about the future of agriculture? The knowledge they hold is valuable, but not always relevant to predicting things.) That’s what really excites me about this kind of work, rather than any use for oracular purposes. It would be amazing if we could isolate a new type of human expertise and have a good way of identifying people who have it.

  13. John Schilling says:

    Tetlockian Superforecasting is hard work requiring unusual talent, whereas pontificating and pointing to a wall of credentials is much easier and more open. The results, and thus the rewards, are hard to distinguish on the short term.

    And the achievable rewards in existing prediction markets are IMO too meager to motivate people to put in the time for years on end. A very good stockpicker can “retire” by turning middle-class life savings into an actively-managed equities portfolio; the limits on the prediction markets I know of do not allow for the same. Neither do the reward scales of any plausible internal corporate prediction-aggregation scheme. Which makes this a hobby being pursued primarily for the benefit of others and with the prospect of a little bit of money on the side. I don’t think that’s enough.

    A wealthy individual or institution might be able to hire a superforecaster for bignum $$$ to e.g. predict oil prices, but they aren’t going to tell the rest of us. A political or PR institution might hire superforecaster to make relevant predictions about e.g. the plausibility measures necessary to prevent AGW from turning catastrophic, but the rest of us aren’t going to be able to tell them apart from lying pundits.

    Unfortunately, anything that does allow proportionate rewards to forecasters of e.g. commodity prices, is functionally indistinguishable from a commodities market – and we already have a vast bureaucracy devoted to enforcing maximal skepticism and suspicion re new and inadequately-recognized commodities markets. Not seeing an easy solution to this one.

  14. the verbiage ecstatic says:

    So I think what we’re missing is laws about the input -> output ratio of prediction markets. I would not predict that as resources go to infinity, predictions go to 99% confidence. Rather, I would predict that calibration (in a well-designed market) goes to complete accuracy, and confidence levels vary based on problem domain / amount of public information about that problem domain. (A good prediction market with infinite resources will give you a prediction with 50% probability that a fair coin toss will come out heads).

    So in other words, scaling up markets doesn’t give you a magical oracle, it just gives you a good understanding of how uncertain outcomes actually are. Which is still useful, but not as useful.

    Which brings you to cost. My understanding is that super-predictors have to invest time learning about problem domains and absorbing information. For prediction markets to work for the long-haul, someone’s gotta be compensated for that work. I can imagine in situations where the predictors themselves value knowing the outcome of the market, you’ll get natural participation without too much money changing hands. But that limits it to domains where questions are broad and generally interesting, so you’re not going to get great personalized advice. If you want predictions about things primarily of interest to you, you’ll need to pony up, and that means we need to know how costly prediction markets are compared to, say, hiring a couple researchers full time. Moreover, if you want to gain a competitive advantage in some domain from the predictions, you’ll need to keep the whole process confidential, which means a lot more $$.

    So, I think the lack of excitement is because there are some hard practical problems in the way of making them into an economically useful tool.

    I would love to see a non-profit prediction market run by an effective altruist organization… that would be a lot of fun and possibly quite valuable. But that’s definitely charity, not business.

    • Quixote says:

      Prediction markets add a lot of value even when no one has to do any additional work by making correct statements profitable and making false statements costly.
      For example, suppose EvilCorp already knows that dumping Chemical X in a river will lead to the cities downstream having higher than average cancer mortality over the next 20 years. This knowledge is also known by DoGooder.Org and Local University. The research has already been done and no one needs to do any additional work to make a confident prediction.
      Without a prediction market, what happens is DoGooder.Org issues a well-crafted PR statement like, “oMg eviLCorP is goNna give us cnacer and we’re ALL goNNA DIE!!!”, EvilCorp issues a statement saying that this is alarmist, that the evidence is currently unclear, that the relevant chemical is legal and used in over a hundred countries, and that the FDA has never issued a definitive ruling finding against Chemical X. Local University then probably issues some long careful academic statement that no one reads or reports on.
      With a prediction market, someone poses the proposition, “contingent on chemical X being dumped in the river, cancer rates in Downriverville will be 20% greater than the national average in the average of the 5 years leading to 2040.” Then DoGooder.Org buys positions indicting a high risk. Some number of informed professors at Local University buy positions with their personal account. Then EvilCorp uses its much deeper pockets to buy a significantly larger number of positions pointing the other way and the market shows a clear low risk from dumping chemical X.
      Over time, each year EvilCorp needs to buy more positions to influence the market and cancel out the influence of experts, doctors, informed citizens, etc. Over time it builds up a sizable position and then needs to report the financial position in its ‘Financial Instruments, Futures Contracts, and Hedges’ section of its financial statements. Once its, there some financial analyst reads it and mentions it in a report. Some biotech hedge fund reads the report and starts buying futures on this question. EvilCorp no longer has relative deep pockets that allow it to distort the market and the market no clearly shows what will happen. EvilCorp’s investors see that it has a huge losing position in futures contracts and also that its done something that will predictably make a lot of people sick. They demand that EvilCorp start taking reserves for the inevitable day of reckoning in 2040. This winds up being a significant drain on EvilCorp’s profits.
      In 2035, seeing how all this played out, EvilButPragmaticCorp decides not to dump Chemical Y into a local river.

      • the verbiage ecstatic says:

        That’s an awesome scenario :-). Little bit of a chicken and egg problem, though, because EvilCorp will feel no pressure to participate unless prediction markets are already an extremely important societal institution.

      • aiguille says:

        Unfortunately, prediction markets can’t always make false conditionals costly.

        Consider an alternative strategy. Instead of spending lots of money on predictions that will be proven false, EvilCorp buys the proposition “contingent on chemical X not being dumped in the river, cancer rates in Downriverville will be 40% greater than the national average in the average of the 5 years leading to 2040.” They have deeper pockets than DoGooder.Org and the professors, so the market shows clearly that while dumping is bad, it’s better than not dumping.

        What happens in 2040 after decades of dumping depends on how the market handles conditionals. If, as I’ve seen suggested, trades on “if A then B” are undone when A is false, then EvilCorp gets all their money back. If the conditionals were made by combining propositions, then EvilCorp has bought a lot of “Either X will be dumped or cancer rates will be 40% above average”, which is true, so they make money.

        EvilCorp will lose if someone forces them to stop dumping, but anyone wanting to do that has to go against the always-accurate market’s prediction of dire consequences.

        I haven’t seen an implementation of conditional prediction markets that isn’t manipulable like this.

      • Deiseach says:

        If EvilCorp has any brains, it will have its own experts lined up to talk about cancer clusters and how correlation is not causation, see the Sellafield cancer cluster.

        “Just because Donwriverville has 20% greater than the national average cancer rates in the average of the five years leading to 2040 does not mean Chemical X in the river is responsible”, says EvilCorp. “Go ahead and prove it. Despite the public outcry fostered by do-gooder groups, the childhood leukaemia was not caused by radiation but by viruses. Same here with Chemical X, which is only being used as a scapegoat by DoGooder.org which wants to close us down. If we close down, the loss to the local economy will be massive. Will DoGooder.org and Professor Brainiac from Local University pay your wages every week?”

        I like this excerpt:

        And they found that the amount of extra radiation from the sites was dwarfed by the amount of radiation from natural sources – such as radon gas from the ground and naturally occurring radioactivity in foods, such as brazil nuts and bananas.

        Bananas give you cancer/bananas are radioactive? This would be perfect for EvilCorp to set up its own rival PR group along the lines of “DoGooder.Org issues a well-crafted PR statement like, “oMg eviLCorP is goNna give us cnacer and we’re ALL goNNA DIE!!!” and bombard the public with BANANAS GIVE YOU CANCER!!! in order to ridicule DoGooder.org’s efforts and lose them public support, especially in the prediction markets (if you’re allowing “informed citizens” to buy positions, you’re allowing citizens to buy positions, and a lot of “oh no my radioactive bananas will give me cancer yeah right sure” citizens will buy the “no, dumping chemical X won’t increase by 20% above the national rates” position).

        Also, what if it’s only 18% above the national rates? Does EvilCorp make its money back then? Unless you’re stone-cold sure it’s going to be 20% (no less and no more), there appears to be ways around the conditionals that make EvilCorp come out at least not a loser.

        • Nornagest says:

          Yeah, bananas are (very mildly) radioactive. It’s because of potassium: bananas are high in it, and potassium-40 is one of the more abundant naturally occurring radioactive isotopes. It’s not very significant — living in an area with a lot of granite irradiates you more, and both are nothing compared to a dentist’s X-ray — but it is detectable.

          I’ve heard that bioaccumulated heavy metals are responsible for a largeish fraction of tobacco smoking’s cancer-causing potential, by a similar mechanism.

        • meltedcheesefondue says:

          >“Just because Donwriverville has 20% greater than the national average cancer rates in the average of the five years leading to 2040 does not mean Chemical X in the river is responsible”

          That’s why you need a prediction market, not a retrospective justification market. If it’s well calibrated, you’re see the difference in cancer rates in the two scenarios

  15. baconbacon says:

    Nobody’s profiting off of anything

    I think this is a major problem. To get good predictions you need to sift through a lot of bad predictions, if someone isn’t getting paid then you are mostly going to get hobbyists. The places where it is easy to pay for good predictions are already heavily saturated (ie the stock market) and function as prediction markets already.

  16. Quixote says:

    People are not charities. For it to be worthwhile for me to post predictions, I need to get paid when my predictions are right. The prediction markets provided that reward, to my understanding these aggregators do not.

    As to why they never caught on, that seems simple. Like most things in a pseudo capitalist world, you understand the subject by asking yourself if it would be beneficial to large corporations and hereditary wealthy families. If it’s beneficial to those groups it will probably happen, if it’s not than absent a massive social push it won’t.

    In this case, both those groups benefit greatly by making truth more difficult to access. The tobacco execs said, “our product is doubt.” The energy companies are on the same playbook for climate change. Real estate developers probably don’t want a clear line on when various cities will flood. CEOs don’t want investors to have visibility into what projects might be overdue. And so on.

    • anthonynaguirre says:

      I think this is super important.

      Most prediction markets (probably all, at the moment) are run as a negative-sum game, since there are trading fees and a “tax” from the platform. So unless you really do have privileged information, they are irrational to use unless it is as a hedge (or some other similar use in a overarching plan). In effect, both the prediction makers and prediction consumers are paying the platform. It’s quite different from investing in almost any real market (with a positive overall growth rate), and really more like actual gambling.

      I think the only way things actually make sense economically is for the prediction consumers, who value the predictions, to pay, and the prediction makers to be paid for their work. This is possible in a market through market subsidies but is a bit awkward. It’s definitely part of the Metaculus long-term plan (and I think there are pretty clean/nice ways to implement it) if we can get large/effective enough that there is a subset of questions that people are willing to pay for answers to.

      In the meantime we may try out some prizes and other fun incentives to spice things up.

    • tscharf says:

      I specifically went there looking for climate change predictions and it was disappointing. Sea level rise is by far the poorest covered aspect of climate change, it is routinely exaggerated by the media to the point of absurdity. I’ll take the under bet on whatever the world thinks it will be especially if they are getting their information from the media.

      I strongly wish climate science would open a betting market on climate impacts that was climate scientist only. Almost everything you hear through the media is low probability outcomes that rarely, if ever, come with the probability in the story. The most egregious recent example is here.

      • anthonynaguirre says:

        Right now I find 10 climate-related questions, though only three are still open. But there are many more possibilities. If you’ve got specific climate-change questions to suggest, I’d definitely encourage you to suggest them. If Metaculus can get a critical mass of interesting ones we can reach out in a more concerted way to the climate science community.

        That being said, this is a place where you really want to much more heavily weight established predictors especially with a track record on similar type of questions. The politicization of the issue would lead people to make ideologically-based predictions. Such people tend to get weeded out fairly quickly (though not instantaneously) by losing a lot of points and becoming irrelevant.

        • tscharf says:

          I saw those, but was disappointed. As far as global warming goes the biggest questions are related to carbon sensitivity, amount of warming through 2100, and more importantly their associated impacts such as sea level rise this century, predicted affects on droughts, storms, etc. Questions such as the warmest year on record are very uninteresting in the grand scheme. Climate is highly variable year to year so these are just bad questions.

          They probably don’t like a lot of these questions because they won’t be resolved until long term but I’m more interested in what people believe relative to IPCC predictions to gauge the affect of media propaganda and ideology. It’s likely that the answers wouldn’t be scientifically useful except sociologically so maybe it is a bad match.

  17. tgb says:

    I won $150 worth of Amazon gift certificates which I used to help buy an electronic keyboard for myself via my participation in the (now-defunct) Daggre. Playing the piano went on to become one of my favorite hobbies, so I’ve profited nicely from prediction markets already!

    • sflicht says:

      DAGGRE was great, and its successor (SciCast) was also pretty good. They had some genuine innovations in prediction market design. I was sad that they couldn’t get funding to continue the project. The PI (Twardy) is now working the private sector on unrelated research.

      • Quixote says:

        Yeah. That was a great experiment and yielded many amazon gift cards. Does anyone know if there are archives of the monthly ranking on that site? I was a pretty consistent top 30 predictor, and would like to be able to link to it for online bragging and CV stuff

  18. Anon. says:

    My personal view is that the power really comes when you get a *lot* of people, questions, and data

    The GJP data indicates the opposite: teams outperform crowds. People intelligently combining info is important.

    • anthonynaguirre says:

      Actually I think that too: no amount so-so predictions is better than a few really good ones. But knowing which ones are really good, and putting them together on teams (as GJP did) takes lots of people and lots of data.

      • Christian Kleineidam says:

        I don’t think you need to know a lot to integrate a feature where people can build teams on the website. Teams can find their team members organically.

  19. drachefly says:

    70-30 is just weak information. You know both are possible as far as we know, and neither is so unlikely you can completely neglect it. In a lot of circumstances, it really isn’t that different a scenario than 30-70.

    A lot of gains in planning come from ruling out possibilities altogether so you can abandon preparations for them. With weak probability ratios like this, you don’t get that.

    • meltedcheesefondue says:

      70-30, with enough payoff, is bet your short-term career on it and expect to do well. 30-70 is not.

  20. Most businesses don’t even have staff suggestion boxes. Bottom-up information is a very under implemented area.

    • andrewflicker says:

      To be fair, if a small or medium sized business is well-run, it doesn’t really *need* suggestion boxes, as employees already share information and suggestions. Suggestion boxes really only help if the system is large enough or dysfunctional enough that the “normal” communication lines have broken down.

  21. ManyCookies says:

    30% chance of Trump being out of office by February 2019!? That does not seem remotely realistic.

    • justanotherlaw says:

      Which way? Is it too low or too high?

      • ManyCookies says:

        Way too high (sadly).

        Mueller’s investigation seems to have teeth, but a year and a half is just not enough time for a complicated fraud/collusion investigation to pan out, let alone tried and processed through a friendly congress. Trump might resign if some unambiguously incriminating bombshell exists and it’s low hanging enough for Mueller to find in a year and half, but that is not a double digit chance.

        Assassination is unlikely, certainly not a double digit chance. Health issues might be 4-5% according to this table, if even that; that table is from the social security population, which doesn’t have access to the best health care the US can provide on prompt notice.

        That’s enough risk for an optimistic 10%, but 30% is plain ludicrous.

        • rocurley says:

          The market doesn’t pay out in the case of assassination, to avoid incentivizing it.

        • sandoratthezoo says:

          February 2019 because it’s possible that Congress could change hands in 2018 to Democrats, and then (the thinking goes) a Democratic Congress would be much more likely to impeach.

          • baconbacon says:

            IIRC you need a 2/3rds vote to actually force the president out of congress, that would take a hell of a win for Ds to grab that much of congress.

          • sandoratthezoo says:

            The procedure is a simple majority of the House to “impeach” (which doesn’t actually remove from office), then a “trial” in the Senate presided over by the Chief Justice, then Senate votes to confirm, and yes, 2/3rds needed to confirm.

            The Senate is, I think correctly, seen as less friendly to Trump than the House is, but on the other hand I believe that the seats up for contention in 2018 make Democratic taking of the majority unlikely in the extreme and taking 67 seats obviously ludicrous (in fact, upon review, only 8 Republicans are up for election in 2018 Senate elections, so it is completely impossible for Democrats to take 2/3rds of the Senate without, like, massive deaths, resignations, or defections from the Republicans). But I could imagine that in the event that it got that far, some non-trivial number of Republicans might defect.

            None the less, I think that the current odds on Trump being out of office (or dead) by 2018 on Metaculus are too high.

          • sandoratthezoo says:

            I think that my prediction is something like:

            chance that he dies of natural causes: 1-2%

            chance that he experiences health problems severe enough that they force him to step down without dying: 1-2%

            chance that he flounces out: 0.1%

            chance that he has some health problem severe enough that it makes a plausible excuse for him to flounce out and he takes the excuse: 1%

            chance that he is impeached by a Republican congress because strong evidence comes out that holy shit he actually straight up did something grotesquely illegal: 2%

            chance that he is impeached by a Republican congress because he does a bunch of shit that suggests that he’s guilty as sin and is trying to cover it up (fires Mueller or obviously obstructs the investigation): 1%

            chance that he is impeached by a Democratic house and confirmed by a Republican senate on weaker evidence: 0.5%

            So if those are basically independent odds, and taking the least-Trump-friendly versions of those odds: 0.98 * 0.98 * 0.999 * 0.99 * 0.98 * 0.99 * 0.995 = 91.7% that Donald Trump is the president as of February 2019.

      • I think there’s a fair chance he could get fed up and flounce out. Am I alone?

        • ManyCookies says:

          I think he’d be too proud to outright quit, though he could definitely pull a Coolidge and just let his VP/staff/cabinet run things.

        • Paul Brinkley says:

          You’re not alone. I’ve thought the same thing, along the lines of the Chief Executive Office not working like a private chief executive office. The culture clash might be enough to drive him out, a la Jim Webb.

        • Conrad Honcho says:

          My model of Trump has him slightly infatuated with his own name. Going down in history as “President who talked big and quit” does not fit that model.

    • Glen Raphael says:

      PredictIt currently thinks Trump has only a 2-in-3 chance of being president at the end of 2018. Buying opportunity!

    • meltedcheesefondue says:

      Feels reasonable to me. Remember we’re not talking about current house/senate, but the house/senate after a year and a half and a new election.

      If Democrats get the house, that means subpoena power, including likely for Trump’s taxes. Then all that’s needed is something unambiguously illegal – possibly for Russia, much more likely something financial. It’s more about how it resonates than anything else. Then if Trump does something dramatic and stupid in response – pardon a family member or himself? – it could destroy anyone’s pretences and force the senate to act.

      That, combined with the possibility of extra Trump stupidity in the meantime, means a 30% estimate seems fair to me.

  22. AeXeaz says:

    Creepy. A world of rationalists and autists using prediction markets to optimize their lives for a crowdsourced utility function which perfectly fits the aggregate but never an individual.

    The focus on AI risk in the rationalist community makes perfect sense, actually, but it should be directed inwards.

    • justanotherlaw says:

      I think moving to Ward Street has made Scott more productive. Or perhaps he hasn’t started working full time yet?

      • Debug says:

        In a separate post (Or perhaps on the subreddit) he confirmed that he has yet to start working since the move.

  23. justanotherlaw says:

    But I’m still surprised there aren’t consultant superforecaster think tanks hanging up their shingles.

    There are, or at least there are superforecasters trying. Good Judgment Inc is kind of a superforecaster-run consulting firm.

    In addition, several superforecasters are involved in finance, and I believe at least one runs their own hedge fund. It’s also not clear how super superforecasters are – my impression is that they’re merely smart people who were placed in an environment where they were incentivized to be accurate, and that many hedge funds and prop shops already internally implement a lot of the guidelines that are suggested in Superforecasting.

    I think it’s true that prediction tournaments are better than not aggregating information, or aggregating information by passing around lengthy internal memos. (Certainly better than passing around dubiously-sourced articles on social media. And prediction markets, or tournaments with smarter aggregation rules, should be significantly better than modern tournaments, at least in theory.) But I think one possible contributor for why prediction tournaments have not caught on is simply that current techniques don’t work that well – their superiority over getting together a team of smart people with some sane incentives is not sufficient (or at least not clearly sufficient) to overcome their novelty and prestige issues.

    • kruemelmobster says:

      You’re right. Besides Good Judgment Inc., which offers to host questions on either Good Judgment Open or the separate private superforecaster platform (or both), there are a few other super guns for hire pages. There’s http://www.super-powered.com and http://www.alephinsights.com (consultancies) and http://www.smart-forecast.com (ratings agency for forecasts). Moreover, a stock club by and for superforecasters exists. And there are two podcasts run by superforecasters, the NonProphets pod and Cognitive Engineering.

  24. DGW says:

    New Zealand had a pretty good set up for a few years until it was banned: https://en.wikipedia.org/wiki/IPredict

    • sflicht says:

      Disagree. I was a user of that site briefly and it was terrible. Ramshackle UI and a laughable attempt at automated liquidity provision.

  25. Ivy says:

    I find Hanson’s and Tetlock’s work on prediction markets and aggregators very exciting, but have never participated in one, and might be representative of the marginal user that Metaculus would try to attract.

    I think the biggest piece missing for me is incentives, either financial or reputational along the lines of a StackOverflow points system or GitHub commit history.

    Making good predictions takes a lot of time. I need a good reason to justify spending that time. If not significant financial returns, at least some badge that lets me credibly signal “I am an unusually good predictor of the future for hard, important questions in X domain” to employers, collaborators, or internet people I get into arguments with.

    But without a prediction market, figuring out what questions merit the “hard, important” criteria is a central-planning-complete problem. You can design criteria like “well calibrated” or “correct 60% of the time against consensus on questions with at least 10 other people predicting” but these will be incommensurable between people who answer different sets of questions.

    • justanotherlaw says:

      It’s interesting, because both Metaculus and GJOpen (which grew out of Tetlock’s work on IARPA) have reputational mechanisms. Metaculus has a cumulative score, while GJOpen tracks your average Brier score and how much better you do than the average forecaster who forecasted on the same questions on the same days you did. Clearly these are not perfect scoring mechanisms and can be gamed, but I think it’s a start.

  26. Nornagest says:

    Saying that Robin Hanson attributes something to signaling and prestige is kind of like saying that Joseph McCarthy suspects the influence of Communism.

    • Scott Alexander says:

      Mostly true, but I think this was the thing that started him being interested in signaling, so it escapes that problem.

    • Quixote says:

      I love this comment. The phrasing is perfect and I just wanted to say +1.

  27. RohanV says:

    Maybe knowing what will happen is easier than executing correctly on that knowledge. How many events truly blindside people, especially people in the relevant industry?

    I’m kind of reminded of Microsoft in the 1990s. They knew the Internet would be important, and they took steps that they thought would help them, but it didn’t work out. Was the problem lack of foreknowledge, or inability to execute on that foreknowledge?

    I think that’s what the prediction markets are missing, someone who conclusively demonstrates how to leverage that knowledge.

    • Scott Alexander says:

      Isn’t this just a prediction subproblem? “We know the Internet will be big, but we don’t know whether to support HTTP or Gopher”. Sounds like a problem a prediction market could help with. “Okay, it’s HTTP, but we don’t know whether paid subscriptions or ads will be more monetizable”. Well…

      • RicardoCruz says:

        Exactly. In reinforcement learning (e.g. training AI to play video games), decision problems are treated as prediction problems. The “AI” predicts the probability of each move being successful in attaining the goal and acts on the one it judges as most successful.

      • baconbacon says:

        No, its not just a prediction sub problem, because a large issue is about things that barely/don’t exist. If you asked in early 1997 who would be the dominant search company you couldn’t phrase it as “Askjeeves, Altavista or Google” because Google didn’t exist yet. The best you could do is something like “Askjeeves, Altavista or some future startup”. There is going to be a pretty short window (if there is one) where Google will be acquirable and in the conscious of your forecasters.

        Also there is the confounder that perhaps Google wouldn’t have become so dominant if it had been acquired by a major company like Microsoft.

        • AnthonyC says:

          So? Yes, framing matters and there are things no one knows yet, but is that worse than the failure rate you get without prediction markets?

        • Deiseach says:

          If you asked in early 1997 who would be the dominant search company you couldn’t phrase it as “Askjeeves, Altavista or Google” because Google didn’t exist yet.

          Oh, this takes me back! When I was first doing keyboarding skills/IT skills courses, and the instructors would tell us about “search engines” and recommend ones like Ask Jeeves, Lycos, and the rest.

          I’d heard online about this thing called “Google” which I tried and found fantastic, much better than the Official Recommendations, and I’d proselytise about “You should try this new one called Google, it’s great!”

          And in response there would be a lot of “Hmmm, yes, well, it’s not on the textbook handouts and the top-ranked ones are Ask Jeeves etc.” 🙂

          This article is great – I had totally forgotten about DogPile! One of my instructors loved DogPile!

      • Paul Brinkley says:

        It could also easily be a timing problem. Sun Microsystems pushed the “the network is the computer” idea for years, committed to server room hardware, had their lunch eaten by local hardware models until Oracle bought them up, and now here we are, touting cloud this and cloud that… Sun arguably could have been fine had they been able to hold out ten more years, or waited ten years before committing.

    • I think the microsoft problem is typical, and it is only survivor bias that make sit look otherwise.

  28. Douglas Knight says:

    Didn’t Robin Hanson claim that a fair number of companies had tried internal prediction markets, found that they worked, and abandoned them?

    I think typical decision-makers want to know what *will* happen, and 60-40 or 75-25 or even 80-20 is not the certainty they want. In the face of this uncertainty, I think people mentally substitute a different question to which they feel that they have some sort of good instinct, or privileged understanding.

    Yes, I think a lot of binary questions are bad for those reasons. But I think the typical commercial applications are for continuous variables, like ship dates or sales forecasts, which should be easier to understand.

    • Lirio says:

      So my Boyfriend posited an interesting idea: If prediction markets work, then they would diminish the value and prestige of company executives, who are the very people who will decide whether or not to implement them. Now in a free market one supposes that eventually some company will care more about the value of accurate prediction over the prestige of its executives, and then use the competitive advantage to attain market dominance, forcing everyone else to catch up. However until that happens, it’s unlikely that executives will implement measures that they feel threaten their positions.

      Which i guess is what Robin Hanson was getting at.

      • Douglas Knight says:

        That’s an explanation of why they don’t adopt it in the first place, but I don’t think it makes sense as an explanation of why they abandon it. Do they learn something using it? Does it diminish their prestige more than they expected?

        • Lirio says:

          Most likely they thought that these markets would be useful tools that would enhance their abilities as decision makers. When it became evident that they could in fact replace them as decision makers, at least in some domains, they did not seem quite as appealing. So they discontinued them on spurious but plausible sounding reasons.

          This probably did not happen at a concious level. People are very good at rationalizing reasons for why things that are in their self-interest are in fact in everyone’s interest. They are also very good at convincing themselves this is the case, because the most effective lie is one you believe in.

        • justanotherlaw says:

          I remember hearing that Google cancelled their prediction markets at least in part due to OPSEC reasons. After all, you didn’t want to make so much information on project dates and new programs available to everyone inside of the company, many of whom might be tempted to leak it.

          I’ve also heard from Tetlock that managers have found it difficult to motivate people in the presence of prediction tournaments – either it was a sure thing, in which case why bother, or it was sure to fail, in which case also why bother. The tournaments created self-fulfilling prophecies, in other words, and I’m sure you can imagine how you wouldn’t want those in your company.

          • Steve Sailer says:

            The long running Hollywood Stock Exchange for predicting movie box office grosses encourages the use of insider information:

            http://takimag.com/article/betting_on_the_hollywood_stock_exchange_better_play_roulette/print#axzz4opn1Vslv

            This became a problem in 2010 when HSX’s owner, Cantor Fitzgerald, tried to get permission to turn betting on movie grosses for pretend money into a real futures market using real money. Congress eventually outlawed movie futures.

            It would be kind of silly for the SEC to have to try to police insider trading about movies, since Hollywood depends upon nonstop gossip for its decisionmaking. Heck, I learned a lot of stuff from my late stuntman neighbor about which seemingly rising stars were unreliable due to a drinking problem or whose careers were unlikely to thrive in larger roles because they weren’t as masculine as they had managed to seem in early small roles.

            For example, my son made a killing on HSX in fake money a few years ago by betting on Chris Pratt and upcoming movies he was in. This was largely a bet on whether Chris could shed his extra weight and develop a leading man’s body. That’s a pretty public process if you happen to live near Chris Pratt. So should the SEC police gossip so that investors in Mobile, AL have a level playing field as people who live near Pratt and his wife Ana Faris?

            All this is real ball of twine.

          • Robin Hanson says:

            I can’t reply directly to Steve, so I’m doing this to reply indirectly. Yes there would have been insider trading in HSX, but no that wasn’t a problem. Congress outlawed it due to pressure from the industry, there was no legal or procedural problem otherwise. It was the CFTC who regulated HSX, not SEC, and they don’t worry about insider trading.

        • Robin Hanson says:

          A common trigger is when a market contradicts a high status person, and is proven right, embarrassing that person. For example, there’s a project with a deadline and the project lead says yes we’ll make it, and the market says no, with high confidence, and then no they don’t make it. That’s when they kill the market.

      • Paul Brinkley says:

        Now in a free market one supposes that eventually some company will care more about the value of accurate prediction over the prestige of its executives, and then use the competitive advantage to attain market dominance, forcing everyone else to catch up. However until that happens, it’s unlikely that executives will implement measures that they feel threaten their positions.

        Something about this argument feels not quite right. I think executives don’t necessarily think of themselves as a group; any one of them will not reflexively support the group of executives as a whole. To wit, any one of them will be competitive enough to seek accurate prediction and gain more market share at the expense of executives of companies in the same industry.

        Or maybe there’s a bit of camaraderie going on even so. They network, after all. Maybe a more accurate cut on this is that executives will undercut competing executives, but gladhand and network cheerfully with non-competing executives, and so here, they’ll back away from prediction aggregators out of support for their non-competing executive fellows, which outnumber their competitors?

        • Lirio says:

          This is not a coordinated action, this is a bunch of people individually seeing a threat to them personally, and deciding to shut it down. Like it makes sense that the executive put in charge of creating a better forecasting method was already generally in charge of forecasting. So on finding that the prediction market doesn’t enhance his expertise so much as replaces it, he goes and convinces the CEO that it’s no good. The CEO himself might also notice that the prediction market could make some of his decision making functions redundant, and so is well inclined to agree. Unless there is competitive pressure from other companies which are using such market’s, there’s not a lot of incentive to stick to them if they’re a threat to your prestige.

          Also as pointed out by others, prediction markets make it difficult to keep company secrets, and small groups of professionals outperform large groups of amateurs anyway.

          EDIT – Since i started on this thread due my Boyfirend’s thoughts on the matter: He has on further consideration concluded that the opsec and the effectiveness of small professional group likely play a bigger role than executive ego. Though it’s still a factor. Inclined to agree with him on this.

          • Deiseach says:

            I imagine it might be more down to “Who cares what Igor in the mailroom thinks is the winning marketing strategy?” and when they look at who the best predictors are, it turns out to be the guys in marketing/R&D/logistics anyway, who they already have working on the problems as part of their day-to-day work, so the prediction market doesn’t really give them that extra value worth the money.

            If it turns out the canteen staff, receptionists, and maintenance staff do really well on predictions then great, but how often does that happen versus “the best predictors are the guys already doing this stuff anyway”?

            What I mean is, the more and better information you have, and the nearer the question is to your area of expertise (whether professional or “it’s an interest of mine”), the better your prediction is going to be. So Marjorie the receptionist is not likely to have as good an understanding of the “which is the better marketing strategy?” as Philip the MBA, so the better predictions come from the ‘experts’, and then the company asks “But isn’t this what we’re paying these guys for anyway?”, in which case a prediction market isn’t doing anything more for them than they’re getting already.

          • Steve Sailer says:

            Right, for example, if the national sales manager is making worse sales forecasts than the six regional sales managers who report to him, then most firms will probably eventually fire the top guy and replace him with one of his underlings.

    • Chalid says:

      I feel like an obvious issue with corporate internal prediction markets is that you don’t want to give employees incentives to make projects fail. How might such an issue be addressed?

      • Chris Hibbert says:

        The usual answer is to make the stakes be trivial. People play many play money or small stakes prediction markets for the fun and the (small) glory. If contributing information to the company’s pool of predictions is also good for the company, then it’s all to the good, right?

        I’ve heard details of several internal markets, and most gave either small prizes (tee shirts and nominal Amazon gift certificates) or just had a leader board and gave no rewards at all.

        • Chalid says:

          Is there any reason to think that a prediction market with only trivial stakes will be any good?

        • Robin Hanson says:

          You can make the stakes higher if you give traders a lopsided initial endowments For example, give everyone “$100 if project succeeds” to start, then let people bet that up or down, but no one is allowed to go below $0 if project succeeds. Then everyone trading has an incentive to make the project succeed.

        • John Schilling says:

          Trivial stakes do not motivate people to engage in the level of effort that superforecasting requires, and in this context the largest gain one can plausibly achieve with an initial $100 stake will be trivial.

          If there’s also some more substantial but intangible gain, like a large amount of status accruing to the guy whose prediction-market balance is seen to have reached $500, that might do it, but it also might get back to the “motivated to make projects fail” problem.

    • Steve Sailer says:

      I built internal sales forecast systems for a couple of firms about 25 years ago, and they functioned like structured prediction markets.

      Lower level people with sales responsibility listed all the potential deals they were working on, their dollar value, and the probability of closing them this quarter. They submitted them to their bosses, who would review them, and revise them after discussions with underlings, then submit them to the national sales manager. He’d discuss them with the regional bosses, then submit the national sales prediction to the COO.

      All the numbers would get updated weekly.

      How well did this work?

      Better than not having a system, or one just based on finger to the wind guesswork.

      Much of the value was that it made sales forecasting not a black box. All the working parts were visible to everybody on the line side of the business. People who routinely made bad forecasts of how much business they would close that quarter were upbraided or worse if they kept it up.

      Would it have helped forecast accuracy to get the opinions of more staff people as well as line people?

      I dunno. I was the staff guy who built the system and ran it until I handed it off to underlings, so I knew all the numbers. But I wasn’t really that interested in whether the Manhattan Beach office had a 75% chance or only a 50% chance of closing that $200,000 deal this quarter. That was a line responsibility.

      A big part of the success of prediction markets is that a lot of people are interested in being able to boast about predicting elections, sports, or Academy Awards correctly. But a lot of business forecasting isn’t very interesting to anybody who lacks line responsibility.

  29. Evan says:

    Here’s my excuse. I eagerly hopped on the bandwagon of forecasting and prediction markets, and tried talking to lots of rationalists and effective altruists about it over the last couple years. It’s one of those things that wasn’t taken up as an idea to actually do something about by many in the rationality community besides Robin Hanson. You, Scott, have blogged about it too. Frankly, I expect it wasn’t being said by the right people in the community in the right way to be taken seriously. The rationalist diaspora is hard to read these days, and to the extent the rationality community runs on a heuristic of concluding the next big idea must be what everyone else in the community is currently talking about, it’s hard to tell why some memes get taken more seriously than others.

    There was a time last year when the rationality community’s interest in Kegan levels spiked for a while. I wish I knew how to get people to excited about prediction markets and forecasting the way they get excited about Kegan levels. Anyway, I gave up on trying to get people to care about forecasting and prediction markets, as I wasn’t well-placed to do something all by myself, and I was hoping Metaculus would take off. Maybe that’s happening now. I’ll believe people will stop making excuses for not doing anything about forecasting when I see it, though.

    • Scott Alexander says:

      I’ve been trying to think what kind of prediction market/aggregator might catch on more in general. I think you might be able to get one to snowball if it had the following features:

      1. Anybody could post a question to be predicted
      2. Predictions are scored in some kind of market-like way, where the successful people’s winnings comes out of the pocket of the unsuccessful people, and you get more credit for getting counterintuitive things (as measured in everyone else disagreeing with you) than obvious things.

      PredictionBook has 1 but not 2. It had a great economy of people putting in silly things like “My relationship is going to last more than a year” or “There will be a twist ending to the latest Unsong chapter”, but there wasn’t a meaningful way to aggregate a person’s record across all of their predictions. If you wanted to look like a “good” predictor, you could just add “the sun will come up tomorrow”, predict 100%, and look great. Without knowing who the best predictors were, you couldn’t do any post-hoc adjustments to make things like superforecasting possible.

      Metaculus has 2 but not 1. I wouldn’t be surprised if it gets good answers to the things it’s asking, but it’s never going to be the same kind of fun tool for little real-life questions that PredictionBook is, and it probably doesn’t have the same potential to encourage community participation.

      Is there anything with both 1 and 2? Does combining these have as much potential as I think?

      • Lirio says:

        Obviously, we need to get a superforecaster on the job of predicting which kind of prediction aggregator is most likely to catch on.

      • Sanchez says:

        Maybe a model like steemit where people are paid in a cryptocurrency for posting questions, voting on questions, and making correct predictions, and where this cryptocurrency derives its value from giving holders extra voting power.

      • justanotherlaw says:

        I think you might be underestimating the difficulty of implementing 1 while keeping the tournament fair. Phrasing and resolving questions takes a lot of effort on the part of the organizers, and I think it would take at least two clever ideas to find a way to find a way to automate away this labor to the point where everyone (or at least a lot more investment into hiring people to oversee tournaments and markets).

        Though note that the people at Augur and Steemit, among others, think they’ve basically solved the problem, so maybe I’m grossly miscalibrated.

        • Scott Alexander says:

          I’m worried the cryptocurrency aspect of Augur makes it too high-barrier-to-entry compared to a meaningless point system like Metaculus. And maybe I’m missing something, but Steemit doesn’t seem to be a prediction market at all.

          • justanotherlaw says:

            You’re right on Steemit not being a prediction market. Whoops. My bad.

            I think it would still be interesting to see what Augur’s solution to question resolution is, and to see if it can be salvaged for a point-based system.

      • tmk17 says:

        Augur is a prediction market on Ethereum. Anyone can pose questions and anyone can bet. It’s decentralized, so it can’t (easily) be shut down by a government. It’s currently in beta with a release expected this year, I think. You can already use it with toy money.

      • Deiseach says:

        Somebody has to ask the stupid questions around here, and that somebody is me!

        Has anybody compared the kind of bookmaker novelty bets to prediction markets? You can bet on politics, current affairs and other things. These sound like very crude prediction markets and they’re open to every idiot willing to risk a few bob. If you track something like “who will resign first – Theresa May or Jeremy Corbyn?” and see how the odds change and see if the final result lines up with where the money went, wouldn’t that give you some kind of coarse data to work with about “yes the public can make predictions” or “it would be an absolute disaster to run the government and the economy on the results of prediction markets unless those markets were confined to a set of experts”?

        • rlms says:

          Betfair basically *is* a prediction market for some topics (individuals can lay bets as well as making them normally). AFAIK, it’s generally quite accurate. However, prediction markets aren’t only interesting because of their potential to predict things well. There is also the interesting possibility of using them to govern/manage countries/companies/whatever by trading “the ‘value’ of the country/whatever conditional on policy one” and “the ‘value’ conditional on policy two” (where the value is calculated with some complicated formula featuring things like GDP per capita).

        • imoatama says:

          As someone that trades on Predictit, but is always keeping an eye on Betfair for dutch book opportunities, I can say that *most* of the time they tend to line up with each other. I haven’t seen research on the performance of either in terms of predicting outcomes, but I would be surprised if it was much different from the large body of work developed by Tetlock and others showing good performance vs other (non wisdom-of-crowd) methods.

        • 1soru1 says:

          Basic problem with paying attention to Betfair-style markets is that 1 person can spend a few thousand to move the market in such a way that they can make millions from the way real financial markets react to that movement.

          There is some evidence that actually happened ahead of Brexit.

      • anthonynaguirre says:

        Scott,

        Metaculus has 1, in that anyone can suggest a question. It then has to be moderated and approved before going live, though. This is important, as most people suggestion questions that are vague or ambiguous or not fleshed out.

        We’re also planning a “second tier” of question that would be private, like predictionbook, where you get to resolve it yourself, and open it (or not) to others to predict. (This is actually 2nd on our to-do list of major features.) But you’d not get “leaderboard” points for that, just better self-knowledge and skill 😉

        • Evan says:

          Is there a process by which questions can be heuristically evaluated for how resolvable they are? A friend and I have been talking about this. It’s Andrew McKnight, if you know him. If Metaculus has a process for approving questions more than just moderator intuition, and you’d be willing to share it with us, we could share some of our ideas with you on how to improve it. Alternatively, we had some ideas in mind for how to generate reliably generate good forecasting questions. If you tell us what roadblocks you run into while trying to get good questions, Andrew and I can try incorporating a solution or patch to that while brainstorming. I think the key is to think of it like an algorithm, though we haven’t figured out exactly how to transform that into a website yet.

          • anthonynaguirre says:

            @Evan Happy to talk more; yes, I’ve met Andrew. Why don’t you get in touch directly (anthony@metaculus.com will work) and we can discuss.

          • jakubsimek says:

            From Tetlock’s Edge Masterclass – First the question needs to pass the “clairvoyant test” – if an oracle could see the future, on an exact date, she needs to be able to confirm your prediction. So for example the question needs to have a deadline and avoid some ambiguous words.

            The second problem is how to cluster questions together to allow for some granularity.

      • Sam Reuben says:

        I’m just going to say, this sounds an awful lot how like the stock market works, given that we understand the stock market to just be asking “how well will this company do?” over and over again. The stock market is not known for being unusually sane and reliable, and I think some of the issues might carry over to betting-oriented prediction markets.

        • disciplinaryarbitrage says:

          I am not an expert on either stock markets or prediction markets/aggregators, but it seems to me like the latter has a couple advantages. For one, the ease of making a ‘prediction’ in a stock market is asymmetric on long v. short–if you think a stock is undervalued you just buy it; but if you think it’s overvalued shorting it is a more costly transaction. More abstractly, the fact that stocks never really resolve to their ‘true’ value means that (most) traders aren’t really asking “how well will this company do in the long run?” so much as “will this company, in the near term, do better or worse than the market’s expectation, as expressed by the price?” If you’re an active trader, you never get feedback on the true underlying value of an asset you’re trading, you just get feedback on whether your prediction of near-term movements in the value of that asset were correct or not. This (naively?) seems liable to add volatility relative to prediction markets/aggregators where your predictions regularly resolve to yes/no.

          • Sam Reuben says:

            These are all fair and reasonable points, most especially the lack of “feedback on the true underlying value.” That, if anything, sets prediction markets apart, and comfortably makes the case that they’ll operate differently.

            At the same time, there’s one element that’s the same in each, and it’s the tendency of humans to mold into crowds and mobs at the drop of a hat, often doing quite irrational and destructive things in the process. Just as the stock market is vulnerable to “noise” in its signalling of market value, with people buying and selling massive amounts of stocks based on rumor, so too might a prediction market.

            Let’s say, for example, that we have a developed prediction market with a cadre of extremely reliable predictors. They all base their accurate predictions on a given set of knowledge, especially on which sources to believe about things. Suddenly, one of their sources is compromised, or they receive a rumor which seems highly convincing to them for whatever reason. They make predictions accordingly, using their same system, and get those predictions catastrophically wrong. If the prediction market is a game, that’s no big deal, but if it’s something we’re actually using to predict things with (which is, I believe, the goal), we’ve got a major problem on our hands.

            To sum up, the idea is: these predictors aren’t good because of magic, or because of absolute understanding of the subject in question. The first is absurd, and if they had the second, we wouldn’t be wasting our time with a prediction market, we’d have known about them already and just be asking about things directly. They’re good at prediction because they have a very reliable system of shortcuts to figure out what’s likely to happen. Unfortunately, every system of shortcuts has its vulnerabilities, places where the shortcuts lead you off a cliff. (This is where the standard criticisms of adages come in: haste makes waste, unless you’re an ambulance driver.) As such, we can be decently confident that there’s going to be some incidents where even reliable prediction markets can have the equivalent of the 1929 stock market crash. That’s what I’m getting at with the comparison.

          • disciplinaryarbitrage says:

            @Sam

            we can be decently confident that there’s going to be some incidents where even reliable prediction markets can have the equivalent of the 1929 stock market crash

            Fully granted, but this sounds like making the perfect the enemy of the good. No one’s saying that a prediction market or prediction aggregation system is going to be impervious to bad information or some level of groupthink, but it might be a little better than other options.

            The 2016 election obviously was a massive prediction error for pretty much everyone: prediction markets were at about 75/25 for Clinton, 538 was about 70/30, but many forecasters were much more confidently predicting Clinton at 90-99%! Most of the people forecasting a Trump win did not come off as voices of reason. And yet here we are.

            It’s worth noting that while prediction markets were pretty far off on the election (about 75/25 per PredictIt), they were less confidently wrong than forecasters and poll aggregators collectively (about 90/10 per this report). From a certain standpoint, wrong is wrong; on the other hand, an organization that has various options for hedging against consequences of uncertain outcomes can benefit from better-calibrated predictions even if they miss badly once in a while.

            (On a side note, I wonder if a big factor in why prediction markets have not taken off is that many organizations have incentive structures that favor leaning into risks rather than hedging against them. If there’s a 30% chance our competitors beat us to market with a new product, a risk-neutral organization might want a contingency plan, but the CEO would look weak and indecisive running around asking managers “what’s our plan B?”)

          • Lirio says:

            On the subject of the 2016 election, it’s important to note that nearly everyone was using their prediction of the popular vote result as proxy for predicting the election result. Keep in mind here that the 2000 election was widely seen as a fluke caused by a poorly designed butterfly ballot, and the last EC-only victory before that had been 112 years prior. As far as i know only 538 was really bothering to keep track of the effect of the Electoral College, pointing out that Trump had an advantage there, and saying shortly before election day that Clinton was within polling error of losing. Nonetheless like everyone else they were weighing the popular vote much more heavily.

            So if you look at it from the perspective of predicting how pundits, forecasters, and prediction markets predicted the popular vote, you’ll find that they were actually pretty on the money. In fact 538’s final popular vote prediction was 48.5% Clinton, 45.0% Trump, and 5% Gary Johnson. Actual results were 48.2% Clinton, 46.1% Trump, and 3.3% Johnson. They did only put 10.5% on Clinton winning the popular vote but the Electoral College, but as mentioned there were good reasons to have low prior on that.

            The lesson to be learned here is that prediction markets and aggregators can fail when everyone is using a proxy for a thing rather than the actual thing, and circumstances conspire to make that proxy no longer useful. Generally proxies are used because they are easier to measure, and indeed an electoral college only victory presents a difficult prediction problem because they can only happen when a handful of key states are within polling error of each other. Which means that if people had been weighing the Electoral College correctly, they would not have predicted a Trump win, but “too close to call, odds slightly favour Clinton”. This doesn’t really feel like much of a prediction, even though it certainly is one.

            It certainly would be interesting to see into an alternate universe where Clinton won. Currently i attribute the election result to basically random variation, but alt-me almost certainly attributes it to Trump never having a realistic chance at all, and would be heaping strong praise on forecasters and predictors. It would still be true that they were setting themselves up for for failure by using a proxy measure, but since the failure didn’t happen, the failure mode would remain invisible.

          • Sam Reuben says:

            @disciplinaryarbitrage

            Perhaps I’m making my point too strongly. Rather than trying to say that prediction markets are dangerous and shouldn’t be used, I’m trying to say that they seem like they ought to be approached from the mindset that they have some catastrophic failure points. Knowing that they have these allows safeguards to be used, which is the difference between the massive numbers of nuclear reactors which have never had a problem in their careers and, say, Chernobyl.

            I think Lirio has said some great things about the recent presidential election, so I’ll leave it at that. The CEO idea is speculative, and I can’t provide good feedback on it without going further into speculation.

          • FeepingCreature says:

            I think stock markets just need an easier way to implement the “Buy U\X” transaction.

          • Deiseach says:

            But if you’re opening your prediction market to the general public, then how do you prevent/take into account Joe Ordinary looks at the question, maybe hasn’t a strong opinion one way or the other on it, looks at where the money is going and decides “okay, everyone is betting that X will happen, that’s where I’ll put my money too”? That seems to me to introduce distortions into the value of finding the ‘true’ value of the question.

            And if you confine your market to a cadre of experts, then I think the grandiose hopes of “we will set government policy for the nation/the economy by the results of the prediction market” will never happen. The public may accept ministers setting foreign policy because they voted that government into power, but a bunch of unelected guys making money off “is it the right time to bomb Foreignstan back to the Stone Age or not?” is not going to look appealing to anyone.

          • baconbacon says:

            But if you’re opening your prediction market to the general public, then how do you prevent/take into account Joe Ordinary looks at the question, maybe hasn’t a strong opinion one way or the other on it, looks at where the money is going and decides “okay, everyone is betting that X will happen, that’s where I’ll put my money too”? That seems to me to introduce distortions into the value of finding the ‘true’ value of the question.

            If all the money is going one way then you get better odds to bet the other way. Unless there is a strong bias for Joe Ordinary to follow the crowd that cannot be overcome with disproportionate payouts this isn’t an issue.

        • Douglas Knight says:

          Stock options and commodity futures have definite endpoints, so their accuracy can be easily measured, and they’re pretty good.

          • Chalid says:

            I think bonds are a better example than stock options. Stock options are about market psychology every bit as much as stocks themselves are, whereas bonds in general more greatly reward careful analysis of the company.

      • kominek says:

        for context: i was in the top 3 of the scicast leaderboard towards the end of its life, and top 10 for quite awhile.

        i don’t hink #1 is workable, at least in a market setting. good questions for a market are much harder to write than you’d think. you’ve got to divide the space of all possible outcomes into a small number of bins, and then convey to a bunch of different people how that division was performed.

        for instance, i just looked at metaculus again and one of the top questions was about whether or not britain’s exit from the EU is ratified by a certain date. ok. seems straightforward. but what happens if the negotiations result in the creation of some new category of EU-relatedness, that looks a lot like being in the EU, but isn’t technically? how do you resolve the question? if that’s not all addressed up front, and something odd happens, somebody is going to be upset at the end.

        scicast had troubles with a question about the robotic exoskeleton being used to kick a soccer ball at the opening of the brazil world cup. it was asking how many meters the ball would travel, and had categories like “0-1”, “1-10”, “10-25”, “25+” . well, what happened was that the ball was kicked, travelled a bit, and then a kid ran up and grabbed the ball before it came to a rest. what’s the resolution?

        if you just retroactively void the question, so that the trades on it never happened, then parties (like me) who played both sides (and potentially had completely exited their position on the question!) could lose money.

        you can take suggestions from anyone for questions, but they need to be written by someone who knows what they’re doing. and they probably need to be assigned to a specific known party for resolution, and include guidelines for how the question will most likely be resolved if something weird happens. (“we’ll split it 50/50”, “we’ll resolve according to final market value”, “we’ll just pick something we like between 1 and 99%”, etc)

        • anthonynaguirre says:

          I agree with this. For Metaculus, resolving a question as “ambiguous” at present never does any harm – the questions don’t “collect” any money, so lack of a payout is easy to do. This does happen though it can usually be avoided by tweaks to the question early on (indeed a lot of the discussion is pushback and refining of the question before and just after opening.) It’s very hard to see all these ambiguities upfront, before a lot of people start thinking about the question. We had an annoying one where the doomsday clock advanced toward midnight, but (for the first time ever) by a half minute. The question asked whether it would advance in one place, and whether it would advance to ‘2’ in another. The spirit was clearly satisfied but not the letter.

          As we scale up, we’ll need to come up with a good way to crowd-source the question itself and its resolution criteria a bit better; we have some ideas for this but I think it’s basically going to come down to some trusted moderators.

          A silver lining to this is that once you do it for a while you get a *lot* better at detecting ambiguous questions, and thinking through what could go wrong. (And a lot less patient with things like “30 predictions in tech for 2017” where you can’t find a single falsifiable one…)

      • Ben says:

        I’ve been toying around with the idea of a fact-based social network that would naturally extend itself to prediction markets. I have no actual plans or ability to create it (I did think of a name! “Factotum”) but I’m interested in hearing if something similar exists or if it’s just infeasible even on a limited scale.

        Round-about but to start with the problem: I dislike the lack of granular control in my feeds on social networks (twitter; what I remember of facebook 7 years ago). I might care that close-friend Bill is no longer in a relationship, but not that he has posted another picture of his cat. The reverse may be true about a distant-relative who has a much cuter cat. I want to know if Harrison Ford dies the day it happens, but I don’t care if he’s had a photo taken at a red carpet event. There are public and well-structured data I think could be scraped from places like IMDb / Wikipedia and fed into a website for consumption as posts for a starting news feed. For the most part, we wouldn’t bet on this scraped data, it would be given as true. On the social network input, people could post and self-tag (and have posts tagged by others) under general fact categories, “Bill is no longer in a relationship” / “Bill posted something funny” / “Bill died”.

        Leaving aside that there’s obvious limitations to what knowledge you could easily-and-sufficiently-organize (a la this site’s sidebar classifications), where prediction markets come into play is when people make “public posts”, which would naturally be taken as conjectures either on the content or on the tag (maybe a tag would be “states a fact”): Bill tagged a post as “Bill posted something funny” but his social group voted that tag down instead of their own “Bill posted something offensive” and it stays off the oft-offended friend’s feeds. Perhaps at an admin level, global “public posts” would be made: “Trump’s revealed as a reptile!” / “Trump will be revealed as a reptile by 2017” with various rules or weights on voting/betting, including monetary stakes, that then determine visibility in feeds and an accuracy value presented beside the post.

        This is I guess a long way of saying that perhaps decision markets can be fun by mixing them together with socially local and relevant information. That what people do on social networks can be classified, and that the accuracy of that post/classification is not only knowledge, but also an indication of if other people will be interested in it.

      • Robin Hanson says:

        I’m skeptical that there is really much interest in people just making predictions in order to be scored for accuracy. I’m much more excited about the prospect of organizations subsidizing such systems in order to answer questions of interest to them. But that requires, eventually, the support of an organization, and that has been the hard thing to create and maintain.

        • Reasoner says:

          Organizations are willing to pay consultants like McKinsey to help them make decisions. But McKinsey has inside information about the internal operations of the other companies that they consult for. They also have a steady stream of ex-McKinsey folks who graduate to big companies and then hire their old buddies.

      • jakubsimek says:

        1. Anybody could post a question but that question needs to be clear and pass a “clairvoyant test” – a hypothetical oracle must be able to validate it in the future on a specific date. Than there is a need to effectively cluster those questions and break them into more granular ones.

        2. Tetlock uses Brier score to rate forecasters. GJP could beat prediction markets because he managed to support effective *teams* of superfocasters and boost it with some extremizing algorithm. Prediction market alone would create healthy competition and incentives, but one also needs effective collaboration.

    • Robin Hanson says:

      A big problem is that people misjudge what it takes for them to be well-placed. The main issue is to find an organization near you with decisions that could be better informed.

    • Christian Kleineidam says:

      There a lot of work invested into forecasting within this community over the years. We build prediction book (developed by TrikeApps who also host LessWrong). There’s the Credence game (Development by CFAR). There’s an Android App (by LW member SquirrelInHell).

      OpenPhil gave Tetlock $500,000 to do research.

      We have LW discussions about the math behind calibration (http://lesswrong.com/lw/n55/a_note_about_calibration_of_confidence/). I like my Prediction-Based-Medicine outline.

      All of that takes more effort than to talk about Kegan levels. Talking about Kegan levels is talking about using existing concepts that are out there. There’s no need to invent anything new. On the other hand, there’s more a requirement to do something new when you want to move forcasting forward. It’s a lot more work.

  30. Lirio says:

    Meanwhile, back in 1977 someone is wondering why more business are not adopting personal microcomputers given their obvious productivity benefits. Technology takes time to disseminate, so the question you need to be asking is not why this specific technology is taking time to disseminate, but why it takes time in general. Likely it’s some complicated mix of signalling, prestige, institutional inertia, and a general feeling that these methods are still unproven. The last one is very important because nobody wants to be the fool who advocated for Hot New Thing over Proven Old Method only to have it blow up in their face. This is a legitimate concern given the large number of Hot New Things that sputter out and amount to nothing.

    Also few people have even heard of Philip Tetlock’s super-forecasting experiments, and fewer still are going to be convinced by them. Knowledge also takes time to disseminate, and people are notoriously hard to convince. Superforcasting was published in 2015, for crying out loud, this is like being shocked that businessmen were still using dumb phones in 2009. If prediction aggregation remains obscure in the next Presidential election, then you can start to wonder if there’s something strange going on. As it is right now, this is normal.

    Oh! It occurs to me that my examples above are physical devices, while prediction aggregation is a technique. Consider instead poll aggregation. The very first poll aggregator was Real Clear Politics in 2002, but poll aggregation didn’t become widely known until the 2008 election at the earliest, and i wouldn’t say it was widespread until 2012. That’s a full decade. Again it’s normal for things to take time to spread.

    • romeostevens says:

      Agree with this. People have a sensible caution about things that sound good on paper. Paper is two dimensional, reality is high dimensional. Prediction markets haven’t hammered out their free parameters or domain of calibration sharply enough for people to want it. Think of it like any other product. It’s not enough to kind of appeal across a broad range of features, you have to be the killer app for a particular feature/use case. You can broaden out to other use cases only by reinvesting resources from that first success.

    • watsonbladd says:

      They were using smartphones very early like the Blackberry. The modern smartphone was more a change in form factor then capabilities.

      • John Schilling says:

        The Blackberry predates the iPhone by less than a decade, IIRC, and things like the camera and media player were substantial changes in capabilities.

        More important was the change in target market. Blackberry was designed to be the smartphone of corporate America and the military-industrial complex, obviously the only people who would need such a device enough to pay the extra price. The iPhone and its successors were designed to be the smartphone of mass-market consumers, on the theory that if you marketed as the hip cool thing and hid most of the price in the service contract you could get enough of them to buy it to bring the price down.

        This was a great gamble, but it was transformative and in ways that go beyond the added features(*). If you stick with what can be reliably predicted, you get stuck with nothing better than a 2005 Blackberry.

        * Note that for corporate America and the military-industrial complex, features like cameras may have negative value.

    • Steve Sailer says:

      Here’s my January 2016 review of Tetlock’s book Superforecasting that considers why nobody asked ahead of time about the really decisive event of 2015 — Merkel’s decision to let in all those migrants, which helped lead to Brexit and Trump in 2016.

      http://takimag.com/article/forecasting_a_million_muslim_mob_steve_sailer/print#axzz4opn1Vslv

      … from the perspective of early 2016, a really interesting question about Syrian refugees would have been: While this book’s manuscript is at the printers, will a European national leader decide upon a whim to invite in a million of them?

      The name “Merkel” doesn’t appear in the index.

      Let’s think through the process of what it would have taken to generate questions about the most interesting events of 2015: For instance, to have asked contestants to predict the probability of Merkel’s Boner, you would first have had to forecast yourself the possibility of it even happening. But inviting home a million-Muslim mob was a remarkably stupid decision by Dr. Merkel. And the universe of possible stupid decisions by politicians is perhaps too large to pester contestants to evaluate.

      Nonetheless, Merkel’s blunderkrieg was more or less accurately foreseen in 1973 by French novelist Jean Raspail in his book The Camp of the Saints, based on his sense of the direction the zeitgeist was headed. In hindsight, Raspail’s prophecy appears brilliant. Still, you can imagine the technical problems in phrasing questions ahead of time to be both broad enough and specific enough. Raspail focused, for example, on a French-Hindu-impoverished-by-sea immersion rather than a German-Muslim-smartphone-by-land hegira. Is that close enough?

      Moreover, Raspail being right 42 years ahead of time isn’t much use in an annual contest that, by its nature, can’t look more than 12 months ahead.

      Also, Raspail missed key aspects of what happened in 2015. He imagined that the refugees would be starving masses who overcame European resistance by their pitifulness. But instead, the invaders turned out to be strutting military-age youths with smartphones, giving Germany’s surrender a weird sexual vibe that nobody yet has explained satisfactorily even in retrospect.

      • pdbarnlsey says:

        Yes, it was the strutting of the million muslim mob that bothered me, and I imagine, Brexit and Trump supporters, the most. Smartphone ownership was secondary, but significant, particularly since I strongly suspect that they were all Obamaphones, or some sort of European equivalent (MerkelMobiles?).

        So a good point well made, Steve. For future reference, can I clarify when we should refer to a group of people as “military-age”? That would seem to cover quite a lot of ground. Is this, for example, a discussion between “military-age” commenters? If so, it feels important that we mention it.

  31. Collin says:

    I wonder if allowing comments on the site will affect its accuracy due to a recency bias invoked by the top comments.

    • Lirio says:

      +1 Meta points to you, sir!

      Also to answer your question, most likely yes. While debating the subject and adding a variety of views may in fact increase overall accuracy, the recency bias effect caused by the limitations of discussion boards is very likely to produce a larger magnitude effect in the opposite direction. The benefit of discussion are much diminished if people do not take in the whole of the discussion.

      • Evan says:

        Has Metaculus or anyone considered testing out forecasters in smaller, closed teams? Phil Tetlock did that in his experiments. That way, you could have a discussion board which is more focused and can delegate different fact-finding missions to different team members. This could increase accuracy while decreasing the probability a large crowd biases a single forecaster. On the other hand, if interpersonal biases are magnified in smaller groups, that may magnify the biasing affects teams have.

        • anthonynaguirre says:

          We have something new in development that will replicate some of the dynamics of teaming but in a more fluid way. Still working out the details of the idea and implementation. A major goal of Metaculus is to try to find a balance between cooperative and competitive dynamics. Competition is important to avoid a bandwagon/groupthink effect and to keep things interesting. But pure competition discouraged information sharing and collaboration, which are extremely powerful. The scientific establishment often does a good job of this balancing act, and we’re working on how we can incentivize it. Right now Metaculus has a cooperative tone to it, but the actual incentives are pretty agains collaboration and information sharing.

    • gbear605 says:

      I use the site frequently. Based on my experience, this is a valid concern though I’m not sure that the recency bias is as much of an issue as much as the lack of comments.

      Also, there’s going to be an even more significant bias from whoever is the first to submit a prediction.

    • Scott Alexander says:

      I’m kind of annoyed that it requires a lot of covering the screen up with my hand to avoid seeing everyone else’s prediction before I formulate mine. I wonder if anyone has tested whether making everyone predict independently would be better.

      • gbear605 says:

        Metaculus has done a few questions in the past that have hidden the predictions, but those were mainly meta questions (eg. “what will be 2/3 of the average response to this question?”), so they have the technology but haven’t really done the studies.

        • Evan says:

          Could they try this out differently, like A/B testing it on different users? If it could improve UX a lot, it could be worth looking into.

      • justanotherlaw says:

        I think someone in Tetlock’s team has done this experiment on GJOpen, though that site does not display comments + forecasts on the same screen as the forecast. (Instead you have to scroll down.) I’ll ask around and get back to you on the result.

      • anthonynaguirre says:

        Hi Scott thanks for writing this up!

        We’ve thought about rolling out a feature where the community prediction is hidden until some threshold number of predictions, so there is not some big ‘first mover’ effect. After that, it’s tricky to discern what is most effective. It might be good to reveal the community prediction after you’ve made your own, or something. When the number of questions and users is bigger, we can run some controlled experiments. This flexibility is another thing that you get with aggregation rather than a market system, as in a market there is always a standing price and no way to ignore it.

        But certainly we could add a toggle so that people could hide the community (and individual) predictions from themselves.

      • Ilya Shpitser says:

        Hmm. To the extent that prediction markets work at all, the phenomenon might be related to “boosting” in Machine Learning. Boosting is sort of like combining a lot of fairly stupid but better than random predictors into a very smart predictor.

        The reason boosting works is still somewhat mysterious.

        If people’s predictions become dependent I think it becomes highly non-obvious what happens (in fact any statistical procedure’s behavior becomes highly non-obvious with dependent samples, because a lot of statistical theory needs independent samples to work).

        I am not an expert on this stuff like Hanson, but I wonder if the lack of adoption is a combination of worries about gameability of these aggregators, and worries about lack of theory.

        As prevalent as signaling, etc. is my worry about Hanson’s explanation is that it makes him look good, and everyone else look bad :).

    • pilgrimoftheeast says:

      definitely – and I’m speaking from my own experience, as I was predicting (or more like playing) GJOpen in the first half of the last year.
      In the beginning I made a couple of not so good predictions, then actually spent a lots of time of research on some other questions (with better results) and then because I wanted to “win” I basically chose a set of (super)forecasters and based my predictions mostly on their forecasts and comments. When I returned after 7 months of inactivity to check my scores I was still quite a bit better than median, even though I didn’t have a chance to change predictions based on changes of events/end date closing.
      Am I superforecaster? Definitely not. Was I able to (at least partly) game the system to achieve good score? Yes.