Utopian Science

I.

Pre-emptive plagiarism is the worst. I was all set to write about how I thought the problems I brought up in The Control Group Is Out Of Control could be addressed.

Then Josh Haas wrote A Modest Proposal To Fix Science, which took the words right out of my mouth. Separate out exploratory and confirmatory research, have the latter done by different people with no stake in the matter.

So if I want to wring a blog post out of this I’m going to have to go way further than that, come up with something really outlandish.

So here is how science works in the utopian culture of Raikoth.

II.

Anyone can do exploratory research. It can be experiments published in special exploratory research journals. Or it can be a collection of anecdotes supporting a theory published in a magazine. Or it can be a list of arguments on a website. The point is to get an idea out there, build interest.

Remember the Angel of Evidence? The centralized nationwide prediction market? Anyone with a theory can list it there. The goal of exploratory research is to get people interested enough in the idea to bet about it on the Angel.

Suppose you become convinced that eating grapes cures cancer. So you submit a listing to the Angel: “Eating grapes cures cancer”. Probably most people doubt this proposition and the odds are around zero. So you do some exploratory research. You conduct a small poorly controlled study of a dozen cancer patients who sign up, and feed them a lot of grapes. You report that all of them seemed to have their cancer go away. You talk about how chemicals in grapes are known tumor inhibitors. Gradually a couple of people start thinking there’s something to what you’re saying. They make bets on the prediction market – maybe saying there’s only a 10% chance that you’re right, but it’s enough. The skeptics, and there are many, gladly bet against them, hoping to part gullible fools from their money. Business on the bet starts to go up.

These research prediction markets are slightly negative-sum. Maybe the loser loses $10, but the winner only gets $9. When enough people have bet on the market, the value of this “missing money” becomes considerable. This is the money that funds a confirmatory experiment.

What this means is that it is interest in – and disagreement about – the question at hand that makes an experiment possible. When no one believes grapes cure cancer, everyone’s probability is around zero and so no one bets against anyone else and there is no money available for the experiment. When you do your exploratory research and come up with good arguments why grapes should work, then if you really make your case some people should be willing to bet on it – not at even odds, maybe, but at least at 10:1 odds favoring them or whatever.

Suppose the experiment returns positive results (what qualifies as “positive results” is predefined – maybe the bet specifies “effect size > 0.4, p < 0.05”). Now one of two things happens. Either everyone is entirely convinced that grapes cure cancer, and people stop doing science and start eating more grapes. Or the controversy continues. If the controversy continues, a bet can be placed on the prediction market for the success or failure of a replication. No doubt people will want to take or short this bet at very different odds than they did the last one. Maybe you could get 10:1 odds against grapes curing cancer the first time, when you were going on a tiny exploratory study, but now you can only get even odds. No problem. The pro-grape side bets in favor, the anti-grape side is still willing to bet against, and the replication takes place.

Rinse and repeat. At every step, one of three things is true. First, there is still controversy, in which case the controversy funds more experiments, and the odds at which people will bet on those experiments is the degree of credence we should have in the scientific prediction involved. Second, there isn’t enough money in favor of the proposition to get a market going, in which case the proposition has been soundly disproven. Third, there isn’t enough money against the proposition to get a market going, in which case the proposition is universally accepted scientific fact.

In practice things are not this easy. The system is excellent at resolving controversies, and you can easily get as much money as you need to study whether guns decrease crime or whatever. But science includes not just controversies but basic research. Things like particle physics might suffer – who is going to bet for or against the proposition that the Higgs boson has a mass greater than 140 GeV? Only a couple of physicists even understand the question, and physicists as a group don’t command large sums of spare capital.

So what happens is that scientific bodies – the Raikothin equivalent of our National Science Foundation – subsidize the prediction markets. This is very important. Instead of donating $1 million to CERN to do boson research, they donate $1 million to the Angel of Evidence to make the prediction market more lucrative. Suddenly the market is positive-sum; maybe you lose $10 if you’re wrong, but gain $11 if you’re right. The lure of free money is very attractive. Some ordinary people jump in, not really sure what a boson is but knowing that the odds are in their favor. But more important, so do “science hedge funds” that hire consultant physicists to maximize their likely return. Just as hedge funds in the US might do lots of research into copper mining even though they don’t care about copper at all in order to figure out which mining company is the best buy, so these “science hedge funds” would try to figure out what mass the Higgs boson is likely to have, knowing they will win big if they’re right. Although the National Science Fund type organization funds the experiments indirectly, it is the money of these investors that directly goes to CERN to buy boson-weighing machinery.

III.

So much for funding. How are the actual experiments conducted?

They are conducted by consultant scientists. The number one rule of being a consultant scientist is you do not care about the hypothesis.

The Raikolin would have a lot of reasons to react in horror if someone pointed them to Earth, but one of the bigger ones is that the person who invented a hypothesis is responsible for testing it. Or at least someone in the same field, who has been debating it for years and whose entire career depends upon it. This makes no more sense than asking criminals to judge their own trials, or having a candidate count the votes in their own election.

Having any strong opinion on the issue at hand is immediate disqualification for a consultant scientist to perform a confirmatory experiment.

The consultant scientist is selected by the investors in the prediction market. Corporate governance type laws are used to select a representative from both sides (those who will profit if the theory is debunked, and those who will profit if it is confirmed). Then they will meet together and agree on a consultant. If they cannot agree, sometimes they will each hire their own consultant scientist and perform two independent experiments, with the caveat that a result only counts if the two experiments return the same verdict.

As the consultant plans the experiment, she receives input from both the pro- and the con- investors. Finally, she decides upon an experimental draft and publishes it in a journal.

This publication is a form of pre-registration, but it’s also more than that. It is the exact published paper that will appear in the journal when the experiment is over, except that all numbers in the results section have been replaced by a question mark, ie “We compared three different levels of grape-eating and found that the highest level had ? percent less cancer than the lowest, p < ?”. The only difference between this draft and the real paper is that the real one fills in the numbers and adds a Discussion section. This gives zero degrees of freedom in what tests are done and in how the results are presented.

Two things happen after the draft is published.

First, investors get one final chance to sell their bets or bow out of the experiment without losses. Perhaps some investors thought that grapes cured cancer, but now that they see the experimental protocol, they don’t believe it is good enough to detect this true fact. They bow out. Yes, this decreases the amount of money available for the experiment. That comes out of the consultant scientist’s salary, giving her an incentive to make as few people bow out as possible.

Second, everyone in the field is asked to give a statement (and make a token bet) on the results. This is the most important part. It means that if you believe grapes cause cancer, and the experiment shows that grapes have no effect, you can’t come back and say “Well, OBVIOUSLY this experiment didn’t detect it, they used overly ripe grapes, that completely negates the anti-tumor effect, this study was totally useless and doesn’t discredit my theory at all”. No. When the draft is published, if you think there are flaws in the protocol, you speak then or forever hold your peace. If you are virtuous, you even say something like “Well, right now I think grapes cure cancer with 90% probability, but if this experiment returns a null result, I guess I’ll have to lower that to 10%.”

These statements are made publicly and recorded publicly. If you say an experiment will prove something, and it doesn’t, and this happens again and again, then people will start noticing you don’t actually know any science.

(If you’re always right, you immediately get hired by a science hedge fund at an obscenely high salary.)

Finally, the consultant scientist does her experiment. Some result is obtained. The question marks in the draft are filled in, and it is resubmitted as a published paper. The appropriate people make or lose money. The appropriate scientific experts gain or lose prestige as people who are or aren’t able to predict natural processes. The appropriate consultant scientists gain or lose prestige as people whose results were or weren’t replicated. The exploratory scientist who proposed the hypothesis in the first place gains or loses prestige that make people more or less likely to bet money on the next idea she comes up with.

IV.

There are no homeopaths in Raikoth.

I mean, there were, ages ago. They proposed experiments that could be done to prove homeopathy, and put their money where their mouth was. They lost lots of money when it turned out not to work. They added epicycles, came up with extra conditions that had to be in place before homeopathy would have an effect. Their critics were more than happy to bet the money it took to test those conditions as well, and the critics ended up rich. Eventually, the homeopaths were either broke, or sufficiently mugged by reality that they stopped believing homeopathy worked.

But homeopathy is boring. The real jewel of this system is to be able to go online, access the Angel of Evidence, and see a list of every scientific hypothesis that anyone considers worth testing, along with the probability estimate that each is true. To watch as the ones in the middle gradually, after two or three experiments, end up getting so close to zero or one hundred as makes no difference, and dropping off the “active” list. To hear the gnashing of the teeth of people whose predictions have been disconfirmed and who no longer have a leg to stand on.

Also, if you can predict the masses of bosons consistently enough, you get crazy rich.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

68 Responses to Utopian Science

  1. Pingback: Not Even A Real Links Post, Just A Link | Slate Star Codex

  2. James Babcock says:

    I have a general objection to prediction markets, which is that while they help greatly with the questions they tackle directly, they create negative information externalities for any questions they don’t tackle directly.

    Suppose this system were being used. To make a concrete example, let’s say it’s being used for pharmaceutical trials – markets go up for “drug X will work for condition Y without unacceptable side-effects”. Now suppose you’ve made a surprising discovery: Every molecule with exactly 29 atoms will fail as a pharmaceutical. (This is an unrealistic example, but there are in fact realistic versions of this which have been found.)

    Option one: You use this to make money on the prediction markets, while keeping it secret. You make a lot of money, society no longer wastes money on certain non-working drugs. This is the world with prediction markets.

    Option two: You publish your discovery. You make little to no money, society no longer wastes money on certain non-working drugs, and the exploratory researchers now know about your model and can look for similar ideas. This is the world without prediction markets.

    • Anonymous says:

      We have a few prediction markets, such as commodity futures. Do you see this happening?

      • gwern says:

        Arguably most financial markets are prediction markets in a sense. And people do complain about Wall Street and hedge funds sucking away talented scientists & mathematicians and keeping any interesting findings trade secrets.

        • Anonymous says:

          Listening to people complain about hedge funds has never contributed anything to my understanding of the world.

          But I do know an example: Ed Thorp claims to have invented Black-Scholes but kept it secret for the first hedge fund. That’s a pretty trivial example.

        • gwern says:

          Listening to people complain about hedge funds has never contributed anything to my understanding of the world.

          You asked a question, I answered it. Don’t get snotty. Do you really think that all the talent diverted to Wall Street has discovered *nothing*, that all the trade secrets of the hedge funds are completely useless to the outside world? That seems like a tough row to hoe.

        • Anonymous says:

          You didn’t answer my question. You just said exactly the same thing as James. Since my question followed his comment, I clearly don’t care about his fantasies.

        • Anonymous says:

          You didn’t answer my question. You just said exactly the same thing as James. Since my question followed his comment, I clearly don’t care about his fantasies.

          Calling a reasonable argument-from-incentives a “fantasy” seems a little off.

        • Anonymous says:

          Sorry James, that was harsh. I do care about your speculations, which is why I asked for evidence. But I don’t see any value in hearing about other people saying the same thing. And, really, fantasies nowhere near harsh enough for people complaining about hedge funds.

    • Your option one misses off the final step – somebody spots the pattern in your trades, after which your secret rapidly becomes public knowledge.

      Whether this is sufficient to balance things is a separate matter, but I don’t feel you’re comparing like to like without acknowledging that step as at least a possible final result of the option one path.

  3. Many experiments are replicable, but fail because they don’t prove what they were meant to prove. I’d guess that this is true of the vast majority of journal publications, actually.

    The direct output of an experiment is something like “under the following conditions, the foometer reads 17.” Successful replication means finding that if you create those conditions, the foometer does say 17 (plus or minus epsilon). But no one cares what a foometer says. What we want to know is “does piracetam make you smarter” or something. Generally, there is a long chain of inference from the foometer value to the hypothesis, and this virtually always has faults in it. (Missing controls are a typical example.)

    These faults only become visible later when related but distinct experiments give divergent answers. No single form of experiment, however successfully replicable, can conclusively decide any question of actual interest.

  4. Gavin says:

    As many commenters have noted, the process of scientific research requires constant adjustment and experimentation. There’s no way to specify everything before running the experiment and exploring the territory. So the process needs to look like this:

    1. Experimental scientists do science much the way it’s practiced now, researching and exploring something until they believe they have a worthy result.
    2. They publish their result with the exact procedure in a “Preliminary Results Journal.”
    3. Betting happens on the preliminary results with their fully specified experiments.
    3. Experiments deemed likely by the market are sent to be tested by independent labs that do nothing but replication. Presumably government and nonprofits have to fund these labs. I don’t see a way around this. Even if there’s betting involved, the money to run the experiment has to come from somewhere.
    4. The experiment is replicated with the exact method specified.
    5. Only independently confirmed results are considered final conclusions.

    The only reasonable way to run science is to let scientists both generate and test hypotheses, but only trust them to publish once an independent lab has confirmed the final experiment. This would probably screw up the current system of doctoral education, but it’s an embarrassment of indentured servitude anyway so that can be dealt with separately.

    I don’t think the betting system for Raikoth as specified really adds much value. It’s not helping with funding, since it requires the NSF to kick in a lot of money to keep the market attractive. And it sets up things to be tested only if they are controversial, not if they are likely.

    In order to interest “science hedge funds” you need to be giving a good return on investments, so they would be taking a fair amount of money out of the system. Though presumably it could be earning some interest while it’s in there. Maybe you bet shares of index funds, not cash?

    There’s something to the betting idea, but I don’t think the setup needs a rework. How about this?
    1. Everyone in the relevant field is given free money to bet with that must be wagered at least once before it can be cashed out.
    2. Outsiders would be welcome to bet as well, but it wouldn’t be required for the system to run.
    3. And only those experiments that are deemed “likely’ by the market are tested.
    4. If you think the market is doing things wrong and missing an opportunity, you can always bet it up until it’s considered “likely” (and lose money if you’re a homeopath!).

  5. Scott, this might be of interest to you: http://www.randomiseme.org

  6. Alexander Stanislaw says:

    This system seems incapable of generating relativity on the basis that Reimannian geometry is not an inductively testable hypothesis. Actually it seems quite underpowered compared to our existing system.

    I get the appeal of a system that is capable of ruling out homeopathy and other silly or appealing but false theories with ruthless efficiency. That is a big plus, however, in practice, its not going to prevent people from using homeopathy – because most people aren’t rationalists (I know this is Raikoth – I’m assuming this post serves a dual purpose of suggesting directions for our science to evolve towards). Either way it wouldn’t dissuade real humans from believing silly things, they just won’t rely on evidence to defend it.

    However, this system just doesn’t look very practical to me. Think about _any_ advancement is say molecular biology. Your system doesn’t seem capable of generating advancements on a reasonable timescale. If you can give a plausible mechanism by which this system could have say figured out the way the lac operon works I would be intrigued. Any other example of a nontrivial discovery that this system could have produced would be helpful.

    For one thing the incentives are too weak. In the real world scientists have big egos and a reputation to defend. This is what motivates them to test and refine their hypotheses, this is what generates huge debates in science. Money doesn’t seem like an adequate replacement for this incentive.

    Secondly, its really slow. Many fields in science are narrow with just a handful (less than 10) experts working on it. This disqualifies them from being consultants under your system. Every time they want to refine or test their hypotheses they would have to train someone to be able to carry out the experiments and even that would be fishy under your system. Given that science consists of lots of back and forth between different hypotheses, this would further slow down progress. Every single iteration of a theory would have to undergo a consultation phase which would likely be bogged down by sheer volume of hypotheses.

    Thirdly, its excludes certain types of research. I already mentioned general relativity and mathematics, but technologies don’t fall under the schematic of hypothesis testing. How would this research have funded the discovery of PCR?

    I have other issues, but in a nutshell, I happen to think that science has done a phenomenal job. Its produced a lot of garbage, but its produced a lot of incredible discoveries in a very short time span. And frankly I don’t care much about the garbage – I don’t care if 80% of the papers about drugs are false (making this number up of course) – I care that we actually have drugs that work! Science isn’t epistemology and that is fine by me.

    I will admit that in fields with low epistemic standards like psychology, or in fields in which it is hard to tell which discoveries/claims are advancements then perhaps this would be helpful.

    Perhaps the main purpose of this system could be to settle disputes. Consultation is reserved for things on which people disagree on. I could buy that. For example in the lac operon case, by the time experimenters were clear enough on what was going on to form a hypothesis, everyone could see that lac repressor works by binding two regions on the DNA forming a loop. Consultation would either not be necessary or it would be saved until the very end of the research to check that their proposed mechanism is correct. But under your system, the many years that it took just to generate a clear hypothesis would be unfunded since the researchers wouldn’t have had any predictions by that time.

    • Douglas Knight says:

      Special relativity is basically purely mathematics, not experimentally distinguishable from competing theories. So maybe this system would not deal well with it. But our system’s treatment of special relativity was ridiculous.

      Sure, Riemannian geometry is mathematics, but general relativity is physics and does make testable predictions. In particular, it explained the precession of Mercury. In Scott’s world, the theory would be introduced and immediately accepted because of that, which is pretty much what happened in our world, except in France, because the French hate the Germans. If there had been a bidding war between the French and English+Germans over general relativity, that could have funded new observations, like Eddington’s expedition to observe gravitational lensing during an eclipse. Of course, all the calculations would have had to be done ahead of time and Eddington would have been barred from participating, which is good because in our world the experiment was a fraud.

      • Ken Arromdee says:

        If you already know X, coming up with a theory that says that X should happen does not make that into a testable prediction of X. In order to count as a prediction, it has to predict something you don’t already know (or that you’re not already certain of). Having something *explain* the precession of Mercury is different from having it *predict* the precession of Mercury.

        • Douglas Knight says:

          Ken, I’m aware that some people hold the position that predictions of future events are the heart of scientific discovery. However, that is relevant neither to Alexander’s comment nor mine, which was about the difference between math and a physical claim. The precession of Mercury distinguishes Newton from Einstein, in a way that nothing distinguishes Lorentz-Fitzgerald from Einstein.

          Much of the point of Scott’s proposal is to sidestep such debates about what constitutes “real science.” In Scott’s world, the only question is what the betting markets endorse. I think they would be convinced by explaining Mercury, just as they were in the real world. If some people could not be convinced without new data, they could bet on the market to fund Eddington’s expedition. Of course, that didn’t convince the French, either.

      • Alexander Stanislaw says:

        I’m going to remain silent on general relativity until Scott says how math gets funded in his world. Scott’s system as a way of settling questions sounds fine to me. Its getting to point where we have well formed questions to ask that I’m unclear on since there is nothing to bet on until then.

      • Pawel Aleksander Fedorynski says:

        Special relativity is basically purely mathematics, not experimentally distinguishable from competing theories.

        What are you talking about? Special relativity has a lot of experimental consequences, all particle accelerators wouldn’t work if not for special relativity. Not to mention time dilation was even measured directly, with a very accurate clock in an airplane.

  7. Shmi Nux says:

    But who funds mathematicians?

    Certainly it is a nice feature of this setup that no one would bother financing any work on interpretations of QM and other Tegmark-style untestables. On the other hand, no one would bother betting on basic mathematical research, either. Unless it can be neatly formulated in terms of proving a theorem or two (how do you even bet on a theorem?). Actually, I am not sure if this is a bug or a feature.

    • Presumably, mathematics would be funded differently than science. Any thoughts about how mathematics gets funded in Raikoth?

    • Geekethics says:

      You can bet on a few things:

      1) Theorem is true (pays out when there’s a proof/disproof, a lot of people have strong opinions about this before there is a proof).
      2) Theorem will be proved/disproved before T (pays out at T, )
      3) This proof will be validated by this body (pays out when they publish their analysis)
      4) Research into this topic will yield a proof of theorem before T.

      This latter seems like a very good basis to run maths departments on. You find the research area the market says is most fruitful for your pet theorem, fire grad students at it, then collect the proofs.

  8. William Newman says:

    https://slatestarcodex.com/2014/05/01/utopian-science/#comments

    “the person who invented a hypothesis is responsible for testing it”

    That is presented as a description of science, full stop. I think that’s mistaken. It does describe a lot of modern state-funded science, but it’s not part of the character of science, unless your definition of science somehow excludes a whole lot of important science, including almost everything done before 1900 (and indeed, quite a few decades afterwards, but 1900 is a round number which suffices to make my point). I think that limitation is so strong that that it justifies adding scare quotes and/or a “cargo cult” qualifier to the word “science”.

    Even before 1900, people took lots of shortcuts, a big one being how someone who was widely understood to be a careful worker got his observations taken fairly seriously even when he was known to be well pleased with the results. But even famous embarrassments like the history of N-rays (just after 1900) and of canals on Mars (well before) seem to falsify any strong form of the “is responsible for testing it” claim. And the history of famous influential basically-correct results like Pasteur’s long-necked intended-to-be-sterile flasks suggests that the main stream of scientific influence was through people who characterized a phenomenon so well that it was straightforward for other investigators to check it for themselves.

    So I agree that the phrase I initially quoted is not a strawman characterization of “science” today. (E.g., check out http://judithcurry.com/2014/04/29/ipcc-tar-and-the-hockey-stick/ with one Lead Author to rule them all, largely responsible for choosing which evidence applies to his conclusion, held to only a superficial standard of replicability in his work at time of publication, and after publication not required to clarify or even to respond honestly to questions about the conclusion was reached.) But something is amiss when that phrase seems like such a bad fit to how science worked for many centuries before. I believe it should not be considered a criticism of science, but of a term like “[cargo cult] science”, or of a phrase like “how the victors of the long march through the institutions have been cashing in on the accumulated credibility of the old institutions of science.”

    (Or: Google translate obstinately refuses to translate “oh puhleeze” — little known fact: “nullius in verba” means “puhleeze is *so* not a word” — but it deigns to tell me that early scientists might have written “dicunt, quod?” So maybe “dicunt quod science”.:-)

  9. ADifferentAnonymous says:

    If I created a utopian conworld, public policy would work kinda like this, since money isn’t really what you need to study whether guns decrease crime. Rather, the legislature would have to vote for experiments. Instituting a policy without running an experiment first would generally be way outside the Overton Window. To the extent that the two parties have different factual anticipations, they’ll both expect to benefit from experiments, so they’ll be happy to trade them; e.g., the conservatives will allow a public health care experiment in exchange for getting their gun liberalization experiment, and both parties will be confident they’ll be vindicated though somewhat sad that the other side’s dumb idea will cause some harm in the short term. The populace would consider it normal to have their community randomly assigned a gun or health policy at the opposite extreme of its people’s overwhelming preference and unpatriotic to object to this.

    A big part of the fun of this would be trying to make real human behavior support this instead of making the people have to be impartial scientists. For example, “You didn’t have any issues with the protocol before the results came in” could be an Unanswerable Soundbyte making it politically inadvisable to nitpick results. If an experiment really is flawed in a way no one foresaw that gets discovered later but is too subtle to explain to the public, it’ll have to wait for a new generation of legislators who weren’t there to approve the flawed experiment. Not perfect, but a reasonable ad hoc solution that doesn’t require a super-intelligent populace.

    • John says:

      What happens if the community subject to a gun liberalization experiment doesn’t actually buy any additional guns?

      • ADifferentAnonymous says:

        Then we know that liberalizing guns in a community like that has no effect. If we want to know the effects of gun ownership, we can require gun ownership.

        • Deiseach says:

          I dislike guns. Your experiment would mean I would be compelled by law to buy one? How will you make me do that?

          Send me to prison – okay, I’ll go.
          Fines – I won’t pay. (See: prison, sending to)
          Give me a free gun – I’ll stick it in the locked garden shed. If that gives you usable results, then cut out the middleman and just put a big cache of guns in a government owned depot (now, what are those places called again?)

          Though such laws have been tried before; a succession of English kings imposed laws about archery practice, especially as people began to prefer playing games to practicing shooting bows, ending with Henry VIII who in 1515 passed a law requiring men to regularly practice archery on pain of fines.

          Seeing as how, despite all the preceding laws which imposed fines and banned other sports, it was necessary to keep passing new laws to force the populace to purchase arms and practice with them, I think we can see that a “own a gun” law mightn’t tell you what you want to know after all (does gun ownership have an effect on reducing crime?) You might end up spending more time and effort bringing people to court for fines and chasing up the die-hards refusing to buy or use guns, and that would drive up your crime figures.

  10. Deiseach says:

    Re: the disproving of homeopathy, how do the Raikoth handle the placebo effect? If some of your test subjects show an improvement after taking the tincture, how can they distinguish between the homeopaths saying “See, we told you it works!” and the placebo effect?

    Or what if the tincture really works? I’m thinking here of St John’s Wort, which used to be sold in health shops and was snorted at in disdain along with the other ‘folk remedies’ by the Real Medical Science crew, until somebody found out “Hey, this stuff really does work for depression!” and then it was hauled off the shelves by law (because now it was A Drug, not Hippy-Dippy Nonsense, and had to be properly regulated).

    • Cyan says:

      Re: the disproving of homeopathy, how do the Raikoth handle the placebo effect?

      The same way we handle the placebo effect, I expect — with placebo (or current-standard-of-care) controls.

  11. gattsuru says:

    I’m not convinced finding such consultant scientists on a large scale is possible, let alone desirable. There are certainly /people/ who don’t believe most anything, up to and including the law of gravity, but after the third or fourth published trial most relevant experts are either going to at least know the data or have spent months with their hands over their ears. And this only gets worse the more specialized the field : if there’s only two labs in the world that can run a CERN-like test, by trial #3, you’re asking people to not believe their own published data and their own lying eyes.

    I think I can believe things I know aren’t strictly true, but that’s not exactly a popular or culturally acceptable view among modern scientists, and for good reasons.

    And there are advantages to motivation. Like status or cash, ideological motivation prompts scientists to work. You could set status higher, but hours worked a week tends to funge better to belief or cash. If removing motivated reasoning buys you 5% better science at 20% higher cost, are you winning?

    At a deeper level, there are two types of mistakes : those you can’t afford to correct and must be extraordinarily careful, and those you can correct. It’s important to recognize the latter not only from an efficiency standpoint, but because it’s harder to recognize recoverable errors than catastrophic ones. I’m not sure motivated reasoning falls into the first category. Compared to many other confounders we think we can adjust for, motivated reasoning /shouldn’t/ be especially complex — and accepting and adjusting it doesn’t require The Right People in the important slots.

    ((There are some other, structural issues. Comparing your psuedo-NSF model to the US medical research field isn’t interesting, but not encouraging. And if your utopian results require a better class of people — most obviously, folk who can accept when they’ve been shown wrong publicly and change their mind as a result — it’s not necessarily illustrative for the set of humans we’re stuck with today.))

    Of course, if you absolutely need unmotivated results, finding scientists unaffected by motivated thinking seems like an extra step. We’ve got a /huge/ amount of unmotivated results already : experimental procedures. Create a cultural expectation of actually writing them up honestly and in detail and in any relationship to the results, the tools to push all the data together, and the software to analyze it…

    And you’ll probably get a lot of information on grades of rat chow, admittedly. But not just that. Lots of hard effort, though.

    • Scott Alexander says:

      Maybe I came on too strong with the “no opinion”. I just meant that they can’t be someone who’s staked their career on one side, or published books arguing for it or anything like that.

  12. Rob says:

    Some very nice ideas indeed. Though I’ll agree with other commenters that writing the entire paper in advance wouldn’t work as specified, because experiments pretty much always need significant tweaking once they make contact with reality. Perhaps some kind of iterative system, where the experimenter tries to run the experiment and when they hit a problem that requires them to change the protocol, they can submit a revised protocol and people get a chance to duck out again? The overhead for that might be too high though, I’m thinking out loud.

  13. Some Guy on the Internet says:

    If I wanted to be a real jerk, I’d bet gigantic amounts of money on random experimental results, then always bow out it the bowing out phase, thus screwing over the poor scientist who performs the experiment. The amount of money I’d “commit” would be almost enough to pay for the whole experiment all by itself, so that once I pulled out the experimenter would be out of zir entire salary and then some.

    It wouldn’t cost me a penny, and I could use this to really screw with people. Either as an end in itself (if I were a real jerk), or in order to suppress some type or another of research that I oppose.

    • Desertopa says:

      Here’s a way to be a jerk and potentially make some actual money at it. Place bets on a large number of hypotheses, always bow out *unless* you decide, when the protocol is released, that it’s sufficiently biased in favor of your bet.

      • Daniel H says:

        My first thought was that this kind of bias would rarely happen. Then I realized that of course it would happen, because now those experimenters are being paid to create biased experiments.

  14. “Having any strong opinion on the issue at hand is immediate disqualification for a consultant scientist to perform a confirmatory experiment.”

    This assumes that pretty-much any scientist can carry out pretty-much any experiment. Every tiny slice of science requires specialist knowledge which would be key to designing the right experiment. Any scientist who spends enough time immersed in a subject to be able to design and carry out that experiment _will_ end up with a strong opinion on it.

    Now, if you’re having robots design and carry out the experiments, that’s different, but with humans your system is unworkable.

    • I’m not sure there’s an assumption that any experiment can be performed by any scientist, but I don’t know how you’d guard against experimenters acquiring opinions about what outcomes they consider more plausible.

      Would it make sense to have experimenter as a separate profession?

      • Daniel H says:

        I’m pretty sure it is a separate profession here. I agree that often experimenters would start to expect certain results, but a lot of human psychology experiments, for example, could theoretically be done by anybody*, even if that person doesn’t have any experience in the relevant area of psychology. As some examples, consider Milgram, Robber’s Cave, Ultimatum Game, Dictator Game, most experiments related to biases, etc. This might not apply to all psychology experiments, but it still applies to a very large number.

        I’m not sure this same argument applies to physics, though. I believe many physics experiments require specialized equipment, and by the time you were able to design that equipment and know why it does what you think it does, you might get an opinion on the results. Still, it’s better than the person who came up with the hypothesis running the experiment. They are literally one of the worst choices as far as bias goes.

        * The term “anybody”, in this case, means anybody who knows how to design an ethical experiment, knows about blinding, knows about the IRB or Raikothian equivalent, knows about statistics, etc. The point is that most good human psychologists can run most human psychology studies, even if they don’t know enough about the specific area of psychology to come up with hypotheses about it.

  15. Morendil says:

    “As the consultant plans the experiment, she receives input from both the pro- and the con- investors. Finally, she decides upon an experimental draft and publishes it in a journal. […] It is the exact published paper that will appear in the journal when the experiment is over, except that all numbers in the results section have been replaced by a question mark.”

    In software engineering we have a similar concept, which we call “waterfall”. It doesn’t really work.

    I don’t know how things work on Raikoth, but on Earth if you tried something like that, I’d expect you would rarely get to the point where you replace question marks with actual numbers, because the proposed experimental protocol would turn out, when you actually carried it out, to require a zillion small but significant changes, a.k.a. “debugging”.

    (I don’t actually do science in any official capacity, but I’ve read a fair bit about this kind of problem, for instance Latour’s “Laboratory Life” or Collins’ “Gravity’s Shadow”, which take a really close look at how experimenters actually go about the business of science.)

    For instance you would find that the grapes picked for the experiment had fermented during the time between approving the experimental consultant’s draft and the betting window closing. Most patients would declare the grapes inedible and withdraw, except an adventurous few who would report totally unexpected results unrelated to cancer, such as lowered inhibitions, some loss of motor control and mild euphoria. (And all of a sudden the funding for cancer-grape research dries up as everyone realizes the greater potential of more exploratory research in this promising area…)

    • John says:

      The same occurred to me. Perhaps the consultant scientist could do the experiment in secret, then publish the writeup while omitting the data?

  16. lmm says:

    How does the NSF funding the prediction markets work? If they just fund anything on there, why would people bet on basic research over things they care about? If the NSF is already deciding what’s basic research and what isn’t, why not just fund it directly?

  17. Michael Keenan says:

    I love almost all of this, but I don’t think the positive-sum bets can work, because you’d bet on both sides and be guaranteed a return (or if you couldn’t be that blatant, you’d buy a diversified portfolio of positive-sum bets). It wouldn’t just be a few ordinary people – it’d be everyone, with all the money they can borrow. I think that would leak unlimited money.

    • somnicule says:

      Hanson’s LMSR requires subsidised betting, but the way it works is that any bets are based on how the odds are shifted. So you can’t win money without adding information to the market.

  18. Anonymous says:

    What makes you think the hedge fundies will give more than the subsidized 1 million to research the answer? If my intuition is correct, the only way they could ever squeeze more than 1 million out of this deal is if the laypeople were systematically getting the answer wrong. (EDIT: Nevermind, my intuition was wrong)

    And wouldn’t they necessarily need to withhold the findings for a period of time in order for it to be useful?

  19. Jai says:

    Who is judging the “grapes cure cancer” contract? Under what conditions does it pay out?

    Might be better to bet solely on experimental outcomes rather than theories.

    • somnicule says:

      Could a combinatorial prediction market let you make bets on theories, which shift the odds of the various predictions the theories makes regarding specific experiments? Then the market should be able to update the odds of different theories based on experimental outcomes.

      Theories themselves wouldn’t be able to be checked directly, though, so it might require something fancy to tether the theory node to the actual theory.

      ETA: Or do it at a meta-level; have it so only theory-bots of some kind can bet directly on the marke of experimental results, and individuals can choose how they distribute their funds between theory-bots.

  20. Brandon says:

    If there is money involved, people will game it. One method I can see is these rich science hedge funds doing the experiments early, in secret, and then betting on the public version.

    • Icicle says:

      Then if people can see who bet what, everyone joins in with the bets the hedge funds make, reducing their advantage.

    • Deiseach says:

      I’m presuming Raikoth doesn’t have big pharmaceutical companies, because otherwise I imagine certain firms that might love to get a patent on Grape Extract for Whatever as a large-scale commercial property would plough money into the prediction market and fund research that way (and also endow chairs at universities which train the scientists who will be running the labs doing the confirmation testing).

      I’m very sceptical about pure, untainted, absolutely free of all bias scientists because they’re humans living in the human world where one hand washes another. (I also am very sceptical of lie detectors, souped-up or no, and consider their use as reliable as tealeaves and a lot more pernicious for biasing evidence, or the jury’s perception of such, in law courts).

      • Daniel H says:

        You might not be able to find actual unbiased humans, but you can do better than the current system. The current system chooses some of the most biased humans to run the experiment. Perfect lack of bias may be impossible, but the perfect is the enemy of the good.

        Also, according to this study (which I’ve briefly skimmed), lie detectors are significantly better than chance for all (EDIT: tested) applications, but still not very good for this particular kind where the questions involved would be necessarily vague.

        EDIT: I would need to look into the evidence on lie detectors and other types of evidence to form an opinion on their use in courts, but for the moment I agree that they are probably viewed as more reliable than they are and thus should be used sparingly if at all in courtroom situations.

        • Deiseach says:

          My problem with lie detectors is that they’re touted (in popular culture at least, whatever about their actual use) as these infallible indicators of innocence or guilt.

          And, apart from the Pope*, I don’t accept that anyone or anything is infallible. I think (and this is only personal impressions gleaned from whatever I’ve read in the general media) that their use in actuality is more as a tool to chivvy confessions out of suspects; the police put on an elaborate kabuki of “Okay, you take this test and it will prove you are lying and you are going down the river for fifty years. On the other hand, co-operate with us, cut a deal, and you’ll be out in five.”

          Changes in galvanic skin response, stress patterns in the voice and all the rest of it can be down to other things than being a liar; very nervous people, people with particular physical or mental illnesses, people who wouldn’t blink an eye at lying about cutting their granny’s throat if they were caught on live TV doing it, and drugs/mediation techniques/the kind of fudging Clark Kent did to dodge the question “Are you Superman?” would all, I submit, knock the results off.

          The cops are as fallible as anyone else, and faced with a suspect they’re convinced is guilty, I don’t think they’d stick at using vague or ambiguous results from such tests to bolster a case.

          But I’m probably biased by cases such as the Birmingham Six, where the forensic evidence was ‘massaged’ to give the ‘right’ results; the tests were inconclusive for the presence of nitro-glycerine, the cops came back to the scientist and as much as said “We know these guys did it but without corroboration they’ll walk free, run the tests again okay?” and he fiddled around with the settings and parameters until hey, traces of explosives definitely found!

          *Within the limits of the charism as defined; no pope can, for instance, infallibly declare that the cashew is the official nut of Catholicism.

  21. I did once make 25 dollars by successfully predicting the discovery of the Higgs boson. But I opine that the completely pre-written papers thing wouldn’t actually work. Things go wrong; you discover hidden flaws in the code; six of the experimental subjects have psychotic breaks and shoot up the lab computer where the blinding strings were kept.

    • Yes: often the only way you can figure out how to do an experiment, in complete detail, is to actually do the experiment. This is true of nearly all difficult tasks, in science as elsewhere; there is always a degree of improvisation involved.

      • James James says:

        Similarly, hypothesis generation and hypothesis testing are not independent tasks.

      • Cyan says:

        And yet Statistical Analysis Plans still get written. (By me, for instance — it’s part of my day job.) When you write the report, you just write down all of the amendments to the plan and why they were necessary.

        In preparing for battle, I have always found that plans are useless but planning is indispensable.

        – Dwight D. Eisenhower

    • Daniel H says:

      What I think would work best is:
      1. Write the draft as described in the post.
      2. As you do the experiment, keep a “changes to experiment design” log. This sholud have all the changes that were made to the experimental design, from “We boiled the water for 3 milliseconds longer than planned due to buggy timing equipment” to “Half of our test subjects were killed by a serial killer while leaving the facility on the first day, and half of the rest decided not to continue, so only 1/4 of our initial subjects completed the experiment”. You should write down these changes as they happen, along with the actual reasons why.
      3. Fill in all the numbers where a result makes sense. Leave the others as question marks.
      4. Publish the filled in original paper, the discussion section, and the change log together as the complete scientific paper. If the changes make other analyses useful, they can be performed, but only in the discussion section.

      EDIT: Due to SSC not letting us use the <li> and <ol> tags, I messed up the numbering on my list.

  22. Sniffnoy says:

    The obvious question that occurs to me is, why are there still journals? 🙂

  23. I’d like to see something about standards for reporting on one’s own experiments. As soon as the idea that grapes might cure cancer (and I hesitate to type this because I expect that the notion will get detached from the thought experiment and people will start thinking it’s true) gets some traction, people will be trying it out themselves.

    I realize you’re trying to shield experimentation from bias by the experimenters, but I also think that since Raikothians are more conscientious that Earth humans, it wouldn’t hurt to push them to record such things as how many grapes they were eating before they started the experiment and how many they were eating when they thought grapes were worth a try.

    Is it reasonable to think there are experimenters who don’t care about the outcome of an experiment? If they exist, how do you identify them?

  24. gwern says:

    Having any strong opinion on the issue at hand is immediate disqualification for a consultant scientist to perform a confirmatory experiment.

    You forgot the part where all the consultants are administered souped-up lie detectors and implicit bias tests to make sure they really have no strong opinion on it!

    • anon says:

      What happens if they can’t find anyone without a strong opinion on something? In our world that wouldn’t be a problem, there’s always ignorance, but in Raikoth people are smart enough that I’d expect stronger uniformity of opinion.

      • gwern says:

        Science is too important to be left to bigots, I say. Clearly Raikoth will raise children in isolation just for this use-case (if an issue is so divisive that they literally can’t find any neutral consultants, then that seems like an issue on which such heroic measures are justified).

  25. Daniel H says:

    What about when the original draft isn’t detailed enough, or is impossible to follow? I suppose in the first case that’s handled by the publishing journal, but the second one sometimes happens unexpectedly. For example, suppose you are attempting to measure the speed of the Earth through the luminiferous æther, or trying to find out how much phlogiston is in an un-bored cannon barrel. You’ll get nonsensical results, because there is no luminiferous æther or phlogiston. Would they just leave the relevant fields as question marks in the final paper and address this in the discussion section?

    I’m also surprised to see you still using in p-values in Raikoth instead of some better alternative.

  26. suntzuanime says:

    I don’t really understand the purpose or logistics behind this section:

    “First, investors get one final chance to sell their bets or bow out of the experiment without losses. Perhaps some investors thought that grapes cured cancer, but now that they see the experimental protocol, they don’t believe it is good enough to detect this true fact. They bow out. Yes, this decreases the amount of money available for the experiment. That comes out of the consultant scientist’s salary, giving her an incentive to make as few people bow out as possible.”

    What goal is it achieving, and how is it not completely invalidating all bets?

    • Anonymous says:

      It’s an incentive for a good protocol, it lets you bow out if the protocol only used raisins, or only used radioactive grapes or otherwise wouldn’t prove it to their satisfaction.

      If a protocol seems reasonable to those betting, they should stay with it.

    • Dan says:

      I share this confusion.

      Suppose I buy into the market when it’s at 30%, then it drops to 20%, then the protocol gets written. Does the “bow out” option mean:

      A. I can now sell at 30%
      B. I can now sell at 20%
      C. I can now sell for whatever a buyer is now willing to pay
      D. Something else

      And if it’s A or B, who is on the other side of that transaction? The consulting scientist (who is running the experiment)?

      • Blaine says:

        It’s C I’m fairly sure. The whole point of being able to bow out is noticing a discrepancy between the hypothesis proposed and what’s actually being tested.

        If the protocol is clearly and obviously biased, suddenly the expected results aren’t so controversial anymore. Everyone begins thinking “This is total bunk, so now I think it’s 1% instead of 50%” so everyone starts shorting it. Remember that the consulting scientist’s pay is a function of how controversial the expected results will be, so crappy tests with easily predictable crappy results will have non-controversial market odds about them.