If You Can’t Make Predictions, You’re Still In A Crisis

A New York Times article by Northeastern University professor Lisa Feldman Barrett claims that Psychology Is Not In Crisis:

Is psychology in the midst of a research crisis?

An initiative called the Reproducibility Project at the University of Virginia recently reran 100 psychology experiments and found that over 60 percent of them failed to replicate — that is, their findings did not hold up the second time around. The results, published last week in Science, have generated alarm (and in some cases, confirmed suspicions) that the field of psychology is in poor shape.

But the failure to replicate is not a cause for alarm; in fact, it is a normal part of how science works.

Suppose you have two well-designed, carefully run studies, A and B, that investigate the same phenomenon. They perform what appear to be identical experiments, and yet they reach opposite conclusions. Study A produces the predicted phenomenon, whereas Study B does not. We have a failure to replicate.

Does this mean that the phenomenon in question is necessarily illusory? Absolutely not. If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions. The scientist’s job now is to figure out what those conditions are, in order to form new and better hypotheses to test […]

When physicists discovered that subatomic particles didn’t obey Newton’s laws of motion, they didn’t cry out that Newton’s laws had “failed to replicate.” Instead, they realized that Newton’s laws were valid only in certain contexts, rather than being universal, and thus the science of quantum mechanics was born […]

Science is not a body of facts that emerge, like an orderly string of light bulbs, to illuminate a linear path to universal truth. Rather, science (to paraphrase Henry Gee, an editor at Nature) is a method to quantify doubt about a hypothesis, and to find the contexts in which a phenomenon is likely. Failure to replicate is not a bug; it is a feature. It is what leads us along the path — the wonderfully twisty path — of scientific discovery.

Needless to say, I disagree with this rosy assessment.

The first concern is that it ignores publication bias. One out of every twenty studies will be positive by pure chance – more if you’re willing to play fast and loose with your methods. Probably quite a lot of the research we see is that 1/20. Then when it gets replicated in a preregistered trial, it fails. This is not because the two studies were applying the same principle to different domains. It’s because the first study posited something that simply wasn’t true, in any domain. This may be the outright majority of replication failures, and you can’t just sweep this under the rug with paeans to the complexity of science.

The second concern is experimenter effects. Why do experimenters who believe in and support a phenomenon usually find it occurs, and experimenters who doubt the phenomenon usually find that it doesn’t? That’s easy to explain through publication bias and other forms of bias, but if we’re just positing that there are some conditions where it does work and others where it doesn’t, the ability of experimenters to so often end out in the conditions that flatter their preconceptions is a remarkable coincidence.

The third and biggest concern is the phrase “it is more likely”. Read that sentence again: “If the studies were well designed and executed, it is more likely that the phenomenon from Study A is true only under certain conditions [than that it is illusory]”. Really? Why? This is exactly the thing that John Ioannidis has spent so long arguing against! Suppose that I throw a dart at the Big Chart O’ Human Metabolic Pathways and when it hits a chemical I say “This! This is the chemical that is the key to curing cancer!”. Then I do a study to check. There’s a 5% chance my study comes back positive by coincidence, an even higher chance that a biased experimenter can hack it into submission, but a much smaller chance that out of the thousands of chemicals I just so happened to pick the one that really does cause cancer. So if my study comes back positive, but another team’s study comes back negative, it’s not “more likely” that my chemical does cure cancer but only under certain circumstances. Given the base rate – that most hypotheses are false – it’s more likely that I accidentally proved a false hypothesis, a very easy thing to do, and now somebody else is correcting me.

Given that many of the most famous psychology results are either extremely counterintuitive or highly politically motivated, there is no reason at all to choose a prior probability of correctness such that we should try to reconcile our prior belief in them with a study showing they don’t work. It would be like James Randi finding Uri Geller can’t bend spoons, and saying “Well, he bent spoons other times, but not around Randi, let’s try to figure out what feature of Randi’s shows interferes with the magic spoon-bending rays”. I am not saying that we shouldn’t try to reconcile results and failed replications of those results, but we should do so in an informed Bayesian way instead of automatically assuming it’s “more likely” that they deserve reconciliation.

Yet even ignoring the publication bias, and the low base rates, and the statistical malpractice, and the couple of cases of outright falsification, and concentrating on the ones that really are differences in replication conditions, this is still a crisis.

A while ago, Dijksterhuis and van Knippenberg published a famous priming study showing that people who spend a few minutes before an exam thinking about brilliant professors will get better grades; conversely, people who spend a few minutes thinking about moronic soccer hooligans will get worse ones. They did four related experiments, and all strongly confirmed their thesis. A few years later, Shanks et al tried to replicate the effect and couldn’t. They did the same four experiments, and none of them replicated at all. What are we to make of this?

We could blame differences in the two experiments’ conditions. But the second experiment made every attempt to match the conditions of the first experiment as closely as possible. Certainly they didn’t do anything idiotic, like switch from an all-female sample to an all-male sample. So if we want to explain the difference in results, we have to think on the level of tiny things that the replication team wouldn’t have thought about. The color of the wallpaper in the room where the experiments were taking place. The accents of the scientists involved. The barometric pressure on the day the study was conducted.

We could laboriously test the effect of wallpaper color, scientist accent, and barometric pressure on priming effects, but it would be extraordinarily difficult. Remember, we’ve already shown that two well-conducted studies can get diametrically opposite results. Who is to say that if we studied the effect of wallpaper color, the first study wouldn’t find that it made a big difference and the second study find that it made no difference at all? What we’d probably end out with is a big conflicting morass of studies that’s even more confusing than the original smaller conflicting morass.

But as far as I know, nobody is doing this. There is not enough psychology to devote time to teasing out the wallpaper-effect from the barometric-pressure effect on social priming. Especially given that maybe at the end of all of these dozens of teasing-apart studies we would learn nothing. And that quite possibly the original study was simply wrong, full stop.

Since we have not yet done this, and don’t even know if it would work, we can expect even strong and well-accepted results not to apply in even very slightly different conditions. But that makes claims of scientific understanding very weak. When a study shows that Rote Memorization works better than New Math, we hope this means we’ve discovered something about human learning and we can change school curricula to reflect the new finding and help children learn better. But if we fully expect that the next replication attempt will show New Math is better than Rote Memorization, then that plan goes down the toilet and we shouldn’t ask schools to change their curricula at all, let alone claim to have figured out deep truths about the human mind.

Barrett states that psychology is not in crisis, because it’s in a position similar to physics, where gravity applies at the macroscopic level but not the microscopic level. But if you ask a physicist to predict whether an apple will fall up or down, she will say “Down, obviously, because we’re talking about the macroscopic level.” If you ask a psychologist to predict whether priming a student with the thought of a brilliant professor will make them do better on an exam or not, the psychologist will have no idea, because she won’t know what factors cause the prime to work sometimes and fail other times, or even whether it really ever works at all. She will be at the level of a physicist who says “Apples sometimes fall down, but equally often they fall up, and we can’t predict which any given apple will do at any given time, and we don’t know why – but our field is not in crisis, because in theory some reason should exist. Maybe.”

If by physics you mean “the practice of doing physics experiments”, then perhaps that is justified. If by physics you mean “a collection of results that purport to describe physical reality”, then it’s clear you don’t actually have any.

So the Times article is not an argument that psychology is not in crisis. It is, at best, an IOU, saying that we should keep doing psychology because maybe if we work really hard we will reach a point where the crisis is no longer so critical.

On the other hand, there’s one part of this I agree with entirely. I don’t think we can do a full post-mortem on every failed replication. But we ought to do them on some failed replications. Right now, failed replications are deeply mysterious. Is it really things like the wallpaper color or barometric pressure? Or is it more sinister things, like failure to double-blind, or massive fraud? How come this keeps happening to us? I don’t know. If we could solve one or two of these, we might at least know what we’re up against.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

318 Responses to If You Can’t Make Predictions, You’re Still In A Crisis

  1. Ton says:


    “On the other hand, there’s one part this it I agree with entirely.”

  2. Tracy W says:

    While I agree that there are some serious problems with lack of replication, I also find myself thinking, that, well, we know that radios work but when my engineering class got to build a radio for a lab assignment the only people whose radios worked first off were those built by the people who already had technical certificates.

    Although on the other hand, a radio that requires precise steps to build is more useful than a psychology discovery that only works under equally precise building for obvious reasons. Which brings me back to your conclusion.

    • Scott Alexander says:

      I think your “other hand” paragraph is really important. There are cases where it’s important that a phenomenon exists even if it’s vanishingly rare and precise – for example, that anything can transmit radio waves at all, even if you have to get everything just right.

      But psychology often tries to generate laws of human behavior. They’re usually trying to say that something is relevant to the real world, or explains some particular phenomenon. The fact that it’s very hard to make a radio right is relevant to our observation that things aren’t constantly forming radios and broadcasting everything we say to people thousands of miles away, and that we don’t have to take all these natural radios into account when trying to explain the world.

      • discursive2 says:

        It seems to me like the test of a discipline is whether the general principles it proposes about how the world works have corollaries that allow you to accomplish very specific things. The difference between Greek philosophers saying that everything is made of the 5 elements and modern physicists saying everything is made up of elementary particles is that modern physicists figured out how to build radios. If your story about physics doesn’t let you build radios or something equally impressive, the odds of it being grounded in anything remotely true is pretty dubious.

        Social psychology seems to be at the level of ancient Greek philosophy… there’s a lot of theories, but people can’t do anything with them. Where are the revolutions in education, or running a business? Obligatory XKCD link

        • PSJ says:

          This seems to fail on the level of how complex the phenomenon you are studying is. It’s hard to say that genetics is untrue in any real way, yet it hasn’t done astounding things in terms of commercialization (at least nothing compared to the radio, computers, or the bomb). The human brain is significantly more complex, so it wouldn’t seem too surprising that we haven’t found a way to commercialize the huge amount of well-validated research yet. (it has been commercialized, but just not on a grand scale)

          And to say you can’t do anything with it at the level of Greek philosophy seems absurd. There’s a huge difference between literally not being able to predict anything and not revolutionizing some aspect of commerce yet.

          • Ptoliporthos says:

            You’re restricting yourself to human genetics. Think about plant and animal genetics, where companies *are* using it to make a lot of money.

          • PSJ says:

            You’re absolutely right, but I think the point still stands about the truth/usability disconnect in that area.

          • roystgnr says:

            Even in the case of human genetics, where the manipulation isn’t all there yet, the diagnostic tools are pretty astounding, don’t you think? Give 23andme some of your spit. If your long-lost brother from across the country did too, they can introduce you. If your dad did too, but his spit doesn’t match yours, the spit is more trustworthy than your sainted mother.

          • Paul Kinsky says:

            > The human brain is significantly more complex, so it wouldn’t seem too surprising that we haven’t found a way to commercialize the huge amount of well-validated research yet.

            Only if you don’t count machine learning using neural networks, specifically convolutional neural networks which are based on structures found in the visual cortex.

          • PSJ says:

            I would absolutely count it, but I wanted to argue from the least convenient world. It can be reasonably argued that a majority of such research lies outside the umbrella of psychology in general and especially outside the study of social psychology and human behavior.

            Edit:not entirely sure why I changed colors 😛

          • kerani says:

            It’s hard to say that genetics is untrue in any real way, yet it hasn’t done astounding things in terms of commercialization (at least nothing compared to the radio, computers, or the bomb).

            I think that you’d have to discount the Green Revolution, the Innocence Project (and DNA testing in crime scenes in general), and food supply/food safety issues in general in order to keep this statement accurate.

          • Earthly Knight says:

            Not to mention testing for heritable diseases. Our investment in genetics has really paid huge dividends, by any measure.

        • Dave says:

          The book “Nudge” by Thaler and Sunstein points at some applications. (I am only barely starting it and laid it aside for a while, so I’m not remembering a good example.) Guy named Sutherland has a couple of TED talks with examples.

      • WWE says:

        I really like the radio example here, and I think that is what is going on to a large extent – there’s a complex system that is hard to hit precisely. But, I wouldn’t go to say that it couldn’t be useful if you managed to figure it out (imagine a computerized testing format that gives a customized priming to optimize testing results).

        As I see it, priming is interesting only because the priming activity is supposed to be something subtle and yet still have a large effect on the results (yelling angrily at your participants is almost certainly going to influence their testing results, so that’s not so interesting).

        Since the priming is somewhat subtle, I think you should be precise to replicate exactly that priming and measurement as much as possible before you even think of starting to generalize. Instead, the replications that exist might not actually be very good replications. I’m happy to initially ignore some differences like barometric pressure and wall color, but when Shanks et al modified the procedure in multiple ways, first by showing a video to extend the priming session and then by having them take an entirely different type of test to gauge the effect of the priming… well, it’s not so unimaginable that they get different results! (what do studies say about watching TV before taking a test?)

        The priming example seems like an example of premature generalization rather than replication. I haven’t looked much at the other attempts (and failures) at replication to see if they are at all like this.

        • Deiseach says:

          I think there may be people for whom priming works, and people for whom it does not work, and figuring out which is which is going to take a whole lot more work on a finer level than “Well, if we tell women to think like men…” 🙂

        • AJD says:

          Replicating “exactly” can be troublesome also. I can’t find it again now, but I recently read an article dealing with a failure to replicate the priming study in which subjects read a bunch of words related to old age (“elderly”, “Florida”, “retirement”, etc.) and then ended up walking slower. The article I was reading observed that the replication study had taken place 20 years (or whatever) later than the original study using the same methodology—but in 20 years, the kind of background factors that would cause priming effects have changed. The set of words that are stereotypically associated with aging are different now than they were 20 years ago, the relative frequency with which one encounters those words in speech or text is different, societal attitudes to and stereotypes about the elderly are different, etc. So even if the priming found in the earlier study was a real effect, we need a well-developed theory of whether we would even therefore expect the same effect to be found 20 years later.

          My field is sociolinguistics. It’s certainly the case that in sociolinguistics, if an experiment conducted on the same population speaking the same population using the same methodology has a different result 20 years later, the smart money is usually on the population and language having changed in the intervening time, rather than the original study having found a false result. (Not to say that false results don’t happen!—but changing populations and languages always happen.)

          • HeelBearCub says:

            I think this is a very good point. I have to imagine, also, that priming also needs to have some novelty to it to work well. IOW, if one tries to give a prime that people hear all the time as a prime this will not be nearly as effective.

          • Steve Sailer says:

            Right, the priming college students to walk slightly slower back to the elevator experiment was made famous by Malcolm Gladwell, who made a lot of money blurring the boundaries between marketing research and psychology.

            I spent a long time in the marketing research business, and one thing we learned was that effective marketing wears off. What was good marketing a few years ago might be boring and trite today, just like fashions from a few years ago don’t seem fresh anymore.

            You’ll notice that the marketing research industry is no danger of going out of business because it’s developed replicable methods to predict the success or failure of future marketing. Instead, marketers continue to have to hire marketing researchers to test whether there new ideas are going to work or not under the latest conditions.

          • Deiseach says:

            When I read the humorous article which mentioned “college students walking slower”, I thought it was all part of the leg-pull.

            And now you are telling me this was a genuine real serious study.

            I’m boggling at this.

            Did anyone check the phrasing? Were the students told (or did they pick up that this is what they were supposed to do) “Imagine you’re old” and then they walked slower because they thought “If I’m an old person, I’ll walk slowly”?

            Because yes, all this is sounding much less like psychology and more like “If you have the smell of fresh baked bread wafting through the store when customers walk in, they’re more likely to buy pastries” type of marketing.

          • houseboatonstyx says:

            @ Steve Sailer
            “Right, the priming college students to walk slightly slower back to the elevator experiment was made famous by Malcolm Gladwell, who made a lot of money blurring the boundaries between marketing research and psychology. …. [the marketing research industry has] developed replicable methods to predict the success or failure of future marketing.”

            Where were the boundaries supposed to be?

            From way outside those forests, it sounds like psychology findings that work on one side get claimed by neuroscience, and those that work on the other side get claimed by marketing research. No wonder psychology never gets a break.

            Psychologists may scorn focus groups, but don’t market researchers crunch a lot of numbers also? The smell of bread in supermarkets would be easy to test: how strong the smell (from zero to X) is and how the cash register receipts add up — from one hour to another, if the researchers like.

          • houseboatonstyx says:

            @ Deiseach
            When I read the humorous article which mentioned “college students walking slower”, I thought it was all part of the leg-pull.

            But actually, that could make an easy way to scout the territory. Set up a camera to film all the students walking out of all classes, see how many ‘slow walks’ follow what kind of classes, look up the content of each class’s session. If you find a lot of slow walks following other classes that bore students to sleep, or that are so interesting that the students are still absorbed in thinking about the subject, then slow walking is not a good way to test priming.

      • Deiseach says:

        I was nodding along in agreement (because come on, reproducibility is one of the cornerstones of validation: someone else repeats your experiment, they didn’t screw up, it doesn’t come out the same way, you go “Welp, better scrap that idea and start again”).

        But then it hit me: this is psychology we’re talking about, and people. This is not like setting up a titration where a set molarity of base is neutralised by a particular volume of a set molarity of acid every time. People do have vagaries. Was the professor/soccer hooligan experiment set up so that for the first lot of experimenters, the students thought “I know the correct answer to this question, but I’m supposed to be a soccer hooligan so I’ll answer it wrongly”? You don’t know the inside of people’s heads and how they think, and maybe the first lot of experimenters got their results by the way they instructed the participants the same way in all four tests, while the second lot phrased or explained it slightly differently.

        I think you’re right about a crisis, if such a high percentage of results are simply not reproducible, especially as psychology results get used to implement policies on school children or the mentally ill in the community or any other vulnerable group that the government has been pressured into Something Must Be Done.

        But I also hope that this will knock on the head the notion that there is some Grand Universal Theory of Mind and once we figure that out, plus we have a handle on the genes, then we can deal with humans as if humans are wind-up toys and you feed input/stimulus A in and reliably get output/reaction B all the time, every time.

        We’re a bundle of contradictions and easily influenced by different moods. I’ve been doing the Mood Monitor exercise for this CBT nonsense and one day I was at 5 on the “How are you feeling?” scale and the next it plunged down to 2, the only reason being a fit of melancholy triggered by overhearing a conversation that had nothing to do with me (a work colleague telling another about a family holiday they’d been on).

        We are not yet easily reducible to a neat scheme. I think maybe the main problem is that the studies are trying to be too scientific, where you are never going to get that neat parallel with classical laboratory experiments on cell cultures or organic chemistry or grating diffraction.

        • HeelBearCub says:

          Where you ended up seems to point at an individual being variable. But where you started seems far more salient.

          Imagine one was handed a bunch of beakers. They are opaque and filled with unknown substances. You can add liquid, but only through filters, each of individual, unknown properties. You can extract some liquid, but through a different set of filters. You do some experiments, publish some results.

          Your beakers are taken away and brand new beakers are given to you, with new filters. You repeat the experiment exactly, and get different results. Hardly surprising.

          • Setsize says:

            And it’s a whole lot easier if you can keep the same beakers for your second experiment. Collect enough data about the same beakers and you can get a handle on what’s in them and what the filters are, and actually find results that generalize to chemistry.

            This is why the more replicable subfields of psychology, like cognitive and sensory/motor, rely heavily on within-subject designs instead of between-subject designs.

            [subvocal grumbling about bloggers who use “psychology” as an anti-synecdoche for “social psychology”]

          • Earthly Knight says:

            Synecdoche is a contronym, the opposite of a synecdoche is a synecdoche.

          • Winter Shaker says:

            Earthly Knight:

            the opposite of a synecdoche is a synecdoche.

            And did you know, you can use the word for part of a synecdoche to refer to the whole synecdoche?

      • Steve Sailer says:

        If in 1996 an experiment succeeded in priming college students to dance the Macarena and in 2015 a replication experiment fails to prime students into dancing the Macarena, is that a crisis in science?

        Well, it is if you have marketed psychology as the Science of Getting People to Do Things They Otherwise Wouldn’t Do. But if you assume, more realistically, that much of priming — even when it works — isn’t long-term Science with a capital S but just fads and fashions and marketing, well then maybe people would start to realize that a lot of what is labeled these days as the Science of Psychology is really just the business of marketing research.

      • FullMeta_Rationalist says:

        But psychology often tries to generate laws of human behavior.

        I wonder… is this even the right way to view psychology? Because the relevant analogy doesn’t compare the science of the psyche to the science of radio waves (a simple physical phenomenon). That would be neuroscience. The relevant analogy compares the science of the psyche to the science of transistor radios (complex, arbitrarily-designed consumer-products).

        In other words, “discovering the laws of transistor radios” sounds awfully silly to me. “Discovering statistical regularities in transistor radios” sounds more apt. The space of transistor radios is wide and deep. Not every possible kind is actually engineered, mass-produced, and marketed. This is probably what Feynman meant when he said “All science is either physics or stamp collecting.”

        This is important because maybe some aspects of psychology aren’t meant to be universally generalized. I think someone down thread mentioned something about same-subject vs cross-subject studies. If we think of people as stamps or transistor radios, how much variance can we expect from cross-subject studies? Maybe a U.S. stampologist notices that lots of stamps have eagles and then a Russian stampologist tries to replicate and says “are you high? I didn’t find a single eagle!”

        I don’t know anything about statistics. So I don’t know what the correct solution is. But it probably involves context. E.g. heritability, nature vs nurture, the color of the wallpaper. And then maybe we can predict stuff like “Russian stamp? 1% chance of eagle” Or “INTJ? 5% chance s/he’s a scientist” Or “vacuum-tube radio? It’s over 30 years old, 60% chance”.


        Why is priming even a psychology thing? Shouldn’t that be like, a linguistics thing?

        • FullMeta_Rationalist says:

          n.b. I have never seen a Russian stamp before. I pulled those numbers out of the air.

          In other news, I think I now understand why assigning probabilities without models might be counterproductive.

        • AJD says:

          Priming is studied in psycholinguistics, but not all topics on which priming research is done have anything to do with language.

        • James Picone says:

          This is probably what Feynman meant when he said “All science is either physics or stamp collecting.”

          That one’s Rutherford, after he got the Nobel prize for chemistry.

    • Andy says:

      Since people with technical certificates could replicate working radios, your class can be sure radios work. As a general rule, instructables written by technicians work when reproduced by other technicians. There is no no replication crisis in radios, the instructions clearly was not detailed enough for non-technician to follow, that is it. Plus, technicians would be able to tell what was wrong with non-functioning radios.

      Otherwise said, everything missing from radio instructables is written in some textbook and all certified technicians know what it is.

      Other psychologists should be equivalent of students with technical certificates through. E.g. experiment described by one should be reproducible by another psychologists. What we have here is equivalent of certified technicians not being able to build machines, not knowing what the problem is and not knowing whether they are in the radios situation (possible to build) or rather telepathy machine situation (impossible to build).

      • Tracy W says:

        Not quite comparable though. You don’t learn electronics purely from a textbook, nearly everyone seems to need the experience of actually putting components together and seeing what happens and learning yes you do need to be careful to avoid dry joins and what-not. These things might even be in the textbook but it takes some practice to be able to remember to do them in the actual real world.
        I asked one of my classmates who did have the certificate what
        So if someone does a replication experiment and fails it’s possible that they’re just at that early stage of learning to do procedures that do fundamentally work. Or of course that it didn’t work in the first place.

        • Alex Godofsky says:

          Tracy, there is clearly no replication crisis in radios because our civilization successfully produces millions of radios every single year. “Replicates” them, if you will.

          • Tracy W says:

            That’s why I used radios as a counter-example. Replicated millions of times a year, but still people not experienced with electronics can fail to build one.

          • Alex Godofsky says:

            Yes, but no one is consistently replicating these psychology results except, occasionally, the original discoverers.

        • Anthony says:

          One would expect that a psychology professor at a research university to be at least as competent at running psychological experiments as a certified radio technician is at assembling radios. And one would expect that the professor has some good amount of hands-on experience running such experiments.

          • Tracy W says:

            If one’s expectations were a reliable guide to reality one would be on the way to receiving one’s Nobel Prize.

            And, more prosaically, it is a lot easier to get experience building electronic circuits than experimenting on real live people. No one objects if you keep piles of capacitors in a box in the lab and occasionally fry one but for some reason they get all uptight about the treatment of students, even art students. So I doubt that psychology professors get that much experience, in particular I doubt they get the experience of being up until 1am determindly struggling to make the damn thing work.

    • FeepingCreature says:

      No relation, I just want to note here that you can turn a Raspberry Pi into a radio transmitter that can hit the FM band by sticking a wire in a specific port and running a certain piece of software.

      (Though keep in mind that depending where you live, this may be illegal.)

      • Who wouldn't want to be anonymous says:

        For reference: Unlicensed FM transmitters are legal in the US, but are extremely restricted in power. According to the FCC’s FAC, unlicensed operation (and construction) is limited to a 200ft operating range.

        If you know enough about the subject to be really, really confident that your raspberry’ll not exceed those limits (or have $75000 to blow on fines and/or a few years to waste in jail)… Have fun?

    • Alex Z says:

      I think the difference is whether or not you have a “known good state”. If the students try to build a radio and fail, we can say: ” OK, we know experienced technicians can build radios most of the time, but students can’t. Why?” And you can go back and check that experienced technicians can indeed build radios. For some of those experiments, there is no such known good state where you can consistently reproduce the results and compare to other experiments where the results are not found. So it seems weird to say: “There is no known good state, but there is a good state. We just can’t find it anymore.”

    • Kyle says:

      Maybe journals should have two stages. The interesting phase 1 journal, and a subsequent replication/pre announced methodology/higher sample size phase 2 journal, which would be done by the original team 1 year later. I think you want to allow some flexible conclusions and adaptation while doing studies. As you observe data hypotheses change, you notice interesting effects, or realize you weren’t thinking about it correctly at the beginning. I think those are completely legitimate things that a truth seeker would do as they investigate a phenomenon, although it might make the p-stats not as accurate, done well and thoughtfully it will minimize that issue. However, having the same team do the replication study has the twin benefits of preventing claims that the replication study didn’t match the original, and limits the amount of scientific bullshitting that just looks for interesting results they can get published because the scientists needs to really believe that their own study will hold up to replication since the original can get pulled a year later.

      • Matt says:

        Having the same team do the replication seems to run into many of the same problems Scott discussed at the beginning of the article.

        • Kyle says:

          Well, Scott lays out three categories. The first, publication bias + being aggressive around statistics, he describes as “may be the outright majority of replication failures.” This method solves both by forcing replication and by making it costly to submit articles that are “statistically significant,” but the authors aren’t confident actually are statistically significant (publication bullshit). It’s a commitment device by the two entities closest to the data – the scientists and the journal – that they are willing to bet on their results.

          The second category of experimenter effects it doesn’t solve. Agree there.

          The third category I would summarize as: two different scientists do roughly the same experiment and get different results. You could say that this is an interesting tension and we should figure out what’s causing the difference. Or you could say that given the huge problems with replication, the first study was probably wrong. The later makes a lot of sense to me along with it seems everyone in this discussion. However, that was why I liked this method of replication. By getting one study with decent sample sizes, consistent methodology, preregistration (because it’s the same study), and the same lab team you can really narrow down the set of differences in the experiment. So, you can’t say “oh well, guess we need to find the differences” if the replication fails. You have to say “I guess we shouldn’t rely on that first study.”

  3. Sam says:

    You mention things like barometric pressure that might have an effect on priming, but isn’t it much more important what subject population you’re testing? Priming is exactly the sort of thing I would expect to work better on students who went to a lower-ranked university (like University of Nijmegen, which the original study tested) than students who went to a higher-ranked university (like University College London, one of the places the replication study tested). I would expect that top students are more likely able to focus on the task at hand and ignore previous distracting priming attempts (negative priming) or would already take tests at their peak ability and not need any additional inspiration (positive priming).

    There’s definitely a difference between “Students can be primed to do better on tests” and “Lower-performing students can be primed to do better on tests” but this distinction feels a lot closer to the physics analogy than throwing exceptions for barometric pressure does.

    • pneumatik says:

      It’s certainly possible that certain types of priming work better (or at all) on people with certain mental traits. There definitely seems to be evidence that people can be influenced mentally by their environment. But the specificity of effect should be itself testable. First create two groups of people, one with the trait suggesting sensitivity as the experimental group and one without it as the control group. Then test how much each group is influenced by the same priming experiment. But these sorts of tests can only come about if researchers try to replicate experiments.

      • Sam says:

        Yes, this explanation would be easy to test, about as easy as testing barometric pressure or anything else. (You’d need to find a large heterogeneous sample, but that’s not unheard-of.) My point is that while we’re all armchair-theorizing, it’s a much more believable alternative to propose that the intelligence of the students matters. Unfortunately, I don’t see it being offered, and I’m curious if anyone has a good reason why.

        Hasn’t anyone else here read some psychology study and thought, “I would never fall for that” or “If I were doing that experiment, I would be able to tell exactly what they were studying and be able to tune it out”? Sure, we’re probably overconfident, but I’m guessing that’s what some of the UCL students in the replication studies thought.

  4. Sergey Kishchenko says:

    I believe there is another simple attack on Barrett’s statement. Let’s assume that failed replication means “it is not false, it just requires some unknown conditions to replicate”. It still means that published paper was wrong stating there was a connection between A and B. It still means that experiment wasn’t well designed if it didn’t capture those conditions. So Barrett’s statement can not prove that psychology is ok. In fact the best it can do it is to prove that “psychology is fucked up but not that fucked up as you can think”.

    • Earthly Knight says:

      This was my thought as well. The conclusion of the original brilliant professor priming study was that the priming increased student’s tests scores, not that there was a priming X wallpaper-color interaction. This conclusion was wrong.

    • RCF says:

      But there is a connection between A and B.

      Suppose I run a study on priming, and find that in the experimental group, 700 out of 1000 pass the test, and in the control group, 300 out of 1000 pass. That’s a highly significant result. If someone says “Well, there were 300 people in the experimental group that the priming didn’t work on, so there must be some X-factor that differs between the people it worked on, and those it didn’t. So you didn’t find a connection between priming and passing the test, you found a connection between priming+X and passing the test.” To say that there’s a connection between priming and passing the test is not to say that priming ensures passing, it means that priming interacts with all the other factors such that, overall, it results in more people passing. If priming makes people sitting in a room with blue wallpaper pass, and doesn’t cause people sitting in a room with red wallpaper to pass, then priming still has a connection with passing.

      • Earthly Knight says:

        In the example you give, the presumptive explanation of the variation is differences in the background psychology of subjects in the experimental condition. That’s okay. But if the explanation turns out to be some incidental feature of the experimental set-up, like “a janitor was vacuuming loudly in the hallway for the 300 who failed the test”, you’ve conducted a poor experiment, because you’re not identifying the causal structure which produced or failed to produce the effect.

        • RCF says:

          So it’s legitimate to not account for causes that occurred before the experiment, but causes that occur during the experiment must be accounted for? That seems a bit too restrictive to me.

          • Earthly Knight says:

            You’re right that it can’t be quite so simple. The claim the experimenters are implicitly making is something like “under such-and-such conditions, if we intervene in such-and-such a fashion, ceteris paribus, undergraduates will behave thus-and-so x% of the time.” What we’re really quibbling about here is the scope of the ceteris paribus clause. If we draw it too broadly, the experiment will have no external validity– we do not want the results to fail to generalize to other wallpaper colors or ambient temperatures. If we draw it too narrowly, we will wind up including causes that are intuitively spurious– all effects in psychology depend on the subjects not being set afire during the experiment, but this is beside the point.

            That being said, it seems pretty clear to me that if your experimental result depends on the wallpaper color and this doesn’t come out during the study, you’ve bungled it pretty badly.

  5. Taradino C. says:

    Speaking of priming studies, I’ve seen this going around recently: “Picture yourself as a stereotypical male”

    Summary: When men and women were primed with a story about the day in the life of a stereotypical woman, then given a mental rotation test, the expected gap between men and women was observed. When they were primed to think about a stereotypical man, the gap vanished.

    This result is quite surprising, since the gap in scores on mental rotation tests seems pretty universal, and the evidence for a strong stereotype threat effect (of which this is given as an example) seems pretty shaky. Any idea if this has been investigated further?

    • Deiseach says:

      I am extremely surprised by that result, as I am (a) female (b) spectacularly bad at mental rotation and spatial tests in general.

      I could imagine I’m Joe (not Josephine) Soap till I’m blue in the face and I don’t see how it would make me do any better. The only thing I can think of to explain that (other than “It’s wrong“) is that they used the same test and by giving people a second chance at it, they changed their minds (because they knew the first answer they’d given before was wrong) and did better that way.

      EDIT: They used four different groups, instead of the same two groups twice? Okay, I have no idea how they got that result. Seeing as how this was conducted in Austria, should we be pondering why Austrians find the idea of Real Manly Men who are tough and do weight training after work so inspirational? (Arnold Schwarzenegger, can you shed any light on your compatriots’ views?) 🙂

      • PSJ says:

        I’m about 80% confident that a study with that problem wouldn’t have been done, let alone passed review. That’s like, first day psych 101 WhatNotToDo.

        Edit: I checked the abstract. They didn’t make that mistake. Priming is supposed to be hard to notice consciously. While it’s been taken overboard in terms of power and relevance, the principle itself is pretty hard to deny given the age old physiological and word-completion data.

        • Deiseach says:

          But does it work on anything more than “If you make people anxious before a test, they don’t do as well”?

          I think if I was instructed before a test that “This is big important test telling us all about your own personal intelligence, flaws and weaknesses”, I’d do a lot worse than being instructed “This is just a test, if you’re interested in your results, ask the supervisor later” (e.g. those stereotype threat tests the linked article was talking about).

          I really would like to see that men/women study done elsewhere to see if something is going on.

          • PSJ says:

            Semantic priming is the most obvious example. Say I have a list of words and non-words. Your task is to tell me if each item is a word or not as fast as possible. It turns out that if I show you MAT SNOW and RAIN SNOW, people on average will respond correctly to the second case faster than the first. In other words, seeing RAIN primes your brain to be able to process related terms faster. This much is noncontroversial and is a fairly textbook example of priming.

            The question “does it work on anything more?” is a very difficult question to answer. On one hand, I am fairly convinced that the mechanisms by which priming occurs are vitally important to the structure and algorithms of thought and emerge almost necessarily from the function of neural networks. On the other hand, it would be academically irresponsible to say so. So all I can say is that priming is a well documented effect in myriad domains, although a lot of work in priming of social situations has come under heavy suspicion and doesn’t seem to have the same theoretical foundation. (hence the skepticism towards the current study)

          • Ariel Ben-Yehuda says:

            The linked article shows the priming staying for 2.5s in a lab environment. It does not say much about the effect staying for multiple minutes.

          • PSJ says:


            Did I claim it did? I’ve studied short-term priming more, but if you’re interested, search for long-term priming. Here is an example. I remember some linguistics paper showing an effect lasting a few weeks off of a single presentation. I think it had something to do with choosing which shape best fits a made up word, but I can’t recall the details.

            Neural models of short-term and long-term priming are different though, IIRC

  6. Wrong Species says:

    So how well do the studies supporting The Nurture Assumption hold up when it comes to replication? Is this a problem for all studies in psychology or are there some areas that are able to withstand the challenge?

    • Tracy W says:

      There’s two parts to The Nuture Assumption, one part is criticising the idea that parents’ behaviour affects children’s adult outcomes (leaving aside extremely bad parenting), this part Judith Harris supports by attacking existing studies, including on the basis of their failure to replicate. The second part, the hypothesis that children are socialised by their peers, Judith Harris recognises as being more speculative because scarcely any research has been done on it, although the claim that immigrant children acquire their peers’ accent, not their parents, is easily observable in everyday life (including my own child).

      • Wrong Species says:

        She didn’t just criticize existing studies in part one, she pointed to alternative studies that suggested that the effects of shared environments(once accounting for genetics and differentiating shared and non-shared environments) were essentially zero. I’m wondering how well those hold up.

        • Tracy W says:

          I don’t have my copy of the book with me now, but from memory Harris referenced multiple studies on this point.

          • Tibor says:

            Including some metastudies, most of them based on twin studies if I remember correctly. Of course not even metastudies are impenetrable holy truths, but it is definitely more solid. I don’t remember which parts exactly were supported by the metastudies though.

    • gwern says:

      They hold up fine. The replicability of behavioral genetics, by which I mean the twin studies and family designs, have never been an issue. If you run a twin study on criminality or IQ or whatever, then (subject to sampling error) you will get the usual results of the outcome being highly heritable, low shared-environment, and rest non-shared-environment. There are hundreds or thousands of such studies – see for example the recent meta-analysis by Polderman et al 2015 covering them all.

      The problem has always been that the people who don’t want to accept the results criticize the interpretation of the results and argue they are being driven by systematic biases/flaws in the study design such as large violations in the real-world of the core assumptions. Replication of a biased algorithm may only replicate the bias, and so the 100th twin study showing low shared-environment adds little over & above the 50th twin study showing low shared-environment: the sampling error is already small, and repeating the same design doesn’t resolve the critics’ objections.

      To make progress, you need to do something like check the assumptions directly or remove the need for them using a different study design. This is why GCTA was so important: by avoiding family design approaches entirely (and thus all the various arguments against them in general) and estimating genetic-similarity directly from genomes of unrelated people and finding lower bounds consistent with all the family designs, it destroys the remaining criticisms.

      • PGD says:

        But GCTA estimates of heritability are significantly lower than twin study estimates of heritability, right?

        Also, GCTA is still subject to environmental confounding — genetic similarities are associated with environmental similarities, since populations cluster together physically and socially due to migration and evolutionary history, etc.

        • Douglas Knight says:

          Nope, GCTA estimates of heritability are the same as other methods.

        • Stezinech says:

          I hadn’t heard of this point against GCTA before. Google gave me this result:


          I’m not sure about the credibility of the author or website; perhaps someone can enlighten us?

        • Douglas Knight says:

          If you write out in your own words what you think it says, I’ll comment on that. If you just think it says “I don’t like heritability,” I agree that it says that, but I don’t see why I should care, regardless of the credentials of the author.

          As to the precise issue of the heritability computed by GCTA, that link gives 16 citations, 2-17. If you are interested, you might consider looking at the 16 papers and seeing how the heritability found in those papers compares to older measures of heritability. The link only provides 2 examples, one that “callous-unemotional” had a twin estimate of 65% and a GCTA estimate of 7%, which it claims is representative. Yet it is pretty different from the other example, IQ, where it cites a twin estimate of 50% but gives a GCTA estimate of 35%.

          • Douglas Knight says:

            I should add a caveat using SNPs is not using the whole genome. A GCTA using widely dispersed samples cannot detect the effect of mutational load. But if the samples are taken from a closely related population, it will capture almost all the information in the genome. That is, the common SNPs will be in linkage disequilibrium with mutational load and thus predicts its effects, even though the SNPs play no more causal a role than with a more dispersed population.

  7. James D. Miller says:

    “People respond to incentives” is the best one sentence definition of economics, and I’d bet it explains psychology’s failure to replicate problem. If it’s much harder to get the results of a psychology experiment published if you limit yourself to running experiments that produce replicable results, and if there is almost no cost to authoring a study that doesn’t replicate, then you would expect the field to be dominated by science-as-attire non-reproducible studies.

    • nope says:

      The crisis isn’t about people testing hypotheses that turn out not to be replicable later. You’re right about the incentives part, but the incentive problem actually centers around the fact that there aren’t really any positive ones for doing replication studies, while there are a lot of negative ones, such as the fact that a lot of prestigious journals will turn you away at the door for not being “original” or “significant” enough. We need all sorts of ideas tested, even out-there ones. But we also need results verified, and that’s why the replication that isn’t being done right now is so important.

      • 27chaos says:

        Why does getting a bad study into some journal matter in the first place? Because many decisionmakers are stupid, and judge based on broad metrics or how excited they feel about an idea, rather than the actual quality and rigor of someone’s work. I think trying to reform journals is a doomed strategy. Instead, we should make it so that publication count itself matters less. This would require educated administrators who care about the field’s truth seeking enough that they are willing to insist on quality. If we don’t have such people in charge, any changes made to incentives will only be minor tweaks.

        • nope says:

          This will never happen. Incompetent administration is a self-perpetuating problem, because who makes decisions about administration structure, hiring, etc? Administrators. Administrators, by and large, are going to be on the less intelligent side of university staff. Publication count is really the only quick and straightforward metric for productivity, and you’re not going to make administrators stop using it. Furthermore, it doesn’t solve the problem that there are basically no incentives for scientists to focus on replication. If you can’t motivate people to do different things, they’re not going to do those things.

          • Anthony says:

            Furthermore, it doesn’t solve the problem that there are basically no incentives for scientists to focus on replication. – right, so what’s needed is to get journals to publish replication studies more. Or to demand that someone try a replication before publishing the initial paper. Or something, but that just pushes the problem somehwere else – how do you incent publication editors to reward replication work more?

          • Dennis Ochei says:

            You need to push the problem all way back to “I need to do x” if you want to solve it

    • baconbacon says:

      People make choices or people have preferences would be afar better one sentence summary of economics.

      • Marc Whipple says:

        I see your point, but I think those are a little too basic. They are pretty trivial observations, which don’t give any predictive power. People could be making random choices, or have irrational preferences.

        “People respond to incentives,” while it also seems trivial, tells us where to look for something. If economics works, and we see people doing something we don’t expect, we know to look for an incentive we haven’t noticed.

        • “People tend to take those actions that best achieve their objectives”

          aka “rationality.”

          • Jeffrey Soreff says:

            >“People tend to take those actions that best achieve their objectives”

            Depending on how much of a disclaimer “tend” is in that statement, it isn’t true. People have all sorts of biases (salience, status quo, many many more) which prevent them from picking actions which best achieve their objectives.

            >”People respond to incentives”
            is a weaker claim.

          • Marc Whipple says:

            @JS: I respectfully suggest that the original observation is true, and that the point you are trying to make would be better expressed as, “People aren’t very good at figuring out what their objectives are or how to achieve them.”

          • Anonymous says:

            @Jeffrey Soreff

            Of course people have biases. But I think extrapolating that to suggest that people are mostly irrational is an extraordinary claim. Think for a moment what the world would look like if people really did not manage to find ways to achieve their objectives most of the time. Someone wants to get somewhere a reasonable distance away: do they drive their car, or attempt to drive their lawnmower? Do they look up the route or do they drive to the nearest city and hope that happens to be where their destination is? Do they follow traffic laws and norms or do they just point their car in the right direction and hit the gas? When they want to turn left, do they turn the wheel left or right?

            Trying to correct your biases of course requires you to have biases in the first place, but does not mean that your actions are mostly irrational rather than mostly rational.

          • keranih says:

            @ Jeffrey Soreff –

            People have all sorts of biases (salience, status quo, many many more) which prevent them from picking actions which best achieve their objectives.

            Perhaps it would help to look at biases/preferences/etc as “competing objectives”, so that a person going after goal A while trying also to reach goal B doesn’t quite travel in a straight line.

          • Dennis Ochei says:

            Humans only have a bounded form of friction-off, perfect knowledge rationality.

            We can’t consider every possible choice, we don’t know perfectly the outcomes of our actions, we can’t perform our actions without mistakes, we have fuzzy or ill-defined objectives, and even when these things aren’t true we still sometimes make choices against our best interest. Furthermore, the execution of our rational faculties is costly in terms of time and utils. So I can’t actually just consider all possible actions and choose the one that maximizes my utility, that costs like a bazillion utils to do.

            All that said, the fiction of the rational human is very useful. But it’s still a fiction. I don’t think it’s as trivial as I think you are saying it is, unless by “people behave rationally,” you mean the lay meaning of rational which might as well be replaced with the word “normal”

    • Splodhopper says:

      Economics suffers from much the same problems as psychology. In fact, one might go so far as to say that it is in even a worse spot, since some people appear to consider it more credible than psychology.

  8. merzbot says:

    So, how could we solve the crisis? Would things like pre-registration be sufficient, or are there just so many variables involved in the study of human behavior that psychology is inherently screwed?

    • LTP says:

      I do think experimental psychology is possibly screwed. It seems like human psychology is just too complicated to study in an experimental way comparable to the hard sciences. This doesn’t mean psychology is worthless. For instance, I think there may be value to therapeutic techniques developed by practioneers over time through experience, or theories developed through extensive examining of case studies by an academic, but this isn’t scientific in the way that experimental psychology aspires to be, but more of a humanistic-oriented approach to the discipline.

      Maybe this is irrational, but at this point I don’t trust *any* experimental psychological results, especially those that lead to conclusions that are counterintuitive, politically convenient, or sensationally covered in the popular science press.

      ETA: I will note, though, that I’m no expert on psychology.

      • Douglas Knight says:

        Barrett is basically saying that’s the situation, yet that psychology is not screwed. Out of the frying pan into the fire, because she hasn’t given any thought to what any of this means, just what she wants her conclusion to be.

        Whereas, those complaining about a replication crisis think it’s just fraud. People who know what they want their conclusion to be.

      • xq says:

        Medicine doesn’t replicate any better. This isn’t a psysch-specific problem, it’s a problem with many fields (roughly, all of them that make heavy use of statistical hypothesis testing).

    • James D. Miller says:

      Genetic data, brain scans, and wearable tracking devices could turn psychology into a real data-driven discipline. But it probably won’t be professors who find the interesting results, rather it will be business people who figure how to profit from figuring out how to make us happier, more productive, and more willing to buy targeted products.

      • PSJ says:

        Have you heard of Neuroscience? Psychology and related fields have been a large progenitor of new statistical techniques for a long time-I’m not sure why you think business people would be better at it. Start here if you want some examples for psych in general, here for social psych in particular.

        • Steve Johnson says:

          Because of natural selection.

          If you make wrong predictions in psychology you get published as long as you make the right wrong predictions.

          In business if you make the wrong predictions you have to keep persuading people to give you money so wrong predictions will keep getting made. On the other hand, if you make correct predictions and they’re revolutionary you make massive amounts of money and get to have sex with beautiful women and live in nice houses and swim in the ocean on weekends.

          • PSJ says:

            I…I’m not sure you have a great picture of what psychology actually looks like from the inside. Since Scott likes to criticize bad psychology, I know you’ve been exposed to that side of things, but we all make fun of it too.

            If you click through the links in the post you’re replying to, I’d challenge you to find anything that feels even slightly politically motivated. You probably just hear about the most outwardly political parts of it (stereotype threat/social priming). Other parts are inwardly political, but that more has to do with actual scientific disagreements (theory of mind/language acquisition). But it’s nothing like Economics or Sociology where politics plays a central role in the field.

            I’d also point you to Thinking Fast and Slow. Which is a psychology book about how to make correct predictions in general. And summarized work that won the Nobel (memorial) Prize for Economics. Which seems like the kind of thing you’d respect.

      • vV_Vv says:

        Business data analytics may lead to more better products, but it will probably not lead to new scientific discoveries, at least not without substantial involvement of academic researchers.

        For instance, consider web advertising click-through prediction: from your search history, the content of your gmail mailbox, your browsing history as tracked by cookies, etc., Google can estimate a better-than-chance probability that you will click on any specific ad.

        Presumably, they do this by pre-processing the raw data into some engineered features and then feeding them to a black box machine learning system like a neural network or a random forest. The recent trend in machine learning seems to be towards using less engineered features and more raw data.

        Strictly speaking, this is a form of psychological prediction. However, we can’t say that it advances psychology as a science: All the details about the system are closed-source trade secrets while science is normally done in the public domain. But even if the system was released to open source, it still wouldn’t change much, because these models are task-specific and opaque: we wouldn’t be able to peek inside a trained neural network or big random forest and gain useful and generalizable knowledge about how the human mind works.

    • nope says:

      Pre-registration doesn’t do anything to solve the replication crisis. Our federal-level science funding organizations need to start mandating that the people who receive public funding take part in the replication process. I can’t think of any possible way that this problem could be solved from the bottom up, but a top-down solution would actually be quite easy, and it baffles me that this isn’t being done.

    • Deiseach says:

      I think it’s valuable in a very broad brush way, but where the trouble comes is when those results are then taken and bruited abroad as “Aha! This proves all X are Y and we should do Z to inhibit/encourage them!”, which then gets translated into policies, as in education, then every so often the whole system gets turned upside-down again when a new study with a new result which results in a new fad comes along.

      It’s like all those management guru books. I’m sure everyone has experience of working somewhere where every so often, a manager gets a burst of enthusiasm and everyone has to stop doing things the old way and do them the New Fancy More Productive way, until just as the disruption has settled down, a different manager gets promoted and then decides “No, I prefer this colour of umbrella for my cheese” and it’s all change again 🙂

  9. Is a failure to replicate more common in some fields of psychology than others?

    • nope says:

      I’m pretty sure social psychology is at the bottom of the barrel in terms of being wrong about things.

      That’s what you get when the only people who go into your field are stupid and can’t math.

      • Anon says:

        I would expect the causality to be the other way around; the more rigorous fields that signal lots of intelligence, attract the more capable people, and then the less capable ones end up in social science (on average).

        • Psych says:

          The “soft sciences” like psychology are easier to do poorly than the hard sciences, but also harder to do well. So yeah, there may be fewer capable people in these disciplines, but there are also some incredibly brilliant people who are not flattered by the higher status of the hard sciences and not afraid of how hard the work is going to be.

          • Douglas Knight says:

            There are ways in which the soft sciences are inherently more difficult, such as it being easier to fool yourself, especially because you have lots of intuitions about psychology. But how much are these the problems and how much is the problem p-hacking, which is no easier in one field than another?

        • Sylocat says:

          Since when does a field “signal(ing) lots of intelligence” mean it attracts more capable people?

    • Jacob says:

      We really don’t have enough data to say for sure, but I can tell you in biology replication is the exception, not the norm. See for instance http://blogs.nature.com/news/2011/09/reliability_of_new_drug_target.html and http://www.nature.com/nature/journal/v483/n7391/full/483531a.html. Which is pretty appalling; if we get priming wrong I don’t know how much it matters, but cancer research?

      • Setsize says:

        Well, if you want to get depressed about the state of replicability in cancer research, watch this lecture.


        Summary: A study can have a good idea and a good method, and get screwed by making one sign error and accidentally copying the header row alongside your data. Then someone else can come along and heroically debug the study, and be pointedly dismissed. All up until one fudged line on someone’s CV is discovered, but that doesn’t lead to more truthful outcomes either.

    • Vilgot Huhn says:

      In the study in question they found that cognitive psychology replicated more than social psychology (50% vs 25%). It’s open access so you can read for youself if you want to.

  10. John Sidles says:

    There is a saying among chemists, physicists and engineers that “When you’re confused, there’s likely more than one thing wrong [with your apparatus/protocol/software].” The following passage points to just one thing (among many) that can go wrong with STEAM communities in general, and with psychology and psychiatry in particular.

    For “mathematics” read “psychology and psychiatry” …

    The work of Nicholas Bourbaki
    by Jean Dieudonné [*] (1970)

    Here is my [Bourbaki’s] picture of mathematics now. It is a ball of wool, a tangled hank where all mathematics rects upon one another in an almost unpredictable way. Unpredictable, because a year almost never passes without our finding new reactions of this kind. And then, in this ball of wool, there are a certain number of threads coming out in all directions and not connecting up with anything else. Well, the Bourbaki method is very simple—we cut the threads. […];

    There I with to explain myself a little. I absolutely do not mean that in making this distinction Bourbaki makes the slightest evaluation on the ingeniousness and strength of theories catalogued in this way. […]

    If I had to make an evaluation I should probably say that the most ingenious mathematics is excluded from Bourbaki, the results most admired because they display the ingenuity and penetration of the discoverer.

    We are not talking about classification then, the good on my right, the bad on my left — we are not playing God. I just mean that if we want to be able to give an account of modern mathematics which satisfies this idea of establishing a center from which all the rest unfolds, it is necessary to eliminate many things. […]

    Bourbaki can only and only wants to set forth theories which are rationally organized, where the methods follow naturally from the premises, and where there is hardly any room for ingenious stratagems.

    Open questions  Are we approaching an epoch in which — to borrow phrases from Bourbaki — the “craftsmanlike ingenuity” that is associated to 20th century psychology and psychiatry can be distilled to a “coherent theory, logically arranged, easily set forth and easily used”? And in this epoch, will the too-numerous “loose threads” of present-day psychological and psychiatric research — be they ever so ingenious — none-the-less be cut-and-discarded, without being sorely missed?

    The real optimists  People who believe in the STEM-feasibility and enlightened desirability of a “thread-cutting” Bourbakian medical synthesis are (as it seems to me) the real optimists of 21st century medical practice.

    Conclusion  There are many urgent and difficult problems with modern psychological and psychiatric practice, yet for so long as the hopes that real optimists cherish for a Bourbakian medical synthesis remain unfulfilled, it is scarcely likely that much progress can be made overall.

    Remark  These optimistic medical hopes and sobering medical concerns are relevant to quantum information theorists too … the respective hopes and concerns of these two disciplines being (from a Bourbakian perspective) naturally entangled.

    [*] Historical note  In 1971, Dieudonné’s often-hilarious article received the Paul R. Halmos — Lester R. Ford Awards, given for “articles of expository excellence,” from the Mathematical Association of America.

    • Deiseach says:

      I think the problem is that, instead of getting into a thread-cutting STEM-style synthesis, as we discover more about human physiology and psychology, we are discovering more individual differences, not fewer.

      For example, that men and women experience the symptoms of heart attacks differently. Even my female GP, when investigating my chest pains, asked me did I have the “shooting or tingling pains down the left arm”. I didn’t (and luckily, whatever the trouble was, at least it turned out not to be my heart) but that does not mean I would not have been having cardiac trouble.

      So instead of being able to simplify “Diagnose a heart attack by shooting pains down left arm”, we’ve gone further in knowledge and found out that it’s more complicated and unique. More threads are sticking out of the hank of wool, and snipping them off may mean (for example) treating women as if they’re men and missing when they’re having a heart attack and so more deaths, not fewer.

      I think the 21st century and beyond optimists may be those who decide “Damn it, it’s going to get to the point where individual patients need an individually designed slate of medicines because one size does not fit all for the same condition”.

      • John Sidles says:

        Jack Vance’s oft-reprinted short story The Men Return (1957) is an account of the world you describe, namely, a world whose phenomena are unique and irreproducible.

        • Deiseach says:

          Well, it’s like Scott has described about anti-depressants; you try the most common/popular one first, because generally it works for most patients, you twiddle around adjusting the dose, and if it doesn’t work or has undesirable side-effects you switch to another one and keep going till you hit one that works for this particular patient 🙂

          Or painkillers: there’s aspirin, paracetamol, and ibuprofen which work for me in that order; aspirin will take down any pain but kills my stomach, paracetamol is next, and ibuprofen does nothing at all. Someone else might find it better than aspirin for them.

          On the broad level, “Pain-Go” will work for 80% of people with no effects; a further 10% won’t find it as effective as “Stop-Ache” and for 2% it makes their eyebrows turn orange. Future medicine may be more “Let’s check you’re not one of the 2% before we prescribe this” rather than “Oh yeah, “Pain-Go” will fix your right up!”

          I think the hey-day of Big Psychological Explain-All Theories was, as in SF, the Golden Age of the 40s/50s. You certainly see it in things like Asimov’s psychohistory or van Vogt’s adoption of non-Aristotelian logic; this notion that with increasing scientific knowledge, we could work out the psychological drives and the neurological areas of behaviour and then it would only be a matter of inputting the correct stimulus to get the desired reaction when dealing with people en masse as a society. That we could plan and construct a world of progress and order and we’d understand all our impulses and evolutionary holdovers and could prune and govern them as desired.

          I think that attitude now survives mainly in sociology, which has never quite gotten over the 70s 🙂

  11. PSJ says:

    The original paper had a very good addition to this discussion, so I’ll just copy it here

    After this intensive effort to reproduce a sample of published psychological findings, how many of the effects have we established are true? Zero. And how many of the effects have we established are false? Zero. Is this a limitation of the project design? No. It is the reality of doing science, even if it is not appreciated in daily practice. Humans desire certainty, and science infrequently provides it. As much as we might wish it to be otherwise, a single study almost never provides definitive resolution for or against an effect and its explanation. The original studies examined here offered tentative evidence; the replications we conducted offered additional, confirmatory evidence. In some cases, the replications increase confidence in the reliability of the original results; in other cases, the replications suggest that more investigation is needed to establish the validity of the original findings. Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims.

    The present results suggest that there is room to improve reproducibility in psychology. Any temptation to interpret these results as a defeat for psychology, or science more generally, must contend with the fact that this project demonstrates science behaving as it should. Hypotheses abound that the present culture in science may be negatively affecting the reproducibility of findings. An ideological response would discount the arguments, discredit the sources, and proceed merrily along. The scientific process is not ideological. Science does not always provide comfort for what we wish to be; it confronts us with what is. Moreover, as illustrated by the Transparency and Openness Promotion (TOP) Guidelines (http://cos.io/top) (37), the research community is taking action already to improve the quality and credibility of the scientific literature.

    We conducted this project because we care deeply about the health of our discipline and believe in its promise for accumulating knowledge about human behavior that can advance the quality of the human condition. Reproducibility is central to that aim. Accumulating evidence is the scientific community’s method of self-correction and is the best available option for achieving that ultimate goal: truth.

    • PSJ says:

      I also want to mention that most of your response to John Ioannidis is valid for psychology as well. Most academic psychologists are very well aware that a large portion of published studies are flawed and spend a good amount of time trying to falsify them. The psychologists that are read by the population at large, however, tend to be the starry-eyed, politically driven idealists, so people get the impression that the whole field should be discarded and replaced with whatever they think is true about human nature. (and this coming from someone who hates most social psych research with a burning passion)

      • Douglas Knight says:

        There’s a big gap between are aware of problems and try to falsify them. Of course, we’re discussing this because of a big project to thoroughly replicate a bunch of studies, but that is exceptional. The normal behavior is to avoid the bad neighborhoods and try to make a positive contribution in unrelated areas.

        Some of the territory that has been abandoned is inherently worthless, but some would admit useful experiments if they weren’t overshadowed by exaggerations of plausible programs. In medicine there is more replication and correction of errors because there is more agreement on what types of questions are useful in the first place.

        • PSJ says:

          I’m not sure I understand you well enough to give a proper response. What do you mean by “The normal behavior is to avoid the bad neighborhoods and try to make a positive contribution in unrelated areas?”

          Going by my best guess, if that were the case, then we wouldn’t have a problem as bad results wouldn’t be accepted by the mainstream community anyway as they are known as “bad neighborhoods” and thus wouldn’t sully the larger theoretical frameworks. And, if the people who are willing to stay come up with convincing enough results, they can always be accepted later.

          In my experience, this isn’t true of psychology (but I tend to stay on the cog/neuro side). Most “bad” theory is disputed rather than ignored. See theory of mind, songbird song acquisition, modular vs non-modular systems, statistical vs logical/theory-based learning of language, and mirror neurons as good examples of the phenomenon. These are all highly controversial fields, but tend to do so through new design of experiments rather than direct replication. Replication would be a good tool, but it is not the only one that can correct errors.

          And even then, large replication projects are not a new thing in psychology. See: Many Labs project papers like this.

  12. You mentioned a number of incidental factors that may affect the results of a study, such as barometric pressure and the colour of the wallpaper of the room in which the study was conducted.

    Would a ‘protocol’ that rigorously specifies each and every incidental down to the last relevant detail be sufficient to remove this problem (except perhaps in exceptional cases) – akin to a ‘clean room’ for psychology experiments? If such a protocol did come about, should greater importance be placed on studies which conform to it, and replications/non-replications of such studies be taken more seriously?

    (I vaguely imagine something which specifies the colour of the walls, desks, their exact specifications and measurements, their layout(s) within the rooms, the size(s) of the rooms, the exact appearance of the computer(s) to be used, a specification of the interface as well, down to the widgets and colours, how human contact can itself be eliminated or standardised (if it’s a cause of possible experimental bias), and so on.)

    • Tracy W says:

      How could any protocol do this? You’d not only need to specify what happened within the room, but also what happened outside it, and everything that happened to all of your experimental subjects (eg rain, traffic problems, exam deadlines) leading up to the stuy, including throughout their lives (where were you when you heard of September 11?)

    • I’m not 100% sold on it, but I got the impression that Scott’s argument was basically saying if you had to choose between blaming it on the wallpaper/barometric pressure etc, or just assuming the study itself was flawed, it’s probably more reasonable to do the later. Otherwise you’re going to risk chasing limitless non-obvious but possible confounding factors.

    • Deiseach says:

      It could come down to something as subtle as “The desks provided were not the kind I’m used to sitting at so I found it tougher to settle down to write comfortably and this distracted me during the test”.

      I definitely think variables such as style of desks, setting, etc. should be taken into account: a dimly-lit, cold test hall might make a difference to somewhere clean and bright. If these were specified as part of the environment and all was as standardised as possible (where “standardised” means “don’t ask for non-Americans to use American desks/numbering etc. and vice versa) might at least cut out some of the “Was it the fact that this study was done at 9 a.m. on Monday when the students had been out on the beer the night before that meant they didn’t test as well? We just don’t know” confusion 🙂

    • Scott Alexander says:

      For each unit of effort you invest in this, you should expect discrepancies to decrease, but you’ll never get it so perfect that nobody will be able to concern-troll you. Are the experimenters in the two settings the same height? Was the research done on a Monday or a Tuesday? Full moon or new moon? Etc.

      • Deiseach says:

        Monday or Tuesday could actually make a difference, as well as morning or afternoon (the post mid-day meal slump).

        As for full moon versus new moon – well, if you get me on the full moon, I am more likely to be aggressive (for reasons of female physiology say no more say no more) 🙂

        That’s the problem; people are not bottles of chemicals or capacitors. But if you’re going to make Big Sweeping Statements based on “Our study shows that…”, then the experiments need to be reproducible in some way, shape or form.

        • Jiro says:

          As for full moon versus new moon – well, if you get me on the full moon, I am more likely to be aggressive (for reasons of female physiology say no more say no more)

          For the phases of the moon to repeat requires 29 1/2 days. 29 1/2 days is within the possible lengths of a woman’s period, but it’s not the average, and even being a couple of hours different would make it not match after a few years. Even if the study were to catch you, by chance, when your period matched the phases of the moon, on the average the periods of the women in the study would be distributed randomly with respect to moon phases and the overall study should find no effect.

          • Good Burning Plastic says:

            Another possible mechanism is full moon → brighter nights → harder to sleep well.

          • Deiseach says:

            Jiro, that individual quirk is exactly the point. If every female went with the moon, it would be an environmental given that full moon = increased levels of aggression in test subjects, and you could take it into account when doing a study on “Chocolate: soothing the savage breast or not?”

            But when everybody has unique characteristics, then doing a study on Monday may indeed get you different results than if you ran the same study on the same group on Tuesday.

            People are not extruded plastic products, is the point. We can probably trust broad general statements from psychology because it’s dealing with the mass, but when it comes down to “Women score higher on maths tests if you remind them they’re Asian but they do worse if you remind them they’re women” then we need a bit of confirmation by running a few replication tests.

          • Jiro says:

            Deseach: Taking into account the fact that someone has a period on the day of the full moon by coincidence is exactly like taking into account that someone had a big argument on the day of the full moon by coincidence–you shouldn’t be taking it into account at all, since over a large group of subjects both the arguments and the periods will be randomly distributed.

            You wouldn’t say “doing a study on Monday would get you different results than on Tuesday if someone happened to have a big argument on Monday”. Same with periods.

          • Deiseach says:

            Jiro, to establish a baseline for comparison (e.g. in the “priming students to walk slower”), I think you’d want to make sure beforehand that your control group weren’t wearing tight shoes, or had sprained ankles, or were all 85 year old arthritis sufferers when comparing them with your test subjects.

            So things like arguments, or hormone levels, or the like would affect baseline moods in the control group/test subjects, and not taking them into consideration might mean results in either the original or the replication “Our test group exhibited elevated aggression after being punched in the face, but our control group also exhibited elevated aggression even when not punched in the face, so the results are inconclusive” study.

          • Jiro says:

            If the things in question are randomly distributed, no, you don’t take them into consideration when establishing a baseline for comparison. You use your general knowledge that levels of aggression fluctuate randomly and you require that your experiment produce differences greater than is likely from random fluctuation. We have statistics for that.

            The fact that a particular individual in your experimental group happened to have had an argument (or a period) on Monday is just a piece of the random fluctuation and does not need to be considered separately from it.

  13. Seth says:

    Many of those psychology experiments strike me less like “dropping an apple” and more like “blowing a soap bubble”. The soap bubble doesn’t drop straight down. Depending on the air currents, it may go up, or down, or drift sideways for a while. Any particular bubble may not behave exactly the same as the previous ones. How big are they? Exactly what kind of soap are they made of? (differences in chemical composition can affect how heavy they are, and how long they hold together). Someone walking by might stir the air enough to affect things. Or maybe the room heating system cycles on. Or off.

    Maybe you shouldn’t try to understand gravity using soap bubbles.

  14. suntzuanime says:

    My barometer for how screwed experimental psychology is is how employed Dr. Jason Mitchell still is. It seems like he’s taken down his article about how replication is mean and evil and unfair to respected scientists like Dr. Jason Mitchell, but he still seems to be in charge of an experimental psychology lab at Harvard, so further work is needed.

    • Addict says:

      I don’t suppose you have an archive of that page? It sounds like an excellent opportunity for me to get in my daily dose of outrage.

        • Marc Whipple says:

          Oh, wow.

          The self-righteous, self-satisfied narcissistic illogic, it burns. It burns us, Precious!

          Although the fact that a highly-placed psychology professor at one of the most elite universities in the world can publish that with a straight face, and not be immediately laughed out of town, does illustrate part of the problem better than a thousand failed replication experiments.

          • 27chaos says:

            What should one do if they are face to face with people like this while they spew bad arguments? I’ve tried speaking up before, but it didn’t go well at all, since they were Authority. Yet I hate the idea of sitting silently while someone does low quality or unethical work.

          • Earthly Knight says:

            If it makes you feel better, that piece provoked dozens and dozens of critical responses. He basically was laughed out of town.

          • Steve Johnson says:

            Earthly Knight-

            He’s still a Harvard professor.

            He wasn’t laughed out of anywhere.

            The actual message here is that you can always try to push fraud as long as it’s progressive fraud and the worst thing that can happen is that people will disagree.

            The upside is that maybe you become the next superstar for inventing the next stage in progressivism. It’s a free bet, might as well take it.

            Of course, this also explains how the field is so contaminated by fraud.

          • PSJ says:

            I’m not sure I see how being an arrogant prick about your achievements is a progressive move? Simply because he stayed in his position (likely tenured), does not mean he has retained the respect of the community.

          • Earthly Knight says:

            He has tenure. For someone with tenure, the worst thing you can do to them is have no one else take their research seriously anymore. That’s all the comeuppance he’s going to get, which is okay, all he did was voice some foolish opinions on this one topic.

            I don’t know why you’re making this political, I can pretty much guarantee that both Mitchell and everyone dogpiling on him are leftists. If leftist academia is full of agitprop junk science, the best thing that can happen is leftist academics noticing this and trying to correct it, no?

          • nyccine says:

            @PSJ: You’re missing some context. Social Priming here is the supporting theory of how Stereotype Threat works, you know, how the underperformance of women/non-asian minorities in comparison to straight white men in certain fields is caused internalization of negative stereotypes. There’s a depressingly large section of social science pretty much devoted to explaining away achievement gaps (and some hard science as well; epigenics is being pushed in some corners as a biological cause).

            So, when Steve Johnson says “The actual message here is that you can always try to push fraud as long as it’s progressive fraud and the worst thing that can happen is that people will disagree.” he is emphatically *not* saying that being a dick is what makes one progressive, he’s saying that as long as you’re a dick in support of progressive causes, you’re much less likely to suffer for it. I can’t imagine, for example, people like Cochran and Harpending writing something like “On the Emptiness Of Failed Replications” in defense of some errors in their work, and there not being demands in academia that they be fired, as fast and as soon as possible. I think you also overstate just how much criticism Mitchell has gotten, compared to what he would have gotten if he were writing it in support of theories supporting right-wing causes.

        • Deiseach says:

          On one side are those who believe more strongly in their own infallibility than in the existence of reported effects; on the other are those who continue to believe that positive findings have infinitely greater evidentiary value than negative ones and that one cannot prove the null.

          I needed to look at that sentence a couple of times before I could understand it, but it seems to be saying that “If I perform an experiment and get a stated result, and you replicate it and don’t get the same result, then you’re wrong because you are denying real results happened” which is a bit jaw-dropping. The whole point of replication experiments is to see if the claimed results really happening; otherwise, we’re on the level of psychic experimentation where the medium cannot produced the same effects because of the negative thought waves of sceptics in the audience (maybe Mesdames Putt and Whitton should contact Dr Mitchell about how to be a gracious loser when your results can’t be reproduced independently?)

          His white swans/black swans example is topsy-turvy; his (or anybody else’s) experiment is the white swan. He says “Effect A happens”. Ten other people copy his experiment and say “No, it doesn’t.” The negative results are the black swans in this instance.

          His cook-book recipe example isn’t great either; sure, you won’t turn out a dish exactly the same as the illustration, but if you follow the recipe, you should end up with “Here, try my Jamie Oliver’s caramelised onions”, not “Hang on, Jason, these are glazed carrots”. “No, they’re onions! I followed the recipe exactly!” “Um – they’re orange and cylindrical and they taste like carrots”. “Well, you’re just mean-spirited and have a chilling effect on cookery!”

        • James D. Miller says:

          “Whether they mean to or not, authors and editors of failed replications are publicly impugning the scientific integrity of their colleagues. Targets of failed replications are justifiably upset, particularly given the inadequate basis for replicators extraordinary claims.”

          Very consistent with academic thinking on the importance of self esteem.

        • Who wouldn't want to be Anonymous says:

          Wow. And here I thought Dr. Oz had filled the quack quota for all of the Ivies.

    • Zykrom says:

      That “this guy’s opinion was so bad that his continued employment reflects badly on everyone in his entire field” feeling is (probably) what hardcore SJWism feels like from the inside.

      • suntzuanime says:

        This guy’s opinion directly reflects on his ability to do his job. If he worked at Foot Locker I wouldn’t care.

        • Nita says:

          Hey, at least he’s merely incompetent and wasting money, rather than putting people at risk by insisting that learning about their existence is dangerous to children.

          • suntzuanime says:

            I’m not at all sure that undermining the reliability of scientific results does not put people at risk, especially if someone foolishly teaches them that it’s only the bad people who don’t trust in Science and we mustn’t be like them so let’s Fucking Love Science with all our hearts.

            To be fair, there’s not much you can do that doesn’t put people at risk. It’s a shitty heuristic overall.

          • Deiseach says:

            “That mean ol’ experimenter says they couldn’t get the same results when they reproduced your study? Well, never you mind them, lil’ professor! You know you got a result, and that’s all that matters! Believe in yourself and keep your heart light shining! That’s what real science is all about!”

        • Zykrom says:

          that part too

          • suntzuanime says:

            That’s where you’re wrong; the Foot Locker example was chosen specifically because Ben Kuchera tried to get somebody fired from Foot Locker for supporting GamerGate. Or to take a more recent case, I don’t think anybody is under any illusions that shooting a lion makes you bad at dentistry. There’s something different at work.

          • Zykrom says:

            Fair enough.

          • suntzuanime says:

            Correction: it was a Dick’s Sporting Goods, not a Foot Locker. I apologize for the error.

  15. I thought this was an interesting post and makes a number of good points.
    One nitpick: in the Dijksterhuis and van Knippenberg paper, the dependent variable in the four experiments was number of correct answers on tests of general knowledge, rather than exam grades, i.e. the results of formal university exams. Although, if it really is a true effect, it might apply in real exams. I am not saying that I think that it is a true effect though, I am remaining neutral for now.

    Sanjay Srivastana has written a thoughtful article in which he also disagrees with Lisa Feldman Barret’s interpretation of failures to replicate. In brief, he argues that an explicit design goal of replication studies is to eliminate extraneous factors that could produce different results from the original experiment. Therefore, it is highly unlikely that most of the failures to replicate occurred due to such factors. Furthermore, the studies were pre-registered and the original experimenters were consulted in advance and thus had the opportunity to make predictions about anything that might cause the replication results to diverge from their own results.

    In another article that I can’t remember the link to, Rolf Zwaan argued that trying to explain away replication failures with post hoc explanations about minor differences between experimental conditions (e.g. Scott’s example of wallpaper color) is like arguing that the effects in question have very low generalizability in that they can be expected to occur only under very highly circumscribed circumstances and therefore are probably not all that interesting.

    Personally, I am impressed by the work of the Many Labs project which involved multiple simultaneous replication attempts in different countries with great care to make the experiments as uniform as possible. This produces many data-points, so it is not just a matter of “do we believe study A or study B.” If an effect fails to replicate in multiple instances like this, I think it is much harder to argue credibly that it is because of wallpaper effects. On the other hand, effects that are replicated in multiple instances have more robust support.

  16. Daniel Armak says:

    Re macro vs. micro: perhaps the psychological studies are at the micro level when they work with individual people. How fare our studies of large groups of people, i.e. psychohistory?

  17. Steve Johnson says:

    There’s an elephant in the living room of course.

    As Greg Cochran succinctly put it:

    Back in 1940, the Soviet powers that be wanted more wheat (and more dead kulaks, of course) . Today, our most desired product is excuses.

    He said this when discussing epigenetics. It applies equally well to priming and stereotype threat.

    • This is bizarre to me. I think you are massively overestimating the politicisation and leftism of psychology. Maybe its because most members of the far-left spout psychology, but that doesn’t mean most psychologists are members of the far-left.

      • Zebram says:

        I don’t think the overall set of social psychologists is politicized to the degree that conservatives generally believe, but there is some. I think the main issue is that studies which support progressive causes are touted in the media, whether mainstream or science media, while those supporting other causes are not.

        For example, I’ve heard repeatedly that studies show liberals are more intelligent than conservatives from many different sources. I haven’t looked at all the studies, but generally speaking, there does seem to be a small difference. However, other studies seem to relatively consistently show that libertarians are more intelligent than liberals. We don’t hear about those.

        • PSJ says:

          If you are talking about this, these are all sociologists. Psychometrics was developed in psychology, but using it to talk about culture and politics as a whole is more of a sociology thing.

          Psychologists tend to associate it with other biological and psychological features, how smaller interventions affect measurement, or psychopathologies.

          • Zebram says:

            Ah, I see. I suppose this confirms what others have been saying above. To those of us in the more ‘hard sciences’, such as physics and chemistry, we see fields like sociology and psychology as lumped together into one incoherent mess. I didn’t even think to differentiate between sociology and social psychology.

          • Sociology is highly politicised, especially since the 70s. It’s a shame, there is some interesting content hidden behind the (mostly left wing) politics.

          • PSJ says:

            What would your reaction be if somebody said that about chemistry and biology 🙂

          • Zebram says:

            @ PSJ:


          • Not Usually Anonymous says:

            Sociology and social psych: my naive outside view gives me a lot more respect for the latter than the former. Some people have mentioned the “cargo cult” concept, and the current replication crisis does make me a bit twitch about this, but in terms of the metaphor, large parts of sociology don’t even look like an airport.

            To put it another way, some people have a two-part distinction between the sciences and humanities, and some have a three-part distinction between natural sciences, social sciences and humanities (we’ll forget about maths and allied fields for now). IMO the border between social sciences and humanities runs roughly through the middle of sociology, with most of the “exciting” lefty stuff being on the humanities, and there being a much more dull-but-worthy numerical side that you rarely get to hear about. I’d go as far as to say that humanities-sociology has more in common with field such as continental philosophy than it does with the other sort of sociology – of course I’m getting deep into the realm of vague impressions and well out of my expertise here.

            It’s all a lot further from chemistry than, for example, linguistics is.

        • Yes, but what has that got to do with the far-left and the USSR? Conflating liberals and communists is silly. It’s like lefties that go around calling anything they don’t agree with “fascist”.

        • Wrong Species says:

          I love the hypocrisy in that. Let’s make sure we hype the studies showing conservatives are dumb but let’s ignore the studies showing differences in IQ between races.

          • Fairhaven says:

            they are contradictory studies, since most liberal voters are black and hispanic

          • Deiseach says:

            government bureaucrats (average to low)

            As a low-level public service minion, I should probably resent that – just as soon as my feeble intellect can work out whether I’ve been insulted or not 🙂

            The majority of liberals are black and Hispanic and white poor(not proven to have superior intelligence).

            Be careful if you’re conflating “vote Democrat” and “liberal”. African-American churches, for instance, are very conservative on “gay rights” and have strongly resisted attempts to identify activism with the Civil Rights movement of the 60s or “Gay rights are the Civil Rights of our day”.

            Their congregants may vote Democrat, but a lot of that would be in the same vein as old-style Irish Catholic voting Democrat; when the Democrats moved strongly socially liberal, a lot of those were the ones who then went Republican reluctantly.

        • Donttellmewhattothink says:

          A bit of common sense: liberals are over represented in academia (smart), Hollywood ( not smart), teachers (average), lawyers(average to smart), government bureaucrats (average to low), . The majority of liberals are black and Hispanic and white poor(not proven to have superior intelligence). Consevatives are over represented in the middle and working class, farmers, businessmen, doctors. Millionaires are evenly divided. It doesn’t seem plausible that these studies are accurately measuring intelligence, ony that they are designed by academics.

        • Fairhaven says:

          Zebram says:
          “For example, I’ve heard repeatedly that studies show liberals are more intelligent than conservatives from many different sources. I haven’t looked at all the studies, but generally speaking, there does seem to be a small difference. However, other studies seem to relatively consistently show that libertarians are more intelligent than liberals. We don’t hear about those”

          Does anyone have links to those studies or study? I have a vague memory of looking at one that was cited and dropping it when I saw the study subjects were all college students, as if that would be a representative sample of conservatives.

          My neice is going to the University of Wisconsin. Like so many kids of her generation, raised in creches and afterschool programs and on social media, she looks left and right at what everyone her age is thinking before she dares form a thought. To be a conservative in that milieu is tying a sign around your neck saying: “I am a social pariah. Give me failing grades. I’m okay with no sex and no friends for the next four years.”

          More seriously, no conservative would think up a study to prove conservatives are smarter. That’s a liberal trope.

          Since the parties are so polarized these days (see last post), is it legit to use Democrat and Republican for a rough rule of thumb for liberal and conservative?

          Ppolitical science actually has some hard data would give you more meaningful results than an academic study on liberalism and IQ. For example, lawyers are the most consistently Democrat voting block. There are four times more Democrat lawyers in Congress than doctors, and five times more Republican doctors than Democrat doctors.

          No one would set out to prove lawyers are smarter than doctors. The reason they sort that way is that lawyers’ economic self-interest leads them to want big government and no medical tort reform. Many doctors work for themselves, and across the board, most people who work for themselves are Republican. In 2012 doctors voted Republican by a 19 point margin, which I could guess is more than usual, and is presumably because they don’t believe Obamacare is good for them or their patients.

          It turns out, self-interest and life experience account very well for predicting how you vote.

          Within an economic class, political sorting correlates with other factors.
          In 2008 Obama carried the majority of the richer rich, those making $200,000 or more per year. These Democrat super-rich live on the two coasts, and have a different value system. They are not religious. They are not family oriented. In fact, they are rather bigoted and feel superior to people who are religious or family oriented, and don’t want to be in a group with them.

          Maggie Gallagher in Human Events :
          ” A 2009 Quinnipiac poll notes that socially liberal values rise with income – “support for same-sex marriage also rises with income, as those making less than $50,000 per year oppose it 54 to 39 percent, while voters making more than $100,000 per year support it 58 to 36 percent.” The very rich are disproportionately strong social liberals….”

          The very richest are Democrats:
          “…there seems to be a tipping point where the ultra-wealthy begin leaning Democratic. The most famous example would be the entertainment industry, where star-studded events have become a significant part of Democratic culture. …. A review of the 20 richest Americans… found that 60 percent affiliate with the Democratic Party…Among the richest families, the Democratic advantage rises even higher, to 75 percent.”

          Peter Schweizer at National Review explained in 2006 that Democrat millionaires and billionaires earn their money differently than rich Republicans.
          … the answer may lie in the way much of this wealth was accumulated. Some of these individuals (Kerry, Dayton, Rockefeller, etc.) inherited their wealth … they haven’t spent time building a business or even holding down a demanding job in corporate America. Others, particularly in the high-tech sector and Hollywood, amassed their wealth quickly and faced fewer challenges in dealing with invasive government and regulations.

          In Hollywood and high tech, there is a sudden jump into wealth. It can seem unearned and unfair. Taxes touch them less than the Republican two-income “rich” family making one hundred thousand.

          Über-rich Democrats often see good fortune as a lottery:

          The Silicon Valley 30-year-old worth $200 million on a stock IPO after six years in the business is likely to have a different view of wealth accumulation than the industrialist who amassed a similar fortune over the course of a lifetime.

          It might be true that liberals include America’s highest IQ cohort – that is, people with advanced degrees, but they are not the majority of liberals. Voting in real life sorts by class,race and economic self-interest, not IQ, although education and region have a role to play. Democrats are the super-rich, half of the rich, the upper middle class and the poor. Most of the middle class Democrat voters are government employees (including teachers). Republicans are half the rich, the middle class and working class. Most liberals are non-white, most conservatives are white.

          If ‘the liberals have a higher IQ’ study were true, it would also be true that poor people of color are overall higher IQ than working and middle class white people who don’t on the west coast or the northeast.

          don’t “studies” have to pass any kind of reality testing (apples do fall down when dropped from a tower) before getting taken seriously in academia?

          Nor do they necessarily have good brains, or even a good education, when it comes to public policy, morals, people or political issues.- the things that inform party affiliation.

      • Cauê says:

        This is bizarre to me. I think you are massively overestimating the politicisation and leftism of psychology. Maybe its because most members of the far-left spout psychology, but that doesn’t mean most psychologists are members of the far-left.


      • Sastan says:

        No, most are not members of the hard left. All are members of the hard left. To be fair, they’re very nice about it, and most are perfectly decent people. And they do show the almost-universal tendency to get very libertarian about the stuff they really care about. But seriously, the farthest right politically you can be known as being and hold a job in academic psychology is somewhere between Trotsky and Castro. It’s so bad that professors will straight up tell you that they would never hire or vote to hire someone who didn’t share their hard-left views. In an age where discrimination has such a strong adverse reaction, the fact that strong majorities will say things like this is extremely indicative.

        If you own your own practice, or are in industry, I have no idea, but I assume politics don’t matter nearly as much.

        • Peter says:

          “All are members of the hard left.”: Citation needed.

          For example, explain to me how Jonathan Haidt could be described as “hard left”. This article shows Haidt showing a bias in psychology, but he doesn’t show anything as ridiculously one-sided as you state.

          • Sastan says:

            Thank you for bringing up Haidt, he’s a perfect exemplar! If you read his books, you see he is quite open about being very liberal. In fact, his whole moral psychology of political groupings began as opposition research specifically to help Democrats win office. He was so disheartened after Kerry’s loss in 2004 he dedicated his whole research to cracking the moral code of conservatives, the better to lobby them with keywords that played to their prejudices!

            However, once he dug into the material, and found it more nuanced than that. And once he developed a baseline respect for the ideals of non-liberals, he was finished. He’s never worked in academic psychology since. He was hired at the NYU School of Business, as there isn’t a psychology program in the nation that will touch him now that he showed them all how biased they all are. And he’s still quite a liberal! He just doesn’t hate conservatives and libertarians enough to be kosher.

            Edit: You should read your own links, they make my point very well! http://www.yourmorals.org/blog/2011/02/discrimination-hurts-real-people/

          • Earthly Knight says:

            And once he developed a baseline respect for the ideals of non-liberals, he was finished. He’s never worked in academic psychology since. He was hired at the NYU School of Business, as there isn’t a psychology program in the nation that will touch him now that he showed them all how biased they all are.

            This is bollocks. Haidt is a celebrity, he could get hired just about anywhere. Presumably he went to a business school because they pay better.

          • Sastan says:

            I dispute your assertion. Perhaps you have evidence? I have the fact that he used to work in a psych department, and now doesn’t. And that anywhere between forty and eighty percent of psych professors will admit in a survey they discriminate in hiring decisions.

          • Earthly Knight says:


            But this is not really a question where evidence is needed, anyone with a passing familiarity with academia could tell you the same (among other things, Haidt was tenured at Virginia). The conservatives who are victims of liberal bias in the academy will be graduate students, adjuncts, and junior hires, not the Haidts of the world, who can fend just fine for themselves.

          • Marc Whipple says:

            EK: I don’t see any evidence. I see speculation from someone who has no obvious connection to the matter. I think “he used to be in a psych department and now he isn’t” is still the only hard fact we have at this point.

            Also, “this is not a question where evidence is needed” is a huge, screaming, flashing, red light with a siren on it.


          • Peter says:


            Oh boy, where do I start?

            a) Haidt identifies as moderate these days. I have read some of his books – one of the episodes that stuck in my mind was him going to India, doing his best to fit into the culture like a good little liberal and thereby absorbing a whole bunch of conservative ideas.

            b) “very liberal” is not “hard left”. Haidt is no Trotsky or Castro, he’s hardly even Jeremy Corbyn. I take it you’re not familiar with those parts of the radical left who use “liberal” as a pejorative… much like as many parts of the American right do. I suspect you’re suffering from a serious case of outgroup homogeneity.

            c) You should read your own links, they make my point very well. There’s a comment from someone griping about being a member of a “closeted conservative minority”. “All are members of the hard left” denies that there is any such minority.

          • Earthly Knight says:

            I think “he used to be in a psych department and now he isn’t” is still the only hard fact we have at this point.

            “Hard facts” are great. Understanding how institutions work is sometimes more valuable. If you poke around the internet you can find Haidt getting a target article published in BBS, Haidt getting marquee billing at various university-run conferences and speaking engagements (including at Virginia), and interviews with Haidt where he talks about how excited he is to move to NYU but somehow forgets to mention that he now faces a coordinated blacklist at every psychology department in the country. You will also notice a distinct absence of articles with alarming headlines like “superstar Virginia professor has tenure revoked” or “Haidt forced out at Virginia” which would have cropped up everywhere if the scenario Sastan envisions had any basis in reality. If you look especially hard, you can even find some data on comparative salary ranges of professors in psych departments versus professors in business schools. But this is only evidence given a background of knowledge which I cannot concisely transmit to you.

            Edit: For reference, here is what it looks like when a university tries to fire a tenured professor for his political views:


          • Sastan says:

            So allow me to grant for the sake of argument your theses.

            1: There are a vanishing few non-liberals in academia, underground, closeted, fearful for their careers so they dare not say so publicly, but there’s a couple!

            2: If you’re well known enough and a “superstar”, the university probably can’t directly fire you, since you were so liberal when they gave you tenure.

            Anyone else see how the best face you all are able to put on things is kind of……….horrible?

            And as to the quibbling over terminology of “hard left”, there’s really no objective measure, but in my experience any professor who calls himself a raving right-wing lunatic is more liberal than 90% of the American public. Anyone who calls himself a “moderate” generally means moderate between the Dems and the Communist party. And the self-styled “liberals” are farther to the left than any group in the US except journalists and actors. This especially holds for social issues, less so for financial.

          • Peter says:

            So if my theses go through, then you’ve been caught exaggerating and it follows that everything else you say is hyberbole too that we can’t trust that the other things you say aren’t hyperbole either.

            There is a problem. Ridiculous overblown hyperbole won’t help with it.

          • John Schilling says:

            So allow me to grant for the sake of argument your theses.

            I don’t think you have accurately described his theses, which almost certainly allow for a greater range of political opinion in academia than your capsule description, and also a broader range of status than just a handful of superstars and everyone else cowering in fear.

            But we’re also talking about your thesis, which was simply “All are members of the hard left.” And you made that in specific disagreement with the alternate thesis that “most [psychologists] are members of the far left”, so you clearly weren’t using “all” as a colloquialism for “most”.

            Most psychologists being members of the far left is a reasonable thesis. It may even be true if you squint and tilt your head just right when looking for far-leftness. But literally all of them? That thesis is disproven by a single example of a not-far-left psychologist, even if it is a moderate-leftist with superstar status. Again, I’m not a fan of argument by single anecdote, but you raised an argument that called for a single anecdote in rebuttal. You score no points now by saying, “Aha, but aside from your single anecdote I’m right about everything else!”

            Try not to do that next time.

          • Jiro says:

            But we’re also talking about your thesis, which was simply “All are members of the hard left.” And you made that in specific disagreement with the alternate thesis that “most [psychologists] are members of the far left”, so you clearly weren’t using “all” as a colloquialism for “most”.

            This is wrong, because you failed to consider that “most” was also interpreted as a colloquialism. If “all” is a colloquialsm for “most”, “most” can be taken as a colloquialism for “most, but not so much that the weaker side has no practical influence”.

    • Scott Alexander says:

      Most of the studies in the replication project were not political by any stretch of the imagination. They were things like “do complicated combinations of stimuli have longer sensory processing times than a different complicated combination of stimuli?”

      I agree that problems with science feed into problems with politics, but the problems with science got there independently.

  18. I broadly agree that psychology at the moment is in the grips of a crisis, for basically the reasons presented in the article and comments – publication bias, lack of talent amongst many many of its participants, etc.

    I’d disagree only with this: Given that many of the most famous psychology results are either extremely counterintuitive or highly politically motivated
    Huh? This is not obviously true more than economics, sociology or any other related field. In my opinion economics is the one where people’s theories line up most directly with their politics. Also, Milgram experiment has been replicated in a large number of different settings in multiple different countries/cultures/demographics. Afaik, Zimbardo’s prisoner experiment, the obvious famous one that is a bit wtf, is not widely regarded as true or useful within the field, apart from a teaching tool for thinking about methodology.

    and this: Since we have not yet done this, and don’t even know if it would work, we can expect even strong and well-accepted results not to apply in even very slightly different conditions.

    Conclusion does not obviously follow from premises? This seems like a massive leap.

    The general thesis of this article is true I think though. Publication bias in particular seems to damage psychology worse than many other fields because there’s so much scope for subtle fiddling with the setup, measurement etc.

    I think the danger however is to react in the way I think many STEM people do – by saying “psychology is pseudoscience stay away from it”, which makes the problem much much worse. I think the more appropriate reaction is “psychology is in a crisis, we need to invest more talent in the field to get better results”. The reasons why I think this more measured response is correct are:

    -There’s faulty expectations/conceptions of what psychology is. Psychology isn’t like physics, it’s like meteorology or climatology – massive complex systems are at work. There’s often sensitive dependence on initial conditions, and a massive multicausal clust**** of factors at play most of the time. If you expect something like the laws of physics, you’re going to be disappointed. Or, comparing psychiatry, while many of the “mechanical” principles of neurobiology are totally solid, actually applying it in real circumstances has quite variable results, even though the psychiatrist has extraordinary legal powers to directly intervene. That’s because stuff stuff to do with humans is really really hard.
    -Like PSJ said “most academic psychologists are very well aware that a large portion of published studies are flawed and spend a good amount of time trying to falsify them.” In my experience this is true. There’s a lot of idiots that seem to be doing their best to make the field look rubbish.
    -Chance are, you probably underestimate the achievements and usefulness of psychology, because to pinch Scott’s phrase, there’s psychology in the water supply. A lot of theories have sort of disseminated into the general populus, often through the hit and miss pathway of pop-psychology, and so now the bar has been raised for what can be considered novel and what gets attention. Hence the psychology you probably need to know is hidden amongst a huge load of awful rubbish. You might be tempted to decide the field is no longer worth investing in, except…
    -Psychology is very useful because it applies to more situations than most other fields, because it covers not only most domains of work, it also remains useful in social and private life, as well as in macro level political, economic and cultural situations.
    -STEM people tend to accept theories that are essentially psychology, provided they don’t intuitively sound or feel too much like psychology. The main examples of this, neither of which I dislike, is memes and signalling. But it’s worth noting that theories in social science/sociology/psychology have existed for a long time that are remarkably similar – conspicous consumption is the obvious one that comes to mind for signalling status. They also have an inbalance in skepticism for “sciency” feeling versions, such as evolutionary psychology, which amongst the social sciences is mostly regarded as untestable and extremely vague in its actual predictions of human behaviour. I’m not saying I think it’s worthless either, but I think “sciency” feeling theories tend to get a bit of a free pass.
    -If we don’t consider a least some empirical psychology, people often end up with whatever flimsy alternatives suits their politics. Using mostly philosophy for the prediction of human behaviour is probably the worst but most common way to screw up in this regard.

    I totally agree with the idea that psychology is in a crisis. But I think it’s worthwhile clarifying the appropriate reaction to that – it’s not about problematising it so we can legitimise any old conception of behaviour, but methodically working the through the flaws and trying to fix them.

    • suntzuanime says:

      I would say that if you’re defending the robustness of scientific findings in your field by comparing it to economics and friggin’ sociology, that’s a sign your field is in fact in serious crisis. Can you imagine what, say, climatologists would say if you said “well, the state of climate science isn’t really any worse than sociology”?

      • I agree that the field has a crisis, but the comparison point isn’t amongst the reasons. The point is that its a complex system you’re studying, and studying complex systems doesn’t look like studying isolated systems like in physics or chem. I don’t think its sensible to expect it to.

        • HeelBearCub says:

          It’s a bit like saying physics has a problem because it can’t predict the weather.

          It’s also a bit like trying to develop physics when you can only study the weather,

          • Odoacer says:

            But we can predict the weather with decent accuracy for the next several days.


          • PSJ says:


            This is my new favorite analogy.


            But we’re also pretty good at predicting behavior for the next few seconds-minutes.

          • HeelBearCub says:

            But you won’t do it using physics. Not directly.

            High praise!

          • Odoacer says:


            You’re really downplaying the achievement of accurately predicting the weather (see the article I linked). I’d wager the average person would be very good at predicting behavior for the next seconds-minutes, w/o the use/foreknowledge of any psych studies. Whereas

            Still, most people take their forecasts for granted. Like a baseball umpire, a weather forecaster rarely gets credit for getting the call right. Last summer [2011], meteorologists at the National Hurricane Center were tipped off to something serious when nearly all their computer models indicated that a fierce storm was going to be climbing the Northeast Corridor. The eerily similar results between models helped the center amplify its warning for Hurricane Irene well before it touched down on the Atlantic shore, prompting thousands to evacuate their homes. To many, particularly in New York, Irene was viewed as a media-manufactured nonevent, but that was largely because the Hurricane Center nailed its forecast. Six years earlier, the National Weather Service also made a nearly perfect forecast of Hurricane Katrina, anticipating its exact landfall almost 60 hours in advance. If public officials hadn’t bungled the evacuation of New Orleans, the death toll might have been remarkably low.

          • PSJ says:

            I don’t believe I am. I’m simply considering the achievements of psychology to be greater than you believe. Here is just one example.

          • vV_Vv says:


            that’s a neuroscience finding, not a psychology finding. And anyway, I think that the philosophical implications about “free will” that the authors engage in are overly speculative.

          • PSJ says:


            No. There is a largely false separation between cognitive psychology and cognitive neuroscience. My university doesn’t even have separate departments and most people in each field consider themselves as both.

          • Richard says:


            No. There is a largely false separation between cognitive psychology and cognitive neuroscience.

            I honestly believe that psychology is conflated with neuroscience the same way astrology used to be conflated with astronomy and I predict that in 50 years or so, this will be obvious.

            Any evidence to the contrary appreciated.

          • PSJ says:

            Frank Tong, one of the authors in the paper that sparked this discussion is in the psychology department of Vanderbilt.

            I am a psychology student at my university, but most of my current work would be considered neuroscience, but most of the work that inspired it is in Psychology. I have not to this date heard a coherent definition that clearly separates cognitive neuroscience and cognitive psychology

          • Odoacer says:


            I’m having difficulty pinning you down here. First you say psychology has achievements similar to weather forecasting, but you use a neuroscience example. Then you say that there’s no difference between cognitive psychology and cognitive neuroscience. You’ve gone from psychology -> neuroscience -> cog neuroscience -> cog psych. What exactly are you defending? Psychology as a field or only subsections like cognitive neuroscience?

            Regardless, the original argument is against psychological studies like those by Diederik Stapel, priming effects, etc. This is about the reproducibility crisis in psychology, as identified by people like Brian Nosek as written here:

            Abstract: Reproducibility is a defining feature of science, but the extent to which it characterizes current research is unknown. We conducted replications of 100 experimental and correlational studies published in three psychology journals using high-powered designs and original materials when available. Replication effects (Mr = .197, SD = .257) were half the magnitude of original effects (Mr = .403, SD = .188), representing a substantial decline. Ninety-seven percent of original studies had significant results (p < .05). Thirty-six percent of replications had significant results; 47% of original effect sizes were in the 95% confidence interval of the replication effect size; 39% of effects were subjectively rated to have replicated the original result; and, if no bias in original results is assumed, combining original and replication results left 68% with significant effects. Correlational tests suggest that replication success was better predicted by the strength of original evidence than by characteristics of the original and replication teams.


          • PSJ says:

            Apologies. Frank Tong was an author of the paper that vV_Vv claimed to be a neuroscience study in order to not allow it to count towards psychological success. That is the line of argument my most recent comment was addressing

            So, from my point of view I went Psychology->psychology sub discipline->same sub discipline->same sub discipline

            I wasn’t aiming to only defend that sub discipline, but to give examples of psychological successes, I have to give an example that falls under some sub discipline of psychology.

            None of my comments not directly below yours were responding to your arguments directly.

          • Setsize says:

            “Psychology” and “neuroscience” are two umbrella terms, each covering several subfields with disparate lineages, whose Venn diagram consists largely of overlap. There is not a meaningful boundary between them.

            The above referenced study (Haynes et al) might be called “neuroscience” because it uses functional imaging in addition to behavioral report. But techniques do not establish disciplinary boundaries — psychologists have always used neurophysiological measures when they are available and applicable. In fact that study is a followup to studies that used EEG rather than fMRI, which in turn were a followup to studies that merely asked observers to note the time on the clock when they “chose” to press the button. So this is a line of research that originates in psychology. It is a pattern that psychology first notices things that are later investigated using neurophysiological techniques.

            There are psychology journals and neuroscience journals; journals are better differentiated by technique than scientists are. If a psychologist does a study on visual perception that uses behavioral measures, they might publish in Journal of Vision or Perception or JOSA:A. If the study also includes neurophysiological measures they might publish in Journal of Neuroscience or Neuron or Visual Neuroscience. If you try to say that people who publish on one group of journals are “psychologists” and people who publish in the other group are “neuroscientists” you will find that all the neuroscientists cite psychology journals and all the psychologists cite neuroscience journals and most of them publish in both; this demarcation fails to carve the disciplines at their joints.

            One example of a historical success of psychology is the identification of trichromacy in color vision, and the identification (up to a linear transform) of the basis vectors that map electromagnetic radiation to color sensation (Young-Helmholz theory), which led to the standardized CIE coordinate spaces that we use to calibrate our monitors.

            Another historical success is the development of signal detection theory and its application to perceptual judgements, which took place simultaneously in psychology and electrical engineering (The historical motivation was radar; the performance of 1950s radar systems being a combination of the radar apparatus and the trained operator who interprets it). This leads to too many developments to list; for instance the models underlying lossy media compression like mpeg/mp3/jpeg.

            We’re hearing about a reproducibility project in psychology because (a) a subset of psychologists noticed the issue and started investigating it (somewhat later than people in biostatistics; see link to a cancer biology talk I posted upthread) and (b) replication of behavioral studies is cheap. I would wager the replicability of neuroimaging studies will turn out just as bad, but scanner time is more expensive.

          • PSJ says:

            Thank you for saying all this much better than I did! 🙂

            I also want to add reinforcement learning and certain planning methods in AI as successes of psychology

          • @HeelBearCub

            But are we claiming actual (modern) psychologists think they’re developing the physics of people? Wouldn’t that be a massive straw man? I think everyone knows psychology is kinda like studying the weather. It’s just an attempt to correlate behaviours, thoughts and circumstances (sometimes with neuroscience thrown in) into a complex model, which is, well, obviously quite hard. It’s still worth it because psychology is much more useful than knowing tomorrow’s weather (unless you’re a farmer). Yet people, especially STEM people, seem beat up on psychology its pseudoscience etc etc. in ways they’d never say about meteorology, which gets it wrong all the time. It’s weird, and I think probably a mistake. I agree there’s a massive crisis, but imo that’s different from there being something wrong with the field itself.

          • vV_Vv says:


            I guess that experimental psychology can be described as the study of human behavior at the level of external stimuli and externally observable actions, while neuroscience studies how these stimuli and actions correlate with brain activity.

            Obviously, the two fields have an overlap, since you can design a study that includes all these elements, and ultimately a theory of human behavior should be consistent with both neurological and behavioral evidence.

            But the study you cited is clearly in the neuroscience camp, since it investigates the correlation between neural activity and actions, not how external stimuli influence actions, while priming studies seem mostly in the psychology camp, since they investigate the correlation between external stimuli and actions, without measuring neural activity.


            Your response typical of how people try to defend “soft” sciences like psychology or economy when they are attacked for their lack of empirical rigor: “We are studying complex systems, you can’t expect the deterministic precision of classical physics, our field is more like meteorology/seismology/quantum mechanics.”

            I find this kind of response flawed. Meteorology/seismology/quantum mechanics don’t make deterministic predictions, but they do make lots of stochastic predictions which are statistically falsifiable, and in practice they are constantly tested and verified to a high accuracy.

            “Soft” sciences, on the other hand, often make vague predictions. When they make falsifiable predictions they are generally difficult to test, and when somebody bothers to test them, we see disastrous results such as the one discussed in this thread. Therefore, the analogy is invalid.

          • Urstoff says:

            @vV_Vv: I think that last bit is right as long as it is qualified to encompass only certain parts of economics and psychology. Microeconomics does make some pretty solid predictions, and there are even a (very) few solid predictions in macroeconomics (printing more money leads to inflation). In psychology, there are lots of solid predictions made in cognitive psychology; there is no crisis of confidence in areas researching memory, perception, etc. When you get to social psychology, the fuzzier areas of cognitive psychology (e.g., social priming, if that even counts as cognitive psychology), etc., the predictions start to fall apart. That isn’t to say that the theoretical work done in macroeconomics, social psychology, etc., isn’t important. It is, as it provides us with ways to think about these things, but confidence in predictions should be pretty low.

            Of course, what this doesn’t mean is that you then should have more confidence in “common sense” or your gut or some other heterodox theory just because the mainstream social science doesn’t make solid predictions. Just because macro (for example) doesn’t really know what’s going on doesn’t mean we can resort to economic predictions rooted in Marxism, ideology, etc. Macro isn’t a great science, but it’s the only one we’ve got.

          • vV_Vv says:

            few solid predictions in macroeconomics (printing more money leads to inflation).

            Isn’t this a prediction of macroeconomics in the same sense that “a dropped apple will fall down” is a prediction of physics?

          • Urstoff says:

            No, because it was not self-evident (even though it may seem self-evident to us now).

        • 27chaos says:

          Are you saying that psychology is in a crisis because psychology is complex and difficult to understand, or are you saying that psychology is in a crisis, and also psychology happens to be complex and difficult to understand? In other words, are you trying to say complexity exonerates the non-replications, or no? Because I can’t tell.

          • I’m saying the second one – the two facts are unrelated. IMO psychology isn’t fundamentally flawed if you don’t expect it to be the physics of people, and it separately is in crisis at the moment because of other reasons like publication bias etc listed in my OP and the article.

    • Zebram says:

      I’d agree with your statement regarding economics. When it comes to empirical evidence, you can find loads of evidence for almost any theoretical position. When I read Austrian and Keynesian economists citing empirical evidence, they both seem to make sense to me. You can make strong cases for either side with empirical evidence. It’s confirmation bias all over the place if you ask me.

      • Yeah a lot of stuff in the humanities and social sciences seem to be like – they sound fine when you’re reading them in isolation but they contradict eachother and can’t all be true. I find sometimes having intuitive reactions like that to stuff I’m reading can actually be harmful, because you assume you have some idea of the truth and get lazy about checking stuff, unlike with physics, where you know for a fact you have no real idea about what a particle does and why.

      • Peter says:

        Austrian economists cite empirical evidence? I thought they were all “economics is a priori and if the evidence disagrees with theory then the evidence is wrong”.

        • Zebram says:

          I wrote that incorrectly. They don’t believe in empirical data as evidence, but they start from theory and use it to explain empirical data. They do look at the empirical data, just not for proof of their theory’s validity. Probably because most other economists look at empirical data as evidence, and so they try to demonstrate how that same data could could be used as evidence for Austrian theory if one so desired.

          • Wrong Species says:

            The nice thing about Austrians is that the evidence is so widely against what they suggest that’s it’s easy to stop taking them seriously. I thought what they were saying had some validity until I realized that the hyperinflation wasn’t coming. With everyone else, the evidence is more mixed and their pronouncements aren’t as strong.

          • MichaelM says:

            Wrong Species:

            It’s really, deeply important to distinguish between three things:

            1. People who read a little bit of Austrian economics from mises.org and think that makes them Austrian economists.

            2. Some Austrian economists who fall on one side of a theoretical debate with respect to the money supply.

            3. Some Austrian economists who fall on the other side of this theoretical debate.

            Austrian theory can indeed understand the possibility of a flight to liquidity (ie. cash hording, which is why we HAVEN’T had serious inflation since the monetary base tripled seven years ago). Some modern Austrians (and a metric ton of internet commentator ‘Austrians’) disagree with what some others say about this possibility and so they think runaway inflation is right around the corner.

            You can find yourself disagreeing with the one side and find other Austrians agreeing with you on that. It’s just unfortunate that the ones you disagree with are numerous and loud and the ones who agree with you are not quite so noisy.

            Austrian economics has a great deal to offer a modern student of the subject without that student having to become a raving mises.org style loony.

      • I expect the Austrians and Keynesians are disagreeing about macroeconomics, that being the field for which “Keynesian” is a meaningful label. Macro is the part of economics that economists don’t understand. I like to describe a course in Macro as a tour of either a cemetery or a construction site.

        Economists do just fine at predicting the effects of price control, or tariffs, or … .

        • Nathan says:

          While I’m not about to disagree with you on this point, I’m a little surprised to hear you making it. As I recall your father had rather a lot to say about macro, at least as far as monetary policy went. Did he just not know what he was talking about?

          • Zebram says:

            I don’t think that’s the same David Friedman.

          • It is the same David Friedman.

            My father is one of the reasons that 60’s Keynesianism got relegated to the cemetery, although it has a tendency to rise from the dead when politically convenient. We know more about macro as a result of his work, but not nearly enough to make it a solved problem in even the limited sense in which price theory is (for the limits of the price theory case, read Coase).

            After I concluded that I was better at economics than at physics, which is what my doctorate is in, I decided that one way of slightly reducing the degree to which I spent my life categorized as my father’s son would be to stay out of the area where he had made his major contributions.

            I ended up in a field of economics in part invented by my uncle—but, fortunately for me, he is much less famous.

          • Steve Sailer says:

            Thank you, Dr. Friedman, for the charming personal note.

            I presume Dr. Friedman is referring to his uncle Aaron Director, 1901-2004(!):


    • Peter says:

      Afaik, Zimbardo’s prisoner experiment, the obvious famous one that is a bit wtf, is not widely regarded as true or useful within the field, apart from a teaching tool for thinking about methodology.

      A psychologist of my aquantance said, “it’s a way of teaching you about research ethics – it’s a good exercise to try to spot all of the unethical things in the experiment”.

    • Setsize says:

      Another phenomenon is that results in psychology which are robust, replicable, and support reduction to underlying mechanisms, have been reclassified (in the popular imagination) as being results in neuroscience, because psychology can’t get a break.

  19. Peter says:

    It’s interesting that the term “conceptual replication” hasn’t come up yet.

    With organic chemistry, often you need to adapt some reaction to your needs. You find some reaction from a paper from the 1970s, say, “well, one of my starting materials has a couple of extra methyl groups in it but hopefully they won’t get in the way, and they use benzene as their solvent but I’ll use toluene because it’s safer and thus the paperwork is easier, and I’m going to scale it up by a few hundred percent because I need lots of the product” but apart from that you follow the description. This wouldn’t count as a replication attempt to a chemist. It would only count if you used the original starting material and solvent, and tried to follow the recipe as faithfully as possible. If we talked like psychologists, we would call this a “conceptual replication attempt”.

    The Barrett article appears to be saying “reproduction failures aren’t a problem because you don’t expect conceptual replications to work all the time”, whereas as far as I can tell the Reproducibility Project isn’t doing conceptual replications – it’s attempting to do good-old-fashioned proper plain old replications. When most of those come back with different results, you have a problem.

    • Alexander Stanislaw says:

      How often do organic chemistry replications fail? I couldn’t find any data on that (replication is apparently not a good search term in this context). I’ve heard mentions of troubleshooting limbo in organic chemistry. I’d imagine the first attempt success rate is quite low, and the eventual success rate depends on the researchers patience.

      Of course the problem with psychology is that there is no way to check if you screwed up, whereas chemists have all sorts of tools. Could this be the main problem with psychology as a field? Its not necessarily that its practitioners are any less competent (I have no opinion on whether this is true, I don’t know much about psychology).

      • Peter says:

        Oooh, erm. You’ll no doubt be unsurprised to learn that there’s no major effort to ensure reproducibility of academic results. There are some areas of analytical chemistry in industry where they consider reproducibility – I did a summer job in an industrial analytic lab, they had a distinction between “repeatability” (same experimenter, same lab) and “reproducibility” (different experimenter, different lab) and if I recall right the error bars on the second tended to be twice the size as on the first.

        Chemistry is a big field, and I expect that things differ by subfield. In organic synthesis, an amount of repetition often happens because you often want to make something in order to use it in some later experiment; the general feeling seems to be that the reactions usually work, but the yield you get is often smaller than reported, especially if you’re a not-very-skilled grad student. Also the yields reported from certain countries which will remain un-named have a reputation for being inflated.

        There’s this journal, Organic Syntheses, which collects and curates synthesis procedures, and they actually repeat the syntheses in-house before they publish anything. Apparently their rejection rate due to non-reproducibility is about 3-5% – but in the period of 1982-2005 it was 12%.

        • Ant says:

          In the field of materials, Experiments are often partially replicated between studies, in order to check the validity of the new content. If you want to test a new measurement system, you will use the result of previous studies to calibrate it. Idem if you want to check the properties of new material, you will perform the same tests as some other team to show that you are correctly measuring what you are suppose to measure. Maybe this sort of calibration could work for psychology. Never do a study with completely new data, always have something to compare your data with

      • Anthony says:

        How often do organic chemistry replications fail?

        More often than you’d think. Derek Lowe and his commenters make comments about the irreproducibility of academic results quite often, and the problem is so bad they had a symposium about it.

        Retraction Watch has 251 posts on physical science retractions, though not all are for reproducibility. (The top one is for plagiarism, for example.)

  20. T. says:

    You can make anything statistically significant in psychology with enough “tricks” (http://neuron4.psych.ubc.ca/~schaller/528Readings/SimmonsNelsonSimonsohn2011.pdf).

    Some people still think it’s perfectly fine to use those tricks. Unfortunately, all these little tricks are allowed and only a few journals actually try and do something about them (like insisting on providing the reader with details). I don’t think that will help in preventing false-positives: just because you report what tricks you used doesn’t make a subsequent false-positive less likely.

    I fear the only thing that will help is setting new standards, which everyone will have to adhere to (like pre-registration), otherwise nothing will change. Imagine how many resources are wasted with how things are going now.

    • 27chaos says:

      but meta analyses mean it’s okay if there are a lot of fraudulent papers so really there’s no problem here at all /s

  21. piero says:

    It seems to me that trying to replicate a result in psychology is like trying to repilicate a weather forecast: you can never be absolutely certain that all relevant factors have been considered, and how much each factor contributes to the outcome. That short-term weather forecasts are pretty accurate only goes to show how much simpler fluid dynamics is compared to psychology.

  22. Zakharov says:

    Incidentally, are successful replications always published, or are they put in the file drawer because failed replications are much more interesting?

    • PSJ says:

      Depending on your definition of replication, they are often not. Many labs will do “pilot” studies before engaging in a research project to test viability. These often constitute a partial replication of the main thrust of the foundation papers (they are generally small scale). They are almost never published (usually for good reasons, but nonetheless). This is one way researchers build up an awareness of which new (or old) ideas are most promising as they, or other labs, may have tried a few out already without publication.

  23. Alex Zavoluk says:

    ” One out of every twenty studies will be positive by pure chance ”

    What? I think you’re alluding to the .05 critical p-value threshold, but that’s not what that tells you. That p-value tells you that of tests of nonexistant phenomena, 1 in 20 will appear true by chance–but what fraction of all published studies are these studies depends on quite a lot of other factors.

    • Adam says:

      I believe what he means is any individual researcher can select 20 hypotheses at random and get at least 1 significant result just by chance. Since this will be the one they publish, the prevalence of hypotheses significant only because of chance in the published literature will be much greater than 1 in 20.

  24. Typo: In the paragraph starting “The third and biggest concern”, you wrote ” I just so happened to pick the one that really does cause cancer”, when you meant to write “…. cure cancer”.

  25. terdragon says:

    So if my study comes back positive, but another team’s study comes back negative, it’s not “more likely” that my chemical does cure cancer but only under certain circumstances.

    Oh gosh, you’re right, it’s the conjunction fallacy: “Chemical X cures cancer, and only under certain very finicky conditions!” trying to be a more likely statement than “Chemical X cures cancer!”

    • PSJ says:

      This doesn’t seem quite right. It’s wouldn’t be surprising to find that the probability of (x plus any one of unnamed factors) is more likely to be true than (x on its own) to cure cancer. Not a real conjunction fallacy.

      • HeelBearCub says:

        Necessary, but not sufficient.

        Like saying it should be more likely that turning the handle would open a locked door than: inserting key, turning key, then turning handle.

    • RCF says:

      The term “conjunction fallacy” refers to only top-level conjunctions. Believing that “If (A and B) then C” is more likely than “If A then C” is not an example of the conjunction fallacy, because the “and” resides within an if-then statement. It is quite possible for the first statement to be true but the second false.

      • HeelBearCub says:

        Isn’t conjunction fallacy believing simply that A and B is more likely than A? Remove the “and then C”.

        • AJD says:

          The conjunction fallacy is believing that A and B is more likely than A, and it’s a fallacy because A and B is in fact less likely than A.

          ((A & B) –> C) actually is logically more likely than (A –> C), so believing it isn’t a fallacy.

  26. Q says:

    Is psychology any worse than biosciences regarding replication ? I dimly recollect they are in the same or similar situation. You would, however, still consider biosciences a promissing field.

  27. Professor Frink says:

    I don’t think psychology’s replication crisis is worse than any other fields. The issue is that some weird, counterintuitive result pops up and suddenly the media is running with it and reporting breathlessly. Meanwhile, the sober psychologists actually are attempting the replications, trying to figure out how the phenomena extend,etc.

    Psychology would have a replication crisis if none of these things are being replicated. But they are! Statistics isn’t a perfect tool. With garden of forking path problems and mild, unintentional p-hacking you can easily end up expecting lots of published results to fail replication (especially if you replicate the most counterintuitive, which are the most likely to fail). As long as replication and extension keeps fixing these problems we have nothing to worry about it.

    • Sastan says:

      I think it is, merely because humans are infinitely more complex than pretty much anything else we study. It’s not that psychological methods are terrible, it’s that the subjects are so sensitive. The problem Psychology has is fifty years of overconfidence and media coverage of that overconfidence. We’ve wrecked whole areas of society by broadcasting unproven new ideas (suppressed memories!) to a gullible public. What Psychology needs is the humility to tell everyone we’re a very young science, we barely know anything.

  28. Fairhaven says:

    After 15 years of private practice doing long term psychoanalytic psychotherapy, I concluded that the only solid thing I was taught was the benefit of the limits of the 50 minute hour and not having a personal relationship, and some very basic things about listening skills. other than that, I think the main thing any therapist has to offer (outside of drugs, which I didn’t prescribe) is your own personal wisdom and kindness, and the relief of a place to express yourself. for some people, sadly, the encouragement to express grievances and dwell on sorrow, and blame parents for everything, is so seductive, it is actually counterproductive. plus you’re getting all that lovely attention for remaining whiney and miserable. this sounds nasty, and i don’t really mean it that way – it’s just that therapy can reward being negative and stuck, and your heart breaks for the people who get no relief and no help despite your best and sometimes heroic efforts to help them. I finally stopped being a therapist and became a novelist.

    • LTP says:

      Might that be an artifact of psychoanalytic style therapy? I’ve been in therapy a lot, and honestly therapists have often actively discouraged me from wallowing and blaming my parents and so on if I did it too much.

    • Scott Alexander says:

      Can you explain the “not having a personal relationship?” I don’t feel like I really understand this well enough yet. I’m obviously not talking about “go on dates with them”, but things like “ask them how their kids are doing and show some sympathy and say you’ve been there too”.

  29. Steven says:

    The fact that the planes don’t land doesn’t prove there’s a crisis in our cargo cult!

  30. 27chaos says:

    I hate how the authors of that paper keep insisting that maybe other fields of science do just as bad, so there’s no justification to be irritated with psychologists or more skeptical of their results than of the results of someone else in another field. Like, it’s true that fraud happens everywhere, but that doesn’t lessen the importance of fraud anywhere. Psychology should get less credit after this result, even though it’s true that other fields should also get less credit. Plus, it’s entirely possible psychology is indeed worse than most other fields, this study constitutes decent evidence for that possibility. I suppose what I’m saying is that they’re refusing to update incrementally, and instead just denying that the results have any practical importance at all. But jumping to the idea that everyone is unfairly bullying psychologists seems very rich, after finding such a major flaw in the field’s research. The fact that you’re criticizing others in your field doesn’t mean your field gets a gold star sticker that magically immunizes it to generalized criticism. It should be the default, not something to be extremely proud about.

  31. RCF says:

    “The second concern is experimenter effects. Why do experimenters who believe in and support a phenomenon usually find it occurs, and experimenters who doubt the phenomenon usually find that it doesn’t?”

    Are experimenter effects found by people who don’t believe in them? If so, would that confirm or refute the concept?

  32. pf says:

    Only about 40% of psychology experiments replicate: have there been other systematic attempts at replicating many studies that found a similar proportion? Are there any planned? I’m not sure I trust such a dramatic outcome as that in a field where only 40% of studies have reproducible results….

    (…and if you systematically go through modern psychology textbooks and remove all conclusions drawn from experiments that haven’t been reproduced half a dozen times, what changes would that produce? How much of what is treated as “accepted knowledge” in psychological fields, on a par with Newtonian mechanics in physics prior to quantum theory and relativity, is built on results that haven’t been sufficiently replicated? I’ve been wondering about that since high school, but I’ve never known how to find out.)

  33. Rick G says:

    Read enough Andrew Gelman, and you’ll be become aware of a bunch of examples of and terms for statistical problems (e.g. Garden of Forking Paths) that are probably already floating around in your head. Then you can come to terms with the fact that the overwhelming cause of failed replications is that the the original result was the product of noise-mining, i.e. wasn’t true to begin with.

  34. (Most of this is lifted from a comment I wrote at Andrew Gelman’s blog, on the same topic.) I agree with your issues with this weak “defense” of replication failures, except perhaps “Right now, failed replications are deeply mysterious.” Reading a lot of papers with poor statistics, and following the literature on the crisis of non-reproducibility, false results, it seems that in most of these cases, people are fitting models to very noisy data with very few data points, which is always a dangerous thing to do.
    Doesn’t everyone realize that noise exists? After asking myself this a lot, I’ve concluded that the answer is no, at least at the intuitive level that is necessary to do meaningful science. This points to a failure in how we train students in the sciences. (Or at least, the not-very-quantitative sciences, which actually are quantitative, though students don’t want to hear that.)

    If I measured the angle that ten twigs on the sidewalk make with North, plot this versus the length of the twigs, and fit a line to it, I wouldn’t get a slope of zero. This is obvious, but I increasingly suspect that it isn’t obvious to many people. What’s worse, if I have some “theory” of twig orientation versus length, and some freedom to pick how many twigs I examine, and some more freedom to prune (sorry) outliers, I’m pretty sure I can show that this slope is “significantly different” from zero. I suspect that most of the people [that perform irreproducible studies] never done an exercise like this, and have also never done the sort of quantitative lab exercises that one does repeatedly in the “hard” sciences, and hence they never absorb an intuition for noise, sample sizes, etc. This “sense” should be a pre-requisite for adopting any statistical toolkit. If it isn’t, delusion and nonsense are the result.

  35. gwern says:

    I am not saying that we shouldn’t try to reconcile results and failed replications of those results, but we should do so in an informed Bayesian way

    http://alexanderetz.com/2015/08/30/the-bayesian-reproducibility-project/ is a Bayesian interpretation of the results: he treats it as a Bayes factor problem, and calculates the BF of each replication, finding that a lot increase the posterior probability of a non-zero effect but a lot also decreasing the posterior probability (sometimes by an enormous amount). Unfortunately, I don’t think he takes it a step further by calculating the original BF of the studies and seeing what the net posterior probabilities look like. (I think a good definition of a failure to replicate is if the replication’s BF is so small it completely neutralizes the original.)

  36. P. George Stewart says:

    Well, what have been the known motives of scientific fraud in the past? Use those criteria to spot-check on results going forward. No, they might not cover all possible cases, but at least we have somewhere to start.

    At a rough guess, I’d say bias arising from desire to preserve or enhance one’s social status, desire for money and desire to prove one’s ideology are going to be things you’ll want to test for.

  37. Urstoff says:

    Okay, so given all the hullabaloo in the last few years, do any types of priming still hold up? Perceptual priming seems pretty obvious to anyone with senses. Semantic priming was (seemingly) established in the literature long before social priming ever became a research topic. Presumably those types of priming are still legit, right?

    • 27chaos says:

      You might consider anchoring a type of priming.

    • Scott Alexander says:

      Stroop effect is still rock solid. I think the kind where if you see the word “doctor” you’re more likely to read _URSE as “nurse” rather than “curse” works too. But I think it has to be very short-term and very closely related.

  38. First comment in this your very, very excelent blog, so let me start sending big congratulations for the insightful, mostly funny and always entertaining posts.

    I very much agree with this one, and was specially moved by this sentence, which summarizes the problem with psychology (and, I’de dare to say, with every “social science”) today:

    I dare to say what most psychologist, both in clinical practice and in academia, consider to be the purpose of psychology is not to gain a deeper understanding of how the human mind actually works (“actually” has those nasty objectivistic, cognitivist connotations), but to get papers published and accumulate quotations, as that is what will be most beneficial for their careers. It is easy to forget, within the hubbub of preparing those papers, submitting them, reading what others publish to stay up to date, reviewing their papers in turn, attending conferences where one can network and gain access to additional publication venues… that the papers should be about something “real”, or, using an even more tainted word, have to be “true”.

    Of course, part of the problem is that truth (and the objective existence of an independent reality we can gain knowledge about) seem to be pretty unfashioanble these days…

  39. Lemminkainen says:

    It seems that the obvious solution for this sort of problem would be for some philanthropist (or perhaps the government?) to set up an institution dedicated to replicating studies in fields with replication problems like psychology, biology, medicine, etc. From what I understand about the academic labor market, graduate schools in biology and psychology produce a lot more PhDs than the set of science-doing institutions actually needs, so you could probably pick up some people who failed to get academic or research-institute or science-industry jobs but who still want to work as scientists fairly cheaply. (Medicine would be a bit harder for this) You could also create an endowment to pay a bunch of eminent people in the field who are concerned about replication a bunch of money to edit a peer-reviewed journal that publishes nothing but replication attempts.

  40. Ustun Sunay says:

    As a practicing chemist with a degree in psychology, it has come to my attention that a non-reproducible result in chemistry leads to more questioning than one in psychology by its practitioners. This may have something to do with the research culture in each field. And then maybe not…

    • Tom Womack says:

      In chemistry you’re much more confident about the consistency of the things you’re working with, and you’ve got a big set of anecdotal evidence about the ways that things that go wrong … if the reaction works with one bottle of samarium triflate and doesn’t work with another, you can buy a new bottle much more easily than you can round up another thirty undergrads; if it doesn’t work with the new bottle, then unless it was an exceptionally fascinating reaction you mutter ‘unknown contaminants’ and go on to do something else.

      • Peter says:

        Alternatively you can try to find out what it was was in the old bottle. There was an episode in my PhD lab where someone did some experiments using chloroform as the solvent, got some results, spent a few months doing something else, went back to those experiments and found she couldn’t repeat the results… until she realised that previously she’d been using “technical grade” chloroform and now she was using the purer “analytical grade” chloroform. The former contains traces of methanol. So she added traces of methanol to the analytical grade chloroform and then she was able to reproduce her old results.

        There seem to be some areas of inorganic chemistry, where leaving a half-empty half-sealed bottle of something in a cupboard for a few years is a good way to let just enough moisture and oxygen at things to make something with interesting, exciting and hard-to-reproduce activity…

        • Deiseach says:

          There seem to be some areas of inorganic chemistry, where leaving a half-empty half-sealed bottle of something in a cupboard for a few years is a good way to let just enough moisture and oxygen at things to make something with interesting, exciting and hard-to-reproduce activity…

          As with “Doctor Jekyll and Mr Hyde”:

          My provision of the salt, which had never been renewed since the date of the first experiment, began to run low. I sent out for a fresh supply, and mixed the draught; the ebullition followed, and the first change of colour, not the second; I drank it and it was without efficiency. You will learn from Poole how I have had London ransacked; it was in vain; and I am now persuaded that my first supply was impure, and that it was that unknown impurity which lent efficacy to the draught.

      • Deiseach says:

        you can buy a new bottle much more easily than you can round up another thirty undergrads

        Well, that’s your problem right there. The same way that specific lines of lab rats and mice have been bred to obtain consistent experimental results, somebody should look into breeding lines of undergrad psychology test subjects. More consistency, plus making it like Real Science running tests on rats! 🙂

  41. Doctor Mist says:

    Well, probably everybody has seen this, but just in case. Today’s xkcd.

  42. Dan Simon says:

    Your last paragraph, Scott, hints obliquely at what I think is the real core question here: why is there so much research on cases where priming works, and so little research on actually understanding priming–exploring possible mechanisms at work, for instance?

    My hypothesis: priming research was never intended to give insight into human psychology at all. Rather, it was developed as a magnificently productive source of relatively inexpensive, easy-to-perform experiments that could, with a minimal sprinkling of creativity, produce “surprising” results suitable for publication. And since the modern academic research career is focused entirely on generating publishable results rather than gaining scientific insight, priming was thus a psych researcher’s ideal research topic.

    The obvious follow-up question: what fraction of popular scientific research topics these days are popular for a similar reason?

    • HeelBearCub says:

      This is annoying. It basically says “psychology is replete with charlatans and liars. I would be happy to call anyone who does priming research a liar to their face. Actually, come to think of it, I’ll happily accuse anyone who is a professor a fraud until they provide evidence otherwise.”

      Am I putting words in your mouth? Or are you ascribing motivations to a whole cohort, absent evidence?

      • Douglas Knight says:

        Dan did not call them liars. Really, he didn’t. Nor will I call you a liar for this false accusation. He said that the experiments were intended to be cheap, not that they were intended to produce false positives. I suppose that some definitions of “charlatan” cover that. But the weak definition of people who fool themselves.

        But I did use the word “fraud.”

        Unlike Dan, I do not use the word “intend.” I make no claims about people’s thought processes. What is important is the selection pressure from the system. The system selects for people who produce a large number of published papers. It selects for people who get news coverage. It selects for people who are not held back from their experiments and their publication by their understanding of statistics. Publication selects for positive results.

        • HeelBearCub says:

          @Douglas Knight:
          “never intended to give insight into human psychology at all” and “a minimal sprinkling of creativity” both strongly imply an attempt to deceive. In additon, you don’t get psych research grants if you have no intention of providing insight into psychology. “I never said the word liar” is the kind of defense a 12 year old offers.

          I think there are problems with the publishing, grant and tenure process. When the model is now “publish or perish” and replication and negative results aren’t publishable and grant making bodies only fund things that seem the most sure to generate results, it will push the science in a certain direction. But that is really different than saying that all of academia is entirely unfocused on gaining scientific insight.

          There is a very, very important baby in that bath water. In fact, this idea that academics are bullshitters is, in no small part, at the heart of the current crisis. What attitudes and pressures do you think caused the current “publish or perish” paradigm?

          • Douglas Knight says:

            Grant committees aren’t mind-reading machines.

            Here is another phrasing: professors have been taught a procedure and have been promised that it will produce insights into psychology. They honestly pursue cargo cult science. They must have some interest in psychological insights, or they would be in another field. But there is a very strong selection pressure against people who give much thought to this procedure or what constitute psychological insight. They generally design research programs with the goal of being publishable. Or they inherit a research program from their advisor — another form of selection.

            I don’t uniformly condemn all academic experimental science. At the very least, some parts teach a more complicated procedures with lots of checks and replication. But there are a lot of areas without a baby under the bathwater.

          • HeelBearCub says:

            OP didn’t say that there were some specific areas that had issues, he said “the modern academic research career is focused entirely on generating publishable results rather than gaining scientific insight”. That means the entirety of academic research, which is flat out BS. Why are you defending that?

          • Douglas Knight says:

            If I were defending that statement, you could look at my defense of it and see why. In fact, I do think it is quite defensible statement, and completely compatible with everything I said. But I did not bother to defend it, because I don’t care what Dan thinks. It is much better to make my own statements, seeking to learn from Dan’s errors of obscurity. Not to mention your reaction to my previous interpretation.

            But, if you really care what Dan thinks, note that his last paragraph explicitly asks about diversity.

          • Dan Simon says:

            I wholeheartedly endorse Douglas Knight’s rephrasing of my point. My goal was not to impute deliberate fraud and deceit to the research community, the vast majority of whom conduct their actual investigations honestly and to the best of their ability. But in my experience, most researchers are also well aware that when it comes to *what* they investigate, they’re essentially participants in a massive game of “survivor”, and accept with varying degrees of regret that they must play along to avoid getting voted off the island.

            Many make the best compromises they can between their scientific idealism and their careerist realism, choosing areas of investigation that they can at least rationalize to themselves aren’t entirely useless charades (and bitterly reproaching, among like-minded friends, any colleague who is more compromised in this respect than they consider themselves to be). But those who are actually willing to sacrifice career goals to pursue what they believe is a far more important and productive research area for the betterment of the human condition are pretty few and far between.

      • 27chaos says:

        How dare someone doubt expert opinion!

      • Tom Scharf says:

        “absent evidence”?

        There is plenty of evidence to support exactly that conclusion. It is discussed in this post. The fact that you respond emotionally to the accusation and present no counter evidence is a foundation to the very problem itself, an appeal to self authority coupled with no apparent desire to hold the science to a higher standard.

    • Jiro says:

      Somebody mentioned this earlier, but priming is researched so much because priming is a social justice topic; priming is related to stereotype threat, which is used to explain minority underachievement. I suppose that is a subset of “never intended to give insight into human psychology” and of “getting publishable results”, though.

      • Scott Alexander says:

        I don’t think most priming research is explicitly related to this.

      • Sastan says:

        Actually, I think it is that priming is a very sensitive phenomenon, which means it can change in weird and interesting ways. More cynically, if you run enough priming experiments, you’ll get the result you want pretty soon, because anything and everything affects priming.

  43. LosLorenzo says:

    “Apples sometimes fall down, but equally often they fall up”

    I tried to replicate this, but failed. The apple just hovered.

  44. njnnja says:

    Isn’t the fundamental problem that humans have spent the last 50,000 years of societal evolution trying to figure out what makes us tick? So we understand pretty well the basic stuff about human motivation. So the only thing that modern psychology can add is counterintuitive results, which are likely to get published precisely because they are “adding to the sum of knowledge” while at the same time, most likely to be incorrect (given the huge prior against the result)

    If you want to understand human nature, it’s tough to beat cultural advances such as great literature, philosophy, and legal and ethical systems.

    As an aside, I think that the comparisons to things like physics are interesting because we tend to forget how advanced humans became in physics and engineering in ancient civilizations (pyramids, anyone?). So modern science gave us advances in physics in things like EM and thermodynamcs but at the end of the day it is likely that human behavior is far more complex than Maxwell’s equations, and the entire science paradigm might not ever work well in understanding human behavior.

  45. alaska3636 says:

    von Mises explained the difficulty facing the social sciences many years ago, first in Human Action and then again in Theory and History. Whereas the components of physical science exhibit a regular relationship between constants and variables, the actions of humans exhibit no such mathematical constancy.

    Psychology has always been a politically motivated creature and so it is not unusual that its proponents would like to see its results treated on the same level as those of the physical sciences.

  46. emblem14 says:

    I think it’s amusing when people get all verklempt about “soft sciences” trying to be real science and failing. Psychology, sociology, economics, political “science” etc. The disturbing part is when otherwise smart people start taking a field of study which on overview cannot provide us meaningful knowledge with any degree of confidence or predictability, and imposing someone’s pet theories du jour as public policy.

    The timeless Tom Leherer:


  47. Richard Kennaway says:

    Coincidentally or otherwise, an article in a similar vein recently appeared here. (HT Irina Margold.) “Science Isn’t Broken,” it is titled, then lists a catalogue of woes (failure to reproduce, p-hacking, post hoc hypothesizing, unconscious bias, deliberate fraud, vanity journals, corrupt refereeing, bad refereeing, and more). All that it then says to justify the proposition of the title is that science is difficult, all of those defects are actually signs of it getting better, and every result is a “temporary truth”.

    • Douglas Knight says:

      Scott linked to that in the previous link post.

      Most people have extremely false beliefs about how science actually works. A lot of things people complain about are very old and science seemed to work back then, anyhow. But new problems are probably a sign of things getting worse.

  48. Tom Scharf says:

    “You see a lot, Doctor. But are you strong enough to point that high-powered perception at yourself? What about it? Why don’t you – why don’t you look at yourself and write down what you see? Or maybe you’re afraid to.” – Clarice Starling – Silence of the Lambs.

    If tests were done honestly, then replication efforts would find STRONGER results 50% of the time, right? Anyone care to apply the “disparate impact” legal theory to this?

    There is a lot of fertile ground to plow in a self examination of how the social sciences reaches its conclusions. Anyone who has been around academia or the sciences for decades knows how much an experimenter’s desire to prove a conclusion NOT very mysteriously affects results. Confirmation bias and data mining being two leading contenders. Professional courtesy means looking the other way and not calling colleagues out. Add in that once someone makes their strong viewpoint public and retracting it becomes professionally embarrassing, you have a toxic mixture that self policing simply will not solve.

    I look at social science conclusions the same way I scan The National Enquirer at the grocery store. The result might actually be true, but there is no easy way to know if it is, and the source simply cannot be trusted. Any result that touches on the Red/Blue culture wars is to be assumed invalid until proven true beyond a clear and convincing threshold in my opinion. The preponderance of the evidence threshold is not adequate here. Too much self imposed political correctness and a profound political tilt in the social sciences taints results.

    Social sciences, investigate yourself. We don’t trust you. That’s the real crisis.

    • Anonymous says:

      If tests were done honestly, then replication efforts would find STRONGER results 50% of the time, right?

      No. For example, suppose all experiments are conducted 100% honestly but there are actually no interesting effects to discover. Some experiments will still find interesting effects because of statistical flukes, but these effects will almost always fail to replicate.

  49. Steve Sailer says:

    I’d like to emphasize the distinction between short-term and long-term predictions by pointing out two different fields that use scientific methods but come up with very different types of results.

    At one end of the continuum are Physics and astronomy. They tend to be useful at making very long term predictions: we know to the minute when the sun will come up tomorrow and when it will come up in a million years. The predictions of physics tend to work over very large spatial ranges, as well. As our astronomical instruments improve, we’ll be able to make similarly long term sunrise forecasts for other planetary systems.

    At the other end of the continuum is the marketing research industry, which uses scientific methods to make short-term, localized predictions.

    For example, “Dear Jello Brand Pudding, Your new TV commercials nostalgically bringing back Bill Cosby to endorse your product again have tested very poorly in our test market experiment, with the test group who saw the new commercials going on to buy far less Jello Pudding over the subsequent six months than the control group that didn’t see Mr. Cosby endorsing your product. We recommend against rolling these new spots out in the U.S. However, they tested remarkably well in China, where there has been coverage of Mr. Cosby’s recent public relations travails.”

    I ran these kind of huge laboratory-quality test markets over 30 years ago in places like Eau Claire, Wisconsin and Pittsfield, MA. (We didn’t have Chinese test markets, of course.) The scientific accuracy was amazing, even way back then.

    But while our marketing research test market laboratories were run on highly scientific lines, that didn’t make our results Science, at least not in the sense of discovering Permanent Laws of the Entire Universe. I vaguely recall that our company did a highly scientific test involving Bill Cosby’s pudding ads, and I believe Cosby’s ads tested well in the early 1980s. That doesn’t mean we discovered a permanent law of the universe: Have Bill Cosby Endorse Your Product.

    In fact, most people wouldn’t call marketing research a science, although it employs many people who studied sciences in college and more than a few who have graduate degrees in science, especially in psychology.

    Marketing Research doesn’t have a Replication Crisis. Clients don’t expect marketing research experiments from the 1990s to replicate with the same results in the 2010s.

    Where does psychology fall along this continuum between physics and marketing research?

    Most would agree it falls in the middle somewhere.

    My impression is that economic incentives push academic psychologists more toward interfacing closely with marketing research, which is corporate funded.

    Malcolm Gladwell discovered a goldmine in recounting to corporate audiences findings from social sciences. People in the marketing world like the prestige of Science and the assumption that Scientists are coming up with Permanent Laws of the Universe that will make their jobs easier because once they learn these secret laws, they won’t have to work so hard coming up with new stuff as customers get bored with old marketing campaigns.

    That kind of marketing money pushes psychologists toward experiments in how to manipulate behavior, making them more like marketing researchers. But everybody still expects psychological scientists to come up with Permanent Laws of the Universe even though marketing researchers seldom do.

    • Who wouldn't want to be Anonymous says:

      They tend to be useful at making very long term predictions: we know to the minute when the sun will come up tomorrow and when it will come up in a million years. [Emphasis added]

      I am not sure this is technically true. The n-body problem is really hard. Over the long term, perturbations between the planets are… unpredictable. If you add a zero or two, we don’t even know what order the planets are going to be in. Differences of a few meters in the starting position of Mercury in simulations, for example, is the difference between it crashing into the Sun, Venus, or Earth. Or differences as small as 15 meters in the position of the Earth makes it impossible to predict the season on Earth. If we can’t tell what season it is going to be in 100 million years, I have a hard time believing we know the exact minute of sunrise in one million years.

      • Eric says:

        Also how exactly would we test those predictions, wait a million years to see if you’re right?

      • According to Wikipedia:

        The planets’ orbits are chaotic over longer timescales, such that the whole Solar System possesses a Lyapunov time in the range of 2–230 million years.

        This suggests that it should be possible to predict when the sun will rise in a million years, but not in any longer timescale.

        • Who wouldn't want to be Anonymous says:

          Okay, I’m not going to lie, I was being a little smarmy. Nevertheless, my point was that if we can’t even predict the location of the Earth on time scales only an orders of magnitude or two larger (or three, for catastrophic collisions between the planets), predicting the rotation of the Earth in a million years is a fools errand. It relies on processes that are much more chaotic. Like climate change (and tectonic activity, and convention in the mantel, and… stuff we may not even know about).

          But if we’re going to play the Wikipedia game, try this one:

          But the principal effect is over the long term: over many centuries tidal friction inexorably slows Earth’s rate of rotation by about +2.3 ms/day/cy. However, there are other forces changing the rotation rate of the Earth. The most important one is believed to be a result of the melting of continental ice sheets at the end of the last glacial period. This removed their tremendous weight, allowing the land under them to begin to rebound upward in the polar regions, which has been continuing and will continue until isostatic equilibrium is reached. This “post-glacial rebound” brings mass closer to the rotation axis of the Earth, which makes the Earth spin faster (law of conservation of angular momentum): the rate derived from models is about −0.6 ms/day/cy. So the net acceleration (actually a deceleration) of the rotation of the Earth, or the change in the length of the mean solar day (LOD), is +1.7 ms/day/cy. This is indeed the average rate as observed over the past 27 centuries.

          We can’t get scientists to agree on what the glaciation is going to look like in 100 years, much less a million. More importantly, in order to make the prediction about sunrise, you would need to know the extent of glaciation for the entire duration between now and then.

          • Peter says:

            And given that a part of the uncertainty over glaciation is to do with uncertainty over what people will do to combat climate change, and the uncertainty about that is due to uncertainty about politics and thus people persuading each other… astronomy[1] is influenced by social psychology.

            [1] As in, “astronomical phenomena” not “astronomy, the field of study. Likewise for social psychology. Although social-psychological phenomena may be influenced by social psychology, the field of study.

    • AJD says:

      Although I understand the distinction you’re drawing between physics and market research, I think you’re ignoring or glossing over the fact that, since any individual piece of data can be evidence for any of a number of competing hypotheses, and things like background priors and relative explanatory power are involved in choosing which hypotheses to entertain. Have A Beloved Celebrity Whom No One Believes Anything Bad About Endorse Your Product isn’t exactly a Permanent Law of the Universe either, but it’s a lot closer to being one than Have Bill Cosby Endorse Your Product; and it’s no less supported by the ’80s research you refer to.

    • Deiseach says:

      I have a hard time believing we know the exact minute of sunrise in one million years.

      So don’t bet your bottom dollar that tomorrow (in one million years’ time) there’ll be sun? 🙂

  50. Santoculto says:

    Many psychological studies can also come with the label ” education ”, ” Freudian studies ”, etc …
    In all, the political subjectivity or cultural Marxism, which took possession of the ‘human sciences’, is having a strong role.
    Many studies in the humanities are not based on theory and practice, but only in the analysis of certain possibilities. Almost all ” human sciences’ are taken by ideological and abstract pollution. This lack of contact with the literal reality, not to mention the politically correct hysteria, may be having a great effect on the sharp drop of the credibility of human sciences.

    Many studies have analyzed ” student groups ” and extrapolate the behavior found as ” human behavior ”. Other factors in this same line of methodological flaw. The human being is a combination of biological variables from different natures, for each study in psychology, there would be the need to create reports, similar to population censuses, but with the addendum of biological, psychological, physiological, etc.

  51. aviti says:

    Wow, this is so true. It reflects how importance is attached to publications than to the conduct of the research itself. Now who to blame? Of course the consumers of the research. No matter what, you have to publish something. Usually you will not want to publish something that shows how an idiot you are. Result is that even scientific method has been hijacked by the people of shaman`s character that now it is difficult to differentiate the two. This also reminds of the fiasco of the STAP cells discovery. It led to shaming very important professors, some commited suicide, others forced into retirement, and others continue to suffer.

  52. FWC says:

    Here’s one idea that might help with some of the issues brought up in this post. The bottom line is to encourage researchers to reproduce their own work.

  53. Anonymous says:

    I find this paragraph – I’m sorry – absolutely idiotic:

    “When physicists discovered that subatomic particles didn’t obey Newton’s laws of motion, they didn’t cry out that Newton’s laws had “failed to replicate.” Instead, they realized that Newton’s laws were valid only in certain contexts, rather than being universal, and thus the science of quantum mechanics was born […]”

    By “replicating an experiment”, we of course want to replicate all of it as closely as possible. If the original study is flawless and a replicate of it finds something else than the original study, then it can’t be a replicate of the original study – something was clearly changed. There is absolutely no experiment in physics which sometimes yields results that follow Newton’s laws and sometimes results that follow quantum mechanics. It requires very different experiment to verify Newton’s second law and to verify that small particles under non-relativistic conditions follow Schrödinger equation. If the message here is that in psychology things are so chaotic that they can depend on arbitrary small differences between experiments (like the wallpaper color), then perhaps the results from these experiences are so weak that they are hardly worth studying in the first place.

    Ernest Rutherford famously said: “If your result needs a statistician then you should design a better experiment.” If you do a psychology experiment and your conclusion doesn’t outright jump out from the data, I would seriously reconsider how meaningful this result is.