SSC Journal Club: Expert Prediction Of Experiments


It’s been a good month for fretting over failures of expert opinion, so let’s look at DellaVigna & Pope, Predicting Experimental Results: Who Knows What?

The authors ran a pretty standard behavioral economics experiment where they asked people on Mechanical Turk to do a boring task while being graded on speed and accuracy. Then they offered one of fifteen different incentive schemes, like “we’ll pay you extra if you do well” or “your score will be publicly visible”.

But the point of the study wasn’t to determine which incentive scheme worked the best, it would determine who could best predict which incentive scheme worked the best. The researchers surveyed a bunch of people – economics professors, psychology professors, PhD students, undergrads, business students, and random Internet users on Mechanical Turk – and asked them to predict the experimental results. Since this was a pretty standard sort of behavioral economics experiment, they were wondering whether people with expertise and knowledge in the field might be better than randos at figuring out which schemes would work.

They found that knowledgeable academics had some advantage over randos, but with enough caveats that it’s worth going over in more detail.

First, they found that prestigious academics did no better (and possibly slightly worse) than less prestigious academics. Full professors did no better than associate professors, assistant professors, or PhD students. People with many publications and citations did no better than people with fewer publications and citations.

Second, they found that field didn’t matter. Behavioral economists did as well as microeconomists did as well as experimental psychologists did as well as theoretical psychologists. To be fair, this experiment was kind of in the intersection of economics and psychology, so all of these fields had equal claim to it. I would have liked to see some geologists or political scientists involved, but they weren’t.

Third, the expert advantage was present in one measure of accuracy (absolute forecast error), but not in another (rank-order correlation). On this second measure, experts and randos did about equally well. In other words, experts were better at guessing the exact number for each condition, but not any better at guessing which conditions would do better or worse relative to one another.

Fourth, the expert advantage was pretty small. Professors got an average error of 169, PhD students of 171, undergrads of 187, MBA students of 198, and MTurk users of 271 (random guessing gave an error of about 416). So the difference between undergrads and experts, although statistically significant, was hardly overwhelming.

Fifth, even the slightest use of “wisdom of crowds” was enough to overwhelm the expert advantage. A group of five undergrads averaged together had average error 115, again compared to individual experts’ error of 169! Five undergrads averaged together (115) did about as well as five experts averaged together (114). Twenty undergrads averaged together (95) did about as well as twenty experts averaged together (99).

Sixth, having even a little knowledge of individuals’ forecasting ability screened off expert status. The researchers gave forecasters some experimental data about the effects of a one-cent incentive and a ten-cent incentive, and asked them to predict the scores after a four-cent incentive – a simple, mechanical problem that just requires common sense. Randos who can do well on this problem do just as well as experts on the experiment as a whole. Likewise, randos who are noticed to do well on the first half of the experiment will do just as well as experts on the second half too. In other words, we’re back to finding “superforecasters”, people who are just consistently good at this kind of thing.

None of this seems to be too confounded by effort. The researchers are able to measure how much time people take on the task, whether they read the instructions carefully, etc. There is some advantage to not rushing through the task, but after that it doesn’t seem to matter much. They also try offering some of the Mechanical Turkers lots of money for getting the answers right. That doesn’t seem to help much either.

The researchers ask the experts to predict the results of this experiment. They (incorrectly) predict that prestigious academics with full professorships and lots of citations will do better than mere PhD students. They (incorrectly) predict that psychologists will do better than non-psychologists. They (correctly) predict that professors and PhD students will do better than undergrads and randos.


What do we make of this?

I would tentatively suggest it doesn’t look like experts’ expertise is helping them very much here. Part of this is that experts in three different fields did about equally well in predicting the experimental results. But this is only weak evidence; it could be that the necessary expertise is shared among those three fields, or that each field contains one helpful insight and someone who knew all three fields would do better than any of the single-field experts.

But more important, randos who are able to answer a very simple question, or who do well on other similar problems, do just as well as the experts. This suggests it’s possible to get expert-level performance just by being clever, without any particular expertise.

So is it just IQ? This is a tempting explanation. The US average IQ is 100. The undergrads in this experiment came from Berkeley, and Berkeley undergrads have an average SAT of 1375 = average IQ of 133 (this seems really high, but apparently matches estimates from The Bell Curve and the Brain Size blog; however, see Vaniver’s point here). That same Brain Size post proposes that the average professor has an IQ of 133, but I would expect psychology/economics professors to be higher, plus most of the people in this experiment were from really good schools. If we assume professors are 135-140, then this would neatly predict the differences seen from MTurkers to undergrads to professors.

But the MBA students really don’t fit into this model. The experiment gets them from the University of Chicago Booth School of Business, which is the top business school in the country and has an average GMAT score of 740. That corresponds to an IQ of almost 150, meaning this should be the highest-IQ sample in the study, yet the MBAs do worse than the undergrads. Unless I’m missing something, this is fatal to an IQ-based explanation.

I think that, as in Superforecasting, the best explanation is a separate “rationality” skill which is somewhat predicted by high IQ and scientific training, but not identical to either of them. Although some scientific fields can help you learn the basics of thinking clearly, it doesn’t matter what field you’re in or whether you’re in any field at all as long as you get there somehow.

I’m still confused by the MBA students, and expect to remain so. All MBA students were undergraduates once upon a time. Most of them probably took at least one economics class, which was where the researchers found and recruited their own undergraduates from. And most of them were probably top students from top institutions, given that they made it into the best business school in the US. So how come Berkeley undergraduates taking an econ class outperform people who used to be Berkeley undergraduates taking an econ class, but are now older and wiser and probably a little more selected? It might be that business school selects against the rationality skill, or it might be that business students learn some kind of anti-insight that systematically misleads them in these kinds of problems.

(note that the MBAs don’t put in less effort than the other groups; if anything, the reverse pattern is found).


Does this relate to interesting real-world issues like people’s trouble predicting this election?

One important caveat: this is all atheoretical. As far as I know, there’s no theory of psychology or economics that should let people predict how the incentive experiment would go. So it’s asking experts to use their intuition, supposedly primed by their expertise, to predict something they have no direct knowledge about. If the experiment were, say, physicists being asked to predict the speed of a falling object, or biologists being asked to predict how quickly a gene with a selective advantage would reach fixation, then we’d be in a very different position.

Another important caveat: predictive tasks are different than interpretative tasks. Ability to predict how an experiment will go without having any data differs from ability to crunch data in a complicated field and conclude that eg saturated fat causes/doesn’t cause heart attacks. I worry that a study like this might be used to discredit eg nutritional experts, and to argue that they might not be any better at nutrition than smart laymen. Whether or not this is true, the study doesn’t support it.

So one way of looking at it might be that this is a critique not of expertise, but of “punditry”. Engineers are still great at building bridges, doctors are still great at curing cancer, physicists are still great at knowing physics – but if you ask someone to predict something vaguely related to their field that they haven’t specifically developed and tested a theory to cope with, they won’t perform too far above bright undergrads. I think this is an important distinction.

But let’s also not get too complacent. The experts in this study clearly thought they would do better than PhD students. They thought that their professorships and studies and citations would help them. They were wrong. The distinction between punditry and expertise is pretty fuzzy. Had this study come out differently, I could have argued for placing nice clear lab experiments about incentive schemes in the “theory-based and amenable to expertise” category. You can spin a lot of things either direction.

I guess really the only conclusion you can draw from all of this is not to put any important decisions in the hands of people from top business schools.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

154 Responses to SSC Journal Club: Expert Prediction Of Experiments

  1. RomeoStevens says:

    My understanding of the forecasting literature is that the “anti-knowledge” that is gained is specifically that experts deviate from linear models to make non linear, threshold based adjustments that then do significantly worse than a linear model. The longer they have been an expert the more likely are to consider such deviations justified. Note that this is not claiming that expertise doesn’t exist. The Cambridge handbook of expertise posits that expert outperformance occurs when there is compressibility of domain knowledge representations that are tuned by tight feedback loops. In other words, expertise is most a thing in fields with tighter feedback loops.

    • Scott Alexander says:

      But it looks like experts did better; it’s only business students who did worse than expected based on IQ/domain knowledge.

      • RomeoStevens says:

        If the undergrads used a linear model they will do better than the experts. Averaging is an example, but all sorts of ad hoc models outperform.

  2. Elizabeth says:

    I giggled at the last line, then internally cried. I guess a good joke does that.

    Interesting juxtaposition of this study and the election forecasting. I also prefer how you summarized the findings of the article over how the original authors did.

    My impression from interacting with MBAs is that since they have to make actionable decisions in shorter time frames, they tend to think more quickly even when they don’t have that pressure on them which occasionally leads to the pitfalls of fast-thinking. Their training was essentially, “Do the best you can in as little time as possible”. You still have MBAs that incorporate data collection and analysis in their thinking but I think the data analysis phase before they feel they need to make a decision is shorter than for professors that have more pressure to produce work of higher rigor over longer time frames.

    However, this does not explain the discrepancy between MBA students and Undergrads. I think Undergrads would also be probably conditioned on shorter time frame work. And I have no actual data to back all of this up, it’s just conjecture based on people I know. So I’m no better at this, and should probably go look for data……..

    Edit I did not find data but I asked some of my MBA friends.

    Essentially, there are many different schools of thought in how an MBA person comes up with a decision. Stanford’s MBA students are more quantitatively oriented. Harvard’s make decisions based on metaphor and analogy. (I.e. This worked in this previous instance which is kinda similar to this new situation, so let’s try that.) So if they repeated this study with different schools of business thought, they’d very likely get different results. A University of Chicago Booth student may be able to explain better than I what the school of thought is there. From their website it seems to be more quantitative? But I can’t tell.

    • Douglas Knight says:

      That was also my thought, that MBAs would make such mistakes, or rationally decided to blow off this task, unlike the GMAT. Which is also my explanation of why Harvard students do much worse than MIT students on the cognitive reflection test.

      But Scott addresses this:

      (note that the MBAs don’t put in less effort than the other groups; if anything, the reverse pattern is found).

      So I am quite surprised.

  3. Luke Perrin says:

    Is the conversion between SAT or GMAT and IQ linear? If not then maybe your calculation of the IQs is giving results which are too large because you’re taking the IQ corresponding to the average rather than taking the average of the IQs.

    • vaniver says:

      They aren’t linear. I’ve done more complicated transforms before to account for this effect, and typically there isn’t much of an effect. I also don’t think there’s a huge difference between thinking of median IQ and mean IQ, with median IQ maybe preferable because of the nonlinearities.

  4. johnswentworth says:

    One hypothesis that could save the IQ theory is that GMAT test-takers have a narrower right tail of IQ than the general population. In other words, really smart people mostly don’t bother with business school, so a high-percentile GMAT score doesn’t translate to a high-percentile IQ.

    However, I think applying the same idea to the “separate rationality skill” is even more promising: people who are very good at predicting things do not go to business school. Perhaps such people predict that business school is mostly useless, despite being a high-status endeavor (similar to the study, in which status is a very weak predictor of results). Given everything I’ve heard about business school, this seems very plausible.

    • StellaAthena says:

      Honestly, it surprises me that Booth students don’t have a high density of people who are good at prediction.
      Where would you expect to find such people? A lot of consultants have MBAs. Maybe mathematicians and computer scientists? Investors and traders? Cops?

  5. Jacob says:

    I got my MBA from a slightly less rigorous b-school than Booth, but I was pretty surprised by how much MBA training is anti-rationalist.

    Every case study or competition is about writing the bottom line first and rationalizing it, every communication workshop drills in that “confident is more important than correct”. Prediction isn’t mentioned at all, except by finance professors who tell you that picking stocks is impossible. In b-school, a model never “fails”, you just need to work harder in applying it. A lot of classes (and MBA jobs) are about accommodating people’s irrationality, not correcting it.

    B-schools teach a lot of useful skills to a lot of smart people, but accurate, calibrated prediction isn’t one of them.

    • Moon says:

      This reminds me of a class in college taught by a wonderful Industrial/Organizational psychologist. he said that he observed that businesses generally love quick confident decisions makers far better than reflective, careful, or accurate, decision makers.

      They choose the guy who says “Yes, I know what to do. We’ll just tie that assembly line back together with bailing wire” And then it immediately breaks again, but he has another quick fix that doesn’t work, but that he is totally confident in. And on and on. People love this type of CEO, he said. He didn’t try to change that, as he saw it as a highly ingrained pattern.

      Maybe it’s just a cave man type of solution finding, that feels right viscerally to people in business “tribes”, due to long established patterns in human history.

      Whether it’s this or not, there can be “cultures” of different academic departments, that are wildly different from one another. Certainly the MBA culture is not really scientific in nature, as some other departments’ cultures may be. And business itself may be more competitive cut throat cave man type stuff. Academia and science may be cooler and more rational and less macho cave man like. Science is more recent human history sort of stuff– an adaptation for solving problems a little more complex than those of the average cave man.

      So maybe I Q is the predictor– with an exception to this rule if a student is studying in a department with a subculture which leans in a direction that is counterproductive– in a direction that interferes with competency in doing the task in the experiment.

    • Fossegrimen says:

      I don’t know anything about US MBAs, but this side of the pond, the people who go for MBAs are the “Rich > Right” crowd. I suspect there might be some self-selection going on.

    • The Element of Surprise says:

      It is as if challenges to business school graduates mostly consist of zero-sum-games against other graduates, instead of challenges imposed by decision making under uncertainty.

    • alwhite says:

      This is my experience as well. MBAs teach that the person is an Important Person, and that means don’t question their decisions.

      I’ve had way to many conversations that ended in “MBA: Do what I say. Me: Physics says no.”

    • Rock Lobster says:

      My take on this is that a lot of business roles are actually sales or motivational rather than analytical, even if they’re not explicitly designated as such. Upper management’s most important role is selling the company’s prospects to investors, creditors, potential business partners, and so on. Even “analytical” business professionals like management consultants are often claimed to be merely a tool for upper management to cut through institutional inertia and implement obviously needed changes.

      I sometimes joke that CFOs have a really easy job. They sit around waiting for the CEO to call and tell them the company needs to raise some capital. The CFO then calls up the bank to see where they could borrow at, pulls up what the company’s bonds are yielding on Bloomberg, and does a little Excel voodoo to estimate an implied cost of equity. Then he picks the lowest-costing option and goes back to doing nothing for another six months.

      Obviously that’s just a joke, and one of the important tasks of a CFO is maintaining and communicating the company’s financial and balance sheet plan to stakeholders.

  6. vaniver says:

    The undergrads in this experiment came from Berkeley, and Berkeley undergrads have an average SAT of 1375 = average IQ of 133 (this seems really high, but apparently matches estimates from The Bell Curve and the Brain Size blog).

    Seems too high to me. If they got a 1375 in 2016, they’re 96th percentile which corresponds to an IQ of 126.

    • qt31415926 says:

      Isn’t it possible that the people who take the SAT have a higher IQ than the average population?

      • Douglas Knight says:

        Vaniver’s link accounts for this: 93rd percentile of test takers, 96th percentile of population.

    • vilgothuhn says:

      This always confuses me. SAT is correlated with IQ, and that correlation is not 1 (it does not measure the exact same construct as an intelligence test, but intelligence strongly affects SAT scores). Should you not regress toward the mean when you translate SAT to IQ, then? Let’s say the correlation was 0.25 or something, then it’d be absurd to say that I’m “in the 96th percentile in IQ” because you’re in the 96th percentile on the SAT.

      So here’s the thing I’m confused about. You should regress toward the mean based on correlation strength if you’re computing individual scores, but what do you do with group scores? Should you assume the noise just cancels out, and thus just translate percentiles?

      So if anyone statistically savvy knows this I’d be glad to learn.

      • omegaxx says:

        I got interested in this question and did some digging:

        Let X2 (undergrads’ SAT) = normal distribution with mean 1000 and std of 100
        Lets say correlation (rho) = 0.3.
        Let X1 (IQ of general population) = normal distribution with mean 100 and std 10.
        Then X1 | X2 (Berkeley’s undergrads’ IQ score conditional on their SAT score of 1375)= normal distribution with mean (100+(10/100)*0.3*(1375-1000)) and variance of (1-0.3^2)*10^2.
        Therefore Berkeley’s undergrad’s IQ likely has mean = 111.25 and std = 3.16.

        • geoffb says:

          I think you want your parentheses to be like

          The actual rho between IQ and SAT is apparently about 0.82 (see Wiki on SAT).

          The SD for IQ is canonically 15 points.

          The mean SAT score is 1000 because the mean of each subscale is 500, but the standard deviation of 100 is for each SAT subscale, not the overall score. If we supposed the subscales were independent so that variances summed, the overall SAT std would be around sqrt (100^2 + 100^2) = ~140.

          Plugging that in gives a mean IQ of 100 + ((15/140) * 0.82 * (1375 – 1000)) = 133, which is what Scott said.

          This is not a response to the issue about regression to the mean in the parent post, of course, which is still interesting.

  7. I suspect that your SAT to IQ argument overestimates the IQ of Berkeley students. Berkeley almost certainly does use SAT and does not use IQ in deciding who to admit. So they are selecting for (among many other things) having done better on the SAT than your IQ would predict, whether because the student happened to be lucky on the SAT, was well coached, had intellectual abilities that applied more to the SAT than the IQ, or … .

    Similarly for other arguments with the same structure.

    • Scott Alexander says:

      Granted, but I think SAT is a small enough part of overall admissions that it shouldn’t decrease too far. Besides, the argument doesn’t really change if the average Berkeleyer is IQ 115.

      • Virbie says:

        > Granted, but I think SAT is a small enough part of overall admissions

        I graduated from Berkeley not too long ago: it’s worth noting that for Berkeley, this proportion is higher than you’d otherwise expect. There’s been some legal decisions that have limited their ability to use discretion and racial discrimination to choose their student body, so their formula for admission is a lot more comprehensible and standardized than most colleges (and thus more dependent on SAT scores than your prior for college admissions probably implies).

    • TheBearsHaveArrived says:

      Well, you are allowed to take the test 3 times and report only your highest score, so that *greatly* minimizes luck if a student feels they performed worse then their practice tests.

      There looks like there is an asymptotic slope of capabilities when it comes to study time and test results. Most college-motivated students who do well in high school and care about studying should come close to that peak result. Namely, perhaps the SAT is not a great aptitude test among the general population…but it comes close to one for motivated people that colleges care about in the first place.

      There are quite a few IQ test traits that are not measured that would probably be useful for colleges, and a lot is related to spatial abilities that matter in art, architecture and engineering. I am skeptical that the traits useful to engineering can even be tested for in this politically correct country, due to its inherent gender gap.

    • Not sure about new versions of the SAT, but the pre-1995 test is a pretty good proxy for IQ. Berkeley is a difficult school, from what I have heard,..elite schools exclusively select for IQ, probably to increase prestige.

      actually the evidence shows the gains from coaching are very modest

      The college counselors’ report concludes that, on average, prep courses yield only a modest benefit, “contrary to the claims made by many test-preparation providers.” It found that SAT coaching resulted in about 30 points in score improvement on the SAT, out of a possible 1600, and less than one point out of a possible 36 on the ACT, the other main college-entrance exam, says Derek Briggs, chairman of the research and methodology department at the University of Colorado in Boulder and author of the admissions counselors’ report.

      There is also an element of deception whereby some coaching companies use mock tests that are harder than the actual test, to inflate coaching performance metrics when the student takes the actual test, which is easier.

      The minimize the influence of luck, older versions of the SAT have 1/4 point penalty for wrong answers

      • TheBearsHaveArrived says:

        I posted on here that the current SAT has an asymtope of improvement on scores for students. That means that, after some studying, its a very very good IQ test. Or if not general IQ test, since some measures of intelligence are not tested correctly, at least a very good test on the innate traits it *can* measure. That is of course given the hours of study on the test.

        Ok. About that pre 1995 thing. People seem to complain about the analogy section being removed, but I think that’s a good thing. I remember on one question on an analogy section I took on a major test, I was comparing the angles of objects and its general geometrical structure, but actually the answer was comparing the civilizations and usages of the objects. I felt that was total bullshit that I was penalized for that, and in general I prefer the types of questions where you read a large verbal scientific passage with clear and absolute answers instead of reading some lib arts passage where the answers are not actually quite so clear. They both measure reading comprehension and short term memory.

      • Matt M says:

        “It found that SAT coaching resulted in about 30 points in score improvement on the SAT, out of a possible 1600”

        This may be technically true but I feel like it’s misleading. For the average student who is motivated enough to take the prep course, all scores say, under 1000, probably simply aren’t in play at all. So the test isn’t really scored from 0 – 1600 in any meaningful sense. A 30 point improvement could still be meaningful.

        • Desertopa says:

          I’ve trained for (although I did not end up working at) a position in an SAT prep company, and while this is probably true of the average student receiving one-on-one coaching, most of the students the company I trained with worked with (and I believe most other companies work with) are done in a group setting, and the standards for these students are often very low.

          The founder of the specific company where I trained was well aware of the research indicating minimal impact of coaching for the test, and made no claim of dramatic gains for recipients of the program. Rather, his perspective was that most of the gains were actually coming at the bottom, from students who lacked some basic test-taking competencies which could be improved with some relatively simple heuristics. Students who’re already competent test-takers, and who have the intelligence to recognize what kind of strategies the setup the SAT incentivizes, don’t benefit much.

          • thetitaniumdragon says:

            TBH, I’d imagine he’s correct; if you understand how the test works and how multiple questions tests work, and know what is going to be on the test, taking the test prep course probably won’t help. It is the people who are clueless who are most likely to benefit – but also probably are least likely to take the test prep courses.

  8. The Nybbler says:

    How certain is the correlation between GMAT and IQ? Certainly when I was in college, the business students were emphatically not considered the brightest, and my impression of the business curriculum is that the classwork isn’t the hard part of it.

  9. TheBearsHaveArrived says:

    Minor critiques and comments——I think your IQ estimate is off.
    Gmat score percentiles:
    IQ percentiles:

    A 740 is at the 97th percentile, which corresponds with a 130 IQ. Not a 150 IQ.—There would have to be cross correlations with the SAT, and then vs general IQ tests to see how much to bump up where the bell curve statistics start, or how different the scores are from a standard distribution.
    I am quite amenable to the idea that there is anti-knowledge gained in some types of classes in college. Any blank-slate fields are obvious examples.

    Much of psychology (and sadly psychiatry)is emperors new clothes stuff. Considering how there was a James Randi style study into cognitive behavioral therapy that kindof debunked the field, I’m sure there was lots of anti knowledge gained. Pretty much the bulk of psychoanalysis is anti-training.
    It seems like lots of business books massively overstate how great the insights are of some leaders in the field.
    I could see how someone trained in economics without application could inproperly use his/her brain on the cost of goods without taking into account useless status stuff, like certain types of cars. The existance of clothes with a name brand that look exactly the same as other goods, but cost 12x as much, is hard to explain.

    One issue with business schools is that they have a much higher spread of cognitive capabilities then other fields in the same colleges. I think its because other variables are weighted more.

    (by the way, in many undergrad business programs, the height of the male and the physical attractiveness of the women is better correlated with future financial success in that major then GPA….I wish I remembered the source)

    • Anon. says:

      97th percentile of people who apply for (post-)graduate degrees.

      • TheBearsHaveArrived says:

        Yes. That’s why I mentioned in there the the test must be compared with other normalized tests to more closely find the capabilities of whatever score there is.

        But I *doubt* its a 20 point bump. I doubt its a 10 point bump too.

        • Anon. says:

          Average of people with a graduate degree is 125, and that includes all sorts of low-IQ fluffy subjects (which don’t ask for a GMAT).

          20 point bump for GMAT seems reasonable.

          • Desertopa says:

            The far edges of a bell curve drop off faster than the center though. An IQ of 120 is approximately 10th percentile, the 1st percentile, or a tenth of the tenth, is about 135, and a tenth of that is about 147.

            If people who apply for post graduate degrees have an average of about 120 (since not all applicants get accepted,) the 97th percentile among those would be about 145.

    • Moon says:

      “It seems like lots of business books massively overstate how great the insights are of some leaders in the field.”

      Yes. A lot of business training is about marketing. So experts who market themselves well are highly regarded. But they may be better at marketing themselves, than at doing whatever else they do.

      And marketing yourself, perhaps the chief skill of business– whether by being tall or pretty or whatever– may not be particularly related to ability to predict experiments.

    • “One issue with business schools is that they have a much higher spread of cognitive capabilities then other fields in the same colleges.”

      Which raises another complication. How were the students who participated in the experiment selected? Perhaps smart business school students were not as interested in volunteering as less smart ones.

    • vilgothuhn says:

      Ok, I’m very interested in this study that “debunked” the field of cognitive behavioral therapy. Would you kindly link it.

      • TheBearsHaveArrived says:

        Well studies like this have been done before, and it was in a textbook of mine long ago that I don’t have anymore.

        What it was was basically putting in a general liberal arts college student who did well, have them brush up and skim some psychological literature, and then report how well that therapy did (vs those who have supposedly been trained in this really special cognitive/psychoanalytical therapy for years with clinical experience)

        More or less, the subjects reported the same improvement that the ones under the specially trained guys did. That’s how people like James Randi debunk fields of crystal therapy where people supposedly need to be intune in nature or specially trained to perform their work.

        And cynically, I think that’s the case. If you look at CBT, it looks like a modified blend of a friend giving comfort and a coaches prep-talk. Just a bunch of complex words have been added.

        In a way, that’s not debunking that some people can find it emotionally useful. But in an important way it is fundamentally debunking it, especially some for the fantastic claims people make for it.

        I guess the most similar analogy I can give is if some organization claimed that their massage therapy or something was passed down and was super special with deep insights and only super duper highly trained people can help with the small payment of 200 a session. Chiropractors are an example. A skeptic looks at it, selects people with some of the characteristic of the workers and location(male or female, college educated or not, room simply nicely decorated with calming colors or not) and tells them the very basics of what to do and what not to do.

        It then turns out that as long as an authority figure is there bell curve statistics take over , with 2 out of 10 people claiming it changed their lives regardless of anything.

        Say that 3 out of 10 people think there is nothing really to it, but don’t say anything because of various reasons.

        There are still lots of reasons why debunked fields can still exist in large quantities. What societal needs and wants can they fulfill, even if the primary claim doesn’t have much to it?

        If you looks at various global governmental organizations, 6 months of therapy straight bans you from various fields of work. If messed up people believe something will help them, then they will volunteer themselves for it. And then whoever has access to who is messed up and who isn’t can make various decisions off of that.

        • vilgothuhn says:

          I hope this is not rude but I was rather hoping for a link. When there’s no concrete data it’s hard to discuss. My first thought is that meaningful non-inferiority studies need large statistical power in a was that I would suspect the studies you talk about don’t have. But that’s just empty speculation, but that’s all I have right now since I’m discussing a study I haven’t read.

          My second reflection is what did they treat? Specific phobia? I’d put my personal confidence in CBT (that is exposure therapy) for specific phobias being effective around 99%. But the treatment is also easy to administer. Panic disorder, and social anxiety is probably also treatable with CBT. Depression, not sure. Bi-polar disorder, probably not. Trichotillomania, no idea?

          My third thought is, ok, how would you explain mediation studies showing that changes the thing that’s theoretically important in the psycholgical models social anxiety disorder (safety-behaviors and self-focus) are correlated with symptom reduction?

          It seems that the claim that CBT has been debunked is overly general. What part of CBT, the parts about operant conditioning being a thing, or the parts where I have “core beliefs” that lead to catastrophic interpretations? I think there are tons of criticisms against CBT, but I find it hard to believe that “the field” has been “debunked”.

          • TheBearsHaveArrived says:

            This is a major criticism I have with the field of psychology, and its a way to protect itself from criticism. And its a way lots of fields protect themselves. Its what happens when fractal wrongness* is combined with some truths in there,

            By making CBT so giant and vague, and making all forms of self-improvement somehow go under there, then it just shielded itself from specific claims against it. Its a tactic used by cold-readers.

            Just look at the first definition you can find of it

            “Cognitive behavioral therapy (CBT) is a short-term, goal-oriented psychotherapy treatment that takes a hands-on, practical approach to problem-solving. Its goal is to change patterns of thinking or behavior that are behind people’s difficulties, and so change the way they feel.”

            “When compared to psychotropic medications, review studies have found CBT-alone to be as effective for treating less severe forms of depression and anxiety, posttraumatic stress disorder (PTSD), tics, substance abuse (with the exception of opioid use disorder), eating disorders, and borderline personality disorder, and it is often recommended in combination with medications for treating other conditions, such as severe obsessive compulsive disorder (OCD) and major depression, opioid addiction, bipolar, and psychotic disorders.[1]”

            Just like how seroquel is now prescribed for…virtually everything(John Oliver did a good take on it), so is CBT . That should mean something very very bad for a fields fundamental honesty. And it also means something bad about how CBT is defined in the first place.

            You can see something similar with Autism. What used to be 4(or 10) sub-diagnoses all got wrapped up into 1 vague umbrella, and now there is a national crisis about autism that ends up close to fractally wrong with some circles of truth thrown in.

            Makes an excellent statement on how the field strangely protects itself on antidepressant treatment with biased depression tests in the first place. You can see how the Hamilton-D can lead to fractal wrongness in depression talk and treatment, and splits a good observation into a million qualifications.

          • Moon says:

            TheBears, there are numerous issues here, from how Medicare and Medicaid are dispensed and how that works or doesn’t, to how payment for doctors and mental health professionals works– in that they have to make a diagnosis on the list of ICD-9 diagnoses in order to get paid.

            Another issue: If you don’t want mental health professionals to use CBT as the treatment for “less severe forms of depression and anxiety, posttraumatic stress disorder (PTSD), tics, substance abuse (with the exception of opioid use disorder), eating disorders, and borderline personality disorder, and it is often recommended in combination with medications for treating other conditions, such as severe obsessive compulsive disorder (OCD) and major depression, opioid addiction, bipolar, and psychotic disorders.[1]”– conditions for which CBT is said to work as well as medications, then..

            Okay, so let’s not do that. So what better treatment can you suggest that mental health professionals should use instead? You’re a psychiatrist or a psychologist or social worker. Someone walks into your office, and you find that they meet the diagnostic criteria for one of these diagnoses. What are you going to do instead of CBT?

            Certainly over time we might come up with better treatments. Scientists might do research to find those. If they can, they should.

            But such are the issues in fields which are practical– not just scientific. Something needs to be done, or at least tried, in those cases where diagnoses do apply, during the time while we are waiting for researchers to come up with the answer to what the most effective treatment is.

            And, of course, if the Medicaid system needs to be fixed, then we ought to fix it.

            I also question this part of the article you linked to above.

            “But, surprisingly, we have a lot of poor people in the country. The government has found a way to transfer to them just enough money and services to keep them from rioting, without calling it a transfer, without calling it socialism.”

            Earlier in the article, the writer mentioned that the Medicaid recipient may go to Eaton, or may be the son of a restaurant owner who didn’t want to pay for insurance. Or may be your 25 year old unemployed actor son, who lives rent-free in your house and spends your money on chest waxing and self-tanning cream.

            None of these people sound as if they are poor, or as if they would likely be rioting if they were not able to receive Medicaid.

            So was that explanation at the end of the article just a standard Libertarian explanation for every bad thing– that it’s socialism and that it benefits the poor? Because what the author described in the rest of the article sounds like something that benefits those members of the middle class who know how to work the system.

          • TheBearsHaveArrived says:

            First off, the fractal wrongness part.
            “conditions for which CBT is said to work as well as medications, then..”

            Some combination of the inherent flaws of the Hamilton-D(and related tests), not counting study drop-out rates(as AA does), already known corruption in the industry makes any comparison with it working as well as medications as suspect. That “as well as medications” comparison is hard to do in an industry as this. The fact that we can’t see at least half the studies done on multiple drugs(and neither can your doctor) without the resources to request a Freedom of Information Request(that they probably won’t answer unless you have the type of pull a Harvard Professor has) makes pinning down anything horrible..and on purpose.

            Also, part of my critique of CBT is how vague it became(and I suppose how it started), and it seems part of an industry wide trend to start approving therapies and meds to…as much as possible for financial gain and try and hide all the faults with statistical malfeasance and a certain type of verbal complexity added to mask its flaws. Depending on which professor or practitioner decides to define it today, its a specific type of therapy only useful in categoryA12 on populatoin subtype13(which may actually be true!…given usage of CBT subcategory 123C), or it might as well be “The Secret” where you just need to change your beliefs and things become true.

            “Something needs to be done, or at least tried, in those cases where diagnoses do apply, during the time while we are waiting for researchers to come up with the answer to what the most effective treatment is.”

            No. Merely because there is a diagnosis and an official treatment plan doesn’t mean that both the diagnosis and treatment don’t have a million faults, that end up doing more harm or are a waste of time and money. The case with SSRI’s and teenagers is an apt example. They ended up being banned in multiple european countries due to distrust of pharma statistics and likely worsening of long term outcomes.

            When the gold standard Ham D is so utterly awful full of flaws and clear biases(the guilt measure for 3 and 4 are like schizophrenia, along with sedation and eating more food heling the score. Its rigged from the start to aid neuroleptics pass official depression tests. It only has one pure depression/sadness question, and that too is rigged for those with outward displays of it. The fact that paranoid symptoms on a depression test is the gold standard should be a dead giveaway for it)

            *The fact that I can knock 12 points off a Hamilton Depression scale with an Ambien and BID Krispy Kream should serve as a warning about the validity and generalizability of the term “antidepressant.” *

            Due to that, the test clearly allows people to get worse in many ways while officially decreasing depression scores. Explains the teenager result.

            As for his commentary on medicade in some cases, the writer never said it was a bad thing. Just its the way it is. When there’s not enough jobs, its all being automated away(mcdonalds just got rid of tons of their front-service jobs), and the population rejects welfare due to some supposed American principal, where else to go but SSI? Prison, the streets?

          • Moon says:

            >The fact that I can knock 12 points off a Hamilton Depression scale with an Ambien and BID Krispy Kream should serve as a warning about the validity and generalizability of the term “antidepressant.”

            Not necessarily. Ambien is for sleep. Insomnia is common among depressed people and it may possibly even cause the depression.

            Also, “depression frequently co-occurs with other psychiatric problems. The 1990–92 National Comorbidity Survey (US) reports that half of those with major depression also have lifetime anxiety and its associated disorders such as generalized anxiety disorder.”


            So it is not as easy as one might guess, to separate one mental disorder from another.

            And even “pure” depression isn’t just sadness. There are numerous other symptoms. Here is another quote from the wikipedia article linked above:

            “Depressed people may be preoccupied with, or ruminate over, thoughts and feelings of worthlessness, inappropriate guilt or regret, helplessness, hopelessness, and self-hatred.[21] In severe cases, depressed people may have symptoms of psychosis. These symptoms include delusions or, less commonly, hallucinations, usually unpleasant.[22] Other symptoms of depression include poor concentration and memory (especially in those with melancholic or psychotic features),[23] withdrawal from social situations and activities, reduced sex drive, irritability,[24] and thoughts of death or suicide. Insomnia is common among the depressed. ”

            This is not to say that there could not be improvement in diagnosis and treatment of depression. Clearly there can be, and there needs to be. It’s just that the diagnosis and treatment of depression are more complex than they might initially appear to be.

          • vilgothuhn says:


            “By making CBT so giant and vague, and making all forms of self-improvement somehow go under there, then it just shielded itself from specific claims against it. Its a tactic used by cold-readers.”

            You don’t just throw CBT on problems. You use specific treatment manuals that have been tested in RCTs. Thus the comparisons to seroquel doesn’t hold. It’s not actually the same things being prescribed. I agree that CBT as a term is vague, but that’s also my criticism of your criticism. Saying CBT has been debunked seems too general. What specifically are you arguing has been debunked?
            (Also, not to be rude, but what is the n of the studies in question, because I’ve gathered that a lot of dodo-bird verdict stuff may actually be about underpowered comparisons.)

            The comparison to cold-reading seems unfair to me, psychiatry/clinical psychology is theoretically plausible and practised by people who actually wants to help. Cold reading is practiced by quacks. Having vague/broad/bad definitions of terms =/= being a quack. Which I feel is the association you’re trying to establish.

        • Moon says:

          >If you looks at various global governmental organizations, 6 months of therapy straight bans you from various fields of work.

          I had not heard that. Unfortunate, if so.

          >If messed up people believe something will help them, then they will volunteer themselves for it. And then whoever has access to who is messed up and who isn’t can make various decisions off of that.

          Going to therapy doesn’t necessarily mean you are more messed up than people who do not go to therapy. Many people do not have the flexibility and stability to be good candidates for therapy, and they are incapable of making the changes that people go to therapy to make. And many people who go to therapy, do so for personal growth purposes, and may be much more functional that the typical person who has never been to a psychotherapist.

  10. Moon says:

    And then there’s the possibility that unless prediction is in a subarea of your field that you are highly familiar with, there is no reason why should you be any better than anyone else in predicting results of an experiment.

    In fact, is not the whole point of science to do experiments in order to gain knowledge, rather than to believe you can train yourself to predict the results of an experiment that hasn’t been completed yet?

    • Winter Shaker says:

      This seems relevant. More generally, ‘making successful predictions’ is a big part of science, even if ‘figuring out tests that would disconfirm them if false’ is the part that often gets emphasised, because there are so many more possible wrong hypotheses than right ones that you pay a huge price in terms of human effort spent if you can’t narrow down the range of options and end up testing everything you can think of.

      Sure, believing you can train yourself to predict the results of experiments if you can’t is a serious failure mode, but seeking to find out to what degree people can train themselves to be better predictors is still a prima facie worthwhile project.

  11. Moon says:

    As far as election prediction is concerned, it’s similar to other types of prediction for money. In a capitalist society, people sell what sells. Therefore, they sell what people want, regardless of whether it exists or not. So “expert systems” to predict the stock market or to show you how to make a billion in real estate are sold, because people want such things, not because they really exist.

    Kind of like the old saying “There’s a sucker born every minute– and two out to get him.”

    People don’t like uncertainty. So they want to buy certainly. So other people sell them certainty and predictions. The predictors may try their best. But that doesn’t mean that that event can actually be predicted. It just means that “the market will bear it” or will pay for it. The “free market” pays for a lot of stuff that doesn’t actually exist.

    Perhaps the purest form of this is the selling of the videos, books, workshops etc. on The Secret. It’s essentially selling people on the belief that wishes are horses and that they can be riding on them tomorrow– that thinking/wishing makes it so, if you only believe it enough. People who have not been exposed to this would be astounded to find out how many hundreds of millions (at least) of dollars have been made off of this scam. After a few years of it never working, then new courses started coming out instructing people in the missing clues to how to make The Secret work– that if you only had this new bit of info, the method would start working perfectly.

    I don’ know if there are any physicists here. But physics has become a great marketing tool in recent years. Practically everyone selling some kind of medical or personal growth or business method snake oil says “This works just like quantum physics.” Of course, many still use the old metaphor “It works just like software on your computer, only it’s the software of your mind that works this way.”

    • Moon says:

      A capitalist society will bear a huge amount of waste and dysfunction, and huge amounts of deception– e.g. false advertising lies going on successfully for generations on end. And tons of things being sold that do not actually exist.

      Homeopathy is Scott’s favorite example, but there are many others. Sales of fake news by up and coming soon-to-be-wealthy teenagers in Macedonia is another. Theoretically the free market is supposed to correct all of this waste and dysfunction and deception. What happened?

      • CatCube says:

        No, the free market does not correct for all the waste, dysfunction, and deception, nor do most proponents think that it will. It just still does better than non-free market systems in spite of it.

        • Moon says:


          • The Nybbler says:


          • For some evidence, compare Venezuela to Chile.

          • Aapje says:


            Which pure free market systems are you thinking of?

            All the capitalist countries that I know are not purely free market systems.

            For some evidence, compare Venezuela to Chile.

            Chile does not have a pure free market.

            I’ll gladly grant you that a moderate left-wing government like in Chile does a better job than an authoritarian, corrupt and anti-capitalist government like in Venezuela.

            However, this does not prove the claim.

          • keranih says:

            Venezuela wasn’t a pure communist society either, and to hear the apologists say so, we’ve never actually tried a pure society (of any stripe.)

            I think we’ll need to be accepting of [X] degree of failure, and look for the systems that reduce X to as low as possible, whilst keeping the other downsides minimized.

          • Chile was and is much more free market than Venezuela.

            West Germany was much more free market than East Germany.

            Taiwan and Hong Kong were much more free market than Maoist China, and current China is much more free market than Maoist China was.

            South Korea is much more free market than North Korea.

            We don’t have any pure free market societies or pure non-free market societies to compare. But there is a pretty striking pattern if you compare the more free market country of an otherwise moderately similar pair with the less free market country.

          • Moon says:

            That’s correct if there are only 2 possibilities– free market vs. Communist. But s not a very realistic comparison for Americans, almost none of whom want a Communist economic system.

            If you look at regulated capitalist markets vs. inadequately regulated capitalist markets, regulation is capable of cutting down on fraud and deception a lot, if the government steers regulation in that direction.

          • Murphy says:


            While I personally agree with the thesis that free markets work better I don’t think this method of proving it is epistemologically safe.

            From Scotts review of read plenty I remember this section:

            “One of the book’s most frequently-hammered-in points was that there was was a brief moment, back during the 1950s, when everything seemed to be going right for Russia. Its year-on-year GDP growth (as estimated by impartial outside observers) was somewhere between 7 to 10%. Starvation was going down. Luxuries were going up. Kantorovich was fixing entire industries with his linear programming methods. ”

            If we were having the same discussion at the right point in history someone who was arguing against free markets could point to Russia, could point to Sputnik etc and compare it to some capitalist failed states from around the time.

          • Aapje says:

            The 1950s were a time of rebuilding in Europe. When industry is in ruins, you can get very good results by rationing the consumers and diverting most resources to rebuilding the supply side. There were plenty of entrepreneurs that were successful before the war and it’s pretty safe to just let them build what they want (with generous loans), rather than depend on the ‘invisible hand of the market,’ which is pretty broken in such a situation.

            Note that even in the UK, rationing happened until 1954.

            Russia started failing when this period was over and they stuck with central planning, in a situation where innovation, productivity growth and such became important.

            From my perspective, economic discussions are often limited to sweeping statements, like ‘free market good,’ ‘capitalism causes poverty’ or ‘austerity is good/bad’ which totally ignores the true complexity of the situation (like interactions with the environment, or that only non-pure systems have been shown to work).

            The worst part is that central bankers/IMF/etc seem to operate with similar basic models and keep causing disasters and then…do the same thing again.

          • “Its year-on-year GDP growth (as estimated by impartial outside observers) was somewhere between 7 to 10%.”

            Is that consistent with what we now know about growth during that period? Were the “impartial outside observers” basing their views on the official Soviet statistics?

            My understanding is that Warren Nutter produced much lower estimates of Soviet growth than other economists studying the Soviet Union, using indirect measures instead of Soviet statistics, and that it turned out that even his estimates were high. Whether that’s true for the period described here I don’t know.

            Along similar lines, Samuelson’s textbook for many years claimed the Soviet Union was growing much faster than the U.S.–but edition after edition showed about the same ratio of GNP. The point at which Samuelson predicted that Soviet income would pass the U.S. figure is now well in the past.

          • Murphy says:


            Given that Russia managed to go from a screwed up agrarian economy far far behind the US to beating the US into space and being a superpower for decades. It seems reasonable to assume that they had at least some periods of very significant growth just to come closeish to catching up and growth rates of 10% or more are not that unusual for economies playing catch up under even vaguely competent leadership.

            Of course that growth stalled and probably because the lack of good internal markets borked things eventually but my point isn’t that: my point is that you can’t just point to a bunch of cherry picked countries and say “look, north korea bad, south korea good” because that strategy can be used almost as easily to “prove” false things as to prove true things.

          • A government run system can focus resources on what matters to the government. For the USSR that mostly meant military, plus projects that gave them prestige, such as space, chess, and sports. But while it was going into space, the bulk of the population was remaining in third world conditions–dentistry without novocaine, a family living in a single room, and the like. The book The Russians by Hedrick Smith, who was the NYT bureau chief in Moscow, gives a pretty striking picture.

            I don’t think my examples are cherry picked. I gave the three cases where you had a common country/culture that had divided along communist/capitalist lines–Germany, China and Korea. The differences in outcomes were not small.

            Also one case of two Spanish American countries, one much less capitalist than the average (Venezuela), one somewhat more (Chile). And one of a country, China, divided in time not space. When Mao was in power, commenters on the left mostly spoke approvingly of his economic policies–when he died The Economist, not a left wing source, credited him with ending famine in China. By 2010, per capita income had gone up about twenty fold since his death–not a small change.

          • rlms says:

            Isn’t living in a single room (rather than on the streets) and any kind of dentistry at all the kind of things that sweatshop workers are supposed to be grateful that the glories of global capitalism have brought them? Or is that different?

          • nimim.k.m. says:


            Yet the point already mentioned that today mostly everyone (important) agrees that centralized authoritarian socialism isn’t optimal form of society (except perhaps in a total war); North Korea or E-Germany (/ any client state in a Soviet empire)) aren’t forms of government and economy that are considered viable alternatives. (My hypothesis is that the main problem with that form of government is the tendency to converge to a rule by authoritarian leaders surrounded by yes-men, and other similar effects that are poison to any seeds of fair and just and evidence-based policy also known as good governance.)

            Instead of Venezuela, people are arguing for Sweden, or the traditional practical policies enacted in the W-Europe countries (including W-Germany) in general, against the “theoretical ideal” free market. (Or to restrict the argument to a something more scenario… in Moon’s original examples, maybe one could argue for a policy like regulated advertisement of health products.) The extreme failure modes of extreme free market in some domains are different enough from the waste generated by a regulated economy that one can’t feasibly rank them on a one-dimensional line from “good” to “bad”.

          • Murphy says:


            A “pure” free market shares a lot of properties with some variants on cellular automate (with some nodes having distant connections or broadcast abilities of varying power)

            While in a traditional command economy you’ve got some central agent or agents trying to sort out all the local problems and typically failing vs the local actors since you don’t have to be very good to be better at solving a problem than the brainpower of even the worlds greatest genius applied to your problems for a few seconds with less info than you have.

            I do however have a feeling that there’s almost certainly some other models which would beat both since cellular automate are rarely even close to an optimal solution to most problems.

            Various charities and companies have also shown that central planning can be combined with local markets using token systems to get quite good results.

          • IrishDude says:

            @nimim.k.m Comparing median income, adjusted for purchasing power, would make Sweden poorer than all but a few U.S. states:

            My understanding is Sweden, as with other Nordic countries, has relatively free markets, but has high taxation and extensive wealth redistribution.

            I do like that Sweden has partially privatized its social security.

          • Aapje says:

            That comparison ignores that Americans work considerable more hours (1790 vs 1612 hours per year according to the OECD in 2015). It’s clear that low welfare societies force their people to work many hours, which boosts the average income, but also causes severe problems (health problems, bad parenting, etc). If you work fewer hours, you also need less money, as you have more time to do things yourself that you would otherwise have to outsource. A flaw of GDP/income comparisons is that it’s not counted when people do things themselves, rather than buy them from others.

            I think that a good argument can be made that average income doesn’t directly map on how well a society functions and that a (somewhat) lower average income can result in a far better functioning society. For example, Sweden scores better on most performance indicators than the US.

            If you gave me the choice to reincarnate as a random American or Swede, I will pick the latter in a heartbeat.

          • IrishDude says:

            Sure GDP doesn’t capture everything about what the ‘good life’ is, but it does provide a good short-hand comparison for standard of living; what things can people buy with their income. The median American can buy more stuff than the median Swede.

            As to ‘low welfare societies force their people to work many hours’, I don’t know what definition of force you’re using their, and who’s doing the forcing. Certainly, Americans could work less and have more leisure time and less stuff, but given their choices, they seem to prefer working 10% more than Swedes. Still, the number of hours worked by Americans has been on the decline for several decades:

            Personally, I’m working harder and longer now so that I can enjoy more and better leisure later. I don’t think there’s a right answer for work/leisure balance as it comes down to varied preferences.

            I think I could live a good life in Sweden or the U.S., but for a variety of reasons I prefer the U.S. (note I say this only based on what I’ve read about Sweden, and not about living there).

          • IrishDude says:

            I meant to say median income instead of GDP in my above post.

          • “Instead of Venezuela, people are arguing for Sweden”

            The Scandinavian countries are free market welfare states. The economic system is at least as close to laissez-faire as in the U.S., if anything closer–although of course less close than I would want.

            Sweden in particular, as discussed here recently, went from being relatively poor among European states in the 19th century to being one of the richest during a period when it was considerably more free market than the European average. Since it developed an extensive welfare system, it’s relative standing has gone down, although it’s still doing pretty well.

          • “If you gave me the choice to reincarnate as a random American or Swede, I will pick the latter in a heartbeat.”

            You are correct, of course, that a statistic such as per capita income is only a approximate measure of how well off people are, for the reasons you point out and others. We can’t do experiments on reincarnation, but we can observe migration flows as some evidence of which societies people prefer, although again imperfect for a variety of reasons.

            I believe Hayek commented somewhere, with regard to his decision to leave Austria, that for him as an established adult the U.K. was an attractive destination, but that if he had been in the situation of sending his children somewhere without him the U.S. would have been more attractive. His point, as I recall it, was that he thought the U.S. was a more fluid system, in which someone coming in with essentially no resources was more able to prosper.

          • Aapje says:


            As to ‘low welfare societies force their people to work many hours’, I don’t know what definition of force you’re using their, and who’s doing the forcing.

            One example is that PRWORA requires that welfare recipients work. When the alternative to accepting work is dropping below a ‘living wage,’ I would argue that it can be called a lack of realistic choice (aka ‘being forced’).

            Certainly, Americans could work less and have more leisure time and less stuff, but given their choices, they seem to prefer working 10% more than Swedes.

            You cannot equate choices with preferences, when people are pushed into certain choices. Unless you want to argue that poor people prefer to live in crime-ridden neighborhoods, just prefer the badly paid and unpleasant jobs that they are qualified for over the much more pleasant jobs that they are unqualified for, etc.

            It’s a rather basic truth that people choose between the options they have, not between the options they like to have. Equating choice with preference means that you can’t believe that capitalism works, IMO (and by that, I don’t mean not working well, but not working at all).


            We can’t do experiments on reincarnation, but we can observe migration flows as some evidence of which societies people prefer, although again imperfect for a variety of reasons.

            Migrants are clearly atypical people. For one, their willingness to migrate selects for risk-takers, hard workers, healthy people, etc. Those people obviously tend to prefer policies that benefit people like them.

            Libertarians seem to have similar characteristics and similarly selfish preferences for policies that benefit people like them. IMHO, one of the main libertarian failure modes is to take that preference for things that they think would make them happy and pretend that this would make the average person happier.

            His point, as I recall it, was that he thought the U.S. was a more fluid system, in which someone coming in with essentially no resources was more able to prosper.

            He migrated in 1931 (!) when Austria was still a very rigid class based society. This was before the welfare states were built up in Europe to counter the communist threat, before the Thatcher reforms in the UK, etc. As such, his statements are completely outdated.

            Social mobility is very poor in the US compared to Sweden, based on actual contemporary statistics, rather than 1931 anecdotes.

          • IrishDude says:


            One example is that PRWORA requires that welfare recipients work. When the alternative to accepting work is dropping below a ‘living wage,’ I would argue that it can be called a lack of realistic choice (aka ‘being forced’).

            I don’t think telling someone that you’ll help them out, but conditionally, means that you’re forcing them into accepting the condition. It is a realistic choice to not accept welfare and the work condition that comes with it. Unpleasant choices are still choices. I make unpleasant choices all the time.

            Anyway, I don’t know how much the working welfare population influences the American vs. Sweden annual working hours statistic, as I’d assume it’s mostly driven by our middle class working more than theirs. I could be wrong though. Perhaps none of the Swedish poor work because their welfare benefits are so generous and that would likely have some impact on the average annual hours worked statistic. If that’s the case, I feel bad for the Swedes who are subsidizing poor people’s leisure time.

            You cannot equate choices with preferences, when people are pushed into certain choices. Unless you want to argue that poor people prefer to live in crime-ridden neighborhoods, just prefer the badly paid and unpleasant jobs that they are qualified for over the much more pleasant jobs that they are unqualified for, etc.

            You can have preferences within constraints, though the poorer you are the more constrained you are. I’d prefer a Maserati, if I had the money, but given my income I prefer a Honda among my more limited options.

            I don’t think it’s much of a stretch to say that people in the U.S. could on average choose to work 10% less and have a standard of living 10% lower. Live in less expensive housing, eat out less, buy store brand products, take less vacations, buy one less TV set, buy a cheaper car, etc. Their revealed preference seems to show they want more stuff rather than more leisure, compared to the Swedes. I can’t say their preferences are wrong.

          • IrishDude says:

            Re: the social mobility link you cite, and the IGE statistic they use, there is good discussion here:

            “As noted by Julia Isaacs, currently a senior fellow at the Urban Institute, because most of the cross-country studies use this father–son IGE measure to illustrate relative mobility, they “ignore the question of cross-country differences in absolute mobility, that is, the likelihood that individuals in a given country will have higher standards of living than their parents due to national rates of economic growth.”[16] Unfortunately, comparable longitudinal data needed to compute absolute mobility are not widely available for most countries.

            Thus, putting aside the inherent issues involved in drawing conclusions from incomplete data, when looked at through one mobility definition lens, the available data suggest that America may not be as “relatively mobile” as some European nations are, but looked at through another lens, the United States boasts a great degree of absolute economic mobility and may indeed be the land of opportunity. However, it is still very difficult to draw even these conclusions definitively without more and better data on intergenerational and intragenerational mobility.

            In any case, when examining mobility between income quintiles over time in different countries, it is important to consider two questions:

            Are we mostly concerned with how much better off children are when compared with their parents?

            Or are we mostly concerned with how much better off children are relative to the progress made by the children of other households?

            To frame this issue in a larger context of opportunity and mobility, it is thus always important to distinguish between “upward absolute mobility” and “relative mobility.” The key distinction is that upward absolute mobility is possible with no downward counterpart, whereas relative mobility is never upward or downward unless examining a strict subset of the population and requiring that someone else be considered as going relatively “up” or “down” even if their circumstances stay the same.

            That bears repeating. It means that an entire country cannot experience upward relative mobility. This is critical because American policymakers over the decades have generally sought to create the economic conditions in which upward mobility is available to everyone rather than focusing on relative positions. Policymakers in some other countries, by contrast, focus more on relative positions than they do on general income growth.”

          • Aapje says:


            It is a realistic choice to not accept welfare and the work condition that comes with it.

            That is not a very realistic choice for many on the bottom, unless they choose a life of crime or such (which has its own major downsides). People do need to eat, have a roof over their head, etc.

            Perhaps none of the Swedish poor work because their welfare benefits are so generous

            EU welfare states do tend to place demands on welfare recipients, but usually less stringent than in the US, where I believe that the policy is driven too much by ressentiment, rather than policy that is chosen because it actually works.

            Anyway, I don’t know how much the working welfare population influences the American vs. Sweden annual working hours statistic, as I’d assume it’s mostly driven by our middle class working more than theirs.

            I agree, there is surely a more general tendency to work more hours. I would argue that Americans probably have high-expenditure cultural norms that actually provide fairly little happiness, like wanting to live in a big house.

            Their revealed preference seems to show they want more stuff rather than more leisure, compared to the Swedes. I can’t say their preferences are wrong.

            I would argue that most people seek happiness, but that they are often told by society what ought to make them happy. So their choices are heavily influenced by social norms, rather than a correct assessment.

            International comparisons do tend to show lower happiness for Americans than for Swedes.

            It means that an entire country cannot experience upward relative mobility.

            You are avoiding addressing my argument by focusing on an entirely different question. A meritocracy will have substantial upward and downward movement, as outcomes are greatly determined by the skills that people have. A class-based society has the opposite.

            It seems clear to me that a society where some people have very little access to higher positions if they were born poor, will result in major unhappiness and destabilizing forces (like votes for a strong leader that promises to fight the elite).

    • eyeballfrog says:

      Physicist here. I’ve never heard people sell things as “just like quantum physics”, but I assure you that unless they’re talking about quantum physics they’re wrong.

    • thetitaniumdragon says:

      FiveThirtyEight put it at about 2:1 for Clinton:Trump. That Trump won was not, in that analysis, terribly surprising, and he only won by the barest of margins, in part due to voter suppression efforts. And indeed, he lost the popular vote by a wide margin. Only about 100,000 votes in a handful of states determined the election.

      The people who put it at 90%+ for Clinton were simply not looking at the data and being honest about it.

  12. Dr Dealgood says:

    If an average psychology professor has a >130 IQ I will eat his tweed jacket.

    Looking at the brain size blog, it looks even worse. He took a rhetorical flourish (“Fewer than one in a thousand individuals in our society has the privilege, the freedom, to pursue their own ideas and creations.”) and treated it as though it was a p-value of 0.001. Then he plugged that in with his correlation coefficient and spat out a totally implausible but certainly flattering 133 IQ.

    You can’t do that. What if Dr Hsu had said “fewer than one in a hundred” or “fewer than one in a million” instead? The IQ of an entire profession shouldn’t fluctuate between 125 and 146 based on authorial whim!

    Look, this is why you need to sanity-check results. I probably wouldn’t have noticed this florid-prose-based psychometric analysis if it hadn’t spit out a totally unbelievable answer. It’s very important not to just trust that people online know what they’re doing.

    • Scott Alexander says:

      Yeah, okay, that was legitimately really bad and I should have caught it.

      But I disagree that psych professors are < 130. Remember, average Berkeley undergrad is somewhere between 126-133. Psych professors probably come from good schools, and they're probably smarter than the average undergrad. 130 seems if anything an underestimate.

      • Silverlock says:

        “That same Brain Size post proposes that the average professor has an IQ of 133, but I would expect psychology/economics professors to be higher, . . .”


      • Moon says:

        Many psychology graduate programs accept a far smaller percentage of their applicants than med schools do. It’s not easy to get into a psychology graduate program.

    • An IQ of 130 is not that uncommon .it would not surprise me if the majority of social science profs have IQs in the 120-140 range

      • alwhite says:

        In my understanding, 130 is about 1 in 50 people. Is this what you mean by not uncommon?

        • i guess it depends how you define rarity

        • StellaAthena says:

          It can’t be /that/ uncommon… Every room I am in seems to have one! /s

          In my experience, higher percentile people (talking pretty generally, not just about IQ) tend to underrate the rarity of their existence rather consistently.

          • baconbacon says:

            Since we are using SAT scores as a proxy, I got a 1350 and my wife got a 1500. I spent two years somewhat recently working in a couple of bakeries, and my wife works for a tech company. I don’t think that any individual working at the first bakery was within a single standard deviation of my SAT score, and possible not within 2 (including the owners). At the 2nd bakery there was one person that I would have estimated within my range, but that is difficult because she was Korean so the language barrier confuses things. The owner of the 2nd was french and probably quite bright, but he also though managing meant yelling which often made him appear not bright (and also made people want to put one over on him which leads to feelings of superiority for many).

            My wife on the other hand went to a high end private school for high school where she was probably only a little above average and now works in a field with plenty of above average people. She thinks she is far less rare than I do (to the extent where she probably puts her own uniqueness at lower than my actual), probably because all her life she has been a little above average where as I have many experiences with people who didn’t or almost didn’t graduate high school.

          • “In my experience, higher percentile people (talking pretty generally, not just about IQ) tend to underrate the rarity of their existence rather consistently.”

            It’s the same reason that people overestimate population density. You base your estimate on what you see, and you tend to be in places where there are people.

          • thetitaniumdragon says:

            The thing is, 1 in 50 is not very rare at all. Sure, if you’re taking a random population sample, that’s uncommon, but if you’re dealing with society, if you’re 1 in 50, there are 6.4 million of you.

            That said, I suspect that the distribution of professors’ IQs is slanted; you probably see a sort of hump below 130 with a long tail above it.

  13. pdbarnlsey says:

    My field is behavioural economics-adjacent, and based on that experience I’d have thought this task was (sufficiently specialised) expertise amenable.

    There’s a big, complicated literature on the effects of incentives (as well as a few recent high profile failures to replicate) and at least a couple of ideological camps built around the “incentives matter” and “psychology matters” points of view.

    If you know that literature (or spent 10 to 60 minutes going through it starting from a decent knowledge base) you ought to be able to come up with a fairly stable ranking of these sorts of incentive structures. But I’m a PhD in a related field, have read a bit on the area and did a series of job interviews relating fairly closely to this about 9 months ago, and I would be relying on instinct and judgement rather than expertise if I performed this exercise cold.

    So I’d say this is either a task which requires very specific expertise, or that the whole area is much more context dependent than anyone realises, and the literature from, say, share trading, fails to generalise to mechanical turk tasks because incentives a super contextual.

    • Scott Alexander says:

      I’m curious: did you look at the incentive structures (which are mentioned in the paper) and see how well you did?

      • pdbarnlsey says:

        Not yet – I thought my comment would be of more value if I wasn’t carrying any baggage.

        I’ll have a look now, but my meta-prediction is that I know where I’d look for the answers (I may even be wrong on this) but not what the answers are, because the literature resists a catchy summary.

        • pdbarnlsey says:

          ok, I made an attempt at ranking them, rather than calculating actual numbers of clicks, since I didn’t have a good mental picture of the length of the experiment and how many clicks might be possible.

          I did just ok. I greatly overrated the appeal of charity across small amounts – participants appear to prefer 1 cent for themselves to 10 cents to charity, and come close to preferring 0.1 cents for themselves. This may be a blind spot for me, or may be an artefact of the mechanical turk workforce.

          I correctly ranked the conditions within each category, which was trivial for some and nontrivial for others, except for slotting in the reminder of the task’s significance one level too high.

          Slotting the different groups together was harder, though I was broadly successful.

          I didn’t expect the low risk aversion condition to be outweighed by a four week delay in payment, and thought the high risk aversion condition would have more of an impact than it did. These are the closest to my area of expertise, so this is embarrassing and perhaps significant.

  14. Mark V Anderson says:

    One study!! It’s fun to talk about reasons for results, and to make fun of MBA students, but this really means nothing. Find 10-20 more studies with the same results and we’ll have something worth discussing.

    • TheBearsHaveArrived says:

      I dislike that statement.

      One properly done and well controlled study with thought and effort put into it beats 1000 flawed ones, or for that matter an infinity of bad ones.

      The problem with analyzing this study is like so much with academia studies.

      Lots of needlessly complex mathematics the type Nassim Taleb mocks. Its far longer then it should be. It should probably be 8 pages of text instead of 30, with the most important explanatory images more logically arranged. Far more citations then it actually used or was inspired by(no, they didn’t read all 35 in depth along with checking some of the citations of the ones they linked)

      Or more or less all the faux complexity obfuscation that academic studies usually have that earn them the ire of industry that *has* to accomplish something on a schedule besides another long paper.

      • Mark V Anderson says:

        Okay, to be more precise: Find 10-20 properly done and well controlled studies and then we’ll talk.

        I don’t believe one study should definitively determine how we think about things. It may be suggestive, but that’s all. There are always issues of one off mistakes that are impossible to determine by reading a study. Heck, just the issue of possible dishonesty in the study is enough to doubt one study. Every study must be replicated before it is taken seriously. And social science, with no end to confounding variables, needs many replications.

  15. Eponymous says:

    This might just be hindsight, but I wouldn’t have expected experts to be much better at guessing this than smart undergrads, with the exception of psychologists and behavioral economists who specifically study incentives (in which case, I *would* have expected them to do significantly better, because I expect there is literature on what kinds of incentive schemes work better than others. And in fact, I’m quite surprised they didn’t find this, leading me to suspect they either didn’t include the right category of experts, or they didn’t check how they did. Or perhaps the existing state of the art answer is “we don’t really know what incentive schemes work” and the results of the experiment are mostly noise.)

    Why wouldn’t I think these experts would do better than smart undergrads? Mostly because there isn’t a well-established theory or body of generally-known empirical results on this topic for them to draw on. So I don’t see a mechanism for expertise to work through in this case.

    (Note: I am basing this on my own knowledge base, which is a reasonable proxy for what such experts know since I am one such expert. I am familiar with a *few* results about effectiveness of different incentive schemes, but I wouldn’t give it enough weight to significantly change my estimates in such an exercise. I expect I would do reasonably well on this exercise, but not due to my expertise — mainly due to my LW/rationality training.)

  16. Eponymous says:

    Two other minor comments:

    If there is little literature on this subject, an important component to answering these questions is introspection. Then the extent to which the person resembles the typical person might influence their responses. Of course, they could take this into account in their answer, but that is probably difficult to do (you can widen your error bars, but you may not know which way to shift the point estimate).

    Also, typically psychological/behavioral economics experiments are done on undergraduates, often economics students. These students might respond to incentives differently than people recruited from another source. For instance, economics undergraduates may respond more to monetary rewards than the general public.

  17. NatashaRostova says:

    >I’m still confused by the MBA students, and expect to remain so. All MBA students were undergraduates once upon a time.

    I can’t outright answer this, but I suspect I have more experience working with MBAs than you:
    For a while I was the only data scientist on a team of top b-school MBAs (at a top-5 tech company). Our area of business development was one where there were so many new initiatives there was no proper empirical way to identify the right choice. It was all, for lack of a better term, ‘business judgement.’ I viewed this as the set of data we were working with was highly highly unstructured and high-dimensional, such that only the human brain could model it, with some light help from Excel.

    In this world, the MBA does lots of work based on what ‘feels’ right, absorbing high dimensional unstructured data, and being smart enough to not make obvious errors. While that’s not really my style (I like coding/econ/forecasting), there is no denying they are better at it than I am — despite knowing nothing really of the scientific method, coding, or forecasting.

    So the main question would then be why would they still be worse than undergrads? I’m not sure I can give a satisfactory answer, but I can say MBA students tend to be part of a group that is taught, and benefits, from always thinking they know the true answer. IMO that’s still too weak of an explanation to explain why they did worse than undergrads.

    To get to that question, subsetting MBAs on area of study is crucial. MBA with marketing specialty shouldn’t be compared to a U Chicago MBA Finance student.

    • Moon says:

      “I can say MBA students tend to be part of a group that is taught, and benefits, from always thinking they know the true answer. IMO that’s still too weak of an explanation to explain why they did worse than undergrads.”

      Of course that would explain why they did worse than undergrads. If you always think you know the true answer quickly, then you will very often never find the true answer–due to not admitting that you don’t already have the answer to begin with. The Dunning-Krueger Effect is greatly intensified among people who can’t admit that they don’t know everything to begin with. You put no effort whatsoever into learning or discovering what you already “know.”

      It’s like the supposed Mark Twain quote (but Tweain never said it actually): “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.”

      Or this quote, which really is from Leo Tolstoy:

      “The most difficult subjects can be explained to the most slow-witted man if he has not formed any idea of them already; but the simplest thing cannot be made clear to the most intelligent man if he is firmly persuaded that he knows already, without a shadow of a doubt, what is laid before him.”

      This matter is one of the greatest pitfalls of civilization. Some of the biggest blocks to progress have often been what we “knew” that turned out not to be so.

  18. howdyhowdy says:

    The task employed in this study seems especially likely to be well predicted by the “wisdom of crowds”, because it may be predicted as an average of idiosyncratic judgments.

    Suppose that:
    (A1) each predicting individual knows how well each incentive scheme would motivate them personally;
    (A2) when considering which incentives would work best, the predicting individuals (e.g. undergraduates) rank the incentive schemes according to the order that would most strongly motivate them personally;
    (A3) the “crowd” of predictors has similar psychological characteristics to the “crowd” from which data was gathered in the incentive task.

    If (A1), (A2), and (A3) are true, then adding predictor individuals to a “wisdom of crowds” predictive model will increase the accuracy of the model; this occurs because the psychological characteristics of the crowd of predictors will better and better approximate the psychology of the individuals who performed in the incentives task.

  19. tcheasdfjkl says:

    When I saw the title, I thought you were going to organize a thing where you find pre-registered experiments, post about them, ask people to predict the results, and then later post the results and see how people did. Can you do that? That sounds fun.

  20. Peter Gerdes says:

    I think we should be cautious in generalizing from this example. In particular, we should be aware that the level of difficulty of the question (and what expertise that requires) plays a huge role in this process.

    Imagine we had instead asked for the respondents to estimate the number of gumballs in a jar. In such a situation we wouldn’t expect expert knowledge to matter at all. However, we might plausibly expect the guesses of anyone making a reasonable effort to have a mean equal to the true value (or nearly so) making averaging guesses very effective.

    From the description it seems quite plausible that what people are being asked to estimate here is something similar. Perhaps there is one simple fact (or natural thought), e.g., incentives tend to be reasonably good at improving people’s behavior or whatever, that the experts are all familiar with that accounts for their slightly better performance.

    It could still very well be that on tasks where expertise can lend more insight experts would do much better. However, it does remind us that there are plenty of tasks where this isn’t the case.

  21. Another important caveat: predictive tasks are different than interpretative tasks.

    that is indeed true and interpretation is important too. and a lot of people get hung-up on predicting, with blanket statements such as ‘economics is a failure because economists didn’t predict the 2008 crisis’ but ignoring or not realizing that a part of economics is also prescriptive and descriptive.

  22. sohois says:

    Unlike the SAT, you can take the GMAT as many times as you want to get a better score which I would expect to have some impact in boosting the supposed IQs of the MBA students.

  23. tgb says:

    Try this test on yourself! Here are the possible incentives. Everyone receives $1 to participate, and then one randomized extra incentive. The incentives are clumped into categories of similar possibilities (as done by the researchers). Participants score 1 point every time they press the key ‘a’ followed by the key ‘b’ on their keyboard (eg: pressing aaabbbab will score 2 points) over the duration of 10 minutes.

    Piece rate:
    1) Your score will not affect your payment in any way.
    2) As a bonus, you will be paid an extra 1 cent for every 100 points that you score.
    3) As a bonus you will be paid an extra 10 cents for every 100 points that you score.
    4) As a bonus, you will be paid an extra 4 cents for every 100 points that you score.
    Pay Enough or Don’t Pay:
    5) As a bonus, you will be paid an extra 1 cent for every 1000 points that you score.
    Social Preferences Charity:
    6) As a bonus, the Red Cross charitable fund will be given 1 cent for every 100 points that you score.
    7) As a bonus, the Red Cross charitable fund will be given 10 cents for every 100 points that you score.
    Social Preferences Gift Exchange:
    8) In appreciation to you for performing this task you will be paid a bonus of 40 cents. Your score will not affect your payment in any way.
    9) As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your account two weeks from today.
    10) As a bonus, you will be paid an extra 1 cent for every 100 points that you score. This bonus will be paid to your acount four weeks from today.
    Gains Versus Losses
    11) As a bonus, you will be paid an extra 40 cents if you score at least 2,000 points.
    12) As a bonus, you will be paid an extra 40 cents. However, you will lose this bonus (it will not be placed in your account) unless you score at least 2,000 points.
    13) As a bonus, you will be paid an extra 80 cents if you score at least 2,000 points.
    Risk Aversion and Probability Weighting:
    14) As a bonus, you will have a 1% chance of being paid an extra $1 for every 100 points that you score. One out of every 100 participants who perform this task will be randomly chosen to be paid this reward.
    15) As a bonus, you will have a 50% chance of being paid an extra 2 cents for every 100 points that you score. One out of every 100 participants who perform this task will be randomly chosen to be paid this reward.
    Social Comparisons
    16) Your score will not affect your payment in any way. In a previous version of this task, many participants were able to score more than 2,000 points.
    17) Your score will not affect your payment in any way. After you play, we will show you how well you did relative to other participants who have previously done this task.
    Task Signifigance
    18) Your score will not affect your payment in any way. We are interested in how fast people choose to press digits and we would like you to do your very best. So please try as hard as you can.

    [Note: I can’t tell whether the score was displayed to the users during the task. But experts taking the test would have known this, since they were given an opportunity to try the task themselves. Seems very relevant for several of these.

    Your goal is to guess the Mean Effort (i.e. average score) of users shown each of these prompts. Additionally, you are told the Mean Effort for the first three tasks. For (1) mean effort is 1521, for (2) its 2029, and for (3) its 2175. [Recall: these are no incentive, 1 cent per 100 points, and 10 cents per 100 points respectively.]
    The results are available in this pastebin but you can also just find it on page 52 of the article in a more readable chart form.

    What’s your mean absolute error? (i.e. take the absolute value of the difference of your guess and the actual mean effort and then take the average of those.)

    Academic experts had average mean absolute error of 169, PhD students had 171, Undergraduates had 187, MBA students 198, and Mechanical Turk workers had 271 (lower is better, obviously).

    Unfortunately, I viewed several of these numbers while writing this so my predictions are incomplete. Ignoring those two, my mean absolute error is 121. I find this surprising, since I totally flubbed several of them and this would put me at the top ~20% of academic experts (which I’m not). I’m wondering if I’m missing something about how they report scores. Though one of the ones that I skipped I probably would have done poorly on, so my score is a little better than it should be. On the other hand, I still did worse than the wisdom of the crowds meta-forecasters when averaging from any group except the MTurk workers.

    Edited to add: I think it would be a great standard to have in the rationality diaspora community to always try to make these kinds of studies something you can perform on yourself when possible. I hate it when the answers get spoiled before I can get a chance to try to think what they should be!

    • mortehu says:

      I made a web app so that you don’t have to manually score yourself. If you know JavaScript, please audit the code, but note that the answers are clearly visible in the code.

      • quanta413 says:

        I tried to make what I thought was an obvious partial ordering of incentive schemes, thought about things a bit, stirred in my intuition of pop psychology, and after some sweating, put in my guesses, got a mean absolute error of a little over 130 on my guesses and was feeling pretty clever. And then I decided, well, what would I do if I had no idea what the problem domain even was or no idea what the different incentive schemes were? Like what if it was just how do you think incentive scheme 1, 2, 3, etc. did with no other information given.

        So I checked what’s the mean error if you just take the mean of the first 3 known values as your guess for every single box and got a mean absolute error of 129. Well fuck.

        This just makes me think even more that the principle of indifference is really underrated. It handily crushes the average of any human groups score and it’s trivial to figure out how to apply it.

        • tgb says:

          Great point! Also: kinda depressing.

        • tmk says:

          I don’t really follow. What do you mean with “take the mean of the first 3 known values as your guess”. How do you get 3 known values? Values of what?

          “how do you think incentive scheme 1, 2, 3, etc. did” implies you know what each scheme is, but that contradicts “no idea what the different incentive schemes were”.

          • rlms says:

            You are given the score for three incentive schemes. Guessing that all the others are will be roughly the mean of the three know scores (even when that obviously won’t be true) gives you a low error value (and I guess applying some basic knowledge that no incentive will have a lower score than any of the others etc. would improve that even more).

          • quanta413 says:

            Yeah, rlms said it better. I think you could get better scores by applying basic knowledge, but the more knowledge you attempt to apply the more wiggle room there is to screw up. Sometimes people overweight their priors is the way I’d put it.

      • tgb says:

        Awesome, thanks! (Scott: link to this so more people will see it!)

  24. cassander says:

    I attended both a top 10 liberal arts college (LAC) and, for a brief period, Berkeley. If the average IQ of undergrads is 133, I’ll eat my hat. Unless I massively misunderstand how clever 133 IQ is, that’s just not possible. If nothing else, the students at the LAC were consistently better, at least verbally. I’ll grant they probably kick our ass in STEM, my LAC actually had a pretty good STEM program, but it was small, but I didn’t spend enough time around either program to judge.

    • StellaAthena says:

      Do you think this is too low or too high? It seems spot on to me (UChicago grad)

    • razorsedge says:

      I was tested at about a 125 in the 6th grade. I am at a school that is about 10 spots behind Berkley in the academic rankings. I am about average intelligence at my school. Not a genius not a moron. So i would not be surprsied if the average Berkley grad was 133.

  25. dndnrsn says:

    Fifth, even the slightest use of “wisdom of crowds” was enough to overwhelm the expert advantage. A group of five undergrads averaged together had average error 115, again compared to individual experts’ error of 169! Five undergrads averaged together (115) did about as well as five experts averaged together (114). Twenty undergrads averaged together (95) did about as well as twenty experts averaged together (99).

    Did you transpose the third and fourth numbers here?

  26. Nadja says:

    Great article as usual. Tiny nitpick on your Trump joke. (Hope it’s allowed here, please delete if it isn’t.) Trump was an undergrad at Wharton: he never got an MBA, so the conclusions about MBA students shouldn’t apply to him. =) Also, a running joke among undergrads at UPenn (including Wharton) is that the undergrads there are much brighter than the MBAs, because the MBAs are the ones who need to come back for a degree having not been successful enough in their careers.

    • Garrett says:

      My primary career is in software development at big tech companies. They frequently have separate management and technology tracks. For people who want to stay tech-focused, there’s a lot of career opportunities that require very little specialized business knowledge past probably 1st year business school. When you’re high-up in the tech track, if you need some business analysis done, you can find one of the low-level business people and have them do the specialized stuff for you.

      But sometimes people on the tech side develop an interest in management itself and want to work their way up the corporate ladder. And in that case you need to quickly make up your lack of business background. In that case you’re dealing with a lot of very bright people pursuing a career change. Which I suppose suggests that the distribution of MBA students might be bi-modal. More stuff it would be interesting to know the answer to.

  27. Moon says:

    Here’s a good contest for people here who are interested in scientific experiments, their interpretations and their results, and in academic theories:

    The Official Thomas Friedman ‘Make a Meaningless Graph’ Contest: Friedman graphed history in his new book, ‘Thank You For Being Late’ – Outdo him and win a free T-shirt.

    A hilarious article, IMO.

    • Deiseach says:

      Well, now I know where the meaningless graphs in that AI persuasion essay came from/were inspired by: two axes with labels but no units, at least one of the labels referring to an unquantifiable abstract measure (“time” can at least be broken down into years/decades/centuries, but what SI units do we measure “progress” or “adaptability” in?) and an exponential curve mapped against these, extending out as far as the creator’s taste, fancy, or pain in their drawing hand will permit.

      I had no idea there was an onlie begetter of such. You learn something new every day! 🙂

  28. StellaAthena says:

    I have contextual reasons to doubt your IQ estimates for Booth students: Booth is widely considered “not that hard” by UChicago undergraduates.

    I graduated this past year (Mathemaics and Philosphy with a minor in CS) , and there was a strong push against going to B-School unless you specifically wanted to impress the kind of person who would be impressed with a B-school degree. University of Chicago economics undergraduates are not allowed to count B-school classes towards the Econ major or minor as they are considered easier than upper level undergraduate courses. There were some kids who took B-school classes, but they universally reported them as being easy. A number of Econ students I knew did this to pad their GPA for (Econ) grad school since most admissions depts would be impressed, not realizing they were easy. These students also reported frequently being at the top of their class. Math/Econ double majors (of which there were a lot) thought the math courses were harder than the Econ courses, and I took the Honors Intermediate Micro class with no background (besides being good at math) and didn’t find it hard.

    Yes, “econ and math majors at University of Chicago think it’s not that hard” isn’t particularly damning, but this makes me think that the median b-school IQ isn’t significantly higher than the median IQ of the undergrad population, which I would estimate to be around +2σ (which agrees with the Berkeley IQ, a corroboration I would have predicted). On a number of metrics (GPA, course placement, and honors) I would place in the top 10% of UChicago students (which also accords with my experience) and my IQ scores have been {145,142,143} (I’ve taken the test at least two other times and don’t know the scores). Two of the reported scores were Raven’s Matrices, and I don’t remember what the others were. I only knew one B-school student personally, but I would be shocked if their average IQ was a full standard deviation above the average UChicago student.

    • TheBearsHaveArrived says:

      Well. Grade inflation can be horrific in some business schools. Doesn’t half the class graduate with honors in some of them?

      But no. Its not as hard as econ, which isn’t as hard as real analysis done right.

      • StellaAthena says:

        Good point. I’ll ask around about if people I know who took B-school classes thought that the b-school kids also thought that the courses were easy. That hadn’t occurred to me

        • Douglas Knight says:

          Trade school is for learning valuable skills, not testing IQ.

          • Wency says:

            To be clear, (as an elite MBA grad), b-school isn’t really for learning valuable skills either, unless you’re a career changer. It’s mostly an expensive networking and job-searching club, with a credential to boot. You’ll learn the skills on the job. Or you already knew them coming into b-school (basically my case) and you just need some time and a bit more weight to your resume to focus on getting the job you want.

            Being a few years out, b-school students are often rusty on academic skills relative to undergrads at a similar IQ level. There’s also a direct cause-effect in how they approach academics — will I get closer to the job I want by doing well in this class or by blowing it off to attend a social with some recruiters? They just left the job market and their sights are on their exit back into the job market from day 1.

            A lot of the recruiting happens before you have many grades, so grades barely matter, especially after the first semester. Some jobs might require you to maintain a ludicrously easy GPA, others have no such requirement.

            Undergrads, by contrast, are coming off of 12 years of training about the importance of grades, with often only a vague conception about the job market. So they have more of an instinct to pursue grades for grades’ sake.

      • StellaAthena says:

        So, I spoke to some of the aforementioned students and they said that they thought that the booth students also thought that the classes were easy, but not as easy as the undergrads. Apparently the undergrads were frequently the top of the class, but everyone did well due to grade inflation (grades below B+ were uncommon) and the B-shook students rarely seemed to care or put in effort. They uniformity expressed surprise at the suggestion that the B-school students were significantly smarter than the average UChicago student.

        Definitely weakens my case, but I think on balance it’s weak evidence against the idea that B-School students are more than one standard deviation smarter than undergrads.

        Also, a numerical objection: there are less than 150k such people in the US. There are 32,000 such people between the ages of 18 and 34 according to the census, which would be under 2,000 such people per year in that age group. I don’t think I live in a world where >10% of the brightest people in the country go to B-school. What are other people’S intuitions on this?

        • baconbacon says:

          Are you sure that you don’t have a major selection bias problem? In my experience the very smart often are the ones that understand and then exploit the rules of the game. The people who figured out how to get the same value degree for less effort could well be selected for the top undergraduate students in terms of IQ.

          • StellaAthena says:

            No, I don’t think that at all. I think that those students went to other universities. About half of UChicago graduates get general honors, which requires a 3.3 GPA, while at Harvard the median grade is an A-. It’s well known at the University, and openly discussed, that it’s harder to get into grad schools coming from UChicago than from peer institutions.

            But honestly, if I had wanted to get the same results with less effort, I would have gone to a top 25 university for free and gotten a 3.7 rather than pay 200k to go to UChicago and gotten a 3.2. If that’s your goal there’s no reason to go to a top ranked university.

          • My younger son, who went to U of C, reports that someone there had a shirt which said “If I wanted to get A’s, I would have gone to Harvard.”

          • StellaAthena says:

            One of the dorms sells them as a fundraiser. And, accurate or not, that perception strongly exists at the school.

  29. TomA says:

    Fundamentally, this study is an exercise in evaluating the power of human reasoning and it’s culmination in the development of rationality. Most of what we have to start with is innate and evolved over many millennia. In recent times, education and the current environment of diverse problem-solving opportunities has likely enhanced our intuitive capabilities with heuristic-based improvement. I doubt that the magnifying glass of statistical analysis is revealing anything other than normal population variance noise. But even if it did show some slight bias, what would that mean? AIs already make many types of predictions with better accuracy than humans.

  30. Steve Sailer says:

    It seems to me that predicting which incentives work best on Mechanical Turk would be the kind of thing that experience with online incentives for Mechanical Turk is most important for.

    After I got my MBA from UCLA in 1982, I went to work for a company that recruited shopping panelists to identify themselves at the checkout counter in supermarkets in Eau Claire, WI and Pittsfield, MA. The incentives that were used were ones that had worked at other panel companies. I don’t recall assuming that my MBA entitled me to tell the experienced professionals in this curious little field that they were doing it wrong. Instead, I can vaguely recall thinking, “Oh, that’s interesting, I probably wouldn’t have thought of that, but I can see now how that would work.” In the business world, there’s a lot of arcane knowledge of different specialities.

    For example, it turns out that to recruit small town people to have their purchases scanned, it helps to tell them that this information is valuable to companies to help them figure out better products to sell. Coming from MBA school, I probably wouldn’t have guessed that pro-social appeals would be highly successful, but apparently it was.

    Another tradition in the panel business that I wouldn’t have figured out a priori is that all communications to panelists were stated to be from a lady with a Betty Crocker-type name and image. The company had commissioned a drawing of their made-up figurehead depicting her as a white lady of about 35-40 with a professional / maternal mien.

    In summary, there are a lot of tricks of the trade of inducing cooperation in research and they’re not necessarily obvious to MBAs. On the other hand, one of the things you do learn in MBA school is that you, personally, don’t know everything, which is why you hire people who have various specialties.

    • Aapje says:

      Coming from MBA school, I probably wouldn’t have guessed that pro-social appeals would be highly successful, but apparently it was.

      Small town people probably often feel ignored and may also experience limited shopping opportunities. So they may be eager to influence supermarkets to cater to them better for these reasons.

      • Steve Sailer says:


        But in defense of MBA students, they are pretty good at saying, “I assumed that X would be true off the top of my head, but you are telling me that everybody in the business knows you can make more money by assuming Y is true? Okay, I’ll start assuming Y is true.”

  31. meltedcheesefondue says:

    I hope this is not a rationalisation on my part, but, having skimmed the options presented, this does not feel surprising.

    Key quote from [Dawes, Robyn M. “The robust beauty of improper linear models in decision making.” American psychologist 34.7 (1979): 571]:

    “Because people—especially the experts in a field—are much better at selecting and coding information than they are at integrating it.

    But people are important. The statistical model may integrate the information in an optimal manner, but it is always the individual (judge, clinician, subjects) who chooses variables. Moreover, it is the human judge who knows the directional relationship between the predictor variables and the criterion of interest, or who can code the variables in such a way that they have clear directional relationships. And it is in precisely the situation where the predictor variables are good and where they have a conditionally monotone relationship with the criterion that proper linear models work well.”

    Basically experts are good at finding variables that affect the outcome, and figuring out the direction of the impact. Once the variables have been established, though, they are pretty bad at weighing their relative importance. The task here (ranking the impact of different pre-selected changes) is essentially that of weighting the importance of the variables.

    Though, as usual, they are massively overconfident in their weighting abilities.

  32. tmk says:

    How statistically significant were the non-differences? If the data is mostly noise, it should not surprise you that they found no difference between groups A and B, even if group A is more prestigious or whatever.

    Second, the different incentives were tested on people on Mechanical Turk. I would guess they respond to incentives very differently than other people in other situations.

    • tgb says:

      Note that the experts+students knew that it was Mechanical Turkers being incentivized. If anything, the experts should benefit from this the most: MTurk is a pretty standard place to do these kinds of experiments these days so if expert knowledge applies anywhere, it ought to apply here.

      • Matt M says:

        Might also be a point that mechanical turk is a relatively new technology and platform that undergrads may be more familiar with than MBAs as a simple virtue of age. 20 year olds may have a better understanding of current tech than 30 year olds do.

  33. Observation. If the goal was to best predict which motivational scheme motivated other people, it would probably be weighted by one’s own motivational tendencies.

    If people differ in inclination due to personality types, and people guessing others motivations use themselves as guidance, then by simply taking several peoples guesses it then smoothes out the distribution into something approximating how the study worked out.

    Its like asking a bunch of chefs what foods the typical person will prefer and in what order. I suppose experienced chefs know more then the average person, but how objective they are/can be and how much they can really know in the first place makes all their guesses matter.

    I wonder how many sociological studies praising the wisdom of crowds suffer from that same mistake.

  34. Steve Sailer says:

    As a more general question, I’m not all that sure that the Popperian idea that making accurate predictions is really the acid test is all that certain. Much of human affairs isn’t at all like understanding astronomy in that the answers to questions can affect what happens next. Whatever model of the solar system you have, for example, doesn’t actually affect the solar system. On the other hand, presidential polling methodology changes all the time in response to what happened in the last election.

    For example, over the last few months I was repeatedly asked to give my opinion on the question of whether the polls showing Hillary in the lead over Trump were accurate or whether they had methodological flaws. I consistently declined because I didn’t see much point in investing a lot of effort in learning an arcane field that would be instantly obsolescent on November 9th because pollsters would find out what they did wrong and try different techniques in 2020.

    On the other hand, I feel pretty good about having identified the 2016 Republican candidate’s road to victory through Wisconsin, Michigan, and Pennsylvania in late November 2000, when I wrote on 11/28/2000:

    “So where could Bush have picked up an additional 3 percent of the white vote? The most obvious source: white union families. …

    “Since union efforts cost Bush Michigan, Pennsylvania, and Wisconsin (at a minimum), you`d think that the GOP would be hot to win back the Reagan Democrats.

    “Don`t count on it, though. It`s just so much more fashionable to continue to chase futilely after Hispanics.

    “In summary: the GOP could win more elections by raising its fraction of the white vote minimally than by somehow grabbing vastly higher fractions of the minority vote.”

    Considering that nobody else during the Karl Rove years was saying this, that’s a pretty darn good prediction. On the other hand, there are not a lot of financial rewards in being right 16 years ahead of time. As Keynes said, markets can stay irrational longer than you can stay solvent.

  35. Saladlikeforks says:

    but I would expect psychology/economics professors to be higher

    I haven’t the slightest clue where you pulled this from, my experience as a neuroscience undergrad led me to the exact opposite conclusion. The number of psychology academics who bought into whatever the latest socio-political zeitgeist determined about people was staggering, besides the biblical import laid upon the DSM. I cannot speak to economic professors but then one would imagine most genius economists to be out making money through their expertise and ability, no?

    Another important caveat: predictive tasks are different than interpretative tasks

    This, I think, is the crux of the solid argument against IQ as far as predictive ability goes. IQ is a measure of recognizing and continuing an established pattern; it’s not about “predicting” the next step, it’s about solving it. An ability to guess well and an ability to reason well are probably linked somewhat, but less so than IQ-philes seem to believe.

    Great write-up on the experiment though, definitely something to think about.

    • Steve Sailer says:

      Right, I think that we need to dig in more into understanding and distinguishing among different types of prediction-making. The canonical examples used in philosophy of science of predicting astronomical phenomena are quite different on several dimensions from the more profitable business of predicting human markets.

  36. Paul Conroy says:


    Interesting post.

    First off I think you’re presumed IQ’s of Liberal Arts professors are way off – as in way too high.
    IIRC, from 15 years ago, the average IQ of PhD’s in various disciplines are as follows (sd=15):
    1. Liberal Arts (including Psychology, Economics) – 115 (+1 sd)
    2. Medicine – 130 (+2 sd)
    3. Math, Physics, Computer Science, Philosophy – 145 (+3 sd)

    I think I have some insight into what kind of ability might make one a better predictor – apart from IQ – as I correctly predicted Trump to be the nominee and president over 1.5 years ago. I am actually a very good predictor across a whole plethora of disciplines, from genetics, stock market, technology, futurism, and so on. I did take part in Phil Tetlock’s Superforecasting experiment, and all my predictions were correct.

    I have what’s considered a high IQ (157, sd 15) on a general IQ test, but more than that I got an almost perfect score on a Raven’s Progressive Matrices test (1 wrong out of about 200), which is heavily weighted towards Visual-Spatial Ability.'s_Progressive_Matrices

    I posited about 3 years ago, that of the 3 main IQ abilities (Verbal, Math, Visual-Spatial), that Verbal and Math may be on one pole – as they both are really symbol manipulation – and Visual-Spatial may be another separate pole of intelligence, with less overlap with the other two.
    I further posit that ability to predict is more related to Visual-Spatial ability, than to Verbal or Math ability.

    If this were true, then how would be test it? and specifically:
    Q1. Is there any group or groups of people who consistently score high on Verbal+Math but low or average on Visual-Spatial?
    Q2. What group or groups score highest on Visual-Spatial?

    A1. Ashkenazi Jews.
    A2. North East Asians and to a lesser extent Northern Europeans, and additionally those who are high functioning on the Autistic Spectrum.

    So, I leave it to readers to ponder if the sample of people from top business schools, may be slightly biased in favor of Ashkenazi Jews – as if my two suppositions are true – than there’s your answer!