codex Slate Star Codex

Open threads at the Open Thread tab every Sunday and Wednesday

Watch New Health Picks

So far most of Trump’s appointments have been ordinary conservative hardliners or ethically-compromised rich people. But there’s a chance that some of his health care picks could be really interesting.

I’m not talking about Health and Human Services nominee Tom Price. As far as I can tell he’s an ordinary conservative hardliner (not to mention an orthopaedic surgeon), and pretty par for the course.

But I’ve seen three names mentioned as top candidates for FDA commissioner – Scott Gottleib, Jim O’Neill, and Balaji Srinavasan. I don’t know much about Gottleib, besides that he writes for Forbes about how Obamacare is bad. But either of the latter two would be shocking breaks with tradition and potentially among the highest-value political experiments of all time.

Jim O’Neill is a director at Mithril Capital and a former deputy deputy (sic) HHS secretary. He’s a (former?) board member of the Seasteading Institute, which hopes to create a libertarian utopia on a floating platform in international waters, and which recently signed a preliminary agreement with French Polynesia to begin pre-construction planning. He’s also a director of SENS, Aubrey De Grey’s collaboration to fight aging, and has proposed increasing the organ supply by paying donors. Also, I see him commenting on Eliezer’s Facebook feed sometimes, and he seems to be Facebook friends with Eliezer, Julia Galef, and, uh, me. Maybe he reads this blog? Hi, Jim!

Balaji Srinivasan describes himself as “a computer scientist, investor, entrepreneur, and academic”, and previously founded a very successful genetic testing startup; now he holds various Bitcoin-related positions. He’s famous (infamous?) for a piece called Silicon Valley’s Ultimate Exit, where he promotes “exit” over “voice”; he suggests Silicon Valley find ways to create an alternate society that escapes the dysfunction of the federal government and the rest of the country – both figuratively via new institutions like Bitcoin and eventually literally through possibilities like seasteading. He called the FDA “the man to beat – or escape, via exit – whether you want a new drug or to get to transhumanism”. And, uh, he also follows me on social media, which is definitely not a characteristic I expected multiple candidates for FDA director to have in common.

I worry that my more liberal friends won’t be sufficiently impressed. They’re thinking “Oh, more libertarians, like all those small-government people from the Bush and Reagan administrations”. My own view is that “libertarian” gets used to pick out at least two different clusters of people. One is rich crony capitalists who want a convenient excuse to cut their own taxes and roll back workers’ rights, but who fight tooth and nail against any decreased subsidies or increased competition that might threaten their own comfortable position. The other is people who are actually interested in using the power of competition to kindle innovation, improve access, bring down entrenched interests, and ultimately help regular people. The first cluster of libertarians has been around forever. O’Neill and Srinivasan seem to be part of the second. A principled, intelligent cluster-two libertarian getting the top job at one of the country’s worst bureaucracies would be practically unprecedented.

(and lest I try to weasel out of this one later, let me state for the record that if O’Neill or Srinivasan get chosen and are able to implement their preferred policies, but the US pharmaceutical industry doesn’t improve dramatically, I should accept it as a defeat for one or another hypothesis of mine. Either free markets don’t work in medicine, or I am so terrible at identifying principled intelligent libertarians that I might as well give up.)

Some important policies that an FDA commissioner like O’Neill or Srinivasan might be able to implement with high benefit and little cost:

1. Medical reciprocity with Europe and other First World countries (The Atlantic, Health Affairs, Marginal Revolution). Right now, Europe has a licensing agency about as strict as the FDA approving medications invented in Europe. Any pharma company that wants their medication approved in both the US and Europe has to spend a billion or so dollars getting it approved by the FDA, and then another billion or so dollars getting it approved by the Europeans. A lot of pharma companies don’t want to bother, with the end result that Europe has many good medications that America doesn’t, and vice versa. Just in my own field, amisulpride, one of the antipsychotics with the best safety/efficacy balance, has been used successfully in Europe for twenty years and is totally unavailable here despite a real need for better antipsychotic drugs. If the FDA agreed to approve any medication already approved by Europe (or to give it a very expedited review process), we could get an immediate windfall of dozens of drugs with unimpeachable records for almost no cost. Instead, in the real world, we’re cracking down on imported Canadian pharmaceuticals because the Canadians don’t have exactly our same FDA which means that for all we know they might be adding thalidomide to every pill or something. This is exactly the sort of silly anti-competitive cronyist practice that a principled intelligent libertarian could do away with.

2. Burdensome approval process for generic medications (SSC, more SSC). How come Martin Shkreli can hike the dose of an off-patent toxoplasma drug 5000%, and everyone just has to take it lying down even though the drug itself is so easy to produce that high school chemistry classes make it just to show they can? The reason is that every new company that makes a drug, even a widely-used generic drug that’s already been proven safe, has to go through a separate approval process that costs millions of dollars and takes two to three years – and which other companies in the market constantly try to sabotage through legal action. Shkreli can get away with his price hike because he knows that by the time the FDA gives anyone permission to compete with him, he’ll have made his fortune and moved on to his next nefarious scheme. If the FDA allowed reputable pharmaceutical companies in good standing to produce whatever generic drugs they wanted, the same as every other company is allowed to make whatever products they want, scandals like Daraprim and EpiPens would be a thing of the past, and the price of many medications could decrease by an order of magnitude.

3. Stop having that thing allowing companies to “steal” popular and effective drugs that have been in the public domain for years, claim them as their private property, shut down all competitors, and jack up the price 10x just by bringing them up to date with modern FDA bureaucracy.

4. Stop having that thing where drug companies can legally bribe other companies not to compete with them. I like this one because it sounds anti-libertarian (we’re imposing a new regulation on what companies can do!) but I think it’s exactly the sort of thing that the crony capitalists would never touch but which principled intelligent libertarians like O’Neill and Srinivasan might be open to, in order to bring more actors into the marketplace.

5. Stop thwarting consumer diagnostic products and genetic tests (SSC, more SSC). Srinivasan comes from the genetic testing world himself, so he’s likely to be extra sympathetic to this.

I notice that Jim O’Neill had (in 2014) a much more radical proposal than any of these: that the FDA should approve drugs based on safety but not efficacy; that is, drug companies have to prove that their drug isn’t dangerous, but they don’t have to do the long-term super-expensive studies proving that it works. This isn’t quite as crazy as it sounds – it just means we’d need to use academic studies and good judgment to figure out what works. We do this already in many cases – with drugs that were grandfathered in before the FDA existed (eg penicillin), or with drugs that the FDA approved for one indication but we use off-label but which we use for another indication (eg Prozac for anxiety). The best-case scenario is that a safety-but-not-efficacy regime would replicate that kind of careful skepticism across the board. The worst-case scenario is that we end up with a lot of ineffective drugs being used for a decade or so until science can catch up and prove them ineffective – something which is arguably already happening. Honestly this kind of policy is probably too revolutionary even for me – but in a world full of stupid regressive fear-driven bad ideas, it’s a bold revolutionary high-variance bad idea, and I respect that (see part 3 here for another problem with this idea).

The pharmaceutical industry stock index hasn’t moved much since some of these names started being floated. I’m not sure what to make of this; I wouldn’t have been able to predict which direction it would go, but I would have expected some direction.

One more really interesting potential appointment. Nature.com: Surprising Contenders Emerge For Trump’s NIH Chief. Most of the contenders aren’t that surprising: Collins is the current NIH chief, and Harris is a Republican congressman with an MD and a strong interest in health policy. One interesting idea of Harris’ is a pledge to lower the age at which researchers get their first grant, which would address a widely shared concern that we’re losing out on creative ideas because people have to spend decades learning to conform and playing academic politics before anyone pays attention to them. I guess this could be good (though see here, apparently the current NIH director supports this as well).

But one name stands out: Stanford statistician John Ioannidis, famous for promoting high quality studies and raising the standard of medical research. I don’t know what his administrative credentials are or what talents you need to run the NIH, but along with the Cochrane people he sets the gold standard for trustworthy bioscience, and having him in a high scientific position would do more to raise my confidence in the standard of US medical research than almost anything else I can think of.

I would say these picks raise my previously abysmal opinion of Trump, except that they all show the obvious hand of Peter Thiel. And I’m not sure it’s possible to raise my opinion of Thiel at this point without me doing something awkward like starting a cult.

Another Followup To “Economists On Education”

Last month I argued that a news article misrepresented the feelings of economists on school vouchers. I got a bit of pushback from people who thought I had just misread it, and that it wasn’t deceptive at all (1, 2, 3), and wrote an addendum post sticking to my guns.

One of my New Years resolutions is to be more empirical, so I thought this would be a good opportunity for an experiment. I offered to bet at 10:1 odds in the other person’s favor that most people shown the article would believe economists were against school vouchers. Noah Smith originally took me up on the bet, but later stopped cooperating with me about it and wouldn’t give a straight answer to the question of whether he was retracting his offer. Vikram Sasi agreed to continue the bet; I’m not sure if he disagreed with my assessment, thought the odds were too good to refuse, or just wanted the experiment to go ahead.

I proposed that we show some random people the article, then ask them a few easy comprehension questions to make sure they were putting in a decent effort. Then we would ask the following:

According to this article:
a) Most economists support privatizing education
b) Most economists oppose privatizing education
c) Unsure / the article doesn’t say

The true answer was (c) – the article doesn’t give enough information to tell you what economists think (which turns out to be mostly unsure, with more supporting the privatization than opposing). But I thought the article heavily (and falsely) implied b, and expected most people to give that answer.

Before Vikram and I could agree on exactly how we were going to do this, we got scooped by two other people who did the experiment without asking us.

Commenter Sripada got 40 people to take the survey on MTurk, of whom 35 answered the easy questions right and qualified for inclusion. Of those 35, 32 of them (91%) answered (b), misinterpreting the article to claim a consensus against privatizing education. Only one of them (3%) gave the correct answer (c), accurately noting that the article doesn’t tell us anything about an economic consensus for or against.

Hanne Watkins, a real research psychologist, also did a 50 person MTurk survey. I don’t have enough access to her data to limit the sample to people who answered the easy questions right, but among all participants 78% believed that most economists opposed privatization, compared to only 10% who correctly noted that the article didn’t say.

Vikram graciously accepted these studies as settling the bet, and since more than half of people misinterpreted the article in both, he paid me my $10.

I’m really happy about this, not just because I won ten dollars but because I feel like it was a rare example of getting to decisively settle a disagreement. Instead of arguing about whether or not the article was misleading, we figured out a way to test it, did the test, and now we know.

I should add that some people might think the article was non-misleading in a different way than I did that can’t be settled empirically (Noah seems to be in this group), and other people might think the particular set of questions the survey asked was unfair or biased. This is why I wish I’d gotten to bet somebody who I knew really disagreed with me about this, so at least that disagreement could have been settled for sure. As a consolation prize, I will take my ten dollars and my increased certainty that the article didn’t accurately convey economists’ thoughts.

Thanks to Vikram, Sripada, and Hanne for their help with this.

OT67: Comment Core

This is the bi-weekly visible open thread. There are hidden threads every few days here. Post about anything you want, ask random questions, whatever. Also:

1. Lots of people scored the predictions they posted for 2016; the ones I can find are JonGunnarsson, Anatoly Karlin, Anders Sandberg, and E. Harding – sorry if I missed anybody. And the Eukaryote Writes Blog offers Tips For Throwing A New Years’ Prediction Party.

2. Ozy from Thing of Things now has a Patreon. And Quillette, a magazine on sociobiology, academic freedom, and politically incorrect science (which I’ve linked to here a few times) has a Patreon too. See Jerry Coyne endorsing their fundraising drive here. Remember, if you don’t like someone who’s asking for money, don’t donate; there’s no need to be a jerk about it in the comments.

3. Speaking of Patreon, I’m going to try posting some shorter things here sometimes. If a post is unusually short, I won’t charge patrons until I’ve accumulated enough unusually short posts that they add up to one normal-length post (probably two or three).

4. Thanks to everyone who came out to UCI to attend the Irvine meetup last week. Hopefully you all had as good a time as I did.

5. Since I just got done posting an Unsong chapter and I’m still in the relevant frame of mind, comment of the week is Jaskologist’s theory of magic.

Posted in Uncategorized | Tagged | 859 Comments

Should Buzzfeed Publish Claims Which Are Explosive If True But Not Yet Proven?

Buzzfeed, January 14: A Mindset Revolution Sweeping Britain’s Classrooms May Be Built On Shaky Science.

Somebody needed to write this article. It’s written very well. I’ve talked to the writer, Tom Chivers, and he was very careful and seems like a great person. The article even quotes me, although I think if I had gotten to choose a quote of mine for thousands of people to see, it wouldn’t have been the one speculating about Carol Dweck making a pact with the Devil.

But I’m not entirely on board with it.

Growth mindset has been really hyped and Carol Dweck has said it can do implausibly exciting things, okay. A lot of smart people are very suspicious of growth mindset and think there has to be some trick, sure. There’s a high prior that something is up, definitely.

But one thing that needs to be at the core of any article like this is that, if there’s a trick, we haven’t found it.

I tried to be really clear about this in my own (mostly pessimistic) article on the subject:

It is right smack in the middle of a bunch of fields that have all started seeming a little dubious recently. Most of the growth mindset experiments have used priming to get people in an effort-focused or an ability-focused state of mind, but recent priming experiments have famously failed to replicate and cast doubt on the entire field. And growth mindset has an obvious relationship to stereotype threat, which has also started seeming very shaky recently. So I have every reason to be both suspicious of and negatively disposed toward growth mindset.

Which makes it appalling that the studies are so damn good.

This is the context of my speculation that Carol Dweck has made a pact with the Devil. I haven’t accused (for example) the stereotype threat people of making a pact with the Devil. They did some crappy studies and exaggerated the results. That doesn’t require any diabolic help. Any social scientist can do that, and most of them do. What’s interesting about the growth mindset research is that it looks just like the sort of thing that should fall apart with a tiny gust of wind, but it actually hangs together pretty well.

BuzzFeed doesn’t really challenge that. The article spends most of its time snarking about how overhyped growth mindset is – and no objections there, given that its advocates claim that it can eg help defuse the Israel-Palestine conflict and bring peace to the Middle East. It spends a bit more time talking about how many people are doubtful – no objections there either, I’m doubtful too.

But in terms of the evidence against it, it’s kind of thin. I only see three real points:

First, it uses a technique called GRIM (granularity-related inconsistency of means). I like its explanation so I’m just going to quote it verbatim:

t works like this: Imagine you have three children, and want to find how many siblings they have, on average. Finding an average, or mean, will always involve adding up the total number of siblings and dividing by the number of children – three. So the answer will always either be a whole number, or will end in .33 (a third) or .67 (two thirds). If there was a study that looked at three children and found they had, on average, 1.25 siblings, it would be wrong – because you can’t get that answer from the mean of three whole numbers.

But Dweck says that she “took ambiguous answers as half scores” – maybe if the child was halfway between growth mindset and fixed mindset it was counted as a 0.5. It’s bad practice to do this kind of thing without mentioning it. But everyone does some bad practices sometime. And I don’t see anybody claiming it affected the results, which were very strong and not likely to stand or fall based on these sorts of things. Nobody is claiming fraud, and Dweck released her original data which looks pretty much like she was generally honest but had some bad reporting practice. Neither the statistician involved nor BuzzFeed claims this affects Dweck’s work very much.

Second, it mentions Stuart Ritchie’s criticism of a couple of recent Dweck papers which show “marginally significant” results. These results are so weak that they’re probably coincidence, but the paper hypes them up. There are a couple of studies like this, but they’re all in very tangential areas of mindsetology, like how children inherit their parents’ mindsets. The original studies, again, show very strong results that don’t need this kind of pleading. For example, the one I cited in my original post got seven different results at the p < 0.001 level. And there are a lot of studies like this.

Third, it mentions a psychologist Timothy Bates who has tried to replicate Dweck’s experiments (at least) twice, and failed. This is the strongest evidence the article presents. But I don’t think any of Bates’ failed replications have been published – or at least I couldn’t find them. Yet hundreds of studies that successfully demonstrate growth mindset have been published. Just as a million studies of a fake phenomenon will produce a few positive results, so a million replications of a real phenomenon will produce a few negative results. We have to look at the entire field and see the balance of negative and positive results. The last time I tried to do this, the only thing I could find was this meta-analysis of 113 studies which found a positive effect for growth mindset and relatively little publication bias in the field.

My intuition tells me not to believe this meta-analysis. But I think it’s really important to emphasize that I’m going off intuition. There’s no shame in defying the data when you think that’s justified, but you had better be really aware that’s what you’re doing.

I guess my concern is this: the Buzzfeed article sounds really convincing. But I could write an equally convincing article, with exactly the same structure, refuting eg global warming science. I would start by talking about how global warming is really hyped in the media (true!), that people are making various ridiculous claims about it (true!), interview a few scientists who doubt it (98% of climatologists believing it means 2% don’t), and cite two or three studies that fail to find it (98% of studies supporting it means 2% don’t). Then I would point out slight statistical irregularities in some of the key global warming papers, because every paper has slight statistical irregularities. Then I would talk about the replication crisis a lot.

I could do this with pretty much any theory I wanted. Any technique strong enough to disprove anything disproves nothing.

(and this is especially important in light of recent really strange negative results that eg fail to find a sunk cost effect, something I would hate to enshrine as “well, guess this has been debunked, no such thing as sunk cost now”)

Again, this isn’t to say I believe in growth mindset. I recently talked to a totally different professor who said he’d tried and failed to replicate some of the original growth mindset work (again, not yet published). But we should do this the right way and not let our intuitions leap ahead of the facts.

I worry that one day there’s going to be some weird effect that actually is a bizarre miracle. Studies will confirm it again and again. And if we’re not careful, we’ll just say “Yeah, but replication crisis, also I heard a rumor that somebody failed to confirm it,” and then forget about it. And then we’ll miss our chance to bring peace to the Middle East just by doing a simple experimental manipulation on the Prime Minister of Israel.

I think it’s good that people are starting to question growth mindset. But at this point questioning it isn’t enough. In my essay I tried to find problems that might have caused spurious effects in Dweck’s studies, and patterns inconsistent with growth mindset being powerful. I think we need to do more of that, plus look for specific statistical and experimental flaws in the papers supporting growth mindset, plus start collecting real published papers that fail to replicate growth mindset. Instead of talking about how sketchy it is, we need to actually disprove it.

We owe it to ourselves, to Carol Dweck, and to her infernal masters.

Posted in Uncategorized | Tagged | 61 Comments

Why Do Test Scores Plateau?

I just got my exam results, so let’s talk medical residency standardized test statistics. In particular, let’s talk about average results by year – that is, compare doctors in their first year of training, their second year of training, etc.

I found three datasets. One is for internal medicine residents over their three-year education. Another is for psychiatry residents over their four-year education. The last is for surgery residents over their five-year education. All of them are standardized to a mean of 500 and standard deviation of 100 for all years lumped together. Here’s how they look:

INTERNAL MEDICINE (numbers eyeballed from graph)
Y1: 425
Y2: 500
Y3: 550

PSYCHIATRY
Y1: 412
Y2: 485
Y3: 534
Y4: 547

SURGERY
Y1: 399
Y2: 493
Y3: 543
Y4: 565
Y5: 570

Year of education starts out as a relatively important factor relative to individual differences, but quickly becomes irrelevant. There’s only a 17% chance that a randomly chosen first-year surgeon will know more about surgery than an average second-year. But there’s a 31% chance a second-year will know more than a third-year, and a 48% chance that a fourth-year will know more than a fifth-year. Compare fourth-year and fifth-year surgeons, and it’s pretty close to 50-50 which of them will know more surgery.

(also, four percent of final-year surgeons about to graduate their training know less than a first-year trainee who just walked through the door. Enjoy thinking about that next time you get an operation)

It looks like people learn the most in their first year, and less every following year. Checking averages of all the programs together supports this:

Y1 – Y2: + 81 points
Y2 – Y3: + 50 points
Y3 – Y4: + 18 points
Y4 – Y5: + 5 points

The standardized nature of the scoring hides how minimal these gains are. The surgery exam report tells me the raw percent correct, which goes like this:

Y1: 62%
Y2: 70%
Y3: 75%
Y4: 76%
Y5: 77%

So these numbers eventually plateau. I don’t think any residency program has a sixth year, but if it did people probably wouldn’t learn very much in it. Why not?

Might it be a simple ceiling effect – ie there’s only so much medicine to learn, and once you learn it, you’re done? No. We see above that Y5 surgeons are only getting 76% of questions right, well below the test ceiling. My hard-copy score report give similar numbers for psychiatry. Also, individuals can do much better than yearly-averages. Some psychiatrists I know consistently score in the high 600s / low 700s every year. Why can’t more years of education bring final-year residents closer to these high performers?

Might it be that final-year residents stop caring – the medical equivalent of senioritis? No. Score change per year seems similar across residencies regardless of how long the residencies are. For example, internal medicine residents gain 50 points in Year 3, about the same as psychiatrists and surgeons, even though internists finish that year, psychiatrists have one year left to go, and surgeons have two.

Might it be that programs stop teaching residents after three years or so, and they just focus on treating patients and not learning? That hasn’t been my experience. In my own residency program, every year attends the same number of hours of lectures per week and gets assigned the same number of papers and presentations. I think this is pretty typical.

Might it be that residents only learn by seeing patients, and there’s only a certain number of kinds of patients that you see regularly in an average hospital, so once you learn the kinds of cases you see, you’re done? And then more book learning doesn’t help at all? I think this is getting close, but it can’t be the right answer. If you look at that surgery table again, you see relatively similar trajectories for the sorts of things you learn by seeing patients (“Patient Care”, “Clinical Management”) and the sorts of things you learn from books and lectures (“Medical Knowledge, Applied Science”).

I don’t have a good answer for what’s going on. My gut feeling is that knowledge involves trees of complex facts branching off from personal experience and things that are constantly reinforced. Depending on an individual’s intelligence and interest in the topic, those trees can reach different depths before collapsing on themselves.

Spaced repetition programs like Anki talk about the forgetting curve, a model where memorized facts naturally decay after a certain amount of time until you remind yourself of them, after which they start decaying again (more slowly), and so on until by your nth repetition you’ll remember it for years or even the rest of your life.

Most people don’t use spaced repetition software, at least not consistently. For them, they’ll remember only those facts that get reinforced naturally. This is certainly true of doctors. By a weird turn of fate I earned a degree in obstetrics in 2012; five years later my knowledge of the subject has dwindled to a vague feeling that this comic isn’t completely accurate. On the other hand, I remember lots of facts about psychiatry and am optimistic about taking a board exam on the subject in September.

But most residents in a given specialty work about the same number of hours, see about the same sorts of patients – and yet still get very different scores on their exams. My guess is that individual differences in intelligence and interest affect things in two ways. First, some people probably have better memories than others, and can learn something if they go six months between reminders, whereas other people might forget it unless they get reminders every other month. Second, some people might be more intellectually curious than others, and so read a lot of journal articles that keep reminding them of things – whereas other people only think about them when it’s vital to the care of a patient they have right in front of them.

This still doesn’t feel right to me; I remember some things I’m not sure I ever get reminded about. Probably the degree to which you find something interesting matters a lot. And maybe there’s also a network effect, where when you think about any antidepressant (for example), it slightly reinforces and acts as a reminder about all your antidepressant-related knowledge, so that the degree to which everything you know is well-integrated and acts as a coherent whole matters a lot too.

Eventually you get to an equilibrium, where the amount of new knowledge you’re learning each day is the same as the amount of old knowledge you’re forgetting – and that’s your exam score. And maybe in medicine, given the amount of patient care and studying the average resident does each day, that takes three years or so.

Why does this matter? A while ago I looked at standardized test scores for schoolchildren by year, eg 1st-graders and 2nd-graders taking the same standardized test. The second-graders did noticeably better than the first graders; obviously 12th graders would do better still. But this unfairly combines the effects of extra education with the effects of an extra year of development. A twelfth-grader’s brain is more mature than a first-grader’s. Louis Benezet experimented with teaching children no math until seventh grade, after which it took only a few months’ instruction to get them to perform at a seventh grade level. It would sure be awkward if that was how everything worked.

Medical residency exams avoid this problem by testing doctors with (one hopes) fully mature brains. They find diminishing returns after only a few years. How much relevance does this have to ordinary education? I’m not sure.

Heuristics Work Until They Don’t

I.

I got to talk to some AI researchers last week, and they emphasized how surprised everyone had been by recent progress in the field. They all agreed on why they were surprised: the “AI winters”, two (or so) past episodes when AI hype got out of control and led to an embarrassing failure to meet expectations. Eventually everyone learned the heuristic “AI progress will never be as fast as people expect”. Then AI progress went faster than expected, and everyone using the old heuristic was caught flat-footed, denying the evidence of their own eyes.

Is this surprising? It’s hard (and possibly meaningless) to segment the history of AI into distinct “eras”, but let’s try it just for fun: suppose that there were two past eras, both of which went worse than expected. If there are equal chances of an era meeting, exceeding, or missing expectations, then there’s a 22% chance that we either get two consecutive booms or two consecutive busts by pure coincidence. If we form a heuristic around this (“it’s always boom” or “it’s always bust”), then we’re interpreting noise and the future is likely to surprise us.

A quick and dirty Bayesian calculation: imagine three models. In Model A, researchers are biased towards optimism: 80% of the time, they will predict greater success than they actually attain, 10% of the time they will get it exactly right, and 10% of the time they will undershoot. In Model B, researchers are biased towards pessimism to the same degree. In Model C, researchers are unbiased and will overshoot, undershoot, and hit expectations with equal probability. Suppose we start with a 50% prior on Model C, and equal 25% probabilities for A and B. After observing one era of inflated expectations, we should have 52% chance A, 6% chance B, and 42% chance C. After observing two such eras, we should think 74% A, 1% B, and 25% C. Adding up the chances of all of the models, there’s a 67% chance that the next era will also be one of inflated expectations, but there’s a 33% chance it won’t be.

This is all completely made up, plus my math is probably wrong. My point is that these kinds of “heuristics” gleaned from n = 2 data points are a lot less interesting than you would think. Getting fooled twice in the same way probably feels pretty convincing, and I can’t blame the people involved for wanting to take a hard line against ever falling for it again. But their confidence that they’re right should be pretty low.

II.

Thinking about this reminded me of an article from The Week, November 2012:

Romney genuinely believed that he would become the nation’s 45th president, and was “shellshocked” by his landslide loss. “I don’t think there was one person who saw this coming,” one senior adviser told Jan Crawford at CBS News. Why was Team Romney so certain of victory? They simply did not believe that younger voters and minorities would turn out the way they did in 2008. “As a result,” says Crawford, “they believed that the public/media polls were skewed” in Obama’s favor, and rejiggered them to show Romney with “turnout levels more favorable to Romney.” In essence, Romney “unskewed” the polls, mirroring widely mocked moves by conservatives to show their candidate with a lead, epitomized by the now-infamous website UnskewedPolls.com. Romney’s defenders say he had plausible reasons to believe Obama’s turnout would be lower; less charitable commentators say Romney and his aides were stuck in a conservative media echo chamber at odds with reality.

Mitt Romney lost in exactly the way all the polls had predicted he would lose, but he wasn’t expecting it because he had cheerfully constructed a story of decreased minority turnout which no real poll supported. This story became a kind of King Canute style warning of the folly of Man – just accept the fricking polls, don’t come up with some private narrative about how decreased turnout will show up on a white horse and save you at the last second.

But we all know what happened in 2016. In retrospect, the fact that decreased minority turnout didn’t happen in one election, with the most popular-among-minorities candidate of all time, shouldn’t have been enough to form a strong heuristic that it would never happen at all.

This is even worse than the story above, because it’s n = 1. I wonder if part of it is the degree to which Romney’s loss formed a useful moral parable – the story of the arrogant fool who said that all the evidence against him was wrong, but got his comeuppance. Well, this last election taught us that arrogant fools don’t get their comeuppance as consistently as we would like.

III.

Speaking of the 2016 election, I feel the same way about this explanation of Hillary’s loss. It spins a narrative where the Hillary campaign management put all of their trust in flashy Big Data and ignored the grizzled campaign specialists who had boots on the ground, as if this was a moral lesson we should all take to heart.

But Moneyball makes the opposite argument. There, managers boldly decided to trust in statistics instead of just listening to the “intuitions” and “conventional wisdom” of professed experts, and they trounced the grizzled people with their ground-boots.

Anyone who learned the obvious lesson from Moneyball (“Hard math can defeat fallible human intuitions) would fail at the 2016 campaign, and anyone who learned the obvious lesson from the 2016 campaign (“Real experience and domain knowledge beat overeducated Big Data hotshots every time”) would fail at the 2003 baseball season.

The solution is: stop treating life as a series of moral parables. Once you get that, it all just becomes evidence – and then you wonder whether a single data point about Presidential campaigns necessarily generalizes to baseball or whatever.

IV.

If I’ve successfully convinced you that you shouldn’t form strong heuristics just by looking at a few salient examples where they seem to hold true, then shame on you.

Posted in Uncategorized | Tagged | 348 Comments

Predictions For 2017

At the beginning of every year, I make predictions. At the end of every year, I score them. So here are a hundred more for 2017.

WORLD EVENTS
1. US will not get involved in any new major war with death toll of > 100 US soldiers: 60%
2. North Korea’s government will survive the year without large civil war/revolt: 95%
3. No terrorist attack in the USA will kill > 100 people: 90%
4. …in any First World country: 80%
5. Assad will remain President of Syria: 80%
6. Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90%
7. No major intifada in Israel this year (ie > 250 Israeli deaths, but not in Cast Lead style war): 80%
8. No interesting progress with Gaza or peace negotiations in general this year: 90%
9. No Cast Lead style bombing/invasion of Gaza this year: 90%
10. Situation in Israel looks more worse than better: 70%
11. Syria’s civil war will not end this year: 60%
12. ISIS will control less territory than it does right now: 90%
13. ISIS will not continue to exist as a state entity in Iraq/Syria: 50%
14. No major civil war in Middle Eastern country not currently experiencing a major civil war: 90%
15. Libya to remain a mess: 80%
16. Ukraine will neither break into all-out war or get neatly resolved: 80%
17. No major revolt (greater than or equal to Tiananmen Square) against Chinese Communist Party: 95%
18. No major war in Asia (with >100 Chinese, Japanese, South Korean, and American deaths combined) over tiny stupid islands: 99%
19. No exchange of fire over tiny stupid islands: 90%
20. No announcement of genetically engineered human baby or credible plan for such: 90%
21. EMDrive is launched into space and testing is successfully begun: 70%
22. A significant number of skeptics will not become convinced EMDrive works: 80%
23. A significant number of believers will not become convinced EMDrive doesn’t work: 60%
24. No major earthquake (>100 deaths) in US: 99%
25. No major earthquake (>10000 deaths) in the world: 60%
26. Keith Ellison chosen as new DNC chair: 70%

EUROPE
27. No country currently in Euro or EU announces new plan to leave: 80%
28. France does not declare plan to leave EU: 95%
29. Germany does not declare plan to leave EU: 99%
30. No agreement reached on “two-speed EU”: 80%
31. The UK triggers Article 50: 90%
32. Marine Le Pen is not elected President of France: 60%
33. Angela Merkel is re-elected Chancellor of Germany: 60%
34. Theresa May remains PM of Britain: 80%
35. Fewer refugees admitted 2017 than 2016: 95%

ECONOMICS
36. Bitcoin will end the year higher than $1000: 60%
37. Oil will end the year higher than $50 a barrel: 60%
38. …but lower than $60 a barrel: 60%
39. Dow Jones will not fall > 10% this year: 50%
40. Shanghai index will not fall > 10% this year: 50%

TRUMP ADMINISTRATION
41. Donald Trump remains President at the end of 2017: 90%
42. No serious impeachment proceedings are active against Trump: 80%
43. Construction on Mexican border wall (beyond existing barriers) begins: 80%
44. Trump administration does not initiate extra prosecution of Hillary Clinton: 90%
45. US GDP growth lower than in 2016: 60%
46. US unemployment to be higher at end of year than beginning: 60%
47. US does not withdraw from large trade org like WTO or NAFTA: 90%
48. US does not publicly and explicitly disavow One China policy: 95%
49. No race riot killing > 5 people: 95%
50. US lifts at least half of existing sanctions on Russia: 70%
51. Donald Trump’s approval rating at the end of 2017 is lower than fifty percent: 80%
52. …lower than forty percent: 60%

COMMUNITIES
53. SSC will remain active: 95%
54. SSC will get fewer hits than in 2016: 60%
55. At least one SSC post > 100,000 hits: 70%
56. I will complete an LW/SSC survey: 80%
57. I will finish a long FAQ this year: 60%
58. Shireroth will remain active: 70%
59. No co-bloggers (with more than 5 posts) on SSC by the end of this year: 80%
60. Less Wrong renaissance attempt will seem less (rather than more) successful by end of this year: 90%
61. > 15,000 Twitter followers by end of this year: 80%
62. I won’t stop using Twitter, Tumblr, or Facebook: 90%
63. I will attend the Bay Area Solstice next year: 90%
64. …some other Solstice: 60%
65. …not the New York Solstice: 60%

WORK
66. I will take the job I am currently expecting to take: 90%
67. …at the time I am expecting to take it, without any delays: 80%
68. I will like the job and plan to continue doing it for a while: 70%
69. I will pass my Boards: 90%
70. I will be involved in at least one published/accepted-to-publish research paper by the end of 2017: 50%
71. I will present a research paper at the regional conference: 80%
72. I will attend the APA national meeting in San Diego: 90%
73. None of my outpatients to be hospitalized for psychiatric reasons during the first half of 2017: 50%
74. None of my outpatients to be involuntarily committed to psych hospital by me during the first half of 2017: 70%
75. None of my outpatients to attempt suicide during the first half of 2017: 90%
76. I will not have scored 95th percentile or above when I get this year’s PRITE scores back: 60%

PERSONAL
77. Amazon will not harass me to get the $40,000 they gave me back: 80%
78. …or at least will not be successful: 90%
79. I will drive cross-country in 2017: 70%
80. I will travel outside the US in 2017: 70%
81. …to Europe: 50%
82. I will not officially break up with any of my current girlfriends: 60%
83. K will spend at least three months total in Michigan this year: 70%
84. I will get at least one new girlfriend: 70%
85. I will not get engaged: 90%
86. I will visit the Bay in May 2017: 60%
87. I will have moved to the Bay Area: 99%
88. I won’t live in Godric’s Hollow for at least two weeks continuous: 70%
89. I won’t live in Volterra for at least two weeks continuous: 70%
90. I won’t live in the Bailey for at least two weeks continuous: 95%
91. I won’t live in some other rationalist group home for at least two weeks continuous: 90%
92. I will be living in a house (incl group house) and not apartment building at the end of 2017: 60%
93. I will still not have gotten my elective surgery: 90%
94. I will not have been hospitalized (excluding ER) for any other reason: 95%
95. I will make my savings target at the end of 2017: 60%
96. I will not be taking any nootropic (except ZMA) daily or near-daily during any 2-month period this year: 90%
97. I won’t publicly and drastically change highest-level political/religious/philosophical positions (eg become a Muslim or Republican): 90%
98. I will not get drunk this year: 80%
99. I get at least one article published on a major site like Huffington Post or Vox or New Statesman or something: 50%
100. I attend at least one wedding this year: 50%
101. Still driving my current car at the end of 2017: 90%
102. Car is not stuck in shop for repairs for >1 day during 2017: 60%
103. I will use Lyft at least once in 2017: 60%
104. I weight > 185 pounds at the end of 2017: 60%
105. I weight < 195 pounds at the end of 2017: 70%

Posted in Uncategorized | Tagged | 421 Comments

Trump And The Batman Effect

Today on Trump Twitter:

Here’s my concern.

When US companies do something that sounds good in the next few years, whether it’s hiring new people, or deciding to stay in the United States, or reporting high profits, some of them are going to credit President Trump.

First, because it’s going to get them good press. “Ford decides not to build plant in Mexico” is tenth-page news. “Ford decides not to build plant in Mexico because of President Trump” is front-page news.

But second, because it’s going to make the President like them. I don’t know whether Trump is secretly sending people to whatever conferences all of these people go to, saying “if you decide to do something good, give me credit, and I’ll do you a favor later”. I assume he isn’t. This is the sort of thing that coordinates itself, without any inconvenient documents that can get posted to WikiLeaks later. If you’re the CEO of Ford, and you notice you’re doing something that would make Trump look really good if you attributed it to him, why not attribute it to him for free, then remind him how much he likes you next time you need a tax cut or a subsidy or something? Trump has put a lot of effort into crafting his image as a person who repays favors (think appointing many of his earliest supporters to Cabinet positions) – you think businesspeople aren’t going to notice that kind of thing?

But also:

0.1% of the time a US company does something that looks bad, like close a plant or move jobs overseas, Trump is going to launch a media crusade against them. The Presidency has a big pulpit and he’s going to get a lot of people angry. Then Trump will offer them some kind of deal, and the company will back down. Not because they’ve learned the error of their ways. Not even because the deal was so good. But because making the President (and the public) happy is much more important to them than moving jobs to Mexico or whatever they were doing before.

Mother Jones mentions in passing that Carrier air conditioning, Trump’s biggest job “success” so far, is owned by a giant defense contractor who gets probably like 1% of their profits from air conditioning. Presumably the company would be happy to never sell another air conditioner again if it meant that the government chooses their fighter jets over the competing brand. Knowing Trump’s style of corruption, they have every reason to believe this will happen after they handed him a big PR victory.

This plan isn’t going to scale. Even Trump can only create so many media circuses. 999 companies will successfully move to Mexico in the amount of time it takes Trump to convince one company not to. But almost tautologically, the only ones we’ll ever hear about are the ones that become media circuses, and so it will look like Trump keeps winning.

So based on these two strategies, we are in for four years of sham Trump victories which look really convincing on a first glance. Every couple of weeks, until it gets boring, another company is going to say Trump convinced them to keep jobs in the United States. The total number of jobs saved this way will never be more than a tiny fraction of the jobs that could be saved by (eg) good economic policy, but nobody knows anything about economic policy and Trump will make sure everybody hears about Ford keeping jobs in the US. Every one of these victories will actively make the world worse, in the sense that these big companies will get taxpayer subsidies or favors they can call in later to distort government priorities, but nobody’s going to notice these either.

I think it’s important that we be prepared for this and send a clear message, before this gets any worse, that these aren’t to be taken seriously.

I also think it’s important to be prepared for the fact that this clear message won’t work. Imagine you’re a factory worker in Indiana, and every week you hear on the news that Trump convinced another factory to stay in the US. And also, you read an editorial by Paul Krugman or someone saying that this is all a trick. What do you end out believing?

And saving jobs isn’t the only way he can do this. Trump’s talent is PR, having his finger on the pulse of the media. He can spot things like that guy who raised the price of the toxoplasma drug 1000%, and then he can go in, make some corrupt deal, and get him to back down. He can spot all of those culture war things where the entire country is going to spend a month focused on the same small-town bakery, and by throwing around the entire might of the federal government he can probably make everyone back off and pose together for a nice group photo. If he can get all of these things right (and it will play exactly to his talents), then a majority of people won’t care what policies his administration passes. I think this is a big part of his plan.

There’s an old joke about Batman. Suppose you’re a hypercompetent billionaire in a decaying city, and you want to do something about the crime problem. What’s your best option? Maybe you could to donate money to law-enforcement, or after-school programs for at-risk teens, or urban renewal. Or you could urge your company full of engineering geniuses to invent new police tactics and better security systems. Or you could use your influence as a beloved celebrity to petition the government to pass laws which improve efficiency of the justice system.

Bruce Wayne decided to dress up in a bat costume and personally punch criminals. And we love him for it.

I worry that Trump’s plan for his administration is to dress up in a President costume and personally punch people we don’t like, while leaving policy to rot. And I worry it’s going to work.

[prediction: highly-publicized stories about Trump successfully keeping businesses in the US on a case-by-case basis, which never add up to a significant number of jobs saved, will keep coming, and be a central point of how his administration relates to the public over the next year: 50%]

Posted in Uncategorized | Tagged | 812 Comments

OT66: Thread Lang Syne (+ Irvine Meetup)

This is the bi-weekly visible open thread. There are hidden threads every few days here. Post about anything you want, ask random questions, whatever. Also:

1. Happy new year! And thanks to everyone who read, contributed to, and supported this blog during 2016.

2. I’ll be in Irvine, California early next week. If anyone there wants to meet, I can make the Peet’s Coffee at 4213 Campus Dr, 7:00 PM on January 4th. Don’t expect many posts during that time.

3. I’ve updated the Mistakes page with a few new mistakes and metamistakes.

4. I know that raikoth.net is down. I think I am going to let it disappear quietly. It was poorly organized and too linked to my real name. I’ll repost the non-libertarian FAQ and other interesting stuff here sometime soon. Thanks to everyone who pointed this out to me.

Posted in Uncategorized | Tagged | 959 Comments

2016 Predictions: Calibration Results

At the beginning of every year, I make predictions. At the end of every year, I score them. Here are 2014 and 2015.

And here are the predictions I made for 2016. Strikethrough’d are false. Intact are true. Italicized are getting thrown out because I can’t decide if they’re true or not.

WORLD EVENTS
1. US will not get involved in any new major war with death toll of > 100 US soldiers: 60%
2. North Korea’s government will survive the year without large civil war/revolt: 95%
3. Greece will not announce it’s leaving the Euro: 95%
4. No terrorist attack in the USA will kill > 100 people: 90%
5. …in any First World country: 80%
6. Assad will remain President of Syria: 60%
7. Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90%
8. No major intifada in Israel this year (ie > 250 Israeli deaths, but not in Cast Lead style war): 80%
9. No interesting progress with Gaza or peace negotiations in general this year: 90%
10. No Cast Lead style bombing/invasion of Gaza this year: 90%
11. Situation in Israel looks more worse than better: 70%
12. Syria’s civil war will not end this year: 70%
13. ISIS will control less territory than it does right now: 90%
14. ISIS will not continue to exist as a state entity: 60%
15. No major civil war in Middle Eastern country not currently experiencing a major civil war: 90%
16. Libya to remain a mess: 80%
17. Ukraine will neither break into all-out war or get neatly resolved: 80%
18. No country currently in Euro or EU announces plan to leave: 90%
19. No agreement reached on “two-speed EU”: 80%
20. Hillary Clinton will win the Democratic nomination: 95%
21. Donald Trump will win the Republican nomination: 60%
22. Conditional on Trump winning the Republican nomination, he impresses everyone how quickly he pivots towards wider acceptability: 70%
23. Conditional on Trump winning the Republican nomination, he’ll lose the general election: 80%
24. Conditional on Trump winning the Republican nomination, he’ll lose the general election worse than either McCain or Romney: 70%
25. Marco Rubio will not win the Republican nomination: 60%
26. Bloomberg will not run for President: 80%
27. Hillary Clinton will win the Presidency: 60%
28. Republicans will keep the House: 95%
29. Republicans will keep the Senate: 70%
30. Bitcoin will end the year higher than $500: 80%
31. Oil will end the year lower than $40 a barrel: 60%
32. Dow Jones will not fall > 10% this year: 70%
33. Shanghai index will not fall > 10% this year: 60%
34. No major revolt (greater than or equal to Tiananmen Square) against Chinese Communist Party: 95%
35. No major war in Asia (with >100 Chinese, Japanese, South Korean, and American deaths combined) over tiny stupid islands: 99%
36. No exchange of fire over tiny stupid islands: 90%
37. US GDP growth lower than in 2015: 60%
38. US unemployment to be lower at end of year than beginning: 50%
39. No announcement of genetically engineered human baby or credible plan for such: 90%
40. No major change in how the media treats social justice issues from 2015: 70%
41. European far right makes modest but not spectacular gains: 80%
42. Mainstream European position at year’s end is taking migrants was bad idea: 60%
43. Occupation of Oregon ranger station ends: 99%
44. So-called “Ferguson effect” continues and becomes harder to deny: 70%
45. SpaceX successfully launches a reused rocket: 50%
46. Nobody important changes their mind much about the EMDrive based on any information found in 2016: 80%
47. California’s drought not officially declared over: 50%
48. No major earthquake (>100 deaths) in US: 99%
49. No major earthquake (>10000 deaths) in the world: 60%

PERSONAL/COMMUNITY
1. SSC will remain active: 95%
2. SSC will get fewer hits than in 2015: 60%
3. At least one SSC post > 100,000 hits: 50%
4. UNSONG will get fewer hits than SSC in 2016: 90%
5. > 10 new permabans from SSC this year: 70%
5. UNSONG will get > 1,000,000 hits: 50%
6. UNSONG will not miss any updates: 50%
7. UNSONG will have higher Google Trends volume than HPMOR at the end of this year: 60%
8. UNSONG Reddit will not have higher average user activity than HPMOR Reddit at the end of this year: 60%
9. Shireroth will remain active: 70%
10. I will be involved in at least one published/accepted-to-publish research paper by the end of 2016: 50%
11. I won’t stop using Twitter, Tumblr, or Facebook: 95%
12. > 10,000 Twitter followers by end of this year: 50%
13. I will not break up with any of my current girlfriends: 70%
14. I will not get any new girlfriends: 50%
15. I will attend at least one Solstice next year: 90%
16. …at least two Solstices: 70%
17. I will finish a long blog post review of stereotype threat this year: 60%
18. Conditional on finishing it, it won’t significantly change my position: 90%
19. I will finish a long FAQ this year: 60%
20. I will not have a post-residency job all lined up by the end of this year: 80%
21. I will have finished all the relevant parts of my California medical license application by the end of this year: 70%
22. I will no longer be living in my current house at the end of this year: 70%
23. I will still be at my current job: 95%
24. I will still not have gotten my elective surgery: 80%
25. I will not have been hospitalized (excluding ER) for any other reason: 95%
26. I will not have taken any international vacations with my family: 70%
27. I will not be taking any nootropic daily or near-daily during any 2-month period this year: 90%
28. I will complete an LW/SSC survey: 80%
29. I will complete a new nootropics survey: 80%
30. I will score 95th percentile or above in next year’s PRITE: 50%
31. I will not be Chief Resident next year: 60%
32. I will not have any inpatient rotations: 50%
33. I will continue doing outpatient at the current clinic: 90%
34. I will not have major car problems: 60%
35. I won’t publicly and drastically change highest-level political/religious/philosophical positions (eg become a Muslim or Republican): 90%
36. I will not vote in the 2016 primary: 70%
37. I will vote in the 2016 general election: 60%
38. Conditional on me voting and Hillary being on the ballot, I will vote for Hillary: 90%
39. I will not significantly change my mind about psychodynamic or cognitive-behavioral therapy: 80%
40. I will not attend the APA meeting this year: 80%
41. I will not do any illegal drugs (besides gray-area nootropics) this year: 90%
42. I will not get drunk this year: 80%
43. Less Wrong will neither have shut down entirely nor undergone any successful renaissance/pivot by the end of this year: 60%
44. No co-bloggers (with more than 5 posts) on SSC by the end of this year: 80%
45. I get at least one article published on a major site like Huffington Post or Vox or New Statesman or something: 50%
46. I still plan to move to California when I’m done with residency: 90%
47. I don’t manage to make it to my friend’s wedding in Ireland: 60%
48. I don’t attend any weddings this year: 50%
49. I decide to buy the car I am currently leasing: 60%
50. Except for the money I spend buying the car, I make my savings goal before July 2016: 90%

Of 50% predictions, I got 8 right and 5 wrong, for a score of 62%
Of 60% predictions, I got 12 right and 9 wrong, for a score of 57%
Of 70% predictions, I got 13 right and 3 wrong, for a score of 81%
Of 80% predictions, I got 13 right and 3 wrong, for a score of 81%
Of 90% predictions, I got 16 right and 1 wrong, for a score of 94%
For 95% predictions, I got 9 right and 0 wrong, for a score of 100%
For 99% predictions, I got 3 right and 0 wrong, for a score of 100%

This is the graph of my accuracy for this year:

Red is hypothetical perfect calibration, blue is my calibration. I am too lazy and bad at graphs to put in 95% right, but it doesn’t change the picture very much (especially because it’s impossible to get a very accurate 95% with 9 questions).

The 50% number is pretty meaningless, as many people have noted, so my main deviation was some underconfidence at 70%. This was probably meaningless in the context of this year’s numbers alone, but looking back at 2014 and 2015, I see a pretty similar picture. I am probably generally a little bit underconfident in medium probabilities (I have also gotten lazier about making graphs).

Overall I rate this year’s predictions a success. Predictions for 2017 coming soon.

Posted in Uncategorized | Tagged | 130 Comments