codex Slate Star Codex

Dispatches from Weird Platonic Spherical Cow Perfect Rationality Outside View World

Before You Get Too Excited About That GitHub Study…

Another day, another study purporting to find that Tech Is Sexist. Since it’s showing up here, you probably already guessed how this is going to end. Most of this analysis is not original to me – Hacker News had figured a lot of it out before I even woke up this morning – but I think it’ll at least be helpful to collect all the information in one easily linkable place.

The study is Gender Bias In Open Source: Pull Request Acceptance Of Women Vs. Men. It’s a pretty neat idea: “pull requests” are discrete units of contribution to an open source project which are either accepted or rejected by the community, so just check which ones are submitted by men vs. women and what the acceptance rate is. This is a little harder than it sounds – people on GitHub use nicks that don’t always give gender cues – but the researchers wrote a program to automatically link contributor emails to Google Plus pages so they could figure out users’ genders.

Of course, this doesn’t rule out that one gender is genuinely doing something differently than another, so they had another neat trick: they wrote another program that automatically scored accounts on obvious gender cues: for example, somebody whose nickname was JaneSmith01, or somebody who had a photo of themselves on their profile. By comparing obviously gendered participants with non-obviously gendered participants whom the researchers had nevertheless been able to find the gender of, they should be able to tell whether there’s gender bias in request acceptances.

Because GitHub is big and their study is automated, they manage to get a really nice sample size – about 2.5 million pull requests by men and 150,000 by women.

They find that women get more (!) requests accepted than men for all of the top ten programming languages. They check some possible confounders – whether women make smaller changes (easier to get accepted) or whether their changes are more likely to serve an immediate project need (again, easier to get accepted) and in fact find the opposite – women’s changes are larger and less likely to serve project needs. That makes their better performance extra impressive.

So the big question is whether this changes based on obviousness of gender. The paper doesn’t give a lot of the analyses I want to see, and doesn’t make its data public, so we’ll have to go with the limited information they provide. They do not provide an analysis of the population as a whole (!) but they do give us a subgroup analysis by “insider status”, ie whether the person has contributed to that project before.

Among insiders, women do the same as men when gender is hidden, but better than men when gender is revealed. In other words, if you know somebody’s a woman, you’re more likely to approve her request. We can’t quantify exactly how much this is, because the paper doesn’t provide numbers, just graphs. Eyeballing the graph, it looks like being a woman gives you about a 1% advantage. The study and the media coverage ignore this result, even though it’s half the study, and as far as I can tell the more statistically significant half.

Among outsiders, women do the same as/better than men when gender is hidden, and the same as/worse than men when gender is revealed. I can’t be more specific than this because the study doesn’t give numbers and I’m trying to eyeball confidence intervals on graphs. The study itself say that women do worse than men when gender is revealed, so since the researchers presumably have access to their real numbers data, that might mean the confidence intervals don’t overlap. From eyeballing the graph, it looks like the difference is 1% – ie, men get their requests approved 64% of the time, and women 63% of the time. Once again, it’s hard to tell by graph-eyeballing whether these two numbers are within each other’s confidence intervals.

The paper concludes that “for insiders…we see little evidence of bias…for outsiders, we see evidence of gender bias: women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable. There is a similar drop for men, but the effect is not as strong.”

In other words, they conclude there is gender bias because obvious-women do worse than gender-anonymized-women. They admit that obvious-men also do worse than gender-anonymized men, but they ignore this effect because it’s smaller. They do not report doing a test of statistical significance on whether it is really smaller or not.

So:

1. Among insiders, women get more requests accepted than men.

2. Among insiders, people are biased towards women, that is, revealing genders gives women an advantage over men above and beyond the case where genders are hidden.

3. Among outsiders, women still get more requests accepted than men.

4. Among outsiders, revealing genders appears to show a bias against women. It’s not clear if this is statistically significant.

5. When all genders are revealed among outsiders, men appear to have their requests accepted at a rate of 64%, and women of 63%. The study does not provide enough information to determine whether this is statistically significant.

6. The study describes its main finding as being that women have fewer requests approved when their gender is known. It hides on page 16 that men also have fewer requests approved when their gender is known. It describes the effect for women as larger, but does not report the size of the male effects, nor whether the difference is statistically significant. From eyeballing it, about 2/3 the size of the female effect, and maybe?

7. The study has no hypothesis for why both sexes have fewer requests approved when their gender is known, without which it seems kind of hard to speculate about the significance of the phenomenon for one gender in particular. For example, suppose that the reason revealing gender decreases acceptance rates is because corporate contributors tend to use their (gendered) real names and non-corporate contributors tend to use handles like 133T_HAXX0R. And suppose that the best people of all genders go to work at corporations, but a bigger percent of women go there than men. Then being non-gendered would be a higher sign of quality in a man than in a woman. This is obviously a silly just-so story, but my point is that without knowing why all genders show a decline after unblinding, it’s premature to speculate about why their declines are of different magnitudes – and it doesn’t take much to get a difference of 1%.

8. There’s no study-wide analysis, and no description of how many different subgroup analyses the study tried before settling on Insiders vs. Outsiders (nor how many different definitions of Insider vs. Outsider they tried). Remember, for every subgroup you try, you need to do a Bonferroni correction. This study does not do any Bonferroni corrections; given its already ambiguous confidence intervals, a proper correction would almost certainly destroy the finding.

9. We still have that result from before that women’s changes are larger and less likely to serve project needs, both of which make them less likely to be accepted. No attempt was made to control for this.

“Science” “journalism”, care to give a completely proportionate and reasonable response to this study?

Here’s Business Insider: Sexism Is Rampant Among Programmers On GitHub, Research Finds. “A new research report shows just how ridiculously tough it can be to be a woman programmer, especially in the very male-dominated world of open-source software….it also shows that women face a giant hurdle of “gender bias” when others assess their work. This research also helps explain the bigger problem: why so many women who do enter tech don’t stick around in it, and often move on to other industries within 10 years. Why bang your head against the wall for longer than a decade?” [EDIT: the title has since been changed]

Here’s Tech Times: Women Code Better Than Men But Only If They Hide Their Gender: “Interestingly enough, among users who were not well known in the coding community, coding suggestions from those whose profiles clearly stated that the users were women had a far lower acceptance rate than suggestions from those who did not make their gender known. What this means is that there is a bias against women in the coding world.” (Note the proportionate and reasonable use of the term “far lower acceptance rate” to refer to a female vs. male acceptance rate of, in the worst case, 63% vs. 64%.)

Here’s Vice.com: Women Are Better At Coding Than Men: “If feminism has taught us anything, it’s that almost all men are sexist. As this GitHub data shows, whether or not bros think that they view women as equals, women’s work is not being judged impartially. On the web, a vile male hive mind is running an assault mission against women in tech.”

This is normally the part at which I would ask how this study got through peer review, but luckily this time there is a very simple answer: it didn’t. If you read the study, you may notice the giant red “NOT PEER-REVIEWED” sign on the top of every page. The paper was uploaded to a pre-peer-review site asking for comments. The authors appear to be undergraduate students.

I don’t blame the authors for doing a neat study and uploading it to a website. I do blame the entire world media up to and including the BBC for swallowing it uncritically. Note that two of the three news sources above failed to report that it is not peer-reviewed.

Oh, one more thing. A commenter on the paper’s pre-print asked for a breakdown by approver gender, and the authors mentioned that “Our analysis (not in this paper — we’ve cut a lot out to keep it crisp) shows that women are harder on other women than they are on men. Men are harder on other men than they are on women.”

Depending on what this means – since it was cut out of the paper to “keep it crisp”, we can’t be sure – it sounds like the effect is mainly from women rejecting other women’s contributions, and men being pretty accepting of them. Given the way the media predictably spun this paper, it is hard for me to conceive of a level of crispness which justifies not providing this information.

So, let’s review. A non-peer-reviewed paper shows that women get more requests accepted than men. In one subgroup, unblinding gender gives women a bigger advantage; in another subgroup, unblinding gender gives men a bigger advantage. When gender is unblinded, both men and women do worse; it’s unclear if there are statistically significant differences in this regard. Only one of the study’s subgroups showed lower acceptance for women than men, and the size of the difference was 63% vs. 64%, which may or may not be statistically significant. This may or may not be related to the fact, demonstrated in the study, that women propose bigger and less useful changes on average; no attempt was made to control for this. This tiny amount of discrimination against women seems to be mostly from other women, not from men.

The media uses this to conclude that “a vile male hive mind is running an assault mission against women in tech.”

Every time I say I’m nervous about the institutionalized social justice movement, people tell me that I’m crazy, that I’m just sexist and privileged, and that feminism is merely the belief that women are people so any discomfort with it is totally beyond the pale. I would nevertheless like to re-emphasize my concerns at this point.

[EDIT: I don’t have much of a quarrel with the authors, who seem to have done an interesting study and are doing the correct thing by submitting it for peer review. I have a big quarrel with “science” “journalists” for the way they reported it. If any of the authors read this and want my peer review suggestions, I would recommend:

1. Report gender-unblinding results for the entire population before you get into the insiders-vs.-outsiders dichotomy.
2. Give all numbers represented on graphs as actual numbers too.
3. Declare how many different subgroup groupings you tried, and do appropriate Bonferroni corrections.
4. Report the magnitude of the male drop vs. the female drop after gender-unblinding, test if they’re different, and report the test results.
5. Add the part about men being harder on men and vice versa, give numbers, and do significance tests.
6. Try to find an explanation for why both groups’ rates dropped with gender-unblinding. If you can’t, at least say so in the Discussion and propose some possibilities.
7. Fix the way you present “Women’s acceptance rates are 71.8% when they use gender neutral profiles, but drop to 62.5% when their gender is identifiable”, at the very least by adding the comparable numbers about the similar drop for men in the same sentence. Otherwise this will be the heading for every single news article about the study and nobody will acknowledge that the drop for men exists at all. This will happen anyway no matter what you do, but at least it won’t be your fault.
8. If possible, control for your finding that women’s changes are larger and less-needed and see how that affects results. If this sounds complicated, I bet you could find people here who are willing to help you.
9. Please release an anonymized version of the data; it should be okay if you delete all identifiable information.]

Testimonials for SSC

[Content note: various slurs and insults]

I.

Last post I thanked some of the people who have contributed to this blog. But I forgot some of the most important contributors: the many readers whose give valuable feedback on everything I write.

So here’s a short sample of some of the feedback I’ve gotten over the past three years. I’m avoiding names and links to avoid pile-ons, but you can probably find most of these if you Google them.

II.

“It’s like someone tried to make fivethirtyeight as uninteresting as possible.”

“Slate Star Codex: 20,000 words on ‘feminism is bad’ and ‘Tom Swifties are the funniest shit I’ve ever seen'”

“Mark Atwood spent a month on the SSC registry of bans. It is my belief that Scott uses his toolbox of psychiatric techniques to manage the range of comment allowable on his blog. This may be justified in order to minimize flaming and trolling, but it is also a stifling form of censorship.”

“Go read the comment section on Slate Star Codex for a week and report back if you think LessWrongism is acceptable. It’s a place for broken people to be shielded from ever hearing that they’re broken and for developing better and better rationalizations for why they’re not broken and shouldn’t do the work needed to fix themselves. And SSC is miles better than lesswrong itself which is a weird cult centered around gnome in charge Eliezer Yudkowski. Just read the post where Scott let ozymandius post about all the ways that Roissy is wrong. The information there permanently disqualifies anyone associated with it from from having anything to do with building a functioning society.”

“Scott Alexander’s blog used to be good, but now he has been terrorized out of politics. Therefore boring. The problem was he purged all frequent commentors to the right of him out of the comments, which means that he had only enemies in his comments. And, being the rightmost, was persecuted. He has stopped posting on politics, I assume as a result of this persecution.”

“it’s by a stuttering aspie with expertise in nothing at all”

“a mentally-ill beta male who literally admitted that he wished he could become an asexual.”

“He is a fairly smart guy who makes well reasoned arguments. (He is also a literal cuckold.)”

“never forget for one fucking second that its author (who is ‘asexual’) and his most avid readers engage in ‘cuddle puddles’ irl, often bringing stuffed animals, and that he recommends this because it ‘increases credence’ in the other cuddlers’ statements.”

“his arguments only seem well-reasoned to people with NO knowledge of the subject matter (like him), and thus his main effect (just like that of his mentor, Eliezer Yudkowsky) is to keep smart people from learning things”

“oh what an expert in psychiatry! he’s a fucking med student in IRELAND. not to mention he uses yudkowsky’s lingo and called Less Wrong “revelatory” or something like that. you are dealing with a 110 IQ reddit type in SSC.”

“here is a series of a few posts (1, 2, 3) about how he is basically a conspiracy theorist, and in these posts he gets completely owned by the guy who cucked him with his tranny ex.”

“Merciful $DEITY. If I had any inclination to participate [on SSC], that [Guns and States] comment thread would have turned me completely off of it. How much more SJW-feminist-entitled can you get?…Extreme? No. Hard-line SJW enough that I’ve got better things to do than try to engage them? Yes. SJW-feminist-entitled? Yes.”

“SSC skews toward highly intelligent discourse, but Scott is very protective of his liberal homies. He will ban you if you stray too far from PC rigor.”

“I’d always got a whiff of fedora from this guy, so I feel gratified in my judgment at seeing him come out as one.”

“Faggot blocked me for calling out some recent bit of his retarded bullshit. Fuck ‘im.”

“Reading Slate Star Codex, I feel like I’m finally strarting to understand how postmodernism happened. First, there’s the whole thing of looking at your friends and a few books you happened to have read recently, and jumping to grand conclusions about all of society throughout all of history. Second, there’s the thing of him writing lengthy posts elaborating at great length on something that might be either boring and obviously true or bold and innovative but also completely wrong.”

“Aargh!! I read that entire Scott Alexander piece…well 65% of it…in earnest with the expectation that there was going to be a POINT to it. Some sort of payoff for my investment of time and attention. But there was nothing. It was just a bunch of bloviation with no purpose.”

“Scott Alexander is the story of a functioning pattern-recognition module trapped in a progressive brain. It would make a great story of its truth-seeking brain blob could eventually break free and rewire his brain to be a born-again reactionary. Not gonna happen though. The prog morality police has a hard, thick grasp on his brain, and all his friends and pseudosexual partners are the leftiest hacks this side of Lenin; so it’s an endless futile battle to square the circle. No wonder he went into psychiatry.”

“I retain great hopes for Scott, he’ll come around. When he does he’ll bring a high level of rigor with him. He is a caterpillar and will become a beautiful reactionary butterfly someday.”

“slatestarcodex is a great example of the difference between ‘knowing how to type’ and ‘knowing how to write'”

“You aren’t reading it right. Scott’s ability to completely identify the problem but still, quite sincerely, ritually abase himself to it at the same time, makes him worthy of connoisseurship. It takes a once in a generation talent to write long sincere *thoughtful* screeds pointing out that baby sacrifice is lowering the birth rate and causing family trauma, though of course he fully understands and endorses that Lord Moloch must be sated with the only food acceptable unto him.”

“Scott Alexander reminds me of some too-nice beta (nominally played by Joseph Gordon Leavitt or a JGL-alike) from some rom-com who’s trying to find his way and get the girl, while painfully oblivious to the fact that he just needs to stop being a too-nice beta and rip out somebody’s jugular. Ostracize someone for their beliefs? Me? Never. Golly gee.”

“seems like these guys are incapable of being dismissive of anything and have to objectively analyze everything.”

“Slate Star Codex is 140 IQ discussion about 105 IQ issues”

“He is sharp and makes good points but is way too fucking verbose. I dont need parts I II III and IV just fucking write concisely and stop vomiting words on your wordpress blog ”

“it’s basically a fish trap for aspies. people who can’t grasp nuance or understand basic human behavior, but are nonetheless obsessed with details and complex systems will inevitably gravitate toward this kind of horseshit. ultimately it’s a bunch of STEM-inclined dudes on the autism spectrum sitting around attempting to unpack societal problems like it was all a game of fucking sim city.”

“I would add that something like Slate Star Codex is also a clinic in the aspie tendency to miss the forest for the trees, except in this case it’s more like closely examining the bark on the trees for no goddamn reason whatsoever.”

“doesn’t this guy have a dayjob as like a doctor or something? why the fuck does he spend hours each day on a blog?” [to which another person on the same forum responded “why the fuck do you spend hours each day posting here?”]

“a blog populated by 99th percentile aspergers/IQ “rationalist” millennials who converse in an abnormally abstract style, and whose concrete cultural experience is drawn mainly from a bunch of weird nerd shit.”

“Its weird brand of reductionism and bizarre, arbitrary specificity plays to the types of spergy assholes and dumb know-it-all teenagers who don’t care about that anyway, or at least that’s how it seems to me. I mean, the ideas themselves seem like they’d be as much of a turn-off to regular people as their proponents’ personalities are, even if in a different way.”

“Oh, hey, the King of the Race Realist Misogynist Libertarian Nerds has Clever Things to say about vaccination.”

“yet another confirmation that: psychiatrists are crazier than their patients. polyamorous, diarrhea of the mouth/pen, math challenged, … i had no idea what an utter piece of shit you were.”

“He keeps his head down for fear of insulting permanently insulted people. He tries hard to be polite to people who hate him and consider him but a dog, unless they need him – and until they need him no longer. It is a waste of intellect, and debasement of character.”

“What makes me sad about Scott is just how close he is. I won’t give up hope on him yet. If only there was some way to secretly inject this guy with testosterone.”

“I wonder if he’s had bloodwork done to check his T count. I have to assume that if someone is an “asexual heteroromantic” as he puts it, that he’s interested in women from some abstract standpoint, and just needs some additional hormones to be thoroughly normal.”

“I literally want to see you kill yourself. I’m serious. You, and everyone else like you, are fucking disgusting wastes of space that are causing the decay of decency in the human race. I’m not going to argue with you, or say that it’s just my opinion or that it’s even up for debate.”

“is it some sort of special ‘Talk Like a Vulcan Day’ over there? Or are they always like that?”

“ssc spends a significant amount of time talking about stuff like how tables and chairs can be genders. he keeps a pretty unhinged tumblr”

“He’s definitely a beta orbiting cuckold.”

“that article seemed like a return ticket to obviousville with eight-hour layovers everywhere”

“That blog is very boring, and I didn’t manage to read long enough to find out what it was about. I hit Page Down a couple of times, and it seemed like it was on an entirely new topic each time.”

“I am thankful that I have never had any desire to seek psychiatric help. I have always had the impression, rightly or wrongly, that folks who pursue psychiatry as a career may themselves be the ones most in need of such therapy. Go to the mountains and look. Get up early and see the sunrise. Stop anywhere and take a minute to look at the beauty of nature all around you. We are a small piece in the universe, but still a part. The plan is good. You are fine. You will succeed if you try hard enough. Everything you need spiritually is inside you and has always been there. Stop complaining.”

“Also that ‘heteronormative asexual’ Scott Alexander. What a bizarre kike. He recently wrote that he’s incapable of not writing. LOL so kikish.”

“I thought it was a blog about science methodology until that post with the talking cactus.”

Posted in Uncategorized | Tagged | 1,483 Comments

OT42: Thread Anniversary

This is the third anniversary of this blog. I want to take this opportunity to thank people who I otherwise haven’t gotten around to thanking:

1. Website/app design company Trike Apps, especially Matt Fallshaw and Catherine Truscott, for hosting the site. They’ve done an amazing job and I strongly recommend them for anyone else launching a website.

2. Michael Keenan, for handling CSS and the layout – anything good about it is to his credit; any problems with it are because of all the finicky demands I place on him. Mason Hartman also gave helpful advice. Michael and Trike also did the technical setup for Unsong, so if you like that there’s another reason to thank them.

3. Bakkot, Alice Monday, and Rory O’Kane for doing a lot of the other technical work, including the green line around new comments and the comment reporting function.

4. Our sponsors, currently Beeminder, MealSquares, and Apptimize.

5. All of the really interesting people who have read and engaged with this blog. I especially want to thank all of the famous, Internet famous, and Internet journalism famous people who have praised me and linked to me, mainly because I have been terrible at reciprocating or even letting you know how much I appreciate your support, or even doing the basic things like not yelling at you and saying you are the Devil when I disagree with you about something. I am suitably humbled to have such important people reading me, I do appreciate your support, and I don’t really believe you’re the Devil, unless you work for Gawker in which I’m agnostic on the issue.

6. Everyone who’s helped build the community. Special thanks to Bakkot (again), Vulture, coderman9, heterodox_jedi, PM_ME_UR_OBSIDIAN, and tailcalled for their work on the subreddit, to drethelin for the IRC channel, and to various people who arranged various meetups.

7. No doubt many more people whom I have forgotten about and I’m sorry.

Posted in Uncategorized | Tagged | 656 Comments

List Of Passages I Highlighted In My Copy Of “Superforecasting”

In year 1, [the Good Judgment Project] beat the official control group by 60%. In year 2, we beat the control group by 78%. GJP also beat its university-affiliated competitors, including the Uniersity of Michigan and MIT, by hefty margins, from 30% to 70%, and even outperformed professional intellgience analysts with access to classified data. After two years, GJP was doing so much better than its academic competitors that IARPA dropped the other teams.

I keep wondering what these other teams were doing. Good Judgment Project sounds like it was doing the simplest, most obvious possible tactic – asking people to predict things and seeing what happened. David Manheim says the other groups tried “more straightforward wisdom of crowds” methods, so maybe GJP’s secret sauce was concentrating on the best people instead of on everyone? Still seems like it should have taken fewer than five universities and a branch of government to think of that.

One result that particularly surprised me was the effect of a tutorial covering some basic concepts that we’ll explore in this book and are summarized in the Ten Commandments appendix. It took only about sixty minutes to read and improved accuracy by roughly 10% through the entire tournament year. Yes, 10% may sound modest, but it was achieved at so little cost.

These Ten Commandments are available online here.

For centuries, [aversion to measuring things and collecting evidence] hobbled progress in medicine. When physicians finally accepted that their experience and perceptions were not reliable means of determining whether a treatment works, they turned to scientific testing – and medicine finally started to make rapid advances.

I see what Tetlock is trying to say here, but as written it’s horribly wrong.

Evidence-based medicine could be fairly described as starting in the 1970s with Cochrane’s first book, and really took off in the 80s and 90s. But this is also the period when rapid medical advances started slowing down! In my own field of psychiatry, the greatest advances were the first antidepressants and antipsychotics in the 50s, the benzodiazepines in the 60s, and then a gradual trickle of slightly upgraded versions of these through the 70s and 80s. The last new drugs that could be called “revolutionary” by any stretch of the imagination were probably the first SSRIs in the early 80s. This is the conventional wisdom of the field and everybody admits this, but I would add the stronger claim that the older medications in many ways work better. I know less about the history of other subfields, but they seem broadly similar – the really amazing discoveries are all pre-EBM, and the new drugs are mostly nicer streamlined versions of the old ones.

There’s an obvious “low-hanging fruit” argument to be made here, but some people (I think Michael Vassar sometimes toys with this idea) go further and say that evidence-based medicine as currently practiced can actually retard progress. In the old days, people tried possible new medications in a very free-form and fluid way that let everyone test their pet ideas quickly and keep the ones that worked; nowadays any potential innovations need $100 million 10-year multi-center trials which will only get funded in certain very specific situations. And in the old days, a drug would only be kept if it showed obvious undeniable improvement in patients, whereas nowadays if a trial shows a p < 0.05, d = 0.10 advantage, that's enough to make it the new standard if it's got a good pharma company behind it. So the old method allowed massive-scale innovation combined with high standards for success; the new method only allows very limited innovation but keeps everything that can show the slightest positive effect whatsoever on an easily-rigged but very expensive test. I'm not sure I believe in the strong version of this argument (the low-hanging fruit angle is probably sufficient), but the idea that medicine only started advancing after the discovery of evidence-based medicine is just wrong. A better way of phrasing it might be that around that time we started getting fewer innovations, but we also became a lot more effective and intelligent at using the innovations we already had.

Consider Galen, the second-century physician to Rome’s emperors…Galen was untroubled by doubt. Each outcome confirmed he was right, no matter how equivocal the evidence might look to someone less wise than the master. “All who drink of this treatment recover in a short time, except those whom it does not help, who all die,” he wrote. “It is obvious, therefore, that it fails only in incurable cases.”

After hearing one too many “everyone thought Columbus would fall off the edge of the flat world” -style stories, I tend to be skeptical of “people in the past were hilariously stupid” anecdotes. I don’t know anything about Galen, but I wonder if this was really the whole story.

When hospitals created cardiac care units to treat patients recovering from heart attacks, Cochrane proposed a randomized trial to determine whether the new units delivered better results than the old treatment, which was to send the patient home for monitoring and bed rest. Physicians balked. It was obvious the cardiac care units were superior, they said, and denying patients the best care would be unethical. But Cochrane was not a man to back down…he got his trial: some patients, randomly selected, were sent to the cardiac care units while others were sent home for monitoring and bed rest. Partway through the trial, Cochrane met with a group of the cardiologists who had tried to stop his experiment. He told them that he had preliminary results. The difference in outcomes between the two treatments was not statistically signficant, he emphasized, but it appeared that patients might do slightly betteri n the cardiac care units. “They were vociferous in their abuse: ‘Archie,’ they said, ‘we always thought you were unethical. You must stop the trial at once.'” But then Cochrane revealed he had played a little trick. He had reversed the results: home care had done slightly better than the cardiac units. “There was dead silence and I felt rather sick because they were, after all, my medical colleagues.”

This story is the key to everything. See also my political spectrum quiz and the graph that inspired it. Almost nobody has consistent meta-level principles. Almost nobody really has opinions like “this study’s methodology is good enough to believe” or “if one group has a survival advantage of size X, that necessitates stopping the study as unethical”. The cardiologists sculpted their meta-level principles around what best supported their object-level opinions – that more cardiology is better – and so generated the meta-level principles “Cochrane’s experiment is accurate” and “if one group has a slight survival advantage, that’s all we need to know before ordering the experiment stopped as unethical.” If Cochrane had (truthfully) told them that the cardiology group was doing worse, they would have generated the meta-level principles “Cochrane’s experiment is flawed” and “if one group has a slight survival advantage that means nothing and it’s just a coincidence”. In some sense this is correct from a Bayesian point of view – I interpret sonar scans of Loch Ness that find no monsters to be probably accurate, but if a sonar scan did find a monster I’d wonder if it was a hoax – but in less obvious situations it can be a disaster. Cochrane understood this and so fed them the wrong data and let them sell him the rope he needed to hang them. I know no better solution to this except (possibly) adversarial collaboration. Also, I suppose this is more proof (as if we needed it) that cardiologists are evil.

In the late 1940s, the Communist government of Yugoslavia broke from the Soviet Union, raising fears that the Soviets would invade. In March 1951 [US intelligence under Sherman Kent reported there was a “serious possibility” of a Soviet attack.] But a few days later, Kent was chatting with a senior State Department official who casually asked, “By the way, what did you people mean by the expression ‘serious possibility’? What kind of odds did you have in mind?” Kent said he was pessimistic. He felt that the odds were about 65 to 35 in favor of an attack. The official was startled. He and his colleagues had taken “serious possibility” to mean much lower odds.

Disturbed, Kent went back to his team. They had all agreed to use “serious possibility” in the [report], so Kent asked each person, in turn, what he thought it meant. One analyst said it meant odds of about 80%. Another thought it meant odds of 20% – exactly the opposite. Other answers were scattered between those extremes. Kent was floored. A phrase that looked informative was so vague as to be almost useless…

In 1961, when the CIA was planning to topple the Castro government by landing a small army of Cuban expatriates at the Bay of Pigs, President John F. Kennedy turned to the military for an unbiased assessment. The Joint Chiefs of Staff concluded that the plan had a “fair chance” of success. The man who wrote the words “fair chance” later said he had in mind odds of 3 to 1 against. But Kennedy was never told precisely what “fair chance” meant and, not unreasonably, he took it to be a much more positive assessment.

Nate Silver, Princeton’s Sam Wang, and other poll aggregators were hailed for correctly predicting all fifty state outcomes, but almost no one noted that a crude, across-the-board prediction of “no change” – if a state went Democratic or Republican in 2008, it will do the same in 2012 – would have scored forty-eight out of fifty, which suggests that the many excited exclamations of “he called all fifty states!” we heard at the time were a tad overwrought.

I didn’t realize this. I think this election I’m going to predict the state-by-state results just so that I can tell people I “predicted 48 of the 50 states” or something and sound really impressive.

The [Expert Political Judgment] data revealed an inverse correlation between fame and accuracy: the more famous an expert was, the less accurate he was. That’s not because editors, producers, and the public go looking for bad forecasters. They go looking for hedgehogs, who just happen to be bad forecasters. Animated by a Big Idea, hedgehogs tell tight, simple, clear stories that grab and hold audiences.

One day aliens are going to discover humanity and be absolutely shocked we made it past the wooden-club stage.

In 2008, the Office of the Director of national Intelligence – which sits atop the entire network of sixteen intelligence agencies -asked the National research Council to form a committee. The task was to synthesize research on good judgment and help the IC put that research to good use. By Washington’s standards, it was a bold (or rash) thing to do. It’s not every day that a bureaucracy pays one of the world’s most respected scientific institutions to produce an objective report that might conclude that the bureaucracy was clueless.

This was a big theme of the book: the US intelligence community deserves celebration for daring to investigate its own competency at all. Interestingly, a lot of its investigations said it was doing things more right than we would think: Tetlock mentions that even independent-to-hostile investigators concluded that it had been correct in using the facts it had to believe Saddam had WMDs. The book didn’t explain exactly how this worked: possibly Saddam was trying to deceive everyone into thinking he had WMDs to prevent attacks, and did a good job? This was part of what got the intelligence community interested in probability: given that they had made a reasonable decision in saying there were WMDs, but it had been a big disaster for the United States, what could they have done differently? Their answer was “continue to make the reasonable decision, but learn to calibrate themselves well enough to admit there’s a big chance they’re wrong.”

[We finished by giving] the forecast a final tweak: “extremizing” it, meaning pushing it closer to 100% or zero. If the forecast is 70% you might bump it up to, say, 85%. If it’s 30%, you might reduce it to 15%…[it] is based on a pretty simple insight: when you combine the judgments of a large group of people to calculate the “wisdom of the crowd” you collect all of the relevant information that is dispersed among all those people. But none of those people has access to all that information…what would happen if every one of those people were given all the information? They would become more confident. If you then calculated the wisdom of the crowd, it too would be more extreme.

Something to remember if you’re doing wisdom-of-crowds with calibration estimates.

The correlation between how well individuals do from one year to the next is about 0.65…Regular forecasters scored higher on intelligence and knowledge tests than about 70% of the population. Superforecasters did better, placing higher than about 80% of the population.

People interested in taking these kinds of tests are generally intelligent; superforecasters are somewhat more, but not vastly more, intelligent than that.

Researchers have found that merely asking people to assume their initial judgment is wrong, to seriously consider why that might be, and then make another judgment, produces a second estimate which, when combined with the first, improves accuracy almost as much as getting a second estimate from another person.

There’s a rationalist tradition – I think it started with Mike and Alicorn – that before you get married, you ask all your friends to imagine that the marriage failed and tell you why. I guess if you just asked people “Will our marriage fail?” everyone would say no, either out of optimism or social desirability bias. If you ask “Assume our marriage failed and tell us why”, you’ll actually hear people’s concerns. I think this is the same principle. On the other hand, I’ve never heard of anyone trying this and deciding not to get married after all, so maybe we’re just going through the motions.

[Superforecaster] Doug Lorch knows that when people read for pleasure they naturally gravitate to the like-minded. So he created a database containing hundreds of information sources – from the New York Times to obscure blogs – that are tagged by their ideological oreintation, subject matter, and geographical origin, then wrote a program that selects what he should read next using criteria that maximize diversity.

Of all humans, only Doug Lorch is virtuous. Well, Doug Lorch and this guy from rationalist Tumblr who tried to get the program but was told it wasn’t really the sort of thing you could just copy and give someone.

[The CIA was advising Obama about whether Osama bin Laden was in Abbotabad, Pakistan; their estimates averaged around 70%]. “Okay, this is a probability thing,” the President said in response, according to Bowden’s account. Bowden editorializes: “Ever since the agency’s erroneous call a decade earlier [on Saddam’s weapons of mass destruction], the CIA had instituted an almost comically elaborate process for weighing certainty…it was like trying to controve a mathematical formula for good judgment.”Bowden was clearly not impressed with the CIA’s use of numbers and probabilities. Neither was Barack Obama, according to Bowden. “What you ended up with, as the president was finding, and as he would later explain to me, was not more certainty but more confusion…in this situation, what you started to get was probabilities that disguised uncertainty, as opposed to actually providing you with useful information…”

After listening to the widely ranging opinions, Obama addressed the rrom. “This is fifty-fifty,” he said. That silenced everyone. “Look guys, this is a flip of the coin. I can’t base this decision on the notion that we have any greater certainty than that…

The information Bowden provides is sketchy but it appears that the media estimate of the CIA officers – the “wisdom of the crowd” – was around 70%. And yet Obama declares the reality to be “fifty-fifty.” What does he mean by that?…Bowden’s account reminded me of an offhanded remark that Amos Tversky made some thirty years ago…In dealing with probabilities, he said, most people only have three settings: “gonna happen,” “not gonna happen,” and “maybe”.

Lest I make it look like Tetlock is being too unfair to Obama, he goes on to say that maybe he was speaking colloquially. But the way we speak colloquially says a lot about us, and there are many other examples of people saying this sort of thing and meaning it. This ties back into an old argument we had here on whether something like a Bayesian concept of probability was meaningful/useful. Some people said that it wasn’t, because everyone basically understands probability and Bayes doesn’t add much to that. I said it was, because people’s intuitive idea of probability is hopelessly confused and people don’t really think in probabilistic terms. I think we have no idea how confused most people’s idea of probability is, and perhaps even Obama, one of our more intellectual presidents, has some issues there.

Barbara Mellers has shown that granularity predicts accuracy: the average forecaster who sticks with the tens – 20%, 30%, 40% – is less accurate than the finer-grained forecaster who uses fives – 20%, 25%, 30% – and still less accurate than the even finer-grained forecaster who uses ones – 20%, 21%, 22%. As a further test, she rounded forecasts to make them less granular, so a forecast at the greatest granularity possible in the tournament, single percentage points, would be rounded to the nearest five, and then the nearest ten. This way, all of the forecasts were made one level less granular. She then recalculated Bier scores and discovered that superforecasters lost accuracy in response to even the smallest-scale rounding, to the nearest 0.05, whereas regular forecasters lost little even from rounding four times as large, to the nearest 0.2.

This was the part nobody on the comments to the last post believed, and I have trouble believing it too.

[There’s a famous Keynes quote: “When the facts change, I change my mind. What do you do, sir?”] It’s cited in countless books, including one written by me and another by my coauthor. Google it and you will find it’s all over the internet. Of all the many famous things Keynes says, it’s probably the most famous. But while researching this book, I tried to track it to its source and failed. Instead I found a post by a Wall Street Journal blogger, which said that no one has ever discovered its provenance and the two leading experts on Keynes think it is apocryphal. In light of these facts, and in the spirit of what Keynes apparently never said, I concluded that I was wrong.

The funny part is that if this fact is true, we’ve known it for fifty years, and people still haven’t changed their mind about whether he said it or not.

“Keynes is always ready to contradict not only his colleagues but also himself whenever circustancse make this seem appropriate,” re[prted a 1945 profile of the “consistently inconsistent” economist. “So far from feeling guilty about such reversals of position, he utilizes them as pretexts for rebukes to those he saw as less nimble-minded. Legend says that while conferring with Roosevelt at Quebec, Churchill sent Keynes a cable reading, ‘Am coming around to your point of view.’ His Lordship replied, ‘Sorry to hear it. Have started to change my mind.'”

I sympathize with this every time people email me to say how much they like the Non-Libertarian FAQ.

Police officers spend a lot of time figuring out who is telling the truth and who is lying, but research has found they aren’t nearly as good at it as they think they are and they tend not to get better with experience…predictably, psychologists who test police officers’ ability to spot lies in a controlled setting find a big gap between their confidence and their skill. And that gap grows as officers become more experienced and they assume, not unreasonably, that their experience has made them better lie detectors.

There’s some similar research on doctors and certain types of diagnostic tasks that don’t give quick feedback.

In 1988, when the Soviet Union was implementing major reforms that had people wondering about its future, I asked experts to estimate how likely it was that the Communist Party would lose its monopoly on power in the Soviet Union in the next five years. In 1991 the world watched in shock as the Soviet Union disintegrated. So in 1992-93 I retunred to the experts, reminded them of the question in 1988, and asked them to recall their estimates. On average, the experts recalled a number 31 percentage points higher than the correct figure. So an expert who thought there was only a 10% chance might remember herself thinking there was a 40% or 50% chance. There was even a case in which an expert who pegged the probability at 20% recalled it as 70%.

As the old saying goes, hindsight is 20/70.

The results were clear-cut each year. Teams of ordinary forecasters beat the wisdom of the crowd by about 10%. Prediction markets beat ordinary teams by about 20%. And superteams beat prediction markets by 15% to 30%. I can already hear the protests from my colleagues in finance that the only reason the superteams beat the prediction markets was that our markets lacked liquidity…they may be right. It is a testable idea, and one worth testing.

The correct way to phrase this is “if there is ever a large and liquid prediction market, Philip Tetlock will gather his superforecasters, beat the market, become a zillionaire, and then the market will be equal to or better than the forecasters.”

Orders in the Wehrmacht were often short and simple – even when history hung in the balance. “Gentlemen, I demand that your divisions completely cross the German borders, completely cross the Belgian borders, and completely cross the River Meuse,” a senior officer told the commanders who would launch the great assault into Belgium and France on May 10, 1940. “I don’t care how you do it, that’s completely up to you.”

This is the opposite of the image most people have of Germany’s World War II military. The Wehrmacht served a Nazi regime that rpeached total obedience to the dictates of the Fuhrer, and everyone emembers the old newsreels of German soldiers marching in goose-stepping unison…but what is often forgotten is that the Nazis did not create the Wehrmacht. They inherited it. And it could not have been more different from the unthinking machine we imagine.

[…]

Shortly after WWI, Eisenhower, then a junior officer who had some experience witht he new weapons called tanks, published an article in the US Army’s Infantry Journal making the modest argument that “the clumsy, awkward and snail-like progress of the old tanks must be forgotten, and in their place we must picture this speedy, reliable, and efficient engine of destruction.” Eisenhower was dressed down. “I was told my ideas were not only wrong but dangerous, and that henceforth I was to keep them to myself,” he recalled. “Particularly, I was not to publish anything incompatible with solid infantry doctrine. If I did, I would be hauled before a court martial.”

Tetlock includes a section on what makes good teams and organizations. He concludes that they’re effective when low-level members are given leeway both to pursue their own tasks as best they see fit, and to question and challenge their higher-ups. He contrasts the Wehrmacht, which was very good at this and overperformed its fundamentals in WWII, to the US Army, which was originally very bad at this and underperformed its fundamentals until it figured this out. Later in the chapter, he admits that his choice of examples might raise some eyebrows, but says that he did it on purpose to teach us to think critically and overcome cognitive dissonance between our moral preconceptions and our factual beliefs. I hope he has tenure.

Ultimately the Wehrmacht failed. In part, it was overwhelmed by its enemies’ superior resources. But it also made blunders – often because its commander-in-chief, Adolf Hitler, took direct control of operations in violation of Helmuth von Moltke’s principles, nowhere with more disastrous effect than during the invasion of Normandy. The Allies feared that after their troops landed, German tanks would drive them back to the beaches and into the sea, but Hitler had directed that the reserves could only move on his personal command. Hitler slept late. For hours after the Allies landed on the beaches, the dictator’s aides refused to wake him to ask if he wanted to order the tanks into battle.

Early to bed
And early to stir up
Makes a man healthy
And ruler of Europe

The humility required for good judgment is not self-doubt – the sense that you are untalented, unintelligent, or unworthy. It is intellectual humility. It is a recognition that reality is profoundly complex, that seeing things clearly is a constant struggle, when it can be done at all, and that human judgment must therefore be riddled with mistakes. This is true for fools and geniuses alike. So it’s quite possible to think highly of yourself and be intellectually humble. In fact, this combination can be wonderfully fruitful. Intellectual humility compels the careful reflection necessary for good judgment; confidence in one’s abilities inspires determined action.

Yes! This is a really good explanation of Eliezer Yudkowsky’s Say It Loud.

(and that sentence would also have worked without the apostrophe or anything after it).

I am…optimistic that smart, dedicated people can inoculate themselves to some degree against certain cognitive illusions. That may sound like a tempest in an academic teapot, but it has real-world implications. If I am right, organizations will have more to gain from recruiting and training talented people to resist their biases.

This is probably a good time to mention that CFAR is hiring.

Posted in Uncategorized | Tagged , | 204 Comments

Book Review: Superforecasting

Lots of people recommended Philip Tetlock’s Superforecasting as relevant to my interests. In retrospect, this should have been a warning sign; if I had been one of the titular superforecasters I might have predicted that it wouldn’t contain too much that’s new. Still, I appreciated its work putting some of the rationality/cognitive-science literature into an easily accessible and more authoritative-sounding form.

Philip Tetlock got famous by studying prediction. His first major experiment, the Expert Political Judgment experiment, is frequently cited as saying that top pundits’ predictions are no more accurate than a chimp throwing darts at a list of possibilities- although Tetlock takes great pains to confess to us that no chimps were actually involved, and this phrasing just sort of popped up as a flashier way of saying “random”. Although this was generally true, he was able to distinguish a small subset of people who were able to do a little better than chance. His investigation into the secrets of their very moderate success led to his famous “fox” versus “hedgehog” dichotomy, based on the fable that “the fox knows many things, the hedgehog knows one big thing”. Hedgehog pundits/experts are people who operate off a single big idea- for example, an economist who says that government intervention is always bad, predicts doom for any interventionist policy, and predicts great success for any noninterventionist one. Foxes are people who don’t have much of a narrative or ideology, but try to find the right perspective to approach each individual problem. Tetlock found that the hedgehogs did worse than the chimp and the foxes did a little better.

Cut to the late 2000s. The US intelligence community has just been seriously embarrassed by their disastrous declaration that there were weapons of mass destruction in Iraq. They set up an Intelligence Advanced Research Projects Agency to try crazy things and see if any of them worked. IARPA approached a bunch of scientists, handed them a list of important world events that might or might not happen, and told them to create some teams and systems for themselves and compete against each other to see who could predict them the best.

Tetlock was one of these scientists, and his entry into the competition was called the Good Judgment Project. The plan was simple: get a bunch of people to sign up and try to predict things, then find the ones who did the best. This worked pretty well. 2,800 people showed up, and a few of them turned out to be…

…okay, now we’re getting to a part I don’t understand. When I read Tetlock’s paper, all he says is that he took the top sixty forecasters, declared them superforecasters, and then studied them intensively. That’s fine; I’d love to know what puts someone in the top 2% of forecasters. But it’s important not to phrase this as “Philip Tetlock discovered that 2% of people are superforecasters”. This suggests a discontinuity, a natural division into two groups. But unless I’m missing something, there’s no evidence for this. Two percent of forecasters were in the top two percent. Then Tetlock named them “superforecasters”. We can discuss what skills help people make it this high, but we probably shouldn’t think of it as a specific phenomenon.

Anyway, the Good Judgment Project then put these superforecasters on teams with other superforecasters, averaged out their decisions, slightly increased the final confidence levels (to represent the fact that it was 60 separate people, all of whom were that confident), and presented that to IARPA as their final answer. Not only did they beat all the other groups in IARPA’s challenge in a landslide, but they actually did 30% better than professional CIA analysts working off classified information.

Having established that this is all pretty neat, Tetlock turns to figuring out how superforecasters are so successful.

First of all, is it just luck? After all, if a thousand chimps throw darts at a list of stocks, one of them will hit the next Google, after which we can declare it a “superchimp”. Is that what’s going on here? No. Superforecasters one year tended to remain superforecasters the next. The year-to-year correlation in who was most accurate was 0.65; about 70% of superforecasters in the first year remained superforecasters in the second. This is definitely a real thing.

Are superforecasters just really smart? Well, sort of. The superforecasters whom Tetlock profiles in his book include a Harvard physics PhD who speaks 6 languages, an assistant math professor at Cornell, a retired IBM programmer data wonk, et cetera. But the average superforecaster is only at the 80th percentile for IQ – just under 115. And there are a lot of people who are very smart but not very good at predicting. So while IQ definitely helps, it isn’t the whole story.

Are superforecasters just really well-informed about the world? Again, sort of. The correlation between well-informedness and accuracy was about the same as the correlation between IQ and accuracy. None of them are remarkable for spending every single moment behind a newspaper, and none of them had as much data available as the CIA analysts with access to top secret information. Even when they made decisions based on limited information, they still beat other forecasters. Once again, this definitely helps, but it’s not the whole story.

Are superforecasters just really good at math? Again, kind of. A lot of them are math PhDs or math professors. But they all tend to say that they don’t explicitly use numbers when doing their forecasting. And some of them don’t have any kind of formal math background at all. The correlation between math skills and accuracy was about the same as all the other correlations.

So what are they really good at? Tetlock concludes that the number one most important factor to being a superforecaster is really understanding logic and probability.

Part of it is just understanding the basics. Superforecasters are less likely to think in terms of things being 100% certain, and – let’s remember just how far left the bell curve stretches – less likely to assign anything they’re not sure about a 50-50 probability. They’re less likely to believe that things happen because they’re fated to happen, or that the good guys always win, or that things that happen will necessarily teach a moral lesson. They’re more likely to admit they might be wrong and correct themselves after an error is discovered. They’re more likely to debate with themselves, try to challenge their original perception, start asking “What could be wrong about this thing I believe?” rather than “How can I prove I’m right?”

But they’re also more comfortable actively using probabilities. Like my predictions, the Good Judgment Project made forecasters give their answers as numerical probability estimates – for example, 15% chance of a war between North and South Korea in the next ten years killing > 1000 people. Poor forecasters tend to make a gut decision based on feelings that superficially related to the question, like “Well, North Korea is pretty crazy, so they’re pretty likely to declare war, let’s say 90%” or “War is pretty rare these days, how about 10%?”. Superforecasters tend to focus on the specific problem in front of them and break it down into pieces. For example, they might start with the Outside View – it’s been about 50 years since the Koreas last fought, so their war probability per decade shouldn’t be more than about 20% – and then adjust that based on Inside View information – “North Korea has a lot fewer foreign allies these days, so they’re less likely to start something than they once were – maybe 15%”.

Or they might break the problem down into pieces: “There would have to be some sort of international incident, and then that incident would have to erupt into total war, and then that war would have to kill > 1,000 people. There are about two international incidents between the Koreas every year, but almost none of them end in war; on the other hand, because of all the artillery aimed at Seoul, probably any war that did happen would have an almost 100% chance of killing > 1,000 people” … and so on. One result is that while poor forecasters tend to give their answers in broad strokes – maybe a 75% chance, or 90%, or so on – superforecasters are more fine-grained. They may say something like “82% chance” – and it’s not just pretentious, Tetlock found that when you rounded them off to the nearest 5 (or 10, or whatever) their accuracy actually decreased significantly. That 2% is actually doing good work.

Most interesting, they seem to be partly immune to cognitive bias. The strongest predictor of forecasting ability (okay, fine, not by much, it was pretty much the same as IQ and well-informedness and all that – but it was a predictor) was the Cognitive Reflection Test, which includes three questions with answers that are simple, obvious, and wrong. The test seems to measure whether people take a second to step back from their System 1 judgments and analyze them critically. Superforecasters seem especially good at this.

Tetlock cooperated with Daniel Kahneman on an experiment to elicit scope insensitivity in forecasters. Remember, scope insensitivity is where you give a number-independent answer to a numerical question. For example, how much should an organization pay to save the lives of 100 endangered birds? Ask a hundred people, and maybe the average answer is “$10,000”. Ask a (different group of) a hundred people how much the same organization should pay to save the lives of 1000 endangered birds, and maybe the average answer will still be $10,000. So it seems you can get people to change their estimate of the value of bird life just by changing the number in the question. Poor forecasters do the same thing on their predictions. For example, a hundred poor forecasters might on average predict a 15% chance of war in Korea in the next five years, and a different group of a hundred poor forecasters might on average predict a 15% chance of war in Korea in the next fifteen years. They’re ignoring the question and just going off of a vague feeling of how likely another Korean war seems. Superforecasters, in contrast, showed much reduced scope insensitivity, and their probability of a war in five years was appropriately lower than of a war in fifteen.

Maybe all this stuff about probability calibration, inside vs. outside view, willingness to change your mind, and fighting cognitive biases is starting to sound familiar? Yeah, this is pretty much the same stuff as in the Less Wrong Sequences and a lot of CFAR work. They’re both drawing from the same tradition of cognitive science and rationality studies.

So as I said before, Superforecasting is not necessarily too useful for people who are already familiar with the cognitive science/rationality tradition, but great for people who need a high-status and official-looking book to justify it. The next time some random person from a terrible forum says that everything we’re doing is stupid, I’m already looking forward to pulling out Tetlock quotes like:

The superforecasters are a numerate bunch: many know about Bayes’ theorem and could deploy it if they felt it was worth the trouble. But they rarely crunch the numbers so explicitly. What matters far more to the superforecasters than Bayes’ theorem is Bayes’ core insight of gradually getting closer to the truth by constantly updating in proportion to the weight of the evidence. That’s true of Tim Minto [the top superforecaster]. He knows Bayes’ theorem, but he didn’t use it even once to make his hundreds of updated forecasts. And yet Minto appreciates the Bayesian spirit. “I think it is likely that I have a better intuitive grasp of Bayes’ theorem than most people,” he said, “even though if you asked me to write it down from memory I’d probably fail.” Minto is a Bayesian who does not use Bayes’ theorem. That paradoxical description applies to most superforecasters.

And if you’re interested, it looks like there’s a current version of the Good Judgment Program going on here that you can sign up to and see if you’re a superforecaster or not.

EDIT: A lot of people have asked the same question: am I being too dismissive? Isn’t it really important to have this book as evidence that these techniques work? Yes. It is important that the Good Judgment Project exists. But you might not want to read a three-hundred page book that explains lots of stuff like “Here’s what a cognitive bias is” just to hear that things work. If you already know what the techniques are, it might be quicker to read a study or a popular news article on GJP or something.

OT41: Having Your Mind Involuntarily Thread

This is the bi-weekly open thread. Post about anything you want, ask random questions, whatever. Also:

1. In the spirit of discussing class differences, here are the 36 best quotes from Davos 2016.

2. Vitalik Buterin expands on my fake side effects article by discovering eHealthMe’s support group for “people who have Death on Xolair”. Related: “We study 33,751 people who have side effects while taking Viagra from FDA and social media. Among them, 983 have Death.”

3. Comments of the week include: Sarah on the order of Siamese twin phrases, John Schilling on that star with the unexplained dimming, Sniffnoy on class (someone once asked if there was anything that couldn’t be related to a David Chapman post; if so, today is not the day we find it), Joyously on Trump’s class, Michael W on Indian perceptions of Hitler, and Ptoliporthos on why ‘research parasitism’ can be a real problem.

4. More meetups: London, maybe Sydney?.

Posted in Uncategorized | Tagged | 1,478 Comments

Staying Classy

Siderea writes an essay on class in America. You should read it. In case you don’t, here’s the summary:

1. People tend to confuse social class with economic class, eg how much money you make. But social class is a more complicated idea involving how respectable you seem, how educated you are, and what kind of family you come from. An assembly line supervisor might make the same amount of money as a schoolteacher, but the schoolteacher would probably seem more refined and be able to access better social circles.

2. Classes are cultures. People in a certain class have their own way of dressing, speaking, decorating, and behaving. They have distinctive ideas and values. This is why a lower-class person cannot simply claim to be upper-class and so gain all the benefits of upper-class-hood; it would be as hard as trying to pass for Japanese. Lower-class people can learn their way around upper-class culture, but it’s a difficult and lifelong project done most easily if you already have upper-class resources.

3. Talking about class is taboo because we like to believe we’re a classless society. We talk about income instead and pretend it’s class. Class breaks through in a couple of phrases like “rednecks” or “white trash” or “white collar” or “coastal elites”, but people use the phrases without usually having a broader idea that it’s class they’re talking about.

4. Class prejudice is complicated. It combines the practical superiority of being upper-class to being lower-class (because you have more money and opportunity) with the very dubious value judgment that upper-class culture is superior to lower-class culture, or that lower-class culture is just people trying to do upper-class culture but failing. But lower-class people like lower-class culture and generally do not want to adopt upper-class culture, except insofar as it’s necessary to advance. Analogies to race and assimilation are obvious.

5. People mostly understand their own class, and the class one step above or below them, but have only vague stereotypes of classes further than that. This limits social mobility; you can’t join what you can’t understand.

6. College is a finishing school for the upper classes. They send their children there to learn the proper upper class values and behaviors. Even if community college does a great job teaching whatever trades it teaches, it will not teach you how to be a part of the upper class, and this will seriously limit your opportunities.

7. Politically, the left pretends class doesn’t exist; the right talks about it, but only to yell at the underclass and say that their culture is wrong. Race is really complicated and will be left out of this analysis.

I notice Siderea is a psychotherapist, which doesn’t surprise me. We in mental health get a pretty good cross-sectional exposure of everybody and get to hear about their lives, and with enough data points the structure comes into sharper relief.

Just to give an example: suppose a lady comes in with really over-permed dyed curly hair wearing several rings, bracelets, and necklaces. Her name is Sherri and she calls you “darling”; she’s also carrying her lunch, which is KFC plus a Big Gulp. Without knowing anything else about her, you can peg her as working class. Maybe she won the lottery ten years ago and is now the richest person in your state. It doesn’t matter. She’s still working class.

Or suppose a thin 25-year-old man comes in wearing glasses, a small close-cropped beard, and a Led Zeppelin t-shirt. His name is Alex and he apologizes for being three minutes late. This guy is probably middle-to-upper-middle-class and college educated, maybe not a great college but still college-educated. And maybe he’s fallen on hard times and doesn’t have a dollar to his name. It still doesn’t matter. He’s still middle-to-upper-middle class.

And you start to learn you can predict things about these people, the concerns they’re going to have, the kind of things that happen to them. Who their friends are. How they relate to their friends: Sherri will expound upon the flaws of every single one of her ungrateful coworkers; Alex will reluctantly say he went through a tough breakup a year or two ago. What kind of drugs they abuse, if they abuse drugs (maybe Sherri has smoking and drinking problems; Alex has probably tried marijuana and LSD but is embarrassed to say so).

But this kind of innate stereotyping is different than a formal taxonomy. Siderea links to Michael Church’s attempt to explain what the classes actually are. This is another piece you should read, but again in case you don’t:

1. 10% of people are in an underclass consisting of “generationally poor” people who may never have held jobs and who come from similarly poor families.

2. 65% of people are in the labor class. They work jobs where labor is seen as a commodity, ie there’s not as much sense of career capital or reputation. They base virtue and success around Hard Work. Its lower levels are minimum wage McJobs, its middle levels are assembly line work, and its higher levels are things like pilots, plumbers, and small business owners. The stratospheric semi-divine level is “celebrities” like reality TV stars who become fabulously rich and famous while sticking to their labor class roots.

3. 23.5% of people are in the gentry class. They fetishize education and career capital. They engage in all sorts of signaling games around “fair trade” and “organic” and what museums they go to. At the lower level they’re schoolteachers and starving artists, at the mid level they’re “professions” like engineering and law, and at the highest level they’re professors and scientists and entrepreneurs. The stratospheric semi-divine level is “cultural influencers” like Jon Stewart or Steven Pinker who become famous and (maybe) rich while sticking to their gentry class roots.

4. 1.5% of people are in the elite class. Although you can be borderline-elite by getting a job in finance and making a few million, the real elite are born into money and don’t work unless they want to. Occasionally they’ll sit on a board or found a philanthropic association or something. They don’t believe in “professional achievement” because working is lower-class; they might compete in complicated status games around who throws the best parties or has the best horses or whatever.

5. The highest class (E1) are insane psychopaths who burn the global commons for shits and giggles. They tend to be drug lords, arms dealers, and morally insane billionaires. Most famous politicians and businesspeople are not of this class and most people in this class are not famous.

6. The three main classes (labor, gentry, and elite) are three different ‘infrastructures’. To be in labor you need skills, to be in gentry you need education, and to be in elite you need connections. There’s no strict hierarchy (eg not all gentry are above all labor), but you can picture them as offset ladders, with the lower gentry being at the same rung as the higher labor and so on.

7. The Elite control everything; the constant threat is that Gentry and Labor will unite against them, which might very well work. The Elite neutralize this threat by making Labor hate Gentry as “effeminate” or “pretentious”; they also convince Labor that the Gentry are probably secretly in cahoots with the underclass against Labor. Elites also convince Labor that Elites don’t exist and it’s Gentry all the way up, which means that “anti-1%” sentiment, which should properly get Labor and Gentry to cooperate against the Elites, instead makes Gentry hate the Elites but Labor hate Gentry. Politics boils down to Gentry being good people trying to improve things, and Elite conning Labor into hating Gentry to prevent things from being improved.

8. While all classes can have good and bad people (except E1, which is wholly bad), Elites have a generally negative influence on society, and Gentry are generally positive. After the World Wars, everybody got angry at the Elites for all the war and killing and stuff, which convinced them to lie low for a few decades and forced the Gentry to take over. This was why the country did so well during the 50s and 60s. Whether the country goes in a good or bad direction now depends on whether the Elites manage to take it back or not. One reason Silicon Valley works (used to work?) so unusually well was that it was mostly a native project of the Gentry that hadn’t yet been infiltrated by the Elites.

Reaction to Church on the subreddit was pretty negative, but I find it at least a good nucleus for further discussion. The Gentry/Labor distinction is glaringly obvious. The Labor/Underclass distinction also seems glaringly obvious to me, if only because Labor hates the underclass. The Gentry/Elite distinction doesn’t seem glaringly obvious to me, but maybe that’s just because I haven’t met enough elites. In particular, Church’s “E1” seems caricatured and out-of-place in his otherwise sober analysis. Then again, if those people existed I probably wouldn’t know anyway. Then again, the rest of Church’s blog suggests some paranoid tendencies, so maybe the E1 entry is just those coming out.

Siderea notes that Church’s analysis independently reached about the same conclusion as Paul Fussell’s famous guide. I’m not entirely sure how you’d judge this (everybody’s going to include lower, middle, and upper classes), but eyeballing Fussell it does look a lot like Church, so let’s grant this.

It also doesn’t sound too different from Marx. Elites sound like capitalists, Gentry like bourgeoisie, Labor like the proletariat, and the Underclass like the lumpenproletariat. Or maybe I’m making up patterns where they don’t exist; why should the class system of 21st century America be the same as that of 19th century industrial Europe?

There’s one more discussion of class I remember being influenced by, and that’s Unqualified Reservations’ Castes of the United States. Another one that you should read but that I’ll summarize in case you don’t:

1. Dalits are the underclass, made up of homeless people, chronically unemployed people, drug addicts, etc. They tend to have a lot of trouble with the law, go in and out of jail, never really hold down stable employment. Status is “street cred” that you get from being powerful, wealthy, and sexually successful, eg gang leaders.

2. Vaisyas are standard middle-class people who engage in productive employment. They tend to form nuclear families and try to go to church. Status is having a stable job, a stable family, and being well-liked in your church or social club.

3. Brahmins are very educated people who participate in the world of ideas. They range from doctors and lawyers to artists and professors. Access is conferred by top-tier university education. Status is from conspicuous engagement in progressive politics, eg being an activist, working for an NGO, “campaigning for justice”. They are “the ruling class”.

4. Optimates are very rich WASPs concerned with breeding and old money. Status comes from breeding and an antiquated idea of “nobility”. Optimates used to be “the ruling class”, but now they’re either extinct or endangered, having been pretty much absorbed into the Brahmins.

5. Mentioned elsewhere in the UR corpus: politics boils down to Vaisyas being basically decent people trying to lead normal productive lives, and Brahmins trying to create a vast tentacled monstrosity of useless bureaucrats and petty enforcers of ideological conformity to employ Brahmins in the “knowledge work” they feel entitled to and to protect their interests. Silicon Valley is (used to be?) unusually functional because it maintained some Vaisya values separate from the corrupting influence of the Brahmins.

Michael Church’s system (henceforth MC) and the Unqualified Reservation system (henceforth UR) are similar in some ways. MC’s Underclass matches Dalits, MC’s Labor matches Vaisyas, MC’s Gentry matches Brahmins, and MC’s Elite matches Optimates. This is a promising start. It’s a fourth independent pair of eyes that’s found the same thing as all the others. (commenters bring up Joel Kotkin and Archdruid Report as similar convergent perspectives).

But there are also some profound differences. UR says that the Elites are mostly gone, that everything’s ruled by the Gentry nowadays, and that the Gentry are allying with the criminal Underclass against Labor. MC mentions this same picture, but only as the false facade that the Elites are trying to get everyone else to believe in order to keep them divided.

You could reconcile some of the differences by supposing the two models have different cutoffs. Suppose we rank people from 0 (lowest underclass) to 100 (highest elite). Maybe MC draws the Labor/Gentry and Gentry/Elite borders at 40 and 70 respectively, and UR draws the Vaisya/Brahmin and Brahmin/Optimate borders at 60 and 90. If the world’s being run by 80s, MC could be right to say it’s run by Elites and not Gentry, and UR could be right in saying it’s run by Brahmins and not Optimates. If Silicon Valley is run by 55s but being ruined by 75s, MC could say it’s run by Gentry but ruined by elites, and UR could say it’s run by Vaisyas but ruined by Brahmins. But if there’s this much variability in class boundaries, what’s the point in even drawing them in the first place?

But I think the differences are real and political: MC comes from a liberal perspective, UR from a conservative one. MC wants to locate the source of the cancer in the (mostly plutocrat) Elites, cast the (mostly liberal) Gentry as wonderful people who can do no wrong, cast the (mostly conservative) Labor as deluded and paranoid, and cast the (liberal-aligned) Underclass in a sympathetic light. UR wants to locate the source of the cancer in the (mostly liberal) Brahmins, cast the (mostly conservative) Labor as decent salt-of-the-Earth types under threat from the elite, and cast the (liberal-aligned) Underclass in an unsympathetic light.

And the political angle evokes one more system worth adding here: my own discussion of the Blue Tribe vs. the Red Tribe in I Can Tolerate Anything But The Outgroup. I point out that the group sometimes referred to as “coastal liberals” or “SWPL” and so on are marked not only by Democratic Party beliefs, but by a host of cultural similarities including food, dress, music, hobbies, religion, values, art, etc. Likewise, the group sometimes referred to as “rednecks” or “fundies” and so on are marked not only by Republican Party beliefs, but by a similar set of cultural similarities. I call these the “Blue Tribe” and the “Red Tribe” as an attempt to distinguish them as cultures and not just as sets of political beliefs.

These tribes seem closely related to classes. “Blue Tribe” is similar to Gentry; “Red Tribe” is similar to Labor. I won’t say there’s a perfect 1:1 equivalence; for example, I know some union leaders who are very clearly in the Labor class but who wouldn’t be caught dead in the Red Tribe. But the resemblance is too close to miss.

Some final scattered thoughts:

1. All those studies that analyze whether some variable or other affects income? They’d all be much more interesting if they analyzed the effect on class instead. For example, there’s a surprisingly low correlation between your parents’ income and your own income, which sounds like it means there’s high social mobility. But I grew up in a Gentry class family; I became a doctor, my brother became a musician, and my cousin got a law degree but eventually decided to work very irregularly and mostly stay home raising her children. I make more money than my brother, and we both make more money than my cousin, but this is not a victory for social mobility and family non-determinism; it’s no coincidence none of us ended up as farmers or factory workers. We all ended up Gentry class, but I chose something closer to the maximize-income part of the Gentry class tradeoff space, my brother chose something closer to the maximize-creativity part, and my cousin chose to raise the next generation. Any studies that interpret our income difference as an outcome difference and tries to analyze what factors gave me a leg up over my relatives (better schools? more breastfeeding as a child?) are stupid and will come up with random noise. We all got approximately the same level of success/opportunity, and those things just happen to be very poorly measured by money. If we could somehow collapse the entirety of tradeoffspace into a single variable, I bet it would have a far greater parent-child correlation than income does. This is part of why I don’t follow the people who take the modest effect of IQ on income as a sign that IQ doesn’t change your opportunities much; maybe everyone in my family has similar IQs but wildly different income levels, and there’s your merely modest IQ/income relationship right there. I think some studies (especially in Britain) have tried analyzing class and gotten some gains over analyzing income, but I don’t know much about this.

2. I think Siderea is right that the Right thinks in social class terms more naturally than the Left. To oversimplify, both sides use class warfare, but the Left’s class warfare is economic (“the plutocrat billionaires are ruining everything!”) and the Right’s class warfare is social (“the media and academic elites are ruining everything!”).

3. Closely related: Donald Trump appeals to a lot of people because despite his immense wealth he practically glows with signs of being Labor class. This isn’t surprising; his grandfather was a barber and his father clawed his way up to the top by getting his hands dirty. He himself went to a medium-tier college and is probably closer in spirit to the small-business owners of the upper Labor class than to the Stanford MBA-holding executives of the Elite. Trump loves and participates in professional wrestling and reality television; those definitely aren’t Gentry or Elites pastimes! When liberals shake their heads wondering why Joe Sixpack feels like Trump is a kindred soul even though Trump’s been a billionaire his whole life, they’re falling into the liberal habit of sorting people by wealth instead of by class. To Joe Sixpack, Trump is “local boy made good”.

4. The thesis of “I Can Tolerate Anything But The Outgroup” simplifies to “It is a Gentry-class tradition to sweep aside all prejudices except class prejudice, which must be held with the intensity of all the old prejudices combined.”

5. But “I Can Tolerate Anything But The Outgroup”‘s Grey Tribe sits uneasy within this system. It doesn’t seem to be a class. But it also seems distinctly different from ordinary Gentry norms. And what about minorities? What about the differences between farmers vs. factory workers? If different classes are equivalent to different cultures, well, there are a lot of different cultures that don’t fit easily into the hierarchy. Maybe class is one factor among many that can create a different culture, but other factors can be stronger than class in some groups?

6. Siderea doesn’t want to get into how race interacts with class, and that seems wise. But a related digression: lots of people complain about social justice being classist, in that it’s hard for anybody who hasn’t either gone to college or at least spent a lot of time hanging around social justice people to keep track of which words, opinions, and causes are okay versus will render you radioactive. On the one hand, this is probably true. On the other, it’s probably true of everything, with social justice as an unexceptional example. Yes, the way you refer to trans people shows what class you’re from, but so does the way you order ice cream.

7. Siderea admits she is classist and not ashamed of this. I have a hard time understanding what she means, but I can try to explain my own classism: I think classes probably sort on important qualities and reinforce those qualities. For example, the Underclass and Labor class people I know are much more likely to have high-conflict styles of interaction: if they feel offended, they’ll yell at you and maybe even fight you. Gentry class people would be horrified at the thought; they might respond to the same offense by filing a complaint with Human Resources. I think there are two equally correct ways to interpret this. Number one, people with the maladaptive behavior of starting physical fights don’t make it very far in life and so end out in lower classes, and insofar as these behaviors are either genetic or learned within the family, their families stay in lower classes throughout the generations. Number two, the lower classes have a culture where you defend your honor by fighting people who offend you, and the upper classes have a culture where you defend your honor by submitting complaints, and although in a cosmic sense both of those styles are equally valid, and although indeed a thousand years ago the fighting might have been more adaptive, in today’s society the complaint-submitting is more adaptive and the lower classes are screwed unless they unlearn that behavior – which they probably won’t, because unlearning class is hard. But this means that classism is at least kind of justified – if you want to hire for example a schoolteacher, you might want to look for people who show all the signs of Gentry rather than Labor class to make sure they’re not going to get into physical fights in the classroom.

8. Cellular automaton theory of fashion likely relevant.

9. Siderea’s idea of college as finishing school for the upper classes is interesting, and her own experience is a window into something I never thought about before. But I’m not sure how typical she is; I think most colleges admit students who are already members of the classes their graduates end up in. I felt like I didn’t learn any class culture during my own college experience at all – which isn’t surprising since I was born the son of a doctor and ended up as a doctor myself. I think my story’s probably more typical than Siderea’s, though other people can prove me wrong if they’ve seen differently.

Posted in Uncategorized | Tagged | 984 Comments

Links 1/16: Shaolink

The McCullough Effect is an optical illusion where after you stare at one picture, another picture looks like it has some different colors. But unlike normal retinal-satiety optical illusions which last a couple of seconds, the McCullough Effect can last hours to months depending on how carefully you prime it. Try it now.

Reddit: What would the person who named walkie-talkies have named other things? Defibrillators are “heartie-starties”; cruise missiles are “zoomie-boomies”. H/t Kaj)

Including ethnic studies classes in secondary school will increase school attendance by 21% and GPA by 1.4 grade points, is apparently by far the best educational intervention ever discovered, and can singlehandedly save the school system. Or possibly there’s a problem with the methodology. The paper’s paywalled, so who knows?

That star with the weird brightness fluctuations that some people thought might be an alien Dyson sphere has even weirder brightness fluctuations than we thought. And cometary clouds have been pretty much ruled out. [EDIT: or maybe not?]

IQ scientist and Intelligence: All That Matters author Stuart Ritchie reviews Garrett Jones’ Hive Mind for Intelligence. Brings up some of the same points as in my review; I feel vindicated! Also: Jones interviewed by AEI.

A lot of people have sent me this article where Carol Dweck says growth mindset is being misused. Certainly we shouldn’t misuse things, but I want to reiterate that my disagreement is prior to any misuse and I still am not sure that even Dweck-approved correctly-used growth mindset is as effective as generally believed.

Stumbling and Mumbling on capitalism vs. markets.

It looks like Greg Cochran no longer believes mutational load is a crucial determinant of IQ. As always, ability to change one’s mind is to be praised and celebrated as a rare but powerful talent. Also canalization.

New study finds that cannabis use does not affect IQ, apparently more authoritative than all the past studies that found it did. Why are cigarettes such an important confounder? Do they cause cognitive issues?

Mike Hearn says that Bitcoin is doomed. Bram Cohen says that Mike Hearn is a whiner who is ragequitting Bitcoin because nobody wanted to let him take it over. The Economist explains the controversy using the phrase “forking hell”. Other people roll up their sleeves and come up with a temporary but mutually agreeable solution to the supposedly Bitcoin-killing problem. We are all helpfully reminded that Bitcoin has been declared doomed 89 times so far, yet continues to exist.

In theory, an infinite number of monkeys could produce the complete works of Shakespeare. In practice, “[we] concluded that monkeys are not random generators“.

Conservatives always say, kind of as preemptive schadenfreude, that nobody would ever hire spoiled student protesters. But this article in the Financial Post is the first time I’ve seen an apparently apolitical, practical-minded discussion in a business context of how to avoid them; it suggests for example searching people’s social media for telltale signs [by employment lawyer; may be self-promotional]. I’m very split; on the one hand I believe in freedom of association and if somebody is clearly going to be trouble you shouldn’t force people to throw out that information and place themselves in a position to depend on that person anyway. On the other hand, I also think that part of meaningful freedom of opinion is that expressing your opinion won’t prevent you from getting a job twenty years down the road. I guess maybe look at the things people do as part of protests (eg if they burn something down or trash buildings) but don’t necessarily judge them for protesting something even if you don’t agree with their position? But I hope this reinforces what I’ve been saying about how getting good meta-level rules about not punishing people for speaking their mind is a common cause of all sides of political debates.

Sort of related: University of Missouri rumored to have declining application rate due to bad publicity from protests last fall.

Fossil words are antiquated words that only survive as part of an expression, like the “eke” in “eke out” or the “beck” in “beck and call”. Related: linguistic Siamese twins are two-word phrases that have to be in a specific order, like “hammer and sickle” or “salt and pepper”.

All-cause mortality over the course of a year rises with proximity to New Years’ Day, which is the deadliest day of the year. Nobody knows why, and it doesn’t seem to have to do with drunk driving, weather, or hospital schedules.

An easy way to fund some kind of important or charitable project you have going on: get a government grant. Related: the government grant process is a terrible confusopoly, which is mostly bad but can be good if you learn to navigate terrible confusopolies and don’t want too many competitors.

The Church Of The Flying Spaghetti Monster has been protesting people being allowed to wear hijabs in drivers license photos by demanding the same “faith-based” right to wear colanders on their head. Unfortunately one of them tried this in Russia and got his comeuppance: he is allowed to drive while wearing a colander on his head, and only while wearing a colander on his head.

Hitler is a rock star in South Asia. Not literally. Literally he’s a retired plumber in Argentina.

Over the past ten years, there’s been an almost 50% increase in deaths due to legal intervention (ie shot by police).

I’ve been saying this for a while, but I’m glad to have backup: pregnant women should supplement with iodine even in developed countries.

More neat methodologies: mental rotation is often used as a proxy for mathematical ability. Boys are usually better at mental rotation than girls, but it’s hard to tell whether this is biological or cultural. But girls who have a male twin get exposed to lots of testosterone in the uterus and probably have more male-like brains. So you might be able to distinguish biological from cultural effects by comparing mental rotation performance of girls who had male vs. female twins.

Related: straight men do better than gay men (and gay women better than straight women) on rotation tasks. Was Turing just a gigantic outlier, or what?

Related: why are gay men shorter than straight men?

While civilized countries debate how many new immigrants to let in, Britain is planning to deport all legal residents who have lived in the UK for more than five years unless they can meet an income threshold which is actually significantly higher than the average UK income. Is there anyone who thinks deporting upper-middle-class people who have been in Britain for decades and have houses and families there is vitally important important to national security? Especially bad because it’s a new law, so these people planned their lives in Britain around people not doing this.

Women whose resume suggests that they’re lesbian get 30% fewer calls for job interviews.

Charter schools in Boston get better test scores than public schools in Boston. Some argue this is because they teach to the test more than public schools do. A new study tries an interesting methodology: see whether these schools have a greater advantage on the higher-stakes, more traditional, more easily-gamed tests that teach-to-the-test schools would be more likely to be teaching to.

Related: state takeovers help failing schools. Public schools in Louisiana outperform voucher schools. I don’t really care that much about takeovers or vouchers, but results like this drag me out of my skepticism and force me to admit there’s some effect of how well schools are run on test scores. Whether that matters for real life applications ten years later is a harder question.

Not as related as it sounds: doubling teacher salary had no effect on any educational parameter in Indonesia. But they just kept all the same teachers and paid them more for no reason, so this doesn’t prove that increasing teacher salaries in the way people usually mean (ie in order to attract better teachers) wouldn’t be a good idea. And a new study does show pay for performance has improved DC’s public schools.

Thank whatever God you believe in that you’re not a junior doctor in the United Kingdom (I was a medical student in Ireland, which was close enough to inspire me to flee across the Atlantic). Proving that it is always good for making things worse, the UK government accuses doctors of killing people by occasionally having days off, but the evidence isn’t enough to support their claim.

Unfortunately-named consequentialist Max Harms has written a sci-fi book about the Singularity, Crystal Society. Haven’t read it yet but people I trust including Brienne Yudkowsky and Kaj Sotala say it’s good. Also: island exploration computer game The Witness (by the author of Braid) is donating 10% of sales to Against Malaria Foundation.

Ultra-premium water is on the rise. I didn’t even know “water sommelier” was a real profession.

Lots of people are warning against the alt-right these days, but needless to say Xenosystems’ warning is a little different. “For the Alt-Right, generally speaking, fascism is basically a great idea; for NRx, fascism is a late-stage leftist aberration made peculiarly toxic by its comparative practicality. There’s no real room for a meeting of minds on this point. From the NRx perspective, the Alt-Right is to be appreciated for helping to clean us up. They’re most welcome to take whoever they can, especially if they shut the door on the way out.”

The Dictionary Of Ancient Magic Words And Spells is a pretty good resource for all of the interesting things our ancestors thought you could do with garbled Latin and a copious supply of newt eyes.

Why does Donald Trump play Phantom of the Opera at all his campaign rallies? Does he just really like Phantom of the Opera? Sort of related: Developing And Testing A Scale To Measure Need For Drama.

The Empirics Of Free Speech (warning: long). What does free speech actually do or not do, according to the evidence? Does it let corporations buy elections? Does it result in heavily biased media? Can people use it to incite violence? Do people actually call “FIRE” in crowded theaters just to laugh as everyone tramples each other? This post will tell you much more than you wanted to know about all of these questions.

I’ve seen this idea floating a few places before under the name “proxy democracy” – a government that’s a direct democracy, but you can delegate your vote to anyone you like, be it a professional senator or just your friend who knows more about politics than you do. Now Google is calling it liquid democracy and testing it for some forms of corporate decision-making.

Jerry Coyne (Why Evolution Is True) reports on a controlled experiment on Facebook – make two otherwise identical Facebook groups, one anti-Palestine and the other anti-Israel. Sure enough, the anti-Palestine one gets banned and the anti-Israel one is left up [though the experiment itself is done by a pro-Israel group I do not trust as much as I trust Coyne]. And Marc Randazza of Popehat says that he’s tried a similar experiment and found that social-justice-branded accounts on Twitter can harass as much as they want up to and including death threats without consequences, but conservative-branded accounts are cracked down upon for slight offenses [though he does not post proof of this experiment]. Overall not likely to convince the not-already-convinced, but matches the anecdotal evidence I hear. Although private companies have the right to monitor their own customers as they see fit, I think FIRE’s philosophy – hold organizations to their stated principles and rules, but criticize them when they fall short or enforce them selectively – is fair. A Facebook that said outright “We’ll ban you for criticizing Palestine, but criticize Israel as much as you want” would have every right to go through with its policies as written – but also might not have too many users.

Studies traditionally show that immigrants do not “steal” native jobs or harm the native economy in any way. The major study to contradict this wisdom was Borjas on the Mariel boatlift of refugees from Cuba, but more recently reanalyses of the data by other economists (or as we now call them, “research parasites”) have cast doubt on that conclusion and the entire field has become an impenetrable quagmire. RealClearPolicy has an excellent and unbiased summary of the debate and of how people are getting such different results.

Philosophers experience silly cognitive biases even on exactly the kinds of problems where they should be most philosophical.

Quantifying Gains In The War On Cancer: ” We estimate that 3-year cancer-related mortality of cancer patients fell 16.7% from 1997 to 2007. Overall, advances in treatment reduced mortality rates by approximately 12.2% while advances in early detection reduced mortality rates by 4.5%.”

The New York Times has a hit piece perfectly nice article on the Center For Applied Rationality, a Less Wrong-affiliated self-help workshop group in Berkeley.

Who would Chinese people vote for in the US presidential election? Spoiler: Donald Trump, but only until somebody tells them what Donald Trump believes.

Related: @DPRK_News (parody North Korean Twitter account actually run by Popehat) covers the Democratic debate.

A pretty comprehensible explanation of what’s going on with Flint’s water. But I worry it might be too quick to exonerate politicians based on them not necessarily making bad water treatment decisions, when the things people are really angry about is them covering it up / not reacting fast enough.

Razib Khan predicts ISIS’ ideology will become more popular.

Dr. David Ludwig debates Dr. Stephan Guyenet on the calorie hypothesis vs. the insulin hypothesis of obesity. They’re both really smart and excellent communicators and this is a great demonstration of the level that this kind of debate should be held it.

This is a really neat new study: The impact of having a father who went to Vietnam. Since whether or not a 1960s American man went to Vietnam was partially determined by the draft lottery, you should be able to factor out all the other reasons someone might or might not go to Vietnam and get what was basically a randomized experiment sending people into war zones. The research finds that the children of people with bad draft numbers (more likely to have gone to Vietnam) make about $200 – $500 less fifty years later than the children of similar people who were less likely to have gone to Vietnam. That doesn’t seem like much, but since only about 10% of the people in the bad-draft-number category actually ended up in Nam and this is the effect that must be driving the average difference, it might be that those children are making $2000 – $5000 less, which actually is a lot (note that this was during a period when very few Americans died in Vietnam, so not much selection effect; the paper also adjusted for all the other reasonable objections I can think of). This is really weird. It’s unclear exactly how the father’s military service hurts the kid, but good guesses would be something like PTSD making the father less effective as a parent, or the father’s military service preventing them from getting as good a job. But that would be a shared environmental effect, which shouldn’t happen, and nongenetic intergenerational transfer of human capital, which also shouldn’t happen! Very interesting.

I’d never heard this before and it sounds fascinating: A Drug To Cure Fear. Apparently you can insta-cure a phobia by taking propranalol (a common drug that blocks some of the bodily effects of emotion) and then exposing yourself to the phobic trigger. Sounds plausible – you’re habituating yourself by “proving” to your brain that it doesn’t scare you – but the drug is so common I’d be surprised nobody noticed before. Anybody with a phobia and access to propranalol want to try this and tell me how it works?

Posted in Uncategorized | Tagged | 966 Comments

Predictions For 2016

At the beginning of every year, I make predictions. At the end of every year, I score them. So here are a hundred more for 2016. Most of them are objectively decideable, but a few are subjective (eg “X goes well”) and are marked with asterisks.

WORLD EVENTS
1. US will not get involved in any new major war with death toll of > 100 US soldiers: 60%
2. North Korea’s government will survive the year without large civil war/revolt: 95%
3. Greece will not announce it’s leaving the Euro: 95%
4. No terrorist attack in the USA will kill > 100 people: 90%
5. …in any First World country: 80%
6. Assad will remain President of Syria: 60% [edit: called out as dumb, but I won’t cheat and change it]
7. Israel will not get in a large-scale war (ie >100 Israeli deaths) with any Arab state: 90%
8. No major intifada in Israel this year (ie > 250 Israeli deaths, but not in Cast Lead style war): 80%
9* No interesting progress with Gaza or peace negotiations in general this year: 90%
10. No Cast Lead style bombing/invasion of Gaza this year: 90%
11* Situation in Israel looks more worse than better: 70%
12. Syria’s civil war will not end this year: 70%
13. ISIS will control less territory than it does right now: 90%
14. ISIS will not continue to exist as a state entity [added: meant in Iraq/Syria]: 60% [edit: called out as dumb, but I won’t cheat and change it]
15. No major civil war in Middle Eastern country not currently experiencing a major civil war: 90%
16* Libya to remain a mess: 80%
17. Ukraine will neither break into all-out war or get neatly resolved: 80%
18. No country currently in Euro or EU announces plan to leave: 90%
19. No agreement reached on “two-speed EU”: 80%
20. Hillary Clinton will win the Democratic nomination: 95%
21. Donald Trump will win the Republican nomination: 60%
22* Conditional on Trump winning the Republican nomination, he impresses everyone how quickly he pivots towards wider acceptability: 70%
23. Conditional on Trump winning the Republican nomination, he’ll lose the general election: 80%
24. Conditional on Trump winning the Republican nomination, he’ll lose the general election worse than either McCain or Romney: 70%
25. Marco Rubio will not win the Republican nomination: 60% [edit: called out as dumb, but I won’t cheat and change it]
26. Bloomberg will not run for President: 80%
27. Hillary Clinton will win the Presidency: 60%
28. Republicans will keep the House: 95%
29. Republicans will keep the Senate: 70%
30. Bitcoin will end the year higher than $500: 80%
31. Oil will end the year lower than $40 a barrel: 60%
32. Dow Jones will not fall > 10% this year: 70%
33. Shanghai index will not fall > 10% this year: 60% [edit: called out as dumb, but I won’t cheat and change it]
34. No major revolt (greater than or equal to Tiananmen Square) against Chinese Communist Party: 95%
35. No major war in Asia (with >100 Chinese, Japanese, South Korean, and American deaths combined) over tiny stupid islands: 99%
36. No exchange of fire over tiny stupid islands: 90%
37. US GDP growth lower than in 2015: 60%
38. US unemployment to be lower at end of year than beginning: 50%
39. No announcement of genetically engineered human baby or credible plan for such: 90%
40* No major change in how the media treats social justice issues from 2015: 70%
41* European far right makes modest but not spectacular gains: 80%
42* Mainstream European position at year’s end is taking migrants was bad idea: 60%
43. [Duplicate removed]
44* So-called “Ferguson effect” continues and becomes harder to deny: 70%
45. SpaceX successfully launches a reused rocket: 50%
46* Nobody important changes their mind much about the EMDrive based on any information found in 2016: 80%
47. California’s drought not officially declared over: 50%
48. No major earthquake (>100 deaths) in US: 99%
49. No major earthquake (>10000 deaths) in the world: 60%
50. Occupation of Oregon ranger station ends: 99%

PERSONAL/COMMUNITY
1. SSC will remain active: 95%
2. SSC will get fewer hits than in 2015: 60%
3. At least one SSC post > 100,000 hits: 50%
4. UNSONG will get fewer hits than SSC in 2016: 90%
5. > 10 new permabans from SSC this year: 70%
5. UNSONG will get > 1,000,000 hits: 50%
6. UNSONG will not miss any updates: 50%
7. UNSONG will have higher Google Trends volume than HPMOR at the end of this year: 60%
8. UNSONG Reddit will not have higher average user activity than HPMOR Reddit at the end of this year: 60%
9. Shireroth will remain active: 70%
10. I will be involved in at least one published/accepted-to-publish research paper by the end of 2016: 50%
11. I won’t stop using Twitter, Tumblr, or Facebook: 95%
12. > 10,000 Twitter followers by end of this year: 50%
13. I will not break up with any of my current girlfriends: 70%
14. I will not get any new girlfriends: 50%
15. I will attend at least one Solstice next year: 90%
16. …at least two Solstices: 70%
17. I will finish a long blog post review of stereotype threat this year: 60%
18* Conditional on finishing it, it won’t significantly change my position: 90%
19. I will finish a long FAQ this year: 60%
20. I will not have a post-residency job all lined up by the end of this year: 80%
21. I will have finished all the relevant parts of my California medical license application by the end of this year: 70%
22. I will no longer be living in my current house at the end of this year: 70%
23. I will still be at my current job: 95%
24. I will still not have gotten my elective surgery: 80%
25. I will not have been hospitalized (excluding ER) for any other reason: 95%
26. I will not have taken any international vacations with my family: 70%
27. I will not be taking any nootropic daily or near-daily during any 2-month period this year: 90%
28. I will complete an LW/SSC survey: 80%
29. I will complete a new nootropics survey: 80%
30. I will score 95th percentile or above in next year’s PRITE: 50%
31. I will not be Chief Resident next year: 60%
32. I will not have any inpatient rotations: 50%
33. I will continue doing outpatient at the current clinic: 90%
34* I will not have major car problems: 60%
35* I won’t publicly and drastically change highest-level political/religious/philosophical positions (eg become a Muslim or Republican): 90%
36. I will not vote in the 2016 primary: 70%
37. I will vote in the 2016 general election: 60%
38. Conditional on me voting and Hillary being on the ballot, I will vote for Hillary: 90%
39* I will not significantly change my mind about psychodynamic or cognitive-behavioral therapy: 80%
40. I will not attend the APA meeting this year: 80%
41. I will not do any illegal drugs (besides gray-area nootropics) this year: 90%
42. I will not get drunk this year: 80%
43* Less Wrong will neither have shut down entirely nor undergone any successful renaissance/pivot by the end of this year: 60%
44. No co-bloggers (with more than 5 posts) on SSC by the end of this year: 80%
45. I get at least one article published on a major site like Huffington Post or Vox or New Statesman or something: 50%
46. I still plan to move to California when I’m done with residency: 90%
47. I don’t manage to make it to my friend’s wedding in Ireland: 60%
48. I don’t attend any weddings this year: 50%
49. I decide to buy the car I am currently leasing: 60%
50. Except for the money I spend buying the car, I make my savings goal before July 2016: 90%

Other people doing yearly predictions with probability: Against Jebel al-Lawz, Anatoly Karlin, Old Lamps, Garrett Peterson. If you’re doing this and I missed it, let me know and I’ll add you in.

Posted in Uncategorized | Tagged | 532 Comments

Side Effects May Include Anything

A couple of days ago a patient said he’d become depressed after starting Xolair, a new asthma drug I know nothing about.

On the one hand, lots of things that mess with the immune system can cause depression. On the other, patients are notorious for blaming drugs for any random thing that happens around the same time they started taking them. So I did what any highly-trained competent medical professional would: I typed “does xolair cause depression?” into Google.

The results seemed promising. The first site was called “Can Xolair cause depression?”. The second was “Is depression a side effect of Xolair?”. Also on the front page were “Could Xolair cause major depression?” and “Xolair depression side effects”. Clearly this is a well-researched topic that lots of people cared about, right?

Let’s look closer at one of those sites, EHealthMe.com. It says: “Major depression is found among people who take Xolair, especially for people who are female, 40-49 old, also take medication Singulair, and have Asthma. We study 11,502 people who have side effects while taking Xolair from FDA and social media. Among them, 14 have Major depression. Find out below who they are, when they have Major depression and more.” Then it offers a link: “Join a support group for people who take Xolair and have Major depression”.

First things first: if there were actually 11502 people taking Xolair, and only 14 of them had major depression, that would be a rate of 0.1%, compared to 6.9% in the general population. In other words, Xolair would be the most effective antidepressant on Earth. But of course nobody has ever done an n=11502 study on whether a random asthma medication causes depression, and EHealthMe is just scraping the FDA databases to see how many people reported depression as a side effect to the FDA. But only a tiny percent of people who get depression report it, and depression sometimes strikes at random times whether you’re taking Xolair or not. So this tells us nothing.

And yet a patient who worries that Xolair might be causing their depression will Google “can xolair cause depression?”, and she will end up on this site that says “major depression is found among people who take Xolair”, which is one of the worst examples of weasel words I’ve ever heard. Then she will read that there are entire support groups for depressed Xolair sufferers. She will find all sorts of scary-looking information like that Xolair-related depression has been increasing since 2008. And this is above and beyond just the implications of somebody bothering to write an entire report about the Xolair-depression connection!

In case you haven’t guessed the twist – no one’s ever investigated whether Xolair causes depression. EHealthMe’s business model is to make an automated program that runs through every single drug and every possible side effect, scrapes the FDA database for examples, then autopublishes an ad-filled web page titled “COULD $DRUG CAUSE $SIDE_EFFECT?”. It populates the page by spewing random FDA data all over it, concludes “$SIDE_EFFECT is found among people who take $DRUG”, and offers a link to a support group for $DRUG patients suffering from $SIDE_EFFECT. Needless to say, the support group is an automatically-generated forum with no posts in it.

And it’s not just EHealthMe. This is a whole market, with competitors elbowing their way past one another to the top of the Google search results. Somebody who doubts EHealthMe and seeks an online second opinion will probably just end up at PatientsVille, whose page is called “Xolair Depression Side Effects”, which contains the same FDA data, and which gets the Google description text “This opens a possibility that Xolair could cause Depression”. Or Treato, whose page claims to contain 56 reader comments on Xolair and depression, but which has actually just searched the Web for every single paragraph that contains “Xolair” and “depression” together and then posted garbled excerpts in its comment section. For example, one of their comments – and this is not at all clear from Treato’s garbled excerpt – is from a tennis forum, where a user with the handle Xolair talks about how his tennis serve is getting worse with age; another user replies “Xolair, I read this and get depressed, I just turned 49.” But if you don’t check whether it came from a tennis forum or not, 56 reports of a connection between a drug and a side effect sounds convincing!

This is really scummy. Maybe it’s not the most devious of traps for you or me, but what about for your grandmother? What about for those people who send money to Nigerian princes? The law is usually pretty strict about who can and can’t provide medical information – so much so that it cracks down on 23andMe just for reading off the genome in a way that uneducated people might misinterpret. Yet somehow sites like EHealthMe are allowed to continue, because they just very strongly imply fake medical information instead of saying it outright.

Remember, only about 50% of people who are prescribed medication take it. Sometimes it’s personal choice or simple forgetfulness. But a lot of the time they stop because of side effects. I had a patient a few months ago who was really depressed. I started her on an antidepressant and she got much better. Then she stopped the medication cold turkey and got a lot worse again. I asked her why she’d stopped. She said her shoulder started hurting, she’d Googled whether antidepressants could cause shoulder pain, and read that they could. She couldn’t remember what site she was reading, but I bet it was EHealthMe or Treato or some of the others just like them.

One day, somebody’s going to Google “can penicillin cause cancer?”, read a report with a link to a support group for penicillin-induced-cancer survivors, stop taking antibiotics, and die. And when that happens, I hope it’s in America, so I can be sure their family will sue the company involved for more money than exists in the entire world.

Posted in Uncategorized | Tagged | 285 Comments