codex Slate Star Codex


Fear And Loathing At Effective Altruism Global 2017

San Francisco in the middle sixties was a very special time and place to be a part of. Maybe it meant something. Maybe not, in the long run – but no explanation, no mix of words or music or memories can touch that sense of knowing that you were there and alive in that corner of time and the world….There was a fantastic universal sense that whatever we were doing was right, that we were winning. And that, I think, was the handle—that sense of inevitable victory over the forces of Old and Evil.

— Hunter S. Thompson

Effective altruism is the movement devoted to finding the highest-impact ways to help other people and the world. Philosopher William MacAskill described it as “doing for the pursuit of good what the Scientific Revolution did for the pursuit of truth”. They have an annual global conference to touch base and discuss strategy. This year it was in the Palace of Fine Arts in San Francisco, and I got a chance to check it out.

The lake-fringed monumental neoclassical architecture represents ‘utilitiarian distribution of limited resources’

The official conference theme was “Doing Good Together”. The official conference interaction style was “earnest”. The official conference effectiveness level was “very”. And it was impossible to walk away from some of the talks without being impressed.

Saturday afternoon there was a talk by some senior research analysts at GiveWell, which researches global development charities. They’ve evaluated dozens of organizations and moved $110 million to the most effective, mostly ones fighting malaria and parasitic infections. Next were other senior research analysts from the Open Philanthropy Project, who have done even more detailed effectiveness investigations and moved about $200 million.

The parade went on. More senior research analysts. More nine-digit sums of money. More organizations, all with names that kind of blended together. The Center for Effective Altruism. The Center For Effective Global Action. Raising For Effective Giving. Effecting Effective Effectiveness. Or maybe not, I think I was hallucinating pretty hard by the end.

I figured the speaker named “Cashdollar” was a hallucination, but she’s right there on the website

One of the breakout rooms had all-day career coaching sessions with 80,000 Hours (motto: “You have 80,000 hours in your career. Make the right career choices, and you can help solve the world’s most pressing problems”). A steady stream of confused altruistic college students went in, chatted with a group of coaches, and came out knowing that the latest analyses show that management consulting is a useful path to build charity-leading-relevant skills, but practicing law and donating the money to charity is probably less useful than previously believed. In their inevitable effectiveness self-report, they record having convinced 188 people to change their career plans as of April 2015.

(I had been avoiding the 80,000 Hours people out of embarassment after their career analyses discovered that being a doctor was low-impact, but by bad luck I ended up sharing a ride home with one of them. I sheepishly introduced myself as a doctor, and he said “Oh, so am I!” I felt relieved until he added that he had stopped practicing medicine after he learned how low-impact it was, and gone to work for 80,000 Hours instead.)

The theater hosted a “fireside chat” with Bruce Friedrich, director of the pro-vegetarian Good Food Institute. I’d heard he was a former vice-president of PETA, so I went in with some stereotypes. They were wrong. Friedrich started by admitting that realistically most people are going to keep eating meat, and that yelling at them isn’t a very effective way to help animals. His tactic was to advance research into plant-based and vat-grown meat alternatives, which he predicted would taste identical to regular meat at a fraction of the cost, and which would put all existing factory farms out of business. Afterwards a bunch of us walked to a restaurant a few blocks down the street to taste an Impossible Burger, the vanguard of this brave new meatless future.

The people behind this ad are all PETA card-carrying vegetarians. And the future belongs to them, and they know it.

The whole conference was flawlessly managed, from laser-fast registration to polished-sounding speakers to friendly unobtrusive reminders to use the seventeen different apps that would keep track of your conference-related affairs for you. And the of course the venue, which really was amazing.

The full-size model of the Apollo 11 lander represents ‘utilitiarian distribution of limited resources’

But walk a little bit outside of the perfectly-scheduled talks, or linger in the common areas a little bit after the colorfully-arranged vegetarian lunches, and you run into the shadow side of all of this, the hidden underbelly of the movement.

William MacAskill wanted a “scientific revolution in doing good”. But the Scientific Revolution progressed from “I wonder why apples fall down” to “huh, every particle is in an infinite number of places simultaneously, and also cats can be dead and alive at the same time”. The effective altruists’ revolution started with “I wonder if some charities work better than others”. But even at this early stage, it’s gotten to some pretty weird places.

I got to talk to some people from Wild Animal Suffering Research. They start with the standard EA animal rights argument – if you think animals have moral relevance, you can save zillions of them for almost no cost. A campaign for cage-free eggs, minimal in the grand scheme of things, got most major corporations to change their policies and gave two hundred million chickens an improved quality of life. But WASR points out that even this isn’t the most neglected cause. There are up to a trillion reptiles, ten quintillion insects, and maybe a sextillion zooplankton. And as nasty as factory farms are, life in the state of nature is nasty, brutish, short, and prone to having parasitic wasps paralyze you so that their larvae can eat your organs from the inside out while you are still alive. WASR researches ways we can alleviate wild animal suffering, from euthanizing elderly elephants (probably not high-impact) to using more humane insecticides (recommended as an ‘interim solution’) to neutralizing predator species in order to relieve the suffering of prey (still has some thorny issues that need to be resolved).

Wild Animal Suffering Research was nowhere near the weirdest people at Effective Altruism Global.

I got to talk to people from the Qualia Research Institute, who point out that everyone else is missing something big: the hedonic treadmill. People have a certain baseline amount of happiness. Fix their problems, and they’ll be happy for a while, then go back to baseline. The only solution is to hack consciousness directly, to figure out what exactly happiness is – unpack what we’re looking for when we describe some mental states as having higher positive valence than others – and then add that on to every other mental state directly. This isn’t quite the dreaded wireheading, the widely-feared technology that will make everyone so doped up on techno-super-heroin (or direct electrical stimulation of the brain’s pleasure centers) that they never do anything else. It’s a rewiring of the brain that creates a “perpetual but varied bliss” that “reengineers the network of transition probabilities between emotions” while retaining the capability to do economically useful work. Partly this last criteria is to prevent society from collapsing, but the ultimate goal is:

…the possibility of a full-fledged qualia economy: when people have spare resources and are interested in new states of consciousness, anyone good at mining the state-space for precious gems will have an economic advantage. In principle the whole economy may eventually be entirely based on exploring the state-space of consciousness and trading information about the most valuable contents discovered doing so.

If you’re wondering whether these people’s research involves taking huge amounts of drugs – well, read their blog. My particular favorites are this essay on psychedelic cryptography ie creating messages that only people on certain drugs can read, and this essay on hyperbolic geometry in DMT experiences.

The guy on the right works for MealSquares, a likely beneficiary of technology that hacks directly into people’s brains and adds artificial positive valence to unpleasant experiences.

The Qualia Research Institute was nowhere near the weirdest people at Effective Altruism Global.

I got to talk to some people from the Foundational Research Institute. They think that suffering is much more bad than happiness is good. And the universe is really really big. So if suffering made up an important part of the structure of the universe, this would be so tremendously outrageously unconscionably bad that we can’t even conceive of how bad it could be. So the most important cause might be to worry about whether fundamental physical particles are capable of suffering – and, if so, how to destroy physics. From their writeup:

Speculative scenarios to change the long-run future of physics may dominate any concrete work to affect the welfare of intelligent computations — at least within the fraction of our brain’s moral parliament that cares about fundamental physics. The main value (or disvalue) of intelligence would be to explore physics further and seek out tricks by which its long-term character could be transformed. For instance, if false-vacuum decay did look beneficial with respect to reducing suffering in physics, civilization could wait until its lifetime was almost over anyway (letting those who want to create lots of happy and meaningful intelligent beings run their eudaimonic computations) and then try to ignite a false-vacuum decay for the benefit of the remainder of the universe (assuming this wouldn’t impinge on distant aliens whose time wasn’t yet up). Triggering such a decay might require extremely high-energy collisions — presumably more than a million times those found in current particle accelerators — but it might be possible. On the other hand, such decay may happen on its own within billions of years, suggesting little benefit to starting early relative to the cosmic scales at stake. In any case, I’m not suggesting vacuum decay as the solution — just that there may be many opportunities like it waiting to be found, and that these possibilities may dwarf anything else that happens with intelligent life.


This talk was called ‘Christians In Effective Altruism’. It recommended reaching out to churches, because deep down the EA movement and people of faith share the same core charitable values and beliefs.

The thing is, Lovecraft was right. He wrote:

We live on a placid island of ignorance in the midst of black seas of infinity, and it was not meant that we should voyage far. The sciences, each straining in its own direction, have hitherto harmed us little; but some day the piecing together of dissociated knowledge will open up such terrifying vistas of reality, and of our frightful position therein, that we shall either go mad from the revelation or flee from the deadly light into the peace and safety of a new dark age.

Morality wasn’t supposed to be like this. Most of the effective altruists I met were nonrealist utilitarians. They don’t believe in some objective moral law imposed by an outside Power. They just think that we should pursue our own human-parochial moral values effectively. If there was ever a recipe for a safe and milquetoast ethical system, that should be it. And yet once you start thinking about what morality is – really thinking, the kind where you try to use mathematical models and formal logic – it opens up into these dark eldritch vistas of infinities and contradictions. The effective altruists started out wanting to do good. And they did, whole nine-digit-sums worth of good, spreadsheets full of lives saved and diseases cured and disasters averted. But if you really want to understand what you’re doing – get past the point where you can catch falling apples, to the point where you have a complete theory of gravitation – you end up with something as remote from normal human tenderheartedness as the conference lunches were from normal human food.

Born too late to eat meat guilt-free, born too early to get the technology that hacks directly into my brain and adds artificial positive valence to unpleasant experiences.

But I worry I’m painting a misleading picture here. It isn’t that effective altruism is divided into two types of people: the boring effective suits, and the wacky explorers of bizarre ethical theories. I mean, there’s always going to be some division. But by and large these were the same people, or at least you couldn’t predict who was who. They would go up and give a talk about curing river blindness in Nigeria, and then you’d catch them later and learn that they were worried that maybe the most effective thing was preventing synthetic biology from taking over the ecosystem. Or you would hear someone give their screed, think “what a weirdo”, and then learn they were a Harvard professor who served on a bunch of Fortune 500 company boards.

The movement’s unofficial leader is William MacAskill. He’s a pretty typical overachiever – became an Oxford philosophy professor at age 28 (!), founded three successful non-profits, and goes around hobnobbing with rich people trying to get them to donate money (he himself has pledged to give away everything he earns above $36,000). I had always assumed he was just a random dignified suit-wearing person who was slightly exasperated at having to put up with the rest of the movement. But I got a chance to talk to him – just for a few minutes, before he had to run off and achieve something – and I was shocked at how much he knew about all the weirdest aspects of the community, and how protective he felt of them. And in his closing speech, he urged the attendees to “keep EA weird”, giving examples of times when seemingly bizarre ideas won out and became accepted by the mainstream.

His PowerPoint slide for this topic was this picture of Eliezer Yudkowsky. Seriously. I’m not joking about this part.

I don’t want to exaggerate this. Maybe the right analogy would be physics. A lot of physicists work on practical things like solar panels and rechargeable batteries. A tiny minority work on stranger things like wormholes and alternate universes. But it’s not like these are two different factions in physics that hate each other. And every so often a solar panel engineer might look into the math behind alternate universes, or a wormhole theorist might have opinions on battery design. They’re doing really different stuff, but it’s within the same tradition.

If it were just the senior research analysts at their spreadsheets, we could dismiss them as the usual Ivy League lizard people and move on. If it were just the fringes ranting about cyber-neuro-metaphilosophy, we could dismiss them as loonies and forget about it. And if it were just the two groups, separate and doing their own thing, we could end National Geographic-style, intoning in our best David Attenborough voice that “Effective Altruism truly is a land of contrasts”. But it’s more than that. Some animating spirit gives rise to the whole thing, some unifying aesthetic that can switch to either pole and back again on a whim. After a lot of thought, I have only one guess about what it might be.

I think the effective altruists are genuinely good people.

Over lunch, one a friend told me about his meeting with an EA philosopher who hadn’t been able to make it to the conference. This friend had met the philosopher, and as they were walking, the philosopher had stopped to pick up worms writhing on the side walk and put them back in the moist dirt.

And this story struck me, because I had taken a walk with one of the speakers earlier, and seen her do the same thing. She had been apologetic, said she knew it was a waste of her time and mine. She’d wondered if it was pathological, whether maybe she needed to be checked for obsessive compulsive disorder. But when I asked her whether she wanted to stop doing it, she’d thought about it a little, and then – finally – saved the worm.

And there was a story about the late great moral philosopher Derek Parfit, himself a member of the effective altruist movement. This is from Larissa MacFarquhar:

As for his various eccentricities, I don’t think they add anything to an understanding of his philosophy, but I find him very moving as a person. When I was interviewing him for the first time, for instance, we were in the middle of a conversation and suddenly he burst into tears. It was completely unexpected, because we were not talking about anything emotional or personal, as I would define those things. I was quite startled, and as he cried I sat there rewinding our conversation in my head, trying to figure out what had upset him. Later, I asked him about it. It turned out that what had made him cry was the idea of suffering. We had been talking about suffering in the abstract. I found that very striking.

Now, I don’t think any professional philosopher is going to make this mistake, but nonprofessionals might think that utilitarianism, for instance (Parfit is a utilitarian), or certain other philosophical ways of think about morality, are quite unemotional, quite calculating, quite cold; and so because as I am writing mostly for nonphilosophers, it seemed like a good corrective to know that for someone like Parfit these issues are extremely emotional, even in the abstract.

The weird thing was that the same thing happened again with a philosophy graduate student whom I was interviewing some months later. Now you’re going to start thinking it’s me, but I was interviewing a philosophy graduate student who, like Parfit, had a very unemotional demeanor; we started talking about suffering in the abstract, and he burst into tears. I don’t quite know what to make of all this but I do think that insofar as one is interested in the relationship of ideas to people who think about them, and not just in the ideas themselves, those small events are moving and important.

I imagine some of those effective altruists, picking up worms, and I can see them here too. I can see them sitting down and crying at the idea of suffering, at allowing it to exist.

Larissa MacFarquhar says she doesn’t know what to make of this. I think I sort of do. I’m not much of an effective altruist – at least, I’ve managed to evade the 80,000 Hours coaches long enough to stay in medicine. But every so often, I can see the world as they have to. Where the very existence of suffering, any suffering at all, is an immense cosmic wrongness, an intolerable gash in the world, distressing and enraging. Where a single human lifetime seems frighteningly inadequate compared to the magnitude of the problem. Where all the normal interpersonal squabbles look trivial in the face of a colossal war against suffering itself, one that requires a soldier’s discipline and a general’s eye for strategy.

All of these Effecting Effective Effectiveness people don’t obsess over efficiency out of bloodlessness; they obsess because the struggle is so desperate, and the resources so few. Their efficiency is military efficiency. Their cooperation is military discipline. Their unity is the unity of people facing a common enemy. And they are winning. Very slowly, WWI trench-warfare-style. But they really are.

Sources and commentary here

And I write this partly because…well, it hasn’t been a great couple of weeks. The culture wars are reaching a fever pitch, protesters are getting run over by neo-Nazis, North Korea is threatening nuclear catastrophe. The world is a shitshow, nobody’s going to argue with that – and the people who are supposed to be leading us and telling us what to do are just about the shittiest of all.

And this is usually a pretty cynical blog. I’m cynical about academia and I’m cynical about medicine and goodness knows I’m cynical about politics. But Byron wrote:

I have not loved the world, nor the world me
But let us part fair foes; I do believe,
Though I have found them not, that there may be
Words which are things,—hopes which will not deceive,
And virtues which are merciful, nor weave
Snares for the failing: I would also deem
O’er others’ griefs that some sincerely grieve;
That two, or one, are almost what they seem,
That goodness is no name, and happiness no dream.

This seems like a good time to remember that there are some really good people. And who knows? Maybe they’ll win.

And one more story.

I got in a chat with one of the volunteers running the conference, and told him pretty much what I’ve said here: the effective altruists seemed like great people, and I felt kind of guilty for not doing more.

He responded with the official party line, the one I’ve so egregiously failed to push in this blog post. That effective altruism is a movement of ordinary people. That its yoke is mild and it accepts everyone. That not everyone has to be a vegan or a career researcher. That a commitment could be something more like just giving a couple of dollars to an effective-seeming charity, or taking the Giving What We Can pledge, or signing up for the online newsletter, or just going to an local effective altruism meetup group and contributing to discussions.

And I said yeah, but still, everyone here seems so committed to being a good person – and then here’s me, constantly looking over my shoulder to stay one step ahead of the 80,000 Hours coaching team, so I can stay in my low-impact career that I happen to like.

And he said – no, absolutely, stay in your career right now. In fact, his philosophy was that you should do exactly what you feel like all the time, and not worry about altruism at all, because eventually you’ll work through your own problems, and figure yourself out, and then you’ll just naturally become an effective altruist.

And I tried to convince him that no, people weren’t actually like that, practically nobody was like that, maybe he was like that but if so he might be the only person like that in the entire world. That there were billions of humans who just started selfish, and stayed selfish, and never declared total war against suffering itself at all.

And he didn’t believe me, and we argued about it for ten minutes, and then we had to stop because we were missing the “Developing Intuition For The Importance Of Causes” workshop.

Rationality means believing what is true, not what makes you feel good. But the world has been really shitty this week, so I am going to give myself a one-time exemption. I am going to believe that convention volunteer’s theory of humanity. Credo quia absurdum; certum est, quia impossibile. Everyone everywhere is just working through their problems. Once we figure ourselves out, we’ll all become bodhisattvas and/or senior research analysts.

OT82: Threado Quia Absurdum

This is the bi-weekly visible open thread. Post about anything you want, ask random questions, whatever. You can also talk at the SSC subreddit, the SSC Discord server. Also:

1. Comments of the week: CatCube on how organizations change over time, Douglas Knight’s update on self-driving car progress, Tibor on gun laws in the Czech Republic. And Brad explains why comments are closed on some posts here better than I could.

2. I’m off social media for the time being to avoid Discourse. If you need to contact me, try email – on a related note, sorry for being terrible about responding to emails.

3. I’ll be at the Effective Altruism Global conference today. Come say hi. If nothing else, I’ll be at the Rationalist Tumblr Meetup (at least briefly) and Katja Grace’s 5:50 talk on AI.

4. Does anyone have strong feelings about who would make a good SSC moderator? Does anyone actually read all the comments here well enough to moderate them?

Posted in Uncategorized | Tagged | 2,034 Comments

Brief Cautionary Notes On Branded Combination Nootropics


Taking nootropics is an inherently questionable decision. The risk isn’t zero, and the benefits are usually subtle at best.

On the other hand, mountain-climbing also has risks, and is so devoid of benefit that the only excuse mountaineers can come up with is “because it’s there”. So whatever. If someone wants to do either – well, it’s a free country, and we all have to amuse ourselves somehow.

But even within this context, special caution is warranted for branded combination nootropics.

I wanted to make up a caricatured fake name for these sorts of things, so I could make fun of them without pointing at any company in particular. But all of the caricatured fake names I can think of turn out to be real products. MegaMind? Real. SuperBrain? Real. UltraBrain? Real. Mr. Power Brain? Real, for some reason.

Even the ones that don’t make sense are real. NeuroBrain? Real, even though one hopes that brains are always at least a little neuro. NeuroMind? Real, with its own Indiegogo campaign. The only thing I haven’t been able to find is a nootropic called BrainMind, but it’s only a matter of time.

These usually combine ten or twenty different chemicals with potential nootropic properties, then make outrageous claims about the results. For example, Neuroxium says on its ridiculous webpage that:

Neuroxium is a revolutionary brain supplement formulated to give you ultimate brain power. Known in Scientific Terms as a “NOOTROPIC” or “GENIUS PILL” Neuroxium improves mental functions such as cognition, memory, intelligence, motivation, attention, concentration and therefore happiness and success.

Your first warning sign should have been when they said “genius pill” was a scientific term (or as they call it, Scientific Term). If you needed more warning signs, this is word-for-word the same claim made by several other nootropics like Synagen IQ, Nootrox, and Cerebral X. So either they can’t even be bothered not to plagiarize their ads, or they change their name about once a week to stay ahead of the law.

I was eventually able to find a list of the ingredients in this stuff:

DMAE (dimethylethanolamine bitartrate), GABA (?-Amino-butyric acid), Caffeine anhydrous, Bacopa monnieri leaf extract, NALT (N-acetyl-L-tyrosine), Centrophenoxine HCl, Alpha-GPC (a-glycerophosphocholine, Agmatine sulfate, Gingko biloba leaf extract, Pine (Pinus pinaster) bark extract, Phosphatidylserine, Aniracetam, CDC Choline (Citicoline), Sarcosine (N-methylglycine), Vincamine [Lesser Periwinkle (Vinca minor) aerial extract], L-Theanine (?-glutamylethylamide), NADH (nicotinamide adenine dinucleotide), TAU (triacetyluridine), Noopept, Adrafinil, Tianeptine, Piperine [Black Pepper (Piper nigrum) fruit extract 445mg.

And the weird thing is, a lot of these are decent choices. Everyone knows caffeine is a good stimulant. Adrafinil is the over-the-counter version of modafinil, an FDA-approved medication for sleep disorders; many of my patients have been very happy with it. Bacopa monnieri has been found to improve memory in so many studies I can’t even keep track of all of them. Noopept is an approved medication in Russia. Tianeptine is an approved medication in France. All of these are chemicals with at least some evidence base behind them, which are potentially good for certain people. If some nootropics user were to say they wanted to try adrafinil, or bacopa, or noopept, or any of the other stuff on that list, I would classify them with the mountain climber – doing something risky but not necessarily stupid.

But taking Neuroxium/Synagen/CerebralX is exactly as bad an idea as you would expect from the advertising copy.

For one thing, they don’t list the doses of any of these things – but they have to be getting them terribly wrong. A standard dose of adrafinil is 600 mg. A standard dose of bacopa is 300 mg. A standard dose of Alpha-GPC choline is about 600 mg. So combining standard doses of just these three ingredients means you need a 1.5 g pill. This is probably too big to swallow. The only pills I know of that get that big are those gigantic fish oil pills made of pure fish oil that everybody hates because they’re uncomfortably big. But this is just what you’d need to have three of the 22 ingredients listed in CerebralX at full doses. The pill is already unswallowably large, and you’ve only gotten a seventh of the way through the ingredient list.

I conclude that they’re just putting miniscule, irrelevant doses into this so they can say they’ve got exciting-sounding chemicals.

For another thing, all of these substances have unique profiles which have to be respected on their own terms. For example, lots of studies say bacopa improves memory – but only after you’ve taken it consistently for several months. If you just go “WOOO, CEREBRALX!” and swallow a bunch of pills and hope that you’ll do better on your test tomorrow, all you’re going to get are the short-term effects of bacopa – which include lethargy and amotivation.

Most sources discussing Noopept recommend starting very low – maybe as low as 5 mg – and then gradually increasing to a standard dose of 10 – 40 mg depending on how it works for you. Some people will apparently need higher doses, and some find it works best for them as high as 100 mg. Needless to say, none of this is possible if you’re taking CerebralX. You’ll take whatever dose is in the product – which they don’t tell you, and which is probably so low as to be meaningless – and stay at the same level for however long you’re taking the entire monstrosity.

Tianeptine has a short half-life and is typically dosed three times a day, unlike most of the other things on the list which are dosed once per day. CerebralX says you should take their whole abomination once a day, which means you’re getting the wrong dosing schedule of tianeptine.

GABA, taken orally, doesn’t cross the blood-brain barrier and has no effect. The only way it could possibly make a difference – and even this is debatable – is if you join it to niacin to create the N-nicotinoyl-GABA molecule, which these people did not do. As a result, their GABA will be totally inert. This is probably for the best, because most of the things on their list are stimulants, and GABA is a depressant, so it would probably all just cancel out.

Piperine is a chemical usually used to inhibit normal drug-metabolizing enzymes and enhance the effect of other substances. This is very occasionally a good idea, when you know exactly what drug you’re trying to enhance and you’re not taking anything else concurrently. But I can’t figure out which drug they’re trying to enhance the activity of here, or even whether they’re trying to enhance the activity of anything at all, or if they just heard that piperine could enhance things and thought “Okay, it’s in”. And if I were giving someone a concoction of twenty-one different random psychoactive drugs, which I was dosing wrong and giving at the wrong schedule, the last thing I would want to do is inhibit the body’s normal drug metabolism. The entire reason God gave people drug-metabolizing enzymes is because He knew, in His wisdom, that some of them were going to be idiots who would take a concoction of twenty-one different random psychoactive drugs because a website said it was, in Scientific Terms, a “GENIUS PILL”. Turning them off is a terrible idea and the only saving grace is that the dose of everything in this monstrosity is probably too small for it to do anything anyway.

Taking any of the ingredients in CerebralX on its own is a potentially risky affair. But if you study up on it and make sure to take it correctly, then maybe it’s a calculated risk, like mountain climbing. Taking everything in CerebralX together is more like trying to mountain-climb in a t-shirt and sandals. You’re not taking a calculated risk as part of a potentially interesting hobby. You’re just being an idiot.


But that’s too easy. I have a larger point here, which is that these sorts of branded combos are bad ideas even if they’re by smart, well-intentioned people who are doing everything right.

Tru-Brain is undeniably in a class above CerebralX. It has a team including neuroscience PhDs. It seems to be a real company that can keep the same name for more than a week. Instead of promising a GENIUS PILL, it makes comparatively modest claims to be able to “perform at your peak” and “stay sharp all day long”.

Correspondingly, its special nootropics combo makes a lot more pharmacological sense. For one thing, it’s a packet rather than a single pill – a concession to the impossibility of combining correct doses of many substances into a single capsule. For another, it limits itself to mostly things that some sane person could conceivably in some universe want to dose on the schedule they recommend. And it’s only got seven ingredients, none of which counteract any of the others or turn off important metabolic systems that God created to protect you from your own stupidity. This is probably about as well-designed a branded nootropics combo as it’s possible to make.

But I would still caution people away from it. Why?

Last year, I surveyed people’s reactions to various nootropics. I got 870 responses total, slightly fewer for each individual substance. Here are the response curves for two of the substances in TruBrain – piracetam and theanine:

These are on a 1-10 scale, where I directed responders to:

Please rate your subjective experience on a scale of 0 to 10. 0 means a substance was totally useless, or had so many side effects you couldn’t continue taking it. 1 – 4 means for subtle effects, maybe placebo but still useful. 5 – 9 means strong effects, definitely not placebo. 10 means life-changing.

Some substances known to be pretty inert averaged scores of around 4. Piracetam and theanine averaged around 5, so maybe a little better than that. But the most dramatic finding was the range. Almost 20% of people rated theanine a two or lower; almost 20% rated it a nine or higher. More than a third placed it in the “probably placebo” range, but 5% found its effects “life-changing”.

The effect of nootropics seems to vary widely among different people. This shouldn’t be surprising: so do the effects of real drugs. Gueorguieva and Mallinckrodt do an unusually thorough job modeling differences in response to the antidepressant duloxetine, and find a clear dichotomy between responders and nonresponders. This matches psychiatric lore – some medications work on some people, other medications work on others. I particularly remember one depressed patient who had no response at all to any SSRI, but whose depression shut off almost like a lightswitch once we tried bupropion. Other people fail bupropion treatment but do well on SSRIs. Probably this has something to do with underlying differences in their condition or metabolism that we just don’t know how to identify at this point (sample simplified toy model: what we call “depression” is actually two diseases with identical symptoms, one of which responds to SSRIs and one of which responds to bupropion).

I think this is why there are no multidrug combo packs. Your psychiatrist never treats your depression with a pill called “MegaMood”, boasting combination doses of Prozac, Wellbutrin, Remeron, and Desyrel. For one thing, either you’re giving an insufficient dose of each drug, or you’re giving full doses of four different drugs – neither is well-tested or advisable. For another, you’re getting four times the side effect risk. For a third thing, if one of the four drugs gives you a side effect, you’ve got to throw out the whole combo. For a fourth, if the combo happens to work, you don’t know whether it’s only one of the four drugs working and the others are just giving you side effects and making you worse. And if it sort of works, you don’t know which of the four drugs to increase, or else you just have to increase all four at once and hope for the best.

All these considerations are even stronger with nootropics. There shouldn’t be universally effective nootropics, for the same reason there’s no chemical you can pour on your computer to double its processing speed: evolution put a lot of work into making your brain as good as possible, and it would be silly if some random molecule could make it much better. Sure, there are exceptions – I think stimulants get a pass because evolution never expected people to have to pay attention to stimuli as boring as the modern world provides us with all the time – but in general the law holds. If you find a drug does significantly help you, it’s probably because your brain is broken in some particular idiosyncratic way (cf. mutational load), the same way you can double a computer’s processing speed with duct tape if one of the processors was broken.

If everyone’s brain is broken in a different way, then not only will no drug be universally effective, but drugs with positive effects for some people are likely to have negative effects for others. If (to oversimplify) your particular brain problem is not having enough serotonin, a serotonin agonist might help you. But by the same token, if you have too much serotonin, a serotonin agonist will make your life worse. Even if you have normal serotonin, maybe the serotonin agonist will push you out of the normal range and screw things up.

Most effective psychiatric drugs hurt some people. I mean, a lot of them hurt the people they’re supposed to be used for – even the psychotic people hate antipsychotics – but once you’ve brushed those aside, there are a lot of others that help a lot of people, but make other people feel worse. There are hordes of people who feel tired on stimulants, or sleepy on caffeine, or suicidal on antidepressants, or any other crazy thing. You rarely hear about these, because usually if someone’s taking a drug and it makes them feel worse, they stop. But psychiatrists hear about it all the time. “That antidepressant you gave me just made me feel awful!” Oh, well, try a different one. “That’s it? Try a different one? Aren’t you embarassed that your so-called antidepressant made me more depressed?” You’re pretty new to this ‘psychopharmacology’ thing, aren’t you?

Thus the tactic used by every good psychiatrist: try a patient on a drug that you think might work, make them report back to you on whether it does. If so, keep it; if not, switch.

If you take a seven-drug combo pack, you lose this opportunity for experiment. Suppose that two of the drugs make you feel +1 unit better, two others have no effect, and three of the drugs make you feel -0.5 units worse, so in the end you feel +0.5 units better. Maybe that seems good to you so you keep taking it. Now you’re taking five more drugs than you need to, including three making you actively worse, and you’re missing the chance to be a full +2 units better by just taking the drugs that are helping and not hurting.

You’re also missing the opportunity to play with the doses or the schedules of things. Maybe if you doubled the dose of one of the drugs making you +1 better, you could be +2 better, but if you double the dose of the other, you start getting side effects and the drug only breaks even. If you experiment, you can figure this out and take twice the dose of the first and the starting dose of the second, for +3 better. Taking them all as part of a combo ruins this: if you try taking twice the dose of the combo, nothing happens.

(And a special word of warning: if some stimulant product combines caffeine with something else, and you feel an effect, your first theory should be that the effect is 100% caffeine – unless the “something else” is amphetamine. There are like a million products which bill themselves as “organic energy cocktails” by combining caffeine with some rare herb from Burma. People drink these and say “Oh, this high feels so much more intense than just drinking caffeine”. Yeah, that’s because it’s much more caffeine. Seriously. Check the doses on those things. I will grudgingly make an exception for some chemicals that are supposed to decrease caffeine jitters, like theanine, which might have a real effect. But the stimulation is from caffeine. Go get an espresso instead.)


But don’t drugs interact? Instead of viewing these seven drugs as seven different variables, shouldn’t we view them as coming together in a seven-color beautiful rainbow of happiness, or whatever?

Once again, I can only appeal to psychiatry, which is still unsure whether there are any useful interactions between its various super-well-studied drugs which it’s been using for decades and prescribing to millions of people. Take the CO-MED study, which combined the popular SSRI escitalopram with the popular NDRI bupropion. Since depression seems to involve abnormalities in the three major catecholamine systems, and escitalopram hits one of these and bupropion hits the other two, this seems like exactly the sort of synergistic interaction we should expect to work. It doesn’t. CO-MED found that the two antidepressants together didn’t treat depression any better than either one alone, let alone produce some synergy that made them more than the sum of their parts. They did, however, have about twice as many side effects.

Other smaller studies say the opposite, so I’m not saying never try escitalopram and bupropion together. I’m saying we don’t know. These are intensely-studied drugs, the whole power of the existing system has been focused on the question of whether they synergize or antisynergize or what, and we’re still not sure.

Also from psychiatry: we know a lot less about the mechanisms of action of drugs than we like to think. Ketamine has been intensively studied for depression for a decade or so, and we only just learned last year that it probably worked on a different receptor than we thought. SSRIs might be the most carefully studied drug class of all time, and we still don’t really know exactly what’s up with them – it can’t just be serotonin; they increase serotonin within a day of ingestion, but take a month to work.

So when people take these incredibly weird substances that have barely been studied at all, where we have only the faintest clue how they work, and then say from their armchair “And therefore, drug A will enhance the effects of drug B and C” – this is more than a little arrogant. Is it all made up? I can’t say “all” with surety. But it might be.

The best-known and most-discussed interaction in nootropics is piracetam-choline. Piracetam increases levels of acetylcholine, which is formed from choline, so it makes sense that these two substances would go well together. Most sites on piracetam urge you to take them together. TruBrain, which predictably is on top of this kind of stuff, combines them together in its combo pack.

But there’s never been a human study showing that this helps., another group which is usually on top of stuff, summarizes (emphasis carried over from original):

[Choline] may augment the relatively poor memory enhancing effects of Piracetam in otherwise healthy animals, but administration of choline alongside Piracetam is not a prerequisite to its efficacy and has not been tested in humans

I surveyed a bunch of choline users, using a little gimmick. Some of the forms of choline sold these days don’t cross the blood-brain barrier and shouldn’t have an effect, so they provide a sort of placebo control for more active forms of choline. In my survey, people who took piracetam with inactive forms of choline didn’t report any worse an experience than those who took the real thing.

This is the most famous and best-discussed interaction in the entire field of nootropics, and it’s on super-shaky ground. So trust me, the CerebralX people don’t have good evidence about the interactions of all twenty-one of their ridiculous substances.

I have to admit, I’m not confident in this part. Maybe psychiatry is wrong. Sometimes I wonder what would happen if we just throw five different antidepressants with five different mechanisms of action at somebody at once. Realistically, maybe this would involve some supplements: l-methylfolate, SAMe, tryptophan, turmeric, and a traditional SSRI. One day I want to try this on someone I know well enough to let me test things on them, but not so well I don’t mind losing a friend when it all blows up in my face. Until then, keep in mind that anyone who says they bet a certain combination of things will produce a synergistic interaction is engaging in the wildest sort of speculation.


One more piece of evidence. The 2016 nootropics survey asked people to rate their experiences with 35 different individual substances, plus a branded combo pack (“AlphaBrain”) of pretty good reputation. The AlphaBrain performed worse than any of the individual substances, including substances that were part of AlphaBrain!

This is of course a very weak result – it wasn’t blinded, and maybe the survey responders have the same anti-branded-combo prejudice I do. But it at least suggests knowledgeable people in the nootropics community are really uncomfortable with this stuff.

90% of the people making branded combo nootropics are lying scum. A few, like TruBrain, seem like probably decent people trying to get it right – but are you confident you can tell them apart? And if you do manage to beat the odds and get something that’s not a complete pharmacological mess, aren’t you still just going to end up with an overpriced bundle of black boxes that won’t provide you with useful information, and which, empirically, everyone hates?

If you’re interested in nootropics, consider trying one substance at a time, very carefully, using something like to learn how to take it and what the possible side effects are. If you can, do what people like Gwern do and try it blind, mixing real pills with placebo pills over the space of a few weeks, so you can make sure it’s a real effect. If you find something that does have a real effect on you, treat that knowledge as a hard-won victory. Then, if you want to go from there, tentatively add a second chemical and test that one in the same way. Do this, and you have some small sliver of a chance of doing more good than harm, at least in the short term.

But if you’re going to order a combination of twenty different things at homeopathic doses from somebody who thinks “GENIUS PILL” is a Scientific Term – well, I hope it works, because you need it.

Posted in Uncategorized | Tagged | 92 Comments

The Lizard People Of Alpha Draconis 1 Decided To Build An Ansible


The lizard people of Alpha Draconis 1 decided to build an ansible.

The transmitter was a colossal tower of silksteel, doorless and windowless. Inside were millions of modular silksteel cubes, each filled with beetles, a different species in every cube. Big beetles, small beetles, red beetles, blue beetles, friendly beetles, venomous beetles. There hadn’t been a million beetle species on Alpha Draconis I before the ansible. The lizard people had genetically engineered them, carefully, lovingly, making each one just different enough from all the others. Atop each beetle colony was a heat lamp. When the heat lamp was on, the beetles crawled up to the top of the cage, sunning themselves, basking in the glorious glow. When it turned off, they huddled together to warmth, chittering out their anger in little infrasonic groans only they could hear.

The receiver stood on 11845 Nochtli, eighty-five light years from Alpha Draconis, toward the galactic rim. It was also made of beetles, a million beetle colonies of the same million species that made up the transmitter. In each beetle colony was a pheromone dispenser. When it was on, the beetles would multiply until the whole cage was covered in them. When it was off, they would gradually die out until only a few were left.

Atop each beetle cage was a mouse cage, filled with a mix of white and grey mice. The white mice had been genetically engineered to want all levers in the “up” position, a desire beyond even food or sex in its intensity. The grey mice had been engineered to want levers in the “down” position, with equal ferocity. The lizard people had uplifted both strains to full sapience. In each of a million cages, the grey and white mice would argue whether levers should be up or down – sometimes through philosophical debate, sometimes through outright wars of extermination.

There was one lever in each mouse cage. It controlled the pheromone dispenser in the beetle cage just below.

This was all the lizard people of Alpha Draconis 1 needed to construct their ansible.

They had mastered every field of science. Physics, mathematics, astronomy, cosmology. It had been for nothing. There was no way to communicate faster-than-light. Tachyons didn’t exist. Hyperspace didn’t exist. Wormholes didn’t exist. The light speed barrier was absolute – if you limited yourself to physics, mathematics, astronomy, and cosmology.

The lizard people of Alpha Draconis I weren’t going to use any of those things. They were going to build their ansible out of negative average preference utilitarianism.


Utilitarianism is a moral theory claiming that an action is moral if it makes the world a better place. But what do we mean by “a better place”?

Suppose you decide (as Jeremy Bentham did) that it means increasing the total amount of happiness in the universe as much as possible – the greatest good for the greatest number. Then you run into a so-called “repugnant conclusion”. The philosophers quantify happiness into “utils”, some arbitrary small unit of happiness. Suppose your current happiness level is 100 utils. And suppose you could sacrifice one util of happiness to create another person whose total happiness is two utils: they are only 1/50th as happy as you are. This person seems quite unhappy by our standards. But crucially, their total happiness is positive; they would (weakly) prefer living to dying. Maybe we can imagine this as a very poor person in a war-torn Third World country who is (for now) not actively suicidal.

It would seem morally correct to make this sacrifice. After all, you are losing one unit of happiness to create two units, increasing the total happiness in the universe. In fact, it would seem morally correct to keep making the sacrifice as many times as you get the opportunity. The end result is that you end up with a happiness of 1 util – barely above suicidality – and also there are 99 extra barely-not-suicidal people in war-torn Third World countries.

And the same moral principles that lead you to make the sacrifice bind everyone else alike. So the end result is everyone in the world ends up with the lowest possible positive amount of happiness, plus there are billions of extra near-suicidal people in war-torn Third World countries.

This seems abstract, but in some sense it might be the choice on offer if we have to decide whether to control population growth (thus preserving enough resources to give everyone a good standard of living), or continue explosive growth so that there are many more people but not enough resources for any of them to live comfortably.

The so-called “repugnant conclusion” led many philosophers away from “total utilitarianism” to “average utilitarianism”. Here the goal is still to make the world a better place, but it gets operationalized as “increase the average happiness level per person”. The repugnant conclusion clearly fails at this, so we avoid that particular trap.

But here we fall into another ambush: wouldn’t it be morally correct to kill unhappy people? This raises average happiness very effectively!

So we make another amendment. We’re not in the business of raising happiness, per se. We’re in the business of satisfying preferences. People strongly prefer not to die, so you can’t just kill them. Killing them actively lowers the average number of satisfied preferences.

Philosopher Roger Chao combines these and other refinements of the utilitarian method into a moral theory he calls negative average preference utilitarianism, which he considers the first system of ethics to avoid all the various traps and pitfalls. It says: an act is good if it decreases the average number of frustrated preferences per person.

This doesn’t imply we should create miserable people ad nauseum until the whole world is a Third World slum. It doesn’t imply that we should kill everyone who cracks a frown. It doesn’t imply we should murder people for their organs, or never have children again, or replace everybody with identical copies of themselves, or anything like that.

It just implies faster-than-light transmission of moral information.


The ansible worked like this:

Each colony of beetles represented a bit of information. In the transmitter on Alpha Draconis I, the sender would turn the various colonies’ heat lamps on or off, increasing or decreasing the average utility of the beetles.

In the receiver on 11845 Nochtli, the beetles would be in a constant state of half-light: warmer than the Draconis beetles if their heat lamp was turned off, but colder than them if their heat lamp was turned on. So increasing the population of a certain beetle species on 11845 Nochtli would be morally good if the heat lamp for that species on Alpha Draconis were off, but morally evil otherwise.

The philosophers among the lizard people of Alpha Draconis 1 had realized that this was true regardless of intervening distance; morality was the only force that transcended the speed of light. The question was how to detect it. Yes, a change in the heat lamps on their homeworld would instantly change the moral valence of pulling a lever on a colony 85 light-years away, but how to detect the morality of an action?

The answer was: the arc of the moral universe is long, but it bends toward justice. Over time, as the great debates of history ebb and sway, evil may not be conquered completely, but it will lessen. Our own generation isn’t perfect, but we have left behind much of the slavery, bigotry, war and torture, of the past; perhaps our descendants will be wiser still. And how could this be, if not for some benevolent general rule, some principle that tomorrow must be brighter than today, and the march of moral progress slow but inevitable?

Thus the white and grey rats. They would debate, they would argue, they would even fight – but in the end, moral progress would have its way. If raising the lever and causing an increase in the beetle population was the right thing to do, then the white rats would eventually triumph; if lowering the lever and causing the beetle population to fall was right, then the victory would eventually go to the grey. All of this would be recorded by a camera watching the mouse colony, and – lo – a bit of information would have been transmitted.

The ansible of the lizard people of Alpha Draconis 1 was a flop.

They spent a century working on it: ninety years on near-light-speed starships just transporting the materials, and a decade constructing the receiver according to meticulous plans. With great fanfare, the Lizard Emperor himself sent the first message from Alpha Draconis I. And it was a total flop.

The arc of the moral universe is long, but it bends to justice. But nobody had ever thought to ask how long, and why. When everyone alike ought to love the good, why does it take so many years of debate and strife for virtue to triumph over wickedness? Why do war and slavery and torture persist for century after century, so that only endless grinding of the wheels of progress can do them any damage at all?

After eighty-five years of civilizational debate, the grey and white mice in each cage finally overcame their differences and agreed on the right position to put the lever, just as the mundane lightspeed version of the message from Alpha Draconis reached 11845 Nochtli’s radio telescopes. And the lizard people of Alpha Draconis 1 realized that one can be more precise than simply defining the arc of moral progress as “long”. It’s exactly as long as it needs to be to prevent faster-than-light transmission of moral information.

Fundamental physical limits are a harsh master.

Links 8/17: On The Site Of The Angels

Benjamin Lay was a four-foot-tall Quaker abolitionist who, among other unusual forms of activism, kidnapped a slaveowner’s child to give them a taste of what slaves had to go through.

ProPublica: The Myth Of Drug Expiration Dates. Most drugs (strong exception for tetracyclines) are neither dangerous nor ineffective once expired. The idea of “drug expiration dates” is just bureaucratic boilerplate. It also costs health systems billions of dollars per year. And key quote: “ProPublica has been researching why the U.S. health care system is the most expensive in the world. One answer, broadly, is waste — some of it buried in practices that the medical establishment and the rest of us take for granted. We’ve documented how hospitals often discard pricey new supplies, how nursing homes trash valuable medications after patients pass away or move out, and how drug companies create expensive combinations of cheap drugs. Experts estimate such squandering eats up about $765 billion a year — as much as a quarter of all the country’s health care spending.”

xhxhxhxh: the research on what leads to intrastate conflict and rebellion. No effect for traditional worries like income inequality or ethnic polarization, etc. Mostly just bad economy and slow growth.

Vice: Everyone Hates Neoliberals, So We Talked To Some. What do self-described neoliberals identify as the core of their philosophy? Key quote from Samuel Hammond: “We are free market globalists, and evangelists of the amazing power of trade liberalization to create wealth, eliminate disease, lift hundreds of millions of people from poverty, and end the pre-conditions for war. At the same time, we are more pragmatic and consequentialist than our utopian and deontological libertarian counterparts… We believe free markets and commercial capitalism are the tools of social justice, rather than the enemy.”

Tengrism, the religion of Genghis Khan and other steppe nomads, is making a comeback in Central Asian republics looking for a suitably nationalist alternative to Islam.

Study: “Across four samples (including a nationally representative sample), we find that stronger obsessive-compulsive symptoms are associated with more right-wing ideological preferences, particularly for social issues.” This should probably be considered in context of Haidt’s work on the Purity foundation, and the Germ Theory Of Democracy.

How Class In China Became Politically Incorrect. Key quote: “Research by the University of Sydney’s David Goodman has found that around 84% of today’s elite are direct descendants of the elite from pre-1949. This suggests that six decades of Communism may not have a dramatic impact upon the elites”. Seen on Twitter with the commentary “Darwin beats Marx every time”.

From Rationalist Tumblr: those claims that medical error is the third-leading cause of death, kills 200,000 people every year, etc? Totally exaggerated. And most people interpret it as ‘number of stupid mistakes by doctors’ when it really means more like “the number of bad health outcomes that could be prevented with perfect god-like-omniscient understanding of all patents’ health situation”.

Andrew Gelman takes on James Heckman; read the comments for some good debate around Perry-Preschool-style interventions.

2016 election margin by district by population. Make sure to spin it around to get the full 3-D effect. This is the first graph I’ve seen that manages to combine two dimensions of space plus two extra variables in a really good instantly-readable way.

72 top researchers and statisticians (SSC readers might recognize Ioannidis, Wagenmakers, Nyhan, & Vazire) sign their names to a paper recommending the threshold for statistical significance be raised from p = 0.05 to p = 0.005 to decrease false positives and improve replicability. Some pushback from other statisticians involved in the replicability movement including Timothy Bates and (preemptively) Daniel Lakens. Both groups agree that it’s a hackish solution that ignores all the important subtleties around the question, but disagree on whether having something easy is at least better than nothing.

US Court Grants Journals Millions Of Dollars In Damages From Sci-Hub. It sure would be a shame if this caused a Streisand Effect where many more people became aware of the existence of Sci-Hub, a free and easy-to-use source for almost all otherwise-paywalled scientific papers, which by the way depends on reader donations to stay online.

Related study: Sci-Hub Provides Access To Nearly All Scholarly Literature. “As of March 2017, we find that Sci-Hub’s database contains 68.9% of all 81.6 million scholarly articles, which rises to 85.2% for those published in closed access journals….we estimate that over a six-month period in 2015–2016, Sci-Hub provided access for 99.3% of valid incoming requests. Hence, the scope of this resource suggests the subscription publishing model is becoming unsustainable.”

The Intercept: US Lawmakers Seek To Criminally Outlaw Support For Boycott Campaign Against Israel vs. Volokh Conspiracy: Israel Anti-Boycott Does Not Violate Free Speech. Some people on Rationalist Tumblr explained this to me: the bill says that Americans can’t join foreign anti-Israel boycotts, but doesn’t prevent them from starting their own, including ones that are exactly like the foreign ones and can’t be distinguished from them in any way. The bill’s proponents say that the only thing it does is prevent foreign countries from demanding American companies boycott Israel as a precondition to doing business there. I think the opposing argument is mostly that laws often get overapplied, and this one seems more overapplicable than most.

Machine Learning Applied To Initial Romantic Attraction: “Crucially [machine learning techniques] were unable to predict relationship variance using any combination of traits and preferences reported beforehand.” See also my previous post on this topic.

Study by Amir Sariaslan and others: after adjusting for unobserved familial risk factors, no link between poverty and crime.

Edge conversation on various things with Rory Sutherland. Starts with why art prices are so much more responsive to fame than architecture prices (a Picasso might cost a thousand times more than a less painter’s work; a Frank Lloyd Wright house costs 1-3% more than a house built by a nobody) and only gets better from there.

Hypermagical Ultraomnipotence: Why the tradeoffs constraining human cognition do not limit artificial superintelligences.

I was really excited about an upcoming depression treatment called NSI-189 that seemed to do everything right and had the potential to revolutionize the field. Well, it just failed its clinical trial.

First genetically-engineered human embryos in the US. Found it was possible to safely correct a defective gene without damaging the rest of the genome (and here’s the paper). The embryos were destroyed and not carried to term.

Freddie deBoer: Bernie Sanders Is A Socialist In Name Only. I really like this piece, and I was going to write it if nobody else did. Most of the policies being mooted by the supposedly socialist left today – Medicare-for-all, better social safety nets, et cetera – are well within the bounds of neoliberalism – ie private property and capitalist economies should exist, but the state should help poor people. “Socialism” should be reserved for systems that end private property and nationalize practically everything. I’m worried that people will use the success of neoliberal systems in eg Sweden to justify socialism, and then, socialism having been justified, promote actual-dictionary-definition socialism. To a first approximation, Sweden is an example of capitalists proving socialism isn’t necessary; Maoist China is an example of socialism actually happening.

Did you know: the first recorded evidence of Sanskrit comes from Syria, not India.

American Runners Are Getting Slower. Definitely see the r/slatestarcodex comment thread. A good example of ruling out a lot of possible confounding factors for a seemingly bizarre result – but I find the argument that the best athletes are moving into other sports more convincing than the article’s own nutritional theory.

Retailer apologizes after accidentally selling product saying “MY FAVORITE COLOR IS HITLER”.

Remember how everyone thought that, if we legalized euthanasia, it would be used as a tool to kill marginalized and oppressed people who couldn’t say no to it? Data after a year of California’s right-to-die law finds it’s disproportionately used by college-educated white men and concludes that Death Is A Social Privilege.

What jobs have the highest and lowest divorce rates? (conditional on being married in the first place). Key finding: everything math- and computer-related has much lower divorce rates than everything else.

Widespread Selection Of Positive Selection In Common Risk Alleles Associated With Autism Spectrum Disorder. This is pretty complicated, but I think what it’s saying is that in general, having autism risk genes increases your intelligence up until the point when you actually have autism, when you become vulnerable to all of the normal autism-related-cognitive-deficits. But this is probably very heterogenous across risk genes and other risk factors.

Israel working to shut down Al-Jazeera out of concerns about “encouraging terrorism”; pretty good example of how anything less than free-speech-absolutism can be circumvented by a sufficiently urgent-sounding plea. [EDIT: But see here]

Facebook shuts down an experimental language AI project, and the media goes crazy.Everyone on every side of the AI risk debate, from Eliezer Yudkowsky to Yann LeCun, wants to make it clear they think this is stupid and it has nothing to do with the position of any reasonable person.

An academic study into horseshoe theory? Authoritarianism and Affective Polarization: A New View on the Origins of Partisan Extremism finds that “strong Republicans and Democrats are psychologically similar, at least with respect to authoritarianism…these findings support a view of mass polarization as nonsubstantive and group-centric, not driven by competing ideological values or clashing psychological worldviews.” Okay, but you still need some explanation of how people choose which group to be in, right?

Single Dose Testosterone Administration Impairs Cognitive Reflection In Men. Note that “single dose testosterone” is very different from “having lots of testosterone chronically”, “being fetally exposed to testosterone”, “being genetically male”, and five million other things it would be easy to confuse this with.

The Hyderabad office of India’s Department of Fisheries.

British Medical Journal Global Health: new data available after the US invasion of Iraq conclusively determines that the claim that US sanctions starved thousands of Iraqi children was a lie deliberately spread by Saddam Hussein.

Congress passes “right to try” bill allowing terminally ill people to access not-yet-FDA-approved medications. Someone in the comments noted that there’s already a procedure for terminally ill individuals to appeal to the FDA to do this, and FDA approves 99% of such requests already. So not only is this mostly a symbolic victory, but one worries that the 1% of requests that aren’t approved might be pretty bad ideas. [EDIT: But see here]

j9461701 on the subreddit posts about the extreme male brain theory of autism, finding it mostly unconvincing. I mostly agree, though it’s important to remember that hormone differences can have varying and seemingly paradoxical effects depending on what level of the various metabolic processes they come in at.

In response to my question about why prediction markets aren’t used more, Daniel Reeves links me to a study of his offering a pretty simple response: yeah, they’re better than other things, but not much better, and they’re a lot more annoying to use.

Paper on empathy (via Rolf Degen): people with born with a condition that makes them unable to feel pain feel like other people are just weaklings who exaggerate their problems. Classify under “metaphors for life”.

Posted in Uncategorized | Tagged | 795 Comments

Contra Grant On Exaggerated Differences


An article by Adam Grant called Differences Between Men And Women Are Vastly Exaggerated is going viral, thanks in part to a share by Facebook exec Sheryl Sandberg. It’s a response to an email by a Google employee saying that he thought Google’s low female representation wasn’t a result of sexism, but a result of men and women having different interests long before either gender thinks about joining Google. Grant says that gender differences are small and irrelevant to the current issue. I disagree.

Grant writes:

It’s always precarious to make claims about how one half of the population differs from the other half—especially on something as complicated as technical skills and interests. But I think it’s a travesty when discussions about data devolve into name-calling and threats. As a social scientist, I prefer to look at the evidence.

The gold standard is a meta-analysis: a study of studies, correcting for biases in particular samples and measures. Here’s what meta-analyses tell us about gender differences:

When it comes to abilities, attitudes, and actions, sex differences are few and small.

Across 128 domains of the mind and behavior, “78% of gender differences are small or close to zero.” A recent addition to that list is leadership, where men feel more confident but women are rated as more competent.

There are only a handful of areas with large sex differences: men are physically stronger and more physically aggressive, masturbate more, and are more positive on casual sex. So you can make a case for having more men than women… if you’re fielding a sports team or collecting semen.

The meta-analysis Grant cites is Hyde’s, available here. I’ve looked into it before, and I don’t think it shows what he wants it to show.

Suppose I wanted to convince you that men and women had physically identical bodies. I run studies on things like number of arms, number of kidneys, size of the pancreas, caliber of the aorta, whether the brain is in the head or the chest, et cetera. 90% of these come back identical – in fact, the only ones that don’t are a few outliers like “breast size” or “number of penises”. I conclude that men and women are mostly physically similar. I can even make a statistic like “men and women are physically the same in 78% of traits”.

Then I go back to the person who says women have larger breasts and men are more likely to have penises, and I say “Ha, actually studies prove men and women are mostly physically identical! I sure showed you, you sexist!”

I worry that Hyde’s analysis plays the same trick. She does a wonderful job finding that men and women have minimal differences in eg “likelihood of smiling when not being observed”, “interpersonal leadership style”, et cetera. But if you ask the man on the street “Are men and women different?”, he’s likely to say something like “Yeah, men are more aggressive and women are more sensitive”. And in fact, Hyde found that men were indeed definitely more aggressive, and women indeed definitely more sensitive. But throw in a hundred other effects nobody cares about like “likelihood of smiling when not observed”, and you can report that “78% of gender differences are small or zero”.

Hyde found moderate or large gender differences in (and here I’m paraphrasing very scientific-sounding constructs into more understandable terms) aggressiveness, horniness, language abilities, mechanical abilities, visuospatial skills, mechanical ability, tendermindness, assertiveness, comfort with body, various physical abilities, and computer skills.

Perhaps some peeople might think that finding moderate-to-large-differences in mechanical abilities, computer skills, etc supports the idea that gender differences might play a role in gender balance in the tech industry. But because Hyde’s meta-analysis drowns all of this out with stuff about smiling-when-not-observed, Grant is able to make it sound like Hyde proves his point.

It’s actually worse than this, because Grant misreports the study findings in various ways [EDIT: Or possibly not, see here]. For example, he states that the sex differences in physical aggression and physical strength are “large”. The study very specifically says the opposite of this. Its three different numbers for physical aggression (from three different studies) are 0.4, 0.59, and 0.6, and it sets a cutoff for “large” effects at 0.66 or more.

On the other hand, Grant fails to report an effect that actually is large: mechanical reasoning ability (in the paper as Feingold 1998 DAT mechanical reasoning). There is a large gender difference on this, d = 0.76.

And although Hyde doesn’t look into it in her meta-analysis, other meta-analyses like this one find a large effect size (d = 1.18) for thing-oriented vs. people-oriented interest, the very claim that the argument that Grant is trying to argue against centers around.

So Grant tries to argue against large thing-oriented vs. people-oriented differences by citing a meta-analysis that doesn’t look into them at all. He then misreports the findings of that meta-analysis, exaggerating effects that fit his thesis and failing to report the ones that don’t. Finally, he cites a “summary statistic” that averages away the variation we’re looking for out by combining it with a bunch of noise, and claims the noise proves his point even though the variation is as big as ever.


Next, Grant claims that there are no sex differences in mathematical ability, and also that the sex differences in mathematical ability are culturally determined. I’m not really sure what he means [EDIT: He means sex differences that exist in other countries] but I agree with his first argument – at the levels we’re looking at, there’s no gender difference in math ability.

Grant says that these foreign differences in math ability exist but are due to stereotypes, and so are less noticeable in more progressive, gender-equitable nations:

Girls do as well as boys—or slightly better—in math in elementary, but boys have an edge by high school. Male advantages are more likely to exist in countries that lack gender equity in school enrollment, women in research jobs, and women in parliament—and that have stereotypes associating science with males.

Again, my research suggests no average gender difference in ability, so I can’t speak to whether these differences are caused by stereotypes or not. But I want to go back to the original question: why is there a gender difference in tech-industry-representation [in the US]? Is this also due to stereotypes and the effect of an insufficiently gender-equitable society? Do we find that “countries that lack gender equity in school enrollment” and “stereotypes associating science with males” have fewer women in tech?

No. Galpin investigated the percent of women in computer classes all around the world. Her number of 26% for the US is slightly higher than I usually hear, probably because it’s older (the percent women in computing has actually gone down over time!). The least sexist countries I can think of – Sweden, New Zealand, Canada, etc – all have somewhere around the same number (30%, 20%, and 24%, respectively). The most sexist countries do extremely well on this metric! The highest numbers on the chart are all from non-Western, non-First-World countries that do middling-to-poor on the Gender Development Index: Thailand with 55%, Guyana with 54%, Malaysia with 51%, Iran with 41%, Zimbabwe with 41%, and Mexico with 39%. Needless to say, Zimbabwe is not exactly famous for its deep commitment to gender equality.

Why is this? It’s a very common and well-replicated finding that the more progressive and gender-equal a country, the larger gender differences in personality of the sort Hyde found become. I agree this is a very strange finding, but it’s definitely true. See eg Journal of Personality and Social Psychology, Sex Differences In Big Five Personality Traits Across 55 Cultures:

Previous research suggested that sex differences in personality traits are larger in prosperous, healthy, and egalitarian cultures in which women have more opportunities equal with those of men. In this article, the authors report cross-cultural findings in which this unintuitive result was replicated across samples from 55 nations (n = 17,637).

In case you’re wondering, the countries with the highest gender differences in personality are France, Netherlands, and the Czech Republic. The countries with the lowest sex differences are Indonesia, Fiji, and the Congo.

I conclude that whatever gender-equality-stereotype-related differences Grant has found in the nonexistent math ability difference between men and women, they are more than swamped by the large opposite effects in gender differences in personality. This meshes with what I’ve been saying all along: at the level we’re talking about here, it’s not about ability, it’s about interest.


We know that interests are highly malleable. Female students become significantly more interested in science careers after having a teacher who discusses the problem of underrepresentation. And at Harvey Mudd College, computer science majors were around 10% women a decade ago. Today they’re 55%.

I highly recommend Freddie deBoer’s Why Selection Bias Is The Most Powerful Force In Education. If an educational program shows amazing results, and there’s any possible way it’s selection bias – then it’s selection bias.

I looked into Harvey Mudd’s STEM admission numbers, and, sure enough, they admit women at 2.5x the rate as men. So, yeah, it’s selection bias.

I don’t blame them. All they have to do is cultivate a reputation as a place to go if you’re a woman interested in computer science, attract lots of female CS applicants, then make sure to admit all the CS-interested female applicants they get. In exchange, they get constant glowing praise from every newspaper in the country (1, 2,
3, 4, 5, 6, 7, 8, 9, 10, etc, etc, etc).

How would we know this was selection bias if we couldn’t just look at the numbers? The graph that Grant himself cites just above this statement shows that, over the same ten year period, percent women CS graduates has declined nationwide. This has corresponded with such a massive push to get more women in tech that…well, that a college which succeeds will get constant glowing praise from every newspaper in the country even when they admit they’re using selection bias. Do you think no one else has tried? Every college diversity office in the country is working overtime to try to get more women into tech, there are women in tech scholarships, women in tech conferences, women in tech prizes – and, over the period that’s happened, Grant’s own graph shows the percent of women in tech going down.

(I don’t understand why it’s going down as opposed to steady, but my guess is a combination of constant messaging that there are no women in tech making women think it isn’t for them, plus the effect from society getting more gender-equitable that we described in Part II – ie we’re now less like Zimbabwe, and so we can’t expect our gender ratios to be as good as theirs are).

Look. If I recruit only gingers, and I admit only gingers, I can get a 100% ginger CS program. That doesn’t mean I’ve proven that gingers are really more interested in CS than everyone else, and it was just discrimination holding them back. It means I’ve done what every single private school and college does anyway, all the time – finagle with admissions to make myself look good.

[EDIT: Some further discussion by Mudd students in the comments here]


Back to Grant:

4. There are sex differences in interests, but they’re not biologically determined.

The data on occupational interests do reveal strong male preferences for working with things and strong female preferences for working with people. But they also reveal that men and women are equally interested in working with data.

So why are there so many more male than female engineers? Because women have systematically been discouraged from working with computers. Look at trends in college majors: since the 1980s, the proportion of female majors has gone up in science and medicine and law, but down in computer science.

Before we discuss this, a quick step back.

In the year 1850, women were locked out of almost every major field, with a few exceptions like nursing and teaching. The average man of the day would have been equally confident that women were unfit for law, unfit for medicine, unfit for mathematics, unfit for linguistics, unfit for engineering, unfit for journalism, unfit for psychology, and unfit for biology. He would have had various sexist justifications – women shouldn’t be in law because it’s too competitive and high-pressure; women shouldn’t be in medicine because they’re fragile and will faint at the sight of blood; et cetera.

As the feminist movement gradually took hold, women conquered one of these fields after another. 51% of law students are now female. So are 49.8% of medical students, 45% of math majors, 60% of linguistics majors, 60% of journalism majors, 75% of psychology majors, and 60% of biology postdocs. Yet for some reason, engineering remains only about 20% female.

And everyone says “Aha! I bet it’s because of negative stereotypes!”

This makes no sense. There were negative stereotypes about everything! Somebody has to explain why the equal and greater negative stereotypes against women in law, medicine, etc were completely powerless, yet for some reason the negative stereotypes in engineering were the ones that took hold and prevented women from succeeding there.

And if your answer is just going to be that apparently the negative stereotypes in engineering were stronger than the negative stereotypes about everything else, why would that be? Put yourself in the shoes of our Victorian sexist, trying to maintain his male privilege. He thinks to himself “Well, I suppose I could tolerate women doctors saving my life. And if I had to, I would accept women going into law and determining who goes free and who goes to jail. I’m even sort of okay with women going into journalism and crafting the narratives that shape our world. But women building bridges? NO MERE FEMALE COULD EVER DO SUCH A THING!” Really? This is the best explanation the world can come up with? Doesn’t anyone have at least a little bit of curiousity about this?

(and I don’t think it’s just coincidence – ie I don’t think it’s just that a bunch of head engineers happened to be really sexist, and a bunch of head doctors happened to be really non-sexist. The same patterns apply through pretty much every First World country, and if it were just a matter of personalities you would expect them to differ from place to place.)

Whenever I ask this question, I get something like “engineering and computer science are two of the highest-paying, highest-status jobs, so of course men would try to keep women out of them, in order to maintain their supremacy”. But I notice that doctors and lawyers are also pretty high-paying, high-status jobs, and that nothing of the sort happened there. And that when people aren’t using engineering/programming’s high status to justify their beliefs about gender stereotypes in it, they’re ruthlessly making fun of engineers and programmers, whether it’s watching Big Bang Theory or reading Dilbert or just going on about “pocket protectors”.

Meanwhile, men make up only 10% of nurses, only 20% of new veterinarians, only 25% of new psychologists, about 25% of new paediatricians, about 26% of forensic scientists, about 28% of medical managers, and 42% of new biologists.

Note that many of these imbalances are even more lopsided than the imbalance favoring men in technology, and that many of these jobs earn much more than the average programmer. For example, the average computer programmer only makes about $80,000; the average veterinarian makes about $88,000, and the average pediatrician makes a whopping $170,000.

As long as you’re comparing some poor woman janitor to a male programmer making $80,000, you can talk about how it’s clearly sexism against women getting the good jobs. But once you take off the blinders and try to look at an even slightly bigger picture, you start wondering why veterinarians, who make even more money than that, are even more lopsidedly female than programmers are male. And then you start thinking that maybe you need some framework more sophisticated than the simple sexism theory in order to predict who’s doing all of these different jobs. And once you have that framework, maybe the sexism theory isn’t necessary any longer, and you can throw it out, and use the same theory to predict why women dominate veterinary medicine and psychology, why men dominate engineering and computer science, and why none of this has any relation at all to what fields that some sexist in the 1850s wanted to keep women out of.

So let’s look deeper into what prevents women from entering these STEM fields.

Does it happen at the college level? About 20% of high school students taking AP Computer Science are women. The ratio of women graduating from college with computer science degrees is exactly what you would expect from the ratio of women who showed interest in it in high school (the numbers are even lower in Britain, where 8% of high school computer students are girls). So differences exist before the college level, and nothing that happens at the college level – no discriminatory professors, no sexist classmates – change the numbers at all.

Does it happen at the high school level? There’s not a lot of obvious room for discrimination – AP classes are voluntary; students who want to go into them do, and students who don’t want to go into them don’t. There are no prerequisites except basic mathematical competency or other open-access courses. It seems like of the people who voluntarily choose to take AP classes that nobody can stop them from going into, 80% are men and 20% are women, which exactly matches the ratio of each gender that eventually get tech company jobs.

Rather than go through every step individually, I’ll skip to the punch and point out that the same pattern repeats in middle school, elementary school, and about as young as anybody has ever bothered checking. So something produces these differences very early on? What might that be?

Might young women be avoiding computers because they’ve absorbed stereotypes telling them that they’re not smart enough, or that they’re “only for boys”? No. As per Shashaani 1997, “[undergraduate] females strongly agreed with the statement ‘females have as much ability as males when learning to use computers’, and strongly disagreed with the statement ‘studying about computers is more important for men than for women’. On a scale of 1-5, where 5 represents complete certainty in gender equality in computer skills, and 1 completely certainty in inequality, the average woman chooses 4.2; the average male 4.03. This seems to have been true since the very beginning of the age of personal computers: Smith 1986 finds that “there were no significant differences between males and females in their attitudes of efficacy or sense of confidence in ability to use the computer, contrary to expectation…females [showed] stronger beliefs in equity of ability and competencies in use of the computer.” This is a very consistent result and you can find other studies corroborating it in the bibliographies of both papers.

Might girls be worried not by stereotypes about computers themselves, but by stereotypes that girls are bad at math and so can’t succeed in the math-heavy world of computer science? No. About 45% of college math majors are women, compared to (again) only 20% of computer science majors. Undergraduate mathematics itself more-or-less shows gender parity. This can’t be an explanation for the computer results.

Might sexist parents be buying computers for their sons but not their daughters, giving boys a leg up in learning computer skills? In the 80s and 90s, everybody was certain that this was the cause of the gap. Newspapers would tell lurid (and entirely hypothetical) stories of girls sitting down to use a computer when suddenly a boy would show up, push her away, and demand it all to himself. But move forward a few decades and now young girls are more likely to own computers than young boys – with little change in the high school computer interest numbers. So that isn’t it either.

So if it happens before middle school, and it’s not stereotypes, what might it be?

One subgroup of women does not display these gender differences at any age. These are women with congenital adrenal hyperplasia, a condition that gives them a more typically-male hormone balance. For a good review, see Gendered Occupational Interests: Prenatal Androgen Effects on Psychological Orientation to Things Versus People. They find that:

Consistent with hormone effects on interests, females with CAH are considerably more interested than are females without CAH in male-typed toys, leisure activities, and occupations, from childhood through adulthood (reviewed in Blakemore et al., 2009; Cohen-Bendahan et al., 2005); adult females with CAH also engage more in male-typed occupations than do females without CAH (Frisén et al., 2009). Male-typed interests of females with CAH are associated with degree of androgen exposure, which can be inferred from genotype or disease characteristics (Berenbaum et al., 2000; Meyer-Bahlburg et al., 2006; Nordenström et al., 2002). Interests of males with CAH are similar to those of males without CAH because both are exposed to high (sex-typical) prenatal androgens and are reared as boys.

Females with CAH do not provide a perfect test of androgen effects on gendered characteristics because they differ from females without CAH in other ways; most notably they have masculinized genitalia that might affect their socialization. But, there is no evidence that parents treat girls with CAH in a more masculine or less feminine way than they treat girls without CAH (Nordenström et al., 2002; Pasterski et al., 2005). Further, some findings from females with CAH have been confirmed in typical individuals whose postnatal behavior has been associated with prenatal hormone levels measured in amniotic fluid. Amniotic testosterone levels were found to correlate positively with parent-reported male-typed play in girls and boys at ages 6 to 10 years (Auyeung et al., 2009).

The psychological mechanism through which androgen affects interests has not been well-investigated, but there is some consensus that sex differences in interests reflect an orientation toward people versus things (Diekman et al., 2010) or similar constructs, such as organic versus inorganic objects (Benbow et al., 2000). The Things-People distinction is, in fact, the major conceptual dimension underlying the measurement of the most widely-used model of occupational interests (Holland, 1973; Prediger, 1982); it has also been used to represent leisure interests (Kerby and Ragan, 2002) and personality (Lippa, 1998).

In their own study, they compare 125 such women and find a Things-People effect size of -0.75 – that is, the difference between CAH women and unaffected women is more than half the difference between men and unaffected women. They write:

The results support the hypothesis that sex differences in occupational interests are due, in part, to prenatal androgen influences on differential orientation to objects versus people. Compared to unaffected females, females with CAH reported more interest in occupations related to Things versus People, and relative positioning on this interest dimension was substantially related to amount of prenatal androgen exposure.

What is this “object vs. people” distinction?

It’s pretty relevant. Meta-analyses have shown a very large (d = 1.18) difference in healthy men and women (ie without CAH) in this domain. It’s traditionally summarized as “men are more interested in things and women are more interested in people”. I would flesh out “things” to include both physical objects like machines as well as complex abstract systems; I’d also add in another finding from those same studies that men are more risk-taking and like danger. And I would flesh out “people” to include communities, talking, helping, children, and animals.

So this theory predicts that men will be more likely to choose jobs with objects, machines, systems, and danger; women will be more likely to choose jobs with people, talking, helping, children, and animals.

Somebody armed with this theory could pretty well pretty well predict that women would be interested in going into medicine and law, since both of them involve people, talking, and helping. They would predict that women would dominate veterinary medicine (animals, helping), psychology (people, talking, helping, sometimes children), and education (people, children, helping). Of all the hard sciences, they might expect women to prefer biology (animals). And they might expect men to do best in engineering (objects, machines, abstract systems, sometimes danger) and computer science (machines, abstract systems).

I mentioned that about 50% of medical students were female, but this masks a lot of variation. There are wide differences in doctor gender by medical specialty. For example:

A privilege-based theory fails – there’s not much of a tendency for women to be restricted to less prestigious and lower-paying fields – Ob/Gyn (mostly female) is extremely lucrative, and internal medicine (mostly male) is pretty low-paying for a medical job.

But the people/thing theory above does extremely well! Pediatrics is babies/children, Psychiatry is people/talking (and of course women are disproportionately child psychiatrists), OB/GYN is babies (though admittedly this probably owes a lot to patients being more comfortable with female gynecologists) and family medicine is people/talking/babies/children.

Meanwhile, Radiology is machines and no patient contact, Anaesthesiology is also machines and no patient contact, Emergency Medicine is danger, and Surgery is machines, danger, and no patient contact.

Here’s another fun thing you can do with this theory: understand why women are so well represented in college math classes. Women are around 20% of CS majors, physics majors, engineering majors, etc – but almost half of math majors! This should be shocking. Aren’t we constantly told that women are bombarded with stereotypes about math being for men? Isn’t the archetypal example of children learning gender roles that Barbie doll that said “Math is hard, let’s go shopping?” And yet women’s representation in undergraduate math classes is really quite good.

I was totally confused by this for a while until a commenter directed me to the data on what people actually do with math degrees. The answer is mostly: they become math teachers. They work in elementary schools and high schools, with people.

Then all those future math teachers leave for the schools after undergrad, and so math grad school ends up with pretty much the same male-tilted gender balance as CS, physics, and engineering grad school.

This seems to me like the clearest proof that women being underrepresented in CS/physics/etc is just about different interests. It’s not that they can’t do the work – all those future math teachers do just as well in their math majors as everyone else. It’s not that stereotypes of what girls can and can’t do are making them afraid to try – whatever stereotypes there are about women and math haven’t dulled future math teachers’ willingness to compete difficult math courses one bit. And it’s not even about colleges being discriminatory and hostile (or at least however discriminatory and hostile they are it doesn’t drive away those future math teachers). It’s just that women are more interested in some jobs, and men are more interested in others. Figure out a way to make math people-oriented, and women flock to it. If there were as many elementary school computer science teachers as there are math teachers, gender balance there would equalize without any other effort.

I’m not familiar with any gender breakdown of legal specialties, but I will bet you that family law, child-related law, and various prosocial helping-communities law are disproportionately female, and patent law, technology law, and law working with scary dangerous criminals are disproportionately male. And so on for most other fields.

This theory gives everyone what they want. It explains the data about women in tech. It explains the time course around women in tech. It explains other jobs like veterinary medicine where women dominate. It explains which medical subspecialties women will be dominant or underrepresented in. It doesn’t claim that women are “worse than men” or “biologically inferior” at anything. It doesn’t say that no woman will ever be interested in things, or no man ever interested in people. It doesn’t say even that women in tech don’t face a lot of extra harassment (any domain with more men than women will see more potential perpetrators concentrating their harassment concentrated on fewer potential victims, which will result in each woman being more harassed).

It just says that sometimes, in a population-based way that doesn’t necessarily apply to any given woman or any given man, women and men will have some different interests. Which should be pretty obvious to anyone who’s spent more than a few minutes with men or women.


Why am I writing this?

Grant’s piece was in response to a person at Google sending out a memo claiming some of this stuff. Here is a pretty typical response that a Googler sent to that memo – I’ve blocked the name so this person doesn’t get harassed over it, but if you doubt this is real I can direct you to the original:

A lot of people without connections to the tech industry don’t realize how bad it’s gotten. This is how bad. It would be pointless trying to do anything about this person in particular. This is the climate.

Silicon Valley was supposed to be better than this. It was supposed to be the life of the mind, where people who were interested in the mysteries of computation and cognition could get together and make the world better for everybody. Now it’s degenerated into this giant hatefest of everybody writing long screeds calling everyone else Nazis and demanding violence against them. Where if someone disagrees with the consensus, it’s just taken as a matter of course that we need to hunt them down, deny them of the cloak of anonymity, fire them, and blacklist them so they can never get a job again. Where the idea that we shouldn’t be a surveillance society where we carefully watch our coworkers for signs of sexism so we can report them to the authorities is exactly the sort of thing you get reported to the authorities if people see you saying.

On the Twitter debate on this, someone mentioned that people felt afraid to share their thoughts anymore. An official, blue-checkmarked Woman In Tech activist responded with (note the 500+ likes):

This is the world we’ve built. Where making people live in fear is a feature, not a bug.

And: it can get worse. If you only read one link, let it be this one about the young adult publishing industry A sample quote:

One author and former diversity advocate described why she no longer takes part: “I have never seen social interaction this fucked up,” she wrote in an email. “And I’ve been in prison.”

Many members of YA Book Twitter have become culture cops, monitoring their peers across multiple platforms for violations. The result is a jumble of dogpiling and dragging, subtweeting and screenshotting, vote-brigading and flagging wars, with accusations of white supremacy on one side and charges of thought-policing moral authoritarianism on the other. Representatives of both factions say they’ve received threats or had to shut down their accounts owing to harassment, and all expressed fear of being targeted by influential community members — even when they were ostensibly on the same side. “If anyone found out I was talking to you,” Mimi told me, “I would be blackballed.”

Dramatic as that sounds, it’s worth noting that my attempts to report this piece were met with intense pushback. Sinyard politely declined my request for an interview in what seemed like a routine exchange, but then announced on Twitter that our interaction had “scared” her, leading to backlash from community members who insisted that the as-yet-unwritten story would endanger her life. Rumors quickly spread that I had threatened or harassed Sinyard; several influential authors instructed their followers not to speak to me; and one librarian and member of the Newbery Award committee tweeted at Vulture nearly a dozen times accusing them of enabling “a washed-up YA author” engaged in “a personalized crusade” against the entire publishing community (disclosure: while freelance culture writing makes up the bulk of my work, I published a pair of young adult novels in 2012 and 2014.) With one exception, all my sources insisted on anonymity, citing fear of professional damage and abuse.

None of this comes as a surprise to the folks concerned by the current state of the discourse, who describe being harassed for dissenting from or even questioning the community’s dynamics. One prominent children’s-book agent told me, “None of us are willing to comment publicly for fear of being targeted and labeled racist or bigoted. But if children’s-book publishing is no longer allowed to feature an unlikable character, who grows as a person over the course of the story, then we’re going to have a pretty boring business.”

Another agent, via email, said that while being tarred as problematic may not kill an author’s career — “It’s likely made the rounds as gossip, but I don’t know it’s impacting acquisitions or agents offering representation” — the potential for reputational damage is real: “No one wants to be called a racist, or sexist, or homophobic. That stink doesn’t wash off.”

Authors seem acutely aware of that fact, and are tailoring their online presence — and in some cases, their writing itself — accordingly. One New York Times best-selling author told me, “I’m afraid. I’m afraid for my career. I’m afraid for offending people that I have no intention of offending. I just feel unsafe, to say much on Twitter. So I don’t.” She also scrapped a work in progress that featured a POC character, citing a sense shared by many publishing insiders that to write outside one’s own identity as a white author simply isn’t worth the inevitable backlash. “I was told, do not write that,” she said. “I was told, ‘Spare yourself.’

Another author recalled being instructed by her publisher to stay silent when her work was targeted, an experience that she says resulted in professional ostracization. “I never once responded or tried to defend my book,” she wrote in a Twitter DM. Her publisher “did feel I was being abused, but felt we couldn’t do anything about it.”

Parts of tech are already this bad. For the rest of you: it’s what you have to look forward to.

It doesn’t have to be this way. Nobody has any real policy disagreements. Everyone can just agree that men and women are equal, that they both have the same rights, that nobody should face harassment or discrimination. We can relax the Permanent State Of Emergency around too few women in tech, and admit that women have the right to go into whatever field they want, and that if they want to go off and be 80% of veterinarians and 74% of forensic scientists, those careers seem good too. We can appreciate the contributions of existing women in tech, make sure the door is open for any new ones who want to join, and start treating each other as human beings again. Your co-worker could just be your co-worker, not a potential Nazi to be assaulted or a potential Stalinist who’s going to rat on you. Your project manager could just be your project manager, not the person tasked with monitoring you for signs of thoughtcrime. Your female co-worker could just be your female co-worker, not a Badass Grrl Coder Who Overcomes Adversity. Your male co-worker could just be your male co-worker, not a Tool Of The Patriarchy Who Denies His Complicity In Oppression. I promise there are industries like this. Medicine is like this! Loads of things are like this! Lots of tech companies are even still like this! This could be you.

Adam Grant seems like a good person. He is superficially doing everything right. He’s not demanding people feel afraid, or saying that everyone who disagrees with him is a fascist. He’s just trying to argue the science.

But I think he’s very wrong about the science. I think Hyde’s article is a gimmick which buries very real differences under a heap of meaningless similarities. I think that it’s inappropriate to cite it to respond to claims of specific differences that it didn’t investigate. I think that claims of a gender-equitable-society-effect in a different domain are inappropriate given the clear opposite effect in the domain being talked about it. I think it’s wrong to privilege likely-selection-biased evidence from a single college over all the evidence from the country as a whole. I think it’s wrong to suppose unique stereotypes in tech and engineering domains with no theory of how they got there, when there are non-stereotype-based theories that better explain the evidence. And I think it’s wrong to ignore all the studies about congenital adrenal hyperplasia.

And I think that, in being wrong about the science, he’s (probably unintentionally) giving aid and comfort to the people who have admitted that turning tech into a climate of fear and threats of violence is the end goal.

Grant is one of the few people doing the virtuous thing and trying to debate this without calling for other people’s deaths. I’m trying to do the virtuous thing and respond to him. But I worry that lots of people on Grant’s side aren’t as virtuous as he is, and I don’t know how to protect anybody from that except by begging people to please look at the science and try to get it right.

[EDIT: Prof. Grant responded to me via email; with his permission I’ve posted his response as a comment below (ie here). My thoughts on his response are in comments below that.]

Why Not More Excitement About Prediction Aggregation?

There’s a new ad on the sidebar for Metaculus, a crowd-sourced prediction engine that tries to get well-calibrated forecasts on mostly scientific topics. If you’re wondering how likely it is that a genetically-engineered baby will be born in the next few years, that SpaceX will reach Mars by 2030, or that Sci-Hub will get shut down, take a look.

(there are also the usual things about politics, plus some deliberately wacky ones like whether Elon Musk will build more kilometers of tunnel than Trump does border wall)

They’re doing good work – online for-cash prediction markets are limited to a couple of bets by the government, and they usually focus on the big-ticket items like who’s going to win elections. Metaculus is run by a team affiliated with the Foundational Questions Institute, and as their name implies they’re really interested in using the power of prediction aggregators to address some the important questions for the future of humanity – like AI, genetic engineering, and pandemics.

Which makes me wonder: what’s everyone else’s excuse?

Back when it looked like prediction markets were going to change everything (was it really as recent as two months ago?), various explanations were floated for why they hadn’t caught on yet. The government was regulating betting too much. The public was creeped out by the idea of profiting off of terrorist attacks. Or maybe people were just too dumb to understand the arguments in favor.

Now there are these crowd-sourced aggregator things. They’re not regulated by the government. Nobody’s profiting off of anything. And you don’t have to have faith that they’ll work – Philip Tetlock’s superforecasting experiments proved it, and Metaculus tracks its accuracy over time. I know the intelligence services are working with the Good Judgment Project, but I’m still surprised it hasn’t gone further.

Robin Hanson is the acknowledged expert on this, thanks to his long career of trying to get institutions to adopt prediction markets and generally failing. He attributes the whole problem to signaling and prestige, which I guess makes as much sense as anything. Tyler Cowen says something similar here. But I’m still surprised there aren’t consultant superforecaster think tanks hanging up their shingles. Forget prestige – aren’t there investors who would pay for their wisdom? And why can’t I hire Philip Tetlock to tell me whether my relationship is going to last?

I asked Prof. Aguirre of Metaculus, and he said (slightly edited for flow):

I don’t think Tetlock has any “secret sauce”, though I think he did a good job. The Metaculus track record is pretty good and will continue to improve. There’s definitely real predictive power. Our main challenge is that all the personnel involved are very time-limited and we’re also operating on a shoestring, probably 1/50th of what Tetlock spent of IARPA’s money.

If you are an individual company or investor, you don’t really get that much from “crowdsourcing” because you don’t really have a crowd (unless you’re a big business and can force your employees to make the predictions); so I’d guess most companies probably just fall back on asking some group of people to get together and make projections etc. My personal view is that the power really comes when you get a *lot* of people, questions, and data; you can then start to leverage that to improve the calibration (by recalibrating based on past accuracy), identify the really good predictors, and build up a large enough corpus of results that the predicted probabilities become grounded in an actual ensemble — the ensemble of questions on the platform.

Relatedly, the ability to make use of probabilistic predictions is, sadly, confined to a rather small fraction of people. I think typical decision-makers want to know what *will* happen, and 60-40 or 75-25 or even 80-20 is not the certainty they want. In the face of this uncertainty, I think people mentally substitute a different question to which they feel that they have some sort of good instinct, or privileged understanding. I think there’s also an element where they just don’t really *believe* the numbers because they are sometimes “wrong.” This sometimes frustrates me, but is not wholly irrational, I would say, because in the absence of a real grounding of the probabilities, if you have some analyst come and tell you that there’s a 70-30 chance, what exactly do you do with that? How would you possibly trust that those are the right numbers if the question is something “squishy” without a solid mathematical model?

I wonder if there’s data about how accuracy changes with number of predictors and predictor quality. There are so many smart people around who are interested in probability and willing to answer a few questions that this seems like a really stupid bottleneck. I’m happy I can do my part pushing Metaculus, but someone seriously needs to find a way to make this scale.

Posted in Uncategorized | Tagged | 169 Comments

Where The Falling Einstein Meets The Rising Mouse

Eliezer Yudkowsky argues that forecasters err in expecting artificial intelligence progress to look like this:

…when in fact it will probably look like this:

That is, we naturally think there’s a pretty big intellectual difference between mice and chimps, and a pretty big intellectual difference between normal people and Einstein, and implicitly treat these as about equal in degree. But in any objective terms we choose – amount of evolutionary work it took to generate the difference, number of neurons, measurable difference in brain structure, performance on various tasks, etc – the gap between mice and chimps is immense, and the difference between an average Joe and Einstein trivial in comparison. So we should be wary of timelines where AI reaches mouse level in 2020, chimp level in 2030, Joe-level in 2040, and Einstein level in 2050. If AI reaches the mouse level in 2020 and chimp level in 2030, for all we know it could reach Joe level on January 1st, 2040 and Einstein level on January 2nd of the same year. This would be pretty disorienting and (if the AI is poorly aligned) dangerous.

I found this argument really convincing when I first heard it, and I thought the data backed it up. For example, in my Superintelligence FAQ, I wrote:

In 1997, the best computer Go program in the world, Handtalk, won NT$250,000 for performing a previously impossible feat – beating an 11 year old child (with an 11-stone handicap penalizing the child and favoring the computer!) As late as September 2015, no computer had ever beaten any professional Go player in a fair game. Then in March 2016, a Go program beat 18-time world champion Lee Sedol 4-1 in a five game match. Go programs had gone from “dumber than heavily-handicapped children” to “smarter than any human in the world” in twenty years, and “from never won a professional game” to “overwhelming world champion” in six months.

But Katja Grace takes a broader perspective and finds the opposite. For example, she finds that chess programs improved gradually from “beating the worst human players” to “beating the best human players” over fifty years or so, ie the entire amount of time computers have existed:

AlphaGo represented a pretty big leap in Go ability, but before that, Go engines improved pretty gradually too (see the original AI Impacts post for discussion of the Go ranking system on the vertical axis):

There’s a lot more on Katja’s page, overall very convincing. In field after field, computers have taken decades to go from the mediocre-human level to the genius-human level. So how can one reconcile the common-sense force of Eliezer’s argument with the empirical force of Katja’s contrary data?

Theory 1: Mutational Load

Katja has her own theory:

The brains of humans are nearly identical, by comparison to the brains of other animals or to other possible brains that could exist. This might suggest that the engineering effort required to move across the human range of intelligences is quite small, compared to the engineering effort required to move from very sub-human to human-level intelligence…However, we should not be surprised to find meaningful variation in the cognitive performance regardless of the difficulty of improving the human brain. This makes it difficult to infer much from the observed variations.

Why should we not be surprised? De novo deleterious mutations are introduced into the genome with each generation, and the prevalence of such mutations is determined by the balance of mutation rates and negative selection. If de novo mutations significantly impact cognitive performance, then there must necessarily be significant selection for higher intelligence–and hence behaviorally relevant differences in intelligence. This balance is determined entirely by the mutation rate, the strength of selection for intelligence, and the negative impact of the average mutation.

You can often make a machine worse by breaking a random piece, but this does not mean that the machine was easy to design or that you can make the machine better by adding a random piece. Similarly, levels of variation of cognitive performance in humans may tell us very little about the difficulty of making a human-level intelligence smarter.

I’m usually a fan of using mutational load to explain stuff. But here I worry there’s too much left unexplained. Sure, the explanation for variation in human intelligence is whatever it is. And there’s however much mutational load there is. But that doesn’t address the fundamental disparity: isn’t the difference between a mouse and Joe Average still immeasurably greater than the difference between Joe Average and Einstein?

Theory 2: Purpose-Built Hardware

Mice can’t play chess (citation needed). So talking about “playing chess at the mouse level” might require more philosophical groundwork than we’ve been giving it so far.

Might the worst human chess players play chess pretty close to as badly as is even possible? I’ve certainly seen people who don’t even seem to be looking one move ahead very well, which is sort of like an upper bound for chess badness. Even though the human brain is the most complex object in the known universe, noble in reason, infinite in faculties, like an angel in apprehension, etc, etc, it seems like maybe not 100% of that capacity is being used in a guy who gets fools-mated on his second move.

We can compare to human prowess at mental arithmetic. We know that, below the hood, the brain is solving really complicated differential equations in milliseconds every time it catches a ball. Above the hood, most people can’t multiply two two-digit numbers in their head. Likewise, in principle the brain has 2.5 petabytes worth of memory storage; in practice I can’t always remember my sixteen-digit credit card number.

Imagine a kid who has an amazing $5000 gaming computer, but her parents have locked it so she can only play Minecraft. She needs a calculator for her homework, but she can’t access the one on her computer, so she builds one out of Minecraft blocks. The gaming computer can have as many gigahertz as you want; she’s still only going to be able to do calculations at a couple of measly operations per second. Maybe our brains are so purpose-built for swinging through trees or whatever that it takes an equivalent amount of emulation to get them to play chess competently.

In that case, mice just wouldn’t have the emulated more-general-purpose computer. People who are bad at chess would be able to emulate a chess-playing computer very laboriously and inefficiently. And people who are good at chess would be able to bring some significant fraction of their full most-complex-object-in-the-known-universe powers to bear. There are some anecdotal reports from chessmasters that suggest something like this – descriptions of just “seeing” patterns on the chessboard as complex objects, in the same way that the dots on a pointillist painting naturally resolve into a tree or a French lady or whatever.

This would also make sense in the context of calculation prodigies – those kids who can multiply ten digit numbers in their heads really easily. Everybody has to have the capacity to do this. But some people are better at accessing that capacity than others.

But it doesn’t make sense in the context of self-driving cars! If there was ever a task that used our purpose-built, no-emulation-needed native architecture, it would be driving: recognizing objects in a field and coordinating movements to and away from them. But my impression of self-driving car progress is that it’s been stalled for a while at a level better than the worst human drivers, but worse than the best human drivers. It’ll have preventable accidents every so often – not as many as a drunk person or an untrained kid would, but more than we would expect of a competent adult. This suggests a really wide range of human ability even in native-architecture-suited tasks.

Theory 3: Widely Varying Sub-Abilities

I think self-driving cars are already much better than humans at certain tasks – estimating differences, having split-second reflexes, not getting lost. But they’re also much worse than humans at others – I think adapting to weird conditions, like ice on the road or animals running out onto the street. So maybe it’s not that computers spend much time in a general “human-level range”, so much as being superhuman on some tasks, and subhuman on other tasks, and generally averaging out to somewhere inside natural human variation.

In the same way, long after Deep Blue beat Kasparov there were parts of chess that humans could do better than computers, “anti-computer” strategies that humans could play to maximize their advantage, and human + computer “cyborg” teams that could do better than either kind of player alone.

This sort of thing is no doubt true. But I still find it surprising that the average of “way superhuman on some things” and “way subhuman on other things” averages within the range of human variability so often. This seems as surprising as ever.

Theory 1.1: Humans Are Light-Years Beyond Every Other Animal, So Even A Tiny Range Of Human Variation Is Relatively Large

Or maybe the first graph representing the naive perspective is right, Eliezer’s graph representing a more reasonable perspective is wrong, and the range of human variability is immense. Maybe the difference between Einstein and Joe Average is the same (or bigger than!) the difference between Joe Average and a mouse.

That is, imagine a Zoological IQ in which mice score 10, chimps score 20, and Einstein scores 200. Now we can apply Katja’s insight: that humans can have very wide variation in their abilities thanks to mutational load. But because Einstein is so far beyond lower animals, there’s a wide range for humans to be worse than Einstein in which they’re still better than chimps. Maybe Joe Average scores 100, and the village idiot scores 50. This preserves our intuition that even the village idiot is vastly smarter than a chimp, let alone a mouse. But it also means that most of computational progress will occur within the human range. If it takes you five years from starting your project, to being as smart as a chimp, then even granting linear progress it could still take you fifty more before you’re smarter than Einstein.

This seems to explain all the data very well. It’s just shocking that humans are so far beyond any other animal, and their internal variation so important.

Maybe the closest real thing we have to zoological IQ is encephalization quotient, a measure that relates brain size to body size in various complicated ways that sometimes predict how smart the animal is. We find that mice have an EQ of 0.5, chimps of 2.5, and humans of 7.5.

I don’t know whether to think about this in relative terms (chimps are a factor of five smarter than mice, but humans only a factor of three greater than chimps, so the mouse-chimp difference is bigger than the chimp-human difference) or in absolute terms (chimps are 2 units bigger than mice, but humans are five units bigger than chimps, so the chimp-human difference is bigger than the mouse-chimp difference).

Brain size variation within humans is surprisingly large. Just within a sample of 46 adult European-American men, it ranged from 1050 to 1500 cm^3m; there are further differences by race and gender. The difference from the largest to smallest brain is about the same as the difference between the smallest brain and a chimp (275 – 500 cm^3); since chimps weight a bit less than humans, we should probably give them some bonus points. Overall, using brain size as some kind of very weak Fermi calculation proxy measure for intelligence (see here), it looks like maybe the difference between Einstein and the village idiot equals the difference between the idiot and the chimp?

But most mutations that decrease brain function will do so in ways other than decreasing brain size; they will just make brains less efficient per unit mass. So probably looking at variation in brain size underestimates the amount of variation in intelligence. Is it underestimating it enough that the Einstein – Joe difference ends up equivalent to the Joe – mouse difference? I don’t know. But so far I don’t have anything to say it isn’t, except a feeling along the lines of “that can’t possibly be true, can it?”

But why not? Look at all the animals in the world, and the majority of the variation in size is within the group “whales”. The absolute size difference between a bacterium and an elephant is less than the size difference between Balaenoptera musculus brevicauda and Balaenoptera musculus musculus – ie the Indian Ocean Blue Whale and the Atlantic Ocean Blue Whale. Once evolution finds a niche where greater size is possible and desirable, and figures out how to make animals scalable, it can go from the horse-like ancestor of whales to actual whales in a couple million years. Maybe what whales are to size, humans are to brainpower.

Stephen Hsu calculates that a certain kind of genetic engineering, carried to its logical conclusion, could create humans “a hundred standard deviations above average” in intelligence, ie IQ 1000 or so. This sounds absurd on the face of it, like a nutritional supplement so good at helping you grow big and strong that you ended up five light years tall, with a grip strength that could crush whole star systems. But if we assume he’s just straightforwardly right, and that Nature did something of about this level to chimps – then there might be enough space for the intra-human variation to be as big as the mouse-chimp-Joe variation.

How does this relate to our original concern – how fast we expect AI to progress?

The good news is that linear progress in AI would take a long time to cross the newly-vast expanse of the human level in domains like “common sense”, “scientific planning”, and “political acumen”, the same way it took a long time to cross it in chess.

The bad news is that if evolution was able to make humans so many orders of magnitude more intelligent in so short a time, then intelligence really is easy to scale up. Once you’ve got a certain level of general intelligence, you can just crank it up arbitrarily far by adding more inputs. Consider by analogy the hydrogen bomb – it’s very hard to invent, but once you’ve invented it you can make a much bigger hydrogen bomb just by adding more hydrogen.

This doesn’t match existing AI progress, where it takes a lot of work to make a better chess engine or self-driving car. Maybe it will match future AI progress, after some critical mass is reached. Or maybe it’s totally on the wrong track. I’m just having trouble thinking of any other explanation for why the human level could be so big.

Posted in Uncategorized | Tagged , | 278 Comments

Is It Possible To Have Coherent Principles Around Free Speech Norms?


One factor that must underlie people’s distrust of non-governmental free speech norms is that they’re so underspecified. The First Amendment is a comparatively simple, bright-line concept – the police can’t arrest you for saying the President sucks. Sure, we need a zillion Supreme Court cases to iron out the details, but it makes sense in principle. By contrast, social norms about free speech risk collapsing into the incoherent Doctrine Of The Preferred First Speaker, where it’s okay for me to say that the President sucks, but not okay for you to say that I suck for saying that. This is dumb, and I don’t know if free speech supporters have articulated a meaningful alternative. I want to sketch out some possibilities for what that sort of alternative would look like.

The philosophical question here is separating out the acts of having an opinion, signaling a propensity, and committing a speech act.

Having an opinion is the sort of thing free speech norms ought to exist to protect. The opinion ought to enter the marketplace of ideas, compete with other opinions on its merit, and either win or lose based on people’s considered rational judgment.

But this can’t be separated from signaling a propensity for action. Suppose Alice has the opinion “hand hygiene doesn’t matter”. The truly virtuous action is to show her (and concerned third parties) studies that prove that dangerous infections are transmissible by unwashed hands. But while you’re doing that, it’s fair to not want to eat at her restaurant. And it’s pro-social to tell other people not to eat at her restaurant either, and not to hire her as a nurse – and if she’s already a nurse, maybe to get her fired. Even though reasonable free speech norms demand that we fight bad ideas through counterargument rather than social punishment, in this case they should permit a campaign to get Alice fired.

One solution here might be to give people the burden of demonstrating that their controversial opinions won’t lead to dangerous actions. For example, if Alice is a nurse, she might say “I don’t believe hand hygiene matters, and I’m going to try to convince the hospital administration to remove their rule mandating handwashing – but until I succeed, I’ll follow the rules and wash my hands just like everyone else.” If I trusted Alice, this would allay my concerns, and I would go back to wanting to debate with her instead of wanting her fired. See also Be Nice, At Least Until You Can Coordinate Meanness.

Some signaling of propensities can’t be so easily fixed. If Carol thinks that “Hitler should have finished the job”, I feel like this tells me a lot about Carol besides just her moral ranking of various World-War-II-related alternate histories. If she was a schoolteacher, then even if she promised not to kill any Jews in class, or even to spread any anti-Semitic propaganda in class, it would be hard for me not to wonder what else was wrong with her, and whether she could really disguise every single repugnant aspect of her personality. On the other hand, if we try to get the school board to fire her, we’re implicitly endorsing the principle “Get someone fired if you know of a belief of theirs that suggests they’re an otherwise repugnant person” – and isn’t this the same principle that led people to campaign against atheist schoolteachers, pro-gay schoolteachers, communist schoolteachers, etc? See also Not Just A Mere Political Issue. I think I bite the bullet on this one and say that if the schoolteacher credibly promises not to be repugnant in any way in front of the kids, you let her keep teaching until she slips up.

And both having opinions and signaling propensities are hard to separate from commiting speech acts. The archetypal example is telling an assassin “I’ll give you $10,000 if you kill Bob” – a form of speech, but tantamount to murder. Repeated harassment – the kind where you scream insults at someone every time they leave the house – falls in the same category: the active ingredient isn’t the information being conveyed by what insults you choose, it’s that they face being screamed at and made to live in fear. And yeah, the archetypical example of this is starting a campaign to email someone’s embarassing secrets to their boss to get them fired.

We can’t just ban speech acts. Everything is a speech act. Me saying “Donald Trump is wrong on immigration” lowers Donald Trump’s status – that’s a speech act. Me saying “You’re wrong about free speech” might trigger you and make you feel awful until you write a 10,000 word essay responding to me – that’s a speech act too. Telling an offensive joke is definitely a speech act, but do we want to ban all jokes that anyone anywhere might be offended by? Let’s face it; a lot of speech is criticism, sometimes really harsh criticism, and the line between “criticism”, “insult”, and “harassment” is vague and debatable (see eg all of Twitter). Everyone has a different set of speech acts they consider beyond the pale, with no real way of sorting it out. So what speech acts do we permit as unavoidable parts of the process of social interaction, which ones do we punish, and how do we punish them?


A sample problem: a while ago, I read an article which took a sensitive social problem, approached it with inexcusably terrible statistics that mis-estimated its incidence by seven orders of magnitude, and then used it to mock the people who suffered from it and tell them they were stupid. I complained about this, and the author was really confrontational to me and said things like that I “needed to see a psychiatrist”. I ended up writing a couple of really angry blog posts, which not only corrected the statistics but also prominently named the author, accused him of being a bad person, and recommended that nobody ever trust him or take him seriously again.

One view: although the author was wrong, we’re all wrong sometimes. I’ve been wrong before, probably in ways that other people considered inexcusable, and I would rather be politely corrected than excoriated in public, dragged through the mud, and accused of being a defective human being. By my article, I contributed to a world where we don’t just debate each other’s points, but launch personal attacks against people in the hopes that they are so ashamed and miserable that they never participate in the discussion again. I have committed crimes against Reason, and I should humbly apologize and try to do better next time.

A second view: the author was either deliberately deceitful or criminally stupid; either way he really was inexcusably bad. If I just quietly correct his statistical error, only a fraction of his readership will see my correction, and meanwhile he’ll go on to do it a second time, a third time, and so on forever. Although there are many good people who should be approached as equals in the marketplace of ideas, there are also defectors against that marketplace who deserve to be ruthlessly crushed, and I was doing a public service by crushing one of them.

And a third view: by being needlessly cruel in his article, the author had already forfeited the protection of “the marketplace of ideas”. Just as if someone tries to shoot you, you can shoot back without worrying so much about the moral principle of nonviolence, so it’s always proper to fight fire with fire. Although I wouldn’t be justified in smacking down someone who had merely failed egregiously, someone who fails egregiously while breaking good discussion norms is another matter.

The second and third views get kind of scary when universalized. The second amounts to “if you decide someone’s a really bad person, feel free to crush them.” No doubt some evangelicals honestly think that gay rights crusaders are bad people; does this justify personal attacks against them?

The third seems to demand a more specific trigger (violation of a norm), but since nobody agrees where the norms are, it’s more likely to just lead to cascades where everyone ends up at different levels of the punishing/meta-punishing/meta-meta-punishing ladder and everyone thinks everyone else started it.

(an example: Alice writes a blog post excoriating Bob’s opinion on tax reforming, calling him a “total idiot” who “should be laughed out of the room”. Bob feels so offended that he tries to turn everyone against Alice, pointing out every bad thing she’s ever done to anyone who will listen. Carol considers this a “sexist harassment campaign” and sends a dossier of all of Bob’s messages to his boss, trying to get him fired. Dan decides this proves Carol is anti-free speech, and tells the listeners of his radio show to “give Carol a piece of their mind”, leading to her getting hundreds of harassing and threatening email messages. Eric snitches on Dan to the police. How many of these people are in the wrong?)

But I can’t fully bite the bullet and accept the first view either; some people are so odious that an alarm needs to be spread. I’m not proud of my behavior in the specific situation mentioned, but I won’t completely give up the right to do something similar if the information arises. I’m going to try as hard as I can to err on the side of not doing that (I stick by my decision not to name the Reason columnist involved in the sandwich incident, although I guess everyone already knows) but sometimes the line will need to be crossed.


I think the most important consideration is that it be crossed in a way that doesn’t create a giant negative-sum war-of-all-against-all. That is, Democrats try to get Republicans fired for the crime of supporting Republicanism, Republicans try to get Democrats fired for the crime of supporting Democratism, and the end result is a lot of people getting fired but the overall Republican/Democrat balance staying unchanged.

That suggests a heuristic very much like Be Nice, At Least Until You Can Coordinate Meanness again: don’t try to destroy people in order to enforce social norms that only exist in your head. If people violate a real social norm, that the majority of the community agrees upon, and that they should have known – that’s one thing. If you idiosyncratically believe something is wrong, or you’re part of a subculure that believes something is wrong even though there are opposite subcultures that don’t agree – then trying to enforce your idiosyncratic rules by punishing anyone who breaks them is a bad idea.

And one corollary of this is that it shouldn’t be arbitrary. Ten million people tell sexist jokes every day. If you pick one of them, apply maximum punishment to him, and let the other 9.99 million off scot-free, he’s going to think it’s unfair – and he’ll be right. This is directly linked to the fact that there isn’t actually that much of a social norm against telling sexist jokes. My guess is that almost everyone who posts child pornography on Twitter gets in trouble for it, and that’s because there really is a strong anti-child pornography norm.

(this is also how I feel about the war on drugs. One in a thousand marijuana users gets arrested, partly because there isn’t enough political will to punish all marijuana users, partly because nobody really thinks marijuana use is that wrong. But this ends out unfair to the arrested marijuana user, not just because he’s in jail for the same thing a thousand other people did without consequence, but because he probably wouldn’t have done it he’d really expected to be punished, and society was giving him every reason to think he wouldn’t be.)

This set of norms is self-correcting: if someone does something you don’t like, but there’s not a social norm against it, then your next step should be to create a social norm against it. If you can convince 51% of the community that it’s wrong, then the community can unite against it and you can punish it next time. If you can’t convince 51% of the community that it’s wrong, then you should try harder, not play vigilante and try to enforce your unpopular rules yourself.

If you absolutely can’t tolerate something, but you also can’t manage to convince your community that it’s wrong and should be punished, you should work on finding methods that isolate you from the problem, including building a better community somewhere else. I think some of this collapses into a kind of Archipelago solution. Whatever the global norms may be, there ought to be communities catering to people who want more restrictions than normal, and other communities catering to people who want fewer. These communities should have really explicit rules, so that everybody knows what they’re getting into. People should be free to self-select into and out of those communities, and those self-selections should be honored. Safe spaces, 4chan, and this blog are three very different kinds of intentional communities with unusual but locally-well-defined free speech norms, they’re all good for the people who use them, and as long as they keep to themselves I don’t think outsiders have any right to criticize their existence.


I don’t know if this position is coherent. My guess is there’s a lot of places it doesn’t match my intuition, and a lot of other places where it’s so fuzzy it could justify or condemn anything at all.

But I think trying to hammer out something like this is important. Free speech norms aren’t about free speech. They quickly bleed over into these really fundamental questions, like – what is a culture? What is it we’re trying to do when we get together and have a society? Are we allowed to want different things from a culture? If so, how do we balance everyone else’s demands? Do we just live in some kind of postmodern globalized atomized culture, or are cultures these things inexorably linked to specific value systems that we’ve got to keep moored to those systems at all costs? How much are we allowed to use shaming to punish people who don’t conform to our culture? How angry are we allowed to be when other people use shaming to punish people we like who don’t conform to theirs?

Trying to get a model of these things that doesn’t immediately contradict itself on everything is potentially a good first step to trying to get a model of these things that’s right and/or liveable.

Gender Imbalances Are Mostly Not Due To Offensive Attitudes


One idea that kept coming up in the comments to the thread on signal-boosting-as-doxxing: it’s permissible to take emergency action against offensive male libertarians, because we really need to improve the gender balance in libertarianism.

The implied assumption is that women flock to movements full of respectful feminists, and shun the ones without them. If women aren’t in libertarianism, that’s because it’s tolerating too many sexists.

Since the assumption is implicit, nobody has defended it, and maybe nobody really believes it in its explicit form. But it underlies enough of this discourse that it’s still worth challenging.

Donald Trump is not a poster child for respectful inclusiveness. He is on record saying that he likes to “grab women by the pussy” in what sure sounds like a nonconsenual manner. His sophistication on gender issues is generally somewhere around the level of australopithecine, distantly aspiring towards Neanderthaldom as some shining mountaintop goal.

But Trump voters were more gender-balanced than libertarians: about 47% female, 53% male. Among Trump’s key demographic of white people, he actually won the female vote, beating Clinton among white women by 53% to 43%.

The Catholic Church also isn’t a poster child for feminism., which has such an authoritative-sounding domain name that I assume it’s written personally by the Pope, says that:

Women are more natural caregivers for children, and men more naturally work outside the home. Yet women can and do work outside the home and men do act as caregivers for children (changing diapers, feeding babies their bottles, burping them, walking with them when they are crying at night-men do all these things, just as women do). Their roles tend to be focused in one area (caregiving for women and working outside the home for men), but one can fill in for the other whenever needed.

Add in their consistent opposition to abortion, birth control, sexual liberation, et cetera, and it should be at least a little surprising that women outnumber men among churchgoing Catholics by about 20%.

Reddit as a whole is about 30% female. But the r/libertarian and r/neoliberal subreddits are 5.5% and 4.5% female respectively. Is this because they’re hostile to women? Seems unlikely. The r/mensrights and r/KotakuInAction (ie Gamergate) subreddit are frequently considered especially offensive, and both of them hover around 10% female. Believe that offensiveness is the sole determinant of gender balance, and you’re forced to conclude that adopting Gamergate’s gender-related norms would double libertarianism’s female population.

See enough of this stuff, and you come to the conclusion that the percent of women in a movement isn’t just a function of how carefully it purges all of its insufficiently pro-feminist members.


So what is it a function of?

Richard Lippa’s Gender Differences In Personality And Interests is a pretty good source for this sort of thing. It notes that one of the largest gender differences recorded – larger even than the things we tend to think of as hard-and-fast obvious gender differences like physical aggressiveness or attitudes toward casual sex – is what Lippa calls “interest in things vs. people”. He writes:

For the people–things dimension of interests, the results in Table 1 are clear, strong, and unambiguous. Men tend to be much more thing-oriented and much less people-oriented than women (mean d = 1.18, a ‘very large’ difference, according to Hyde (2005) verbal designations

He notes that a d of 1.18 is “very large”, but I worry that the less statistically savvy won’t appreciate quite how large it is. All I can say is that I spent several years believing that the d statistic was a scale from 0 to 1, because I’d never seen a d go outside that range before. Daniel Lakens wrote a great piece about a study that found a d = 1.96, where he argued we should dismiss it almost out of hand, because non-tautological effects are almost never that large and so clearly somebody made a mistake in the study (spoiler: they did). Lippa’s finding isn’t quite at that level, but it’s getting up there.

This strikes me as sort of similar to the systematizing-empathizing distinction, where men and women also score pretty differently (d = 0.5). I admit I don’t have great proof that these are related concepts, but it seems intuitive to me that systems are a sort of “thing”, and that the kind of people who are interested in analyzing formal systems all the time usually aren’t super people-oriented.

(am I using “kind of people” here as a dog-whistle for autism? Give me more credit than that; I’m actually trying to dog-whistle “NMDA receptor hypofunction.”)

There seems to be a similar gender difference in tendency to use utilitarianism – see eg Friesdorf et al’s meta-analysis. Without me being able to explain exactly what I mean, I hope you share my intuition that utilitiarianism is an unusually thing-oriented and systematizing moral system, and that this gestures at the same large but hard-to-explain difference as the other two results.

These are probably upstream of more noticeable differences in political attitudes. A less-empathizing/more-systematizing personality would make people more interested in the politics of effective institutions than in the politics of provisioning social welfare. And a less utilitiarian and more traditional-morality-focused personality would make people more interested in using government power to enforce social norms. Eagly et al look at gender differences in political attitudes and find exactly this:

The most commonly noted difference is that women are more likely than men to endorse policies that support the provision of social services for deserving and disadvantaged groups (Goertzel,
1983; Schlesinger & Heldman, 2001; Shapiro & Mahajan, 1986), including housing, child care, educational opportunity, and financial support in the form of welfare…Women also advocate more restriction of many behaviors that are traditionally considered immoral (e.g., casual sex; Oliver & Hyde, 1993; consumption of pornography, Seltzer et al., 1997)…Women can thus be regarded as more liberal than men in social compassion and more conservative in traditional morality.

“More liberal in social compassion and more conservative in traditional morality” sounds like a pretty good description of “not libertarian”.

I’m not sure there’s a lot of mystery left to explain here. You can eliminate every single shred of sexism in the libertarian movement, make it so feminist that it makes Ursula K. LeGuin books look like Margaret Atwood books – and it will still never get anywhere near the gender balance of those weird evangelical sects who talk about women have to be subservient because Eve was made from Adam’s rib.


A strong counterargument:

What if all this stuff about sexism driving away women is all a big hoax? And so after we make women feel safer, stamp out prejudice, enforce common decency, and encourage everyone to treat each other with compassion – darn it, we created a better world for nothing! If the goal is “eliminate malignant sexism” – and surely it should be – why be so upset about one argument for eliminating malignant sexism which might not be entirely accurate?

First, because I’m a heartless thing-oriented systematizer, and I despise bad arguments on principle, and I don’t care if you people-oriented empathizers think they serve a prosocial community-building function.

But second, because this gives fuzzy-empathizing-humanities types a giant hammer with which to beat all sciency-systematizing-utilitarian types forever.

I’m not the sort of New Atheist who believes in some kind of apocalyptic battle where the defenders of rationality and civilization face off against some fantastic coalition of postmodernism / hippyism / gender studies / crystal-healing / evangelicalism / cultural Marxism / Islam / Donald Trump. But STEM ideals conflicting with humanities-focused ideals isn’t a total fiction. There really are some differences in values between the average Silicon Valley programmer and the average Oberlin literature professor. A humanities/empathizing/intuitive vs. sciency/systematizing/utilitarian distinction isn’t a perfectly natural category, nor the only axis on which people differ. But I’m hoping you share my intuition that it’s at least a vague cluster, and at least one axis of difference.

And however different postmodernists, evangelicals, Islamists, Muslims, crystal-healers, and Trump supporters might be, there actually is one thing they have in common: all these groups have great gender balance. You’ll never find a Wiccan circle or a gender studies class that accidentally ended up as 100% male.

And computer scientists, mathematicians, economists, utilitarians, libertarians, movement atheists, skeptics, transhumanists, cryptocurrency enthusiasts, et cetera – are an equally sundry non-coalition. But they also have something in common: a serious skew towards men.

And if you accept the implict assumption that good opinions = gender balance and sexism = gender imbalance, then forever and always the crystal healers and Trump supporters will have a clear badge of being good people and responsible citizens, and the utilitarians and economists will be, on a collective level, sexist jerks. And it sure seems like this is a point in favor of the crystal healers and Trump supporters if you’re trying to figure out who to trust.

A community made up of sexist jerks has a moral obligation to stop being sexist and jerkish right away, both because it’s the right thing to do, and because it’s tactically advantageous to be able to recruit women to the cause. If sexist jerkishness can be measured by gender balance, the appropriate response is “keep dialing up the level of cracking-down-on-sexism until gender balance approaches parity.” But if this is your philosophy, and gender balance doesn’t respond at all to these crackdowns, then level-of-cracking-down quickly rises to infinity.

So we end up with sciency/systematizing/utilitarian fields and movements turning into circular firing chambers. If “it’s permissible to take emergency action against offensive male libertarians, because we really need to improve the gender balance in libertarianism”, then the state of emergency quickly becomes permanent, as more and more extreme measures fail to effect any improvement. Yes, there will be people trying to get you fired over making sexist jokes on Twitter. But don’t worry – there will also be people trying to get you fired over if you’re too interested in BDSM, or if you find the word dongle amusing. And if the people trying to get the other people fired overstep their social status just a tiny bit, then the backlash will get them fired (“circular firing chamber” wasn’t intended as a pun, but maybe it should have been).

At the same time, everyone on the humanities/empathizing/intuitive side of the divide gets a knockdown argument against any sciency/systematizing/utilitarian challenge: “Oh, those sexists? Come back when you’ve taken off your fedora, dudebros.” And this sort of thing works. And since so many of us implicitly accept the poor-gender-balance = sexist jerks narrative, we can’t even fight back. The best we can do is “Yes, our side is full of repellent people with terrible values, and your side is full of lovely tolerant accepting people who successfully build inclusive communities for everyone, but, uh, we make some good points anyway. Wait, come back! Why aren’t you reading my double-blind randomized study showing that I’m right?”

I think a better response is to point out that there’s never been any study or informal survey suggesting that sciency/systematizing/utilitarian people are any more sexist than anyone else (if you find one, tell me). No person or community is perfect, but to claim some special evil for libertarians, economists, utilitarians etc is a weird hypothesis I’ve never seen anyone try to explain, let alone prove. When 20% of high school kids taking the AP Computer Science test are women, and 20% of college students majoring in CS are women, and then 20% of people in the tech industry are women, maybe “demographics are a thing” is a better hypothesis than “the tech industry is uniquely full of gross sexist nerds”.

This isn’t to deny the experiences of women who feel more frequently harassed in these communities. But it does seem s more likely to be a result of gender balance than a cause of it. That is, if we dectuple the male-to-female ratio while holding the sexual-harasser-ness of each man constant, then each woman faces ten times more sexual harassment.

This might mean that there’s a duty for sciency/systematizing/utilitarian people to work ten times harder to fight harassment than anyone else. I think that might be happening. When I first entered medicine, I was shocked at the lackadaisacal attitude people there took to casually sexist speech, compared to an environment in Silicon Valley which I’ve heard people describe as “Stalinist”. But medicine is a people-oriented/empathizing profession; 90% of nurses are women; so are 57% of psychiatry residents. The potential-harasser:potential-victim ratio keeps people safer without the same permanent state of emergency.

So yes, let’s try to build a better world. But let’s do it because it’s a good thing to do, not because we expect it to single-handedly normalize gender ratios.