codex Slate Star Codex

THE JOYFUL REDUCTION OF UNCERTAINTY

Del Giudice On The Self-Starvation Cycle

[Content note: eating disorders]

Anorexia has a cultural component. I’m usually reluctant to assume anything is cultural – every mediocre social scientist’s first instinct is always to come up with a cultural explanation which is simple, seductive, flattering to all our existing prejudices, and wrong. But after seeing enough ballerinas and cheerleaders who became anorexic after pressure to lose weight for the big competition, even I have to throw up my hands and admit anorexia has a cultural component.

But nobody ever tells you the sequel. That ballerina who’s losing weight for the big competition at age 16? At age 26, she’s long since quit ballet, worried it would exacerbate her anorexia. She’s been in therapy for ten years; for eight of them she’s admitted she has a problem, that her anorexia is destroying her life. Her romantic partners – the ones she was trying to get thin to impress – have long since left her because she looks skeletal and weird. She understands this and would do anything to cure her anorexia and be a normal weight again. But she finds she isn’t hungry. She hasn’t eaten in two days and she isn’t hungry. In fact, the thought of food sickens her. She goes to increasingly expert therapists and dieticians, asking them to help her eat more. They recommend all the usual indulgences: ice cream, french fries, cookies. She tries all of them and finds them inexplicably disgusting. Sometimes with a prodigious effort of will she will manage to finish one cookie, and congratulate herself, but the next day she finds the task of eating dessert as daunting as ever. Finally, after many years of hard work, she is scraping the bottom end of normal weight by keeping to a diet so regimented it would make a Prussian general blush.

And nobody ever tells you about all the people who weren’t ballerinas. The young man who stops eating because it gives him a thrill of virtue and superiority to be able to demonstrate such willpower. The young woman who stops eating in order to show her family how much their neglect hurts her. If they pursue their lack of appetite far enough, they end up the same way as the ballerina – admitting they have a problem, admitting they need to eat more, hiring all sorts of doctors and dieticians to find them a way to eat more, but discovering themselves incapable of doing so.

And this is why I can’t subscribe to a purely cultural narrative of anorexia. How does “ballerinas are told they should be thin in order to be pretty” explain so many former ballerinas who want to gain weight but can’t? And how does it explain the weird, almost neurological stuff like how anorexic people will mis-estimate their ability to fit through doors?

All of this makes much more sense in a biological context; it’s as if the same system that is broken in obese people who cannot lose weight no matter how hard they try, is broken in anorexics who cannot gain weight no matter how hard they try. There are plenty of biological models for what this might mean. But then the question becomes: how do we reconcile the obviously cultural part where it disproportionately happens to ballerinas, to the probably biological part where the hypothalamus changes its weight set point?

I’m grateful to Professor del Giudice and Evolutionary Psychopathologyfor presenting the only reasonable discussion of this I have heard, which I quote here basically in its entirety:

The self-starvation cycle arises in predisposed individuals following an initial phase of food restriction and weight loss. Food restriction may be initially prompted by a variety of motives, from weight concerns and a desire for thinness to health-related or religious ideas (eg spiritual purity, ascetic self-denial). In fact, the cycle may even be started by involuntary weight loss due to physical illness. While fasting and exercise are initially aversive, they gradually become rewarding – even addictive – as the starvation response kicks in. At the same time, restricting behaviors that used to be deliberate become increasingly automatic, habitual, and difficult to interrupt (Dwyer et al, 2001; Guarda et al, 2015; Lock & Kirz, 2013; McGuire & Troisi 1998). The self-starvation cycle plays a crucial role in the onset of anorexia.

Increased physical activity is a key component of the starvation response in many animal species; in general, its function is to prompt exploration and extend the foraging range when food is scarce. This response is so ingrained that animals subjected to food restriction in conditions that allow physical activity often starve themselves to death through strenuous exercise (Fessler, 2002; Guarda et al, 2015; Scheurink et al, 2010). In humans, pride is a powerful additional rewrad of self-starvation – achieving extraordinary levels of thinness and self-control makes many anorexic patients feel special and superior (Allan & Goss, 2012). The starvation response also brings about some psychological changes that further contribute to reinforce the cycle. In particular, starvation dramatically interferes with executive flexibility/shifting, and patterns of behavior become increasingly rigid and inflexible. The balance between local and global processing is also shifted toward local details. This may contribute to common body image distortions in anorexia, as when patients focus obsessively on a specific body part (eg the neck or hips) but preceive themselves as globally overweight (Pender et al, 2014; Westwood et al; 2016).

The self-starvation cycle has been documented across time and cultures, including non-Western ones. In modern Western societies, concerns with fat and thinness are the main reason for weight loss and probably explain the moderate rise of AN incidence around the second half of the 20th century. However, cases of self-starvation with spiritual and religious motivations have been common in Europe at least since the Middle Ages (and include several Catholic saints, most famously St. Catherine of Siena). In some Asian cultures, digestive discomfort is often cited as the initial reason for restricting food intake, but the resulting syndrome has essentially the same symptoms as anorexia in Western countries (Bell, 1984; Brumberg, 1989; Culbert et al, 2015; Keel & Klump, 2003). The DSM-5 criteria for anorexia include fear of gaining weight as a diagnostic requirement; for this reason, most historical and non-Western cases would not be diagnosed as AN within the current system. However, the present emphasis on thinness is likely a contingent sociohistorical fact and does not seem to represent a necessary feature of the disorder. (Keel & Klump, 2003)

My anorexic patients sometimes complain of being forced into this mold. They’ll try to go to therapy for their inability to eat a reasonable amount of food, and their therapist will want to spend the whole time talking about their body image issues. When they complain they don’t really have body image issues, they’ll get accused of repressing it. Eventually they’ll just say “Yeah, whatever, I secretly wanted to be a ballerina” in order to make the therapist shut up and get to the part where maybe treatment happens.

The clear weak part of this theory is the explanation of the “self-starvation cycle”. Aside from a point about animals sometimes having increased activity to go explore for food, it all seems kind of tenuous.

And how come most people who starve never get anorexia? How come sailors who ran out of food halfway across the Pacific, barely made it to some tropical island, and gorged themselves on coconuts didn’t end up anorexic? Donner Party members? Concentration camp survivors? Is there something special about voluntary starvation? Some kind of messed-up learning process?

I am interpreting the point to be something along the lines of “Suppose for some people with some unknown pre-existing vulnerability, starving themselves voluntarily now flips some biological switch which makes them starve themselves involuntarily later”.

Framed like this, it sounds more like a description of anorexia than a theory about it (though see here for an attempt to flesh this out). But it’s a description which captures part of the disease that a lot of other models don’t, and which brings some things into clearer relief, and I am grateful to have it.

Posted in Uncategorized | Tagged | 221 Comments

Book Review: Evolutionary Psychopathology

I.

Evolutionary psychology is famous for having lots of stories that make sense but are hard to test. Psychiatry is famous for having mountains of experimental data but no idea what’s going on. Maybe if you added them together, they might make one healthy scientific field? Enter Evolutionary Psychopathology: A Unified Approach by psychology professor Marco del Giudice. It starts by presenting the theory of “life history strategies”. Then it uses the theory – along with a toolbox of evolutionary and genetic ideas – to shed new light on psychiatric conditions.

Some organisms have lots of low-effort offspring. Others have a few high-effort offspring. This was the basis of the old r/k selection theory. Although the details of that theory have come under challenge, the basic insight remains. A fish will lay 10,000 eggs, then go off and do something else. 9,990 will get eaten by sharks, but that still leaves enough for there to be plenty of fish in the sea. But an elephant will spend two years pregnant, three years nursing, and ten years doing at least some level of parenting, all to produce a single big, well-socialized, and high-prospect-of-life-success calf. These are two different ways of doing reproduction. In keeping with the usual evolutionary practice, del Giudice calls the fish strategy “fast” and the elephant strategy “slow”.

To oversimplify: fast strategies (think “live fast, die young”) are well-adapted for unpredictable dangerous environments. Each organism has a pretty good chance of randomly dying in some unavoidable way before adulthood; the species survives by sheer numbers. Fast organisms should grow up as quickly as possible in order to maximize the chance of reaching reproductive age before they unpredictably die. They should mate with anybody around, to maximize the chance of mating before they unpredictably die. They should ignore their offspring, since they expect most offspring to unpredictably die, and since they have too many to take care of anyway. They should be willing to take risks, since the downside (death without reproducing) is already their default expectation, and the upside (becoming one of the few individuals to give birth to the 10,000 offspring of the next generation) is high.

Slow strategies are well-adapted for safer environments, or predictable complex environments whose intricacies can be mastered with enough time and effort. Slow strategy animals may take a long time to grow up, since they need to achieve mastery before leaving their parents. They might be very picky maters, since they have all the time in the world to choose, will only have a few children each, and need to make sure each of those children has the best genes possible. They should work hard to raise their offspring, since each individual child represents a substantial part of the prospects of their genetic line. They should avoid risks, since the downside (death without reproducing) would be catastrophically worse than default, and the upside (giving birth to a few offspring of the next generation) is what they should expect anyway.

Del Giudice asks: what if life history strategies differ not just across species, but across individuals of the same species? What if this theory applied within the human population?

In line with animal research on pace-of-life syndromes, human research has shown that impulsivity, risk-taking, and sensation seeking, are systematically associated with fast life history traits such as early intercourse, early childbearing in females, unrestricted sociosexuality, larger numbers of sexual partners, reduced long-term mating orientation, and increased mortality. Future discounting and heightened mating competition reduce the benefits of reciprocal long-term relationships; in motivational terms, affiliation and reciprocity are downregulated, whereas status seeking and aggression are upregulated. The resulting behavioral pattern is marked by exploitative and socially antagonistic tendencies; these tendencies may be expressed in different forms in males and females, for example through physical versus relational aggression (Belsky et al 1991; Borowsky et al 2009; Brezina et al 2009; Chen & Vazsonyi 2011; Copping et al 2013a, 2013b, 2014a; Curry et al 2008; Dunkel & Decker 2010 […]

And:

Disgust sensitivity is another dimension of individual differences with links to the fast-slow continuum. To begin, high disgust sensitivity is broadly associated with measures of risk aversion. Moral and sexual disgust correlate with higher agreeableness, conscientiousness, and honesty-humility; and sexual disgust specifically predicts restricted sociosexuality (Al-Shawaf et al 2015; Sparks et al 2018; Tybur et al 2009, 2015; Tybur & de Vries 2013). These findings suggest that the disgust system is implicated in the regulation of life-history-related behaviors. In particular, sexual and moral disgust show the most consistent pattern of correlations with other indicators of slow strategies.

Romantic attachment styles have wide ranging influences on sexuality, mating, and couple stability, but their relations with life history strategies are somewhat complex. Secure attachment styles are consistently associated with slow life history traits (eg Chisholm 1999b; Chisholm et al 2005; Del Giudice 2990a). Avoidance predicts unrestricted sociosexuality, reduced long-term orientation, and low commitment to partners (Brennan & Shaver 1995; Jackson & Kirkpatrick 2007; Templehof & Allen 2008). Given the central role of pair bonding in long-term parental investment, avoidant attachment – which, on average, is higher in men – can be generally interpreted as a mediator of reduced parenting effort. However, some inconsistent findings indicate that avoidance may capture multiple functional mechanisms. High levels of romantic avoidance are found both in people with very early sexual debut and in those who delay intercourse (Gentzler & Kearns, 2004); this suggests that, at least for some people, avoidant attachment may actually reflect a partial downregulation of the mating system, consistent with slower life history strategies.

And:

At a higher level of abstraction, the behavioral correlates of life history strategies can be framed within the five-factor model of personality. Among the Big Five, agreeableness and conscientiousness show the most consistent pattern of associations with slow traits such as restricted sociosexuality, long-term mating orientation, couple stability, secure attachment to parents in infancy and romantic partners in adulthood, reduced sex drive, low impulsivity, and risk aversion across domains (eg Baams et al 2004; Banai & Pavela 2015; Bourage et al 2007; DeYoung 2001; Holtzman & Strube 2013; Jonasen et al 2013 […] Some researchers working in a life history perspective have argued that the general factor of personality should be regarded as the core personality correlate of slow strategies.

Del Giudice suggests that these traits, and predisposition to fast vs. slow life history in general, are caused by a gene * environment interaction. The genetic predisposition is straightforward enough. The environmental aspect is more interesting.

There has been some research on the thrify phenotype hypothesis: if you’re undernourished while in the womb, you’ll be at higher risk of obesity later on. Some mumble “epigenetic” mumble “mechanism” looks around, says “We seem to be in a low-food environment, better design the brain and body to gorge on food when it’s available and store lots of it as fat”, then somehow modulates the relevant genes to make it happen.

Del Giudice seems to imply that a similar epigenetic mechanism “looks around” at the world during the first few years of life to try to figure out if you’re living in the sort of unpredictable dangerous environment that needs a fast strategy, or the sort of safe, masterable environment that needs a slow strategy. Depending on your genetic predisposition and the observable features of the environment, this mechanism “makes a decision” to “lock” you into a faster or slower strategy, setting your personality traits more toward one side or the other.

He further subdivides fast vs. slow life history into four different “life history strategies”.

The antagonistic/exploitative strategy is a fast strategy that focuses on getting ahead by defecting against other people. Because it expects a short and noisy life without the kind of predictable iterated games that build reciprocity, it throws all this away and focuses on getting ahead quick. A person who has been optimized for an antagonistic/exploitative strategy will be charming, good at some superficial social tasks, and have no sense of ethics – ie the perfect con man. Antagonistic/exploitative people will have opportunities to reproduce through outright rape, through promising partners commitment and then not providing it, through status in criminal communities, or through things in the general category of hiring prostitutes when both parties are too drunk to use birth control. These people do not have to be criminals; they can also be the most cutthroat businessmen, lawyers, and politicians. Jumping ahead to the psychiatry connection, the extreme version of this strategy is probably antisocial personality disorder.

The creative/seductive strategy is a fast strategy that focuses on getting ahead through sexual selection, ie optimizing for being really sexy. Because it expects a short and noisy life, it focuses on raw sex appeal (which peaks in the late teens and early twenties) as opposed to things like social status or ability to care for children (which peak later in maturity). A person who has been optimized for a creative/seductive strategy will be attractive, artistic, flirtatious, and popular – eg the typical rock star or starlet. They will also have traits that support these skills, which for complicated reasons include being very emotional. Creative/seductive people will have opportunities to reproduce through making other people infatuated with them; if they are lucky, they can seduce a high-status high-resource person who can help take care of the children. The most extreme version of this strategy is probably borderline personality disorder.

The prosocial/caregiving strategy is a slow strategy that focuses on being a responsible pillar of the community who everybody likes. Because it expects a slow and stable life, it focuses on lasting relationships and cultivating a good reputation that will serve it well in iterated games. A person who has been optimized for a prosocial/caregiving strategy will be dependable, friendly, honest, and conformist – eg the typical model citizen. Prosocial/caregiving people will have opportunities to reproduce by marrying their high school sweetheart, living in a suburban tract house, and having 2.4 children who go to state college. The most extreme version of this strategy is probably being a normie.

The skilled/provisioning strategy is a slow strategy that focuses on being good at specific useful tasks. Because it expects a slow and stable life, it focuses on gaining abilities that may take years to bear any fruit. A person who is optimized for a skilled/provisioning strategy will be intelligent, good at learning, and a little bit obsessive. They may not invest as much in friendliness or seductiveness; once they succeed at their chosen path, they will get social status through being indispensible for the continued functioning of the community, and they will have opportunities to reproduce because of their high status and obvious ability to provide for the children. The most extreme version of this strategy is probably high-functioning autism.

This division into life strategies is a seductive idea. I mean, literally, it’s a seductive idea, ie in terms of memetic evolution, we may worry it is optimized for a seductive/creative strategy for reproduction, rather than the boring autistic “is actually true” strategy. The following is not a figure from Del Giudice’s book, but maybe it should be:

There’s a lot of debate these days about how we should treat research that fits our existing beliefs too closely. I remember Richard Dawkins (or maybe some other atheist) once argued we should be suspicious of religion because it was too normal. When you really look at the world, you get all kinds of crazy stuff like quantum physics and time dilation, but when you just pretend to look at the world, you get things like a loving Father, good vs. evil, and ritual purification – very human things, things a schoolchild could understand. Atheists and believers have since had many debates over whether religion is too ordinary or sufficiently strange, but I haven’t heard either side deny the fundamental insight that science should do something other than flatter our existing categories for making sense of the world.

On the other hand, the first thermometer no doubt recorded that it was colder in winter than in summer. And if someone had criticized physicists, saying “You claim to have a new ‘objective’ way of looking at temperature, but really all you’re doing is justifying your old prejudices that the year is divided into nice clear human-observable parts, and summer is hot and winter is cold” – then that person would be a moron.

This kind of thing keeps coming up, from Klein vs. Harris on the science of race to Jussim on stereotype accuracy. I certainly can’t resolve it here, so I want to just acknowledge the difficulty and move on. If it helps, I don’t think Del Giudice wants to argue these are objectively the only four possible life strategies and that they are perfect Platonic categories, just that these are a good way to think of some of the different ways that organisms (including humans) can pursue their goal of reproduction.

II.

Psychiatry is hard to analyze from an evolutionary perspective. From an evolutionary perspective, it shouldn’t even exist. Most psychiatric disorders are at least somewhat genetic, and most psychiatric disorders decrease reproductive fitness. Biologists have equations that can calculate how likely it is that maladaptive genes can stay in the population for certain amounts of time, and these equations say, all else being equal, that psychiatric disorders should not be possible. Apparently all else isn’t equal, but people have had a lot of trouble figuring out exactly what that means. A good example of this kind of thing is Greg Cochran’s theory that homosexuality must be caused by some kind of infection; he does not see another way it could remain a human behavior without being selected into oblivion.

Del Giudice does the best he can within this framework. He tries to sort psychiatric conditions into a few categories based on possible evolutionary mechanisms.

First, there are conditions that are plausibly good evolutionary strategies, and people just don’t like them. For example, nymphomania is unfortunate from a personal and societal perspective, but one can imagine the evolutionary logic checks out.

Second, there are conditions which might be adaptive in some situations, but don’t work now. For example, antisocial traits might be well-suited to environments with minimal law enforcement and poor reputational mechanisms for keeping people in check; now they will just land you in jail.

Third, there are conditions which are extreme levels of traits which it’s good to have a little of. For example, a little anxiety is certainly useful to prevent people from poking lions with sticks just to see what will happen. Imagine (as a really silly toy model) that two genes A and B determine anxiety, and the optimal anxiety level is 10. Alice has gene A = 8 and gene B = 2. Bob has gene A = 2 and gene B = 8. Both of them are happy well-adjusted individuals who engage in the locally optimal level of lion-poking. But if they reproduce, their child may inherit gene A = 8 and gene B = 8 for a total of 16, much more anxious than is optimal. This child might get diagnosed with an anxiety disorder, but it’s just a natural consequence of having genes for various levels of anxiety floating around in the population.

Fourth, there are conditions which are the failure modes of traits which it’s good to have a little of. For example, psychiatrists have long categorized certain common traits into “schizotypy”, a cluster of characteristics more common in the relatives of schizophrenics and in people at risk of developing schizophrenia themselves. These traits are not psychotic in and of themselves and do not decrease fitness, nor is schizophrenia necessarily just the far end of this distribution. But schizotypal traits are one necessary ingredient of getting schizophrenia; schizophrenia is some kind of failure mode only possible with enough schizotypy. If schizotypal traits do some other good thing, they can stick around in the population, and this will look a lot like “schizophrenia is genetic”.

How can we determine which of these categories any given psychiatric disorder falls into?

One way is through what is called taxometrics – the study of to what degree mental disorders are just the far end of a normal distribution of traits. Some disorders are clearly this way; for example, if you quantify and graph everybody’s anxiety levels, they will form a bell curve, and the people diagnosed with anxiety disorders will just be the ones on the far right tail. Are any disorders not this way? This is a hard question, though schizophrenia is a promising candidate.

Another way is through measuring the correlation of disorders with mutational load. Some people end up with more mutations (and so a generically less functional genome) than others. The most common cause of this is being the child of an older father, since that gives mutations more time to accumulate in sperm cells. Other people seem to have higher mutational load for other, unclear reasons, which can be measured through facial asymmetry and the presence of minor physical abnormalities (like weirdly-shaped ears). If a particular psychiatric disorder is more common in people with increased mutational load, that suggests it isn’t just a functional adaptation but some kind of failure mode of something or other. Schizophrenia and low-functioning autism are both linked to higher mutational load.

Another way is by trying to figure out what aspect of evolutionary strategy matches the occurrence of the disorder. Developmental psychologists talk about various life stages, each of which brings new challenges. For example, adrenache (age 6-8) marks “the transition from early to middle childhood”, when “behavioral plasticity and heightened social learning go hand in hand with the expression of new genetic influences on psychological traits such as agression, prosocial behavior, and cognitive skills” and children receive social feedback “about their attractiveness and competitive ability”. More obviously, puberty marks the expression of still other genetic influences and the time at which young people start seriously thinking about sex. So if various evolutionary adaptations to deal with mating suddenly become active around puberty, and some mental disorder always starts at puberty, that provides some evidence that the mental disorder might be related to an evolutionary adaptation for dealing with mating. Or, since a staple of evo psych is that men and women pursue different reproductive strategies, if some psychiatric disease is twice as common in women (eg depression) or five times as common in men (eg autism), then that suggests it’s correlated with some strategy or trait that one sex uses more than the other.

This is where Del Giudice ties in the life history framework. If some psychiatric disease is more common in people who otherwise seem to be pursuing some life strategy, then maybe it’s related to that strategy. Either it’s another name for that strategy, or it’s another name for an extreme version of that strategy, or it’s a failure mode of that strategy, or it’s associated with some trait or adaptation which that strategy uses more than others do. By determining the association of disorders with certain life strategies, we can figure out what adaptive trait they’re piggybacking on, and from there we can reverse engineer them and try to figure out what went wrong.

This is a much more well-thought-out and orderly way of thinking about psychiatric disease than anything I’ve ever seen anyone else try. How does it work?

Unclear. Psychiatric disorders really resist being put into this framework. For example, some psychiatric disorders have a u-shaped curve regarding childhood quality – they are more common both in people with unusually deprived childhoods and people with unusually good childhoods. Many anorexics are remarkably high-functioning, so much so that even the average clinical psychiatrist takes note, but others are kind of a mess. Autism is classically associated with very low IQ and with bodily asymmetries that indicate high mutational load, but a lot of autistics have higher-than-normal IQ and minimal bodily asymmetry. Schizophrenia often starts in a very specific window between ages 18 and 25, which sounds promising for a developmental link, but a few cases will start at age 5, or age 50, or pretty much whenever. Everything is like this. What is a rational, order-loving evolutionary psychologist supposed to do?

Del Giudice bites the bullet and says that most of our diagnostic categories conflate different conditions. The unusually high-functioning anorexics have a different disease than the unusually low-functioning ones. Low IQ autism with bodily asymmetries has a different evolutionary explanation than high IQ autism without. In some cases, he is able to marshal a lot of evidence for distinct clinical entities. For example, most cases of OCD start in adulthood, but one-third begin in early childhood instead. These early OCD cases are much more likely to be male, more likely to have high conscientiousness, more likely to co-occur with autistic traits, and have a different set of obsessions focusing on symmetry, order, and religion (my own OCD started in very early childhood and I feel called out by this description). Del Giudice says these are two different conditions, one of which is associated with pathogen defense and one of which is associated with a slow life strategy.

Deep down, psychiatrists know that we have not really subdivided the space of mental disorders very well. Every year a new study comes out purporting to have discovered the three types of depression, or the four types of depression, or the five types of depression, or some other number of types of depression that some scientist thinks she has discovered. Often these are given explanatory power, like “number three is the one that doesn’t respond to SSRIs”, or “1 and 2 are biological; 3, 4, and 5 are caused by life events”. All of these seem equally plausible, so given that they all say different things I tend to ignore all of them. So when del Giudice puts depression under his spotlight and finds it subdivides into many different subdisorders, this is entirely fair. Maybe we should be concerned if he didn’t find that.

But part of me is still concerned. If evo psych correctly predicted the characteristics of the psychiatric disorders we observe, then we would count that as theoretical confirmation. Instead, it only works after you replace the psychiatric disorders we observe with another, more subtle set right on the threshold of observation. The more you’re allowed to diverge from our usual understanding, the more chance you have to fudge your results; the more different disorders you can divide things into, the more options you have for overfitting. Del Giudice’s new schema may well be accurate; it just makes it hard to check his accuracy.

On the other hand, reality has a surprising amount of detail. Every previous attempt to make sense of psychopathology has failed. But psychopathology has to make sense. So it must make sense in some complicated way. If you see what looks like a totally random squiggle on a piece of paper, then probably the equation that describes it really is going to have a lot of variables, and you shouldn’t criticize a many-variable equation as “overfitting”. There is a part of me that thinks this book is a beautiful example of what solving a complicated field would look like. You take all the complications, and you explain by layering of a bunch of different simple and reasonable things on top of one another. The psychiatry parts of Evolutionary Psychopathology: A Unified Approach do this. I don’t know if it’s all just epicycles, but it’s a heck of a good try.

I would encourage anyone with an interest in mental health and a tolerance for dense journal-style writing to read the psychiatry parts of this book. Whether or not the hypotheses are right, in the process of defending them it calls in such a wide array of evidence, from so many weird studies that nobody else would have any reason to think about, that it serves as a fantastic survey of the field from an unusual perspective. If you’ve ever wanted to know how many depressed people are reproducing (surprisingly many! about 90 – 100% as many as non-depressed people!) or what the IQ of ADHD people is (0.6 standard deviations below average; the people most of you see are probably from a high-functioning subtype) or how schizophrenia varies with latitude (triples as you move from the equator to the poles, but after adjusting for this darker-skinned people seem to have more, suggesting a possible connection with Vitamin D), this is the book for you.

III.

I want to discuss some political and social implications of this work. These are my speculations only; del Giudice is not to blame.

We believe that an abusive or deprived childhood can negatively affect people’s life chances. So far, we’ve cached this out entirely in terms of brain damage. Children’s developing brains “can’t deal with the trauma” and so become “broken” in ways that make them a less functional adult. Life history theory offers a different explanation. Nothing is “broken”. Deprived children have just looked around, seen what the world is like, and rewired themselves accordingly on some deep epigenetic level.

I was reading this at the same time as the studies on preschool, and I couldn’t help noticing how well they fit together. The preschool studies were surprising because we expected them to improve children’s intelligence. Instead, they improved everything else. Why? This would make sense if the safe environment of preschool wasn’t “fixing” their “broken” brains, but pushing them to follow a slower life strategy. Stay in school. Don’t commit crimes. Don’t have kids while you’re still a teenager. This is exactly what we expect a push towards slow life strategies to do.

Life strategies even predict the “fade-out/fade-in” nature of the effects; the theory specifies that although aspects of life strategy may be set early on, they only “activate” at the appropriate developmental period. From page 93: “The social feedback that children receive in this phase [middle childhood]…may feed into the regulation of puberty timing and shape behavioral strategies in adolescence.”

Society has done a lot to try to help disadvantaged children. A lot of research has been gloomy about the downstream effects; none of it raised anybody’s IQ, there are still lots of poor people around, income inequality continues to increase. But maybe we’re just looking in the wrong place.

On a related note: a lot of intelligent, responsible, basically decent young men complain of romantic failure. Although the media has tried hard to make this look like some kind of horrifying desire to rape everybody because they believe are entitled to whatever and whoever they want, the basic complaint is more prosaic: “I try to be a nice guy who contributes to society and respects others; how come I’m a miserable 25-year-old virgin, whereas every bully and jerk and frat bro I know is able to get a semi-infinite supply of sex partners whom they seduce, abuse, and dump?” This complaint isn’t imaginary; studies have shown that criminals are more likely to have lost their virginity earlier, that boys with more aggressive and dishonest behaviors have earlier age of first sexual intercourse, and that women find men with dark triad traits more attractive. I used to work in a psychiatric hospital that served primarily adolescents with a history of violence or legal issues; most of them had had multiple sexual encounters by age fifteen; only half of MIT students in their late teens and early 20s have had sex at all.

Del Giudice’s work offers a framework by which to understand these statistics. Most MIT students are probably pursuing slow life strategies; most violent adolescents in psych hospitals are probably pursuing fast ones. Fast strategies activate a suite of traits designed for having sex earlier; slow life strategies activate a suite of traits designed for preventing early sex. There’s a certain logical leap here where you have to explain how, if an individual is trying very hard to have teenage sex, his mumble epigenetic mumble mechanism can somehow prevent this. But millions of very vocal people’s lived experiences argue that it can. The good news for these people is that they are adapted for a life strategy which in the past has consistently resulted in reproduction at some point. Maybe when they graduate with a prestigious MIT degree, they will get enough money and status to attract a high-quality slow-strategy mate, who can bear high-quality slow-strategy kids who produce many surviving grandchildren. I don’t know. This hasn’t happened to me yet. Maybe I should have gone to MIT.

Finally, the people who like to say that various things “serve as a justification for oppression” are going to have a field day with this one. Although del Giudice is too scientific to assign any moral weight to his life history strategies, it’s not that hard to import it.

(source)

Life strategies run the risk of reifying some of our negative judgments. If criminals are pursuing a hard-coded antagonistic-exploitative strategy, that doesn’t look good for rehabilitation. Likewise, if some people are pursuing creative-seductive strategies, that provides new force to the warning to avoid promiscuous floozies and stick to your own social class. In the extreme version of this, you could imagine a populism that claims to be fighting for the decent middle-class slow-strategy segment of the population against an antagonistic/exploitative underclass. The creative/seductive people are on thin ice – maybe they should start producing art that looks like something.

(it doesn’t help that this theory is distantly related to an earlier theory proposed by Canadian psychologist John Rushton, who added that black people are racially predisposed to fast strategies and Asians to slow strategies, with white people somewhere in the middle. Del Giudice mentions Rushton just enough that nobody can accuse him of deliberately covering up his existence, then hastily moves on.)

But aside from the psychological compellingness, this doesn’t make a lot of sense. We already know that antagonistic and exploitative people exist in the world. All that life history theory does is exactly what progressives want to do: provide an explanation that links these qualities to childhood deprivation, or to dangerous environments where they may be the only rational choice. Sure, you would have to handwave away the genetic aspect, but you’re going to have be handwaving away some genetics to make this kind of thing work no matter what, and life history theory makes this easier rather than harder. It also provides some testable hypotheses about what aspects of childhood deprivation we might want to target, and what kind of effects we might expect such interventions to have.

Apart from all this, I find life history strategy theory sort of reassuring. Until now, atheists have been denied the comfort of knowing God has a plan for them. Sure, they could know that evolution had a plan for them, but that plan was just “watch dispassionately to see whether they live or die, then adjust gene frequencies in the next generation accordingly”. In life history strategy theory, evolution – or at least your mumble epigenetic mumble mechanism – actually has a plan for you. Now we can be evangelical atheists who have a personal relationship with evolution. It’s pretty neat.

And I come at this from the perspective of someone who has failed at many things despite trying very hard, and also succeeded at others without even trying. This has been a pretty formative experience for me, and it’s seductive to be able to think of all of it as part of a plan. Literally seductive, in the sense of memetic evolution. Like that Hogwarts chart.

Read this book at your own risk; its theories will start creeping into everything you think.

OT116: Opensées Thread

1. I screwed up the WordPress that runs this blog pretty badly. The main effect on your side is that the mailing list disappeared and so no one’s getting email notifications. Don’t bother signing up again as I’m trying to find a way to restore the old list, after which any new ones will be deleted [EDIT: I think this is fixed now].

2. There are rationalist winter solstice celebrations this year in NYC, Boston, Oakland, and Seattle (possibly also elsewhere) at various times through December; see this link for more details. Warning: can be kind of weird.

3. Comment of the week is theredsheep on the new Eastern Orthodox schism, and on the ecclesiastical link between the Patriarch of Constantinople and the USA.

Posted in Uncategorized | Tagged | 619 Comments

Book Review: The Mind Illuminated

I.

The Mind Illuminated is a guide to Buddhist meditation by Culadasa, aka John Yates, a Buddhist meditation teacher who is also a neuroscience PhD. At this point I would be more impressed to meet a Buddhist meditation teacher who wasn’t a neuroscience PhD. If I ever teach Buddhist meditation, this is going to be my hook. “Come learn advanced meditation techniques with Scott Alexander, whose lack of a neuroscience PhD gives him a unique perspective that combines ancient wisdom with a lack of modern brain science.” I think the world is ready for someone to step into this role. But Culadasa is not that person, and The Mind Illuminated is not that book.

I am trying not to read too many books on spiritual practices until I’m ready to practice some spirituality. I made an exception for TMI because lots of people recommended it to me for its description of how the brain works. This seems like the sort of thing that Buddhist meditation teachers who are also neuroscientists could have insight on, so I decided to check it out.

Tradition divides meditation into two parts: concentration meditation, where you sharpen and control your focus, versus insight meditation, where you investigate the nature of perception and reality. TMI follows a long tradition of focusing on concentration meditation, with the assumption that insight meditation will become safer and easier once you’ve mastered concentration, and maybe partly take care of itself. Its course divides concentration meditation into ten stages. Early stages contain basic tasks like setting up a practice, focusing on the breath, and overcoming distractability. Later stages are more interesting; the ninth stage is learning how to calm the intensity of your meditative joy; apparently without special techniques “overly intense joy” becomes a big problem.

I usually hate meditation manuals, because they sound like word salad. “One attains joy by combining pleasure with happiness. Pleasure is a state of bliss which occurs when one concentrates focus on the understanding of awareness. Happiness is a state of joy that occurs when one focuses concentration on the awareness of understanding. By focusing awareness on bliss, you can increase the pleasure of understanding, which in turn causes concentration to be pleasant and joy to be blissful, and helps you concentrate on understanding your awareness of happiness about the bliss of focus.” At some point you start thinking “Wait, were all the nouns in that paragraph synonyms for each other?”

Culadasa avoids this better than most people. Whenever he introduces a term, he puts it in bolded italicized letters, and includes it in a glossary at the back. He tries to stick to multiple-word-phrases that help clarify the concept, like “bliss of physical pliancy” or “meditative joy”, instead of just calling one thing “joy” and the other thing “bliss” and hoping you remember which is which. He includes a section on what he means by distinguishing “awareness” from “attention”, and admits that some of these are tough choices that do not necessarily cooperate with the spirit of the English language. And his division of the material into stages helps ensure you’re not reading a term until you’re somewhere around the point of personally experiencing the quality being discussed.

This is characteristic of the level of care taken in this book, which despite its unfortunate acronym does a good job of presenting just the right amount of information. For example, when people say “meditate on the breath”, I can only do this for a little while until I notice that the breath doesn’t really exist as a specific object you can concentrate on. Really there are just a bunch of disconnected sensations changing at every moment. What do you concentrate on? I had previously dismissed this as one of several reasons why obsessive-compulsive people shouldn’t do meditation, but TMI describes exactly this issue, says that it is normal and correct to worry about it, and prescribes solutions: concentrate on the disconnected sensations of the breath in whatever way feels easiest for the first few stages, and once you’ve increased awareness to the point where you can notice each subpart of the breath individually, do that.

II.

TMI also solves a whole slew of my obsessive questions and concerns with its “attention vs. awareness” dichotomy.

I had always been confused by instructions like “concentrate on the breath until you feel joy, then notice the joy”. Usually what would happen was: I would concentrate on the breath, ask myself “am I feeling joy yet?”, spend some time trying to figure this out, realize my attention had deviated from the breath, put my attention back on the breath, then feel bad because I wasn’t checking to see if I was feeling joy or not. How could I both have 100% of my attention on the breath, but also be checking my joy? If I came up with the policy “check once per minute for joy, then go back to the breath”, how would I avoid checking arbitrarily often whether it felt like a minute had gone by? This was another issue I just dismissed as “maybe meditation is not for obsessive-compulsive people”.

But TMI distinguishes between “attention” (sometimes “focused attention”) as the one thing in the foreground of your brain, and awareness (sometimes “peripheral awareness”) as the potentially many things in the background of your brain. Think of it working the same way as central vs. peripheral vision. When given instructions like “concentrate on the breath until you feel joy, then notice the joy”, you should be focusing your entire attention on the breath, but potentially noticing joy in your peripheral awareness. These instructions are no more contradictory than “look at this dot on the wall straight ahead, but notice if a dog runs past”.

The book urges meditators to avoid a state of hyperfocus in which they are so intent on the breath that they would not notice the house falling down around them. It says this is a trap that will not build the proper habits of mind to continue to higher stages and do insight meditation later. It recommends instead a form of practice in which meditators, while keeping their attention on the breath, are constantly monitoring both for external events like barking dogs or the house falling, and for internal events like feeling hungry or having thoughts. This last one sort of makes me want to scream: how can I monitor whether or not I am having thoughts without thinking about it, in which case the answer is always ‘yes’? But this is exactly the kind of paradox that the attention/awareness dichotomy is supposed to overcome. You can keep attention on the breath, notice a thought arising in the periphery of your awareness, and gently note it and push it away, all without shifting attention.

Culadasa is very excited about this:

One great example of [a new perspective] is the distinction I make in this book between attention and awareness. Despite hundreds of thousands of meditators practicing over millennia, it has never before been clearly conceptualized that the ordinary mind has two distinct ways of “knowing”, even though these different ways of knowing have so much to do with achieving the goals of meditation. However, cognitive psychology and neuroscience have recently shown that there are two distinctly different kinds of knowing that involve completely different parts of the brain. This is a finding that deeply informs new ways of practicing meditation and interpreting our meditation experiences, from beginner to adept. This is only one example, but the point should be obvious: meditation can guide and inform neuroscience, and neuroscience can do the same for meditation.

I would usually be pretty reluctant to propose that hundreds of thousands of meditators practicing over millennia had all just missed something really important. And I have to admit that in the two or three test meditations I have done since reading this, I have had as much trouble as ever with these issues, and don’t notice an attention/awareness distinction that becomes obvious now that I have the terms I need to understand it. But realistically maybe something like this has to be true for most discussion about meditation to make sense at all.

III.

TMI gives its model of how the mind works in six interludes distributed among the chapters on meditation advice.

It begins with a startling claim that mental time is granular, and only one item can be in consciousness per granule-moment. The seven main types of items that can occupy a moment of consciousness are sight, sound, smell, taste, touch, thought, and a “binding moment” that combines aspects of the previous six. Each moment of consciousness is completely static. The only reason things seem to move or thoughts seem to flow is because the moments of consciousness are moving from moment to moment faster than you can detect, like a movie which flips from still frame to still frame so quickly that it gets perceived as continuous action. Culadasa also compares it to a “string of beads”, with each bead being a particular kind of moment (sight, sound, etc).

There are never two things in consciousness at the same time. If you think there are, that’s either because your consciousness is switching back and forth from thing to thing so quickly that you can’t follow it, or because your consciousness is perceiving a “binding moment” that presents a single aspect including both of those things. For example, if you see a cat, and you hear a meow, you might experience a “binding moment” in which you think you hear the cat meowing, although really what has happened is SIGHT:CAT — SOUND:MEOW — BINDING:(CAT, MEOW).

This sounds to me like it completely reverses the point made in the attention/awareness dichotomy, where you can be attentive to one thing but aware of many others at the same time. After all, if consciousness can only contain one thing at a time, what room is there for peripheral awareness? Culadasa states that each individual moment is either a moment of attention, or a moment of awareness. Moments of awareness can contain many things:

For example, say you’re sitting on a cabin deck in the mountains, gazing out at the view. Each moment of visual awareness will include a variety of objects — mountains, trees, birds, and sky—all at the same time. Auditory moments of awareness will include all the various sounds that make up the audible background — birdsong, wind in the trees, a babbling brook, and so forth—again, all at the same time. On the other hand, moments of visual attention might be restricted just to the bird you’re watching on a nearby branch. Auditory attention might include only the sounds the birds are making. Even when your attention is divided among several things at once—perhaps you’re knitting or whittling a piece of wood while you sit—moments of attention are still limited to a small number of objects. Finally, binding moments of attention and binding moments of awareness take the content from the preceding sensory moments and combine them into a whole: “Sitting on the deck, looking out at the mountain, while carving a piece of wood.”

Now, let’s consider the second difference: the degree of mental processing in moments of awareness versus moments of attention. Individual moments of awareness provide information about a lot of things at once, but the information has only been minimally processed. The result is our familiar experience of peripheral awareness of many things in the background. However, these moments of awareness do include some simple interpretations of sense data. You may be aware that the sounds you hear are from “traffic,” or that the things in the background of your visual field are “trees.” These simple concepts help evaluate and categorize all that information, contributing to our understanding of the present context. Although these preliminary interpretations don’t usually lead to any kind of action, some part of this information is frequently referred to attention for more analysis. Other times— say, when the sound of traffic suddenly includes screeching tires — the information in peripheral awareness can trigger an automatic action, thought, or emotion, any of which can then become an object of attention.

This still seems strained, but I grudgingly admit it kind of works.

TMI builds on this idea to create the “mind-system model”, its explanation for what consciousness is and why we have it. In this model, there are many “subminds”. The book is a little vague on how many there are or what level of complexity we’re supposed to be imagining here, and whether they represent only the few most salient divisions (eg “the visual system”) or are more numerous and abstract (eg “the part of your brain that likes to play computer games”), but I get the impression it’s closer to the latter. These subminds usually do their own thing, but sometimes have conflicting agendas.

Consciousness is a neutral ground shared by all subminds:

Here’s the picture presented so far: every sub-mind belongs either to the unconscious sensory or unconscious discriminating mind. Each sub-mind performs its own specialized task independently of others, and all at the same time. Each can project content into consciousness, as well as initiate actions. Obviously, there’s enormous potential for conflict and inefficiency, if not total chaos. This is where consciousness fits into the picture: the conscious mind provides an “interface” that allows these unconscious sub-minds to communicate with each other and work together cooperatively. With all these unconscious sub-minds working independently and at the same time, the potential for conflict is enormous. The conscious mind is what allows them to work together cooperatively.

The conscious mind acts as a universal recipient of information. It can receive information from each and every separate, unconscious sub-mind. In fact, all conscious experience is simply an ongoing stream of moments of consciousness whose content has been projected into the conscious mind by unconscious sub¬minds. Then, when information enters consciousness, it becomes immediately available to all the other sub-minds. Therefore, the conscious mind also serves as a universal source of information. Because the conscious mind is both a universal recipient and a universal source of information, all the unconscious sub-minds can interact with each other through the conscious mind.

As a helpful image, picture the whole mind-system as a kind of corporation. It is made up of different departments and their employees, each with distinct roles and responsibilities. These are the unconscious sub-minds. At the top of the corporate structure is the “boardroom,” or conscious mind. The diligent employees working in their separate departments produce reports, which get sent to the boardroom to be discussed further and perhaps acted on. In other words, the unconscious sub-minds send information up into the conscious mind. The conscious mind is simply a passive “space” where all the other minds can meet. In this “boardroom of the mind” metaphor, the conscious mind is where important activities of the mind-system get brought up, discussed, and decided on. One, and only one, sub-mind can present its information at a time, and that’s what creates single moments of consciousness. The object of consciousness during that moment becomes part of the current agenda, and is made simultaneously available to all the other sub-minds for further processing. In subsequent moments, they project the results of their further processing into consciousness, creating a discussion that leads to conclusions and decisions.

If this sounds familiar, it’s because as far as I can tell it’s a rebranding of Bernard Baars’ global workspace theory of consciousness. I like global workspace theory and have always considered it the most plausible solution to the easy problems of consciousness. I’m a little bit concerned that Culadasa never mentions global workspace theory in the book, and that I’ve never heard of any connection between global workspace theory and Buddhism before. Not really sure what to make of this.

TMI continues:

Just because information projected into consciousness becomes available to every sub-mind of the mind-system, that doesn’t mean they all receive it. It’s like a radio show: the show is being broadcast, but not everyone is tuning in to listen.

Meditation increases the degree to which individual subminds are tuned in to consciousness. Since the book will later say this is all a metaphor, I think a better way of framing this might be “increase the bandwidth of the connections between the individual subminds”. When someone says meditators are “more conscious” or have “higher awareness” than non-meditators, they mean that more sub-minds are tuned in to consciousness more closely at any given time.

This accomplishes what Culadasa calls “unification of mind”; with more bandwidth, the subminds are able to resolve their conflicting priorities and act more like a single unit. This can start out sort of ugly; there can be good reasons why some heavily repressed and traumatized subminds aren’t usually invited to the table, and the creation of new links between them and the global workspace feels from the inside like scary unconscious material welling up into the psyche. But this is part of the “negotiation” it takes for these subminds to unify; with enough meditation, the system will assimilate their insights and they will join the Borg like everyone else.

This isn’t enlightenment. Enlightenment is something else. TMI calls it a “cessation event”:

A cessation event is where unconscious sub-minds remain tuned in and receptive to the contents of consciousness, while at the same time, none of them project any content into consciousness. Then,consciousness ceases — completely. During that period, at the level of consciousness there is a complete cessation of mental fabrications of any kind — of the illusory, mind-generated world that otherwise dominates every conscious moment. This, of course, also entails a complete cessation of craving, intention, and suffering. The only information that tuned in sub-minds receive during this event is the fact of a total absence.

What makes this the most powerful of all Insight experiences is what happens in the last few moments of consciousness leading up to the cessation. First, an object arises in consciousness that would normally produce craving. It can be almost anything. However, what happens next is quite unusual: the mind doesn’t respond with the habitual craving and clinging. Rather, it fully understands the object from the perspective of Insight: as a mental construct, completely “empty” of any real substance, impermanent, and a cause of suffering. This profound realization leads to the next and final moment of complete equanimity, in which the shared intention of all the unified sub-minds is to not respond. Because nothing is projected into consciousness, the cessation event arises. With cessation, the tuned-in sub-minds simultaneously realize that everything appearing in consciousness is simply the product of their own activity. In other words, they realize that the input they’re accustomed to receiving is simply a result of their own fabricating activities.

I usually hate theories that explain the brain based on subminds. They seem too easy, in the way anthropomorphizing is always too easy. Want to run marathons, but spend your time drinking beer instead? Just model yourself as having a marathon-running submind and a beer-drinking submind, and they’re fighting, and the beer-drinking submind is winning. Do this enough times and you’ll never figure out anything about hyperbolic discounting or reinforcement learning or any of the very important principles that govern what the brain actually does and which do not look like little people fighting inside of you. Your solutions will always look like some weird form of therapy based on starting a dialogue with the beer-drinking submind and convincing it that beer isn’t so good after all, which never works, and you’ll never get around to taking Adderall, which for some reason will cause all the little men inside your head to change their opinions to whatever you wanted in the first place.

For whatever reason, TMI’s mind-system model doesn’t bother me as much. Maybe it’s because he’s not trying to invent yet another new age psychotherapy to help fight procrastination. Maybe because it’s in the context of global workspace theory, which I already like. Or maybe it’s because the idea of modules and processes without enough bandwidth to connect to the global workspace sounds less anthropomorphic than little people who make you drink beer because they like beer.

IV.

This is a very optimistic book.

Buddhism started out with Theravada teachers saying it would take millions of lifetimes to reach enlightenment. Then the Mahayana and Vajrayana schools started saying maybe you could reach enlightenment in one lifetime, if you did everything right and worked very hard. Recently I’ve been reading works by modern teachers like Daniel Ingram and Vinay Gupta, who compare the amount of work involved in enlightenment to the amount of work in an MD or PhD – maybe five years? But Culadasa states that “for householders who practice properly, it’s possible to master the Ten Stages within a few months or years”, adding in a footnote:

The Dalai Lama has said “If one knows the nature, order, and distinctions of the levels explained above without error and cultivates calm abiding, one can easily generate faultless meditative stabilization in about a year.” When I first began teaching, I also believed that with diligent practice most people should be able to master all Ten Stages in less than a year. I have since learned that this is not realistic in terms of most people, and making such a flat pronouncement can be discouraging for those who have been practicing much longer without attaining that mastery.

So fine, only cool people can get mastery in less than a year. Still, this is a dramatic promise. But then why are there so many cultures where monks study their entire lives in monasteries? Monks have big advantages over the sort of “householder” meditators Culadasa is talking about – they can meditate every waking hour, they have access to the best teachers. Surely they should all get enlightened within a few months? I have read some work on the idea of “multiple paths” and “endless dharma gates” which suggest that what is ordinarily called enlightenment is just the first and most obvious step on an endless process of personal exploration. But when I read about historical Buddhism culture, it still seems like a majority of monks at any given time are unenlightened, including those who have been at the monastery many years.

Maybe all of this Western rationality and efficiency really is that great, and by cutting out the chaff modern people can get enlightenment much faster than the ancients could? Is this true in any other field? I get the impression that modern schoolchildren still master subjects like geometry or Latin at about the same age that the medievals would, though I could be wrong about this. Maybe Culadasa was right when he claimed his book includes important distinctions that hundreds of thousands of meditators working for thousands of years have missed. Maybe the past was just stupid and anybody moderately competent can make order-of-magnitude improvements. I don’t know. It seems like a pretty big claim, though.

(or maybe this is overcomplicating things. It’s not necessarily contradictory to say that a talented person, practicing an hour a day, could go from “zero math” to “able to solve calculus problems” in a year, but also that the average student has been studying math for ten years and can’t solve calculus problems.)

TMI also feels optimistic in comparison to another meditation book I reviewed, Mastering the Core Teachings of the Buddha. Its author, Daniel Ingram, counts himself as part of the same “pragmatic dharma” movement as Culadasa, and the two of them have occasionally cooperated on various things and taught together. But Ingram stresses that meditation and enlightenment do not provide many of the worldly gains their advocates promise, and in many cases can make things worse. He warns of what he called “the Dark Night”, a tendency for people midway along the path of meditation to shatter their psyches and fall into states of profound depression and agitation.

Culadasa has a rosier view of both points. He believes that the “unification of mind” produced by meditation will have its common-sense result of reducing internal conflict and improving “willpower”; it will also “overcome all harmful emotions and behavior”, leaving you with few things to worry about except the looming specter of excessive joy.

As for the Dark Night, he doesn’t like the term, and only gives it one sentence in the main text of the book plus two pages in an appendix. The two pages reassure us that enough practice in concentration meditation serves as a prophylactic:

One of the greatadvantages of samatha [concentration meditation] is that it makes it easier to confront the Insights into impermanence, emptiness, the pervasive nature of suffering, and the insubstantiality of the Self that produce Awakening. Without samatha, these challenging Insights have the potential to send a practitioner spiraling into a “dark night of the soul”.

Since the whole book is about samatha meditation, and treats everything else as something that happens naturally while you’re doing samatha, this makes it sound pretty minimal; just do what you would be doing anyway and you’ll be fine. This is a big difference from Ingram, who thinks that explaining the risk of the Dark Night and how to get through it is one of the most important jobs of a meditation teacher. Culadasa endorses this difference:

Have I seen in my students anything remotely resembling a “dark night” as defined above? Absolutely not. Nor can I recall ever having seen the sorts of extreme experiences of the dukkha nanas that are appearing so frequently in these online discussions.

There seems to be something of a consensus in the relevant community that Culadasa’s type of practice, which is called “wet” (ie includes concentration and jhanas) may be less likely to produce these kinds of problems than the so-called “dry insight” that Ingram discusses, and that if you’re doing everything right maybe you shouldn’t worry about it. Shinzen Young is another meditation teacher who moves in the same circles as Ingram and Culadasa. I found his perspective on this the most informative:

Historically it is not a term from the Buddhist meditative tradition but rather from the Roman Catholic meditative tradition. (Of course, there’s nothing wrong with using Christian terms for Buddhist experiences but…). One must clearly define what one means by a “Dark Night” within the context of Buddhist experience.

It is certainly the case that almost everyone who gets anywhere with meditation will pass through periods of negative emotion, confusion, disorientation, and heightened sensitivity to internal and external arisings. It is also not uncommon that at some point, within some domain of experience, for some duration of time, things may get worse before they get better. The same thing can happen in psychotherapy and other growth modalities. For the great majority of people, the nature, intensity, and duration of these kinds of challenges is quite manageable. I would not refer to these types of experiences as “Dark Night.”

I would reserve the term for a somewhat rarer phenomenon. This phenomenon, within the Buddhist tradition, is sometimes referred to as “falling into the Pit of the Void.” It entails an authentic and irreversible insight into Emptiness and No Self. What makes it problematic is that the person interprets it as a bad trip. Instead of being empowering and fulfilling, the way Buddhist literature claims it will be, it turns into the opposite. In a sense, it’s Enlightenment’s Evil Twin. This is serious but still manageable through intensive, perhaps daily, guidance under a competent teacher. In some cases it takes months or even years to fully metabolize, but in my experience the results are almost always highly positive. For details, see The Five Ways manual pages 97-98.

This whole Dark Night discussion reminds me of a certain Zen Koan. Although the storyline of this koan is obviously contrived, it does contain a deep message. Here’s how the koan goes: A monk is walking on a precipitous path and slips but is able to grab onto a branch by his teeth. A person standing below, recognizing the monk as an enlightened master, asks him to describe Enlightenment. What should the monk do? As a teacher, he’s duty bound to speak, but as soon as he speaks, the consequences will be dire. It sounds like a lose/lose situation. If you were the monk, what would you do? That’s the koan.

If we don’t describe the possibility of Dark Night, then we leave people without a context should it occur. On the other hand, if we do discuss it, people get scared and assume it’s going to happen to them, even if we point out (as I just did), that it’s relatively infrequent. So the take-home message is:

1. Don’t worry, it’s probably not going to happen to you.
2. Even if it does, that’s not necessarily a problem.

It may require input from a teacher and time but once it’s integrated, you’ll be a very, very happy camper.

I think it would be a good thing if people lighten up around this issue. This may help (see attached cartoon).

From this I gather that Culadasa is closer to the mainstream on this issue (also, that enlightenment does not help the mind overcome a propensity to dad jokes).

There’s a lot of drama over this issue, and if you want you can find a bunch of really enlightened and compassionate pot-shots that the different teachers are taking at each other over their respective positions. The only insight I can add to this comes from my medical experience, where I notice a very similar phenomenon in how many side effects people accord certain drugs. For example, although some people will say SSRI discontinuation syndrome is toxic and scary and omnipresent and a good reason never to use SSRIs at all, my experience in five years of taking dozens of people on and off various SSRIs is that I’ve never seen it happen beyond an occasional mild headache if the drugs are tapered properly. I know there are studies that disagree with my experience, but that is definitely my experience. Part of is is probably a difference in what kind of expectations (in yourself or your patients/students). Another part is probably a difference in what your patients/students communicate to you. A third part is probably actual differences in the way you prescribe or teach. All of these combined can be pretty powerful.

But the biggest difference I notice is that a “serious” side effect is the one you (or one of your patients) has had, a “minor” side effect is one that you haven’t. If a certain drug works great for 95% of people, but causes a month of constant vomiting for 5%, then a doctor who’s used it a few times and always gotten the great results will think of it as great (plus a rare side effect that doesn’t cause lasting damage) and a patient who has been vomiting constantly for a month will think of it as an evil poison which should never have been made legal (even though most people get lucky and don’t have any problems).

Shinzen says that meditation can definitely cause something terrible called “falling into the Pit of the Void”, but that it usually doesn’t happen, and that with daily guidance you will get better after a few months or years, and so basically it’s not a big problem. My guess is that the person who has been trapped in some kind of weird bad trip for several months thinks of it as a very big problem, and wants everybody to warn about it all the time. All of this closely matches the way I’ve seen doctors and patients talk about medication side effects. I’m not sure there’s a difference here except the hard-to-navigate first-person difference of “did it happen to me?”

But overall Culadasa’s optimism seems justified here. Maybe it’s the only approach to this topic that seems justified. Imagine if there was something you could do an hour a day for a year or two, which would win you more willpower and a release from all suffering, with less side effects than the average SSRI? Why aren’t we all doing it?

For more information, you can also check out Culadasa’s website and the The Mind Illuminated subreddit.

Is Science Slowing Down?

[This post was up a few weeks ago before getting taken down for complicated reasons. They have been sorted out and I’m trying again.]

Is scientific progress slowing down? I recently got a chance to attend a conference on this topic, centered around a paper by Bloom, Jones, Reenen & Webb (2018).

BJRW identify areas where technological progress is easy to measure – for example, the number of transistors on a chip. They measure the rate of progress over the past century or so, and the number of researchers in the field over the same period. For example, here’s the transistor data:

This is the standard presentation of Moore’s Law – the number of transistors you can fit on a chip doubles about every two years (eg grows by 35% per year). This is usually presented as an amazing example of modern science getting things right, and no wonder – it means you can go from a few thousand transistors per chip in 1971 to many million today, with the corresponding increase in computing power.

But BJRW have a pessimistic take. There are eighteen times more people involved in transistor-related research today than in 1971. So if in 1971 it took 1000 scientists to increase transistor density 35% per year, today it takes 18,000 scientists to do the same task. So apparently the average transistor scientist is eighteen times less productive today than fifty years ago. That should be surprising and scary.

But isn’t it unfair to compare percent increase in transistors with absolute increase in transistor scientists? That is, a graph comparing absolute number of transistors per chip vs. absolute number of transistor scientists would show two similar exponential trends. Or a graph comparing percent change in transistors per year vs. percent change in number of transistor scientists per year would show two similar linear trends. Either way, there would be no problem and productivity would appear constant since 1971. Isn’t that a better way to do things?

A lot of people asked paper author Michael Webb this at the conference, and his answer was no. He thinks that intuitively, each “discovery” should decrease transistor size by a certain amount. For example, if you discover a new material that allows transistors to be 5% smaller along one dimension, then you can fit 5% more transistors on your chip whether there were a hundred there before or a million. Since the relevant factor is discoveries per researcher, and each discovery is represented as a percent change in transistor size, it makes sense to compare percent change in transistor size with absolute number of researchers.

Anyway, most other measurable fields show the same pattern of constant progress in the face of exponentially increasing number of researchers. Here’s BJRW’s data on crop yield:

The solid and dashed lines are two different measures of crop-related research. Even though the crop-related research increases by a factor of 6-24x (depending on how it’s measured), crop yields grow at a relatively constant 1% rate for soybeans, and apparently declining 3%ish percent rate for corn.

BJRW go on to prove the same is true for whatever other scientific fields they care to measure. Measuring scientific progress is inherently difficult, but their finding of constant or log-constant progress in most areas accords with Nintil’s overview of the same topic, which gives us graphs like

…and dozens more like it. And even when we use data that are easy to measure and hard to fake, like number of chemical elements discovered, we get the same linearity:

Meanwhile, the increase in researchers is obvious. Not only is the population increasing (by a factor of about 2.5x in the US since 1930), but the percent of people with college degrees has quintupled over the same period. The exact numbers differ from field to field, but orders of magnitude increases are the norm. For example, the number of people publishing astronomy papers seems to have dectupled over the past fifty years or so.

BJRW put all of this together into total number of researchers vs. total factor productivity of the economy, and find…

…about the same as with transistors, soybeans, and everything else. So if you take their methodology seriously, over the past ninety years, each researcher has become about 25x less productive in making discoveries that translate into economic growth.

Participants at the conference had some explanations for this, of which the ones I remember best are:

1. Only the best researchers in a field actually make progress, and the best researchers are already in a field, and probably couldn’t be kept out of the field with barbed wire and attack dogs. If you expand a field, you will get a bunch of merely competent careerists who treat it as a 9-to-5 job. A field of 5 truly inspired geniuses and 5 competent careerists will make X progress. A field of 5 truly inspired geniuses and 500,000 competent careerists will make the same X progress. Adding further competent careerists is useless for doing anything except making graphs look more exponential, and we should stop doing it. See also Price’s Law Of Scientific Contributions.

2. Certain features of the modern academic system, like underpaid PhDs, interminably long postdocs, endless grant-writing drudgery, and clueless funders have lowered productivity. The 1930s academic system was indeed 25x more effective at getting researchers to actually do good research.

3. All the low-hanging fruit has already been picked. For example, element 117 was discovered by an international collaboration who got an unstable isotope of berkelium from the single accelerator in Tennessee capable of synthesizing it, shipped it to a nuclear reactor in Russia where it was attached to a titanium film, brought it to a particle accelerator in a different Russian city where it was bombarded with a custom-made exotic isotope of calcium, sent the resulting data to a global team of theorists, and eventually found a signature indicating that element 117 had existed for a few milliseconds. Meanwhile, the first modern element discovery, that of phosphorous in the 1670s, came from a guy looking at his own piss. We should not be surprised that discovering element 117 needed more people than discovering phosphorous.

Needless to say, my sympathies lean towards explanation number 3. But I worry even this isn’t dismissive enough. My real objection is that constant progress in science in response to exponential increases in inputs ought to be our null hypothesis, and that it’s almost inconceivable that it could ever be otherwise.

Consider a case in which we extend these graphs back to the beginning of a field. For example, psychology started with Wilhelm Wundt and a few of his friends playing around with stimulus perception. Let’s say there were ten of them working for one generation, and they discovered ten revolutionary insights worthy of their own page in Intro Psychology textbooks. Okay. But now there are about a hundred thousand experimental psychologists. Should we expect them to discover a hundred thousand revolutionary insights per generation?

Or: the economic growth rate in 1930 was 2% or so. If it scaled with number of researchers, it ought to be about 50% per year today with our 25x increase in researcher number. That kind of growth would mean that the average person who made $30,000 a year in 2000 should make $50 million a year in 2018.

Or: in 1930, life expectancy at 65 was increasing by about two years per decade. But if that scaled with number of biomedicine researchers, that should have increased to ten years per decade by about 1955, which would mean everyone would have become immortal starting sometime during the Baby Boom, and we would currently be ruled by a deathless God-Emperor Eisenhower.

Or: the ancient Greek world had about 1% the population of the current Western world, so if the average Greek was only 10% as likely to be a scientist as the average modern, there were only 1/1000th as many Greek scientists as modern ones. But the Greeks made such great discoveries as the size of the Earth, the distance of the Earth to the sun, the prediction of eclipses, the heliocentric theory, Euclid’s geometry, the nervous system, the cardiovascular system, etc, and brought technology up from the Bronze Age to the Antikythera mechanism. Even adjusting for the long time scale to which “ancient Greece” refers, are we sure that we’re producing 1000x as many great discoveries as they are? If we extended BJRW’s graph all the way back to Ancient Greece, adjusting for the change in researchers as civilizations rise and fall, wouldn’t it keep the same shape as does for this century? Isn’t the real question not “Why isn’t Dwight Eisenhower immortal god-emperor of Earth?” but “Why isn’t Marcus Aurelius immortal god-emperor of Earth?”

Or: what about human excellence in other fields? Shakespearean England had 1% of the population of the modern Anglosphere, and presumably even fewer than 1% of the artists. Yet it gave us Shakespeare. Are there a hundred Shakespeare-equivalents around today? This is a harder problem than it seems – Shakespeare has become so venerable with historical hindsight that maybe nobody would acknowledge a Shakespeare-level master today even if they existed – but still, a hundred Shakespeares? If we look at some measure of great works of art per era, we find past eras giving us far more than we would predict from their population relative to our own. This is very hard to judge, and I would hate to be the guy who has to decide whether Harry Potter is better or worse than the Aeneid. But still? A hundred Shakespeares?

Or: what about sports? Here’s marathon records for the past hundred years or so:

In 1900, there were only two local marathons (eg the Boston Marathon) in the world. Today there are over 800. Also, the world population has increased by a factor of five (more than that in the East African countries that give us literally 100% of top male marathoners). Despite that, progress in marathon records has been steady or declining. Most other Olympics sports show the same pattern.

All of these lines of evidence lead me to the same conclusion: constant growth rates in response to exponentially increasing inputs is the null hypothesis. If it wasn’t, we should be expecting 50% year-on-year GDP growth, easily-discovered-immortality, and the like. Nobody expected that before reading BJRW, so we shouldn’t be surprised when BJRW provide a data-driven model showing it isn’t happening. I realize this in itself isn’t an explanation; it doesn’t tell us why researchers can’t maintain a constant level of output as measured in discoveries. It sounds a little like “God wouldn’t design the universe that way”, which is a kind of suspicious line of argument, especially for atheists. But it at least shifts us from a lens where we view the problem as “What three tweaks should we make to the graduate education system to fix this problem right now?” to one where we view it as “Why isn’t Marcus Aurelius immortal?”

And through such a lens, only the “low-hanging fruits” explanation makes sense. Explanation 1 – that progress depends only on a few geniuses – isn’t enough. After all, the Greece-today difference is partly based on population growth, and population growth should have produced proportionately more geniuses. Explanation 2 – that PhD programs have gotten worse – isn’t enough. There would have to be a worldwide monotonic decline in every field (including sports and art) from Athens to the present day. Only Explanation 3 holds water.

I brought this up at the conference, and somebody reasonably objected – doesn’t that mean science will stagnate soon? After all, we can’t keep feeding it an exponentially increasing number of researchers forever. If nothing else stops us, then at some point, 100% (or the highest plausible amount) of the human population will be researchers, we can only increase as fast as population growth, and then the scientific enterprise collapses.

I answered that the Gods Of Straight Lines are more powerful than the Gods Of The Copybook Headings, so if you try to use common sense on this problem you will fail.

Imagine being a futurist in 1970 presented with Moore’s Law. You scoff: “If this were to continue only 20 more years, it would mean a million transistors on a single chip! You would be able to fit an entire supercomputer in a shoebox!” But common sense was wrong and the trendline was right.

“If this were to continue only 40 more years, it would mean ten billion transistors per chip! You would need more transistors on a single chip than there are humans in the world! You could have computers more powerful than any today, that are too small to even see with the naked eye! You would have transistors with like a double-digit number of atoms!” But common sense was wrong and the trendline was right.

Or imagine being a futurist in ancient Greece presented with world GDP doubling time. Take the trend seriously, and in two thousand years, the future would be fifty thousand times richer. Every man would live better than the Shah of Persia! There would have to be so many people in the world you would need to tile entire countries with cityscape, or build structures higher than the hills just to house all of them. Just to sustain itself, the world would need transportation networks orders of magnitude faster than the fastest horse. But common sense was wrong and the trendline was right.

I’m not saying that no trendline has ever changed. Moore’s Law seems to be legitimately slowing down these days. The Dark Ages shifted every macrohistorical indicator for the worse, and the Industrial Revolution shifted every macrohistorical indicator for the better. Any of these sorts of things could happen again, easily. I’m just saying that “Oh, that exponential trend can’t possibly continue” has a really bad track record. I do not understand the Gods Of Straight Lines, and honestly they creep me out. But I would not want to bet against them.

Grace et al’s survey of AI researchers show they predict that AIs will start being able to do science in about thirty years, and will exceed the productivity of human researchers in every field shortly afterwards. Suddenly “there aren’t enough humans in the entire world to do the amount of research necessary to continue this trend line” stops sounding so compelling.

At the end of the conference, the moderator asked how many people thought that it was possible for a concerted effort by ourselves and our institutions to “fix” the “problem” indicated by BJRW’s trends. Almost the entire room raised their hands. Everyone there was smarter and more prestigious than I was (also richer, and in many cases way more attractive), but with all due respect I worry they are insane. This is kind of how I imagine their worldview looking:

I realize I’m being fatalistic here. Doesn’t my position imply that the scientists at Intel should give up and let the Gods Of Straight Lines do the work? Or at least that the head of the National Academy of Sciences should do something like that? That Francis Bacon was wasting his time by inventing the scientific method, and Fred Terman was wasting his time by organizing Silicon Valley? Or perhaps that the Gods Of Straight Lines were acting through Bacon and Terman, and they had no choice in their actions? How do we know that the Gods aren’t acting through our conference? Or that our studying these things isn’t the only thing that keeps the straight lines going?

I don’t know. I can think of some interesting models – one made up of a thousand random coin flips a year has some nice qualities – but I don’t know.

I do know you should be careful what you wish for. If you “solved” this “problem” in classical Athens, Attila the Hun would have had nukes. Remember Yudkowsky’s Law of Mad Science: “Every eighteen months, the minimum IQ necessary to destroy the world drops by one point.” Do you really want to make that number ten points? A hundred? I am kind of okay with the function mapping number of researchers to output that we have right now, thank you very much.

The conference was organized by Patrick Collison and Michael Nielsen; they have written up some of their thoughts here.

Impending Survey Discussion Thread

Sorry, decreased blogging this week because of Thanksgiving.

But I am going to post a new SSC survey in a few weeks. Feel free to use this thread to tell me what you want – from questions you want to see, to methodology issues that bothered you on past surveys, to whatever.

Keep in mind I will probably have to ignore the overwhelming majority of suggestions here.

Posted in Uncategorized | Tagged , | 290 Comments

Links 11/18: MayflowURL

In 532, the Byzantines and Persians signed what they called The Perpetual Peace, so named because it was expected to last forever. It lasted eight years. After the ensuing war, the Byzantines and Persians, now less optimistic, named their new treaty The Fifty Year Peace. It lasted ten years.

Patrick Collison and Michael Nielson on diminishing returns from science. Some of you have already seen my thoughts on this, but I’ll post them here in a week or two.

Wikipedia has a page on Armenia/Azerbaijan relations in the Eurovision Song Contest. Highlights include the time Azerbaijan’s secret police rounded up everyone who voted for Armenia, the time Armenia claimed Azerbaijan cut off the broadcast to prevent people from seeing Armenia winning, and accusations from Azerbaijani officials that vapid Armenian love song “Don’t Deny” was dog-whistling a point about the Armenian Genocide.

Everything You Know About State Education Rankings Is Wrong. Most rating systems rank state education success based on a combined measure which includes amount of money spent as a positive outcome, making it tautological to “prove” that more funding improves state performance. See also economists’ Stan Liebowitz and Matthew Kelly corrected ranking table, which also adjusts for some confounders.

The most significant Christian schism of the past five hundred years happened last month, when the Russian Orthodox Church severed ties with the Eastern Orthodox Patriarch of Constantinople due to an argument about Ukraine.

California may allow marijuana, and it may allow alcohol, but at least it’s taking a strong stance against cocktails that include CBD, for some reason.

Recent news in scientific publishing: two statisticians launch RESEARCHERS.ONE (site, Andrew Gelman blog post), a “souped-up Arxiv with pre- and post-publication review”. And Elsevier files a lawsuit forcing a Swedish ISP to ban Sci-Hub; the ISP complies but also bans Elsevier. Also: preregistration works.

Related?: A Chinese barbecue restaurant named itself The Lancet after a top medical journal, and is offering discounts for researchers based on the impact factor of the journals they’ve published in. (h/t Julia Galef)

Experimental archaeology is the practice of doing things we think ancient people might have done to learn more about the details. For example, the Trireme Trust built and rowed a functional Greek trireme to learn more about how triremes worked.

Researchers crack the brain’s code for storing faces (paper, news article), describing it as “a high-dimensional analogy of the familiar RGB code for colors, allowing realistic faces to be accurately decoded with…a small number of cells”.

The Alpine-Himalayan orogenic belt connects the Pyrenees, Alps, Carpathians, Caucasus, Zagros, Tian Shan, and Himalayan ranges.

In what might be the most impressive temper tantrum of all time, the Saudis, angry about Qatar’s support for regional enemy Iran, are planning to dig a giant canal to turn Qatar into an island.

Did you know there are still object-level arguments about libertarianism sometime? It’s true! See Bryan Caplan’s delightfully named Optimality Vs. Fire. Another interesting Caplan: The Triumph Of Ayn Rand’s Worst Idea.

I am always a sucker for the “X as dril tweets” genre, so here is philosophers as dril tweets. EG:


If you want to see all of (someone’s idiosyncratic and dubious selection of what counts as) the rationality-related subreddits in one place, there’s now a Rationality Reddit Feed. Also, gwern has a subreddit now.

Sarah Kliff at Vox is trying to bring transparency to ER prices with a database of what each hospital’s fees are (though it doesn’t look like it’s the kind of transparency where you’re allowed to see the database, apparently for medical privacy law reasons). If you have a recent ER bill, you can submit here, or you can see some of Vox’s reporting on the issue here.

Related: if you missed your previous opportunity to write about effective altruism for Vox, they’re hiring another effective altruism writer/reporter. You can see some of the excellent work by their current EA reporter here.

Science disproves your intrusive thoughts: Most Initial Conversations Go Better Than People Think.

Scandal at meta-analysis producer the Cochrane Collaboration as board members resign en masse. The story seems to go like this: The Collaboration did a meta-analysis showing that HPV vaccines are safe and effective. Cochrane board member Peter Gøtzsche (previously featured here as author of my favorite study on the placebo effect) wrote a savage takedown in the British Medical Journal saying the HPV review did not meet Cochrane standards and should not have been published. The Collaboration’s Board was apparently angry that he took this dispute public and a bare majority voted to expel him. Then the other half of the board stepped down in protest. So much for the one organization we were previously able to trust 🙁

And another academic scandal: Eiko Fried and James Coyne are two of my favorite psychologists and crusaders for high standards in psychology. They’ve recently been having a bad time. As far as I can understand it, Coyne is (by his own admission) well known for being extremely blunt and not afraid of personal attacks on people he thinks deserve it. Fried wrote an article about how a climate of personal attacks and nastiness in the psychology community have gone too far, and most of his examples were of Coyne. Coyne wrote some things accusing Fried of tone policing, but also sued Fried for “cyberbullying” and spread rumors that he was “aligned with racism”. Now Fried has 100% won the lawsuit, the rumors against him have been debunked, various people have come out saying they were harassed by Coyne (and apparently there was also a case of “assault and battery”!) and various institutions Coyne is affiliated with have unaffiliated with him (or said they were never as affiliated as he claimed). I’m really disappointed in this, but it’s helped crystallize some things for me. First, that although cyberbullying is a big problem, mindlessly cracking down on it is dangerous for exactly the reasons shown here – a cyberbully trying to silence their victim by suing them for cyberbullying (and the “aligned with racism” slur is a parallel warning on the dangers of moral panics). And second, that complaints about “tone policing” can often be a smokescreen for just genuinely being a bad actor.

“Superpermutations” is a term for mathematical objects containing every possible permutation of some number of items. The field recently received a jolt when a proof of the lower bound of an important theorem was discovered to have been posted by an anonymous user on a 4chan thread about how many different ways you could watch anime episodes. Now in an equally weird twist of fate, the upper bound of the same theorem has been proven by sci-fi writer Greg Egan, author of Permutation City.

Let’s Fund (description, site) is a crowdfunding site for effective altruism that helps people discover or coordinate campaigns.

This article is called YouTubers Will Enter Politics And The Ones Who Do Are Probably Going To Win, but it focuses on Kim Kataguiri (age 22, the youngest person ever elected to Brazil’s Congress) and other right-wing YouTubers who won positions in the recent Brazilian elections.

The world’s new tallest statue is India’s Statue Of Unity, a 600-foot high (and impressively realistic) depiction of independence hero Sardar Patel.

In 1861, a Tokugawa-era author published the first Japanese book ever on the newly-contacted land of America, called Osanaetoki Bankokubanashi. Although beautifully illustrated, the content was a bit fanciful…

..and by “a bit fanciful”, I mean that this is a depiction of John Adams asking a mountain fairy to help avenge the death of his mother, who was eaten by a giant snake. I assumed the book had to be fake, but Kyoto University seems to endorse it as real. You can find more of Kapur’s commentary here and the rest of the book here.

Karl Friston, previously the subject of a bemused SSC post, is now the subject of an only-somewhat-bemused Wired story. The way this story presents the free energy principle makes it much more of an obvious match for control theory, so much that I’m wondering if I’m misunderstanding it. Related: some computational neuroscience principles used to make a curiosity-driven AI.

Mathematical proofs small enough to fit on Twitter: every odd integer is the difference of two squares.

From the subreddit: the most successful fraudster of all time may have been Jho Low, a financier who offered to manage Malaysia’s $4 billion dollar sovereign wealth fund, took the $4 billion, and walked away with it.

Ever wonder why charities (and other organizations) that say they have enough funding but complain they can’t find enough good employees don’t just raise the salaries they’re offering until they can? Here’s an 80,000 Hours survey on the topic. The main insight is that if a group has 20 employees and can’t find a 21st, then if they want to raise the open position’s salary by X in order to attract more people, they need to raise all their existing employees’ salaries by X or those employees will reasonably complain they’re getting paid less for the same work. So the cost of raising the salary they’re offering for an empty position is less like X and more like 21X.

Sorry, non-Californians, more on the CA ballot propositions – here’s a table of how the state voted on each vs. how SFers voted vs. how LAers voted vs. what the relevant newspapers endorsed. It looks like everyone is pretty much in alignment except the San Francisco Chronicle, which hates everything.

Many Indo-European languages use euphemisms for “bear”, sometime several layers of euphemism, because of a fear that speaking the bear’s true name might summon it. The English word “bear” is a euphemism originally meaning “brown one”. Inside the quest to reconstruct the bear’s True Name. NB: do not read this article aloud or you might get eaten by bears.

Posted in Uncategorized | Tagged | 311 Comments

OT115: Oberon Thread

This is the bi-weekly visible open thread (there are also hidden open threads twice a week you can reach through the Open Thread tab on the top of the page). Post about anything you want, but please try to avoid hot-button political and social topics. You can also talk at the SSC subreddit or the SSC Discord server – and also check out the SSC Podcast. Also:

1. Comment of the week is John Schilling, explaining the good-cop bad-cop relationship between state courts and state legislators.

2. From now on, I will be deleting comments that say “first!”, whether or not they also say other things. Come on, people.

Posted in Uncategorized | Tagged | 631 Comments

The Economic Perspective On Moral Standards

[Content warning: scrupulosity. Some recent edits, see Mistakes page for details.]

I.

There are some pretty morally unacceptable things going on in a pretty structural way in society. Sometimes I hear some activists take this to an extreme: no currently living person is morally acceptable. People who aren’t reorienting their entire lives around acknowledging and combating the evils of the world aren’t even on the scale. And people who are may be (in the words of one of my friends who is close to that community) “only making comfortable sacrifices that let them think of themselves as a good person within their existing comfortable moral paradigm, instead of confronting the raw terrible truth.” IE “If you think you’re one of the good ones, you’re wrong”.

I have heard this sentiment raised by animal rights activists. The average meat-eater isn’t even on the scale. The average vegetarian still eats milk and cheese, and so is barely even trying. Even most vegans probably use some medical product with gelatin, or something tested on lab rats, or are just benefitting from animal suffering in some indirect way.

And I have heard it raised by environmentalists. The average SUV driver isn’t even on the scale. The average conscientious liberal might think they’re better because they bike to work and recycle, but they still barely think about how they’re using electricity generated by coal plants and eating food grown with toxic pesticides. Everyone could be doing more.

And I have heard raised by labor activists. Most of us use stuff made in sweatshops. Even if you avoid sweatshops, you probably use stuff made at less than a living wage. Even if you avoid that, are you doing everything you can to help and support workers who earn less than you do?

Even if you aren’t an animal rights activist, environmentalist, or labor advocate, do you believe in anything? Are you a Christian, a social justice advocate, or rationalist? Do you know anyone who really satisfies you as being sinless, non-racist, and/or rational? Then perhaps you too believe nobody is good.

We shouldn’t immediately dismiss the idea that nobody is good. By our standards, there were many times and places where this was true. I am not aware of any ancient Egyptians who were against slavery. By Roman times, a handful of people thought it might be a bad idea, but nobody lifted a finger to stop it. I doubt you could find any Roman at the intersection of currently acceptable positions on slavery, torture, women’s rights, and sexuality. Maybe a few followers of Epicurus – but is there much difference between 0.01% of people being good, and nobody being good?

But let’s back up here philosophically. There’s a clear definition of “perfectly good” – someone who has never deviated from morally optimal behavior in any way. “Nobody is perfectly good” is, I think, an uncontroversial statement. If “nobody is good” is controversial, it’s because we expect “good” to be a lower bar than “perfectly good”, representing a sort of minimum standard of okayness. It might be possible that nobody meets even a minimum standard of okayness – the Roman example still seems relevant – but we should probably back up further and figure out how we’re setting okayness standards.

My subjective impression of what we mean by “good” in the sense of “a decent person” or “minimally okay” has internal and external components. Internally, it means a person who is allowed to feel good about themselves instead of feeling guilty. Externally, it means a person who deserves praise rather than punishment.

Some people would deny one side or the other of this dichotomy. For example, some people believe nobody should ever feel guilty. Or, on the other hand, the “do you want a fucking cookie?” attitude activists sometimes take toward people who expect praise for their acts of support. This seems to rest on an assumption that even socially rare levels of virtue fall within the realm of “your basic minimal duty as a human being” and so should not get extra praise.

But I find the “good person”/”not a good person” dichotomy helpful. I’m not claiming it objectively exists. I can’t prove anything about ethics objectively exists. And even if there were objective ethical truths about what was right or wrong, that wouldn’t imply that there was an objective ethical truth about how much of the right stuff you have to do before you can go around calling yourself “good”. In the axiology/morality/law trichotomy, I think of “how much do I have to do in order to be a good person” as within the domain of morality. That means it’s a social engineering question, not a philosophical one. The social engineering perspective assumes that “good person” status is an incentive that can be used to make people behave better, and asks how high vs. low the bar should be set to maximize its effectiveness.

Consider the way companies set targets for their employees. At good companies, goals are ambitious but achievable. If the CEO of a small vacuum company tells her top salesman to sell a billion vacuums a year, this doesn’t motivate the salesman to try extra hard. It’s just the equivalent of not setting a goal at all, since he’ll fail at the goal no matter what. If the CEO says “Sell the most vacuums you can, and however many you sell, I will yell at you for not selling more”, this also probably isn’t going to win any leadership awards. A good CEO might ask a salesman to sell 10% more vacuums than he did last year, and offer a big bonus if he can accomplish it. Or she might say that the top 20% of salesmen will get promotions, or that the bottom 20% of salesmen will be fired, or something like that. The point is that the goal should effectively carve out two categories, “good salesman” and “bad salesman”, such that it’s plausible for any given salesman to end up in either, then offer an incentive that makes him want to fall in the first rather than the second.

I think of society setting the targets for “good person” a lot like a CEO setting the targets for “good vacuum salesman”. If they’re attainable and linked to incentives – like praise, honor, and the right to feel proud of yourself – then they’ll make people put in an extra effort so they can end up in the “good person” category. If they’re totally unattainable and nobody can ever be a good person no matter how hard they try, then nobody will bother trying. This doesn’t mean nobody will be good – some people are naturally good without hope for reward, just like some people will slave away for the vacuum company even when they’re underpaid and underappreciated. It just means you’ll lose the extra effort you would get from having a good incentive structure.

So what is the right level at which to set the bar for “good person”? An economist might think of this question as a price-setting problem: society is selling the product “moral respectability” and trying to decide how many units effort to demand from potential buyers in order to maximize revenue. Set the price too low, and you lose out on money that people would have been willing to pay. Set the price too high, and you won’t get any customers. Solve for the situation where you have a monopoly on the good and the marginal cost of production is zero, and this is how you set the “good person” bar.

I don’t have the slightest idea how you would actually go about doing that, and it’s just a metaphor anyway, so let me give some personal stories and related considerations.

II.

When I was younger, I determined that I had an ethical obligation to donate more money to charity, and that I was a bad person for giving as little as I did. But I also knew that if I donated more, I would be a bad person for not donating even more than that. Given that there was no solution to my infinite moral obligation, I just donated the same small amount.

Then I met a committed group of people who had all agreed to donate 10%. They all agreed that if you donated that amount you were doing good work and should feel proud of yourself. And if you donated less than that, then they would question your choice and encourage you to donate more. I immediately pledged to donate 10%, which was much more than I had been doing until then.

Selling the “you can feel good about the amount you’re donating to charity” product for 10% produces higher profits for the charity industry than selling it for 100%, at least if many people are like me.

III.

I can see an argument for an even looser standard: you should aim to be above average.

This is a very low bar. I think you might beat the average person on animal rights activism just by not stomping on anthills. The yoke here is really mild.

But if you believe in something like universalizability or the categorical imperative, “act in such a way that you are morally better than average” is a really interesting maxim! If everyone is just trying to be in the moral upper half of the population, the population average morality goes up. And up. And up. There’s no equilibrium other than universal sainthood.

This sounds silly, but I think it might have been going on over the past few hundred years in areas like racism and sexism. The anti-racism crusaders of yesteryear were, by our own standards, horrendously racist. But they were the good guys, fighting people even more racist than they were, and they won. Iterate that process over ten or so generations, and you reach the point where you’ve got to run your Halloween costume past your Chief Diversity Officer.

Another good thing about the number 50% is that it means there will always be as many good people as bad people. This can prove helpful if, for example, the bad people don’t like being called bad people, and want to take over the whole process of moral progress so they can declare themselves the good guys.

I’m not saying the bar should be set exactly at average. For one thing, this would mean that we could never have a situation where everyone was good enough. I think the American people are basically okay on the issue of fratricide, and I don’t want to accidentally imply that the least anti-fratricide 50% of people should feel bad about themselves. For another thing, it might be possible to move the generational process of moral progress faster if the bar is set at a higher level.

But I think these considerations at least suggest that the most effective place to set the bar might be lower than we would naively expect. I think this is how I treat people in real life. I have trouble condemning anyone who is doing more than the median. And although I don’t usually go around praising people, anybody who is doing more than the average for their time and place seems metaphorically praiseworthy to me.

IV.

A friend brings up an objection: even if low standards extract the most units of moral effort from the population, that might not be what’s important. In some cases, it’s more important that a few very important people put forth an extraordinary effort than that everyone does okay. For example, whether a billionaire donates some high percent of their money matters more than if you or I do. And whether a brilliant scientist devotes their career to fighting disease or existential risk matters much more than the rest of us.

I’m still trying to think about this, but naively it seems that we can treat the set of all exceptional people the same as the set of all people in general, and standards which extract the most moral value from one will also apply to the other. This especially makes sense if the standards are normed to effort rather than absolute results – for example, “everyone should donate 10%” extends to billionaires better than “everyone should donate $100”; diminishing marginal utility issues do argue that billionaires should donate more than that, but once you find the right unit the same argument should work.

One exception might be if we would otherwise hold exceptional people to higher standards. For example, if everybody tries to pressure a billionaire to donate most of their fortune, or on a brilliant scientist to work very hard to fight disease, then having a universal standard that it’s okay to do a little more than average might make this pressure less effective. Maybe trying to hold the average person to high standards usually backfires, but everyone would be able to successfully gang up on exceptional people to enforce high standards for them. If this were true, then a low bar might hold in Mediocristan, and a higher bar in Extremistan. Things like “not eating animal products” or “not driving an SUV” are in Mediocristan; things like curing cancer or donating wealth might be in Extremistan.

Another exception would be if it’s more important to extract many units of moral worth from the same people, rather than distributed across the population. For example, if cancer research (or AI research) turns out to be the most important thing, then even in the counterfactual world where everyone is equally intelligent, it’s important that somebody be the person to spend five years laboriously learning the basics so that they can make progress, and then that this person spend as much effort as possible doing the research and exploiting their training. A random person donating one hour of their time here or there is almost useless in comparison. In this case, we might want to have the highest standards possible, since a world in which most people give up and do nothing (but a few people accept the challenge and do a lot) is better than the alternative where everybody does a tiny bit. Again, this seems more applicable to issues like curing cancer or aligning AI than ones like not driving an SUV or using products made with unfair labor practices.

V.

I tried being vegetarian for a long time. Given that I can’t eat any vegetables (something about the combination of the bitter taste and the texture; they almost universally make me gag), this was really hard and I kept giving up and going back to subsisting mostly off of meat. The cost of “you can feel good about the amount you’re doing to not contribute to animal suffering” was apparently higher than I could afford.

For the past year, I’ve been following a more lax rule: I can’t eat any animal besides fish at home, but I can have meat (other than chicken) at restaurants. I’ve mostly been able to keep that rule, and now I’m eating a lot less meat than I did before.

This is a pathetic rule compared to even real pescetarians, let alone real vegetarians, let alone vegans. But I can tell this is right on the border of what I’m capable of doing; every time I go to the supermarket, I have an intense debate with the yetzer hara about whether I should buy meat to eat at home that week, and on rare occasions I give into temptation. But in general I hold back. And part of what holds me back is that I let myself feel like I am being good and helping save animals and have the right to feel proud when I keep my rule, and I beat myself up and feel bad and blameworthy when I break it.

I am sure any serious animal rights activist still thinks I am scum. Possibly there is an objective morality, and it agrees I am scum. But if I am right that this is the strictest rule I can keep, then I’m not sure who it benefits to remind me that I am scum. Deny me the right to feel okay when I do my half-assed attempt at virtue, and I will just make no attempt at virtue, and this will be worse for me and worse for animals.

Sticking to the economics metaphor, this is price discrimination. Companies try to figure out tricks to determine how rich you are, and then charge rich people more and poor people less: the goal is everyone buying the product for the highest price that they, personally, are willing to pay. Society should sell me “feel good about the amount you’re doing for animals” status for the highest price that leaves me preferring buying the product to not doing so. If someone else is much richer in willpower, or just likes vegetables more, society should charge them more.

Companies rarely try price discrimination except in the sneakiest and most covert ways, because it’s hard to do well and it makes everybody angry. Society does whatever it does – empirically, not care about animals unless someone is torturing a puppy or something. But we-as-members-of-society have the ability to practice price discrimination on ourselves-as-individuals, and sell our own right to feel okay about ourselves for whatever amount we as experts in our own preferences believe we can bear.

I don’t know the answer to the question of where “we” “as” “a” “society” “should” “set” “moral” “standards”. But if you’re interested in the question, and you have a good sense for what you are and aren’t capable of, maybe practicing price discrimination on yourself is the way to go.

Preschool: Much More Than You Wanted To Know

I.

A lot of people pushed back against my post on preschool, so it looks like we need to discuss this in more depth.

A quick refresher: good randomized controlled trials have shown that preschools do not improve test scores in a lasting way. Sometimes test scores go up a little bit, but these effects disappear after a year or two of regular schooling. However, early RCTs of intensive “wrap-around” preschools like the Perry Preschool Program and the Abecedarians found that graduates of those programs went on to have markedly better adult outcomes, including higher school graduation rates, more college attendance, less crime, and better jobs. But these studies were done in the 60s, before people invented being responsible, and had kind of haphazard randomization and followup. They were also small sample sizes, and from programs that were more intense than any of the scaled-up versions that replaced them. Modern scaled-up preschools like Head Start would love to be able to claim their mantle and boast similar results. But the only good RCT of Head Start, the HSIS study, is still in its first few years. It’s confirmed that Head Start test score gains fade out. But it hasn’t been long enough to study whether there are later effects on life outcomes. We can expect those results in ten years or so. For now, all we have is speculation based on a few quasi-experiments.

Deming 2009 is my favorite of these. He looks at the National Longitudinal Survey of Youth, a big nationwide survey that gets used for a lot of social science research, and picks out children who went to Head Start. These children are mostly disadvantaged because Head Start is aimed at the poor, so it would be unfair to compare them to the average child. He’s also too smart to just “control for income”, because he knows that’s not good enough. Instead, he finds children who went to Head Start but who have siblings who didn’t, and uses the sibling as a matched control for the Head Starter.

This ensures the controls will come from the same socioeconomic stratum, but he acknowledges it raises problems of its own. Why would a parent send one child to Head Start but not another? It might be that one child is very stupid and so the parents think they need the extra help preschool can provide; if this were true, it would mean Head Starters are systematically dumber than controls, and would underestimate the effect of Head Start. Or it might be that one child is very smart and the so the parents want to give them education so they can develop their full potential; if this were true, it would mean Head Starters are systematically smarter than controls, and would inflate the effect of Head Start. Or it might be that parents love one of their children more and put more effort into supporting them; if this meant these children got other advantages, it would again inflate the effect of Head Start. Or it might mean that parents send the child they love more to a fancy private preschool, and the child they love less gets stuck in Head Start, ie the government program for the disadvantaged. Or it might be that parents start out poor, send their child to Head Start, and then get richer and send their next child to a fancy private preschool, while that child also benefits from their new wealth in other ways. There are a lot of possible problems here.

Deming tries very hard to prove none of these are true. He compares Head Starters and their control siblings on thirty different pre-study variables, including family income during their preschool years, standardized test scores, various measures of health, number of hours mother works during their preschool years, breastfedness, etc. Of these thirty variables, he finds a significant difference on only one: birth weight. Head Starters were less likely to have very low birth weight than their control siblings. This is a moderately big deal, since birth weight is a strong predictor of general child health and later life success. But:

Given the emerging literature on the connection between birth weight and later outcomes, this is a serious threat to the validity of the [study]. There are a few reasons to believe that the birth weight differences are not a serious source of bias, however. First, it appears that the difference is caused by a disproportionate number of low-birth-weight children, rather than by a uniform rightward shift in the distribution of birth weight for Head Start children. For example, there are no significant differences in birth weight once low-birth-weight children (who represent less than 10 percent of the sample) are excluded.

Second, there is an important interaction between birth order and birth weight in this sample. Most of the difference in mean birth weight comes from children who are born third, fourth, or later. Later-birth-order children who subsequently enroll in Head Start are much less likely to be low birth weight than their older siblings who did not enroll in preschool. When I restrict the analysis to sibling pairs only, birth weight differences are much smaller and no longer significant, and the main results are unaffected. Finally, I estimate all the models in Section V with low-birth-weight children excluded, and, again, the main results are unchanged.

Still, to get a sense of the magnitude of any possible positive bias, I back out a correction using the long-run effect of birth weight on outcomes estimated by Black, Devereux, and Salvanes (2007). Specifically, they find that 10 percent higher birth weight leads to an increase in the probability of high school graduation of 0.9 percentage points for twins and 0.4 percentage points for siblings. If that reduced form relationship holds here, a simple correction suggests that the effect of Head Start on high school graduation (and by extension, other outcomes) could be biased upward by between 0.2 and 0.4 percentage points, or about 2–5 percent of the total effect.

Having set up his experimental and control group, Deming does the study and determines how well the Head Starters do compared to their controls. The test scores show some confusing patterns that differ by subgroup. Black children (the majority of this sample; Head Start is aimed at disadvantaged people in general and sometimes at blacks in particular) show the classic pattern of slightly higher test scores in kindergarten and first grade, fading out after a few years. White children never see any test score increases at all. Some subgroups, including boys and children of high-IQ mothers, see test score increases that don’t seem to fade out. But these differences in significance are not themselves significant and it might just be chance. Plausibly the results for blacks, who are the majority of the sample, are the real results, and everything else is noise added on. This is what non-subgroup analysis of the whole sample shows, and it’s how the study seems to treat it.

The nontest results are more impressive. Head Starters are about 8% more likely to graduate high school than controls. This pattern is significant for blacks, boys, and children of low-IQ mothers, but not for whites, girls, and children of high-IQ mothers. Since the former three categories are the sorts of people at high risk of dropping out of high school, this is probably just floor effects. Head Starters are also less likely to be diagnosed with a learning disability (remember, learning disability diagnosis is terrible and tends to just randomly hit underperforming students), and marginally less likely to repeat grades. The subgroup results tend to show higher significance levels for groups at risk of having bad outcomes, and lower significance levels for the rest, just as you would predict. There is no effect on crime. For some reason he does not analyze income, even though his dataset should be able to do that.

He combines all of this into an artificial index of “young adult outcomes” and finds that Head Start adds 0.23 SD. You may notice this is less than the 0.3 SD effect size of antidepressants that everyone wants to dismiss as meaningless, but in the social sciences apparently this is pretty good. Deming optimistically sums this up as “closing one-third of the gap between children with median and bottom-quartile family income”, as “75% of the black-white gap”, and as “80% of the benefits of [Perry Preschool] at 60% of the cost”.

Finally, he does some robustness checks to make sure this is not too dependent on any particular factor of his analysis. I won’t go into these in detail, but you can find them on page 127 of the manuscript, and it’s encouraging that he tries this, given that I’m used to reading papers by social psychologists who treat robustness checks the way vampires treat garlic.

Deming’s paper very similar to Garces Thomas & Currie (2002), which does the same methodology on a different dataset. GTC is earlier and more famous and probably the paper you’ll hear about if you read other discussions of this topic; I’m focusing on Deming because I think his analyses are more careful and he explains what he’s doing a lot better. Reading between the lines, GTC do not find any significant effects for the sample as a whole. In subgroup analyses, they find Head Start makes whites more likely to graduate high school and attend college, and blacks less likely to be involved in crime. One can almost sort of attribute this to floor effects; blacks many times more likely to have contact with the criminal justice system, and there are more blacks than whites in the sample, so maybe it makes sense that this is only significant for them. On the other hand, when I look at the results, there was almost as strong a positive effect for whites (ie Head Start whites committed more crimes, to the same degree Head Start blacks committed fewer crimes) – but there were fewer whites so it didn’t quite reach significance. And the high school results don’t make a lot of sense however you parse them. GTC use the words “statistically significant” a few times, so you know they’re thinking about it. But they don’t ever give significance levels for individual results and one gets the feeling they’re not very impressive. Their pattern of results isn’t really that similar to Deming’s either – remember, Deming found that all races were more likely to benefit from high school, and no race had less crime. GTC also don’t do nearly as much work to show that there aren’t differences between siblings. Deming is billed as confirming or replicating GTC, but this only seems true in the sense that both of them say nice things about Head Start. Their patterns of results are pretty different, and GTC’s are kind of implausible.

And for that matter, ten years earlier two of these authors, Currie and Thomas, did a similar study. They also use the National Longitudinal Survey of Youth, meaning I’m not really clear how their analysis differs from Deming’s (maybe it’s much earlier and so there’s less data?) They first use an “adjust for confounders” model and it doesn’t work very well. Then they try a comparing-siblings model and find that Head Starters are generally older than their no-preschool siblings, and also generally born to poorer mothers (these are probably just the same result; mothers get less poor as they get older). They also tend to do better on a standardized test, though the study is very unclear about when they’re giving this test so I can’t tell if they’re saying that group assignment is nonrandom or that the intervention increased test scores. They find Head Start does not increase income, maybe inconsistently increases test scores among whites but not blacks, decreases grade repetition for whites but not blacks, and improves health among blacks but not whites. They also look into Head Start’s effect on mothers, since part of the wrap-around program involves parent training. All they find is mild effects on white IQ scores, plus “a positive and implausibly large effect of Head Start on the probability that a white mother was a teen at the first birth” which they say is probably sampling error. Like the later study, this study does not give p-values and I am too lazy to calculate them from the things they do give, but it doesn’t seem like they’re likely to be very good.

Finally, Deming’s work was also replicated and extended by a team from the Brookings Institute. I think what they’re doing is taking the National Longitudinal Survey of Youth – the same dataset Deming and one of the GTC papers used – and updating it after a few more years of data. Like Deming, they find that “a wide variety” of confounders do not differ between Head Starters and their unpreschooled siblings. Because they’re with the Brookings Institute, their results are presented in a much prettier way than anyone else’s:

The Brookings replication (marked THP here) finds sizes somewhat larger than GTC, but somewhat smaller than Perry Preschool. It looks like they find a positive and significant effect on high school graduation for Hispanics, but not blacks or whites, which is a different weird racial pattern than all the previous weird racial patterns. Since their sample was disproportionately black and Hispanic, and the blacks almost reached significance, the whole sample is significant. They find increases of about 6% on high school graduation rates, compared to Deming’s claimed 8%, but on this chart it’s hard to see how Deming said his 8% was 80% as good as Perry Preschool. There are broadly similar effects on some other things like college attendance, self esteem, and “positive parenting”. They conclude:

These results are very similar to those by Deming (2009), who calculated high school graduation rates on the more limited cohorts that were available when he conducted his work.

These four studies – Deming, GTC, CT, and Brookings – all try to do basically the same thing, though with different datasets. Their results all sound the same at the broad level – “improved outcomes like high school graduation for some racial groups” – but on the more detailed level they can’t really agree which outcomes improve and which racial groups they improve for. I’m not sure how embarrassing this should be for them. All of their results seem to be kind of on the border of significance, and occasionally going below that border and occasionally above it, which helps explain the contradictions while also being kind of embarrassing in and of itself (Deming’s paper is the exception, with several results significant at the 0.01 level). Most of them do find things generally going the right direction and generally sane-looking findings. Overall I feel like Deming looks pretty good, the Brookings replication is too underspecified for me to have strong opinions on, and the various GTC papers neither add nor subtract much from this.

II.

I’m treating Ludwig and Miller separately because it’s a different – and more interesting – design.

In 1965, the government started an initiative to create Head Start programs in the 300 poorest counties in the US. There was no similar attempt to help counties #301 and above, so there’s a natural discontinuity at county #300. This is the classic sort of case where you can do a regression discontinuity experiment, so Ludwig and Miller decided to look into it and see if there was some big jump in child outcomes as you moved from the 301st-poorest-county to the 300th.

They started by looking into health outcomes, and found a dramatic jump. Head Start appears to improve the outcomes of certain easily-preventable childhood diseases 33-50%. For example, kids from counties with Head Start programs had much less anemia. Part of the Head Start program is screening for anemia and supplementing children with iron, which treats many anemias. So this is very unsurprising. Remember that the three hundred poorest counties in 1965 were basically all majority-black counties in the Deep South and much worse along every axis than you would probably expect – we are talking near-Third-World levels of poverty here. If you deploy health screening and intervention into near-Third-World levels of poverty, then the rates of easily preventable diseases should go down. Ludwig and Miller find they do. This is encouraging, but not really surprising, and maybe not super-relevant to the rest of what we’re talking about here.

But they also find a “positive discontinuity” in high school completion of about 5%. Kids in the 300th-and-below-poorest counties were about 5% more likely than kids in the 301st-and-above-poorest to finish high school. This corresponds to an average of staying in school six months longer. This discontinuity did not exist before Head Start was set up, and it does not exist among children who were the wrong age to participate in Head Start at the time it was set up. It comes into existence just when Head Start is set up, among the children who were in Head Start. This is a pretty great finding.

Unfortunately, it looks like this. The authors freely admit this is just at the limit of what they can detect at p < 0.05 in their data. They double check with another data source, which shows the same trend but is only significant at p < 0.1. "Our evidence for positive Head Start impacts on educational attainment is more suggestive, and limited by the fact that neither of the data sources available to us is quite ideal." This study has the strongest design, and it does find an effect, but the effect is basically squinting at a graph and saying "it kind of looks like that line might be a little higher than the other one". They do some statistics, but they are all the statistical equivalent of squinting at the graph and saying "it kind of looks like that line might be a little higher than the other one", and about as convincing. For a more complete critical look, see this post from the subreddit.

There is one other slightly similar regression discontinuity study, Carneiro and Ginja, which regresses a sample of people on Head Start availability and tries to prove that people who went to Head Start because they were just within the availability cutoff do better than people who missed out on Head Start because they were just outside it. This sounds clever and should be pretty credible. They find a bunch of interesting effects like that Head Starters are less likely to be obese, and less likely to be depressed. They find that non-blacks (but not blacks) are less likely to be involved in crime (which, remember, is the opposite finding as the last paper about Head Start and crime and race). But they don’t find any effect on likelihood to graduate high school or be involved in college. Also, they bury this result and everyone cites this paper as “Look, they’ve replicated that Head Start works!”

III.

A few scattered other studies to put these in context:

In 1980, Chicago created “Child Parent Centers”, a preschool program aimed at the disadvantaged much like all of these others we’ve been talking about. They did a study, which for some reason published its results in a medical journal, and which doesn’t really seem to be trying in the same way as the others. For example, it really doesn’t say much about the control group except that it was “matched”. Taking advantage of their unusually large sample size and excellent follow-up, they find that their program made children stay in school the same six months longer as many of the other studies find, had a strong effect on college completion (8% vs. 14% of kids), showed dose-dependent effects, and “was robust”. They are bad enough at showing their work that I am forced to trust them and the Journal of the American Medical Association, a prestigious journal that I can only hope would not have published random crap.

Havnes and Mogstad analyze a free universal child-care program in Norway, which was rolled out in different places at different times. They find that “exposure to child care raised the chances of completing high school and attending college, in orders of magnitude similar to the black-white race gaps in the US”. I am getting just cynical enough to predict that if Norway had black people, they would have a completely different pattern of benefits and losses from this program, but the Norwegians were able to avoid a subgroup analysis by being a nearly-monoethnic country. This is in contrast to Quebec, where a similar childcare program seems to have caused worse long-term outcomes. Going deeper into these results supports (though weakly and informally) a model where, when daycare is higher-quality than parental care, child outcomes improve; when daycare is lower-quality than parental care, child outcomes decline. So a reform that creates very good daycare, and mostly attracts children whose parents would not be able to care for them very well, will be helpful. Reforms that create low-quality daycare and draw from households that are already doing well will be harmful. See the discussion here.

Then there’s Chetty’s work on kindergarten, which I talk about here. He finds good kindergarten teachers do not consistently affect test scores, but do consistently affect adult earnings, similar to fade-out arguments around preschool. This study is randomized and strong. Its applicability to the current discussion is questionable, since kindergarten is not preschool, having a good teacher is not going to preschool at all, and the studies we’re looking at mostly haven’t found results about adult earnings. At best this suggests that schooling can have surprisingly large and fading-out-then-in-again effects on later life outcomes.

And finally, there’s a meta-analysis of 22 studies of early childhood education showing an effect size of 0.24 SD in favor of graduating high school, p less than 0.001. Maybe I should have started with that one. Maybe it’s crazy of me to save this for the end. Maybe this should count for about five times as much as everything I’ve mentioned so far. I’m putting it down here both to inflict upon you the annoyance I felt when discovering this towards the end of researching this topic, and so that you have a good idea of what kind of studies are going into this meta-analysis.

IV.

What do we make of this?

I am concerned that all of the studies in Parts I and II have been summed up as “Head Start works!”, and therefore as replicating each other, since the last study found “Head Start works!” and so did the newest one. In fact, they all find Head Start having small effects for some specific subgroup on some specific outcome, and it’s usually a different subgroup and outcome for each. So although GCT and Deming are usually considered replications of each other, they actually disprove each other’s results. One of GCT’s two big findings is that Head Start decreases crime among black children. But Deming finds that Head Start had no effect on crime among black children. The only thing the two of them agree on is that Head Start seems to improve high school graduation among whites. But Carneiro and Ginja, which is generally thought of as replicating the earlier two, finds Head Start has no effect on high school graduation among whites.

There’s an innocent explanation here, which is that everyone was very close to the significance threshold, so these are just picking up noise. This might make more sense graphically:

It’s easy to see here that both studies found basically the same thing, minus a little noise, but that Study 1 has to report its results as “significant for blacks but not whites” and Study 2 has to report the opposite. Is this what’s going on?

I made a table. I am really really not confident in this table. On one level, I am fundamentally not confident that what I am doing is even possible, and that the numbers in these studies are comparable to one another or mean what it looks like they mean. On a second level, I’m not sure I recorded this information correctly or put the right numbers in the right places. Still, here is the table; red means the result is significant:

This confirms my suspicions. Every study found something different, and it isn’t even close. For example, Carneiro & Ginja finds a strong effect of lowering white crime, but GCT finds that Head Start nonsignificantly increases white crime rates. Meanwhile, GCT find a strong and significant effect lowering black crime, but Carneiro and Ginja find an effect of basically zero.

The strongest case for the studies being in accord is for black high school graduation rates. Both Deming and Ludwig+Miller find an effect. Carneiro and Ginja don’t find an effect, but their effect size is similar to those of the other studies, and they might just have more stringent criteria since they are adjusting for multiple comparisons and testing many things. But they should have the more stringent criteria, and by trying to special-plead against this, I am just reversing the absolutely correct thing they did because I want to force positive results in the exact way that good statistical practice is trying to prevent me from doing. So maybe I shouldn’t do that.

Here is the strongest case for accepting this body research anyway. It doesn’t quite look like publication bias. For one thing, Ludwig and Miller have a paper where they say there’s probably no publication bias here because literally every dataset that can be used to test Head Start has been. For another, although I didn’t focus on gender or IQ on the chart above, most of the studies do find that it helps males and low-IQ people more with the sorts of problems men and low-IQ people usually face, which suggest it passes sanity checks. Most important, in a study whose results are entirely spurious, there should be an equal number of beneficial and harmful findings (ie they should find Head Start makes some subgroups worse on some outcomes). Since each of these studies investigates many things and usually finds many different significant results, it should be hard to publication bias all harmful findings out of existence. This sort of accords with the positive meta-analysis. Studies either show small positive results or are not signficant, and when you combine all of them into a meta-analysis, they become highly significant, look good, and make sense. And this would fit very well with the Norwegian study showing strong positive effects of childcare later in life. And Chetty’s study showing fade-out of kindergarten teachers followed by strong positive effects later in life. And of course the Perry Preschool and Abecedarian studies showing fade-out of tests scores followed by strong positive effects later in life. I even recently learned of a truly marvelous developmental explanation for why this might happen, which unfortunately this margin is too small to contain – expect a book review in the coming weeks.

The case against this research is that maybe the researchers cheated to have there be no harmful findings. Maybe the meta-analysis just shows that when a lot of researchers cheat a little, taking care to only commit minor undetectable sins, that adds up to a strong overall effect. This is harsh, but I was recently referred to this chart (h/t Mother Jones, which calls it “the chart of the decade” and “one of the greatest charts ever produced”):

This is the outcome of drug trials before and after the medical establishment started requiring preregistration (the vertical line) – in other words, before they made it harder to cheat. Before the vertical line, 60% of trials showed the drug in question was beneficial. After the vertical line, only 10% did. In other words, making it harder to cheat cuts the number of positive trials by a factor of six. It is not at all hard to cheat in the research of early childhood education; all the research in this post so far comes from the left side of the vertical line. We should be skeptical of all but the most ironclad research that comes from the left – and this is not the most ironclad research.

The Virtues of Rationality say:

One who wishes to believe says, “Does the evidence permit me to believe?” One who wishes to disbelieve asks, “Does the evidence force me to believe?” Beware lest you place huge burdens of proof only on propositions you dislike, and then defend yourself by saying: “But it is good to be skeptical.” If you attend only to favorable evidence, picking and choosing from your gathered data, then the more data you gather, the less you know. If you are selective about which arguments you inspect for flaws, or how hard you inspect for flaws, then every flaw you learn how to detect makes you that much stupider.

This is one of the many problems where the evidence permits me to disbelieve, but does not force me to do so. At this point I have only intuition and vague heuristics. My intuition tells me that in twenty years, when all the results are in, I expect early childhood programs to continue having small positive effects. My vague heuristics say the opposite, that I can’t trust research this irregular. So I don’t know.

I think I was right to register that my previous belief preschool definitely didn’t work was outdated and under challenge. I think I was probably premature to say I was wrong about preschool not working; I should have said I might be wrong. If I had to bet on it, I would say 60% odds preschool helps in ways kind of like the ones these studies suggest, 40% odds it’s useless.

I hope that further followup of the HSIS, an unusually good randomized controlled trial of Head Start, will shed more light on this after its participants reach high school age sometime in the 2020s.