Is Science Slowing Down?

[This post was up a few weeks ago before getting taken down for complicated reasons. They have been sorted out and I’m trying again.]

Is scientific progress slowing down? I recently got a chance to attend a conference on this topic, centered around a paper by Bloom, Jones, Reenen & Webb (2018).

BJRW identify areas where technological progress is easy to measure – for example, the number of transistors on a chip. They measure the rate of progress over the past century or so, and the number of researchers in the field over the same period. For example, here’s the transistor data:

This is the standard presentation of Moore’s Law – the number of transistors you can fit on a chip doubles about every two years (eg grows by 35% per year). This is usually presented as an amazing example of modern science getting things right, and no wonder – it means you can go from a few thousand transistors per chip in 1971 to many million today, with the corresponding increase in computing power.

But BJRW have a pessimistic take. There are eighteen times more people involved in transistor-related research today than in 1971. So if in 1971 it took 1000 scientists to increase transistor density 35% per year, today it takes 18,000 scientists to do the same task. So apparently the average transistor scientist is eighteen times less productive today than fifty years ago. That should be surprising and scary.

But isn’t it unfair to compare percent increase in transistors with absolute increase in transistor scientists? That is, a graph comparing absolute number of transistors per chip vs. absolute number of transistor scientists would show two similar exponential trends. Or a graph comparing percent change in transistors per year vs. percent change in number of transistor scientists per year would show two similar linear trends. Either way, there would be no problem and productivity would appear constant since 1971. Isn’t that a better way to do things?

A lot of people asked paper author Michael Webb this at the conference, and his answer was no. He thinks that intuitively, each “discovery” should decrease transistor size by a certain amount. For example, if you discover a new material that allows transistors to be 5% smaller along one dimension, then you can fit 5% more transistors on your chip whether there were a hundred there before or a million. Since the relevant factor is discoveries per researcher, and each discovery is represented as a percent change in transistor size, it makes sense to compare percent change in transistor size with absolute number of researchers.

Anyway, most other measurable fields show the same pattern of constant progress in the face of exponentially increasing number of researchers. Here’s BJRW’s data on crop yield:

The solid and dashed lines are two different measures of crop-related research. Even though the crop-related research increases by a factor of 6-24x (depending on how it’s measured), crop yields grow at a relatively constant 1% rate for soybeans, and apparently declining 3%ish percent rate for corn.

BJRW go on to prove the same is true for whatever other scientific fields they care to measure. Measuring scientific progress is inherently difficult, but their finding of constant or log-constant progress in most areas accords with Nintil’s overview of the same topic, which gives us graphs like

…and dozens more like it. And even when we use data that are easy to measure and hard to fake, like number of chemical elements discovered, we get the same linearity:

Meanwhile, the increase in researchers is obvious. Not only is the population increasing (by a factor of about 2.5x in the US since 1930), but the percent of people with college degrees has quintupled over the same period. The exact numbers differ from field to field, but orders of magnitude increases are the norm. For example, the number of people publishing astronomy papers seems to have dectupled over the past fifty years or so.

BJRW put all of this together into total number of researchers vs. total factor productivity of the economy, and find…

…about the same as with transistors, soybeans, and everything else. So if you take their methodology seriously, over the past ninety years, each researcher has become about 25x less productive in making discoveries that translate into economic growth.

Participants at the conference had some explanations for this, of which the ones I remember best are:

1. Only the best researchers in a field actually make progress, and the best researchers are already in a field, and probably couldn’t be kept out of the field with barbed wire and attack dogs. If you expand a field, you will get a bunch of merely competent careerists who treat it as a 9-to-5 job. A field of 5 truly inspired geniuses and 5 competent careerists will make X progress. A field of 5 truly inspired geniuses and 500,000 competent careerists will make the same X progress. Adding further competent careerists is useless for doing anything except making graphs look more exponential, and we should stop doing it. See also Price’s Law Of Scientific Contributions.

2. Certain features of the modern academic system, like underpaid PhDs, interminably long postdocs, endless grant-writing drudgery, and clueless funders have lowered productivity. The 1930s academic system was indeed 25x more effective at getting researchers to actually do good research.

3. All the low-hanging fruit has already been picked. For example, element 117 was discovered by an international collaboration who got an unstable isotope of berkelium from the single accelerator in Tennessee capable of synthesizing it, shipped it to a nuclear reactor in Russia where it was attached to a titanium film, brought it to a particle accelerator in a different Russian city where it was bombarded with a custom-made exotic isotope of calcium, sent the resulting data to a global team of theorists, and eventually found a signature indicating that element 117 had existed for a few milliseconds. Meanwhile, the first modern element discovery, that of phosphorous in the 1670s, came from a guy looking at his own piss. We should not be surprised that discovering element 117 needed more people than discovering phosphorous.

Needless to say, my sympathies lean towards explanation number 3. But I worry even this isn’t dismissive enough. My real objection is that constant progress in science in response to exponential increases in inputs ought to be our null hypothesis, and that it’s almost inconceivable that it could ever be otherwise.

Consider a case in which we extend these graphs back to the beginning of a field. For example, psychology started with Wilhelm Wundt and a few of his friends playing around with stimulus perception. Let’s say there were ten of them working for one generation, and they discovered ten revolutionary insights worthy of their own page in Intro Psychology textbooks. Okay. But now there are about a hundred thousand experimental psychologists. Should we expect them to discover a hundred thousand revolutionary insights per generation?

Or: the economic growth rate in 1930 was 2% or so. If it scaled with number of researchers, it ought to be about 50% per year today with our 25x increase in researcher number. That kind of growth would mean that the average person who made $30,000 a year in 2000 should make $50 million a year in 2018.

Or: in 1930, life expectancy at 65 was increasing by about two years per decade. But if that scaled with number of biomedicine researchers, that should have increased to ten years per decade by about 1955, which would mean everyone would have become immortal starting sometime during the Baby Boom, and we would currently be ruled by a deathless God-Emperor Eisenhower.

Or: the ancient Greek world had about 1% the population of the current Western world, so if the average Greek was only 10% as likely to be a scientist as the average modern, there were only 1/1000th as many Greek scientists as modern ones. But the Greeks made such great discoveries as the size of the Earth, the distance of the Earth to the sun, the prediction of eclipses, the heliocentric theory, Euclid’s geometry, the nervous system, the cardiovascular system, etc, and brought technology up from the Bronze Age to the Antikythera mechanism. Even adjusting for the long time scale to which “ancient Greece” refers, are we sure that we’re producing 1000x as many great discoveries as they are? If we extended BJRW’s graph all the way back to Ancient Greece, adjusting for the change in researchers as civilizations rise and fall, wouldn’t it keep the same shape as does for this century? Isn’t the real question not “Why isn’t Dwight Eisenhower immortal god-emperor of Earth?” but “Why isn’t Marcus Aurelius immortal god-emperor of Earth?”

Or: what about human excellence in other fields? Shakespearean England had 1% of the population of the modern Anglosphere, and presumably even fewer than 1% of the artists. Yet it gave us Shakespeare. Are there a hundred Shakespeare-equivalents around today? This is a harder problem than it seems – Shakespeare has become so venerable with historical hindsight that maybe nobody would acknowledge a Shakespeare-level master today even if they existed – but still, a hundred Shakespeares? If we look at some measure of great works of art per era, we find past eras giving us far more than we would predict from their population relative to our own. This is very hard to judge, and I would hate to be the guy who has to decide whether Harry Potter is better or worse than the Aeneid. But still? A hundred Shakespeares?

Or: what about sports? Here’s marathon records for the past hundred years or so:

In 1900, there were only two local marathons (eg the Boston Marathon) in the world. Today there are over 800. Also, the world population has increased by a factor of five (more than that in the East African countries that give us literally 100% of top male marathoners). Despite that, progress in marathon records has been steady or declining. Most other Olympics sports show the same pattern.

All of these lines of evidence lead me to the same conclusion: constant growth rates in response to exponentially increasing inputs is the null hypothesis. If it wasn’t, we should be expecting 50% year-on-year GDP growth, easily-discovered-immortality, and the like. Nobody expected that before reading BJRW, so we shouldn’t be surprised when BJRW provide a data-driven model showing it isn’t happening. I realize this in itself isn’t an explanation; it doesn’t tell us why researchers can’t maintain a constant level of output as measured in discoveries. It sounds a little like “God wouldn’t design the universe that way”, which is a kind of suspicious line of argument, especially for atheists. But it at least shifts us from a lens where we view the problem as “What three tweaks should we make to the graduate education system to fix this problem right now?” to one where we view it as “Why isn’t Marcus Aurelius immortal?”

And through such a lens, only the “low-hanging fruits” explanation makes sense. Explanation 1 – that progress depends only on a few geniuses – isn’t enough. After all, the Greece-today difference is partly based on population growth, and population growth should have produced proportionately more geniuses. Explanation 2 – that PhD programs have gotten worse – isn’t enough. There would have to be a worldwide monotonic decline in every field (including sports and art) from Athens to the present day. Only Explanation 3 holds water.

I brought this up at the conference, and somebody reasonably objected – doesn’t that mean science will stagnate soon? After all, we can’t keep feeding it an exponentially increasing number of researchers forever. If nothing else stops us, then at some point, 100% (or the highest plausible amount) of the human population will be researchers, we can only increase as fast as population growth, and then the scientific enterprise collapses.

I answered that the Gods Of Straight Lines are more powerful than the Gods Of The Copybook Headings, so if you try to use common sense on this problem you will fail.

Imagine being a futurist in 1970 presented with Moore’s Law. You scoff: “If this were to continue only 20 more years, it would mean a million transistors on a single chip! You would be able to fit an entire supercomputer in a shoebox!” But common sense was wrong and the trendline was right.

“If this were to continue only 40 more years, it would mean ten billion transistors per chip! You would need more transistors on a single chip than there are humans in the world! You could have computers more powerful than any today, that are too small to even see with the naked eye! You would have transistors with like a double-digit number of atoms!” But common sense was wrong and the trendline was right.

Or imagine being a futurist in ancient Greece presented with world GDP doubling time. Take the trend seriously, and in two thousand years, the future would be fifty thousand times richer. Every man would live better than the Shah of Persia! There would have to be so many people in the world you would need to tile entire countries with cityscape, or build structures higher than the hills just to house all of them. Just to sustain itself, the world would need transportation networks orders of magnitude faster than the fastest horse. But common sense was wrong and the trendline was right.

I’m not saying that no trendline has ever changed. Moore’s Law seems to be legitimately slowing down these days. The Dark Ages shifted every macrohistorical indicator for the worse, and the Industrial Revolution shifted every macrohistorical indicator for the better. Any of these sorts of things could happen again, easily. I’m just saying that “Oh, that exponential trend can’t possibly continue” has a really bad track record. I do not understand the Gods Of Straight Lines, and honestly they creep me out. But I would not want to bet against them.

Grace et al’s survey of AI researchers show they predict that AIs will start being able to do science in about thirty years, and will exceed the productivity of human researchers in every field shortly afterwards. Suddenly “there aren’t enough humans in the entire world to do the amount of research necessary to continue this trend line” stops sounding so compelling.

At the end of the conference, the moderator asked how many people thought that it was possible for a concerted effort by ourselves and our institutions to “fix” the “problem” indicated by BJRW’s trends. Almost the entire room raised their hands. Everyone there was smarter and more prestigious than I was (also richer, and in many cases way more attractive), but with all due respect I worry they are insane. This is kind of how I imagine their worldview looking:

I realize I’m being fatalistic here. Doesn’t my position imply that the scientists at Intel should give up and let the Gods Of Straight Lines do the work? Or at least that the head of the National Academy of Sciences should do something like that? That Francis Bacon was wasting his time by inventing the scientific method, and Fred Terman was wasting his time by organizing Silicon Valley? Or perhaps that the Gods Of Straight Lines were acting through Bacon and Terman, and they had no choice in their actions? How do we know that the Gods aren’t acting through our conference? Or that our studying these things isn’t the only thing that keeps the straight lines going?

I don’t know. I can think of some interesting models – one made up of a thousand random coin flips a year has some nice qualities – but I don’t know.

I do know you should be careful what you wish for. If you “solved” this “problem” in classical Athens, Attila the Hun would have had nukes. Remember Yudkowsky’s Law of Mad Science: “Every eighteen months, the minimum IQ necessary to destroy the world drops by one point.” Do you really want to make that number ten points? A hundred? I am kind of okay with the function mapping number of researchers to output that we have right now, thank you very much.

The conference was organized by Patrick Collison and Michael Nielsen; they have written up some of their thoughts here.

Impending Survey Discussion Thread

Sorry, decreased blogging this week because of Thanksgiving.

But I am going to post a new SSC survey in a few weeks. Feel free to use this thread to tell me what you want – from questions you want to see, to methodology issues that bothered you on past surveys, to whatever.

Keep in mind I will probably have to ignore the overwhelming majority of suggestions here.

Posted in Uncategorized | Tagged , | 294 Comments

Links 11/18: MayflowURL

In 532, the Byzantines and Persians signed what they called The Perpetual Peace, so named because it was expected to last forever. It lasted eight years. After the ensuing war, the Byzantines and Persians, now less optimistic, named their new treaty The Fifty Year Peace. It lasted ten years.

Patrick Collison and Michael Nielson on diminishing returns from science. Some of you have already seen my thoughts on this, but I’ll post them here in a week or two.

Wikipedia has a page on Armenia/Azerbaijan relations in the Eurovision Song Contest. Highlights include the time Azerbaijan’s secret police rounded up everyone who voted for Armenia, the time Armenia claimed Azerbaijan cut off the broadcast to prevent people from seeing Armenia winning, and accusations from Azerbaijani officials that vapid Armenian love song “Don’t Deny” was dog-whistling a point about the Armenian Genocide.

Everything You Know About State Education Rankings Is Wrong. Most rating systems rank state education success based on a combined measure which includes amount of money spent as a positive outcome, making it tautological to “prove” that more funding improves state performance. See also economists’ Stan Liebowitz and Matthew Kelly corrected ranking table, which also adjusts for some confounders.

The most significant Christian schism of the past five hundred years happened last month, when the Russian Orthodox Church severed ties with the Eastern Orthodox Patriarch of Constantinople due to an argument about Ukraine.

California may allow marijuana, and it may allow alcohol, but at least it’s taking a strong stance against cocktails that include CBD, for some reason.

Recent news in scientific publishing: two statisticians launch RESEARCHERS.ONE (site, Andrew Gelman blog post), a “souped-up Arxiv with pre- and post-publication review”. And Elsevier files a lawsuit forcing a Swedish ISP to ban Sci-Hub; the ISP complies but also bans Elsevier. Also: preregistration works.

Related?: A Chinese barbecue restaurant named itself The Lancet after a top medical journal, and is offering discounts for researchers based on the impact factor of the journals they’ve published in. (h/t Julia Galef)

Experimental archaeology is the practice of doing things we think ancient people might have done to learn more about the details. For example, the Trireme Trust built and rowed a functional Greek trireme to learn more about how triremes worked.

Researchers crack the brain’s code for storing faces (paper, news article), describing it as “a high-dimensional analogy of the familiar RGB code for colors, allowing realistic faces to be accurately decoded with…a small number of cells”.

The Alpine-Himalayan orogenic belt connects the Pyrenees, Alps, Carpathians, Caucasus, Zagros, Tian Shan, and Himalayan ranges.

In what might be the most impressive temper tantrum of all time, the Saudis, angry about Qatar’s support for regional enemy Iran, are planning to dig a giant canal to turn Qatar into an island.

Did you know there are still object-level arguments about libertarianism sometime? It’s true! See Bryan Caplan’s delightfully named Optimality Vs. Fire. Another interesting Caplan: The Triumph Of Ayn Rand’s Worst Idea.

I am always a sucker for the “X as dril tweets” genre, so here is philosophers as dril tweets. EG:


If you want to see all of (someone’s idiosyncratic and dubious selection of what counts as) the rationality-related subreddits in one place, there’s now a Rationality Reddit Feed. Also, gwern has a subreddit now.

Sarah Kliff at Vox is trying to bring transparency to ER prices with a database of what each hospital’s fees are (though it doesn’t look like it’s the kind of transparency where you’re allowed to see the database, apparently for medical privacy law reasons). If you have a recent ER bill, you can submit here, or you can see some of Vox’s reporting on the issue here.

Related: if you missed your previous opportunity to write about effective altruism for Vox, they’re hiring another effective altruism writer/reporter. You can see some of the excellent work by their current EA reporter here.

Science disproves your intrusive thoughts: Most Initial Conversations Go Better Than People Think.

Scandal at meta-analysis producer the Cochrane Collaboration as board members resign en masse. The story seems to go like this: The Collaboration did a meta-analysis showing that HPV vaccines are safe and effective. Cochrane board member Peter Gøtzsche (previously featured here as author of my favorite study on the placebo effect) wrote a savage takedown in the British Medical Journal saying the HPV review did not meet Cochrane standards and should not have been published. The Collaboration’s Board was apparently angry that he took this dispute public and a bare majority voted to expel him. Then the other half of the board stepped down in protest. So much for the one organization we were previously able to trust 🙁

And another academic scandal: Eiko Fried and James Coyne are two of my favorite psychologists and crusaders for high standards in psychology. They’ve recently been having a bad time. As far as I can understand it, Coyne is (by his own admission) well known for being extremely blunt and not afraid of personal attacks on people he thinks deserve it. Fried wrote an article about how a climate of personal attacks and nastiness in the psychology community have gone too far, and most of his examples were of Coyne. Coyne wrote some things accusing Fried of tone policing, but also sued Fried for “cyberbullying” and spread rumors that he was “aligned with racism”. Now Fried has 100% won the lawsuit, the rumors against him have been debunked, various people have come out saying they were harassed by Coyne (and apparently there was also a case of “assault and battery”!) and various institutions Coyne is affiliated with have unaffiliated with him (or said they were never as affiliated as he claimed). I’m really disappointed in this, but it’s helped crystallize some things for me. First, that although cyberbullying is a big problem, mindlessly cracking down on it is dangerous for exactly the reasons shown here – a cyberbully trying to silence their victim by suing them for cyberbullying (and the “aligned with racism” slur is a parallel warning on the dangers of moral panics). And second, that complaints about “tone policing” can often be a smokescreen for just genuinely being a bad actor.

“Superpermutations” is a term for mathematical objects containing every possible permutation of some number of items. The field recently received a jolt when a proof of the lower bound of an important theorem was discovered to have been posted by an anonymous user on a 4chan thread about how many different ways you could watch anime episodes. Now in an equally weird twist of fate, the upper bound of the same theorem has been proven by sci-fi writer Greg Egan, author of Permutation City.

Let’s Fund (description, site) is a crowdfunding site for effective altruism that helps people discover or coordinate campaigns.

This article is called YouTubers Will Enter Politics And The Ones Who Do Are Probably Going To Win, but it focuses on Kim Kataguiri (age 22, the youngest person ever elected to Brazil’s Congress) and other right-wing YouTubers who won positions in the recent Brazilian elections.

The world’s new tallest statue is India’s Statue Of Unity, a 600-foot high (and impressively realistic) depiction of independence hero Sardar Patel.

In 1861, a Tokugawa-era author published the first Japanese book ever on the newly-contacted land of America, called Osanaetoki Bankokubanashi. Although beautifully illustrated, the content was a bit fanciful…

..and by “a bit fanciful”, I mean that this is a depiction of John Adams asking a mountain fairy to help avenge the death of his mother, who was eaten by a giant snake. I assumed the book had to be fake, but Kyoto University seems to endorse it as real. You can find more of Kapur’s commentary here and the rest of the book here.

Karl Friston, previously the subject of a bemused SSC post, is now the subject of an only-somewhat-bemused Wired story. The way this story presents the free energy principle makes it much more of an obvious match for control theory, so much that I’m wondering if I’m misunderstanding it. Related: some computational neuroscience principles used to make a curiosity-driven AI.

Mathematical proofs small enough to fit on Twitter: every odd integer is the difference of two squares.

From the subreddit: the most successful fraudster of all time may have been Jho Low, a financier who offered to manage Malaysia’s $4 billion dollar sovereign wealth fund, took the $4 billion, and walked away with it.

Ever wonder why charities (and other organizations) that say they have enough funding but complain they can’t find enough good employees don’t just raise the salaries they’re offering until they can? Here’s an 80,000 Hours survey on the topic. The main insight is that if a group has 20 employees and can’t find a 21st, then if they want to raise the open position’s salary by X in order to attract more people, they need to raise all their existing employees’ salaries by X or those employees will reasonably complain they’re getting paid less for the same work. So the cost of raising the salary they’re offering for an empty position is less like X and more like 21X.

Sorry, non-Californians, more on the CA ballot propositions – here’s a table of how the state voted on each vs. how SFers voted vs. how LAers voted vs. what the relevant newspapers endorsed. It looks like everyone is pretty much in alignment except the San Francisco Chronicle, which hates everything.

Many Indo-European languages use euphemisms for “bear”, sometime several layers of euphemism, because of a fear that speaking the bear’s true name might summon it. The English word “bear” is a euphemism originally meaning “brown one”. Inside the quest to reconstruct the bear’s True Name. NB: do not read this article aloud or you might get eaten by bears.

Posted in Uncategorized | Tagged | 309 Comments

OT115: Oberon Thread

This is the bi-weekly visible open thread (there are also hidden open threads twice a week you can reach through the Open Thread tab on the top of the page). Post about anything you want, but please try to avoid hot-button political and social topics. You can also talk at the SSC subreddit or the SSC Discord server – and also check out the SSC Podcast. Also:

1. Comment of the week is John Schilling, explaining the good-cop bad-cop relationship between state courts and state legislators.

2. From now on, I will be deleting comments that say “first!”, whether or not they also say other things. Come on, people.

Posted in Uncategorized | Tagged | 627 Comments

The Economic Perspective On Moral Standards

[Content warning: scrupulosity. Some recent edits, see Mistakes page for details.]

I.

There are some pretty morally unacceptable things going on in a pretty structural way in society. Sometimes I hear some activists take this to an extreme: no currently living person is morally acceptable. People who aren’t reorienting their entire lives around acknowledging and combating the evils of the world aren’t even on the scale. And people who are may be (in the words of one of my friends who is close to that community) “only making comfortable sacrifices that let them think of themselves as a good person within their existing comfortable moral paradigm, instead of confronting the raw terrible truth.” IE “If you think you’re one of the good ones, you’re wrong”.

I have heard this sentiment raised by animal rights activists. The average meat-eater isn’t even on the scale. The average vegetarian still eats milk and cheese, and so is barely even trying. Even most vegans probably use some medical product with gelatin, or something tested on lab rats, or are just benefitting from animal suffering in some indirect way.

And I have heard it raised by environmentalists. The average SUV driver isn’t even on the scale. The average conscientious liberal might think they’re better because they bike to work and recycle, but they still barely think about how they’re using electricity generated by coal plants and eating food grown with toxic pesticides. Everyone could be doing more.

And I have heard raised by labor activists. Most of us use stuff made in sweatshops. Even if you avoid sweatshops, you probably use stuff made at less than a living wage. Even if you avoid that, are you doing everything you can to help and support workers who earn less than you do?

Even if you aren’t an animal rights activist, environmentalist, or labor advocate, do you believe in anything? Are you a Christian, a social justice advocate, or rationalist? Do you know anyone who really satisfies you as being sinless, non-racist, and/or rational? Then perhaps you too believe nobody is good.

We shouldn’t immediately dismiss the idea that nobody is good. By our standards, there were many times and places where this was true. I am not aware of any ancient Egyptians who were against slavery. By Roman times, a handful of people thought it might be a bad idea, but nobody lifted a finger to stop it. I doubt you could find any Roman at the intersection of currently acceptable positions on slavery, torture, women’s rights, and sexuality. Maybe a few followers of Epicurus – but is there much difference between 0.01% of people being good, and nobody being good?

But let’s back up here philosophically. There’s a clear definition of “perfectly good” – someone who has never deviated from morally optimal behavior in any way. “Nobody is perfectly good” is, I think, an uncontroversial statement. If “nobody is good” is controversial, it’s because we expect “good” to be a lower bar than “perfectly good”, representing a sort of minimum standard of okayness. It might be possible that nobody meets even a minimum standard of okayness – the Roman example still seems relevant – but we should probably back up further and figure out how we’re setting okayness standards.

My subjective impression of what we mean by “good” in the sense of “a decent person” or “minimally okay” has internal and external components. Internally, it means a person who is allowed to feel good about themselves instead of feeling guilty. Externally, it means a person who deserves praise rather than punishment.

Some people would deny one side or the other of this dichotomy. For example, some people believe nobody should ever feel guilty. Or, on the other hand, the “do you want a fucking cookie?” attitude activists sometimes take toward people who expect praise for their acts of support. This seems to rest on an assumption that even socially rare levels of virtue fall within the realm of “your basic minimal duty as a human being” and so should not get extra praise.

But I find the “good person”/”not a good person” dichotomy helpful. I’m not claiming it objectively exists. I can’t prove anything about ethics objectively exists. And even if there were objective ethical truths about what was right or wrong, that wouldn’t imply that there was an objective ethical truth about how much of the right stuff you have to do before you can go around calling yourself “good”. In the axiology/morality/law trichotomy, I think of “how much do I have to do in order to be a good person” as within the domain of morality. That means it’s a social engineering question, not a philosophical one. The social engineering perspective assumes that “good person” status is an incentive that can be used to make people behave better, and asks how high vs. low the bar should be set to maximize its effectiveness.

Consider the way companies set targets for their employees. At good companies, goals are ambitious but achievable. If the CEO of a small vacuum company tells her top salesman to sell a billion vacuums a year, this doesn’t motivate the salesman to try extra hard. It’s just the equivalent of not setting a goal at all, since he’ll fail at the goal no matter what. If the CEO says “Sell the most vacuums you can, and however many you sell, I will yell at you for not selling more”, this also probably isn’t going to win any leadership awards. A good CEO might ask a salesman to sell 10% more vacuums than he did last year, and offer a big bonus if he can accomplish it. Or she might say that the top 20% of salesmen will get promotions, or that the bottom 20% of salesmen will be fired, or something like that. The point is that the goal should effectively carve out two categories, “good salesman” and “bad salesman”, such that it’s plausible for any given salesman to end up in either, then offer an incentive that makes him want to fall in the first rather than the second.

I think of society setting the targets for “good person” a lot like a CEO setting the targets for “good vacuum salesman”. If they’re attainable and linked to incentives – like praise, honor, and the right to feel proud of yourself – then they’ll make people put in an extra effort so they can end up in the “good person” category. If they’re totally unattainable and nobody can ever be a good person no matter how hard they try, then nobody will bother trying. This doesn’t mean nobody will be good – some people are naturally good without hope for reward, just like some people will slave away for the vacuum company even when they’re underpaid and underappreciated. It just means you’ll lose the extra effort you would get from having a good incentive structure.

So what is the right level at which to set the bar for “good person”? An economist might think of this question as a price-setting problem: society is selling the product “moral respectability” and trying to decide how many units effort to demand from potential buyers in order to maximize revenue. Set the price too low, and you lose out on money that people would have been willing to pay. Set the price too high, and you won’t get any customers. Solve for the situation where you have a monopoly on the good and the marginal cost of production is zero, and this is how you set the “good person” bar.

I don’t have the slightest idea how you would actually go about doing that, and it’s just a metaphor anyway, so let me give some personal stories and related considerations.

II.

When I was younger, I determined that I had an ethical obligation to donate more money to charity, and that I was a bad person for giving as little as I did. But I also knew that if I donated more, I would be a bad person for not donating even more than that. Given that there was no solution to my infinite moral obligation, I just donated the same small amount.

Then I met a committed group of people who had all agreed to donate 10%. They all agreed that if you donated that amount you were doing good work and should feel proud of yourself. And if you donated less than that, then they would question your choice and encourage you to donate more. I immediately pledged to donate 10%, which was much more than I had been doing until then.

Selling the “you can feel good about the amount you’re donating to charity” product for 10% produces higher profits for the charity industry than selling it for 100%, at least if many people are like me.

III.

I can see an argument for an even looser standard: you should aim to be above average.

This is a very low bar. I think you might beat the average person on animal rights activism just by not stomping on anthills. The yoke here is really mild.

But if you believe in something like universalizability or the categorical imperative, “act in such a way that you are morally better than average” is a really interesting maxim! If everyone is just trying to be in the moral upper half of the population, the population average morality goes up. And up. And up. There’s no equilibrium other than universal sainthood.

This sounds silly, but I think it might have been going on over the past few hundred years in areas like racism and sexism. The anti-racism crusaders of yesteryear were, by our own standards, horrendously racist. But they were the good guys, fighting people even more racist than they were, and they won. Iterate that process over ten or so generations, and you reach the point where you’ve got to run your Halloween costume past your Chief Diversity Officer.

Another good thing about the number 50% is that it means there will always be as many good people as bad people. This can prove helpful if, for example, the bad people don’t like being called bad people, and want to take over the whole process of moral progress so they can declare themselves the good guys.

I’m not saying the bar should be set exactly at average. For one thing, this would mean that we could never have a situation where everyone was good enough. I think the American people are basically okay on the issue of fratricide, and I don’t want to accidentally imply that the least anti-fratricide 50% of people should feel bad about themselves. For another thing, it might be possible to move the generational process of moral progress faster if the bar is set at a higher level.

But I think these considerations at least suggest that the most effective place to set the bar might be lower than we would naively expect. I think this is how I treat people in real life. I have trouble condemning anyone who is doing more than the median. And although I don’t usually go around praising people, anybody who is doing more than the average for their time and place seems metaphorically praiseworthy to me.

IV.

A friend brings up an objection: even if low standards extract the most units of moral effort from the population, that might not be what’s important. In some cases, it’s more important that a few very important people put forth an extraordinary effort than that everyone does okay. For example, whether a billionaire donates some high percent of their money matters more than if you or I do. And whether a brilliant scientist devotes their career to fighting disease or existential risk matters much more than the rest of us.

I’m still trying to think about this, but naively it seems that we can treat the set of all exceptional people the same as the set of all people in general, and standards which extract the most moral value from one will also apply to the other. This especially makes sense if the standards are normed to effort rather than absolute results – for example, “everyone should donate 10%” extends to billionaires better than “everyone should donate $100”; diminishing marginal utility issues do argue that billionaires should donate more than that, but once you find the right unit the same argument should work.

One exception might be if we would otherwise hold exceptional people to higher standards. For example, if everybody tries to pressure a billionaire to donate most of their fortune, or on a brilliant scientist to work very hard to fight disease, then having a universal standard that it’s okay to do a little more than average might make this pressure less effective. Maybe trying to hold the average person to high standards usually backfires, but everyone would be able to successfully gang up on exceptional people to enforce high standards for them. If this were true, then a low bar might hold in Mediocristan, and a higher bar in Extremistan. Things like “not eating animal products” or “not driving an SUV” are in Mediocristan; things like curing cancer or donating wealth might be in Extremistan.

Another exception would be if it’s more important to extract many units of moral worth from the same people, rather than distributed across the population. For example, if cancer research (or AI research) turns out to be the most important thing, then even in the counterfactual world where everyone is equally intelligent, it’s important that somebody be the person to spend five years laboriously learning the basics so that they can make progress, and then that this person spend as much effort as possible doing the research and exploiting their training. A random person donating one hour of their time here or there is almost useless in comparison. In this case, we might want to have the highest standards possible, since a world in which most people give up and do nothing (but a few people accept the challenge and do a lot) is better than the alternative where everybody does a tiny bit. Again, this seems more applicable to issues like curing cancer or aligning AI than ones like not driving an SUV or using products made with unfair labor practices.

V.

I tried being vegetarian for a long time. Given that I can’t eat any vegetables (something about the combination of the bitter taste and the texture; they almost universally make me gag), this was really hard and I kept giving up and going back to subsisting mostly off of meat. The cost of “you can feel good about the amount you’re doing to not contribute to animal suffering” was apparently higher than I could afford.

For the past year, I’ve been following a more lax rule: I can’t eat any animal besides fish at home, but I can have meat (other than chicken) at restaurants. I’ve mostly been able to keep that rule, and now I’m eating a lot less meat than I did before.

This is a pathetic rule compared to even real pescetarians, let alone real vegetarians, let alone vegans. But I can tell this is right on the border of what I’m capable of doing; every time I go to the supermarket, I have an intense debate with the yetzer hara about whether I should buy meat to eat at home that week, and on rare occasions I give into temptation. But in general I hold back. And part of what holds me back is that I let myself feel like I am being good and helping save animals and have the right to feel proud when I keep my rule, and I beat myself up and feel bad and blameworthy when I break it.

I am sure any serious animal rights activist still thinks I am scum. Possibly there is an objective morality, and it agrees I am scum. But if I am right that this is the strictest rule I can keep, then I’m not sure who it benefits to remind me that I am scum. Deny me the right to feel okay when I do my half-assed attempt at virtue, and I will just make no attempt at virtue, and this will be worse for me and worse for animals.

Sticking to the economics metaphor, this is price discrimination. Companies try to figure out tricks to determine how rich you are, and then charge rich people more and poor people less: the goal is everyone buying the product for the highest price that they, personally, are willing to pay. Society should sell me “feel good about the amount you’re doing for animals” status for the highest price that leaves me preferring buying the product to not doing so. If someone else is much richer in willpower, or just likes vegetables more, society should charge them more.

Companies rarely try price discrimination except in the sneakiest and most covert ways, because it’s hard to do well and it makes everybody angry. Society does whatever it does – empirically, not care about animals unless someone is torturing a puppy or something. But we-as-members-of-society have the ability to practice price discrimination on ourselves-as-individuals, and sell our own right to feel okay about ourselves for whatever amount we as experts in our own preferences believe we can bear.

I don’t know the answer to the question of where “we” “as” “a” “society” “should” “set” “moral” “standards”. But if you’re interested in the question, and you have a good sense for what you are and aren’t capable of, maybe practicing price discrimination on yourself is the way to go.

Preschool: Much More Than You Wanted To Know

I.

A lot of people pushed back against my post on preschool, so it looks like we need to discuss this in more depth.

A quick refresher: good randomized controlled trials have shown that preschools do not improve test scores in a lasting way. Sometimes test scores go up a little bit, but these effects disappear after a year or two of regular schooling. However, early RCTs of intensive “wrap-around” preschools like the Perry Preschool Program and the Abecedarians found that graduates of those programs went on to have markedly better adult outcomes, including higher school graduation rates, more college attendance, less crime, and better jobs. But these studies were done in the 60s, before people invented being responsible, and had kind of haphazard randomization and followup. They were also small sample sizes, and from programs that were more intense than any of the scaled-up versions that replaced them. Modern scaled-up preschools like Head Start would love to be able to claim their mantle and boast similar results. But the only good RCT of Head Start, the HSIS study, is still in its first few years. It’s confirmed that Head Start test score gains fade out. But it hasn’t been long enough to study whether there are later effects on life outcomes. We can expect those results in ten years or so. For now, all we have is speculation based on a few quasi-experiments.

Deming 2009 is my favorite of these. He looks at the National Longitudinal Survey of Youth, a big nationwide survey that gets used for a lot of social science research, and picks out children who went to Head Start. These children are mostly disadvantaged because Head Start is aimed at the poor, so it would be unfair to compare them to the average child. He’s also too smart to just “control for income”, because he knows that’s not good enough. Instead, he finds children who went to Head Start but who have siblings who didn’t, and uses the sibling as a matched control for the Head Starter.

This ensures the controls will come from the same socioeconomic stratum, but he acknowledges it raises problems of its own. Why would a parent send one child to Head Start but not another? It might be that one child is very stupid and so the parents think they need the extra help preschool can provide; if this were true, it would mean Head Starters are systematically dumber than controls, and would underestimate the effect of Head Start. Or it might be that one child is very smart and the so the parents want to give them education so they can develop their full potential; if this were true, it would mean Head Starters are systematically smarter than controls, and would inflate the effect of Head Start. Or it might be that parents love one of their children more and put more effort into supporting them; if this meant these children got other advantages, it would again inflate the effect of Head Start. Or it might mean that parents send the child they love more to a fancy private preschool, and the child they love less gets stuck in Head Start, ie the government program for the disadvantaged. Or it might be that parents start out poor, send their child to Head Start, and then get richer and send their next child to a fancy private preschool, while that child also benefits from their new wealth in other ways. There are a lot of possible problems here.

Deming tries very hard to prove none of these are true. He compares Head Starters and their control siblings on thirty different pre-study variables, including family income during their preschool years, standardized test scores, various measures of health, number of hours mother works during their preschool years, breastfedness, etc. Of these thirty variables, he finds a significant difference on only one: birth weight. Head Starters were less likely to have very low birth weight than their control siblings. This is a moderately big deal, since birth weight is a strong predictor of general child health and later life success. But:

Given the emerging literature on the connection between birth weight and later outcomes, this is a serious threat to the validity of the [study]. There are a few reasons to believe that the birth weight differences are not a serious source of bias, however. First, it appears that the difference is caused by a disproportionate number of low-birth-weight children, rather than by a uniform rightward shift in the distribution of birth weight for Head Start children. For example, there are no significant differences in birth weight once low-birth-weight children (who represent less than 10 percent of the sample) are excluded.

Second, there is an important interaction between birth order and birth weight in this sample. Most of the difference in mean birth weight comes from children who are born third, fourth, or later. Later-birth-order children who subsequently enroll in Head Start are much less likely to be low birth weight than their older siblings who did not enroll in preschool. When I restrict the analysis to sibling pairs only, birth weight differences are much smaller and no longer significant, and the main results are unaffected. Finally, I estimate all the models in Section V with low-birth-weight children excluded, and, again, the main results are unchanged.

Still, to get a sense of the magnitude of any possible positive bias, I back out a correction using the long-run effect of birth weight on outcomes estimated by Black, Devereux, and Salvanes (2007). Specifically, they find that 10 percent higher birth weight leads to an increase in the probability of high school graduation of 0.9 percentage points for twins and 0.4 percentage points for siblings. If that reduced form relationship holds here, a simple correction suggests that the effect of Head Start on high school graduation (and by extension, other outcomes) could be biased upward by between 0.2 and 0.4 percentage points, or about 2–5 percent of the total effect.

Having set up his experimental and control group, Deming does the study and determines how well the Head Starters do compared to their controls. The test scores show some confusing patterns that differ by subgroup. Black children (the majority of this sample; Head Start is aimed at disadvantaged people in general and sometimes at blacks in particular) show the classic pattern of slightly higher test scores in kindergarten and first grade, fading out after a few years. White children never see any test score increases at all. Some subgroups, including boys and children of high-IQ mothers, see test score increases that don’t seem to fade out. But these differences in significance are not themselves significant and it might just be chance. Plausibly the results for blacks, who are the majority of the sample, are the real results, and everything else is noise added on. This is what non-subgroup analysis of the whole sample shows, and it’s how the study seems to treat it.

The nontest results are more impressive. Head Starters are about 8% more likely to graduate high school than controls. This pattern is significant for blacks, boys, and children of low-IQ mothers, but not for whites, girls, and children of high-IQ mothers. Since the former three categories are the sorts of people at high risk of dropping out of high school, this is probably just floor effects. Head Starters are also less likely to be diagnosed with a learning disability (remember, learning disability diagnosis is terrible and tends to just randomly hit underperforming students), and marginally less likely to repeat grades. The subgroup results tend to show higher significance levels for groups at risk of having bad outcomes, and lower significance levels for the rest, just as you would predict. There is no effect on crime. For some reason he does not analyze income, even though his dataset should be able to do that.

He combines all of this into an artificial index of “young adult outcomes” and finds that Head Start adds 0.23 SD. You may notice this is less than the 0.3 SD effect size of antidepressants that everyone wants to dismiss as meaningless, but in the social sciences apparently this is pretty good. Deming optimistically sums this up as “closing one-third of the gap between children with median and bottom-quartile family income”, as “75% of the black-white gap”, and as “80% of the benefits of [Perry Preschool] at 60% of the cost”.

Finally, he does some robustness checks to make sure this is not too dependent on any particular factor of his analysis. I won’t go into these in detail, but you can find them on page 127 of the manuscript, and it’s encouraging that he tries this, given that I’m used to reading papers by social psychologists who treat robustness checks the way vampires treat garlic.

Deming’s paper very similar to Garces Thomas & Currie (2002), which does the same methodology on a different dataset. GTC is earlier and more famous and probably the paper you’ll hear about if you read other discussions of this topic; I’m focusing on Deming because I think his analyses are more careful and he explains what he’s doing a lot better. Reading between the lines, GTC do not find any significant effects for the sample as a whole. In subgroup analyses, they find Head Start makes whites more likely to graduate high school and attend college, and blacks less likely to be involved in crime. One can almost sort of attribute this to floor effects; blacks many times more likely to have contact with the criminal justice system, and there are more blacks than whites in the sample, so maybe it makes sense that this is only significant for them. On the other hand, when I look at the results, there was almost as strong a positive effect for whites (ie Head Start whites committed more crimes, to the same degree Head Start blacks committed fewer crimes) – but there were fewer whites so it didn’t quite reach significance. And the high school results don’t make a lot of sense however you parse them. GTC use the words “statistically significant” a few times, so you know they’re thinking about it. But they don’t ever give significance levels for individual results and one gets the feeling they’re not very impressive. Their pattern of results isn’t really that similar to Deming’s either – remember, Deming found that all races were more likely to benefit from high school, and no race had less crime. GTC also don’t do nearly as much work to show that there aren’t differences between siblings. Deming is billed as confirming or replicating GTC, but this only seems true in the sense that both of them say nice things about Head Start. Their patterns of results are pretty different, and GTC’s are kind of implausible.

And for that matter, ten years earlier two of these authors, Currie and Thomas, did a similar study. They also use the National Longitudinal Survey of Youth, meaning I’m not really clear how their analysis differs from Deming’s (maybe it’s much earlier and so there’s less data?) They first use an “adjust for confounders” model and it doesn’t work very well. Then they try a comparing-siblings model and find that Head Starters are generally older than their no-preschool siblings, and also generally born to poorer mothers (these are probably just the same result; mothers get less poor as they get older). They also tend to do better on a standardized test, though the study is very unclear about when they’re giving this test so I can’t tell if they’re saying that group assignment is nonrandom or that the intervention increased test scores. They find Head Start does not increase income, maybe inconsistently increases test scores among whites but not blacks, decreases grade repetition for whites but not blacks, and improves health among blacks but not whites. They also look into Head Start’s effect on mothers, since part of the wrap-around program involves parent training. All they find is mild effects on white IQ scores, plus “a positive and implausibly large effect of Head Start on the probability that a white mother was a teen at the first birth” which they say is probably sampling error. Like the later study, this study does not give p-values and I am too lazy to calculate them from the things they do give, but it doesn’t seem like they’re likely to be very good.

Finally, Deming’s work was also replicated and extended by a team from the Brookings Institute. I think what they’re doing is taking the National Longitudinal Survey of Youth – the same dataset Deming and one of the GTC papers used – and updating it after a few more years of data. Like Deming, they find that “a wide variety” of confounders do not differ between Head Starters and their unpreschooled siblings. Because they’re with the Brookings Institute, their results are presented in a much prettier way than anyone else’s:

The Brookings replication (marked THP here) finds sizes somewhat larger than GTC, but somewhat smaller than Perry Preschool. It looks like they find a positive and significant effect on high school graduation for Hispanics, but not blacks or whites, which is a different weird racial pattern than all the previous weird racial patterns. Since their sample was disproportionately black and Hispanic, and the blacks almost reached significance, the whole sample is significant. They find increases of about 6% on high school graduation rates, compared to Deming’s claimed 8%, but on this chart it’s hard to see how Deming said his 8% was 80% as good as Perry Preschool. There are broadly similar effects on some other things like college attendance, self esteem, and “positive parenting”. They conclude:

These results are very similar to those by Deming (2009), who calculated high school graduation rates on the more limited cohorts that were available when he conducted his work.

These four studies – Deming, GTC, CT, and Brookings – all try to do basically the same thing, though with different datasets. Their results all sound the same at the broad level – “improved outcomes like high school graduation for some racial groups” – but on the more detailed level they can’t really agree which outcomes improve and which racial groups they improve for. I’m not sure how embarrassing this should be for them. All of their results seem to be kind of on the border of significance, and occasionally going below that border and occasionally above it, which helps explain the contradictions while also being kind of embarrassing in and of itself (Deming’s paper is the exception, with several results significant at the 0.01 level). Most of them do find things generally going the right direction and generally sane-looking findings. Overall I feel like Deming looks pretty good, the Brookings replication is too underspecified for me to have strong opinions on, and the various GTC papers neither add nor subtract much from this.

II.

I’m treating Ludwig and Miller separately because it’s a different – and more interesting – design.

In 1965, the government started an initiative to create Head Start programs in the 300 poorest counties in the US. There was no similar attempt to help counties #301 and above, so there’s a natural discontinuity at county #300. This is the classic sort of case where you can do a regression discontinuity experiment, so Ludwig and Miller decided to look into it and see if there was some big jump in child outcomes as you moved from the 301st-poorest-county to the 300th.

They started by looking into health outcomes, and found a dramatic jump. Head Start appears to improve the outcomes of certain easily-preventable childhood diseases 33-50%. For example, kids from counties with Head Start programs had much less anemia. Part of the Head Start program is screening for anemia and supplementing children with iron, which treats many anemias. So this is very unsurprising. Remember that the three hundred poorest counties in 1965 were basically all majority-black counties in the Deep South and much worse along every axis than you would probably expect – we are talking near-Third-World levels of poverty here. If you deploy health screening and intervention into near-Third-World levels of poverty, then the rates of easily preventable diseases should go down. Ludwig and Miller find they do. This is encouraging, but not really surprising, and maybe not super-relevant to the rest of what we’re talking about here.

But they also find a “positive discontinuity” in high school completion of about 5%. Kids in the 300th-and-below-poorest counties were about 5% more likely than kids in the 301st-and-above-poorest to finish high school. This corresponds to an average of staying in school six months longer. This discontinuity did not exist before Head Start was set up, and it does not exist among children who were the wrong age to participate in Head Start at the time it was set up. It comes into existence just when Head Start is set up, among the children who were in Head Start. This is a pretty great finding.

Unfortunately, it looks like this. The authors freely admit this is just at the limit of what they can detect at p < 0.05 in their data. They double check with another data source, which shows the same trend but is only significant at p < 0.1. "Our evidence for positive Head Start impacts on educational attainment is more suggestive, and limited by the fact that neither of the data sources available to us is quite ideal." This study has the strongest design, and it does find an effect, but the effect is basically squinting at a graph and saying "it kind of looks like that line might be a little higher than the other one". They do some statistics, but they are all the statistical equivalent of squinting at the graph and saying "it kind of looks like that line might be a little higher than the other one", and about as convincing. For a more complete critical look, see this post from the subreddit.

There is one other slightly similar regression discontinuity study, Carneiro and Ginja, which regresses a sample of people on Head Start availability and tries to prove that people who went to Head Start because they were just within the availability cutoff do better than people who missed out on Head Start because they were just outside it. This sounds clever and should be pretty credible. They find a bunch of interesting effects like that Head Starters are less likely to be obese, and less likely to be depressed. They find that non-blacks (but not blacks) are less likely to be involved in crime (which, remember, is the opposite finding as the last paper about Head Start and crime and race). But they don’t find any effect on likelihood to graduate high school or be involved in college. Also, they bury this result and everyone cites this paper as “Look, they’ve replicated that Head Start works!”

III.

A few scattered other studies to put these in context:

In 1980, Chicago created “Child Parent Centers”, a preschool program aimed at the disadvantaged much like all of these others we’ve been talking about. They did a study, which for some reason published its results in a medical journal, and which doesn’t really seem to be trying in the same way as the others. For example, it really doesn’t say much about the control group except that it was “matched”. Taking advantage of their unusually large sample size and excellent follow-up, they find that their program made children stay in school the same six months longer as many of the other studies find, had a strong effect on college completion (8% vs. 14% of kids), showed dose-dependent effects, and “was robust”. They are bad enough at showing their work that I am forced to trust them and the Journal of the American Medical Association, a prestigious journal that I can only hope would not have published random crap.

Havnes and Mogstad analyze a free universal child-care program in Norway, which was rolled out in different places at different times. They find that “exposure to child care raised the chances of completing high school and attending college, in orders of magnitude similar to the black-white race gaps in the US”. I am getting just cynical enough to predict that if Norway had black people, they would have a completely different pattern of benefits and losses from this program, but the Norwegians were able to avoid a subgroup analysis by being a nearly-monoethnic country. This is in contrast to Quebec, where a similar childcare program seems to have caused worse long-term outcomes. Going deeper into these results supports (though weakly and informally) a model where, when daycare is higher-quality than parental care, child outcomes improve; when daycare is lower-quality than parental care, child outcomes decline. So a reform that creates very good daycare, and mostly attracts children whose parents would not be able to care for them very well, will be helpful. Reforms that create low-quality daycare and draw from households that are already doing well will be harmful. See the discussion here.

Then there’s Chetty’s work on kindergarten, which I talk about here. He finds good kindergarten teachers do not consistently affect test scores, but do consistently affect adult earnings, similar to fade-out arguments around preschool. This study is randomized and strong. Its applicability to the current discussion is questionable, since kindergarten is not preschool, having a good teacher is not going to preschool at all, and the studies we’re looking at mostly haven’t found results about adult earnings. At best this suggests that schooling can have surprisingly large and fading-out-then-in-again effects on later life outcomes.

And finally, there’s a meta-analysis of 22 studies of early childhood education showing an effect size of 0.24 SD in favor of graduating high school, p less than 0.001. Maybe I should have started with that one. Maybe it’s crazy of me to save this for the end. Maybe this should count for about five times as much as everything I’ve mentioned so far. I’m putting it down here both to inflict upon you the annoyance I felt when discovering this towards the end of researching this topic, and so that you have a good idea of what kind of studies are going into this meta-analysis.

IV.

What do we make of this?

I am concerned that all of the studies in Parts I and II have been summed up as “Head Start works!”, and therefore as replicating each other, since the last study found “Head Start works!” and so did the newest one. In fact, they all find Head Start having small effects for some specific subgroup on some specific outcome, and it’s usually a different subgroup and outcome for each. So although GCT and Deming are usually considered replications of each other, they actually disprove each other’s results. One of GCT’s two big findings is that Head Start decreases crime among black children. But Deming finds that Head Start had no effect on crime among black children. The only thing the two of them agree on is that Head Start seems to improve high school graduation among whites. But Carneiro and Ginja, which is generally thought of as replicating the earlier two, finds Head Start has no effect on high school graduation among whites.

There’s an innocent explanation here, which is that everyone was very close to the significance threshold, so these are just picking up noise. This might make more sense graphically:

It’s easy to see here that both studies found basically the same thing, minus a little noise, but that Study 1 has to report its results as “significant for blacks but not whites” and Study 2 has to report the opposite. Is this what’s going on?

I made a table. I am really really not confident in this table. On one level, I am fundamentally not confident that what I am doing is even possible, and that the numbers in these studies are comparable to one another or mean what it looks like they mean. On a second level, I’m not sure I recorded this information correctly or put the right numbers in the right places. Still, here is the table; red means the result is significant:

This confirms my suspicions. Every study found something different, and it isn’t even close. For example, Carneiro & Ginja finds a strong effect of lowering white crime, but GCT finds that Head Start nonsignificantly increases white crime rates. Meanwhile, GCT find a strong and significant effect lowering black crime, but Carneiro and Ginja find an effect of basically zero.

The strongest case for the studies being in accord is for black high school graduation rates. Both Deming and Ludwig+Miller find an effect. Carneiro and Ginja don’t find an effect, but their effect size is similar to those of the other studies, and they might just have more stringent criteria since they are adjusting for multiple comparisons and testing many things. But they should have the more stringent criteria, and by trying to special-plead against this, I am just reversing the absolutely correct thing they did because I want to force positive results in the exact way that good statistical practice is trying to prevent me from doing. So maybe I shouldn’t do that.

Here is the strongest case for accepting this body research anyway. It doesn’t quite look like publication bias. For one thing, Ludwig and Miller have a paper where they say there’s probably no publication bias here because literally every dataset that can be used to test Head Start has been. For another, although I didn’t focus on gender or IQ on the chart above, most of the studies do find that it helps males and low-IQ people more with the sorts of problems men and low-IQ people usually face, which suggest it passes sanity checks. Most important, in a study whose results are entirely spurious, there should be an equal number of beneficial and harmful findings (ie they should find Head Start makes some subgroups worse on some outcomes). Since each of these studies investigates many things and usually finds many different significant results, it should be hard to publication bias all harmful findings out of existence. This sort of accords with the positive meta-analysis. Studies either show small positive results or are not signficant, and when you combine all of them into a meta-analysis, they become highly significant, look good, and make sense. And this would fit very well with the Norwegian study showing strong positive effects of childcare later in life. And Chetty’s study showing fade-out of kindergarten teachers followed by strong positive effects later in life. And of course the Perry Preschool and Abecedarian studies showing fade-out of tests scores followed by strong positive effects later in life. I even recently learned of a truly marvelous developmental explanation for why this might happen, which unfortunately this margin is too small to contain – expect a book review in the coming weeks.

The case against this research is that maybe the researchers cheated to have there be no harmful findings. Maybe the meta-analysis just shows that when a lot of researchers cheat a little, taking care to only commit minor undetectable sins, that adds up to a strong overall effect. This is harsh, but I was recently referred to this chart (h/t Mother Jones, which calls it “the chart of the decade” and “one of the greatest charts ever produced”):

This is the outcome of drug trials before and after the medical establishment started requiring preregistration (the vertical line) – in other words, before they made it harder to cheat. Before the vertical line, 60% of trials showed the drug in question was beneficial. After the vertical line, only 10% did. In other words, making it harder to cheat cuts the number of positive trials by a factor of six. It is not at all hard to cheat in the research of early childhood education; all the research in this post so far comes from the left side of the vertical line. We should be skeptical of all but the most ironclad research that comes from the left – and this is not the most ironclad research.

The Virtues of Rationality say:

One who wishes to believe says, “Does the evidence permit me to believe?” One who wishes to disbelieve asks, “Does the evidence force me to believe?” Beware lest you place huge burdens of proof only on propositions you dislike, and then defend yourself by saying: “But it is good to be skeptical.” If you attend only to favorable evidence, picking and choosing from your gathered data, then the more data you gather, the less you know. If you are selective about which arguments you inspect for flaws, or how hard you inspect for flaws, then every flaw you learn how to detect makes you that much stupider.

This is one of the many problems where the evidence permits me to disbelieve, but does not force me to do so. At this point I have only intuition and vague heuristics. My intuition tells me that in twenty years, when all the results are in, I expect early childhood programs to continue having small positive effects. My vague heuristics say the opposite, that I can’t trust research this irregular. So I don’t know.

I think I was right to register that my previous belief preschool definitely didn’t work was outdated and under challenge. I think I was probably premature to say I was wrong about preschool not working; I should have said I might be wrong. If I had to bet on it, I would say 60% odds preschool helps in ways kind of like the ones these studies suggest, 40% odds it’s useless.

I hope that further followup of the HSIS, an unusually good randomized controlled trial of Head Start, will shed more light on this after its participants reach high school age sometime in the 2020s.

Ketamine: An Update

In 2016, I wrote Ketamine Research In A New Light, which discussed the emerging consensus that, contra existing theory, ketamine’s rapid-acting antidepressant effects had nothing to do with NMDA at all. I discussed some experiments which suggested they might actually be due to a related receptor, AMPA.

The latest development is Attenuation of Antidepressant Effects of Ketamine by Opioid Receptor Antagonism, which finds that the opioid-blocker naltrexone prevents ketamine’s antidepressant effects. Naltrexone does not prevent dissociation or any of the other weird hallucinatory effects of ketamine, which are probably genuinely NMDA-related. This suggests it’s just a coincidence that NMDA antagonism and some secondary antidepressant effect exist in the same drug. If you can prevent an effect from working by blocking the opiate system, a natural assumption is that the effect works on the opiate system, and the authors suggest this is probably true.

(unexpected national news tie-in: Kavanaugh accuser Christine Blasey Ford is one of the authors of this paper)

In retrospect, there were warnings. The other study to have found an exciting rapid-acting antidepressant effect for an ordinary drug was Ultra-Low-Dose Buprenorphine As A Time-Limited Treatment For Severe Suicidal Ideation. It finds that buprenorphine (the active ingredient in suboxone), an opiate painkiller also used in treating addictions to other opiates, can quickly relieve the distress of acutely suicidal patients.

This didn’t make as big a splash as the ketamine results, for two reasons. First, everyone knows opiates feel good, and so maybe this got interpreted as just a natural extension of that truth (the Scientific American article on the discovery focused on an analogy where “mental pain” was the same as “physical pain” and so could be treated with painkillers). Second, we’re currently fighting a War On Opiates, and discovering new reasons to prescribe them seems kind of like giving aid and comfort to the enemy.

Ketamine is interesting because nobody can just reduce its mode of action to “opiates feel good”. Although it was long known to have some weak opiate effects, it doesn’t feel good; all the dissociation and hallucinations and stuff make sure of that. Whatever is going on is probably something more complicated.

The psychiatric establishment’s response, as published in the prestigious American Journal of Psychiatry, is basically “well, f@#k”. Here we were, excited about NMDA (or AMPA) giving us a whole new insight into the mechanisms of depression and the opportunity for a whole new class of treatment – and instead it looks like maybe it’s just pointing to The Forbidden Drugs That Nobody Is Supposed To Prescribe. The article concludes that ketamine should not be abandoned, but ketamine clinics under anaesthesiologists should be discouraged in favor of care monitored by psychiatrists. I will try not to be so cynical as to view this as the establishment seizing the opportunity for a power grab.

What happens now? A lot of this depends on addiction. One way we could go would be to say that although ketamine might have some opiate effects, it’s not addictive to the same degree as morphine, and it doesn’t seem to turn users into drug fiends, so we should stop worrying and press forward. We could even focus research on finding other opiates in a sweet spot where they’re still strong enough to fight depression but not strong enough to get people addicted. Maybe very-low-dose-buprenorphine is already in this sweet spot, I don’t know.

But all of this is going to be shaped by history. Remember that heroin was originally invented (and pushed) as a less-addictive, safer opiate that would solve the opiate crisis. Medicine has a really bad habit of seizing on hopes that we have found a less addictive version of an addictive thing, and only admitting error once half the country is addicted to it. And there are all sorts of weird edge cases – does ketamine cross-sensitize people to other opiates? Does it increase some sort of domain-general addiction-having-center in the brain? I know substance abuse doctors who believe all of this stuff.

Also, should we start thinking opiates have some sort of deep connection to depression? “Depression is related to the stuff that has the strongest effect on human happiness of any molecule class known” seems…actually pretty plausible now that I think about it. I don’t know how much work has been done on this before. I hope to see more.

SSRIs: An Update

Four years ago I examined the claim that SSRIs are little better than placebo. Since then, some of my thinking on this question has changed.

First, we got Cipriani et al’s meta-analysis of anti-depressants. It avoids some of the pitfalls of Kirsch and comes to about the same conclusion. This knocks down a few of the lines of argument in my part 4 about how the effect size might look more like 0.5 than 0.3. The effect size is probably about 0.3.

Second, I’ve seen enough to realize that the anomalously low effect size of SSRIs in studies should be viewed not as an SSRI-specific phenomenon, but as part of a general trend towards much lower-than-expected effect sizes for every psychiatric medication (every medication full stop?). I wrote about this in my post on melatonin:

The consensus stresses that melatonin is a very weak hypnotic. The Buscemi meta-analysis cites this as their reason for declaring negative results despite a statistically significant effect – the supplement only made people get to sleep about ten minutes faster. “Ten minutes” sounds pretty pathetic, but we need to think of this in context. Even the strongest sleep medications, like Ambien, only show up in studies as getting you to sleep ten or twenty minutes faster; this NYT article says that “viewed as a group, [newer sleeping pills like Ambien, Lunesta, and Sonata] reduced the average time to go to sleep 12.8 minutes compared with fake pills, and increased total sleep time 11.4 minutes.” I don’t know of any statistically-principled comparison between melatonin and Ambien, but the difference is hardly (pun not intended) day and night. Rather than say “melatonin is crap”, I would argue that all sleeping pills have measurable effects that vastly underperform their subjective effects.

Or take benzodiazepines, a class of anxiety drugs including things like Xanax, Ativan, and Klonopin. Everyone knows these are effective (at least at first, before patients develop tolerance or become addicted). The studies find them to have about equal efficacy as SSRIs. You could almost convince me that SSRIs don’t have a detectable effect in the real world; you will never convince me that benzos don’t. Even morphine for pain gets an effect size of 0.4, little better than SSRI’s 0.3 and not enough to meet anyone’s criteria for “clinically significant”. Leucht 2012 provides similarly grim statistics for everything else.

I don’t know whether this means that we should conclude “nothing works” or “we need to reconsider how we think about effect sizes”.

All this leads to the third thing I’ve been thinking about. Given that the effect size really is about 0.3, how do we square the scientific evidence (that SSRIs “work” but do so little that no normal person could possibly detect them) with the clinical evidence (that psychiatrists and patients often find SSRIs sometimes save lives and often make depression substantially better?)

The traditional way to do this is to say that psychiatrists and patients are wrong. Given all the possible biases involved, they misattribute placebo effects to the drugs, or credit some cases that would have remitted anyway to the beneficial effect of SSRIs, or disproportionately remember the times the drugs work over the times they don’t. While “people are biased” is always an option, this doesn’t fit the magnitude of the clinical evidence that I (and most other psychiatrists) observe. There are patients who will regularly get better on an antidepressant, get worse when they stop it, get better when they go back on it, get worse when they stop it again, et cetera. This raises some questions of its own, like why patients keep stopping antidepressants that they clearly need in order to function, but makes bias less likely. Overall the clinical evidence that these drugs work is so strong that I will grasp at pretty much any straw in order to save my sanity and confirm that this is actually a real effect.

Every clinician knows that different people respond to antidepressants differently or not at all. Some patients will have an obvious and dramatic response to the first antidepressant they try. Other patients will have no response to the first antidepressant, but after trying five different things you’ll find one that works really well. Still other patients will apparently never respond to anything.

Overall only about 30% – 50% of the time when I start a patient on a particular antidepressant, do we end up deciding this is definitely the right medication for them and they should definitely stay on it. This fits national and global statistics. According to a Korean study, the median amount of time a patient stays on their antidepressant prescription is three months. A Japanese study finds only 44% of patients continued their antidepressants the recommended six months; an American study finds 31%.

Suppose that one-third of patients have some gene that makes them respond to Prozac with an effect size of 1.0 (very large and impressive), and nobody else responds. In a randomized controlled trial of Prozac, the average effect size will show up as 0.33 (one-third of patients get effect size of 1, two-thirds get effect size of 0). This matches the studies. In the clinic, one-third of patients will be obvious Prozac responders, and their psychiatrist will keep them on Prozac and be very impressed with it as an antidepressant and sing the praises of SSRIs. Two-thirds of patients will get no benefit, and their doctors will write them off as non-responders and try something else. Maybe the something else will work, and then the doctors will sing the praises of that SSRI, or maybe they’ll just say it’s “treatment-resistant depression” and so doesn’t count.

In other words, doctors’ observation “SSRIs work very well” is an existence statement “there are some patients for whom SSRIs work very well” – and not a universal observation “SSRIs will always work well for all patients”. Nobody has ever claimed the latter so it’s not surprising that it doesn’t match the studies.

I linked Gueorguieva and Krystal on the original post; they are saying some kind of much more statistically sophisticated version of this. But I can’t find any other literature on this possibility, which is surprising, because if it were true it should be pretty obvious, and if it were false it should still be worth somebody’s time to debunk.

If this were true, it would strengthen the case for the throughput-based model I talk about in Recommendations vs. Guidelines and Anxiety Sampler Kits. Instead of worrying only about a medicine’s effect size and side effects, we should worry about whether it is a cheap experiment or an expensive experiment. Imagine a drug that instantly cures 5% of people’s depression, but causes terrible nausea in the other 95%. The traditional model would reject this drug, since its effect size in studies is low and it has severe side effects. On the throughput model, give this drug to everybody, 5% of people will be instantly cured, 95% of people will suffer nausea for a day before realizing it doesn’t work for them, and then the 5% will keep taking it and the other 95% can do something else. This is obviously a huge exaggeration, but I think the principle holds. If there’s enough variability, the benefit-to-side-effect ratio of SSRIs is interesting only insofar as it tells us where in our guideline to put them. After that, what matters is the benefit-to-side-effect ratio for each individual patient.

I don’t hear this talked about much and I don’t know if this is consistent with the studies that have been done.

Fourth, even though SSRIs are branded “antidepressants”, they have an equal right to be called anti-anxiety medications. There’s some evidence that they may work better for this indication than for depression, although it’s hard to tell. I think Irving Kirsch himself makes this claim: he analyzed the efficacy of SSRIs for everything and found a “relatively large effect size” of 0.7 for anxiety (though the study was limited to children). Depression and anxiety are highly comorbid and half of people with a depressive disorder also have an anxiety disorder; there are reasons to think that at some deep level they may be aspects of the same condition. If SSRIs effectively treated anxiety, this might make depressed people feel better in a way that doesn’t necessarily show up on formal depression tests, but which they would express to their psychiatrist as “I feel better”. Or, psychiatrists might have a vague positive glow around SSRIs if it successfully treats their anxiety patients (who may be the same people as their depression patients) and not be very good at separating that positive glow into “depression efficacy” and “anxiety efficacy”. Then they might believe they’ve had good experiences with using SSRIs for depression.

I don’t know if this is true and some other studies find that results for anxiety are almost as abysmal as for depression.

Marijuana: An Update

[Originally to be titled “Marijuana: I Was Wrong”, but looking back I was suitably careful about everything, and my reward is not having to say that.]

Five years ago, I reviewed the potential costs and benefits of marijuana legalization and concluded that there wasn’t enough evidence for a firm conclusion. I found that using some made-up math, the effects looked slightly positive, but this was very sensitive to small changes in how made-up the math was.

The only really interesting conclusion was that most of the objective costs or benefits of legalization came from road traffic accidents. Either stoned driving would increase such accidents, killing thousands. Or people using marijuana instead of alcohol would decrease those accidents, saving thousands. I concluded:

We should probably stop [emphasizing direct] health effects of marijuana and imprisonment for marijuana-related offenses, and concentrate all of our research and political energy on how marijuana affects driving.

Using the best evidence available at the time, I predicted that marijuana legalization would probably decrease road traffic accidents. Now several states have legalized marijuana, data are in, and we have some preliminary evidence on how marijuana affects driving. And I was wrong.

A study by the Highway Loss Data Institute in June of last year finds that states that legalized marijuana saw insurance claims for auto accidents increase about 3% over the general national trend for the time. An updated study by the same group finds 6% according to insurance claims, and 5.2% according to police reports.

These are usually contrasted with a 2017 study that finds legalization states did not have significantly increased rates of car accident fatality. What is going on?

This study finds a (non-significant) increase of 2.7%. This very nicely matches the non-fatal collision study covering the same period, which finds a 3% increase in total collisions. But because there’s a lower sample size of fatal collisions compared to total collision, the fatality result fails to reach significance. Probably the reason these two results are lower than the 5.2% – 6% result is because the 5.2% – 6% result is newer, and marijuana sales have been increasing every year after legalization.

If this interpretation is true, we should expect that a mature legal marijuana industry causes about a 5% increase in car crashes and fatalities. Score one point for “obvious things” in its fight with “clever attempts to draw counterintuitive conclusions because of substitution effects”.

In the current set of nine states with legalization, the 5% increase would amount to an extra 300 deaths per year. If the country as a whole legalized, that would make about 1800 extra deaths per year. Using my totally made-up math model from the previous post, this is enough to shift the net effect of marijuana legalization from positive to slightly negative. This is especially true if the alternative to legalization is decriminalization, which has many of the benefits of legalization but fewer costs.

But again, given how weak the math here is and how dependent it is on a lot of assumptions, this probably shouldn’t taken too seriously. Tomorrow we could find out that I interpreted the fatality study wrong and marijuana really does cause uniquely non-fatal car accidents. Or that we should be ignoring all of this and paying attention to the effects on chronic pain. Or that marijuana causes cancer. Wait, no, that one was last week. Screw it.

People pointed out on the original thread that all this quantification of the objective harms and benefits of marijuana left out something important: a lot of people like it. Fair. This is hard to think about, but here are some things that help guide my intuitions:

1. Marijuana still is definitely not as bad as alcohol or smoking, which aren’t banned

2. Marijuana still is probably worse than SSRIs, which are banned without a prescription (though it’s hard to go to jail for having them; consider them “decriminalized”). Don’t tell me this a fake comparison; they’re both psychoactive drugs that purport to make you calmer and happier.

3. About two thirds of drunk driving deaths are the drunk driver themselves, and stoned driving is probably the same way. We might choose to focus only on the one-third of fatalities that happen to bystanders if we believe people should be allowed to make bad choices that only hurt themselves.

4. Everyone expects the marijuana market to keep expanding in the states where it already exists, so these numbers may increase.

5. Marijuana taxes, spent intelligently, could easily save more lives than these accidents cost.

6. Marijuana taxes won’t be spent intelligently

7. If marijuana really does increase cancer risk by a few percent, that could easily outweigh everything else and make it a giant public health disaster.

8. But bacon also increases cancer risk by a few percent, is already a giant public health disaster, and we don’t worry about it that much.

9. If the above calculations are true, preventing national legalization of marijuana would save half as many lives as successfully implementing Australia-style gun control in the US.

There wasn’t meant to be a conclusion to all of these: they help guide my intuition, but in so many different directions that I still don’t have a real position.

Preschool: I Was Wrong

Kelsey Piper has written an article for Vox: Early Childhood Education Yields Big Benefits – Just Not The Ones You Think.

I had previously followed various studies that showed that preschool does not increase academic skill, academic achievement, or IQ, and concluded that it was useless. In fact, this had become a rallying point of movement for evidence-based social interventions; the continuing popular support for preschool proved that people were morons who didn’t care about science. I don’t think I ever said this aloud, but I believed it in my heart.

I talked to Kelsey about some of the research for her article, and independently came to the same conclusion: despite the earlier studies of achievement being accurate, preschools (including the much-maligned Head Start) do seem to help children in subtler ways that only show up years later. Children who have been to preschool seem to stay in school longer, get better jobs, commit less crime, and require less welfare. The thing most of the early studies were looking for – academic ability – is one of the only things it doesn’t affect.

This suggests that preschool is beneficial not because of the curriculum or because of “teaching young brains how to learn” or anything like that, but for purely social reasons. Kelsey reviews some evidence that it might improve child health, but this doesn’t seem to be the biggest part of the effect. Instead, she thinks that it frees low-income parents from childcare duties, lets them get better jobs (or in the case of mothers, sometimes lets them get a job at all), and improves parents’ human capital, with all the relevant follow-on effects. More speculatively, if the home environment is unusually bad, it gives the child a little while outside the home environment, and socializes them into a “normal” way of life. I’ll discuss a slightly more fleshed-out model of this in an upcoming post.

My only caveat in agreeing with this perspective is that Chetty finds the same effect (no academic gains, but large life-outcome gains years later) from children having good rather than bad elementary school teachers. This doesn’t make sense in the context of freeing up parents’ time to get better jobs, or of getting children out of a bad home environment. It might make sense in terms of socializing them, though I would hate to have to sketch out a model of how that works. But since the teacher data and the Head Start data agree, that gives me more reason to think both are right.

I can’t remember ever making a post about how Head Start was useless, but I definitely thought that, and to learn otherwise is a big update for me. I’ve written before about how when you make an update of that scale, it’s important to publicly admit error before going on to justify yourself or say why you should be excused as basically right in principle or whatever, so let me say it: I was wrong about Head Start.

That having been said, on to the self-justifications and excuses!

1) Head Start seems to work for reasons unrelated to the ones that made people want to do it. Those people were still wrong, and this is still a good example of policy effects being difficult to predict. It seems to have succeeded by coincidence, not because “early childhood education” is a good idea.

2) This probably strengthens rather than weakens the Caplanian case against education, since the studies find that the educational parts of preschool are not useful, and better teachers and curricula do not affect the benefits.

3) This strengthens rather than weakens the case that academic achievement is related primarily to IQ, and that IQ is primarily genetic and difficult to change. An intervention targeted at academic achievement and IQ manages to change everything else except those variables, which remain stubbornly the same. Studies consistently find that IQ is only responsible for about 25% of life outcomes, suggesting that education works on the other 75%.

But on a broader scale, this does lower my confidence in biodeterminism. Preschool is a shared environmental effect; your parents have a big effect on whether or not you go to preschool. Why doesn’t this shared environmental effect show up in studies, which generally find no shared environmental effect matters?

This is the same problem raised by Ozy’s post on lead. We know lead is important. We know it can damage your life outcomes. But we also know lead is related to the shared environment. And we also know studies keep finding the shared environment doesn’t matter. Some studies find the shared environment matters a little, when you make extra-double sure to have very high income inequality in your sample. But other studies find that it doesn’t, and almost all of them find that it doesn’t matter much at the still-high levels of income inequality you get by recruiting a convenience sample. How can this be? We have two really excellent and well-replicated scientific literatures, each proving opposite things. What now?

All I can think of is that maybe shared environment can matter, but is so small in the grand scheme of things that it’s below the threshold where zoomed-out studies of everything can detect it. That would help reconcile the two literature bases. But it doesn’t seem right. The lead effects are huge. The preschool effects, while moderate, suggest that something as minor as “whatever social advantage your family gets from your mother not having to take care of you for part of the day from ages 3 – 5” can have lasting and detectable effects. Surely then we would expect much larger effects from whether your mother is independently wealthy and can do whatever she wants, or whether your family otherwise has the ability to accrue social advantage.

It might also be an effect of what we’re measuring. Although there’s conventional wisdom that shared environment shows little effect in twin studies, there are occasional outliers. For example, studies of crime often find shared environment factors around 15-20%, especially in younger or poorer samples. And some of the studies that found effects from preschool measured crime. These are some inconsistent findings, and 15-20% from everything doesn’t seem consistent with measurable effects from preschool alone, but I’m kind of desperate here.

I guess I will just increase my belief in the studies that suggest shared environment matters a bit more when you limit yourself to non-cognitive factors and include the really poor, and hope that future work confirms this result.

I’ll also increase my political support for programs like these. I think these findings make universal childcare (almost) a no-brainer. They make universal pre-K much more appealing, with the strongest arguments against being inefficiency, eg that universal childcare or basic income are a more effective way of doing the same thing. But given the political realities that make universal pre-K more likely to happen than childcare or basic income, I am now happy to support it.

Posted in Uncategorized | Tagged | 268 Comments