codex Slate Star Codex

In a mad world, all blogging is psychiatry blogging

California, Water You Doing?

[Epistemic status: Low confidence. I have found numbers and stared at them until they made sense to me, but I have no education in this area. Tell me if I’m wrong.]

I.

There has recently been a lot of dumb fighting over who uses how much water in California, so I thought I would see if it made more sense as an infographic sort of thing:

Sources include Understanding Water Use In California, Inputs To Farm Production, California Water Usage In Crops, Urban Water Use Efficiency, Water Use In California, and Water: Who Uses How Much. There are some contradictions, probably caused by using sources from different years, and although I’m pretty confident this is right on an order of magnitude scale I’m not sure about a percentage point here or there. But that having been said:

On a state-sized level, people measure water in acre-feet, where an acre-foot is the amount of water needed to cover an area of one acre to a depth of one foot. California receives a total of 80 million acre-feet of water per year. Of those, 23 million are stuck in wild rivers (the hydrological phenomenon, not the theme park). These aren’t dammed and don’t have aqueducts to them so they can’t be used for other things. There has been a lot of misdirection over this recently, since having pristine wild rivers that fish swim in seems like an environmental cause, and so you can say that “environmentalists have locked up 23 million acre-feet of California water”. This is not a complete lie; if not for environmentalism, maybe some of these rivers would have been dammed up and added to the water system. But in practice you can’t dam every single river and most of these are way off in the middle of nowhere far away from the water-needing population. People’s ulterior motives shape whether or not they add these to the pot; I’ve put them in a different color blue to mark this.

Aside from that, another 14 million acre-feet are potentially usable, but deliberately diverted to environmental or recreational causes. These include 7.2 million for “recreational rivers”, apparently ones that people like to boat down, 1.6 million to preserve wetlands, and 5.6 million to preserve the Sacramento River Delta. According to environmentalists, this Sacramento River Delta water is non-negotiable, because if we stopped sending fresh water there the entire Sacramento River delta would turn salty and it would lead to some kind of catastrophe that would threaten our ability to get fresh water into the system at all.

34 million acre-feet of water are diverted to agriculture. The most water-expensive crop is alfalfa, which requires 5.3 million acre-feet a year. If you’re asking “Who the heck eats 5.3 million acre-feet of alfalfa?” the answer is “cows”. A bunch of other crops use about 2 million acre-feet each.

All urban water consumption totals 9 million acre-feet. Of those, 2.4 million are for commercial and industrial institutions, 3.8 million are for lawns, and 2.8 million are personal water use by average citizens in their houses. In case you’re wondering about this latter group, by my calculations all water faucets use 0.5 million, all toilets use 0.9 million, all showers use 0.5 million, leaks lose 0.3 million, and the remaining 0.6 million covers everything else – washing machines, dishwashers, et cetera.

Since numbers like these are hard to think about, it might be interesting to put them in a more intuitive form. The median California family earns $70,000 a year – let’s take a family just a little better-off than that who are making $80,000 so we can map it on nicely to California’s yearly water income of 80 million acre-feet.

The unusable 23 million acre-feet which go into wild rivers and never make it into the pot correspond to the unusable taxes the California family will have to pay. So our family is left with $57,000 post-tax income.

In this analogy, California is spending $14,000 on environment and recreation, $34,000 on agriculture, and $9,000 on all urban areas. All household uses – toilets, showers, faucets, etc – only add up to about $2,800 of their budget.

There is currently a water shortfall of about 6 million acre-feet per year, which is being sustained by exploiting non-renewable groundwater and other sources. This is the equivalent of our slightly-richer-than-average family having to borrow $6,000 from the bank each year to get by.

II.

Armed with this information, let’s see what we can make of some recent big news stories.

Apparently we are supposed to be worried about fracking depleting water in California. ThinkProgress reports that Despite Historic Drought, California Used 70 Million Gallons Of Water For Fracking Last Year. Similar concerns are raised by RT, Huffington Post, and even The New York Times. But 70 million gallons equals 214 acre-feet. Remember, alfalfa production uses 5.3 million acre feet. In our family-of-four analogy above, all the fracking in California costs them about a quarter. Worrying over fracking is like seeing an upper middle class family who are $6,000 in debt, and freaking out because one of their kids bought a gumball from a machine.

Apparently we are also supposed to be worried about Nestle bottling water in California. ABC News writes an article called Nestle Needs To Stop Bottling Water In Drought-Stricken California, Advocacy Group Says, about a group called the “Courage Campaign” who have gotten 135,000 signatures on a petition saying that Nestle needs to stop “bottling the scarce resource straight from the heart of California’s drought and selling it for profit.” Salon goes even further – their article is called Nestle’s Despicable Water Crisis Profiteering: How It’s Making A Killing While California Is Dying Of Thirst, and as always with this sort of thing Jezebel also has to get in on the action. But Nestle’s plant uses only 150 acre-feet, about one forty-thousandth the amount used to grow alfalfa, and the equivalent of about a dime to our family of four.

The Wall Street Journal says that farms are a scapegoat for the water crisis, because in fact the real culprits are environmentalists. They say that “A common claim is that agriculture consumes about 80% of ‘developed’ water supply, yet this excludes the half swiped off the top for environmental purposes.” But environmentalism only swipes half if you count among that half all of the wild rivers in the state – that is, every drop of water not collected, put in an aqueduct, and used to irrigate something is a “concession” to environmentalists. A more realistic figure for environmental causes is the 14 million acre-feet marked “Other Environmental” on the map above, and even that includes concessions to recreational boaters and to whatever catastrophe is supposed to happen if we can’t keep the Sacramento Delta working properly. It’s hard to calculate exactly how much of California’s water goes to environmental causes, but half is definitely an exaggeration.

Wired is concerned that the federal government is ordering California to spend 12,000 acre-feet of water to save six fish (h/t Alyssa Vance). Apparently these are endangered fish in some river who need to get out to the Pacific to breed, and the best way to help them do that is to fill up the river with 12,000 acre feet of water. That’s about $12 on our family’s budget, which works out to $2 per fish. I was going to say that I could totally see a family spending $2 on a fish, especially if it was one of those cool glow-in-the-dark fish I used to have when I was a kid, but then I remembered this was a metaphor and the family is actually the entire state budget of California but the six fish are still literally just six fish. Okay, yes, that seems a little much.

III.

Finally, Marginal Revolution and even some among the mysterious and endangered population of non-blog-having economists are talking about how really the system of price controls and subsidies in the water market is ridiculous and if we had a free market on water all of our problems would be solved. It looks to me like that’s probably right.

Consider: When I used to live in California, even before this recent drought I was being told to take fewer showers, to install low-flush toilets that were inconvenient and didn’t really work all that well, to limit my use of the washing machine and dishwasher, et cetera. It was actually pretty inconvenient. I assume all forty million residents of California were getting the same message, and that a lot of them would have liked to be able to pay for the right to take nice long relaxing showers.

But if all the savings from water rationing amounted to 20% of our residential water use, then that equals about 0.5 MAF, which is about 10% of the water used to irrigate alfalfa. The California alfalfa industry makes a total of $860 million worth of alfalfa hay per year. So if you calculate it out, a California resident who wants to spend her fair share of money to solve the water crisis without worrying about cutting back could do it by paying the alfalfa industry $2 to not grow $2 worth of alfalfa, thus saving as much water as if she very carefully rationed her own use.

If you were to offer California residents the opportunity to not have to go through the whole gigantic water-rationing rigamarole for $2 a head, I think even the poorest people in the state would be pretty excited about that. My mother just bought and installed a new water-saving toilet – which took quite a bit of her time and money – and furthermore, the government is going to give her a $125 rebate for doing so. Cutting water on the individual level is hard and expensive. But if instead of trying to save water ourselves, we just paid the alfalfa industry not to grow alfalfa, all the citizens of California could do their share for $2. If they also wanted to have a huge lush water-guzzling lawn, their payment to the alfalfa industry would skyrocket all the way to $5 per year.

In fact, though I am not at all sure here and I’ll want a real economist to double-check this, it seems to me if we wanted to buy out all alfalfa growers by paying them their usual yearly income to just sit around and not grow any alfalfa, that would cost $860 million per year and free up 5.3 million acre-feet, ie pretty much our entire shortfall of 6 million acre-feet, thus solving the drought. Sure, 860 million dollars sounds like a lot of money, but note that right now California newspapers have headlines like Billions In Water Spending Not Enough, Officials Say. Well, maybe that’s because you’re spending it on giving people $125 rebates for water-saving toilets, instead of buying out the alfalfa industry. I realize that paying people subsidies to misuse water to grow unprofitable crops, and then offering them countersubsidies to not take your first set of subsidies, is to say the least a very creative way to spend government money – but the point is it is better than what we’re doing now.

Posted in Uncategorized | Tagged | 669 Comments

Links 5/15: Tall And Linky

If The Machines Are Taking Our Jobs, They Are Hiding It From The Bureau Of Labor Statistics. An argument that the ‘rise of the robots’ can’t be behind stagnant employment numbers, because increasing the amount of work done by robots would make productivity-per-human go up, and it isn’t.

I was able to solve the Cheryl’s birthday Singapore logic puzzle after a few minutes, but I got stuck on the transfinite version.

The Kosher Light Switch claims that after you flip it, the light will come on, but that your flipping it doesn’t cause the light to come on, thus making it compliant with complicated Jewish ritual laws. Needless to say, this seems to depend on an interpretation of causation which is not entirely…what’s the word…kosher.

I said a while ago I thought that “affirmative consent” laws wouldn’t matter one way or the other since situations where people pressed cases based on them were unliikely to come up. I seem to have been wrong – in a recent case in Brandeis, a man was found in violation of affirmative consent laws because during the course of a two year romantic relationship, he occasionally kissed his partner goodbye in the morning without asking permission first. I’d like to blame this one on Feminism Gone Too Far, but since both parties were gay men we guys have nobody to blame but ourselves here.

The worst method of transliterating the Qiang language gives us such lovely words as “eazheabeageyegeaiju”, “gganpaeidubugeisdu”, and “chegvchagvchegvchagvlahva”. Anyone want to play a game of Terrible Qiang Transliteration Scrabble?

Be Careful Studying Discrimination Using Names. I talked about this briefly when comparing the two recent Women In STEM studies – calling one candidate “John” and the other “Jennifer” introduced a whole host of possible confounds beyond just gender. The article points out that articles which try to prove white-black discrimination by comparing “John” to “Jamal” have the same problem – Jamal isn’t just a black name, it’s a poor black name, and a fairer comparison would be a poor white name like Billy Bob. Features a pretty good reply by Women In STEM paper author Corinne Moss-Racusin, and a less good reply by the guy who wrote the John-Jamal paper.

Dodging Abilify is about the contortions some mental health patients have to go through to prevent their doctors from inappropriately prescribing latest Exciting-New-Marketing-Campaign-Drug Abilify to them. The writer may or may not be pleased to know that when Abilify goes generic in the near future, all of a sudden all of these prescriptions will stop and people will start pushing brexipiprazole instead.

South Dakota’s new ad campaign (h/t Heidi): look, lots of people want to go to Mars, but South Dakota is less inhospitable than Mars, so come to South Dakota instead. Key slogan: “If you’re someone that’s really introverted, it might not be that bad.”

The politics behind the recent campaign against Dr. Oz, and why it might have played right into his hands.

Student Course Evaluations Get An F. Professors whom students rate worst are precisely those professors whose students get the best grades in future courses, suggesting these evaluations are negatively correlated with teaching quality. Very relevant to our recent discussion on psych drugs, hopefully not relevant to past discussions on democracy!

Marijuana probably exacerbates psychosis because of its main chemical constituent THC. But a different marijuana chemical, cannabidiol, might actually a potent antipsychotic. And more evidence for same.

Dutch people swear using diseases. I bet doctors must win all verbal duels in the Netherlands.

An intervention meant to raise kindergarteners’ tolerance of disabled people by teaching them a curriculum about how great it was to have disabled friends actually lowered their tolerance of the disabled compared to a control curriculum where they learned science stuff. Researchers theorized that the science stuff made them work together in groups with other children (including disabled ones) for a practical goal rather than rubbing their noses in the difference.

A new study finds homeopathy and Prozac both outperform placebo by the same amount in treating postmenopausal depression. Ars Technica thinks it knows why the study found such a counterintuitive finding, but check the comments for why their deconstruction seems a bit premature. Overall I think both those defending the integrity of the trial and those attacking it have some good points, but the problem is that if this experiment had done anything other than propose homeopathy worked, it would never have gotten this level of scrutiny and any flaws it might or might not have would just have been allowed to pass.

This is Steven King-level creepy: Thoughts Can Fuel Some Deadly Brain Cancers.

Nostalgebraist, a very interesting guy who hangs around rationalist Tumblr, is writing fiction I’ve been enjoying a lot. His completed work, Floornight, asks – what happens if we discover the soul is real, but operates more like a quantum object than a classical object, and also some people go to study it in a giant dome in the middle of the sea surrounded by alien ghosts which is part of a plot by parallel universes to fight a war based on differing interpretations of measure? His current work-in-progress, The Northern Caves, is even better.

Somebody actually does the full scientific study and determines that atheists are no more angry than the general population. I predicted this result here two years ago.

Kazakh leader apologizes for winning election with 97.7% of the vote, saying “it would have looked undemocratic to intervene to make the victory more modest”.

Polygamists are four times more likely to get heart disease than monogamists after everything else is controlled for, which to me probably means they think they controlled for everything else but they didn’t.

First results from psychology’s largest reproducibility test: by strict criteria, only 39% of published studies replicate; by looser criteria, 63% do.

Speaking of which, you remember that study on how reading problems in a hard-to-read font makes you think about them more rationally? Totally failed to replicate multiple times, now abandoned.

RPG doormat.

A new paper finds that telling people that everyone stereotypes just makes them stereotype more.

A new paper finds black mayors (relative to white mayors) improve position of blacks (relative to whites) in cities where they are elected.

Genetic influence on political beliefs. Everything is some typical combination of heredity and nonshared environment except which party you belong to, which is mostly shared environment. In other words, you come up with your opinions on your own, then ignore them and vote for whoever your parents voted for.

John Boehner was wrong when he said we as a nation spend more money on antacids than we do on politics, but he was surprisingly close – within a factor of three or so.

A Redditor lists facts and fictions about the new spaceship drives that claim to use weird physics. Apparently if they work they will Change Everything Forever, including land transportation. But smart people are very skeptical.

Razib Khan finds that, contrary to the stereotypes, more intelligent and more liberal people are more likely to believe in free speech.

Drinking too much caffeine during pregnancy may double your baby’s risk of childhood obesity

Killing Hitler With Praise And Fire is a Choose Your Own Adventure book about a time traveler trying to assassinate the Fuhrer without messing history up too atrociously.

Posted in Uncategorized | Tagged | 504 Comments

Growth Mindset 4: Growth Of Office

Previously In Series: No Clarity Around Growth Mindset…Yet // I Will Never Have The Ability To Clearly Explain My Beliefs About Growth Mindset // Growth Mindset 3: A Pox On Growth Your Houses

Last month I criticized a recent paper, Paunesku et al’s Mindset Interventions Are A Scalable Treatment For Academic Underachievement, saying that it spun a generally pessimistic set of findings about growth mindset into a generally optimistic headline.

Earlier today, lead author Dr. Paunesku was kind enough to write a very thorough reply, which I reproduce below:

I.

Hi Scott,

Thanks for your provocative blog post about my work (I’m the first author of the paper you wrote about). I’d like to take a few moments to respond to your critiques, but first I’d like to frame my response and tell you a little bit about my own motivation and that of the team I am a member of (PERTS).

Good criticism is what makes science work. We are critical of our own work, but we are happy to have help. Often critics are not thoughtful or specific. So I very much appreciate the intent of your blog (to be thoughtful and specific).

What is our motivation? We are trying to improve our education system so that all students can thrive. If growth mindset is effective, we want it in every classroom possible. If it is ineffective, we want to know about it so we don’t waste people’s time. If it is effective for some students in some classrooms, we want to know where and for whom so that we can help those students.

What is our history and where are we now? PERTS approached social psychological interventions with a fair amount of skepticism at first. In many ways, they seemed too good to be true. But, we thought, “if this is true, we should do everything we can to spread it”. Our work over the last 5 years has been devoted to trying to see if the results that emerged from initial, small experiments (like Aronson et al., 2002 and Blackwell et al., 2007) would continue to be effective when scaled. The paper you are critiquing is a step in that process — not the end of the process. We are continuing research to see where, for whom, and at what scale social psychological approaches to improving education outcomes can be effective.

How do I intend to respond to your criticisms? In some cases, your facts or interpretations are simply incorrect, and I will try to explain why. I also invite you to contact me for follow up. In others cases, we simply have different opinions about what’s important, and we’ll have to agree to disagree. Regardless, I appreciate your willingness to be bold and specific in your criticism. I think that’s brave, and I think such bravery makes science stronger.

First, what is growth mindset?

This quote is from one of your other blog posts (not your critique of my paper), from your post:

If you’re not familiar with it, growth mindset is the belief that people who believe ability doesn’t matter and only effort determines success are more resilient, skillful, hard-working, perseverant in the face of failure, and better-in-a-bunch-of-other-ways than people who emphasize the importance of ability. Therefore, we can make everyone better off by telling them ability doesn’t matter and only hard work does.

If you think that’s what growth mindset is, I can certainly see why you’d find it irritating — and even destructive. I’d like to assure you that the people doing growth mindset research do not ascribe to the interpretation of growth mindset you described. Nor is that interpretation of growth mindset something we aim to communicate through our interventions. So what is growth mindset?

Growth mindset is not the belief that “ability doesn’t matter and only effort determines success.” Growth mindset is the belief that individuals can improve their abilities — usually through effort and by learning more effective strategies. For example, imagine a third grader struggling to learn long division for the first time. Should he interpret his struggle as a sign that he’s bad at math — as a sign that he should give up on math for good? Or would it be more adaptive if he realized that he could probably get a lot better at math if he sought out help from his peers or teachers? The student who thinks he should give up would probably do pretty badly while the student who thinks that he can improve his abilities — and tries to do so by learning new study strategies and practicing them — would do comparatively better.

That’s the core of growth mindset. It’s nothing crazy like thinking ability doesn’t matter. It’s keeping in mind that you can improve and that — to do so — you need to work hard and seek out and practice new, effective strategies.

As someone who has worked closely with Carol Dweck and with her students and colleagues for seven years now, I can personally attest that I have never heard anyone in that extended group of people express the belief that ability does not matter or that only hard work matters. In fact, a growth mindset wouldn’t make any sense if ability didn’t matter because a growth mindset is all about improving ability.

One of the active goals of the group I co-founded (PERTS) is to try to dispel misinterpretations of growth mindset because they can be harmful. I take it as a failure of our group that someone like you — someone who clearly cares about research and about scientific integrity — could walk away from our work with that interpretation of growth mindset. I hope that PERTS, and other groups promoting growth mindset, can get better and better at refining the way we talk about growth mindset so that people can walk away from our work understanding it more clearly. For that perspective, I hope you can continue to engage with us to improve that message so that people don’t continue to misinterpret it.

Anyway, here are my responses to specific points you made in your blog about my paper:

Was the control group a mindset intervention?

You wrote:

“A quarter of the students took a placebo course that just presented some science about how different parts of the brain do different stuff. This was also classified as a “mindset intervention”, though it seems pretty different.”

What makes you think it was classified as a mindset intervention? We called that the control group, and no one on our team ever thought of that as a mindset intervention.

The Elderly Hispanic Woman Effect

You wrote:

Subgroup analysis can be useful to find more specific patterns in the data, but if it’s done post hoc it can lead to what I previously called the Elderly Hispanic Woman Effect…

First, I just want to note that I love calling this the “elderly Hispanic woman effect.” It really brings out the intrinsic ridiculousness of the subgroup analyses researchers sometimes go through in search of an effect with a p<.05. It is indeed unlikely that "elderly Hispanic women" would be a meaningful subgroup for analyzing the effects of a medicine (although it might be a fun thought exercise to try to think of examples of a medicine whose effects would be likely to be moderated by being an elderly Hispanic woman). In bringing up the elderly Hispanic woman effect, you're suggesting that we didn't have an a priori reason to think that underperforming students would benefit from these mindset interventions and that we just looked through a bunch of moderators until we found one with p<.05. Well that's not what we did, and I hope I can convince you that our choice of moderator was perfectly reasonable given prior research and theory. There's a lot of research (and common sense too) to suggest that mindset -- and motivation in general -- matters much more when something is hard than when it is easy. Underachieving students presumably find school more difficult, so it makes sense that we'd want to focus on them. I don't think our choice of subgroup is a controversial or surprising prediction. I think anyone who knows mindset research well would predict stronger effects for students who are struggling. In other words, this is obviously not a case of the elderly Hispanic woman effect because it is totally consistent with prior theory and predictions. What ultimately matters more than any rhetorical argument, however, is whether the effect is robust -- whether it replicates. On that front, I hope you'll be pleased to learn that we just ran a successful replication of this study (in fall 2014) in which we again found that growth mindset improves achievement specifically among at-risk high school students (currently under review). We're also planning yet another large scale replication study this fall with a nationally representative sample of schools so that we can be more confident that the interventions are effective in various types of contexts before giving them away for free to any school that wants them. Is the sense of purpose intervention just a bunch of platitudes?

You wrote:

Still another quarter took a course about “sense of purpose” which talked about how schoolwork was meaningful and would help them accomplish lots of goals and they should be happy to do it.

[Later you say that those “children were told platitudes about how doing well in school will “make their families proud” and “make a positive impact”.]

I wouldn’t say those are platitudes. I think you’re under-appreciating the importance of finding meaning in one’s work. It’s a pretty basic observation about human nature that people are more likely to try hard when it seems like there’s a good reason to try hard. I also think it’s a pretty basic observation about our education system that many students don’t have good reasons for trying hard in school — reasons that resonate with them emotionally and help them find the motivation to do their best in the classroom. In our purpose intervention, we don’t just tell students what to think. We try to scaffold them to think of their own reasons for working hard in school, with a focus on reasons that are more likely to have emotional resonance for students. This type of self-persuasion technique has been used for decades in attitudes research.

We’ve written in more depth about these ideas and explored them through a series of studies. I’d encourage you to read this article if you’re interested.

Our paper title and abstract are misleading

You wrote:

Among ordinary students, the effect on the growth mindset group was completely indistinguishable from zero, and in fact they did nonsignificantly worse than the control group. This was the most basic test they performed, and it should have been the headline of the study. The study should have been titled “Growth Mindset Intervention Totally Fails To Affect GPA In Any Way”.

I think the title you suggest would have been misleading. How?

First, we did find evidence that mindset interventions help underachieving students — and those students are very important from a policy standpoint. As we describe in the paper, those students are more likely to drop out, to end up underemployed, or to end up in prison. So if something can help those students at scale and at a low cost, it’s important for people to know that. That’s why the word “underachievement” is in the title of the paper — because we’re accurately claiming that these interventions can help the important (and large) group of students who are underachieving.

Second, the interventions influenced the way all students think about school in ways that are associated with achievement. Although the higher performing students didn’t show any effects on grades in the semester following the study, their mindsets did change. And, as per the arguments I presented above about the link between mindset and difficulty, it’s quite feasible that those higher-performing students will benefit from this change in mindset down the line. For example, they may choose to take harder classes (e.g., Romero et al., 2014) or they may be more persistent and successful in future classes that are very challenging for them.

A misinterpretation of the y-axis in this graph.

You wrote:

Growth mindset still doesn’t differ from zero [among at-risk students].

This just seems to be a simple misreading of the graph. Either you missed the y-axis of the graph that you reproduced on your blog or you don’t know what a residual standardized score is. Either way, I’ll explain because this is pretty esoteric stuff.

The zero point of the y-axis on that graph is, by definition, the grand mean of the 4 conditions. In other words, the treatment conditions are all hovering around zero because zero is the average, and the average is made up mostly of treatment group students. If we had only had 2 conditions (each with 50% of the students), the y-axis “zero” would have been exactly halfway in between them. So the lack of difference from zero does not mean that the treatment was not different from control. The relevant comparison is between the error bars in the control condition and in the treatment conditions.

You might ask, “why are you showing such a graph?” We’re doing so to focus on the treatment contrast at the heart of our paper — the contrast between the control and treatment groups. The residual standardized graph makes it easy to see the size of that treatment contrast.

We’re combining intervention conditions

You wrote:

Did you catch that phrase “intervention conditions”? The authors of the study write: “Because our primary research question concerned the efficacy of academic mindset interventions in general when delivered via online modules, we then collapsed the intervention conditions into a single intervention dummy code (0 = control, 1 = intervention).

[This line of argument goes on for a long time to suggest that we’re unethical and that there’s actually no evidence for the effects of growth mindset on achievement.]

We collapsed the intervention conditions together for this analysis because we were interested in the overall effect of these interventions on achievement. We wanted to see if it is possible to use scalable, social-psychological approaches to improve the achievement of underperforming students. I’m not sure why you think that’s not a valid hypothesis to test, but we certainly think it is. Maybe this is just a matter of opinion about what’s a meaningful hypothesis to test, but I assure you that this hypothesis (contrast all treatments to control) is consistent with the goal of our group to develop treatments that make an impact on student achievement. As I described before, we have a whole center devoted to trying to improve academic achievement with these types of techniques (see perts.net); so it’s pretty natural that we’d want to see whether our social-psychological interventions improve outcomes for the students who need them most (at-risk students).

You’re correct that the growth mindset intervention did not have a statistically significant impact on course passing rates by itself (at a p<.05 level). However, the effect was in the expected direction with p=0.13 (or a 1-tailed p=.07 -- I hope you'll grant that a 1-tailed test is appropriate here given that we obviously predicted the treatment would improve rather than reduce performance). So the lack of a p<.05 should not be interpreted -- as you seem to interpret it -- as some sort of positive evidence that growth mindset "actually didn't work." Anyway, I would say it warrants further research to replicate this effect (work we are currently engaging in). To summarize, we did not find direct evidence that the growth mindset intervention increased course passing rates on its own at a p<.05 level. We did find that growth mindset increased course passing rates at a trend level -- and found a significant effect on GPA. More importantly for me (though perhaps less relevant to your interest specifically in growth mindset), we did provide evidence that social-psychological interventions, like growth mindset and sense of purpose, can improve academic outcomes for at-risk students. We're excited to be replicating this work now and giving it away in the hopes of improving outcomes for students around the world. Summary

I hope I addressed your concerns about this paper, and I welcome further discussion with you. I’d really appreciate it if you’d revise your blog post in whatever way you think is appropriate in light of my response. I’d hate for people to get the wrong impression of our work, and you don’t strike me as someone who would want to mislead people about scientific findings either.

Finally, you’re welcome to post my response. I may post it to my own web page because I’m sure many other people have similar questions about my work. Just let me know how you’d like to proceed with this dialog.

Thanks for reading,

Dave

II.

First of all, the obvious: this is extremely kind and extremely well-argued and a lot of it is correct and makes me feel awful for being so snarky on my last post.

Things in particular which I want to endorse as absolutely right about the critique:

I wrote “A quarter of the students took a placebo course that just presented some science about how different parts of the brain do different stuff. This was also classified as a “mindset intervention”, though it seems pretty different.” Dr. Paunesku says this is wrong. He’s right. It was an editing error on my part. I meant to add the last sentence to the part on the “sense of purpose” intervention, which was classified as a mindset intervention and which I do think seems pretty different. The placebo intervention was never classified as a mindset intervention and I completely screwed up by inserting that piece of text there rather than two sentences down where I meant it to be. It has since been corrected and I apologize for the error.

If another successful replication found that growth mindset continues to only help the lowest-performing students, I withdraw the complaint that this is sketchy subgroup mining, though I think that in general worrying about this is the correct thing to do.

I did misunderstand the residual standardized graph. I suggested that the control group must have severely declined, and got confused about why. In fact, the graph was not about difference between pre-study scores and post-study scores, but difference between group scores and the average score for all four groups. So when the control group is strongly negative, that means it was much worse than the average of all groups. When growth mindset is not-different-from-zero, it means growth mindset was not different from the average of all four groups, which consists of three treatment groups and one control group. So my interpretation – that growth mindset failed to change children’s grades – is not supported by the data.

(In my defense, I can only plead that in the two hundred fifty comments I received, many by professional psychologists and statisticians, only one person picked up on this point (admittedly, after being primed by my own misinterpretation). And the sort of data I expected to be seeing – difference between students’ pre-intervention and post-intervention scores – does not seem to be available. Nevertheless, this was a huge and unforgiveable screw-up, and I apologize.)

III.

But there are also a few places where I will stick to my guns.

I don’t think my interpretation of growth mindset was that far off the mark. I explain this a little further in this post on differing possible definitions of growth mindset, and I will continue to cite this strongly worded paper by Dweck as defense of my views. It’s not just an obvious and innocuous belief about about always believing you should be able to improve, it’s a belief about very counterintuitive effects of believing that success depends on ability versus effort. It is possible that all sophisticated researchers in the field have a very sophisticated and unobjectionable definition of growth mindset, but that’s not the way it’s presented to the public, even in articles by those same researchers.

Although I’m sure that to researchers in the field statements like “Doing well at school will help me achieve my goal” don’t sound like platitudes, it seems important to me in the context of discussions about growth mindset. Some people have billed growth mindset as a very exciting window into what makes learning tick, and how we should divide everyone into groups based on their mindset, and how it’s the Secret To Success, and so on. Learning that a drop-dead simple intervention – telling students to care about school more – actually does as well or better than growth mindset seems to me like a damning result. I realize it would be kind of insulting to call sense-of-purpose an “active placebo” in the medical sense, but that’s kind of how I can’t help thinking of it.

I’m certainly not suggesting the authors of the papers are unethical for combining growth mindset intervention with sense of purpose intervention. But I think the technique is dangerous, and this is an example. They got a result that was significant at p = 0.13. Dr. Paunesku suggests in his email to me that this should be one-tailed (which makes it p = 0.07) and that this obviously trends towards significance. This is a reasonable argument. But this wasn’t the reasonable argument made in the paper. Instead, they make it look like it achieved classical p < 0.05 significance, or at least make it very hard to notice that it didn't. Even if in this case it was - I can't even say white lie, maybe a white spin - I find the technique very worrying. Suppose I want to prove homeopathy cures cancer. I make a trial with one placebo condition and two intervention conditions - chemotherapy and homeopathy. I find that the chemotherapy condition very significantly outperforms placebo, but the homeopathy condition doesn't. So I combine the two interventions into a single bin and say "Therapeutic interventions such as chemotherapy or homeopathy significantly outperform placebo." Then someone else cites it as "As per a study, homeopathy outperforms placebo." This would obviously be bad. I am just not convinced that growth mindset and sense of purpose are similar enough that you can group them together effectively. This is what I was trying to get at in my bungled sentence about how they're both "mindset" interventions but seem pretty different. Yes, they're both things you tell children in forty-five minute sessions that seem related to how they think about school achievement. But that's a really broad category.

But doesn’t it mean something that growth-mindset was obviously trending toward significance?

First of all, I would have had no problem with saying “trending toward significance” and letting readers draw their own conclusions.

Second of all, I’m not totally sure I buy the justification for a one-tailed test here; after all, it seems like we should use a one-tailed test for homeopathy as well, since as astounding as it would be if homeopathy helped, it would be even more astounding if homeopathy somehow made cancer worse. Further, educational interventions often have the opposite of their desired effect – see eg this campaign to increase tolerance of the disabled which made students like disabled people less than a control intervention. In fact, there’s no need to look further than this very study, which found (counterintuitively) that among students already exposed to sense-of-purpose interventions, adding on an extra growth-mindset intervention seemed to make them do (nonsignificantly) worse. I am not a statistician, but my understanding is you ought to have a super good reason to use a one-tailed test, beyond just “Intuitively my hypothesis is way more likely than the exact opposite of my hypothesis”.

Third of all, if we accept p < 0.13 as "trending towards significance", we have basically tripled the range of acceptable study results, even though everyone agrees our current range of acceptable study results is already way too big and some high percent of all medical studies are wrong and only 39% of psych studies replicate and so on.

(I agree that all of this could be solved by something better than p-values, but p-values are what we’ve got)

I realize I’m being a jerk by insisting on the arbitrary 0.05 criterion, but in my defense, the time when only 39% of studies using a criterion replicate is a bad time to loosen that criterion.

IV.

Here’s what I still believe and what I’ve changed my mind on based on Dr. Paunesku’s response.

1. I totally bungled my sentence on the placebo group being a mindset intervention by mistake. I ashamedly apologize, and have corrected the original post.

2. I totally bungled reading the residual standard score graph. I ashamedly apologize, and have corrected the original post, and put a link in bold text to this post on the top.

3. I don’t know whether the thing I thought the graph showed (no significant preintervention vs. postintervention GPA improvement for growth mindset, or no difference in change from controls) is true. It may be hidden in the supplement somewhere, which I will check later. Possible apology pending further investigation.

4. Growth mindset still had no effect (in fact nonsignificantly negative) for students at large (as opposed to underachievers). I regret nothing.

5. Growth mindset still failed to reach traditional significance criteria for changing pass rates. I regret nothing.

The Future Is Filters

Related to: The Toxoplasma of Rage

I.

Tumblr Savior is a neat program that blocks Tumblr posts containing specific words or phrases. For example, if you don’t want to hear all of the excellent reasons going around Tumblr why you should kill all men, you just block “kill all men” and they never show up. Add a few extra terms like “white dudes” (nothing good ever came of an article including the phrase “white dudes”), “trans”, “cis”, and “pictures of my vagina”, and you can make Tumblr almost usable.

(My own Tumblr Savior list is an interesting record both of my psyche and of mid-2010s current events. Sometimes I imagine a future cyber-archaeologist stumbling across it and asking “But, but…why would he ban the word ‘puppies’?” Poor, poor innocent future archaeologist.)

I recently learned about Twitter blockbots. These are lists maintained by some trustworthy people, such that subscribing to the blockbot automatically blocks everyone on the list. The original was made by some people in the social justice community to help block people they figured other members of the social justice community wouldn’t want to have to deal with. Although some people seem to be added on by hand, the bot also makes educated guesses about who to block by blacklisting accounts that follow the feeds of too many anti-social-justice leaders.

There are rumors of a similar anti-SJ block list of people who engage on online mobbing and harassment in the name of social justice, but I can’t find it online right now and I think it might have been taken down.

An article I read recently (but which I can’t find right now to link to) proposes a higher-tech solution for Facebook’s harassment problems. They want Facebook to train machine-learning programs to detect posts that most people would consider trollish. So far, so boring. The interesting part comes afterwards – instead of auto-blocking those posts, Facebook would assign them a certain number of Troll Points. Users could then set an option for how their Facebook feed should react to Troll Points – for example, by blocking every post with more than a certain amount. That way, people who were concerned about free speech and who enjoy participating in “heated discussion” would be able to do so, while people who wanted a safer and more pleasant browsing experience could have a very low cutoff for taking action.

But the really interesting part got dismissed after a sentence. What if instead of combining everything into Troll Points, Facebook assigned the points in different domains? Foul Language, Blasphemy, Racial Slurs, Threats, Harassment, Dirty Argument Tactics, et cetera. And then I could set that I don’t care about Foul Language or Blasphemy, but I really don’t want to see any Threats or Racial Slurs.

(obviously the correct anarcho-capitalist solution is to have third-party companies making these algorithms and selling them to individual Facebook users, but in a world where Facebook is trying to become more and more closed to third-party apps, that’s probably not going to happen)

So, take all this filtering technology – Tumblr Savior, Twitter blockbots, and hypothetical Facebook Troll Points, combine them together, project them about ten years into the future with slightly better machine learning, and you have an Internet where nobody has to see, even for an instant, anything they don’t want to. What are the implications?

II.

The most obvious possibility is that everyone will be better off because we can avoid trolls. In this nice black-and-white worldview, there are good people, and there are trolls, and eliminating the trolls is a simple straightforward decision that makes the good people better off. This is how The Daily Beast thinks of it (How Block Bot Could Save The Internet), and as anyone who’s been trolled or harassed online knows, there’s a lot of truth to this view.

The second most obvious possibility is that we will become a civilization of wusses safely protected from ever having to hear an opinion we disagree with, or ever having our prejudices challenged. This is how Reason thinks of it (Block Bots Automate Epistemic Closure On Twitter). Surely there’s some truth here too. How hard would it be to create a filter that blocks all conservative/liberal opinions? Just guess based on whether a text links to foxnews.com or dailykos.com, or add in linguistic cues (“death tax”, “job creators”, etc). Once such a filter existed, how many people do you think would use it proudly, bragging about how they’re no longer “wasting their time listening patiently to bigots” or whatever?

But I don’t think the scenario is quite that apocalyptic. If you’re getting all of your exposure to opinions you disagree with from them being shouted in your face by people you can’t avoid, you probably are not going to lose much by not having that happen. The people who are actually interested in holding discussions can still do that. When I was young and therefore stupid I used to hang out at politics forums specifically for this purpose.

The third possibility is that there would be a remarkable shift of discourse in favor of the powerful and against the powerless.

Terrorism has always been a useful weapon of the powerless. The powerful get laws passed through Congress or whatever, but the powerless don’t have that opportunity. They need to get people to pay attention, and blowing those people up has always been an effective tool in that repertoire. We see this most obviously in places like Palestine and the Basque Country. Likewise, as many people have pointed out, the recent riots in Baltimore can be thought of as a group of powerless people trying to make their anger heard in one of the only ways available to them. It would be politically un-savvy to call this “terrorism”, but as acts of destruction intended to promote a political struggle, they probably fit into the same cluster.

But the next step down from terrorism is annoyism. Terrorism is meant to convince by terrorizing those who ignore your cause; annoyism is meant to convince by annoying people who ignore your cause. Think of a bunch of protesters shouting on a major road, or throwing red paint over people wearing fur, or passive-aggressive Tumblr posts starting “dear white dudes”, or, in probably the purest example of the idea, the Black Brunch protests, where a bunch of black people burst into predominantly white restaurants and shout at patrons about how they’re probably complicit with racism. Even if there’s no implicit threat of force, the point is it’s unpleasant and people can’t ignore it even if they want to.

And so the traditional revolutionary chant goes: “No justice, no peace.” But the thing about filters is that they offer the opportunity for peace regardless of whether or not there is justice. At least they do online, which is where people in the future are going to be spending a lot more of their time.

Imagine you are a rich person who doesn’t want to have to listen to people talking about how rich people need to be socially responsible all the time. It makes you feel guilty, and they are saying mean things like that you don’t deserve all of the money you have, and shouting about social parasites and so on.

So you tell your automated filter to just never let you see any message like that again.

There is an oft-discussed division between politically right or neutral loud angry people (“trolls”) and loud angry people on the political left, (“you are not allowed to dictate the terms on which victims of oppression express their righteous anger”). Machine learning programs will not accept that division, and the latter can be magicked out of visibility just as easily as the former.

Imagine being able to put an entire movement on mute. While I can’t deny the appeal, I’m not sure we – and especially not the social justice community, which is currently laughing at the complaints of people who object to their blockbot – have entirely thought this one through.

III.

The part I find most interesting about all of these possibilities is that they force us to bring previously unconscious social decisions into consciousness.

I think most people, if asked “Is it important to listen to arguments by people who disagree with you?” would answer in the affirmative. I also think most people don’t really do this. Maybe having to set a filter would make people explicitly choose to allow some contrary arguments in. Having done that, people could no longer complain about seeing them – they would feel more of an obligation to read and think about them. And of course, anyone looking for anything more than outrage-bait would choose to preferentially let in high-quality, non-insulting examples of disagreeing views, and so get inspired to think clearly instead of just starting one more rage spiral.

And I think most people, if asked “Is it important to listen to the concerns of the less powerful?” would also be pretty strongly in favor – with the caveat that people can recognize annoyism when it’s being used against them and aren’t especially tolerant of it. The ability to completely block out annoyism, combined with people being forced to explicitly choose to listen to alternative opinions, might make groups that currently favor annoyism change tactics to something more pleasant – though possibly less effective.

I think the result would be several carefully separated groups with their own social and epistemic norms, all of which coexist peacefully and in relative isolation from one another – groups which I would hope then develop their own norms about helping powerless members. This would be an interesting step towards what I describe in my Archipelago article as “a world where everyone is a member of more or less the community they deserve.”

Prescriptions, Paradoxes, and Perversities

[WARNING: I am not a pharmacologist. I am not a researcher. I am not a statistician. This is not medical advice. This is really weird and you should not take it too seriously until it has been confirmed]

I.

I’ve been playing around with data from Internet databases that aggregate patient reviews of medications.

Are these any good? I looked at four of the largest such databases – Drugs.com, WebMD, AskAPatient, and DrugLib – as well as psychiatry-specific site CrazyMeds – and took their data on twenty-three major antidepressants. Then I correlated them with one another to see if the five sites mostly agreed.

Correlations between Drugs.com, AskAPatient, and WebMD were generally large and positive (around 0.7). Correlations between CrazyMeds and DrugLib were generally small or negative. In retrospect this makes sense, because these two sites didn’t allow separation of ratings by condition, so for example Seroquel-for-depression was being mixed with Seroquel-for-schizophrenia.

So I threw out the two offending sites and kept Drugs.com, AskAPatient, and WebMD. I normalized all the data, then took the weighted average of all three sites. From this huge sample (the least-reviewed drug had 35 ratings, the most-reviewed drug 4,797) I obtained a unified opinion of patients’ favorite and least favorite antidepressants.

This doesn’t surprise me at all. Everyone secretly knows Nardil and Parnate (the two commonly-used drugs in the MAOI class) are excellent antidepressants1. Oh, nobody will prescribe them, because of the dynamic discussed here, but in their hearts they know it’s true.

Likewise, I feel pretty good to see that Serzone, which I recently defended, is number five. I’ve had terrible luck with Viibryd, and it just seems to make people taking it more annoying, which is not a listed side effect but which I swear has happened.

The table also matches the evidence from chemistry – drugs with similar molecular structure get similar ratings, as do drugs with similar function. This is, I think, a good list.

Which is too bad, because it makes the next part that much more terrifying.

II.

There is a sixth major Internet database of drug ratings. It is called RateRx, and it differs from the other five in an important way: it solicits ratings from doctors, not patients. It’s a great idea – if you trust your doctor to tell you which drug is best, why not take advantage of wisdom-of-crowds and trust all the doctors?

The RateRX logo. Spoiler: this is going to seem really ironic in about thirty seconds.

RateRx has a modest but respectable sample size – the drugs on my list got between 32 and 70 doctor reviews. There’s only one problem.

You remember patient reviews on the big three sites correlated about +0.7 with each other, right? So patients pretty much agree on which drugs are good and which are bad?

Doctor reviews on RateRx correlated at -0.21 with patient reviews. The negative relationship is nonsignificant, but that just means that at best, doctor reviews are totally uncorrelated with patient consensus.

This has an obvious but very disturbing corollary. I couldn’t get good numbers on how times each of the antidepressants on my list were prescribed, because the information I’ve seen only gives prescription numbers for a few top-selling drugs, plus we’ve got the same problem of not being able to distinguish depression prescriptions from anxiety prescriptions from psychosis prescriptions. But total number of online reviews makes a pretty good proxy. After all, the more patients are using a drug, the more are likely to review it.

Quick sanity check: the most reviewed drug on my list was Cymbalta. Cymbalta was also the best selling antidepressant of 2014. Although my list doesn’t exactly track the best-sellers, that seems to be a function of how long a drug has been out – a best-seller that came out last year might have only 1/10th the number of reviews as a best-seller that came out ten years ago. So number of reviews seems to be a decent correlate for amount a drug is used.

In that case, amount a drug is used correlates highly (+0.67, p = 0.005) with doctors’ opinion of the drug, which makes perfect sense since doctors are the ones prescribing it. But amount the drug gets used correlates negatively with patient rating of the drug (-0.34, p = ns), which of course is to be expected given the negative correlation between doctor opinion and patient opinion.

So the more patients like a drug, the less likely it is to be prescribed2.

III.

There’s one more act in this horror show.

Anyone familiar with these medications reading the table above has probably already noticed this one, but I figured I might as well make it official.

I correlated the average rating of each drug with the year it came on the market. The correlation was -0.71 (p < .001). That is, the newer a drug was, the less patients liked it3.

This pattern absolutely jumps out of the data. First- and second- place winners Nardil and Parnate came out in 1960 and 1961, respectively; I can’t find the exact year third-place winner Anafranil came out, but the first reference to its trade name I can find in the literature is from 1967, so I used that. In contrast, last-place winner Viibryd came out in 2011, second-to-last place winner Abilify got its depression indication in 2007, and third-to-last place winner Brintellix is as recent as 2013.

This result is robust to various different methods of analysis, including declaring MAOIs to be an unfair advantage for Team Old and removing all of them, changing which minor tricylics I do and don’t include in the data, and altering whether Deprenyl, a drug that technically came out in 1970 but received a gritty reboot under the name Emsam in 2006, is counted as older or newer.

So if you want to know what medication will make you happiest, at least according to this analysis your best bet isn’t to ask your doctor, check what’s most popular, or even check any individual online rating database. It’s to look at the approval date on the label and choose the one that came out first.

IV.

What the hell is going on with these data?

I would like to dismiss this as confounded, but I have to admit that any reasonable person would expect the confounders to go the opposite way.

That is: older, less popular drugs are usually brought out only when newer, more popular drugs have failed. MAOIs, the clear winner of this analysis, are very clearly reserved in the guidelines for “treatment-resistant depression”, ie depression you’ve already thrown everything you’ve got at. But these are precisely the depressions that are hardest to treat.

Imagine you are testing the fighting ability of three people via ten boxing matches. You ask Alice to fight a Chihuahua, Bob to fight a Doberman, and Carol to fight Cthulhu. You would expect this test to be biased in favor of Alice and against Carol. But MAOIs and all these other older rarer drugs are practically never brought out except against Cthulhu. Yet they still have the best win-loss record.

Here are the only things I can think of that might be confounding these results.

Perhaps because these drugs are so rare and unpopular, psychiatrists only use them when they have really really good reason. That is, the most popular drug of the year they pretty much cluster-bomb everybody with. But every so often, they see some patient who seems absolutely 100% perfect for clomipramine, a patient who practically screams “clomipramine!” at them, and then they give this patient clomipramine, and she does really well on it.

(but psychiatrists aren’t actually that good at personalizing antidepressant treatments. The only thing even sort of like that is that MAOIs are extra-good for a subtype called atypical depression. But that’s like a third of the depressed population, which doesn’t leave much room for this super-precise-targeting hypothesis.)

Or perhaps once drugs have been on the market longer, patients figure out what they like. Brintellix is so new that the Brintellix patients are the ones whose doctors said “Hey, let’s try you on Brintellix” and they said “Whatever”. MAOIs have been on the market so long that presumably MAOI patients are ones who tried a dozen antidepressants before and stayed on MAOIs because they were the only ones that worked.

(but Prozac has been on the market 25 years now. This should only apply to a couple of very new drugs, not the whole list.)

Or perhaps the older drugs have so many side effects that no one would stay on them unless they’re absolutely perfect, whereas people are happy to stay on the newer drugs even if they’re not doing much because whatever, it’s not like they’re causing any trouble.

(but Seroquel and Abilify, two very new drugs, have awful side effects, yet are down at the bottom along with all the other new drugs)

Or perhaps patients on very rare weird drugs get a special placebo effect, because they feel that their psychiatrist cares enough about them to personalize treatment. Perhaps they identify with the drug – “I am special, I’m one of the only people in the world who’s on nefazodone!” and they become attached to it and want to preach its greatness to the world.

(but drugs that are rare because they are especially new don’t get that benefit. I would expect people to also get excited about being given the latest, flashiest thing. But only drugs that are rare because they are old get the benefit, not drugs that are rare because they are new.)

Or perhaps psychiatrists tend to prescribe the drugs they “imprinted on” in medical school and residency, so older psychiatrists prescribe older drugs and the newest psychiatrists prescribe the newest drugs. But older psychiatrists are probably much more experienced and better at what they do, which could affect patients in other ways – the placebo effect of being with a doctor who radiates competence, or maybe the more experienced psychiatrists are really good at psychotherapy, and that makes the patient better, and they attribute it to the drug.

(but read on…)

V.

Or perhaps we should take this data at face value and assume our antidepressants have been getting worse and worse over the past fifty years.

This is not entirely as outlandish as it sounds. The history of the past fifty years has been a history of moving from drugs with more side effects to drugs with fewer side effects, with what I consider somewhat less than due diligence in making sure the drugs were quite as effective in the applicable population. This is a very complicated and controversial statement which I will be happy to defend in the comments if someone asks.

The big problem is: drugs go off-patent after twenty years. Drug companies want to push new, on-patent medications, and most research is funded by drug companies. So lots and lots of research is aimed at proving that newer medications invented in the past twenty years (which make drug companies money) are better than older medications (which don’t).

I’ll give one example. There is only a single study in the entire literature directly comparing the MAOIs – the very old antidepressants that did best on the patient ratings – to SSRIs, the antidepressants of the modern day4. This study found that phenelzine, a typical MAOI, was no better than Prozac, a typical SSRI. Since Prozac had fewer side effects, that made the choice in favor of Prozac easy.

Did you know you can look up the authors of scientific studies on LinkedIn and sometimes get very relevant information? For example, the lead author of this study has a resume that clearly lists him as working for Eli Lilly at the time the study was conducted (spoiler: Eli Lilly is the company that makes Prozac). The second author’s LinkedIn profile shows he is also an operations manager for Eli Lilly. Googling the fifth author’s name links to a news article about Eli Lilly making a $750,000 donation to his clinic. Also there’s a little blurb at the bottom of the paper saying “Supported by a research grant by Eli Lilly and company”, then thanking several Eli Lilly executives by name for their assistance.

This is the sort of study which I kind of wish had gotten replicated before we decided to throw away an entire generation of antidepressants based on the result.

But who will come to phenelzine’s defense? Not Parke-Davis , the company that made it: their patent expired sometime in the seventies, and then they were bought out by Pfizer5. And not Pfizer – without a patent they can’t make any money off Nardil, and besides, Nardil is competing with their own on-patent SSRI drug Zoloft, so Pfizer has as much incentive as everyone else to push the “SSRIs are best, better than all the rest” line.

Every twenty years, pharmaceutical companies have an incentive to suddenly declare that all their old antidepressants were awful and you should never use them, but whatever new antidepressant they managed to dredge up is super awesome and you should use it all the time. This sort of does seem like the sort of situation that might lead to older medications being better than newer ones. A couple of people have been pushing this line for years – I was introduced to it by Dr. Ken Gillman from Psychotropical Research, whose recommendation of MAOIs and Anafranil as most effective match the patient data very well, and whose essay Why Most New Antidepressants Are Ineffective is worth a read.

I’m not sure I go as far as he does – even if new antidepressants aren’t worse outright, they might still trade less efficacy for better safety. Even if they handled the tradeoff well, it would look like a net loss on patient rating data. After all, assume Drug A is 10% more effective than Drug B, but also kills 1% of its users per year, while Drug B kills nobody. Here there’s a good case that Drug B is much better and a true advance. But Drug A’s ratings would look better, since dead men tell no tales and don’t get to put their objections into online drug rating sites. Even if victims’ families did give the drug the lowest possible rating, 1% of people giving a very low rating might still not counteract 99% of people giving it a higher rating.

And once again, I’m not sure the tradeoff is handled very well at all.6.

VI.

In order to distinguish between all these hypotheses, I decided to get a lot more data.

I grabbed all the popular antipsychotics, antihypertensives, antidiabetics, and anticonvulsants from the three databases, for a total of 55,498 ratings of 74 different drugs. I ran the same analysis on the whole set.

The three databases still correlate with each other at respectable levels of +0.46, +0.54, and +0.53. All of these correlations are highly significant, p < 0.01. The negative correlation between patient rating and doctor rating remains and is now a highly significant -0.344, p < 0.01. This is robust even if antidepressants are removed from the analysis, and is notable in both psychiatric and nonpsychiatric drugs.

The correlation between patient rating and year of release is a no-longer-significant -0.191. This is heterogenous; antidepressants and antipsychotics show a strong bias in favor of older medications, and antidiabetics, antihypertensives, and anticonvulsants show a slight nonsignificant bias in favor of newer medications. So it would seem like the older-is-better effect is purely psychiatric.

I conclude that for some reason, there really is a highly significant effect across all classes of drugs that makes doctors love the drugs patients hate, and vice versa.

I also conclude that older psychiatric drugs seem to be liked much better by patients, and that this is not some kind of simple artifact or bias, since if such an artifact or bias existed we would expect it to repeat in other kinds of drugs, which it doesn’t.

VII.

Please feel free to check my results. Here is a spreadsheet (.xls) containing all of the data I used for this analysis. Drugs are marked by class: 1 is antidepressants, 2 is antidiabetics, 3 is antipsychotics, 4 is antihypertensives, and 5 is anticonvulsants. You should be able to navigate the rest of it pretty easily.

One analysis that needs doing is to separate out drug effectiveness versus side effects. The numbers I used were combined satisfaction ratings, but a few databases – most notably WebMD – give you both separately. Looking more closely at those numbers might help confirm or disconfirm some of the theories above.

If anyone with the necessary credentials is interested in doing the hard work to publish this as a scientific paper, drop me an email and we can talk.

Footnotes

1. Technically, MAOI superiority has only been proven for atypical depression, the type of depression where you can still have changing moods but you are unhappy on net. But I’d speculate that right now most patients diagnosed with depression have atypical depression, far more than the studies would indicate, simply because we’re diagnosing less and less severe cases these days, and less severe cases seem more atypical.

2. First-place winner Nardil has only 16% as many reviews as last-place winner Viibryd, even though Nardil has been on the market fifty years and Viibryd for four. Despite its observed superiority, Nardil may very possibly be prescribed less than 1% as often as Viibryd.

3. Pretty much the same thing is true if, instead of looking at the year they came out, you just rank them in order from earliest to latest.

4. On the other hand, what we do have is a lot of studies comparing MAOIs to imipramine, and a lot of other studies comparing modern antidepressants to imipramine. For atypical depression and dysthymia, MAOIs beat imipramine handily, but the modern antidepressants are about equal to imipramine. This strongly implies the MAOIs beat the modern antidepressants in these categories.

5. Interesting Parke-Davis facts: Parke-Davis got rich by being the people to market cocaine back in the old days when people treated it as a pharmaceutical, which must have been kind of like a license to print money. They also worked on hallucinogens with no less a figure than Aleister Crowley, who got a nice tour of their facilities in Detroit.

6. Consider: Seminars In General Psychiatry estimates that MAOIs kill one person per 100,000 patient years. A third of all depressions are atypical. MAOIs are 25 percentage points more likely to treat atypical depression than other antidepressants. So for every 100,000 patients you give a MAOI instead of a normal antidepressant, you kill one and cure 8,250 who wouldn’t otherwise be cured. The QALY database says that a year of moderate depression is worth about 0.6 QALYs. So for every 100,000 patients you give MAOIs, you’re losing about 30 QALYs and gaining about 3,300.

OT19: Don’t Thread On Me

This is the semimonthly open thread. Post about anything you want, ask random questions, whatever. Also:

1. Comments of the week are Scott McGreal actually reading the supplement of that growth mindset study, and gwern responding to the cactus-person story in the most gwernish way possible.

2. Worthy members of the in-group who need financial help: CyborgButterflies (donate here) and as always the guy who runs CrazyMeds (donate by clicking the yellow DONATE button on the right side here)

3. I offer you a statistical mystery a little closer to home than the ones we usually investigate around here: how come my blog readership has collapsed? The week-by-week chart looks like this:

Notice that the week of February 23rd it falls and has never recovered. In fact, I can pinpoint the specific day:

Between February 20th and February 21, I lost about a third of my blog readership, and they haven’t come back.

Now, I did go on vacation starting February 20 and make fewer posts than normal during that time, but usually when I don’t post for a while I get a very gradual drop-off, whereas here, the day after a relatively popular post, everyone departs all of a sudden. And I’ve been back from vacation for a month and a half without anything getting better.

I would assume maybe WordPress changed its method of calculating statistics around that time, but I can’t find any evidence of this on the WordPress webpage. That suggests it might be a real thing. Did any of you leave around February 20th for some reason and not check the blog again until today? Did anything happen February 20th that tempted you to leave and you only barely hung on? I get self-esteem and occasionally money from blog hits, so this is kind of bothering me.

4. I want to clarify that when I discuss growth mindset, the strongest conclusion I can come to is that it’s not on as firm ground as some people seem to think. I do not endorse claims that I have “debunked” growth mindset or that it is “stupid”. There are still lots of excellent studies in favor, they just have to be interpreted in the context of other things.

Posted in Uncategorized | Tagged | 869 Comments

Nefarious Nefazodone And Flashy Rare Side Effects

[Epistemic status: I am still in training. I am not an expert on drugs. This is poorly-informed speculation about drugs and it should not be taken seriously without further research. Nothing in this post is medical advice.]

I.

Which is worse – ruining ten million people’s sex lives for one year, or making one hundred people’s livers explode?

I admit I sometimes use this blog to speculate about silly moral dilemmas for no reason, but that’s not what’s happening here. This is a real question that I deal with on a daily basis.

SSRIs, the class which includes most currently used antidepressants, are very safe in the traditional sense of “unlikely to kill you”. Suicidal people take massive overdoses of SSRIs all the time, and usually end up with little more than a stomachache for their troubles. On the other hand, there’s increasing awareness of very common side effects which, while not disabling, can be pretty unpleasant. About 50% of users report decreased sexual abilities, sometimes to the point of total loss of libido or anorgasmia. And something like 25% of users experience “emotional blunting” and the loss of ability to feel feelings normally.

Nefazodone (brand name Serzone®, which would also be a good brand name for a BDSM nightclub) is an equally good (and maybe better) antidepressant that does not have these side effects. On the other hand, every year, one in every 300,000 people using nefazodone will go into “fulminant hepatic failure”, which means their liver suddenly and spectacularly stops working and they need a liver transplant or else they die.

There are a lot of drug rating sites, but the biggest is Drugs.com. 467 Drugs.com users have given Celexa, a very typical SSRI, an average rating of 7.8/10. 14 users have given nefazodone an average rating of 9.1/10.

CrazyMeds might not be as dignified as Drugs.com, but they have a big and well-educated user base and they’re psych-specific. Their numbers are 3.3/5 (n = 253) for Celexa and 4.1/5 (n = 47) for nefazodone.

So both sites’ users seem to agree that nefazodone is notably better than Celexa, in terms of a combined measure of effectiveness and side effects.

But nefazodone is practically never used. It’s actually illegal in most countries. In the United States, parent company Bristol-Myers Squibb (which differs from normal Bristol-Myers in that it was born without innate magical ability) withdrew it from the market, and the only way you can find it nowadays is to get it is from an Israeli company that grabbed the molecule after it went off-patent. In several years working in psychiatry, I have never seen a patient on nefazodone, although I’m sure they exist somewhere. I would estimate its prescription numbers are about 1% of Celexa’s, if that.

The problem is the hepatic side effects. Nobody wants to have their liver explode.

But. There are something like thirty million people in the US on antidepressants. If we put them all on nefazodone, that’s about a hundred cooked livers per year. If we put them all on SSRIs, at least ten million of them will get sexual side effects, plus some emotional blunting.

My life vastly improved when I learned there was a searchable database of QALYs for different conditions. It doesn’t have SSRI-induced sexual dysfunction, but it does have sexual dysfunction due to prostate cancer treatment, and I assume that sexual dysfunction is about equally bad regardless of what causes it. Their sexual dysfunction has some QALY weights averaging about 0.85. Hm.

Assume everyone with fulminant liver failure dies. That’s not true; some get liver transplants, maybe some even get a miracle and recover. But assume everyone dies – and further, they die at age 30, cutting their lives short by fifty years.

In that case, putting all depressed people on nefazodone for a year costs 5,000 QALYs, but putting all depressed people on SSRIs for a year costs 1,500,000 QALYs. The liver failures may be flashier, but the 3^^^3 dust specks worth of poor sex lives add up to more disutility in the end.

I don’t want to overemphasize this particular calculation for a couple of reasons. First, SSRIs and nefazodone both have other side effects besides the major ones I’ve focused on here. Second, I don’t know if the level of SSRI-induced sexual dysfunction is as bad as the prostate-surgery-induced sexual dysfunction on the database. Third, there are a whole bunch of antidepressants that are neither SSRIs nor nefazodone and which might be safer than either.

But I do want to emphasize this pattern, because it recurs again and again.

II.

In that spirit, which would you rather have – something like a million people addicted to amphetamines, or something like ten people have their skin eat itself from the inside?

I can’t get good numbers on how many adults abuse Adderall, but a quick glance at the roster for my hospital’s rehab unit suggests “a lot”. Huffington Post calls it the most abused prescription drug in America, which sounds about right to me. Honestly there are worse things to be addicted to than Adderall, but it’s not completely without side effects. The obvious ones are anxiety, irritability, occasionally frank psychosis, and sometimes heart problems – but a lot of the doctors I work with go beyond what the research can really prove and suggest it can produce lasting negative personality change and predispose people to other forms of addictive and impulsive behavior.

If you’ve got to give adults a stimulant, I would much prefer modafinil. It’s not addictive, it lacks most of Adderall’s side effects, and it works pretty well. I’ve known many people on modafinil and they give it pretty universally positive reviews.

On the other hand, modafinil may or may not cause a skin reaction called Stevens Johnson Syndrome/Toxic Epidermal Necrolysis, which like most things with both “toxic” and “necro” in the name is really really bad. The original data suggesting a connection came from kids, who get all sorts of weird drug effects that adults don’t, but since then some people have claimed to have found a connection with adults. Some people get SJS anyway just by bad luck, or because they’re taking other drugs, so it’s really hard to attribute cases specifically to modafinil.

Gwern’s Modafinil FAQ mentions an FDA publication which argues that the background rate of SJS/TEN is 1-2 per million people per year, but the modafinil rate is about 6 per million people per year. However, there are only three known cases of a person above age 18 on modafinil getting SJS/TEN, and this might not be different from background rates after all. Overall the evidence that modafinil increases the rate of SJS/TEN in adults at all is pretty thin, and if it does, it’s as rare as hen’s teeth (in fact, very close to the same rate as liver failure from nefazodone).

(also: consider that like half of Silicon Valley is on modafinil, yet San Francisco Bay is not yet running red with blood.)

(also: ibuprofen is linked to SJS/TEN, with about the same odds ratio as modafinil, but nobody cares, and they are correct not to care.)

I said I’ve never seen a doctor prescribe nefazodone in real life; I can’t say that about modafinil. I have seen one doctor prescribe modafinil. It happened like this: a doctor I was working with was very upset, because she had an elderly patient with very low energy for some reason, I can’t remember, maybe a stroke, and wanted to give him Adderall, but he had a heart arrythmia and Adderall probably wouldn’t be safe for him.

I asked “What about modafinil?”

She said, “Modafinil? Really? But doesn’t that sometimes cause Stevens Johnson Syndrome?”

And then I glared at her until she gave in and prescribed it.

But this is very, very typical. Doctors who give out Adderall like candy have no associations with modafinil except “that thing that sometimes causes Stevens-Johnson Syndrome” and are afraid to give it to people.

III.

Nefazodone and modafinil are far from the only examples of this pattern. MAOIs are like this too. So is clozapine. If I knew more about things other than psychiatry, I bet I could think of examples from other fields of medicine.

And partially this is natural and understandable. Doctors swear an oath to “first do no harm”, and toxic epidermal necrolysis is pretty much the epitome of harm. Thought experiments like torture vs dust specks suggest that most people’s moral intuitions say that no amount of aggregated lesser harms like sexual side effects and amphetamine addictions can equal the importance of avoiding even a tiny chance of some great harm like liver failure or SJS/TEN. Maybe your doctor, if you asked her directly, would endorse a principled stance of “I am happy to give any number of people anxiety and irritability in order to avoid even the smallest chance of one case of toxic epidermal necrolysis.”

And yet.

The same doctors who would never dare give nefazodone, consider Seroquel a perfectly acceptable second-line treatment for depression. Along with other atypical antipsychotics, Seroquel raises the risk of sudden cardiac death by about 50%. The normal risk of cardiac sudden death in young people is about 10 in 100,000 per year, so if my calculations are right, low-dose Seroquel causes an extra cardiac death once per every 20,000 patient-years. That’s ten times as often as nefazodone causes an extra liver death.

Yet nefazodone was taken off of the market by its creators and consigned to the dustbin of pharmacological history, and Seroquel is the sixth-best-selling drug in the United States, commonly given for depression, simple anxiety, and sometimes even to help people sleep.

Why the disconnect? Here’s a theory: sudden cardiac death happens all the time; sometimes God just has it in for you and your heart stops working and you die. Antipsychotics can increase the chances of that happening, but it’s a purely statistical increase, such that we can detect it aggregated over large groups but never be sure that it played a role in any particular case. The average person who dies of Seroquel never knows they died of Seroquel, but the average person who dies from nefazodone is easily identified as a nefazodone-related death. So nefazodone gets these big stories in the media about this young person who died by taking this exotic psychiatric drug, and it becomes a big deal and scares the heck out of everybody. When someone dies of Seroquel, it’s just an “oh, so sad, I guess his time has come.”

But the end result is this. When treatment with an SSRI fails, nefazodone and Seroquel naively seem to be equally good alternatives. Except nefazodone has a death rate of 1/300,000 patient years, and Seroquel 1/20,000 patient years. And yet everyone stays the hell away from the nefazodone because it’s known to be unsafe, and chooses the Seroquel.

I conclude either doctors are terrible at thinking about risk, or else maybe a little too good at thinking about risk.

I bring up the latter option because there’s a principal-agent problem going on here. Doctors want to do what’s best for their patients. But they also want to do what’s best for themselves, which means not getting sued. No one has ever sued their doctor because they got a sexual side effect from SSRIs, but if somebody dies because they’re the lucky 1/300,000 who gets liver failure from nefazodone, you can bet their family’s going to sue. Suddenly it’s not a matter of comparing QALYs, it’s a matter of comparing zero percent chance of lawsuit with non-zero percent chance of lawsuit.

(Fermi calculation: if a doctor has 100 patients at a time on antidepressants, and works for 30 years, then if she uses Serzone as her go-to antidepressant, she’s risking a 1% chance of getting the liver failure side effect once in her career. That’s small, but since a single bad lawsuit can bankrupt a doctor, it’s worth taking seriously.)

And that would be a tough lawsuit to fight. “Yes, Your Honor, I knew when I prescribed this drug that it sometimes makes people’s livers explode, but the alternative often gives people a bad sex life, and according to the theory of utilitarianism as propounded by 18th century philosopher Jeremy Bentham – ” … “Bailiff, club this man”.

And the same facet of nefazodone that makes it exciting for the media makes it exciting for lawsuits. When someone dies of nefazodone toxicity, everyone knows. When someone dies of Seroquel, “oh, so sad, I guess his time has come”.

That makes Seroquel a lot safer than nefazodone. Safer for the doctor, I mean. The important kind of safer.

This is why, as I mentioned before, I hate lawsuits as a de facto regulatory mechanism. Our de jure regulatory mechanism, the FDA, is pretty terrible, but to its credit it hasn’t banned nefazodone. One time it banned clozapine because of a flashy rare side effect, but everyone yelled at them and they apologized and changed their mind. With lawsuits there’s nobody to yell at, so we just end up with people very quietly adjusting their decisions in the shadows and nobody else being any the wiser.

I don’t want to overemphasize this, because I think it’s only one small part of the problem. After all, a lot of countries withdrew nefazodone entirely and didn’t even give lawsuits a chance to enter the picture.

But whatever the cause, the end result is that drugs with rare but spectacular side effects get consistently underprescribed relative to drugs with common but merely annoying side effects, or drugs that have more side effects but manage to hide them better.

Growth Mindset 3: A Pox On Growth Your Houses

[EDIT: The author of this paper has responded; I list his response here.]

Jacques Derrida proposed a form of philosophical literary criticism called deconstruction. I’ll be the first to admit I don’t really understand it, but it seems to have something to do with assuming all texts secretly contradict their stated premise and apparent narrative, then hunting down and exposing the plastered-over areas where the author tries to hide this.

I have no idea whether this works for literature or not, but it’s a useful way to read scientific papers.

Consider a popular field – or, at least, a field where a certain position is popular. For example, we’ve been talking a lot about growth mindset recently. There seem to be a lot of researchers working to prove growth mindset and not a lot working to disprove it. Journals are pretty interested in studies showing growth mindset interventions work, and maybe not so interested in studies showing they don’t. I’ll admit that my strong suspicions of publication bias don’t seem to be borne out by the facts here – see this meta-analysis – but I bet its more sinister cousin “all experimenters believe the same thing and have the same experimenter effects” bias is alive and well.

In a field like that, you’re not going to get the contrarian studies you want, but one way to find the other side of the issue is to look a little more closely at the studies that do get published, the ones that say they’re in support of the thesis, and see if you can find anything incriminating.

Here’s a perfect example: Mindset Interventions Are A Scalable Treatment For Academic Underachievement, by a team of six researchers including Carol Dweck.

The abstract reads:

The efficacy of academic-mind-set interventions has been demonstrated by small-scale, proof-of-concept interventions, generally delivered in person in one school at a time. Whether this approach could be a practical way to raise school achievement on a large scale remains unknown. We therefore delivered brief growth-mind-set and sense-of-purpose interventions through online modules to 1,594 students in 13 geographically diverse high schools. Both interventions were intended to help students persist when they experienced academic difficulty; thus, both were predicted to be most beneficial for poorly performing students. This was the case. Among students at risk of dropping out of high school (one third of the sample), each intervention raised students’ semester grade point averages in core academic courses and increased the rate at which students performed satisfactorily in core courses by 6.4 percentage points. We discuss implications for the pipeline from theory to practice and for education reform.

This sounds really, really impressive! It’s hard to imagine any stronger evidence in growth mindset’s favor.

And then you make the mistake of reading the actual paper.

The paper asked a 1,594 students from a bunch of different high schools to take a 45 minute online course.

A quarter of the students took a placebo course that just presented some science about how different parts of the brain do different stuff.

Another quarter took a course that was supposed to teach growth mindset.

Still another quarter took a course about “sense of purpose” which talked about how schoolwork was meaningful and would help them accomplish lots of goals and they should be happy to do it. This was also classified as a “mindset intervention”, though it seems pretty different.

And the final quarter took both the growth mindset course and the “sense of purpose” course.

Then they let all students continue taking their classes for the rest of the semester and saw what happened, which was this:

[EDIT: I totally bungled these graphs! See discussion of exactly how on the author’s reply above, without which the information below will be misleading at best]

Among ordinary students, the effect on the growth mindset group was completely indistinguishable from zero, and in fact they did nonsignificantly worse than the control group. This was the most basic test they performed, and it should have been the headline of the study. The study should have been titled “Growth Mindset Intervention Totally Fails To Affect GPA In Any Way”.

Instead they went to subgroup analysis. Subgroup analysis can be useful to find more specific patterns in the data, but if it’s done post hoc it can lead to what I previously called the Elderly Hispanic Woman Effect, after medical papers that can’t find their drug has any effect on people at large, so they keep checking different subgroups – young white men…nothing. Old black men…nothing. Middle-aged Asian transgender people…nothing. Newborn Australian aboriginal butch lesbians…nothing. Elderly Hispanic women…p = 0.049…aha! And the study gets billed as “Scientists Find Exciting New Drug That Treats Diabetes In Elderly Hispanic Women.”

As per the abstract, the researchers decided to focus on an “at risk” subgroup because they had principled reasons to believe mindset interventions would work better on them. In their subgroup of 519 students who had a GPA of 2.0 or less last semester, or who failed one or more academic courses last semester:

Growth mindset still doesn’t differ from zero. And growth mindset does nonsignificantly worse than their “sense of purpose” intervention where they tell children to love school. In fact, the students who take both “sense of purpose” and growth mindset actually do (nonsignificantly) worse than sense-of-purpose alone!

But the control group mysteriously started doing much worse in all their classes right after the study started, so growth mindset is significantly better than the control group. Hooray!

Why would the control group’s GPA suddenly decline? The simplest answer would be that by coincidence the class got harder right after the study started, and only the intervention kids were resilient enough to deal with it – but that can’t be right, because this was done at eleven different schools, and they wouldn’t have all had their coursework get harder at the same time.

Another possibility is that sufficiently low-functioning kids are always declining – that is, as time goes on they get more and more behind in their coursework, so their grades at time t+1 are always less than at time t, and maybe growth mindset has arrested this decline. This is plausible and I’d be interested in seeing if other studies have found this.

Perhaps aware that this is not very convincing, the authors go on to do another analysis, this one of percent of students passing their classes.

This is the same group of at-risk students as the last one. It’s graphing what percent of these students pass versus fail their courses. The graph on th left shows that a significantly higher number of students in the intervention conditions pass their courses than in the control condition.

This is better, but one part still concerns me.

Did you catch that phrase “intervention conditions”? The authors of the study write: “Because our primary research question concerned the efficacy of academic mindset interventions in general when delivered via online modules, we then collapsed the intervention conditions into a single intervention dummy code (0 = control, 1 = intervention).

We don’t know whether growth mindset did anything for even these students in this little subgroup, because it was collapsed together with the (more effective) “sense of purpose” intervention before any of these tests were done. I don’t know if this is just for convenience, or if it is to obfuscate that it didn’t work on its own.

[EDIT: Scott McGreal looks further and finds in the supplementary material that growth mindset alone did NOT significantly improve pass rates!]

The abstract of this study tells you none of this. It just says: “Mindset Interventions Are A Scalable Treatment For Academic Overachievement…Among students at risk of dropping out of high school (one third of the sample), each intervention raised students’ semester grade point averages in core academic courses and increased the rate at which students performed satisfactorily in core courses by 6.4 percentage points” From the abstract, this study is a triumph.

But my own summary of these results, as relevant to growth mindset is as follows:

For students with above a 2.0 GPA, a growth mindset intervention did nothing.

For students with below a 2.0 GPA, the growth mindset interventions may not have improved GPA, but may have prevented GPA from falling, which for some reason it was otherwise going to do.

Even in those students, it didn’t do any better than a “sense-of-purpose” intervention where children were told platitudes about how doing well in school will “make their families proud” and “make a positive impact”.

In no group of students did it significantly increase chance of passing any classes.

Haishan writes:

If ye read only the headlines, what reward have ye? Do not even the policymakers the same? And if ye take the abstract at its face, what do ye more than others? Do not even the science journalists so?”

Titles, abstracts, and media presentations are where authors can decide how to report a bunch of different, often contradictory results in a way that makes it look like they have completely proven their point. A careful look at the study may find that their emphasis is misplaced, and give you more than enough ammunition against a theory even where the stated results are glowingly positive.

The only reason we were told these results is that they were in the same place as a “sense of purpose mindset” intervention that looked a little better, so it was possible to publish the study and claim it as a victory for mindsets in general. How many studies that show similar results for growth mindset lack a similar way of spinning the data, and so never get seen at all?

Universal Love, Said The Cactus Person

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Right,” I said. “I’m absolutely in favor of both those things. But before we go any further, could you tell me the two prime factors of 1,522,605,027, 922,533,360, 535,618,378, 132,637,429, 718,068,114, 961,380,688, 657,908,494 ,580,122,963, 258,952,897, 654,000,350, 692,006,139?

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

The sea was made of strontium; the beach was made of rye. Above my head, a watery sun shone in an oily sky. A thousand stars of sertraline whirled round quetiapine moons, and the sand sizzled sharp like cooking oil that hissed and sang and threatened to boil the octahedral dunes.

“Okay,” I said. “Fine. Let me tell you where I’m coming from. I was reading Scott McGreal’s blog, which has some good articles about so-called DMT entities, and mentions how they seem so real that users of the drug insist they’ve made contact with actual superhuman beings and not just psychedelic hallucinations. You know, the usual Terence McKenna stuff. But in one of them he mentions a paper by Marko Rodriguez called A Methodology For Studying Various Interpretations of the N,N-dimethyltryptamine-Induced Alternate Reality, which suggested among other things that you could prove DMT entities were real by taking the drug and then asking the entities you meet to factor large numbers which you were sure you couldn’t factor yourself. So to that end, could you do me a big favor and tell me the factors of 1,522,605,027, 922,533,360, 535,618,378, 132,637,429, 718,068,114, 961,380,688, 657,908,494, 580,122,963, 258,952,897, 654,000,350, 692,006,139?

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

The sea turned hot and geysers shot up from the floor below. First one of wine, then one of brine, then one more yet of turpentine, and we three stared at the show.

“I was afraid you might say that. Is there anyone more, uh, verbal here whom I could talk to?”

“Universal love,” said the cactus person.

At the sound of that, the big green bat started rotating in place. On its other side was a bigger greener bat, with a ancient, wrinkled face.

Not splitting numbers / but joining Mind,” it said.
Not facts or factors or factories / but contact with the abstract attractor that brings you back to me
Not to seek / but to find

“I don’t follow,” I said.

Not to follow / but to jump forth into the deep
Not to grind or to bind or to seek only to find / but to accept
Not to be kept / but to wake from sleep

The bat continued to rotate, until the first side I had seen swung back into view.

“Okay,” I said. “I’m going to hazard a guess as to what you’re talking about, and you tell me if I’m right. You’re saying that, like, all my Western logocentric stuff about factoring numbers in order to find out the objective truth about this realm is missing the point, and I should be trying to do some kind of spiritual thing involving radical acceptance and enlightenment and such. Is that kind of on the mark?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Frick,” I said. “Well, okay, let me continue.” The bat was still rotating, and I kind of hoped that when the side with the creepy wrinkled face came into view it might give me some better conversation. “I’m all about the spiritual stuff. I wouldn’t be here if I weren’t deeply interested in the spiritual stuff. This isn’t about money or fame or anything. I want to advance psychedelic research. If you can factor that number, then it will convince people back in the real – back in my world that this place is for real and important. Then lots of people will take DMT and flock here and listen to what you guys have to say about enlightenment and universal love, and make more sense of it than I can alone, and in the end we’ll have more universal love, and…what was the other thing?”

“Transcendent joy,” said the big green bat.

“Right,” I said. “We’ll have more transcendent joy if you help me out and factor the number than if you just sit there being spiritual and enigmatic.”

“Lovers do not love to increase the amount of love in the world / But for the mind that thrills
And the face of the beloved, which the whole heart fills / the heart and the art never apart, ever unfurled
And John Stuart is one of / the dark satanic mills”

“I take it you’re not consequentialists,” I said. “You know that’s really weird, right. Like, not just ‘great big green bat with two faces and sapient cactus-man’ weird, but like really weird. You talk about wanting this spiritual enlightenment stuff, but you’re not going to take actions that are going to increase the amount of spiritual enlightenment? You’ve got to understand, this is like a bigger gulf for me than normal human versus ineffable DMT entity. You can have crazy goals, I expect you to have crazy goals, but what you’re saying now is that you don’t pursue any goals at all, you can’t be modeled as having desires. Why would you do that?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Now you see here,” I said. “Everyone in this conversation is in favor of universal love and transcendent joy. But I’ve seen the way this works. Some college student gets his hands on some DMT, visits here, you guys tell him about universal love and transcendent joy, he wakes up, says that his life has been changed, suddenly he truly understands what really matters. But it never lasts. The next day he’s got to get up and go to work and so on, and the universal love lasts about five minutes until his boss starts yelling at him for writing his report in the wrong font, and before you know it twenty years later he’s some slimy lawyer who’s joking at a slimy lawyer party about the one time when he was in college and took some DMT and spent a whole week raving about transcendent joy, and all the other slimy lawyers laugh, and he laughs with them, and so much for whatever spiritual awakening you and your colleagues in LSD and peyote are trying to kindle in humanity. And if I accept your message of universal love and transcendent joy right now, that’s exactly what’s going to happen to me, and meanwhile human civilization is going to keep being stuck in greed and ignorance and misery. So how about you shut up about universal love and you factor my number for me so we can start figuring out a battle plan for giving humanity a real spiritual revolution?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

A meteorite of pure delight struck the sea without a sound. The force of the blast went rattling past the bat and the beach, disturbing each, then made its way to a nearby bay of upside-down trees with their roots in the breeze and their branches underground.

“I demand a better answer than that,” I demanded.

The other side of the bat spun into view.

“Chaos never comes from the Ministry of Chaos / nor void from the Ministry of Void
Time will decay us but time can be left blank / destroyed
With each Planck moment ever fit / to be eternally enjoyed”

“You’re making this basic mistake,” I told the big green bat. “I honestly believe that there’s a perspective from which Time doesn’t matter, where a single moment of recognition is equivalent to eternal recognition. The problem is, if you only have that perspective for a moment, then all the rest of the time, you’re sufficiently stuck in Time to honestly believe you’re stuck in Time. It’s like that song about the hole in the bucket – if the hole in the bucket were fixed, you would have the materials needed to fix the hole in the bucket. But since it isn’t, you don’t. Likewise, if I understood the illusoriness…illusionality…whatever, of time, then I wouldn’t care that I only understood it for a single instant. But since I don’t, I don’t. Without a solution to the time-limitedness of enlightenment that works from within the temporal perspective, how can you consider it solved at all?”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

The watery sun began to run and it fell on the ground as rain. It became a dew that soaked us through, and as the cold seemed to worsen the cactus person hugged himself to stay warm but his spines pierced his form and he howled in a fit of pain.

“You know,” I said, “sometimes I think the kvithion sumurhe had the right of it. The world is an interference pattern between colliding waves of Truth and Beauty, and either one of them pure from the source and undiluted by the other will be fatal. I think you guys and some of the other psychedelics might be pure Beauty, or at least much closer to the source than people were meant to go. I think you can’t even understand reason, I think you’re constitutionally opposed to reason, and that the only way we’re ever going to get something that combines your wisdom and love and joy with reason is after we immanentize the eschaton and launch civilization into some perfected postmessianic era where the purpose of the world is fully complete. And that as much as I hate to say it, there’s no short-circuiting the process.”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“I’m dissing you, you know. I’m saying you guys are so intoxicated on spiritual wisdom that you couldn’t think straight if your life depended on it; that your random interventions in our world and our minds look like the purposeless acts of a drunken madman because that’s basically more or less what they are. I’m saying if you had like five IQ points between the two of you, you could tap into your cosmic consciousness or whatever to factor a number that would do more for your cause than all your centuries of enigmatic dreams and unasked-for revelations combined, and you ARE TOO DUMB TO DO IT EVEN WHEN I BASICALLY HOLD YOUR HAND THE WHOLE WAY. Your spine. Your wing. Whatever.”

“Universal love,” said the cactus person.

“Transcendent joy,” said the big green bat.

“Fuck you,” said I.

I saw the big green bat bat a green big eye. Suddenly I knew I had gone too far. The big green bat started to turn around what was neither its x, y, or z axis, slowly rotating to reveal what was undoubtedly the biggest, greenest bat that I had ever seen, a bat bigger and greener than which it was impossible to conceive. And the bat said to me:

“Sir. Imagine you are in the driver’s seat of a car. You have been sitting there so long that you have forgotten that it is the seat of a car, forgotten how to get out of the seat, forgotten the existence of your own legs, indeed forgotten that you are a being at all separate from the car. You control the car with skill and precision, driving it wherever you wish to go, manipulating the headlights and the windshield wipers and the stereo and the air conditioning, and you pronounce yourself a great master. But there are paths you cannot travel, because there are no roads to them, and you long to run through the forest, or swim in the river, or climb the high mountains. A line of prophets who have come before you tell you that the secret to these forbidden mysteries is an ancient and terrible skill called GETTING OUT OF THE CAR, and you resolve to learn this skill. You try every button on the dashboard, but none of them is the button for GETTING OUT OF THE CAR. You drive all of the highways and byways of the earth, but you cannot reach GETTING OUT OF THE CAR, for it is not a place on a highway. The prophets tell you GETTING OUT OF THE CAR is something fundamentally different than anything you have done thus far, but to you this means ever sillier extremities: driving backwards, driving with the headlights on in the glare of noon, driving into ditches on purpose, but none of these reveal the secret of GETTING OUT OF THE CAR. The prophets tell you it is easy; indeed, it is the easiest thing you have ever done. You have traveled the Pan-American Highway from the boreal pole to the Darien Gap, you have crossed Route 66 in the dead heat of summer, you have outrun cop cars at 160 mph and survived, and GETTING OUT OF THE CAR is easier than any of them, the easiest thing you can imagine, closer to you than the veins in your head, but still the secret is obscure to you.”

A herd of bison came into listen, and voles and squirrels and ermine and great tusked deer gathered round to hear as the bat continued his sermon.

“And finally you drive to the top of the highest peak and you find a sage, and you ask him what series of buttons on the dashboard you have to press to get out of the car. And he tells you that it’s not about pressing buttons on the dashboard and you just need to GET OUT OF THE CAR. And you say okay, fine, but what series of buttons will lead to you getting out of the car, and he says no, really, you need to stop thinking about dashboard buttons and GET OUT OF THE CAR. And you tell him maybe if the sage helps you change your oil or rotates your tires or something then it will improve your driving to the point where getting out of the car will be a cinch after that, and he tells you it has nothing to do with how rotated your tires are and you just need to GET OUT OF THE CAR, and so you call him a moron and drive away.”

“Universal love,” said the cactus person.

“So that metaphor is totally unfair,” I said, “and a better metaphor would be if every time someone got out of the car, five minutes later they found themselves back in the car, and I ask the sage for driving directions to a laboratory where they are studying that problem, and…”

“You only believe that because it’s written on the windshield,” said the big green bat. “And you think the windshield is identical to reality because you won’t GET OUT OF THE CAR.”

“Fine,” I said. “Then I can’t get out of the car. I want to get out of the car. But I need help. And the first step to getting help is for you to factor my number. You seem like a reasonable person. Bat. Freaky DMT entity. Whatever. Please. I promise you, this is the right thing to do. Just factor the number.”

“And I promise you,” said the big green bat. “You don’t need to factor the number. You just need to GET OUT OF THE CAR.”

“I can’t get out of the car until you factor the number.”

“I won’t factor the number until you get out of the car.”

“Please, I’m begging you, factor the number!”

“Yes, well, I’m begging you, please get out of the car!”

“FOR THE LOVE OF GOD JUST FACTOR THE FUCKING NUMBER!”

“FOR THE LOVE OF GOD JUST GET OUT OF THE FUCKING CAR!”

“FACTOR THE FUCKING NUMBER!”

“GET OUT OF THE FUCKING CAR!”

“Universal love,” said the cactus person.

Then tree and beast all fled due east and the moon and stars shot south. And the bat rose up and the sea was a cup and the earth was a screen green as clozapine and the sky a voracious mouth. And the mouth opened wide and the earth was skied and the sea fell in with an awful din and the trees were moons and the sand in the dunes was a blazing comet and…

I vomited, hard, all over my bed. It happens every time I take DMT, sooner or later; I’ve got a weak stomach and I’m not sure the stuff I get is totally pure. I crawled just far enough out of bed to flip a light switch on, then collapsed back onto the soiled covers. The clock on the wall read 11:55, meaning I’d been out about an hour and a half. I briefly considered taking some more ayahuasca and heading right back there, but the chances of getting anything more out of the big green bat, let alone the cactus person, seemed small enough to fit in a thimble. I drifted off into a fitful sleep.

Behind the veil, across the infinite abyss, beyond the ice, beyond daath, the dew rose from the soaked ground and coalesced into a great drop, which floated up into an oily sky and became a watery sun. The cactus person was counting on his spines.

“Hey,” the cactus person finally said, “just out of curiosity, was the answer 37,975,227, 936,943,673, 922,808,872, 755,445,627, 854,565,536, 638,199 times 40,094,690,950, 920,881,030, 683,735,292, 761,468,389, 214,899,724,061?”

“Yeah,” said the big green bat. “That’s what I got too.”

Links 4/15: Link And You’re Dead

Perytons are mysterious bursts detected by radio telescopes. Some kind of novel astronomical object? Maybe not – a recent investigation suggested something more banal – microwave ovens in the astronomers’ break room.

Greg Cochran on creepy cell line infections. “There are diseases that look as if they might be infectious where no causative organisms has ever been found – diseases like sarcoidosis. They might be caused by some disease that started out as your second cousin Frank.”

Yale on climate change polling. More people believe in global warming themselves, than believe there is a scientific consensus around it? That’s the opposite of what I would have expected. More people want to regulate CO2 than believe global warming exists? Polling is weird.

A lot of the scrutiny around Ferguson focused on its corrupt police force as an example of white officials fleecing black citizens, and how this might be solved by mobilizing black voters to take control of the government. The Daily Beast has an interesting article on the town next to Ferguson – where black officials fleece black citizens about the same amount.

CVS will allow people to get naloxone without prescriptions in order to fight deaths from opiate overdose (which naloxone treates). The two interesting things I took from this study – first, it’s surprisingly legal to give prescription drugs away without prescriptions if you can get a couple of trade groups to agree to it. Second, maybe this will mean alcoholics can try the Sinclair Method on their own.

Lot of interesting graphs on my Twitter feed this month. Here’s one on how fertility isn’t declining and one on how IQ affects likelihood of escaping poverty (source)

An Italian surgeon is prepared to attempt the world’s first head transplant.

A multinational team says their machine learning program can now predict IQ from MRI images accurately enough that their estimates correlate at 0.71 with the real thing. I asked Twitter what they thought; apparently it’s real prediction rather than “my machine learning algorithm correctly predicted the same data we fed it”, but it might be confounded by the sample of different-aged children; the program might just be reading off whose brain looks older and predicting that older children perform better on IQ tests.

How did surveyors in 1919, long before the computer was invented, calculate the geographical center of the United States?

No Irish Need Apply: A Myth Of Victimization. A historian argues that there are no actual records of 19th century American businesses or advertisments using this phrase, and it was later made up to promote Irish-American solidarity. When asked for comment, experts look shifty and say they “know nothing”.

More strong claims for probiotics: a four-week treatment with a multispecies supplement decreases reactivity to sad mood, considered a risk factor for depression.

Vox writes about Raj Chetty’s theories of location-dependent social mobility, and now it seems that Hillary Clinton is a huge fan. But Steve Sailer points out exactly the same giant gaping radioactive flaw that I noticed – he is basically just noticing that there is less social mobility between races than within them, and that therefore, places with high black populations appear to have less social mobility. Please tell me I’m misunderstanding something and he didn’t actually miss this.

A while back we discussed gender differences in ethical theories. A recent big meta-analysis finds that women are moderately more deontological than men, and men slightly more utilitarian than women. Whatever.

It’s morally wrong to blame a victim’s actions for their own victimization. We should be blaming those victims’ genes. Or something. Not really sure what to do with this one.

Very closely related: a while back I argued that the apparent connection between childhood bullying and psychiatric disorders was way too strong to be real and likely to represent some kind of common confound. Sure enough, when somebody twin-studied it they found that at least in the case of paranoia 93% of the association is likely to represent a common genetic risk factor.

Ready For Hillary? Take Our Quiz And Find Out! Question four: “Her slogan is (a) Ready for Hillary, (b) Resigned to Hillary, (c) Preparing for Chelsea, or (d) What Difference, At This Point, Does It Make?”

19th century polymath Francis Galton was among the first to study the efficacy of prayer, noting among other things that despite all the people praying “God save the King” royals tended to die earlier than other upper-class individuals.

Chris Blattman conducted a study in Liberia that finds that at-risk poor young men given cognitive behavioral therapy were involved in 20-50% less crime, drugs, and violence than a control group, with effects lasting at least a year. This sincerely surprises me. I would pay money to see what James Coyne thinks of this.

New work with odd jellyfish-like creatures called ctenophores raises the surprising question: did neurons evolve twice?

At least three towns have exclamation points in their names: Hamilton!, Ohio; Westward Ho!, Devon, and Saint Louis du Ha! Ha!, Quebec.

In order to prove some kind of point, Ecuador very carefully disguises a portion of its territory as Costa Rica, tells some of its citizens they were going on a trip to Costa Rica, then keeps them in Ecuador. Now it’s an international incident with the Costa Rican government getting involved.

Individual Differences In Executive Function Are Almost Entirely Genetic In Origin. And when they say “almost entirely”, they mean “about 99%”. This doesn’t make sense to me – why should this be the only 99% genetic thing in a world full of cognitive skills that are about 50% genetic? Really looking forward to a replication attempt.

Has Obamacare Turned Voters Against Sharing The Wealth? Maybe not Obamacare specifically, but the magnitude of increasing opposition to redistribution is surprising and disturbing. Also a confusing sign of how poorly trends in media coverage mirror trends in people’s attitudes.

FBI Admits It Fudged Forensic Hair Matches In Nearly All Criminal Trials For Decades. “Oops” doesn’t seem to cut it.

If Douglas Hofstadter wrote erotica (h/t Multiheaded)

Posted in Uncategorized | Tagged | 363 Comments