Four years ago I examined the claim that SSRIs are little better than placebo. Since then, some of my thinking on this question has changed.
First, we got Cipriani et al’s meta-analysis of anti-depressants. It avoids some of the pitfalls of Kirsch and comes to about the same conclusion. This knocks down a few of the lines of argument in my part 4 about how the effect size might look more like 0.5 than 0.3. The effect size is probably about 0.3.
Second, I’ve seen enough to realize that the anomalously low effect size of SSRIs in studies should be viewed not as an SSRI-specific phenomenon, but as part of a general trend towards much lower-than-expected effect sizes for every psychiatric medication (every medication full stop?). I wrote about this in my post on melatonin:
The consensus stresses that melatonin is a very weak hypnotic. The Buscemi meta-analysis cites this as their reason for declaring negative results despite a statistically significant effect – the supplement only made people get to sleep about ten minutes faster. “Ten minutes” sounds pretty pathetic, but we need to think of this in context. Even the strongest sleep medications, like Ambien, only show up in studies as getting you to sleep ten or twenty minutes faster; this NYT article says that “viewed as a group, [newer sleeping pills like Ambien, Lunesta, and Sonata] reduced the average time to go to sleep 12.8 minutes compared with fake pills, and increased total sleep time 11.4 minutes.” I don’t know of any statistically-principled comparison between melatonin and Ambien, but the difference is hardly (pun not intended) day and night. Rather than say “melatonin is crap”, I would argue that all sleeping pills have measurable effects that vastly underperform their subjective effects.
Or take benzodiazepines, a class of anxiety drugs including things like Xanax, Ativan, and Klonopin. Everyone knows these are effective (at least at first, before patients develop tolerance or become addicted). The studies find them to have about equal efficacy as SSRIs. You could almost convince me that SSRIs don’t have a detectable effect in the real world; you will never convince me that benzos don’t. Even morphine for pain gets an effect size of 0.4, little better than SSRI’s 0.3 and not enough to meet anyone’s criteria for “clinically significant”. Leucht 2012 provides similarly grim statistics for everything else.
I don’t know whether this means that we should conclude “nothing works” or “we need to reconsider how we think about effect sizes”.
All this leads to the third thing I’ve been thinking about. Given that the effect size really is about 0.3, how do we square the scientific evidence (that SSRIs “work” but do so little that no normal person could possibly detect them) with the clinical evidence (that psychiatrists and patients often find SSRIs sometimes save lives and often make depression substantially better?)
The traditional way to do this is to say that psychiatrists and patients are wrong. Given all the possible biases involved, they misattribute placebo effects to the drugs, or credit some cases that would have remitted anyway to the beneficial effect of SSRIs, or disproportionately remember the times the drugs work over the times they don’t. While “people are biased” is always an option, this doesn’t fit the magnitude of the clinical evidence that I (and most other psychiatrists) observe. There are patients who will regularly get better on an antidepressant, get worse when they stop it, get better when they go back on it, get worse when they stop it again, et cetera. This raises some questions of its own, like why patients keep stopping antidepressants that they clearly need in order to function, but makes bias less likely. Overall the clinical evidence that these drugs work is so strong that I will grasp at pretty much any straw in order to save my sanity and confirm that this is actually a real effect.
Every clinician knows that different people respond to antidepressants differently or not at all. Some patients will have an obvious and dramatic response to the first antidepressant they try. Other patients will have no response to the first antidepressant, but after trying five different things you’ll find one that works really well. Still other patients will apparently never respond to anything.
Overall only about 30% – 50% of the time when I start a patient on a particular antidepressant, do we end up deciding this is definitely the right medication for them and they should definitely stay on it. This fits national and global statistics. According to a Korean study, the median amount of time a patient stays on their antidepressant prescription is three months. A Japanese study finds only 44% of patients continued their antidepressants the recommended six months; an American study finds 31%.
Suppose that one-third of patients have some gene that makes them respond to Prozac with an effect size of 1.0 (very large and impressive), and nobody else responds. In a randomized controlled trial of Prozac, the average effect size will show up as 0.33 (one-third of patients get effect size of 1, two-thirds get effect size of 0). This matches the studies. In the clinic, one-third of patients will be obvious Prozac responders, and their psychiatrist will keep them on Prozac and be very impressed with it as an antidepressant and sing the praises of SSRIs. Two-thirds of patients will get no benefit, and their doctors will write them off as non-responders and try something else. Maybe the something else will work, and then the doctors will sing the praises of that SSRI, or maybe they’ll just say it’s “treatment-resistant depression” and so doesn’t count.
In other words, doctors’ observation “SSRIs work very well” is an existence statement “there are some patients for whom SSRIs work very well” – and not a universal observation “SSRIs will always work well for all patients”. Nobody has ever claimed the latter so it’s not surprising that it doesn’t match the studies.
I linked Gueorguieva and Krystal on the original post; they are saying some kind of much more statistically sophisticated version of this. But I can’t find any other literature on this possibility, which is surprising, because if it were true it should be pretty obvious, and if it were false it should still be worth somebody’s time to debunk.
If this were true, it would strengthen the case for the throughput-based model I talk about in Recommendations vs. Guidelines and Anxiety Sampler Kits. Instead of worrying only about a medicine’s effect size and side effects, we should worry about whether it is a cheap experiment or an expensive experiment. Imagine a drug that instantly cures 5% of people’s depression, but causes terrible nausea in the other 95%. The traditional model would reject this drug, since its effect size in studies is low and it has severe side effects. On the throughput model, give this drug to everybody, 5% of people will be instantly cured, 95% of people will suffer nausea for a day before realizing it doesn’t work for them, and then the 5% will keep taking it and the other 95% can do something else. This is obviously a huge exaggeration, but I think the principle holds. If there’s enough variability, the benefit-to-side-effect ratio of SSRIs is interesting only insofar as it tells us where in our guideline to put them. After that, what matters is the benefit-to-side-effect ratio for each individual patient.
I don’t hear this talked about much and I don’t know if this is consistent with the studies that have been done.
Fourth, even though SSRIs are branded “antidepressants”, they have an equal right to be called anti-anxiety medications. There’s some evidence that they may work better for this indication than for depression, although it’s hard to tell. I think Irving Kirsch himself makes this claim: he analyzed the efficacy of SSRIs for everything and found a “relatively large effect size” of 0.7 for anxiety (though the study was limited to children). Depression and anxiety are highly comorbid and half of people with a depressive disorder also have an anxiety disorder; there are reasons to think that at some deep level they may be aspects of the same condition. If SSRIs effectively treated anxiety, this might make depressed people feel better in a way that doesn’t necessarily show up on formal depression tests, but which they would express to their psychiatrist as “I feel better”. Or, psychiatrists might have a vague positive glow around SSRIs if it successfully treats their anxiety patients (who may be the same people as their depression patients) and not be very good at separating that positive glow into “depression efficacy” and “anxiety efficacy”. Then they might believe they’ve had good experiences with using SSRIs for depression.
I don’t know if this is true and some other studies find that results for anxiety are almost as abysmal as for depression.