Last month I criticized a recent paper, Paunesku et al’s Mindset Interventions Are A Scalable Treatment For Academic Underachievement, saying that it spun a generally pessimistic set of findings about growth mindset into a generally optimistic headline.
Earlier today, lead author Dr. Paunesku was kind enough to write a very thorough reply, which I reproduce below:
Thanks for your provocative blog post about my work (I’m the first author of the paper you wrote about). I’d like to take a few moments to respond to your critiques, but first I’d like to frame my response and tell you a little bit about my own motivation and that of the team I am a member of (PERTS).
Good criticism is what makes science work. We are critical of our own work, but we are happy to have help. Often critics are not thoughtful or specific. So I very much appreciate the intent of your blog (to be thoughtful and specific).
What is our motivation? We are trying to improve our education system so that all students can thrive. If growth mindset is effective, we want it in every classroom possible. If it is ineffective, we want to know about it so we don’t waste people’s time. If it is effective for some students in some classrooms, we want to know where and for whom so that we can help those students.
What is our history and where are we now? PERTS approached social psychological interventions with a fair amount of skepticism at first. In many ways, they seemed too good to be true. But, we thought, “if this is true, we should do everything we can to spread it”. Our work over the last 5 years has been devoted to trying to see if the results that emerged from initial, small experiments (like Aronson et al., 2002 and Blackwell et al., 2007) would continue to be effective when scaled. The paper you are critiquing is a step in that process — not the end of the process. We are continuing research to see where, for whom, and at what scale social psychological approaches to improving education outcomes can be effective.
How do I intend to respond to your criticisms? In some cases, your facts or interpretations are simply incorrect, and I will try to explain why. I also invite you to contact me for follow up. In others cases, we simply have different opinions about what’s important, and we’ll have to agree to disagree. Regardless, I appreciate your willingness to be bold and specific in your criticism. I think that’s brave, and I think such bravery makes science stronger.
First, what is growth mindset?
This quote is from one of your other blog posts (not your critique of my paper), from your post:
If you’re not familiar with it, growth mindset is the belief that people who believe ability doesn’t matter and only effort determines success are more resilient, skillful, hard-working, perseverant in the face of failure, and better-in-a-bunch-of-other-ways than people who emphasize the importance of ability. Therefore, we can make everyone better off by telling them ability doesn’t matter and only hard work does.
If you think that’s what growth mindset is, I can certainly see why you’d find it irritating — and even destructive. I’d like to assure you that the people doing growth mindset research do not ascribe to the interpretation of growth mindset you described. Nor is that interpretation of growth mindset something we aim to communicate through our interventions. So what is growth mindset?
Growth mindset is not the belief that “ability doesn’t matter and only effort determines success.” Growth mindset is the belief that individuals can improve their abilities — usually through effort and by learning more effective strategies. For example, imagine a third grader struggling to learn long division for the first time. Should he interpret his struggle as a sign that he’s bad at math — as a sign that he should give up on math for good? Or would it be more adaptive if he realized that he could probably get a lot better at math if he sought out help from his peers or teachers? The student who thinks he should give up would probably do pretty badly while the student who thinks that he can improve his abilities — and tries to do so by learning new study strategies and practicing them — would do comparatively better.
That’s the core of growth mindset. It’s nothing crazy like thinking ability doesn’t matter. It’s keeping in mind that you can improve and that — to do so — you need to work hard and seek out and practice new, effective strategies.
As someone who has worked closely with Carol Dweck and with her students and colleagues for seven years now, I can personally attest that I have never heard anyone in that extended group of people express the belief that ability does not matter or that only hard work matters. In fact, a growth mindset wouldn’t make any sense if ability didn’t matter because a growth mindset is all about improving ability.
One of the active goals of the group I co-founded (PERTS) is to try to dispel misinterpretations of growth mindset because they can be harmful. I take it as a failure of our group that someone like you — someone who clearly cares about research and about scientific integrity — could walk away from our work with that interpretation of growth mindset. I hope that PERTS, and other groups promoting growth mindset, can get better and better at refining the way we talk about growth mindset so that people can walk away from our work understanding it more clearly. For that perspective, I hope you can continue to engage with us to improve that message so that people don’t continue to misinterpret it.
Anyway, here are my responses to specific points you made in your blog about my paper:
Was the control group a mindset intervention?
“A quarter of the students took a placebo course that just presented some science about how different parts of the brain do different stuff. This was also classified as a “mindset intervention”, though it seems pretty different.”
What makes you think it was classified as a mindset intervention? We called that the control group, and no one on our team ever thought of that as a mindset intervention.
The Elderly Hispanic Woman Effect
Subgroup analysis can be useful to find more specific patterns in the data, but if it’s done post hoc it can lead to what I previously called the Elderly Hispanic Woman Effect…
First, I just want to note that I love calling this the “elderly Hispanic woman effect.” It really brings out the intrinsic ridiculousness of the subgroup analyses researchers sometimes go through in search of an effect with a p<.05. It is indeed unlikely that "elderly Hispanic women" would be a meaningful subgroup for analyzing the effects of a medicine (although it might be a fun thought exercise to try to think of examples of a medicine whose effects would be likely to be moderated by being an elderly Hispanic woman). In bringing up the elderly Hispanic woman effect, you're suggesting that we didn't have an a priori reason to think that underperforming students would benefit from these mindset interventions and that we just looked through a bunch of moderators until we found one with p<.05. Well that's not what we did, and I hope I can convince you that our choice of moderator was perfectly reasonable given prior research and theory. There's a lot of research (and common sense too) to suggest that mindset -- and motivation in general -- matters much more when something is hard than when it is easy. Underachieving students presumably find school more difficult, so it makes sense that we'd want to focus on them. I don't think our choice of subgroup is a controversial or surprising prediction. I think anyone who knows mindset research well would predict stronger effects for students who are struggling. In other words, this is obviously not a case of the elderly Hispanic woman effect because it is totally consistent with prior theory and predictions. What ultimately matters more than any rhetorical argument, however, is whether the effect is robust -- whether it replicates. On that front, I hope you'll be pleased to learn that we just ran a successful replication of this study (in fall 2014) in which we again found that growth mindset improves achievement specifically among at-risk high school students (currently under review). We're also planning yet another large scale replication study this fall with a nationally representative sample of schools so that we can be more confident that the interventions are effective in various types of contexts before giving them away for free to any school that wants them. Is the sense of purpose intervention just a bunch of platitudes?
Still another quarter took a course about “sense of purpose” which talked about how schoolwork was meaningful and would help them accomplish lots of goals and they should be happy to do it.
[Later you say that those “children were told platitudes about how doing well in school will “make their families proud” and “make a positive impact”.]
I wouldn’t say those are platitudes. I think you’re under-appreciating the importance of finding meaning in one’s work. It’s a pretty basic observation about human nature that people are more likely to try hard when it seems like there’s a good reason to try hard. I also think it’s a pretty basic observation about our education system that many students don’t have good reasons for trying hard in school — reasons that resonate with them emotionally and help them find the motivation to do their best in the classroom. In our purpose intervention, we don’t just tell students what to think. We try to scaffold them to think of their own reasons for working hard in school, with a focus on reasons that are more likely to have emotional resonance for students. This type of self-persuasion technique has been used for decades in attitudes research.
We’ve written in more depth about these ideas and explored them through a series of studies. I’d encourage you to read this article if you’re interested.
Our paper title and abstract are misleading
Among ordinary students, the effect on the growth mindset group was completely indistinguishable from zero, and in fact they did nonsignificantly worse than the control group. This was the most basic test they performed, and it should have been the headline of the study. The study should have been titled “Growth Mindset Intervention Totally Fails To Affect GPA In Any Way”.
I think the title you suggest would have been misleading. How?
First, we did find evidence that mindset interventions help underachieving students — and those students are very important from a policy standpoint. As we describe in the paper, those students are more likely to drop out, to end up underemployed, or to end up in prison. So if something can help those students at scale and at a low cost, it’s important for people to know that. That’s why the word “underachievement” is in the title of the paper — because we’re accurately claiming that these interventions can help the important (and large) group of students who are underachieving.
Second, the interventions influenced the way all students think about school in ways that are associated with achievement. Although the higher performing students didn’t show any effects on grades in the semester following the study, their mindsets did change. And, as per the arguments I presented above about the link between mindset and difficulty, it’s quite feasible that those higher-performing students will benefit from this change in mindset down the line. For example, they may choose to take harder classes (e.g., Romero et al., 2014) or they may be more persistent and successful in future classes that are very challenging for them.
A misinterpretation of the y-axis in this graph.
Growth mindset still doesn’t differ from zero [among at-risk students].
This just seems to be a simple misreading of the graph. Either you missed the y-axis of the graph that you reproduced on your blog or you don’t know what a residual standardized score is. Either way, I’ll explain because this is pretty esoteric stuff.
The zero point of the y-axis on that graph is, by definition, the grand mean of the 4 conditions. In other words, the treatment conditions are all hovering around zero because zero is the average, and the average is made up mostly of treatment group students. If we had only had 2 conditions (each with 50% of the students), the y-axis “zero” would have been exactly halfway in between them. So the lack of difference from zero does not mean that the treatment was not different from control. The relevant comparison is between the error bars in the control condition and in the treatment conditions.
You might ask, “why are you showing such a graph?” We’re doing so to focus on the treatment contrast at the heart of our paper — the contrast between the control and treatment groups. The residual standardized graph makes it easy to see the size of that treatment contrast.
We’re combining intervention conditions
Did you catch that phrase “intervention conditions”? The authors of the study write: “Because our primary research question concerned the efficacy of academic mindset interventions in general when delivered via online modules, we then collapsed the intervention conditions into a single intervention dummy code (0 = control, 1 = intervention).
[This line of argument goes on for a long time to suggest that we’re unethical and that there’s actually no evidence for the effects of growth mindset on achievement.]
We collapsed the intervention conditions together for this analysis because we were interested in the overall effect of these interventions on achievement. We wanted to see if it is possible to use scalable, social-psychological approaches to improve the achievement of underperforming students. I’m not sure why you think that’s not a valid hypothesis to test, but we certainly think it is. Maybe this is just a matter of opinion about what’s a meaningful hypothesis to test, but I assure you that this hypothesis (contrast all treatments to control) is consistent with the goal of our group to develop treatments that make an impact on student achievement. As I described before, we have a whole center devoted to trying to improve academic achievement with these types of techniques (see perts.net); so it’s pretty natural that we’d want to see whether our social-psychological interventions improve outcomes for the students who need them most (at-risk students).
You’re correct that the growth mindset intervention did not have a statistically significant impact on course passing rates by itself (at a p<.05 level). However, the effect was in the expected direction with p=0.13 (or a 1-tailed p=.07 -- I hope you'll grant that a 1-tailed test is appropriate here given that we obviously predicted the treatment would improve rather than reduce performance). So the lack of a p<.05 should not be interpreted -- as you seem to interpret it -- as some sort of positive evidence that growth mindset "actually didn't work." Anyway, I would say it warrants further research to replicate this effect (work we are currently engaging in). To summarize, we did not find direct evidence that the growth mindset intervention increased course passing rates on its own at a p<.05 level. We did find that growth mindset increased course passing rates at a trend level -- and found a significant effect on GPA. More importantly for me (though perhaps less relevant to your interest specifically in growth mindset), we did provide evidence that social-psychological interventions, like growth mindset and sense of purpose, can improve academic outcomes for at-risk students. We're excited to be replicating this work now and giving it away in the hopes of improving outcomes for students around the world. Summary
I hope I addressed your concerns about this paper, and I welcome further discussion with you. I’d really appreciate it if you’d revise your blog post in whatever way you think is appropriate in light of my response. I’d hate for people to get the wrong impression of our work, and you don’t strike me as someone who would want to mislead people about scientific findings either.
Finally, you’re welcome to post my response. I may post it to my own web page because I’m sure many other people have similar questions about my work. Just let me know how you’d like to proceed with this dialog.
Thanks for reading,
First of all, the obvious: this is extremely kind and extremely well-argued and a lot of it is correct and makes me feel awful for being so snarky on my last post.
Things in particular which I want to endorse as absolutely right about the critique:
I wrote “A quarter of the students took a placebo course that just presented some science about how different parts of the brain do different stuff. This was also classified as a “mindset intervention”, though it seems pretty different.” Dr. Paunesku says this is wrong. He’s right. It was an editing error on my part. I meant to add the last sentence to the part on the “sense of purpose” intervention, which was classified as a mindset intervention and which I do think seems pretty different. The placebo intervention was never classified as a mindset intervention and I completely screwed up by inserting that piece of text there rather than two sentences down where I meant it to be. It has since been corrected and I apologize for the error.
If another successful replication found that growth mindset continues to only help the lowest-performing students, I withdraw the complaint that this is sketchy subgroup mining, though I think that in general worrying about this is the correct thing to do.
I did misunderstand the residual standardized graph. I suggested that the control group must have severely declined, and got confused about why. In fact, the graph was not about difference between pre-study scores and post-study scores, but difference between group scores and the average score for all four groups. So when the control group is strongly negative, that means it was much worse than the average of all groups. When growth mindset is not-different-from-zero, it means growth mindset was not different from the average of all four groups, which consists of three treatment groups and one control group. So my interpretation – that growth mindset failed to change children’s grades – is not supported by the data.
(In my defense, I can only plead that in the two hundred fifty comments I received, many by professional psychologists and statisticians, only one person picked up on this point (admittedly, after being primed by my own misinterpretation). And the sort of data I expected to be seeing – difference between students’ pre-intervention and post-intervention scores – does not seem to be available. Nevertheless, this was a huge and unforgiveable screw-up, and I apologize.)
But there are also a few places where I will stick to my guns.
I don’t think my interpretation of growth mindset was that far off the mark. I explain this a little further in this post on differing possible definitions of growth mindset, and I will continue to cite this strongly worded paper by Dweck as defense of my views. It’s not just an obvious and innocuous belief about about always believing you should be able to improve, it’s a belief about very counterintuitive effects of believing that success depends on ability versus effort. It is possible that all sophisticated researchers in the field have a very sophisticated and unobjectionable definition of growth mindset, but that’s not the way it’s presented to the public, even in articles by those same researchers.
Although I’m sure that to researchers in the field statements like “Doing well at school will help me achieve my goal” don’t sound like platitudes, it seems important to me in the context of discussions about growth mindset. Some people have billed growth mindset as a very exciting window into what makes learning tick, and how we should divide everyone into groups based on their mindset, and how it’s the Secret To Success, and so on. Learning that a drop-dead simple intervention – telling students to care about school more – actually does as well or better than growth mindset seems to me like a damning result. I realize it would be kind of insulting to call sense-of-purpose an “active placebo” in the medical sense, but that’s kind of how I can’t help thinking of it.
I’m certainly not suggesting the authors of the papers are unethical for combining growth mindset intervention with sense of purpose intervention. But I think the technique is dangerous, and this is an example. They got a result that was significant at p = 0.13. Dr. Paunesku suggests in his email to me that this should be one-tailed (which makes it p = 0.07) and that this obviously trends towards significance. This is a reasonable argument. But this wasn’t the reasonable argument made in the paper. Instead, they make it look like it achieved classical p < 0.05 significance, or at least make it very hard to notice that it didn't. Even if in this case it was - I can't even say white lie, maybe a white spin - I find the technique very worrying. Suppose I want to prove homeopathy cures cancer. I make a trial with one placebo condition and two intervention conditions - chemotherapy and homeopathy. I find that the chemotherapy condition very significantly outperforms placebo, but the homeopathy condition doesn't. So I combine the two interventions into a single bin and say "Therapeutic interventions such as chemotherapy or homeopathy significantly outperform placebo." Then someone else cites it as "As per a study, homeopathy outperforms placebo." This would obviously be bad. I am just not convinced that growth mindset and sense of purpose are similar enough that you can group them together effectively. This is what I was trying to get at in my bungled sentence about how they're both "mindset" interventions but seem pretty different. Yes, they're both things you tell children in forty-five minute sessions that seem related to how they think about school achievement. But that's a really broad category.
But doesn’t it mean something that growth-mindset was obviously trending toward significance?
First of all, I would have had no problem with saying “trending toward significance” and letting readers draw their own conclusions.
Second of all, I’m not totally sure I buy the justification for a one-tailed test here; after all, it seems like we should use a one-tailed test for homeopathy as well, since as astounding as it would be if homeopathy helped, it would be even more astounding if homeopathy somehow made cancer worse. Further, educational interventions often have the opposite of their desired effect – see eg this campaign to increase tolerance of the disabled which made students like disabled people less than a control intervention. In fact, there’s no need to look further than this very study, which found (counterintuitively) that among students already exposed to sense-of-purpose interventions, adding on an extra growth-mindset intervention seemed to make them do (nonsignificantly) worse. I am not a statistician, but my understanding is you ought to have a super good reason to use a one-tailed test, beyond just “Intuitively my hypothesis is way more likely than the exact opposite of my hypothesis”.
Third of all, if we accept p < 0.13 as "trending towards significance", we have basically tripled the range of acceptable study results, even though everyone agrees our current range of acceptable study results is already way too big and some high percent of all medical studies are wrong and only 39% of psych studies replicate and so on.
(I agree that all of this could be solved by something better than p-values, but p-values are what we’ve got)
I realize I’m being a jerk by insisting on the arbitrary 0.05 criterion, but in my defense, the time when only 39% of studies using a criterion replicate is a bad time to loosen that criterion.
Here’s what I still believe and what I’ve changed my mind on based on Dr. Paunesku’s response.
1. I totally bungled my sentence on the placebo group being a mindset intervention by mistake. I ashamedly apologize, and have corrected the original post.
2. I totally bungled reading the residual standard score graph. I ashamedly apologize, and have corrected the original post, and put a link in bold text to this post on the top.
3. I don’t know whether the thing I thought the graph showed (no significant preintervention vs. postintervention GPA improvement for growth mindset, or no difference in change from controls) is true. It may be hidden in the supplement somewhere, which I will check later. Possible apology pending further investigation.
4. Growth mindset still had no effect (in fact nonsignificantly negative) for students at large (as opposed to underachievers). I regret nothing.
5. Growth mindset still failed to reach traditional significance criteria for changing pass rates. I regret nothing.