Last week, the replication crisis in psychology was pushed back into the news when Susan Fiske, a former president of the Association for Psychological Science, wrote a column in which she criticized “online vigilantes” on blogs, Twitter, and Facebook who have taken prominent work in social psychology to task. Fiske likened these “destructo-critics” to “methodological terrorists.” This broadside was controversial, with myself and others responding that, when it comes to pointing out errors in published work, social media has been necessary. There just has been no reasonable alternative.
A couple days later, the other shoe dropped, as Dana Carney, one of the authors of a famous study proposing that power stances could translate into powerful feelings, disclaimed that work, which she had done in collaboration with Andy Yap and Amy Cuddy, a former student of Fiske. (I wrote about the power-pose study, which was conducted in 2010 but failed to replicate in 2014, earlier this year for Slate.) In a statement, which she posted on her website, Carney wrote, “The evidence against the existence of power poses is undeniable. … I do not think the effect is real.”
This note from Carney was wonderful news. I think it’s a great step forward when people are willing to reconsider their published research in light of methodological and empirical criticism.
The criticisms are valid. Researchers study small effects with noisy measurements and then look through their data to find statistically significant comparisons. This approach will be expected to lead to unreplicable claims. But, worse than that, it can lead to research communities where unreplicable results seem to reinforce each other: Study a small effect with noisy measurements, and any statistically significant claim will necessarily massively overestimate any underlying effects. In follow-up studies, researchers will then expect to see comparably huge effects, hence they anticipate “high power” (in statistics jargon), and they expect high rates of success. Coming into their studies with this expectation, they can feel justified in jiggling their data until they get the findings they want. The resulting claims get published in journals, their findings are believed, and the cycle continues.
But this is a problem in lots of scientific fields. Why does psychology continue to dominate the news when it comes to discussion of the replication crisis?
Why not economics, which is more controversial and gets more space in the news media? Or medicine, which has higher stakes and a regular flow of well-publicized scandals?
Here are some relevant factors that I see, within the field of psychology:
1. Sophistication: Psychology’s discourse on validity, reliability, and latent constructs is much more sophisticated than the usual treatment of measurement in statistics, economics, biology, etc. One reason for this is that psychology is an inherently difficult field, studying constructs such as personality, intelligence, and motivation, which are undeniably important but which by their nature are “latent constructs” that cannot be measured directly. Psychologist Paul Meehl raised serious questions about research methods as early as the 1960s, at a time when other fields were just getting naïve happy talk about how all problems would be solved with randomized experiments.
2. Overconfidence deriving from research designs: When we talk about the replication crisis in psychology, we’re mostly talking about lab experiments and surveys. Either way, you get clean identification of comparisons, hence there’s an assumption that simple textbook methods can’t go wrong. We’ve seen similar problems in economics (for example, a paper on air pollution in China that was based on a naïve trust in regression discontinuity analysis, not recognizing that, when you come down to it, what they had was an observational study), but lab experiments and surveys in psychology are typically so clean that researchers sometimes can’t seem to imagine that there could be any problems with their methods. And then researchers let their overconfidence about “statistical significance” leak into their analyses of observational data such as in the notorious “himmicanes and hurricanes” paper.
3. Openness. This one hurts: Psychology’s bad press is in part a consequence of its open culture, which manifests in various ways. To start with, psychology is institutionally open. Sure, there are some bad actors who refuse to share their data or who try to suppress dissent. Overall, though, psychology offers many channels of communication, even including the involvement of outsiders such as myself. One can compare to economics, which is notoriously resistant to ideas coming from other fields.
And, compared to medicine, psychology is much less restricted by financial and legal considerations. Biology and medicine are big business, and there are huge financial incentives for suppressing negative results, silencing critics, and flat-out cheating. In psychology, it’s relatively easy to get your hands on the data or at least to find mistakes in published work.
4. Involvement of some of prominent academics. Research controversies in other fields typically seem to involve fringe elements in their professions, and when discussing science publication failures, you might just say that Andrew Wakefield had an ax to grind and the editor of the Lancet is a sucker for political controversy, or that Richard Tol has an impressive talent for getting bad work published in good journals. In the rare cases when a big shot is involved (for example, Carmen Reinhart and Kenneth Rogoff), it is indeed big news. But, in psychology, the replication crisis has engulfed Fiske, Roy Baumeister, John Bargh, Carol Dweck … these are leaders in their field. So there’s a legitimate feeling that the replication crisis strikes at the heart of psychology, or at least social psychology; it’s hard to dismiss it as a series of isolated incidents.
5. Everyone loves psychology: It’s often of general interest (hence all the press coverage, TED Talks, and so on) and accessible, both in its subject matter and its methods. Biomedicine is all about development and DNA and all sorts of technical matters; to understand empirical economics you need to know about regression models; but the ideas and methods of psychology are right out in the open for all to see. At the same time, most of psychology is not politically controversial. If an economist makes a dramatic claim, journalists can call up experts on the left and the right and present a nuanced view. At least until recently, reporting about psychology followed the “scientist as bold discoverer” template, from Gladwell on down.
What do you get when you put it together?
The strengths and weaknesses of the field of research psychology seemed to have combined to (a) encourage the publication and dissemination of lots of low-quality, unreplicable research, while (b) creating the conditions for this problem to be recognized, exposed, and discussed openly.
It makes sense for psychology researchers to be embarrassed that those papers on power pose, ESP, himmicanes, etc. were published in their top journals and promoted by leaders in their field. Just to be clear: I’m not saying there’s anything embarrassing or illegitimate about studying and publishing papers on power pose, ESP, or himmicanes. Speculation and data exploration are fine with me; indeed, they’re a necessary part of science. My problem with those papers is that they presented speculation as mature theory, that they presented data exploration as confirmatory evidence, and that they were not part of research programs that could accommodate criticism. That’s bad news for psychology, as it would be for any other field.
But psychologists can express legitimate pride in the methodological sophistication that has given them avenues to understand the replication crisis, in the openness that has allowed prominent work to be criticized, and in the collaborative culture that has facilitated replication projects. Let’s not let the breakthrough-of-the-week hype and the TED Talk–ing hawkers and the “replication rate is statistically indistinguishable from 100 percent” blowhards distract us from all the good work that has showed us how to think more seriously about statistical evidence and scientific replication.