Remember that study that found that most psychology studies were wrong? Yeah, that study was wrong. That’s the conclusion of four researchers who recently interrogated the methods of that study, which itself interrogated the methods of 100 psychology studies to find that very few could be replicated. (Whoa.) Their damning commentary will be published Friday in the journal Science. (The scientific body that publishes the journal sent Slate an early copy.)
In case you missed the hullabaloo: A key feature of the scientific method is that scientific results should be reproducible—that is, if you run an experiment again, you should get the same results. If you don’t, you’ve got a problem. And a problem is exactly what 270 scientists found last August, when they decided to try to reproduce 100 peer-reviewed journal studies in the field of social psychology. Only around 39 percent of the reproduced studies, they found, came up with similar results to the originals.
That meta-analysis, published in Science by a group called the Open Science Collaboration, led to mass hand-wringing over the “replicability crisis” in psychology. (It wasn’t the first time that the field has faced such criticism, as Michelle N. Meyer and Christopher Chabris have reported in Slate, but this particular study was a doozy.)
Now this new commentary, from Harvard’s Gary King and Daniel Gilbert and the University of Virginia’s Timothy Wilson, finds that the OSC study was bogus—for a dazzling array of reasons. I know you’re busy, so let’s examine just two.
The first—which is what tipped researchers off to the study being not-quite-right in the first place—was statistical. The whole scandal, after all, was over the fact that such a low number of the original 100 studies turned out to be reproducible. But when King, a social scientist and statistician, saw the study, he didn’t think the number looked that low. Yeah, I know, 39 percent sounds really low—but it’s about what social scientists should expect, given the fact that errors could occur either in the original studies or the replicas, says King.
His colleagues agreed, telling him, according to King, “This study is completely unfair—and even irresponsible.”
Upon investigating the study further, the researchers identified a second and more crucial problem. Basically, the OSC researchers did a terrible job replicating those 100 studies in the first place. As King put it: “You’d think that a test about replications would actually reproduce the original studies.” But no! Some of the methods used for the reproduced studies were utterly confounding—for instance, OSC researchers tried to reproduce an American study that dealt with Stanford University students’ attitudes toward affirmative action policies by using Dutch students at the University of Amsterdam. Others simply didn’t use enough subjects to be reliable.
The new analysis “completely repudiates” the idea that the OSC study provides evidence for a crisis in psychology, says King. Of course, that doesn’t mean we shouldn’t be concerned with reproducibility in science. “We should be obsessed with these questions,” says King. “They are incredibly important. But it isn’t true that all social psychologists are making stuff up.”
After all, King points out, the OSC researchers used admirable, transparent methods to come to their own—ultimately wrong—conclusions. Specifically, those authors made all their data easily accessible and clearly explained their methods—making it all the easier for King and his co-authors to tear it apart. The OSC researchers also read early drafts of the new commentary, helpfully adding notes and clarifications where needed. “Without that, we wouldn’t have been able to write our article,” says King. Now that’s collaboration!
“We look forward to the next article that tries to conclude that we’re wrong,” he adds.
Update, March 4, 2016, 8:00 a.m.: We reached University of Virginia psychologist Brian Nosek, an author of the original study and executive director of the Center for Open Science. Nosek and his fellow co-authors have issued a rebuttal to the Gilbert et al commentary, which appears alongside it in Science. Nosek was also a reviewer of the Gilbert commentary.
The two papers agree on one thing, says Nosek: That the original study found a 40 percent reproducibility rate. But they differ on what to make of that rate. The original authors sought to introduce that number as a starting point, rather thant to characterize it as high or low. “The whole goal of this is to stimulate some real learning about reproducibility, because so far it’s all been speculation,” says Nosek. “This is the first time we’ve had some real data.”
By contrast, the Gilbert paper took that data and then “jumped to a conclusion, based on selective exploratory evidence,” in Nosek’s words. The Gilbert paper attributed the fact that the reproducibility rate was as low as it was—in the authors’ characterization—due in part to the replica studies being poor reproductions of the originals. “They’ve generated one hypothesis,” says Nosek. “It is an optimistic assessment.”
For example, the Gilbert commentary finds that replica studies were four times as likely to generate similar results in cases where the original researchers endorsed the replicas. Their conclusion: Many of the replicated studies were faulty. That isn’t necessarily true, says Nosek. “The other reason is that researchers who don’t really believe that their effect is robust may be less likely to endorse designs because they don’t have as much faith in their conclusions,” he says.
In addition, the response by Nosek and his co-authors points out:
There is no such thing as exact replication. All replications differ in innumerable ways from original studies. They are conducted in different facilities, in different weather, with different experimenters, with different computers and displays, in different languages, at different points in history, and so on.”
Nosek adds that he is “very pleased that both the comment and response have been published.”
Update, March 4, 2016: The original image in this post has been removed because it was unrelated to the post.