Amid the nightmare of the past three weeks, with black men being shot by cops, and then black men shooting cops, the New York Times ran a surprising front-page headline: “Analysis Finds No Racial Bias in Lethal Force.” Nestled between articles on killings in Baton Rouge, Louisiana; Dallas; and Staten Island, New York, the story described research by Harvard economist Roland Fryer, whose team scoured reams of data on shootings by police in Houston and concluded—among other things—that cops may be less likely to shoot at black suspects in any given situation. “It is the most surprising result of my career,” Fryer told the Times.
Both Fryer’s study and the Times’ coverage of it were widely criticized. The newspaper never should have hyped the “nonsensical Harvard study,” tweeted Washington Post reporter Wesley Lowery, who won a Pulitzer Prize earlier this year for his reporting on killings by police. He added: “Since when are economists authorities on police shootings?” The venerable debunking website Snopes.com weighed in to discredit Fryer’s paper, even putting scare quotes around its description of the “Harvard study.” (Those words lent the work “a gravitas that was not yet present,” the site explained.) Complaints from readers who were “confused, angry or both” also reached the desk of the New York Times’ public editor. “The authors could have worked harder to anticipate confusion,” Liz Spayd wrote.
Critics of Fryer’s conclusions raised a number of concerns about his methodology. The study’s headline finding was, in part, based on officers’ own accounts of their encounters with suspected criminals, but police reports might not always tell the truth. The study’s fine-grained data came from Houston, but Houston may be different from other cities in the U.S. The study’s findings were dramatic, but the error bars in some of its data tables were quite large. The study described a lack of bias in lethal violence by police, but it’s possible that systemic racial bias in policing—the fact that there’s a lower bar for stopping blacks—distorted the results.
These are all fair points to raise, and they bear directly on how the research should be interpreted. (Fryer has responded to some of the critiques of his work.) I won’t weigh in on this aspect of the debate except to second what Rosa Li wrote recently in Slate: While Fryer’s data set was quite impressive, it certainly has its drawbacks, and no single study of the issue will ever yield a comprehensive answer.
I do want to address another form of recriminations, one that had less to do with how to read Fryer’s work than with whether it should be read at all. Snopes declared his research nothing more than “an unvetted working paper” and “a work in progress” that had been falsely described by the Times as if it were a completed study. The Times’ public editor wrote that the newspaper’s “story never says that the research had yet to receive the kind of rigorous assessment that a peer review brings.” According to this argument, since Fryer’s study has some flaws, and those flaws may be so subtle that only experts can expose them, then it wasn’t right to share the study data with the public. In other words, this was a case of bad science and bad science journalism.
I understand the impulse here, to find a rule of thumb for when and whether it’s OK to report on scientific research. No wants to get caught out writing up a bullshit study. When that study bears on a social issue as raw and vital as bias in policing, we want our filters even finer, so we don’t pollute the discourse with results unless we know they’re legit. But the simple test that’s been proposed in Fryer’s case, one his paper would have failed—I mean, was this study even peer-reviewed?—makes no sense at all.
That may sound perverse, given that peer review—the (usually anonymous) evaluation of a research study by several people in the field—plays the role of gatekeeper for science publishing. It’s how journals make the choice to put any given paper into print, or to send it back for revisions, or to reject it altogether. So why not apply the same form of scientific quality control to journalism? If a paper isn’t good enough to get into the academic literature, then surely it has no place on A1 of the New York Times.
In practice, though, “peer review” refers to a bewildering array of methods and procedures. At least 1 million peer-reviewed articles are published every year, in at least 25,000 journals. At the narrow, top tier of this ecosystem, where prestigious journals filter out all but the best and most important papers, peer review screens for breakthrough work with airtight methodology and well-founded conclusions. At the bulbous bottom, peer review is less discerning. It’s also amenable to all sorts of chicanery—like rings of scientists who rubber-stamp each other’s work or researchers who invent reviews.
The use of peer review varies from one publication to another and between different fields of research. In physics, scientists tend to post unpublished manuscripts to a common online archive, where anyone can read them and respond. Some of these go on to formal peer review and publication, but others don’t. In economics—Roland Fryer’s field—“working papers” often gestate in an early stage for quite a while before making their way into academic journals. And in other areas of research, such as legal scholarship, tradition holds that no one bothers with peer review at all.
In light of this diversity of practice and propensity for fraud, little information can be gleaned from the simple claim that any given study has or hasn’t been peer-reviewed. Take the case of Fryer’s working paper, the “unvetted” research that the Times’ public editor claimed had yet to receive a “rigorous assessment.” Fryer first presented his data, and collected feedback from his peers, in a seminar last summer at the National Bureau of Economic Research. He presented the work again to colleagues at Harvard, Brown, University College London, and the London School of Economics. Aware of the paper’s incendiary nature and importance, he sent the paper out to 50 colleagues asking for feedback. He also hired a second research team to recode all the data that he used for his analysis, just to make sure the results would replicate.
Fryer’s paper was not “peer-reviewed.” It was reviewed, though, very thoroughly, and by a large number of his peers.
Some critics of the Times’ journalism compared the lavish coverage of Fryer’s unpublished study to that of a paper on police shootings by a researcher named Cody Ross. The latter paper found that, at the population level, the chance of an unarmed black person being killed by police is much higher than the chance an unarmed white person will be killed by a cop. Fryer’s and Ross’ findings are not really at odds, since they’re asking somewhat different questions. Fryer, like Ross, found that blacks are more likely to be shot than whites. Fryer, though, looked at details of individual encounters, to control for how threatening each of those encounters was. When he controlled for those details, the bias went away, or even reversed.
Ross’ paper, unlike Fryer’s, went through peer review. But the journal where it appeared, PLOS One, happens to take an unorthodox approach to sorting through submissions, whereby its reviewers don’t assess papers for their contributions to the field, but rather only on the question of whether their methodologies seem up to snuff. PLOS One has also gotten caught up in several peer review–related scandals in the past few years. (In one, a paper was rejected on account of its author being female.) I don’t mean to say there’s anything wrong with Ross’ paper, just that its trip through peer review does not signal, on its own, that the work is worthy of attention.
Even under the best circumstances, peer review suffers from a fundamental limitation: Any given study is likely to be assessed by researchers who are sympathetic to its theories and assumptions. As statistician and social-science gadfly Andrew Gelman put it, “the problem with peer review is with the peers.” Fryer may have sent his draft around to 50 colleagues, but all of those colleagues could have been prone to making the same assumptions he did. That’s why the back-and-forth of the past two weeks has been so useful: It’s bringing welcome scrutiny to an important and meticulously constructed data set.
If peer review can’t provide a useful signal of what’s good and bad in science, then what other filters might a journalist deploy to take its place? One easy, often-used heuristic is to use a measure of credentials. Is the author of this paper on the faculty of a prestigious university? Is he or she respected in the field?
According to this test, the embattled working paper gets a perfect score. Fryer is the youngest black professor ever to receive tenure at Harvard University; he’s the recipient of both a MacArthur “genius grant” and a John Bates Clark Medal, the latter awarded to the best American economist younger than 40. If these institutional achievements count for anything, then surely his research merits airing in the Times. Ross’ background, on the other hand, offers no such grounds for confidence. He’s a graduate student, not a tenured professor, and he’s at the University of California–Davis, not Harvard.
But I’ve been reporting on science long enough to know that reputation is itself a very noisy signal of research quality. Scientists at Harvard, scientists with MacArthur grants, scientists with Nobel Prizes, scientists with best-selling books—they’re all susceptible to putting out suspect work. Meanwhile, scientists who haven’t yet received their Ph.D.s, or who work at institutes you’ve never heard of, do all manner of impressive and important research.
What else, then? Reporters might steer clear of papers based on their methodologies, rather than their authors or where and how the work was published. One could choose to ignore any study that draws on observational data, for example, and cover only randomized, controlled experiments. But then we’d be left with nothing to say on any topic—including racial bias in policing—where randomized, controlled experiments would be impossible to conduct.
Reporters could ignore single studies, however appealing those might be for newsy ledes (“According to a new study released on Tuesday …”) and focus instead on analyses of entire bodies of research. But then we might still run into the “garbage in, garbage out” problem: A whole bunch of lousy studies isn’t much better, in aggregate, than a single lousy one.
Since no single rule of thumb can separate the wheat and chaff in science, journalists must do their best to use them all in tandem, adding them together, weighing all of the above to varying degrees, and then deciding by gut instinct whether some composite threshold of respectability has been crossed. This fuzzy calculation has a way of drifting in the wind, however, blown off course by reader interest, news pegs, and the backdraft of a social-justice conflagration.
It would be nice if we could keep science coverage grounded, with every suspect piece of research barred from the media. But given that our filters will be imprecise, it’s probably best to err on the side of reporting more instead of less. If someone tries to tell you otherwise, and shouts down a study on the basis of a single, superficial trait—this paper isn’t peer-reviewed, that guy didn’t go to Harvard—it may be time to find another peer, and talk to someone else.