The Research Is Only As Good As the Data

A new study found no evidence of racial bias in police shootings. But last year, a study came to the opposite conclusion. Why?

An anti-Trump protesters is arrested after clashes with police during a rally outside Republican presidential candidate Donald Trump's event in San Diego, California, on May 27, 2016.

A protester is arrested after clashes with police during a rally outside Republican presidential candidate Donald Trump’s event in San Diego, California, on May 27.

Mark Ralston/Getty Images

The recent high-profile police shooting deaths of Alton Sterling, Philando Castile, and too many other black men and women have raised contentious questions about the extent to which law enforcement officers are affected by racial biases. A newly published working paper by Harvard University professor Roland G. Fryer Jr. received a lot of attention and pushback this week for finding no evidence of racial bias in police shootings. This is surprising, Fryer readily admitted, and anyone exposed to the recent graphic videos of police behaving violently toward black men and women might agree. Another reason it is surprising? A previously published paper by University of California, Davis, anthropologist Cody Ross found that people shot by police are significantly more likely to be black than white.

Why did these two papers reach such different conclusions? Because they drew on different data sources and consequently relied on different statistical methods. Right now, there is no comprehensive official federal database documenting shootings by U.S. law enforcement officers.* Instead, researchers must read through thousands of 50+ page police reports from a few cooperative cities, as Fryer’s team quite impressively did, or use alternative databases compiled by nongovernment groups, as Ross did. With incomplete and imperfect datasets, researchers are limited in the analyses that they can perform.

Fryer’s study compared all 500 police shootings that occurred in Houston from 2000–2015 to a random sample of Houston cases from the same time period in which lethal force could have been justified but was not used.* With these data, he was able to use the statistical method of logistic regression to determine how much a suspect’s race affected whether he or she would be shot by police during a heated encounter. Fryer found that black Houstonites, compared to white Houstonites, were actually almost 25 percent less likely to be shot by police in such encounters. Fryer was quite explicit about the fact that his data were specific to Houston and more data are needed in order to understand whether police shootings are racially biased in other parts of the country.

Ross, a Ph.D. student, probably did not have an army of research assistants at his disposal, which may explain why he used Deadspin’s crowd-sourced U.S. Police-Shooting Database. This database has the lofty goal of tracking every police shooting in the U.S. by calling on everyday people to Google and log police shootings for every calendar day from 2011 to 2014. The idea is that this might eliminate inadvertent bias that plays out in how police reports are made, but of course, it also introduces all kinds of new biases, too, such as what the media chooses to report and how Google’s search algorithm prioritizes search results. Importantly, at the time of Ross’ analyses, only about half of the days had been searched, and out of the nearly 2,000 records, just over 700 contained enough location and race data to be useable for Ross’s analyses. Thus, while Fryer’s Houston police shooting dataset had painstakingly complete coverage (though only for a single city), Ross’ dataset was less thorough and reliable, though it did draw on reports from across the U.S.

Because the Deadspin dataset also only recorded media-reported shootings and not all police encounters in which shootings could have happened, Ross used Bayesian statistics to ask a different question from Fryer’s: “For people shot by the police, what is the relative likelihood that they are black versus white and armed versus unarmed?” According Ross’ analyses, people are three times more likely to be black, unarmed, and shot by police than they are to be white, unarmed, and shot by police. (This type of analysis yields risk ratios that compare relative probabilities rather than exact values.) For those who questioned how Philando Castile’s gun-carrying affected his risk of being shot, Ross finds that black Americans shot by police are 2.8 times more likely to be armed than unarmed. (White Americans shot by police are 3.3 times more likely to be armed than unarmed.)

It makes sense that Fryer’s study received more attention—its data were more comprehensive, and his team uniquely investigated a comparison group of nonshooting police encounters. But even though Ross’ study used a less reliable dataset, there still may be useful findings. For example, he was able to identify counties in which racial biases may be especially strong. (He specifically calls out Miami-Dade in Florida, Los Angeles, and New Orleans.) Applying Fryer’s method of thoroughly combing police reports to the cities Ross’ paper identified as being especially biased would be a smart way to proceed.

The most important takeaway here is to remember that each study is not a definitive reflection of the truth but an assessment of the data available to a researcher. The researchers know this. Fryer notes that his paper just “takes first steps into the treacherous terrain of understanding the extent of racial differences in police use of force,” and Ross writes that the Deadspin database is incomplete and needs thorough verification. Both authors agree on the need for more readily available and complete data on U.S. police shootings so that more research can be conducted.

Our current state of scattered record-keeping on police violence only allows researchers to extrapolate limited conclusions that come with many caveats. A comprehensive central database that tracks all instances of police shootings would allow researchers to draw more accurate conclusions. Until then, we have to remember that a conclusion is only as strong as the data it pulls from—and our data on this issue are weak.

*Correction, July 18, 2016: An earlier version of this story misstated how many police shootings occured in Houston from 2000 to 2015. It was 500. (Return.) This story has also been updated to clarify that while there is a federal database documenting shootings by U.S. law enforcement, it is not thought to be comprehensive. (Return.)