This is an entry in Viral Studies, a Slate series in which we break down recent viral articles and—most importantly—their caveats.
The recent mass gathering in South Dakota for the annual Sturgis Motorcycle Rally seemed like the perfect recipe for what epidemiologists call a “superspreading” event. Beginning Aug. 7, an estimated 460,000 attendees from all over country descended on the small town of Sturgis for a 10-day event filled with indoor and outdoor events such as concerts and drag racing.
Now a new working paper by economist Dhaval Dave and colleagues is making headlines with their estimate that the Sturgis rally led to a shocking 266,796 new cases in the U.S. over a four-week period, which would account for a staggering 19 percent of newly confirmed cases in the U.S. in that time. They estimate the economic cost of these cases at $12.2 billion, based on previous estimates of the statistical cost of treating a COVID-19 patient.
Not surprisingly, the internet lit up with “we told you so!” headlines and social media shaming and blaming. The huge figures immediately hit the “confirmation bias” button in many people’s brains. But hold up. There are lots of reasons to be skeptical of these findings, and the 266,796 number itself should raise serious believability alarm bells.
Modeling infection transmission dynamics is hard, as we have seen by the less than stellar performance of many predictive COVID-19 models thus far. (Remember back in April, when the IHME model from the University of Washington predicted zero U.S. deaths in July?) Pandemic spread is difficult both to predict and to explain after the fact—like trying to explain the direction and intensity of current wildfires in the West. While some underlying factors do predict spread, there is a high degree of randomness, and small disturbances (like winds) can cause huge variation across time and space. Many outcomes that social scientists typically study, like income, are more stable and not as susceptible to these “butterfly effects” that threaten the validity of certain research designs.
The Sturgis study essentially tries to re-create a randomized experiment by comparing the COVID-19 trends in counties that rallygoers traveled from with counties that apparently don’t have as many motorcycle enthusiasts. The authors estimate the source of inflow into Sturgis during the rally based on the “home” location of nonresident cellphone pings. They use a “difference-in-difference” approach, calculating whether the change in case trends for a county that sent many people to Sturgis was larger compared with a county that sent none. They looked at how cumulative case numbers changed between June 6 and Sept. 2.
While this approach may sound sensible, it relies on strong assumptions that rarely hold in the real world. For one thing, there are many other differences between counties full of bike rally fans versus those with none, and therein lies the challenge of creating a good “counterfactual” for the implied experiment—how to compare trends in counties that are different on many geographic, social, and economic dimensions? The “parallel trends” assumption assumes that every county was on a similar trajectory and the only difference was the number of attendees sent to the Sturgis rally. When this “parallel trends” assumption is violated, the resulting estimates are not just off by a little—they can be completely wrong. This type of modeling is risky, and the burden of proof for the believability of the assumptions very high.
More critically, the paper assumes the “noise” in COVID-19 cases from different counties averages out over time and thus comparing the trends is valid. We all probably know by now that epidemic curves are not so predictable and are heavily dependent on the luck of floating wildfire embers, so to speak. This approach may work for changes in the uptake of state benefits or other outcomes traditionally analyzed by difference-in-difference designs, but not for outcomes that are serially correlated, like wildfires or epidemics. Thus, this idea—that even if the parallel trends assumption held, differences in COVID-19 cases across counties are fully attributable to the rally—is a strong ask.
Having estimated such a large number of additional infections due to the Sturgis rally from aggregate data, the authors should have wondered if such high levels of transmission were epidemiologically feasible over the short time frame. But as computational social scientist Rex Douglass details in this Twitter thread, the paper doesn’t provide a model of infectious disease transmission—a pretty major oversight. Basically, the authors don’t outline what transmission on this scale would have to look like to reach 266,796 infections—for example, X percentage of attendees arriving infected across the 10 days, Y percent transmitting the virus to Z new people, etc. Given the staggered arrivals (traffic flow data show that about 50,000 showed up per day) and incubation period (roughly five days), it seems likely that those infected at arrival could only have infected on average one or two new “generations” of infections during the rally itself. Even with a bleak assumption that 1 percent of attendees arrived already infectious (spread over 10 days) yet well enough to ride motorcycles to South Dakota, and all of them were “superspreaders,” passing their infection along to another 10 people, back-of-the-envelope math makes it hard to get in the ballpark of this number of infections that could have happened at the rally.
Of course, the study measures new cases in home counties, so perhaps that’s when the transmission really explodes. Let’s recall this was a motorcycle rally, so many attendees almost certainly didn’t fly home as soon as possible. High numbers of people came from California, Nevada, and Florida, so we can assume the return trip home took at least a few days for those heading home directly. The lure of the open road in August after months of worldwide lockdown may have even induced many riders to take a meandering path home. In short, it is a stretch to believe that so many infected riders could have gotten home in the short time frame required to infect others, incubate, get tested, and have these infections show up in county statistics by Sept. 2, just two weeks after the conclusion of the rally. In theory, the authors could have used the cellphone ping data to incorporate this variation in return times and routes, but they don’t mention doing so in the paper.
Since attendees hardly had time to attend the rally, get infected, and then bike home and infect others, the fact that rates in large sending counties are higher than those for non-sending countries strongly suggests that these differences in trends were in the works anyway due to local transmission dynamics, and not a direct result of the rally. As Ashish Jha, a physician and the dean of Brown University’s School of Public Health, pointed out on Twitter, the raw data show no spikes in counties where the authors say the rally attendees came from, increasing the mystery of where the 266,796 cases could have taken place.
If thinking through the required transmission dynamics doesn’t raise your alarm bells, consider this: The paper’s results show that the significant increase in transmission was only evident after Aug. 26. That makes sense—it would be consistent with a lag time for infections from the beginning of the rally. Nonetheless, the authors state that their estimate of the total number of cases, 266,796, represents “19 percent of the 1.4 million cases of COVID-19 in the United States between August 2nd 2020 and September 2nd.” (Italics mine.) In reality, these extra cases must have occurred in the second half of the month, meaning these estimates would account for a staggering 45 percent of U.S. cases over those two weeks. This simply doesn’t seem plausible.
The 266,796 number also overstates the precision of the estimates in the paper even if the model is taken at face value. The confidence intervals for the “high inflow” counties seem to include zero (meaning the authors can’t say with statistical confidence that there was any difference in infections across counties due to the rally). No standard errors (measures of the variability around the estimate) are provided for the main regression results, and many of the p-values for key results are not statistically significant at conventional levels. So even if one believes the design and assumptions, the results are very “noisy” and subject to caveats that don’t merit the broadcasting of the highly specific 266,796 figure with confidence, though I imagine that “somewhere between zero and 450,000 infections” would not have been as headline-grabbing.
None of this means that the rally was probably harmless. Common sense would tell us that such a large event with close contact was risky and did increase transmission. The paper’s estimates for the rise in cases in Meade County, South Dakota, the site of the rally, reports a more plausible increase of between 177 and 195 cases, consistent with raw data.* Given the huge inflow to this specific location along with increased testing for the event, a bump was not surprising. Contact tracing reports have identified cases and deaths linked to the event, but in the range of hundreds.
More broadly, while it’s important for us to understand factors driving COVID-19 transmission, the methodological challenges to identifying these effects at the aggregate level are difficult to overcome. Improved contact tracing and surveys at the individual level are the best way to gain insights into transmission dynamics. (At Dear Pandemic, a COVID-19 science communication effort I run with colleagues, we unfortunately spend much of our time explaining and correcting such misleading statistics.) The authors of this study have used the same study design to estimate the effects of other mass gatherings including the BLM protests and Trump’s June Tulsa, Oklahoma, rally. Each paper has given some part of the political spectrum something they might want to hear but has done very little to illuminate the actual risks of COVID-19 transmission at these events. Exaggerated headlines and cherry-picking of results for “I told you so” media moments can dangerously undermine the long-term integrity of the science—something we can little afford right now.
Correction, Sept. 11, 2020: This article originally and erroneously questioned the paper’s estimates for Meade County based on a data error. The Meade County estimates are in fact in a range consistent with raw county data.