## Looking at how COVID-19 cases cluster can help us understand how the disease is transmitted

A number of articles in the past week or so have focused on the possibility of COVID-19 “superspreaders.” This is the idea that a very large share of COVID-19 infections are caused by a small number of people. For example, one article from the Telegraph suggests that perhaps 80 percent of infections are caused by 10 percent of people. Is this true? And if so, you may be wondering: How can I avoid that 10 percent of people?

To think about this, we need a little bit of epidemiology. If you’ve followed anything about COVID-19 carefully you’ve probably heard about “R0,” the effective reproductive rate of the virus, which measures the average number of people that are infected by one infected person. If R0 is larger than 1 (i.e., if, on average, an infected person spreads the virus to multiple people), the number of infected people continues to grow.

There is also a second, less discussed epidemiological parameter: k, the “dispersion factor.” This k is a measure of the concentration of infection, of its clustering. It’s a way to get at the question: Do we see a lot of dense clusters of infection, or are the patterns more dispersed? This parameter doesn’t have a direct interpretation like R0, but what we can say is that a lower value of k implies a greater clustering of infections. Greater clustering implies that a smaller number of people are responsible for more infections.

If you want to get into details, Science had a nice general write-up on this topic, and here is a (denser) preprint about estimating this k value in COVID-19. (There has been a lot of discussion of the value of preprints—early release, non-peer-reviewed studies—in COVID-19 research. On the one hand, they are fast. On the other hand, lack of peer review may limit our confidence in results. It’s a balance.) The preliminary data suggest that COVID-19 could have a very low k; perhaps something like 0.1. If that’s the case, then a small number of infected individuals could be responsible for a large number of infection clusters.

This might lead to the idea that certain individuals, for some reason, are better at transmitting the virus than others and become “superspreaders.” However, that doesn’t necessarily follow from the above. It may be more accurate to talk about superspreader events than individuals. That is, it seems likely that clustering of infections is due, at least in part, to the circumstances of exposure. It is clear in the case of COVID-19 that some environments are more conducive to spreading the virus than others.

Nursing homes are one environment where we’ve seen widespread infection, but that is likely to reflect the susceptibility of elderly populations rather than the actual conditions of these facilities. On the other hand, meatpacking plants are a viral hot spot despite a relatively low-risk population; this may be due to close work environments and low temperatures. There has also been some discussion of the possible role of singing (there was a choir outbreak in Washington), shouting, and particular kinds of exercise. There are some South Korean clusters linked to Zumba. And many European cases were linked to a particular ski resort, perhaps due to this singing/beer pong bar experience.

It is also true that an individual’s ability to spread the virus varies, as people shed different viral amounts and at different times. The clusters associated with choirs, for example, may be an unfortunate example of a combination of a high-spreading environment with an individual who was shedding a lot of virus, either in general or in that moment. We saw a lot of this with (the original) SARS: A few individuals were responsible for a huge amount of viral spread. There is an interesting example of an individual infected with SARS who had very high viral load in their urine and feces and ended up causing a large cluster of cases in their apartment building due to sewage-based aerosol backflow.