Most weeks, CNN Health or the New York Times Science section (or Slate!) reports on another study about health. Within the past couple weeks there was one about how four cups of coffee a day kills you and one on which brands of adult beverages are most likely to result in a trip to the emergency room (answer: malt liquor).
A question that comes up again and again in reading these, and came up all the time when I was writing my book on pregnancy, is how to know which studies warrant our attention.
The gold standard in medicine, and in other fields, is the randomized controlled trial. In a study like this, participants are divided into two (or more) groups randomly, and each is told to do something different. In a drug study, one group takes a drug and the other does not. Because the groups are randomly selected, on average they are similar before the study. So if the researchers see differences after the study, they can be confident the differences are due to the treatment.
Even in a randomized study, there are limitations. No research study is run on the entire population of the world. What we learn from these studies is the impact of the treatment on average, not on every individual.
So if you are approaching a health decision based on data from a randomized study, that’s great. But the majority of the time, especially in public health where I looked for data on pregnancy, the studies aren’t randomized. In 2012, the American Journal of Public Health published 128 papers—only 14 of them were randomized.
What we get instead are observational studies, which also compare individuals who engage in different behaviors. The difference, though, is that these individuals choose to engage in these behaviors on their own—and if differences among people influence their choice of behavior, it may be those differences, not the behavior itself, that changes their outcomes.
Let’s consider a particular example, drawn from pregnancy (since this is, after all, a pregnancy blog). When I came to look at the issue of caffeine, I found that, at least in evaluating the relationship with miscarriage, there were no randomized studies. So I turned to observational studies.
One that I summarize in my book is from the American Journal of Obstetrics and Gynecology, published in 2008. The paper compares women who drink no caffeine to those who drink two cups a day or more and analyzes the risk of miscarriage (the researchers also included an intermediate group, but for now we’ll think about the simple comparison). Table 1 of this paper summarizes the characteristics of women in these different caffeine groups. Here’s a subset:
Clearly, there are big differences across these groups other than how much caffeine they drink. The coffee drinkers are older, more likely to be white, poorer on average, and more likely to smoke. Smoking and age, in particular, are linked to miscarriage. So is it the coffee? Or is it the smoking?
The crucial element of evaluating these studies is to figure out how big these differences really are and how much they matter. It’s common in studies like this to show the impacts of the treatment—in this case, caffeine—after adjusting for these controls. This adjustment is important but also incomplete. In the above study, for instance, the authors controlled for whether the household income was more or less than $50,000, but that’s only a crude measure. It is hard to know whether better controls—more detail about income— would make more of a difference.
What does this mean, in general, when we’re faced with these studies? How can we evaluate them? One thing to start with is always be a little skeptical. But a bit more concretely, here are two tips:
1. Look at how different the groups are on factors such as age, education, and so on. All else equal, the more similar the groups look, the better.
2. Look at how complete the data is. A lot of media reports say a study is “adjusted for socio-demographics.” How effective this is depends on how comprehensive the variables on demographics are. A study using data that details exactly how much education someone has, and exactly their income will be able to adjust more completely for this issue than one where all the researchers observe is whether someone completed high school or not.
If you use these criteria, you’ll quickly realize that some studies are much, much better than others. And while no single study is going to be enough to close the book on any issue, thinking critically about these possible problems may sometimes lead you to decide a study doesn’t have much information at all. Not to mention allow you to drink coffee.