Medical Examiner

Sunny Side Up

Screw-ups over unpublished data or no, antidepressants still work.


The New England Journal of Medicine has just published what at first glance looks like an extremely disturbing study concerning drugs and the information we get about them. Erick Turner, of the Oregon Health and Sciences University, and others examined data on antidepressants submitted to the Food and Drug Administration over a 15-year period. They found that only studies yielding positive results were likely to appear in the medical literature. Thirty-seven of 38 favorable trials had been published. By contrast, of 36 studies with negative or ambiguous findings, 22 were never published, and 11 were published with a deceptively favorable spin. As a result, 94 percent of the published studies supported the efficacy of antidepressants, even though only about half the total trials had found that the medications help.

It gets worse. The Turner group computed “effect sizes,” a measure that’s meant to make diverse studies comparable by statistically adjusting for the obduracy of the problem each interventionaddresses. Regarding antidepressants in trials submitted to the FDA, the published literature had pumped up the effect size by more than 30 percent. And so at second glance, Turner’s results are horrifying as well. I want to show why at third glance, they might not be quite so worrisome—though even this third perspective does not let drug companies off the hook.

First, a disclaimer. Despite what the press coverage—including the beginning of this article and the accompanying photo in the New York Times—suggested, the Journal study is mostly not about Prozac. I know that my name is sometimes associated with that medication, so I repeat: Prozac is not at issue. The FDA outcome data on Prozac have been in print for at least 17 years. Looking at studies about Prozac involving over 1,100 patients, the Journal researchers found only one trial from 1986, on 42 subjects (18 of them on Prozac), in which inconclusive results were spun as positive. Eli Lilly, the company that introduced Prozac, is implicated in the questionable publication practices, but via its newer antidepressant, Cymbalta, which looked half as good in the unpublished as the published trials. For GlaxoSmithKline’s Wellbutrin SR (slow release), unfavorable studies on 627 patients remained out of view, while a single favorable study on 230 patients found its way into print. The published data made the slow-release Wellbutrin look slightly more effective than Prozac; the overall data suggest only minimal efficacy.

Selective publication, if it is done to suppress information, is a serious problem. It misleads doctors and patients about the risks and benefits of drugs; and as Journal Editor Dr. Jeffrey Drazen points out in the Times, holding back data disrespects the efforts of research subjects who sign on in the belief that they are providing information to the profession and the public. I don’t doubt that drug companies try to withhold unfavorable data. You don’t get a pattern this blatant—only one-third of negative studies finding their way into print—on a random basis. But the pharmaceutical houses might have an odd sort of mitigating detail in their favor: Their research is notoriously shoddy.

I’ve made this point previously. That’s because almost none of the data Turner analyzes are new. If you’ve read articles about the placebo debate over the past 10 years—how much of antidepressants’ apparent effect is due to patients’ expectations of improvement?—you’ve been reading about these same FDA studies. They were analyzed in the late 1990s and again in the early years of this decade, to great fanfare. In the aggregate data, antidepressants look unimpressive. But it’s possible that the problems are with the particular studies, not the drugs.

In the rush to bring patented compounds to market, pharmaceutical houses sometimes enroll research subjects who barely meet criteria for the condition under study (in this case, depression). In some early trials, researchers may purposely use low doses; the idea is to squeak by the FDA’s minimum efficacy requirements without raising concerns about side effects. Because the subjects do not have the relevant disease, and because normal people’s moods wax and wane, these sloppy studies have high placebo response rates. The subjects simply look better over time. And because people without depression (or depressed people on too little drug) may not respond to the medication being tested, true effects are muted. Instead, the study shows an elevated placebo response rate. And then the research tends not to get published, because it’s simply not credible. Or the consequence is worse yet. Every researcher in the field can name a promising substance that was lost for patient use as a result of poor study design or overeager recruitment of subjects, resulting in astronomical placebo response rates.

Sponsoring flawed research does not make drug companies look noble. But the resulting publishing bias—only the better-conducted studies, which are also the more favorable ones, find their way into print—may not be a matter of conspiracy. As the authors of the Journal article put it, “We cannot determine whether the bias observed resulted from a failure to submit manuscripts on the part of authors and sponsors, from decisions by journal editors and reviewers not to publish, or both.” It may even be that the published studies reflect the drugs’ efficacy more accurately than do the overall results that Turner and his colleagues calculated.

Nor is it at all clear that the standards the FDA used for drug approval yielded harmful results for public health purposes. One requirement was that a drug must demonstrate efficacy in two trials; inconclusive findings in other trials might then be set aside. This standard may have been reasonable. Given how much “noise” enters into even careful research—problems in diagnosis, problems in outcome measurements—it is hard to identify effective medications. Only about one in 10 drugs that enter late-stage trials comes to market; excluding effective drugs may be a price we pay for setting the bar as high as we do. The Turner reanalysis supports this “two is enough” policy. When you include the negative data, each of the 12 antidepressants under study still demonstrates efficacy. That’s not to say that full study results shouldn’t be made public—only that there’s no evidence that the old procedures led the FDA astray.

As for the drugs, there is no great mystery about the efficacy of antidepressants. We have access to the results of large-scale trials whose protocols were published in advance and whose data have been analyzed openly at every stage. Study after study shows a response rate on the order of 50 percent to 60 percent, where the response to a placebo pill is 35 percent to 40 percent. In general, most of the positive change occurs in the sicker patients. The more stringent the study, the more robust the outcome. In research on hard-to-treat depression, like depression in conjunction with strokes or heart disease, antidepressants prove useful.

Another word about the measure Turner utilizes: Effect size was developed to assess interventions in education and psychotherapy. Studies in those fields cover widely different outcomes, using a variety of tests. In order to integrate and compare unlike measures, statisticians wanted a formula that would put results on a level playing field, by taking into account the intractability of a target for change. The mathematical correction factor is indirect, but the idea is that you ought to get more credit for changing stable phenomena and less for phenomena that fluctuate naturally.

It turns out that most psychotherapies have large effect sizes. A brief course of psychotherapy has more influence on mental health than addingnine months of reading instruction, halving class size, or introducing computers to a classroom does for academic success. In most head-to-head trials, antidepressants are at least as effective as psychotherapy. And in recent large studies where psychotherapy failed or showed minimal results, antidepressants succeeded. So, the power of antidepressants, properly tested, is likely to be many times what the aggregate data in the Turner analysis suggest.

To turn the matter on its head, the low effect sizes Turner reports for the FDA studies are for the very same drugs that perform well in research that vigorously tracks outcomes. If that’s the case, what do greater effect sizes signify? In the Turner analysis, Celexa demonstrated a low effect size: 0.24; but Celexa has just been shown, in a quite rigorous study, to bring benefit to about half of patients with complicated, chronic depression that had not responded to prior treatment. Effexor weighed in at 0.40. If the ratios Turner discusses are to be taken seriously—if Effexor is understood to be more than half, again, as effective as Celexa—then certain antidepressants may be very effective indeed. Some psychiatrists have always argued that Effexor, which directly influences a broader range of brain pathways than Prozac or Celexa, ought to be a better antidepressant—along the lines of older drugs, like Elavil, that may also have helped a higher percentage of patients, but with harsher side effects.

Then again, flawed studies are flawed studies. Perhaps the best thing to say about this newdata analysis is that it bears no news at all about antidepressants. They are just as good or as bad as we imagined them to be. The article’s contribution is to show that the publication process obscures negative studies about drugs; it adds nothing original about the drugs themselves.

And, of course, academic publishers have moved to clean up their acts. In 2004, the International Committee of Medical Journal Editors, a group that includes representatives of the New England Journal of Medicine, JAMA, the Lancet, and others, vowed not to publish outcome studies unless the trials had been publicly registered before the enrollment of the first patient. In 2007, the FDA moved to require more open registration of drug trials as well. The result should be much-desired transparency, from early in the course of a drug’s evaluation. But that transparency carries a risk—it might highlight poor studies that lead us to abandon promising medications. Here’s hoping that more open reporting will create pressures for better quality research.