Last month, the New England Journal of Medicine reported that performing annual CT scans on smokers and others at risk for lung cancer could prevent “some 80 percent of deaths from lung cancer.” Yet the American Cancer Society and National Cancer Institute still haven’t endorsed this strategy. Frustrated, Dr. Claudia Henschke, the study’s lead author, recently told the New York Times, “I don’t get what the resistance is.” Why should such promising results prompt more study and not immediate action?
Most people take for granted that an ounce of prevention is worth a pound of cure. Today, a routine trip to the doctor is essentially a visit for numerous screening tests. The idea is that problems caught early can be treated and cured early. High blood pressure may indicate a risk for heart attacks in the future; mammograms may catch early breast cancers before they metastasize; blood routinely taken from newborns may indicate metabolic problems before brain damage results.
But the truth is that screening tests are just like any other drug or medical procedure, with potentially deadly risks that must be balanced with the potential benefits. The same people who would agonize over the decision to take estrogen-replacement therapy, for example, don’t think twice before getting a mammogram. However, as the data indicate, they sometimes should. Screening tests can cause harm in two major ways: false-positive diagnoses and unnecessary treatment of benign conditions. Unfortunately, these problems can be masked because a little-known but vital error pervades almost every major study involving screening for deadly diseases, especially cancers—and makes the tests appear better than they really are.
First, due to statistical reality, even highly accurate screening tests have many false positives. Take a random airport test for cocaine that correctly identifies 99 percent of cocaine smugglers and correctly excludes 99 percent of nonsmugglers, and assume about 100 smugglers enter an airport of 100,000 passengers. Among smugglers, 99 would have a positive test, and one would be negative. But among law-abiding travelers, 999 would have false-positive tests. Thus, only 99 out of 1,098 people who test positive, or less than 10 percent, are real smugglers. So, a lot of innocent people endure fruitless internal body-cavity searches. If all you care about is catching smugglers, the results are great, since only one escapes. But if you focus on the harm to bystanders, the screening procedure seems pretty draconian.
In principle, a mammogram works the same way. Almost 10 percent of annual mammograms are considered abnormal. According to a 1998 study from the New England Journal of Medicine, almost one in five women who do not have breast cancer will have a biopsy after about a decade of mammograms, and almost one-third will have some form of additional testing stemming from a false-positive breast-cancer screen.
The second danger of screening tests is overdiagnosis and overtreatment. For example, the most common pediatric solid tumor worldwide is neuroblastoma, which develops in roughly one in 7,000 children. Most parents discover their child has it only when they notice a large mass in the abdomen. But decades ago, researchers realized that even tiny neuroblastomas often secrete hormones similar to adrenaline into a child’s urine. They hypothesized that screening all healthy infants’ urine for the hormones could help find tumors that were still very small and possibly curable. The Japanese Ministry of Health was the first to try this; beginning in 1984, it tested the urine of Japanese infants and soon found hundreds of small neuroblastomas—which were promptly treated by surgery and chemotherapy.
Under great pressure to follow suit, Canadian and American officials decided instead to perform a controlled study of almost half a million infants and published their findings in 2002. Infants from Quebec were screened. Infants from other areas weren’t. When the groups were compared, the results were shocking. Twice as many screened infants were diagnosed with neuroblastoma, compared to other infants. But despite aggressive treatment, the overall death rate from the cancer was the same. Screening didn’t save lives. All it did was identify infants with harmless neuroblastomas that would have melted away without treatment—and subjected them to surgery and chemotherapy. One example of the human toll: During the study, a Canadian child suffered brain damage from surgery for a neuroblastoma that might have disappeared on its own, and fell into a persistent vegetative state.
In short, it would have been better for the infants as a group if their tumors had not been detected by urine screening. In 2005, economists from McGill University in Montreal calculated that Canada and the United States avoided almost $600 million in costs and the unnecessary treatment of almost 10,000 infants by not plunging into the screening program as Japan had.
As the Japanese Ministry of Health learned, the only way to assess the value of screening is by clinical trial, where a screened group gets compared to an unscreened group to see who ends up healthier. (Henschke’s recent study of CT scanning, alas, had no unscreened group.) Without careful trials, unproven screening mushrooms. And it’s impossible to uproot once established. Today, for example, most healthy laboring women are screened with fetal heart monitors for early brain asphyxia in fetuses, which might be relieved by cesarean section. Yet despite a five-fold increase in C-sections since the screening became routine, cerebral-palsy rates in babies remain unchanged. Without adequate data, prostate-specific antigen tests for prostate cancer have proliferated wildly among middle-aged men; their impact on heath is anyone’s guess. The American Diabetes Association recommends screening all patients over 45 years for diabetes. Yet there is no evidence that this improves health.
Which brings us to a fundamental problem with the screening studies themselves. As of 2002, only 16 randomized clinical trials of adult cancer screening had ever been done, and all of them concerned either chest X-ray screening for lung cancer, mammography for breast cancer, or fecal blood testing for colon cancer. (Conspicuously absent were trials for Pap smears, PSA screening, colonoscopy, rectal exams, and testicular self-exams.)
In a revealing 2002 paper in the Journal of the National Cancer Institute, William Black and colleagues from Dartmouth-Hitchcock Medical Center explain how those randomized studies—which form the backbone of some screening guidelines—actually emphasized the wrong outcome. Routine chest X-rays, for example, are supposed to reduce the death rate from lung cancer, and that’s what the studies typically measure and report. Studies also routinely show that mammograms reduce breast-cancer deaths. But that’s not really what people care about. What they want is an overall lower death rate. What good, after all, is a test that may lower the risk of lung-cancer death but increase the overall risk of death from side effects, such as pointless operations (as in neuroblastoma)?
Unfortunately, according to the Dartmouth analysis, none of the studies demonstrated any measurable overall reduction in mortality from cancer screening. Most worrisome, in half the studies, the overall mortality rates tended to be worse in screened groups than in unscreened groups—erasing any benefit of screening.
Without better studies on which to base national screening policies, efforts to prevent disease may do more harm than good. It’s hard to hold off on strategies as seductive as CT scanning to detect early lung cancers and study them further. But if we don’t—to paraphrase—we must be prepared to accept the consequences of going to war with the data we have, instead of the data we really need.