IBM’s Watson Supercomputer Was Supposed to Revolutionize Oncology. Things Aren’t Going Great.

IBM’s promises don’t always line up with reality.


In an IBM commercial from 2016, an adorable, gap-toothed girl named Annabelle sits down on a couch to chat with Watson, the company’s supercomputer. Reminding her that her birthday is coming up, it asks whether she’ll be having a cake. She cheerily responds that she will, even though she was too sick the year before. “The data your doctor shared shows you’re healthy,” it tells her in its clipped voice, adding that it helps physicians identify cancer treatments. “Watson, I like you,” Annabelle chirps as the segment concludes.

Chatty and personable, the Watson of this commercial—as in many of IBM’s more celebrity-focused spots—seems designed to assuage public fears about malevolent artificial intelligence: This is a HAL 9000 who would happily open the pod bay doors for you, so long as you asked nicely. Even the computer’s name—a reference to former IBM chairman Thomas J. Watson—also evokes that of Sherlock Holmes’ bumbling colleague, arguably suggesting that A.I. is there to help advance our stories, not to replace the human mind.*

That’s a worthy ideal, and one that squares with the hopes of artificial intelligence researchers who believe advanced algorithms will facilitate human labor instead of supplanting it. It is, notably, the goal of IBM’s Watson for Oncology, the ambitious project referenced in that commercial, which promises to let doctors “Spend less time searching literature and [electronic medical records], and more time caring for patients.” Despite the occasional, enthusiastic article proposing that the supercomputer will become the “best doctor in the world,” its goal has always been to ease the work of clinicians, not to supplant them.

The trouble is, Watson may not even be good at that relatively modest task.

In a lengthy and heavily reported article from Stat, Casey Ross and Ike Swetlitz write that the widely touted oncology system doesn’t appear to be as useful as IBM’s marketing suggests. As they put it, their reporting “suggest[s] that IBM, in its rush to bolster flagging revenue, unleashed a product without fully assessing the challenges of deploying it in hospitals globally.”

As Jennings Brown has written in Gizmodo, Watson has long been more a triumph of marketing than anything else, but Stat offers a deep dive into the supercomputer’s practical limitations. Among other things, Watson for Oncology fails to live up to the most sweeping promise of AI-assisted medical care: that it would help generate novel treatment regimens from individual patient data. To the contrary, Ross and Swetlitz explain, Watson’s recommendations rely “exclusively on training by human overseers, who laboriously feed Watson information about how patients with specific characteristics should be treated.”

That training comes entirely from the work of physicians at Memorial Sloan Kettering Hospital in New York. After receiving a given patient’s medical records, Watson reportedly crawls through the database those doctors have created and provides a list of recommended treatment plans. It organizes these suggestions according to their probable efficacy, though the design of the system is such that it cannot explain why it weighs one over another.

It’s here that the true problems begin to reveal themselves. According to one of Ross and Swetlitz’s sources, for example, Watson sometimes suggests a chemotherapy drug for patients whose cancer has spread to the lymph nodes, even when it has been given information about a patient whose cancer has not spread to the lymph nodes.

In other cases, its recommendations show a clear geographic bias: Because its proposals derive from a single hospital, they fail to take the treatment protocols of other countries into account. Where Memorial Sloan Kettering’s patients are, Ross and Swetlitz write, “generally affluent,” other hospitals may not have access to the same resources. Accordingly, Watson threatens to devolve into another example of algorithmic bias, normatively imposing judgments that apply to a specific population, whether or not they are optimal in other parts of the world.

Such considerations come to a head, Ross and Swetlitz suggest, in IBM’s failure to empirically demonstrate that Watson improves medical care in any meaningful way. While some reporting does indicate that the computer can save clinical research time in certain contexts, no studies on Watson for Oncology have yet appeared in peer-reviewed publications. Where research has been conducted, it’s not always positive: Unpublished work from Denmark, for example, indicates only 33 percent agreement between the computer’s proposals and those of medical professionals. And even where the alignment is higher, Ross and Swetlitz write, “[S]howing that Watson agrees with the doctors proves only that it is competent in applying existing methods of care, not that it can improve them.”

As CNBC reports, a “recent Gallup survey [found] that about one in eight workers, or 13 percent of Americans … believe it’s likely they will lose their jobs due to new technology, automation, robots or AI in the next five years.” Given that A.I. is likely to leave millions unemployed in the years ahead, it’s tempting to suggest that 87 percent of Americans aren’t worried enough. As Stat’s investigation shows, however, oncologists are probably safe—at least for now.

*Correction, Sept. 6, 2017: This post originally misidentified the origin of the name of the IBM supercomputer Watson.