We Have to Be Smart About Artificial Intelligence in Medicine

When the technology is complicated, opaque, changing, and absolutely vital to the health of a patient, how do we make sure it works as promised?

A person with their shirt pulled up to show off an insulin pump against her stomach.
A nurse teaches a patient with Type 1 diabetes how to use an insulin pump managed with an electronic control unit. Devices like this and the artificial pancreas (not pictured) represent a huge step forward for the treatment of diabetes.
BSIP/UIG via Getty Images

For millions of people suffering from diabetes, new technology enabled by artificial intelligence promises to make management much easier. Medtronic’s Guardian Connect system promises to alert users 10 to 60 minutes before they hit high or low blood sugar level thresholds, thanks to IBM Watson, “the same supercomputer technology that can predict global weather patterns.” Startup Beta Bionics goes even further: In May, it received Food and Drug Administration approval to start clinical trials on what it calls a “bionic pancreas system” powered by artificial intelligence, capable of “automatically and autonomously managing blood sugar levels 24/7.”

An artificial pancreas powered by artificial intelligence represents a huge step forward for the treatment of diabetes—but getting it right will be hard. Artificial intelligence (also known in various iterations as deep learning and machine learning) promises to automatically learn from patterns in medical data to help us do everything from managing diabetes to finding tumors in an MRI to predicting how long patients will live. But the artificial intelligence techniques involved are typically opaque. We often don’t know how the algorithm makes the eventual decision. And they may change and learn from new data—indeed, that’s a big part of the promise. But when the technology is complicated, opaque, changing, and absolutely vital to the health of a patient, how do we make sure it works as promised?

Diabetes devices provide an example in microcosm of broader issues with artificial intelligence in medicine. The potential here is enormous in terms of improved patient health, reduced costs, and increased access to high-quality care. Imagine if every primary care physician could diagnose certain eye problems as well as an ophthalmologist, or if pictures of skin lesions could be automatically evaluated for signs of cancer. These technologies are coming—the eye example is already FDA-approved. Soon, A.I. will make predictions, recommendations, and even decisions about patient care. But ensuring that medical A.I. consistently helps patients will demand careful study and continuing oversight.

To understand the challenge in diabetes, it helps to know the technological baseline. The traditional way to administer insulin to patients with Type 1 diabetes (where the pancreas doesn’t make insulin at all) or Type 2 diabetes (where the body becomes resistant to insulin) that can’t be managed with oral medication is through finger-sticks and injections. Patients check their blood sugar by pricking the tip of a finger and using a small test strip to measure the level of glucose in the blood. Patients also inject insulin manually, using a syringe, based on blood sugar readings and knowing when they are going to need extra insulin (typically around meals).

Existing technology has already made two of those steps automatable. First, a continuous glucose monitor can (as the name implies) continuously measure the amount of glucose in the patient’s blood through a small sensor inserted under the skin. Second, an insulin pump can continuously provide insulin to the patient through a catheter that sits just below the skin (with extra insulin when needed). But the patient, informed by doctors, is still in the middle, interpreting glucose readings and deciding when insulin is needed and how much. When you combine the two devices (a monitor and a pump), you get a system called an artificial pancreas. Typically, there is a controller, which can be embedded in one of the devices or separate, in a connected smartphone. If the patient is doing the controlling, it’s called an “open loop” system.

One step further is a “closed loop” artificial pancreas, where software handles the whole issue, receiving and interpreting signals from the monitor, deciding when and how much insulin is needed, and directing the insulin pump to provide the right amount. The first closed-loop system was approved in late 2016. The system should take as much of the issue off the mind of the patient as possible (though, of course, that has limits). Running a close-loop artificial pancreas is challenging. The way people respond to changing levels of carbohydrates is complicated, as is their response to insulin; it’s hard to model accurately. Making it even more complicated, each individual’s body reacts a little differently.

Here’s where artificial intelligence comes into play. Rather than trying explicitly to figure out the exact model for how bodies react to insulin and to carbohydrates, machine learning methods, given a lot of data, can find patterns and make predictions. And existing continuous glucose monitors (and insulin pumps) are excellent at generating a lot of data. The idea is to train artificial intelligence algorithms on vast amounts of data from diabetic patients, and to use the resulting trained algorithms to run a closed-loop artificial pancreas. Even more exciting, because the system will keep measuring blood glucose, it can learn from the new data and each patient’s artificial pancreas can customize itself over time as it acquires new data from that patient’s particular reactions.

Here’s the tough question: How will we know how well the system works? Diabetes software doesn’t exactly have the best track record when it comes to accuracy. A 2015 study found that among smartphone apps for calculating insulin doses, two-thirds of the apps risked giving incorrect results, often substantially so. So far, the most developed diabetes A.I. is incorporated into devices, not stand-alone, and we might hope that device-makers have more expertise than potentially fly-by-night app developers. But artificial intelligence is much more complicated, and typically, the trained algorithm doesn’t say (indeed, often it can’t say) how it makes its recommendations. And companies like to keep their algorithms proprietary for a competitive advantage, which makes it hard to know how they work and what flaws might have gone unnoticed in the development process.

Clinical trials help by having a group of patients try the algorithm for a carefully monitored period of time to see how they do. But what about once the systems are actually in use by individual patients, where each algorithm is learning from that patient’s data—how can we make sure that errors aren’t creeping into the system? Making sure it works right is especially important when the algorithm is actually making medical decisions all the time, determining insulin levels on its own. Those insulin levels themselves provide some immediate feedback but require that someone be watching. At least for the foreseeable future, these systems will likely need careful monitoring by patients, doctors, and the FDA, which has jurisdiction over medical devices.

These issues aren’t unique to diabetes care—other A.I. algorithms will also be complicated, opaque, and maybe kept secret by their developers. The potential for problems multiplies when an algorithm is learning from data from an entire hospital, or hospital system, or the collected data from an entire state or nation, not just a single patient. In these bigger contexts, we’d still like algorithms to learn from the wealth of new health data, and to keep learning as they affect the medical decisions of real patients going forward. When the issues get more complicated, it also gets harder to rely on one set of signals—a single patient’s blood sugar and A1C levels—to check whether the algorithms are doing their job. We don’t know what problems the algorithms could have caught, and which patients could have been better treated, or which patients got worse but didn’t need to.

The FDA is working on this problem. The head of the agency has expressed his enthusiasm for bringing A.I. safely into medical practice, and the agency has a new Digital Health Innovation Action Plan to try to tackle some of these issues. But they’re not easy, and one thing making it harder is a general desire to keep the algorithmic sauce secret. The example of IBM Watson for Oncology has given the field a bit of a recent black eye—it turns out that the company knew the algorithm gave poor recommendations for cancer treatment but kept that secret for more than a year. If artificial intelligence is going to make it safely and effectively into medical practice, it’s going to require lots of players to keep a sophisticated eye on how it gets implemented and how well it does. For patients with diabetes, and for the rest of us, it’s worth getting this right.