The Kidney Transplant Algorithm’s Surprising Lessons for Ethical A.I.

A more democratic approach to A.I. is messy, but it can work.

In May 2021, I got a call I never expected. I was working on a book about A.I. ethics, focused on the algorithm that gives out kidneys to transplant patients in the United States. Darren Stewart—a data scientist from UNOS, the nonprofit that runs the kidney allocation process—was calling to get my take: How many decimal places should they include when calculating each patient’s allocation score? The score is an incredibly important number, given it determines which patient will get first chance at each donated organ. But still: decimal places? Surely, I thought, this was a technical detail—a question for the experts.

I quickly discovered that I was wrong. The issue was ethical rather than technical, and fascinating: The computers at UNOS could calculate each patient’s allocation score out to 16 decimal places. And in some cases, for patients in very similar situations, the system would need to carry things out to the umpteenth decimal place in order to find a difference between two transplant patients and offer an available organ to one rather than the other.

Given how high the stakes are, why not use all the data you can and be as precise as possible? To Stewart and his colleagues, that approach felt wrong. Past a certain point, these extra digits of precision get so small that they stop describing real medical differences between patients. A tiny difference in the 14th decimal place between Alice’s kidney score and Bob’s doesn’t actually mean that Alice would do better with a transplant. At that point, the score difference may just be a technical pretext for giving the organ to one candidate instead of the other, a reason that may outwardly appear neutral and objective, but is actually arbitrary.

“We want to make our decisions as much as possible based on clinical criteria and not flipping a coin,” Stewart told me. But how much was it possible, or fair, to use clinical criteria? If two patients are, as far as we can tell, clinically equivalent, then it may be wrong not to flip a coin between them.

Stewart’s question opened the door to deeper ones: Which decisions really belong to the technical experts? And where instead are we stretching the role of the expert too far, and latching on to technical details as an excuse not to face a hard moral question?

It’s a quandary that comes up wherever high-stakes software shapes human lives—which is to say, in more and more places. When software reads online resumes, someone has to decide which factors are relevant to the job and fair to consider. In courtrooms, algorithms tag some defendants as “high risk”—which means someone has to draw a line and decide how much (and what kind of) risk it takes to be tagged as dangerous. Assigning students to public schools? There again, the ideal of a good student needs to be quantified. The most comfortable move for a citizen, public official, or executive in situations like these is often to abdicate the hard choices to somebody else, hoping the experts will figure it out. Such choices can be pushed down the line until a data scientist or other expert is left with no alternative but to make what is inescapably a moral decision, often on the basis of some technical and neutral seeming detail.

But in the world of organ transplants, surgeons and data scientists have an unusual habit of being brutally honest about the human lives behind their work—of inviting others into the impossible choices their field confronts. For better and worse, the organ transplant system is itself a real-life laboratory of more inclusive, accountable techniques for building and using A.I.—approaches that are now being proposed in U.S. and EU legislation that could cover courtrooms, hiring, housing, and many other sensitive domains.

And that’s what made Stewart’s phone call so exciting. Rather than just deciding, the data scientists at UNOS were in essence raising their hands to say that a seemingly technical question was actually a moral one, and that they were not the people to resolve it.

Where did this culture of moral humility—one that’s now shaping the design of a high-stakes A.I. system—come from?

Collaborative decision-making about hard ethical choices in kidney medicine began before the digital revolution. It began before there were many kidney transplants. It started, in fact, with Teflon, and the work of a pioneering doctor, a man his friends called “Scrib.”

In the late 1950s, when Belding Scribner began his work, kidney failure was a fatal illness, but one that scientists were tantalizingly close to cracking open. Healthy people have two kidneys, and surgeons had figured out how to transplant a kidney and get it working again in the recipient’s body. But unless that organ came from an identical twin, the recipient’s immune system would likely “reject” the organ within a few weeks, attacking it as foreign. Without a transplant, on the other hand, doctors knew how to use an external machine to filter toxins out of the patient’s blood, acting in place of her kidneys—the process known as dialysis. But each dialysis session required a large-bore needle to be inserted somewhere new on the patient’s body. After about a month, doctors would run out of places to put the needle, and the patient would begin a fatal decline.

Scrib realized that Teflon tubing—the nonstick material that was first used for pots and pans—could be left in a patient’s arm indefinitely, providing a reliable way to access all their blood. This meant a kidney patient could receive dialysis over and over, potentially for years on end, and kidney failure could become a chronic condition.

But there was a problem: Scrib had just a handful of dialysis machines, so he could keep alive only a few patients at any one time. His smooth plastic tubing had created a sticky moral dilemma: Whom to save? Scribner and his team were inundated with pleas from dying patients and their doctors.

Faced with this quandary, Scribner and his colleagues chose to do something extraordinary: They shared their moral burden with the Seattle community they served. Rather than pretending that their technical expertise gave them special moral standing, they chose to be morally modest, and to widen the circle. The doctors still decided who was medically eligible for dialysis. But then, they established a second committee, a group of seven laypeople chosen by the local medical society, who would make the non-medical decision of how to allocate the few available slots among the many eligible patients. The committee members were given some basic education about kidney medicine, but weren’t told how to make their moral choices.

They Decide Who Lives, Who Dies” was the headline of a 1962 Life magazine article about this new group. Its members, who were anonymous, were photographed in shadow. A clerical collar can be seen on one. The lone woman of the group, a homemaker, clasps a pair of reading glasses in her folded hands. The article reported that the committee’s approach was based on “acceptance of the principle that all segments of society, not just the medical fraternity, should share the burden of choice as to which patients to treat and which ones to let die.”

The Life story described some biases that played out on the committee—they favored male breadwinners who had children to support—and it triggered widespread revulsion. A pair of scholars wrote that the committee was judging people “in accordance with its own middle-class suburban value system: scouts, Sunday school, Red Cross. This rules out creative nonconformists … the Pacific Northwest is no place for a Henry David Thoreau with bad kidneys.” The original Life story never mentioned race, but later reporting suggested the committee had been biased in favor of white applicants. The committee only ran for a few years. Other dialysis facilities used different rationing strategies—including first-come, first-served—and in 1972 Congress passed an extraordinary law to provide dialysis at public expense through Medicare to all patients who needed it. That proved to be a humane, if extremely costly, escape route from the rationing problem that Scribner once faced.

Along with all its faults, I think the Seattle committee also gave us much to admire. It was profoundly, even uncomfortably, honest about the hard choices at the center of kidney medicine. It refused to pretend that such choices were—or ever could be—entirely technical. And it tried, albeit clumsily, to democratize the values inside a complex, high-tech system. The Seattle physicians and their lay colleagues were rationing a scarce supply of dialysis treatments. But even after Congress provided dialysis for everyone, the shortage of transplantable kidneys was destined to spark similar questions, ones we still face today. And Scrib’s experiment with sharing the moral microphone (along with other stories I tell in the book) helped spark the system we have today.

Right now, about 100,000 people in the U.S. are waiting for a kidney transplant. Each time a kidney becomes available, an algorithm decides which of the many waiting patients should be offered that organ, based on a complex blend of medical data, logistics, and moral judgment. Compared with many other high-stakes A.I. systems, this one is governed in a relatively inclusive, accountable way: The hard moral tradeoffs are made with public input, collected through public meetings and open comments. The allocation rules are transparent, and explained in plain English. There are forecasts of how proposed changes would impact the system. And, every year, there are audits, prepared by a third-party monitor, that show how the rules are working out. These practices, and the culture that surrounds them, are a valuable learning opportunity about the virtues—and limits—of inclusive A.I. design.

The kidney allocation algorithm that operates today—the software matching up organs with patients even as you read these words—still bears the mark of Scribner’s moral modesty, among many other influences. The transplant system still has profound problems: As in many other parts of American medicine, structural inequalities that correlate with race, wealth, and social advantage often shape who lives and who dies. At the same time, the system’s governance is uncommonly democratic for high-stakes A.I., and through the 10-year overhaul described in my book, the system became incrementally fairer and more efficient, saving lives and moving toward a more level playing field across racial and other differences. The algorithm was developed with extensive public consultation and input. Forecasts of the algorithm’s possible impact played a pivotal role in helping people understand the stakes and ultimately rewrite the system’s moral rules. The new algorithm’s logic, and the factors that determine each patient’s fate within it, are transparent—not only publicly disclosed, but explained in simple terms. And the system’s performance is subject to annual audits, by an organization that publishes detailed reports.

The current algorithm is totally different than it would have been if experts alone had designed it. The first plan for the new system, unveiled at a public meeting in Dallas in 2007, was to maximize the number of life-years saved by the available organs. That’s an idea that makes a lot of sense to doctors. But it would likely also have meant more organs for younger, wealthier, and whiter patients—worsening racial disparities and hurting patients who’d waited the longest. The designers of this first plan were honest about its faults. They even showed a chart that vividly depicted how the plan would shift transplants away from older patients, toward younger ones.

I talked to Clive Grawe, a kidney patient from Los Angeles who traveled to Dallas to oppose that life-years plan. The first person without a Ph.D. to address the meeting, Grawe, then 55, had a rare disease that causes the kidneys to break down over time. He argued that older patients, like him, would be unfairly slighted under the plan, since it would use age as a factor. Even if they were actually healthier than someone younger, taking better care of themselves, and would gain more years of life, these older patients would still lose out. Other attendees argued that the whole idea of maximizing life-years saved—which would involve favoring wealthier candidates, who have better overall health and access to care—was unfair even if done perfectly. In the end, the Dallas plan was scrapped, and today’s algorithm strikes more of a balance, offering the healthiest kidneys to patients who are likely to live the longest, while still giving the rest of the waiting list a reasonable chance to receive other kidneys. Compared with what came before, this does increase the total amount of life saved, but it also improves race equity, and keeps the door open for older patients to get transplants.

The process was messy and slow. There’s no perfect way of rationing scarce organs, and the debate wound on for 10 years, until people gradually aligned around a tolerable compromise. Even now, unfair hurdles still stop many patients from joining the transplant list in the first place: Low-income patients on dialysis are often confused about how to join the list, and hospitals face pressure to screen out candidates who they fear might lack transportation to appointments, sufficient family support, or other advantages. Some problems that went ignored for decades are only now being addressed. Earlier this summer, for instance, the system finally removed an explicit race factor that had long made it harder for Black patients to get kidneys. And even if you like the allocation rules, there are other challenges—the interface is cumbersome. Data entry is error-prone. The allocation logic is public and its performance is audited, but UNOS claims that the underlying source code is a trade secret. The people in transplant medicine are extremely skilled and hard-working, but they and their organizations, much like people anywhere, can also be territorial and obstinate.

Voices in the Code: A Story about People, Their Values, and the Algorithm They Made

By David G. Robinson