Artificial intelligence judging has become a reality. Last month, a Colombian judge used ChatGPT to generate part of his judicial opinion. Colombia is not alone. Estonia has piloted a robot judge, and the United States and Canada increasingly use A.I. tools in law.
These recent events have sparked a debate about “unethical” uses of A.I. in the judiciary. As the technological hurdles to A.I. judging recede, the remaining barriers are ones of law and ethics.
Would it be fair to citizens for an A.I. judge—an algorithmic decision-maker—to resolve disputes? This is a complex legal and ethical question, but one useful piece of data is the views of citizens themselves. We conducted experiments on a representative sample of 6,000 U.S. adults to examine this question. And the results are surprising: Citizens don’t always see A.I. in the courtroom as unfair.
This result—human judges are not always seen as fairer than A.I. judges—defies conventional wisdom. Commentators have long seen the administration of justice as a distinctively human enterprise. The task of judging calls not only for knowledge and accuracy but also a respect for the dignity of the parties involved. If A.I. were incapable of conveying such an attitude, then human judges would have an inimitable procedural justice advantage over machines.
At first blush, our results support this intuition that human judges are fairer. Ordinary citizens generally evaluate A.I. judges as less fair than human judges. In our first study, participants evaluated one of three scenarios: a contract dispute, bail determination, or criminal sentencing. Summing across all scenarios of the first study, human judges received an average procedural fairness score of approximately 4.4 on a 7-point scale. A.I. judges scored very slightly below 4. We call this perceived difference the “human-A.I. fairness gap.” All else equal, people evaluate legal proceedings before a human judge as fairer than legal proceedings before an A.I. judge. The human-A.I. fairness gap persists across diverse legal areas and issues.
However, we also discover that this human-A.I. gap can be partially offset by increasing the A.I. judge’s interpretability and ability to provide a hearing. A hearing affords a party the opportunity to speak and be heard. A decision is interpretable if it can be presented in a logical form and if it is possible to grasp how changes in inputs affect outcomes. Both a hearing and an interpretable decision enhance ordinary judgments of fairness, whether the decision-maker is a human or an A.I. Strikingly, a human-led proceeding that does not offer a hearing and renders uninterpretable decisions is not seen as being fairer than an A.I.-led proceeding that offers a hearing and renders interpretable decisions.
This is surprising since one might have believed a hearing in front of a machine to be hollow and meaningless. For ordinary citizens to feel they have been listened to seems to require a decision-maker possessing the uniquely human capacity for empathy. Yet, we find that a machine described as being able to recognize speech and facial expressions and trained to detect emotions can enhance people’s perceptions of procedural justice.
Similarly, much of the legal-ethical discourse over A.I. has revolved around interpretability of algorithms. Often, the debate implicitly assumes that comparable decisions by humans are interpretable. However, commentators have noted that humans are quintessential black boxes. Human decision-making is not always transparent to the decision-maker, never mind other humans. And we find that people do care about the interpretability of both human and A.I. decision-making.
How do we get from these findings to the conclusion that the human-A.I. fairness gap might one day be offset? Well, even today, full hearings in front of human judges are not always feasible because of resource constraints. For example, an asylum hearing will often only last several minutes. The same is true for bail hearings. Similarly, human judicial decisions are not perfectly interpretable. Human legal opinions vary in their readability, and A.I. tools can already provide highly readable text. It is not clear that A.I. tools can currently produce more interpretable judicial opinions than humans, but their ability to pass as legal reasoners is impressive. For example, ChatGPT recently passed four Minnesota Law School exams.
Finally, our studies suggest that the human-A.I. fairness gap is mainly driven by the belief that human judges are still more accurate than machines. However, there are and increasingly will be domains where machines will be demonstrably more accurate than humans, such as tumor classification. And experts predict that A.I. will exceed human performance in other fields over the next century.
There are many other factors that may influence citizens’ evaluations of human and A.I. judges. Both humans and A.I. have their advantages. On the question of accuracy, one consideration is whether the administration of justice is reliable or random. Human asylum adjudication has been described as akin to “roulette.” The grant or refusal of asylum depends very much on who among the human judges hears the case. Insofar as predictability matters for perceptions of judicial fairness, variability between human judges may count against them as adjudicators for some kinds of cases.
Even without considering such additional factors, simply adding a hearing and increasing the interpretability of A.I.-rendered decisions reduces the fairness gap. As such, some human judicial decisions today may be seen as less fair than advanced A.I. ones. And future developments in A.I. judging along the dimensions we have identified could even result in A.I. proceedings being accepted as generally fairer than human proceedings.
Of course, people’s ordinary intuitions about the fairness of A.I. judging do not fully resolve the underlying ethical and legal concerns. People can be mistaken about fairness or manipulated into believing a procedure is fair when it is not. But the opinions of those subject to the law should be taken into account when designing adjudicative institutions. And in some circumstances, people see having your day in human court as no fairer than having your day in robot court.
An expanded version of this work appears in the Harvard Journal of Law & Technology.