Do Algorithms Make Sentencing Fairer?
S1: Judge Machimura? Yes. Hi. Hi, it’s Lizzie O’Leary. How are you?
S2: Just fine. Nice to meet you this way.
S1: A few weeks ago, I called up a judge in Wisconsin named Nicholas McNamara. He serves on the circuit court in Dane County, which includes the city of Madison, the state capital. Before McNamara was a judge. He was a lawyer. He represented plaintiffs in negligence cases, civil cases. But when he became a judge in 2009, that changed his caseload, became almost exclusively criminal.
S3: Did you have a picture in your mind of what say, oh, evaluating an offender or sentencing would be like before you actually took office?
S2: No, I had no idea. No.
S4: Eventually, of course, he figured it out as a starting point. In Wisconsin, there are three primary sentencing factors for a judge to consider the gravity of the offense.
S5: The character of the offender and the need to protect the public to assess these factors.
S4: Wisconsin judges can order what’s called a pre-sentence report.
S5: I want to get background information on the defendant and sometimes from victims when there’s victims. And that’s what the appreciators report does in interviews, family members of the defendant and the victim and history of education and traumatic life experiences for the defendant. Other challenges, disabilities, mental health, addiction issues, criminal with this information in hand.
S1: Eventually a decision is made, a sentence is handed down, but the process doesn’t end there.
S6: We’re required to explain our sentencing decisions and all of our guiding cases and instruction and education for sentencing confirms that judge’s discretion that’s unguided and unchecked. This is not due process that the rationale for sentencing decisions must be made knowable and subjective to review. It primarily means we have to explain it. But in terms of how we actually reach numbers or other decisions, whether to incarcerate or release in a community, that’s that’s not easy to explain this question, this question of how judges come to these decisions three years or five, whether there’s an opportunity for parole.
S1: It’s hard. It’s been known for a long time that the sentencing process is imperfect like the rest of us. Judges have biases, sometimes conscious, sometimes not. There was a famous study back in 2011 that showed Israeli judges presiding over parole hearings were more lenient at the beginning of the day and after lunch. So back in the first few years of McNamara’s time as a judge, the state made a change.
S6: The Department of Corrections began giving us at the end of all of the prisons, reports the complex evaluation.
S4: Compass is an acronym. It stands for Correctional Offender Management Profiling for Alternative Sanctions. It’s used in a bunch of different ways. But a simple explanation is this Compass is an algorithm that’s used to predict recidivism or how likely it is that a person will re-offend. One part of Compass, the risk assessment adds up risk factors to create a picture of the defendant that might include things like age, criminal history and employment history. Those factors then create a score which judges can use in their sentencing decisions.
S7: Algorithms like Compass are increasingly popular. They’re used in 35 states, but they don’t always work like they’re supposed to. Today on the show, algorithms and criminal justice and how one program intended to make sentencing fairer didn’t. I’m Lizzie O’Leary and this is What Next TBD, a show about technology, power and how the future will be determined. Stay with us.
S1: Jim MacNamara remembers the first time Compass was introduced to him, the company that makes the algorithm. It was then called Northpoint presented the tool at a big meeting.
S6: There were a lot of lawyers and judges there and it actually kind of felt like a sales pitch.
S2: And that’s what it was. Yeah.
S3: What was your reaction when you were presented with with Compass?
S6: My reaction first was very negative about the fact that Compass and Northpoint, the business that owned it and profited from it, was very, very clear about the proprietary nature of their algorithms, their various formulas and the data itself. And then also the more we learned about it, including at that original presentation, the less useful it seemed.
S1: One of McNamara’s first issues with the software was its purpose.
S8: It was not created to be used by judges at sentencing. It was never designed for that. It was never intended for that by the people who created it. It was created to be used by correction officials and officers supervising people either in an institution in terms of their risk or for placement in the community primarily.
S1: His second problem was how the program defined risk.
S8: Compass only evaluates the risk of the defendant being rearrested within two years of the assessment.
S1: But judges didn’t always keep that timeframe in mind when making decisions.
S9: I know of people that were using it and still do consider it. Forget that a high risk person is simply a person who is going to be has some degree of risk of re-arrest within two years.
S1: So for Judge McNamara, sitting in this presentation, the list of problems with the Compass algorithm was growing longer and longer. But perhaps the biggest flaw in his mind was transparency. The algorithm was proprietary so the judges couldn’t evaluate its inner workings.
S6: The absolute refusal to even contemplate taking out certain questions or telling us how we could we wait answers to certain questions or rescore. That was the first big objection. It’s just like, wow, this is a black box and you’re asking the government to start using this and you won’t even tell us how it works.
S1: The question of how compas and other algorithms like it work is something journalists and researchers have tried to answer, but it’s hard. After all, how can you evaluate whether an algorithm makes sentencing fairer? If you can’t get all the data? But there’s one new study in one state where we can see inside an algorithm. In Virginia, it’s a compass, but it’s similar. Why did you choose to look at Virginia? Given that these things are used in lots of places around the country.
S10: So Virginia was one of the first states to adopt risk assessments in sentencing.
S1: This is Jennifer Doleac. She’s an economist at Texas A&M and she’s been studying Virginia sentencing algorithm for non-violent offenders.
S11: Virginia had just implemented a truth in sentencing reform during the mid 90s. This is a reform that pushes the system toward incarcerating people for their full sentence length, so basically reduces the possibility of parole. And people looked at that and said our incarceration rates are going to skyrocket as a result of this. So we need some sort of release valve. Right. We need to figure out who to let out so that we have room in our prisons for all these people that are going to be incarcerated now due to truth and sentencing. And so the nonviolent risk assessment was implemented with the explicit goal of identifying the twenty five percent lowest risk nonviolent offenders and recommending them for diversion from incarceration.
S1: The other reason why Doleac wanted to study Virginia’s algorithm is that, unlike Compass, it’s transparent.
S12: The information that goes into the risk assessment is is public. And so it’s stuff like the person’s age, their gender, their criminal history during the period we’re studying.
S11: It also included some socioeconomic factors like whether they were employed, whether they’re married.
S1: Why did you want to study them?
S13: So when I went into this research, I was actually extremely optimistic about the potential of risk assessments to reduce racial disparities in criminal sentencing.
S12: So we know that in general, humans are not that good at making predictions. In particular, we tend to get distracted by a lot of irrelevant information, like whether we’re hungry or cold or tired or whether our football team lost to that weekend. There’s a whole bunch of research showing that those types of factors tend to negatively affect the defendant. That’s before the judge. And particularly tends to negatively impact black defendants before the judge. And in addition, we know that that humans are racially biased also. And so it seems like there’s a lot of room for a tool like this to improve decision making and reduce racial disparities. If we sort of are able to crunch the numbers in a standardized way across all defendants and then present that information to the judge in a way that can inform their decision and maybe cause them to pause a bit if they if they think someone standing and burned them is high risk and the computer says they’re low risk, maybe they will think about that and change their decision.
S1: This is an experience that Judge McNamara mentioned having when we spoke to him, taking a moment to examine his own biases when the risk assessment and his personal assessment were at odds. For Doleac, this interaction is essential.
S13: As an economist, the way I frame this problem is not so much you know, whether the algorithm is is racist or biased relative to some objective truth. It’s whether it’s more bias than the judge would be relative to the status quo. Does this move us in a better direction? It doesn’t have to completely solve the problem, but it could move us in a better direction. And so we were really interested going into this project. You know, all of all the existing research has really consider these algorithms basically in a vacuum, thinking of them as replacing the judge’s decision.
S10: But in practice, they don’t ever replace the judge’s decision. They’re just informing the judge. And so the real question is how the machine and the human interact.
S4: React. And her co-author, Meghan Stevenson, released their findings in a working paper late last year. And their questions about that interaction between judges and algorithms yielded some surprising results.
S3: One thing stood out to me when I read your paper, there’s a sense in here that says we find that racial disparities increased in the courts that appear to use risk assessments most.
S1: Why do you think that is?
S10: So we don’t find any change in racial disparities when we look statewide, when we it is really when we just look at the the jurisdictions where the judges didn’t seem to change their behavior the most when this policy went into effect, that we see a slight increase in racial disparities. And so black defendants do worse relative to white defendants. I think there are two reasons for this. So one is that the risk assessments themselves are worse on average for black defendants relative to white defendants tends to be due to things like the criminal history information as well as socioeconomic information. So whether they’re employed or whether they’re married.
S3: Those tend to be correlated with race, even though we should say that race is not explicitly mentioned in these assessments.
S10: Exactly. So it’s it’s unconstitutional to include race in these algorithms, which is a whole nother conversation. There are lots of smart people who are, you know, now arguing maybe we should rethink that because there’s actually a good reason to control for race and then subtract that out. And that could solve some of these problems. But as it currently stands, you’re not allowed to include race as a variable. But you can’t include a whole bunch of other stuff that’s correlated with race. And even just criminal history alone tends to be correlated with race for a variety of reasons, including the biases in the criminal justice system. And so part of it is the risk assessments themselves. The scores do seem to be a bit biased against black defendants. But the other problem is that judges implement or pay attention to the risk assessments in a racially biased way. So if you see two defendants, one black, one white with the same risk, or they’re more likely to divert the white defendant from incarceration than they are the black defendant. And so there is still this the the element of human bias that we were hoping these risk assessments would remove now just affects how they’re implemented.
S1: Their second major finding centers on youth as these algorithms became more and more common or on the country, there was this popular idea that judges were making a lot of mistakes in their sentencing decisions, letting high risk people slip through the cracks of the system. But Doley says the decision to let these high risk offenders off more lightly wasn’t a mistake. It was a choice.
S10: And what we’re able to show in our paper is that those high risk people are mostly young people. And it turns out that there is no really long standing tradition in the criminal justice system to consider youth a mitigating factor in sentencing. So we generally think that if you’re young, you’re just less culpable for your crime. Your brain hasn’t fully developed yet. We know you’re probably going to age out of whatever it is that you’re doing. And so so we tend to air on the side of giving young people a second chance.
S1: In Virginia, the presence of a risk assessment increased sentences for young people because the algorithm views youth as a risk factor. Young defendants got a little lucky, though, because judges didn’t always do with the algorithm set.
S10: We find that the risk assessments in Virginia did increase sentencing pretty substantially for young people, but nowhere near as much as they would have if the judges had actually followed the recommendations all the time.
S13: And so this really calls into question whether the previous studies that suggested the judges were making mistakes all the time, they might not have been making mistakes. They might have just had a competing objective that the risk assessment wasn’t taking into account.
S1: One of the most important things to consider, Doley access isn’t necessarily exactly how the algorithms work, but rather how we work with them and how our faith and technology can mislead us.
S14: One of the reasons that people are really concerned about risk assessments is that even if they don’t do any worse than the judge does, even if they just present the same information that the judge has, it’s now presented with sort of a veneer of science. We might generally know to second guess a human being’s decision because we make mistakes all the time. We might be by. We know that humans are biased, but if the computer said that this person’s high risk, well, then that must be accurate. And I think that we just need to kind of make sure that we’re comfortable enough with all these tools to recognize that they’re not magic. They’re a function of what we design them to do. And in this case, you know, we’re predicting risk, and that’s not necessarily the only thing that we care about in these sentencing decisions. And so that seems to be what’s causing a lot of the trouble.
S1: You went into this project pretty optimistic. How do you feel now?
S15: I still like to think of myself as an optimistic person. I do think that there is at my core, I still think that there’s great potential to these kinds of tools, but I’m now much less optimistic about how quickly we will get to a scenario where we’re able to use them for good.
S5: And as for how Judge MacNamara feels, these tools are just too premature. It’s too it’s too early. The science is not caught up with our dreams of how these would work or hoped. And people like Professor Dooryard Stevenson are showing us where these limits are. And hopefully the tools will get better. And hopefully we haven’t completely given up on them.
S16: But but there is not certainty. My favorite quote for my whole job is. Still, terrorists, that doubt is a is not a pleasant condition, but certainties absurd, and I accept the doubt and uncertainty of what I do. It’s unpleasant, but it’s more honest. And risk assessments don’t change that. They might take away some doubt, but only if understood, right?
S17: Nicholas MacNamara is a judge on the Circuit Court of Dane County, Wisconsin. Jennifer Doleac is an associate professor of economics at Texas A&M and the director of the Justice Tech Lab. OK, that’s the show. What next? TBD is produced by Ethan Brooks and hosted by me. Lizzie O’Leary. And it’s part of a larger what next family. Mary and her team will be back later today to update you on all things impeachment. And TBD is also a part of Future Tense, a partnership of Slate, Arizona State University and New America. Thanks for listening. Talk to you next week.