David Heinemeier Hansson, a well-known software engineer, posted a viral Twitter thread last week denouncing the Apple Card as sexist after its algorithm determined that he deserved a credit limit 20 times higher than his wife’s. In a blog post, his wife, Jamie Heinemeier Hansson, explained that there was no apparent reason for the discrepancy, writing, “I have had credit in the U.S. far longer than David. I have never had a single late payment. I do not have any debts. David and I share all financial accounts, and my very good credit score is higher than David’s.” Apple co-founder Steve Wozniak later revealed that the algorithm had given him a credit line 10 times higher than his wife’s, also for no apparent reason.
The thread sparked an uproar on Twitter over the weekend, and the New York State Department of Financial Services announced on Saturday that it was launching an investigation into the credit card program, which Apple operates jointly with Goldman Sachs. The department declared, “Financial services companies are responsible for ensuring the algorithms they use do not even unintentionally discriminate against protected groups.” Goldman Sachs maintains that it has “not and never will make decisions based on factors like gender.”
To understand how an algorithm could be systematically giving female Apple Card customers lower credit lines than men—and to discuss how the opacity of such algorithms often allows discrimination to persist—I spoke to Cathy O’Neil, a mathematician and the author of Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Our interview has been condensed and edited for clarity.
Aaron Mak: What was your initial reaction to the news of this incident?
Cathy O’Neil: Not even surprised. Of course this is going to keep happening as companies that are investing in this kind of [algorithmic] technology keep pretending it’s not a problem. They look at the upside—which is faster, scalable, quick decision-making—and they ignore the downside, which is that they’re taking on a lot of risk.
So at first glance, this does seem like a pretty convincing case of algorithm-fueled gender discrimination?
We don’t have enough information to know what was really going on there. The truth is they have all sorts of data about us that we don’t even know about, and our profiles, even if they’re not accurate, are available for corporations like Apple to purchase, even though we individuals can’t purchase our own. So there’s all sorts of things that could have happened in that particular case that might not have anything to do with gender. But on the other hand, we don’t know. It could have something to do with gender. The larger point is that it’s unaccountable and opaque, and Apple doesn’t really care. The most important point being that we should demand that they do better than that. We should demand accountability on the part of anybody who’s using an algorithm like that.
What kind of information would we want to see in order to determine whether there was gender discrimination?
There’s two broad types of evidence that we’d want. One of them is a statistical evidence that indicates they have a definition of fairness in various directions, like for every protected class they’re certain to not be discriminatory. They would have to define what it means to be fair with respect to gender, or fair with respect to race, or fair with respect to veteran status, or whatever the protected class is. They’d have to define fairness and they’d have to show us evidence that they’ve tested the algorithm for fairness. That’s a statistical question; that’s something that an individual probably would never be able to get at, which is why I’m not ready to say this is gender discrimination, because we just have two data points. We cannot categorize this problem right now with the information we have.
That’s one of the reasons that the only people who can do it must do it, and those are the people who use and deploy the algorithm. They’re the ones who are capable of doing this. Technically speaking it’s a legal requirement, but the regulators in charge of this kind of thing just simply do not know how to force the companies to do it. So that’s got to change.
How could the algorithm have produced a disparity like this? What factors may have been gendered?
There are so many different ways that this could have happened that it would be irresponsible for me to say that I know, so I won’t. But I could speculate that in my household, I do way more spending than my husband. If my husband and I were to apply and I got a much higher credit rating, that wouldn’t surprise me. It wouldn’t necessarily be gendered, although there is a gender pattern of who does the shopping. So in that sense there is gender involved, but I’m just saying that it might be an individual behavior that is tipping the scale here. The example from Twitter is that they’re married and they’ve been living in the same household for a long time and the wife has a higher credit score, but we don’t really know what their behaviors are relative to each other, and the point is that a big company like Apple absolutely can buy demographic profiling and behavioral profiling. I don’t know, and point is that we should; we should at least know to trust it in a larger, statistical sense.
You were talking about the idea of fairness earlier. How do people usually try to think about fairness with an algorithm like this?
We have to develop a new way of talking about it, because the old way of talking about it doesn’t work anymore. The old way of talking about it was, “Let’s not use race or gender explicitly.” That kind of worked back when we didn’t collect a lot of data about each other and about ourselves, and it wasn’t just available everywhere. But nowadays we can infer race and gender based on all sorts of different indicators. There are proxies for those kinds of classes just up the wazoo for companies that are interested in inferring that kind of thing. Statistically they’re a lot better compared to a random guess. It’s no longer sufficient to say, “Let’s not explicitly use race and gender.”
There’s no way that an algorithm is colorblind or gender-blind. We have to think through more carefully what it means for an algorithm to be fair. And that might mean we actually do explicitly infer race and gender and compare the results by category. Are you offering lower APR and higher thresholds of credit to white men compared to black women? This is an opportunity for Apple, which prides itself on being such an edgy privacy company, to also be an edgy algorithmic fairness company.
Companies often argue that being more transparent about their algorithms would threaten their intellectual property. How do you address that?
Trusting an algorithm is different from knowing the source code. And I think they can establish trust at a statistical level. They could say, “We compared black women with this FICO score and what we offered them to white men with the same FICO score. And we found this is the difference.” They’d have to define what the test is and show us the results. But with those kinds of aggregate statistical tests, if they told us the answer to them, would not give away the source code. There’s no IP issue there.
Would this alleged algorithmic sexism be the sort of thing that Apple and Goldman should have been able to catch before using it on customers?
It’s actually incredible to me how often this does not get caught. It’s because we have no standards in data science. It’s not really even a field yet. I don’t think it should even be considered a science, because science has hypothesis testing and it has a well-defined concept of what is means to have evidence. We still haven’t even maintained what the most basic questions should be. This is the same kind of thing that we saw with the facial recognition software that Amazon and IBM put out to the public, bragging about how great it was. They hadn’t bothered to test to see whether it worked as well on black men as on white men. That’s one of the reasons I keep going back to those categories; you’d think by now with all of these PR debacles that these companies would say, “Let’s do some basic tests and make sure this isn’t crazy.” We need to get to a place where the embarrassment of a PR problem is so bad, or the risk of regulatory oversight is so strong, or the risk of a class-action lawsuit is so real that they start doing this a priori before deploying the algorithms.
Why have regulators not been able to catch and curb this sort of algorithmic bias effectively?
When I have the privilege of talking to lawmakers, and this is a bipartisan issue for the most part, what they respond to most viscerally is stories. Stories of a person being unfairly denied money, which led to them losing medical care, their job, and their house—real-life blood-and-gore stories. And the problem of course is that it’s really hard to know exactly what went wrong with this opaque algorithm. Most algorithmic harm flies entirely under the radar. It happens in the context of people trying to get a job, but they never get interviewed because they’re filtered out by algorithmic job hiring. So how would they know the reason they didn’t get the interview was because an algorithm unfairly labeled them as lazy or whatever it was? The problem is if you don’t know you’re a victim of algorithmic harm, then you can’t tell the story. It’s this invisible system of harm.
On the topic of this invisibility, it seems noteworthy that the person who spotted this alleged gender discrepancy in credit lines was a highly skilled software developer. Is there anything that the average consumer can look out for to detect algorithmic bias, or does it really take a deep knowledge of the technology to spot it?
Right, this is a guy who is highly skilled and powerful, and he knew that he had a right to know, and he knew that being denied that right to know was an offense. Most people are told, “It’s math and you wouldn’t understand it,” and they stop asking questions. So it almost requires a person that is as successful as this guy to say, “Yeah, that’s fucked up.” You have to be immune to that kind of math-shaming.
And it seems like part of the reason why this incident was noticed and got a lot of attention was because this algorithm is particularly consumer-facing. Do you think this would’ve gotten as much attention if the algorithm was being applied to another area, like the prison system?
That’s exactly my problem. This is one of the most used consumer-facing algorithms of all, and even this is very difficult to understand. But think about the algorithms that we don’t even know are being applied. The example I keep coming back to is that when you’re applying for a job, your application goes through a silent filter that you don’t even know exists. These algorithms are everywhere. Everywhere.