On Tuesday, New York City’s Board of Elections released an incorrect tally of votes in the Democratic primary for mayor that appeared to show a near-dead heat during an automatic runoff—before the count was revealed as a fiasco. The results seemed to show that the lead that front-runner Eric Adams had over Kathryn Garcia and Maya Wiley had significantly narrowed. Adams’ campaign, however, pointed out a discrepancy in the tally, which led the board to withdraw the report. The board later disclosed that it had accidentally included 135,000 test ballots with the group of actual ballots, leading to the error. The snafu comes after the city’s first time using ranked choice voting, in which voters are able to order the candidates from first to last according to preference, and votes are reassigned until a candidate attains a majority. The board is expected to release a corrected intermediary vote count later on Wednesday, though the final results likely won’t come for a few weeks.
To get a better sense of what went wrong, I spoke to Cornell University computer science professor Andrew Myers, who has spent the last 18 years running an open-source online ranked choice voting system called CIVS that organizations can use to run their own elections. Our conversation has been condensed and edited for clarity.
Slate: Given your experience with ranked choice voting, what was your sense of the voting infrastructure and systems that New York City had constructed for the race? Did it seem promising?
Andrew Myers: I don’t know the details about the software they set up, but any time you’re fielding a large system that has software and humans involved and lots of complex processes, it’s hard to get it right the first time. They accidentally counted test ballots, which seems like a mistake they should have ironed out earlier in their process and noticed before they actually released results.
Did this seem like a careless human error, or could it be a more complicated technical bug?
It’s hard to speculate, but what it suggests to me is that they hadn’t done a really thorough dry run of the system before they put it online. Even if they had done that, I might expect some kinds of problems, because if you test the system with 10,000 voters and then you scale it up, suddenly there are changes and new kinds of bugs that show up, but this seems like the kind of bug you could’ve caught with a 10,000-vote test.
It seems to me that they should have noticed that the results didn’t look right. That I think is a human error, but it shouldn’t have been a human error that would’ve even had a chance to occur. When you’re running something like this, it should really be set up to be push-button, and all the processes are really well understood, and everybody knows what they’re supposed to do at each step. To include fake data accidentally suggests they didn’t have their processes rock-solid.
What is the usual procedure for test ballots?
Normally, you would expect that they would have some set of input files that specified the ballots that were supposed to be counted, perhaps separate input files for each geographic region, and maybe with mail-in votes handled separately. I would guess that among those input files, they included the file that they had been using for testing the system.
So if they had done more solid test runs, they would’ve been able to better spot test ballots that weren’t supposed to be there during the actual tabulation?
They were obviously testing the system with their test ballots, but what you would hope is that they would then conduct a mock election where they actually had all of the ballots collected and then tabulate those, and run through the whole process all the way to deciding what the final outcome of the election was. Ideally all of that would be almost entirely preprogrammed, so there wouldn’t be humans doing any of the intermediate steps. People would put inputs in at the beginning of the process, go through the whole thing, and give the final results and produce appropriate diagnostic reports along the way. This suggests to me that their processes were a little bit too manual, and that they weren’t doing enough cross-checking to see that the automatic process was doing what they thought.
The error made it seem like the difference in vote counts between front-runner Eric Adams and two of his competitors were much smaller than they actually were at that point in the tabulation process. Could this have been a result of the test ballots taking earlier polling in account, which suggested that the race was fairly close between the two?
That seems like a reasonable speculation. I could imagine that when they generated the test ballots synthetically, they used the current polling numbers at the time to generate those ballots, and then they would tend to reinforce certain directions that might not turn out to reflect actual electorate.
Are ranked choice voting systems significantly more complicated or prone to error?
Definitely. The way that the rank choice voting system that New York used works is it runs a whole series of simulated runoff elections using the ballots. That would tend to make it harder to spot problems. The conventional election method that we’re all familiar with, plurality voting, has as one really nice property—it’s what we call a summable voting system. You can separately tabulate the outcome of the election on a per-district basis, and then just add those together to get the answer. Whereas with ranked choice voting, all of the ballots have to go to one centralized place where, if mistakes are made, they affect everything. In plurality voting, each of the districts can produce their own results. A district might make mistakes, but you can kind of analyze each district on its own and see if they got it right or not. Here [with ranked choice voting] there is one big algorithm that uses all of the ballots in a way that is a little unpredictable.
What can the Board of Elections do to make sure this sort of error doesn’t happen again?
I would say probably more automation and more verification, more cross-checking and making sure that at each stage, the data that’s available is correct. At any stage of the computation, you should be checking that it’s doing the right thing, but at the same time you shouldn’t have humans moving data around manually.
So in the optimal scenario, it’d be pretty much automated throughout, but you would have more people involved checking to see how it’s going at every stage.
Yeah. We know how to do that with the existing plurality voting system. The processes for doing that effectively have evolved over hundreds of years. We just don’t have as much experience with this kind of stuff.