When the Russian warship Moskva sank in the Black Sea south of Ukraine, some 500 crew members were reportedly on board. The Russian state held a big ceremony for the surviving sailors and officers who were on the ship. But, considering Russia’s history of being not exactly truthful when it comes to events like this, many people wondered whether these were actual sailors from Moskva. That’s where Aric Toler comes in.
Toler is director of research and training for Bellingcat, the group that specializes in open-source and social media investigations. He used facial recognition software to identify the men in the video through images in Russian social media, and found that most of the men were indeed sailors from Sevastopol, the town the ship was operating out of. Aric is the first to admit that on its own, this morsel of information isn’t going to change the course of the war, but it is a tiny, momentary clearing in the fog of an often obscured picture.
On Sunday’s episode of What Next: TBD, I spoke with Toler about how facial recognition technology is being used in unexpected—and sometimes alarming—ways in Ukraine. Our conversation has been edited and condensed for clarity.
Lizzie O’Leary: One of the main facial recognition tools you use in your work—the one you were using to investigate that video of the Russian sailors—is a program called Findclone. It’s cheap, about $5 a month, and it’s easy to sign up for. How does Findclone work?
Aric Toler: At some point, in 2018 or so, the guy or guys who run it scraped basically all of VK, which is the Russian Facebook. It has hundreds and hundreds of millions of users. It’s extremely popular. They scraped every single photo from the site. They then took these billions of photos and they ran the machine learning algorithm to do really good facial recognition on it. They were able to link every photo back to its original post and profile. So every single photo on VK was put through the ringer of machine learning facial recognition and put on the site, and you can put a face on there and search it. The benefit from this is not just that you search a face and you see the person’s profile, which happens maybe 50 percent of the time. What happens more often is, we do a lot of work with Russian spies and security service officers, people who don’t usually have accounts. But their wives do. And their old college buddies do, and their brothers do, and their moms do, and their kids do. You’ll find them in the background of photos at a birthday party they had, you can see their face behind a cake. Then you look at the name and identity of the person uploading the photo and you can figure out that’s them because they have the same last name or they live in the same town.
What’s it like to be using these kinds of tools? Is it exciting? Scary?
The first time you use it your mind is blown, because you’re like, how is this possible? How could this be so easy? You just plug it in and there you go. There’s the profile. There’s their family. We have some spy that we’ve been researching for months and month, and we have a passport photo and all of a sudden the whole world is opened up. It’s also very creepy, because you’re doing the searches and you’re pulling up all these innocent bystanders. You’re finding their family, people who have no idea or who have limited understanding about who these people are that we’re investigating. You feel like you’re getting a view into their life that you probably shouldn’t be able to. But because of this tech, it’s possible.
But it’s not just the tech. It’s also possible because of the specific environment you’re operating in. The Russian internet, especially when it comes to data and privacy, is worlds away from the American one.
There are constant, constant, constant data leaks coming out of Russia. Massive leaks. The entire government registry of cars registered in Moscow for three years was leaked, so if you own a car in Moscow, then your passport number, your date of birth, your address, your phone number is public and online. Russia is kind of a wild, wild East of data privacy and data legislation because there basically is none. About a year ago, for about 50 bucks or so, you could buy someone’s cellphone records. There are a lot of stories about wives who think their husbands are cheating on them and they buy their husband’s cellphone data and they see that they’re calling some woman at 2 a.m. and they know that they’re cheating on them. This doesn’t exist in the U.S. at all.
In the U.S., a lot of the conversations, particularly around facial recognition tech, have been about domestic policing and data gathering. How would you describe the role that that same tech has been playing in war over the past few years?
Russia’s kind of its own thing here. There’s lots of facial recognition services out there in Russia, and it’s been used in the war for years now. Basically, since these services started popping up around 2016, 2017, people have been using this to identify soldiers. Especially back in the earlier stage of the war, when Russia denied ever being involved in the war at all, people would run these on fighters and soldiers to prove that they’re Russian mercenaries or soldiers or whatever. It’s not quite apples to oranges when you compare how this is used by U.S. police and government forces in the U.S. and probably the U.K. and Europe.
I hate to use the phrase “perfect storm,” but in some ways it seems applicable because you’re talking about a place with a very different data culture and the bottom-up abilities to use things like Findclone. There are just so many different permutations of it.
Yeah, it’s like four or five things all hitting it at once. And one more thing you can add to that is just petty corruption. This is a big reason why so much of this data is out there: people who leaked this information just because their salaries are not high enough, so on the side they go to databases and sell data. A lot of people ask “Why don’t you do this for the CIA? Why don’t you choose for Mossad or MI6 or whatever?” Well, you know, it’d be great if we could, but these circumstances don’t exist basically anywhere else on earth, this perfect storm of data.
Recently the circumstances got a bit more complicated. The Ukrainian government started working with Clearview AI, the controversial American facial recognition company. Clearview is infamous for working with law enforcement and for scraping people’s photos from the internet without their consent. The company says it offered its services to five Ukrainian government agencies free of charge. According to multiple news reports, that’s led to Ukrainian soldiers scanning the faces of dead Russians to find their identities, then contacting their families.
They published videos about this, showing the chats that they have with the mothers of the dead soldiers. This is kind of like schadenfreude—it’s supposed to be like, look at them suffering, look at the family, look at the mother being shocked and horrified about her son being dead. That’s kind of the most twisted and worst version of this, but I think there are some cases that could be called benign good cases.
I think there is a narrative in the press about the sort of “good use case” of facial recognition in in this war, maybe on the side of Ukraine. As someone who deals in the gray areas, how do you react to seeing all those different pieces of information come out?
It’s really hard to unpack. The more nefarious and dubious stuff is Ukrainian soldiers contacting the families of dead soldiers in order to harass them and mock them. The more benevolent or good use case of is Ukrainian-Russian independent journalists trying to contact the family to talk about, “Have you received compensation for the death? Have you been told about this? Have you been told to hush up by the government?”
Do you think that we could ever have average people using facial recognition in the U.S. the way they are in Russia?
All it takes is one big dump of photos from Instagram or Facebook, and then it could be applied. It’ll never be to the degree of Russia and Ukraine, just because the base of the country is just so corrupt and rotten, in both countries, with data leaks and corruption and all that stuff. But we have our own American version of this, we have lots of access to commercial apps and providers that could be leaked at some point.
There’s a million examples of people buying commercial data. This is legal. It shouldn’t be, but it’s legal. You can buy people’s geolocation data from different apps that track it. Most famously there was an example of someone who was able to buy the geolocation data that came from Grindr. And they used that to out a Catholic priest. If we ever do get to that singularity of data being available to everyone, it’ll probably be through some kind of leak or selling data through commercial apps related to your face. Maybe there’s a big data dump from Facebook or LinkedIn. If this happened for Facebook, I think the world would burn down with how people would use it.
There’s been work by Timnit Gebru, Joy Buolamwini, and others that shows that facial recognition works less well on darker skin tones. That’s something that I think about a lot if we’re thinking about an American application. How does race fit into this picture?
That’s definitely an issue because these things are tested most often on white people. This is something I run into a lot when I run Findclone on people who are not ethnically Russian or white. Russia’s a very diverse country with hundreds of races and ethnicities and religions. Sometimes I run facial recognition for people who are from the Far East, like Buryats, an ethnic group that’s near Mongolia, and the results are much worse than if I were to run on an ethnically Russian person. If I run a facial recognition of a Black person on Findclone, it’ll bring up Barack Obama and NBA players. That’s mostly because there’s not a lot of Black people in Russia.
That’s what the AI was trained on.
Exactly. So the data is just not as good.
I’m struck by the tension between two things you said. Number one is that “It’s going to happen. These tools are going to be more widespread,” but also, “If it happened with Facebook, the world would burn down.”
It already exists behind closed doors. Google, Facebook, Amazon, Microsoft, all these places already have extremely powerful facial recognition that works. It’s kind of like a dam. What could burst the dam is if this stuff is made commercially available. All it takes is one person on GitHub, plus a data leak. And once that happens it’s fair game.