It was 7 a.m. when Benjamin Perkin’s parents got an alarming call. The person on the other end of the line identified themselves as a lawyer and said Perkin had killed a U.S. diplomat in a car accident, was in jail, and needed money for legal fees. The lawyer then put Perkin on the phone. “Hey Mom and Dad, I love you. I appreciate you. I need this money,” he said. A few hours later, the lawyer called back, requesting $15,000 that same day to get Perkin out of jail. “Yes, I need the money,” Perkin assured his parents.
Perkin’s parents were terrified. They had a brief moment of wondering if this could possibly be a scam, but they remembered that the voice on the phone sounded just like their son’s. They quickly got the money from a bank and sent it through a Bitcoin terminal, as instructed. A few hours later, the real Perkin made his regular phone call to his parents, just checking in. That’s when they realized they’d been scammed. It wasn’t Perkin on the phone that morning—but it did sound like him, thanks to A.I.
On Sunday’s episode of What Next: TBD, I spoke with Pranshu Verma, a reporter for the Washington Post, about how generative A.I. has made it frighteningly easy to replicate someone’s voice, creating a powerful new tool for scammers. Our conversation has been edited and condensed for clarity.
Lizzie O’Leary: How does this technology work?
Pranshu Verma: Essentially, you take 30 seconds, a minute, a few minutes of a person’s voice. You can find that on Facebook, Instagram, YouTube … You upload it to a website. And then that A.I. software analyzes everything that makes your voice unique—how old you are, your accent, your gender, your regional differences, every little part of your voice. Then, it goes into a vast database of voices that it has analyzed. After that, if you’re going to say, for example, the word “hello,” it knows how you would say hello, how every little phoneme might sound. Using these online tools, you can type in “Hi, I’m in danger,” and using a clone voice, you can make it say that.
How common are these types of A.I. scams?
It’s so new, and it’s really hard to say how common it is. After I published my story, I got several emails of people saying that they’ve filed police reports because they’ve been victim to similar scams. But the thing is, we know so little about these scams. One, because it’s so new, and two, by the time the crime has been committed, you don’t have much information for the police to go off of. Three, if you’re submitting a complaint to the FTC, you’re explaining to them what you think happened. People don’t really have a good idea yet of whether they’re victims of an A.I. voice-cloning scam. For example, one of the grandmothers I talked to in the beginning didn’t know it was an A.I. voice-cloning scam. She thought maybe these scammers call anybody, and when they find somebody to pick up the phone, they can somehow analyze who that person is, and then there’s a way to pretend that they’re somebody else. It’s really hard for people to know what’s happened to them.
Is there any way to know who’s behind these things? Often scams are part of organized criminal syndicates. Do we know that in this case?
We don’t know this yet. According to experts that we’ve talked to, it probably is similar to what we’ve seen other imposter scams be, which are large criminal syndicates that can operate domestically or internationally. They can spoof phone numbers, and they have whole elaborate setups to take advantage of, often, the elderly.
You found most of the scam victims in your story on TikTok and YouTube, in part because that was the only avenue they had to tell their stories and warn other people.
Here’s the thing: If you lose money, you go to the police, and then the police want to know how to investigate it. We talked to somebody who had two decades of experience in consumer fraud, and they said, even if you lose $20,000, yes, it’s devastating for that grandmother or that father or mother who loses that money, but for a police organization, if they’re small, they might not have a dedicated team to track down consumer fraud. If they’re big, they have other priorities, too. Do they put three detectives on the case with no information, potentially a spoof phone number, no idea where that phone number’s from, no other leads? They have to weigh the resources, and so that’s why you have people sometimes falling between the cracks.
One of the things that I think is so fascinating and scary about this is how easy the technology is to use. I used a company called ElevenLabs, took a 30-second clip of you, and was able to make a cloned version of your voice. (We reached out to ElevenLabs for comment, but we didn’t hear back by recording time.) What is the positive use case that companies put out there for their product?
We’re still figuring what the positive use cases are. If you watch Top Gun: Maverick, Val Kilmer, for example, wasn’t able to speak for parts of that, but they recreated his voice. They had a lot of Val Kilmer’s voice to go off of from his previous movies, and now with an actor that can’t speak anymore, we’re able to recreate his voice.
But the higher-profile use of this technology, at least right now, is to make deep fakes.
Things like Joe Biden, Donald Trump, and Barack Obama debating the best fast-food chains, or Joe Rogan selling libido-boosting coffee on TikTok. It’s not hard to imagine internet trolls using this tech to cause real havoc or more sophisticated groups using it for disinformation. Has that happened yet?
ElevenLabs, for example, has gotten a lot of heat in the past month or so because somebody recreated Emma Watson’s voice saying passages from Adolf Hitler’s Mein Kampf. She never said any of that. Now imagine that taken to the nth degree: An election happens, a politician says something that they would never say, and then that spreads over WhatsApp or whatever. And that’s possible, because the guardrails still aren’t as good yet. ElevenLabs has put in some guardrails. They say, “If you have a free account with the company, you can’t recreate voices like this,” but the next level up of an account is just $5, so that’s not too prohibitive. People are trying to put watermarks in these types of audio saying, “Oh, it’s A.I.-generated.” But by the time you have a convincing-sounding thing spreading on the internet or on WhatsApp channels, the damage is often already done. The industry is having to grapple with whether this should be so easy to do and how to actually protect it.
Is there any regulation of this?
Unfortunately, not that I’m aware of. Our politicians are still trying to figure out how to regulate social media companies like Facebook, Twitter, and Instagram. I have, personally, no idea when they’re going to get around to actually regulating generative artificial intelligence. There are some bits of chatter in the Supreme Court about hints that generative artificial intelligence companies should be responsible for the content that they help create. But again, this is still so early. We don’t see much by way of federal regulation yet, and right now it’s on the individual companies to do that.
A.I. is so hot right now. There has to be some awareness on Capitol Hill or in the agencies that this is a force that they are going to have to reckon with soon. Do you see that seeping into general consciousness?
Yeah, I am seeing that seep into agencies like the FTC, the FBI. They are aware, and the hard part is that they need people to be vigilant. People are their best advocates, so they have offered very specific ways of protecting yourself. They all say, “If you ever get a call from a loved one asking for money, pause that call right away and then call that loved one yourself. Call that number that you know, and confirm that it’s actually them.” If somebody’s asking you to give money in the form of a gift card or at some weird location—to go to a Bitcoin terminal or go meet in a car someplace—be incredibly suspicious. Trust your gut here.
Informally, we’ve actually heard a lot of families who, in order to protect their elders, have created somewhat of a safe word. So the son or the daughter might say, “Hey, Mom, Dad, if I’m actually at risk, I’m going to say the word ‘pineapple,’ and that’s how you’ll know that it’s actually me asking for money.” People are trying to find creative ways to actually get through this, which is very interesting and engaging, but also very sad that we have to create safe words right now to ensure that the conversations we have are trustworthy.
What do the victims of these scams want people to know?
Be vigilant. When it comes to money, don’t trust that the person over the phone is who they say they are, unless you verified it.
And I will say in both cases I reported on, people felt very embarrassed. There was a sense of shame and embarrassment because they felt like they had been duped by something they shouldn’t have been duped by. So they wanted people to know: Don’t feel personally shamed about this. This is really new stuff. Be vigilant, but understand that this is a whole new world we’re going into, and it comes with a lot of risks.
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.