When Apple quietly launched a catalog of A.I.-narrated audiobooks early in January, it was surprising news, and it wasn’t. Robot narrators are not new: Alexa provides text-to-speech for Kindle content and Google offers a suite of artificial voices of various genders and accents for those wishing to publish “auto-narrated” audiobooks.
The difference is that Apple’s four voices—“Madison” and “Jackson” suggested for fiction, “Helena” and “Mitchell” for nonfiction—sound much more natural than the digitally generated voices available elsewhere, leading to fears that they could replace human narrators altogether. A few of Apple’s voices are even noticeably similar to the voices of well-known members of the community of human audiobook narrators. “There’s a little tension there,” Edoardo Ballerini told me. “There has been a sense that narrators should stay away from this, that they shouldn’t participate in the hastening of their colleagues’ demise.”
Ballerini, profiled in the New York Times as “the voice of God,” is among the coterie of star narrators whose performances have become a selling point in themselves. (Knowing that I’ll get to hear the text read in Ballerini’s soulful voice has certainly prompted me to buy an audiobook when I was otherwise on the fence.) Ballerini said he hasn’t been approached yet with an offer to provide the velvety building blocks for an A.I. version of his own voice, but “I know other people who have, and some have refused. Others, it sounds like, did not.”
For Emily Woo Zeller—narrator of Marie Kondo’s bestselling The Life-Changing Magic of Tidying Up and winner of AudioFile magazine’s 2020 Golden Voice award—the issue is more existential. By providing recordings that help artificial intelligence learn to speak more naturally, she noted, narrators are participating in “another level of giving the voice away.”
Because Apple’s A.I. narration is shrouded in secrecy and (presumably) NDAs, there’s no confirmed account of how the narrators behind the voices for Madison, et al., were compensated. But Zeller pointed to the example of Susan Bennett, who unwittingly provided the voice for the original Siri, Apple’s digital personal assistant. Because the recordings that became the basis for Siri were commissioned by another company for another purpose, Bennett, who received a one-time payment, didn’t even know that she’d become the voice of a million iPhones until a friend alerted her to the similarity when Siri was introduced six years later. (Apple has never confirmed whose voice was the basis for Siri, but an audio-forensics expert consulted by CNN expressed “100 percent” certainty that it’s Bennett.)
In the absence of solid intel on Apple’s contracts with the actors it used, members of the professional narrator community are concerned they’ll be the next to be Siri-ized. They worry, as Zeller puts it, that “we get paid one sum and the producer or publisher owns that work and everything related to it forever and ever,” effectively taking possession of the narrator’s distinctive voice.
But what about the audiobooks themselves? Is the A.I. good enough to render a human narrator superfluous? It’s true that listening to the samples provided on the page Apple uses to promote the service to authors and publishers can be disconcerting. Like other A.I.-generated content currently circulating online, they seem plausibly human. But after listening to selections from more than 25 of the A.I.-narrated audiobooks recently released in the Apple Books store (search “AI narration” in the Books app), I’m convinced that the technology still has a long way to go.
Part of the problem is that the types of titles that seem most likely to receive A.I. narration—older or self-published books unlikely to sell enough copies to make compensating a human narrator affordable—tend to be fiction, and the A.I. narrators are simply terrible at fiction. The majority of these audiobooks are romances and thrillers. It’s hard to imagine romance fans thrilling to dialogue from one of the genre’s sexy alpha heroes when it’s recited in the earnest female voice of Madison, which seems by far to be the most popular of Apple’s four options. Likewise, I listened to the in medias res opening scene of a thriller in which the narrator and his lover (some kind of scientist, perhaps) are setting off a gigantic rocket on a hill overlooking London. “ ‘Don’t let go of me!’ she shouted,” recited Jackson with zombie-like placidity.
Another thing the A.I. narrators fail at is humor. “We didn’t just move to Huxbury, we moved to the outskirts of Huxbury,” complains the 11-year-old narrator of the middle-grade novel From Ant to Eagle by Alex Little. He’s appalled that his family has relocated him beyond the sticks, but the line registers as nonsensical when read with the flat intonation of the A.I. narrator. Accents are another stumbling block. Listening to Madison narrating The Lady’s Deception, a Regency-era gothic by Susanna Craig, poses the question: Can a runaway English bride find love with a haunted Irish rebel if they both sound like the exact same 21st-century American woman?
Professional narrators like Ballerini and Zeller create distinct voices for the dialogue of each character in a novel. And even if a sophisticated A.I. someday emerges that can alter its voice depending on who’s speaking, as Zeller pointed out, “Context is everything, and we may make a different choice about the way that a sentence is delivered because of who is saying it to whom at any given time in the story.” The whole point of fiction, particularly genre fiction, is to deliver an emotional experience, and a narrator definitionally incapable of having an emotional experience seems unlikely to be able to make artful decisions about how to read a dramatic scene. “You need that human factor in storytelling,” Ballerini said.
However, nonfiction is another story. There are a handful of nonfiction books (many of them Canadian, for some reason) in Apple’s A.I.-narrated stable. Like fiction, narrative nonfiction that relies on scenes with dialog, such as a biography of the founder of the National Film Board of Canada, runs into the problem of A.I. narration flattening the drama. Similarly, even when the anecdotes in 101 Fascinating Hockey Facts are meant to be amusing, the A.I. narrator recites them with a solemnity better suited to The 9/11 Commission Report.
But books like When Your Baby Won’t Stop Crying by Tonja Krautter are ideal candidates for A.I. narration. Their audience is limited enough that an audiobook with a human narrator might not be feasible, and their simple goal of delivering information wouldn’t necessarily require a producer and editor. (If anything, the eerie calm of the A.I. narration seems like just the ticket for a parent run ragged by a colicky infant.) “Look, I’m not a fan of A.I. voices,” Ballerini said. “But there is a reasonable argument that it can serve a purpose, with backlist titles and nonfiction that nobody was going to put into audio anyway. Here is a tool that can make it accessible for people.” Not all sleep-deprived parents—not all readers, period—are able to read from page or screen, and an affordable method that renders more books accessible to them would be valuable.
Self-help, inspirational, and business books lose little in the immaterial hands of an A.I. narrator. “There are listeners who will listen at two times the speed,” Zeller pointed out, “so they’re not listening for the human content anyway.” Existing services already provide abridged versions of titles like Atomic Habits and The 7 Habits of Highly Effective People so that busy people can decant the books’ contents into their own brains as efficiently as possible. Those readers are not going to miss the richness and nuance of a real voice.
The future of A.I. narration could get weird. Ballerini foresees a time when celebrities will license A.I. based on existing recordings of their voices, and “you could have Tom Hanks read your book if you’re willing to pay for it.” This, however, could also create further headaches for those charged with managing celebrities’ public profiles. “You might hire Meryl Streep’s voice to read your erotica novel, and they might not like that,” Ballerini further speculated. “But are Meryl Streep’s attorneys really gonna catch up with it if it’s some small title that sells 20 copies? It’s a kind of the Wild West right now.”
Both Ballerini and Zeller predict a tiered market in which high-profile or well-funded titles—a Stephen King novel, or the memoir of a billionaire—get audiobooks with human narrators, while more marginal books, and eventually midlist titles, will increasingly resort to A.I. This could easily become yet another signifier of prestige in the publishing industry. It will also likely undercut a segment of audiobook production in which the authors of smaller titles pair up with beginning narrators to form a starter market for performers still learning the narrator’s art. “There’s not going to be that stepladder anymore,” Zeller said.
This scenario also raises a question for the human narrators whose (as yet officially unidentified) voices have served as the basis for Apple’s A.I. narration. If digital approximations of their voices become the familiar sound of low-budget audiobooks, will that in turn make their own voices sound “cheap” and lower their value as human narrators? All anyone can do now is speculate, but Zeller wants the developers of this A.I. to realize and respect that at the heart of what they’re creating is something stubbornly human: real people’s voices, with all the complexity and feeling they contain. “You’re not scaling the technology,” she said, “the technology is being used to scale us.”