Apple’s AI audiobooks have human narrators who are afraid of losing their jobs. They should be.
When Apple quietly launched a catalog of AI-narrated audiobooks in early January, it was surprising news, and it wasn’t. Robot narrators are not new: Alexa provides text-to-speech for Kindle content and Google offers a variety of artificial voices of different genders and accents for those who want to publish “auto-narrate” audiobooks.
The difference is that Apple’s four voices-“Madison” and “Jackson” suggested for fiction, “Helena” and “Mitchell” for non-fiction – sound much more natural than the digitally generated voices available elsewhere, leading to fears that they could replace human ones entirely narrators. A few of Apple’s voices even bear a noticeable resemblance to the voices of well-known members of the human audiobook narrator community. “There’s a bit of tension there,” Edoardo Ballerini told me. “There has been a sense that narrators should stay out of this, that they should not participate in hastening the death of their colleagues.”
dancers, profiled in the New York Times as “the voice of God”, is among the star storytellers whose performances have become a selling point in themselves. (Knowing that I’ll get to hear the lyrics read in Ballerini’s soulful voice definitely made me buy an audiobook when I was otherwise on the fence.) Ballerini said he has yet to be approached with an offer about making the velvety building available. blocks an AI version of his own voice, but “I know other people who have, and some have refused. Others, it sounds like, didn’t.”
For Emily Woo Zeller – narrator of Marie Kondo’s bestseller The life-changing magic of decluttering and winner of AudioFile magazine’s 2020 Golden Voice award – the question is more existential. By providing recordings that help artificial intelligence learn to speak more naturally, she noted, narrators engage in “another level of giving away the voice.”
Because Apple’s AI narrative is shrouded in secrecy and (presumably) NDAs, there is no confirmed account of how the narrators behind the voices of Madision, et al. was compensated. But Zeller pointed to the example of Susan Bennett, who unwittingly provided the voice for the original Siri, Apple’s digital personal assistant. Because the recordings that became the basis for Siri were commissioned by another company for a different purpose, Bennett, who received a one-time payment, didn’t even know she had become the voice of a million iPhones until a friend alerted her to the similarity. when Siri was introduced six years later. (Apple has never confirmed whose voice was the basis for Siri, however an audio forensics expert consulted by CNN expressed “100 percent” certainty that it is Bennett.)
In the absence of solid information about Apple’s contracts with the actors it used, members of the professional storytelling community are concerned that they will be the next to be Siri-ized. They worry, as Zeller puts it, that “we get paid one sum, and the producer or publisher owns the work and everything related to it forever and ever,” effectively taking possession of the narrator’s distinctive voice.
But what about the audiobooks themselves? Is the AI good enough to make a human narrator redundant? That’s right, listening to the examples of page Apple uses to promote the service for authors and publishers can be unsettling. Like other AI-generated content currently circulating online, they seem plausibly human. But after listening to selections from more than 25 of the AI-narrated audiobooks recently released in the Apple Books store (search “AI narration” in the Books app), I’m convinced that the technology still has a long way to go.
Part of the problem is that the types of titles that seem most likely to receive AI narration—older or self-published books that probably don’t sell enough copies to make it affordable to compensate a human narrator—tend to be fiction , and the AI narrators are simply terrible at fiction. The majority of these audiobooks are romances and thrillers. It’s hard to imagine romance fans getting excited about dialogue from one of the genre’s sexy alpha heroes when it’s recited in the earnest female voice of “Madison,” which appears to be by far the most popular of Apple’s four options. I also listened to in media res opening scene of a thriller in which the narrator and his lover (perhaps some sort of scientist) launch a giant rocket on a hill overlooking London. “ ‘Don’t let me go!’ she shouted,” Jackson recited with zombie-like casualness.
“ ‘Don’t let me go!’ she shouted,” Jackson recited with zombie-like casualness.
Another thing the AI narrators fail at is humor. “We didn’t just move to Huxbury, we moved to outskirts of Huxbury,” complains the 11-year-old narrator of the middle-grade novel From ant to an eagle by Alex Little. He is appalled that his family has moved him beyond the sticks, but the line registers as nonsensical when read with the flat intonation of the AI narrator. Accents are another stumbling block. Listening to Madison’s story The lady’s deceptiona Regency-era gothic by Susanna Craig, asks the question: Can a runaway English bride find love with a haunted Irish rebel if they both sound exactly the same 21StAmerican woman of the century?
Professional narrators such as Ballerini and Zeller create distinct voices for the dialogue between each character in a novel. And even if a sophisticated artificial intelligence one day emerges that can change its voice depending on who is speaking, as Zeller pointed out, “Context is everything, and we can make a different choice about the way a sentence is delivered because of of who is saying it to whom at any given moment in history.” The whole point of fiction, especially genre fiction, is to deliver an emotional experience and a narrator who is by definition incapable of having an emotional experience seems unlikely to be able to make artful decisions about how to read a dramatic scene. “You need the human factor in storytelling,” Ballerini said.
Nonfiction, however, is a different story. There are a handful of non-fiction books (many of them Canadian for some reason) in Apple’s AI-narrated stable. Like fiction, narrative nonfiction based on scenes with dialogue, such as a biography of the founder of the National Film Board of Canada, runs into the problem of AI narration flattening the drama. Similarly, even when the anecdotes in 101 Fascinating Hockey Facts are bad enough to be funny, the AI narrator recites them with a solemnity better suited to That 9/11 Commission Report.
But books like When your baby won’t stop crying by Tonja Krautter are ideal candidates for AI storytelling. Their audience is limited enough that an audiobook with a human narrator might not be feasible, and their simple goal of providing information wouldn’t necessarily require a producer and editor. (If anything, the eerie calm of the AI narrative just seems like the ticket to a parent running ragged from a colicky infant.) “Look, I’m not a fan of AI voices,” Ballerini said. “But there’s a fair argument that it can serve a purpose, with backlist titles and nonfiction that nobody would put into audio anyway. Here’s a tool that can make that accessible to people.” Not all sleep-deprived parents — not all readers, period — are able to read from page or screen, and an affordable method that makes more books available to them would be valuable.
Self-help, inspirational and business books lose very little in the immaterial hands of an AI narrator. “There are listeners who will listen at twice the speed,” Zeller pointed out, “so they’re not listening for the human content anyway.” Existing services already provides shortened versions of titles such as Atomic habits and The 7 Habits of Highly Effective People so busy people can decant the books’ contents into their own brains as efficiently as possible. These readers are not going to miss the richness and nuance of a real voice.
The future of AI storytelling could get weird. Ballerini foresees a time when celebrities will license AI based on existing recordings of their voices, and “you could have Tom Hanks read your book if you’re willing to pay for it.” However, this can also create additional headaches for those tasked with managing celebrities’ public profiles. “You might hire Meryl Streep’s voice to read your erotic novel, and they might not like that,” Ballerini speculated further. “But are Meryl Streep’s lawyers really going to catch up if it’s a small title that sells 20 copies? It’s kind of the Wild West right now.”
Both Ballerini and Zeller foresee a tiered market in which high-profile or well-funded titles—a Stephen King novel or a billionaire’s memoir—get audiobooks with human narrators, while more marginal books, and ultimately mid-list titles, will increasingly resort to to AI This could easily become another term for prestige in the publishing industry. It is also likely to undermine a segment of audiobook production where the authors of smaller titles team up with beginning narrators to form a seed market for artists still learning the narrator’s craft. “There’s not going to be that stepladder anymore,” Zeller said.
This scenario also raises a question for the human narrators whose (as yet officially unidentified) voices have served as the basis for Apple’s AI narrative. If digital approximations of their voices become the familiar sound of low-budget audiobooks, will that make their own voices sound “cheap” and lower their value as human narrators? All anyone can do now is speculate, but Zeller wants the developers of this AI to realize and respect that at the core of what they’re creating is something stubbornly human: the voices of real people, with all that complexity and feeling they contain. “You don’t scale the technology,” she said, “the technology is used to scale U.S.”