The Industry

Some Chatbots Ganged Up and Plagiarized Me

It was easy to spot, and impossible to stop.

A look at the ChatGPT chatbot.
The tools of a thief. Photo illustration by Jonathan Raa/NurPhoto/Reuters

This article is from Big Technology, a newsletter by Alex Kantrowitz.

Last weekend, a new Substack called the Rationalist lifted analysis and writing directly from my own newsletter on the platform, Big Technology. Its plagiarized post on the “Creator Economy”—which I’d covered only days prior—went viral, hitting the front page of Hacker News and sparking a conversation with more than 80 comments. It would’ve been a terrific debut for any publication, if it was authentic.

What made the case of the Rationalist particularly striking, though, was its author—an avatar by the name of “Petra”—admitted they’d used A.I. tools to produce the story, including those from OpenAI, Jasper, and Hugging Face. The speed at which they were able to copy, remix, publish, and distribute their inauthentic story was impressive. It outpaced the platforms’ ability, and perhaps willingness, to stop it, signaling generative A.I.’s darker side will be difficult to tame.

Advertisement
Advertisement
Advertisement
Advertisement

“It’s really hard to predict all the maleficent uses,” said Giada Pistilli, principal ethicist at Hugging Face. “We try to anticipate all the risks, but it’s always super hard.”

The Rationalist is an odd publication. It has no mission. No named authors outside of Petra. It’s been live for a week. And yet two days after it launched, it was lifting passages directly from Big Technology.

Here, for instance, is a Big Technology clause from last week’s story:

With the days of zero-interest-rate froth ending, the investments are becoming more difficult to justify.

And here’s the Rationalist, two days later:

With the end of zero-interest-rate froth, these investments are becoming more difficult to justify.

Advertisement

Here’s another clause from Big Technology:

Online content creation is still mostly viable for the very top echelon of online creators

And here again the Rationalist, two days later:

Only the top echelon of creators are able to make a viable income

A flashy headline—”The creator economy: the top 1% and everyone else”—helped propel the Rationalist’s story to the Hacker News front page, a position typically worth thousands of views. The core of the story, lifted from Big Technology, was good enough to spark a discussion.

Advertisement

Yet as Hacker News users read through, they noticed something was off. “The whole article feels to me like it’s generated by GPT-3 based on a few prompts,” wrote one user. “This wasn’t written by a person,” said another. Then, Petra confessed. “If you are from hacker news, here are the tools I used to improve the readability,” they said before listing OpenAI, Jasper, and Hugging Face. The tools enable A.I. writing and likely helped remix the original article. Petra, who did not respond to a request for comment, didn’t mention the content originated with another publication.

Advertisement
Advertisement

As the story circulated, the tech platforms assisting the Rationalist stood still. OpenAI shared a generic statement that included the line, “Our policies require that users be up-front with their audience.” Hugging Face admitted it had no way of finding the offending user, though Pistilli seemed grateful to be alerted. And Substack promised to investigate.

Substack said it has a policy against plagiarism, which Merriam-Webster defines as “to steal and pass off (the ideas or words of another) as one’s own.” Yet while this case fits the definition, Substack decided to let The Rationalist’s post stand. “At this time we’re unable to conclude with certainty that the post violates our plagiarism policy,” said Substack spokesperson Helen Tobin.

Advertisement
Advertisement
Advertisement

Given the Rationalist’s success, more advanced efforts to copy and remix others’ work with A.I. will likely take place. And it should be easy to improve. The Rationalist was sloppy, lifting clauses word for word. But as publications with similar intent refine their systems, they’ll be able to remove all traces of the original writing and just pass along the ideas. And it shouldn’t be hard to automate either.

Imagine A.I. remixing the Financial Times’ 10 most-read stories of the day—or the Information’s venture-capital coverage—and making the reporting available sans paywall. A.I. is already writing nonplagiarized stories for publications like CNET. At a certain point, some, presumably less reputable publishers will cut corners.

There’s no quick technological fix to these issues. As has been the case for nearly all instances of bad information spreading online, readers and editors will again have to figure this out themselves. “Our competitors rip us off all the time, essentially remixing stuff and sharing,” said the Information CEO Jessica Lessin. “The Information subscribers are smart to get it from the source. But I am watching all this with fascination of course.”

Advertisement