The language in Nature was pretty mild as far as freakouts go. ChatGPT and other similar A.I. tools, the editors wrote, threaten “the transparency and trust-worthiness that the process of generating knowledge relies on … ultimately, research must have transparency in methods, and integrity and truth from authors.” The editor of Nature’s chief rival, Science, similarly blew his stack in a most genteel manner: “An AI program cannot be an author. A violation of these policies will constitute scientific misconduct no different from altered images or plagiarism of existing works,” he wrote.
These might seem like gentle warnings, but to academics who submit research papers to peer-reviewed journals like Science and Nature, the specter of being charged with research misconduct—potentially a career-wrecking accusation—for using A.I. is about as subtle as an air-raid siren. It’s a bright red neon sign saying, in all caps, “Don’t go there.”
All this pearl-clutching isn’t the result of a hypothetical problem. Just a few days earlier, Nature’s journalists had covered how ChatGPT-written abstracts were science-y enough to fool fellow scientists, and, worse, that A.I.-co-authored articles were already working their way into the peer-reviewed literature. Just as university teachers have already started finding ChatGPT-written essays in the wild and journalists have been discovering A.I.-written news articles, scientific journals suddenly realized that the abstract threat of machine-written “research” articles was already becoming very, very concrete. It seems like we’re just weeks away from a tsunami of fake papers hitting the peer-reviewed literature, swamping editors and drowning the scientific process in an ocean of garbage.
The journals are absolutely right to worry; ChatGPT and, presumably, its A.I. successors yet to come represent a potential existential threat to the peer review process—a fundamental mechanism that governs how modern science is done. But the nature of that challenge isn’t fundamentally about the recent, rapid improvement in A.I. mimicry as much as it is about a much slower, more insidious disease at the heart of our scientific process—the same problem that makes A.I. such a threat to university teaching and to journalism.
ChatGPT isn’t the first research-paper-writing machine to drive journal editors to distraction. For nearly two decades, computer science journals have been plagued with fake papers created by a computer program written by MIT grad students. To use this program, named SCIgen, all you have to do is enter one or more names and, voilà, the program automatically spits out a computer science research paper worthy of submission to a peer-reviewed journal or conference. Worthy, that is, if none of the peer reviewers bothered to actually read the paper. SCIgen-written articles were so transparently nonsense that anyone with the slightest expertise in computer science should have spotted a hoax before finishing the first paragraph. Yet not only were SCIgen papers regularly getting past the peer review process and into the pages of scientific journals, it was happening so regularly that, in the mid-2010s, journals deployed an automated detector to try to stem the tide. Nowadays, unretracted SCIgen papers are harder to find, but you can still spot them in bottom-feeder journals every so often.
Back then, the real story wasn’t about how sophisticated SCIgen was; after all, it was a simple script that could only create a really poor facsimile of a research article, one that shouldn’t fool anyone who belonged within 10 feet of a computer science journal. As I wrote in Slate eight years ago, “The likelihood of a real computer scientist being fooled by a SCIgen paper is roughly the same as that of a classicist mistaking pig Latin for an undiscovered Cato oration.” A SCIgen paper shouldn’t pass even the most cursory “peer review” by the most clueless “peer” in the field. Yet here, too, the journal editors were freaking out to the point of deploying automated defenses against SCIgen attacks.
This says way more about the peer review process than it does about the state of computer mimicry. In theory, the idea of the peer review process is that a scientific claim needs to be vetted by two or three or more scientists of similar stature who prod the research and look for errors, logical holes, and other problems. (According to scientific legend, reviewer No. 2 is always hard to please.) Peer review is a quality-control measure that’s supposed to ensure that what gets published meets a certain minimum standard—not that the research is right, but that it’s not obviously wrong, and that it has some value in advancing knowledge in the field. And that’s why peer-reviewed publications are so much more valued in science than unvetted, “preprint” articles: By putting its imprimatur on an article, a peer-reviewed journal is telling the scientific world that that minimum standard is met. And for the most prestigious journals in the scientific world, Science and Nature and the New England Journal of Medicine and the like, those minimum standards are supposed to be quite high.
But vetting a scientific document takes a lot of thought and work, and the scientists who do it aren’t generally paid by the journals they’re doing all this labor for. It shouldn’t come as a surprise that often they—or the graduate students they dragoon into doing the work for them—don’t always do the best job of review. And as the number of publications and paper submissions increases, the pool willing to do serious peer reviews is being spread thinner and thinner. It’s a scaling problem: The process of vetting information is what gives the peer-reviewed journal its value—but because doing that process well is so labor-intensive, it’s a bottleneck that can’t expand in the same way that the journals themselves are. The result is an almost inevitable decline in the quality of peer review. However, most journals are more than willing to take the hit, as the quality of peer review is mostly invisible. For most, the only sign that things are amiss is when obvious garbage starts getting past the gatekeepers—the gatekeepers that are the whole point of the journal in the first place.
What ChatGPT is doing, like SCIgen before it, is generating BS that should (at least at this point) fail a serious peer review. Sure, ChatGPT and its A.I. siblings are orders of magnitude more sophisticated than SCIgen, but if you cut through the verbiage, much of what these programs produce is wrong, even nonsensical. They make up facts and generate fake sources to back up their lies. A proper peer review, which checks references and vets facts, should catch the problem.
Take, for example, one of the peer-reviewed papers Nature described—“Can GPT-3 write an academic paper on itself, with minimal human input?” Posted on a preprint server in July, the article was reportedly rejected by one journal and then accepted to another with revisions. While we don’t yet know the nature of those revisions, the paper that the authors have made public is full of errors that a competent peer review should catch. The references are a mess: Two seem to be referencing the same online article in different ways—an article that doesn’t appear to exist. And if it did exist, it couldn’t contain the claimed information about the GPT-3 bot, as the reference dates from four years before it first premiered. Similarly, another reference seems made up out of whole cloth. Two of the referenced works do exist, but a cursory check shows that the contexts of those references are entirely bogus: They don’t say what the peer-reviewed paper says they say. (The researchers submitting the paper did, happily, notice that the references were nonsense.) More important: The A.I.-generated text was anodyne and information-free; it added nothing to anyone’s understanding of anything. The bot is constitutionally unable to advance knowledge in the field because it’s fundamentally a recycling engine, regurgitating what it’s been trained with in a controlled manner. That’s not the way to advance knowledge. By definition, it doesn’t belong in a peer review journal.
Peer review journals are worried because the ease of producing good-looking but vacuous articles risks exposing how thin the peer review barrier separating real science from nonsense truly is. At heart, it’s the same reason university professors and journalists are worried, too.
Peer review is essentially an informational quality control system. When professors grade essays and tests, they’re performing a similar function: They’re trying to sort good from bad to ensure that only students that meet a minimum set of standards wind up getting the stamp of approval from the university. Journalists at serious news outlets are also in the business of information discrimination; in theory, at least, they hoover up facts and opinions from a variety of sources, synthesize a story out of it, and only after multiple layers of fact-checking and editorial polishing does it get presented to the public. When professors aren’t able to distinguish between a student and an A.I., or editors put spit-polish on a GPT-generated turd rather than discarding it altogether, it shows that the discrimination process is badly broken.
The choice is to tackle the problem head-on—consciously improving standards despite the pressure to keep getting bigger and faster and cheaper—or to let these discrimination processes decay. Luckily, A.I. provides tools to help us with the latter; for every A.I.-written paper or essay, A.I. can help overworked scientists and professors generate a peer review or a grade.
In other words, if the A.I. apocalypse is here, it’s not so much a Terminator as it is an Idiocracy.
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.