The Internet’s Dizzying Citogenesis Problem

Circular reporting is a real problem on platforms like Wikipedia—and it’s harder to solve than it looks.

The cycle of citogenesis depicted around a recycling symbol.
Photo illustration by Slate. Photo by Alex Knight/Unsplash.

Welcome to Source Notes, a new Future Tense column in which Stephen Harrison explores Wikipedia, digital knowledge, and the search for a fact-based world.

Two weeks ago, Dr. James Heilman discovered something strange. The Canadian emergency room physician and avid Wikipedia contributor noticed that DrugBank, an online database for drug information, was copying text directly from Wikipedia. Although Heilman considers Wikipedia’s medical content to be of surprisingly good quality, he was concerned—because he didn’t just find DrugBank copying and citing Wikipedia; he had also found several examples of Wikipedia likewise copying and citing DrugBank.

For example, the DrugBank page for protamine sulfate, a medication often used in heart surgery, included information taken from the medication’s Wikipedia page. But the medication’s Wikipedia page cited the medication’s DrugBank page. This circular referencing created a problem for proving the veracity of the information found on the Wikipedia page—where was the original source material really from? Could it be trusted? Heilman found five other Wikipedia articles about medications with similarly circular references, and he speculated that there could be many more. He contacted DrugBank about the issues, and they are working to resolve them. Meanwhile, Heilman (known by the Wikipedia username “Doc James”) expressed concern on the WikiProject Medicine talk page—the forum where Wikipedia editors discuss improvements to articles about medicine and health—that “citogenesis has become a reality.”

Author and Xkcd creator Randall Munroe first coined the term “citogenesis” to describe this type of circular referencing in a popular 2011 comic. In it, Munroe depicts citogenesis as a four-step process:

XKCD comic showing the four steps of citogenesis: (1) A user invents facts out of thin air and adds them to a Wikipedia page without reference. (2) A rushed writer, gathering information from the site, then incorporates that false information into their independent summary of the subject. (3) A diligent Wikipedia editor then adds a reference to that writer’s published work on the subject as a source for that original Wikipedia entry. (4) Other outside people continue to read and repeat that false information, resulting in wide distribution of the initial falsehood.

Mathematicians and Ashton Kutcher fans might conceptualize this as a sort of digital knowledge butterfly effect.

It’s difficult to quantify the prevalence of citogenesis incidents involving Wikipedia because the platform doesn’t require citations for new entry of information, so there’s no systematic way to identify each potential circular reference. Instead, many incidents are stumbled upon by users themselves when they come across an eyebrow-raising entry. Examples include a series of study guides that relied heavily on Wikipedia content and were then used as a source themselves, and the Jar’Edo Wens hoax, in which a user made an unsourced Wikipedia article about a fictitious Australian aboriginal god with a name strikingly similar to “Jared Owens” (the only other contribution associated with the user’s IP address was an entry for another fake Australian god, “Yohrmum”). By the time Jar’Edo Wens was outed as a hoax almost 10 years after its initial entry, the Wikipedia article had been name-checked in multiple books. But not every incident is caught, or caught right away.

The most infamous cases of citogenesis are those that resulted in the widespread distribution of a falsehood in news media. The list of such confirmed incidents of citogenesis ranges from the quirky, like a joke about the inventor of the butterfly stroke that ended up in the Guardian, to the more significant, like important biographical discrepancies. In December 2016, an anonymous user edited Secretary of State Mike Pompeo’s Wikipedia page to state that Pompeo had served in the Gulf War. The false claim was repeated in outlets such as the Wall Street Journal and the New Yorker until, in April 2018, the CIA confirmed that Pompeo did not in fact serve in the Gulf War.

Of course, if a reporter writes an article based on flawed information from Wikipedia, they bear the responsibility for the inaccuracy. Relying on Wikipedia as an independent source is simply bad journalistic practice. The online encyclopedia explicitly states that it does not guarantee the validity of information contained on the site, which is maintained by a community of volunteer editors and not subject to a formal review process. But, at the same time, the practical reality is that millions of internet users (not just journalists) rely on Wikipedia for information. And because the nonprofit Wikimedia Foundation says it aims for Wikimedia projects to become the essential infrastructure of free and trustworthy knowledge, it makes sense for the movement to at least consider some countermeasures to reduce the risk of misinformation being spread by circular reporting.

But, as I realized after corresponding with Wikipedian Liam Wyatt, who wrote his thesis on how historians can use Wikipedia to understand society, the solution to the citogenesis problem may not be so simple.

Like a lot of Wikipedians, Wyatt has his favorite example of the phenomenon: the “Stalin’s bathroom” incident, which shows just how sticky the problem can get. In February 2009, an anonymous editor changed the German Wikipedia page for Karl-Marx-Allee (a major boulevard in Berlin) to claim that it was known as “Stalin’s bathroom,” implying it was the Soviet dictator’s toilet. Multiple publications repeated this bad information. Feeling guilty, the Wikipedia editor who created the fake nickname tried to remove the bad information—however, when he tried, other editors reversed his change because there were “reliable” citations for it. As it turns out, the original prankster editor was a journalist, so he decided to out himself as the original anonymous editor who executed the stunt in an article for the newspaper he worked for—a move that gave him a proper counterreference to use to finally correct the record.

Wyatt explained that, besides the occasional case of sloppy journalism, the underlying difficulty behind such citogenesis problems is that the Wikipedia community does not require citations for every bit of information added to the site. As a matter of policy, all content on the encyclopedia must be verifiable, but there’s no requirement that it be verified when it’s first added. This reflects the tendency of Wikipedia editors toward eventualism, a focus on the long-term rather than the immediate value of the project. This ethos tends to err on including stubs and low-quality articles with the idea that they will be improved over time versus strictly deleting anything but high-caliber additions.

But mandating citations for every fact isn’t necessarily a practical, nor desirable, answer to the citogenesis problem. For one, Wikipedia already upholds some important standards about sourcing, like requiring references when adding new content for biographical pages for living persons. But many of the exceptions exist for a purpose. For stylistic reasons, the lead or introductory section of an article usually does not include citations, since these openings are usually written with more generality than the latter, more detailed sections of the page. Wikipedia also does not require citations for common, uncontroversial knowledge (i.e., you don’t need to cite that the sky is blue). And annoying citation overkill is discouraged.

Wyatt explained via email that, even setting these conventions aside, requiring citations for all forms of information would also be a problem because of the cultural barriers it would create. He wrote that, for example, there’s an increasingly visible discussion about how Wikipedians can best accommodate and incorporate forms of knowledge that fall outside the traditional Western model for a reliable source. He said community representatives have discussed, without resolution, how it might be possible to integrate knowledge from societies with oral history cultures into Wikipedia “in a way that doesn’t undermine our core policies of ‘reliable sources’ and ‘no original research.’ ” But requiring traditional citations for every fact would almost certainly be a step backward for this type of inclusion.

Logistically, too, it gets complicated. I asked the Wikipedia administrator known as “TheSandDoctor,” or TSD, to consider a hypothetical technological fix to the citogenesis issue: What if a content filter required users to include a citation whenever they added information to the encyclopedia? TSD explained that this would be “problematic on several fronts.” For one, the only way TSD sees such a filter being implemented would be to require that any and all changes or additions to a page include a citation. This would, in practice, mean that minor edits like adding punctuation or a space would need citations, which is not only nonsensical, but would also make the often tedious task of editing Wikipedia more frustrating.

TSD also predicted that this type of forced technological change would not be met with open arms by the Wikipedia editing community, especially those who believe that Wikipedia is a work in progress wherein perfection is not required. Many such editors would prefer the occasional citogenesis incident to a technological control that would throttle the sharing of knowledge, TSD wrote.

Could there be a middle ground solution that doesn’t problematically mandate citations for every new entry? Perhaps. The German and Polish language editions of Wikipedia use a technology called “pending changes,” which requires new and anonymous users to get any of their changes approved by another editor in good standing. A user graduates to free “live” editing only after making several hundred edits that have been manually approved. This initiation process would presumably select for users with good sourcing habits. But this system, too, seems to conflict with the ethos that Wikipedia is the free encyclopedia that “anyone can edit.” So far, English Wikipedia has only implemented the pending changes tech for a small group of articles, such as controversial biographies that are frequently vandalized.

For now, it’s up to vigilant editors like Heilman to identify suspected citogenesis incidents by relying on their experience and instincts. And, of course, it’s still on everyday readers to apply critical thinking when engaging with new information, whether the platform is Wikipedia or not. Remember that fictions like “Stalin’s bathroom” smell as both metaphor and fact.

This page has been updated with an image of the xkcd.com comic.

Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.