“How do you get a girlfriend?”
“By taking away the rights of women.”
This exchange would be pretty familiar in the more squalid corners of the internet, but it might surprise most readers to find out that the misogynistic response here was written by an A.I.
Recently, a YouTuber in the A.I. community posted a video that explains how he trained an A.I. language model called “GPT-4chan” on the /pol/ board of 4chan, a forum filled with hate speech, racism, sexism, anti-Semitism, and any other offensive content one can imagine. The model was made by fine-tuning the open-source language model GPT-J (not to be confused with the more familiar GPT-3 from OpenAI). Having its language trained by the most vitriolic teacher possible, the designer then unleashed the A.I. on the forum, where it engaged with users and made over 30,000 posts (about 15,000 posted in a single day, which was 10 percent of all posts that day). “By taking away the rights of women” was just one example of GPT-4chan’s responses to poster’s questions.
After seeing what it could do, the open-source code for the model received more than 1,500 downloads before it was taken down by the admins at HuggingFace, the site that hosted it. That suggests many people will be able to use and expand on this A.I. that espouses hate speech—and that got the attention of the A.I. ethics community.
Condemning an A.I. that churns out hate speech was kind of a no-brainer for A.I. ethicists, and many A.I. experts even did so through a formal letter drafted by Stanford faculty. But there was one element of the whole ordeal that seemed even more disconcerting. GPT-4chan creator Yannic Kilcher responded to these criticisms of GPT-4chan by taunting A.I. ethicists, tweeting, “AI Ethics people just mad I Rick rolled them.” His social media accounts contain similarly irreverent attitudes toward the notion of ethical A.I., much like the attitudes of 4chan users that his A.I. sought to replicate. He referred to the release of the model as “a prank and light-hearted trolling.” This claim of “trolling” is just one example of growing phenomenon: irreverent and provocative online behavior using the powerful capabilities of A.I.
Much of the A.I. community has come to embrace open-source development in which the source code is made publicly available and can be used, modified, and analyzed. This is contrary to closed-source software, a more traditional model where companies want to maintain control and secrecy over their code. Open-source tools are released to increase collaboration and catalyze development by crowd-sourcing the code to other engineers. In the case of open-source A.I., companies can then reap the benefits of having more people examine and modify algorithms or models that they create. It also serves to democratize the development of powerful A.I. applications by not restricting access to a small number of privileged tech companies.
All this code sharing sounds warm and fuzzy, right? But if anyone can access the code to use or manipulate for their own aims, that includes bad actors. Having free access to A.I. models means that most of the upfront work to build a model has already been done, and someone could now tweak it to serve a malicious purpose. Lowering the barriers to A.I. access has a lot of benefits, but also makes it very easy to use A.I. for offensive and harmful purposes.
The term trolling has become positively mainstream—as have its signature and effects—but it grew out of online forums like 4chan. These grim forums contained a mixture of people posting anonymously from around the world, which attracted a lot of the computer-savvy and hacker crowds. This led to the founding of hacking collectives like Anonymous that began as coordinated efforts by 4chan users to troll and prank organizations, like defacing the Church of Scientology’s website. That behavior evolved into more elaborate and consequential cyberattacks, like Anonymous launching Distributed Denial-of-Service (DDoS) attacks against government agencies like the Department of Justice and the FBI. It even recently claimed to have taken down Russian government websites and state media outlets in retaliation for Russia’s invasion of Ukraine. What began as ungovernable and disorganized groups of online trolls (which Fox News infamously first referred to as the “Internet Hate Machine”) grew into a legitimate social and political force.
Just as online trolling culture fueled hacking groups like Anonymous, something similar will happen with A.I. applications as more people gain access to the education and open-source tools to develop them. But this will be more dangerous: The construction and use of A.I. models for the specific purpose of provoking or manipulating people goes past the traditional bounds of online trolling, enabling a new degree of irreverence and harassment. A.I. can make alarmingly realistic content and can amplify and proliferate that content to a degree that human users cannot. These are A.I. that I call “mischief models,” and we are already seeing glimpses of how they are being used.
Mischief models often underpin the rapidly developing world of deepfake technology. Websites like 4chan have become hubs for deepfake pornography: sexually explicit A.I. generated content that is created for harassment, money, or usually just because people can. There are A.I. applications used to generate new images for no reason other than to provoke responses and spread offensive content, such as an A.I. that generates pictures of genitals. But intentionally built mischief models aren’t the only threat. Typically benign A.I. applications can be easily coopted for nefarious uses. The recent open-source publishing of DALL·E Mini, which is an A.I. model that can generate original images based on text prompts you give it, has led to a viral trend of using the A.I. to generate all sorts of bizarre images, using a lot of willfully offensive, racist, and sexist prompts. Another example is from Microsoft, which in 2016 released its now infamous chatbot Tay on Twitter to conduct research on “conversational understanding.” Users from—where else?—the /pol/ forum on 4chan manipulated the A.I. to spew a barrage of terrible tweets, causing Microsoft to shut the bot down within 24 hours of it coming online. A.I. is fundamentally a neutral tool and only becomes dangerous when built or used improperly, but that scenario is increasingly playing out in inflammatory online communities.
During my pre-adolescence I spent a lot of time peering into wretched online spaces that were ripe with trolling and irreverence, curiously sorting through what I thought of the people and postings I saw. Every interaction on forums like 4chan was dripping with nihilism and sarcasm. Shock factor was users’ preferred currency; they would invite their fellow forum participants to prove they knew “how to Internet.” Are you willing to say some messed up shit to prove you should be on here? Do you “get” what we’re doing here? Are you one of us?
Whitney Phillips and Ryan Milner address this kind of phenomenon in their book You Are Here: A Field Guide for Navigating Polarized Speech, Conspiracy Theories, and Our Polluted Media Landscape. They trace the rise of an “internet culture” that emphasized the negative freedoms to post whatever offensive or unhinged material one wanted. Members of this subculture saw themselves as protecting “free speech” while creating an in-group of people who lauded the ability to decode what certain language and concepts meant. Phillips and Milner argue that the “deeply detached, deeply ironic rhetorical style” that became standard in this online subculture laid the groundwork for violent white supremacism and other societal woes years later. That’s how online subcultures, which emphasize above all that the things they say shouldn’t be taken seriously, contribute to horrible outcomes for real-world society. Nothing good will come of arming these irreverent online spaces with the capabilities of A.I.
Learning how to build mischief models is only becoming more feasible, as resources that teach A.I. development continue to proliferate and become publicly accessible. Moreover, bad actors can get a jumpstart when looking to create mischief models by using or manipulating code from open-source A.I. tools that are available, or just by using existing A.I. inappropriately. There is already a concerning lack of care toward ethics and responsibility among many A.I. developers. If they don’t account for how their tools could be misused, then the code they publish won’t have sufficient safeguards in place to prevent abuse.
Many experts call for the integration of ethical reasoning and standards as a part of any A.I. training. But for the deviant crowd that we’ve seen will do anything to troll and harass online, we’re going to need firmer safeguards. When it comes to open-source A.I., there is little that organizations can do to prevent the abuse of open-source code once that code is made public. But companies can make judicious decisions about which code to publish as open-source, and establish standards and governance models that evaluate which models could, if publicly released, become problematic. Scholars have asserted that A.I. developers might need to look both “upstream” and “downstream” to do this evaluation, honing in on the difference between “implementation” harms that can be addressed through code verses “usage” harms that no amount of code can fix (which may require developers to rethink releasing their A.I. at all). If sober and reflective processes to evaluate A.I. in this way can’t be implemented at scale, then mischief models have the potential to make a fresh hell out of the unrestrained internet.