Future Tense

Twitter Has Set Itself Up for an Enormous New Content Moderation Problem

A Twitter bird slowly disappearing in a circle.
Animation by Slate

First, there was Snapchat. Then, there were Instagram Stories. By 2017, Facebook, Instagram’s parent company, added “Stories” as well, prompting memes about everyone rolling out “stories,” including bananas and fidget spinners.

And this week, Twitter rolled out fleets. Fleets are virtually identical to Instagram Stories: You can post text or upload a photo or video, and then obsessively check who’s viewed your inane story. The most appreciable difference is that Fleets’ selection of text colors is slightly more pastelly than Instagram Stories’.

Advertisement

The feature is not only unoriginal, but also strangely late to the game; even LinkedIn beat Twitter to the punch by adding its stories feature a couple of months back. It’s not clear what the platform gains from including them, but it does complicate one area in which Twitter, like its competitors, is struggling: moderation.

Advertisement
Advertisement
Advertisement

The platform has long been criticized for shortcomings in its moderation policies, which can leave users vulnerable to harassment as well as leave the spread of misinformation unchecked. In 2020, it has stepped up its moderation, by cracking down in particular on coronavirus and voting-related mis- and disinformation. Most notably, as President Trump continually tweets falsehoods about the election, the platform has added labels to his tweets noting that those claims are disputed. Facebook, too, added such labels, but according to a data scientist interviewed by BuzzFeed News, they only decrease post sharing by around 8 percent, which is unlikely to change posts’ reach.

Advertisement

Instagram Stories demonstrates that ephemeral, disappearing posts are much more difficult to moderate than static content. First, there’s the time limit—with hundreds of millions of Instagram Stories posted a day, it’s hard to imagine a way to keep tabs on all of them. And as a 2018 Vice piece pointed out, it can be difficult to determine whether an individual story post violates terms of service. Stories are often posted in a sequence, so an individual post may lack broader context. For instance, I often see stories that are meant to be scrolled through as a whole, where each individual post includes just one word: the first might just say “HAPPY,” the next “BIRTHDAY,” and, the last: “FIDO.” One could just as easily post a similar sequence with misinformation, like “5G causes coronavirus.” No individual post from that sequence might flag a review, but as a whole, it would still be propagating a dangerous message.

Advertisement
Advertisement

Instagram and Facebook appear to use a combination of machine detection—recognizing hate symbols, for instance—as well as human moderators to determine whether posts violate policies, which includes looking at whole stories for full context. In my experience using Instagram stories, it appears there’s automatic detection for keywords like “coronavirus,” “COVID-19,” or “pandemic”—whenever I’ve posted screenshots of news stories with those words, Instagram has notified me that it’s added a link to a “health source” below the post. (The link goes to the CDC’s page on the coronavirus, which is certainly better than nothing, but is also unlikely to provide much information about the more complex issues around COVID-19.)

Advertisement

But the ins and outs of how fleets are moderated is not yet clear, and Twitter has long been tight-lipped about the specifics of its moderation practices. A Twitter spokesperson told me that fleets, like tweets, must follow all Twitter rules, and may also contain warnings and labels. That’s all well and good, but people are already trying to poke holes in fleets’ moderation standards; Marc-André Argentino, a data scientist studying extremist groups at Concordia University, posted a thread on Twitter saying he’d been able to fleet “banned URLs, videos and disinformation about the election results.” In a since-deleted thread, Argentino said his account was locked after other users reported him but that his tweets were still visible while he was banned, and that with “sockpuppet” accounts he runs, he was able to post videos with Nazi symbolism, graphic gore, and suicide threats, among other reprehensible things. From that, Argentino concluded that Twitter has no automated detection to catch banned imagery or URLs. I asked Twitter about Argentino’s allegations, and a representative reiterated again that fleets must follow Twitter rules and that the platform took action on Argentino’s account on that basis.

Advertisement
Advertisement
Advertisement

While Argentino’s posts may have revealed some shortcomings in Twitter’s fleet moderation, Joan Donovan, disinformation and media researcher who directs the Harvard Kennedy School’s Shorenstein Center, voiced concern about the ethics of Argentino’s tactics. “There are many ways to interrogate platforms’ features without harming people,” she told me in an email. (I reached out to Argentino for further comment, but did not hear back.)

Other users reported that fleets appear to introduce a loophole in sending direct messages to users. Normally, you can decide whether you want to have open DMs, meaning anyone can send you a message, or closed DMs, so only people whom you follow can. But by replying to a fleet, it appears you may be able to message a user with closed messages. To test this, I made sure my Twitter account was set to accept messages only from people I follow. I then unfollowed my sister-in-law. She wasn’t able to message me initially, but  she was able to by simply replying to my fleet. (To really simulate the potential creepiness, she messaged me: “send me a pic of your feet.”) She also tried to message me from a couple burner accounts, but was unable to. It seems this only works if two users have previously had a DM history, which hopefully screens out some potential for abuse, but does not completely eliminate it; you can certainly receive unpleasant or harassing DMs from people you’ve messaged with before.

All in all, it seems Twitter will have no shortage of work to do to manage the impact of fleets. “It’s difficult to know why Twitter would roll out a new feature in the middle of a pandemic when they are already struggling to keep up with moderating the volume of content,” says Donovan.

Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.

Advertisement