One of the latest chapters in the Facebook Files shows that despite large investments and improving technology, Facebook’s A.I. consistently fails to detect harmful content. Perhaps most strikingly, Facebook employees estimated that a whopping 99.4 percent of the content that violates the company’s policies against violence and incitement remains on the platform. The reason is that there is far too much to monitor by hand, so A.I. detection is essential—even though it struggles with this difficult task. In the words of a senior Facebook engineer: “We do not and possibly never will have a model that captures even a majority of integrity harms.”
This content moderation problem is not unique to Facebook; it plagues all the large social media platforms. However, at least with misinformation, the recent focus on content moderation is distracting us from something important: In addition to detecting misinformation on social media, A.I. can be a tool for defunding misinformation so it doesn’t spread on social media in the first place. But it’s not being used for this second purpose nearly as effectively as it could be.
The reason that a lot of dangerous misinformation exists is that it is, unfortunately, quite lucrative: Fake news brings real clicks, and with that comes real dollars in the form of ad revenue. If this ad revenue dried up, so would a lot of misinformation. We need to make it harder for publishers of misinformation to host online ads.
Google is the largest digital ad company in the world. Like Facebook and Twitter, Google displays ads on its own sites—but Google also serves as an intermediary connecting advertisers with independent sites that want to host ads. To do this, Google runs online auctions to algorithmically distribute ads to over 2 million non-Google sites in what is known as the Google Display Network. These GDN sites receive payment from advertisers for hosting ads, based on the number of views the ads receive, and they share a percentage of this payment with Google.
The nonprofit, nonpartisan U.K. organization Global Disinformation Index estimated that in 2019, disinformation sites brought in nearly $250 million in ad revenue, of which Google was responsible for nearly 40 percent. A Google spokesperson responded to a related GDI report by saying it is “fundamentally flawed in that it does not define what should be considered disinformation”; this methodological criticism from Google did not dissuade the leaders of over a dozen of the top philanthropic organizations in the U.S. from sending Google CEO Sundar Pichai a letter raising concern over the issues brought to light in the GDI report. The annual ad revenue for misinformation (a larger category than disinformation) was recently estimated by NewsGuard Technologies, a tech company aimed at evaluating news sources, to be more than $2.5 billion.
NewsGuard identified more than 150 sites that published falsehoods and conspiracy theories about the 2020 presidential election between Election Day and Inauguration Day. It found that 80 percent of these sites, including One America News Network and Gateway Pundit, received ad placements from Google. NewsGuard also uncovered several examples of well-respected medical organizations inadvertently placing ads on, and thereby providing funding to, sites that published harmful medical misinformation.
Google has policies disallowing certain kinds of content in the GDN. This includes some categories of election and health misinformation, and on Nov. 8 it added climate change denial misinformation to the list. Google first strips away the ad hosting privilege of individual pages that violate its policies; it only resorts to sitewide demonetization in cases of persistent, egregious violation.
The enormity of ad distribution necessitates a semi-automated approach to detecting policy violations, so A.I. is used to expedite and scale up the efforts of human moderators. In 2020, Google demonetized 1 billion pages for policy violations. This is an astronomical number, but, as noted above, one particular revelation from the Facebook Files is that as much bad stuff as A.I. is able to catch, there is always a ton more that it misses. For instance, Facebook estimates that it only takes down 3 to 5 percent of the prohibited hate speech on the platform—because that’s all the automated A.I. detection system is confident enough to remove without human verification.
A.I. systems are incapable of determining whether a piece of content violates a policy. What happens is the A.I. estimates the probability of a policy violation, and if this score is above a certain threshold, then action is taken. To keep the number of false positives low, the threshold for action is usually set extremely high. This is why investigations such GDI’s and NewsGuard’s find so much misinformation sailing through Google’s detection system despite blatantly violating Google’s ad host policies: If the threshold is set to, say, 90 percent, then content with a score of 89 percent is deemed safe.
There’s a painfully obvious way to do much better: These scores should also be used as penalties in the algorithmic auctions Google runs in which sites in the GDN bid for ad placements.
Even much-maligned Facebook knows that this is how A.I. should be used. Despite only removing 3 to 5 percent of hate speech, Facebook has significantly reduced the amount of hate speech users encounter (what Facebook calls the “prevalence” of hate speech). That’s because it uses policy violation probabilities as penalties in the news feed algorithm—so that the more likely a post is to be hate speech, the less likely you are to see it, even if the A.I. isn’t confident enough to remove the post entirely. Facebook still has a long way to go, but at least with this downranking approach Facebook is using its A.I. effectively to help users see less bad stuff on the platform.
Google’s task is much easier than Facebook’s, because Facebook must evaluate each individual piece of content whereas Google can use a site’s track record to help assign penalties. In fact, Google already does this: It frequently touts its efforts to “elevate quality journalism” in search rankings by demoting sites with a track record of publishing misinformation. This has been a successful endeavor to some extent—which is, in part, why Facebook and Google were both in the public crosshairs back in 2016 for their role in propagating misinformation, yet today we mostly hear about Facebook’s failures. For example, Google thankfully did not repeat in 2020 its embarrassing 2016 debacle of ranking a WordPress blog falsely claiming Trump won the popular vote as the top result for the search phrase “final election results.”
There is no reason Google cannot apply the same kind of demotion in its ad auction/distribution algorithm that it already applies in its search ranking algorithm for sites that publish misinformation. Google relies on a porous all-or-nothing approach to detecting misinformation in the GDN while sitting idly on a proven method that could help enormously.
This is the peak of hypocrisy: In the private shadows of ad distribution, Google knowingly funnels money to the very same misinformation sites that it proudly fights in its public-facing search system.
When asked for comment, a Google spokesperson wrote: “We have strict publisher policies that explicitly prohibit a wide range of unreliable claims and misinformation. When we find content that violates those policies we block or remove ads from serving. Our enforcement varies—it can be specific to an individual page or extend to an entire site, depending on the nature and pervasiveness of the violations.” These strict policies are hamstrung by Google’s needlessly anemic all-or-nothing approach to enforcement.
We should continue the push for Facebook to fix its engagement-driven algorithms that are damaging our democracy, as well as the calls for more transparency so we can better understand these algorithms and hold Facebook accountable for their harms. But don’t let anti-Zuckerberg hysteria distract us from the role Google has quietly been playing in financially supporting so much of the harmful content that ends up on Facebook.
To give Google some credit, it did eventually discontinue an earlier option for sites in the GDN to remain anonymous to advertisers (a popular choice for misinformation sites). And it recently added an option for advertisers to import dynamically updated host site exclusion lists curated by third parties, which can help advertisers avoid misinformation sites if they actively endeavor to do so.
While Google is the largest ad distributor and so bears much of the responsibility here, it is not the only ad distributor funding misinformation. We need Congress to regulate the dangerous and dysfunctional digital ad market: require all online ad distributors to reveal their efforts to prohibit and downrank misinformation sites—so that there is at least some public accountability—and mandate measures to make it easier for advertisers to make healthier choices of where to place their ads, no matter which ad distributor they use.
Thanks to Frances Haugen and the whistleblowers who preceded her, we know how much harmful content is on Facebook and how little the company has done to stop it. What we haven’t focused enough on is who profits from producing this harmful content in the first place and how we can stop it before it spreads and becomes both a technological and a free speech nightmare.