You Can’t Clean Up a Data Spill

And it makes the Cambridge Analytica scandal even more maddening.

1's and 0's falling spilling into a pile against a green background.
Photo illustration by Slate

There’s no such thing as a cleanup site for data spills. That’s because when data leaks, it can be duplicated far faster than anyone can mop it up. So when Mark Zuckerberg sat down with Wired recently as part of his media tour to cool down the Cambridge Analytica scandal, saying that Facebook would undertake tremendous efforts to “fully confirm that no Facebook community data is out there,” I could only assume he was speaking figuratively, hyperbolically, or disingenuously.

Zuckerberg’s company is in trouble because frighteningly porous data-sharing policies allowed information on 50 million Facebook users to inappropriately end up in the custody of a controversial political-data firm, Cambridge Analytica, which did digital voter targeting for both Ted Cruz’s and Donald Trump’s campaigns during the most recent election. Cambridge Analytica swore when it was caught a few years ago that it deleted the data, and it swears now it didn’t have it by the time it was hired by the Trump campaign, even though reporting in the Guardian and New York Times says otherwise. Pending an investigation, we don’t know whether it’s true that Cambridge Analytica deleted the data when it said it did, or at all. But we can be fairly sure that this valuable information is still out there, no matter who deleted it, somewhere beyond the hedges of Facebook’s walled garden.

Much of the outrage at Facebook in recent weeks has focused on our discomfort with the company’s data-privacy practices, though it also has a lot to do with an ambient anger toward the company that has simmered since the aftermath of the 2016 election, during which Facebook inadvertently abetted malicious fake news and a Russian disinformation campaign. But less attention has gone toward the practical actions that have to happen moving forward, and what users and the company can do about the data that’s out there. But contemplating how to fix the problem transforms the issue of Facebook’s privacy blunders into an almost impossible debacle—because, unfortunately, there’s just no good answer.

As we all (should!) know, when you use Facebook, it collects data on you. Facebook stores what page or pictures you spend the most time looking at; what you search for, click on, and like; what comments you expanded; what events you checked out; what you shopped for on other websites across the internet; who you recently called or texted; and what photos you appear in, even if you’ve never seen the picture before. If one of your Facebook friends had happened to download the app made on behalf of Cambridge Analytica in 2014 and decided to take the personality quiz, that person’s profile data, along with the profile data of his friends, including yours, was likely in a cache owned by a company whose job was to install a conservative in the White House. After the data was collected, it could have been sold or passed on by Cambridge Analytica or the psychology professor who made the app, Aleksandr Kogan, and then sold or passed on again. This could have happened before or after Cambridge Analytica started working for Trump, and before or after Facebook made the firm swear it deleted the data. All of which is why, when Zuckerberg told CNN that Cambridge Analytica “legally certified” to Facebook that the data had been destroyed in 2015 and suggested he thought at the time that would’ve been enough, it sounds pretty naïve.

Facebook probably has no idea where the data is. And even though Zuckerberg said that his company plans to do a full investigation into third-party app developers who acted “suspiciously,” and even though Facebook will ostensibly threaten to ban any of them from using Facebook again if they don’t comply, there’s a slim chance these investigations could be anything even approaching comprehensive.

“Tracking down and searching where that data has gone will be incredibly difficult,” says Sarah Aoun, a digital security specialist and open web fellow at the Mozilla Foundation. “I’m not even sure it would be realistic.” Maybe it would be easier if the data was “watermarked,” meaning there was some tag on the data to indicate it was the Cambridge Analytica–obtained Facebook data. But Facebook didn’t do that, as Zuckerberg explained to Wired, and even if it had, Aoun says that “any identifiable trace relating it back to Facebook can be altered and then changed and could exist in 10 different shapes and forms online or in the hands of anyone.” One person might just have people’s emails and location information. Someone else might have a portion of the list that’s just single men younger than 25 who are interested in owning a car. Another ad targeter might have everything.

Beyond the investigation, Zuckerberg said in a Facebook post that the company plans to further tighten how developers can access user data, so if, say, you downloaded an app three years ago and haven’t touched it in more than three months, developers will no longer have access to your data. Facebook is also working on a tool that forces marketers who upload people’s email addresses to Facebook for ad targeting to certify that the user data they have was rightfully obtained.

But even with such steps, which will better secure Facebook’s systems in the future, it’s safe to bet that hackers and data-hungry web crawlers will find a way to circumvent it. Just look at how difficult it’s been for the entertainment and software industry to build digital rights management systems—that is, the digital locks on, say, an Amazon Kindle or a copy of Microsoft Word that are supposed to prevent users from making copies. Anyone who has ever torrented an expensive piece of software or bought one that was copied to a CD-R knows that these tools don’t keep pirates out. “The past is littered with digital rights management systems that have been broken,” says Adam Doupé, the associate director at the Center for Cybersecurity and Digital Forensics at Arizona State University. “There really isn’t a good tech solution.”

Even erasing data only goes so far. “Even when you think about deleting files on a hard disk, by default we think it’s gone, but we know from digital forensics that a copy of it usually still exists. It just hasn’t been overwritten yet,” Doupé said, stressing that once data makes it into the wild, there’s no telling where it’s ended up. Sandy Parakilas, a whistleblower who used to work on Facebook’s app security team in 2011, told the Guardian recently that he “always assumed there was something of a black market” for Facebook data harvested from app developers who took advantage of the company’s porous policies.

This is an example of why, as blogger Maciej Ceglowski has put it, data is like toxic waste: “Just when you think you’ve buried it forever, it comes leaching out somewhere unexpected.” But data doesn’t necessarily lose its toxicity over time. There’s no half-life—if there’s Facebook information out there that shows your political activities or sexual orientation, that information could be used against you in the future. Yes, that might include blackmail, but it also could include less malevolent but still worrisome practices like ad targeting that reinforces stereotypes or markets harmful products. Doupé said that data from the Equifax and the Yahoo breaches that hit hundreds of millions of people are probably for sale in online marketplaces where people who specialize in things like making fake credit cards and buying things with fake bank accounts trade people’s personal information.

Considering all of this, it’s hard not to wonder whether Facebook’s permissive data-sharing policies weren’t merely a result of a lack of foresight but rather reflected an ethos at the company. As longtime Facebook executive Andrew Bosworth put it in a memo recently obtained by BuzzFeed, growth is more important to Facebook than protecting the health of the people who use it. Connecting people may cost “a life by exposing someone to bullies. Maybe someone dies in a terrorist attack coordinated on our tools,” he said in the memo, which he has since deleted from Facebook’s internal system, saying it did not reflect his attitudes but was instead meant to spur debate. Whatever he meant, his words had the ring of explaining away the negative externalities of Facebook’s products. “That’s why all the work we do in growth is justified,” Bosworth wrote.

But is it justified? When Parakilas spoke to the Guardian, he said that “one of the main ways to get developers interested in building apps was through offering them access to this data.” Meanwhile, getting users hooked into addictive and time-wasting apps, like Words With Friends or FarmVille, was an important way that Facebook grew. It worked. Now Facebook is one of the most valuable companies in the world. It’s hard to imagine an ethical equation wherein one company’s drive to become even more profitable justifies letting random developers cart off their customers’ data without much in place to protect them from misuse.

The result is an unfathomable mess made even more so because of the nature of digital data: We can’t see it, and we never discuss what happens after it leaks. As routine as it has become to read news of massive data breaches, we never see headlines about cleaning it up. A company like Equifax or Yahoo may strengthen its security (while deploying its lobbyists to stymie new regulations), but once the data is out there, there’s no telling who has a copy. “We can’t fix what already happened,” says Doupé.

Facebook’s work to strengthen its security should continue, of course, as should its investigation to where user data might have landed. If Zuckerberg shares the fruits of these inquiries, it could help everyone learn how to be safer online. But Facebook also shouldn’t be trusted to do the right thing for long. The company knew tens of millions of users’ data was wrongfully handed over to Cambridge Analytica in 2015 during Ted Cruz’s campaign, but Facebook didn’t notify users about what happened. The company also knew about how bad Russian interference on Facebook was in the run-up to the election by as early as November 2016, according to the New York Times, but didn’t come forward about it until September 2017, all the while refusing to answer questions from reporters. It wasn’t until last November that Facebook stopped letting advertisers target people based on hateful interest categories, like “Jew haters” and “threesome rape,” and even that was only after reporters from ProPublica wrote about it. Facebook is currently under investigation by the Federal Trade Commission for potentially breaking federal rules about protecting user privacy it agreed to follow in 2011.

So if we can’t trust Facebook to take care of its users, we need to try to take better care of ourselves. There are defensive tools people should consider downloading, like Mozilla’s Facebook Container Extension, which makes it harder for Facebook to track your activity across the web, and Privacy Badger, a browser extension made by the Electronic Frontier Foundation (where, full disclosure, I used to work), which stops online advertising companies that track your activities as you visit different websites, that give us more control over who collects data on us and where that data ends up. It’s also important to demand lawmakers work to enact comprehensive data privacy laws, like a strong federal rule requiring companies notify users in case there is a breach, or limits on what kinds of data websites are allowed to collect in the first place. The most recent federal online privacy law was passed more than 30 years ago, and it has to do with email. It is ill-equipped to address the vastly more complicated internet we inhabit today.

One silver lining: European data protections go into effect in May, which will require internet companies to ensure that users are able to consent to the data that’s being collected about them and that companies are clear about how that data is used. Not following the rules could cost a company 4 percent of its annual global revenue, which for Facebook means billions of dollars. If Facebook is able to follow the law in Europe, it should have the scaffolding in place to apply the same protections to U.S. users, too. But that’s going to require pressuring our representatives to make new data privacy laws—and before we do that, we need to care enough to ask them. We should care because the problem of data spills is so frustratingly unsolvable and unavoidable. The best thing we can do is pass laws that try to make sure that when they do happen, they affect as few people and do as little damage as possible.

Read more from Slate on Cambridge Analytica.