The age of big data is frightening to a lot of people, in part because of the implicit promise that algorithms, sufficiently supplied with data, are better at inference than we are. Superhuman powers are scary: Beings that can change their shape are scary, beings that rise from the dead are scary, and beings that can make inferences that we cannot are scary. It was scary when a statistical model deployed by the guest marketing analytics team at Target correctly inferred based on purchasing data that one of its customers—sorry, guests—a teenage girl in Minnesota, was pregnant, based on an arcane formula involving elevated rates of buying unscented lotion, mineral supplements, and cotton balls. Target started sending her coupons for baby gear, much to the consternation of her father, who, with his puny human inferential power, was still in the dark. Spooky to contemplate, living in a world where Google and Facebook and your phone, and, geez, even Target, know more about you than your parents do.
But we ought to spend less time worrying about eerily super-powered algorithms, and more time worrying about crappy ones.
For one thing, crappy might be as good as it gets. Yes, the algorithms that drive the businesses of Silicon Valley get more sophisticated every year, and the data fed to them more voluminous and nutritious. There’s a vision of the future in which Google knows you—where by aggregating millions of micro-observations (“How long did he hesitate before clicking on this … how long did his Google Glass linger on that …”) the central storehouse can predict your preferences, your desires, and your actions, especially vis-à-vis what products you might want, or might be persuaded to want.
It might be that way! But it also might not. There are lots of mathematical problems where supplying more data improves the accuracy of the result in a fairly predictable way. If you want to predict the course of an asteroid, you need to measure its velocity and its position, as well as the gravitational effects of the objects in its astronomical neighborhood. The more measurements you can make of the asteroid and the more precise those measurements are, the better you’re going to do at pinning down its track.
But some problems are more like predicting the weather. That’s another situation where having plenty of fine-grained data, and the computational power to plow through it quickly, can really help. In 1950, it took the early computer ENIAC 24 hours to simulate 24 hours of weather, and that was an astounding feat of space-age computation. In 2008, the computation was reproduced on a Nokia 6300 mobile phone in less than a second. Forecasts aren’t just faster now; they’re longer-range and more accurate, too. In 2010, a typical five-day forecast was as accurate as a three-day forecast had been in 1986.
It’s tempting to imagine that predictions will just get better and better as our ability to gather data gets more and more powerful. Won’t we eventually have the whole atmosphere simulated to a high precision in a server farm somewhere under The Weather Channel’s headquarters? Then, if you wanted to know next month’s weather, you could just let the simulation run a little bit ahead.
It’s not going to be that way. Energy in the atmosphere burbles up very quickly from the tiniest scales to the most global, with the effect that even a minuscule change at one place and time can lead to a vastly different outcome only a few days down the road. Weather is, in the technical sense of the word, chaotic. In fact, it was in the numerical study of weather that Edward Lorenz discovered the mathematical notion of chaos in the first place. He wrote, “One meteorologist remarked that if the theory were correct, one flap of a sea gull’s wing would be enough to alter the course of the weather forever. The controversy has not yet been settled, but the most recent evidence seems to favor the sea gulls.”
There’s a hard limit to how far in advance we can predict the weather, no matter how much data we collect. Lorenz thought it was about two weeks, and so far the concentrated efforts of the world’s meteorologists have given us no cause to doubt that boundary.
Is human behavior more like an asteroid or more like the weather? It surely depends on what aspect of human behavior you’re talking about. In at least one respect, human behavior ought to be even harder to predict than the weather. We have a very good mathematical model for weather, which allows us at least to get better at short-range predictions when given access to more data, even if the inherent chaos of the system inevitably wins out. For human action we have no such model, and may never have one. That makes the prediction problem massively harder.
In 2006, Netflix launched a $1 million competition to see if anyone could write an algorithm that outperformed Netflix’s own with regard to recommending movies to customers. The finish line didn’t seem very far from the start: The winner would be the first program to do 10 percent better at recommending movies than Netflix did.
Contestants were given a huge file of anonymized ratings—about a million ratings in all, covering 17,700 movies and almost half a million Netflix users. The challenge was to predict how users would rate movies they hadn’t seen. There’s data—lots of data. And it’s directly relevant to the behavior you’re trying to predict. And yet this problem is really, really hard. It ended up taking three years before anyone crossed the 10 percent improvement barrier, and it was only done when several teams banded together and hybridized their almost-good-enough algorithms into something just strong enough to collapse across the finish line. Netflix never even used the winning algorithm in its business; by the time the contest was over, Netflix was already transitioning from sending DVDs in the mail to streaming movies online, which makes dud recommendations less of a big deal. And if you’ve ever used Netflix (or Amazon, or Facebook, or any other site that aims to recommend you products based on the data it’s gathered about you), you know that the recommendations remain pretty comically bad. They might get a lot better as even more streams of data get integrated into your profile. But they certainly might not.
Which, from the point of view of the companies doing the gathering, is not so bad. It would be great for Target if they knew with absolute certainty whether or not you were pregnant, just from following the tracks of your loyalty card. They don’t. But it would also be great if they could be 10 percent more accurate in their guesses than they are now. Same for Google. They don’t have to know exactly what product you want; they just have to have a better idea than competing ad channels do. You don’t need to outrun the bear!
Predicting your behavior 10 percent more accurately isn’t actually all that spooky for you, but it can mean a lot of money for them. I asked Jim Bennett, the vice president for recommendations at Netflix at the time of the competition, why they’d offered such a big prize. He told me I should have been asking why the prize was so small. A 10 percent improvement in their recommendations, small as that seems, would recoup the $1 million in less time than it takes to make another Fast and Furious movie.
It’s no big deal if Netflix suggests the wrong movie to you. But in other domains, bad data is more dangerous. Think about algorithms that try to identify people with an elevated chance of being involved in terrorism, or people who are more likely than most to owe the government money. Or the secret systems the rating agencies use to assess the riskiness of financial assets.
Here, the mistakes have real consequences. It’s creepy and bad when Target intuits that you’re pregnant. But it’s even creepier and worse if you’re not pregnant—or a terrorist, or a deadbeat dad—and an algorithm, doing its business in a closed and opaque box, decides that you are.