The Leaky Nature of Online Privacy

Network analysis can uncover your personal details even if you choose to hide them.

This article arises from Future Tense,a collaboration among Arizona State University, the New America Foundation, and Slate.

Online privacy is trickier than you think

In July, LinkedIn co-founder Reid Hoffman wore a strange device on his wrist as he gave a talk to the MIT Media Lab. The sensor, developed by Boston startup Neumitra, blinked to display Hoffman’s overall stress level as he spoke. * Behind him, the live Twitter feed for #MediaLabTalk scrolled by on two giant screens. One tweet noted, “You can see Reid Hoffman’s stress monitor blink faster as he struggles to recall details of an article he just read.” Hoffman’s stress level also elevated each time his high-tech accessory was mentioned, as when Media Lab Director Joi Ito joked that the Media Lab planned to enhance all future talks by hooking presenters up to lie detectors. Though the device is not actually a lie detector (and even full polygraph tests are not great at spotting lies), the device may have helped a viewer distinguish the points on which Hoffman was confident from those he was less sure about. All this dovetailed nicely with one of the themes of his talk: With increased public scrutiny, information becomes more trustworthy.

Volunteering one’s own physiological data to demonstrate openness is not so far-fetched in a time when technological changes are causing social barriers to crumble. The day when we all sport stress monitors is probably not looming. But we may already be unintentionally volunteering that information. Recent research by Ming-Zher Poh, Daniel J. McDuff, and Rosalind Picard of MIT Media Lab’s Affective Computing Group has found that pulse information can be gleaned from a basic webcam. It turns out that blood changes the skin’s color slightly with each pulse, in a way that can be recovered from a video signal using a technique called independent component analysis. Though the technology has not yet proven itself outside the lab, where controlled lighting conditions can make such analysis easier, it is not hard to imagine that pulse could be recoverable from video recordings of normal teleconferences. Pulse therefore joins the long list of information that we are leaking all the time over the Internet without really knowing it.

It’s clear at this point that anybody can take a photo of you and post it on the Internet; once it’s there, it is nearly impossible to remove all copies. But increasingly, pattern recognition software has made it possible to learn about someone not based on what he has shared about himself but by examining what his friends have made public. For example, researchers have trained a program to identify the sexual orientation of gay males with roughly 80 percent accuracy using the self-reported orientation of their Facebook friends. Alan Mislove of Northeastern University has shown that it takes only a 20 percent participation rate among college students in filling out profile information to deduce facts—such as major, year, and dorm—about the nonresponders who simply friended others. The software uses statistics gleaned from large data sets about how often friends tend to have characteristics in common when they are a part of a community and how often they might simply share characteristics by chance; then it combines several such probabilities into a statistically motivated guess as to whether a person belongs to a particular community. So it’s not actually possible to participate in social networks without revealing anything about yourself; you reveal your interests by association.

Received wisdom holds that the Internet is the place where nobody knows you’re a dog. But this has always been something of an illusion. In response to concerns about privacy, Facebook has rolled out new settings meant to secure your details. But ZIP code, sex, and birth date are enough to determine your exact identity 87 percent of the time, as noted by Latanya Sweeney of Harvard’s Data Privacy Lab. Unless the subject is savvy enough to use a proxy, your location can be deduced from your IP address; your birth date is often volunteered on social-networking sites or as proof of being old enough to not give the website headaches; and gender can be revealed by as subtle a cue as word choice. Even if an exact match can’t be found through profile information, a writing sample can generally suffice to pick you out of a group of people, given enough writing samples for comparison. The tiny variations in word choice that distinguish one writer from another can be overlooked by a human observer but picked out statistically by a computer.

Rather than be distressed about the increasing impossibility of privacy, perhaps we should instead consider the benefits of being more open. Hoffman mentioned that on LinkedIn, profile information becomes more trustworthy than a traditional paper résumé once a person’s profile has at least 10 links, because people are less willing to stretch the truth in front of their friends and colleagues. This is the general trade-off we face: In exchange for giving up some of our privacy, we acquire more reliable information about one another.

The idea of bringing this kind of subtle information exchange to video analysis, though, takes it to a whole new level, which is why I am so intrigued by the Affective Computing Lab’s optical pulse research. Would people agree to communicate via a videoconferencing system that revealed pulse to better convey an honest signal of how interested or excited the other person was? Might dating sites offer customers software to analyze a prospective date’s video chat after the fact for signs of excitement without the other knowing? The current attitude toward Google stalking appears to tilt toward the idea that if information is impossible to hide, it’s fair to see. The ramifications of that principle are far-reaching.  Nobody knows exactly how much they are revealing about themselves every time they use a social networking site or save a YouTube video for posterity, because these are open research questions. But they surely reveal more than they intend.

The cultural changes wrought by the Internet are not yet done, because our understanding as a society of exactly what information is on the Internet is not complete. The power of statistics and pattern recognition means that we’ve put much more information on the Web than we intended. As the Internet becomes more connected to the real world through video and other sensors, the chance for information leakage increases. One consequence is that some members of the younger generation have elected to give up on privacy entirely. But even this inferred information is subject to the general rules of Internet information: It can be incorrect, misleading, out-of-date, misinterpreted, or intentionally faked. Statistical association, when you think about it, is an awfully superficial way to judge a person, and pulse is an exceedingly ambiguous social signal.  Like Internet users themselves, pattern recognition software can fall prey to an illusion of intimacy while still understanding little of what makes a person tick.

Correction, September 6, 2011: This article originally misstated the developer of the QSensor. It was created by the startup company Neumitra. (Return to corrected sentence.)