Future Tense

Web of Assumptions

Americans don’t like when companies infer their personal information—but marketers keep doing it anyway.

Photo illustration by Natalie Matthews-Ramo. Image by Thinkstock.

Last year, Facebook presented users with different ads for the film Straight Outta Compton based on their “ethnic affinity.” The targeted users had not checked a box identifying themselves as black. Instead, the social marketers’ algorithms had guessed their “ethnic affinity” based on posts they had clicked on and liked in the past.

It’s happened before. In what has become a go-to cautionary tale, Target infamously used inference to send a teenager advertisements for baby items before she had even told her family she was pregnant; the company’s algorithms had predicted her maternal state based on her purchase data and demographic profile. In another case, ProPublica busted Princeton Review for charging different prices for its SAT prep courses based on ZIP code, which meant that Asian families were nearly twice as likely to be quoted a higher price. Home Depot, Staples, and other companies similarly have been caught varying prices by location. While likely unintentional in these cases, the outcomes highlight that location information can serve as an effective proxy for race or income.

If a company asks you online to provide detailed information about yourself, you might be reluctant to share it. So digital marketers have found a workaround: They vacuum up troves of data about internet users to improve their use of inference, the practice of using available information—like behavioral, mobile, and publicly available data—to make informed guesses about consumers. By deducing your race, income, gender, interests, and more, companies can personalize search results, ads, pricing, and other content.

Marketers love personalized ads because they perform well—about three times better than standard ads, according to research by web marketing firm Jivox. However, the use of inference has generated concerns about discrimination. Following the Straight Outta Compton uproar, ProPublica reported that it was able to place a housing-related Facebook ad that excluded those of black, Asian, and Hispanic ethnic affinity. Facebook claimed to crack down after this feature was found to be in violation of the Fair Housing Act, but just weeks ago, ProPublica confirmed the company has continued to allow discriminatory ads to go through. For example, while posing as a housing rental agency, ProPublica purchased a Facebook ad with settings to exclude potential renters “interested in Islam, Sunni Islam and Shia Islam.” The ad was approved in 22 minutes. Of course, Facebook users do not explicitly report their race or ethnic affinity, and while there is a profile field for religion, a lot of users don’t give this information. Facebook’s algorithms fill in the blanks through probabilistic models—or inference.

Yet many Americans aren’t aware that social networks and advertisers are offering them personalized content based on inference algorithms. When they learn about it, they often dislike the idea. In 2016, my colleagues Emily Paul, Pavel Venegas, and I conducted a research study in partnership with the Center for Democracy and Technology, focused on understanding attitudes about inference and personalization. (The research was funded by the UC­–Berkeley Center for Long-Term Cybersecurity and the Center for Technology, Society, and Policy.) Out of 748 U.S. internet users surveyed, 58 percent had never or rarely thought about ads targeting them based on their inferred race, and 65 percent found the idea to be unacceptable or somewhat unacceptable.

Respondents particularly disliked the use of race for personalizing the prices of products, with only 8 percent saying that using their inferred race would be somewhat acceptable or acceptable. Across personalization contexts, many participants connected the use of race to broader discrimination and societal implications: “Didn’t we determine racial profiling was inappropriate?” one survey participant asked. “Why is it okay for a corporation to behave in this manner?”

Most web users surveyed don’t like marketing based on their household income level, either, with 67 percent finding this type of targeted ad to be somewhat unacceptable or unacceptable. Some respondents had emotional responses to the idea that their income might influence the marketing they see, using words like “rude,” “scary,” and “creepy.” Others described missed opportunities: “One should not be limited to being offered only what is appropriate for their income range. … A person may be willing to spend [more] on something they really want.”

Attitudes about personalization depend on the context in which it takes place, whether an inference is accurate, and how people feel about the sensitivity of the type of data used. For example, participants in our study did not see their gender as a sensitive type of personal information in general, but they viewed the use of inferred gender in pricing as mostly unacceptable. In contrast, in advertising, respondents found that the use of inferred city or town of residence was mostly acceptable. Here some respondents wrestled with how the information was obtained, calling it “unsettling” or “uncomfortable,” but the use of nonspecific location data generally mirrored real-world expectations: “It does not identify me because it just associates me with 100,000 people that live near me, and might give me important local information.”

In the offline world, we have agency over the disclosure of our personal information, or it’s at least bounded by those who are in physical proximity to us. A supermarket cashier might reasonably assume that you live in the area, but it would be shocking to discover that they know your home and work addresses, which competing stores you shop at, the route of your commute, or other information that real-time location tracking offers. It’s not relevant and raises safety concerns. Likewise, the store would surely receive complaints if the cashier asked each customer for his or her household income.

Our study suggests that the industry needs new standards to better inform and empower people to protect their personal information. This requires not only allowing users control over what data are collected about them but also what is inferred about them, how those inferences are used, and by whom. It’s currently rare for companies to give users a window into this inference-based personalization online, but Google offers a step in the right direction, giving users the ability to view (and, if they’re incorrect, modify) age, gender, and interest inferences, and the option to turn off ad personalization completely.

If industry can’t find a path to respecting users on its own, users deserve a regulatory hook that can help it to find the way. Companies and policymakers should adopt a user-centered approach that takes into account social norms of disclosure in the offline world. For guidance on how to treat users, the tech industry can take advantage of existing frameworks such as the Federal Trade Commission’s Fair Information Practice Principles, which advises that users be informed and active participants in decisions about their personal information, including specifying how it will be used when it is collected. Even if user information is not considered sensitive or personally identifiable on its own, if it will be used to infer personally identifiable information, that should be shared with the consumer.

Industry would have us believe that personalization is good for everyone, seamlessly connecting buyers and sellers, regardless of what drives those connections. We can expect its use to continue: A 2016 survey of marketers by Forrester Research found that 74 percent plan to invest in advancing analytics capabilities, and we increasingly see possible applications of artificial intelligence to this area. However, industry should proceed thoughtfully and resist the temptation to overstep, particularly as people put more of their lives online. For example, recent research demonstrated that algorithms can be used to scan Instagram photos for markers of mental health. While it may be tempting to use such information without our knowledge for marketing or pricing (insurance or otherwise), our research suggests that’s not the kind of personalization Americans want.

This article is part of Future Tense, a collaboration among Arizona State University, New America, and Slate. Future Tense explores the ways emerging technologies affect society, policy, and culture. To read more, follow us on Twitter and sign up for our weekly newsletter.