Future Tense

Knowing Is Half the Battle: Combating Big Data’s Dark Side Through Data Literacy

John Podesta is heading the White House review on big data and privacy.

Photo by Chip Somodevilla/Getty Images

The White House Office of Science and Technology Policy is nearing the end of its 90-day review of big data and privacy. Soon, industry leaders, privacy advocates, engineers, and developers expect to learn regulators’ key questions and priorities for balancing innovation in predictive analytics while protecting against harm or discrimination.

But one topic has received comparatively little attention: the role of federal policies to support data literacy.

As our regime of predictive analytics expands, we need a better system of accountability—and that requires public knowledge about data collection and analysis. What information is collected about us? How is the information analyzed and how are interpretations applied? Asking—and understanding answers to—these questions will let citizens and consumers determine which practices in predictive analytics are good or right for society.

All of us stand to benefit from learning how big data works. We experience surveillance by default, with corporate and government actors collecting and analyzing massive amounts of data points about us on a daily basis.

The processes might be invisible, but their consequences are very real for some groups.

The poor, immigrants, communities of color, and other historically marginalized groups disproportionately feel the effects of a high-tech surveillance state. Their persistent struggles with imbalanced and unfair information flows provide us the best clues for what an unchecked system of data collection, analysis, and application could bring about en masse. For example, as the ACLU has documented, the FBI has engaged in “collecting racial and ethnic information and ‘mapping’ American communities around the country based on crude stereotypes about which groups commit different types of crimes.” Intrusions can seem more benign, too: On Future Tense last year, Christine Rosen highlighted the rise of social-engineering surveillance: In Providence, R.I., a program aimed at improving early childhood language development in poor families planned to outfit kids with “smart” clothing to record daily conversations. “The lack of concern about how state surveillance of private citizens—even in the interest of ‘improving’ those citizens—is increasing with little public debate about the challenges such interventions pose to freedom and autonomy,” Rosen wrote.

Meanwhile, the underserved have less opportunity to take part in “good surveillance” projects. As late adopters of new technologies, poor people find themselves excluded from certain kinds of data flows. Poorer neighborhoods might not catch wind of civic apps like SeeClickFix, which lets people report potholes and other nonemergencies to the local government, or they might not take to particular social media platforms to report community problems, thus leaving them off the radar of local government officials responding to crowd-sourced needs.

Most programs encouraging technology adoption in underserved communities in the United States do not have the means to teach privacy or data literacy. I know this not only from first-hand research, but also from scanning resources like digitalliteracy.gov or those produced by corporate-run programs.

But many programs to bridge the digital divide offer innovative models for how to engage with underserved communities—and with the right resources, they can grow to address privacy and big data issues. Projects supported by the Broadband Technology Opportunities Program, a Recovery Act program, serve as case in point. BTOP-funded broadband adoption training efforts and public computer centers have met with success when embedded within trusted social infrastructure. Places like Philadelphia FIGHT, which serves HIV/AIDS populations, and People’s Emergency Center, which supports homeless women and children, introduced computers, tablets, and the Internet in the larger context of improving their health or finding safe housing. Doing so made the context of learning familiar, safe, and relevant.

Data literacy can’t be taught by parachute or evangelism. The best thing would be to give resources to schools, libraries, and other community anchor institutions to teach this new material. And that means treating data literacy as a federal policy issue.

It’s nice that scores of researchers, technologists, privacy advocates, and policymakers have been convening around the country to talk about the potential uses and abuses of big data. But we need to include the rest of America, too. Let’s hope the White House report, which should be released soon, will be the start, not the end, of a national conversation about privacy and data literacy.