If you thought IBM using “quietly scraped” Flickr images to train facial recognition systems was bad, it gets worse. Our research, which will be reviewed for publication this summer, indicates that the U.S. government, researchers, and corporations have used images of immigrants, abused children, and dead people to test their facial recognition systems, all without consent. The very group the U.S. government has tasked with developing best practices and standards for the artificial intelligence industry, which includes facial recognition software and tools, is perhaps the worst offender when it comes to using images sourced without the knowledge of the people in the photographs.*
The National Institute of Standards and Technology, a part of the U.S. Department of Commerce, maintains the Facial Recognition Verification Testing program, the gold standard test for facial recognition technology. This program helps software companies, researchers, and designers evaluate the accuracy of their facial recognition programs by running their software through a series of challenges against large groups of images (data sets) that contain faces from various angles and in various lighting conditions. NIST has multiple data sets, each with a name identifying its provenance, so that tests can include people of various ages and in different settings. Scoring well on the tests by providing the fastest and most accurate facial recognition is a massive boost for any company, with both private industry and government customers looking at the tests to determine which systems they should purchase. In some cases, cash prizes as large as $25,000 are awarded. With or without a monetary reward, a high score on the NIST tests essentially functions as the tech equivalent of a Good Housekeeping seal or an “A+” Better Business Bureau rating. Companies often tout their test scores in press releases. In recognition of the organization’s existing market approval role, a recent executive order tasked NIST with developing a plan for new standards for the artificial intelligence industry, which includes facial recognition technology among other machine learning and algorithmic tools.*
Through a mix of publicly released documents and materials obtained through the Freedom of Information Act, we’ve found that the Facial Recognition Verification Testing program depends on images of children who have been exploited for child pornography; U.S. visa applicants, especially those from Mexico; and people who have been arrested and are now deceased. Additional images are drawn from a Department of Homeland Security scenario in which DHS staff simulated regular traveler-image capture for the purposes of testing. Finally, individuals booked on suspicion of criminal activity are the subject of the majority of the images used in the program.*
When a company, university group, or developer wants to test a facial recognition algorithm, it sends that software to NIST, which then uses the full set of photograph collections to determine how well the program performs in terms of accuracy, speed, storage and memory consumption, and resilience. An “input pattern,” or single photo, is selected, and then the software is run against one or all of the databases held by NIST. For instance, one test, known as the false non-match rate, measures the probability of the software failing to correctly identify a matching face in the database. Results are then posted on an agency leaderboard, where developers can see how they’ve performed relative to other developers. In some respects, this is like more familiar product testing, except that none of the people involved in the testing know about, let alone have consented to, the testing.
Altogether, NIST data sets contain millions of pictures of people. Any one of us might end up as testing material for the facial recognition industry, perhaps captured in moments of extraordinary vulnerability and then further exploited by the very government sectors tasked with protecting the public. Not only this, but NIST actively releases some of those data sets for public consumption, allowing any private citizen or corporation to download, store, and use them to build facial recognition systems, with the photographic subjects none the wiser. (The child exploitation images are not released.) There is no way of telling how many commercial systems use this data, but multiple academic projects certainly do.
When we reached out to NIST for comment, we received the following from Jennifer Huergo, director of media relations:
The data used in the FRVT program is collected by other government agencies per their respective missions. In one case, at the Department of Homeland Security (DHS), NIST’s testing program was used to evaluate facial recognition algorithm capabilities for potential use in DHS child exploitation investigations. The facial data used for this work is kept at DHS and none of that data has ever been transferred to NIST. NIST has used datasets from other agencies in accordance with Human Subject Protection review and applicable regulations.
In a follow-up conversation after this article was published, Huergo asked us to clarify that her statement of “one case” does not refer to one use of the child exploitation data set but rather to the “one case” of that “operational data set” having been created. Huergo acknowledged that the child exploitation data set is in fact used for the facial recognition vendor test program, which to date includes 100 corporations, universities, and other developers. While Huergo pointed out that the exploitation database remains housed with DHS, she also confirmed that a global set of developers, many of whom are working on commercial technology—not on tools designed to help solve open cases of exploitation—are in fact testing their commercial applications against images of exploited children. We asked Huergo whether the families of the children in the images had consented to their use for these purposes or even been notified of their use, and she replied that she has “no information on that.” She did point out that the images are de-identified to the degree possible (no names or other geolocation information attached beyond what might be in the pictures) and that no NIST staff view the images.
We also asked Huergo why consent had not been gathered for individuals represented in the mug-shot and visa-image databases. She responded that the “government uses data that it has on different people in ways that are consistent with regulations.”
While we understand that NIST trusts its counterparts to follow their “respective missions,” U.S. government agencies have a long history of targeting and bias. Additionally, pointing to human subject review and regulations is a deflection from conversations about data ethics in a way that is both familiar and deeply troubling. [Update, March 22, 2019: We now have confirmation of the nonconsensual—possibly without notification even—use of a range of images in the possession of the government, including mug shots that are obtained before any charges are brought or a crime has been proved committed, visa application images, and child exploitation images.]
The use of these image databases in not a recent development. The Child Exploitation Image Analytics program—which is a data set for testing by facial recognition technology developers—has been running since at least 2016 with images of “children who range in age from infant through adolescent” and the majority of which “feature coercion, abuse, and sexual activity,” according to the program’s own developer documentation. These images are considered particularly challenging for the software because of the greater variability of position, context, and more. The Multiple Encounter Dataset, in use since 2010, contains mug shots, notably taken before anyone has been convicted of a crime, and deceased persons supplied by the FBI. It reproduces racial disparities that are well-known in the U.S. legal system. According to our calculations, while black Americans make up 12.6 percent of the U.S. population, they make up 47.5 percent of the photographs in the data set.
This sort of bias in source data sets creates problems when software needs to be “trained” on its task (in this case, to recognize faces) or “tested” on its performance (the way NIST tests facial recognition software). These data set biases can skew test results, and so misrepresent what constitutes an effective facial recognition system. In light of recent incidents of racial and other bias in facial recognition tools, academics, industry leaders, and politicians alike have called for greater regulation of this training.
But calls for greater diversity in data sets come at a cost—cooperation with organizations like NIST and enrolling more nonconsensual faces into data sets. As scholar Zoé Samudzi, for example, notes, “It is not social progress to make black people equally visible to software that will inevitably be further weaponized against us.” Rather than focusing on greater diversity, we should be focusing on more regulation at every step in the process. This regulation cannot come from “standards bodies” unfit for the purpose.
Instead, policies should be written by ethicists, immigration activists, child advocacy groups, and others who understand the risks that marginalized populations in the U.S. face, and are answerable to those populations. How do we understand privacy and consent in a time when mere contact with law enforcement and national security entities is enough to enroll your face in someone’s testing? How will the black community be affected by its overrepresentation in these data sets? What rights to privacy do we have when we’re boarding a plane or requesting a travel visa? What are the ethics of a system that uses child pornography as the best test of a technology? We need an external, independent regulatory body to protect citizens from the risk that conversations about ethics in facial recognition technology may become a smoke screen for maintaining government and corporate interests. Without informed, responsible, and accountable regulators, the standards that NIST creates will continue to take advantage of the most vulnerable among us.
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.
Correction, March 22, 2019: This article originally misstated that NIST has a “regulatory” role on facial recognition under a newly released executive order. NIST does not create regulations, but is a testing body, and under the new executive order has been given 180 days to develop a plan for standards. The executive order also refers more broadly to artificial intelligence, not specifically to facial recognition. The article also misstated that boarding photos from airports had been used as a data set for testing facial recognition algorithms. Those images came from a Department of Homeland Security scenario and involved volunteers.
Update, March 22, 2019: This article was updated to include new information provided by an NIST spokesperson about the use of child exploitation images.