There’s a lot you might guess about a person based on their voice: their gender, their age, perhaps even their race. That’s your brain making an educated guess about the identity of a speaker based on what you’ve experienced, but sometimes, those guesses are wrong. (People I talk to on the phone who don’t know my name often assume I’m white because I speak English without an accent. They frequently express surprise to learn I’m Asian.) In a recent paper, a group of MIT researchers set out to investigate what a computer can guess about a person’s appearance from their voice.
To do that, the researchers trained their model using a dataset called AVSpeech, a selection of YouTube videos originally compiled by Google researchers for a different project. The model was fed face and voice data from hundreds of thousands of YouTube examples. Then, researchers fed the voices to the model and asked it to create a face it thought matched the voice. In the end, the model was decent at predicting what a person looked like but struggled with people of certain identities. For instance, while the model renders an Asian American man speaking Chinese as an Asian man, it draws up a white man when that same person is speaking English instead. It also appeared to have issues with voice pitch—it assumed people with high-pitched voices were women and those with low were men—and age. In short, it appears that the model learned some basic stereotypes about a person’s face and voice.
Unbeknownst to him, Nick Sullivan, head of cryptography at Cloudflare, contributed to this model’s “education.” He said a friend sent him the paper and was “quite surprised and confused” to see his face among the “successful” renderings. “I saw a photo of me, a computer construction of my face, and a computer-generated image that didn’t resemble me but had a similar nose and jaw dimensions,” says Sullivan. (In my opinion, he’s being quite generous about that computer-generated image; it’s unrecognizable as him.)
Part of his confusion was that he hadn’t signed any waivers to be a part of a machine learning study, but he had signed waivers for appearance on YouTube videos, so he figured maybe one of those videos found its way into the dataset the researchers used. However, after some digging, he discovered the video used in the dataset and didn’t recall signing any kind of waiver for that one.
Whether Sullivan signed a waiver likely doesn’t matter, though. Most research using data from human participants does require scientists to obtain informed consent (most often in the form of waivers). But YouTube videos are considered publicly available information and not classified as “human subjects research”—even if researchers are studying the intricacies of your face and voice. And while YouTube users own the copyright to their own videos, researchers using clips could make the argument that their work qualifies as “fair use” of copyrighted materials, since the end result is “transformative” of the original work. (In the case of the Speech2Face data, the model quite literally transforms your voice and face data into something else entirely.) Casey Fiesler, assistant professor of information science at the University of Colorado Boulder, says she’s never seen a copyright holder challenge researchers who used their internet posts as data. “There probably aren’t legal issues with it,” she says.
But just because something is legal doesn’t mean it’s ethical. That doesn’t mean it’s necessarily unethical, either, but it’s worth asking questions about how and why researchers use social media posts, and whether those uses could be harmful. I was once a researcher who had to obtain human-subjects approval from a university institutional review board, and I know it can be a painstaking application process with long wait times. Collecting data from individuals takes a long time, too. If you could just sub in YouTube videos in place of collecting your own data, that saves time, money, and effort. But that could be at the expense of the people whose data you’re scraping.
But, you might say, if people don’t want to be studied online, then they shouldn’t post anything. But most people don’t fully understand what “publicly available” really means or its ramifications. “You might know intellectually that technically anyone can see a tweet, but you still conceptualize your audience as being your 200 Twitter followers,” says Fiesler. In her research, she’s found that the majority of people she’s polled have no clue that researchers study public tweets.
Some may disagree that it’s researchers’ responsibility to work around social media users’ ignorance, but Fiesler and others are calling for their colleagues to be more mindful about any work that uses publicly available data. For instance, Ashley Patterson, an assistant professor of language and literacy at Penn State University, ultimately decided to use YouTube videos in her dissertation work on biracial individuals’ educational experiences. That’s a decision she arrived at after carefully considering her options each step of the way. “I had to set my own levels of ethical standards and hold myself to it, because I knew no one else would,” she says. One of Patterson’s first steps was to ask herself what YouTube videos would add to her work, and whether there were any other ways to collect her data. “It’s not a matter of whether it makes my life easier, or whether it’s ‘just data out there’ that would otherwise go to waste. The nature of my question and the response I was looking for made this an appropriate piece [of my work],” she says.
Researchers may also want to consider qualitative, hard-to-quantify contextual cues when weighing ethical decisions. What kind of data is being used? Fiesler points out that tweets about, say, a TV show are way less personal than ones about a sensitive medical condition. Anonymized written materials, like Facebook posts, could be less invasive than using someone’s face and voice from a YouTube video. And the potential consequences of the research project are worth considering, too. For instance, Fiesler and other critics have pointed out that researchers who used YouTube videos of people documenting their experience undergoing hormone replacement therapy to train an A.I. to identify trans people could be putting their unwitting participants in danger. It’s not obvious how the results of Speech2Face will be used, and when asked for comment, the paper’s researchers said they’d prefer to quote from their paper, which pointed to a helpful purpose: providing a “representative face” based on the speaker’s voice on a phone call. But one can also imagine dangerous applications, like doxing anonymous YouTubers.
One way to get ahead of this, perhaps, is to take steps to explicitly inform participants their data is being used. Fiesler says that when her team asked people how they’d feel after learning their tweets had been used for research, “not everyone was necessarily super upset, but most people were surprised.” They also seemed curious; 85 percent of participants said that if their tweet were included in research, they’d want to read the resulting paper. “In human-subjects research, the ethical standard is informed consent, but inform and consent can be pulled apart; you could potentially inform people without getting their consent,” Fiesler suggests.
Sullivan says it would have been nice to have been notified that his voice and face were in a research database, but he also acknowledges that given the size of the corpus, it would’ve been a difficult task. And in the case of Speech2Face, researchers were using a dataset originally collected for a different project. Even if the original researchers had notified participants that their videos were being used, would the Speech2Face users then also have a responsibility to renotify those users with details about their work? In any case, it seems like researchers could at least notify people whose personal details are published in a paper. “Since my image and voice were singled out as an example in the Speech2Face paper, rather than just used as a data point in a statistical study, it would have been polite to reach out to inform me or ask for my permission,” says Sullivan.
But even informing YouTubers might not be the best decision in all cases. Patterson, for instance, considered doing so but decided against it for two reasons. First, some of the YouTubers were under 18, which meant that reaching out to them would have required her to first contact their parents. Based on the videos’ candid content about their families and school experiences, Patterson said, it seemed like the YouTubers’ imagined audiences were definitely not parents. “It seemed like a violation of the way they envisioned this platform,” she says, but she also acknowledges that researchers’ eyes could similarly be seen as a violation. Additionally, Patterson said that the IRB officials she talked with said they had no precedent for contacting creators of publicly available content like YouTube videos and that ironing that all out would have taken months. In Patterson’s case, it just didn’t seem practical.
In the end, there’s no one-size-fits-all for researchers to determine whether using publicly available data is appropriate, but there is certainly more room for more discussion. “It would be nice to see more reflection from researchers about why this is OK,” says Fiesler, suggesting researchers’ published papers could discuss the ethical considerations they made. (The Speech2Face paper did include an “ethics” section, but it did not include this type of discussion, and when asked for comment, they pointed me back to this section.) Patterson agrees: “I think there are going to be more conversations for sure, and in the not too distant future, you might not even be able to do this kind of work.”
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.