In a recent article in the scientific journal PLOS ONE, researchers at Duke University concluded that a speech characteristic known as “vocal fry” may be harmful to people’s career prospects, with women being slightly more at risk than men. Which sounds alarming. After all, anything that is found to influence our potential for employment is sure to gain attention in such a competitive job market.
Predictably, the article was covered widely in various media, including in the Atlantic by Olga Khazan, who implied that women need to police how they speak for fear of being perceived as untrustworthy by a potential employer. On closer inspection, though, it turns out that self-policing may not be necessary at all, at least with respect to vocal fry. The original study contained a number of serious flaws, which, when considered, prevent us from drawing any conclusions at all about which specific acoustic characteristics sounded “untrustworthy” to the listeners who participated.
The design of the study was relatively straightforward. A group of 800 people, via an online survey service called Qualtrics, listened to either all male or all female speakers produce the sentence: “Thank you for considering me for this opportunity.” Some of these sentences were produced with vocal fry, which, in contrast to normal voice, involves some irregularity in the vibration of the vocal cords and lower pitch (see the image below). To a listener, the “vocal fry” regions sound something like a stick being dragged along a fence, where one can hear individual vibrations, or pulses, of the vocal folds.
The participants in this experiment were asked to listen to each speaker’s pair of utterances—with and without vocal fry—and to indicate which of the pair “was perceived to be more educated, competent, trustworthy, attractive, and which speaker they would hire.” The expectation was that listeners might have different attitudes towards those speakers with vocal fry than those without it—and this is indeed what they found.
The big problem, however, is how the authors produced the voices with vocal fry.
When linguists, phoneticians, or speech scientists want to study whether an acoustic characteristic in someone’s voice influences how listeners perceive them, they often will record a person and then modify those aspects of the person’s voice which they wish to test. This process allows one to carefully control the acoustic dimensions in the signal and requires some knowledge of speech acoustics and digital signal processing. But certain aspects of one’s voice are harder to modify than others. As it happens, vocal fry is one of those hard-to-modify characteristics.
Fortunately, though, there’s a solution. Just as one might buy two types of apples to compare their flavors, we can look for speakers who just happen to produce a lot of vocal fry in their speech and compare them to those who do not. And it’s possible that if we were to play the speech of these two groups to 800 people, we’d find that listeners view one differently than the other. This is, in fact, what Ikuko Patricia Yuasa did in her study of vocal fry, or creaky voice, back in 2010. The authors of this study, on the other hand, did no such thing. Rather, they recorded speakers producing normal utterances and then trained them to produce an utterance with vocal fry. In other words, the speakers are attempting to imitate a voice with vocal fry.
There are several reasons why this is problematic, but the first is perhaps the most obvious: most people are not very good at imitating someone else’s speech. If you ask the average person to “talk like a Texan,” they might (or might could) try to imitate something that they believe to be an important characteristic of Texas speech. Yet, to most listeners, especially those from Texas, they would sound like a caricature of an actual Texan. The same thing goes for people imitating an upper-class British accent, or Arnold Schwarzenegger, or Sarah Palin.
And that’s the rub. The speakers in this study consciously insert creak into their voice, but not in a way that it occurs in natural speech. For example, previous studies of vocal fry have found that it’s fairly restricted. It tends to arise in specific locations in words and sentences where we might otherwise expect low pitch. But that’s not the case here. Instead, the speakers produce a flat, more-or-less robotic voice when simulating vocal fry.
That’s not the only way in which the imitated speech sounds unnatural, however. With one exception (speaker 5), each of the imitated sentences produced by female speakers is also longer than the corresponding non-imitated sentence for that speaker, as shown in the chart below:
These differences do not appear to be restricted to particular words either. As seen in the figure below, almost all words were longer in the imitated speech than in the natural speech. The longer duration here, in comparison with the shorter natural sentences, may have the quality of sounding stilted to the listener.
A related problem in the study is the authors’ acoustic analysis of the speech signal. The calculation of pitch requires determining how well successive vocal fold vibrations correlate with one another. When the vocal folds are vibrating normally, such a calculation is possible, but when vocal fold vibration is too irregular, as in vocal fry, it’s impossible to calculate pitch accurately. We can try, but we may get erroneous values. The researches here, however, neither controlled for nor mentioned even how pitch was calculated during durations of vocal fry. In fact, the words “Thank you,” which contained no vocal fry in any of the utterances, had universally lower pitch in the vocal fry sentences than in the normal sentences. Which suggests that the speakers may have simply been lowering pitch across the entire imitated sentence, rather than simply adding vocal fry.
Finally, no quantitative acoustic analysis of actual vocal fry (using technical measurements like jitter or shimmer) was ever included in the authors’ study. Yes, you heard that right—in a study relating vocal fry to listener attitudes and hireability there was no quantitative estimate of whether and how the imitated speech differed with respect to naturally occurring vocal fry, the supposed test variable. (In passing, we should note that the simulated vocal fry yielded a relatively negative impression for both male and female speakers, with the female speakers coming out a little bit worse, so the fact that headlines targeted exclusively women is itself misleading.)
To sum up, the speakers in the study simply attempted to lower their overall pitch level while imitating vocal fry rather than including more vocal fry in a natural fashion. The increased effort involved in the imitation also made their utterances longer.
These two acoustic differences, among others, would seem to contribute to the speakers sounding unnatural when imitating vocal fry. So, when listeners judge speakers with vocal fry as sounding “untrustworthy,” there’s a good chance that they’re simply making that assessment based on the speaker not sounding like herself. The better lesson that we might take away from this study, then, is that your job prospects are harmed if you try to talk (or act) like someone you’re not.
A version of this post appeared on Language Log.