This question originally appeared on Quora.
Answer by Marc Ettlinger, Ph.D., Linguistics, UC-Berkeley: The articulators ventriloquists have to constrain are the lips and jaw. Everything else can’t be seen (except the larynx moving up and down, which most audiences wouldn’t notice; next time you watch one, especially a man with an Adam’s apple, keep an eye on the neck). With the jaw, they set it at a fixed point that is relatively closed so people can’t see the tongue moving inside but is open enough to produce many of the sounds. And the lips are open enough so as to be able to make vowels and coronals and velars consonants, plus a w-ish sound.
What’s left are the Labial consonants, i.e., b, p, m, v, f and w.
With the only constraint being that the lips can’t move away from this near-fixed position (though look carefully and many will cheat on the occasional b) the trouble is really only with these sounds. Obviously, the first trick is to try to avoid words with these sounds in them. When they can’t, they’ll substitute one sound for another: b with d, p with t, m with n , v with w and f with th. You can see these substitutions if you pay careful attention. In this You Tube video Look at the way this guy says Simon (jump to 3:17):
Or, in this video, notice how this woman says fresh as thresh (at 0:20): (Compare, for example, the common substitution of th for f in certain English dialects, e.g., I’m free years old).
There are two reason why we don’t really notice.
First, acoustically, n and m aren’t that distinct.
Same with b and d:
Third, and more importantly, our perception of speech is driven as much by top down versus bottom-up processing (Top-down and bottom-up design). That is, what you’re expecting to hear makes you hear things that might not be there. It’s why all that stuff with devil’s messages worked on back-masked records: There really was nothing there, but if someone tells you what you’re going to hear, you somehow make it out.
A famous example is the Ganong effect (Gallery) where people hear an ambiguous sound, between d or t, in a context that makes them a word or non-word (e.g., dash and task, where tash and dask aren’t words).
For the same exact sound, people think they hear the sound that makes the word:
The same thing is happening with the ventriloquist: They take advantage of top-down processing to make you hear the sounds they’re not making.
One last thing on the linguistics associated with ventrilloquism:
When a ventriloquist does fix their jaw, another important thing to note is that the tongue is compensating for the lack of jaw movement resulting in the sound being made in a different way than usual.
Compare, for example, your jaw position for regular ee and ah.
This sounds a little bit off from a normal ee. That’s because you’re not in canonical optimal position. The reason we have the vowels we do—every language has ee most have u and many have ah—is because the mouth is an optimally designed resonator for certain vowels. Making the ah differently no longer takes advantage of this quantal vowel space (Vowel Theories). That is, while tongue position is continuous, clearly articulated vowels are not.
What is linguistically interesting with this tongue compensation for the fixed jaw/lips is that you likely should have been able to make this non-canonical ah even if you haven’t ever tried before.
That’s because of a feedback loop between your ear and your mouth. This feedback loop has been tested in some cool ways. Folks like Prof. Frank Guenther, John Houde and Sazzad Nasir (Sensorimotor Learning Laboratory) have played around with acoustic and sensorimotor feedback and its effects on speech using contraptions like this:
More questions on Speech and Language Pathology: