Lip-reading aids the hearing and speech impaired, but it’s a specialized skill and has limits. For example, it only works if you can see a speaker’s mouth. And it’s not a prevalent skill in the general population, so someone with a speech impediment may have trouble being understood. If computers could do it accurately, more services could incorporate lip-reading functions like real-time speech-to-text translation into their accessibility features. Now researchers at the University of East Anglia are using machine learning to develop a lip-reading process with better accuracy than past attempts. And the ultimate goal is “a fool-proof recognition model for lip-reading.”
Beyond accessibility, the technology has other potential applications as well. One is making it easier for law enforcement to work with videos that have poor audio quality or no audio at all (usually the case with security footage). Automated lip-reading would also be useful for video recorded in a very noisy place. The researchers even pointed out that lip-reading can be used in sports like soccer to try to gain an advantage from afar by determining what opponents are talking about.
Computer science researchers Helen Bear and Richard Harvey developed a training method for their lip-reading program that classifies sounds and mouth shapes more successfully than previous algorithms. Let’s be clear: The new technique only identifies the correct word between 10 to 20 percent of the time. But the translations are often understandable on the scale of a sentence or paragraph. And Bear emphasized to TechCrunch that the accuracy the system is achieving is way above guessing at random. “The idea behind a machine that can lip read is that the machine itself has got no emotions, it doesn’t mind if it gets it right or wrong — it’s just trying to learn,” she said.