If you’ve managed to get over your own NSA-induced, Snowdenian fear of typing, here’s another important privacy question: Do you trust that bag of potato chips you’re holding? The word out of MIT is that you probably shouldn’t. Nearby potted plants should also be treated with suspicion. What makes these everyday items a threat to your (conversational) personal data? It’s just that MIT announced on Monday that its researchers, along Microsoft and Adobe, have developed an algorithm that can reconstruct sound simply by analyzing video of the vibrations of objects around you.
In one set of experiments, [researchers] were able to recover intelligible speech from the vibrations of a potato-chip bag photographed from 15 feet away through soundproof glass. In other experiments, they extracted useful audio signals from videos of aluminum foil, the surface of a glass of water, and even the leaves of a potted plant… “When sound hits an object, it causes the object to vibrate,” says Abe Davis, a graduate student in electrical engineering and computer science at MIT and first author on the new paper. “The motion of this vibration creates a very subtle visual signal that’s usually invisible to the naked eye. People didn’t realize that this information was there.”
Reconstructing audio from video requires that the frequency of the video samples — the number of frames of video captured per second — be higher than the frequency of the audio signal. In some of their experiments, the researchers used a high-speed camera that captured 2,000 to 6,000 frames per second. That’s much faster than the 60 frames per second possible with some smartphones, but well below the frame rates of the best commercial high-speed cameras, which can top 100,000 frames per second.
The researchers plan to present their work at a computer graphics conference, but if you’d like a quick explainer, they have a handy video that breaks down their findings.