Is That a Giraffe or a Cockroach?

It is alarmingly easy to trick image recognition systems.

Photo illustration of a giraffe ... or a cockroach.
Photo illustration by Slate. Photos by Getty Images Plus.

Adapted from You Look Like a Thing and I Love You: How Artificial Intelligence Works and Why It’s Making the World a Weirder Place, by Janelle Shane. Out now from Voracious Books.

Suppose you’re running security at a cockroach farm. You’ve got advanced image recognition technology on all the cameras, ready to sound the alarm at the slightest sign of trouble. The day goes uneventfully until, reviewing the logs at the end of your shift, you notice that although the system has recorded zero instances of cockroaches escaping into the staff-only areas, it has recorded seven instances of giraffes. Thinking this a bit odd, perhaps, but not yet alarming, you decide to review the camera footage. You are just beginning to play the first “giraffe” time stamp when you hear the skittering of millions of tiny feet.

What happened?

Your image recognition algorithm was fooled by an adversarial attack. With special knowledge of your algorithm’s design or training data, or even via trial and error, the cockroaches were able to design tiny note cards that would fool the A.I. into thinking it was seeing giraffes instead of cockroaches. The tiny note cards wouldn’t have looked remotely like giraffes to people—they’d be just a bunch of rainbow-colored static. And the cockroaches didn’t even have to hide behind the cards—all they had to do was keep showing the cards to the camera as they walked brazenly down the corridor.

Weirdly hackable machine learning algorithms underlie an increasing number of applications in our world, from image recognition to malware detection to making health care recommendations. They’re useful in part because they can use trial and error to solve tough problems that require finding subtle trends in large amounts of data to detect whether a particular transaction is fraudulent, or that tissue is cancerous. But they’re also susceptible to a strange, cyberpunk form of hacking that leaves them fragile in a way that a human employee wouldn’t be.

Besides the part about sentient cockroaches, this isn’t complete fantasy. These sorts of adversarial attacks are a weird feature of machine learning–based image recognition algorithms. Researchers have demonstrated that they could show an image recognition algorithm a picture of a lifeboat (which it identifies as a lifeboat with 89.2 percent confidence), then add a tiny patch of specially designed noise way over in one corner of the image. A human looking at the picture could tell that this is obviously a picture of a lifeboat with a small patch of rainbow static over in one corner. The A.I., however, identifies the lifeboat as a Scottish terrier with 99.8 percent confidence. The researchers managed to convince the A.I. that a submarine was in fact a bonnet and that a daisy, a brown bear, and a minivan were all tree frogs. The A.I. didn’t even know that it had been fooled by that specific patch of noise. When asked to change a few pixels that would make the bonnet look like a submarine again, the algorithm changed pixels sprinkled throughout the image rather than targeting the guilty noise patch.

That tiny adversarial patch of static is the difference between a functioning algorithm and a mass cockroach breakout.

It’s easiest to design an adversarial attack when you have access to the inner workings of the algorithm. But it turns out that you can fool a stranger’s algorithm, too. Researchers at LabSix have found that they can design adversarial attacks that work on neural networks (a kind of machine learning algorithm loosely based on the brain), even when they don’t have access to its inner connections. Using a trial-and-error method, they could fool neural nets when they had access only to their final decisions and even when they were allowed only a limited number of tries (100,000, in this case). Just by manipulating the images they showed it, they managed to fool Google’s image recognition tool into thinking a photo of skiers was a photo of a dog instead.

Here’s how: Starting with a photo of a dog, they replaced some of its pixels one by one with pixels from a photo of skiers, making sure to only pick pixels that didn’t seem to have an effect on how much the A.I. thought the photo looked like a dog. If you played this game with a human, past a certain point the human would start to see the skiers overlaid on the picture of the dog. Eventually, when most of the pixels were changed, the human would see only skiers and no dog. The A.I., however, still thought the picture was a dog, even after so many pixels were replaced that human s would see an obvious photo of skiers. The A.I. seemed to base its decisions on a few crucial pixels, their roles invisible to humans.

So could you protect your algorithm against adversarial attacks if you didn’t let anyone play with it or see its code? It turns out that it might still be susceptible if the attacker knows what data set it has been trained on. This potential vulnerability shows up in real-world applications like medical imaging and fingerprint scanning.

The problem is that there are just a few image data sets in the world that are both free to use and large enough to be useful for training image recognition algorithms, and many companies and research groups use them. These data sets have their problems. For One, ImageNet, has 126 breeds of dogs but no horses or giraffes, and its humans mostly tend to have light skin. But they’re convenient because they’re free. Adversarial attacks designed for one A.I. will likely also work on others that learned from the same data set of images. The training data seems to be the important thing, not the details of the way the A.I. was designed. This means that even if you kept your A.I.’s code secret, hackers may still be able to design adversarial attacks that fool your A.I. if you don’t go to the time and expense of creating your own proprietary data set.

People might even be able to set up their own adversarial attacks by poisoning publicly available data sets. There are public data sets, for example, to which people can contribute samples of malware to train anti-malware A.I. But a paper published in 2018 showed that if a hacker submits enough samples to one of these malware data sets (enough to corrupt just 3 percent of the data set), then the hacker would be able to design adversarial attacks that foil A.I.s trained on it.

It’s not entirely clear why an algorithm’s success hinges so much more on its training data than its design. And it’s a bit worrying, since it means that the algorithms may in fact be recognizing weird quirks of their data sets rather than learning to recognize objects in all kinds of situations and lighting conditions. This data set memorization—in other words, overfitting might still be a far more widespread problem in image recognition algorithms than we’d like to believe.

But it also means that algorithms in the same family—algorithms that learned from the same training data—understand each other strangely well. When I asked an image recognition algorithm called AttnGAN to generate a photo of “a girl eating a large slice of cake,” it generated something barely recognizable. Blobs of cake floated around a fleshy hair-topped lump studded with far too many orifices. The cake texture was admittedly well done. But a human would not have known what the algorithm was trying to draw.

But do you know who can tell what AttnGAN was trying to draw? Other image recognition algorithms that were trained on the COCO data set. Visual Chatbot gets it almost exactly right, reporting “a little girl is eating a piece of cake.”

The image recognition algorithms that were trained on other data sets, however, are mystified. “Candle?” guesses one of them. “King crab?” “Pretzel?” “Conch?”

The artist Tom White has used this effect to create a new kind of abstract art. He gives one A.I. a palette of abstract blobs and color washes and tells it to draw something (a jack-o’-lantern, for example) that another A.I. can identify. The resulting drawings look only vaguely like the things they’re supposed to be a “measuring cup” is a squat green blob covered in horizontal scribbles, and a “cello” looks more like a human heart than a musical instrument. But to ImageNet-trained algorithms, the pictures are uncannily accurate. In a way, this artwork is a form of adversarial attack.

Of course, as in our earlier cockroach scenario, adversarial attacks are often bad news. In 2018 a team from Harvard Medical School and MIT warned that adversarial attacks in medicine could be particularly insidious—and profitable. Today, people are developing image recognition algorithms to automatically screen X-rays, tissue samples, and other medical images for signs of disease. The idea is to save time by doing high throughput screening so humans don’t have to look at every image. Plus, the results could be consistent from hospital to hospital, everywhere the software is implemented—so they could be used to decide which patients qualify for certain treatments or to compare various drugs to one another.

That’s where the motivation for hacking comes in. In the United States, insurance fraud is already lucrative, and some health-care providers are adding unnecessary test and procedures to increase revenue. An adversarial attack would be a handy, hard-to-detect way to move some patients from category A to category B. There’s also temptation to tweak the results of clinical trials so a profitable new drug gets approved. And since a lot of medical image recognition algorithms are generic ImageNet-trained algorithms that have had a little extra training time on a specialized medical data set, they’re relatively easy to hack. This doesn’t mean it’s hopeless to use machine learning in medicine—it just means that we may always need a human expert spot-checking the algorithm’s work.

Another application that may be particularly vulnerable to adversarial attack is fingerprint reading. A team recently showed that it could use adversarial attacks to design what it called a masterprint—a single fingerprint that could pass for 77 percent of the prints in a low-security finger­ print reader. The team was also able to fool higher-security readers, or commercial fingerprint readers trained on different data sets, a significant portion of the time. The masterprints even looked like regular fingerprints—unlike other spoofed images that contain static or other distortions—which made the spoofing harder to spot.

Voice-to-text algorithms can also be hacked. Make an audio clip of a voice saying “Seal the doors before the cockroaches get in,” and you can overlay noise that a human will hear as subtle static but that a voice­ recognition algorithm will hear as “Please enjoy a delicious sandwich.” It’s possible to hide messages in music or even in silence.

Résumé screening services might also be susceptible to adversarial attack not by hackers with algorithms of their own but by people trying to alter their resumes in subtle ways to make it past the A.I. The Guardian reports: “One HR employee for a major technology company recommends slipping the words ‘Oxford’ or ‘Cambridge’ into a CV in invisible white text, to pass the automated screening.”

It’s not like machine learning algorithms are the only technology that’s vulnerable to adversarial attacks. Even humans are susceptible to the Wile E. Coyote style of adversarial attack: putting up a fake stop sign, for example, or drawing a fake tunnel on a solid rock wall. It’s just that machine learning algorithms can be fooled by adversarial attacks that humans would never even register. And as A.I. technology becomes more widespread, we may be in for an arms race between A.I. security and increasingly sophisticated and difficult-to-detect hacks.

Book cover
Voracious Books

Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.