An A.I. programmer responds to Annalee Newitz’s “When Robot and Crow Saved East St. Louis.”
In 2018 the A.I. robot CIMON was sent to the International Space Station—and that’s when the awkwardness began. A floating sphere with a digital face displaying a few simple expressions, CIMON was supposed to help astronauts through many-step procedures by displaying information and answering questions. When astronaut Alexander Gerst tested it, he found CIMON’s maneuverability impressive but its social awareness perhaps less so. It had been programmed to know Gerst’s favorite song, but had to be ordered multiple times to stop playing it. “Let’s sing along with those favorite hits,” it interrupted, as Gerst tried to get it to record video. Moments later it seemed to take exception to Gerst’s mild comments on its flying ability. “Don’t be so mean, please,” it told him. “Don’t you like it here with me?” Soon it seemed CIMON’s mood detection system had a “hangry” category and had confusedly placed Gerst in it. “Oh, dear, I feel you. I can already hear your stomach roaring. Should we take a look for when it is time for food?” CIMON was soon stowed away.
In Annalee Newitz’s story “When Robot and Crow Saved East St. Louis,” there’s another little round bot called, simply, Robot. Like CIMON, its job is to interact with people: Robot’s purpose is to monitor local humans for signs of disease to spot epidemics before they spread. Unlike CIMON, it’s so charming that I just want to pat its little robot head. “Hello,” it says. “I am your friendly neighborhood flu fighter! Please cough into this tissue and hold it up to the scanner please!” It gives plump-cheeked smiles and waves its tiny gripper arm.
Like CIMON, Robot one day reaches the limits of its programming. Robot travels beyond its initial route among gated communities and smart mansions to the less-gentrified neighborhoods across the river. There it encounters humans who speak languages and dialects it hasn’t been trained on, and humans living in buildings and tents that aren’t registered in any of its databases. It’s completely unprepared to work in these neighborhoods.
Robot should have failed. Algorithmic bias has been the downfall of many a modern-day A.I., from face-recognition software that performs poorly on dark-skinned people, to voice-recognition software that can’t understand some voices, to software that perpetuates or even amplifies biases in the human behaviors it copies.
But then, something magical happens. Robot recognizes that something is wrong and phones its programmer to ask for advice. And when she tells Robot that the solution is to learn new human dialects and habits, it understands what she wants it to do, downloads the code she sends it, and starts gathering data that will help it fix the shortcomings in its training data.
To an A.I. programmer, this is truly a fairy-tale moment. From a real A.I. today, the programmer might get an error code—if they’re lucky. More likely, a robot in this situation would barge on with bad data, working badly and never knowing it, while accusing its users of “being mean.” Unable to navigate unfamiliar building types, a real-life Robot probably would have ended up lost in a closet like a flying Roomba, sticking to its pre-programmed conversational routines, cheerfully asking a Spock T-shirt if it would please cough into a handkerchief.
The problem is that today’s A.I.s—and those of the foreseeable future—are very limited in scope. Called artificial narrow intelligence, or ANI, they can perform well on simple tasks, but don’t understand the world around them or what it is they’re really doing. The A.I.s in our science fiction are almost all artificial general intelligence, or AGI, which can perceive the world at or above the level of a human. Robot at first seems like it might be an ANI—it says what it has been programmed to say, and it has a “preference” for its initial route simply because of its internal coordinate system. But all that changes when Robot phones home. An A.I. programmer sees Robot notice the flaws in its training data, understand abstract concepts, and pick up languages on the fly, and she knows from this point forward that Robot is an AGI and is therefore as magical as a talking mirror from a fairy tale.
The story already has the trappings of a fairy tale, with its talking animals (crows make a delightful appearance), and its title even sounds like a traditional origin tale along the lines of “Why Giraffe has a long neck” or “How Rhinoceros got his skin.”
Like any good fairy tale, there’s just enough reality in this story to keep us grounded. Even ANIs can pick up things like languages and social habits by observation—that’s one of the main attractions of machine learning, the type of algorithm that allows an A.I. to learn by example rather than being explicitly programmed.
For example, a word-vector algorithm will crawl huge corpuses of text such as Amazon product reviews or news articles, and learn from context which words are similar. Map these into a 3D space and you can sail through these meanings like a galaxy, passing clumps of city names, or sports terms, or navigating smoothly from words about politics to words about crime. Using these relationships, word vector algorithms know that “looooooove,” “loveeeeee,” and “lovelovelovelovelove” are similar in meaning, but that “llove” is more distantly related.
Other language algorithms can also learn by example. Given enough examples of how humans have translated certain phrases in certain contexts, translation algorithms like those behind Google Translate can make a paragraph more or less readable in another language. Similar methods have also been used to train algorithms to identify spam emails, sort potential metal band names from would-be My Little Pony names, or determine whether a review is praising a product or trashing it. Algorithms can even learn to generate new text of their own, mimicking product reviews, recipes, Harry Potter fan fiction, and more.
But as anyone who’s interacted with Siri knows, today’s A.I.s don’t understand much about language. They show their limited ANI nature as soon as they get out of their comfort zones—try asking Siri to reword a sentence, or to name three things larger than a lobster. Given enough training examples, a Robot using real-life machine learning might be able to understand verbal directions to a street address. But it probably would be out of its depth if the human told it to fly in through the unfinished walls of the upper floor—unless flying through unfinished walls had figured prominently in that initial training data. And even then, if flying through unfinished walls is tricky enough, an ANI may not be able to learn the skill reliably. When Robot navigates highly variable building designs from mansions to construction sites to tents, it’s showing off its magical AGI abilities.
Robot also learned its new languages from brief conversations, rather than by analyzing huge text data sets, another sign that its abilities are far beyond the ordinary. Word vector and translation algorithms have the jobs that are closest to what Robot needs for language, but they usually need huge amounts of data—all of Wikipedia, for example. But for some languages, including many spoken by millions of speakers, there aren’t many examples of translated texts.
When the training data sets get small, things get weird. For example, when Reddit users and others began using Google Translate to translate nonsense strings of English words into some languages, like Maori or Somali, they found the A.I. would emit strange religious-sounding prophecies. One explanation for the outputs? With these languages, religious texts like the Bible might figure prominently in the small body of translated writings the algorithm trained on. When uncertain about the translation, the A.I. might have resorted to returning the few phrases it knew. In its early days of learning its new languages, Robot might have sounded very odd.
Robot’s ability to pick up nuance was also impressive. There are DARPA crisis teams working to build tools that can scrape together basic translation from what’s available on social media, but these aren’t expected to be nuanced, just enough to provide emergency assistance, not to the level of using endearing puns. In this story, Robot has even less data to go on, just a few overheard conversations. It succeeds because, well, magical talking Robot.
Another difficulty Robot overcame was learning from conversation rather than by reading text. To understand someone pointing to an object and saying “this is a sheep,” you have to first be able to reliably recognize a sheep. Even the best image-recognition algorithms are tripped up by images that seem obvious to humans. They do a lot of guessing based on things they’ve seen during training, and they don’t do well when things don’t match their training data. A family of bears in a field gets labeled as a herd of cows. A sheep in a kitchen gets labeled as a dog. Goats in a tree get labeled as giraffes or birds. And even if you understand what’s going on, there are ambiguities. If a person says a word and then turns and takes a few steps, are they demonstrating “walk” or “left” or “leave”? Even humans find this kind of language acquisition difficult. The ANIs of today (and of the near future) don’t understand the world well enough for this.
Robot is a cross between a magic talking mirror and a fairy-tale fool: inexperienced, but pure of heart and improbably successful. In this fairy tale, Robot works for the CDC, but Robot-level technology would be a real game-changer for humanitarian crisis responders, the crew of the International Space Station, food delivery startups, and more. It would be wonderful to have a robot that could understand instructions, internalize a goal like “help sick people,” and ask for clarification when it’s having trouble. That’s not to mention Robot’s seriously cool ability to decipher the language of crows—I know a few field researchers who’d love to have a Robot on their team.
Having a well-meaning mind behind our algorithms would save us from a lot of the harm that we’re unwittingly inflicting—algorithms that copy our biases, that recommend extreme videos and articles, or that censor non-white, non-binary, non-heterosexual, and/or disabled voices. But what we have running our algorithms in real life is less like the AGI of Robot, and a lot more like the ANI of a Roomba. For now, we can’t rely on our friendly neighborhood talking Robots to rise beyond their biased training data and politely ping us when they’ve discovered a problem. (As CIMON demonstrates, we can’t even rely on them to be friendly!)
Instead, we’ve got to be careful not to expect our algorithms to behave like fun fairy-tale Robots.
Strive for Robot, plan for CIMON.