Since the very beginning of the computer revolution, researchers have dreamed of creating computers that would rival the human brain. Our brains are information machines that use inputs to generate outputs, and so are computers. How hard could it be to build computers that work as well as our brains?
In 1954 a Georgetown-IBM team predicted that language translation programs would be perfected in three to five years. In 1965 Herbert Simon said that “machines will be capable, within twenty years, of doing any work a man can do.” In 1970 Marvin Minsky told Life magazine, “In from three to eight years we will have a machine with the general intelligence of an average human being.” Billions of dollars have been poured into efforts to build computers with artificial intelligence that equals or surpasses human intelligence. Researchers didn’t know it at first, but this was a moonshot—a wildly ambitious effort that had little chance of a quick payoff.
So far, it has failed. We still know very little about how the human brain works, but we have learned that building computers that rival human brains is not just a question of computational power and clever code.
A.I. research was launched at a summer conference at Dartmouth in 1956 with the moonshot vision that “every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it.” Seventeen years later, the 1973 Lighthill report commissioned by the U.K. Science Research Council concluded that “in no part of the field have the discoveries made so far produced the major impact that was then promised.” Funding dried up and an A.I. winter began. There was a resurgence of A.I. research in the 1980s, fueled by advances in computer memory and processing speed and the development of expert systems, followed by a second A.I. winter as the limitations of expert systems became apparent. Another resurgence began in the 1990s and continues to this day.
Widely publicized computer victories over world champions in backgammon, checkers, chess, Go, and Jeopardy! have fueled the idea that the initial hopes for A.I. are on the verge of being realized. But just as in the first decades of moonshot hope, ambitious predictions and moving goalposts continue to be the norm.
In 2014, Ray Kurzweil predicted that by 2029, computers will have human-level intelligence and will have all of the intellectual and emotional capabilities of humans, including “the ability to tell a joke, to be funny, to be romantic, to be loving, to be sexy.” As we move closer to 2029, Kurzweil talks more about 2045.
In a 2009 TED talk, Israeli neuroscientist Henry Markram said that within a decade his research group would reverse-engineer the human brain by using a supercomputer to simulate the brain’s 86 billion neurons and 100 trillion synapses.
These failed goals cost money. After being promised $1.3 billion in funding from the European Union, Markram’s Human Brain Project crashed in 2015. In 2016, the market research firm PwC predicted that GDP would be 14 percent or $15.7 trillion higher in 2030 because of A.I. products and services. They weren’t alone. McKinsey, Accenture, and Forrester also forecast similar figures by 2030, with Forrester in 2016 predicting $1.2 trillion in 2020. Four years later, in 2020, Forrester reported that the A.I. market was only $17 billion. It now projects the market to reach $37 billion by 2025. Oops!
The $15 trillion predictions made in 2016 assumed the success of A.I. moonshots such as Watson for health care, DeepMind and Nest for energy use, Level 5 self-driving vehicles on public roads, and humanlike robots and text. When moonshots like these work, they can be revolutionary; when they turn out to be pie in the sky, the failures are costly.
We have learned the hard way that winning a game of Go or Jeopardy! is a lot easier than processing words and images, providing effective health care, and building self-driving cars. Computers are like New Zealander Nigel Richards, who memorized the 386,000 words in the French Scrabble dictionary and won the French-language Scrabble World Championship twice, even though he doesn’t know the meaning of the French words he spells. In the same way, computer algorithms fit mathematical equations to data that they do not understand and consequently cannot employ any of the critical thinking skills that humans have.
If a computer algorithm found a correlation between Donald Trump tweeting the word with and the price of tea in China four days later, it had no way of assessing whether this correlation is meaningful or meaningless. A state-of-the-art image recognition program was 99 percent certain that a series of horizontal black and yellow lines was a school bus, evidently focusing on the color of the pixels and completely ignoring the fact that buses have wheels, doors, and a windshield.
The health care moonshot has also disappointed. Swayed by IBM’s Watson boasts, McKinsey predicted a 30–50 percent productivity improvement for nurses, a 5–9 percent reduction in health care costs, and health care savings in developed countries equal to up to 2 percent of GDP. The Wall Street Journal published a cautionary article in 2017, and soon others were questioning the hype. A 2019 article in IEEE Spectrum concluded that Watson had “overpromised and underdelivered.” Soon afterward, IBM pulled Watson from drug discovery, and media enthusiasm waned as bad news about A.I. health care accumulated. For example, a 2020 Mayo Clinic and Harvard survey of clinical staff who were using A.I.-based clinical decision support to improve glycemic control in patients with diabetes gave the program a median score of 11 on a scale of 0 to 100, with only 14 percent saying that they would recommend the system to other clinics.
Following Watson’s failure, the media moved on to Google health care articles in Nature and other journals that reported black-box results with unreported tweaks that were needed to make the models work well. After Google published its protein folding paper, an expert in structural biology said, “Until DeepMind shares their code, nobody in the field cares and it’s just them patting themselves on the back.” He also said that the idea that protein folding had been solved was “laughable.” An international group of scientists described a Google paper on breast cancer as another “very high-profile journal publishing a very exciting study that has nothing to do with science. … It’s more an advertisement for cool technology. We can’t really do anything with it.” Such cautions are well deserved in light of the flop of Google’s highly touted Flu Trends algorithm. After claiming to be 97.5 percent accurate in predicting flu outbreaks, Google Flu Trends overestimated the number of flu cases for 100 of the next 108 weeks, by an average of nearly 100 percent, before being quietly retired.
The self-driving vehicle moonshot is in a similar state. By late 2018, it was becoming clear that self-driving cars were much harder than originally thought, with one Wall Street Journal article titled “Driverless Hype Collides With Merciless Reality.” In 2020, startups like Zoox, Ike, Kodiak Robotics, Lyft, Uber, and Velodyne began layoffs, bankruptcies, revaluations, and liquidations at deflated prices. Uber sold its autonomous unit in late 2020 after years of claiming that self-driving vehicles were its key to future profitability. An MIT task force announced in mid-2020 that fully driverless systems will take at least a decade to deploy over large areas.
Overall, A.I. moonshots are proving to be an expensive collection of failures. An October 2020 Wired article titled “Companies Are Rushing to Use AI—but Few See a Payoff” reported that only 11 percent of firms that have deployed A.I. are reaping a “sizable” return on their investments. One reason is that costs often turn out to be higher—much higher—than originally assumed. According to a fall 2020 MIT Sloan Management Review article, “A good rule of thumb is that you should estimate that for every $1 you spend developing an algorithm, you must spend $100 to deploy and support it.”
The 2020 edition of the “State of AI Report,” published by A.I. investors Nathan Benaich and Ian Hogarth, concluded that “we’re rapidly approaching outrageous computational, economic, and environmental costs to gain incrementally smaller improvements in model performance.” For example, “Without major new research breakthroughs, dropping the [image recognition] error rate from 11.5% to 1% would require over one hundred billion billion dollars!”
The fact is most moonshots fail: nuclear fusion, synthetic fuels, supersonic flight, maglev, and blockchain for everything. Instead, successful technologies generally begin in small and often overlooked applications and then expand to bigger and more important ones. Transistors were first used in hearing aids and radios before becoming ubiquitous in military equipment, computers, and phones. Computers began with accounting applications and later expanded to every function of a company. LEDs were first used in calculators and automobile dashboards, long before being used for lighting. The internet began as a tool for professors before becoming the most widely used technology since electricity. Solar cells were used in satellites and remote locations long before they were used to generate electricity for urban homes and business. In almost every case, technologies begin in a niche and then incrementally expand to other applications over decades through exponential improvements in price and performance.
Some companies successfully focus their A.I. efforts on solutions to small problems with achievable benefits. For instance, DHL uses A.I.-controlled robots to find packages, move them around warehouses, and load them onto planes. And Microsoft recently acquired Nuance, a company best known for a deep learning voice transcription service that is very popular in the health care sector.
Many similar examples can be found in robotic process automation—software robots that emulate humans interacting with digital systems. It can be used for accounting, manufacturing, financial, and engineering transactions, and it is the fastest-growing segment of the A.I. market.
The same incremental approach can be used for health care, self-driving vehicles, and more. Mutually beneficial diffusion and progress can come from collaboration among large research hospitals within and across countries as researchers learn from one another and generalize from one case to another. The holy grail of a robotaxi that can operate without a driver in every geographic location no matter the weather remains elusive, but self-driving vehicles are used successfully in constrained environments like mining camps, large factories, industrial parks, theme parks, golf clubs, and university campuses. It is surely better to perfect small solutions before moving on to crowded public roads with a plethora of unforeseen hazards.
One of the reasons A.I. overpromised and underdelivered is that we didn’t anticipate that building a computer that surpasses the human brain is the moonshot of all moonshots. Computers may someday rival human intelligence. In the meantime, we should recognize the limitations of A.I. and take advantage of the real strengths of computers. The failure of A.I. moonshots is not a reason to give up on A.I., but it is a reason to be realistic about what A.I. can do for us.
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.