Nearly 1,300 people spent this past weekend racing to fill little boxes inside larger boxes, ever mindful of spelling, trivia, wordplay, and a ticking clock. They were competitors—newcomers, ardent hobbyists, and elite speed solvers—in the American Crossword Puzzle Tournament, the pastime’s most prestigious competition. And most of them got creamed by some software.
The annual event, normally set in a packed hotel ballroom with solvers separated by yellow dividers, was virtual this year, pencils swapped for keyboards. After millions of little boxes had been filled, a computer program topped the leaderboard for the first time.
Dr. Fill is the algorithmic creation of Matt Ginsberg, an Oxford-trained astrophysicist—and computer scientist, stunt pilot, bridge player, novelist, and magician—who lives in Oregon. When he began the project a decade ago, his motivation was simple: “I sucked at crosswords, and it just pissed me off.” Ginsberg hoped one day to walk into the tournament hall, wave his laptop above his head, and show the humans who’s boss. Now, if only virtually, he has.
Held since 1978, and founded by the longtime New York Times puzzle editor Will Shortz, the ACPT is a gauntlet of puzzles, in a range of sizes and difficulties, up to and including fiendish, created by the best constructors in the business, solved over two grueling days. Those are followed by one championship puzzle for the three top finishers who, in normal times, complete it on enormous dry-erase grids wearing big noise-blocking headphones. Throughout, the solvers are ranked using a formula that balances accuracy and speed. The top human contestants are consistently perfect, and can solve the equivalent of a really tough Saturday Times puzzle in, say, three minutes.
Dr. Fill wasn’t perfect; it finished the tournament with three errors. But its blazing speed—most puzzles in well under a minute—helped it outscore all comers, edging the top human by a razor-thin 15 points. (This reporter finished 172nd, with one error, some 1,500 points behind the leaders.)
As long as their field has existed, computer scientists have been harnessing games, which represent small slices of the real world that so interests artificial intelligence research. Checkers, backgammon, chess, Go, poker, and other games have witnessed the machines’ invasions, falling one by one to dominant A.I.s. Now crosswords have joined them.
Here is Dr. Fill solving the six puzzles from Saturday in real time:
To get this good, Dr. Fill has ingested mountains of data, including the entire contents of Wikipedia and a giant database of crossword clues and answers scraped from the web. (Dr. Fill is disconnected from the internet during tournaments.) The program debuted at the ACPT in 2012, where it finished a modest 141st—“I Beat Dr. Fill” buttons were distributed to everyone who finished above it. The program steadily improved yet peaked at 11th place in 2017.
But two things happened to Dr. Fill this year. First, it ran not on its usual laptop but on a custom-built desktop with a 64-core processor and two GPUs—a heavy box that Ginsberg wouldn’t ordinarily lug across the country. Second, Ginsberg recently received a serendipitous, last-minute email.
Dan Klein, the head of the Berkeley Natural Language Processing Group, alongside his students, had, like so many people stuck at home, taken a pandemic interest in crosswords. His group set to work on a Berkeley Crossword Solver that took the form of a question-answering system, something like Siri or Alexa. They reached out to Ginsberg to share their work.
Ginsberg’s Dr. Fill was of an earlier era, an example of what is sometimes called “good old-fashioned A.I.”—reliant on human-understandable logic and search—like how Deep Blue searched and ranked millions of chess positions per second in the 1990s. The Berkeley system, on the other hand, was newfangled, an example of a neural network—the less understandable, black box machine-learning systems so prevalent today—like DeepMind’s AlphaGo system that conquered the ancient Chinese game Go.
The original Dr. Fill is very good at searching lightning fast through countless possible placements of words in a grid. It assigns each possibility a probability of being correct and weighs those probabilities across the puzzle, settling on what it sees as the most promising solution. The Berkeley Crossword Solver, meanwhile, is very good at understanding clues—its neural network was trained on 6 million clues and answers. Its network learned what Klein calls the “generalizations and abstractions” that allow a human, or a machine, to understand language.
“Our knowledge of language is based on a massive amount of talking, and hearing people talk, and things we’ve read,” Klein said. “All of this language we use and that we’ve been exposed to, it leaves a trace. And these [A.I.] systems are really no different.” The Berkeley program is like someone who has been raised not by wolves but by sentient crossword puzzles. “It learns by playing crosswords, and this system has played a lot of crosswords,” Klein said, laughing. “People have played fewer crosswords, but have lived in the world.”
An alliance came naturally. Ginsberg and the Berkeley crew started working together just two weeks before the tournament, plugging the latter system into the former, and the centaur program finally ran with just days to go. The result, hastily constructed though it was, was a marvel, its pieces working hand in glove to solve crosswords. Ginsberg’s system handled the grid and the colder, mathematical side of things, searching and placing answers, while the Berkeley team’s system unriddled the hazier, “human” side of the language of the clues, crosswords’ music.
This is how, during the tournament, Dr. Fill figured out that “Trip to watch the big game?” was “safari” and “Pasta dish at the center of a murder mystery?” was “poison penne,” and placed them in an eyeblink. But language remains an occasional hiccup. The program did not realize, in one phonetically themed puzzle that “broke the sound barrier,” that “Crazed” and “Deduces” were, oddly, “mannequin” and “firs.” These were meant to be read, more sensibly, as “manic” and “infers.”
Ginsberg sees the new Dr. Fill as a marriage between two unlikely and often battling partners: good old-fashioned A.I. and modern machine learning. “Those two groups have historically not played well together,” he said. “They don’t like each other. Everybody has this huge bias that they’re going to use one approach and not the other, and it’s been bad.” But playing nice has its benefits. “As a scientist, I am incredibly excited to see these two communities finally working together to solve problems that were too hard for them individually.”
Dr. Fill’s computer victory is ultimately a human victory. “Like we’ve seen with other games and puzzles, beating humans at crossword puzzles is a combination of technology and ingenuity,” said Jonathan Schaeffer, the computer scientist who solved checkers. “Like we’ve seen for other domains, there is no easy answer and there is no substitute for hard work and patience.”
How have crossword-solving humans reacted? In the past, a perennial wave of cheers has filled the ballroom whenever it’s been announced that Dr. Fill has made a mistake. Many solvers have become exasperated with Ginsberg’s annual speech at the event, updating the humans on his program’s incremental assault on their leaderboard. This year, a chorus of digital boos—some joking and some not—filled the chatroom after the announcement that the program had finished first.
“On a personal level, I find the Dr. Fill project annoying,” said Amy Reynaldo, co-editor of Crosswords With Friends. “Nobody wants a machine to beat them at something they are quite good at!” However, she added, “I somewhat grudgingly recognize the sci-tech importance of Matt Ginsberg’s project.”
And as with their compatriots in the worlds of chess or poker or Go, the human solvers will keep right on solving.
“I don’t think about it at all,” said Stella Zawistowski, an elite speed solver from Brooklyn (who finished 10th in the ACPT). “It’s not an affront to what I and other top solvers do, and I think we all knew it would happen someday if Matt kept at it long enough.”
While Dr. Fill was not eligible for the $3,000 first prize, it did attempt the championship puzzle. The human champion, Tyler Hinman, plowed through it in an ungodly fast three minutes. Dr. Fill solved it (perfectly) in 49 seconds.
Shortz, the Times crossword editor and ACPT impresario, hopes Dr. Fill will keep attending—and he suspects this year’s puzzles may have been especially “up Dr. Fill’s alley.” Ginsberg has no plans to stop—and he suspects the puzzles will be made even more bedeviling next year to thwart the good doctor. But this may have a welcome side effect.
“The things that you will have to do to make this hard for automated solvers,” Klein said, “are the same things you would have to do to make truly new and creative crossword puzzles that feel novel, fresh, and exciting.”
Shortz doesn’t fear the machine. “Crossword people are intelligent,” he said. “They have lively minds. They’re interested in the latest development of computers. And, of course, it’s not just a computer. It’s what Matt has done, his ingenuity, that we’re really admiring.”
There’s always a human behind the machine. “Yeah,” Shortz said. “At least so far.”