A Computer Reads the State of the Union

What natural-language analysis tells us about the president.

For the overlapping set of people who both watch the State of the Union and care about linguistic analysis—still with us?—tallying up words from each year’s address is its own kind of Washington sabermetrics. (Meanwhile, everyone who vowed to take a shot every time Bush said “taxes” Monday night is having a bad morning.) The Times reliably produces its bubble charts of the most common thematic words each year, which provide an excellent CliffsNotes for the big priorities.

By natural-language-processing standards, simply counting the frequency of words is unthinkably primitive. As computers get better at understanding the structure of sentences—think about that corrugated green line under a sentence in Microsoft Word reminding you to avoid the passive voice—they can do a much more thorough job reading political speeches and offering their insights.

To prove it, natural-language expert Kevin Dooley, a professor of supply-chain management at Arizona State University, ran each of Bush’s State of the Union addresses (counting the first one, which was technically an address to the joint Congress) through his algorithms for Slate. Crawdad Technologies, the company Dooley co-founded with fellow ASU professor Steve Corman, uses original-text-analysis software to identify the most influential words in a text, and it is currently used to measure “buzz” in the political blogosphere at Wonkosphere.com.

“Influential words,” Dooley explains, “are those that create meaning in the text by connecting ideas together, and they are used strategically by a speaker in order to structure a message in a particular way.” To calculate influence, Crawdad uses its patent-pending “Centering Resonance Analysis” (PDF) to rearrange a text into a network, where it becomes much easier to determine which words act as vital connectors to others. (This is similar to the concept of “centrality” used in Slate’s steroids social network.)

Based on this method, Dooley identified four distinct voices over the course of Bush’s address. The first, “Domestic Bush,” appeared only once, in the president’s 2001 message to the joint session of Congress shortly after taking office. Words like budget, program, school, and tax were at the center of his message, the Crawdad technology reports.

The following two years, after Sept. 11, can be categorized as “Security Bush.” Weapon, threat, Afghanistan, and Hussein are vital. By 2004, the year Bush was up for re-election and the war in Iraq was under way, he shifted to “Visionary Bush.” Great shows up as a highly influential word each of the next four years, while man, life, freedom, and human are also popular.

Monday night, a fourth and final Bush emerged, Crawdad tells us: “Legacy Bush.” New and year topped the list, along with leader, Congress, and agreement. Iraq(i) was also influential.

“As is common in most State of the Unions, Bush framed his thoughts in a nationalistic manner, using the words America(n), nation, good, people, and world very significantly,” Dooley tells Slate. “We note that Bush believes his legacy is still very much tied to Iraq and the Iraqi people.

“Bush introduced some new influential words into the 2008 SOTU, two of which stand out and emphasize the legacy frame,” he continues. “First, Bush sees his legacy as one built around agreements—for an economic stimulus package, for trade, for intelligence, for energy, and for Middle East peace.  Second, Bush frames his legacy in terms of being a leader, because it is leaders who make the difference—in Congress, in the Middle East, in Iraq, and in Iran. Thus while Bush may have laid out specific programs and initiatives, it is his leadership on terrorism and Middle East and trade which he wants to be remembered by.”

So says the machine.