How future historians will use the Twitter archives.

The Library of Congress

Among the many criticisms of Twitter, the most common by far is that no one cares what you ate for breakfast.

In fact, quite a few people care. “I actually think it’s very useful,” says Paul Freedman, a professor at Yale University who studies the history of food. For him, a 140-character ode to your KFC Double Down—along with the worshipful photo you took before devouring it—could be a priceless historical document. “Historians are interested in ordinary life,” Freedman says. “And Twitter is an incredible resource for ordinary life.”

Hence the decision by the Library of Congress last week to store the complete archives of Twitter. Starting six months from now, every last tweet—currently produced at a rate of 50 million a day—will be saved on an LoC hard drive and will presumably be accessible to historians for … well, forever.

Digital archiving isn’t anything new. A nonprofit digital library called the Internet Archive started collecting snapshots of the World Wide Web in 1996. University libraries regularly scan their research collections to make them accessible on the Web. Google Books is currently scanning the books of at least 20 major research libraries.

But the decision to archive Twitter takes digital preservation to a new level of detail. In the past, all archives, even digital ones, had to be selective. The Internet Archive doesn’t preserve every last byte of the Web—only the seemingly important parts. The Twitter archive, by contrast, will be mind-numbingly complete. Everything from reactions to the uprising in Iran to Robert Gibbs’ first tweet to your roommate’s two-sentence analysis of Hot Tub Time Machine will be saved for posterity. Which is, from a historian’s perspective, historic. Now that we’ve started logging all the stray thoughts hurled into cyberspace, the prospect of recording every last word ever published—to paraphrase archivist Brewster Kahle, we’re “one-upping the Greeks”—doesn’t seem especially crazy.

The question is, does the preservation of digital content, from tweets to Facebook updates to blog comments, make the job of historians easier or harder?

The answer is: both. On the one hand, there’s more useful information for historians to sift. On the other, there’s more useless information. And without the benefit of hindsight, it’s impossible to tell which is which. It’s like what John Wanamaker supposedly said about advertising: He knew half of it was wasted, he just didn’t know which half.

The trick will be organization. Hashtags—the # symbols people use to create discussion threads, such as #ashtag for the Iceland volcano cloud and #snowpocalypse for the February snowstorm that swept Washington, D.C.—are a start. But many tweeters don’t bother to tag their posts. Historians will probably be able to search by keyword. But that can lead them astray, too. How do you know if someone is complaining about the windows in their house or the Windows on their computer?

Data-mining has become sophisticated enough to make these distinctions based on context. Sometimes that means looking at other keywords surrounding a keyword. (If the word “laptop” appears near “Windows,” for example, the author is probably talking about software.) It could also mean looking at metadata—when the tweet was sent, where it was sent from, whom the person is following and vice versa. Twitter has no plans to share public metadata with the LoC, but a spokesman says it would be “open to discussing this with them.”

Whether historians can make sense of this data depends on the tools they have to sort through it. “This is what historians have always done: they create order out of chaos,” says Martha Anderson, the director of the LOC’s National Digital Information Infrastructure and Preservation Program. “It’s kind of like saying, ‘Are newspapers useful for historians?’” says Elaine Tyler May, a history professor at the University of Minnesota and president of the Organization of American Historians. * “We know that they are, but you have to know what you’re looking for.”

Save it all, says history professor Dan Cohen. You never know what people will do with it. Cohen heads up the Center for History and New Media at George Mason University. After 9/11, the center created an archive that included tens of thousands of personal stories from that day. When researchers later looked at the server logs to see who had visited, they found some visitors were linguists studying teen slang. Someone else browsed the archive while researching cell phone usage, since many of the stories involved cell phones. “That’s the power of a large scale open archive,” Cohen says.

Movements in historiography are usually influenced by politics. In the 1960s and ‘70s, for example, bottom-up social history rather than the “great men” approach to scholarship became popular in academia. But technology can also breed new kinds of history. Go back to the food example. Culinary historians who study the 1950s have limited source materials. “You can read cookbooks, you can read restaurant reviews, you can read write-ups in the New Yorker,” says Nicolaas Mink, a lecturer on food history at the University of Wisconsin-Stevens Point. “But you don’t get a sense of how that food is actually being prepared or being received by ultimate consumer.” Twitter is a font of amateur food criticism—exactly what a historian interested in broad social attitudes toward food would be looking for.

Or take the history of adolescents. The source material for studying how and what kids think has always been limited to school papers, letters from parents and teachers, the occasional diary. Again, mostly “top-down” history. There’s little real-world data about how kids interact with each other. Blogs, tweets, and Facebook updates offer glimpses into the lives of children on a scale that no randomized study could re-create.

Twitter as historical document would also allow scholars to trace phenomena in real time. Historians have long tried to reconstruct how word of AIDS spread in New York in the early 1980s. But it’s hard to document who knew what when. “A lot of those conversations are lost except for people like me who might remember the time 30 years ago,” says David Mindich, a journalism professor at Saint Michael’s College. Twitter preserves those cultural moments. Google has already created a program called Replay that maps Twitter topics over time.

There are limits to what Twitter can tell us. Part of the problem is selection bias. Tweets are designed for public consumption. As Mindich put it: “If you’re looking at what someone ate for breakfast, you’re also looking at what someone wants everyone to think they ate for breakfast.” Or consider Facebook photos. If a sociologist conducted a study of college student behavior based on pictures tagged on Facebook, he would assume 98 percent of a student’s time was spent clutching red Solo cups. Data collected from Twitter could also suffer from unreliability. “It will be the job of historians to verify,” says Elaine Tyler May.

Of course, selection bias is always an issue. The letters preserved in someone’s archive are the letters lovingly set aside for preservation. Perhaps the slapdash quality of tweets makes them more reliable as historical documents. No one broadcasts the mouth feel of the KFC Double Down with a mind to posterity—at least they didn’t until the LoC announced its plan. One could argue that a lot less calculation goes into a tweet than into, say, the collected letters of William Jefferson Clinton.

That said, Twitter historiography will be relatively cut-and-dry, since it’s all public. The real challenge will be figuring out what happens to private digital missives, like the contents of someone’s Gmail account. Google’s current policy is to transfer a dead person’s account to relatives if they provide the right paperwork. But what happens 100, 200, 500 years from now, when the public interest in opening someone’s e-mail account outweighs the private interest in keeping it secret? “It’s a matter of time,” says Martha Anderson of the LoC. “But I feel like if anyone looked at my Gmail account, they’d think this is a boring woman who just shops all the time.” Which, for future historians studying consumption patterns in early 21st-century America, could be downright fascinating.

Correction, April 21, 2010: This article originally identified May as a professor at the University of Wisconsin. (Return  to the corrected sentence.)

Like  Slate on Facebook. Follow us on  Twitter.