How Our Own Genetic Code Could Make the Internet Last for Millennia

Never fear, this OK Go video will live on.

Researchers at Microsoft, with the help of a team at the University of Washington, have managed to store 200 megabytes of information inside a strand of synthetic DNA. According to a press release, the molecule was more than 1.5 billion nucleotides long and contained, among many other things, the music video for OK Go’s song “This Too Shall Pass.” While this project wasn’t the first to look into the storage capabilities of artificial DNA, the new achievement—a record in the field—is an early step on the way to larger goals: making internet storage cheaper, physically smaller, and much more permanent.

The technology that currently houses the internet has a few problems, the first being that it’s temporary. All information on the internet is located somewhere, in a physical storage device. These devices store that information using either an arrangement of electrons or magnetism, but because it’s hard to get a perfect electrical/magnetic insulation, some information may become demagnetized, and the electrons will inevitably shift around or escape, leading to data loss. What that means is that unless it is transferred periodically to a new piece of hardware, all data on the internet will eventually disappear.

Since this effect operates on the scale of hundreds or even thousands of years, depending on the exact equipment, you’re unlikely to encounter any problems in your day-to-day life, but at a societal level, the problems are more severe. Movie studios and record companies are wrestling with the question of how to format their archives, since movies and music made today are often “born digital” and must be converted back to analog if the creators want cheaper and longer-lasting storage. But in these industries, the convenience of leaving things digital often wins out. (Not that analog is an entirely safe medium; a 2007 report from the Academy of Motion Picture Arts and Sciences estimated that at that time, only half of the movies made before 1950 still existed.) In the world of science, the situation is even worse; scientific research and experimental data are stored almost entirely online, on one of these ever-degrading servers, where continual transfer is necessary. To this end, the American Geophysical Union passed a policy in 1993 encouraging its members to archive their data in centers that promise to periodically transfer information to new hardware, to avoid data loss and preserve data sets for researchers in the future. Unless we find a way to save important information in a more permanent way, we may be eventually faced with a modern Library of Alexandria.

DNA-based storage offers a solution to the problem of impermanence. When the remains of wooly mammoths were discovered in recent years, researchers were able to sequence the genome, despite most specimens having died more than 10,000 years ago, thanks to the relative stability of the DNA molecule.

Our efficiency in converting binary data (ones and zeros) into physical DNA sequences (the A, C, G, and T that comprise the four possible nucleotides of genetic code) is improving rapidly. Last year, a team in Germany was able to encode 83 kilobytes of data into a DNA strand, making last week’s announcement a 1,000-fold increase in less than a year. If this trend continues, our cultural and intellectual breakthroughs could eventually be stored in what Microsoft envisions as a “vast digital attic,” where the most critical insights of our species are encoded into DNA, for long-term storage.

But constant decay isn’t the only obstacle to archiving the internet. We also have to grapple with its size, which is expected to reach 44 trillion gigabytes by 2020. (That’s 44 zettabytes for you prefix sticklers.) This level of exponential growth poses a few logistical issues, many relating to the sheer cost of maintaining server farms that can span millions of square feet and require millions of dollars’ worth of electrical power each year.

DNA, however, is very dense. According to computer scientist Luis Ceze, the lead researcher for the University of Washington on this project, if all the information available online today were translated into DNA storage, it could fit into a shoebox. You could then pop that shoebox in a refrigerator set to just below freezing, and expect have readable information for hundreds of thousands of years. For archival purposes, this would save an almost unbelievable amount of money in maintenance. (That 2007 report from the Academy of Motion Picture Arts and Sciences found that it’s about 11 times more expensive to archive movies digitally instead of in an analog form.)

DNA is also appealing as a storage alternative because it is highly unlikely to become obsolete. Archivists have always struggled with the problem of obsolesce—data are stored on machines or media devices that become rapidly outdated, and must be quickly transferred to more current technology before the ability to read the older format is lost forever. However, given the ubiquity of DNA-based lifeforms on this planet, it’s unlikely that we will ever abandon DNA as a technology.

Eventually, it is possible that the technology may improve to the point of having DNA-based computers, although there are significant obstacles. More immediately, DNA-based storage may soon be preserving all those embarrassing photos of your childhood forever.