Future Tense

Excel Created Major Typos in 20 Percent of Scientific Papers on Genes


Excel is partially responsible for errors in 20 percent of scientific papers dealing with genes, according to a new study.

In an effort to “raise awareness of the problem,” three scientists published findings that suggest one-fifth of all scientific papers about genes contain detrimental typos due to an Excel default setting that converts gene names to dates or numbers.

One mistaken gene conversion for example turns the gene symbol SEPT2, short for Septin 2, to “2-Sep.” Likewise, MARCH1—aka Membrane-Associated Ring Finger (C3HC4) 1, E3 Ubiquitin Protein Ligase—is rendered as “1-Mar.” The scientists wrote: “Furthermore, RIKEN identifiers were described to be automatically converted to floating point numbers (i.e. from accession ‘2310009E13’ to ‘2.31E+13’).”


Researchers Mark Ziemann, Yotam Eren, and Assam El-Osta explained that the “inadvertent gene symbol conversion is problematic because these supplementary files are an important resource in the genomics community that are frequently reused.” Or, as Softpedia puts it:


[I]f researchers would enter this data one Excel cell at a time, they would surely notice. But they don’t, mainly because most of this data is copy-pasted from tables or other sources inside Excel files, hundreds or thousands of values at a time.

The conversion takes place without the researcher noticing and culminates in research papers with errors in their supplementary files, sometimes contributing to unverifiable data or errors in subsequent calculations.

The gene name conversion issue with Excel and other similar software programs isn’t new. The scientists acknowledge “this problem and workarounds were first highlighted over a decade ago,” but, they say, the errors persist. And while the mistakes could be detected by more thorough reviews by those publishing the scientific papers, there is no readily available way fix for the software. As the new study says, “there is no way to permanently deactivate automatic conversion to dates in MS Excel and other spreadsheet software such as LibreOffice Calc or Apache OpenOffice Calc.” However, the authors note that Google Sheets did not make the same gene name conversion errors.

Ultimately, however, the study put the onus for accuracy on those in charge of publishing scientific papers, not software programs. They wrote: “Inadvertent gene name conversion errors persist in the scientific literature, but these should be easy to avoid if researchers, reviewers, editorial staff and database curators remain vigilant.”