Data-driven research is revealing everything from patterns in gun violence to epicenters of syphilis during the Civil War. Now advocacy group Autism Speaks is using Google Cloud Platform to create an enormous database of sequenced genomes from people with autism and their family members. The goal is to facilitate new discoveries about autism by making the data openly accessible.
As part of its Ten Thousand Genomes Program (AUT10K), Autism Speaks has already collected and sequenced 1,000 genomes. But each one is about 100 gigabytes, so the database quickly became unwieldy. It was hard even to get the data to researchers for analysis. But collaborating with Google gives Autism Speaks access to storage, plus the analysis tools already available through Google Cloud Platform.
Autism Speaks’ chief science officer Rob Ring says that up until now, “We’ve literally had to package up hard drives and ship them from the site where they’re sequenced.” But using cloud storage allows for much more flexibility and will let “folks … bring the questions to the data.”
Ring says that the data will have an open API through which anyone can access the genomes. “Not everybody has the capability to store that kind of data or has access to computer resources needed to analyze it,” he said. “By being able to provide open access, anyone can do it rather than just those privileged few who have those kinds of resources.”
The collaboration with Autism Speaks is the first big announcement for Google’s genomics platform, which is building off of existing Google Cloud Platform tools for analyzing big data sets. And though Google doesn’t have any special access to Autism Speaks’ data beyond the open interface that will be available to everyone, providing cloud and big data services is a large contribution in itself.
David Glazer, a member of the Google genomics team, said that the idea of a collaboration with Autism Speaks started about a year ago, when it became clear that the group’s genomic data would fit well with the types of projects Google Cloud Platform is looking to assist with. “Biology has become a data-limited science, meaning the hard part is no longer gathering enough data—the hard part is storing, analyzing, interpreting, and collaborating around the data,” he said. “We are adding genomic specific capability to the Google Cloud Platform to help people who are doing biology do it better, faster, and more cost-effectively than they could have done without Google Cloud Platform.”
As Autism Speaks begins next generation research with its AUT10K database, the group’s controversial history with the vaccines and autism debate isn’t part of the conversation. But in the past, Autism Speaks’ top management has struggled to come to a consensus about whether preservatives in vaccines can cause autism. And Alison Singer, a senior executive at Autism Speaks from 2005 to 2009, quit because she felt that the organization’s internal conversation was still fixated on something that had been scientifically disproved.
In 2012 when Donald Trump caused a mini flare-up of the debate, the Autism Speaks website still said that a link between autism and vaccines “remains possible.” And it doesn’t seem that the statement has changed since then. On the website today, after noting that, “studies have not found a link between vaccines and autism,” the website adds, “It remains possible that, in rare cases, immunization may trigger the onset of autism symptoms in a child with an underlying medical or genetic condition.”
But building a genomic database seemingly has nothing to do with the controversy, and both Autism Speaks and Google want it to stay that way. “Right now the evidence doesn’t point to a connection between vaccines and autism. So from my point of view, that scientific debate is really well addressed,” Ring said. “Things like that never come into this discussion [about AUT10K] at all.” Glazer agrees. When asked if politicized controversies factor into conversations with clients like Autism Speaks, he said, “It doesn’t. We have a technical conversation about their technical data.”
And Autism Speaks certainly has lots of data on the petabyte scale. Though it will take time to reach the 10,000 genome goal, open access to the database on Google Cloud Platform will allow people around the world to start doing genomic analysis in the meantime. If you want to take a deep dive into some very detailed autism data, this is the place to go.