Too Much of a Good Thing

What to do with all of the data generated by medical devices?

Illustration by Mark Alan Stamaty

On Thursday, March 26, Future Tense—a partnership of Slate, New America, and Arizona State University—will hold an event on medical device security and privacy at the New America office in Washington, D.C. For more information and to RSVP, visit the New America website.

How many lives could be saved if smartphones gave medical workers in remote Kenyan villages access to a suite of accurate, mobile diagnostic tools? Colorimetrix is tackling that question. It’s an application in development that can help smartphones diagnose diabetes, kidney disease, and urinary tract infections, and the hope is that it will eventually detect HIV, malaria, and tuberculosis. Worldwide, academics and businesses are creating apps that they hope will provide in-the-field diagnoses and treatment recommendations where access to reliable, portable, and cheap medical technology is scarce.

But to fulfill those promises, mobile health care technologies, known as mHealth, have to overcome some major challenges related to data collection and usage. Right now treating patients using mHealth isn’t important or even very realistic—what matters is the data collected from these patients. But researchers don’t know exactly what types of information they need. And consequently, so much data is collected that the gatherers don’t know how to use it all.

Advancements in mobile biological sensors are providing opportunities to gather unprecedented amounts of clinical-quality data. With the correct algorithms, that information can revolutionize researchers’ understanding of chronic illnesses and answer questions that have puzzled healers for generations. Data could reveal the driving factors behind leading causes of death, like cancer, heart disease, and stroke. Someday, users could be alerted about an imminent heart attack hours, days, or even months before it happens.

But to create that predictive future, scientists and medical researchers need copious amounts of granular data. And Euan Thomson, the CEO of AliveCor, a mobile electrocardiogram manufacturer and heart disease research company, says detailed data is not what researchers currently have.

Chronic diseases react differently within each person. That makes discovering the root cause of these diseases challenging. For instance, although the symptoms might be similar, viral strains within people are going to send slightly different signals. Doctor visits and traditional clinical trials, meanwhile, can offer only snapshots of patients’ health at that particular moment (or month). This approach inhibits collecting data on the environmental factors that affect diseases in somebody’s day-to-day life, and makes discovering why an illness is there and how it arrived much harder, says Alexander Hoffmann, the director of the Institute for Quantitative and Computational Biosciences at UCLA. The method also leads to subjects being placed in large subsets in which individual distinctions get lost in the background. But until technology catches up, that’s among the most efficient methods for reliable data collection.

That’s where mobile health care comes in—by its very nature, it adds in missing data points. Mobile health care offers longitudinal data, which eliminates snapshot irregularities by showing what signals might correlate with health or disease over a long period of time. And if we figure out what the outside disease drivers are in a chronic illness—the genotype, the phenotype, and the environment—then we can understand how the body works, and we can help all of society manage and avoid risk factors for these diseases.

For example, Thomson’s company focuses on atrial fibrillation, a chronic heart disease affecting about 3 million Americans. The average cardiologist treating this disease might review electrocardiograms, or ECGs, at the rate of four or five per day for his entire career, Thomson says. And he might not be able to relate the fine detail of one to the fine detail of the next. But a computer algorithm, such as the artificial scientist Eve, can sift through more than 10,000 data points a day, spotting minuscule changes and connecting them. That’s because inside a large data set are statistical clusters. Within these correlating clusters you can figure out which (if any) data points have any clinical meaning. The more data you have, the better chance you have to find granular detail around the cluster.

Thomson says data cluster analytics give insights about a person’s ECGs and lifestyle data over a long period of time. Algorithms then identify key clusters of heartbeats, which can eventually show how factors like lifestyle, habits, and medications can change his or her heart. AliveCor has collected more than 1.5 million ECGs and has hundreds of thousands that are categorized as medically abnormal. Thomson says in the next 10 years he expects mHealth data to show atrial fibrillation is no longer one singular disease, which opens up new treatment opportunities.

But Thomson doesn’t see this change occurring before the next decade. Several kinks will prevent the mHealth hype train from leaving the station anytime soon. Reade Harpham, the director of human-centric design at Battelle Memorial Institute, says one of the big problems is an important and somewhat forgotten question: We have all this data. Now what do we do with it? The simple answer is that researchers are still trying to figure that out.

A couple of years ago, developers and researchers might get 200 data points a day about parameters such as somebody’s diet, exercise, glucose levels, heartbeat, and how he or she reacts to certain treatments. Now some pharmaceutical companies are staring down 18 million data points a day about genetics, reactions to drugs, and environmental stimuli, Harpham says.  

That’s a lot of information, but for some organizations it’s still not enough to satiate the needs of their mHealth algorithms. Until those needs are met, how useful a certain data set is remains a mystery.

Businesses, researchers, and academics in the mHealth industry are racing to get the largest data set possible so they can become valuable, says Nigam Shah, an assistant professor of biomedical informatics at Stanford University. It’s relatively easy and cheap to collect the data from users, but developing algorithms to figure out how that information correlates and who can benefit from it is expensive and time-consuming. To complicate matters more, Hoffmann says two analysis methods are widely used to create these algorithms. And the two methods’ results can be drastically different.

One is purely qualitative. The algorithm looks at whether a type of mobile health signal correlates with the user’s health, or a known disease or medication. That’s the most basic approach available. It’s a completely unbiased method, but the algorithm requires a much larger sample size to glean something useful from the data.

The slightly more sophisticated but more powerful approach is to have an understanding of the causal relationship between a disease and its symptoms, Hoffmann says. For example, we know that eating a lot of sugar affects insulin balance and metabolism. For diabetics, this is dangerous. That a priori knowledge allows researchers to pinpoint what the pathways are that connect these things, which becomes a quantitative relationship. An algorithm can be made based off these biochemical pathways that allows researchers to analyze data sets based on proving or disproving a specific end goal. It gives predictive power. A problem with this approach is that it depends on the reliability of the knowledge that the algorithm is based on—if the information is incorrect, then the predictions are also wrong.

The ideal scenario combines the two methods. First, use a qualitative algorithm to discover causal relationships among diseases and symptoms. Then, use that data in a quantitative algorithm to create now-accurate predictions and inferences. However, the problems of big data collection in health care aren’t solely limited to the algorithms. The doctor’s office also plays a big role.

Because these algorithms are still years off, the average doctor doesn’t have the ability to make mHealth data mean anything. Even if they did, the law isn’t clear on if they’re allowed to use that information to treat patients. Today, if you use an ECG monitor like AliveCor’s and give its data to your doctor, he will have to figure out whether that should be part of your medical record. If it is, is it covered by HIPAA? Is your doctor required to archive it for seven years? Is it identifying information, or not identifying information? Is it protected health information or not PHI? That’s a policy decision. And the law is always lagging behind technology.

Mix the policy challenges alongside the technological challenges, and right now the entire process is basically chaos.

Mobile health care is in a tricky but crucial spot. Device developers and industry researchers are caught in a looping cycle of needing users to give more data to improve medicine, but not knowing yet if that data is even useful or reliable. Developers and researchers are in dire need of better algorithms to make millions of data points tell a story. But with more data and time, these issues will solve themselves. If early adopters stay willing to offer their personal data, despite the risks, the future of medicine will be marvelous.