Future Tense

Using Tweets to Predict Which States Are Overweight

Regional preferences for meal hashtags.

Graphic courtesy of Daniel Fried

It’s a familiar joke that people are obsessed with talking about food and sharing food photos on social media. But researchers at the University of Arizona had the idea to use public Twitter data to study the language of food. And they found that trends in the Twitter data were predictive of what was actually going on in different regions.

The group based its analysis, published last month in arXiv, on more than 3 million food-related tweets. Using natural language processing algorithms, the researchers were able to predict things like the percentage of a state’s population that is overweight, and the rate of diagnosed diabetes.

Mihai Surdeanu, a natural language analysis researcher who worked on the study, said:

Can you actually predict the risk of diabetes and overweight distributions looking at the entire subset of Twitter that focuses on food? It turns out you kind of can. We’re not solving the problem yet, but we do much much better than the baseline. So there is definitely predictive power behind the language of food.

Surdeanu and the other researchers broke the data down in all different geographic ways to see what food words could predict and with what degree of accuracy on the city, state, and regional levels. The state level was where the group could most accurately predict things like population diabetes rate, but the group could predict where blocks of tweets were coming from within the 15 most populous cities in the country.

Applying natural language processing to food words seems to be a trend. Surdeanu’s former boss at Stanford, Dan Jurafsky, wrote for Slate last month about his research on the linguistic characteristics of food words.

The most distinctive food word per state. You can tell it’s not the most common word per state because how much could people in Maine possibly talk about durian? For New York keep in mind that “Prune” is the name of a restaurant. Nothing against dried plums.

Graphic courtesy of Daniel Fried

The results of the Arizona study are promising, so the group plans to move forward with refining its models and testing them on other data sets.

“There’s the detection phase and there’s the intervention phase,” Surdeanu said. ”If you find out that, say, this neighborhood or this city is at higher risk of diabetes than the rest of the state, can you do something in terms of public health to improve that? That’s the goal of this work. And of course everybody likes maps!”

Heatmaps showing the geographic distribution of tweets about donuts.

Graphic courtesy of Daniel Fried

And pasta.

Graphic courtesy of Daniel Fried

And sushi.

Graphic courtesy of Daniel Fried

And wine!

Graphic courtesy of Daniel Fried