A few years ago, a quirky bit of science came out that made a big splash, perhaps thanks to its practical application potential: Your date would likely find you more attractive if you turned up wearing red. That was the suggestion behind one 2008 study, which showed that male undergraduates based in the United States consistently rated women with the color red in their photographs (either their clothes or the background) as substantially more attractive. And a 2010 study showed the same for women rating men.
The results, presented in papers co-authored by Andrew Elliot, a professor of psychology at the University of Rochester in New York, were eaten up by the media, receiving widespread coverage over the years (in Slate too). The stories bought into the explanations offered up in the papers—that the effect makes sense because we associate red with passion, something that might have evolutionary roots, since people flush when they’re angry. (Don’t people look their best when they’re upset?) They often included suggestions to wear the color in specific situations, such as your dating app profile. For years, the literature continued to grow with more studies supporting the red-romance effect.
What’s been less reported is that the red-romance effect has been under scrutiny right from its early days. Now, after several follow-up studies, it seems likely that it does not hold up at all. Evidence for this comes from replications studies—scientific efforts that attempt to replicate an experiment to ensure that the previously found effect remains. One such replication study was published last week in Social Psychology by Robert Calin-Jageman and Gabrielle Lehmann of Dominican University in River Forest, Illinois. This study repeated the strongest experiment from Elliot’s 2008 paper as closely as possible, having university students and online participants rate the same photos. It produced little to no effect for either gender.
So why did science think it did? The issues at hand seem to be the same ones surfacing again and again in the replication crisis—too much weight given to small samples, a tendency to publish positive results and not negative results, and perhaps an unconscious bias from the researchers themselves.
The small numbers are one problem—many of the initial studies contained inadequate sample sizes, according to Calin-Jageman. His replication used more participants than the original papers—600 participants (360 women, 242 men) compared with 150 and 168 participants in Elliot’s 2008 and 2010 papers, respectively. Even so, he was quick to note that the higher number of women made him more confident in that nonresult. But another 2016 study conducted three replications of men rating women with more than 800 participants and also failed to find support for the red effect.
Calin-Jageman and his student have since gone further to accumulate the data as part of a yet unpublished meta-analysis, which they have shared with Elliot. Put together, the meta-analysis—consisting of data from nearly 4,000 participants—shows no effect for women rating men and a weak effect for the inverse. Calin-Jageman said the latter result is a mixture of large and null effects, which led him to conclude that “either the big effects are a fluke, or there is something specific needed to be done when studying red that not all labs are doing.”
In collecting all of this data, Calin-Jageman uncovered another problem with the research: He found that more than half of all experiments involving women rating men hadn’t been published at all, and for men rating women, around 30 percent of the data was not there either. “What is published is just the tip of the iceberg,” he said. “It’s not the whole story.”
It also contributes to our likely mistaken belief that the red-romance connection is strong. The majority of the published data are positive findings showing the effect working. Most of the unpublished findings show no effect. Scholarly journals often actively reject papers with negative findings, and in the case of red-romance, it seems that the studies that said it mattered were published while the ones that showed no effect were ignored. No wonder we think it has an influence—science says so!
Greg Francis of Purdue University in West Lafayette, Indiana, was one of the first scientists to question Elliot’s 2010 study, arguing that the study’s results seemed too good to be true. His paper noted that the results were underpowered and had likely included null experiments that weren’t submitted or were the result of (perhaps subconsciously) biased methods. For example, none of the original experiments were pre-registered—meaning they posted the paper’s underlying protocols online before any experimentation. Pre-registration, which some argue doesn’t happen enough, also means papers are published despite their outcomes. (The researchers behind the replication pre-registered the study.)
Ultimately, the saga supports some of the biggest lessons of the current scientific crisis. “It seems clear that we’re burying negative evidence,” Calin-Jageman told me. “We should all work to stop doing that.” He thinks researchers should put more effort into replicating findings and resubmitting rejected papers. Another bright spot: The Center for Open Science in Charlottesville, Virginia, has launched a “Preregistration Challenge,” paying academics $1,000 to incentivize them into pre-registering their papers.
In an interview with Slate, Elliot admitted that sample sizes in his earlier works were “too small relative to contemporary standards.” He added, “I have an inclination to think that red does influence attraction, but it is important for me to be open to the possibility that it does not.”