For several months, Duke University economist Campbell Harvey has been busy delivering some unpleasant news to the world of finance. Most of the existing research about why some stocks outperform others, he believes, “is likely false.” As in: likely wrong and likely worthless to an investor. That includes everything from studies published in academic journals for the past 50 years to the techniques mutual fund managers use to pick their portfolios.
The issue, as Harvey and his co-authors Yan Liu and Heqing Zhu argue in a recent working paper reported on last month by Vox, is that economists and quants haven’t been using a strong enough statistical standard to distinguish meaningful discoveries about the factors that drive stock prices from patterns that emerge purely by chance. “Everybody’s in the same boat, including me,” Harvey says. “My previous research suffers from the same problem.”
Harvey refers to that problem as “multiple testing.” But you can think of it as the danger that emerges when math whizzes try to explain the world by throwing dozens of theories at the wall, just to see what sticks. Let’s say a mutual fund analyst wanted to know why a group of stocks all rose over 10 years. They might start testing different variables the companies have in common, and seeing if any correlate with how their shares performed. Typically, researchers are content that they’ve found a statistically significant relationship if it survives testing at the 95 percent confidence level, which means that there’s only a 5 percent chance the result is a fluke.
But most researchers don’t test just one or two variables. They try dozens, hundreds, or even thousands as they search for a novel discovery that will make for an intriguing paper or yield a slight investment edge. And because there’s still a 5 percent chance that they’ve reached a random result with each test, the odds that they’ll land upon a completely meaningless correlation increases with each additional regression they run. “If I try 20 things,” Harvey says, “then there’s a 95 percent chance that I’ve got a false positive. Just by chance, something is going to correlate.”
The multiple-testing problem isn’t exclusive to economics or finance. High-energy physicists deal with it when trying to tease out the relationships between untold colliding particles. Biomedical researchers face it when they test millions of gene combinations to figure out which might be related to diseases like cancer. In 2005, the medical science field felt the impact of a paper titled “Why Most Published Research Findings Are False,” which argued multiple testing was leading to incorrect findings.
According to Harvey, financial economists like himself simply haven’t grappled with this issue as it applies to their area of expertise. He says he noticed something was wrong during his six years as editor of the Journal of Finance, one of the top titles in its field. Economists, he said, kept submitting papers for publication that tried to account for the exact same stock price behavior using different explanations that claimed to be mathematically significant. Something was amiss, he thought.
In his recent paper, Harvey examines 313 studies dating back to the 1960s that claim to identify factors behind stock returns to find out if their results hold up once you mathematically correct for the multiple-testing problem. More than half of their findings, he concludes, are “likely false.” Some of the erroneous results, Harvey told me, are the product of modern computing, which allows researchers to easily test thousands of variables at a time in search of a correlation. But economists still managed to make similar mistakes five decades ago, without the same processing power.
What’s most worrisome for anybody with a 401(k) is that the problem isn’t merely academic. Investment firms treat their models as trade secrets, and some, like hedge funds, use an extremely high statistical bar for their internal research. But Harvey says he suspects that most research produced by the finance industry is just as flawed as the work coming out of universities. First, many firms use academic studies as the starting point for their own work. Second, he sees similar mistakes made in research published in finance trade publications and client notes.
Finally, Harvey says, “The industry record speaks for itself.” Most mutual funds that are actively managed—meaning an actual human picks the stock portfolio—fail to beat their benchmark over three or five years. In other words, you’re probably better off investing in an index fund that tracks the S&P 500 than one that tries to outperform it. Bad math would seem like one obvious explanation for their poor performance.
Harvey’s work offers another lesson to would-be investors: Even if a fund does consistently beat the market, it may just be a coincidence. With so many investment funds out there, some will overperform in a way that seems statistically significant just by chance.
Ultimately, Harvey’s paper argues that financial economists need to adopt a more stringent statistical threshold to weed out significant discoveries from the flukes. He also offers his own test, which he hopes will help investors separate the good money managers from the lucky ones. Since finishing his research, he has been presenting the findings before industry and academic groups, and says the response from both groups has mostly been positive. Pension funds, he said, are especially interested, and some institutional investors have begun asking management firms whether their products meet his new standard.
Given that sciences like medicine have been aware of these problems for years, why did it take so long for finance to come along? “Finance is a fairly young field,” Harvey said. “We have been very focused on discovery, and we’ve made some great discoveries in our field. It’s just a sign of maturity in our field that we’re starting to think about how we do our research.”