The case of the New England Patriots’ underinflated footballs has touched off a hypertechnical forensic fracas like nothing since the kerning in George W. Bush’s National Guard documents. What’s the precision level of the pressure gauge the referees use? What effect does temperature fluctuation have on the level of inflation? What about vigorous rubbing of the balls? (The jokes write themselves.)
But I’m a math guy, so I’m going to concentrate on the math question, touched off by Warren Sharp last week, first on his blog and then here on Slate. Sharp wrote this week that, one season after a 2006 rule change allowing each team to supply its own balls, the Patriots became superhumanly stingy with fumbles. They allowed only one fumble every 74 plays between 2007 and 2014, far outside the range delineated by the rest of the teams in the league that play their home games in outdoor stadiums. The Patriots were what statisticians call an outlier—a data point so far outside the expected range that it signals something other than normal variation is in play.
The number-loving sports pundits of the Internet swarmed, pointing out some real issues with Sharp’s analysis of the degree to which the Pats were an outlier. FiveThirtyEight did an omnibus post on the “Statistics 101 problems” with Sharp’s piece. Deadspin called it mostly junk. And Chicago data scientist (and Pats fan) Drew Fustin said the Patriots weren’t an outlier at all.
How can that be? Sharp, Fustin pointed out, excluded the teams that play in domes from his analysis. That makes sense, in a way; the climate-controlled game is a different physical environment than what we Packers fans call “actual football.” But it makes just as much sense to include all the teams, but throw out all the games played in domes. When you do that, as Fustin did, the chart looks different. The Patriots, with one fumble every 74 plays between 2010 and 2014 in outdoor games, weren’t even the best team in the league; that was the Atlanta Falcons, who in their outdoor games fumbled only once in 83 times. And the Saints, also a dome team, weren’t much worse, fumbling once every 67 plays. Measured this way, the Patriots still look like an excellent ball-control team, but they no longer break the curve.
Sharp wasted no time in firing back. Yes, he said, the Falcons had a lower fumble rate than the Patriots, but more than half of their games are played in domes every year, so they have many fewer outdoor games. It’s a small sample size! Their sterling record of 23 fumbles in 1,900 plays might just be a lucky fluke. If they had 26 fumbles instead of 23, a pretty small difference, then the Falcons would score no better than New England. The Patriots’ mark, maintained over 5,222 plays, seems a lot more reliable.
Sharp has a point. For example, in any ranking of groups by per capita statistics, smaller groups will tend to dominate the top and bottom of the list. I was reminded of this last week by a Vox article ranking states by the proportion of Facebook users residing there who “liked” the site I F*king Love Science (IFLS). If you plot proportion of IFLS fans against total Facebook population in a given state, you get something like this:
Notice something? The extreme values of places with high and low percentages of IFLS Facebook followers are concentrated among smaller states. As you move rightward on the scatterplot to the larger states, the points cluster towards the center. A big state is almost inevitably going to have a heterogeneous population, which all gets averaged into its overall score, pulling it towards the national mean. A small state, less so.
Similarly, you might expect the top tiers (and the bottom tiers) of the fumbles-per-play–in-outdoor-games rankings to consist of teams with fewer games outdoors—that is, the dome teams—even if those teams aren’t actually any better (or worse) at preventing fumbles.
So does that mean the Falcons aren’t really any better than the Patriots at holding onto the ball? Well, it’s not quite that simple. Yes, the Falcons have a small sample size. But we don’t have to content ourselves with grunting “small sample size bad” like an animal. Let us rise to our hindlimbs and compute!
Suppose the Falcons were no less fumbly than the Patriots. That is, suppose their true propensity was to fumble once in every 74 plays. In 1,900 plays, you’d expect them to cough up 25 or 26 fumbles. What’s the chance they’d suffer as few as 23? Actually, it’s not terribly unlikely; it happens about 34 percent of the time. (In math lingo: if X is a random variable taking the value 0 in 73 out of 74 times and 1 in 1 out of 74 times, the sum of 1,900 independent copies of X is less than or equal to 23 about 34.2 percent of the time. Of course you can complain here that the variables are surely not really independent, given that field and weather conditions in a single game can increase or decrease fumble probability for that game, as can the quality of the teams.)
So we really can’t be confident the Falcons were better than the Patriots at preventing fumbles. But Sharp’s claim isn’t just that the Patriots were the best; it’s that they towered far above the rest of the league. Could that be true?
The next-best team on Sharp’s chart of outdoor teams was the Baltimore Ravens, who fumbled once every 55 plays from 2010 to 2014. If that were the Falcons’ true fumble rate, they’d be expected to fumble 34 or 35 times in their 1,900 outdoor plays. Their chance of 23 or fewer fumbles would then be only 2 percent. Even the small sample is enough to provide substantial evidence against that hypothesis. I think we can be pretty convinced that the Falcons were either a little better than the Patriots at preventing fumbles, about the same, or a little worse—but not much worse.
None of this settles the question of whether the Patriots are big fat cheaters. New England may not be an all-time outlier in the history of fumbles. But no one disputes that they went from average to very, very good, and it happened suddenly, and it happened one season after the NFL allowed each team to provide its own game balls and the same season they were caught violating the rules in another controversy that had opposing fans alleging long-running wrongdoing. This might have happened because the Patriots acquired more sure-handed players in 2007 and moved to a spread offense, as Fustin suggests. Or it might have happened because the Patriots have had squishy balls for years, as everyone outside of Maine, Vermont, New Hampshire, Rhode Island, and Massachusetts suggests. The fumble stats alone are consistent with both theories.
So how to decide? You can’t do it with one stat alone. People don’t think Barry Bonds used steroids because he hit 73 home runs; they think he used steroids because he hit 73 home runs and people testified that he used steroids. Babe Ruth hit 54 home runs in 1920, when the second-best total was George Sisler’s 19, but we don’t think Babe Ruth used steroids, because steroids didn’t exist. You judge a theory based on all the evidence you have for it: past Patriots’ transgressions, the pressure gap between the home and visitor game balls in last week’s AFC championship, your personal feelings about Bill Belichick’s moral foundations, and so on. The Patriots’ sudden improvement in preventing fumbles doesn’t close the case against them, but it’s one more piece of evidence.
Correction, Jan. 30, 2015: Due to a production error, the caption on this photo originally misidentified Danny Amendola as Rob Gronkowski.