A week before the election, Slate published a consumer’s guide in which we disclosed each pollster’s methods and how they might affect that survey’s numbers relative to the election returns. Now the returns are in, for pollsters as well as the public. Which polls nailed the results, which blew it, and why? Our review suggests three factors were crucial.
1. Party identification. Democrats have consistently outnumbered Republicans in surveys since FDR. Several of the pollsters we examined in October assumed that the turnout in 2004 would be close to the average of the last three presidential elections, which was more Democratic than Republican by several percentage points.
The pollsters who ran the Battleground survey disagreed. They assumed that the electorate would be 42.3 percent Democratic and 42.3 percent Republican. That split, negotiated between the Republican and Democratic companies that conducted the poll, looked to us like an unscientific political compromise. The Pew survey looked even crazier for projecting that Republicans would outnumber Democrats, 37 percent to 35 percent.
But guess what? On Election Day, exit polls showed Republicans matching Democrats 37 percent to 37 percent. Pollsters who assumed that historical patterns would temper the Republican intensity in this year’s surveys got it wrong. Those who bet on the data instead of the historical patterns got it right.
2. Undecided voters. Historically, last-minute undecideds have broken decisively for the presidential challenger. Based on this pattern, Gallup allocated 90 percent of its undecideds to Kerry, lifting him into a tie with Bush at 49 percent. TIPP made a similar bet on the 4.4 percent of voters in its final survey who said they were still “not sure” whom to vote for. TIPP allocated 61 percent of this group to Kerry and only 34 percent to Bush.
Slate’s Election Scorecard page went further. Alongside our projection of each state based on its final polls, which yielded a 269-269 Electoral College tie (we got Florida and Wisconsin wrong), we issued a separate “vote-share” projection that allocated undecideds as follows: 1) enough to third-party candidates to match the showing by those candidates in the same state in 2000; and 2) the remainder to Kerry. This model yielded a Kerry victory with somewhere between 276 and 291 electoral votes.
Oops! According to exit polls, Bush got 46 percent of those who made up their minds in the last week of the campaign and 44 percent of those who made up their minds in the final three days. TIPP got it wrong, Gallup got it very wrong, and Slate’s vote-share formula got it very, very wrong. Who got it right? Pew again. In its final report, Pew predicted that undecideds “may break only slightly in Kerry’s favor.” With 6 percent of voters undecided in the week before the election, Pew added 3 percent to Bush’s total and 3 percent to Kerry’s.
3. Automation. Before the election, we publicly doubted and privately derided Rasmussen and SurveyUSA, which used recorded voices to read their poll questions. We rolled our eyes when they touted the virtues of uniformity and when they complained that live interviewers “may not know how to read or speak the English language,” could “chew gum,” or might “just make up the answers to questions.” It sounded to us like a rationalization for cutting costs.
Look who’s laughing now. Rasmussen and SurveyUSA beat most of their human competitors in the battleground states, often by large margins.
Let’s compare the automated surveys to the three biggest pollsters who used live interviewers in multiple battleground states. We’ll grade each pollster on two measures: 1) how far its final numbers for Bush and Kerry varied from the official returns, and 2) how far the gap between its final numbers for Bush and Kerry varied from the gap shown in the official returns. For example, suppose a pollster had Bush winning a state 48 to 46 percent, but Bush actually won the state 50 to 47. By the first measure—let’s call it the sum—the poll missed Bush’s number by 2 and Kerry’s by 1, for a total error of 3. By the second measure—let’s call it the spread—the poll’s 2-point lead for Bush missed the actual 3-point lead for Bush by a total error of 1.
Start with the sum method. Rasmussen and Gallup overlapped in four battleground states: the big three (Florida, Ohio, and Pennsylvania) plus Minnesota. In all four, Rasmussen beat Gallup. Rasmussen’s average error in these states was 3.3 points compared to Gallup’s 6.2. SurveyUSA overlapped with Gallup in the big three states plus Iowa. Again, the automated pollster whipped Gallup. SurveyUSA’s average error was 3.5 points. Gallup’s was 6.4.
Mason-Dixon fared better, but not by much. It conducted surveys in five states that Rasmussen also polled: the big three plus Michigan and Minnesota. Mason-Dixon’s average error in these states was 5.5 points. Rasmussen’s was 3.2. Mason-Dixon overlapped SurveyUSA in 10 states: the big three, Arkansas, Colorado, Iowa, Michigan, Missouri, Nevada, and Oregon. Mason-Dixon was off in these states by an average of 5.6 points. SurveyUSA was off by 3.3.
Zogby came closer but still couldn’t beat the robo-pollsters. Rasmussen went head-to-head with Zogby in the big three, Michigan, and Minnesota. Zogby erred in these states by an average of 4.3 points. Rasmussen erred by just 3.2. SurveyUSA squared off against Zogby in the big three, Colorado, Iowa, Michigan, and Nevada. Zogby was off in these states by an average of 4.5 points. SurveyUSA was off by just 3.4.
Human pollsters argue that the sum method favors automated polls, because when respondents are asked to choose a candidate, they’re more likely to punch “1” or “2” on their phones than to punch “3” for other or undecided. This drives down the number of other/undecided responses, lifting both major candidates closer to their final numbers. If one poll has Kerry winning a state 46-45 with 9 percent undecided, and Kerry actually wins 50-49, the sum method punishes that pollster for every other/undecided respondent (calculating an 8-point error) and fails to reward the pollster for nailing the spread. Instead, the sum method rewards a second pollster who recorded fewer other/undecided responses and called the state for Bush, 51-48. The second pollster outscores the first by the sum method (missing Bush’s number by 2 and Kerry’s by 2), despite blowing the spread by 4 points (calling a 3-point win for Bush when Kerry actually won by a point).
What happens to the pollster comparisons if we switch to the spread method? Both of the automated pollsters still beat Gallup. Head to head, SurveyUSA missed the spreads by an average of 2.3 points; Gallup missed by an average of 5.4. Rasmussen cleaned Gallup’s clock, missing the spreads by an average of 1.6 points compared to Gallup’s 6.2. Rasmussen also whipped Zogby, erring by 1.0 points compared to Zogby’s 3.2. But the contest between SurveyUSA and Zogby was tighter: The human pollster was off by an average of 3.6 points, compared to the robo-pollster’s 2.5.
Throw in Mason-Dixon, and the comparison gets even tighter. In the five states where Rasmussen overlapped with Mason-Dixon, the two pollsters essentially tied. If you compare election returns (measured to a tenth of a percent) to the most precise published poll results (measured in whole integers), each pollster missed by the exact same average: 1.42 points.
Mason-Dixon says it would be more scientific to compare whole-integer poll results to whole-integer (rounded) election returns. This method would lower Mason-Dixon’s average error. We understand that error rates averaged to a tenth of a percent are tenuous when the poll numbers from which they’re computed are whole integers. But we can’t agree that rounding off election returns improves the situation. Alternatively, Mason-Dixon argues that if we’re using election returns calculated to a tenth of a percent, the best scientific comparison would be to poll results measured to a tenth of a percent, which again would lower Mason-Dixon’s average error. We agree that this would be more scientific. But Rasmussen didn’t release its results to a tenth of a percent, so we can’t compare the two pollsters at that level of precision. Anyway, the performances are so close, and the variation in averages depending on decimal place is so tiny when compared to the much bigger margin of error on each poll, that it’s impossible to call the race between Rasmussen and Mason-Dixon one way or the other. It’s a tie.
The match-up between Mason-Dixon and SurveyUSA is a different story. In the 10 states where they went head to head, the human pollster prevailed. Mason-Dixon erred by an average of 1.8 points, beating SurveyUSA’s 2.6. For this lonely victory over the machines, Mason-Dixon deserves the polling industry’s Gary Kasparov award.
How did the robots largely beat the humans? For starters, they aren’t robots. They’re recordings of human voices. Pollsters who use this technology argue that the uniformity achieved by automation—every respondent hears the questions read exactly the same way—outweighs any distortions caused by people hanging up or lying to the recordings. They also argue that the interviewers who read questions and record answers in “human” polls are all too human. A human poll may bear the name of a major newspaper or television network, but the interviews are usually “outsourced” to a company you’ve never heard of and conducted by whoever is willing to make the phone calls—which sound a lot like telemarketing—for modest wages.
We won’t settle the relative merits of the two approaches in this article or this election. But when the two major automated pollsters score either second and first–or third and tied for first, depending on how you count it–in round-robin match-ups with the three major human pollsters, it’s time to broaden the experiment in automated polling and compare results to see what’s working and why. Clearly, the automated pollsters are onto something, and the human pollsters who have fallen behind will have to figure out how to beat it—or join it.
Correction, Dec. 11, 2004: This article originally said that the measure by which Rasmussen and SurveyUSA beat all three human pollsters was the spread method. This was incorrect. The error calculations supplied were for the sum method. We recalculated the average error for each pollster using the spread method and determined that Mason-Dixon beat SurveyUSA. We apologize to Mason-Dixon and to indignant humans everywhere.
Correction, Dec. 20. 2004: Dec. 11, after we had calculated and published pollsters’ error averages using the spread method, Ohio certified a revised vote count that lowered Bush’s vote share in that state from 51.0 to 50.9 and raised Kerry’s vote share from 48.5 to 48.8. Accordingly, we have recalculated all the numbers using both methods. The recalculation eliminated Rasmussen’s advantage over Mason-Dixon using the spread method, producing a tie.