Frame Game

Pollish Sausage 

No matter who you think is going to win the presidential election, you can find a poll to back up your opinion. If you’re betting on George W. Bush, you can point to the Battleground 2000 survey, which consistently shows Bush ahead. If you’re betting on Al Gore, you can point to the New York Times/CBS poll, which usually indicates a small lead for Gore. If you think the debates helped Bush a lot, you can point to the CNN/USA Today/Gallup poll, which found a big Bush surge after each encounter. If you think the debates didn’t help Bush much, you can point to the Reuters/MSNBC/Zogby survey, which has rarely shifted more than two points a day.

Why do the polls confirm so many theories? Because theories are built into the polls. Each polling outfit has its own objectives and biases. In the case of media surveys, these objectives and biases aren’t about ideology; they’re about news-making and social science. Some tracking pollsters want to find big day-to-day changes, others want stability. Some want to narrow the population they study, others want to broaden it. Some fear passive bias, others fear active bias. Each pollster designs his survey to suit his preferences, and each gets the results he’s looking for. Like the rest of us, pollsters have theories about who will vote and how. Polls don’t confirm these theories. They incorporate them.

This year’s big controversy is the CNN/USA Today/Gallup tracking poll. Other pollsters are dismayed at Gallup’s radical swings. In the two days after the first debate, Gallup’s three-day sample went from an 11-percentage-point Gore lead to a seven-point Bush lead. Last weekend, Bush had a nine-point lead in the Gallup sample; two days later, Gore had grabbed the lead. Contrast this with the Zogby survey, which moved only four points and two points during those periods, respectively. Why the difference? Because Gallup and Zogby are looking for different things. Gallup is trying to capture daily fluctuations, while Zogby is trying to filter them out.

On its Web site, Gallup makes clear that its poll seeks to maximize daily change: “Our objective is to pick up movements up and down in reaction to the day-to-day events of the campaign.” Gallup postulates that one in five voters is highly malleable: “A sizeable portion of the voting population, upwards of 20%, is uncommitted and on any given day as likely to come down in favor of one candidate as the other.” Gallup doesn’t mind that big shifts in the partisan makeup of each day’s sample—one day lots of Republicans, the next day lots of Democrats—push its numbers back and forth. Gallup’s editor in chief, Frank Newport, says these partisan shifts reflect “differential intensity” between the parties. One day, Republicans feel likely to vote; the next, Democrats feel likely to vote. Accordingly, the pool of “likely voters” shifts from Bush to Gore.

Other pollsters regard that kind of change as a distraction. They want to hold some factors constant—including party affiliation—so they can focus on variations in other factors. “We’re trying to measure movement within groups,” says Ed Goeas, the Republican pollster who oversees the survey. “If I see that white women have moved 10 points, I want to see whether that was real movement”—as opposed to an excess of Republican women in the first sample and an excess of Democratic women in the second. Similarly, Washington Post survey director Rich Morin writes that Gallup “may not be tracking real changes in the electorate, but merely changes in relative interest or enthusiasm of Republicans and Democrats.”

Notice the clash of premises. Morin and Goeas use a hard model of voting behavior. They assume that any changes in the horse-race numbers (i.e., the percentage of respondents who plan to vote for Bush or Gore) caused by changes in the partisan makeup of the likely voter pool aren’t “real.” These pollsters treat the distribution of Democratic and Republican voters in presidential election turnout as a constant. When they see poll results in which that distribution shifts back and forth like a variable, they dismiss the data and fault the poll’s methods. You could argue that their hard model, with its fixed dichotomy of constants and variables, is too rigid. But you could argue just as easily that Gallup’s soft model, which treats everything as a variable—to the point of positing that uncommitted voters are “on any given day as likely to come down in favor of one candidate as the other”—is too mushy and chaotic. Which model is better? The answer to that question isn’t scientific. It’s philosophical.

It’s also practical. CNN and USA Today are in the news business. They’re paying Gallup for new numbers every day. If Gallup’s numbers don’t change, where’s the news? So Gallup has an incentive to keep its filter loose, allowing the winds of shifting partisan intensity to blow its numbers back and forth. Goeas, on the other hand, is a professional campaign pollster—as is his Democratic partner in the survey, Celinda Lake. They’ve designed their poll to get the kind of information a candidate, as opposed to a news organization, would want. Campaigns divide the electorate into demographic groups—union households, white women, Midwestern Catholics—and target their ads and messages to those groups. A campaign manager needs to hold the distribution of these groups constant from day to day so she can track movement within each group. Which poll is correct? That depends on what you need the numbers for.

Here’s another philosophical question: How many days do you need to poll in order to understand public opinion? Gallup is sampling 400 people every night. Since CNN and USA Today want the numbers to keep changing, they report a rolling average based on only the last three samples. If Gore was doing well three nights ago, but Bush is doing well tonight, the pro-Gore sample drops out of the three-night mix, the pro-Bush sample goes in, and Bush gets a big bump. On its Web site, however, Gallup reports a rolling average based on the last six samples. The pro-Gore sample stays in the mix, diluting Bush’s bump—and conversely, tonight’s pro-Bush sample stays in the mix five days from now, diluting Gore’s next bump. The result is a less exciting series of smaller shifts. The three-day average tells you how 1,200 people feel right now. The six-day average tells you how 2,400 people feel over the course of a week. Which number should you pay attention to? That depends on whether you want the latest news or the big picture.

The argument for the big picture is that it’s a better predictor. Presidential preference “is not a firmly held attitude,” says Gallup’s Web site. “[T]here is no need for Americans to develop a firmly held view on their vote until Nov. 7.” Yet Gallup says its poll is designed to clarify who would win the election if it were held today. Its surge toward Bush after the first debate, for example, suggests that “if the election were indeed held during the days after the debate, Bush would have won, in large part because his voters would be more likely to turn out to vote.” But if presidential preferences don’t become “firmly held” until Election Day, then it makes no sense to infer from today’s numbers that Bush would win “if the election were held today.” The election isn’t being held today—and if it were, voters would have to resolve their fluctuating feelings into firmly held views that might not lead to the same conclusion.

Every pollster dreads statistical bias. But there are two kinds of statistical bias: passive and active. Passive bias is what happens when you don’t balance your sample. If you live in a white neighborhood and poll your neighbors, you don’t get enough black respondents. You have to take steps to make sure you either 1) sample the proper percentage of blacks up front; or 2) “weight” the number of blacks in your sample to reach the proper percentage. For example, if you polled half as many blacks as you should have, you double the weight of each black respondent’s answers, as though you had polled the correct number.

How do you determine the proper percentage? The least intrusive way is to adjust each demographic group—women, Hispanics, senior citizens—to census data. But what if you’re polling likely voters? Shouldn’t you adjust the percentage of black respondents in your sample to the percentage of blacks among voters who actually turn out on Election Day? And how do you figure that percentage? Do you look just at exit polls from the last election or at precinct-by-precinct turnout figures? How many past elections should you look at? How should you update those old figures to take account of possible changes in this year’s black turnout? And what if you overestimate black turnout and assign too much weight to black respondents in your poll? In that case, you’ve replaced passive bias with active bias.

Gallup and the New York Times/CBS poll use minimal weighting, based on the census. Goeas and Zogby, however, adjust their filters and weights to match the turnouts they expect among various demographic groups, based on past turnout, current voter registration, and other factors. Polls whose weights and filters are calibrated to reflect turnout, as opposed to just the census, tend to favor senior citizens, well-educated people, whites, men, and nonunion households. The weights alone can radically change the final numbers. According to the Post, on one recent night Zogby’s weighting process shifted the results from a four-point Bush lead to a four-point Gore lead.

To see how filters can affect survey results, look at the disclaimer on the Post’s own poll: “The Post and ABC News collect data jointly but use somewhat different models to identify likely voters. This can produce slightly different estimates of candidate support.” Sure enough, over the past week, ABC and the Post have reported different results from the same tracking poll. Here is a perfect controlled study: The raw data are the same, but the pollsters differ, and therefore, so do the reported results. 

The problem isn’t ideological bias. Weighting can just as easily shift the numbers the other way. The problem is that weights and filters aren’t part of the interviewing process. They precede and succeed it. Whether you’re filtered into or out of the poll and how heavily your answers are weighted depend largely on the pollster’s theory of this year’s turnout—and that theory isn’t reported alongside the numbers in tomorrow’s newspaper. “Every time you add a weight, you run the risk of skewing your internal data. You’re adding one more unknown,” observes Goeas. So which poll should you trust—the one that minimizes weights and filters or the one that maximizes them? That depends on which kind of bias worries you more.

The big debate about weighting this year concerns party affiliation. Republicans are indicating they’re more likely to vote this year than in past years. Should pollsters believe them or stick with the old turnout projections, which favor Democrats? Usually, weighting protects the GOP. On his Web site, for example, Zogby argues that his polls are more accurate because “we apply weighting for party identification to ensure that there is no built-in Democratic bias in our sampling.” But New York Times survey editor Mike Kagay agrees with Gallup poll editor Frank Newport that party affiliation, unlike race or gender, is too vague and changeable to measure or track reliably. So in addition to the difference among pollsters over which kind of bias to err against—active or passive—there’s a philosophical disagreement over whether party affiliation is more like a trait or like an opinion. Good luck resolving that one.

There are plenty of other backstage quarrels among the pollsters. Zogby dismays some colleagues by polling during the day. Goeas dismays others by not polling on Fridays or Saturdays, which are the hardest days to reach married voters with children. Whose methods get the most accurate result? Even the election won’t settle that question. Every pollster has fudge factors he can apply to massage his numbers at the last moment. He can raise or lower his projected turnout. He can adjust his weighting coefficients, as several pollsters have already done during this campaign. Twelve years ago, when I worked at the Hotline, our final three-day rolling average missed the election result—so we left an extra night’s sample in the mix and bragged about nailing the result with our four-day rolling average. Being a clever pollster means never having to say you’re sorry.