Did U.S. Election Polls Fail? Should Publishers Care?
Many media stories imply that the polling profession suffered a catastrophic failure in the recent presidential election because the national polling consensus was that Clinton would win by a comfortable margin of 3 to 4 points. Now that the absentee ballots have been counted, that forecast turns out to be fairly close since Clinton won by 2 points. So the national polls were respectably within the “confidence interval” of normal statistics.
However the state-level polls, which determine the composition of the Electoral College, were another matter altogether. These polls, which were much more likely to be done on the cheap using opt-in internet panels (rather than the more expensive random samples), were far less accurate. According to post-election analyses by Nate Silver and others, most state-level polls weighted their samples to reflect the age and sex of their populations, but not education levels. Since education was such a critical determinant of one’s leaning in this election, this probably contributed to some of the errors in state-level polls.
Education may, in fact, be emerging as such an important marker of cultural affinities that publishers may want to insist that it be part of the weighting schemes for any research that they do or buy. Publishers may also keep in mind the old “you get what you pay for” maxim when choosing cheap internet access panels over more scrupulous (and expensive) sampling methods; in many cases, it may not matter but if the stakes are high, good samples will be money well spent.
Some of the post-election chatter about the polls is self-serving or ill-informed. Political journalists like Michael Wolff and Peggy Noonan pronounced polling dead to be replaced (of course) by political journalism. Vendors of natural language processing and text analysis software crowed (without evidence) that their methods would have been better. Vendors of “big data” analytic solutions have made similar claims -- though I know of no “big data” system that published an election forecast -- accurate or otherwise -- prior to the election. So let’s take the protestations of methodological superiority with a grain of salt for now -- especially since the really professional post-mortems are still to come. Based on prior research on sources of survey error, here are a few educated hypotheses regarding the ultimate conclusions that the 2016 election polling post-mortems will find:
- Pre-election poll samples failed to reflect education levels accurately. Nate Silver’s analysis shows that the higher a state’s % whites without a college degree, the more likely the polls underestimated support for Trump. Weighting could have helped that, but it was unevenly applied.
- Some Trump supporters refused to speak to pollsters. This gets into the area that we call non-response bias and it probably was a factor, especially since pollsters noted as the campaign went on that it was getting harder to get people to take their calls. This is one area where research shows that it is better to collect data via computer or paper since the anonymity makes people feel more comfortable being truthful. If survey respondents and non-respondents are not alike on key measures, the survey taker has to figure that out and adjust accordingly to get an accurate forecast.
- The likely voter models were off. This is apt to be one of the biggest smoking guns, but also one whose effect is hardest to quantify. Not all polls provide transparent details about the algorithm they use to transform raw survey results into election forecasts -- a “black box” problem that should be wearingly familiar to publishers whose businesses are often in the thrall of opaque algorithms.
When asked his opinion of the French Revolution, Cho En Lai famously said that, “It is too early to tell.” We shouldn’t have to wait that long to get an empirically informed view of what went wrong and what went right with 2016’s election polling, but it still will take a bit longer. In the meantime, publishers should not jump to premature conclusions. The ultimate answers are not likely to support claims that we should abandon all surveys, or shift all of our resources to ethnographic observation, or rely exclusively on Big Data, or on social media, or on any other magic bullet. Reality tends to be more complicated and truth tends to be found in the judicious application of multiple methods.