August 2020 - The Numbers don't Lie

This is the first of my approximately weekly posts I’m planning about the 2020 election.

As election day approaches, polls are going to become more prominent. It’s important we carefully interpret polls. I suggest you stick to focusing on polls from poll aggregators (like FiveThirtyEight or Real Clear Politics) or those tied to prominent news organizations. Polls brought up by polling experts (myself included) are typically going to be good sources. But you can encounter polls out in the wild that are complete garbage and you should be skeptical of a poll from a website you have never heard of. I’m going to briefly talk about three things you have to always consider when you analyze polls:

Margin of error
Polls are not predictions
Outliers happen

Margin of Error

The margin of error is probably one of the most misunderstood polling concepts. The margin of error comes from a statistical formula that the natural randomness that comes from estimating a proportion for an entire population with a small sample. The margin of error is meant to be added and subtracted to a single candidate’s support. When we are talking about US elections, we normally care about the difference between the democratic and republican candidates (sometimes called the margin and written as Trump +x or Biden +y) and to examine that we must double the error. The reason for doubling the error is because in a poll with a margin of error of three points, Biden could be underestimated by three points, and Trump could be overestimated by three points, which leads to a six-point gap. The margin of error doesn’t cover the rare scenario where a respondent lies or makes a mistake. The margin of error calculation assumes the individuals who respond to a poll are not that much different than the population we are aiming to poll, and we can reach every member of that population, which isn’t exactly the case. The margin of error underestimates the polling error. It’s hard to quantify before an election how much margin of error underestimates the real error, but it is typically less than one percentage point in from my analysis I did on polling error (details will come later). If the difference between two candidates is less than double the margin of error, the poll does not provide enough information about who is winning, and this signals the race is too close to predict from just that poll.

Polls are not Predictions

Polls are not designed to predict elections. Polls are designed to estimate the percent of voters who support each candidate and what percentage of voters are undecided when they are conducted and not necessarily election day. The margin of error estimates the error between the poll and the true support for the candidates while the poll is conducted. People change their minds occasionally, and it’s hard to predict what direction undecided voters will go. A good guideline from research (taken from this book and replicated my own analysis) is that polls are not predictive of the election day result until labor day weekend. The predictiveness of polls improves over time. For example, my model will start collecting data on September 6th and will hopefully be fit by September 22nd, which is about 45 days before the election.

Outliers happen

Occasionally we will see a poll somewhere that is different from other polls. Strange poll results do happen, and you can’t say there has been a change in the race until multiple polls from different pollsters have similar findings. You should look at multiple polls when you check the state of the race. I like to look at two poll aggregators: FiveThirtyEight and RealClearPolitics. FiveThirtyEight is where I am planning to get my data from for my model. The tricky part about comparing two polls is you have to add both their margin of errors together to compare single candidates and double that number if you want to look at the difference between two candidates. Consider Poll A had Trump at 45 and Biden at 48 with a 3 point margin of error, and Poll B had Trump at 48 and Biden at 42 with a 4 point margin of error. The difference between Trump and Biden in Poll A is -3 with a margin of error of 3*2=6, and Poll B is +6 with a margin of error of 8. The margin of error to compare these polls would be 6+8=14, which means if the difference between the polls is less than 14, we would say that the polls aren’t showing statistically different results and the difference we observe could be explained by random sampling error. A lot of times, outliers aren’t really outliers after you adjust margin of error to fit your comparison.

Now you have learned some basic tools to critically analyze polls on your own to follow the races that matter to you. There are many more things you can do with polling, and the analysis I do is far more complicated than this. I’ll continue to write about polls and the election if you want to learn more.

Month: August 2020

How to Interpret Election Polls