What I Learned From 2016

Recently I have been working on a chapter of my dissertation about polling accuracy and modeling polling error. I’ve done a big analysis on state-level Presidential polls from 2008-2016 and I will talk about that in a later post. Looking at the data from 2016 has been a chance to really dive into the data I collected over the years and reflect on what happened.

In 2016, I was a college sophomore. I was in my first mathematically based statistics class. Now I’m a third year Statistics PhD student with most of my classes completed. I’ve grown a lot as a statistician over these past four years, and I think I still have a little more growing to do. But I’ve compiled three things I’ve learned.

  1. Don’t forget that the general public aren’t experts
  2. Embrace Uncertainty
  3. Reflect on the Mistakes of the Past to Prevent Them in the Future

Don’t Forget that the General Public Aren’t Experts

I’ve always viewed election modeling as a pathway to public engagement and education about statistics. Still, you have to remember that not everyone knows what margin of error means or understands how to interpret a probability. Formal statistical training often focuses on the limitations of statistical modeling and how it will always have uncertainty. But people often equate statistics with mathematics as something that provides the same solution with every attempt and that it won’t be wrong. If you are going to put a model out to the public there, you need to explain it in a way that can both be understood by a non-expert, but still provide the information experts need to evaluate your model

Embrace Uncertainty

Polling and polling averages are limited in their precision. If you have an election within a point, polls will not be able to predict the winner. Models have more precision, but there are many examples of cases where they, too, are uncertain. Polling data can be limited, and elections happen once. Sometimes we can’t know who will win. Additionally, it’s hard to evaluate probability predictions because there is only one 2020 presidential election. It is important to be open about the uncertainty in your model and the potential for error.

Reflect on the Mistakes of the Past to Prevent them in the Future

The polling and election modeling world has spent a lot of time reflecting on the 2016 Presidential election. The errors in 2016 were a mixture of insufficient data, some bad methodological choices, and bad luck. Some of the methodological mistakes are easy to fix, but some of the important ones are difficult. A big issue in 2016 was weighting the data for education because non-college-educated individuals are less likely to respond. But it’s not clear how to weight the data well when the composition of the electorate is constantly changing and turnout is hard to predict. But pollsters and modelers know what the challenges and know how polls have performed in the past which can help us access the likely accuracy of polls. We can learn from 2016 so that our models and polls can be the best they can be.

I hope that I’m not writing a piece like this in 2024. I hope that the public will listen to the experts on what typical polling accuracy is and not blindly assume that margin of error covers everything. I hope that the polls and models do well and that polling will be something that the public values and trusts. But now all we can really do is wait and see.

What my Undergraduate Research experience was like in Statistics

I am entering my third and final year of my undergraduate degree.  I have been doing research since almost day 1, and I wanted to share what my experience was like. As a statistician, I feel like I have to mention this is from a sample size of 1 and may not reflect all undergraduate research experiences.

First, I want to give a little background.  The summer before my senior year of high school, I was chosen to participate in an NSF (National Science Foundation) funded REU (Research Experience for Undergraduates)  at Texas Tech.  There I was exposed to what research was like.  We had a series of workshops each led by different researchers over a two week period. I loved the Texas Tech math department and decided to attend Texas Tech for my undergraduate degree. I meet my current research advisor Dr. Ellingson at the REU.

Right after classes started during my freshman year, I decided to email Dr. Ellingson and see if could do research with him.  I started work on image analysis (Dr. Ellingson’s specialty).  I was also following the GOP nomination because it was interesting to me.  I had an idea to predict the nomination using Bayesian statistics, similar to how Five Thirty Eight predicts elections.  I had talked with Dr. Ellingson about political science statistics before and how there was a need for a statistically sound open source academic model.  He agreed to help guide me through the process of building a model to predict the GOP nomination process.

At the time of the GOP nomination my math background was pretty limited, so I decided to just use Baye’s theorem and used the normal distribution to estimate likelihood.  I did all the calculations in excel and I downloaded csv files from Huffington Post Pollster with the poll data.  I used previous voting results from similar states as the prior in my model.  More info about my model can be found here. What I found the most challenging was making a lot decisions about how I was going to predict the election.  I also struggled with making the decisions about the delegate assignments which often involved breaking the results down by congressional districts, even when the poll data was state wide.  After the first Super Tuesday (March 1st) I began to realize that how difficult it is to find a good prior state and reassign support of candidates who dropped out of the race.  The nomination process taught me that failure is inevitable in research, especially in statistics, where everything is at least slightly uncertain.

In the summer of 2016, I started gearing up for the general election. I decided to use Scipy (a python package for science and stats) to make my predictions.  Making the programs was incredibly difficult.  I had over a dozen variations to match different combinations of poll data.  I had the programs up and running by early October, but I discovered a couple of bugs that invalidated my early test predictions.  The original plan was to run the model on the swing states two or three times before the real election. In the middle of October I discovered a bug in one of my programs.  I had to then fix the bug in every program.  I then finally did some manual calculations to confirm the programs worked.  It was difficult to have to admit that my early predictions were totally off, but I am glad I found it before the election.  Research isn’t like a homework assignment with answers in a solution manual.  You don’t know what is exactly going to happen and it is easy to make mistakes.

I ended up writing a paper on my 2016 general election model.  Writing an paper on your research is very different than writing a paper on other peoples research.  My paper was 14 pages (and over 6500 words) long, and only about one or two pages were about what other people’s research on the topic.  It took a very long time to write, and I had 17 drafts.  I hated writing the paper at first, but when I finished it felt amazing. It was definitely worth the effort.

Undergraduate research is difficult, but I loved the entire process.  I got to work with real data to solve a real problem.  I learned how to read a research paper, and eventually I got to write my own.  I got to give presentations to both general audiences and mathematicians and statisticians.  I got to use my research to  inform others about statistics. If you are thinking about doing undergraduate research, you definitely should.

 

Why I am against the Recount

I don’t support Jill Stein’s call for a recount the election results in Michigan, Pennsylvania, and Wisconsin.  There are multiple reasons why I think Jill Stein is going about this the wrong way.

1. Jill Stein has no chance of winning the three states in question.

Jill Stein will not be the next president of the United States.  She got less than 2% in all of these states.  If Hillary Clinton wanted to pursue a recount in Michigan which was won by Trump by probably under 1% (right now Trump has a lead of just 11,612 votes),  I  could understand that decision, especially if Michigan was all Clinton needed to be president.  I know that the probability of 11,612 being incorrectly counted is low, but if the presidency was decided by less than 0.0001 of the votes cast I think it the results should be verified.    But I think the Clinton campaign probably considered a recount and decided it would not change the outcome so it wasn’t worth the money, time, and controversy.  But a candidate who at most got 1.1% in the states questioned shouldn’t throw a fit. I don’t think its her place to call for a recount.  I would have had the same opinion if Gary Johnson had tried a similar approach in light of a Clinton Presidency.

2.  If the election was hacked a recount would probably not catch it.

Let’s say the electronic votes were tampered.  I don’t believe this happened at all but let’s entertain the idea for a second.  If the machines were hacked they would have probably changed the record of the vote which is all that is analyzed in a simple recount.  A recount is just recounting the votes. An audit would maybe have caught it, if this hack had taken place.  But audits are expensive and a far more sensible explanation for a Trump win in Michigan, Pennsylvania, and Wisconsin is that turnout was down in urban areas where Obama got  a lot of support. I am not denying that foreign interests tried to influence the election, like the creation of fake news by teenagers in Macedonia (here is an article about that: http://www.cbsnews.com/news/fake-news-macedonia-teen-shows-how-its-done/). The lower turnout compared to 2008 probably hurt Clinton.  At the end of the day it appears that more republicans turned out to vote than democrats.     Nate Silver (a democrat) wrote about the possible hacking claims here: http://fivethirtyeight.com/features/demographics-not-hacking-explain-the-election-results/.

3.  This process isn’t helping the division in this country.

The recount isn’t going to change anything.  Trump won.  Trump may have not been the candidate you would have picked.  Personally I would have wanted any other republican candidate from the nomination process.  This recount is just making things worse.  Jill Stein should want our country to come together and accept the results.  Her own VP pick doesn’t approve of this process. Instead of escalating the situation Jill Stein should stop fighting.

The bottom line is that Donald Trump will be the next president of the United States.  All this talk about a rigged election was not backed by any evidence.  Trump shouldn’t have called the election rigged.  Stein shouldn’t do the same thing.  I would have supported a recount if we had  a 2000 situation with the election decided by 121 votes, regardless of the winner.  But given the situation a recount is unnecessary and wasteful.  I know that you may not trust Statisticians right now because we were wrong about the election, but still listen to the multiple voices speaking out against the recount.  It’s time for our country to unite and accept what happened on November 8, 2016.