2018 Midterms

It’s (finally) Election season again which means my Friday politics blog posts are going to start back up again from now at least the end of August.  I am starting a Statistics Ph.D. program at Texas A&M in the fall so I might not have time to blog regularly for the end of election season,  but I will try.  I am so excited to follow the 2018 midterm elections!

This summer I plan to immerse myself back into the midterm elections as I did for the 2016 Presidential race.  The goal right now is to use the model I already built to predict the 33 Senate races and possibly a few House races if I have time.  This election gives me the opportunity to try to build a few new, different kinds of priors to use in my Iterative Gaussian model.   I will continue to break the races into groups and use a method similar what I have used to predict Presidential elections,  but I am planning to bring in poll data from the Generic House Ballot with some adjustments.

The Senate races are going to bring in new challenges because each race has a different set of candidates that will complicate how I choose races to serve as the prior.  What will probably happen is that I will break races down into categories based on the competitiveness of a race and what direction I think it is leaning.  Hopefully, we will see polling for every Senate race start to pop up after the primaries.   If some of the races don’t have any polls, I might have to get creative.

Right now,  I think Republicans will keep the Senate Majority. There are only 8 Republican Senate seats up for reelection.  Of those, only Nevada was not won by Trump in the last election.  Additionally, there are vulnerable Democratic senators in West Virginia, Missouri, Montana, and Indiana which are all states Trump won by a large margin.  Given the polarization of the electorate,  I don’t think enough Trump voters (at least those happy with his performance) can be convinced to vote for a Democrat.  I also believe Ted Cruz will be reelected to his seat in Texas.  O’Rouke might have more money and is gaining popularity,  but it will be an uphill battle for him to convince enough moderate Republicans and Republican-leaning independents to vote for him even though he will probably vote the party line if elected.

My New Project: Revised Models to Predict American Presidential Elections Preregistration

My current project is a series of new models to predict American Presidential Elections like in the original model with some minor changes.   The new models have 3 different methods to reassign undecided voters,  2 different conjugate priors, and 3 different ways to calculate using the Gaussian conjugate prior.   The models deal with hypothetical election results with only the two major parties’ candidates. In total there are 12 models.  This is a pre-registration post with my methodology and some thoughts on what I think will happen and what I am looking for in the results.

One of the key features of this project is while it still takes a similar approach of using poll data from other states as the prior, it expands the prior to be a pooled collection of all the polls from within the category.   I believe that this new method will help address some of the issues I faced choosing one source of polls as the prior, and will possibly help in swing states where it will use polls from other swing states.

One of the goals of this project is to have better more definitions of swing states and prior regions.  The original model had definitions that were admittedly somewhat ad-hoc.   In this new project, I define a swing state as a state that has been won by both a Democratic candidate and a Republication candidate in the past four elections.   Overall, I like this definition because it is easy to use, but I wish it could capture future swing states like Indiana in 2008, and Michigan, Pennsylvania, and Wisconsin in 2016.   Since I don’t have the same time constraints I had with the 2016 model,  I have been able to put more thought into how prior regions should be defined.   This time I am going to stick closer to the US Census regions (found here) and divide the West and Midwest Census regions into a red state and a blue state subgroup.   I am going to split the Southern and Northeastern regions into two subgroups of the region with the same partisan alignment.   I am going to more Delaware, Maryland, and Washington DC into the Middle Pacific subregion of the Northeastern region since I think that is too small of a region and Washington DC and Delaware usually only have a handful of polls.  I think these states would benefit from being joined with the Middle Pacific region and will help even out the between state demographic variation.   Since the Census regions are more based on geography than culture and politics according to the history of the Census regions found here, I feel comfortable doing this.   I am also changing my mind from the previous model on the placement of Missouri.  The fact that the race was so close in Missouri in 2008,  indicates to me that its political culture may be more like the Midwest than the South.  To me, a key feature of the Midwest (and the smaller Western states) is that state partisanship is weaker than other states, and swings are more common compared to the Northeastern region or the South.  I am going to keep Missouri in the Midwest region, where it is in the US Census regions.  I am splitting the Northwest into the Middle Atlantic and New England subgroups.  In the South, I am going to split it into two regions:  one containing the West South Central region plus Tennessee and Kentucky, and another with the South Atlantic region plus Mississippi and Alabama.   Dividing the south was a difficult decision, but I looked at the Electorate Profiles and decided that that was the best way to preserve demographic similarly among key groups (Whites , Hispanics, African Americans, college-educated individuals, high-income earners percentage, and percentage in poverty) into the Southern regions.  Deciding the group for the Southern blue states was hard because they were too small of a group to be alone, and while the Middle Atlantic region wasn’t a great fit it was the best fit.

The models use three different methods to reassign undecided and minor party voters.  The first method reassigns the voters based on the past election results.  The second method splits the undecided voters equally between the two candidates.  Lastly, the third method reassigns the undecided voters proportional to their support.  For example consider a poll of a hundred people with 50 supporters of the democrat, 40 supporters of the republican, and 10 undecided voters.   The state voted 60% of the democrat and 40% for the republican in the last election.  Under the first method, 4 of the undecided voters would be reassigned to the Republican candidate, and the other voters would be reassigned to the Democratic candidate, making the poll results 56 for Democrats and 44 for Republicans.  The second method would reassign 5 voters to the democrat and 5 voters to the republican making the adjust pool results 55 Democrats and 45 Republicans.  Under the third method, the Democratic candidate received 55.556% of the two-party support, and the Republican received 44.444% of the  two-party support,  this translates to a fraction of a person so the multiplied figures of 5.556 and 4.444 are rounded to 6 and 4 respectively.   I realize I could drop the undecided voters from the polls as done in this paper by Lock & Gelman, but I am using poll data to predict the election result and not using a time series approach.   I haven’t found anyone using past election results to reassign voters.   FiveThirtyEight splits the undecideds evenly between the two candidates, so that is why I included that method.  This paper by  Christensen & Florence talks about the proportional reassignment of undecided voters.   The Christensen & Florence paper talks about an undergraduate project on predicting elections and has been a heavy inspiration for my research.

Conjugate Prior and Calculation Methods

These models use either the binomial or Gaussian conjugate prior.   The goal of the models is to predict the proportion of votes for the Democratic candidate among the two major party candidates.  The data is binomial with a Bernoulli likelihood, but the extent of the independence of people concerns me.   I think individuals show up multiple times in the polls, meaning that the observations are not independent.  If the data was truly i.i.d,  I would be ok with using the beta conjugate prior,  but since it is likely not the case I am afraid this causes on an underestimation of the variance.     I am curious what effects using the normal approximation to the binomial distribution have in the contexts of predicting elections based on polls.  I also want to see the effects of different methods of reassigning voters and the new prior has on the original calculation method from the previous study.  In the original study, I used the standard deviation and count of polls inside the Gaussian conjugate prior.   There are 4 different models:  Beta conjugate prior, a Gaussian model that uses the normal approximation to the binomial distribution and updates after every poll,  a Gaussian model that averages the polls and finds the standard deviation of the poll data and uses that information to make the calculation, a Gaussian model that turns the polls into one giant poll and uses the normal approximation to the binomial distribution.  If I had to choose the better assumption,  I would go with polls are independent over people are independent.   But I plan on eventually exploring ways to remove that the independence assumption.

Choosing the “Best” Model

I don’t think I am going to take all twelve models and turn them into multilevel models or run simulations.  Based on the data I have every model is run 153 times (3 times for the 50 states plus DC) to predict the 2008, 2012, 2016 elections.   The pooled models would likely not translate well into a time series model.  The main question I am asking is: do these changes make the model even more accurate, or at least as accurate as the original model?   I also want to know if the method used to reassign undecided voters matters.   I don’t  think it will since the proportion of undecided voters are is small and the difference between the polls and past vote usually similar.  I don’t like the idea of splitting the vote evenly between the two candidates because I think it doesn’t work as well in highly partisan states.  I don’t think that undecided voters at any point in time in West Virginia or Massachusetts are going to vote are going to turn out and vote equally for the two major candidates.   What I am hoping to get out of this is a rough idea is if any of these changes have a practical effect on accuracy.   And if there is no difference I am going to probably opt for proportionally reassigning voters and iteratively updating the model.

Looking Forward to Further Research

This project is an intermediate step in the process of testing the use of poll data from other areas as a part of the prior in a Bayesian model to predict American national elections.   Since there are a lot of key changes in this new set of models,  I want to get more data on the accuracy of my idea of exclusively poll-based models to predict elections.   What I hope later is to turn this into a time series multilevel model with and without the inclusion of a fundamental model.   I don’t have anything against fundamental modeling, but an exclusively poll based model requires less data collection than fundamental modeling.  I want to see the viability of this method,  because if it can match the performance of fundamental models then this may be a better strategy.  I want to make my own fundamental model that treats swing states differently from partisan states in the future.   I intend to look at state-level and regional-level effects on voting behavior.   The big assumption of this method is that state-level effects within a region are small and that pooling the polls across a region mitigates this effect so that the pooled polls are a good preliminary estimate of voting behavior.

 

My Comments on the Special Elections in 2017

I thought I would provide my perspective on the special election that occurred last week in Kansas, and the upcoming special house election in Georgia.

For full disclosure, I am a republican who is against some of the President’s policies on immigration, and health care.

I do not think Trump’s performance will have a major affect the voting behavior of people with strong party ties.  Republicans vote Republican most of the time, and Democrats vote Democrat most of the time.  Independents and moderates are more of a wild card.  Independents may not vote the same as they did in 2016.

The districts in question are in no way representative of the whole country. They  Any result from these elections cannot be applied to the whole country or  “predict” the entire midterm election outcome.  You could maybe use the results to for certain districts, but certainly not the entire country.  For statistical analysis to work properly, the samples need to be reasonably representative.

Special elections are all about who turns out.  In the Kansas election, Democrats spent a lot of money and attention on the race since there are only a few races this year.   The money and a lack of an incumbent is probably why the race was closer than the 2016 race.  The 2017 Kansas race had about half the votes compared to the 2016 race,  this big of a change can affect the outcome.  In Georgia, I expect a race  that is closer than usual for that district, but still with a Republican win. I doubt that a Democrat will win a majority of the votes in the primary.

These special elections need to be interpreted in context.  They are two races in House districts that haven’t been competitive in years.  We should not even try to extrapolate to the entire country from these races.  Favorability polls are a much better indicator of political sentiment  However, I think that the favorability polls like the general election polls could be underestimating Trump’s support.  It has been difficult to get Republicans to respond to the polls, and this may affect the accuracy of polls.  After the midterms in 2018,  there will be a clearer picture of support for the Republican party.  Until then we can only guess.

 

 

A Non-Technical Overview of My Research

Recently I have been writing up a draft of a research article on my general election model to submit for academic publication.  But that paper is technical and requires you to have some exposure to statistical research to understand.  I wanted to explain my research without going into all the technical details.

Introduction

The President of the United States is elected every four years.  The Electoral College decides the winner,  by the votes of electors chosen by their home state.  Usually the electors are chosen based on the winner of that state and they vote for the winner of that state. Nate Silver correctly predicted the winner of the 2008 election with Bayesian statistics.  Silver got 49 out of 50 states correct.   Silver certainly wasn’t the first person to predict the election, but he received a lot of attention for his model.   Silver’s runs Five Thirty Eight  which talks about statistics and current events.  Bayesian statistics is a branch of statistics that uses information you already know (called a prior) and adjusts the model as more information comes in.  My model like Nate Silver’s used  Bayesian statistics. We do not know the details of the Silver model, besides that it used Bayesian statistics.  To the best of my knowledge, my method is the first publicly available model that used poll data from other states as the prior.  A prediction was made for 2016, where I correctly predicted 6 states.  Then the model was applied to 2008 and 2012, where my prediction of state winners matched the prediction of Five Thirty Eight.

Methodology

I took poll data from Pollster, which provided me csv files for the 2016 and 2012 election.  For 2008 I had to create the csvs by hand.  I had a series of computer programs in Python (a common programming language) to analyze.  My model, used the normal distribution.  My approach divided the 50 states into 5 regional categories: swing states,  southern red states,  midwestern red states, northern blue states,  and western blue states.  The poll data source used as the prior were National,  Texas,  Nebraska,  New York, and California respectively.  This approach is currently believed to be unique, but since multiple models are proprietary it is unknown if this has been used before.  I only used polls if they were added to pollster before the Saturday before election date.   For the 2016 election analysis this meant November 5th.  I posted my predictions on November 5th.

I outline more of my method here.

Results and Discussion

My model worked pretty well compared to other models.  Below is a table of other models and their success rate at predicting the winning candidate in all 50 states plus (and Washington D.C.).

Race Real Clear Politics Princeton Election Consortium Five Thirty Eight (Polls Plus) PredictWise (Fundamental) Sabato’s Crystal Ball My Model
2008 Winner Accuracy 0.96078 0.98039 0.98039  N/A 1 0.98039
2012 Winner Accuracy 0.98039 0.98039 1 0.98039 0.96078 1
2016 Winner Accuracy 0.92157 0.90196 0.90196 0.90196 0.90196 0.88235
Average Accuracy 0.95425 0.95425 0.96078 0.94118 0.95425 0.95425

As you can see all the models do a similar job at picking the winner in each state, which predicts the electoral college.  There are other ways to compare accuracy, but I don’t want to discuss this here since it gets a little technical.   No one was right for every state in every election.  It would probably be impossible to create a model that would consistently predict the winner in all states, because of the variability of political opinions.   Election prediction is not an exact science.  But there is the potential to apply polling analysis to estimate public opinion on certain issues and politicians.  Right now the errors in polls are too large determine public opinion on close issues.   But further research could determine ways to reduce error in polling analysis.

Only You can Prevent Bad Political Polls

My research relies heavily on polls.  So I understand why it is important to do polls.   If I see a poll and determine it’s well written, I do it.  But I think this position is rare because people don’t know the importance of polls. I want to explain why I think polls are important.   Pre-election polls are commonly used to predict elections, and favorability polls are often used to judge a politician’s  popularity. Polls are an important part of American politics.

I get that polls are annoying.  I know it takes time and you are probably busy (like me).  But doing 1 political poll a year can greatly help improve the accuracy of polls.   You don’t have to always answer a poll, but increased participation in polls improves accuracy.   Now there are a lot of bad polls, and it’s difficult to tell if a phone poll is good based of the phone number.  Some people have “polls” that really are marketing calls.  I understand if you are hesitant to do phone polls.  But internet polling provides a good alternative.  I think the future of polling is quality internet polls.  When you do an good internet poll you know more about the quality of the poll then a poll phone call. But Internet polls from scientific polling agencies require a large base of people to create accurate samples.  You can randomly call 1000 phones, but you really can’t send 1000 random internet users a poll. To combat this problem polling agencies have databases of users to send polls. Polling agencies send surveys to certain users to create a good sample. Joining a survey panel with political polls is a way to get your voice heard.

My view on participating in political polls is you can’t complain if you don’t participate.  Polls need a diverse sample to be accurate.  If you feel your political stance is not heard in the polls, then you should do more polls instead of less.  We need all kinds of people to do good polls.  Not everyone may have internet access, but enough voters do to create a good sample.  What you can do is join a poll panel.  My two recommendations are https://today.yougov.com/ or https://www.i-say.com/.  They also do non-political polls and market research which are also important (I might do a post later on this). I recommend them because they are user friendly and statistically sound.  I am not receiving anything for recommending these agencies, I just think they are good.

If you want polls to be more accurate, the best (and easiest) thing to do is participate in polls.  As a statistician, I value good data.  But for data to be good it needs a representative sample.  Regardless of your politics, you should participate in political polls.

 

A look at Alternatives to the Current Electoral College Process

First, I want to be clear, that there is no universally fair way to elect a president. All methods have pros and cons, and you can have your own opinion about which way is the best.

Current System

Right now with the exception of Nebraska and Maine, the electoral college is decided by whoever has the most support in a state.  The winner usually has a majority of votes, but sometimes no single candidate was a majority. This method also helps smaller states as they have a lower ratio of voters to electors than larger states.

Pros

This method makes it easy to determine the winner on election night.  You don’t necessarily need all the votes to come in if you have enough information to predict the winner.

Cons

Most states have a clear winner party.  So most of the attention goes to swing states who do not have a regular winner.

Popular Vote

The popular vote method is based on the winner of the popular vote.  Whoever gets the most votes wins.  This method can be implemented if enough states change their laws to award their electors to the popular vote winner.

Pros

Every vote counts the same.  Larger states would have more power than the current system.

Cons

Smaller states lose some electoral power compared to the current system.

Congressional District System

This system awards 2 electors to the state winner and 1 elector to the winner of every congressional district.  This is the method Maine and Nebraska use.

Disclaimer:  This is my personally prefered system.

Pros

It’s a compromise between the current system and the popular vote system.  The electoral college would probably mimic the congressional makeup.

Cons

Like the current system,  could elect a president that didn’t win the popular vote.

 

All of these systems have pros and cons.  There isn’t necessarily a “best” way to pick the president.

Here is a Five Thirty Eight article about different methods of deciding the electoral college.

 

If I were a Senator

Donald Trump is officially the 45th president of the United States. Next the cabinet nominees will be voted on by the senate for confirmation. Republicans have a majority, but it would take only three Republican senators to prevent the appointment of a nominee. Technically a Democrat might vote for a nominee, but considering how many Democrats aren’t participating in the inauguration no Democrats will probably vote yes on the more controversial nominees. The question is if you were a senator that doesn’t like Trump or a certain nominee should you vote for them anyway to protect your position in the senate?

This is a complicated decision given what we know about Trump’s low favorability rating. A YouGov/Economist poll asks questions about how voters view Trump and his cabinet picks. Some picks aren’t as controversial like mates who has the highest favorability among non-Trump voters in the poll. But the Secretary of State nominee Rex Tillerson and Attorney General nominee Jeff Session have the lowest favorability among non-Trump voters. You have to consider that the majority of voters didn’t vote for Trump in the election, and the majority of voters have neutral or negative opinions in most polls. This decision is difficult if you don’t like the nominees, but as a Republican senator feel obligated to support your party.

Here is what I would do if I were a Republican Senator. I don’t think most of the cabinet picks are qualified or good candidates for their positions. I know that independent and democratic and a portion of Republican voters don’t like some of the cabinet picks. Not voting for a nominee would hurt me, it would probably anger my colleagues and lower my favorability with my constituents. Not voting for my party’s nominee would probably make national news and may not be beneficial for me. However, if I vote against a nominee and it turns out they don’t get confirmed, it probably wouldn’t hurt me that much. If I run for reelection and Trump and his cabinet are unpopular my dissent could help. If I don’t vote for a cabinet candidate, but they still get confirmed I would have risked my position for nothing. This scenario is complicated and an example of a prisoner’s dilemma game (more info here). The idea in this case is voting against a candidate is only worth it if it blocks the confirmation of that candidate and it turns out that Trump is not favorable at the time of my reelection. But the payoff is higher if I vote for the nominee regardless of the actions of the 51 other Republican senators. I also know that the rejection of this nominee doesn’t mean that the next nominee going to be a better nominee. Knowing these things the only nominees that I would consider voting against being Sessions and Tillerson because those are high power positions and are unpopular enough to increase the chance that my vote would prevent their confirmation. So it wouldn’t surprise me if almost all the senate nominees get confirmed.

Why I am against the Recount

I don’t support Jill Stein’s call for a recount the election results in Michigan, Pennsylvania, and Wisconsin.  There are multiple reasons why I think Jill Stein is going about this the wrong way.

1. Jill Stein has no chance of winning the three states in question.

Jill Stein will not be the next president of the United States.  She got less than 2% in all of these states.  If Hillary Clinton wanted to pursue a recount in Michigan which was won by Trump by probably under 1% (right now Trump has a lead of just 11,612 votes),  I  could understand that decision, especially if Michigan was all Clinton needed to be president.  I know that the probability of 11,612 being incorrectly counted is low, but if the presidency was decided by less than 0.0001 of the votes cast I think it the results should be verified.    But I think the Clinton campaign probably considered a recount and decided it would not change the outcome so it wasn’t worth the money, time, and controversy.  But a candidate who at most got 1.1% in the states questioned shouldn’t throw a fit. I don’t think its her place to call for a recount.  I would have had the same opinion if Gary Johnson had tried a similar approach in light of a Clinton Presidency.

2.  If the election was hacked a recount would probably not catch it.

Let’s say the electronic votes were tampered.  I don’t believe this happened at all but let’s entertain the idea for a second.  If the machines were hacked they would have probably changed the record of the vote which is all that is analyzed in a simple recount.  A recount is just recounting the votes. An audit would maybe have caught it, if this hack had taken place.  But audits are expensive and a far more sensible explanation for a Trump win in Michigan, Pennsylvania, and Wisconsin is that turnout was down in urban areas where Obama got  a lot of support. I am not denying that foreign interests tried to influence the election, like the creation of fake news by teenagers in Macedonia (here is an article about that: http://www.cbsnews.com/news/fake-news-macedonia-teen-shows-how-its-done/). The lower turnout compared to 2008 probably hurt Clinton.  At the end of the day it appears that more republicans turned out to vote than democrats.     Nate Silver (a democrat) wrote about the possible hacking claims here: http://fivethirtyeight.com/features/demographics-not-hacking-explain-the-election-results/.

3.  This process isn’t helping the division in this country.

The recount isn’t going to change anything.  Trump won.  Trump may have not been the candidate you would have picked.  Personally I would have wanted any other republican candidate from the nomination process.  This recount is just making things worse.  Jill Stein should want our country to come together and accept the results.  Her own VP pick doesn’t approve of this process. Instead of escalating the situation Jill Stein should stop fighting.

The bottom line is that Donald Trump will be the next president of the United States.  All this talk about a rigged election was not backed by any evidence.  Trump shouldn’t have called the election rigged.  Stein shouldn’t do the same thing.  I would have supported a recount if we had  a 2000 situation with the election decided by 121 votes, regardless of the winner.  But given the situation a recount is unnecessary and wasteful.  I know that you may not trust Statisticians right now because we were wrong about the election, but still listen to the multiple voices speaking out against the recount.  It’s time for our country to unite and accept what happened on November 8, 2016.