Update 09/23/17: I am switching to two proportion Z tests. I am setting the population proportion to .5 to prevent an underestimation of variance.
This is a bit of a technical post, I will have a better explanation later.
Post election, I have been working on a paper and thinking about what to do next. I am really interested in breaking down voter behavior in the swing states. I have collected exit poll data from the 11 swing states. I want to test if voter behavior across the swing states was consistent with the national vote or the swing state average.
For phase 1 of this experiment, I will run Chi-Square Test of Homogeneity between a swing state compared to the average of the other swing state and the national vote. I will look at each category four different ways: Trump vs. not Trump, Clinton vs. not Clinton, Other vs Clinton and Turmp, and overall. This will probably be around 1500 tests. I will have an initial alpha level of 0.05. I will then run a two proportion z-tests on the tests were the p value was less than 0.05. I will do the z-tests on the direction that matches the data.
For phase 2, I will collect data from 2008 and 2012 in states that have a statistically significant portion of significant tests. Then I will compare voting behavior with Chi-Square Test of Homogeneity on: 2008 vs 2012, 2008 vs 2016, and 2012 vs 2016. Then significant results will be tested using a two proportion z-test.
I am going with the Chi-Square test first for two reasons. The Chi-Square test is not subject to errors in the direction of an effect, and the Chi-Square test is less sensitive than a two proportion z-test. I have to be very careful in my interpretation of the results since an analysis this large means that there is a big potential for false positives and false negatives. This analysis will probably take me most of next year. I’ll give an update on my progress in December.
I thought I would provide my perspective on the special election that occurred last week in Kansas, and the upcoming special house election in Georgia.
For full disclosure, I am a republican who is against some of the President’s policies on immigration, and health care.
I do not think Trump’s performance will have a major affect the voting behavior of people with strong party ties. Republicans vote Republican most of the time, and Democrats vote Democrat most of the time. Independents and moderates are more of a wild card. Independents may not vote the same as they did in 2016.
The districts in question are in no way representative of the whole country. They Any result from these elections cannot be applied to the whole country or “predict” the entire midterm election outcome. You could maybe use the results to for certain districts, but certainly not the entire country. For statistical analysis to work properly, the samples need to be reasonably representative.
Special elections are all about who turns out. In the Kansas election, Democrats spent a lot of money and attention on the race since there are only a few races this year. The money and a lack of an incumbent is probably why the race was closer than the 2016 race. The 2017 Kansas race had about half the votes compared to the 2016 race, this big of a change can affect the outcome. In Georgia, I expect a race that is closer than usual for that district, but still with a Republican win. I doubt that a Democrat will win a majority of the votes in the primary.
These special elections need to be interpreted in context. They are two races in House districts that haven’t been competitive in years. We should not even try to extrapolate to the entire country from these races. Favorability polls are a much better indicator of political sentiment However, I think that the favorability polls like the general election polls could be underestimating Trump’s support. It has been difficult to get Republicans to respond to the polls, and this may affect the accuracy of polls. After the midterms in 2018, there will be a clearer picture of support for the Republican party. Until then we can only guess.
Statsland is a magical world that exists only in (certain) Statistics textbooks. In Statsland, statistics is easy. We can invoke Central Limit theorem and use the normal distribution when n is larger than 30. In Statsland we either know or can easily determine the correct distribution. In Statsland 95% confidence intervals have a 95% chance of containing the real value. But we don’t live in Statsland.
The point of doing statistics is that it would be too difficult (or impossible) to find the true value of a population. You aren’t likely to find the exact value, but you can be pretty close. In a statistics textbook problem, you probably have enough information to do a good job of estimating the desired value. But in applied statistics you may not have as much information. If you know the mean and standard deviation of a population you do not need to do much (if any) statistics. Any time you have to estimate or substitute information, your model will not perform as well as a theoretically perfect model.
Statistics never was and never will be an exact science. In most cases, your model will be wrong. There are no perfect answers. Your confidence intervals will rarely perform as they theoretically should. The requisite sample size to invoke Central Limit Theorem is not clear cut. Your approach should vary on the individual problem. There is no universal formula to examine data. Applied Statistics should be flexible and instead of rigid. The world is not a statistics textbook problem, and should never be treated as such.