December 2020 - The Numbers don't Lie

I’ve been following all the lawsuits trying to overturn the election results in states that Biden won. I’m far from a legal expert so I don’t pay much attention to the legalese but I do like to examine the “evidence” of “irregularities” in the election. I’m highly disappointed in what qualifies as an expert or evidence in most of these lawsuits. It is shocking to see people who claim to be experts make statements that occasionally could be refuted by an undergraduate who took a stats class and a political science class.

If you are going to suggest that our democracy was violated and that there was a massive conspiracy to rig the election you need to have solid evidence. That is a life-altering claim to the health of our democracy. It’s a serious allegation that needs serious evidence. You need to provide an analysis created by someone that has genuine expertise in both political science and statistics. To do a high-quality analysis you probably have to be on the level of a statistics Ph.D. student in the last part of their studies. You need methods commonly used in the literature for similar problems. And you need to approach the analysis knowing that statistical tests aren’t the best equipped to detect fraud and rule out more innocent explanations of the irregularities.

But as I’ve examined the “evidence” of voter fraud I’ve been highly disappointed in the rigor of the arguments. Voter fraud is rare, and the level of voter fraud necessary to change the presidential outcome has to thousands of times higher than normal levels. At best the conclusion of these analyses is that mail-in votes counted later did not come from the same distribution as the in-person votes or that the vote change from 2016. But everyone should have known that because the method of voting was highly partisan and mail-in votes take longer to process. And every election year there are commonly shifts from previous results. There isn’t evidence of massive fraud but people including President Trump are acting like there is. These are highly damaging allegations that are not based on facts.

Now I will break down what’s wrong with the analysis in six lawsuits. These lawsuits were filed by Republicans. I’ve addressed all of this on Twitter, but I wanted to summarize and show that there are many cases with many mistakes.

The Texas Expert who would have failed Stats 101

The recent Texas lawsuit (expert starts on pg. 22) says Michigan, Wisconsin, Georgia, and Pennsylvania had unfair elections and the electors chosen shouldn’t count. The expert has a Ph.D. in economics. Some economists are great at political science and statistics, but this expert doesn’t seem to be good at either. He claimed that the odds of Biden outperforming Clinton was 1 in a quadrillion. But he used a test that doesn’t apply because it assumes that you have a sample from ideally less than 10% of the population and that the votes are counted in random order and all the ballots have the same probability of being for Biden. Additionally, he used the wrong formula. And this test was something I taught undergraduate students in the most basic stats class at Texas A&M. I had final exam questions testing the ability to perform this procedure and this work would deserve a failing grade. I am struggling to express how horrible this analysis is. The analysis doesn’t make a clear conclusion other than these results are unexpected and the expert was used in the lawsuit to show that it is practically impossible for Trump to have lost. But the analysis can’t be used to make that conclusion even if it was appropriate and done correctly. This test can only detect differences in populations but it can’t explain why the difference occurred or if the difference was fraud.

The Michigan analysis that Used Minnesota Data

In another lawsuit, an expert claimed that there were more votes than people in certain precincts. This would obviously be a red flag and if true would likely show fraud. But these calculations combined voting results from Michigan using population data from Minnesota. So the analysis was completely wrong.

The Arizona Lawsuit that said audits by Precinct would be better than Audits by voting center

Another lawsuit in AZ claimed that the audit needed to be done by Precinct instead of by voting center to be accurate. But how you do a sample like that doesn’t have a large effect and sorting the data to Precincts from voting centers would be highly time-consuming. There wasn’t a need to do an audit by Precinct from a statistical standpoint

The Pennslyvania Lawsuit that wanted to assume illegal votes had the same distribution as legal ones

A lawsuit in PA wanted to take a sample of the mail-in ballot envelopes, examine them to see if they were fraudulent, and then assume Biden won the same percentage of illegal votes as legal ones. This is a bad idea because how can you know what the distribution of illegal votes is? Additionally, any analysis of the ballots would likely label some legal ballots as illegal. It’s hard to identify illegal votes unless you have a double vote, someone stealing a ballot, or lying about their eligibility. A signature mismatch isn’t always going to be fraud. You must consider that your analysis would have sampling error if not every ballot was examined. So you need more complex analysis and a highly designed sample. So a fishing expedition to find enough invalid ballots to change the winner is a bad idea.

The Nevada Lawsuit that said if ballots are rejected by the machine at high rates the machine must be missing illegal votes

A lawsuit in Nevada mentioned that the signature verification machine rejected a lot of ballots. Those ballots were reviewed by humans and accepted most of the time. The lawsuit claimed that since the machine rejected a lot of ballots it must have accepted multiple fraudulent ballots. But the machine rejecting a legitimate ballot is a unique process than accepting a fraudulent one. You can’t immediately conclude that the machine accepted fraudulent ballots. The relationship between managing false negatives and false positives often is an inverse relationship. So if this machine was rejecting ballots, those it accepted were likely signature matches.

The Georgia Lawsuit that said you had more voters than eligible people

A lawsuit in GA claimed there were registered voters than eligible voters and that there are many people who voted only for Biden. But the analysis that found “extra” voters used census data without accounting for a the relatively large margin of error on county-level estimate of eligible voters. Basically in a state like GA that has automatic registration, and therefore high registration the census data is often imprecise enough to show over 100% registration. The census data is also pulled over years and known to struggle in areas rapidly growing in population. Also turnout in these counties was much lower than the eligible voters estimates. There might be some extra people on the rolls, but they aren’t voting, and the problem is likely exaggerated in this analysis. Also someone else did really bad math that assumed Biden overperforming the Democratic senate candidate meant people only voted for President and no other candidates. But in reality this was just split ticket voting which is common especially among Republicans in the Trump era.

So in summary, lots of the statistical claims that there is evidence of voter fraud are statistically unsound. I fully believe that the results are correct, and that no massive fraud occurred.

Month: December 2020

Why the Claims of Election Fraud aren’t based on Statistical Evidence