Election Accuracy

Measuring Election Accuracy

One of the most important criteria we use to evaluate voting methods is "Accuracy", but how do we determine if a voting method is accurate? Does it elect the candidates who should win? Is it fair and representative? 

There are a number of tools that voting scientists use to answer these questions and like all good science lovers we advocate taking a close look from multiple perspectives. For evaluating the accuracy of individual elections one approach is to find the "Condorcet Winner," the candidate who would beat all others in a head to head race. For comparing accuracy of multiple elections most voting scientists turn to election simulations to measure accuracy. "Bayesian Regret" is one such model and the Yee simulations are another. Modeling from John Huang is a newer addition to the body of evidence which all are in agreement as to relative conclusions comparing voting methods. One of the most sophisticated and realistic is "Voter Satisfaction Efficiency," from Harvard PhD in Statistics Dr. Jameson Quinn. Quinn was Vice-Chair at the Center for Election Science at the time this study came out, and he has since joined the Equal Vote Coalition board of Directors.  These simulations can be an invaluable addition to the data we can collect from real world elections.

Real world, empirical election data is also a critical source of information. Unfortunately, the basic choose-one-only style ballot doesn't give us much data to go on. These ballots are not expressive enough to collect the voters' full opinions, and we have no way of knowing if the votes cast were honest or dishonest. We also have no way of knowing if factors like vote splitting and the Spoiler Effect distorted the election outcome. Simply determining the candidate who got the most votes only goes so far. 

For assessing voting methods with less expressive ballots, pre-voting-day polling and exit polls can be a valuable addition to election results and ballot data. In many cases ratings are used in this kind of polling because a rating is able to collect the kind of data needed to assess less expressive ballot data and election results. Nonetheless we can draw firm conclusions from the data, polling, and other kinds of observation and trends.

For example, failed elections due to vote splitting and the Spoiler Effect can be glaringly obvious. The 2000 presidential election with George Bush Sr. (Republican) vs Al Gore (Democrat) and Ralph Nader (Green Party) is a classic example, even if we ignore the electoral collage. In that election, a majority of voters were from the Left end of the political spectrum. Based on polling we can safely conclude that many Green Party voters would have preferred Gore, and if we had a more expressive voting system, the election would have elected Gore. In 1996 the same scenario happened in reverse where the Republican Bob Dole was likely the candidate preferred overall, but he lost the election to Bill Clinton after voters on the Right were split between Bob Dole and Ross Perot.

Among voting scientists there is full consensus that our choose-one-only voting method is wildly inaccurate with more than two candidates in the race. 

“The fact is that FPTP, the voting method we use in most of the English-speaking world, is absolutely horrible, and there is reason to believe that reforming it would substantially (though not of course completely) alleviate much political dysfunction and suffering.” -Jameson Quinn in “A Voting Theory Primer for Rationalists”

Real world data is particularly insightful when we are looking at election results from voting systems that do use more expressive ballots. For example Instant Runoff Voting uses an expressive ballot, but also uses a multi-round, tournament style elimination process which doesn't count all the rankings. When we go back and look at the ballot data again, sometimes we find elections where the candidate who won wasn't actually preferred by the voters... according to the ballots cast.


Voter Satisfaction Efficiency


One of the most cutting edge tools for measuring election accuracy is VSE or Voter Satisfaction Efficiency. VSE analyzes voting methods using thousands of simulated elections across a wide variety of scenarios. Factors and variables like strategic voters, voters blocks who cluster on issues, number of candidates, degree of polarization, and more are considered to help us determine when and how often an election system elects the best candidate. In VSE the candidate who should win is defined as the "candidate that would make as many voters as possible, as satisfied as possible with the election outcome." 

Voter Satisfaction Efficiency makes a strong case for STAR Voting. In VSE, STAR topped the charts, coming in as more accurate than all other voting systems that are being seriously advocated for, many of them by large margins. The only voting system that was close to on par was a Condorcet Method called Ranked Pairs which had previously set the bar for accuracy but which is too complex to be viable in real world elections. 

Here are some of the findings that we can extrapolate from the VSE graphs:

  • STAR is among the very best of the best. When voters are honest STAR delivers its best results with a VSE of over 98%.

  • Under less than ideal circumstances, such as elections where a large portion of voters are strategic, STAR was still highly accurate with a VSE of over 90%. Under the worst-case-scenario, STAR Voting was basically just as accurate as the best-case-scenario for IRV (commonly referred to as Ranked Choice Voting) and was much better than Plurality Voting (our current system) under any circumstances.

  • Compared to many other systems STAR showed a high resiliency against strategic voters. This means that voter tactics have a smaller impact on overall election accuracy so that even if many people try and game the system, the election will still come out in good shape. The exception was 3-2-1 Voting which was slightly less accurate than STAR in a best case scenario, but slightly more resilient to strategic voting. (3-2-1 is another newly proposed system in the rated runoff family.) 

  • VSE strategy simulations showed that STAR doesn't incentivize strategic voting for voters overall. Strategic and dishonest voting is just as likely to backfire as it is to help the individual voter. In contrast, strategic voting under Instant Runoff Voting was found to be incentivized roughly twice as often as under STAR Voting. 


You can learn more about Voter Satisfaction Efficiency here

The Ka-Ping Yee Simulations:

In 2006, Ka-Ping Yee introduced a way to examine single-winner election methods via computer graphics (see: http://zesty.ca/voting/sim/). Each colored circle represents a candidate in a 3 or 4 candidate election in a two-dimensional political space. Each white dot represents an individual voter. This kind of visualization is useful in that you can see exactly how ideologically close or far each voter is from each candidate. The color of the background represents which candidate would win under each method if a randomized electorate, centered at that point, were to vote. This model is a simplification of our complex political spectrum, but it does a good job at illustrating common phenomena that effect election outcomes. 

Yee's diagrams show some serious pathologies with the Plurality and Instant Runoff methods, but it is unclear from his descriptions whether these were frequent occurrences, or if they were chosen specifically to highlight these flaws. This video follows on Yee's work by animating the positions of the candidates in the two-dimensional political space, and adds Score Voting, STAR Voting (aka Score Runoff Voting) and a one-voter "ideal winner" model. Where plurality and IRV tend to squeeze out candidates in the center, Score tends to give an advantage to candidates who are positioned in between other candidates. STAR Voting consistently performs closest to the ideal model of the systems visualized. We also recommend checking out the work of Nicky Case, which has a great article where you can directly interact with voting systems and scenarios to see what would happen. 


Condorcet Winner as a Measure of Accuracy

The Condorcet winner is the candidate that was preferred over all others head-to-head. A ranked ballot or any other ballot that shows voter's preferences is all that is needed to find the Condorcet winner if one exists. When there is a Condorcet winner there's a good case to be made that that candidate should have won. Unfortunately Condorcet can be limited because there isn't always a single winner that was preferred over all others. Sometimes preferences are cyclical. (A>B, B>C, C>A.)

Another problem is that there are a few situations where some people may argue that the Condorcet winner didn't actually have the most support. Advocates of Instant Runoff Voting (IRV; commonly called Ranked Choice Voting) often argue this to defend the results of a recent Burlington Vermont IRV election which didn't elect the candidate who was preferred over all others. In order to make that argument convincingly we would need to know more than just voters' preference orders, we would need to know how much each voter liked each candidate.

In ranked ballot systems there's no way to know if a voter actually liked their second choice. Second choice could mean full support if a voter really loves more than one candidate. Conversely a voter's second choice may be a candidate whom they strongly dislike, but who is better than their worst case scenario.

In a 2009 Burlington Mayor's race there were three viable candidates, a Democrat, Republican, and a Progressive, and all three had significant support. The Democrat was preferred over all others (the Condorcet Winner) but came in third place after voters first choice votes were counted. The Progressive won.

Many Republican voters had ranked the Democrat as their second choice to show that they preferred the Democrat to the Progressive candidate. If these voters would have actually been significantly more satisfied if the Democrat had won, then the Condorcet winner did deserve to win after all. On the other hand if Republicans would have been almost equally dissatisfied with either the Democrat or the Progressive, then the Progressive was probably the candidate with the most support after all. To learn more about Burlington read more from Equal Vote here, more from The Center for Election Science here, and more from the Center for Range Voting here.

The point is that in these kinds of close three way ties it's critical to have enough ballot data to determine if the candidate who won had the most support or not. For that we need a ballot that allows us to show degree of support, as well as vote no preference if desired. In Burlington the ballots clearly showed that the Democrat was preferred over all others. The Democrat was the Condorcet winner and so he clearly deserved to win according to the ballots cast. Instant Runoff Voting was repealed the following election.

These kinds of inaccurate results could unfairly affect any kind of candidate. In a different election IRV could just as easily elect a Democrat where a Progressive was preferred by voters, or a Republican where a Democrat was preferred, or any other combination.

STAR Voting would usually elect the Condorcet winner, but if it doesn't, the preference degree data should provide a convincing case for why another candidate actually had more support and better represented the people.