Greg D. Adams, Asst. Professor
|Prof. Chris Fastnow, Director
Center for Women in Politics in Pennsylvania
Pittsburgh, PA 15213
According to several news accounts, many voters in Palm Beach,Florida, have claimed that they were confused by the ballot structure and may have inadvertently voted for Buchanan when in fact they intended to vote for Gore. In the hours after the election, a discussion ensued among several academic friends and colleagues about whether such ballot confusion could be statistically detected, since (apparently) Palm Beach county alone had the unusual ballot structure in Florida.
Chris Fastnow, a political scientist and director of the Center for Women in Politics in Pennsylvania at Chatham College (and who is also my wife) found the Florida county-level returns for the election on the internet at the CBS News website and passed them on to me (we've now updated the data with those from the state of Florida). We reasoned that if enough voters in Palm Beach county were confused and mistakenly voted for Buchanan, it should be statistically detectable by examining the vote for Buchanan relative to the votes for Bush for all of the counties in Florida.
A first cut at the data, simple scatterplots and linear regressions, suggested that something unique happened in Palm Beach county. By simple visual inspection and several different model specifications, it appears that instead of the 3407 votes Buchanan received in Palm Beach county, he probably would have received under 1000 votes, if the other counties in Florida are any guide.
Within seconds of my seeing the initial results, a colleague of mine wandered into my office, saw the results, and urged me to make the results "public" as soon as possible. I hastily drafted an email to my dept., pointing them to a graph of Buchanan's vs. Bush's votes, which I had put up on my internet server. The email spread across the university and onward, which prompted a string of phone calls, emails, and so forth. Ever since, I have been overwhelmed with more messages than I can possibly handle, and I have had little opportunity to do anything else but sift out emails pertinent to my day-to-day duties and answer the phone. Admittedly, these are rudimentary analyses, mostly done early in the afternoon after the election, when it appeared a winner of the election could be declared within hours. Since then, many scholars have done much more in-depth analyses. Interested readers are encouraged to turn to these papers for a better sense of the methodological issues.
In order to get an estimate of the number of votes that Buchanan would have garnered in Palm Beach county, I ran a number of regressions, but two of them drew the most attention: one predicting Buchanan's vote share based on Bush's votes in the other Florida counties, and one predicting Buchanan's votes based on the total votes cast.
There are theoretical reasons to think that the number Buchanan's votes should correlate with Bush's. First, for any candidate, a large county with many people will generally provide the candidate more votes than a county with fewer people, all else being equal. Second, holding size of the county constant, a more conservative county should favor both Buchanan and Bush in a proportionate way. It thus seemed reasonable to us to expect a systematic relationship between the two candidates' votes.
There is also reason to look at Buchanan's votes as compared to the total votes cast in each county. If each county were ideologically similar to one another, the percentage of votes that Buchanan got would be roughly the same in each county, which would produce a straight line when Buchanan's votes were plotted against the total votes. Of course, some counties are more conservative/liberal than others, in which case the total votes cast would probably not predict Buchanan's votes as well as a prediction using votes cast for Bush. Moreover, if the population of the county were correlated with the ideology of the county, one might expect the relationship to curve. Bigger counties would give Buchanan more votes, but if bigger counties tend to be more liberal, bigger counties would tend to give proportionately fewer votes.
The regression results are posted with the graphs below for several different comparisons: (Buchanan's votes vs. Bush's votes), (Buchanan vs. Total votes), (Buchanan 2000 vs. Buchanan 1996 primaries), (Buchanan vs. Registered Reform Party Members), and (Socialist Party votes vs. Green Party votes). The last graph demonstrates that the ballot mistakes in Palm Beach hold for other candidates on the right-hand side of the ballot as well.
If you don't know what regression is, we basically fit the "best" straight line to the scatterplots below (excluding Palm Beach county). Most of the vote shares for Buchanan fall pretty close to this line. If Palm Beach county were like the other counties, according to estimates (using Bush's votes) Buchanan would have gotten around 600 votes in that county instead of the 3407 votes he actually got. If we used total votes to predict Buchanan's vote, we would have predicted Buchanan to get somewhere around 737 votes. The exact results, with confidence intervals (kind of like with polls, when they say +/- 3%) are given with the regression results (click on the picture-links below).
Problems with the simple linear modelA model such as this is readily interpretable, but there are valid statistical concerns regarding the approach (namely, "heteroskedasticity," which affects the confidence intervals around the estimates). A county with only a few thousand people would be very unlikely to generate a Buchanan vote that's off by 2000 votes, but a county with several million people would be more likely to. The simple model described above assumes such an error is equally likely, regardless of the size of the county. Similarly, although it is easy to interpret the graphs of simple raw votes plotted against each other, they are arguably misleading because they can dramatize outliers for large counties (such as Palm Beach).
There are ways to correct for this, but each has some additional problems. One way is to compute the log of each of the variables before running the analyses (or even before graphing the results). This is a much "fairer" way to present the results, but it is hard to interpret the scale after the log transformations, and it is even harder to compute the confidence intervals in terms of votes (i.e., to compute how likely or unlikely 3400 votes in Palm Beach would be). As a technical issue, it also makes certain assumptions about the nature of the "errors" in the model, although the same could be said for most ways of analyzing the results. Log-corrected models can be found from some of the authors linked at the top of this page, and I have posted my co-author's and my results for the "log-adjusted" model as well.
Log transformations to correct for population size do not generally affect the predicted Buchanan vote for Palm Beach county. Regradless of the transformation, if one uses Bush's votes to predict Buchanan's, the "best" guess for Palm Beach is still around 600 votes. However, the transformations do affect how probable 3400 Buchanan votes would be in a county where Bush got 153,000 votes (all else being equal). If you take the logs of the votes, for instance, it's still quite unlikely that Buchanan could get anywhere approximating 3400 votes, but it's within the "realm of possibility" that he could get 1200 or 1500 or even 1800+ votes (or conversely, some number smaller than 600, but fewer than 600 is largely irrelevant for the argument, since the onus is on showing the votes to be irregularly too high). Whereas the simple linear model may make the Buchanan vote in Palm Beach county appear to be atypical on the order of winning the Powerball jackpot, the "population-corrected" model suggests the Buchanan vote would be a little more likely than 1-in-10,000.
Another way to look at the results is to compute the percentage of votes for each candidate within the counties. This has the potential to be even more misleading than using raw votes though, because it treats small counties equal to large counties (i.e., it throws out a piece of information, namely the population of the counties). For instance, if Pat Buchanan buys ten people beers in a bar in Liberty county, where fewer than 2500 people voted, Buchanan's percentage can jump dramatically. If Buchanan buys beers for ten people in Dade county, it would probably not be statistically detectable. By computing percentage of votes, this difference is obscured.
Finally, one could take the raw votes (or even the percentage of votes) and weight them by some function of each county's population. Some of the authors listed above also do this.
If one holds to the statistical assumptions of most of these models, and if Buchanan's unusual performance can be attributed to voters who intended to vote for Gore (an assumption that some have contested), then it can be claimed with a fairly high degree of statistical confidence that the mistakes cost Gore a significant share of votes. If Bush wins Florida by a small amount, a strong claim can thus be made that the confusion over the unique ballot structure in Palm Beach cost Gore the presidency.
Although the results support the contention that the ballot structure in Palm Beach cost Gore a significant share of votes, they do not prove it. Such a proof is impossible, because it requires eliminating all possible rival explanations, no matter how plausible or implausible they may be. We cannot rule out alternative factors, from any personal ties Buchanan has in Palm Beach, to equally confused Bush voters, to alien conspiracies. We can, however, comment on the statistical likelihood of such events, given various assumptions and our empirical observations.
Other ModelsAs noted above, there are lots of ways to estimate Buchanan's vote shares, and I've posted several data sets at the bottom of this page so that anyone can analyze the data they way they believe most appropriate. I am not averse to criticism, but I'm also not going to engage in or even referee debates concerning my analyses or anyone else's (I have enough email already, thank you). Interested readers are encouraged to compare the various models from the authors that I've linked and decide for themsevles.
2000 Florida presidential election, county-level
2000 Florida primary elections, county-level
2000 Florida party registration, county-level
1996 Florida presidential, election, county-level
1996 Florida Republican primary, county-level