There is No Satisfactory Form of Utilitarianism

In 2000, a paper written by Gustaf Arrhenius called An Imposibility Theorem for Welfarist Axiologies showed that all forms of utilitarianism lead to at least one of three highly undesirable implications. The result is extremely dire for those of us who might have hoped that utilitarianism could give good answers to ethical questions about birth, death and human populations.

A “population axiology” is a way of combining the welfare of many people into a single measure of the welfare of everyone. For instance, adding a numerical measure of everyone’s happiness together is one axiology; averaging their wellbeings is another; summing the monetary wealth of the 1,000,000 poorest people is a third.

Note that two things vary about these population axiologies: what kind of welfare is being measured (“happiness”, “wellbeing”, “monetary wealth”), and how these quantities are combined across the population (“total”, “average”, “sum for the 1,000,000 worst-off people”). Utilitarian ethical theories basically require a population axiology, though there are many non-utilitarian axiologies too.

Ahrrenius’ proof shows that any axiology which satisfies three basic sanity conditions1 necessarily leads to at least one of three distressing conclusions. Each of these conclusions is deeply contrary to our ethical intuitions. Let us meet the three prongs of the dilemma:

Option one: The Repugnant Conclusion

For any population of very happy people, there exists a much larger population with lives barely worth living that is better than the group of very happy people (according to the population axiology).

Option two: The Sadistic Conclusion

Suppose we start with a population of very happy people. For any proposed addition of a sufficiently large number of almost-as-happy people, there is a small number of horribly tortured people that is (according the population axiology) a preferable addition.

Option three: The Very Anti-Egalitarian Conclusion

For any population of two or more people which has uniform happiness, there exists another population of the same size which has lower total and average happiness, and is less equal, but is better (according to the population axiology).

Utilitarianism needs to do a great deal of wriggling to escape these implications. For instance, some of the STM authors favour variants of utilitarianism that are limited to pre-defined populations: we can say what is best in a world containing Mary, Fred and Jane, but not whether the world is better if Abigail also lives in it. Such positions are coherent, but they are unable to address important ethical questions: Should we have children? Should we try to prevent overpopulation? When is it wrong to abort a fetus?

Topic for discussion: is Arrhenius’ impossibility theorem more serious for the project of using utilitarianism to answer questions of public policy than Arrow’s theorem is for the project of using voting systems to elect politicians?



[1] The basic sanity conditions are almost immune to reasonable disagreement. They are:

  • The “dominance condition”: if population A is the same size as population B, and every person in A is happier than their equivalent in B, then A is better than B;
  • The “addition principle”: if it is bad to add a group of people B to population A, where the people in B are worse off than those in A, then it is at least as bad to add a group of people C, where C is larger than B and those in C are worse off than those in B.
  • The “minimal non-extreme priority principle”: there exists a number of people n such that adding a n extremely well-off/happy people and a single person of slightly negative welfare to an original population A is better than not adding anyone to A. It is notable that the Difference Principle advocated by John Rawls does not in general satisfy this principle and is therefore not subject to the Arrhenius theorem.

Misattribution of Arousal

In 1974, Donald Dutton and Arthur Aron published an important paper on what the psychology literature calls misattribution of arousal. Their study involved having both a man and a woman stop men on a bridge and administer a questionnaire that included some basic questions and imagination exercises. After the questionnaire was filled out, the person administering the questionnaire offered his or her phone number and an offer to talk more about the study later. The researched did this on two different bridges. One rather scary rope bridge and one more solid bridge made of wood. What they found was that the men who talked to the woman on the scary bridge were somewhat more likely to come up with more sexual stories and much more likely to call up the person administering the questionnaire.

Dutton and Aron’s paper went on to describe a series of other experiments getting at the same phenomenon, including one in which they told male research subjects in the presence of an attractive women that they were going to be given either strong or weak electric shocks. They asked the men about how attracted they were to the woman and, sure enough, the men who thought they were going to be given the strong shocks described themselves as more sexually attracted.

The basic finding was that when men became excited — either because they thought they were going to be shocked or because they were dangling over a cavern — they confused their physiologically aroused state for one that was brought on by sexual attraction. The experiment’s subjects “misattributed” part of their their arousal to sexual attraction.

In the recent bestseller Blink, Malcolm Gladwell argues that “thin-slicing”, or our instantaneous decisions — often based on physiological reactions — are often as good as or even better than carefully planned and considered ones. The concept has been given a huge amount of attention recently due to the popularity of Gladwell’s book.

Dutton and Aron’s work, and a host of work that came both before and after, provides a useful set of counter examples. While our basic reactions can be useful, our minds are not always very good at understanding why we feel what we feel. Feelings as basic as who we find attractive are influenced, often heavily, by our environment in ways we do not and perhaps cannot easily understand. Gladwell focuses on the way we make correct decisions quickly but there’s another side to that coin. Actions influenced by misattribution of arousal and of other physiological states are much more common than we know.

Are Whites More Supportive of the Death Penalty for Blacks?

In 2001 political scientists Mark Peffley and John Hurwitz carried out two surveys. The first one found:

Do you favor or oppose the death penalty for persons convicted of murder?

Somewhat favor: 29%
Strongly favor: 36%

The second one found:

Some people say that the death penalty is unfair because most of the people who are executed are African-Americans. Do you favor or oppose the death penalty for persons convicted of murder?

Somewhat favor: 25%
Strongly favor: 52%

Total support jumped 12 points from 65% to 77%, while strong support jumped 16 points (36% to 52%).

Additional results: Another condition replaced “mostly black” with “many innocents” with no significant effect. 50% of blacks favored the death penalty in the neutral phrasing, 38% in the “mostly black” condition, and 34% in the “many innocents” condition.

Unit Bias

Last year, a group of psychology researchers at the University of Pennsylvania published a paper on what they call “unit bias.”

In their study, the researchers would top up a bowl of an M&Ms in the hallway of an apartment building each day. Next to the bowl was a scoop and a sign that said, “Eat Your Fill: please use the spoon to serve yourself.” Some days, they left it with a tablespoon sized spoon. Other days, they provided a quarter-cup sized spoon (quadruple the size). The result? An average of 1.67 times more M&Ms were taken on the days that the big scoop was there than on the days when it was the spoon.

The researchers did similar experiments with large and small tootsie rolls and with halved and unbroken pretzels. The results were similar in each case: when the “unit” of food was larger, people took more. While the amount taken did not vary in proportion to size of the unit (i.e., people did not take 4 times as many M&Ms), the effect was clear. The researchers concluded that, “consumption norms promote both the tendency to complete eating a unit and the idea that a single unit is the proper portion.”

Dan Lockton of the Architectures of Control blog has ruminated on the implications of this research and described the way that it played into more expensive foods with larger portion sizes (versus two-for-one deals) at fast food changes like McDonald’s.

Is the net effect of health care zero?

UPDATE: Nyman (2007) points out the RAND study discussed in this piece had a terrible flaw that undermines its argument. See the update below.

In the 1970s, the RAND Corporation picked out 7700 people in six cities and gave half of them free health care. Those lucky ones took advantage of it (spending 30-40% more on average) and they spent it on reasonable things (as judged by medical observers), but they didn’t seem to get any healthier. As the study put it:

For the average participant, as well as for subgroups differing in income and initial health status, no significant effects were detected on eight other measures of health status and health habits. Confidence intervals for these eight measures were sufficiently narrow to rule out all but a minimal influence, favorable or adverse, of free care for the average participant.

The only exceptions: an improvement in vision (not too surprising that free glasses will help people see better) and an improvement in blood pressure. But for the latter, critics point out that you’re statistically likely to see improvement in one metric in a sample of this size, even if none of the metrics actually improve.

The RAND study was by far the biggest study of this kind, but other studies find similar results. One analysis found that regions whose Medicare programs give out more money (when the underlying healthiness of the residents is held constant) see no increase in survival rates. A replication found the same results in VA hospitals. Cross-national comparisons find “the impact of public spending on health is … both numerically small and statistically insignificant”. Correlational studies find “Environmental variables are far more important than medical care.” And there are more where that came from.

There are two possible explanations. One is that, as Robin Hanson puts it, medicine is a scandal. “[T]he medical research literature must suffer from severe biases, such as fraud, funding bias, treatment selection bias, publication selection bias, leaky placebo effects, misapplied statistics, and so on. How else can we square the usual positive benefit found in medical publications with a net zero benefit?”

The other is that each individual treatment is effective on the particular grounds measured over the particular time period investigated, but that this only leaves people open to other health problems. Drug-coated stents, for example, are effective at opening blocked heart arteries, but they appear to also cause blood clots.

(Addendum by pde: there is an important third possibility, which is that the 30-40% of extra spending amongst the group that got free health cover simply didn’t result in any better treatment: they might have been charged more, given more hand-holding, and more treatments of unclear benefit, while the control group were getting equally good/bad access to the treatments that are extremely beneficial, such as early excision of melanoma or insulin for type 1 diabetes. It may be that social and cultural factors, rather than ability to pay, determine who gets access to good health care.)

UPDATE: Nyman (2007) points out that one reason this could be the case is that people in the subgroup that had to pay for their health care could voluntarily leave the study if they were sick, returning to their previous insurance regime where they may not have had to pay as much for treatment. And, indeed, he finds that 16 times as many people voluntarily left the pay subgroup as the free subgroup. This would seem to severely throw these findings into question.

As Richard Lewontin argued many years ago:

What is the evidence for the benefits of modern scientific medicine? Certainly we live a great deal longer than our ancestors. [... But a] very large fraction of the change [...] is a tremendous reduction in infant mortality. [... I]n 1860, the infant mortality rate in the U.S. was 13 percent–so the average life expectancy for the population as a whole was reduced considerably by this early death. The gravestones of people who died in the middle of the nineteenth century indicate a remarkable number of deaths at an old age. In fact, scientific medicine has done little to add years for people who have already reached their maturity. In the last 50 years, only four months have been added to the expected life span of a person who is already 60 years old.

[...] As far as we can tell, the decrease in death rates from the infectious killers of the nineteenth century is a consequence of the general improvement in nutrition and is related to an increase in the real wage. In countries like Brazil today, infant mortality rises and falls with decreases and increases in the minimum wage. (42ff)

Of course, universal health care is a lot more politically palatable than a universal living wage.

(Thanks to jmc for suggesting this piece and to Robin Hanson and his site Overcoming Bias for collecting so many resources on it. Hanson is starting a petition to request a larger version of the RAND study.)

Sex, and other -isms of science

It is tempting to think that serious sexism died in the 1970s. These days, overt gender discrimination is unusual, and slightly risky for its perpertrators. But a classic study, conducted in Sweden in 1995, found that sexism (and perhaps less surprisingly, nepotism) had retained a preeminent role in the allocation of scientific jobs and the making-or-breaking of scientific careers.

Christine Wennerås and Agnes Wold analysed the application process for Swedish post-doctoral medical research fellowships that year (they had to make freedom of information requests to get their data). They observed a field of 114 applicants competing for 20 jobs. 46% of the applicants were female, but only 4 of them won positions.

Using multiple regressions, the authors estimated which characteristics of candidates led to high “scientific competence” ratings from the reviewers: what was the relative importance of educational background, publication and citation records,1 the applicant’s gender, the presence of relationships to assessors and other factors?2

The results are shocking. Being female was a major liability: a candidate would need to have 3 extra articles in Nature or Science (or 20 in decent specialist journals) just to counteract the disadvantages she faced for being a woman. There were two women in the pool so prolific that they won post-doc jobs this way, but for most good female scientists there was only one hope for getting a position: knowing someone on the review committee.

Wennerås and Wold measured these personal connections by observing whether a member of the committee recused himself from reviewing the application because he knew the applicant. The presence of such a relationship conferred an advantage of similar size to the advantage of being male.

Sweden has a reputation for some of the most progressive attitudes and policies on gender relations in the world. It is disturbing that despite this, the patriarchy (accidental or otherwise) was still firmly in place in 1995. The most hopeful explanatory hypothesis that Wennerås and Wold offered for their results was the fact that 90% of the application reviewers were male. But it’s hard to say when that will change.


  1. Variables were included for total number of publications, total number of publications weighted by impact factor, first-author publications weighted by impact factor, total citations, and total citations to the candidate’s first-author publications. 

  2. The other factors were letters of recommendation, field of research, foreign nationality, and overseas experience. 

Can humans act utilitarian?

One of the most important schools of ethical thought is consequentialism, which holds that the best actions (or rules, or ways of making decisions) are simply the ones that lead to the best outcomes. When acts are bad (hitting someone with a stick, for instance) it is not because of the deed itself but because of the results that follow — pain, injury, lost friendships. Failing to intercede to prevent something bad from happening to someone else is almost as bad as taking the action yourself.1

The largest branch of consequentialism is utilitarianism. Utilitarians hold that the “best” outcomes are those which are the best for people collectively: “the greatest good for the greatest number”, as Bentham put it.

Utilitarianism calls for two things: altruism and calculation. It tells us, “if you know that the benefit that you would get from this hundred dollar note is less than the benefit that your impoverished friend Susan would get from a second-hand bicycle, you should buy her the bicycle.” And it tells us, “if you know your $100 could save a life in Darfur, you should send it to an humanitarian organisation there instead”. In fact, if lives can be saved for such small amounts, maybe we should be sending more than $100.

A recent study by Deborah Small, George Lowenstein and Paul Slovic demonstrates that, although human beings are capable of altruism, our altruism is in some sense psychologically incompatible with the kind of rational calculation we’d need to perform to be good act-utilitarians.

The experiment by Small et al. shows clearly that human beings2 donate significantly more money to help the victims of catastrophes when two conditions hold: (a) the victim is an identifiable individual, rather than an undetermined individual or a large group in need; and (b) the donor is reasoning emotionally.3

When the experimental subjects were told about the human tendency to donate to indentified individuals in need (rather than large groups in need), they stopped reasoning emotionally. That change halved donations to identified individuals, but did not affect the alread-low donations to groups!

When the authors “primed” some experimental subjects with emotion-based tasks (`how does the word “baby” make you feel?’), and others with mathematical tasks, they observed that the emotionally-primed subjects gave twice as much to identified individuals. Both groups gave similar, low amounts to groups in need.

There are some powerful logical arguments in favour of act-utilitarianism and similar ethical positions. But until we find a way to train, trick, or teach ourselves to live by them, these philosophies will remain incomplete.

Thanks to Toby Ord for suggesting this paper.


  1. From a consequentialist perspective, the main difference between sins of commission (hitting someone with a stick) and sins of omission (failing to stop a branch falling on someone) is that we can’t usually predict events precisely when we aren’t causing them, and we can’t be sure of our ability to prevent them. There are psychological differences too: we might lose a friend for the first action but not the second. 

  2. The results apply to human beings or, at least, to students sitting on their own in a cafeteria at a “University in Pennsylvania”. It would be worth repeating the experiment with other demographics, especially those with more experience of philanthropy. 

  3. Both (a) and (b) were already in the preceeding literature; Small et al. show that altruism increases only when they both hold. 

Who truly governs America’s cities?

Who Governs? is a widely-hailed classic in the field of political science; it was the book that basically made the career of “the Dean of American political scientists”: Robert A. Dahl. In it, Dahl attempts to discover how government really works in America. To do this, he decides to study decision-making in a typical American city — namely, the one outside his office at Yale University: New Haven, Connecticut.

Who Governs? argues that New Haven worked according to Dahl’s theory of “pluralism”: elite political groups exist, but they aren’t very powerful. Instead, they balance each other out, leaving politicians (and thus their voters) firmly in control.

Fifteen years later, the political sociologist G. William Domhoff went over the issues covered in the book (including Dahl’s notes and sources, which Dahl was honest enough to share) only to find that Dahl had badly bungled the research. Upon closer review, Dahl’s own notes, plus a few new sources, revealed exactly the opposite story. This is the story of how Dahl got things so badly wrong.

Finding the elite

Dahl begins by claiming that there’s little overlap between the city’s social and economic elite, part of his argument that different groups of elites balance each other out. So he counts company presidents, individuals with significant property in the city, directors of multiple sizable city firms, and any director of a bank in the city. Then he takes this list and sees how many of them attended the New Haven Lawn Club debutante ball. He doesn’t find much overlap.

Domhoff points out that this is kind of an odd metric. For one thing, not all the elites go to the debutante ball, while many people from out of town do. So instead he says any member of one of New Haven’s three elite social clubs is a social elite, while anyone who’s a director of one of New Haven’s ten most interlocked firms is an economic elite. (Firms are interlocked when they share members of their board of directors.) He finds incredible overlap — of the entire corporate network, 55% are in a social club; of those on two boards, 80% are. So much for that.

Deciding urban renewal

But the bulk of Dahl’s study is his attempt to see who actually makes decisions on three important issues. He picks (arbitrarily) political nominations, public education, and urban renewal. For each, he interviews the major players to find out how the relevant decisions got made. Domhoff points out that political nominations are rather uninteresting, since they’re internal party disputes, and that elites don’t care about public education, since they all live in the suburbs or send their kids to private schools. Which leaves urban renewal.

New Haven went through a massive urban renewal shortly before Dahl’s study and Dahl claims it was orchestrated by the city’s mayor, who heroically fought resistance-to-change on all fronts, selflessly ensuring what was best for New Haven. (The urban renewal project in fact ended up completely destroying New Haven’s downtown, but that’s a separate story.) As Dahl quotes the mayor: “Redevelopment in New Haven began in February of ’55. We had to start from scratch and assemble a team and start to file all the papers and get the whole program launched.” But Dahh omits a key piece of context.

Urban renewal had in fact been in the works for years, at the insistence of the town’s Chamber of Commerce. When the new mayor took office, the Chamber of Commerce quickly organized a meeting with him at which “the entire program [of urban renewal] would be explained to him and he would be urged to get action started on the program” (as their own minutes described it). A representative met with Mr. Lee at one of the elite social clubs and reported back that “Mr. Lee said he was in entire agreement with [our] program for action.”

So why did Lee claim that he had to start from scratch? Turns out, the city was having trouble getting some of their filings approved, so they decided to try a new strategy and assemble a new team, which begun by refiling all the relevant permits. But this was just a technical detail — the urban renewal plans themselves had long been in the works.

Normally in science, you refute someone’s results by conducting the same form of research yourself under different circumstances. But Domhoff went much further: he reexamined the very same research that Dahl conducted, even using Dahl’s own notes and transcripts. But the conclusions he came to were wildly different. It’s hard to think of a more stunning refutation. Not that political science was interested in hearing it. Dahl remains the field’s idol, while Domhoff is an obscure professor at UC Santa Cruz.

What’s the best way to fight drugs?

In 1994, the RAND Corporation, a major US military think tank, conducted a massive study (with funding from the Office of National drug Control Policy, the US Army, and the Ford Foundation) to measure the effectiveness of various forms of preventing the use of illegal drugs, particularly cocaine.

They analyzed a variety of popular methods and calculated how much it would cost to use each method to reduce cocaine consumption in the US by 1%. Source-country control — military programs to destroy drug production in countries like Peru, Bolivia, and Colombia — are not just devastating to poor third-world citizens; they’re also the least effective, costing $783 million for a 1% reduction. Interdiction — seizing the drugs at the border — is a much better deal, costing only $366 million. Domestic law enforcement — arresting drug dealers and such — is even better, at $246 million. But all of those are blown completely out of the water by the final option: funding treatment programs for drug addicts would reduce drug use by 1% at a cost of only $34 million.

In other words, for every dollar spent on trying to stop drugs through source-country control, we could get the equivalent of twenty dollars benefit by spending the same money on treatment. This isn’t a bunch of hippy liberals saying this. This is a government think tank, sponsored by the US Army.

Evo psych error roundup

An influential group of biologists, psychologists, and other busybodies has for decades promoted the idea that the social sciences should be grounded in the ideas of evolution, that human behavior should be predicted from estimates of what evolution would do. The idea has been heavily promoted from the 1970s, when it was called sociobiology, until today, where it’s called evolutionary psychology (evo psych for short), but little in the way of compelling evidence has been produced. Today, we’ll focus on some less than compelling evidence.

Exhibit A: One common (and characteristically offensive) claim among evopsychers is that your mother’s mother will spend more time caring for you than your father’s mother because — naturally enough — your father’s mother isn’t evolutionarily certain that you have her DNA, since your mother could have been impregnated by any one of tons of guys. The data does indeed seem to bear this out, but sadly this is no win for the evopsychers, since there are some perfectly competent alternative explanations: kids are usually primarily raised by their mothers and its not surprising that those mothers will look to their mothers for help. (via Jeremy Freese)

Exhibit B: In 1995, Christenfeld and Hill argued that since fathers were so unsure if kids were really theirs, evolution would ensure that kids looked more like their fathers than their mothers, so that they wouldn’t be abandoned by deadbeat dads. And, sure enough, they had some students rate whether kids looked more like their father or mother and found that they looked more like their father. Robert French later redid the study, only to find that he couldn’t replicate the results. Oops. (via Mark-Jason Dominus)

Exhibit C: In 1993, Devendra Singh spent months pouring over old copies of Playboy — for science, of course. He set about measuring the waist-to-hip ratios of Playboy models and Miss America winners, concluding that they had maintained relatively constant — approximately .70 — even as the models had gotten thinner over the years. He argued that men were evolutionarily wired to find this “hourglass shape” attractive. The result was quoted in just about every evopsych textbook and news article since. Well, Jeremy Freese and Sheri Meland checked the numbers and found — once again — that none of it was true. There have actually been statistically significant changes in waist-to-hip ratios over time. (original article)