# Lewontin's Fallacy

 Infobox Multi Locus Allele Clusters   In a haploid population, when a single locus is considered (blue), with two alleles, + and - we can see a differential geographical distribution between Population I (70% +) and Population II (30% +). When we want to assign an individual to one of these populations using this single locus we will assign any + to population I because the probability (p) of this allele belonging to Population I is p=0.7, the probability (q) of incorrectly assigning this allele to Population I is q=1-p, or 0.3. This amounts to a Bernoulli trial because the answer to the question "is this the correct population?" is a simple yes or no. This makes the test Binomially distributed but with a single trial. But when three loci per individual are taken into account, each with p=0.7 for a + allele in Population I the average number of + alleles per individual becomes kp=2.1 (number of trials (k=3) x probability for each allele (p=0.7)) and 0.9 (3 x 0.3) + alleles per individual in Population II. This is sometimes referred to as the population trait value. Because alleles are discrete entities we can only assign an individual to a population based on the number of whole + alleles it contains. Therefore we will assign any individual with three or two + alleles to Population I, and any individual with one or fewer + alleles to population II. The binomial distribution with three trials and a probability of 0.7 shows that the probability of and individual from this population having a single + allele is 0.189 and for zero + alleles it is 0.027, which gives a misclassification rate of 0.189 + 0.027 = 0.216, which is a smaller chance of misclassification than for a single allele. Misclassification becomes much smaller as we use more alleles. When more loci are taken into account, each new locus adds an extra independent test to the binomial distribution, decreasing the chance of misclassification. Using modern computer software and the abundance of genetic data now available, it is possible not only to distinguish such correlations for hundreds or even thousands of alleles, which form clusters, it is also possible to assign individuals to given populations with very little chance of error. It should be noted, however, that genes tend to vary clinally, and there are likely to be intermediate populations that reside in the geographical areas between our sample populations (Population III, for example, may lie equidistantly from Population I and Population II). In this case it may well be that Population III may display characteristics of both population I and Population II and have intermediate frequencies for many of the alleles used for classification, causing this population be be more prone to misclassification.

Human Genetic Diversity: Lewontin's Fallacy is a 2003 paper by A.W.F. Edwards that criticizes Richard Lewontin's 1972 conclusion[1] that race is an invalid taxonomic construct. Although academic texts generally avoid referring to Edwards' counterargument in the polemicist terms from Edwards' original title, Edwards' critique nevertheless appears in a number of subsequent academic books and popular science books that discuss Lewontin's thesis.[2][3]

## Lewontin's argument

Lewontin argued that because the overwhelming majority of human genetic variation (85%) is between individuals within the same population, and about 6–10% is between populations within the same continent, racial classification can only account for between 5–10% of human variation, and is therefore of virtually no genetic or taxonomic significance. This argument is widely cited as evidence that racial categories are biologically meaningless, and that behavioral differences between groups cannot have any genetic underpinnings.

## Edwards' critique

Edwards argued that while Lewontin's statements on variability are correct when examining the frequency of individual loci between individuals, it is nonetheless possible to classify individuals into different racial groups with an accuracy that approaches 100% when one takes into account more loci. This happens because differences at different loci are correlated across populations — the alleles that are more frequent in a population at one locus and those that are more frequent in that population at another locus are correlated when we consider the two populations simultaneously.

In Edwards' words, "most of the information that distinguishes populations is hidden in the correlation structure of the data." These correlations can be extracted using commonly-used ordination and cluster analysis techniques. As Edwards showed, even if the probability of misclassifying an individual based on a single locus is as high as 30% (as Lewontin reported in 1972), the misclassification probability based on 10 loci can drop to just a few percent.

Neven Sesardic has pointed out that, unbeknownst to Edwards, Jeffry B. Mitton already made the same argument about Lewontin's claim in two articles published in The American Naturalist in the late 1970s.[4]

## Genetic clusters and the fallacy

Studies of human genetic clustering have shown that people can be accurately classified into racial groups using correlations between alleles from multiple loci. For instance, a 2001 paper by Wilson et al. reported that an analysis of 39 microsatellite loci divided their sample of 354 individuals into four natural clusters, which broadly correspond to four geographical areas (Western Eurasia, Sub-Saharan Africa, China, and New Guinea).[5]

On the other hand the results obtained by clustering analyses are dependent on several criteria:

• The clusters produced are relative clusters and not absolute clusters; each cluster is the product of comparisons between sets of data derived for the study, results are therefore highly influenced by sampling strategies, different sampling strategies will produce different clusters. (Edwards, 2003)
• The geographic distribution of the populations sampled; because human genetic diversity is marked by isolation by distance, populations from geographically distant regions will form much more discrete clusters than those from geographically close regions.[6]
• The number of genes used. The more genes used in a study the greater the resolution produced, and therefore the greater number of clusters that will be identified.[7]

Whether or not Lewontin's Fallacy is a fallacy depends on how one defines the concept of "difference" between two genomes.[8] Talking of two genomes being "more similar" or "less similar" implies the existence of a metric. The most naive metric, and the one used implicitly by Lewontin, is that of simply counting the number SNPs. Edwards's criticism of Lewontin amounts to the statement that it is a "fallacy" to use this naive metric, because some SNPs may be in a meaningful way more significant to other SNPs.

If differences are considered to exist when individuals can be accurately classified according using a single randomly chosen trait, then Lewontin's results imply that human races are not distinct in this sense. If, on the other hand, "real differences" are considered to exist when individuals can be accurately classified using a number of traits, then accurate and meaningful classification of human groups is possible. The ability to accurately classify groups using multiple loci is, of course, not simply a property of populations from different continents — any two populations can have their individuals accurately classified in this manner, if enough loci are used.

Conversely, in the paper "Genetic similarities within and between human populations" Witherspoon et al. (2007) show that even when individuals can be reliably assigned to specific population groups, it is still possible for two randomly chosen individuals from different populations/clusters to be more similar to each other than to a randomly chosen member of their own cluster. This is because multi locus clustering relies on population level similarities, rather than individual similarities, so that each individual is classified according to their similarity to the typical genotype for any given population.[9]

The paper claims that this masks a great deal of genetic similarity between individuals belonging to different clusters. Or in other words, two individuals from different clusters can be more similar to each other than to a member of their own cluster, while still both being more similar to the typical genotype of their own cluster than to the typical genotype of a different cluster. When differences between individual pairs of people are tested, Witherspoon et al. found that the answer to the question "How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?" is not adequately addressed by multi locus clustering analyses. They found that even for just three population groups separated by large geographic ranges (European, African and East Asian) the inclusion of many thousands of loci is required before the answer can become "never".[10]

On the other hand, the accurate classification of the global population must include more closely related and admixed populations, which will increase this above zero, so they state "In a similar vein, Romualdi et al. (2002) and Serre and Paabo (2004) have suggested that highly accurate classification of individuals from continuously sampled (and therefore closely related) populations may be impossible". Witherspoon et al. conclude "The fact that, given enough genetic data, individuals can be correctly assigned to their populations of origin is compatible with the observation that most human genetic variation is found within populations, not between them. It is also compatible with our ﬁnding that, even when the most distinct populations are considered and hundreds of loci are used, individuals are frequently more similar to members of other populations than to members of their own population".[11]

## References

1. ^ Made in The apportionment of human diversity (1972) and again in the 1974 book The Genetic Basis of Evolutionary Change.[page needed]
2. ^ The Ancestor's Tale: A Pilgrimage to the Dawn of Evolution by Richard Dawkins and Yan Wong http://books.google.com/books?id=rR9XPnaqvCMC&pg=PA406
3. ^ Sohini Ramachandran, Hua Tang, Ryan N. Gutenkunst, and Carlos D. Bustamante, Genetics and Genomics of Human Population Structure, chapter 20 in M.R. Speicher et al. (eds.), Vogel and Motulsky’s Human Genetics: Problems and Approaches, 4th ed., Springer, 2010, ISBN 3540376534, p. 596
4. ^ Sesardic, Neven (2010). "Race: A Social Destruction of a Biological Concept". Biology & Philosophy 25: 143. doi:10.1007/s10539-009-9193-7.  Mitton's articles are the following:
• Mitton, Jeffry B. (April 1977). "Genetic Differentiation of Races of Man as Judged by Single-Locus and Multilocus Analyses". The American Naturalist 111 (978): 203–212. doi:10.1086/283155.
• Mitton, Jeffry B. (1978). "Measurement of Differentiation: Reply to Lewontin, Powell, and Taylor". The American Naturalist 112 (988): 1142–1144. doi:10.1086/283359.
5. ^ Wilson JF, Weale ME, Smith AC, et al. (November 2001). "Population genetic structure of variable drug response". Nature Genetics 29 (3): 265–9. doi:10.1038/ng761. PMID 11685208.
6. ^ Kittles RA, Weiss KM (2003). "Race, ancestry, and genes: implications for defining disease risk". Annual Review of Genomics and Human Genetics 4: 33–67. doi:10.1146/annurev.genom.4.070802.110356. PMID 14527296.
7. ^ Tang H, Quertermous T, Rodriguez B, et al. (February 2005). "Genetic structure, self-identified race/ethnicity, and confounding in case-control association studies". American Journal of Human Genetics 76 (2): 268–75. doi:10.1086/427888. PMID 15625622.
8. ^ Chakraborty, Ranajit (September 1982). "Allocation Versus Variation: The Issue of Genetic Differences Between Human Racial Groups". The American Naturalist 120 (3): 403–4. doi:10.1086/283998.
9. ^ Witherspoon DJ, Wooding S, Rogers AR, et al. (May 2007). "Genetic similarities within and between human populations". Genetics 176 (1): 351–9. doi:10.1534/genetics.106.067355. PMID 17339205.
10. ^ Witherspoon DJ, Wooding S, Rogers AR, et al. (May 2007). "Genetic similarities within and between human populations". Genetics 176 (1): 351–9. doi:10.1534/genetics.106.067355. PMID 17339205.
11. ^ Witherspoon DJ, Wooding S, Rogers AR, et al. (May 2007). "Genetic similarities within and between human populations". Genetics 176 (1): 351–9. doi:10.1534/genetics.106.067355. PMID 17339205.