Skip Navigation

The genetic architecture of normal variation in human pigmentation: an evolutionary perspective and model


Skin pigmentation varies substantially across human populations in a manner largely coincident with ultraviolet radiation intensity. This observation suggests that natural selection in response to sunlight is a major force in accounting for pigmentation variability. We review recent progress in identifying the genes controlling this variation with a particular focus on the trait's evolutionary past and the potential role of testing for signatures of selection in aiding the discovery of functionally important genes. We have analyzed SNP data from the International HapMap project in 77 pigmentation candidate genes for such signatures. On the basis of these results and other similar work, we provide a tentative three-population model (West Africa, East Asia and North Europe) of the evolutionary–genetic architecture of human pigmentation. These results suggest a complex evolutionary history, with selection acting on different gene targets at different times and places in the human past. Some candidate genes may have been selected in the ancestral human population, others in the ‘out of Africa’ proto European-Asian population, whereas most appear to have selectively evolved solely in either Europeans or East Asians separately despite the pigmentation similarities between these two populations. Selection signatures can provide important clues to aid gene discovery. However, these should be viewed as complements, rather than replacements of, functional studies including linkage and association analyses, which can directly refine our understanding of the trait.


It has long been noted that the vast majority of the genetic diversity found in the human species is distributed within geographic populations (13). Only 5–15% is observed between groups, reflecting our recent origin less than 200 000 years ago. In contrast and intriguingly, an estimated 88% of the total variation in skin color is found among geographic groups (4). The discordance is probably the consequence of intense selective pressure in the past on important attributes of the skin, the organ that most immediately and extensively interfaces with our environments.

The clear correlation between skin pigmentation and incident ultraviolet radiation (UVR) (5,6) suggests that sunlight is one of the most important environmental variables shaping normal pigmentation variation. However, the specifics of how UVR has affected skin pigmentation are subject to debate and may be complex. Some argue that darker skin is likely to be favored in regions of high UVR for its protection against sunburn and skin cancer (7,8). Others suggest that variation in pigmentation evolved to regulate the penetration of UVR, balancing the need to prevent folate photolysis but permit sufficient vitamin D photosynthesis (6).

An alternative hypothesis emphasizes the importance of sexual selection as opposed to natural selection in response to the environment (9). Darwin was an early proponent of this theory in which differences in pigmentation (hair, eye and skin color) are explained by reproductive variation driven through the perceived attractiveness or desirability of a particular appearance (10). More recently, a multi-staged evolutionary model has been proposed for vertebrate evolution (11). Under this model, natural selection is the predominant force in the earlier stages of a trait's evolution, whereas sexual selection plays a role later. At this later stage, divergence is guided within species clades by different forms of sensory communication among members. These include visual, auditory and/or other behavioral characteristics and cues. In our view, this model may help explain the evolution of skin color in humans, although further work is required to disentangle specifically which genes and types of selection have been important in shaping this trait.


Skin pigmentation is largely determined by the production and deposition of melanin, which is synthesized from the amino acid tyrosine. Melanin manufacture occurs within organelles called melanosomes in specialized melanocyte cells located on the basement membrane at the epidermal/dermal junction. The process is complex and multi-staged and necessarily involves several gene products (12). Indeed more than 100 genes have been implicated in the determination of pigmentation in the mouse and most of these have human homologs (13). These genes include transcription factors, membrane and structural proteins, enzymes and several kinds of receptors and their ligands. However, it is not likely that all genes involved in the process are contributing to normal human variation. Early heritability studies indicated a limited number of major genes influencing the observed variation (14,15). The most comprehensive study, carried out by Harrison and Owen (15), suggested that just 3–4 genes contributed to the phenotypic difference in pigmentation in British people of mixed European/West African ancestry. Later reanalysis of the Harrison and Owen (15) data indicated that potentially 10 genes or more could be underlying those differences in skin color (16).

More recently, several strategies have begun to identify the genes determining normal variation across human populations. Model organisms have played a crucial role in identifying some of these and establishing functional roles. For example, SLC24A5, a probable melanosomal cation exchanger, was identified from its ortholog in the zebrafish (mutations in this gene cause the golden zebrafish phenotype) (17). The role of SLC24A5 in determining human skin color was demonstrated using admixture mapping (18,19), a method developed to test for linkage between ancestry-informative markers and traits and diseases that differ between the parental populations that were brought together to form the admixed population (reviewed in 20 and 21). Variants in SLC24A5 play a central role in skin lightening in Europeans, explaining ∼30% of the difference in skin pigmentation between European and West African populations (17). Admixture mapping has also been used to demonstrate an important role for MATP in European/West African differences in skin color (22) and for establishing minor effects of OCA2, TYR (23) and ASIP (24) across these same populations. Admixture mapping has also implicated a gene in the region of MYO5A and SLC24A5 in determining the differences in skin color between native Americans and Europeans (25).

Several other genes have been associated with intra-continental variation. Variants in MC1R, one of the critical receptors in melanogenesis, are associated with red hair and pale, non-tanning skin in Europeans (26). MATP and OCA2 have been associated with eye color in Europeans (27,28). A joint effect of MC1R and OCA2 on skin pigmentation in Tibetans has also been demonstrated (29).


In addition to these traditional routes, it is becoming increasingly clear that detecting the signature of a selective event is an important signpost to functionally important genes (30,31). Furthermore, regional population-specific signatures of selection can and should be an important approach to discover genes that explain phenotypic traits, like pigmentation, that differ among populations. For instance, a 150 kb region surrounding the SLC24A5 gene shows a large drop in heterozygosity in Europeans, consistent with the so-called ‘hitchhiking’ effect where a functional variant under strong selection drags linked polymorphism to higher frequencies at the expense of other haplotypes (17). This selective sweep argues that the move toward lighter skin in these populations did not result simply from a relaxation of natural selection, as has been suggested (9,32). In addition, the absence of any comparable signature of selection in two East Asian populations suggests that the phenotypic similarity between East Asians and Europeans (light skin relative Africans) is an example of convergent evolution. Interestingly, Jablonski and Chaplin (6), on the basis of analyses of the geographic distributions of skin color and UVR intensities, have predicted that genetic convergence may prove common.

Given such clear potential insights, it is not surprising that several studies have reported using selection signatures to aid the discovery of functionally important genes. This approach is facilitated by the availability of public genome-wide SNP data resources from Perlegen Sciences (33) and in particular, the International HapMap project ( General genome screens of population-specific positive selection using these data have identified several pigmentation candidates as having been under such pressures (8,34). We summarize the positive findings from these studies in Table 1.

Table 1.

Pigmentation candidate genes showing positive signatures of selection

To add to these existing studies, we examined SNP data in 77 genes, where there is a priori evidence for a role in pigmentation, for signatures of selection. A suite of three simple selection statistics were calculated for each of the three HapMap populations [Yoruba from West Africa (YRI), Japanese and Han Chinese from East Asian (ESA), European-Americans (CEU)] using an overlapping sliding window approach across the genome. 25 kb regions were successively examined across each chromosome, with start position incremented by 5 kb from the previous window. This analysis is an extension of one previously used on a smaller set of candidate genes (22).

The selection statistics we used explore changes in allele frequency spectra that suggest non-neutral evolution. The first is inter-population divergence, which is expected to increase for loci that have undergone selection in one population. Pairwise genetic distances as FST values were calculated (35) between populations, and these values were used to calculate the locus-specific branch length (LSBL) for each SNP. The LSBL essentially apportions the divergence measured by overall FST into three population branches (CEU, YRI or ESA; 36) and in the case of three populations is equivalent to the population-specific FST (37). Values for each SNP were averaged to give a single LSBL for each window.

The other two statistics quantify reductions in diversity, sometimes associated with selective sweeps. Tajima's D, one of the most commonly used such measures, assesses the allele frequency spectrum within a region for evidence of selection (38). An excess of low frequency variants gives a negative Tajima's D value, possibly indicative of directional selection. The third statistic quantifies population-specific loss in genetic diversity as the natural log of the ratio of heterozygosity (lnRH) between it and each of the other two populations. These are highly negative when the population in question has a low heterozygosity for any region or window compared with another population. As with LSBL, lnRH values were calculated for each SNP individually and then averaged for each 25 kb window. Windows were mapped to genes of interest on the basis of the position of the central base–pair of the window and the Refseq coordinates of the largest gene transcript including an additional 10 kb both 5′ and 3′ to potentially capture nearby regulatory regions.

An inherent limitation of population genetic inferences from this type of data is that the unusualness or significance of these statistics depends on the underlying demographic history of each population. For example, negative Tajima's D values can result from a severe reduction in population size (bottleneck). One way to gauge significance is to simulate population history and compare the observed value with the simulated range. However, the accuracy of this depends on the correct specification of the history, which is generally unknown. An alternative approach compares a particular observed result with those found in other regions of the genome: an empirical distribution. Demographic history will affect the entire genome, whereas natural selection will only impact a subset of loci and these are expected to lie at the tails of any empirical distribution. To gauge the significance of values observed in candidate pigmentation genes, we adopted the latter approach. In ∼500 000 windows spanning the entire genome, ∼3.45 million SNPs (typed in all three populations) were examined. These were used to assign an empirical P-value to each window simply given by the fraction of windows where a greater value was observed. Given differences in the rates of evolution between the X-chromosome and the autosomes, we treated these chromosomes separately.

An idealized case of positive selection in one population is expected to result in high LSBL and significantly negative Tajima's D and lnRH (relative to the two other populations), and encouragingly, this is observed in Europeans for both SLC25A5 and MATP (22). Although many other genes with suggestive signatures of selection were also identified across studies, several others are unique to one study or indeed to one test of selection within the same study. One reason for this is the limitations of different tests of selection and how they are applied. The performance of many will vary depending on the age of the selection event, its strength and the nature of the pre-existing variation at the locus in question (31). For example, computer simulations have shown that selection signatures can vary depending on whether selection acts on a de novo mutation or a polymorphism already segregating in the population (39). The data sets analyzed may also be inadequate to the task in some respects. The HapMap project was primarily designed as a resource to aid gene discovery through association mapping and consequently has an ascertainment bias toward SNPs with higher levels of heterozygosity (40). Such a bias may affect statistics like Tajima's D, which were originally developed for data with full ascertainment of polymorphism (e.g. resequencing studies), but have been extended to SNP window data [Carlson et al. (41) and Kelley et al. (42) refer to this statistic as TDGen]. For these reasons, it is likely that many evolutionary approaches will miss at least some genes that have functionally significant roles in shaping human pigmentation. Conversely, some signatures of selection in candidate genes will be false positives or even real signatures of selection that result from other gene functions not linked to skin pigmentation. It is also possible that any signature in a candidate gene is actually associated with a different but closely linked gene and the ‘candidate gene’ is merely carried along by hitchhiking. Nonetheless, evolutionary analyses are relatively easy means to screen candidate genes and quickly ‘nominate’ those that should be prioritized for further investigation using linkage analysis, gene association and other functional methods.


We have combined our results with those from other studies to form a simple and somewhat speculative model for some aspects of the evolution of pigmentation in humans. Our model is displayed graphically in Figure 1, which shows an average tree of the three HapMap populations, with general skin pigmentation trends indicated by shading on the branches. This human tree would join with a hominin ancestor who likely had light skin because of the protective shielding provided by complete body hair coverage (6). It seems plausible that coincident with the loss of fur in the lineage leading to early Homo sapiens, there was strong selection for skin darkening (Branch 1 in Fig. 1; 43,44). Selection-nominated pigmentation genes on this branch include MITF and EDN3, which show evidence of common negative Tajima's D across all three populations. Later, the African and the proto-European/East Asian populations diverged upon the ‘out of Africa’ expansion (Branch 2), followed by the splitting of East Asians and Europeans (Branches 3 and 4, respectively).

Figure 1.

Speculative framework model for the evolutionary–genetic architecture of human pigmentation in three populations. The tree shows the average relationships among three human populations, with generalized deduced skin pigmentation level of these populations indicated by shading on the branches which are labeled 1–5. Genes hypothesized to have been subject to positive selection listed by branch are summarized in Table 1 along with the source(s) of evidence supporting their placement.

Previous evidence (17,22) indicates that the genetic mechanisms resulting in the light skin of both Europeans and East Asians may be largely different and therefore did not evolve prior to their population divergence. However, some candidate genes show evidence of selection in both populations, which may be indicative of an important role pre-dating any divergence (Branch 2) or the independent evolution of the same function in both populations (Branches 3 and 4). To help distinguish between these possibilities, we supplemented our initial analysis by examining the presence of haplotype sharing (in 200 kb regions centered on several nominated genes), which helps us to distinguish whether the haplotypes present in populations showing signatures of selection are the same in the two populations (suggesting Branch 2 localizations) or different (suggesting Branch 3 and 4 localizations).

In addition to the European-specific history already demonstrated for SLC24A5 and MATP, our analyses (in combination with other studies) suggest that MYO5A, DTNBP1, TYRP1, EDA, OCA2 and KITLG may have undergone European-specific changes and may thus be functionally affecting skin lightening in Europeans (Branch 4). These evolutionary analyses also point to strong nominees underlying a parallel mechanism behind the same phenotype in East Asians (Branch 3). In particular, ADAM17 and ADAMTS20 show strong signatures of selection in East Asians comparable with those of SLC24A5 and MATP in Europeans. DCT, MC1R, LYST, EDA, OCA2 and ATRN also show relatively strong signatures of distinct East Asian selective events.

Two additional genes, ASIP and BNC2, show evidence for selection in both East Asians and Europeans, with haplotype sharing between the populations pointing to a selective event prior their divergence (Branch 2). In a further sign of the potential evolutionary complexity of human pigmentation, two other genes, LYST and KITLG, appear to have selective events that may have started on Branch 2 and then continued to Branches 3 and 4, respectively. West African populations may also have experienced adaptive pressure, perhaps for skin darkening relative to the ancestral human population at the time of the out-of-Africa migration, but we find less evidence for this and have placed no genes on Branch 5.

It is important to note that this simplified phylogeny and model only encompass some of the variation in human pigmentation. For example, it remains unclear whether other similarities in human pigmentation across the world, for example those leading to dark skin in West Africa, South Asia and Melanesia, are the result of common evolutionary events and genetic mechanisms. Investigation of these questions has already begun using an evolutionary approach by examining the genetic distance (FST) of populations in pigmentation candidate genes relative to empirical distributions from large numbers of SNPs (22). Our simple model is also deficient in assuming broad population groups are uniform in terms of pigmentation mechanism and evolution. For example, related phenotypes such as hair/eye color and skin response traits (like tanning) vary substantially in Europe. MC1R variants, known to influence these traits, vary significantly in frequency across the continent. MC1R or another gene may explain a substantial difference in tanning ability between West (e.g. British Isles) and North (e.g. Scandinavia) Europe despite similar constitutively pigmentation (45).

Notwithstanding recent advances in the identification of genes, there is still much work required to elucidate the full genetic architecture of normal human hair, eye and skin pigmentation. For example, it will be interesting to discover the genetic basis of the substantial differences in melanogenic dose response (tanning capacity) noted between East Asians and European Americans (46). Answering these and other questions not only unravels an interesting physiological trait, but also provides a model or test system for gene discovery in other polygenic traits (like complex diseases) which have greater environmental sources of variation. There is a clear justification for adopting an evolutionary approach in exploring human pigmentation genetics. These have already revealed a trait with a highly dynamic and complex evolutionary past and pointed to the molecular mechanisms underlying phenotypic variability; the later potential most clearly demonstrated by the strong signature of selection in the functionally important SLC45A5 and MATP genes. Although this route has potential to identify ‘selection-nominated candidate genes’ no matter how strong these signatures are, they are still not sufficient evidence of a functional relationship between a gene locus or allelic variant and skin color. However, combining this approach with linkage analysis, genotype/phenotype association (appropriately controlled for stratification) or other direct functional measures should be fruitful in explaining this fascinating human trait.

  1. Mark D. Shriver3,*
  1. 1Smurfit Institute of Genetics, Trinity College Dublin, Dublin 2, Ireland,
  2. 2IPATIMUP, Porto, Portugal and
  3. 3Department of Anthropology, The Pennsylvania State University, PA, USA
  1. *To whom correspondence should be addressed at: Department of Anthropology, 409 Carpenter Building, University Park, PA 16802, USA. Tel: +1 8148631078; Fax: +1 8148631474; Email: mds17{at}
  • Received July 27, 2006.
  • Accepted August 1, 2006.


This work was supported in part by grants from the NIH/NHGRI (HG002154) to M.D.S., from FCT (SFRH/BPD/21887/2005) to S.B. and from the Health Research Board, Ireland (RP/2004/155) to B.M. We would also like to thank Heather Norton, Esteban Parra, Josh Akey, Dan Bradley, Jorge Rocha, and Greg Barsh for helpful and formative discussion.

Conflict of Interest statement. None declared.


| Table of Contents