Article Outline

Copyright © 2001 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 69, Issue 6, 1348-1356, 1 December 2001

doi:10.1086/324511

Do the Four Clades of the mtDNA Haplogroup L2 Evolve at Different Rates?

Antonio Torroni12Go To Corresponding Author Chiara Rengo24Valentina Guida2Fulvio Cruciani2Daniele Sellitto5Alfredo Coppa3Fernando Luna Calderon6Barbara Simionati7Giorgio Valle7Martin Richards8Vincent Macaulay9 and Rosaria Scozzari2

1 Dipartimento di Genetica e Microbiologia, Università di Pavia, Pavia, Italy
2 Dipartimenti di Genetica e Biologia Molecolare, Rome
3 Biologia Animale e dell’Uomo, Università “La Sapienza,”, Rome
4 Istituto di Medicina Legale, Università Cattolica del Sacro Cuore, Rome
5 Centro di Genetica Evoluzionistica del Consiglio Nazionale delle Ricerche, Rome
6 Museo National de Historia Natural di Santo Domingo, Santo Domingo, Dominican Republic
7 Centro Ricerche Interdipartimentale Biotecnologie Innovative, Università di Padova, Padova, Italy
8 Department of Chemical and Biological Sciences, University of Huddersfield, Huddersfield, United Kingdom
9 Department of Statistics, University of Oxford, Oxford

Address for correspondence and reprints: Dr. Antonio Torroni, Dipartimento di Genetica e Microbiologia, Università di Pavia, Via Ferrata 1, 27100 Pavia, Italy

Forty-seven mtDNAs collected in the Dominican Republic and belonging to the African-specific haplogroup L2 were studied by high-resolution RFLP and control-region sequence analyses. Four sets of diagnostic markers that subdivide L2 into four clades (L2a–L2d) were identified, and a survey of published African data sets appears to indicate that these clades encompass all L2 mtDNAs and harbor very different geographic/ethnic distributions. One mtDNA from each of the four clades was completely sequenced by means of a new sequencing protocol that minimizes time and expense. The phylogeny of the L2 complete sequences showed that the two mtDNAs from L2b and L2d seem disproportionately derived, compared with those from L2a and L2c. This result is not consistent with a simple model of neutral evolution with a uniform molecular clock. The pattern of nonsynonymous versus synonymous substitutions hints at a role for selection in the evolution of human mtDNA. Regardless of whether selection is shaping the evolution of modern human mtDNAs, the population screening of L2 mtDNAs for the mutations identified by our complete sequence study should allow the identification of marker motifs of younger age with more restricted geographic distributions, thus providing new clues about African prehistory and the origin and relationships of African ethnic groups.

Introduction


Even though the term “haplogroup” was not coined until later (Torroni et al. Torroni et al., 1993), it had already been known from one of the earliest studies of human mtDNA variation (Johnson et al. Johnson et al., 1983) that the cluster of lineages now referred to as “haplogroup L2” (Chen et al. Chen et al., 1995) was a well-defined monophyletic haplotype group (type 2 and derivatives). Early RFLP studies employing five or six rare cutter restriction enzymes showed that haplogroup L2 encompasses about one-third of sub-Saharan African mtDNAs (Johnson et al. Johnson et al., 1983; Scozzari et al. Scozzari et al., 1988, Scozzari et al., 1994; Soodyall and Jenkins Soodyall and Jenkins, 1992, Soodyall and Jenkins, 1993; Graven et al. Graven et al., 1995). Despite its current high frequency and its high estimated coalescence time, which has been calculated as 59,000–78,000 years on the basis of RFLP data (Chen et al. Chen et al., 1995, Chen et al., 2000) and as ∼56,000 years on the basis of hypervariable segment I (HVS-I) data (Watson et al. Watson et al., 1997), haplogroup L2 was not involved in the process of human expansion out of Africa and remained restricted to that continent. Intriguingly, despite these interesting features, the structure and internal sequence variation of this haplogroup have not been analyzed in detail until now.

In the present study, a group of L2 mtDNAs from the Dominican Republic, a country in which the African population component is predominant and heterogeneous in origin, was first studied by high-resolution RFLP and control-region sequence analyses. Subsequently, one mtDNA from each of the four identified clades within L2 was completely sequenced, reaching the highest possible level of molecular resolution. Unexpectedly, we observed that two of the L2 clades are disproportionately derived compared with the other two.


Subjects and Methods


The population sample consisted of 127 unrelated male subjects from the Dominican Republic who were living in Santo Domingo (n=50) and San Juan de la Maguana (n=77). Appropriate informed consent was obtained from all participants, and genomic DNAs were extracted from blood through use of standard procedures.

To determine high-resolution RFLP haplotypes, the entire mtDNA was amplified using PCR in nine overlapping fragments, by the use of the primer pairs described by Torroni et al. (Torroni et al., 1997). Each of the nine PCR segments was then digested with 14 restriction endonucleases (AluI, AvaII, BamHI, DdeI, HaeII, HaeIII, HhaI, HincII, HinfI, HpaI, MspI, MboI, RsaI, and TaqI). In addition, all mtDNAs were screened for the presence/absence of the BstOI site at nucleotide position (np) 13704, the AccI sites at nps 14465 and 15254, the BfaI site at np 4914, the NlaIII sites at nps 4216 and 4577, the XbaI site at np 7440, the MseI sites at 14766 and 16297, and the MnlI site at np 10871. The polymorphism at np 12308 was also tested through use of a mismatched primer that generates a HinfI site when the A12308G mutation is present (Torroni et al. Torroni et al., 1996). The mtDNA control region was sequenced between nps 16003 and 16474, as described elsewhere (Torroni et al. Torroni et al., 1999), and included all of HVS-I (nps 16024–16383).

A new protocol has been developed and optimized to obtain complete mtDNA sequences. The entire mtDNA was amplified in 11 overlapping PCR fragments, using a set of primers with matching annealing temperatures (see Results section). After PCR, the fragments were purified using the QIAquick purification kit (QIAGEN), and Cycle Sequencing was performed by application of BigDye Terminator chemistry associated with the enzyme TaqFS, using a set of 32 nested primers specifically designed for this protocol. An ABI 3700 sequencer with 96 capillaries was employed for separation of the sequencing ladders. The sequencing was performed by the Centro Ricerche Interdipartimentale Biotecnologie Innovative (CRIBI) of the University of Padua (BMR–Servizio Sequenziamento di DNA Web site), where further technical details can be obtained. Complete sequences were aligned, assembled, and compared using the program Sequencher 3.0 (Gene Codes). Since the traces were of excellent quality and were unambiguous, it was only necessary to sequence one strand.

Phylogeny construction was performed by hand and was confirmed using Network 2.0e (Bandelt et al. Bandelt et al., 1995), for the reduced median network, and PAUP* (Swofford Swofford, 2000), for the most parsimonious tree. The likelihood-ratio test of the molecular clock was performed using TREE-PUZZLE 5.0 (Strimmer and von Haeseler Strimmer and von Haeseler, 1996).


Results


High-resolution RFLP analysis and control-region sequencing revealed that 47 of the 127 Dominican subjects (37%) harbored L2 haplotypes (Table 1) and that the remainder belonged to other known African (L1, L3b, L3d, L3e, L3*, and U6), American Indian (A, B, C, and D), and western Eurasian (J and U2) haplogroups (data not shown). As reported elsewhere, L2 mtDNAs are characterized by the RFLP motif +3592 HpaI, +10394 DdeI, −10871 MnlI, +16390 HinfI/−16390 AvaII, and by the HVS-I motif 16223-16278-16390 (Chen et al. Chen et al., 1995, Chen et al., 2000; Watson et al. Watson et al., 1997; Quintana-Murci et al. Quintana-Murci et al., 1999; Alves-Silva et al. Alves-Silva et al., 2000; Pereira et al., Pereira et al., in press). However, our survey shows that additional RFLP markers subdivide L2 into four clades that have been termed “L2a,” “L2b,” “L2c,” and “L2d” (Table 1). Clades L2a (+13803 HaeIII), L2b (+4157 AluI), and L2c (−322 HaeIII, −679 DdeI, and −13957 HaeIII) were previously identified by Chen et al. (Chen et al., 2000), and L2d (−3693 MboI and a transition at np 16399) is described here for the first time. Diagnostic mutations in HVS-I further distinguish the four clades from each other in some cases (Table 1 and fig. 1). The clade L2d, although represented by only two subjects in our sample, is by far the most divergent clade within L2 (fig. 1).

 
Table 1 RFLP and HVS-I Variation of L2 mtDNAs
Haplotypea
L2 CladeRFLPbHVS-IcNo. of Subjects
L2a+13803e223-278-294-390-189-3621
L2a+13803e223-278-294-390-3621
L2a+13803e223-278-294-390-3091
L2a+13803e223-278-294-390-086-3091
L2a+13803e223-278-294-390-189-245-3091
L2a+13803e223-278-294-390-189-3093
L2a+13803e, [−10394c]223-278-294-390-3091
L2a+13803e, +16517e223-278-294-390-256-3091
L2a+13803e, +16517e223-278-294-390-189-309,1
L2a+13803e, +12752a, +15749s, +16517e223-278-294-390-189-193-3091
L2a+13803e, −12629b/+12629j223-278-294-3902
L2a+13803e, −12629b/+12629j, +16517e223-278-294-390-3091
L2a+13803e, +14003p,+16239s, +16517e223-278-294-390-193-213-239-3091
L2a+13803e, −6296c; +16517e278-294-390-189-192-3091
L2a+13803e, −12406h223-278-294-390-093-189-1931
L2a+13803e, [−3592h], +16517e223-278-294-390-189-3091
L2b+4157a, +6610g, +14406c114A-129-213-223-278-390-3541
L2b+4157a, +6610g, +11313a114A-129-213-223-278-3902
L2b+4157a, +6610g114A-129-213-223-278-3901
L2b+4157a, +417kd, −16310k114A-129-213-223-278-390-311-355-362-3682
L2b+4157a, +417kd, −15883e114A-129-213-223-278-390-355-362-4651
L2b+4157a, +417kd, −5261e, −15776a114A-213-223-278-390-255-284-355-3621
L2b+4157a, +5559a, −5742i114A-129-213-223-278-390-2121
L2c−322e, −679c, −13957e223-278-390-192-2615
L2c−322e, −679c, −13957e223-278-390-2631
L2c−322e, −679c, −13957e223-278-390-093-189-2641
L2c−322e, −679c, −13957e, −8858f223-278-390-214-2741
L2c−322e, −679c, −13957e, −8858f, +16517e223-278-3901
L2c−322e, −679c, −13957e, +6618e, −16297s223-278-390-264-2984
L2c−322e, −679c, −13957e, −16310k, +16517e223-278-390-181-3111
L2c−322e, −679c, −13957e, −13704p, −15996c/−16000g223-278-390-1723
L2d−3693j, −3534c/−3537a, −5584a, −6014l, +12946c/+12949n/+12950f, −13704p, +15494c, +16143s, +16239s, −16310k, +16517e, COII-tRNALys 6-bp insertion223-278-390-399-111A-145-184-213-234-239-258-292-295-311-355-4001
L2d−3693j, −9553e, −12629b/+12629j; −15776a, +16296c/−16297s, −16310k, +16398e, +16517e278-390-399-093-129-189-293-300-311-3541
a States diagnostic of each of the L2 clades are underlined.
b All L2 mtDNAs harbor the RFLP motif +3592h, +10394c, −10871z, +16389g/−16390b, except for those in which square brackets indicate reverted RFLP sites. Sites are numbered from the first nucleotide of the recognition sequence. A “+” indicates the presence of a restriction site, a “−” the absence. The explicit indication of the presence/absence of a site implies the absence/presence in haplotypes not so designated. The restriction enzymes used in the analysis are designated by the following single-letter codes: a, AluI; b, AvaII; c, DdeI; e, HaeIII; f, HhaI; g, HinfI; h, HpaI; i, MspI; j, MboI; k, RsaI; l, TaqI; m, BamHI; n, HaeII; o, HincII; p, BstOI; q, NlaIII; r, BfaI; s, MseI; z, MnlI. A slash (/) separating states indicates the simultaneous presence or absence of restriction sites that can be correlated with a single-nucleotide substitution.
c Only those nucleotide positions (minus 16000) between 16003 and 16474 that differ from the Cambridge Reference Sequence (CRS) (Andrews et al. Andrews et al., 1999) are shown. Mutations are transitions, unless the base change is specified explicitly.
d Incorrectly mapped as +762k by Chen et al. (Chen et al., 1995).
Display large version of this figure
Display high quality version of this figure
Figure 1
Unweighted reduced median network (Bandelt et al. Bandelt et al., 1995) of the 47 L2 samples from the Dominican Republic, showing the four clades L2a–L2d. The circles represent combined high-resolution RFLP and HVS-I haplotypes, with their areas proportional to the frequency. The smallest circles are singletons, whereas the largest have frequency 5. The black circles denote the four mtDNAs (one for each clade) that have been completely sequenced. RFLP mutations are indicated next to the branches, with the arrow pointing in the direction of a site gain. The label is the nucleotide at the beginning of the recognition sequence (in the numbering of the reference sequence of Anderson et al. [Anderson et al., 1981]); the letter suffix indicates the enzyme (see Table 1). HVS-I mutations (between 16003 and 16474) are shown on the branches; they are transitions unless the base change is explicitly indicated. Underlining indicates resolved recurrent mutations, and unresolved events are shown by reticulation. Implausible links are shown with a dotted line. The node marked with an asterisk (*) has the RFLP motif +3592h, +10394c, −10871z, +16390g/−16390b and the HVS-I motif 223-278-390. The lengths of branches in L2a and L2d are shown distorted, for convenience of display. The hypervariable RFLP 16517e was not considered, nor were indel events.

To better define the relationships between the four L2 clades, one mtDNA (denoted by a black circle in fig. 1) from each of the four clades was completely sequenced. For the present analysis, we developed an efficient sequencing strategy that minimizes time and expense. First, the mtDNA was PCR amplified into 11 fragments by means of primer pairs with almost identical melting temperatures (Table 2), so that the 11 PCR reactions could be performed simultaneously at the same annealing temperature (55°C) in the same thermocycler. Only 32 nested primers were then employed for the cycle sequencing procedure (Table 3).

 
Table 2 Oligonucleotides Used to Amplify the Entire Human mtDNA in 11 PCR Fragments
Oligonucleotidea
PCR ID NumberFragment Length (bp)Name5′ np3′ npSequence (5′→3′)Melting Temperature (°C)
11,84514897for1489714918ctagccatgcactactcaccag59.96
155rev155134aataggatgaggcaggaatcaa59.93
21,75916488for1648816509ctgtatccgacatctggttcct59.93
1677rev16771656gtttagctcagagcggtcaagt60.08
31,8321404for14041425acttaagggtcgaaggtggatt60.23
3235rev32353214cttaacaaaccctgttcttggg59.90
41,7842900for29002921caataacttgaccaacggaaca59.90
4683rev46834662ttagaaggattatggatgcggt59.83
51,7714381for43814402acctatcacaccccatcctaaa59.59
6151rev61516130actagtcagttgccaaagcctc59.95
61,7475871for58715892gcttcactcagccattttacct59.79
7617rev76177596tcttgtagacctacttgcgctg59.72
71,9807239for72397260gcatacaccacatgaaacatcc60.13
9218rev92189197ttggtgggtcattatgtgttgt60.02
81,7408910for89108931cttaccacaaggcacacctaca60.09
10649rev1064910628aggcacaatattggctaagagg59.65
91,76910457for1045710478tcatatttaccaaatgcccctc60.04
12225rev1222512204agttcttgtgagctttctcggt59.57
101,81612014for1201412035ctcacccaccacattaacaaca60.70
13829rev1382913808agtcctaggaaagtgacagcga60.44
111,87313477for1347713498gcaggaatacctttcctcacag60.13
15349rev1534915328gtgcaagaataggaggtggagt59.64
Note.—The annealing temperature for all PCR reactions is 55°C;
a nps correspond to the CRS (Anderson et al. Anderson et al., 1981). The length of each oligonucleotide was 22 nucleotides.
 
Table 3 Oligonucleotides Used for Sequencing the Entire Human mtDNA
Sequencing Oligonucleotidea
Template PCR ID NumberNameLength (nucleotides)5′ np3′ npSequence (5′→3′)Melting Temperature (°C)
114948for201494814967cacatcactcgagacgtaaa54.92
115564for201556415583atttcctattcgcctacaca54.93
1131rev20131112acagatactgcgacataggg55.28
216522for201652216541taaagcctaaatagcccaca55.27
2584for20584603tagcttacctcctcaaagca55.46
21060for2010601079aagacccaaactgggattag55.74
31445for2014451464gagtgcttagttgaacaggg55.02
32047for2020472066tttaaatttgcccacagaac55.39
32509for2025092528atcacctctagcatcaccag55.23
43085for2030853104atccaggtcggtttctatct54.24
43598for2035983617ctcaacctaggcctcctatt55.17
44010for2040104029acaccctcaccactacaatc54.77
54410for2044104429cagctaaataagctatcggg54.58
55014for2050145033cctcaattacccacatagga55.02
55544for2055445563tcaaagccctcagtaagttg55.63
66041for2060416060ccttctaggtaacgaccaca55.33
66600for2066006619cacctattctgatttttcgg54.91
77336for2073367355cgaagcgaaaagtcctaata55.00
77937for2179377957ttcaactcctacatacttccc53.49
78459for2084598478aactaccacctacctccctc54.74
88975for1889758992tcattcaaccaatagccc54.27
89589for2095899608aagtcccactcctaaacaca54.68
810147for201014710166acatagaaaaatccacccct55.09
910498for221049810519tagcatttaccatctcacttct53.48
911081for201108111100ataacattcacagccacaga54.03
911644for201164411663cctcgtagtaacagccattc54.99
1012114for191211412132acatcattaccgggttttc54.81
1012600for201260012619attcatccctgtagcattgt54.56
1013134for201313413153agcagaaaatagcccactaa54.42
1113568for201356813587ttactctcatcgctacctcc55.02
1114103for201410314122ctctttcttcttcccactca54.61
1114603for201460314622gaaggcttagaagaaaaccc54.87
a nps correspond to the CRS (Anderson et al. Anderson et al., 1981).

A phylogeny of the four L2 complete sequences is shown in figure 2. Consistent with L2d being the most divergent clade, the tree (rooted using a complete sequence from L1a as an outgroup) shows that L2d branched earliest within haplogroup L2. This first branching was followed by that giving rise to L2a, and L2b and L2c are the most closely related.

Display large version of this figure
Display high quality version of this figure
Figure 2
Most-parsimonious reconstruction of the character evolution on a most-parsimonious tree of complete L2 sequences, rooted by use of a complete haplogroup L1a sequence. This tree includes the four L2 mtDNAs sequenced in the course of the present study (blackened circles) and the three L2 complete sequences (blackened squares) previously reported by Ingman et al. (Ingman et al., 2000). The L1a sequence used as an outgroup, as suggested by the phylogenies of Watson et al. (Watson et al., 1997) and Ingman et al. (Ingman et al., 2000), was also obtained in the course of the present study and is from a Dominican subject. Mutations are shown on the branches; they are transitions, unless the base change is explicitly indicated. Deletions are indicated by a “d” preceding the deleted nucleotides. Insertions are indicated by a plus sign (+) preceding the number and type of inserted nucleotides. Underlining indicates recurrent mutations. “s” indicates synonymous mutations, whereas “ns” indicates nonsynonymous mutations. The asterisk (*) indicates the most recent common ancestor of the L2 mtDNAs in our sample. This differs from the revised CRS (Andrews et al. Andrews et al., 1999) by mutations (transitions unless otherwise indicated) at the following positions: 73, 146, 150, 152, 182, 263, 315+C, 750, 769, 1018, 1438, 2416, 2706, 3594, 4104, 4769, 7028, 7256, 7521, 8206, 8701, 8860, 9221, 9540, 10115, 10398, 10873, 11719, 12705, 13590, 13650, 14766, 15301, 15326, 16223, 16278, 16311, 16390, and 16519.

Discussion


The first studies with high-resolution restriction mapping divided global mtDNA variation into a number of major ancient clades, called haplogroups (Wallace Wallace, 1995; Torroni et al. Torroni et al., 1996; Macaulay et al. Macaulay et al., 1999). In recent years, the dissection of these “old haplogroups” into smaller and younger monophyletic units, characterized by a more restricted geographic/ethnic distribution, has begun. For instance, haplogroups U and M are now subdivided into numerous clades (Kivisild et al. Kivisild et al., 1999; Macaulay et al. Macaulay et al., 1999; Richards et al. Richards et al., 2000), and even rather recent haplogroups, such as the European pre-V, have been dissected to identify spatial frequency patterns (Torroni et al. Torroni et al., 2001). However, the intrahaplogroup clades identified so far in Eurasian haplogroups do not generally encompass all of the haplogroup members—that is, there is often a “leftover bag” of unclassified mtDNAs within each haplogroup. Our data in Table 1 suggest that this situation may not apply to the African haplogroup L2, since all L2 members from a country—the Dominican Republic—that has been populated by Africans of very different ethnic ancestry are classifiable into four well-defined clades. Indeed, a survey of our data and those published elsewhere (Chen et al. Chen et al., 1995, Chen et al., 2000; Mateu et al. Mateu et al., 1997; Watson et al. Watson et al., 1997; Rando et al. Rando et al., 1998; Krings et al. Krings et al., 1999; Alves-Silva et al. Alves-Silva et al., 2000; Pereira et al., Pereira et al., in press; A. Brehm, L. Pereira, H.-J. Bandelt, M. J. Prata, and A. Amorim, unpublished data) suggests that only 2 of 503 L2 mtDNAs do not fit into any of the four clades. These are 2 Biaka L2 mtDNAs, detected in a sample of 17 subjects, which harbored the RFLP motif +1899 HaeIII, −5261 HaeIII (Chen et al. Chen et al., 1995). Unfortunately, these two mtDNAs have apparently not been included among the 17 Biaka (4 belonging to L1a and 13 belonging to L1c) whose control-region sequences have been reported by Vigilant et al. (Vigilant et al., 1991), even though both studies used the Biaka cell lines from L. Cavalli-Sforza’s laboratory as the DNA source. Thus, at the moment, it is not possible to determine whether the two L2 Biaka mtDNAs are members of L2a or L2b that have reverted at the diagnostic RFLP marker, or whether they form an additional very rare L2 clade.

The survey of available L2 HVS-I and RFLP data also suggests that the four L2 clades display different geographic/ethnic distributions. L2a, the most common clade (62% of the total L2), is the only one widespread all over Africa and appears to be subdivided into two major widespread subsets by the 16309 mutation. The derived form at 16309 appears to be more concentrated in western Africa, but distribution studies are hampered by likely reversions at this position. In contrast, L2b appears to be absent in eastern Africans (Watson et al. Watson et al., 1997; Krings et al. Krings et al., 1999) and in Biaka and Mbuti Pygmies (Vigilant et al. Vigilant et al., 1991; Chen et al. Chen et al., 1995), rare in southern Africans (2.9%) (Vigilant et al. Vigilant et al., 1991; Chen et al. Chen et al., 2000; Pereira et al., Pereira et al., in press), but is common in some Senegalese populations (9.5%) (Chen et al. Chen et al., 1995; Rando et al. Rando et al., 1998). A similar distribution is shown by L2c, which is very common in Senegal (13.5%) (Chen et al. Chen et al., 1995; Rando et al. Rando et al., 1998) and Cabo Verde (16.7%) (A. Brehm, L. Pereira, H.-J. Bandelt, M. J. Prata, and A. Amorim, unpublished data) but is virtually absent in eastern and southern Africans (Watson et al. Watson et al., 1997; Krings et al. Krings et al., 1999; Pereira et al., Pereira et al., in press), the Pygmies, and the !Kung (Vigilant et al. Vigilant et al., 1991; Chen et al. Chen et al., 1995, Chen et al., 2000). The fourth, newly-defined clade, L2d, is rather rare. Including the mtDNAs of two subjects from the Dominican Republic, only 19 L2d mtDNAs can be identified in a total of 503 L2 subjects (3.8%): 7 in Equatorial Guinea, 2 in West Saharans, 3 in the Wolof, 1 in the Mandenka, 1 in Nigeria, 1 in the Lake Chad Kanuri, 1 in southern Sudan, and 1 in Brazil (Chen et al. Chen et al., 1995, Chen et al., 2000; Mateu et al. Mateu et al., 1997; Watson et al. Watson et al., 1997; Rando et al. Rando et al., 1998; Krings et al. Krings et al., 1999; Alves-Silva et al. Alves-Silva et al., 2000; Pereira et al., Pereira et al., in press; A. Brehm, L. Pereira, H.-J. Bandelt, M. J. Prata, and A. Amorim, unpublished data). Seven of these belong to the subset defined by the HVS-I motif 16111A-16145-16184-16239-16292-16355, and the other 12 harbor the distinguishing HVS-I motif 16129-16189-16300-16354. Overall, L2d appears to be mainly restricted to western Africa, like L2b and L2c.

It is worth mentioning that the less common clades L2b and L2d were not sampled in the study by Ingman et al. (Ingman et al., 2000). This is because their mtDNAs were not preselected on the basis of haplogroup affiliation, and a random sampling obviously tends to miss less-common haplogroups. To provide the widest and most-detailed coverage of the human mtDNA phylogeny, an alternative strategy—namely, selection of mtDNAs on the basis of some haplotype information, ideally both control-region and RFLP data—was pursued here, for one major haplogroup.

The phylogeny in figure 2 is striking in at least one regard: the two subjects from L2b and L2d seem disproportionately derived compared with those from L2a and L2c. This highlights a risk in using a small number of complete sequences to access the divergence time of haplogroups. A small sample of sequences might capture only some of the variation; in this case, perhaps just that of the most common clades, L2a and L2c (see Ingman et al. Ingman et al., 2000). In this case, a point estimate of the divergence of L2 would be an underestimate for two reasons: first, the sample would not coalesce on the likely most recent common ancestor of L2 (since it lacks L2d), and, second, the sample would lack the longer branches (in L2b and L2d). Indeed, the average number of mutations (outside of the control region) from the inferred most recent common ancestor of the L2a and L2c sequences in our sample is 14.8, whereas the same statistic evaluated for all seven L2 sequences is 19.4.

This pattern raises the question as to whether the variation at sites outside the control region (neglecting indels) is consistent with a neutral model with a uniform molecular clock. To test this, we evaluated the likelihood of the reconstructed character evolution shown on the tree in figure 2 under two models: one in which a uniform rate was enforced and another where each branch could evolve at its own rate. This calculation was made by coding the mutations inferred in the maximum-parsimony tree as binary characters and by use of a two-state model. Using the likelihood-ratio test, we could reject the uniform clock model at the 5% level (log likelihood L0=−11835.4, for uniform clock; L1=−11842.5, for variable rate model; test statistic 2[L0−L1]=14.4, a value that is exceeded in only 2.6% of cases under the null hypothesis, assuming that the test statistic is distributed as χ2 with 6 df).

Our observation suggests that the mutation process has not been adequately modeled, and this could be for several reasons. First, we may have reconstructed the phylogeny imperfectly—that is, an unfortunate set of recurrent mutations could have distorted the tree topology and the reconstruction of character evolution. This seems unlikely: the L2 sequences are not highly divergent, and we have had to infer only a single recurrent mutation within the coding sequence. In addition, the tree is broadly consistent with the picture that emerges from the variation in the control region, as discussed above. Second, we may not have accounted fully for the stochastic variation in our very small sample of sequences. For instance, another example of L2d may emerge which falls on a shorter branch, more consistent with the variation in L2a and L2c; however, this might in itself be additional evidence of rate variation, since the branches within L2d would then be very different. Only more data can really settle this issue. Third, a succession of founder events and bottlenecks could perhaps generate rather extreme patterns, such as those observed in L2; however, only simulations could test this possibility. Fourth, there may be different selective pressures acting on different lineages. This latter effect might be apparent in the pattern of synonymous and nonsynonymous changes (“s” and “ns” in fig. 2) within protein-coding genes. There do appear to be differences in the proportions of these changes in different parts of L2. L2a appears impoverished in nonsynonymous changes, in comparison with the other parts of L2 and with L2bc in particular (one-tailed Fisher’s exact test for L2a versus the rest of L2: P=.031; this result should be treated with caution, since there is a potential issue concerning multiple comparisons).

This hint of a role for selection in the evolution of human mtDNA follows previous work on its role in the divergence of the mtDNA of humans and chimpanzees (Nachman et al. Nachman et al., 1996). It remains to be seen whether stronger evidence can be found in other parts of the human mtDNA phylogeny, in other geographical regions. If so, the challenge of disentangling the effects of the various evolutionary forces that have shaped human mtDNA will be renewed. In any case, it is likely that the screening of members of the L2 clades for the mutations identified by our complete sequence study will identify markers of younger age with more-restricted geographic and ethnic distributions. A detailed analysis of these subclades should provide new clues about African prehistory and the origin and relationships of African populations.


Acknowledgments

This research received support from Telethon-Italy grants E.0890 (to A.T.) and B.57 (to G.V.); Italian Consiglio Nazionale delle Ricerche grant 99.02620.CT04 (to A.T.); Fondo d’Ateneo per la Ricerca 2001 dell'Università di Pavia (to A.T.); Progetto Finalizzato C.N.R. “Beni Culturali” (Cultural Heritage, Italy) (to R.S. and A. C.); Grandi Progetti Ateneo Università di Roma “La Sapienza” (to R.S.); the Italian Ministry of the University, Progetti Ricerca Interesse Nazionale 1999 and 2001 (to A.T., R.S., and A. C.); the “Istituto Pasteur Fondazione Cenci Bolognetti,” Università di Roma “La Sapienza” (to R.S.), and a Research Career Development Fellowship from the Wellcome Trust (to V.M.).

References



Alves-Silva et al., 2000 Alves-Silva, J, Santos, MDS, Guimarães, PEM, Ferreira, ACS, Bandelt, H-J, Pena, SDJ, and Prado, VF (2000). The ancestry of Brazilian mtDNA lineages. Am J Hum Genet 67, 444–461. Abstract | Full Text | (2241 kb) | CrossRef | PubMed

Anderson et al., 1981 Anderson, S, Bankier, AT, Barrell, BG, de Bruijn, MHL, Coulson, AR, Drouin, J, Eperon, IC, Nierlich, DP, Roe, BA, Sanger, F, et al. (1981). Sequence and organisation of the human mitochondrial genome. Nature 290, 457–465. CrossRef | PubMed

Andrews et al., 1999 Andrews, RM, Kubacka, I, Chinnery, PF, Lightowlers, RN, Turnbull, DM, and Howell, N (1999). Reanalysis and revision of the Cambridge reference sequence for human mitochondrial DNA. Nat Genet 23, 147. CrossRef | PubMed

Bandelt et al., 1995 Bandelt, H-J, Forster, P, Sykes, BC, and Richards, MB (1995). Mitochondrial portraits of human populations using median networks. Genetics 141, 743–753. PubMed

Chen et al., 2000 Chen, Y-S, Olckers, A, Schurr, TG, Kogelnik, AM, Huoponen, K, and Wallace, DC (2000). mtDNA variation in the South African Kung and Khwe—and their genetic relationships to other African populations. Am J Hum Genet 66, 1362–1383. Abstract | Full Text | (1541 kb) | CrossRef | PubMed

Chen et al., 1995 Chen, Y-S, Torroni, A, Excoffier, L, Santachiara-Benerecetti, AS, and Wallace, DC (1995). Analysis of mtDNA variation in African populations reveals the most ancient of all human continent-specific haplogroups. Am J Hum Genet 57, 133–149. PubMed

Graven et al., 1995 Graven, L, Passarino, G, Semino, O, Boursot, P, Santachiara-Benerecetti, S, Langaney, A, and Excoffier, L (1995). Evolutionary correlation between control region sequence and restriction polymorphisms in the mitochondrial genome of a large Senegalese Mandenka sample. Mol Biol Evol 12, 334–345. PubMed

Ingman et al., 2000 Ingman, M, Kaessmann, H, Pääbo, S, and Gyllensten, U (2000). Mitochondrial genome variation and the origin of modern humans. Nature 408, 708–713. CrossRef | PubMed

Johnson et al., 1983 Johnson, MJ, Wallace, DC, Ferris, SD, Rattazzi, MC, and Cavalli-Sforza, LL (1983). Radiation of human mitochondrial DNA types analyzed by restriction endonuclease cleavage patterns. J Mol Evol 19, 255–271. CrossRef | PubMed

Kivisild et al., 1999 Kivisild, T, Bamshad, MJ, Kaldma, K, Metspalu, M, Metspalu, E, Reidla, M, Laos, S, Parik, J, Watkins, WS, Dixon, ME, et al. (1999). Deep common ancestry of Indian and western-Eurasian mitochondrial DNA lineages. Curr Biol 9, 1331–1334. CrossRef | PubMed

Krings et al., 1999 Krings, M, Salem, AE, Bauer, K, Geisert, H, Malek, AK, Chaix, L, Simon, C, Welsby, D, Di Rienzo, A, Utermann, G, et al. (1999). mtDNA analysis of Nile River Valley populations: a genetic corridor or a barrier to migration?. Am J Hum Genet 64, 1166–1176. Abstract | Full Text | (1333 kb) | CrossRef | PubMed

Macaulay et al., 1999 Macaulay, V, Richards, M, Hickey, E, Vega, E, Cruciani, F, Guida, V, Scozzari, R, Bonné-Tamir, B, Sykes, B, and Torroni, A (1999). The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet 64, 232–249. Abstract | Full Text | (5038 kb) | CrossRef | PubMed

Mateu et al., 1997 Mateu, E, Comas, D, Calafell, F, Pérez-Lezaun, A, Abade, A, and Bertranpetit, J (1997). A tale of two islands: population history and mitochondrial DNA sequence variation of Bioko and São Tomé, Gulf of Guinea. Ann Hum Genet 61, 507–518. CrossRef | PubMed

Nachman et al., 1996 Nachman, MW, Brown, WM, Stoneking, M, and Aquadro, CF (1996). Nonneutral mitochondrial DNA variation in humans and chimpanzees. Genetics 142, 953–963. PubMed

Pereira et al., in press Pereira L, Macaulay V, Torroni A, Scozzari R, Prata MJ, Amorim A. Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Ann Hum Genet (in press)..

Quintana-Murci et al., 1999 Quintana-Murci, L, Semino, O, Bandelt, H-J, Passarino, G, McElreavey, K, and Santachiara-Benerecetti, AS (1999). Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nat Genet 23, 437–441. CrossRef | PubMed

Rando et al., 1998 Rando, JC, Pinto, F, González, AM, Hernández, M, Larruga, JM, Cabrera, VM, and Bandelt, H-J (1998). Mitochondrial DNA analysis of northwest African populations reveals genetic exchanges with European, Near-Eastern, and sub-Saharan populations. Ann Hum Genet 62, 531–550. CrossRef | PubMed

Richards et al., 2000 Richards, M, Macaulay, V, Hickey, E, Vega, E, Sykes, B, Guida, V, and Rengo, C, et al. (2000). Tracing European founder lineages in the Near Eastern mtDNA pool. Am J Hum Genet 67, 1251–1276. Abstract | Full Text | (220 kb) | CrossRef | PubMed

Scozzari et al., 1994 Scozzari, R, Torroni, A, Semino, O, Cruciani, F, Spedini, G, and Santachiara Benerecetti, AS (1994). Genetic studies in Cameroon: Mitochondrial DNA polymorphisms in Bamileke. Hum Biol 66, 1–12. PubMed

Scozzari et al., 1988 Scozzari, R, Torroni, A, Semino, O, Sirugo, G, Brega, A, and Santachiara-Benerecetti, AS (1988). Genetic studies on the Senegal population. I. Mitochondrial DNA polymorphisms. Am J Hum Genet 43, 534–544. PubMed

Soodyall and Jenkins, 1992 Soodyall, H, and Jenkins, T (1992). Mitochondrial DNA polymorphisms in Khoisan populations from southern Africa. Ann Hum Genet 56, 315–324. CrossRef | PubMed

Soodyall and Jenkins, 1993 Soodyall, H, and Jenkins, T (1993). Mitochondrial DNA polymorphisms in Negroid populations from Namibia: new light on the origins of the Dama, Herero and Ambo. Ann Hum Biol 20, 477–485. CrossRef | PubMed

Strimmer and von Haeseler, 1996 Strimmer, K, and von Haeseler, A (1996). Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13, 964–969. PubMed

Swofford, 2000 Swofford, DL (2000). PAUP*: phylogenetic analysis using parsimony (*and other methods). (Sunderland, Massachusetts: Sinauer Associates). PubMed

Torroni et al., 2001 Torroni, A, Bandelt, H-J, Macaulay, V, Richards, M, Cruciani, F, Rengo, C, and Martinez-Cabrera, V, et al. (2001). A signal, from human mtDNA, of postglacial recolonization in Europe. Am J Hum Genet 69, 844–852. Abstract | Full Text | (1567 kb) | CrossRef | PubMed

Torroni et al., 1999 Torroni, A, Cruciani, F, Rengo, C, Sellitto, D, López-Bigas, N, Rabionet, R, Govea, N, López de Munain, A, Sarduy, M, Romero, L, et al. (1999). The A1555G mutation in the 12S rRNA gene of human mtDNA: recurrent origins and founder events in families affected by sensorineural deafness. Am J Hum Genet 65, 1349–1358. Abstract | Full Text | (109 kb) | CrossRef | PubMed

Torroni et al., 1996 Torroni, A, Huoponen, K, Francalacci, P, Petrozzi, M, Morelli, L, Scozzari, R, Obinu, D, Savontaus, ML, and Wallace, DC (1996). Classification of European mtDNAs from an analysis of three European populations. Genetics 144, 1835–1850. PubMed

Torroni et al., 1997 Torroni, A, Petrozzi, M, D'Urbano, L, Sellitto, D, Zeviani, M, Carrara, F, Carducci, C, Leuzzi, V, Carelli, V, Barboni, P, et al. (1997). Haplotype and phylogenetic analyses suggest that one European-specific mtDNA background plays a role in the expression of Leber hereditary optic neuropathy by increasing the penetrance of the primary mutations 11778 and 14484. Am J Hum Genet 60, 1107–1121. PubMed

Torroni et al., 1993 Torroni, A, Schurr, TG, Cabell, MF, Brown, MD, Neel, JV, Larsen, M, Smith, DG, Vullo, CM, and Wallace, DC (1993). Asian affinities and continental radiation of the four founding Native American mtDNAs. Am J Hum Genet 53, 563–590. PubMed

Vigilant et al., 1991 Vigilant, L, Stoneking, M, Harpending, H, Hawkes, K, and Wilson, AC (1991). African populations and the evolution of human mitochondrial DNA. Science 253, 1503–1507. PubMed

Wallace, 1995 Wallace, DC (1995). Mitochondrial DNA variation in human evolution, degenerative disease, and aging. Am J Hum Genet 57, 201–223. PubMed

Watson et al., 1997 Watson, E, Forster, P, Richards, M, and Bandelt, H-J (1997). Mitochondrial footprints of human expansions in Africa. Am J Hum Genet 61, 691–704. Abstract | | (226 kb) | CrossRef | PubMed

Publication Information


Received: August 22, 2001
Accepted: September 21, 2001


Article Information

PubMed

Related Articles

  • …more