Copyright © 2001 The American Society of Human Genetics. All rights reserved.
The American Journal of Human Genetics, Volume 69, Issue 6, 1348-1356, 1 December 2001
Antonio Torroni1, 2, , , Chiara Rengo2, 4, Valentina Guida2, Fulvio Cruciani2, Daniele Sellitto5, Alfredo Coppa3, Fernando Luna Calderon6, Barbara Simionati7, Giorgio Valle7, Martin Richards8, Vincent Macaulay9 and Rosaria Scozzari2
1 Dipartimento di Genetica e Microbiologia, Università di Pavia, Pavia, Italy
2 Dipartimenti di Genetica e Biologia Molecolare, Rome
3 Biologia Animale e dell’Uomo, Università “La Sapienza,”, Rome
4 Istituto di Medicina Legale, Università Cattolica del Sacro Cuore, Rome
5 Centro di Genetica Evoluzionistica del Consiglio Nazionale delle Ricerche, Rome
6 Museo National de Historia Natural di Santo Domingo, Santo Domingo, Dominican Republic
7 Centro Ricerche Interdipartimentale Biotecnologie Innovative, Università di Padova, Padova, Italy
8 Department of Chemical and Biological Sciences, University of Huddersfield, Huddersfield, United Kingdom
9 Department of Statistics, University of Oxford, Oxford
Even though the term “haplogroup” was not coined until later (Torroni et al. Torroni et al., 1993), it had already been known from one of the earliest studies of human mtDNA variation (Johnson et al. Johnson et al., 1983) that the cluster of lineages now referred to as “haplogroup L2” (Chen et al. Chen et al., 1995) was a well-defined monophyletic haplotype group (type 2 and derivatives). Early RFLP studies employing five or six rare cutter restriction enzymes showed that haplogroup L2 encompasses about one-third of sub-Saharan African mtDNAs (Johnson et al. Johnson et al., 1983; Scozzari et al. Scozzari et al., 1988, Scozzari et al., 1994; Soodyall and Jenkins Soodyall and Jenkins, 1992, Soodyall and Jenkins, 1993; Graven et al. Graven et al., 1995). Despite its current high frequency and its high estimated coalescence time, which has been calculated as 59,000–78,000 years on the basis of RFLP data (Chen et al. Chen et al., 1995, Chen et al., 2000) and as ∼56,000 years on the basis of hypervariable segment I (HVS-I) data (Watson et al. Watson et al., 1997), haplogroup L2 was not involved in the process of human expansion out of Africa and remained restricted to that continent. Intriguingly, despite these interesting features, the structure and internal sequence variation of this haplogroup have not been analyzed in detail until now.
In the present study, a group of L2 mtDNAs from the Dominican Republic, a country in which the African population component is predominant and heterogeneous in origin, was first studied by high-resolution RFLP and control-region sequence analyses. Subsequently, one mtDNA from each of the four identified clades within L2 was completely sequenced, reaching the highest possible level of molecular resolution. Unexpectedly, we observed that two of the L2 clades are disproportionately derived compared with the other two.
The population sample consisted of 127 unrelated male subjects from the Dominican Republic who were living in Santo Domingo (n=50) and San Juan de la Maguana (n=77). Appropriate informed consent was obtained from all participants, and genomic DNAs were extracted from blood through use of standard procedures.
To determine high-resolution RFLP haplotypes, the entire mtDNA was amplified using PCR in nine overlapping fragments, by the use of the primer pairs described by Torroni et al. (Torroni et al., 1997). Each of the nine PCR segments was then digested with 14 restriction endonucleases (AluI, AvaII, BamHI, DdeI, HaeII, HaeIII, HhaI, HincII, HinfI, HpaI, MspI, MboI, RsaI, and TaqI). In addition, all mtDNAs were screened for the presence/absence of the BstOI site at nucleotide position (np) 13704, the AccI sites at nps 14465 and 15254, the BfaI site at np 4914, the NlaIII sites at nps 4216 and 4577, the XbaI site at np 7440, the MseI sites at 14766 and 16297, and the MnlI site at np 10871. The polymorphism at np 12308 was also tested through use of a mismatched primer that generates a HinfI site when the A12308G mutation is present (Torroni et al. Torroni et al., 1996). The mtDNA control region was sequenced between nps 16003 and 16474, as described elsewhere (Torroni et al. Torroni et al., 1999), and included all of HVS-I (nps 16024–16383).
A new protocol has been developed and optimized to obtain complete mtDNA sequences. The entire mtDNA was amplified in 11 overlapping PCR fragments, using a set of primers with matching annealing temperatures (see Results section). After PCR, the fragments were purified using the QIAquick purification kit (QIAGEN), and Cycle Sequencing was performed by application of BigDye Terminator chemistry associated with the enzyme TaqFS, using a set of 32 nested primers specifically designed for this protocol. An ABI 3700 sequencer with 96 capillaries was employed for separation of the sequencing ladders. The sequencing was performed by the Centro Ricerche Interdipartimentale Biotecnologie Innovative (CRIBI) of the University of Padua (BMR–Servizio Sequenziamento di DNA Web site), where further technical details can be obtained. Complete sequences were aligned, assembled, and compared using the program Sequencher 3.0 (Gene Codes). Since the traces were of excellent quality and were unambiguous, it was only necessary to sequence one strand.
Phylogeny construction was performed by hand and was confirmed using Network 2.0e (Bandelt et al. Bandelt et al., 1995), for the reduced median network, and PAUP* (Swofford Swofford, 2000), for the most parsimonious tree. The likelihood-ratio test of the molecular clock was performed using TREE-PUZZLE 5.0 (Strimmer and von Haeseler Strimmer and von Haeseler, 1996).
High-resolution RFLP analysis and control-region sequencing revealed that 47 of the 127 Dominican subjects (37%) harbored L2 haplotypes (Table 1) and that the remainder belonged to other known African (L1, L3b, L3d, L3e, L3*, and U6), American Indian (A, B, C, and D), and western Eurasian (J and U2) haplogroups (data not shown). As reported elsewhere, L2 mtDNAs are characterized by the RFLP motif +3592 HpaI, +10394 DdeI, −10871 MnlI, +16390 HinfI/−16390 AvaII, and by the HVS-I motif 16223-16278-16390 (Chen et al. Chen et al., 1995, Chen et al., 2000; Watson et al. Watson et al., 1997; Quintana-Murci et al. Quintana-Murci et al., 1999; Alves-Silva et al. Alves-Silva et al., 2000; Pereira et al., Pereira et al., in press). However, our survey shows that additional RFLP markers subdivide L2 into four clades that have been termed “L2a,” “L2b,” “L2c,” and “L2d” (Table 1). Clades L2a (+13803 HaeIII), L2b (+4157 AluI), and L2c (−322 HaeIII, −679 DdeI, and −13957 HaeIII) were previously identified by Chen et al. (Chen et al., 2000), and L2d (−3693 MboI and a transition at np 16399) is described here for the first time. Diagnostic mutations in HVS-I further distinguish the four clades from each other in some cases (Table 1 and fig. 1). The clade L2d, although represented by only two subjects in our sample, is by far the most divergent clade within L2 (fig. 1).
|L2 Clade||RFLPb||HVS-Ic||No. of Subjects|
|L2a||+13803e, +12752a, +15749s, +16517e||223-278-294-390-189-193-309||1|
|L2a||+13803e, −12629b/+12629j, +16517e||223-278-294-390-309||1|
|L2a||+13803e, +14003p,+16239s, +16517e||223-278-294-390-193-213-239-309||1|
|L2a||+13803e, −6296c; +16517e||278-294-390-189-192-309||1|
|L2a||+13803e, [−3592h], +16517e||223-278-294-390-189-309||1|
|L2b||+4157a, +6610g, +14406c||114A-129-213-223-278-390-354||1|
|L2b||+4157a, +6610g, +11313a||114A-129-213-223-278-390||2|
|L2b||+4157a, +417kd, −16310k||114A-129-213-223-278-390-311-355-362-368||2|
|L2b||+4157a, +417kd, −15883e||114A-129-213-223-278-390-355-362-465||1|
|L2b||+4157a, +417kd, −5261e, −15776a||114A-213-223-278-390-255-284-355-362||1|
|L2b||+4157a, +5559a, −5742i||114A-129-213-223-278-390-212||1|
|L2c||−322e, −679c, −13957e||223-278-390-192-261||5|
|L2c||−322e, −679c, −13957e||223-278-390-263||1|
|L2c||−322e, −679c, −13957e||223-278-390-093-189-264||1|
|L2c||−322e, −679c, −13957e, −8858f||223-278-390-214-274||1|
|L2c||−322e, −679c, −13957e, −8858f, +16517e||223-278-390||1|
|L2c||−322e, −679c, −13957e, +6618e, −16297s||223-278-390-264-298||4|
|L2c||−322e, −679c, −13957e, −16310k, +16517e||223-278-390-181-311||1|
|L2c||−322e, −679c, −13957e, −13704p, −15996c/−16000g||223-278-390-172||3|
|L2d||−3693j, −3534c/−3537a, −5584a, −6014l, +12946c/+12949n/+12950f, −13704p, +15494c, +16143s, +16239s, −16310k, +16517e, COII-tRNALys 6-bp insertion||223-278-390-399-111A-145-184-213-234-239-258-292-295-311-355-400||1|
|L2d||−3693j, −9553e, −12629b/+12629j; −15776a, +16296c/−16297s, −16310k, +16398e, +16517e||278-390-399-093-129-189-293-300-311-354||1|
|a States diagnostic of each of the L2 clades are underlined.|
b All L2 mtDNAs harbor the RFLP motif +3592h, +10394c, −10871z, +16389g/−16390b, except for those in which square brackets indicate reverted RFLP sites. Sites are numbered from the first nucleotide of the recognition sequence. A “+” indicates the presence of a restriction site, a “−” the absence. The explicit indication of the presence/absence of a site implies the absence/presence in haplotypes not so designated. The restriction enzymes used in the analysis are designated by the following single-letter codes: a, AluI; b, AvaII; c, DdeI; e, HaeIII; f, HhaI; g, HinfI; h, HpaI; i, MspI; j, MboI; k, RsaI; l, TaqI; m, BamHI; n, HaeII; o, HincII; p, BstOI; q, NlaIII; r, BfaI; s, MseI; z, MnlI. A slash (/) separating states indicates the simultaneous presence or absence of restriction sites that can be correlated with a single-nucleotide substitution.
c Only those nucleotide positions (minus 16000) between 16003 and 16474 that differ from the Cambridge Reference Sequence (CRS) (Andrews et al. Andrews et al., 1999) are shown. Mutations are transitions, unless the base change is specified explicitly.
d Incorrectly mapped as +762k by Chen et al. (Chen et al., 1995).
To better define the relationships between the four L2 clades, one mtDNA (denoted by a black circle in fig. 1) from each of the four clades was completely sequenced. For the present analysis, we developed an efficient sequencing strategy that minimizes time and expense. First, the mtDNA was PCR amplified into 11 fragments by means of primer pairs with almost identical melting temperatures (Table 2), so that the 11 PCR reactions could be performed simultaneously at the same annealing temperature (55°C) in the same thermocycler. Only 32 nested primers were then employed for the cycle sequencing procedure (Table 3).
|PCR ID Number||Fragment Length (bp)||Name||5′ np||3′ np||Sequence (5′→3′)||Melting Temperature (°C)|
|Note.—The annealing temperature for all PCR reactions is 55°C;|
|a nps correspond to the CRS (Anderson et al. Anderson et al., 1981). The length of each oligonucleotide was 22 nucleotides.|
|Template PCR ID Number||Name||Length (nucleotides)||5′ np||3′ np||Sequence (5′→3′)||Melting Temperature (°C)|
|a nps correspond to the CRS (Anderson et al. Anderson et al., 1981).|
A phylogeny of the four L2 complete sequences is shown in figure 2. Consistent with L2d being the most divergent clade, the tree (rooted using a complete sequence from L1a as an outgroup) shows that L2d branched earliest within haplogroup L2. This first branching was followed by that giving rise to L2a, and L2b and L2c are the most closely related.
The first studies with high-resolution restriction mapping divided global mtDNA variation into a number of major ancient clades, called haplogroups (Wallace Wallace, 1995; Torroni et al. Torroni et al., 1996; Macaulay et al. Macaulay et al., 1999). In recent years, the dissection of these “old haplogroups” into smaller and younger monophyletic units, characterized by a more restricted geographic/ethnic distribution, has begun. For instance, haplogroups U and M are now subdivided into numerous clades (Kivisild et al. Kivisild et al., 1999; Macaulay et al. Macaulay et al., 1999; Richards et al. Richards et al., 2000), and even rather recent haplogroups, such as the European pre-V, have been dissected to identify spatial frequency patterns (Torroni et al. Torroni et al., 2001). However, the intrahaplogroup clades identified so far in Eurasian haplogroups do not generally encompass all of the haplogroup members—that is, there is often a “leftover bag” of unclassified mtDNAs within each haplogroup. Our data in Table 1 suggest that this situation may not apply to the African haplogroup L2, since all L2 members from a country—the Dominican Republic—that has been populated by Africans of very different ethnic ancestry are classifiable into four well-defined clades. Indeed, a survey of our data and those published elsewhere (Chen et al. Chen et al., 1995, Chen et al., 2000; Mateu et al. Mateu et al., 1997; Watson et al. Watson et al., 1997; Rando et al. Rando et al., 1998; Krings et al. Krings et al., 1999; Alves-Silva et al. Alves-Silva et al., 2000; Pereira et al., Pereira et al., in press; A. Brehm, L. Pereira, H.-J. Bandelt, M. J. Prata, and A. Amorim, unpublished data) suggests that only 2 of 503 L2 mtDNAs do not fit into any of the four clades. These are 2 Biaka L2 mtDNAs, detected in a sample of 17 subjects, which harbored the RFLP motif +1899 HaeIII, −5261 HaeIII (Chen et al. Chen et al., 1995). Unfortunately, these two mtDNAs have apparently not been included among the 17 Biaka (4 belonging to L1a and 13 belonging to L1c) whose control-region sequences have been reported by Vigilant et al. (Vigilant et al., 1991), even though both studies used the Biaka cell lines from L. Cavalli-Sforza’s laboratory as the DNA source. Thus, at the moment, it is not possible to determine whether the two L2 Biaka mtDNAs are members of L2a or L2b that have reverted at the diagnostic RFLP marker, or whether they form an additional very rare L2 clade.
The survey of available L2 HVS-I and RFLP data also suggests that the four L2 clades display different geographic/ethnic distributions. L2a, the most common clade (62% of the total L2), is the only one widespread all over Africa and appears to be subdivided into two major widespread subsets by the 16309 mutation. The derived form at 16309 appears to be more concentrated in western Africa, but distribution studies are hampered by likely reversions at this position. In contrast, L2b appears to be absent in eastern Africans (Watson et al. Watson et al., 1997; Krings et al. Krings et al., 1999) and in Biaka and Mbuti Pygmies (Vigilant et al. Vigilant et al., 1991; Chen et al. Chen et al., 1995), rare in southern Africans (2.9%) (Vigilant et al. Vigilant et al., 1991; Chen et al. Chen et al., 2000; Pereira et al., Pereira et al., in press), but is common in some Senegalese populations (9.5%) (Chen et al. Chen et al., 1995; Rando et al. Rando et al., 1998). A similar distribution is shown by L2c, which is very common in Senegal (13.5%) (Chen et al. Chen et al., 1995; Rando et al. Rando et al., 1998) and Cabo Verde (16.7%) (A. Brehm, L. Pereira, H.-J. Bandelt, M. J. Prata, and A. Amorim, unpublished data) but is virtually absent in eastern and southern Africans (Watson et al. Watson et al., 1997; Krings et al. Krings et al., 1999; Pereira et al., Pereira et al., in press), the Pygmies, and the !Kung (Vigilant et al. Vigilant et al., 1991; Chen et al. Chen et al., 1995, Chen et al., 2000). The fourth, newly-defined clade, L2d, is rather rare. Including the mtDNAs of two subjects from the Dominican Republic, only 19 L2d mtDNAs can be identified in a total of 503 L2 subjects (3.8%): 7 in Equatorial Guinea, 2 in West Saharans, 3 in the Wolof, 1 in the Mandenka, 1 in Nigeria, 1 in the Lake Chad Kanuri, 1 in southern Sudan, and 1 in Brazil (Chen et al. Chen et al., 1995, Chen et al., 2000; Mateu et al. Mateu et al., 1997; Watson et al. Watson et al., 1997; Rando et al. Rando et al., 1998; Krings et al. Krings et al., 1999; Alves-Silva et al. Alves-Silva et al., 2000; Pereira et al., Pereira et al., in press; A. Brehm, L. Pereira, H.-J. Bandelt, M. J. Prata, and A. Amorim, unpublished data). Seven of these belong to the subset defined by the HVS-I motif 16111A-16145-16184-16239-16292-16355, and the other 12 harbor the distinguishing HVS-I motif 16129-16189-16300-16354. Overall, L2d appears to be mainly restricted to western Africa, like L2b and L2c.
It is worth mentioning that the less common clades L2b and L2d were not sampled in the study by Ingman et al. (Ingman et al., 2000). This is because their mtDNAs were not preselected on the basis of haplogroup affiliation, and a random sampling obviously tends to miss less-common haplogroups. To provide the widest and most-detailed coverage of the human mtDNA phylogeny, an alternative strategy—namely, selection of mtDNAs on the basis of some haplotype information, ideally both control-region and RFLP data—was pursued here, for one major haplogroup.
The phylogeny in figure 2 is striking in at least one regard: the two subjects from L2b and L2d seem disproportionately derived compared with those from L2a and L2c. This highlights a risk in using a small number of complete sequences to access the divergence time of haplogroups. A small sample of sequences might capture only some of the variation; in this case, perhaps just that of the most common clades, L2a and L2c (see Ingman et al. Ingman et al., 2000). In this case, a point estimate of the divergence of L2 would be an underestimate for two reasons: first, the sample would not coalesce on the likely most recent common ancestor of L2 (since it lacks L2d), and, second, the sample would lack the longer branches (in L2b and L2d). Indeed, the average number of mutations (outside of the control region) from the inferred most recent common ancestor of the L2a and L2c sequences in our sample is 14.8, whereas the same statistic evaluated for all seven L2 sequences is 19.4.
This pattern raises the question as to whether the variation at sites outside the control region (neglecting indels) is consistent with a neutral model with a uniform molecular clock. To test this, we evaluated the likelihood of the reconstructed character evolution shown on the tree in figure 2 under two models: one in which a uniform rate was enforced and another where each branch could evolve at its own rate. This calculation was made by coding the mutations inferred in the maximum-parsimony tree as binary characters and by use of a two-state model. Using the likelihood-ratio test, we could reject the uniform clock model at the 5% level (log likelihood L0=−11835.4, for uniform clock; L1=−11842.5, for variable rate model; test statistic 2[L0−L1]=14.4, a value that is exceeded in only 2.6% of cases under the null hypothesis, assuming that the test statistic is distributed as χ2 with 6 df).
Our observation suggests that the mutation process has not been adequately modeled, and this could be for several reasons. First, we may have reconstructed the phylogeny imperfectly—that is, an unfortunate set of recurrent mutations could have distorted the tree topology and the reconstruction of character evolution. This seems unlikely: the L2 sequences are not highly divergent, and we have had to infer only a single recurrent mutation within the coding sequence. In addition, the tree is broadly consistent with the picture that emerges from the variation in the control region, as discussed above. Second, we may not have accounted fully for the stochastic variation in our very small sample of sequences. For instance, another example of L2d may emerge which falls on a shorter branch, more consistent with the variation in L2a and L2c; however, this might in itself be additional evidence of rate variation, since the branches within L2d would then be very different. Only more data can really settle this issue. Third, a succession of founder events and bottlenecks could perhaps generate rather extreme patterns, such as those observed in L2; however, only simulations could test this possibility. Fourth, there may be different selective pressures acting on different lineages. This latter effect might be apparent in the pattern of synonymous and nonsynonymous changes (“s” and “ns” in fig. 2) within protein-coding genes. There do appear to be differences in the proportions of these changes in different parts of L2. L2a appears impoverished in nonsynonymous changes, in comparison with the other parts of L2 and with L2bc in particular (one-tailed Fisher’s exact test for L2a versus the rest of L2: P=.031; this result should be treated with caution, since there is a potential issue concerning multiple comparisons).
This hint of a role for selection in the evolution of human mtDNA follows previous work on its role in the divergence of the mtDNA of humans and chimpanzees (Nachman et al. Nachman et al., 1996). It remains to be seen whether stronger evidence can be found in other parts of the human mtDNA phylogeny, in other geographical regions. If so, the challenge of disentangling the effects of the various evolutionary forces that have shaped human mtDNA will be renewed. In any case, it is likely that the screening of members of the L2 clades for the mutations identified by our complete sequence study will identify markers of younger age with more-restricted geographic and ethnic distributions. A detailed analysis of these subclades should provide new clues about African prehistory and the origin and relationships of African populations.
This research received support from Telethon-Italy grants E.0890 (to A.T.) and B.57 (to G.V.); Italian Consiglio Nazionale delle Ricerche grant 99.02620.CT04 (to A.T.); Fondo d’Ateneo per la Ricerca 2001 dell'Università di Pavia (to A.T.); Progetto Finalizzato C.N.R. “Beni Culturali” (Cultural Heritage, Italy) (to R.S. and A. C.); Grandi Progetti Ateneo Università di Roma “La Sapienza” (to R.S.); the Italian Ministry of the University, Progetti Ricerca Interesse Nazionale 1999 and 2001 (to A.T., R.S., and A. C.); the “Istituto Pasteur Fondazione Cenci Bolognetti,” Università di Roma “La Sapienza” (to R.S.), and a Research Career Development Fellowship from the Wellcome Trust (to V.M.).
The URL for data in this article is as follows:
BMR–Servizio Sequenziamento di DNA, http://bmr.cribi.unipd.it/ (for technical details regarding mtDNA sequencing)
Chen et al., 2000 (2000). mtDNA variation in the South African Kung and Khwe—and their genetic relationships to other African populations. Am J Hum Genet 66, 1362–1383. Abstract | Full Text | (1541 kb) | CrossRef | PubMed
Graven et al., 1995 (1995). Evolutionary correlation between control region sequence and restriction polymorphisms in the mitochondrial genome of a large Senegalese Mandenka sample. Mol Biol Evol 12, 334–345. PubMed
Krings et al., 1999 (1999). mtDNA analysis of Nile River Valley populations: a genetic corridor or a barrier to migration?. Am J Hum Genet 64, 1166–1176. Abstract | Full Text | (1333 kb) | CrossRef | PubMed
Macaulay et al., 1999 (1999). The emerging tree of West Eurasian mtDNAs: a synthesis of control-region sequences and RFLPs. Am J Hum Genet 64, 232–249. Abstract | Full Text | (5038 kb) | CrossRef | PubMed
Pereira et al., in press Pereira L, Macaulay V, Torroni A, Scozzari R, Prata MJ, Amorim A. Prehistoric and historic traces in the mtDNA of Mozambique: insights into the Bantu expansions and the slave trade. Ann Hum Genet (in press)..
Rando et al., 1998 (1998). Mitochondrial DNA analysis of northwest African populations reveals genetic exchanges with European, Near-Eastern, and sub-Saharan populations. Ann Hum Genet 62, 531–550. CrossRef | PubMed
Soodyall and Jenkins, 1993 (1993). Mitochondrial DNA polymorphisms in Negroid populations from Namibia: new light on the origins of the Dama, Herero and Ambo. Ann Hum Biol 20, 477–485. CrossRef | PubMed
Torroni et al., 1999 (1999). The A1555G mutation in the 12S rRNA gene of human mtDNA: recurrent origins and founder events in families affected by sensorineural deafness. Am J Hum Genet 65, 1349–1358. Abstract | Full Text | (109 kb) | CrossRef | PubMed
Torroni et al., 1997 (1997). Haplotype and phylogenetic analyses suggest that one European-specific mtDNA background plays a role in the expression of Leber hereditary optic neuropathy by increasing the penetrance of the primary mutations 11778 and 14484. Am J Hum Genet 60, 1107–1121. PubMed
The American Journal of Human Genetics, Volume 71, Issue 5, 1 November 2002, Pages 1082-1111
Antonio Salas, Martin Richards, Tomás De la Fe, María-Victoria Lareu, Beatriz Sobrino, Paula Sánchez-Diz, Vincent Macaulay and Ángel Carracedo
Africa presents the most complex genetic picture of any continent, with a time depth for mitochondrial DNA (mtDNA) lineages >100,000 years. The most recent widespread demographic shift within the continent was most probably the Bantu dispersals, which archaeological and linguistic evidence suggest originated in West Africa 3,000–4,000 years ago, spreading both east and south. Here, we have carried out a thorough phylogeographic analysis of mtDNA variation in a total of 2,847 samples from throughout the continent, including 307 new sequences from southeast African Bantu speakers. The results suggest that the southeast Bantu speakers have a composite origin on the maternal line of descent, with ∼44% of lineages deriving from West Africa, ∼21% from either West or Central Africa, ∼30% from East Africa, and ∼5% from southern African Khoisan-speaking groups. The ages of the major founder types of both West and East African origin are consistent with the likely timing of Bantu dispersals, with those from the west somewhat predating those from the east. Despite this composite picture, the southeastern African Bantu groups are indistinguishable from each other with respect to their mtDNA, suggesting that they either had a common origin at the point of entry into southeastern Africa or have undergone very extensive gene flow since.
Abstract | |
The American Journal of Human Genetics, Volume 75, Issue 5, 1 November 2004, Pages 752-770
Toomas Kivisild, Maere Reidla, Ene Metspalu, Alexandra Rosa, Antonio Brehm, Erwan Pennarun, Jüri Parik, Tarekegn Geberhiwot, Esien Usanga and Richard Villems
Approximately 10 miles separate the Horn of Africa from the Arabian Peninsula at Bab-el-Mandeb (the Gate of Tears). Both historic and archaeological evidence indicate tight cultural connections, over millennia, between these two regions. High-resolution phylogenetic analysis of 270 Ethiopian and 115 Yemeni mitochondrial DNAs was performed in a worldwide context, to explore gene flow across the Red and Arabian Seas. Nine distinct subclades, including three newly defined ones, were found to characterize entirely the variation of Ethiopian and Yemeni L3 lineages. Both Ethiopians and Yemenis contain an almost-equal proportion of Eurasian-specific M and N and African-specific lineages and therefore cluster together in a multidimensional scaling plot between Near Eastern and sub-Saharan African populations. Phylogeographic identification of potential founder haplotypes revealed that approximately one-half of haplogroup L0–L5 lineages in Yemenis have close or matching counterparts in southeastern Africans, compared with a minor share in Ethiopians. Newly defined clade L6, the most frequent haplogroup in Yemenis, showed no close matches among 3,000 African samples. These results highlight the complexity of Ethiopian and Yemeni genetic heritage and are consistent with the introduction of maternal lineages into the South Arabian gene pool from different source populations of East Africa. A high proportion of Ethiopian lineages, significantly more abundant in the northeast of that country, trace their western Eurasian origin in haplogroup N through assorted gene flow at different times and involving different source populations.
Abstract | |
The American Journal of Human Genetics, Volume 74, Issue 5, 1 May 2004, Pages 827-845
Lluís Quintana-Murci, Raphaëlle Chaix, R. Spencer Wells, Doron M. Behar, Hamid Sayar, Rosaria Scozzari, Chiara Rengo, Nadia Al-Zahery, Ornella Semino, A. Silvana Santachiara-Benerecetti, Alfredo Coppa, Qasim Ayub, Aisha Mohyuddin, Chris Tyler-Smith, S. Qasim Mehdi, Antonio Torroni and Ken McElreavey
The southwestern and Central Asian corridor has played a pivotal role in the history of humankind, witnessing numerous waves of migration of different peoples at different times. To evaluate the effects of these population movements on the current genetic landscape of the Iranian plateau, the Indus Valley, and Central Asia, we have analyzed 910 mitochondrial DNAs (mtDNAs) from 23 populations of the region. This study has allowed a refinement of the phylogenetic relationships of some lineages and the identification of new haplogroups in the southwestern and Central Asian mtDNA tree. Both lineage geographical distribution and spatial analysis of molecular variance showed that populations located west of the Indus Valley mainly harbor mtDNAs of western Eurasian origin, whereas those inhabiting the Indo-Gangetic region and Central Asia present substantial proportions of lineages that can be allocated to three different genetic components of western Eurasian, eastern Eurasian, and south Asian origin. In addition to the overall composite picture of lineage clusters of different origin, we observed a number of deep-rooting lineages, whose relative clustering and coalescent ages suggest an autochthonous origin in the southwestern Asian corridor during the Pleistocene. The comparison with Y-chromosome data revealed a highly complex genetic and demographic history of the region, which includes sexually asymmetrical mating patterns, founder effects, and female-specific traces of the East African slave trade.
Abstract | |