The Longue Durée of Genetic Ancestry: Multiple Genetic Marker Systems and Celtic Origins on the Atlantic Facade of Europe
Abstract
Celtic languages are now spoken only on the Atlantic facade of Europe, mainly in Britain and Ireland, but were spoken more widely in western and central Europe until the collapse of the Roman Empire in the first millennium a.d. It has been common to couple archaeological evidence for the expansion of Iron Age elites in central Europe with the dispersal of these languages and of Celtic ethnicity and to posit a central European “homeland” for the Celtic peoples. More recently, however, archaeologists have questioned this “migrationist” view of Celtic ethnogenesis. The proposition of a central European ancestry should be testable by examining the distribution of genetic markers; however, although Y-chromosome patterns in Atlantic Europe show little evidence of central European influence, there has hitherto been insufficient data to confirm this by use of mitochondrial DNA (mtDNA). Here, we present both new mtDNA data from Ireland and a novel analysis of a greatly enlarged European mtDNA database. We show that mtDNA lineages, when analyzed in sufficiently large numbers, display patterns significantly similar to a large fraction of both Y-chromosome and autosomal variation. These multiple genetic marker systems indicate a shared ancestry throughout the Atlantic zone, from northern Iberia to western Scandinavia, that dates back to the end of the last Ice Age.
Family resemblances among the Celtic languages were first identified by Edward Lhuyd at the end of the 17th century (Renfrew 1987; Cunliffe 1997; James 1999). The surviving insular Celtic (as opposed to the extinct continental Celtic) languages are usually subdivided into two groups on the basis of a single consonantal shift. Welsh, Breton, and the extinct Cornish language are referred to as “Brythonic,” or “P-Celtic,” whereas Irish, Scottish Gaelic, and the extinct Manx language are thought to be more archaic and are referred to as “Goidelic,” or “Q-Celtic” (Trask 1996). The continental Gaulish language, formerly spoken in northern and central France, was more closely related to the insular Brythonic group but has no surviving descendants. Traces also survive of Celtiberian, a Celtic language spoken in antiquity in western and central Iberia alongside the non-Indo-European Basque language of northeastern Spain and Aquitaine, and of Lepontic, which was spoken in northern Italy from the 6th century. Lhuyd named the languages “Celtic,” after one of the names applied to the people of northern France in Greek and Roman ethnography dating from about the 5th century b.c.
Conflating the historical ethnographic “Celts” and 17th-century “Celtic speakers,” Lhuyd followed up his discovery by proposing that the two forms of the Celtic language had been brought to Britain and Ireland by waves of invasion from the continent, and, in the years that followed, antiquarians and archaeologists suggested that these migrations had their “heartland” in the elite La Tène cultural package of the central European Iron Age. Despite the adoption of La Tène art styles, however, archaeological evidence for large-scale Iron Age migrations into the British Isles has been singularly lacking. In Ireland, for example, La Tène artifacts are relatively rare and are almost always of indigenous manufacture rather than of external origin (Raftery 1994), leading archaeologists and historians to question the accepted idea of Celtic migration to Ireland (Ó’Donnabháin 2000). More generally, Renfrew (1987), among others, proposed that the roots of insular Celtic identity lay within the region in which the Celtic languages were historically spoken, in the diffusion of Indo-European speakers into Britain and Ireland with the arrival of the Neolithic in ∼4000 b.c. Cunliffe (2001) appears to go further, describing the coalescence of the Celtic languages along the coastline of the Atlantic facade of Europe, from southern Iberia to the Shetland Islands, via maritime networks that reach back into the late Mesolithic period. The similarities in prehistoric monumental architecture and the spread of the early–Bronze Age “Beaker package,” to take two examples, attest to the likely sharing of beliefs and attitudes through social networks that extended from one end of the Atlantic zone to the other.
This view implies an uncoupling of the link, established by Lhuyd, of a necessary connection between the various aspects of what in the past 200 years has come to be thought of as a “Celtic package”—including, in particular, the peoples encountered and described as “Celts” by the classical authors, the producers of Iron Age La Tène art and their descendants, and speakers of Celtic languages. Modern Celtic speakers should, by this view, be thought of rather as “Atlantic Celts,” whose putative continental Iron Age ancestry is open to question (James 1999; Cunliffe 2001). At the same time, many archaeologists and a seeming majority of historians retain the traditional view with some vigor (Megaw and Megaw 1996, 1998).
Genetic evidence has recently lent some support to the suggestion of a shared ancestral heritage among the human populations of Atlantic Europe. Y-chromosome analysis has highlighted similarities between the Pyrenean populations of northern Spain and western population samples from the British Isles (Hill et al. 2000; Wilson et al. 2001). More specifically, a modal haplotype defined by SNP and STR markers (the “Atlantic modal haplotype” in haplogroup R1b [Y Chromosome Consortium 2002]) is present at an unusually high frequency in each population. This has been interpreted as a common Paleolithic genetic legacy that was relatively undisturbed at the edge of the European peninsula by subsequent dispersals from the east, such as those suggested to have taken place during the spread of the Neolithic (Wilson et al. 2001). Some classical marker systems also hint at Atlantic affinities: for example, alleles of the ABO and Rhesus blood groups display frequency peaks in Atlantic Europe (Cavalli-Sforza et al. 1994).
By contrast, studies of human mtDNA in Europe suggested a lack of structure within the continent, leading to debate about the extent to which inferences on demographic history could be justifiably drawn (Richards et al. 1996; Simoni et al. 2000a, 2000b; Torroni et al. 2000). However, with larger data sets, some patterns began to emerge. Wilson et al. (2001) suggested that both mtDNA and X-linked microsatellites (which spend two-thirds of their history in females) indicate an asymmetry between male and female histories of western areas of the British Isles. Principal-components analysis (PCA) of these markers indicated a greater similarity to central European populations than for the Y chromosome, prompting the suggestion that at least one of the “cultural transitions” in the British Isles involved female immigration. More recently, PCA of mtDNA haplogroup frequencies has demonstrated the existence of trends in mtDNA across Europe similar to those found in other markers (Richards et al. 2002). However, since it considered only haplogroup frequencies, this latter analysis did not take advantage of the full range of European variation present at the individual haplotype level. Their analysis was also conducted on a coarse geographical level (since samples sizes were still relatively small) and therefore provided little direct information on finer-scale variation within the continent.
In this study, we assembled and analyzed published European and Near Eastern mtDNA hypervariable segment I (HVS-I) sequences from 8,533 individuals from 45 populations—twice as many as the largest previous analyses (Di Rienzo and Wilson 1991; Piercy et al. 1993; Pult et al. 1994; Bertranpetit et al. 1995; Sajantila et al. 1995, 1996; Calafell et al. 1996; Côrte-Real et al. 1996; Decorte 1996; Francalacci et al. 1996; Richards et al. 1996, 2000; Hofmann et al. 1997; Baasner et al. 1998; Lutz et al. 1998; Opdal et al. 1998; Parson et al. 1998; Rousselet and Mangin 1998; Salas et al. 1998; Kittles et al. 1999; Orekhov et al. 1999; Pfeiffer et al. 1999; Belledi et al. 2000; Dimo-Simonin et al. 2000; Helgason et al. 2000, 2001; Lahermo et al. 2000; Pereira et al. 2000; Cali et al. 2001; Di Benedetto et al. 2001; Kouvatsi et al. 2001; Larruga et al. 2001; Malyarchuk and Derenko 2001; Mogentale-Profizi et al. 2001; Tagliabracci et al. 2001; Malyarchuk et al. 2002; Passarino et al. 2002; Gonzalez et al. 2003; Dubut et al. 2004; V. M. Cabrera, personal communication). To these, we added 200 new mtDNA sequences from maternally unrelated subjects from Ireland, and we compared the mtDNA variation with that in Y-chromosome and autosomal markers to investigate the extent to which these different genetic marker systems suggest shared or differing demographic histories in the Atlantic zone of Europe.
All samples were collected with informed consent. We then isolated DNA from buccal cells and sequenced the mtDNA HVS-I in both forward and reverse directions. To allow comparison with previously published Irish data, only variation in positions 16030–16394 (with respect to the Cambridge reference sequence [CRS] [Anderson et al. 1981]) was considered. Four SNP markers were also typed by use of RFLP analysis: 12308, 4580, 7028, and 73, distinguishing haplogroups U, V, H, and pre-HV, respectively (Torroni et al. 1996; Macaulay et al. 1999).
We observed a total of 155 haplotypes among the 300 Irish individuals studied (including 100 from previous studies), with all but one sample falling into the main western Eurasian haplogroups: U, HV, JT, I, W, and X (Richards et al. 1998). Full results and additional supplementary information are available from the authors' Web site. A χ2 test of mtDNA haplogroup frequencies in samples from eastern (n=127) and western (n=128) Ireland showed no significant differences (the remaining 45 samples did not fall into the eastern or western region). In addition, the genetic distance (ΦST value) between the two regions, on the basis of HVS-I, is small and not significantly greater than zero. This contrasts with the Y-chromosome pattern in Ireland, where eastern and western complements have been shown to be substantially different. This difference between eastern and western Irish Y chromosomes has been attributed to the preferential settlement of subsequent migrants to the accessible east coast after initial colonization (Hill et al. 2000).
Founder analysis (Richards et al. 2000) dated the entry of different mtDNA lineages into Europe by examining the levels of nucleotide diversity accumulated around haplotypes that have matches in the Near East. Roughly 20% of Europeans, principally those belonging to haplogroups J, T1, and U3, are proposed to descend from Neolithic settlers, with the remainder attributed to earlier Late Paleolithic/Mesolithic inhabitants. About 13% of Irish mtDNAs belong to putative Neolithic clusters, a value that is toward the lower end of the range found in Europe and similar to areas such as Scandinavia and the western Mediterranean (Iberia). This observation is consistent with the progressive dilution of the genetic impact of these migrants toward the north and west of Europe. Furthermore, there is an even distribution of putatively Neolithic haplogroups around the island, suggesting that females who arrived after the initial settlement were not restricted to east-facing regions. There are two potential explanations for this: either they were more mobile after arrival in the east or other regions of the island were in direct contact with the continental source populations. By contrast, however, Y-chromosome lineages of putative Near Eastern Neolithic origin (Semino et al. 2000) appear to be virtually absent from the west of Ireland (Hill et al. 2000).
We first examined broad affinities at the population level. Control-region sequences from 8,733 individuals were assembled from the current and previous studies and were grouped into 45 geographically defined population samples. Each of these was checked for quality, as recommended by Bandelt et al. (2002) (results available on the authors' Web site). Samples from some small and/or isolated populations were excluded from our analysis because of the possibility of unusually strong genetic drift that might confound any broader phylogeographical patterns. These included the Western Isles of Scotland, Orkney, and Skye (Helgason et al. 2001).
We estimated genetic distances between all populations as linearized ΦST statistics (Slatkin 1995) by use of ARLEQUIN, version 2.000 (Schneider et al. 2000). The ΦST values were based on pairwise sequence differences between positions 16090 and 16365 to allow maximum comparability between all populations.
To account for mutation-rate heterogeneity in the mtDNA control region, site rates were modeled as independently and identically distributed (i.i.d.) gamma with α=0.26 (Meyer et al. 1999). The resulting matrix of interpopulation ΦST values was summarized in two dimensions by use of multidimensional scaling (MDS) implemented by the ALSCAL program included in the SPSS package, version 11.0. The results are shown in figure 1. A broadly east-west—or southeast-northwest—trend is evident in the first dimension, with the Jordanian and Basque population samples occupying the respective poles. This echoes the trend observed in a PCA of haplogroup frequencies of European regions (Richards et al. 2002). Spatial autocorrelation analysis confirms that dimension 1 values are consistent with a clinal pattern across Europe (not shown).
We visualized the geographical variation of each dimension by interpolating observed values to produce a synthetic surface map of Europe by use of the Spatial Analyst extension of ArcView, version 3.2. We employed the inverse distance–weighted method, using the 12 nearest neighbors, to calculate interpolated map values. The resulting values were then divided into 12 equal classes, or contours. Again, the roughly southeast-to-northwest gradient of the first dimension is clear (fig. 2A). Atlantic European samples, including those from Ireland, Wales, Scotland, and Galicia, as well as Iceland and Norway, occupy positions on the edge of the European range, toward the Basque pole. The Iberian Peninsula is notable as an area of steep north-south gradient, with the north more similar to central and western Europe and the south more similar to Mediterranean Europe and the Near East. The second dimension does not appear to display any obvious geographical pattern but does distinguish Ireland, Scotland, Wales, Iceland, and Galicia from areas in Fenno-Scandinavia that have low values in dimension 1.
To place these findings in the context of other genetic marker systems, we also reexamined published data from loci with different inheritance modes. We reanalyzed, in similar fashion, Y-chromosome data from 3,822 individuals divided into 42 populations (Rosser et al. 2000; Wilson et al. 2001), comprising haplotypes defined by nine binary polymorphisms (SRY-1532, SRY-8299, 92R7, TAT, M9, 12f2, YAP, sY81, and LLY22g) (Rosser et al. 2000).
Dimension 1 of Y-chromosome variation, displayed as a synthetic map (fig. 2B), shows a pattern that is broadly similar to that of mtDNA dimension 1, with a prominent gradient from the Near East to western Europe. A similarity between Atlantic coastal areas is again evident and indeed shows that Ireland and western Britain have a stronger affinity with the Basque region than is found with mtDNA data. Cornwall (where the last Cornish speaker died in 1891) shares this Atlantic affinity in Y-chromosome data but not in mtDNA data. Brittany, an area with a Celtic language and close links to Cornwall, also appears to be an exception in the general mtDNA landscape. We also reanalyzed autosomal variation, using a matrix of interpopulation FST values calculated from classical gene frequencies in 25 European populations (taken from the study by Cavalli-Sforza et al. [1994]). This did not include any Near Eastern populations, but, nonetheless, a similar trend is observed (fig. 2C). However, unlike the Y chromosome or mtDNA, this is clearer in the second dimension (see Torroni et al. 1998).
By use of Pearson’s correlation coefficient, we assessed the congruence of the major trend in mtDNA variation (dimension 1 [fig. 2A]) with, first, the two dimensions of Y-chromosome variation and, second, both dimensions of autosomal diversity. Since gene flow between neighboring populations leads to prior spatial autocorrelation (or the nonindependence of genetic variation across loci with respect to geography), we applied the Dutilleul (1993) correction, as implemented in the PASSAGE package (Rosenberg 2001). An effective or reduced sample size (and thus fewer degrees of freedom) is calculated to reflect spatial structure; this can then be applied in a t test to gauge the significance of the correlation coefficient. An appropriate number of distance classes for each comparison was set by use of Yule’s rule. mtDNA dimension 1 correlated strongly and significantly with Y-chromosome dimension 1 (r=0.714; P=.033 [corrected for two comparisons]) but not with the second dimension (r=0.314; P=.312). Conversely, dimension 2 in autosomal variation is correlated with the predominant mtDNA trend (r=0.528; P=.043), whereas dimension 1 is not (r=0.141; P=.86).
Focusing on the relationships between Ireland and its neighbors, we investigated the geographical provenance of matches to Irish mtDNA haplotypes. This was implemented by comparing each haplotype found in Ireland (positions 16093–16362) with a world database of mtDNA HVS-I sequences assembled from previous studies (Röhl et al. 2001). By use of the geographical information system “mtradius” (Forster et al. 2002), which uses information on the location and frequency of the closest matching haplotypes, we calculated a center of gravity (or center of distribution), with an SD in kilometers (km) as an indication of the dispersal range of the haplotypes. Higher SDs tend to occur with common ancestral haplotypes that have widespread distributions, which are phylogeographically rather uninformative. The less widely and more recently dispersed haplotypes were identified here as point estimates, with an SD of <500 km and an intermediate category of 500–1,000 km.
The results are displayed on a map of Europe in figure 3. The most frequent Irish haplotypes, represented by larger circle size, also have high SDs, indicating that they are widespread throughout Europe. Haplotypes with intermediate SDs are more common in western Europe, whereas haplotypes with low SDs are concentrated almost exclusively in Atlantic and (to a much lesser extent) Mediterranean Europe. The concentration of center-of-gravity estimates with low and intermediate SDs within or adjacent to the Atlantic zone (seen in fig. 2A) is notable. However, the most striking result is the very strong sharing of localized haplotypes with Britain, particularly Scotland. These are widely distributed throughout Ireland and are not concentrated in particular areas. A lesser degree of sharing is also apparent between Ireland and Pyrenean Spain. It is also noteworthy that particular mtDNAs that are characteristic of central Europe, such as J1a (Richards et al. 1998), are virtually absent from the Atlantic facade.
Previous studies of Y-chromosome variation demonstrated strong levels of differentiation within Europe (Rosser et al. 2000), and variation in autosomal loci often exhibits a similar structure (Cavalli-Sforza et al. 1994). However, a detailed portrait of mtDNA structure in Europe has hitherto remained elusive. Yet concordance between different marker systems is an important means of demonstrating that geographical patterns are the result of demographic history and not (for example) of selection. These results strongly suggest—for the first time, to our knowledge—that the demographic histories of Europe, in general, and Ireland, in particular, are similarly recorded in loci with different inheritance patterns. The use of a very large data set that was checked for quality, analyzed at the level of individual lineages, and subdivided into fine population units appears to have been a key factor in the identification of the hitherto-undetected mtDNA patterns seen here.
Previous studies indicated particular affinities within the Atlantic zone of Europe on the basis of the distribution of both the Y-chromosome haplogroup R1b (which reaches frequencies approaching 100% in some parts of western Europe) and the mtDNA haplogroup V (which, however, amounts to <5% of European mtDNAs) (Torroni et al. 1998, 2001; Hill et al. 2000; Semino et al. 2000; Wilson et al. 2001). During the last glaciation, human habitation is thought to have been largely restricted to refugial areas in southern Europe; one of the most important of these is likely to have been in southwestern France and the Iberian Peninsula (Dolukhanov 1993; Housley et al. 1997; Gamble et al. 2004). The recolonization of western Europe from an Iberian refugium after the retreat of the ice sheets ∼15,000 years ago could explain the common genetic legacy in the area. An alternative but not mutually exclusive model would place Atlantic fringe populations at the “Mesolithic” extreme of a Neolithic demic expansion into Europe from the Near East.
In any event, the preservation of this signal within the Atlantic arc suggests that this region was relatively undisturbed by subsequent migrations across the continent. The identification of likely dispersal points for some Irish haplotypes in northern Spain and western France is further evidence for links between Atlantic populations. Cunliffe (2001) has used Braudel’s term, the “longuedurée,” to describe the long-term sedimentation of traditions on the Atlantic facade, which he suggests may stem from the late Mesolithic period, perhaps even predating the arrival of agriculture in the region. Our results support the view that the genetic legacy, at least, of the region may trace back this far and perhaps even to the earliest settlements following recolonization after the Last Glacial Maximum.
An alternative explanation might simply be restricted patterns of long-term gene flow within these two major ecogeographical zones in Europe, facilitated by the Atlantic and Mediterranean seaways. It is difficult to distinguish genetically between a common Paleolithic origin and more recent contacts. However, haplogroup R1b3f Y chromosomes, which have a recent origin in Iberia (Hurles et al. 1999), have not been found in Ireland (Hill et al. 2000), arguing against the migration of very large numbers of men by this route, at least, in the past 2,000–3,000 years. This would be consistent with the suggestion that most contacts over this period would have been small scale, rather in the manner of the Kula ring in the western Pacific (Cunliffe 2001). On the female side, the presence of putatively Neolithic mtDNA haplogroups in Ireland does indicate some gene flow from the continent after the initial peopling of the island (∼9,000 years before the present) following the postglacial reexpansion (see Wilson et al. 2001), although this could have been at any time in the past 6,000 years or so.
A degree of genetic heterogeneity in the British Isles is apparent, at least on the Y chromosome and much more tentatively on the mtDNA, with southeastern England tending to show a greater affinity to neighboring areas of continental Europe. Anglo-Saxon mass migration has been proposed as the explanation for this pattern in Y-chromosome variation (Weale et al. 2002; Capelli et al. 2003). Such explanations may seem feasible for the Y chromosome, given the high levels of drift that might be associated with disproportionately high numbers of offspring among conquering elite males. However, the weight of archaeological evidence is against population replacement associated with the Anglo-Saxon conquest (Esmonde-Cleary 1989), suggesting that alternative explanations should be considered. It may be that the genetic landscape of southeastern Britain has been shaped by older links with the continent, perhaps during the Neolithic period or even before the filling of the North Sea, when Britain was still connected to the continent via the Doggerland plain (Coles 1998).
The multiple mtDNA links between Ireland and Britain, particularly Scotland, are especially striking (see O’Donnell et al. 2002). Archaeological evidence supports contacts during prehistory, and early historical accounts describe the establishment of Irish colonies in Scotland from at least a.d. ∼500 (indeed, the name “Scotland” derives from the Latin word for “Ireland” at this time). Linguistically, modern Scottish Gaelic is a clear derivative of the Irish language. During the 16th and 17th centuries, the plantation of Ulster led to the arrival of substantial numbers of settlers moving in the opposite direction. However, the widespread distribution of these mtDNA haplotypes within Ireland suggests they may be largely the result of earlier contacts.
What seems clear is that neither the mtDNA pattern nor that of the Y-chromosome markers supports a substantially central European Iron Age origin for most Celtic speakers—or former Celtic speakers—of the Atlantic facade. The affinities of the areas where Celtic languages are spoken, or were formerly spoken, are generally with other regions in the Atlantic zone, from northern Spain to northern Britain. Although some level of Iron Age immigration into Britain and Ireland could probably never be ruled out by the use of modern genetic data, these results point toward a distinctive Atlantic genetic heritage with roots in the processes at the end of the last Ice Age.
Acknowledgments
We thank all the volunteers for providing DNA samples, and we are grateful for the financial support for this work provided by the National Millennium Committee through the Royal Irish Academy, as well as by Patrick Guinness and Joseph Donohoe through the Trinity Trust. We thank Vicente M. Cabrera, for access to unpublished Jordanian mtDNA sequences, and Agnar Helgason, José Larruga, Sabine Lutz, Isabelle Dupanloup, Alain Stévanovitch, Päivi Lahermo, and Adriano Tagliabracci, for providing sequences in convenient format. Thanks are also due to Vincent Macaulay, for his assistance with sequence quality checks; Abigail R. Freeman, for figure formatting; Ceiridwen J. Edwards and Muiris O’Sullivan, for helpful discussion and proofreading; and two anonymous reviewers, for their helpful comments.
Electronic-Database Information
The URL for data presented herein is as follows: