Academia.eduAcademia.edu
The phylogenealogy of R-L21: four and a half millennia of expansion and redistribution Joe Flood* * Dr Flood is a mathematician, economist and data analyst. He was a Principal Research Scientist at CSIRO and has been a Fellow at a number of universities including Macquarie University, University of Canberra, Flinders University, University of Glasgow, University of Uppsala and the Royal Melbourne Institute of Technology. He was a foundation Associate Director of the Australian Housing and Urban Research Institute. He has been administrator of the Cornwall Y-DNA Geographic Project and several surname projects at FTDNA since 2007. He would like to give credit to the many ‘citizen scientists’ who made this paper possible by constructing the detailed R1b haplotree over the past few years, especially Alex Williamson. 1 ABSTRACT: Phylogenealogy is the study of lines of descent of groups of men using the procedures of genetic genealogy, which include genetics, surname studies, history and social analysis. This paper uses spatial and temporal variation in the subclade distribution of the dominant Irish/British haplogroup R1b-L21 to describe population changes in Britain and Ireland over a period of 4500 years from the early Bronze Age until the present. The main focus is on the initial spread of L21-bearing populations from south-west Britain as part of the Beaker Atlantic culture, and on a major redistribution of the haplogroup that took place in Ireland and Scotland from about 100 BC. The distributional evidence for a British origin for L21 around 2500 BC is compelling. Most likely the mutation originated in the large Beaker colony in south-west Britain, where many old lineages still survive. From that spread point it was carried rapidly by sea into north-west France, Ireland, north- west Spain and the Middle Rhine, which today have a high incidence of L21, and into Northern England and Scotland. Of about 45 known early Bronze branches or subclades of L21, almost all are found in Britain or in the English-speaking Diaspora. We are able to identify most of the larger subclades of L21 as ‘Atlantic’—spread throughout the Atlantic Beaker range with a distinct presence in Cornwall-Devon in the early Bronze. Continental R-L21 has origins in small random samples from the extensive English distribution. While many studies have tried to identify continental contributions to Isles populations, here we suggest that the reverse was much greater, at least in the early Bronze Age. The global distribution of L21 subclades is almost exactly Pareto, showing an entirely random expansion from an initial point of time, however that point is much later than the early Bronze. Around 100 BC a second major R-L21 expansion from a severe bottleneck was initiated in Ireland and Scotland, when a dozen residual ‘deep’ sub-branches sprang to life and came to dominate L21. This is consistent with a collapse in the effective population of Ireland, followed by a rapid expansion. The limited evidence suggests that a severe weather event, famine and/or epidemic occurred around this time. The strongly patrilineal nature of insular Celtic society helped to keep male lines culturally intact, so that these emergent deep subclades can still be identified with Irish clans to some extent. Around 90 per cent of R-L21 individuals in Scandinavia have paternal-line relatives in Ireland and Scotland dating to Viking times. The distribution is random and involves small numbers and distinct lines, suggesting that some of these were taken to Scandinavia as prisoners and slaves. The Great Migration of millions of people from Ireland and Scotland to North America in relatively modern times was so substantial that no founder effects can be discerned and the New World has acted as a growth matrix extending and preserving the pre-existing R-L21 distribution. This paper introduces several ‘skyline’ methods to trace the development over time of the subclade distribution of L21. These show that the distribution in England has not changed a great deal since the Bronze Age, in stark contrast to the situation in Ireland and Scotland. England and the Continent now make a much smaller contribution to R-L21 than in the past, probably stemming from Roman and Germanic expansion that pushed L21-bearing populations westward. 2 The phylogenealogy of R-L21: four and a half millennia of expansion and redistribution 1. The discovery of L21 The Y-chromosome is passed more or less unchanged from father to son except for a few small mutations that accrue from time to time due to copying error. These mutations, known as single nucleotide polymorphisms or SNPs, define a unique family tree of all men, an ordered tree of descent known as a haplotree. This tree is quite detailed; and like the rings of a tree, it contains information about the broad demographics of mankind, revealing a great deal about population settlement, expansion and development. This may be studied using the statistical methods of genetics, or by methods rather similar to genealogy, in which genetic, archaeological and historical evidence is assembled relating to the timing and place of each SNP mutation and the history and distribution of descendants carrying the SNP. We call this latter approach phylogenealogy. Haplogroup R-L21, the group of all men that have the SNP mutation known as L21, is the most common patrilineage in the British Isles. It is a major branch of the general Y-haplogroup R1b that has dominated Western Europe since the early Bronze Age. Around 37 per cent of men in the British Isles as a whole are R-L21, and two-thirds of the Irish. The coastal Atlantic areas in France across from Britain and an area on the Middle Rhine also have significant incidences of L21, but otherwise the presence in continental Europe is low. Because the British, particularly the Irish, have been such major contributors to the populations of USA and Canada, R-L21 is also one of the commonest lineages in North America.1 It has sometimes been identified as a carrier of Celtic culture because of its high frequency in areas that once spoke Celtic languages, and details of the lineage have been eagerly researched by those claiming Celtic heritage. The L21 SNP marker was located on the Y-chromosome in 2005.2 The major interest of genetic genealogy at the time was not so much in SNPs, as testing for these was then unavailable to the public, but in finding sets of rapidly mutating multivalued short tandem repeat (STR) marker values that might provide a DNA signature for the ancient families of Ireland, in the same way that many modern surnames have STR signatures or haplotypes. Various STR clusters were identified, mostly informally, and associated with particular clans or districts and with traditional clan leaders from the early Christian period (see Wright 2009, Wilson et al. 2001). The O'Neills of Ireland were the best known and most important family in Irish history, descended from a long dynastic line that for centuries were Kings of Ulster and High Kings of Ireland. In 2006 a group of researchers from Trinity College Dublin reported a signature haplotype which was allegedly associated with some descendant families of O'Neill, and which they claimed was, a biological record of past hegemony and supports the veracity of semimythological early genealogies. The fact that about one in five males sampled in northwestern Ireland is likely a patrilineal descendant of a single early mediaeval ancestor is a powerful illustration of the potential link between prolificacy and power. (Moore et al. 2006). This research was subsequently used to promote and badge DNA testing: Are you descended from the legendary kings of Ireland? 1 Over 63 million men are L21: 21.5 million in Europe and 42 million in the English-speaking Diaspora. (estimated from the Origins Database and Hammer et al. 2005). 2 Along with several other key R1b-subclade markers in Figure 5 including U106 and U152. 3 These claims turned out to be over-enthusiastic in a number of respects,3 but the results did demonstrate that the descendants of a single man could eventually come to dominate a large population. Other examples from the distant past were soon found; however, the growth of single lineages to such an extent has not happened in historical times, and the circumstances in which it might occur remain uncertain. After 2008 when SNP testing became commercially available, it became apparent that the various Irish clusters were actually associated with SNPs that defined deep subclades of L21 on the phylogenetic Y-tree. The O’Neill ‘Irish modal’ or ‘Irish Type 1’ cluster in particular was ultimately defined by M222 (Kennedy 2014), one of the original SNP markers named by Underhill (2003). In the last few years, the advent of NextGen DNA sequencing and amplification technology, in which large sections of the human genome can be sequenced fairly cheaply and reliably, has diverted the attention of geneticists away from the Y-chromosome to whole-genome comparisons of current populations and ancient DNA. However, NextGen complete or partial sequencing of the Y- chromosome has also greatly extended the scope of Y-chromosome analysis. Very recently affordable panels of SNPs have become available that have permitted the very large ‘untested rump’ of R1b samples in the public domain to be re-tested for subclade membership – and they have revealed an extraordinarily complex hierarchical structure within R-L21. This paper largely deals with the changing incidence of R-L21 and its subclades, in the context of the four expansive phases during which it came to occupy its current spread: • the founding of L21 and its major branches and settlement areas at the beginning of the Bronze Age around 2500 BC, followed by a long 2500-year interregnum with relatively little activity, • the settlement of Scotland some time later; • a major redistribution and subsequent expansion of L21 in Ireland and Scotland at the dawn of the Common Era 100 BC–700 AD; • limited translation of L21 to Scandinavia as the result of Viking incursions; and • the Great Migration to the English-speaking New World where most R-L21 is to be found today. We develop phylogenealogical methods to investigate lifetime changes in distribution of the R-L21 haplogroup and the possible causes of these changes. In doing so we reach several substantial new conclusions about the populating of the Isles. The Appendices contain most of the technical material and tables. Appendix A discusses the data, and Appendix B examines the constraints of using traditional population genetics variance methods for L21, because of its pronounced internal dynamics. Spreadsheet C has the larger tables used for the study. 2. Incidence of L21 and subclades The incidence by country of origin of L21 is shown in Figure 1 (see spreadsheet Table C1, which calculates the distribution of Y-haplogroups in most European countries). 3 Critiqued in two chapters of Jaski (2013) 4 Figure 1. Incidence of R-L21 by origin Source: European Origins database (Appendix 1, Table C1).4 A strong west-to-east decline is evident, and there is a heavy presence of 50 per cent or more in the traditional ‘Celtic’ locations, especially Ireland. France has about 16 per cent R-L21 while Iberia has 10 per cent. The incidence in the ‘Germanic’ countries is low, about 4 per cent. There is a residual presence throughout most of central and eastern Europe. The distribution is heavily regionalised. Within England, as Figure 2 shows, London has the highest incidence of L21, presumably due to immigration from the ‘Celtic’ areas, followed by the North and Midlands, because of proximity to Scotland. The south coast has a relatively low incidence, with a pronounced west-to-east gradient, because of the mix of other subclades of R1b—especially U106 and DF27. Figure 2. Incidence of L21, regions of England, France and Spain Note *: Small samples. 4 The database shows a lower incidence of R1b than some other sources – which translates to a lower incidence of L21. See Appendix A. 5 Galicia in the north-west of Spain has a similar incidence as the south-west of England. Although the samples are small, it appears that Normandy, Brittany and Alsace in France have a high incidence similar to the Celtic fringe of the Isles.5 It is believed the Rhineland of Germany also has a high incidence similar to Alsace, though we have not been able to locate specific data for the area.6 L21 has about 45 subclades immediately below it or its major branch DF13.7 Some of the subclades also immediately branch rapidly, so that L21 has 75 known branches that survive from 2200 BC or earlier.8 So many of the early Bronze subclades of L21 have survived to the present that any sufficiently widespread data collection will contain many of these. For example, the 1000 Genomes Consortium (2010) collection from Cornwall and Kent contained 30 R-L21 men who fall into 15 different subclades including several rare lines. The distribution of the sizes of the larger subclades of L21 is shown in Figure 3. It is extremely close to an idealised geometric distribution (Fox and Lasker 1982), and on average each subclade is about ¾ the size of the one above it. This shows that what we see is a Yule process attributable to pure chance (Yule 1925, Rossi 2015) or genetic drift, which does not require any special assumptions about human behaviour. The distribution is self-similar as one continues downstream through the haplotree, since the number of descendants of any fairly large group of men, taken to enough generations, is always geometric in the limit. Figure 3. Log rank size of sizes of subclades of L21, with trend Note: There are about 15 smaller known subclades. Source: L21 full database, Appendix A, N=6619. 5 Using the generic term ‘the Isles’ to refer to Britain, Ireland and associated islands. 6 The FTDNA mapping facility shows considerably more L21 on the German side of the Rhine, along the stretch from Koblenz to Stuttgart. 7 Also if below the early marker ZZ10_1. This includes 19 singletons with only one known representative. 8 Branches are identified from their presence in the database, and confirmed in the websites www.ytree.net or www.yfull.com. Timing largely follows the methodology used in www.yfull.com, where each SNP mutation found in a Big Y test occurs on average every 140 years or 5 generations approximately. See Appendix A. 6 The tight conformity to the theoretical distribution confirms that our sample accurately represents the global distribution of L21 and is not subject to very recent bottlenecking or founder effects to any degree, at least as far as the sizes of subclades are concerned (see Appendix A). The subclades of L21 have a very different incidence in different countries, which gives a strongly local character to L21, as Table C2 and Figure 4 show. The larger subclades are spread fairly evenly in England, suggesting it is a central place or point of distribution where subclades either developed first or subsequently mixed. In Ireland and Scotland, by comparison, the top five subclades account for about 80 per cent of the total, and the dominant subclades are different in each country. As we shall see, much of the difference in Ireland and Scotland is due to large founder effects in SNP lineages dated to the early Common Era, 2500 years after the formation of L21. Figure 4. Distribution of largerL21 subclades: England, Ireland, Scotland and the Continent Source: L21 database (Table C2a) Although the distribution of subclades in individual European countries differs a good deal, the total distribution for mainland Europe looks rather like the English distribution – suggesting that European countries have been populated by small random samples from England. At least in the early years of L21, this appears to be true. 3. Early Bronze – the primary expansion of L21 3.1 The phases of expansion of L21. There have been several periods when L21 expanded substantially, as shown by a very rapid increase in the number of sub-branches over a short period of time. In genetics, the effective population is estimated by the number of allele changes over a period of time, and rapid changes in the effective population in quite a number of studies are assumed to correspond to similar changes in the real population (see for example Batini 2015). The genetic evidence for the expansion of L21 shows the following: 7 • In the first expansive period, from 2500 BC to about 2000 BC, L21 and its subclades were founded, split and expanded throughout the Atlantic Beaker range. Sometime later in the middle Bronze a second expansion occurred and the expansion extended to Scotland. • In the second period, the early Common Era from 100 BC to 600 AD, another large population advance in Ireland and Scotland substantially reorganised the distribution of subclades. This was probably preceded by a very substantial fall in the population of Ireland, a near-extinction which gave a number of surviving very small subclades the chance to expand. • In the Viking period, this new Irish-Scottish L21 was carried into Scandinavia and Northern Europe, probably by slaves taken in raids. • Finally, in the colonial period from 1600 to 1900, L21 moved freely throughout the English- speaking Diaspora, preserving the ancient distributions and preventing the extinction of some ancient lines. The rest of this section examines the phylogenealogical evidence for these assertions, including evidence from the L21 haplotree and from archaeological and historical sources. 3.2 Beginnings - The Atlantic culture The L21 mutation occurred during an extraordinarily rapid expansion of the effective population of the male R1b haplogroup on the Atlantic seaboard. In only a few hundred years. ‘Western R1b’ formed over 300 Y-chromosome branches that survive to the present day and which define our current categories of Western R1b.9 Batini et al. (2015, Figure 1) show that the branching of R1b at this time was spectacular, equal to that of all other European haplogroups taken together. No other effective male population expansion of this rate, magnitude and extent is known until the modern era. The companion paper Flood (2016) proposes that the original Western R1b men were a closely related group of mariners and traders who came to the Atlantic seaboard before 2700 BC.10 These invaders are often known as the ‘Bell Beaker Folk’ because of their distinctive drinking vessels. The Bell Beaker period marked a period of unprecedented cultural contact in Atlantic and Western Europe on a scale not seen previously nor seen again. With boats as their major form of transport and trade as a major means of sustaining communities, the Bell Beakers established their initial colonies near to tradeable resources, on the coast and up major rivers. They appear to have leapfrogged to specific areas, probably to exploit valuable metals like gold, tin and copper- very much in the manner of their descendants in the New World four millennia later. The Beakers formed maritime colonies in quick succession in Iberia, southern England, Ireland, the Rhone Valley, Brittany and the Middle Rhine. These settlements grew together connected by the sea trade routes of the ‘Atlantic culture’ (Cunliffe 1994, 2001, 2010), with the south of England at the centre. As Bradley (2007: 26) puts it, ‘The islands’ distinct geography … allowed them to form links with regions of the European mainland that would not have been in regular contact with one another.’ 9 This compares with about 40 haplogroup lineages that survive in Europe from before the last Glacial Maximum, and an estimated 400 lines that developed across Eurasia in almost 20 000 years from 25 000 BC to 6000 BC (Flood 2016), almost none of which are attributable to Western Europe. 10 Flood (2016) critiques an alternative theory that R1b arrived at the Atlantic seaboard by land. 8 R1b-L151 P312 U106 L21 DF27 U152 Figure 5. Atlantic R1b, major branches In Figure 5 we see the principal ‘Western R1b’ haplogroups R-DF27, R-L21, R-U106 and R-U152. Flood (2016) regards these as the genetic expressions of separate settlements - R-L21 in the south-west English mining and religious settlement, R-U106 around the North Sea, R-U152 on the Rhone, in Lombardy and the Cisalpine area, while R-DF27 represents the original Iberian settlement. The largest Beaker settlement, apart perhaps from the gold-tin Tagus valley settlement in Portugal, appears to have been in south-west Britain in what looks very much like the world’s first minerals rush, seeking the world’s most valuable resources at the time, alluvial gold and tin. Standish et al. (2015) write, ‘Southwest Britain would have been an extremely important region during the Bronze Age, as local populations would have had the ability to control the supply of two of the key materials in use at this time.’ NextGen sequencing of the rapidly branching R1b genome has allowed for reasonably accurate dating of L21 to about 2500 BC. This date has been supported by the presence of Bell Beaker sites all over Britain and Ireland dating from before 2400 BC. The Beaker constructions in Cornwall are the most extensive in Britain with an abundance of round barrows and cairns, henges, stone circles and stone cist graves.11 The construction of Stonehenge II and III in Wiltshire, which required complex logistics and extensive manpower, was probably funded from the proceeds of the Cornwall-Devon mining bonanza. At Durrington Walls near Stonehenge, the largest village on the Atlantic seaboard was sited for a short time around 2500 BC, housing about 4000 people from all over Britain (Parker- Pearson et al. 2013). Ireland is particular important for L21. A Bell Beaker arsenical bronze smelting industry at Ross Island in the south-west of Ireland dates to 2400 BC, when the local sulpharsenide ores were smelted to produce most of the arsenical bronze axes used in Britain. Traded artefacts from the site have been found in the south of Britain, while large numbers of artefacts using Cornish gold have been found in Ireland.12 A long-suspected relationship between Bell Beaker peoples and R1b DNA has now been confirmed by the sequencing of the first ancient Bronze Age genome in the Isles (Cassidy et al. 2016). Remains at Rathlin Island off the north coast of Ireland have been dated to 2050 BC and are L21>DF21, the largest subclade of L21 prior to the Christian era. Rathlin was a production facility for 11 Pevsner (1989: 27); http://www.historic-cornwall.org.uk/flyingpast/living.html Accessed April 2016. 12 David Keys, Cornwall was scene of prehistoric goldrush, says new research. Daily Mail 5 June 2015; reporting on Standish et al. (2015). 9 porcellanite axe heads, a dense form of recrystallised basalt, and several Bronze Age gold artefacts have been found there (Jope et al. 1952). After this early ‘rush’ of settlement Ireland seems to have been demically isolated from the main Bell Beaker culture. Once the metals rushes were over, the Irish Beaker period was ‘characterized by the ancientness of Beaker intrusions, by isolation and by influences and surviving traditions of autochthons’ (Osmon 2011). Another very early L21 colony was founded on the Middle Rhine, and it is believed a high incidence of L21 occurs in the area even today.13 The presence of unique early L21 subclade branches from the area suggests the settlement date is probably prior to 2300 BC. Cassidy et al. (2016) found a significant admixture closely related to Irish DNA in modern Germans (particularly visible as a Middle Rhine hotspot in their Figure 3). The presence of Bronze Age Wessex wheel-and-cross disks and Wessex-style pottery along the Middle Rhine,14 coupled with the L21 genetic connection, make it likely that the colony was launched from south-west Britain.15 It appears that the Beaker expansion hit its carrying capacity quite quickly, because after about 2000 BC there are few new branches in the L21 haplotree until the Common Era. One exception is very significant branching in the L513/DF1 subclade about 6 SNPs or 800 years from the formation of L21. This might correspond to a mid-Bronze population expansion in Scotland; a late Scottish Bronze Age where arable land expanded at the expense of forests; perhaps because all suitable land had been cleared in England and by this time and settlers turned to more marginal land in Scotland. This may have occurred as late as the Bronze Age Climatic Optimum around 1600 BC, when climate change made settlement further north more practicable. 3.3 L21 Subclades in the early Bronze Age Our principal technique for classifying the original subclades of L21 is to examine the number of early Bronze branches and their geographic extent. The subclades fall into three broad classes, which we call Atlantic, local or residual, depending on their size and spread. Most of the major subclades are Atlantic. These subclades branched widely and disseminated along the ‘Atlantic culture’ routes from south-western Britain to Iberia, Brittany, and the Rhine in the last half of the 3rd millennium BC. We define Atlantic subclades as satisfying three conditions: • having many early branches, spread widely throughout Britain and Ireland; suggesting a spread during the initial colonisation by R1b. The number of early branches16 typically determines the size of the subclade; • two separate early Bronze lines (branching pre-2000 BC) in Cornwall or Devon, signifying the likelihood of a very early ancestor located there; 13 The Beakers normally settled near tradeable resources and why they bypassed the lower Rhine is not clear. Given their interest in alcohol and feasting (Sherratt 1987) it is tempting to think they might have been interested in the wild grapevines that grew in the area. 14 Flanagan (1998), Taylor (1980). Clarke (1970) attributes ‘Wessex-style beakers’ in south-west England to a ‘rich and powerful group of settlers from the Middle Rhineland who came mainly from the area around Mainz and Koblenz’. The genetic evidence strongly suggests the reverse is the case. 15 By an ironic quirk, some of the various Celtic and Germanic invaders of Britain over the millennia might be said to be ‘returning home’. 16 For the purpose of this exercise we count the number of ‘early branches’ that occur within 6 SNPs of the founder of the subclade, using www.ytree.net which has the most up-to-date, accurate and easily followed haplotree of L21. An ‘early branch’ should be standalone in origin, without other places obviously higher in the tree. 10 • early Bronze lines in two or more of France, Germany or Iberia. L21 spread to these places very early but never really ‘took off’ as far as we know; the continental lines of L21 we observe are very long and thin and their branches define them as early strays. However the relatively low numbers tested in mainland Europe hampers analysis. Condition b) locates these subclades of L21 in south-western England at some point. The presence of two widely separated17 members of the same subclade in Cornwall or Devon provides a good chance their common ancestor lived in the area before 2000 BC. We know that the Cornwall/Devon area was a major dissemination point for R-L21, and likely had the first large settlements in Britain, Beaker or otherwise. That was where tin and gold were found in most abundance, and where the largest number of Beaker sites are found. The significant presence of about 15 per cent of the brother Iberian haplogroup R-DF27 in Cornwall, the only place in Northern Europe with a significant presence, confirms the scope of early population exchange with Spain. It is known that Cornwall and Devon are genetically differentiated from the rest of Britain (Leslie et al. 2015) and very early differences in genome have persisted there. Of these Atlantic subclades: • DF21 was comfortably the largest subclade overall until the Common Era (see Table 1 and Section 5) and was particularly dominant in Ireland. Although it is regarded as ‘original Irish’ initially it was a classic Atlantic subclade, with 28 early branches including one each in Iberia (Torres), the Rhineland (Fuston) and Normandy (Montgomery), and it remains the largest subclade in England and second in Ireland (Table C2a). There are five members of DF21 in Cornwall and Devon, within two different early branches. • Z253 branched 27 times during the early Bronze era and is the most widespread subclade and the most common on the Continent and in England before the Common Era. It remains the second-largest subclade in England and third in Ireland (see Table C2a). Our sample has six widely separated men in Cornwall/Devon,18 and early branches are found across Britain and around the Atlantic horizon in France, Germany and Spain. As well Z253 found its way to Ireland and Scotland very early and half a dozen branches are Irish (some of these might be continental incursions). • Z251 is also widespread with 15 early branches, mostly English but with representatives having ancestry in Scotland, Portugal, Germany and even Poland and Mexico. Like other English subclades it has fallen in importance. The branch S9294, dating to 2000-2300 BC, is probably originally Cornish as it has two widely separated representatives Millett and Watty. • DF41 apparently did not branch for 9 SNPs (the late Bronze), after which it formed 15 branches across the Atlantic horizon from Germany to Iberia. Two of these branches are found in Cornwall/Devon. A subclade A40 participated in the post-Roman growth spurt in Scotland, while the Royal Stewart line, which is from Brittany, expanded rapidly in mediaeval times. • DF49 began as a fairly sizable Atlantic subclade, branching 23 times in the early Bronze. It was widespread across the Isles with traces in France (Normandy and Poitou) and in Iberia. It is found in widely separated branches in Cornwall as Coad-Coode and Biddick and in Devon 17 Defined as being in two separate Bronze branches, or having a genetic distance measured by 67 SNPs of 18 or more. 18 Unfortunately these have not done sufficient testing to provide good evidence that Z253 was present in Bronze SW Britain. However the high STR variance of Z253 there is indicative. 11 as Hicks and Woolcott. Three thousand years later during the Irish Golden Age this subclade found real prominence as M222 and became the largest fraction of L21. • FGC11134. This subclade had eight early branches, in France and Portugal as well as the Isles. Most of its development occurred much later in Ireland, probably in concert with DF49. • DF63 is regarded as the ‘earliest subclade’, because it branched six ways directly from L21 without the intervening DF13 mutation that is upstream of most other subclades. It has a fairly large early continental component making it truly Atlantic – Spain, Portugal, France, Netherlands and Germany, and is found in Cornwall, though not in particularly early branches (Hicks and Trengove). The ‘Lennox Cluster’ Z16506 participated in the post-Roman expansion in Scotland, probably as indigenous p-Celtic. • FGC5494. This middle-sized subclade branched very early at least 14 ways and is found across Britain and Ireland, with small lines in Germany, France and Iberia. In the Isles it has continued down very long thin lines to a number of discrete surnames in the present: Kenyon, Collings, McLaren, Maynard, Lunney, Maxwell, Phillips. There are three early lines in Cornwall/Devon. • S1051 is a poorly studied middle-sized subclade. It has early branches in Iberia and Denmark, two widely spaced branches in Cornwall (Priske, Medland), and is spread across the Isles. It was once the fifth largest subclade but now is eleventh. It was subject to an early RecLOH19 which gives it a fairly distinctive STR signature. Table 1. Estimated pre- Roman incidence of major L21 subclades by Isles countries. Subclades England Ireland Scotland N DF21* 12.6% 27.5% 24.5% 147 Z253* 15.2% 16.1% 16.4% 104 L513* 7.9% 15.4% 11.3% 80 DF49* 12.0% 13.4% 8.2% 77 S1051 8.9% 2.6% 11.9% 44 Z251* 6.3% 3.9% 5.7% 33 FGC5494 8.9% 3.9% 1.9% 32 DF41 6.8% 3.0% 3.1% 27 Other 21.5% 14.1% 17.0% 111 Note: *) These subclades are enumerated net of the contribution of the deep subclades in Table 2, as a partial proxy for the early distribution. Alternative methods are explored in Section 5. Source: From Table C2c, using men in the L21 database who have 67 markers tested. These nine Atlantic subclades formed part of the same general population, and they lived and migrated within the Atlantic geographical extent. Along with L513, they are estimated to comprise about 80 per cent of the pre-Roman R-L21 population. A few small subclades of L21 are also probably Atlantic, such as S1026 and Z15600, found across the Isles and in France and Germany, CTS3386, which is mostly in Ireland but also Italy and Finland, and FGC35995/Y14240, which has only been found in France, Sweden and Mexico. 19 A RecLOH occurs when one arm of a palindrome ‘writes over’ or replaces the other, making both values of a two-valued marker the same. This may be a big jump, and may affect more than one marker if they are on the same palindrome. A similar RecLOH on YCA (jump from 19-23 to 19-19) occurs in half a dozen other subclades of L21, but not so early. 12 The second class of subclades has been largely confined to one location and is not seen around the Atlantic horizon. It has only one significant subclade and several small ones: • L513 is out of synchronisation with other subclades, expanding at rather different times and places. It seems that it became embedded in northern populations of the Isles and its expansion relates to periods of warmer weather. It first branched after eight equivalent mutations, somewhere about 1600 BC in the Bronze Climatic Optimum (Minoan Warm Period). It was then very vigorous, producing 27 early branches, including several found only in Scotland, giving the general impression of expansion into new territory. These early branches are equally spread between Scotland and Ireland, with some presence in England. Somewhere around 100 BC, in the Roman Warm Period, the branch L193 that is today the largest began to expand. It is tempting to regard L513 as the eponymous early Bronze Scottish subclade, just as DF 21 is Irish, however it may also have been strongly represented among the pre-Goidelic peoples in Northern Ireland. • MC14 is largely Scottish and branched at the same times as L513. • CTS1751 has an uncertain distribution, and several old lines in Devon suggest an origin there; but it is also found in Yorkshire and Lancashire. Its deep subclade BY595 is mostly Irish. Table 2. Residual subclades of L21 and locations of origin Subclade Location of origin 15049032 A-G England, Scotland A5846 Cornwall, England, France, Italy, USA A7900 Wales, France, USA BY2868/BY2899/A4556 England, Scotland, Wales, USA BY4045 USA BY575 England, Ireland, Wales, Finland FGC13742 England, Wales, USA FGC13780 Cornwall, USA FGC21979/A9607 Ireland, Scotland, USA FGC5924 Cornwall L371 Scotland, Wales, Ireland S16264 Devon, France, USA Y14240 Wales, France, Sweden, Mexico Z16500 England, Ireland, Scotland, France, Germany Z17300 England, Scotland, Wales Z39589* Romania (originally German from the Rhine) England (4), Scotland (2), Ireland, France, DF13* Germany, Italy, Norway, USA (5) ZZ10* England, Scotland, Canada Sources: L21 database, www.ytree.net The third class consists of about 16 small ‘residual’ subclades, for which we do not yet have sufficient samples to deduce location or timing (see Table 2 for a listing). S16264, A5846 and Z16500 meet part of the criteria for ‘Atlantic’. Some may be specific to certain locations, such as A7900 and L371 in Wales, FGC13780 is known only in Cornwall and the USA and BY4045 in the USA. Only one is not known in Britain. 13 There are also about 19 singletons currently designated DF13* or ZZ10*, with lines of descent within the Atlantic spread that have apparently survived only a few men from extinction for 4500 years.20 The distribution of L21 subclades across the British Isles for the first 2500 years seems to have been reasonably uniform compared with what came later, though Table 1 shows DF21 and L513 had a considerably higher incidence in Ireland and Scotland than in England, which had a much higher contribution from smaller subclades of L21 (see Section 5 for more detailed analysis). The final group of subclades were late bloomers that came apparently ‘from nowhere’, expanding very rapidly in Ireland and Scotland from about 100 BC. These had existed as small background lines at the tail of the Pareto distribution for thousands of years, until a major redistribution of R-L21 gave them the opportunity to increase their numbers very rapidly. 4. Dark Age to Diaspora We resume our historical account in this section, with a description of what happened to L21 in the period 100 BC to 1800AD, beginning with a very major redistribution. 4.1 The Dark Age - Golden Age redistribution and expansion in Ireland and Scotland Something quite remarkable happened in Ireland and Scotland around the beginning of the Common Era. A very substantial makeover of the structure of R-L21 occurred, so large as to resemble a population recovery from some kind of disaster in which the Irish population was nearly wiped out. This was followed by very rapid growth. Two main phases appear to be involved in this expansion: one in the ‘Dark Age’ from 100 BC, then a consolidation and faster growth from 400 AD. 4.1.1 Dark Age collapse and emergence of deep subclades First, somewhere around 100 BC, two residual subclades of L21 appeared from obscurity and began branching. They were L1335 ‘Scottish modal’ and Z255 ‘Irish Sea’, to use the STR cluster names under which they were first discovered. These were accompanied at the same time by nine equally obscure branches of major Atlantic subclades that had also been residual21 since the early Bronze Age— DF49>M222 ‘Irish type I’, DF21>Z3000 ‘Clan Colla’, FGC11134>CTS4466 ‘Irish Type II’, Z253>L225 ‘Irish Type III’, Z253>CTS9251 ‘Irish Type IV’ and L513>L193 ‘Little Scottish’ (see Table 3a).22 Table 3a shows the principal ‘deep subclades’ involved in the Irish and Scottish Dark Age expansion, in descending order of variance. Excel Table C2d shows their detailed count by country. The subclades all coalesce about 15 SNPs from the present on average, or 2100 years old, a period associated with the late Iron Age and Celtic culture.23 The STR variances of the subclades shown in Table 3 are only about a half of those of the original Bronze Age subclades (see Table B1), which is to be expected as these deep subclades are less than half the age. However even though all these subclades are about the same age as registered by SNPs, the variances are different depending on the internal structure of the lineages (see Appendix B). 20 As the sample expands it is expected that more will appear. In fact there are already a number of untested isolates within the database that are probably residual singletons. 21 “Residual’ here means having a long thin unbranching line with 15 or more equivalent SNPs (M222 has 45). 22 The timing on these breakouts is important but not critical, Although yfull.com dates these subclades to 100- 300 AD, we calculate using the same method on the larger sample in www.ytree.net that the coalescent for each of these branches is an average 15 SNPs or 2100 years 23 The mean age of the subclades is calculated as in Appendix A. www.yfull.com dates them all to about 200- 400 AD using the same method, but their L21 NextGen sample is quite small with relatively few branches compared with the. 14 Table 3a. Deep subclades and clusters of the late Iron Age/early Common Era. Branch L21 Subclade Variance N* Cluster name Irish CTS4466 FGC11134 0.159 186 Irish Type II Z255 Z255 0.159 265 Irish Sea M222 DF49 0.147 966 Irish Type I, Ui Neill P314 DF21 0.140 53 P314 Project L1336 DF21 0.135 42 Clare CTS9881 Z253 0.135 35 Irish Type IV Z3000 DF21 0.124 320 Clan Colla Z16282 DF21 0.101 58 Carroll Scottish L193/S176 L513 0.204 160 McLean- Little A71/S190 DF21 0.192 109 Little Scottish L1065 L1335 0.142 429 Scottish Modal Note: *) Taken from Table C2d The lines/clusters selected are those of the right age greater than 0.5 per cent of the sample—all but two of which are very well known. What makes these subclades ‘deep’ is that they straddle a long thin line of many equivalent SNPs, so that after the branch formation in the early Bronze there were no further known branches for typically 1500–2000 years, when they suddenly sprang to life. Because of this very long isolation during which the STRs mutated, the new founder of the deep branch might quite possibly have a very different STR signature than the founder of L21, and so therefore do all his descendants. This means that members of deep subclades can often be distinguished by STRs alone. L21 Branch 1 Branch 2 Branch 3 Branch 4 Figure 6. Deep branches, clusters and overlap Figure 6 shows how these STR clusters form. The founders of deep branches 1 and 4 have haplotypes far from the modal, so their descendants do not have overlapping STRs. Deep Branches 2 and 3 however are near the modal and have some overlap in the STRs of their descendants, so they do not form a distinct cluster. Clearly, the branches have to be ‘deep’ (at least half way down the timescale) for this to work, otherwise their lines of descent will overlap. Most of these Irish late–blooming subclades have been associated with particular clans and legendary leaders from the ‘Dark Age’ and in fact they do show some degree of correspondence with 15 particular surnames traditionally associated with the clans. However, the correspondences are less than perfect. Some of the larger ‘deep subclades’ are: • M222 is the extreme case however considered. It has 45 equivalent SNPs in its lead-in and its haplotype is so distinctive it can be picked up with no error by only a few SNP markers (which is how it was discovered). It is common among surnames such as Gallagher, Boyle, Doherty and O’Donnell purportedly associated with the legendary Ui Neill, High Kings of Scotland. It is almost three times as large as the next deep subclade, and it is the flagship subclade of L21 (especially in Ulster, where Busby et al. (2012) and Myres et al. (2011) found local incidences of over 40 per cent). It also spread aggressively into Scotland, in moderate quantities to England, and in trace amounts to Northern Europe. As we shall see it is possible that it is foreign to Ireland.24 • Z3000, known as Clan Colla, is one of a number of deep lines of DF21, the largest subclade in Ireland before the Common Era expansion. The lineage is supposed to be descended from the Three Collas, warlike chieftains who conquered Ulster in the early part of the 4th century, one of whom became the first King of Airgialla in southern Ulster. The Maguires, MacMahons and other surnames are supposed to be descended from Airgialla, though this is only party demonstrated by DNA (O’Hart 1892, Biggins 2016.). This is the only late subclade to be found in Wales in significant numbers; perhaps associated with the Kingdom of Dyfed and the colony of the Deisi in Pembroke, or with the Irish colony on the Llyn Peninsula in the north-west. • Z255 ‘Irish Sea’ is one of the residual subclades from the tail of the original L21 distribution that suddenly sprang to life in the Christian era. Surnames such as Byrne, Gleeson, Fitzpatrick and Beatty are involved. It has a good representation in Scotland and England, and also in Scandinavia. • CTS4466 ‘Irish Type II’ is numerous in southern Ireland but rarely occurs elsewhere. It includes surnames such as Collins, Donohue and Sullivan. Like M222, it sprang from a fairly small Atlantic subclade, in this case FGC11134, down a long line of equivalent SNPs. • CTS9881 ‘Irish Continental’ shows the inaccuracies that may arise when only a small number of STRs are used to try to identify clusters. It was initially thought25 that the subclade included a number of Norman-English surnames and that it must have come from the Continent through the Pale. Most of these matches disappeared once SNPs were tested. There is one small English branch BY412 but it is recent, probably from Irish migration. A similar but less extensive expansion began around the same time in Scotland, presumably under the influence of some of the same drivers. • L193, a fairly deep branch of L513, expanded rapidly in Scotland in the Celtic period, along with the ‘Little Scottish’ branch A71 of DF21. It seems that these formed and expanded in the original p-Celtic Brythonic-speaking population of Scotland. • L1335 Almost as spectacular as the M222 tsunami was the sudden rise of a new branch of the background subclade L1335, which had only branched once since its inception. In the span of six SNPs, from about 100 BC to 700 AD it branched an extraordinary 38 times, the record for an L21 subclade. This founder extravaganza brought L1335>L1065 from a single man to the largest L21 subclade in Scotland, so that it is now known as ‘Scots modal’. There is no clue to its origin except for a small branch from about 1800 BC living on the remote Llyn 24 http://www.ancestraljourneys.org/irishsurnames.shtml 25 https://sites.google.com/site/irishtype4/irish-type-4-sub-clade 16 Peninsula in north-west Wales. This may be misleading as the Llyn was the site of an Irish colony in the early Christian period. 4.1.2 ‘Golden Age’ expansion and overflow About 500 years later, after the Romans departed from Britain, this expansion continued very vigorously in the early Christian period in Ireland, flowing through to Scotland and later to England and Wales. The expansion was still centred on the deep subclades but was more broadly based. This surge from about 300–700 AD laid down the subclade distribution we see today. About 80 per cent of present-day L21 from Ireland and Scotland derives from deep branches at this time, and 40 per cent of L21 elsewhere – England, Wales and the Continent (see Table C2c and C2d). Overall, about 70 per cent of L21 is from the period, so that L21 is much more a Common Era phenomenon than a Bronze Age one. Following the departure of the Romans from Britain, a further expansive surge appears to have occurred in what became the early Christian ‘Golden Age’ of Ireland.26 This later expansion appears to be a largely random outcome of rapid growth within the redistributed Irish population. Some of the larger new ‘deep’ branches involved are shown in Table 3b. These subclades all have about the same STR variance as the Dark Age subclades, but their average SNP count is several less. They all occur in the big subclades DF21, Z253 or L513. As well, M222>DF104 and CTS4466>A541 were formed, which contain the bulk of the large deep subclades Irish Type I and II. Table 3b. Subclades of the post-Roman Golden Age Mean L21 SNP Branch Subclade Variance N* counta Cluster name CTS3087 L513 0.156 26 12 S7898 Z253 0.150 22 12 Corofin Z16372 L513 0.146 29 13 Shaw Nicholson Z23532 L513 0.141 23 11 L226 Z253 0.132 187 13 Irish Type III, Boru L1402/A385 DF21 0.131 34 13 Seven septs of Laois L1336 DF21 0.101 42 13 A40 (Scotland) DF41 0.0826 30 12 1426 cluster Note a) Big Y counts are 140 years per SNP, so 13 would have TMRCA about 180AD. Some of the better known of the ‘Golden era’ subclades in Table 3b are: • L226 ‘Irish Type III’ is the largest branch of the eclectic subclade Z253. It begins with a 25-SNP lead-in which probably is native to Ireland but might be foreign. It is found in Munster, especially in Tipperary, Clare and Limerick. (Wright 2009). It is associated with the Dal gCais or Dalcassians, a tribe who designated themselves as descendants of a semi-legendary Dark Age king of Munster, Cormac Cas. The most famous member of this tribe was King Brian Boru who ended the line of Ui Neill High Kings in the early 11th century. Common surnames in the clan include O’Brien, MacNamara, Kennedy, Grady, McMahon Hogan and McGrath (O’Hart 1892). The Kingdom of Dyfed in Pembroke is supposed to be Dalcassian but there is no sign of this lineage in Wales. 26 Again, www.yfull.com shows the TMRCA for some of these subclades as 650-800 AD, but from the SNP count, 200-450 AD is more likely. 17 • ‘Cruithen’ L513. Three separate lineages of L513, which we normally associate with early Scottish DNA, are found in Northern Ireland dating from the Golden Age (Table 3b). Along with DF21, they are probably part of the original p-Celtic population in Ireland and Scotland. Although these subclades are predominantly Irish (see Table C2d) they have some Scottish presence, suggesting a continuous link. A people known as the Cruithen lived in Ulster at the dawn of recorded history, and the Irish used this same term for the Picts of Scotland. The Cruithen were ultimately overcome by the Northern Ui Neill in the 7th Century and many were driven to Scotland. Although some scholars have rejected the tie,27 the presence of expanding Golden Age L513 lineages in Ireland lends some support to the Pictish connection. In Scotland the main event of the period was the overflow of Irish excess population into the Western part of Scotland as the Goidelic-speaking ‘Scots’ who eventually overcame the confederation of original Brythonic-speaking tribes known as Celts (Broun 1999). Most of this ‘invasion’ is easily visible as M222. Given its peculiar origins and extraordinarily rapid growth from a single man, it is possible that L1335 is also Dark Age Irish and accompanied the Scots. 4.1.3 The stripping of the tail Further evidence for a near-extinction event in Ireland is the stripping away of the tail of the Pareto subclade distribution there. In a population bottleneck, a small sample of men survive including only a few subclades from the long tail. These have no competition so they may rapidly become major subclades, further stripping away the tail. This is what appears to have happened in Ireland. As we saw in Figure 4, middle-sized subclades are poorly represented in Ireland. This continues into the tail of residual subclades: Table 2 shows 15 of the residual subclades and singletons in England but only four in Ireland; Wales and even Cornwall-Devon with much smaller samples have more small subclades. A standard property of the Pareto distribution is that doubling the sample size will increase the number of subclades present by a fixed amount, The Irish sample is more than four times as large as the English but the number of residual subclades is much less, suggesting the tree has been severely pruned. A possible alternative is that Ireland was originally settled by only a small number of L21 men so that the bottleneck occurred at the beginning. However the Continent and Scotland have a reasonable share of small subclades showing from the Atlantic expansion and so should Ireland; as well it should have developed its own mini-branches since we know it was settled very early in the Beaker expansion. The anomaly is a prime sign of a bottleneck event. 4.2 Collapse and recovery? The post-Roman expansion within Ireland was preceded by a substantial collapse, recognised as a period of fortified warfare from 100 BC to 300 AD, which Charles-Edwards (2000) has called the ‘Irish Dark Age’. The genetic record suggests that this was more prolonged and far darker than anyone has previously considered. The substantial reorganisation of the male DNA lines in Ireland over a few centuries, coupled with the pruning of the residual subclades, is consistent with a catastrophic decline in the effective male breeding population of Ireland, perhaps to only a few hundred breeding men—followed by a very rapid recovery and a growing population that increased well beyond the original level. These bottlenecks appear to be quite common in the ancient genome and are responsible for intermittent severe pruning of the phylogenetic tree – explaining why so very few 27 Ó Cróinín (1995), Jackson (1956). 18 lineages have come down to us from ancient populations that numbered in the millions. This is the first bottleneck event to be so clearly identified in time and place. That there should have been population growth in Ireland and Scotland at this time is not unexpected. From 250 BC to 400 AD the climate in Northern Europe entered what has been called the Roman Warm Period (Bianchi and McCave 1999). For ancient peoples, population carrying capacity is largely dependent on food availability, and the warmer climate was beneficial for crop production in colder areas such as Scotland and Northern Ireland. It is likely that the warmer climate also led continental Celts further north, and eventually the Romans. We know that Ireland on the edge of the known world was very backward throughout the Bronze Age, and when Iron Age technology finally arrived, this must have provided a considerable boost to food production. Ireland sat outside of the Roman world, but the slow diffusion of Roman advances in crops and farming, probably through the agency of Christianity, must have kept the momentum up for a considerable time. It is not the fact of expansion, but the extraordinarily sudden appearance of 11 new rapidly expanding subclades that lead us to suspect a catastrophic event. Y-lineages normally maintain their relative proportions once they are of sufficient size (and in England, the relativities were largely maintained within R-L21, as we show in Section 5). In a large established population, any new haplogroup will not randomly reach a sufficient size in competition with other lineages to make much impact. The only ways for a lineage to break out from obscurity are: in the aftermath of a severe bottlenecking event, through founder settling of new territory, through differential population growth rates, or if the expansion is not random. These are the possibilities we now consider. It should be stressed that as far as we know we are not talking about an external invasion. The L21 subclades that appeared in the Dark Age had ancestor lines in the Isles for thousands of years but in tiny quantities. The simultaneous expansion of a small number of randomly chosen subclades in Ireland (and to a lesser extent in Scotland), apparently without interference from existing large lines, appears very similar to what occurs when a small population settles new territory. The relative absence of tiny subclades from the Pareto tail in Ireland also suggests a major dieback. The only real way for sudden marked founder effects to take place in an established population is for genetic diversity to be very substantially thinned before a major population expansion; then any man who by chance has many sons may make a large impact in the genetic pool as there is little competition. 4.2.1 Plague and famine The DNA redistribution suggests a catastrophe, probably a famine and/or plague that more than decimated the Irish population. Exactly what plague or other disaster might have done the thinning is unknown. Among calamities, plagues generally take the highest toll. We speculate that as Ireland opened up to the outside world, diseases were introduced that had been present in Britain or the Continent for some time. The locals had no resistance - which incidentally would give resistant newcomers a considerable advantage in breeding for a few generations. If Ireland had been isolated for millennia there were plenty of diseases to which the Irish would have not acquired immunity, such as those that later devastated the New World—smallpox, influenza, typhoid, yellow fever and pertussis—which taken together have typically wiped out 95 per cent of newly exposed populations (Diamond 2005). The arrival of Iron Age people, Roman traders or slaves taken in raids would have been sufficient to break the isolation and spread contagion. 19 Events of this kind were not unfamiliar to the ancients. The Irish pseudohistory Lebor Gabala tells of an early post-Deluge settlement in Ireland, about 9 000 people led by one Partholon, who all died of plague in a single week. A similar fate befell the people of Nemed some decades later, with all but 30 subsequently being killed in a battle with the Fomorians, aboriginal inhabitants of Ireland. While these tales are not to be regarded as historical events, the possibility of plague devastating the whole island community was clearly familiar to the 11th-century authors of Lebor. Famine, plague and war go hand in hand. Ireland has suffered regular severe famines, as has Northern Europe more generally. Myllyntaus (2009: 80) estimates that crop failures in Northern Europe occurred with much greater frequency than usual in certain periods. For example, torrential floods and rains caused a famine from 1315–22 in the Isles and Northern Europe that was the greatest on record, causing widespread starvation, violent social conflicts, ruthless crimes, epidemic diseases and very high mortality (Jordan 1997). Whatever happened in Ireland during the Dark Age must have been considerably worse than this, a 2500 year event probably reducing the population by more than two orders of magnitude. Ireland is exposed to weather events and other extreme conditions because of its proximity to the Atlantic. The warmer climate between 300 BC and about 100 AD produced frequent extreme weather events on the Atlantic seaboard. Strabo wrote that around the years 120 to 114 BC (exactly in our genetic timeframe) storm surge from the North Sea covered large areas along the coasts of Denmark and northern Germany with water, permanently altering the coastline and forcing the Cimbrians, Teutones and Ambrones into the lands of the Romans. Similar events probably happened in Ireland during the same period, reducing the population, which could not so easily migrate. Severe bottlenecking events associated with climate change or epidemic appear to have taken place with some regularity in prehistory, and periodic cullings such as this one are probably responsible for the very small number of ancient lines that have come to down to us on the Y-haplotree. The sweeping clean of populations by plague or disaster in ancient times was not however entirely without benefit. Subsequently, the land could be resettled more efficiently and new technology introduced, eventually producing a larger, healthier and more prosperous population.28 Ireland had been particularly backward as a remote island at the edge of the known world, and this resettlement appears to have been very advantageous, both culturally and economically. It is not necessarily a fatal objection that this proposed catastrophe is not recorded by either Irish or Roman sources, except in cryptic terms. Irish history in the first millennium is at best unreliable, a ‘mixture of truth, lies, myth and legend’. The Irish were fond of compiling long genealogies that claimed descent for most of the common surnames from legendary heroes and kings. While the Irish genealogies do extend back into the Dark Age, they do not go as far back as 100 BC, apart from the tantalising clues of the Lebor. The Roman chronicles too were silent, as Ireland and Britain were outside of their sphere of influence at this time. 4.2.2 Invasion and war? One thing we do have both archaeological and semi-historical evidence for is war. War eradicates the male population much more thoroughly than the female, Warfare then as now was bloody and brutal, but in these times it involved slaughter and selling off the survivors and their families. 28 . Bell et al.(2007) suggest that while the Black Death of the 14th century decimated the English population, it cleared the land for the wool industry which later supported the Industrial Revolution and eventually England’s colonial empire. 20 Some of the newly expanding lines may have been foreign. O’Rahilly (1946) believed that the first Goidelic speakers in Ireland arrived from Aquitaine in south-western France about 100 AD in several groups including the Connachta under Tuathal Techtmar, who carved out a territory in Meath, the kingdom of the southern Ui Neill, fighting ‘a hundred battles’. The Irish Chronicles describe Tuathal as a legendary figure descended from a long Irish lineage. Aquitaine is an unlikely site for L21>DF49; an alternative O’Neill lineage is R-DF27>>Z37492, much more likely to have come from southern France. The 17th century historian Geoffrey Keating compiled a tale that Tuathal was a leader of exiles in Scotland assisted to return by the Roman governor Agricola in the hope that raiding would stop under his rule. Part of this legend, again in the Lebor, refers to a severe famine that struck Ireland around 56 AD in punishment for the unseating of Tuathal’s father, the rightful High King. However, critics have discounted this legend as propaganda designed to legitimise a foreign invading force. The advance of the Ui Neill seems to have eliminated much of the indigenous opposition or driven them off to Scotland. Tuathal’s ‘hundred battles’ might have been a metaphor for Dark Age brutality and the elimination by war of large groups of people during a Goidelic-speaking Iron Age invasion. A people called the Fir Domnann were said to have landed in Ireland and settled in several locations, and it has been proposed that they were Dumnonians from Devon/Cornwall. Branches of DF49 and divergent early branches of M222 are present today in Cornwall and Devon. Yorke (1995: 18–19) has speculated that emigration from Dumnonia to Armorica in the 5th and 6th centuries that led to a Breton-speaking sister kingdom in Brittany was an opportunistic expansion rather than a response to Saxon harassment, so an earlier intrusion bringing the M222 forerunner to Ireland is quite possible. M222 settlers in Scotland were prominent in the Damnonii tribe of Argyll, and the naming might once again be more than coincidence, reflecting an actual tribal source. Earlier social historians were aware that during the Dark Age the population of Ireland had been severely reduced. Charles-Edwards suggested that the sale of prisoners to Roman Britain by local chiefs reduced the population. However this is disputed and it seems more likely that extensive Irish slaving raids to Britain actually augmented the population as the pax Romana declined (St Patrick was taken by slavers from England to Ireland as a young boy). The advent of Christianity in the early 5th Century accompanied a flowering of local culture, and at this time the growth in effective population accelerated. Religion pacified the warring people and gave hope and focus after centuries of fear and withdrawal.29 The monasteries were large organised farms, able to combine small holdings of land and engage in intensive farming and the cultivation of grains. Due to their regular interchange with Rome the monks may have brought newer crops and farming methods, and they were probably more prepared and had more time to experiment. Food production must have risen, permitting a rapid population advance above pre-crisis levels. It is certainly known that most of the forests of Ireland were cleared during the post-Roman period, a definite indication of a population surge. In this period, vows of celibacy would also have removed men and women from the breeding population, maintaining some continuing pressure on genetic diversity, though not sufficient to produce the wholesale distributional changes of the Dark Age. Whatever the circumstances, and whether or not foreign expeditionary forces took advantage of a population collapse, the Irish expansion was both visible and forceful, and soon spilled over into neighbouring countries, most of whom were in disarray after the collapse of the Roman Empire. To a 29 It is one of the basic principles of demographics, almost a biological imperative, that when people feel safe and secure they will have more chilren. 21 fair extent, L21 as we know it is not an early Bronze but a Common Era phenomenon, since that is when the distribution took its current form. 4.2.3 The influence of the Romans and Irish-Scottish interchange The picture in Scotland is not as clear as in Ireland, although some of the circumstances are similar. Certainly the presence of factors encouraging rapid population growth were the same – the Roman Warming and the late arrival of Iron Age technology. However it is harder to explain why the growth was so uneven and initially restricted to the deep subclades L193, A71 and especially the upstart subclade L1335 (see Tables 3 and C3d). The same disaster that almost wiped out the Irish population might have also affected the West and the Lowlands of Scotland. However we think this is unlikely as the subclade tail has not been thoroughly stripped in Scotland—and it seems that a founder effect involving the settling of vacant lands is more likely. The evidence for invasion and warfare is more substantial in Scotland than in Ireland. Iron Age invasions from the Continent were more likely to have taken place following the wholesale displacement of populations by extreme weather in Northern Europe. The wide distribution of circular broch towers in Northern Scotland and around the Forth, built from around 100 BC as apparent defensive structures, points to social disturbance or warfare. The continuing harassment of Scotland by the Romans after 71 AD may have been sufficient to cause a founder effect in the Scottish Lowlands once they departed. Although they held Caledonia for only about 40 years, their repeated invasions cleared the buffer zone between Hadrian’s Wall and the Antonine wall (see Figure 7). Tribes that opposed the Romans could be almost eliminated: Caesar claimed his conquest of Gaul killed a million people, mostly civilians. Severus invaded Scotland in 209 AD with 40 000 men, claiming to have committed genocidal depredations on the natives. Whether this is accurate, the Roman military presence had been preventing large areas of Scotland’s fertile land between the Walls from being developed, acting as a population constraint. After the Romans left in AD 419, the vacuum between the Walls was filled by incoming populations from all directions. The situation in Scotland is complicated by what has commonly been regarded as an invasion of the west coast of Scotland by Irish Goidelic speakers, who ultimately wrested power from the Pictish autochthonous majority and gave Scotland its rulers, its Gaelic language and its name. The new arrivals were known as Scots (a name previously applied by the Romans to Irish raiders). If one regards the carrying capacity of a region to be largely determined by food production, the population always tends to overshoot following the introduction of new crops and technology, to the point where it can only be sustained in the best seasons, or beyond. Then the excess population can only be relieved by famine and/or by exodus.30 As the Irish resettlement and cultural renaissance progressed, the excess population spilled over into Scotland and the Western Isles. The indigenous inhabitants had fiercely resisted Roman expansion for centuries and prevented their settlement of Romanised populations between the Walls, so a settlement in Scotland by less formidable Gaelic-speaking ‘Scots’ must have been partly consensual. This was facilitated by the adoption of Christianity influenced by Iona and Ireland, into the ‘Brythonic enclosure’ of Strathclyde in the 6th century AD. 30 Exactly as occurred in Ireland in 1845-9 when the population of Ireland was 30 per cent higher than it is today. 22 Figure 7. Hadrian’s and Antonine Walls, Scotland and Northern England Source: created by Norman Einstein 2003, Creative Commons. The expansion of M222 in Scotland is the only clear example we have of what might be a fairly concerted move by a single lineage, a deliberate move by the M222 Ui Neill group into Scotland, establishing the Dalriada overkingdom of Argyll and Antrim and over time gradually eliminating the Pictish p-Celtic confederation from positions of influence. If the move had been random with regard to lineage, we would expect to see a more balanced move of Irish DNA into Scotland. The alternative interpretation to invasion is that M222 is autochthonous, and grew from its beginnings astride both Ireland and Scotland in the Dalriada area across areas vacated by the putative disaster of 100 BC. There it was later co-opted into Christian and Goidelic culture just as the Picts were. This is less satisfactory however as it does not explain the clear distinction between the two peoples or the apparent advance of M222 northwards and eastwards. The sudden rise of the mysterious subclade L1335 from residual status to a point where it exceeded native DF21 and L513 and even M222 in Scotland defies explanation. A further complication is the extensive presence of the other major R1b branch U106 on the east coast of Scotland, which was probably associated with intrusions from the Continent, also suffering from post-Roman unrest. A further complication is the arrival of Saxons from the Continent, who according to Gildas (Frazer 2009:43) were brought in to stop the ferocious Pictish and Scottish invaders boiling from the north and west, but who ultimately revolted and formed their own settlements. There is ample evidence of population exchange between lowland Scotland and the Continent over an extended period, visible within L21 and other haplotypes, but these matters are beyond the scope of this study. 4.2.4 The elite hypothesis Since genetic genealogists became aware of the presence of the extensive M222 lineage in Northern Ireland, a ‘hypothesis of elites’ has been advanced as responsible for its prevalence. This is crudely expressed as ‘chieftains with many wives and brave warriors with many sons’, or more precisely on the ability of elites to afford polygamy and to exclude other men from breeding, in line with an ancient prejudice that virility and manly virtue expresses itself both in battle and male-line procreation. A similar idea features in many myths and pseudohistories within patriarchal societies. Specific lineages may certainly have a continuing impact within patriarchal cultures with endogamy, where specific bloodlines may be roughly preserved as cohesive entities and may be able to sustain 23 an elite caste or status. There has probably been no society that meets these requirements more thoroughly than the Celtic Irish. The Irish derbfine system was one of the most agnatic on record, designed so that land and power would not pass out of the hands of a single male lineage. The derbfine was the set of patrilineal descendants of a common great grandfather. When one of the members died, property was passed and often divided among the rest. Contracts could only be undertaken with the consent of the whole collective, and new chiefs or kings were always elected from within the derbfine of the last leader. This tended to keep the clans or septs geographically segmented. It also kept power fragmented— Ireland had about 150 petty ‘kings’. It cannot be denied that particular families, clans or races have been able to gain and hold power and dictate the directions of a society. However, this does not automatically convert to a ‘selective breeding advantage’ in the way that Moore et al. (2006) proposed for Ireland or Thomas et al. (2006) for Anglo-Saxon England. If it did, alleged ‘inferior races’ would have been outbred the world over, but they never have been.31 In fact the ‘elite hypothesis’ has little or no empirical support. Unlike herd animals, humans are poorly equipped for selective breeding. It is not an easy matter to expand a human male-line deliberately or even to sustain one: the history of Eurasian elites is littered with failed dynasties that could not produce heirs. There is nothing to suggest from genealogical studies that the number of male-line descendants of a single man correlates with socioeconomic status or any other variable— except perhaps the negative impact of having a dangerous and life-shortening occupation, such as being a miner, a seafarer or a ‘brave warrior’. The preferred alternative to the ‘elite hypothesis’ is random expansion from a bottleneck. The Yule process in the limit creates the Pareto outcome of Figure 3, where a few men randomly have large numbers of descendants and many men have few descendants. As the number of descendants is random, and elites are by definition small, it is much more likely that large lineages will descend from poor men than from kings – and this is what the recent genealogical record shows with few exceptions. The status theory does have a bearing on the structure of growth out of a bottleneck, in that any man who randomly has many sons during the chaos of an unregulated society may be able to secure resources and gain a temporary advantage for his family. Woolf (2007: 21) states, ‘In a world in which masculine physical strength counted for much in both labour and coercion, a band of brothers may have brooked little resistance’.32 He cites, ‘It is preferable that a man’s lord should be his kinsman’, quoting MacFirbhisigh’s Law: ‘It is customary for great lords that when their families and kindreds multiply, their clients and followers are oppressed, injured and wasted.‘ Social constructs like property ownership, religion and dominant language can be changed quickly by incoming elites. These may confer some temporary advantage in breeding for their followers, if for example women only marry men who speak the dominant language or follow the dominant religion. M222 in Scotland was Christian and Goidelic-speaking, which probably enhanced their status. 31 This depends to some extent on whether children of mixed marriages are absorbed into the subject or the master group. However, as an example aboriginal peoples have been able to retain a fair degree of autochthonous Y-DNA (Hammer et al. 32 There are modern counterparts: in Albania in the lawless period after the collapse of Communism, families moved en masse onto public land and began to build houses. The author was told that the size of the land holding depended on the number of sons to defend it. 24 However any hereditary elite will have many followers with different lineages, and they all will join and benefit from any ‘selective breeding advantage’, so in the end the random will triumph. It is possible the M222 founders had some sort of assistance from a status advantage in the early years; but more likely the penetration of M222 and the other deep subclades is entirely a random phenomenon, which is strongly supported by the natural Pareto distribution shown in Figure 3. In short, the trick to a ‘founder effect’ is to have many sons and grandsons randomly at a time when the effective male population is very low, just preceding a major population advance. This is what the M222 founder must have done – and also his ancestor the L21 founder, thousands of years earlier. They may have been someone of significance, but in all likelihood they were not. So we contend that status does not create large lineages but the reverse. Relatively large lineages in periods of crisis form a core or seed for clans that are able to claim power in certain types of patriarchal societies. Following disasters that suddenly reduce the breeding population and eliminate social controls, families that by chance have large numbers of sons are able to take and hold empty territory and form local elites. They then place their own junior relatives on adjacent lands and build up a clan holding. Eventually they may employ new languages and religions to support their social status and keep the momentum going. This is what we see following the Irish Dark Age, when new clans suddenly sprang up all over Ireland. A dozen different lineages were affected covering most of the country. The size of the lineages is a chance occurrence, but the fact that many are associated with particular hereditary clans at one time holding regional hegemony is not. 4.3 The slavery syphon The opposite breeding strategy to being of high status is to be a slave. For the selfish gene that cares nothing for freedom and little for quality of life, slavery can be an effective strategy for settling new territory – just as captive domestic species have been have flourished in locations far from their origins. Slaves are protected by the captors who assist them to survive in a foreign land. They are encouraged to breed to increase their numbers for work and sale, and they till the land so they are usually well-fed. The men may have last choice of the women, but they do not have to engage in punitive wars and dynastic struggles that reduce their numbers, or go on long expeditions that keep them from their wives. Consequently, the genes of slaves may survive and spread well in new lands. From about 793 AD Viking raiders from Scandinavia33 began to assault the coastline of the Isles; perhaps the long Christian peace where disputes could be resolved through church and court rather than by brute force had reduced the ability of the Britons to defend themselves. The Vikings occupied most of the Scottish Isles and the Isle of Man initially as ‘pirate retreats’ and they established large port settlements at York, Dublin and along the south and east coast of Ireland. Much of the Hebridean archipelago became Norse-speaking. Viking society was a slave society, ‘thralls’ worked the land allowing the freemen to sail in search of plunder. One of the main reasons for the raiding, according to Woolf (2007) was the partible system of land inheritance, where all brothers inherited the land, so that in a growing population plots soon became so small they were unviable. Partible inheritance and slavery have often gone hand in hand, because slaves work the land but cannot inherit. The Vikings took vast numbers of slaves to run their agricultural holdings, mostly from now overpopulated Ireland and Scotland. In a single day it is reported they took 1000 slaves from Dublin. These slaves were brought to the Viking homelands, and their genetic inheritance is visible. 33 Originally from Norway, but by 850 AD from Denmark. 25 Observing the Norse genetic contribution to the larger British population has proved elusive,34 but the reverse impact of slaves from Britain on Scandinavia’s population is easier to see. The L21 incidence in Scandinavia is only about 4 per cent, but still visible and significant. Of 48 samples of L21 from Scandinavia, none are unequivocally Atlantic - in place since the early Bronze. About half belong to post-Roman growth branches like M222, CTS4466 or L1335. All but five show a fairly close relationship (57/67) with Irish or Scottish men. Table 4 gives the breakdown of counts of Nordic L21 in the L21 database by origin and destination. Table 4. Early and late L21 arrivals in Scandinavia Early Late Total Destination Atlantic Irish Scottish Denmark 2 2 1 5 Norway 21 6 27 Sweden 3 10 3 16 Total 5 33 10 48 A reasonable conclusion is that about 90 per cent of Nordic L21 men are descended from slaves taken in raids. The samples however are small and this requires further examination. Germanic societies on the Continent were organised on a similar basis to the Scandinavian. There is some similarity between the L21 distributions in Scotland, the Low Countries and Germany (see Table C3), which may be due to documented exchanges in the early Middle Ages or to earlier slaving raids from the Continent. A proper investigation of the complex history of the continental Insular Atlantic L21 enclaves would require better data and deeper investigation. It is from the Continent that the Saxon and Norman invasions of Britain were launched, by people who were in part descendants of the Atlantic civilisation, and it is not surprising it has been difficult to isolate their genetic heritage in Britain. 4.4 The Diaspora A thousand years after the post-Roman expansion, the British Empire began in 1607 with the permanent settlement of the colony of Virginia. This was soon followed by about 20 more colonies on the eastern seaboard of North America. By this time, Britain’s forests and other natural resources of the time were essentially exhausted and investment companies formed with the intention of profiting from the virgin land in the New World. By 1670, about 120 000 British were in the New World, and by 1770 2.1 million.35 A considerable number of men in our L21 database are descended from settlers in 17th century Virginia and Carolina. From the 1840s, much of the population of Ireland, Scotland and Cornwall proceeded abroad as economic refugees. About 10 million Irish have emigrated, considerably more than the current population of Ireland, and today over 40 million North Americans claim Irish heritage. Similarly, following the Highland Clearances and the dissolution of the Clans around 1750, the Scots began to emigrate (see Beaty 2009 for a good account). About 50-million people identify as being of Scots or 34 Except in the case of the Shetland Islands and Orkney (Wilson et al. 2001, McEvoy et al. 2006). The problem lies in finding lines that are demonstrably Norse. 35 https://web.viu.ca/davies/h320/population.colonies.htm 26 Scots-Irish heritage, even though the population of Scotland is only 5.3 million. One could say that if slavery is good for spreading genes, eviction and persecution may be better. This large influx is an excellent sampler of Britain’s population from the 1600s to 1800s, and it is not surprising that the distribution of L21 subclades in North America is very much like that of the British Isles as a whole (see Appendix A). While founder effects in North America are not apparent at the broad level, they are readily apparent at the surname level (see Flood 2013 for a good example). As always the statistical rule of colonisation is: few men, founder effects and redistribution; many men, expansion of the original distribution. 5. The changing distribution of L21 – Skyline methods In the last two sections we made an attempt to establish the pre-Roman distribution of L21 subclades simply by removing the major expansive subclades from the post-Roman era. This is a fairly limited methodology as quite a number of other branches expanded late in a small way, and truncating only the large branches will underestimate their contribution. We have good reason to believe for example, that England had a considerably larger proportion of L21 in early times. Another way to analyse the data is through branch analysis methods that fall under the general heading of skyline analysis (Drummond et al 2005). This can be done either with SNPs or STRs. 5.1 Skyline and SNPs With SNPs the method is straightforward in theory. We presume the population of each subclade at any time is proportional to the number of branches in existence at that time, as a skyline plot does (see Batini et al. 2015). Table 5. Distribution of L21 subclades now and around 50 AD using SNP and STR skylines Incidence % of R-L21 (SNP)a Incidence % of R-L21 (STR)b Subclade Present 50AD 3 25 DF49 15.0 9.8 22.7 11.4 DF21 15.9 14.4 19.7 15.1 Z253 11.9 13.4 11.4 14.1 L513 9.1 10.4 10.6 8.2 L1335 7.4 0.8 9.6 0.9 Z255 4.7 1.8 5.4 1.4 FGC11134 9.9 7.3 4.9 3.5 DF41 4.6 5.5 3.3 6.0 S1051 1.1 2.4 2.2 4.7 Z251 5.5 8.7 2.1 7.5 FGC5494 2.9 6.7 2.1 6.6 DF63 2.9 3.5 1.7 5.2 Other 8.8 16.1 4.2 15.3 N 1291 492 2383 348 Source: a) http://www.ytree.net/; b) L21 database, 111 marker. 27 The main points of interest in the first two numeric columns of Table 5 are the advance of DF49 (M222) and the two formerly residual subclades L1335 and Z255, using share lost to the English subclades and the remaining residual subclades.36 5.2 Skeleton skylines and STRs A similar skyline process may be followed with STRs by creating a ‘skeleton’ which is an extension of the process we go through to weed out close relatives. For a test depth k, within each subclade we include only one of any pair with genetic block distance (GD) < k, so that all genetic distances in the skeleton are k or greater. As k increases we go progressively back in time to see only the branches that existed at that time. This allows us to use the full L21 dataset with country of origin signifiers.37 The simplest skeleton algorithm has been used, in which all records are removed with GD closer than k to the first item in the list, proceeding iteratively through the remaining records in the list. This process gives non-unique skeleton solutions, depending on the order of the list and favouring the first items; with larger k the results can be fairly different. The process can be bootstrapped by using different orderings of the records and taking the average.38 The resulting skyline distributions approximate a time progression that is not expected to be accurate but will illuminate the basic trends. 100% 90% 80%24.2% 21.7% 19.0% 17.0% 70% 19.1% 19.2% 60% 50% 47.0% 37.5% 29.7% 27.3% 40%53.6% 52.9% 30% 20% 27.0% 29.3% 29.6% 10% 20.0% 12.5% 14.5% 0% 3 10 15 20 25 30 England Ireland Scotland Wales France Other Figure 8. Skeleton skyline distributions, 111 markers, country of origin Note: Rough time equivalents (k = 3, 10, 15, 30, 25, 30) (AD 1750, 1250, 900, 550, 200, BC 150). Figure 8 shows the fraction of European L21 in different countries as the skyline progresses backward. At k=25, which approximates the pre-Dark Age distribution of L21, England and Ireland 36 By comparison with our other sources we see that the two largest deep subclades L1335 and DF49>M222 are undertested in NextGen, while the smaller subclades are relatively overtested, probably because the customers have been notified they have something unusual that needs testing. 37 In fact only the 2137 records with 111 STR markers tested are used, because the finer resolution of the larger set of markers reduces the time spread around k, 38 Two extreme solutions are given by putting the most isolated records at the top or at the bottom while progressively stepping outwards. The first solution marks out the extreme boundaries of each subclade, while the second marks central points within the structure. The results here are the average of these two, plus three random distributions. 28 each have about 30 per cent of L21 and Ireland slightly less, while Scotland has 19 per cent and the Continent about 16 per cent (see Table C4 for details). The continental contribution falls steadily from k=25 onwards. From k=20 or post-Roman times, the English share rapidly slides down to the present-day level of 12.5 per cent and the Continent to residual levels,39 while the Irish contribution doubles and the Scottish share increases. The total population probably grew at a fairly constant rate across the geographic range, so we are not witnessing differential rates of population growth; but most probably a pushing back westwards of high-L21 Insular Atlantic populations. In line with the historical record, the Continental decline might be attributed to movements of tribes in Gaul displaced by the Romans, and then followed by the expansion of Germanic peoples to the Atlantic and into eastern England – carrying high proportions of other branches of R1b and other Y-haplogroups. However while we indicate the possibility of further investigation of continental movements, our data are not adequate for this purpose. England Ireland 3 10 15 20 25 30 3 10 15 20 25 30 Atlantic Residual Atlantic Residual Regional Deep Irish Regional Deep Irish Deep Scottish Deep Scottish Scotland Continent 3 10 15 20 25 30 3 10 15 20 25 30 Atlantic Residual Atlantic Residual Regional Deep Irish Regional Deep Irish Deep Scottish Deep Scottish Figure 9. Skeleton skyline, types of L21 subclade The skeleton skylines for England, Ireland and Scotland are shown in the Excel Table C4, and a summary for the different classes of subclade is given in Figure 9. Some observations are: 39 Note the underrepresentation of the Continent; in Table A3 we recommend it should be multiplied by a factor of 5 in the present. However here we are not so interested in the absolute level but in the fact that it has fallen. 29 The skeleton skyline also shows the change of the distribution of the various subclades of L21 over time. The last two columns of Table 5 show the equivalents for STRs as the SNP NextGen distributions, and it is reassuring to see they are very close for the two methods (correlations 0.89 and 0.94, for the distributions at present and 50 AD)—an extremely good fit for the earlier distributions, given that they have been obtained by different methods using different data. The skeleton estimates are probably more accurate, because of NextGen testing bias (see Appendix 1). • The overall distribution at k=10, the beginning of the Middle Ages, is similar to the present day; • England and the Continent taken as a whole have a similar subclade distribution, which stays fairly constant over the full skyline, except for recent creep of the deep subclades at the expense of smaller Atlantic subclades; • Initially, the distributions in England, Ireland and Scotland are not dissimilar except that Ireland has considerably more DF21 and (apparently) very few residual subclades, while England has little L513 and twice as much of the mid-sized Atlantic subclades Z251 and FGC5494; • In Ireland and Scotland the rise of the deep subclades at the expense of the Atlantic subclades is shown very clearly. Most of the change takes place between r=25 and r=15 (approximately 200-900 AD). The Irish deep subclades have a strong presence in Scotland, presumably due to overflow from Ireland. Again, the results are consistent with severe shocks and foundation events in Ireland and Scotland about 2000 years ago, from which the L21 distribution is still recovering. It is tempting to use the skeleton skyline to try to attribute an origin to various branches, as the process eventually throws up a single representative or several widely separated representatives of each branch; however the method tends to deliver the most common geographical presence in each branch, which may not be the place of origin. Conclusions In the words of Wang et al (2014), ‘the Y-chromosome is a superb tool for inferring human evolution and recent demographic history from a paternal perspective.’ Our picture of the past prior to written history is incomplete and based on very limited evidence and many preconceptions. However just as genetic genealogy can assist in unravelling the recent past when records fail, phylogenealogy and the human Y-haplogenetic tree can reveal details of unsuspected population events that may have affected the later history of nations. Using the Y-haplotree, we have been able to reveal substantial changes and trends that have not been obvious using other methods of analysis. We presume that the major haplogroup known as Western R1b was spread throughout Atlantic Europe by the people known as the Beaker Folk, who were seafarers seeking tradeable resources. The expansion of the Beaker people over a narrow period of a few hundred years, establishing widely separated colonies in key locations by boat, has meant that different subclades of Western R1b have become associated with particular settlements, and we associate L21 with south-west Britain. Within the L21 lineage we can see evidence of an extremely rapid expansion of descendants of a single man, who peopled Ireland and a good part of Britain, northern France and the Middle Rhine within the space of a few hundred years, apparently meeting little opposition from the existing inhabitants. This occurred within the context of a wide ranging water-based network of trade and culture. The evidence for a dispersal from Britain includes: 30 • the larger variance of L21 in England than anywhere else, and the presence of about 30 distinct subclade branches; • south-west England as the major focus of Beaker activity in Northern Europe, at the centre of the L21 range, and having large easily accessible alluvial deposits of key metals; • L21 on the Continent taking the form of small random samples from the English distribution; • all but one of the major L21 subclades being ‘Atlantic’—spread throughout the range of the Atlantic culture, with an early Bronze presence in south-west England defined by two or more widely separated examples; • no support for any other origin for L21 or its subclades. During the period of the Atlantic culture, L21 was a primary signifier of an Insular Atlantic people in the Isles and beyond. Beaker settlements on the adjacent Continent seem to have been small client arrangements, though the presence of L21 today in the key areas shows their genetic influence persisted. Subsequent invaders of Britain such as the Belgae, Saxons and Normans had a British admixture from the early Bronze Age, making their DNA rather difficult to distinguish from the English population. Several major subclades DF21 and L513 found their way into northern and Irish populations in greater proportions. There is some indication from the Y-haplotree that Scotland underwent a population expansion during the Bronze Age Climate Optimum from about 1600 BC, so differential rates of population growth involving founder effects may be responsible for this limited early differentiation of subclades. It has been suspected for a long time because of the very few male lineages that have come down to us that the human genome has been subject to intermittent pruning involving substantial decreases in genetic diversity probably resulting from natural disasters, epidemics or extremes of warfare. For the first time one of these has been pinpointed, in Ireland around 100 BC, also affecting Scotland but not England to any degree. At this time, more than half way through its history, a major reorganisation of L21 took place in Ireland and parts of Scotland. The Irish male effective population fell to very low levels, so significant as to be equivalent to a near-extinction and resettlement by survivors. This hitherto undocumented population collapse is probably due to an extreme (2500 year) weather event accompanied by famine, epidemic and opportunistic invasion and warfare. In support of this: • a dozen L21 ‘deep subclades’ that had been residual for millennia suddenly appeared at the same time from nowhere and grew rapidly. One of these (M222) grew extremely rapidly over a few centuries to become the largest subclade of L21; • the long tail of the Pareto distribution of L21 subclades was almost extinguished in Ireland; • the isolation of Ireland and its exposure to the Atlantic has made it vulnerable to weather, famine and epidemics, as strongly hinted in the Irish pseudohistories; • severe storm surge and famine were recorded by the Romans in Northern Europe from 120- 114 BC, sufficient to displace a number of tribes and change the coastline; • historians have made reference to a ‘Dark Age’ of fortified warfare in Ireland and a population decline prior to 400 AD. This decline was followed by a period of rapid population growth and re-peopling which took the population above pre-disaster levels, during which Ireland underwent a major cultural renaissance and the current subclade structure of L21 was laid down. About 70 per cent of L21 belongs to ‘deep subclades’ from this period. The size of particular subclades has been established randomly, however 31 the patrilineal and fragmented nature of Irish society caused these new deep subclades to be confined to particular areas and clans to a fair extent, a spatial legacy that is still visible today. The very ‘long thin’ lead-ins to the new subclades meant that their founders had drifted far from the Atlantic Modal in many cases, so that the ‘clusters’ could be easily identified by STR signatures. In Scotland much the same happened with the emergence of several new large ‘deep subclades’ especially L1335. However here the situation was complicated by prolonged harassment by the Romans that kept the population ‘between the Walls’ at minimal levels. With the departure of the Romans, the gap was filled by Irish populations that had overextended their local carrying capacity, and invasion from the Continent and England. During the same period Insular Atlantic DNA seems to have been pushed back on the Continent and in England. This substantially lowed the contributions of England and the Continent to the total L21 population. Around 90 per cent of L21 in Scandinavia dates from the Viking period and is probably attributable to prisoners and slaves brought back to Scandinavia. We have uncovered no evidence of any significant back-migration of L21 from mainland Europe to the Isles,40 though this must have occurred in small quantities. Phylogenetic methods such as variance, PCI and admixture analysis are found to be unsuitable in their usual form for analysis of L21 because of the multi-staged dynamic subclade development and the in situ alterations the haplogroup has undergone. The final legacy of L21 is that it was carried to the English-speaking Diaspora in such great numbers that no bottlenecking is evident; in fact the structure of L21 seems to have been better preserved abroad than in the Isles These foreign descendants of an ancient Bronze Age lineage have come together collectively to test their own DNA and construct the detailed Y-haplotree that has made this analysis possible. Embedded in that haplotree is a rich imprint of events that happened before literacy or recorded history in the Isles, which this paper has begun to explore. 40 With the exception of the Royal Stewart line DF41>L726, who are known to have come from Brittany in the Middle Ages. The most likely back-migrants would be Z253 or Z251, which have a wide continental presence. 32 7. References Batini, C, Hallast, P, Zadik, D, et al. (2015). Large-scale recent expansion of European patrilineages shown by population resequencing. Nature Communications 6, Article 7152. Beaty, K G (2009). Finding the ‘Scot’ in the Scottish-American: an investigation of Scottish identity through mitochondrial DNA and Y-chromosome markers. MA Thesis, University of Kansas. kuscholarworks.ku.edu/bitstream/handle/1808/5976/Beaty_ku_0099M_10671_DATA_1.pdf, Accessed April 2016. Bell, A R, Brooks, C and Dryburgh, P R (2007). The English Wool Market, c.1230–1327 Cambridge: Cambridge University Press, Bianchi, G G and McCave, I N (1999). Holocene periodicity in North Atlantic climate and deep-ocean flow south of Iceland. Nature 397 (6719): 515–7. Biggins, P (2016). DNA of the Three Collas. www.peterspioneers.com/colla.htm#multiplesepts. Boattini, A, Lisa, A, Fiorani, O, et al. (2012). General method to unravel ancient population structures through surnames, final validation on Italian data. Hum. Bio. 2012, 84, 235–270. Bradley R (2007). The prehistory of Britain and Ireland. Cambridge University Press. Broun, D (1999). The Irish identity of the Kingdom of the Scots in the twelfth and thirteenth centuries. Boydell, Woodbridge. Busby, G B, Brisighelli, F, Sanchez-D P, et al. (2012) The peopling of Europe and the cautionary tale of Y-chromosome lineage R-M269. Proceedings Biological Sciences/The Royal Society 279: 884–892. Campbell, K D (2007). Geographic patterns of haplogroup R1b in the British Isles. J Genetic Genealogy 3 (1), 3. Cassidy, L M, Martiniano, R, Murphy E M, et al. (2016). Neolithic and Bronze Age migration to Ireland and establishment of the insular Atlantic genome PNAS 113(2) 368–373. Charles-Edwards, T M (2000). Early Christian Ireland. Cambridge University Press. Clarke, D L. (1970). Beaker pottery of Great Britain and Ireland. Cambridge University Press. Cunliffe B (1994). The Oxford illustrated history of prehistoric Europe. Oxford University Press. Cunliffe, B (2010). Celtic from the west. Chapter 1: celticization from the west: the contribution of archaeology. Oxbow Books. Cunliffe. B (2001). Facing the ocean: the Atlantic and Its peoples, 8000 BC to AD 1500. Oxford University Press. Diamond, J (2005). Guns, germs, and steel: the fates of human societies. W W Norton & Co. Drummond A J, Rambaut, A, Shapiro, B and Pybus, O C (2005). Bayesian coalescent inference of past population dynamics from molecular sequences. Mol Biol Evol 22(5):1185–1192. Flanagan, L (1998). Ancient Ireland: life before the Celts. Dublin: Gill & MacMillan. Flood, J (2013). Unravelling the code: the Coads and Coodes of Cornwall and Devon. Deluge Publishing. Flood, J (2016). The conquest of the Atlantic seaboard: Beaker Folk and Western R1b. In preparation. 33 Fox GW and Lasker GW (1982). The distribution of surname frequencies. International Statistical Review 51: 81-87. Frazer, J E (2009). From Caledonia to Pictland: Scotland to 795. Edinburgh University Press. Hammer, M F, Chamberlain, V F, Kearney, V F, et al. (2005). Population structure of Y-chromosome SNP haplogroups in the United States and forensic implications for constructing Y-chromosome STR databases Forensic Science International 164: 45–55. Jackson, K H (1953). Language and history of early Britain. Edinburgh University Publications. Jaski, B (2013). Early Irish kingship and succession. Four Courts Press. Jope, E M, Morey, J E and Sabine, P A (1952). Porcellanite Axes from factories in north-east Ireland: Tievebulliagh and Rathlin. Ulster Journal of Archaeology 15: 31–60. Jordan, W C (1997). The Great Famine: Northern Europe in the Early Fourteenth Century. Princeton: Princeton University Press. Kennedy, I (2014). The history of M222: a story in six parts. http://www.kennedydna.com/HistoryOfM222.pdf, accessed April 2016. Leslie, S, Winney, B, Hellental, G, et al. (2015). The fine scale genetic structure of the British population. Nature 519: 309–314. McEvoy, B, Brady, C, Moore, K L T and Bradley, D G (2006). The scale and nature of Viking settlement in Ireland from Y-chromosome admixture analysis. European Journal of Human Genetics 14, 1288– 1294. Moore L T, McEvoy B, Cape E, et al. (2006). A Y-Chromosome signature of hegemony in Gaelic Ireland. Am J Hum Genet 78: 334–8. Myres, N M, Rootsi, S, Lin, A A, et al. (2011). A major Y-chromosome haplogroup R1b Holocene era founder effect in Central and Western Europe. European Journal of Human Genetics 19: 95–101. Myllyntaus, T (2009). Summer frost: A natural hazard with fatal consequences in pre-industrial Finland. Chapter 3 in Mauch, C and Pfister (eds). Natural disasters, cultural responses: case studies toward a global environmental history. Lexington Books. Ó Cróinín, D (1995). Early medieval Ireland 400–1200. Longman. O’Hart, J (1892). Irish pedigrees or the origin and stem of the Irish nation. James Duffy and Co. O'Rahilly, T F (1946). Early Irish history and mythology. Dublin Institute for Advanced Studies. Osmon, R (2011). The graves of the Golden Bear: ancient fortresses and monuments of the Ohio Valley. Grave Distractions Publications. Parker-Pearson, M G, Pollard, J, Richards, C., et al. (2013). ‘Stonehenge’, pp 159–78 in Harding, A and Fokkens, H (eds).The Oxford handbook of the European Bronze Age. Oxford University Press. Pevsner, N (1989). Cornwall. Yale University Press. Pfister, U and Fertig, G (2010). The population history of Germany: research strategy and preliminary results, Max Planck Institite for Demographic research. Working Paper WP 2010-035 34 Roewer, L, Croucher, P J, Willuweit, S., Lu, T T, Kayser, M, Lessig, R, et al. (2005). Signature of recent historical events in the European Y-chromosomal STR haplotype distribution. Human Genetics 116: 279–291. Rossi P (2015). Self-similarity in population dynamics: surname distributions and genealogical trees. Entropy 17: 425–37. Sherratt, A. G. (1987). Cups that cheered: the introduction of alcohol to prehistoric Europe. In Waldren, W H and Kennard, R C, Bell Beakers of the Western Mediterranean: definition, interpretation, theory and new site data. The Oxford International Conference 1986. Oxford: British Archaeology Reports: 81–114. Standish, C D, Dhuime, B, Hawkesworth, C J and Pike, A W G (2015). A non-local source of Irish chalcolithic and early Bronze Age gold. Proceedings of the Prehistoric Society 81: 149–177. Taylor, J J (1980). Bronze Age goldwork of the British Isles. Cambridge University Press. The 1000 Genomes Project Consortium. (2010). A map of human genome variation from population- scale sequencing. Nature 467: 1061–1073. Thomas, M G, Stumpf, M P H and Härke, H (2006). Evidence for an Apartheid-like social structure in early Anglo-Saxon England. Proc. Biol. Sci. 273: 2651–7. Underhill, P A (2003). Inferring human history: clues from Y-chromosome haplotypes. Cold Spring Harbor Symposia on Quantitative Biology, Cold Spring Harbor Laboratory Press: LXVIII, 487–493. Wang, C-C, Thomas, M, Gilbert, P, et al. (2014) Evaluating the Y-chromosomal timescale in human demographic and lineage dating. Investigative Genetics Dec 2014: 5–12. Wilson J F, Weiss, D A, Richards M, et al. (2001). Genetic evidence for different male and female roles during cultural transitions in the British Isles. Proc. Natl Acad. Sci. USA 98, 5078–83. Winney, B, Boumertit, A, Day, T, et al. (2012). People of the British Isles: preliminary analysis of genotypes and surnames in a UK-control population. Eur J Hum Genet 20(2): 203–210. Woolf, A (2007). From Pictland to Alba, 789–1070. Edinburgh University Press. Wright, D M (2009). A set of distinctive marker values defines a Y-STR signature for Gaelic Dalcassian families. Journal of Genetic Genealogy 5: 1–7. Yorke, B (1995).Wessex in the early Middle Ages. Continuum International Publishing. Yule, G U (1925). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis, F.R.S.. Philosophical Transactions of the Royal Society B 213: 21–87 Zhivotovsky, L A, Underwood, P A, Cinnoglu, C, et al. (2004). The effective mutation rate at Y- chromosome short tandem repeats, with application to human population-divergence time. Am. J. Hum. Genet. 74: 50–61. . 35 APPENDIX A. Data This project has assembled as many L21 and other European Y-haplotype data records as possible. To obtain the fine resolution we need for examining the very detailed and ‘bushy’ branching of R-L21 in its early years and in Ireland and Scotland nearly 3000 years later, we need to know the proportions of the early Bronze L21 subclades deriving from each country, as well as the ‘deep subclades’ that expanded in the early Christian era. There are no publicly available academic studies that have produced sufficiently detailed data or that have tested British data sufficiently well.41 The FTDNA Y-project commercial database is by far the largest and most comprehensively tested for Y-chromosome STRs and SNPs. It can only be publicly accessed in a partial way, by taking records from various public projects and combining them. Two datasets from FTDNA were compiled for the project in this way. A general European ‘origins’ dataset to calculate incidence of L21 and other haplogroups was obtained by combining all the European geographic projects with all the haplotype projects for Europe and eliminating duplicates. To the commercial core was added the databases of a few research studies where these contained L21 typing—which has helped to improve the measures of incidence in Spain and Italy. A total of 27264 records are in this European Origins database. The estimated incidences of all different Y- haplogroups by country are shown in spreadsheet Table C1, along with a list of sources. We only use the L21 results in the paper, but as all haplogroups and major subgroups of R1b and P312 had to be enumerated in doing so, we present the full results. The R1b incidences shown are somewhat lower than other estimates, for example Campbell (2007) in the British Isles. This appears to be due to more men joining the smaller haplogroup projects than the R1b projects, because R1b is thought to be not ‘interesting’. The geographical projects are more balanced, but they are limited in size and usually not as well tested and assembled. The second dataset is the L21 database, which is entirely from FTDNA projects with the addition of the 1000 Genomes set (as this has been analysed for subclades of L21). It includes STR markers, and is much more thoroughly cleaned than the European Origins dataset. First, all descendants of the same man have been removed unless they are fairly distant (same ancestor, genetic distance 65/67 or closer are removed), keeping only the record with the most testing. This prevents the database being crowded out by near-relatives. After this procedure, 6276 L21 records remained (Table C2a), including 5002 with 67 STR markers or more (Table C2b). To give an idea of the scale of this collection, Busby et al. (2012) claimed to have ‘the largest collection of R- M269 yet assembled’ with 2000 records and 10 STRs. Yet although the L21 sample here is nearly eight times as large and much more extensively tested, for some purposes (particularly on the Continent) our sample is still not large enough. Second, the country of origin has been corrected where possible. Participants in FTDNA testing are asked to provide their earliest confirmed paternal-line ancestor and their ‘country of origin’. However only a few geographical projects, CORNWALL and DEVON, formally vet these ‘distant ancestor’ assignments by checking the existence of the ancestor and the line of paternal descent. The main problem is the descendants of early settlers in North America, particularly from the Virginia and Carolina colonies, who do not know their European ‘roots’ and often guess at their origin (Campbell 2007). They may either give England, Ireland, Scotland depending on their surname or 41 The People of the British Isles collection from the Wellcome Trust (Winney et al. 2012) is believed to have good information and was the source of the 1000 Genomes data in Cornwall and Kent, but the database has not been made available for research. 36 some family tradition, or else state ‘United Kingdom’ or ‘Unknown’. These all have to be corrected to ‘United States’ or ‘Unknown’, which is a fairly laborious task. If an ancestor with a date and place of birth is given, the country of origin is set to that country in preference to what is stated (it quite frequently differs, such as ancestors from Northern Ireland who are stated to be from ‘Scotland’). If a name and date but no place is given, the place of origin of many of these ancestors may be found in online databases.42 After this correction about 60 per cent of the records provide a European ancestor/place of origin (see Table A1). However this does not turn out to be as serious an impediment to analysis as one might expect, since the missing values are random and unsystematic. As Table A1 shows, the subclade distribution for both groups with no known European ancestor correlates strongly with the total, showing no bias. The ‘Isles Diaspora’ group, which consists largely of descendants of 17th century settlers, correlates well with England and Ireland, whereas the ‘Unknown Ancestor’ group correlates much better with Ireland and Scotland. This is reassuring. Table A1. Pearson correlations of subclade incidence vectors for the British Isles and missing ancestor categories English- speaking Unknown England Ireland Scotland Diaspora ancestor Total England 1 Ireland 0.494 1 Scotland 0.473 0.680 1 Isles Diaspora 0.809 0.871 0.717 1 Unknown ancestor 0.454 0.872 0.921 0.799 1 Total 0.701 0.930 0.840 0.962 0.928 1 The absence of geocoding of ancestry is not the only problem; 670 records have no L21 subclade recorded (see spreadsheet Table C2a).Unfortunately the absence of subclade is systematic because men of different origins have engaged in different amounts of testing, and also because many of the large ‘deep subclades’ of Ireland and Scotland can be identified easily from STRs alone, without SNP testing. Only 6 per cent of Irish records and 9 per cent of Scottish records do not show a subclade, whereas about a quarter of records from England and Wales have no subclade and 30 per cent of those from the Continent. Thus the places that already have a paucity of data and which are important to the Atlantic hypothesis have the deficiency aggravated by minimal testing. With 67 markers we are able to increase the size of subclades by searching for clusters (taking as a clustering threshold a genetic distance of 8 or less on 67 markers). This only works on deep subclades with a distinctive founder haplotype signature. This approach is even more accurate with 111 markers, as employed in Section 5. Timing the Y-haplotree Vitally important for this paper has been the accurate Y-haplotree for L21 and the timing of key SNPs—most notably L21 itself and its Dark Age subclades. Our original awareness of the implications of haplotree analysis came from scrutinising coalescence times of various SNPs on the www.yfull.com site, which happened to coincide with known archaeological and historical events. 42 Such as Geni, Ancestry, Worldconnect. 37 The methodology used in yfull is an operationalised version of the standard method of SNP dating, which depends on counting the average number of downstream mutations within a specific part of the Y-chromosome and then applying standard mutation rates (see Batini et al. 2015, for example). The average number of SNP mutations occurring on the 8-10 million base pairs in a typical Big Y test is one per 140 years or five generations.43 This method is inexact for calculating the TMRCA for two men (who can each have very different numbers of mutations occurring since their common ancestor). It becomes much more exact as the number of men with a particular mutation increases, because the variance of the mean changes inversely with the number of samples, becoming exact for large haplogroups with many men. The problem is the pruning of the tree, with most side branches disappearing, leading to long stretches with no branch having large numbers of ‘equivalent mutations’. Along such a stretch there might as well be only one descendant. Therefore accurate results may only be obtained for ‘bushy’ SNPs with many branches. Fortunately L21 and its Dark Age subclades are of this kind. The Rathlin sequencing of ancient R-DF21 genomes matched the prior estimates almost exactly, which is most reassuring. The yfull L21 sample is quite limited, therefore we found it necessary to turn to the much larger BigTree sample44 to estimate the age of the Dark Age subclades in Table 3 with accuracy (on yfull these subclades all fell into the post-Roman Golden Age, which made little sense from a historical perspective). A recount was made of all the deep subclades branch by branch, using the SNP information provided in BigTree, and the coalescents were adjusted upwards accordingly. Another problem is testing, if one wishes to use Big Y SNPs for skyline analysis as in Table 5. Big Y may be taken as a separate test by FTDNA customers; it is expensive so is only undertaken by those with great interest. Some subclades have accordingly undertaken NextGen testing more frequently than others – probably due to advocacy by project administrators or to solve a specific problem. Also, people having ‘rare branches’ are encouraged to take Big Y, which may cause overrepresentation of the small subclades. Accordingly, the distribution of subclades of L21 from Big Y in Table 5 is significantly different from that shown in Table C2a, although the fit is much better in the past. However these distributional issues are not of concern to the present project. Weighting correction for spatial bias It has not been found necessary to correct for spatial bias during the course of this paper, because the randomness of the distribution of subclades as shown in Figure 3 and Table A1 is sufficiently reassuring, at least within the Isles. However there may be circumstances where this is necessary and the appropriate country weightings for L21 are shown in Table A3, for the benefit of other researchers. The Table says that roughly speaking Irish descendants need to have a weighting factor of 0.6 applied, Scotland 0.25, France 4 and Germany 8, when examining the significance of subclades etc. However this does not take into account the extended and repeated emigration from Ireland and Scotland, whereby more than the current population of these countries actually emigrated over time, while Figure 3 inclines us to the conclusion that the sample is actually a fair representation of global L21, at least as far as the British Isles and English-speaking Diaspora are concerned. 43 https://www.yfull.com/faq/what-yfulls-age-estimation-methodology/ outlines the method and its sources. 44 The outputs of Big Y are largely unintelligible in isolation, even to well-informed laymen, so most either upload their results to yahoo chat groups, where they are analysed and incorporated in BigTree, or else they pay for analysis by yfull, which is better on other haplogroups. Both sites make their trees publicly available. 38 39 Table A2. Population weightings for L21 by country Population Population 1841 million L21 L21 million Sample size Weight England 13.7 0.202 2.85 398 1.00 Scotland 2.6 0.488 1.27 701 0.26 Wales 1.8 0.495 0.89 81 1.58 Ireland 8.1 0.65 5.27 1327 0.57 France* 4.6 0.41 1.89 67 4 Germany* 7.0 0.35 2.45 44 8 Note: *France here consists only of Normandy, Brittany, Picardy and Alsace-Lorraine. Germany contains the Rhineland; its L21 fraction is taken to be the same as Alsace. These regional populations are taken pro-rata from national 1840 figures, using the current regional population distribution. Source: Online Historical Population Reports www.histpop.org, Pfister and Fertig (2010). Disciplinary bias – strengths and weaknesses of the dataset Population genetics is a very different discipline from genealogy. The first is concerned with alleles and their transmission and development, mostly using statistical programs to compare DNA within and between populations. Genealogy is based on historicity and the construction of the family tree using records and making educated assumptions using family tradition and social trends. Different disciplines have surprisingly different norms and paradigms relating to data. The natural sciences such as genetics prefer primary data, although this is changing as large collections of genomes are established. Social sciences such as economics and geography largely use secondary published data, while genealogy uses administrative data. From an operational point of view, commercial DNA databases are similar to other ‘found datasets’ such as administrative data, which generally have to be cleaned and converted to purpose. The FTDNA dataset is a particularly interesting example of an evolving self-contributed online database of a kind that is becoming more and more common, and it is a worthy subject of research investigation in its own right. Unlike special-purpose datasets which are mostly single-use and often discarded, the evolving dataset may be applied to all kinds of problems, whenever the number of records and the ever-increasing level of testing are sufficient for the purpose. Like most evolving administrative or ‘found’ datasets, extra care and different kinds of data cleaning are required than with primary special-purpose samples. If necessary weightings may be applied to limit bias. Despite the disciplinary preference for self-collection, many formal published phylogenetic Y- chromosome studies have used the FTDNA database or its more easily accessible Ysearch offshoot to fill out their data. What is often not appreciated or is glossed over is that many research collections do not actually use a formal population sampling strategy.45 They designate a few geographical areas or population groups and obtain a number of volunteers in each group, usually applying minor screens to improve randomness and local focus. Surprisingly, the manner in which these local samples are collected is rarely stated. The lack of attention to this point is startling, but it is presumed some form of informal submissions model is used, either advertising for participants, or initiating contact until a sufficient number of participants is reached. This is very similar to the way 45 Those that do employ a selection strategy generally have much better data sets – a fine example being Boattini et al. (2012) in Italy who performed a prior surname analysis and selected their sample size on this basis. 40 FTDNA projects obtain their members – either they are pro-actively approached or they respond to various forms of publicity. Because of the very limited level of testing in most formally published research databases, the only ones we have been able to find to supplement our databases are the 1000 Genomes Consortium (2010) and Boattini et al. (2012). The strength of the FTDNA ‘crowd-funded’ database is the very large number of records, the inclusion of the English-speaking Diaspora, its ‘living’ evolving nature and the breadth and depth of DNA testing, well beyond what most research studies have been able to afford. The weaknesses are the sporadic geotyping in most FTDNA projects (notable exceptions being CORNWALL and DEVON where applicants are formally vetted) and the strong spatial bias and moderate genotype bias with regard to numbers of participants and levels of testing, which leads to some concerns about representativeness. We know from Table A2 that the FTDNA database is heavily loaded in favour of English-speaking countries, particularly the USA, and also (apparently) heavily in favour of men of Irish and Scottish descent. This can in theory be corrected by weighting; however it turns out to not be a great concern for R-L21, the pre-Diaspora structure of which appears to be representative and actually appears to have been better preserved in the Diaspora than in Europe, if the number of residual subclades found there is any guide. 41 APPENDIX B. When STR variance fails Before the widespread availability of whole-genome testing, one of the principal techniques of phylogenetics made use of the spatial variance of a small number of SNPs as a proxy for the age of the coalescent (TMRCA) of a group of men. This is because of a standard theorem in genetics that the variance of an allele is an unbiased estimator of the number of generations since the common ancestor (Zhivotovsky et al. 2004). Inflated claims were made for the technique: Sun et al. (2009) stated ‘microsatellites are accurate molecular clocks for coalescent times of at least 2 million years’. Phlyogeographers attempted to prove a cline in STR variance across Europe from East to West existed, as proof that agriculturalists settled from the Middle East in this way (Roewer et al. 2005, Rosser et al. 2000). However Busby et al. (2012) refuted a paper by Balaresque et al. (2010) that attempted to use R1b to confirm this cline, and in the process showed that the results depended on the properties of the STR markers used, concluding ‘existing data and tools are insufficient to make credible estimates for the age of this haplogroup’. Table B1. The major L21 subclades up to 100 BC, with variances 67 marker 111 marker Subclade variance variance N DF49* 0.342 0.326 157 L513/DF1* 0.340 0.330 129 FGC11134* 0.332 0.282 20 DF41* 0.329 0.305 58 Z253* 0.326 0.293 215 FGC5494 0.319 0.305 69 S1051 0.290 0.318 74 DF21* 0.271 0.287 333 Z251 0.269 0.292 79 DF63 0.263 0.269 37 S1026 0.252 0.258 26 Note *: As in Table 1, deep subclades have been removed. The variances of major subclades of early L21 are shown in Table B1. All these subclades are almost the same age as measured by SNPs, so the variances should be the same; however even with the large deep subclades removed there is still a very substantial difference in variance between subclades (which is much worse if the deep subclades are included). Despite Busby’s reservations, STR variance does broadly correlate with average time to coalescence; however it is not a robust or particularly accurate measure. In practice there are three classes of errors that may arise when using variance of STRs for dating coalescence times or the length of time a particular haplogroup has been present in a locality: • variance depends much more on the mix of subclades than on geographic factors; • the presence of large recent lineages within a particular subclade will void normality and give false answers; • particular STRs can show sudden leaps, and data errors or outliers can substantially modify variances. The main reason that STR variance is a poor predictor of regional difference or of origin is shown in Table B2. 42 Table B2. L21 variance explained* by region and subclade % of variance Source of variance Contribution explained Region 0.002957 2 Subclade 0.114356 76 Region*Subclade 0.033216 22 Var(Error) 0.289278 Note: *Minimum Norm Quadratic Unbiased Estimation Of the non-random variance in STR markers, 76 per cent is explained by the mix of subclades, 22 per cent by the interaction between subclade and region (for instance, the presence of ‘short fat’ sub- branches in some regions), and only 2 per cent by country differences (for example, the length of time R-L21 as a whole has been in the country). So three-quarters of the explained variance is due to the presence of different subclades (particularly the deep subclades that have their own STR signature) and the remainder is due to combined effects (the different properties of subclades by region). Only a negligible proportion of variance is due to the region alone. Table B3 shows the variance of L21 in various countries or regions for different numbers of markers (the results are fairly sensitive to what markers are used), in descending order for 67 markers. Since each country contains more than one subclade, the coalescent (TMRCA) is actually the same; at about 4500 years. The variance gives some idea of the diversity of L21 in each place (by coincidence, it also describes what we suspect was the order of settlement in Europe). The different variances do not imply different lengths of time, they represent the relative diversity of the L21 distribution— ‘long and thin’ or ‘short and fat’. Table B3. STR variances of L21, various locations and regions, 67 and 111 markers Country/region 67 markers 111 markers England 0.3439 0.3374 France 0.3373 0.3167 Ireland 0.3366 0.3202 Unknown 0.3300 0.3206 English-speaking 0.3295 0.3227 Diaspora Germany 0.3183 0.3397 Wales 0.3183 0.3397 Scotland 0.3177 0.3242 Hispanic 0.3108 0.2953 Mediterranean 0.2958 0.3002 Scandinavia + 0.2923 0.3237 Low countries All 0.3352 0.3271 It is reassuring that the variance in the two categories of missing origin (Unknown and English- speaking Diaspora) is very close to that of the whole sample, showing first that the sample is representative and second that the Great Migration had no obvious founder effects. It is noteworthy that L21 has only been in North America for 400 years yet its variance is greater there than in Scotland, where it has been present for 4500 years. 43 Finally, variance is unfortunately a non-robust measure. It is calculated as squares of differences and bad data or outliers can have a substantial effect in a small sample. Measurement error, data transcription errors, or sudden leaps in the values of particular markers such as RecLOHs have to be monitored carefully. Changing the markers used can also produce different results, as Tables B1 and B3 show. So—while STR variance tells us something useful, it is not the amount of time L21 has been present in a location. It gives a better indication of the age of a particular SNP, but even there it is modified by the internal structure and the proportion of recent expansion in the subclade. The problematic nature of STR variance as a tool implies that many of the standard methods of phylogeography also do not work on closely related populations and are at best only mildly indicative of any relationship. In particular admixture analysis using Y-STRs, as was commonly employed throughout 1995-2010 in academic papers (e.g. McEvoy et al. 2006), does not work on L21. For instance, if one was to try to infer present-day proportions of Irish, Scottish or English populations on the Continent using the pre-Roman distributions in Table 5, the results would have no meaning because the populations in Ireland and Scotland had not yet differentiated to any significant degree. In the very long term there are likely to be several widely separated periods of interaction between neighbouring countries, and if the distributions have changed internally in the meantime, the overlays will be hard to interpret by a single measure. The various correlations between modern distributions are shown in Excel Table C3, which indicates that most places on the Atlantic culture spread are more closely related to England as the original point of distribution than to each other, However there are some correlations between Germany, the Low Countries and Scotland because of late interchange between those places, which deserves investigation using other haplogroups as well as L21 and better continental data. Figure B1. Country scores on Principal Components 2 and 346 46 Component 1 separates England from Ireland and Scotland and is otherwise uninformative. 44 A similar concern applies to Principal Components Analysis (PCA) on L21. Figure B1 graphs the country placements on the principal components of variance. The results are similar whether we use STRs or subclade vectors (the proportions in each subclade), but the latter give much better fits, as one would expect from Table B2. The first three components separate out England, Ireland and Scotland as expressing much of the variance in L21. All the other countries, even Wales, remain tightly clustered in the middle. This is because their subclade distributions were mostly established through the Atlantic culture long before the Isles differentiated. The same thing will happen using SNPs, so that one must be careful in interpreting any form of variance-based procedure for geographical differentiation of closely related populations undergoing significant internal change over time, including admixture analysis and PCA. 45
Table C1, Distribution of Y-haplotypes, European countries and selected regions, per cent Ireland Wales Scotland England France Spain Portugal R1b 75.9 65.8 67.9 57.2 57.4 56.6 46.4 R1a 2.5 1.2 4.8 3.7 2.0 2.3 2.0 I1 5.3 15.3 10.8 16.3 11.4 5.2 4.4 I2 10.4 4.9 9.6 9.6 5.7 4.9 5.7 EGJT 5.4 12.7 6.2 13.3 23.5 31.0 39.6 Other 0.5 0.0 0.7 0.0 0.0 0.0 2.0 R1b 75.9 65.8 67.9 57.2 57.4 56.6 46.4 Eastern 0.5 0.0 0.5 0.0 1.5 2.5 1.2 L151* 0.4 0.5 0.2 0.8 0.8 1.0 0.0 U106 5.4 6.6 10.0 20.0 6.6 2.1 3.7 P312 69.6 58.7 57.3 36.4 48.5 50.9 41.5 P312 69.6 58.7 57.3 36.4 48.5 50.9 41.5 L21 64.8 49.5 48.8 20.2 16.6 11.8 10.4 DF27 2.4 3.4 3.4 7.9 11.7 31.6 25.9 U152 1.7 3.4 3.2 6.4 18.9 7.3 5.2 * 0.6 2.4 1.9 2.0 1.3 0.3 0.0 N 3315 567 2302 3923 891 1428 455 Notes *: Balkans = former Yugoslavia; Other = Q, N, C, R2, other haplotypes; Eastern R1b = L151- ENGLISH REGIONS South South West Cornwall Devon West coast SE Kent Midlands R1b 69 71.8 61.3 58.6 55.8 65.5 57.1 R1a 1.3 3.7 6 1.4 6 4.2 5.3 I1 11 8 15.1 18.6 14.4 11.8 19.3 I2 9.7 7.1 5.2 14.3 13.4 7.6 6.5 EGJT 8.4 9.4 12 7.1 10 10.9 10.9 Other 0.6 0 0.4 0.4 0 0.9 R1b 69 71.8 61.3 58.6 55.8 65.5 57.1 Eastern 0.8 1.6 1.4 0.0 0.0 0.0 0.4 L151* 0.0 2.6 1.4 0.0 0.0 1.1 1.2 U106 19.3 28.5 27.7 35.6 30.0 29.9 23.0 P312 49.0 39.1 30.8 23.0 25.8 34.4 32.5 P312 49.0 32.1 30.8 23.0 25.8 34.4 32.5 L21 24.0 18.7 13.7 10.5 9.7 13.0 11.7 DF27 13.9 7.2 8.5 6.3 6.5 7.1 12.2 U152 5.5 2.9 7.7 2.1 6.5 9.5 7.6 * 5.5 3.3 0.9 4.2 3.2 4.8 1.0 N 154 351 232 70 201 89 322 Note: Standard administrative regions, except SW excludes Cornwall/Devon, SE excludes Kent, S Coas Notes: Untested R1b and P312 are distributed pro-rata faccording to tested distribution Sources: Geographical Projects Anglo-Saxon, Benelux, British Isles, Cornwall, Devon, Ireland, Irish mapping, Munster Irish, French Heritage, Huguenot, Normandy, Alsace, Parisi Celts, Flanders, Belgium Walloon, Net Germany, German Language Area, Palatine, Alpine, French Swiss, Lithuania, Lituaniapropria Balkans, Bulgarian, Romania, Greece, Italy, North Italy, Campania, Malta, Spis Slovakia, Ibe Sources: Mixed Projects I1 East/Central Europe, I1 Suomi, Iberian I1, R1b France, R1b Iberian, E Scotland Sources: Haplogroup Projects C, C-P39, C-M217, E-V13, E-M35, E-M81, E1a, E1a1, E1b1, E1b1a, E1b1a1, E-L674, F, G2a2a I2*, I2a, I2a2b-L38, I2b-L415, I-L161, J, J1c3, J1c3d2, J-M172, J1-M267, J2, J2a, J-l214, J-M304 R*, R1a*, R1a, R1a&subclades, R1a1ah, R2, R2 WTY, T, S14328, R1b, R1b-M343, R1b1*, R-M Sources: L21 Projects R-L21, RL21WTY, R-17-14-10, R-FGC11134, R-CTS4466, R-DF21, Little Scots, R-L513, R-L133 Sources: Other: 1000 Genomes, Boattini et al. (2013). Excluded: Former USSR except Baltic States, Turkey, Jewish projects (overrepresented), su ed regions, per cent Nether- Switzer- Norway Denmark Belgium lands Germany Sweden land Italy 21.8 32.0 58.8 45.1 32.2 22.1 37.8 34.9 29.8 6.8 4.1 2.5 9.4 17.7 1.8 3.4 33.2 40.8 10.6 14.7 20.3 38.6 13.8 9.5 6.0 8.8 11.5 14.6 8.6 4.6 9.3 6.7 4.5 11.2 14.5 19.5 29.6 6.3 37.3 45.5 4.8 0.3 0.5 3.6 2.5 10.7 0.0 0.0 21.8 32.0 58.8 45.1 29.6 22.1 37.8 34.9 0.7 0.0 0.9 1.6 1.6 0.6 2.4 7.7 0.1 1.5 0.9 1.8 0.8 0.3 0.5 0.1 8.8 16.5 21.9 24.0 11.8 10.4 8.9 3.6 12.2 13.9 35.1 17.7 15.4 10.7 25.9 23.5 12.2 13.9 35.1 17.7 15.4 10.7 25.9 23.5 6.4 5.3 10.7 3.6 3.5 3.5 1.2 1.5 1.2 4.0 13.0 5.4 3.0 2.3 4.9 5.4 1.3 4.6 9.0 6.6 6.4 2.2 19.8 16.7 3.3 0.0 2.4 2.1 2.5 2.7 0 0.0 914 294 218 632 2645 784 442 1624 astern R1b = L151- ITALY, FRANCE, SPAIN East North North North South Midlands Yorkshire London East West Italy Italy 53.7 58.15 74.6 63 65.7 47.9 22.3 4.6 6 3.4 3 1.6 3.0 4.0 22.7 19.3 4.5 15 14.1 9.5 9.5 7.6 6 6.8 12 7 2.8 3.5 11.4 7.8 10.7 7 11.2 36.7 60.6 0 2.75 0 0 0.4 0.0 0.0 53.7 58.15 74.6 63 65.7 47.9 22.3 0.0 2.0 1.5 0.0 1.3 6.3 8.1 1.1 2.0 0.0 0.0 0.6 0.0 0.5 17.9 19.0 27.1 20.5 40.7 3.7 3.5 34.7 35.1 46.0 42.5 23.1 38.0 10.2 34.7 35.1 46.0 42.5 40.7 38.0 10.2 25.7 15.5 32.3 26.8 26.4 1.7 1.1 0.0 10.6 4.2 6.3 6.4 8.3 2.6 7.7 7.4 7.6 7.9 7.1 28.1 6.4 1.3 1.6 1.7 1.6 0.7 0.0 0.0 132 218 177 100 242 463 198 E excludes Kent, S Coast is Hampshire/Sussex/Wight mapping, Munster Irish, Ulster Heritage, Isle of Man, Scottish, Scottish mapping, Scotland Flemish, Wales, Welsh , Belgium Walloon, Netherlands, Viking-Germanic, Scandinavia, Denmark, Danish Demes, Norway, Sweden, Swe thuania, Lituaniapropria, Latvia, Baltic Sea, Polish, Vistula River, Waldensian, Czech, Slovak, Hungarian Jászság, Malta, Spis Slovakia, Iberian, Portugal, Spain , E Scotland 1b1a1, E-L674, F, G2a2a, G2b, G-L497, G-CTS342, G-L293, G-M406, G-M342, G-U1, G-Uncat, G-PF3359, G-PF314 7, J2, J2a, J-l214, J-M304J-M241, J-L817, J-L1405-M67, J2a-PF5197, J2b-M102, J-YSC0000076, L, N, N1c1, N-L732, N b, R1b-M343, R1b1*, R-M73, R1b-M269, R-P310, R-P312, R-DF19, R1b-L238, R-DF27, R-SRY267, R-FGC20747, R-U e Scots, R-L513, R-L1335, R-FGC5494, R-S1026, R-S1051, R-Z251, R-FGC13899, R-Z253, R-Z255, R-CTS3386, R-D ts (overrepresented), surname projects Baltic Greece Balkans* Romania Bulgaria Poland Finland States Austria 21.5 11.2 12.1 12.4 10.1 6.1 10.6 15.3 9.0 23.1 12.6 13.5 40.1 12.1 15.8 15.9 2.3 9.0 7.1 4.6 9.3 34.4 4.3 9.5 14.8 24.2 11.0 20.8 6.6 1.8 2.5 9.5 52.4 30.7 48.9 46.7 27.9 1.7 41.8 47.1 0.0 1.8 8.2 1.9 6.1 43.9 25.0 2.6 21.5 11.2 12.1 12.4 10.1 6.1 10.6 15.3 10.8 5.8 3.8 10.1 2.4 0.1 2.4 2.1 0.0 0.5 0.0 0.5 0.6 0.0 0.0 0.0 0.5 2.4 1.5 0.0 3.1 3.3 3.4 4.2 10.3 2.4 6.8 1.8 4.0 2.7 4.8 9.1 10.3 2.4 6.8 1.8 4.0 2.7 4.8 9.1 2.9 0.8 2.3 0.0 0.6 0.0 1.4 0.0 1.5 0.0 0.0 0.0 1.0 0.0 0.5 0.0 5.9 1.6 3.4 1.8 2.0 0.5 2.8 6.3 0.0 0.0 0.0 0.0 0.4 0 0 0 321 342 183 259 1470 719 564 189 NCE, SPAIN Nor- Cata- Sicily mandy Brittany Alsace Galicia Basque lonia 21.4 74.4 77.7 43.6 57.0 66.6 66.1 4.4 4.6 0.0 3.1 0.8 2.6 1.5 9.5 9.5 9.5 9.5 9.5 9.5 1.9 12.2 6.9 4.3 6.3 4.0 2.6 5.0 52.5 4.6 8.5 28.1 26.2 18.1 25.5 0.0 0.0 0.0 9.4 2.4 0.5 0.0 21.4 74.4 77.7 43.6 57.0 66.6 66.1 10.5 2.3 0.0 0.0 1.6 0.0 1.5 0.0 2.3 0.0 0.0 0.4 1.0 0 4.4 8.0 2.1 6.3 1.2 0.5 7.2 6.5 61.7 75.6 37.4 53.8 65.1 57.4 6.5 61.7 75.6 37.4 53.8 65.1 57.4 1.0 41.8 50.4 37.4 14.2 11.5 8.2 1.9 10.9 12.6 0.0 31.6 53.6 37.6 3.6 5.4 12.6 0.0 7.9 0.0 11.5 0.0 3.6 0.0 0.0 0.0 0.0 0.0 181 87 47 32 248 193 2303 emish, Wales, Welsh Patronymics, orway, Sweden, Swedish nobility, Finland, Finno-Ugric, Hungarian Jászság, Hungarian Magyar, Hungarian Bukovina, G-PF3359, G-PF3147, G-M377, H, I, I1, I1-Z140, I-L205, I1-L69, I1-Z58, I1a1b2, I1d, I1-L1301,I1-L1302, I-L161, I- L, N, N1c1, N-L732, N-Z1936, N-VL29, N-P189, N-L666, Q, Q Nordic, 67, R-FGC20747, R-U152, R-FGC22501, R-U106, R-P89, R-U198, R-Z18 Z255, R-CTS3386, R-DF41, R-DF49, R-M222, R-L226, R-DF63 Hungary Czech Slovak 17.4 23.8 9.0 22.6 27.9 37.0 8.7 7.4 5.2 13.6 12.6 16.1 31.7 24.5 28.9 6.0 3.8 3.8 17.4 23.8 9.0 2.8 3.5 1.6 0.3 0.6 0.0 5.8 6.4 4.1 8.6 13.4 3.3 8.6 13.4 3.3 1.0 2.0 1.4 1.3 6.2 8.7 1.3 619 420 211 1301,I1-L1302, I-L161, I-M223, I-P109, Table C2. L21 subclade counts by country/region a) Full database, N=6577 DF49 DF21 L513 Z253 L1335 England 37 56 34 42 3 Ireland 660 420 209 263 60 Scotland 166 124 122 43 188 Wales 6 33 9 5 5 English-speaking diaspora 283 197 106 100 30 France 10 6 3 8 1 Iberia and Latin America 1 12 Italy and Greece 2 1 1 2 German speaking 3 4 8 3 4 Scandinavia 5 9 1 12 4 Low Countries 2 2 3 East-Central Europe 1 Unknown 285 172 131 122 183 Total 1458 1025 625 611 483 b) with 67 makers, edited*, N=5002 DF49 DF21 L513 Z253 L1335 England 41 51 32 43 32 Ireland 423 354 114 202 41 Northern Ireland 77 21 29 12 16 Scotland 138 99 95 38 181 Wales 5 30 8 5 8 English-speaking diaspora 232 180 72 95 41 France 10 3 3 4 Iberia and Latin America 1 7 Italy and Greece 1 2 German speaking 2 2 5 2 5 Scandinavia 5 7 1 10 3 Low Countries 2 2 1 East-Central Europe 1 Unknown 189 140 71 99 107 Total 1124 890 432 517 437 Note *: With close relatives removed, country of origin edited, matches added to subclades . The large fall-off in Irish and Scottish DF49 is due to the greater proportion of DF49>M222 not taking c) Atlantic subclades, deep subclades removed, small subclades and singletons shown DF21 Z253 L513 DF49 S1051 England 24 29 15 23 17 Ireland 84 49 47 41 8 Scotland 39 26 18 13 19 Wales 10 5 4 3 English-speaking diaspora 67 55 32 45 18 France 2 4 1 6 Iberia and Latin America 1 7 6 Italy and Greece 1 Germany 2 4 Low countries 2 2 Scandinavia 3 7 1 1 East-Central Europe 1 Unknown 42 38 21 24 17 Total 274 222 144 158 86 Note: All the subclades in (d) have been removed from Table (b) d) Deep subclades M222 Z3000 S190 Z16282 P314 England 18 11 8 4 Ireland 459 159 24 37 37 Scotland 125 24 33 2 Wales 2 16 English speaking diaspora 187 57 18 14 6 France 4 1 Germany 2 1 1 Scandinavia 4 4 Unknown 165 48 26 3 6 Total 966 320 109 58 53 Note: Colours represent the different upstream L21 subclades Z255FGC11134 DF41 DF63 Z251 S1051 FGC5494 CTS3386 21 9 20 3 15 12 15 97 146 36 15 12 13 14 11 45 10 35 17 10 14 4 6 1 6 3 4 1 1 68 35 40 50 23 31 22 2 2 2 4 1 3 1 1 9 3 1 4 1 1 1 3 1 2 6 2 9 2 1 2 2 2 1 1 1 1 4 91 62 54 18 17 20 13 19 337 268 205 116 97 96 76 40 Z255FGC11134 DF41 DF63 Z251 S1051 FGC5494 CTS3386 15 7 17 6 12 17 17 4 75 102 14 3 8 5 9 9 10 3 9 3 4 3 3 1 26 8 21 10 9 19 3 4 1 1 3 3 2 2 1 54 34 18 19 19 18 20 1 3 2 2 3 3 1 4 3 6 1 1 1 1 1 4 2 8 3 1 2 1 1 1 1 1 1 5 70 45 17 9 10 17 5 2 264 206 107 61 76 86 66 24 ed to subclades . of DF49>M222 not taking 67 marker testing (. (25% vs 14%), probaby because M222 is an old project and many d singletons shown Z251 FGC5494 DF41 DF63 CTS1751 CTS3386 S1026FGC11134 12 17 13 6 7 4 3 4 12 12 9 6 12 10 5 5 9 3 5 10 2 4 4 2 2 3 3 2 1 1 19 20 11 19 6 1 6 2 3 1 3 3 2 4 3 1 1 1 4 2 1 1 1 1 2 1 1 5 10 5 9 9 2 2 3 7 76 66 57 61 31 24 25 20 L1336 L1402 L1065 L193 FGC9793 Z16372 CTS3087 Z23532 1 3 32 11 3 2 2 26 9 57 27 34 17 14 4 1 181 59 4 6 1 7 1 3 2 4 9 8 41 28 4 3 2 3 1 1 5 1 3 4 4 11 107 26 9 3 6 6 42 34 429 160 54 29 26 23 CTS1751 S1026 Z16500 L371 MC14 S16264 Other Unknown 4 3 4 1 1 11 91 9 8 2 2 1 3 122 3 4 2 3 6 80 2 4 4 30 9 6 4 5 1 4 26 195 3 2 1 2 26 17 1 1 1 2 20 2 13 2 1 7 7 4 2 3 1 5 66 34 29 14 13 9 7 63 670 CTS1751 S1026 Z16500 L371 MC14 Other Unknown Total 7 3 4 14 80 402 11 5 1 1 87 1464 1 1 1 1 7 202 2 4 1 4 4 53 719 2 1 4 1 3 25 105 6 6 2 5 4 14 143 983 3 2 2 24 64 11 33 1 1 8 1 17 42 11 53 2 10 1 7 15 2 3 1 2 3 3 107 902 31 25 12 12 13 44 575 5002 an old project and many have lost interest and have not kept up-to-date with testing norms, Z16500 L1335 MC14 L371FGC13742FGC13780 A5846 S16264 4 2 2 1 2 1 1 1 1 4 6 1 4 1 1 2 4 5 2 1 2 2 1 1 3 2 1 12 6 13 12 3 4 3 5 L226 S856 CTS9881 Z255 CTS4466 A40 S7898 L745 5 5 3 15 3 3 1 1 117 24 13 86 100 10 11 4 3 6 2 26 8 11 1 5 1 1 19 5 11 54 32 4 5 3 3 1 1 1 9 3 2 43 9 6 70 39 4 2 4 187 50 35 265 186 32 22 18 Total 382 2103 882 114 1235 76 49 11 59 64 13 13 1275 6276 A7900FGC21979 Y14240 BY575 L21* DF13* Z39589* ZZ10* 1 1 3 1 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1 2 2 1 3 3 6 1 5 Total 131 1269 505 30 513 11 11 30 597 3097 Total 191 305 159 49 325 29 21 5 14 7 16 7 197 1325 TableC3. Correlations between countries and regions based on subclade distributions. Northern English Country/Region England Ireland Scotland Wales Ireland diaspora England 1 Ireland 0.584 1 Northern Ireland 0.596 0.914 1 Scotland 0.744 0.661 0.752 1 Wales 0.761 0.433 0.436 0.588 1 English diaspora 0.758 0.933 0.942 0.729 0.603 1 France 0.757 0.426 0.462 0.391 0.479 0.624 Germany 0.712 0.392 0.503 0.721 0.56 0.545 Scandinavia 0.703 0.409 0.338 0.452 0.367 0.513 Low Countries 0.621 0.17 0.255 0.706 0.748 0.327 Iberia/Hispanic 0.540 0.083 0.101 0.118 0.334 0.250 Mediterranean 0.638 0.32 0.336 0.587 0.455 0.401 East Europe 0.219 -0.023 0.001 -0.012 0.176 0.064 Note: The correlations of continental countries are mostly mediated through England rather than a dir (the exception being Eastern Europe that shows spread from Germany) ade distributions. Scand- Low Mediterr East France Germany Iberia inavia Countries -anean Europe 1 0.453 1 0.586 0.451 1 0.265 0.705 0.164 1 0.531 0.229 0.603 0.089 1 0.424 0.519 0.427 0.561 0.159 1 0.178 0.522 0.190 0.148 0.165 0.34 1 ngland rather than a direct connection Table C4. Skeleton skyline average subclade and country distributions a)Major subclades k=3 k=10 k=15 k=20 k=25 k=39 DF49 22.7% 23.2% 19.4% 14.3% 11.4% 11.1% DF21 19.7% 18.0% 17.9% 17.1% 15.1% 15.1% Z253 11.4% 12.3% 13.7% 15.2% 14.1% 12.2% L513 10.6% 9.7% 10.1% 9.0% 8.2% 6.6% L1335 9.6% 8.0% 5.5% 2.1% 0.9% 1.3% Z255 5.4% 6.2% 5.3% 2.9% 1.4% 0.9% FGC11134 4.9% 5.3% 4.9% 4.1% 3.5% 3.8% DF41 3.3% 2.8% 3.6% 4.9% 6.0% 5.1% S1051 2.2% 2.2% 2.7% 4.1% 4.7% 4.5% Z251 2.1% 2.7% 3.8% 5.8% 7.5% 7.4% FGC5494 2.1% 2.4% 3.4% 5.1% 6.6% 7.2% DF63 1.7% 1.6% 2.2% 3.7% 5.2% 4.7% Other 4.2% 5.9% 7.4% 11.5% 15.3% 18.7% Unknown 5.6% 6.7% 9.0% 11.9% 7.5% 3.4% b) Major countries England 12.5% 14.5% 20.0% 27.0% 29.3% 29.6% Ireland 53.6% 52.9% 47.0% 37.5% 29.7% 27.3% Scotland 24.2% 21.7% 19.0% 17.0% 19.1% 19.2% Wales 3.3% 3.3% 4.0% 5.1% 6.1% 5.4% France 1.9% 2.3% 3.2% 4.6% 6.1% 7.6% Other 4.9% 5.4% 6.8% 8.8% 9.5% 8.9% N 2383 1567 752 559 348 155 c) England DF49 11.0% 8.9% 8.5% 8.3% 7.8% 8.8% DF21 14.3% 14.4% 14.0% 12.1% 11.5% 11.6% Z253 13.7% 16.0% 17.9% 18.4% 16.8% 16.0% L513 7.9% 7.9% 7.8% 6.2% 5.3% 3.9% L1335 8.5% 6.5% 2.2% 1.0% 0.0% 0.0% Z255 5.1% 6.0% 3.4% 2.1% 0.6% 0.8% FGC11134 1.7% 2.1% 1.8% 2.2% 3.0% 4.6% DF41 5.1% 5.8% 6.3% 6.4% 6.3% 8.4% S1051 6.1% 5.3% 5.6% 5.4% 5.1% 2.1% Z251 5.7% 6.3% 6.8% 8.5% 10.6% 8.7% FGC5494 7.6% 6.0% 7.7% 8.7% 8.7% 10.5% DF63 1.7% 2.1% 2.7% 2.2% 3.0% 2.2% Other 11.5% 12.8% 15.3% 18.6% 22.1% 22.5% d) Ireland DF49 29.5% 29.7% 25.0% 20.2% 15.4% 11.0% DF21 22.4% 20.9% 22.9% 22.7% 23.8% 29.9% Z253 11.9% 13.1% 13.4% 13.7% 13.8% 13.3% L513 11.3% 10.1% 11.3% 14.8% 15.5% 12.2% L1335 3.6% 3.3% 2.7% 1.5% 0.8% 1.3% Z255 5.7% 6.6% 6.5% 3.8% 2.5% 1.3% FGC11134 8.3% 9.1% 8.7% 7.6% 5.3% 4.1% DF41 2.0% 1.5% 1.5% 2.1% 3.2% 3.8% S1051 0.4% 0.3% 0.5% 1.2% 1.8% 2.5% Z251 1.4% 1.4% 2.1% 4.5% 5.8% 7.5% FGC5494 1.2% 1.7% 2.0% 3.2% 5.1% 6.0% DF63 0.5% 0.1% 0.1% 0.3% 0.3% 0.6% Other 1.7% 2.2% 3.2% 4.5% 7.7% 5.9% e) Scotland DF49 19.1% 20.9% 16.4% 9.4% 7.4% 7.4% DF21 14.6% 13.5% 13.2% 16.5% 13.0% 9.1% Z253 7.0% 8.1% 11.8% 16.2% 13.4% 11.2% L513 15.4% 13.5% 15.8% 13.1% 13.2% 13.0% L1335 24.8% 24.4% 18.7% 6.5% 2.3% 2.2% Z255 1.6% 2.5% 3.3% 2.1% 0.9% 1.6% FGC11134 0.6% 0.8% 1.5% 2.3% 1.6% 1.1% DF41 4.7% 2.9% 3.4% 4.8% 7.2% 2.3% S1051 4.1% 3.1% 2.9% 5.0% 6.9% 7.3% Z251 1.5% 1.9% 2.3% 4.5% 5.8% 4.6% FGC5494 0.8% 1.3% 1.5% 2.9% 4.5% 4.2% DF63 2.2% 2.1% 2.1% 4.2% 6.0% 11.1% Other 3.6% 5.0% 7.1% 12.5% 17.9% 22.1% Source: L21 database, 111 marker subset. Averages are taken across five different orderings of the data, including the two extreme orderings ma wo extreme orderings maintaining maximum and minimum distances.