Population Varieties within Y-Haplogroup I and

their Extended Modal Haplotypes

 

Contents

For background the reading below is recommended first 

 

Figure 1 - Haplogroup I Tree

Figure 2 - Finding Correlated Markers

Figure 3 - Bifurcation of I1a with DYS462

Figure 4 - I1a Types and DYS464a,b,c,d

Figure 5 - European Geography of I1a

Here - Modal Haplotypes for Varieties Within Haplogroup I

 

Here - Estimating Age of Descendant Haplotype Populations

Figure 6 - Population Growth Inhomogeneities & Variance Change

 

---------------------  Ken Nordtvedt  ------------------------

 

Comments, questions, or corrections are encouraged to

 knordtvedt@bresnan.net

 

The individual male who was the founder of Y-Haplogroup I some thousands of years ago had a unique haplotype of STR repeat values at whatever number of markers we measure today.  As the generations of the founder’s descendants came and went, independent mutations accumulated in the different marker repeat values and were themselves passed down, with there becoming more diversity of values for the fast mutating markers and less for the slow mutating markers.  In absence of other factors, today’s descendant population of this founder will then show distributions of repeat values at different markers reflecting this mutational process.  A typical pattern of counts for a marker might be 2, 9, 87, 11, 1 counts for 11, 12, 13, 14, 15 repeats, respectively, for a sample of 110 haplotypes taken from today’s descendant population.  13 repeats in this case is said to be the “modal” repeat value for the marker and for the population being examined. The collection of the modal repeat values for the whole set of measured markers composes the descendant population’s modal haplotype.  If the number of generations since the founder is not too great, the modal haplotype will be discernable, and it is a best bet if one wishes to infer the founder’s original haplotype.  

 

Having  two grandfathers of Y-Haplogroup I1a, one from Norway and the other with roots in lowland Scotland, but the two having an estimated most recent common ancestor several thousand years ago, I became interested in tracing the ancient history of I1a, and eventually all the clades within Y-Haplogroup I.  My method is to search for all the founders within that haplogroup who, as the Y-Haplogroup I peoples spread out across Europe after the last glacial maximum, became the locational and temporal roadmarks for that history, and by leaving their clustering imprints in the distribution of the haplotypes found today --- leave us a way to discover their existence in the distant past.

 

If we are lucky, founders of unique clusters of haplotypes centered about a modal haplotype may have an already discovered SNP mutation closely associated with them; but that may not be the case as SNP discoveries follow their own path in the laboratory, independent of the development of databases of STR-based haplotypes.  In the absence of defining SNPs, we can nevertheless still discover these founders and their descendant population varieties by discerning the population structure in the haplotype databases.  This works best when the haplotypes are extended --- consisting of a large number of markers.  The markers of a variety which establish different modal values from their parent population occur randomly among the markers (after taking account of their different mutational rates), so more markers means better chances to find a variety’s special identifying marker modal values.

 

Unlike the distribution of counts I showed previously which could result from a few hundred generations of mutations, suppose a marker for the population of 110 haplotypes showed instead a count distribution of 5, 47, 52, 4, 2 at the same consecutive repeat values.  Statistically speaking, such a distribution is very unlikely to have arisen purely as a result of mutations from a founder’s unique repeat value.  It rather suggests that a later descendant founder with the marker’s  repeat value displaced by one from that of the father founder became unusually prolific and developed a very robust descendant population of his own.  So a superposition of two populations is being seen.  If such odd count distributions are seen at a number of markers, the suspicion of their being two separate populations with different modal haplotypes is supported and can be checked.  Correlated counts between the markers with unusual distributions are performed; this process is illustrated in Figure 2 with an actual case study --- the discovery of “Isles” I1c variety.   This is basically how I find varieties or sub-populations within parent populations, and how the different modal haplotypes for such varieties are established.  If the key markers with unique modals happen to occur in databases with geographical information attached to the haplotypes, then this sometimes supplies the “frosting on the cake”, the association of a unique geographical place of origin  with the discovered variety.

 

My extended modal haplotypes have generally been found by working with the Sorenson Molecular Genetics Foundation (SMGF) database of 43 marker haplotypes.  I have added the 4-copy marker DYS464a,b,c,d to my modal haplotypes by using the Ysearch database.  SMGF has not yet included this marker in their database, but I have found it a powerful marker for helping to distinquish varieties from each other, and hopefully SMGF will someday add this marker.  When possible I have used the YHRD database to learn more about the geographical associations with varieties, but YHRD includes so few markers in its database this is often impossible for varieties whose key defining markers are not in that small set.  Different journal papers with haplotype databases have been consulted on an opportunistic basis; special note should be made of Capelli’s regional survey of the British Isles and the 2004 paper on Y-Haplogroup I  by Rootsi et al.

 

Color coding has been used to indicate the weak, moderate, and strong modal repeat values in my spreadsheet of these modal haplotypes.  Useful marker repeat values for identifying different varieties are also indicated.  You can immediately go to the spreadsheet of identified Y-Haplogroup I varieties and their modal haplotypes HERE or continue the text below with commentary on the varieties. 

 

The I-Haplogroup Tree.

 

The tree for I-Haplogroup sub-clade structure and defining SNP mutations is shown in Figure 1.  While incorporating the very latest findings, the tree is subject to change with the discovery of additional non-private SNPs within I-Haplogroup, or to SNP testing of unusual haplotypes within I-Haplogroup which might prove negative for some of the apparently redundant SNPs in the present tree while positive for others.  For instance, if haplotypes were found that were positive for P30 but negative for M253 and M307, then the I1a portion of the tree would have to be redrawn to include a branch emerging from between the three mentioned SNPs.  The colored branches indicate the most populous subclades --- I1a1, I1b*, and I1c*.  Rootsi et al have found a small fractional population of I1a4 haplotypes among the I1a population of Eastern Europe.  A good sized I1b2 population is found in Sardinia and parts of Iberia, with tiny amounts spread elsewhere in Europe.  And I1c1 with the derived M284 state has been confirmed from a British Isles haplotype as well as a laboratory specimen with stated Basque origins.  Rootsi et al have reported a small population of I* haplotypes; however they did not test for the P38 SNP, so it is presently not possible to know if their unaffiliated haplotypes are I* or I1*.  The dotted lines connecting P78 and P95 to M223 indicate that exactly where and in which temporal order these SNPs establish subclades within I1c is yet to be determined by future haplotype testing for these SNPs. I have had extended 43 marker haplotypes measured for both P78+ and P95+ dna samples, and they robustly express the motif of I1c, M223+ haplotypes.  The measurement of a 43 marker haplotype for a I1a4, M227+ sample reveals a motif which except for a very unusual 10 repeats at DYS426 (an extremely slow mutating marker) appears as a very normal I1a1 haplotype.  So the I1a4 sample is presently being tested for P40, and it is possible that this clade was incorrectly declared a parallel clade and instead should be a subclade of I1a1.   All other indicated branches on the tree have yet no reported haplotype populations, but this situation could change soon, especially for I*, I1*, and I1a* 

 

Description of Haplogroup I Varieties

 

Almost all I1a has the very unusual 8 repeats at DYS455, a very slow mutating marker.  And virtually no other European haplotypes outside of I1a have 8 repeats at DYS455.  This makes identifying and studying  I1a haplotypes quite straightforward if one's extended haplotypes include this marker.  The motif YCAIIa,b = 19,21 is also close to universal over all of I1a, however I1c shares this same modal pair of repeat values at this marker, so one would look first to DYS455 before YCAIIa,b in identifying I1a haplotypes.

 

I1a-AS (AngloSaxon) is the most populous form of I1a that is found.  It must be considered the major core haplotype variety of I1a; it acquired its nickname (AngloSaxon) because it reaches its highest percentages of population in areas of continental Europe where the Anglo-Saxons are said to have originated --- Netherlands, northwest Germany, Denmark.  It is also found in good amount throughout modern Germany, but falls to about half the fraction of total population by the time you get to south and eastern Germany.  Southern Sweden also has a good amount of this basic I1a variety.  A good amount of this I1a variety has been brought to the British Isles; the most plausible scenario is that the Anglo-Saxon invader/immigrants brought it.  Regional studies in the Isles such as that of Capelli show this I1a variety reaching highest densities in those lsles locations where Anglo-Saxons and later immigrants of the Danelaw settled.

 

The core I1a-AS haplotype to look for in papers using only a small set of markers is 14,22,(13,14),10,11,13,(12,28),14 at DYS19,390,385a,b,391,392,393,389i,ii,388.  The entire I1a-AS modal haplotype as exhibited is remarkably stable at all the rest of the markers other than the few specifically discussed.

 

Large satellite or neighboring populations of I1a-AS with DYS19 = 15, or with DYS385a,b = 14,14 or 13,15 or 13,13 also exist in the same areas that the core haplotype with DYS385a,b = 13,14 is found.  The whole extended I1a-AS  haplotype population is dominated by the modal DYS462 = 12; this marker, like DYS455, has about the slowest mutation rate of  any found today; so it rarely mutates in the time since the founder.  But unlike DYS455 which has remained universally at 8 repeats throughout I1a, the repeat value at DYS462 shows a single major shift in dominant repeat value to 13 right in the middle of the I1a population's migration northward in Europe, as is discussed further below.  Whereas the core 14,22,(13,14) I1a-AS has about an equal split between the motifs 12,14,15,15 and 12,14,15,16 at DYS464a,b,c,d, its satellite populations are not so evenly split: 14,22,(13,13) and 15,22,(13,14) are predominately 12,14,15,15 at DYS464a,b,c,d; while 14,22,(14,14) and 14,22,(13,15) are predominately 12,14,15,16.

 

I1a-AS(1-5) are five small varieties found within the Anglo-Saxon I1a.  Because of the relatively small populations of these varieties, finding the special geographical features associated with each of these varieties has proven difficult, although I believe some differences are there to be established and work continues on this. The pedigrees in the SMGF database may permit determining the ratio of continental to Isles populations and the Danish versus German ancestral counts on the continent.  Because these varieties do not differ on any of the few markers used in the YHRD database, the suberb regional divisions of that database can not be exploited to gain geographical information.

 

 

I1a-N (Norse) is far and away the most populous form of I1a found in Sweden and Finland, and is a close second in Norway.  It is found in only tiny quantities in continental Europe south of the Baltic and North Seas, and takes second place to I1a-AS in Denmark.  With its shifts at DYS390 --- 22 to 23 --- and at DYS385a,b --- 13,14 to 14,14 --- the core Norse haplotype to look for is 14,23,(14,14),10,11,13,(12,28),14 at the classical markers listed previously.  But the strongly confirmative shift for identifying a Norse I1a rather than Anglo-Saxon I1a is at DYS462 where there should now be 13 repeats.  Also supporting the distinct Norse variety is the strong dominance  12,14,15,16 at DYS464a,b,c,d (41 to 5) over the 12,14,15,15 motif at this 4-copy marker.

 

The shift in modalities at DYS462 and DYS464a,b,c,d are shown In Figure 3 across the landscape of I1a types found in the SMGF.  Noting the DYS462 = 12 and 13 populations for each type gives a good indication of the sizes of the various I1a types.  A more detailed graphical presentation of the DYS464a,b,c,d counts for the various I1a types is shown in Figure 4. The various outlying upticks in counts represent the presence of small population varieties within the indicated I1a type; some are already identified in the sheet of modal haplotypes while others are still being worked on.

 

The geographical shifts which accompany the different varieties of I1a are shown in Figure 5. The data was obtained from the previously linked YHRD regional database.

 

I1a-uN (ultra-Norse)  reaches its peak density in Norway where it is the most numerous form of I1a as seen in the YHRD database.  Its core haplotype motif differs from Norse I1a by a shift at DYS385b --- 14 to 15 --- then taking the form 14,23,(14,15) at DYS19,390,385a,b.  It is the third most populous in Sweden and Denmark after the Norse I1a-N and Anglo-Saxon I1a-AS forms.  It also has 13 repeats at DYS462, putting itself on the same side with Norse I1a of the great DYS 462 bifurcation of I1a; and it has essentially no 12,14,15,15 at DYS464a,b,c,d as seen in Ysearch database, being mainly 12,14,15,16.   I1a-uN is very close to totally absent south of the Baltic and North Seas.

 

I1a-uN itself splits into two discernable varieties, based on the DYS461 repeat value.  I1a-uN1 has the same modals 12,28 at DYS461,449 which most all of I1a has.  I1a-uN2 has 11,29 repeats at DYS461,449 and according to the listed pedigrees in the SMGF database, this latter division of ultra-Norse I1a is even more strongly associated geographically with Norway over Sweden and Denmark.  I1a-uN2 also shows an interesting shift in its modality at DYS464, being 11,14,14,16.

 

I1a1a is a small sub-clade variety of I1a1 which Rootsi et al found in about 10 percent of the I1a from Eastern Europe.  From the Sorenson database we confirmed its tendency to come from that part of Europe, but examples were also found in Germany and Denmark.  It looks very much like standard Anglo-Saxon I1a except for there being 10 repeats at DYS426 instead of the usual 11 for I1a.  This unusual marker value was found in the extended haplotype  shown in blue which was measured for a M227+ dna sample obtained from Eastern Europe.  More I1a haplotypes with 10 at DYS426 are needed to test for M227 to see if this unusual marker value is generally present in this sub-clade.

 

I1b-Din (Dinaric) is the main component of I1b.  It obtained its name from a mountain range in the Balkans near where this haplogroup reaches its most dense presence.  It has also spread out through much of Eastern Europe.  Because the main extended haplotype databases such as SMGF and Ysearch are somewhat concentrated in their sampling to Northwest Europe, their I1b populations are relatively weak.

 

I1b-West (Western) is a variety of I1b found more in Western Europe, and particularly in a swath across Germany’s Baltic and North Sea coastal areas, and then into the British Isles.  Western I1b variety is most notably distinquished by having 15 repeats at DYS388 instead of the usual 13 repeats of Dinaric I1b.  I1b-West is also usually 10 at DYS391 instead of I1b-Din being 11.  While these two varieties of I1b share the modal 21,21 at YCAIIa,b, they have differing modal values at a large number of markers and are not difficult in being distinquished from each other.  I1b-West was discovered in 2004 by a genetic genealogy hobbyist from Finland who himself is I1b-Din with roots in Estonia.

 

I1b2 is very easy to spot if its very unusual YCAIIa,b = 11,21 is included in the examined haplotype.  This subclade of I1b represents a very large fraction of the males of Sardinia, an island in the Mediterranean Sea west of Italy, and it is a sizeable contributor to the population in regions of the Iberian peninsula, but only a small amount is found in more northerly Europe. It’s unusual modal 12,12 at DYS385a,b helps to identify it with short haplotypes, but with extended haplotypes all three varieties of I1b are readily distinquished from each other.

 

I1c-Cont (Continental) is the main variety of haplogroup I1c.  The area of its most dense presence is Northwest Germany and Netherlands, then up into Denmark, and even Southern Sweden and Norway.  A good amount is also found in the British Isles, perhaps brought there by the Germanic and Scandinavian invader/immigrants in the historic era.  I1c-Cont tends to have the high repeat values at DYS389i,ii, it is modal 23 at DYS390; 14 at DYS437; 10 at DYS445; and 21 at C4.  There still seems sufficient complexity in its repeat distributions at a number of markers, and at one time I had distinquished a “Northern Continental” I1c variety from a “Southern Continental” I1c variety based on a few markers.  I ended this distinction when I separated “Isles” I1c from this earlier “Southern Continental” I1c population and found the remaining population looking more like the “Northern” I1c modal haplotype.  But there is still work to do with this population which will probably be objectively split again into multiple continental varieties.  Shown in blue are two extended haplotypes  measured from dna samples derived for two SNPs, P78+ and P95+, which may be found to divide I1c-Cont.  These examples of I1c came from males in Germany and Netherlands, respectively. 

 

I1c1-Isles is found almost exclusively in the British Isles, and heavily from Scotland at that.  In the SMGF database there were no haplotype pedigrees of this variety originating on the continent.

Recently, however, an Isles I1c haplotype was found M284+ and is therefore I1c1 subclade.  My original M284+ dna sample came from a male with origins in the Basque population, and its extended haplotype is shown in blue.  Slightly changing the haplotype from the I1c1 modal form does bring forth some Brazilian and other Latin American matches in the SMGF database suggesting Iberian origins.  I1c1 is a candidate haplogroup which may have arrived in the British Isles in pre-Roman times, and perhaps directly from more southwesterly Europe.  Investigating this hypothesis is continuing.

 

I1c-Root is an unusual variety of I1c with modal 19,19 at YCAIIa,b.  One Iberian example from this variety has recently been found negative (ancestral) for M284.  It is found spread throughout Western Europe from Iberia and Italy up through southern Scandinavia.

 

I1 is a robust variety within haplogroup I which has not yet been placed in the tree, but some selected SNP tests should establish its final status  The very unusual modal feature of this variety is the 10,12 motif at DYS455,454 --- two extremely slow mutating markers.  DYS454 has 11 repeats throughout the rest of Haplogroup I, while DYS455 was previously mentioned to have 8 repeats for I1a and 11 repeats for I1b and I1c.  There is no presently unassigned SNP within Haplogroup I to become the tag for Ix. It has recently been found to be positive for the P38 SNP.  This variety is also found well-dispersed in continental Europe from Italy and Iberia, in France and Germany, and up through Denmark.

 

I1* is very small variety which recently was found negative for P30, M223, and P37.2, and hence not I1a, I1c, or I1b.  Bu earlier it was found derived P38+.  It's 10,10 at DYS459a,b is one of its noticeable modal features.  19,19 or 19,21 at YCAIIa,b are other modal features along with 12,29 at DYS389i,ii.

 

The database populations of these varieties will be added in a future upgrade.