Population Varieties within Y-Haplogroup I and
their Extended Modal Haplotypes
For background the reading below
is recommended first
Figure 1 - Haplogroup
Figure 2 - Finding
Figure 3 -
Bifurcation of I1a with DYS462
Figure 4 - I1a Types
- European Geography of I1a
Here - Modal Haplotypes for Varieties Within
Here - Estimating Age of Descendant Haplotype
Figure 6 - Population Growth Inhomogeneities & Variance
--------------------- Ken Nordtvedt
Comments, questions, or corrections are encouraged to
individual male who was the founder of Y-Haplogroup I some thousands of years
ago had a unique haplotype of STR repeat values at whatever number of markers we
measure today. As the generations of the
founder’s descendants came and went, independent mutations accumulated in the
different marker repeat values and were themselves passed down, with there
becoming more diversity of values for the fast mutating markers and less for
the slow mutating markers. In absence
of other factors, today’s descendant population of this founder will then show
distributions of repeat values at different markers reflecting this mutational
process. A typical pattern of counts for
a marker might be 2, 9, 87, 11, 1 counts for 11, 12, 13, 14, 15 repeats,
respectively, for a sample of 110 haplotypes taken from today’s descendant
population. 13 repeats in this case is
said to be the “modal” repeat value for the marker and for the population being
examined. The collection of the modal repeat values for the whole set of
measured markers composes the descendant population’s modal haplotype. If the number of generations since the
founder is not too great, the modal haplotype will be discernable, and it is a
best bet if one wishes to infer the founder’s original haplotype.
Having two grandfathers of Y-Haplogroup I1a, one
from Norway and the other with roots in lowland Scotland, but the two having an
estimated most recent common ancestor several thousand years ago, I became
interested in tracing the ancient history of I1a, and eventually all the clades
within Y-Haplogroup I. My method is to
search for all the founders within that haplogroup who, as the Y-Haplogroup I
peoples spread out across Europe after the last glacial maximum, became the
locational and temporal roadmarks for that history, and by leaving their
clustering imprints in the distribution of the haplotypes found today --- leave
us a way to discover their existence in the distant past.
If we are
lucky, founders of unique clusters of haplotypes centered about a modal
haplotype may have an already discovered SNP mutation closely associated with them;
but that may not be the case as SNP discoveries follow their own path in the
laboratory, independent of the development of databases of STR-based haplotypes. In the absence of defining SNPs, we can
nevertheless still discover these founders and their descendant population
varieties by discerning the population structure in the haplotype
databases. This works best when the
haplotypes are extended --- consisting of a large number of markers. The markers of a variety which establish
different modal values from their parent population occur randomly among the
markers (after taking account of their different mutational rates), so more
markers means better chances to find a variety’s special identifying marker
the distribution of counts I showed previously which could result from a few
hundred generations of mutations, suppose a marker for the population of 110
haplotypes showed instead a count distribution of 5, 47, 52, 4, 2 at the same
consecutive repeat values. Statistically
speaking, such a distribution is very unlikely to have arisen purely as a
result of mutations from a founder’s unique repeat value. It rather suggests that a later descendant
founder with the marker’s repeat value
displaced by one from that of the father founder became unusually prolific and
developed a very robust descendant population of his own. So a superposition of two populations is
being seen. If such odd count
distributions are seen at a number of markers, the suspicion of their being two
separate populations with different modal haplotypes is supported and can be
checked. Correlated counts between the
markers with unusual distributions are performed; this process is illustrated in
with an actual case study --- the discovery of “Isles” I1c variety. This is basically how I find varieties or
sub-populations within parent populations, and how the different modal
haplotypes for such varieties are established.
If the key markers with unique modals happen to occur in databases with
geographical information attached to the haplotypes, then this sometimes
supplies the “frosting on the cake”, the association of a unique geographical
place of origin with the discovered
extended modal haplotypes have generally been found by working with the
Sorenson Molecular Genetics Foundation (SMGF) database of 43 marker haplotypes. I have added the 4-copy marker DYS464a,b,c,d to my modal haplotypes by
database. SMGF has not yet included this
marker in their database, but I have found it a powerful marker for helping to
distinquish varieties from each other, and hopefully SMGF will someday add this
marker. When possible I have used the YHRD database to learn more about
the geographical associations with varieties, but YHRD includes so few markers
in its database this is often impossible for varieties whose key defining
markers are not in that small set.
Different journal papers with haplotype databases have been consulted on
an opportunistic basis; special note should be made of Capelli’s regional
survey of the British
the 2004 paper on Y-Haplogroup I by
coding has been used to indicate the weak, moderate, and strong modal repeat
values in my spreadsheet of these modal haplotypes. Useful marker repeat values for identifying
different varieties are also indicated.
You can immediately go to the spreadsheet of identified Y-Haplogroup I
varieties and their modal haplotypes HERE or continue the text below with commentary on
The I-Haplogroup Tree.
for I-Haplogroup sub-clade structure and defining SNP mutations is shown in Figure 1. While incorporating the very latest findings,
the tree is subject to change with the discovery of additional non-private SNPs
within I-Haplogroup, or to SNP testing of unusual haplotypes within
I-Haplogroup which might prove negative for some of the apparently redundant
SNPs in the present tree while positive for others. For instance, if haplotypes were found that were
positive for P30 but negative for M253 and M307, then the I1a portion of the
tree would have to be redrawn to include a branch emerging from between the
three mentioned SNPs. The colored
branches indicate the most populous subclades --- I1a1, I1b*, and I1c*. Rootsi et al have found a small fractional
population of I1a4 haplotypes among the I1a population of Eastern Europe.
A good sized I1b2 population is found in Sardinia and parts of Iberia, with tiny amounts spread elsewhere
And I1c1 with the derived M284 state has been confirmed from a British Isles haplotype as well as a laboratory
specimen with stated Basque origins.
Rootsi et al have reported a small population of I* haplotypes; however
they did not test for the P38 SNP, so it is presently not possible to know if
their unaffiliated haplotypes are I* or I1*.
The dotted lines connecting P78 and P95 to M223 indicate that exactly
where and in which temporal order these SNPs establish subclades within I1c is
yet to be determined by future haplotype testing for these SNPs. I have had
extended 43 marker haplotypes measured for both P78+ and P95+ dna samples, and
they robustly express the motif of I1c, M223+ haplotypes. The measurement of a 43 marker haplotype for
a I1a4, M227+ sample reveals a motif which except for a very unusual 10 repeats
at DYS426 (an extremely slow mutating
marker) appears as a very normal I1a1 haplotype. So the I1a4 sample is presently being tested
for P40, and it is possible that this clade was incorrectly declared a parallel
clade and instead should be a subclade of I1a1. All other indicated branches on the tree
have yet no reported haplotype populations, but this situation could change
soon, especially for I*, I1*, and I1a*
Description of Haplogroup I Varieties
all I1a has the very unusual 8 repeats at DYS455, a very slow mutating
marker. And virtually no other European
haplotypes outside of I1a have 8 repeats at DYS455.
This makes identifying and studying
I1a haplotypes quite straightforward if one's extended haplotypes
include this marker. The motif YCAIIa,b
= 19,21 is also close to universal over all of I1a, however I1c shares this
same modal pair of repeat values at this marker, so one would look first to DYS455 before YCAIIa,b in identifying
(AngloSaxon) is the
most populous form of I1a that is found.
It must be considered the major core haplotype variety of I1a; it
acquired its nickname (AngloSaxon) because it reaches its highest percentages of
population in areas of continental Europe where the Anglo-Saxons are said to have originated
--- Netherlands, northwest Germany, Denmark.
It is also found in good amount throughout modern Germany, but falls to about half the
fraction of total population by the time you get to south and eastern Germany.
also has a good amount of this basic I1a variety. A good amount of this I1a variety has been
brought to the British Isles; the most plausible scenario is that the Anglo-Saxon
invader/immigrants brought it. Regional
studies in the Isles such as that of Capelli show this I1a variety reaching
highest densities in those lsles locations where Anglo-Saxons and later
immigrants of the Danelaw settled.
I1a-AS haplotype to look for in papers using only a small set of markers is
14,22,(13,14),10,11,13,(12,28),14 at DYS19,390,385a,b,391,392,393,389i,ii,388. The entire I1a-AS modal haplotype as
exhibited is remarkably stable at all the rest of the markers other than the
few specifically discussed.
satellite or neighboring populations of I1a-AS with DYS19 = 15, or with DYS385a,b = 14,14 or 13,15 or 13,13
also exist in the same areas that the core haplotype with DYS385a,b = 13,14 is found. The whole extended I1a-AS haplotype population is dominated by the
modal DYS462 = 12; this marker, like DYS455, has about the slowest mutation
rate of any found today; so it rarely
mutates in the time since the founder.
But unlike DYS455 which has remained universally at 8 repeats throughout
I1a, the repeat value at DYS462 shows a single major shift in dominant repeat value to
13 right in the middle of the I1a population's migration northward in Europe,
as is discussed further below. Whereas
the core 14,22,(13,14) I1a-AS has about an equal split between the motifs
12,14,15,15 and 12,14,15,16 at DYS464a,b,c,d, its satellite populations are not so
evenly split: 14,22,(13,13) and 15,22,(13,14) are predominately 12,14,15,15 at DYS464a,b,c,d; while 14,22,(14,14) and
14,22,(13,15) are predominately 12,14,15,16.
are five small
varieties found within the Anglo-Saxon I1a.
Because of the relatively small populations of these varieties, finding
the special geographical features associated with each of these varieties has
proven difficult, although I believe some differences are there to be
established and work continues on this. The pedigrees in the SMGF database may
permit determining the ratio of continental to Isles populations and the Danish
versus German ancestral counts on the continent. Because these varieties do not differ on any
of the few markers used in the YHRD database, the suberb regional divisions of
that database can not be exploited to gain geographical information.
(Norse) is far and
away the most populous form of I1a found in Sweden and Finland, and is a close second in Norway.
It is found in only tiny quantities in continental Europe south of the Baltic and North Seas, and takes second place to I1a-AS
With its shifts at DYS390 --- 22 to 23 --- and at DYS385a,b --- 13,14 to 14,14 --- the
core Norse haplotype to look for is 14,23,(14,14),10,11,13,(12,28),14 at the
classical markers listed previously. But
the strongly confirmative shift for identifying a Norse I1a rather than Anglo-Saxon
I1a is at DYS462 where there should now be 13 repeats. Also supporting the distinct Norse variety is
the strong dominance 12,14,15,16 at DYS464a,b,c,d (41 to 5) over the
12,14,15,15 motif at this 4-copy marker.
in modalities at DYS462 and DYS464a,b,c,d are shown In Figure 3 across the landscape of I1a types
found in the SMGF. Noting the DYS462 = 12 and 13 populations for each
type gives a good indication of the sizes of the various I1a types. A more detailed graphical presentation of the
DYS464a,b,c,d counts for the various
I1a types is shown in Figure 4. The various
outlying upticks in counts represent the presence of small population varieties
within the indicated I1a type; some are already identified in the sheet of
modal haplotypes while others are still being worked on.
geographical shifts which accompany the different varieties of I1a are shown in
data was obtained from the previously linked YHRD regional database.
(ultra-Norse) reaches its peak density in Norway where it is the most numerous form
of I1a as seen in the YHRD database. Its
core haplotype motif differs from Norse I1a by a shift at DYS385b --- 14 to 15 --- then taking
the form 14,23,(14,15) at DYS19,390,385a,b. It is
the third most populous in Sweden and Denmark after the Norse I1a-N and
Anglo-Saxon I1a-AS forms. It also has 13
repeats at DYS462, putting itself on the same side with Norse I1a of the great DYS 462 bifurcation of I1a; and it has
essentially no 12,14,15,15 at DYS464a,b,c,d as seen in Ysearch database, being mainly
12,14,15,16. I1a-uN is very close to
totally absent south of the Baltic and North Seas.
itself splits into two discernable varieties, based on the DYS461 repeat value. I1a-uN1 has the same modals 12,28 at DYS461,449 which most all of I1a
has. I1a-uN2 has 11,29 repeats at
DYS461,449 and according to the listed
pedigrees in the SMGF database, this latter division of ultra-Norse I1a is even
more strongly associated geographically with Norway over Sweden and
Denmark. I1a-uN2 also shows an
interesting shift in its modality at DYS464, being 11,14,14,16.
I1a1a is a small sub-clade variety of I1a1
which Rootsi et al found in about 10 percent of the I1a from Eastern Europe.
From the Sorenson database we confirmed its tendency to come from that
part of Europe, but examples were also found in Germany and Denmark.
It looks very much like standard Anglo-Saxon I1a except for there being
10 repeats at DYS426 instead of the usual 11 for I1a.
This unusual marker value was found in the extended haplotype
shown in blue which was measured for a M227+ dna sample obtained from Eastern Europe.
More I1a haplotypes with
10 at DYS426 are needed to test for M227 to see if this unusual marker value is
generally present in this sub-clade.
(Dinaric) is the
main component of I1b. It obtained its
name from a mountain range in the Balkans near where this haplogroup reaches
its most dense presence. It has also
spread out through much of Eastern Europe. Because the
main extended haplotype databases such as SMGF and Ysearch are somewhat
concentrated in their sampling to Northwest Europe, their I1b populations are
(Western) is a
variety of I1b found more in Western Europe, and particularly in a swath across
Germany’s Baltic and North Sea coastal areas, and then into the British
Isles. Western I1b variety is most notably distinquished
by having 15 repeats at DYS388 instead of the usual 13 repeats of Dinaric I1b. I1b-West is also usually 10 at DYS391 instead of I1b-Din being
11. While these two varieties of I1b
share the modal 21,21 at YCAIIa,b, they have differing modal values at a large
number of markers and are not difficult in being distinquished from each
other. I1b-West was discovered in 2004
by a genetic genealogy hobbyist from Finland who himself is I1b-Din with roots
I1b2 is very easy to spot if its very unusual
YCAIIa,b = 11,21 is included in the examined haplotype. This subclade of I1b represents a very large
fraction of the males of Sardinia, an island in the Mediterranean Sea west of Italy, and it is a sizeable contributor
to the population in regions of the Iberian peninsula, but only a small amount is found
in more northerly Europe. It’s unusual modal 12,12 at DYS385a,b helps to identify it with
short haplotypes, but with extended haplotypes all three varieties of I1b are
readily distinquished from each other.
the main variety of haplogroup I1c. The
area of its most dense presence is Northwest Germany and Netherlands, then up into Denmark, and even Southern Sweden and Norway.
A good amount is also found in the British Isles, perhaps brought there by the
Germanic and Scandinavian invader/immigrants in the historic era. I1c-Cont tends to have the high repeat values
at DYS389i,ii, it is modal 23 at DYS390; 14 at DYS437; 10 at DYS445; and 21 at C4. There still seems sufficient complexity in
its repeat distributions at a number of markers, and at one time I had
distinquished a “Northern Continental” I1c variety from a “Southern
Continental” I1c variety based on a few markers. I ended this distinction when I separated
“Isles” I1c from this earlier “Southern Continental” I1c population and found
the remaining population looking more like the “Northern” I1c modal
haplotype. But there is still work to do
with this population which will probably be objectively split again into multiple
continental varieties. Shown in blue are
two extended haplotypes measured from
dna samples derived for two SNPs, P78+ and P95+, which may be found to divide
I1c-Cont. These examples of I1c came
from males in Germany and Netherlands, respectively.
I1c1-Isles is found almost exclusively in the British Isles, and heavily from Scotland at that. In the SMGF database there were no haplotype
pedigrees of this variety originating on the continent.
however, an Isles I1c haplotype was found M284+ and is therefore I1c1
subclade. My original M284+ dna sample
came from a male with origins in the Basque population, and its extended
haplotype is shown in blue. Slightly
changing the haplotype from the I1c1 modal form does bring forth some Brazilian
and other Latin American matches in the SMGF database suggesting Iberian
origins. I1c1 is a candidate haplogroup
which may have arrived in the British Isles in pre-Roman times, and perhaps directly from more
Investigating this hypothesis is continuing.
I1c-Root is an unusual variety of I1c with
modal 19,19 at YCAIIa,b. One Iberian
example from this variety has recently been found negative (ancestral) for
M284. It is found spread throughout Western Europe from Iberia and Italy up through southern Scandinavia.
I1 is a robust variety within
haplogroup I which has not yet been placed in the tree, but some selected SNP
tests should establish its final status The very unusual modal feature
of this variety is the 10,12 motif at DYS455,454 --- two extremely slow
mutating markers. DYS454 has 11 repeats throughout the
rest of Haplogroup I, while DYS455 was previously mentioned to have 8 repeats for
I1a and 11 repeats for I1b and I1c. There is no presently unassigned SNP
within Haplogroup I to become the tag for Ix. It has recently been found to be
positive for the P38 SNP. This variety is also found
well-dispersed in continental Europe from Italy and Iberia, in France and Germany, and up through Denmark.
I1* is very small variety which recently
was found negative for P30, M223, and P37.2, and hence not I1a, I1c, or I1b. Bu earlier it was found derived P38+.
It's 10,10 at DYS459a,b is one of its noticeable modal features. 19,19 or
19,21 at YCAIIa,b are other modal features along with 12,29 at DYS389i,ii.
database populations of these varieties will be added in a future upgrade.