- The Unz Review

RSS

Follow @razibkhan

Features

Authors Filter? ❌

Ben G Jason Collins Ml P-ter Guest Admin Razib Khan

Nothing found

« Earlier Items

TeasersGene Expression Blog

Genomics

TeasersGene Expression Blog

Genomics

◄►Bookmark◄❌►▲ ▼Toggle AllToC▲▼Add to LibraryRemove from Library • BShow CommentNext New CommentNext New Reply

Razib Khan at the Center of Eurasia

Razib Khan • November 30, 2016

• 300 Words • 17 Comments

🔊 Listen॥■►

RSS

The Eurogenes blog is running a fundraiser. I chipped in mostly to support his continued blogging. I don’t agree with everything he posts, but the site is a good and valuable resource. “Genome blogging” hasn’t gotten as far as I’d have thought it would have in 2010, mostly because the initial burst of enthusiasm wasn’t followed up by a consistent producer community (in addition to the commentariat). But Eurogenes soldiers on….

In any case, as part of the donation I got an analysis of my genotype. I wasn’t super interested in this because I know a fair amount about my genotype, and the analysis isn’t too informative for someone like me who is >10% East Eurasian. But I thought I would post the PCA because it’s interesting. It’s hard to see in the image above (click it for a larger version), but you notice that I am almost equally positioned between the antipodes of Eurasia (Western Europeans vs. East Asians). Most South Asians occupy a position in between, but definitely skewed toward West Eurasians (the green). But you notice I’m nearly the most “eastern” of the main cluster of South Asians. Depending on how you calculate it I’m between 10% and 20% East Eurasian. This “pulls” me in that direction more than almost all other South Asians. Like a colossus I look east, and I look west, and stand athwart Eurasia bridging the gap.

In any case, if you have $12 to spare, think about donating to Eurogenes. Logistics are at the link.

• Category: Science • Tags: Genomics

Putting the Semantics Before the Horse

Razib Khan • November 15, 2016

• 400 Words • 39 Comments

🔊 Listen॥■►

RSS

Listened to an interesting interview this morning with the author of a new book, The Latinos of Asia: How Filipino Americans Break the Rules of Race. There was a lot to agree with and disagree with, but it rang true in many ways for me because I have had a fair number of students with roots in the Philippines. An early portion of the interview illustrates an important dynamic. The author himself has parents from the Philippines and when his university was running a study on alcohol consumption and those of “Asian” ancestry. When he approached to be a participant though the researchers said that what they were looking for were people of Japanese, Korean and Chinese ancestry, because they had the right “population structure.”

Naturally this was somewhat offensive. The author pointed that as a sociologist he believes race is a “social construct.” It is also the case that people from the Philippines occupy a someone liminal position of both “Asian” and “Latino” identities. As a South Asian I can relate, as I am “Asian”, but not “typically” Asian.

From what I can gather the research group was rather artless in the way they communicated their necessary conditions for their project, but the researchers probably were correct in excluding the author. Alcohol flush reaction segregates in only a finite set of East Asians. With limited resources it is rational for them to exclude individuals from populations where the variants of interest are not present, or at very low frequencies.

The problem is that the author confuses the terminology, “Asian”, with reality. A common tendency in the “post-modern” style of thought is putting primacy in the power of language to shape our perception of reality. The fact is that people from the Philippines have very distinct genetic structure in relation to Northeast Asians. Whether they are categorized as Asian or Latino does not truly impact that fact, unless one is of Chinese background and from the Philippines. To me it is ironic that so many scholars place into language so much power when language is only an imperfect mapping onto reality.

• Category: Science • Tags: Genomics, Science

Genomics for 6% of Newborns?

Razib Khan • November 2, 2016

• 200 Words • 3 Comments

🔊 Listen॥■►

RSS

40_weeks_pregnant Would You Want To Know The Secrets Hidden In Your Baby’s Genes? Turns out most people don’t. The article profiles the BabySeq Project, and the offer of whole exome sequencing (exomes are the parts of the genome which code for proteins).

In some ways, the results were discouraging:

One thing Green hadn’t anticipated is how hard it has been to convince new parents to do this screening in the first place. Early research showed the majority of parents were interested in the medical information. But 94 percent of parents Green and his team are approaching are saying no.

So that leaves 6 percent. That’s not a trivial number. There are about 4 million babies born per year, so that would be 240,000 newborns with their exome sequenced. And part of the balking is just fear. People will get over this. Technologies like this have an S-shaped adoption curve. It starts out low at first, and then expands to most of the population. Some portion will always opt-out, and that’s their right.

Probably the bigger issue is that people need to not overreact. A lot of loss-of-function mutations turn out to be innocuous in many people.

• Category: Science • Tags: Genomics

No One Knows About the Third Human Admixture Into Melanesians

Razib Khan • October 28, 2016

• 300 Words • 6 Comments

🔊 Listen॥■►

RSS

Vanuatu_blonde There are several reports in the media about a third hominin group besides Denisovans and Neanderthals, and how they contributed to Melanesians. Science News has a sober summary of it all.

Several people have asked me on email and Twitter about this, and I told them to ignore it. The reason I say this is that I was in the room when the presentation was given, and it was clear people were having a hard time following what was going on. Afterward several pretty intelligent statistical human geneticists expressed great confusion.

A few things to take away. First, it was a presentation at a conference. One can’t expect very novel findings to be understood easily in a 15-30 minute talk. Wait for the preprint at least. Second, this was a presentation at a conference. A lot of presentations don’t pan out. If it’s a really surprising theoretical or interpretative finding, as opposed to sequencing a new species (an empirical result), generally I don’t pay close attention. A lot of time the reason that no one else has stumbled onto the surprising results is that they are wrong, or trivial upon further inspection.

Finally, there are complexities of human history we don’t have a good grasp on. There may indeed be other hominins which contributed to the human gene pool to the point of detectable admixture. I think it is likely. But it is a different thing to have specified all the details.

Basically, I wish the press would set a higher bar for presenting on new results from conferences. It’s not even that it’s not been peer reviewed. Often the results are provisional, and they don’t end up turning into the paper that’s promised.

• Category: Science • Tags: Genomics

The Egyptians Live

Razib Khan • October 28, 2016

• 400 Words • 4 Comments

🔊 Listen॥■►

RSS

For various sociocultural reasons ancient Egyptians are a big deal. The pyramids of Giza are about as distant from the time of Augustus as Classical Rome is from us. When the pyramids were rising the world was mostly prehistory. Africa was dominated by hunter-gatherers, as was much of Southeast Asia. The genetic cluster which we recognize as Northern Europeans was only coming into focus, while South Asians as we understand them today may not totally have been a coherent group.

It was a very different time. Down to the present day one population can plausibly claim a connection to ancient Egypt, and that population are the Copts. Though now extinct, their language was a direct descendant of ancient Egyptian, which was not a Semitic tongue. As Christians in a nation which as been Muslim for over 1,000 years, with the period after 1000 A.D. likely majority Muslim, they likely have experienced less genetic perturbation than other groups in the area.

The paper The genetics of East African populations: a Nilo-Saharan component in the African genetic landscape has some Coptic, and other, samples. There are 175,000 markers on this chip. Merged with the HGDP populations you get about 30,000.

The PCA shows that Copts are mostly West Eurasian in ancestry. But they seem shifted to Northeast African Sub-Saharan groups The Mozabite Berbers are shifted toward West Africans, but their West Eurasian ancestry looks to be more like that of the Copts than West Asians. Though not shifted toward West Asians, the Copts do seem to have affinities with various Sudanese groups.
This TreeMix graph was run on 30,000 markers, with the Sudan skewed sample and some HGDP populations. Not surprisingly, there are gene flow arrows from the Copts to the others. In particular, the two Nubian groups, who have long been resident right to the south of Egypt. But, there is also gene flow from a position between the Copts and Sub-Saharan Africans. Finally, observe that Northeast African Bantus, who have some Nilotic admixture from the Bantu, receive gene flow from a more “European” like population, while the Copts receive gene flow from near the Sardinians. All this points to a complex population history.

It seems likely that the Eurasian backmigration into Sub-Saharan African over the Holocene involved several distinct events. Some of them probably date to the period of the Pleistocene.

• Category: Science • Tags: Copts, Genomics

Why 23andMe Is No Longer Leading on Personal Genomics

Razib Khan • October 27, 2016

• 500 Words • 7 Comments

🔊 Listen॥■►

RSS

I really admire what 23andMe has done. To a great extent they are the “Uber” of DTC personal genomics. FamilyTree DNA really pioneered the sector in the early 2000s, while The Genographic Project scaled things up massively in the middle 2000s. But in the late 2000s 23andMe brought Silicon Valley “disruption” to the game, pushing into disease and traits in a way that both the two earlier efforts consciously avoided. We know how that ended.

But it wasn’t all in vain. 23andMe is today a healthy company, and its shoot-first-ask-questions later actions in the first half of the teens really brought personal genomics into peoples’ lives.

So what’s going with stories like this, 23andMe Has Abandoned The Genetic Testing Tech Its Competition Is Banking On:

For years, genetic-testing startup 23andMe was working to develop a cutting-edge technology that could dramatically expand what its customers might learn about their DNA. While the company’s core product, a $199 “spit kit,” can tell you about your health and ancestry based on small bits of your genetic code, tests based on the new technology — called next-generation sequencing — could provide much more comprehensive information, including your potential risks for many diseases.

But 23andMe has given up on the technology for now, BuzzFeed News has learned.

I think one way to understand what’s going on is that though the firm’s consumer face is still as a DTC personal genomics outfit, it is really banking on becoming a genetically savvy pharmaceutical corporation. Genomics is the future, but pharm is the present.

23andMe probably has ~1.5 million genotypes now. They’ll confirm more than 1 million. If they had more than 2 million I assume they would tell us they did. What are they doing with those genotypes? It was always understood by most that 23andMe was increasing its database to the point where they could generate associations that academics could not because of lack of statistical power. The problem now, with more than 1 million genotypes, is that they need phenotypes.

It is much more valuable for 23andMe to get rich data on one customer, than it is to gain one hundred more random genotypes. That’s probably why they’re not sweating that the $199 price point discourages people, especially when those people are getting less than they did in the past. That’s also why they are pulling out of the game in next generation sequencing. Sequencing is basically a commodity business now, and just not as good a return on investment as gearing toward the pharmaceutical market. Sequencing deeply has some benefits, but there is no way 23andMe would be able to subsidize the $1,000 cost of a good 30x genome to get enough of a sample size to return the investment.

None of this is a big secret. A friend of mine was talking about this in the broadest sketches at the 23andMe party at ASHG.

• Category: Science • Tags: 23andMe, Genomics

More Than Can be Imagined in Your Models

Razib Khan • October 8, 2016

• 900 Words • 7 Comments

🔊 Listen॥■►

RSS

One of the most incredible journeys that the human species has undergone is the Austronesian expansion of the past 4,000 years. These maritime peoples seem to have emerged from the islands of Taiwan, and pushed forward south, west, and east, so that their expansion pushed to East Africa, and the fringes of South America. There now also some circumstantial evidence that Polynesian contact with the Americas predates the Columbian Exchange. Looking at the map above in hindsight it seems natural to imagine such contacts.

Though where the Austronesians went is incredible, their origins are somewhat more opaque, but rather tantalizing. That is because their original expansion was likely just before the horizon of history. In Guns, Germs, and Steel Jared Diamond alluded to the “express train” vs. “slow boat” models of the expansion. Basically, whether the Lapita peoples rapidly pushed out from Taiwan, or whether there was a long period of coexistence with Melanesians in Near Oceania. Over the past few years genetics seems to have supported the “slow boat” model.

Here is a paper from 2012, Population Genetic Structure and Origins of Native Hawaiians in the Multiethnic Cohort Study:

The “Express Train” and the “Slow Boat” models of Polynesian migration are expected to have uniquely distinct genetic signatures on present day genomes of Native Hawaiians. Under the “Express Train” model, the proportion of admixture in Native Hawaiians of Melanesian and Asian ancestry is expected to be near zero, whereas under the “Slow Boat” model, the proportion of admixture is expected to be substantially greater than zero. To test these two models, we conducted a supervised ADMIXTURE analysis using Papuan and Melanesians as one source population of Polynesians and Han Chinese, She, Cambodian, Japanese, Yakut, and Yi as surrogates for the second source population of Taiwanese aborigines[18],[19]. Importantly, we did not fix ancestry for the Melanesians or Asians and therefore allowed for admixture within either ancestral groups–thus, mitigating bias by earlier admixture processes and allowing for accurate clusters of ancestry membership. We set K = 2 and estimated in 40 100% Native Hawaiians an average of 32% and 68% of their genomes to be derived from Melanesian and Asian origins, respectively (Figure 4). This notable proportion of Melanesian admixture (32%) among Native Hawaiians, substantially greater than zero, lends support of the “Slow Boat” model of ancestral origins.

This is not an isolated study. Y chromosomes indicate substantial Melanesian admixture, while the mtDNA does not. One inference then was a “slow boat” model predicated on matrifocality. That is, expanding Polynesian groups were centered around matrilineal lineages, and absorbed Melanesian men into their communities. The above research was from a Hawaiian data set, but the results are consistent across Polynesia in relation the proportion of Melanesian ancestry.

Case closed? No so fast! Ancient DNA has now been brought to the question, and fundamentally changed our perceptions. Genomic insights into the peopling of the Southwest Pacific:

The appearance of people associated with the Lapita culture in the South Pacific around 3,000 years ago1 marked the beginning of the last major human dispersal to unpopulated lands. However, the relationship of these pioneers to the long-established Papuan people of the New Guinea region is unclear. Here we present genome-wide ancient DNA data from three individuals from Vanuatu (about 3,100–2,700 years before present) and one from Tonga (about 2,700–2,300 years before present), and analyse them with data from 778 present-day East Asians and Oceanians. Today, indigenous people of the South Pacific harbour a mixture of ancestry from Papuans and a population of East Asian origin that no longer exists in unmixed form, but is a match to the ancient individuals. Most analyses have interpreted the minimum of twenty-five per cent Papuan ancestry in the region today as evidence that the first humans to reach Remote Oceania, including Polynesia, were derived from population mixtures near New Guinea, before their further expansion into Remote Oceanian…our finding that the ancient individuals had little to no Papuan ancestry implies that later human population movements spread Papuan ancestry through the South Pacific after the first peopling of the islands.

These results strong indicate that the original Lapita migration did not mix with Melanesians. And, the ancient samples share common ancestry with modern Polynesians, so that their heritage persists down to the present. Looking at the distribution of Melanesian ancestry they concluded this admixture occurred on the order of ~1,500 years before the present (their intervals were wide, but the ancient samples serve as a boundary). Additionally, in line with the Y and mtDNA the X chromosome indicated more of the ancient ancestry than the autosome. The authors conclude that “it is also possible that some of these patterns reflect a scenario in which the later movement of Papuan ancestry into Remote Oceania was largely mediated by males
who then mixed with resident females.”

The take home message than is that we need to be more modest with our models. Without ancient DNA it seems likely that we would not have stumbled onto this result; the ancestry deconvolution methods which date admixture have wide confidence intervals when you go back far in time.

• Category: Science • Tags: Genetics, Genomics

How to Look at Population Structure

Razib Khan • October 3, 2016

• 800 Words • 7 Comments

🔊 Listen॥■►

RSS

A friend asked me about population structure, and methods to ferret it out and classify it. So here is a quick survey on the major methods I’m familiar with/utilize now and then. I’ll go roughly in chronological order.

First, you have trees. These are pretty popular from macroevolutionary relationships, but on the population genetic scale (intraspecific, microevolutionary) you’re mostly talking about representing distances between groups in a tree format. You saw this in History and Geography of Genes, where genetic distances in the form of Fst values (proportion of genetic variation unique to between two groups) were used as distance inputs.

A problem with trees is that they don’t model gene flow, a major dynamic on a microevolutionary scale. Also, complex relationships can get elided in tree frameworks, and as you add more and more populations you often end up with an incomprehensible fan-like topology.

Then you have principle component analyses (PCA) and related methods (e.g., multidimensional scaling, which is very different in the sausage-making but generates a similar output). Like trees, this is a visualization of the variation, in this case on a two dimensional plot (please don’t bring up three dimensional PCA, there’s no such thing until holograms show up).

The problem with PCA is that different types of dynamics can lead to the same result. For example, someone who is an F1 of two distinct groups occupies the same position as a population which happens to occupy a genetic position between two groups. Additionally, by constraining the variation into two dimensions, one can mislead in terms of relationships. There are many dimensions, but operationally you focus on on two at a time.

A paper of interest, Population Structure and Eigenanalysis.

Next you have model-based clustering introduced in Jonathan Pritchard’s Inference of Population Structure Using Multilocus Genotype Data. There are many flavors of this, but they operate under the same framework. You have a model of population dynamics, and see how the genotype data can be explained by parameters of the model. Of particular interest is assignment to one of K populations, which can be combined to explain the variation in the data.

Unlike PCA these model-based methods are rather good at identifying people who are first generation mixes, as opposed to those from stabilized groups along a cline. But, they also produce artifacts, because they are quite sensitive to the input data, and lend themselves to cherry-picking.

journal.pgen.1002967.g003 (1) Earlier I said that one problem with the tree methods is that they don’t model gene flow. Joe Pickrell’s TreeMix does so. Like the original tree methods, and unlike PCA or unsupervised model-based clustering, you specify a set of populations. Then you compare the populations in terms of their genetic distance, and fit them to a tree, but add migration parameters to that tree where the fit between the tree and the data is the most tenuous fit.

All visualizations are deformations of reality. TreeMix attempts to mitigate this somewhat by introducing another representation, that of migration.

Screenshot 2016-10-02 22.38.02 Next we have local ancestry methods. By local ancestry, basically we mean methods which can assign ancestry to particular regions of the genome. While tree methods measure differences across pooled populations, PCA and model-based methods compare genotypes between individuals (this is a simplification, but bear with me). Local ancestry methods, like RFMix, compare regions of the genome with each other.

Related to, but not exactly the same, as local ancestry methods are haplotype based methods. In particular, I’m thinking of the FineStructure and its related methods. These leverage variation across the genome in terms of haplotypes, rather than just looking at genotypes. They also tend to benefit from phasing, for obvious methods. FineStructure and its relatives tend to need more marker density than model-based methods, which require more marker density than PCA, which requires more marker density that tree based methods. These haplotype based methods allow for correction of and accounting for forces such as genetic drift, which tend to skew results in other methods.

Finally, there is the AdmixTools framework which is good for testing very explicit hypotheses. While many of the above methods, such as TreeMix and unsupervised model-based clustering, explore an almost open-ended space of structure possibilities, the methods in AdmixTools exists in large part to test narrow delimited models. This goes to the fact that many of these methods are complementary, and you should use them together to arrive at a robust result. For example, if you are assigning populations for TreeMix, you should use PCA and model-based clustering to make sure that the populations are clear and distinct, and outliers are removed.

There’s a lot I left out, but many of the other methods are just twists on the ones above.

• Category: Science • Tags: Genomics

The Sex Ratio Is in the X

Razib Khan • September 30, 2016

• 900 Words • 20 Comments

🔊 Listen॥■►

RSS

In The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World the archaeologist David Anthony outlines the thesis that migrations from the west Eurasian steppe during the Bronze Age reshaped the culture of Northern Europe. When Anthony published the book, which you should really read if you are interested in this topic, it was a somewhat heterodox position. Though his intellectual pedigree is of long standing, arguably going back centuries, and extending down the present with J. P. Mallory’s In Search of the Indo-Europeans, in the past few decades diffusion of different sort has been paramount. In particular, the thesis that Indo-Europeans arrived with the first agriculturalists was of late ascendant, with some support being received from phylogenetic modeling of language evolution.

Anthony’s thesis in a way was a halfway house between early modern migrationism from the Eurasian steppe the newer theories. He proposes that the influence from the steppe via the Kurgan people was due to elite dominance and cultural emulation. An analogy here might be that of Hungary, where a Ugric speaking elite eventually imparted to the people a language, but very few distinct genes.

Eventually Anthony collaborated with some geneticists, and provided samples for DNA analysis. The results ended up resulting in a resurrection of migrationism, Massive migration from the steppe was a source for Indo-European languages in Europe. From what I have heard Anthony’s reaction was one of some shock as the magnitude of the genetic change.

That’s the high level view. But what about the details? Over the past few years I’ve highlighted work that indicates that many Y chromosomal lineages are star-shaped. That is, they underwent recent demographic expansion. Recent as in on the order of ~5,000 years ago in the past. But the Y chromosome is just one locus. I’ve always been curious about results from the X because the X also gives you good sex specific dynamics; 2/3 of the time it is spent in females, and 1/3 of the time in males.

Amy Goldberg has done so, Familial migration of the Neolithic contrasts massive male migration during Bronze Age in Europe inferred from ancient X chromosomes:

Dramatic events in human prehistory, such as the spread of agriculture to Europe from Anatolia and the Late Neolithic/Bronze Age (LNBA) migration from the Pontic-Caspian steppe, can be investigated using patterns of genetic variation among the people that lived in those times. In particular, studies of differing female and male demographic histories on the basis of ancient genomes can provide information about complexities of social structures and cultural interactions in prehistoric populations. We use a mechanistic admixture model to compare the sex-specifically-inherited X chromosome to the autosomes in 20 early Neolithic and 16 LNBA human remains. Contrary to previous hypotheses suggested by the patrilocality of many agricultural populations, we find no evidence of sex-biased admixture during the migration that spread farming across Europe during the early Neolithic. For later migrations from the Pontic steppe during the LNBA, however, we estimate a dramatic male bias, with ~5-14 migrating males for every migrating female. We find evidence of ongoing, primarily male, migration from the steppe to central Europe over a period of multiple generations, with a level of sex bias that excludes a pulse migration during a single generation. The contrasting patterns of sex-specific migration during these two migrations suggest a view of differing cultural histories in which the Neolithic transition was driven by mass migration of both males and females in roughly equal numbers, perhaps whole families, whereas the later Bronze Age migration and cultural shift were instead driven by male migration, potentially connected to new technology and conquest.

The figure to the left shows the inferences made in regards to the quantitative contribution of farmer males and females, and steppe males and females, to Bronze Age European populations. In short, it looks like the population of Northern Europe derives from a fusion of males from the steppe, and native females, who themselves arose out of a group of peoples which synthesized the ancestry of European hunter-gatherers and West Asian farmers.

But one of the more interesting things about this preprint is that the admixture can’t be modeled by a single pulse event. It seems that there were repeated migrations out of the steppe over multiple generations. But, these men did not bring women, at least in large numbers. The preprint lays out the common sense reason: these were mobile groups, probably bands of men with weapons. If your game is predation on other humans, having a baggage train of women and children is not optimal.

There is a historical analog to what might have happened. Argentina is a nation where mitochondrial lineages show a lot of Amerindian heritage. But the whole genome far less. This is because of male biased migration from Europe. One generation of this would result in a mixed population, but many generations would slowly replace the whole genome.

We will never know in concrete terms what social-political organizations the Indo-Europeans set up once they conquered the plains of Northern Europe, because we don’t have writing. But it seems unlikely that we’re talking about only band or clan level scales of organization. Rather, it was likely that a ‘Indo-European commonwealth’ of some sort existed initially, predicated on domination and extraction of value from the natives. In such a fashion one can imagine Europe being a draw for enterprising males from the steppe. This could also explain likely ‘back migration’ over time, leading to ‘European’ ancestry among later steppe cultures.

• Category: Science • Tags: Ancient DNA, Genomics

The Sequenced Generation

Razib Khan • September 22, 2016

• 100 Words • 8 Comments

🔊 Listen॥■►

RSS

cinnamon

There was a time, five years ago or so, when we knew all the humans who had been sequenced. Or at least most of them. But now we’re coming into the period when the first sequenced animals of any given species are starting to die. Above is Cinnamon, the first sequenced cat is no longer with us. And some day the hour will come when Craig Venter, who was a major contributor to the first human genome, will no longer be with us.

Something to consider.

• Category: Science • Tags: Genomics

Vlogging Human Evolutionary Genomics

Razib Khan • September 22, 2016

• 100 Words • 1 Comment

🔊 Listen॥■►

RSS

The Estonian Biocentre has been one of the best resources in human population genomics, because their policy under Mait Metspalu seems to be to release the data once it’s published. Today I went and checked the site, and noticed a vlog accompanying their Nature paper, Genomic analyses inform on migration events during the peopling of Eurasia.

Well done.

• Category: Science • Tags: Genomics

Admixture Analysis Isn't Wrong, It Misleads

Razib Khan • September 19, 2016

• 600 Words • 2 Comments

🔊 Listen॥■►

RSS

The above results are from Ancestry. You can see here 4% Melanesian. This is common in South Asians. And it’s not an error in the method. Rather, it is a natural outcome of the methods uses to generate admixture profiles.

Basically what’s going on is this:

1) You have data. In this case, the data are your own genotypes, as well as that of a set of individuals which represent world genetic variation, and are categorized into discrete populations.

2) You have a model or set of models. These models have different parameters.

3) You look at the data you have, and pick the parameters which best explain the data given the model.

If you have 100,000 or more markers that’s more than enough genotype data for individuals. The models themselves are quite stylized (e.g., HWE random mating sets of populations), but close enough to reality to give good results in many cases. For example, Ashkenazi Jews are often assigned to be ~100% Ashkenazi Jewish through these methods.

Then again, Ashkenazi Jews are a good test case. This is a population which went through a bottleneck about 500 to 1,000 years ago, and has been reasonably endogamous most of this time. Additionally, it’s not extremely structured due to inbreeding in different clan lineages. Though cousin marriage and uncle-niece marriage has been practiced by Ashkenazi Jews, the runs of homozygosity you see in Jewish genomes is not such that indicates a highly inbred population, as is common in the Middle East or South Asia. Rather, there are lots of medium length segments identical by descent across individuals.

Ashkenazi Jewish population is rather simple, and it is actually a rather clear and distinct population cluster. It stands to reason that when you create an Ashkenazi Jewish reference panel in your training data set it’s a pretty good match to the individuals you are testing.

The problems occur when you are to generate clusters and ancestry assignments for populations which are not so clear and distinct. Why do South Asians routinely come out as part Melanesian or Polynesian? This post was prompted by a Facebook thread where a South Asian customer of Ancestry was interested to see she had Polynesian ancestry. The reality is she almost certainly does not have Polynesian ancestry.

What’s going on is that the reference panel for South Asians used by many of the DTC genomics companies is not diverse enough to capture South Asian genetic diversity. There is an element of South Asian ancestry, “Ancestral South Indian” or ASI, which has deep shared ancestry with populations across Southern Eurasia and out toward Oceania. The admixture analysis method is searching through the reference panels for combinations of genotypes which can explain individual genetic variation. Since the South Asian training set is insufficient to explain all the South Asian variation the algorithms are filling in the balance of the variation with the closest available proxies to the “ghost clusters.”

The method is constrained and conditioned on two things:

1) The data being put in, which is often insufficient.

2) The set of populations that it is forced to work with to generate the combinations in individuals (the parameter values in the model to explain the data) are often insufficient or artificial.

What I mean by the last is that many of the genetic clusters are not taxonomically equivalent. “South Asian” ancestry is much more diverse and diffuse than “Melanesian” ancestry. This why Melanesian ancestry can explain South Asian ancestry, but generally not the reverse.

• Category: Science • Tags: Genetics, Genomics

The Neandertal-Modern Cultural Synthesis

Razib Khan • September 18, 2016

• 500 Words • 2 Comments

🔊 Listen॥■►

RSS

A new paper in PNAS, Palaeoproteomic evidence identifies archaic hominins associated with the Châtelperronian at the Grotte du Renne, weighs in the question of whether the Châtelperronian culture were Neandertals, with an answer in the affirmative in this case:

The displacement of Neandertals by anatomically modern humans (AMHs) 50,000–40,000 y ago in Europe has considerable biological and behavioral implications. The Châtelperronian at the Grotte du Renne (France) takes a central role in models explaining the transition, but the association of hominin fossils at this site with the Châtelperronian is debated. Here we identify additional hominin specimens at the site through proteomic zooarchaeology by mass spectrometry screening and obtain molecular (ancient DNA, ancient proteins) and chronometric data to demonstrate that these represent Neandertals that date to the Châtelperronian. The identification of an amino acid sequence specific to a clade within the genus Homo demonstrates the potential of palaeoproteomic analysis in the study of hominin taxonomy in the Late Pleistocene and warrants further exploration.

The details about stratigraphy are beyond me. But the protein and mtDNA evidence is pretty conclusive in my opinion that there are Neandertal individuals in this assemblage. Therefore, assuming their stratigraphy is correct, what you see in the Châtelperronian may be a cultural influence upon Neandertals by anatomically modern humans who were pushing into Europe at this time.

But cultural influence may not be the only dynamic at work. In The 10,000 Year Explosion: How Civilization Accelerated Human Evolution Greg Cochran hypothesized that Châtelperronian culture may have been a vector for Neandertal genes coming into modern human populations. And now we know that this isn’t always one directional. That is, just as modern humans absorbed genes from “archaic” populations, so archaic groups absorbed ancestry from modern populations (or at least humans closer to the main stem of modern humanity).

In The Third Chimpanzee Jared Diamond posited that the Châtelperronian Neandertals were analogous to native peoples in the New World such as the Cherokee, who adopted many aspects of European settler culture in their attempt to resist cultural absorption and marginalization. But one dynamic we need to remember about these tribes is that they also had a lot of European ancestry, in part because of the rapidly unbalanced population sizes. It seems entirely likely, as some have posited, that the last “Neandertal” populations were also substantially admixed. Therefore, it is not entirely surprising that they would also tend to exhibit cultural features more commonly found among modern humans.

My prediction is that when whole genomes of Châtelperronian Neandertals are available it is highly likely that they often show evidence of modern human ancestry.

Note: Diamond’s The Third Chimpanzee is in my opinion a very underrated work. It is a bit dated today, but I still think it is quite worth reading.

• Category: Science • Tags: Evolution, Genomics, Human Evolution

The Genetic Structure of Denmark with SNP Data

Razib Khan • September 18, 2016

• 300 Words • 9 Comments

🔊 Listen॥■►

RSS

We live in an age when we have a lot of SNP data on a lot of populations. This allows for a very fine level of granularity in terms of analysis. To illustrate, Genetics recently published Nationwide Genomic Study in Denmark Reveals Remarkable Population Homogeneity, which analyzes hundreds of Danes with hundreds of thousands of SNPs. In The History and Geography of Human Genes, published twenty years ago, most of the analysis was grounded in pairwise comparisons between populations using hundreds of markers. Not only do we have much greater resources in terms of data, but we have various analytic frameworks which in concert allow for richer, more precise, inferences. Today we can actually assign ancestry regions of an individual’s genome!

The first author of the Genetics paper, Yorgos Athanasiadis, has put up a post where does a walk-through of the various methods step-by-step. It’s useful for anyone who has an inclination to do something similar for another data set.

This portion jumped out at me:

The six Danish regions showed highest affinity with a cluster that we call BRI(tish), because it’s mostly made up by British samples, followed by the NOR(wegian) and SWE(dish) clusters. This is not to say that Danes are about 40% made up by British DNA, as some enthusiastic twitters have mentioned. The BRI cluster also includes German, Belgian and Dutch samples, meaning that it might as well be reflecting some other ethnic component; in lack of a better name, we called it BRI. Another interesting fact is that because of the presence of this cluster, haplotype sharing with other Scandinavians was about 40%….

I think the implications of this are something I’m going to have to chew on for a while. Some of these genetics results aren’t straightforward in terms of what they mean in a vacuum, though the historical inference is obvious.

Anyway, read the whole thing, On the genetic structure of Denmark.

• Category: Science • Tags: Genomics

What Little We Know About Megafauna

Razib Khan • September 16, 2016

• 300 Words • 7 Comments

🔊 Listen॥■►

RSS

Ewen Callaway reports from a conference in England, Elephant history rewritten by ancient genomes:

Modern elephants are classified into three species: the Asian elephant (Elephas maximus) and two African elephants — the forest-dwellers (Loxodonta cyclotis) and those that live in the savannah (Loxodonta africana). The division of the African elephants, originally considered a single species, was confirmed only in 2010.

Scientists had assumed from fossil evidence that an ancient predecessor called the straight-tusked elephant (Paleoloxodon antiquus), which lived in European forests until around 100,000 years ago, was a close relative of Asian elephants.

In fact, this ancient species is most closely related to African forest elephants, a genetic analysis now reveals. Even more surprising, living forest elephants in the Congo Basin are closer kin to the extinct species than they are to today’s African savannah-dwellers. And, together with newly announced genomes from ancient mammoths, the analysis also reveals that many different elephant and mammoth species interbred in the past.

…
Palkopoulou and her colleagues also revealed the genomes of other animals, including four woolly mammoths (Mammuthus primigenius) and, for the first time, the whole-genome sequences of a Columbian mammoth (Mammuthus columbi) from North America and two North American mastodons (Mammut americanum).

The researchers found evidence that many of the different elephant and mammoth species had interbred. Straight-tusked elephants mated with both Asian elephants and woolly mammoths. And African savannah and forest elephants, who are known to interbreed today — hybrids of the two species live in some parts of the Democratic Republic of Congo and elsewhere — also seem to have interbred in the distant past. Palkopoulou hopes to work out when these interbreeding episodes happened.

15x coverage. This is awesome. And incredible.

• Category: Science • Tags: Genomics

Living in the Age of Structure

Razib Khan • September 5, 2016

• 400 Words • 6 Comments

🔊 Listen॥■►

RSS

Jonathan Novembre and Benjamin Peter have posted a preprint of a review, Recent advances in the study of fine-scale population structure in humans, which readers will find useful. In particular, the citations are a gold-mine for anyone attempting to navigate this literature.

The figure above from their preprint illustrates the number of markers needed to differentiate populations in Europe. Recall that genetic variation within Europe, especially Northern Europe, is rather low. It’s pretty clear that if you sample 100 SNPs from the human genome you can’t differentiate much. At 1,000 SNPs structure begins to appear, and this is starting to be well resolved by 10,000 SNPs. By 100,000 SNPs you are pretty much going to hit diminishing returns for regional diversity on Europe level scales. The pattern differs by method. PCA for example does much better with 10,000 SNPs in Europe than the model-based clustering (e.g., ADMIXTURE) in my experience, but the two are comparable as you near 100,000 SNPs. Beyond 100,000 SNPs there is not that much increase in resolution for genome-wide methods that rely on genotypes at this level of genetic diversity.

Of course, if you want really fine-scale differences, between villages for examples, more markers, and perhaps whole-genome sequencing that can pick up rare variants, are useful. In other words, there are cases one can imagine where more data than is normally available on SNP-chips ps useful. But these are definite boundary conditions. Once you get to the point of distinguishing branches of extended families you really can’t collapse the genealogies any further.

Another instance where more marker density, or the power of high coverage whole-genome sequencing, might be useful is for local ancestry deconvolution. If you’re assigning ancestry to windows of the genome then your marker density is going to be a limiting factor, as you might be slicing the 100,000 SNPs into 1,000 subunits.

Finally, there’s the issue of the models being tested. Novembre and Peter allude to the fact that many of these models posit stylized discrete pulse admixtures. As it turns out in some cases ancient DNA seems to have confirmed that something like this went on. That is, long periods of local stability and panmixia, followed by genetic turnover and admixture. But they note that there isn’t a good simulation framework where demographic scenarios are allowed to generate in silico data for testing new models. In other words, biologists are currently having to rely on “natural experiments.”

• Category: Science • Tags: Genomics

Genotype Them All, and Let GWAS Sort It Out

Razib Khan • August 28, 2016

• 600 Words • 4 Comments

🔊 Listen॥■►

RSS

About thirteen years ago I expressed the opinion that an understanding of population structure will become a matter of intellectual curiosity once we have a better understanding of the genetic basis of characteristics. A friend, who was a statistical geneticist, told me that this was unlikely. We were unlikely to capture the ability to predict all outcomes well enough on even high heritable complex traits to simply discard population structure information. Some of this is not due to genetics; different populations may expose themselves to different environmental conditions. For example, it would be useful to know which individuals in the CEU white European American data set are practicing Mormons, and which are not, because Mormonism tends to result in a lot of behavior modification.

But some of the concern about population structure has to do with the fact that genetic background matters, and we are unlikely to ever have total omniscience as to the nature of genetic interactions and dependencies. By this, I mean that if we have a strong causal signal which associates disease risk with a genetic variant, that risk is still conditional on dependencies of other genetic variations across the genome. Those variations are the outcome of demographic histories, which one can “control” for to some extent by accounting for population structure. In more plain language, a signal that predicts an outcome in Norwegians may not predict the same outcome in Nigerians. The may be due to different frequencies of other variants which are not directly causal, but interact with the causal signals, which vary between populations.

More recently I’ve been a bit sanguine. I don’t follow the literature closely, but papers like High Trans-ethnic Replicability of GWAS Results Implies Common Causal Variants, make me wonder if the genetic background concerns weren’t over-wrought.

A new preprint, Population genetic history and polygenic risk biases in 1000 Genomes populations, suggests we should be worried. Or, more precisely, we should be cognizant of the limitations genetic background imposes upon us for certain classes of variants and disease. In particular, rare variants are going to be less portable across populations because of shallower time depth of their emergence, after, populations have diverged. So, if you have a low frequency major effect causal variant in Europeans, there is a much lower likelihood that it is in other populations.

The histogram above illustrates an excellent case study from the preprint. The genetic architecture of height and its genomic basis has been most well elucidated for Europeans. We know, for example, many of the loci which distinguish Northern and Southern Europeans, and, we know that selection has resulted in divergence between the two populations over the past 5,000 years. But as you can see the predicted heights seem to simply follow genetic distance from Europeans. SAS = South Asians, while AMR = a mixed cohort of populations from the Americas. EAS and AFR are East Asians and Africans. In reality, Africans are nearly as tall as Europeans (taller or shorter depending upon the reference European population), and taller than East Asians. The predictions here are off because the causal variants inferred from the studies of European cohorts are portable in direction proportion to shared demographic history. South Asians share a relatively ancient demographic history with Europeans, while many mixed groups from the Americas have Europeans as one of their recent founding populations. But in both cases the causal variants were likely segregating in the ancestral populations before divergence, so there is no major difference in the consequence.

The preprint has a lot more than just a reanalysis of GWAS. Using local ancestry deconvolution methods they show how one can infer history from patterns of genetic variation (though as always, this should not be taken as gospel, as there are biases in the methods currently used). The major take home is simple: population structure is real, and, it has real consequences functionally.

• Category: Race/Ethnicity, Science • Tags: Genetics, Genomics

Our Magnificent Bastard Race

Razib Khan • August 21, 2016

• 600 Words • 109 Comments

🔊 Listen॥■►

RSS

In 2011 I was having dinner with an old friend who was an engineer at Intel. He also has a Ph.D. from MIT. Smart guy. But when I mentioned casually offhand that we were all a few percent Neanderthal (outside of Africa), he was surprised. I was a bit shocked, as I explained that this was a huge science story. The Neanderthal genome had been published the previous year. How could my friend not have known?

He was totally unembarrassed, and told me I overestimated how closely the public followed genetics and paleontology. I’m sure he was right. But it’s hard to remember sometimes.

We’ve gone further beyond where we were in 2010. We now have a really good grasp of a lot of population dynamics in Eurasia over the past 20,000 years. Probably the best place to start is with this preprint, The genetic structure of the world’s first farmers. But the general outlines were already evident a few years back in Toward a new history and geography of human genes informed by ancient DNA.

Most of the world’s population seems to descend from a mixing of a set of groups which 10,000 years ago were distinct. How distinct? We’re talking about Fst values on the order of 0.10, which means that ~10% of the variation genetically is partitioned across two pairwise populations. That’s about what you see between Europeans and Chinese today. Some of the Fst values were a bit higher, some lower, but the 0.10 seems about right.

To make it easy for some of you, I’ve labeled and placed the approximate locations of ancestral groups to modern Northern Europeans ~10,000 years ago. What I’m trying to represent is a map which shows the modal regions of distribution of ancestors that Northern Europeans today had 10,000 years ago. So, for example, since ~15% of the ancestry of Northern Europeans is “Ancient North Eurasian” (ANE), a lot of ancestors of Northern Europeans alive today would be living somewhere in the broad expanse of Central Eurasia (now, because of various demographic events the number of ANE was probably lower than farmers, perhaps lower than the 15% contribution to the modern genomes).

A substantial proportion of the ancestry of Northern Europeans is “European hunter-gatherer,” dating to the Pleistocene. But here’s the kicker: most of that ancestry dates to after the LGM, to about ~15,000 years ago. The really deep Pleistocene ancestry in Europe is only found at very low levels now.

The final issue is that a lot of the phenotypes that we racially code are recent. This probably explains why groups like the Kalash and Nuristanis can look more like Europeans than South Asians, but they’re genetically more like South Asians.

What does any of this have to do with non-scientific things? I don’t really know. My interest in population structure is intellectual, not personal. But a certain type of person should probably stop talking about how white people have been in Europe for 40,000 years. First, the ancestors of modern Europeans 40,000 years ago were almost all residing outside of Europe. An assertion that holds until 15,000 years ago. And most would still be resident outside of Europe 8,000 years ago as depending on how you count/calculate* And, perhaps more importantly, the typical phenotype of Northern Europeans probably really coalesced only around ~5,000 years ago.

* Definitely true for Southern Europeans, but conditional on Northern Europeans depending on where you draw Europe’s eastern boundary.

Addendum: I stole the title from John McWhorter’s book, Our Magnificent Bastard Tongue.

Also, this is not to say that

1) population structure today is trivial in a phylogenetic sense, it isn’t.

2) it is not to say that population structure functionally irrelevant, it isn’t.

• Category: Science • Tags: Genetics, Genomics

Genomics to the People

Razib Khan • August 19, 2016

• 4 Comments

🔊 Listen॥■►

RSS

Joe Pickrell and Yaniv Erlich did an AMA on Reddit yesterday. I recommend you check it out.

They promote their new project, seeq. It looks pretty slick, and I’m excited to be part of the batch of beta testers.

• Category: Science • Tags: Genomics

How Science Is Done

Razib Khan • August 10, 2016

• 200 Words • 3 Comments

🔊 Listen॥■►

RSS

A follow up on the Ancient Archaic Admixture Into the Andamanese story, No evidence for unknown archaic ancestry in South Asia:

Genomic studies have documented a contribution of archaic Neanderthals and Denisovans to non-Africans. Recently, Mondal et al. 2016 (Nature Genetics, doi:10.1038/ng.3621) published a major dataset–the largest whole genome sequencing study of diverse South Asians to date–including 60 mainland groups and 10 indigenous Andamanese. They reported analyses claiming that nearly all South Asians harbor ancestry from an unknown archaic human population that is neither Neanderthal nor Denisovan. However, the statistics cited in support of this conclusion do not replicate in other data sets, and in fact contradict the conclusion.

Last I heard they hadn’t released the bam files. Mistakes are made, that’s how science is done, and other people help in the process of correction. But, it is starting to get worrisome to me to see papers with bioinformatic errors being published in high impact journals.

• Category: Science • Tags: Genomics

« Earlier Items

"Genomics" Items Across Entire Archive

About Razib Khan

"I have degrees in biology and biochemistry, a passion for genetics, history, and philosophy, and shrimp is my favorite food. If you want to know more, see the links at http://www.razib.com"