How do scientists use DNA differences found in fossils to determine when various human populations split apart from one another?
By and large, they aren’t.
First off, we have some founding assumptions. You’ve got to agree to tentatively accept the hypotheses and some of the principles of evolution, to start: that over time, genetic changes happen (at least somewhat) randomly, and those mutations offer either positive or negative effects on ability to survive or reproduce; but by and large, there are also a large number of mutations which have no impact on survival. There are also survival events, and migrations- you must accept that people choose to relocate from area to area, either because food becomes scare in an area or competition between humans becomes too fierce in an area, and so there are multiple events that happen over time to move a particular population around, and that this migration explains why there appears to be humans across the face of the earth.
The evidence used to develop these theories predates the era of DNA testing, and scientists have developed a series of simulations and models that appear to fit with the observed physical changes noted in the skeletal structures of ancient humans. While some scientists disagree on the absolute validity of one model or the next and might prefer one model over another, all of the models tell a similar, closely related set of stories: humans appear to have migrated from somewhere in the african continent, through the middle east, onward through asia into the americas, into the european continent, et cetera; and generally have populated the entire world through a series of migrations which seem to have started a couple hundred thousand years ago. The exact dates will probably never be known precisely, but let’s say (for the sake of argument) that the major movements into other continents occurred about 100,000 years ago.
Enter the era of DNA testing. DNA sequencing is an expensive and time consuming effort, and initially, it was only performed on small subsets of the human genome.
In order to understand what DNA is telling us, you have to understand something about the acronym TMRCA: time to most recent common ancestor. Lets say you have a patch of DNA that mutates fairly frequently, such that every two or three generations, there’s another mutation observed. If you look at that sequence of DNA in you and your brother, and you assume that you have only received that DNA from one of your parents, you wouldn’t expect any differences. But, let’s say that your parents have different copies of each, and their parents have different copies of each, and your grandparents have different copies of each. Over time, these copies mix and cross over, as well as have new changes introduced. While this can be modeled and predicted, the early methods of testing weren’t sophisticated enough to handle this complexity, and so they resorted to testing the two areas of DNA which weren’t subject to this kind of mixing: mtDNA, and (most of) the Y chromosome.
There’s many articles on the history of using mtDNA changes to determine common descent, both to our common ancestor in humans (see, e.g. Mitochondrial DNA and the mysteries of human evolution) and other animals, plants, and bacteria, but once you start going further back in time, beyond the common descent of apes and hominids, you start encountering slightly different models for how mtDNA is passes from mother to daughter, and the changes become much more difficult to devise acceptable models that can explain this common descent (see, e.g. Evolution and inheritance of animal mitochondrial DNA: rules and exceptions). Nonetheless, there’s a tremendous amount of “sameness” between human mitochondria from people around the world that’s nearly impossible to explain without some possible common ancestor and an almost exclusive mother-to-child inheritance. The difficulty in coming up with a model involves explaining how all of the people who are alive today have the different varieties observed- and while this is not a conclusive model, it fits the data collected very well, and appears to continue to fit quite well as we collect more and more sequences from modern people.
The image to the left shows both some of the assumptions in the model as well as the classes of observed sequences. The sequences with the most in common with all other peoples are found in people who generally didn’t leave Africa, and these are found in three groups of maternal sequences called L, and are shown in the map in the upper center-left. From these people, people with greater differences can be shown to have spread out during multiple waves of migrations, down the map, and then outward into more isolated groups to the left (into Australia) and to the right (into the Americas), explaining the common origin of the various aboriginal populations in each of these regions.
Let’s briefly go back to TMRCA. If there’s a mutation found in your sister, you, and your mom, but not your grandmother or anyone else on the planet, the TMRCA is about 25–30 years. If it’s also found in your grandmother, but no one else, the TMRCA is 50–60 years. Now, take all the mutations ever found- in mtDNA, that’s roughly 2000 or so out of 16,000 or so possible locations. Then bucket them into who shares then in common, and group people. If you make assumptions about the number of unique mutations which might occur in a given year, and say that roughly one mutation happens say every “n” generations or so, you get an initial estimate of TMRCA.
You might never find someone who has the same sequence as whatever sequence you predict might be that common sequence, but by carefully grouping and bucketing, you might discover a hypothetical origin sequence than all people alive today have on average no more than say 100 differences in total from that common ancestor. Let’s make a very naive model with a lot of assumptions: If each of the changes occurs once every 7 generations, and there are 100 differences in total in each branch, then the common ancestor is about 700 generations prior, and the most common ancestor would be between 17,500 and 21,000 years ago. Now, obviously it’s not that simple, and it’s not a metronome-type clock ticking off the years. Mutations happen randomly, and while with a greater number of possible mutations and a more accurate range of estimates, you can slowly but surely come up with a better clock than my simple one I have just provided. The latest models show about 130,000–170,000 years before present, for the mtDNA mutations observed in people.
But more importantly, it shows certain groups of people have a much more recent TMRCA, pointing to the dates of certain “bottleneck” events. If, say, all European women are found to have a common ancestor 45,000 years ago, then we might be able to date all of the women back to a common ancestor who passed out of Africa. If we date all European women into two groups of women, ones who have a common ancestor 30,000 years ago and a different group 10,000 years ago, but all have a common ancestor 45,000 years go, we can say that Europe was likely populated in two waves or groups, one which brought ancestors 10,000 years ago, and one that brought ancestors 30,000 years ago, but both groups shared a common ancestor 45,000 years ago. You see these various branches as different routes on the arrows on the map above. Now it may be that the group from 10ky has, say, 3–4 mutations always in common but another 2 or so which are only sporadically shared among the grouping, and the group 30ky has say 5–6 mutations entirely in common, but both groups have another 1–2 mutations which they both have and share with only both these groups and no one else. We know the one with more mutations entirely in common but not shared with the other groups must have had an older origin, and the total number of mutations is indicative of the overall age of the group. By careful reconstruction and simulation, you begin to refine the classes, estimates, and mutations into a coherent picture,
Again, all of this evidence is collected not from fossils, but from modern humans. We don’t know and can’t guarantee that individual women descend from people who followed these paths of migrations, but based upon all of the evidence collected and careful review of the available data, we are fairly certain that this model is the best one among many alternate models that explains the data observed.
As scientists have become more comfortable and gained certainty with this model, the next step is to find evidence that either helps refine or clarify the model, perhaps finding evidence that breaks certain parts of the model such that a new and better model might be found. Here’s where the DNA of archaic fossils might be helpful.
The scientists behind the scenes in the article Feeling old? DNA supports an early evolution of our species, are attempting to take the sequences of fossils collected from South Africa either 500 years old or 2000 years old, and to then fit them into the picture. When you look at a particular inherited mutation, that mutation either arose recently, or originated in one of our common ancestors. To the extent that people interbred, the mutations in the autosomal DNA (non mtDNA, non Y-DNA) that are observed in full sequences can be compared through extensive computer models. This is the DNA that mixes between parents each time a child is conceived. Right now there are (likely more than) eight different models that are used to try to understand the massive amounts of data we’ve collected both from modern humans and archaic fossils, but still, the bulk of evidence is largely from humans alive in the last 20 years or so.
This is not a simple task. It involves grouping each of millions of possible candidate mutations into a possible origin story- finding other people who have the same mutation, and then coming up with a statistically-likely chain of evidence to determine when each of these people shared a common ancestor. With more and more people able to move to remote areas of the world and interbreed with people from different areas, this is a nearly impossible task simply with modern human DNA sequences. By taking the sequences of archaic humans whose origin is less likely to have come from half way around the world, we can find individual markers which might be dated to a particular common descent. For this to work, the archaic fossils must have the same mutations as currently-existing people- if mutations are found that are not found in current people, there is a possibility that these mutations didn’t survive in any modern population. But, these might be found in more archaic fossils, and so help to produce a tree of common descent between samples and to determine, by the number of shared and unshared mutations, a reasonable estimate of time when this family split off from the rest of the human family tree.
Why do we feel good about these models? There seems to be a lot of assumptions, after all. We feel good because the pictures that are coming out of these various models appear to be in rough agreement. They might disagree on one of the more arcane details, for example, the number of waves of people who moved into Europe and when, but by and large, the evidence from DNA is not contradicting any of the older evidence we have found from studying fossils without the help of DNA, and further, the evidence we are finding in archaic DNA of fossils is likewise coming into general agreement with the other two sets of evidence- occasionally disagreeing on some minor details, but overall, the picture still looks very much the same.
Update: I’ve had an opportunity to review the paper in question a bit more thoroughly (Southern African ancient genomes estimate modern human divergence to 350,000 to 260,000 years ago) and while I have the utmost respect and admiration for the work, it’s important to note that the task of unravelling the origins of “modern humans” is always going to be a subject of debate and the latest answer is not likely to be the “final answer” to the question. It should be noted that while this adds some very important details on the role of multiple populations admixing to form modern humans, we also are fairly certain that Neanderthals and other “archaic” humans unrelated to modern humans co-existed and likely interbred with “modern” humans in Europe subsequent to our departure from Africa (for example, see: Evidence mounts for interbreeding bonanza in ancient human species).
It is also important to note that the “multiple origin” hypothesis doesn’t in any way place the “common descent” hypothesis of evolution into any question. In order to ask the kind of questions we are posing in multiple-origin hypotheses, you first have to make strong assumptions of the existence of common ancestors. What is under debate is the size of the population in which those first humans were born, and to what mix and what importance other “archaic” or “extinct” human variations participated in the breeding pool and ultimate survival of that ancestor. Theories which speak of a “mitochondrial Eve” (while popular due to the harmony with theological origin stories) might seem to trace to a “single” common ancestor, but it should be noted that the extremely small percentage of DNA that mtDNA and Y-DNA represents and the lack of crossover and other mixing provides a natural bias to any experiment looking for “a single” ancestor.
answer originally published on Quora: How do scientists use DNA differences found in fossils to determine when various human populations split apart from one another?
matt harbowy is a scientist, amateur genetic genealogist, activist, and data management expert. He is one of the founders of the non-profit Counter Culture Labs, working to bring fairness and egalitarian ideals to people interested in learning about science and biotechnology. He is also a top writer on the question and answer site, Quora.