That evolution occurs is well resolved. Precisely how evolution occurs, in detail, is less so.
One question revolves around if present-day life arose from a single species or more than one.
The question of if life has a single origin particularly arises from an oft-cited quote of Charles Darwin: ’all the organic beings which have ever lived on this earth have descended from some one primordial form.’3
Over the last few years* in particular there has been a lot of commentary that horizontal transfer (HGT) present in bacterial species may indicate that present-day life may not have arisen from a single ancestor, but perhaps a (small) number of ancestors.
HGT is where genes pass ’horizontally’ from one species to another species, rather than ’vertically’ to the offspring or progeny of the species with the gene. (The dotted blue lines in the trees below represent horizontal gene transfer.)
Abundant evidence exists for a universal common ancestor for ’higher’ taxa such as vertebrates (species with a backbone). Harder to present is quantitative evidence for all life, including bacteria and other microbes. (You can point to qualitative arguments more easily pointing to, for example, common chemistry in all life.)
DNA, RNA and protein sequences can be compared as to how similar they are. It is often argued that in most cases the more similar the protein (DNA or RNA) sequence, the more likely they arise from a closely related ancestor, the idea being that the smaller the number of differences between them, the smaller number of mutations that have occurred and hence the smaller evolutionary time from when both were identical, i.e from the same species.
While this will be true in many or most cases, it is not a reliable assertion when (taken on it’s own) as in practice there are many alternative explanations for why two proteins (genes, RNAs) may be similar. Convergent evolution is one, where two unrelated proteins independently become more similar over time.
Rather than battle this and related issues, Theobald used model selection theory to test quantitatively which model best explains the data: models with a single ancestry or models with more than one. The approach works, to use his words, without ’assuming that sequence similarity implies genetic kinship.’
To mathematicians a model means a description of how something might be. One model of evolution is that all life arose from a single chain of ancestry. (Imagine one tree of life arising from a single ancestor; tree ‘b’ on the left.) Another model is that there was more than one source of life. (Imagine several independent trees of life or a single tree with more than one original ancestor.)
A key problem is that if there are many free parameters, a complex model can ’explain’ many datasets. Free parameters are values used in the model that you can vary. Naturally, the more values you can vary, the easier it will be to fit your model to data.
Model selection theory aims to determine the best model for the data, balancing the accuracy of the fit of each competing model for the data against the number of variables used in fitting the alternative models to the data. (I should emphasise here that while the diagrams of the trees shown above are elegantly simple, the underlying mathematical model used to generate them are quite complex.)
Theobald’s work compares protein sequences selected from 23 proteins that are present in all forms of life for 12 well-characterised species. Proteins are what the code of many genes encode**, a chain of amino acids – the amino acid sequence of the protein – that fold to form a particular shape.
Each ’family’ of proteins chosen in Theobald’s study have the same three-dimensional structures in different species, but different amino acid sequences and are thought to be orthologous, that is they have a common structure, but the different members of each protein family carry out different molecular tasks. The divergence to carry out different tasks reflects evolution. The question is what is the best model for how this divergence arose, from a single ancestry or more than one.
Using three of the more widely accepted methods used to create phylogenetic trees (ancestral trees of life) or networks, Theobald tested if three domains of life – eukaryotes, bacteria and archaea – were best explained by having one origin or each of different possible combinations of more than one line of ancestry to the present-day species.
His results strongly favour a single ancestor despite that close to a half of the proteins used in the study are likely to have ancestry that involves horizontal gene transfer.
Using a model with the proteins taken together as a group (i.e. not considering horizontal gene transfer), a model with single ancestor was very strongly favoured despite that the proteins that have a HGT ancestry might confound the analysis.
When tested using a model that allowed the proteins in each class of life to evolve independently of the other, hence allowing for HGT events to be accounted for, this most often more strongly supported the model of a single common ancestor than the previous models.
Thus, this work does not argue against a ’web’ with extensive horizontal transfer in the earlier stages of the evolution of life; it strongly suggests is that such an initial web of life would arise from a single ancestry.
In similar fashion, when a model considering if eukaryotes (loosely speaking, plants, fungi and animals) arose through a symbiosis of bacteria and archaea was tested, a universal common ancestry (UCA) was again very strongly favoured over more than one ancestry.
As you might expect there are still caveats with this work.*** Single scientific papers essentially never resolve all the issues in a single leap. This work is a solid advance, moving the case forward using a quantitative approach; more will certainly follow.
* The concepts involved are not new to biologists, but the discussion in wider context has ’come around again’ as you might say. Ford Doolittle’s Scientific American article Uprooting the Tree of Life (Feb, 2000) lays out much of the earlier background, but this not available on-line free. Most libraries should have a copy.
** Genes can also code for RNAs, but I don’t want to make this story more complex than it need be.
*** As this article already has quite enough going on in it and it already more than long enough, I’ll leave this for others to contribute in the comments.
The word ‘species’ in the second sentence (and perhaps elsewhere) should perhaps come in inverted commas. The word ‘population’ is probably more appropriate. There is some debate over if the species concept applies fully to (some) microbes or for very early life.
New Zealanders will note that the accompanying commentary paper is by New Zealand scientists Professors David Penny (Massey) and Mike Steel (Canterbury).
As an aside, one thing that appeals to me is that this article is a single author paper – in a leading journal to boot – something that seems rare these days.
1. Theobald, D. (2010). A formal test of the theory of universal common ancestry Nature, 465 (7295), 219-222 DOI:10.1038/nature09014
2. Steel, M., & Penny, D. (2010). Origins of life: Common ancestry put to the test Nature, 465 (7295), 168-169 DOI: 10.1038/465168a
Other computational biology articles on Code for life: