Where do we come from? How did all the diversity of life originate? These are some of the most intriguing biological questions we still struggle to answer. In order to understand the processes that produce the diversity of life on Earth, evolutionary biologists study the historical relationships of organisms, their phylogeny. There are only three major kinds of cellular organisms that form the Tree of Life: Bacteria, Archaea and Eukarya (Figure 1). Eukarya include the familiar plants, animals, and fungi, as well as protists, single-celled creatures such as microalgae. Their cells have an internal architecture based on membranes and the DNA is kept inside a nucleus. Organisms in the other two groups, Bacteria and Archaea, are all unicellular and lack this more complex internal structure.
Figure 1 ~ Representatives of major forms of life and their cell structure. (a) Bacteria Bacillus subtilis; (b) Archaea ARMAN; (c, d) unicellular Eukaryote, microalgae Chlamydomonas sp.; (e) multicellular Eukaryote, sea star Astropecten articulatus; (f) Eukaryotic cell (human white blood cell). a, b, d, f: transmission electron microscopy; c: light microscopy. Figure sources: a, Wikimedia Commons; b, adapted from Baker et al. 2010 PNAS 107(19); c, d, Wikimedia Commons top, bottom; e, Wikimedia Commons; f, Wikimedia Commons
The oldest records of Bacteria come from microfossils and larger fossilized structures that resemble modern bacterial communities, and date to around 3.5 billion years ago (bya). This is about the same age as the first evidence for Archaea, detected by the biological methane left behind in rocks, a gas that is only produced by Archaea nowadays. Finally, the first unambiguous evidence for eukaryotic life is much more recent, dating to around 1.2 bya, with some possible older eukaryotic microfossils .
Although Bacteria and Archaea look alike to our eyes, the latter are more related to the Eukaryotes than to the Bacteria. But how exactly are we related to the Archaea? Recent studies with new molecular data and techniques are revealing this history .
Recent competing hypotheses about the origin of Eukarya
The tree of life currently presented in most textbooks refers to three domains of life [3, 4]. It places the Eukarya as a sister group to the Archaea, and Bacteria as sister to both (Figure 2a). The main alternative to this idea is the archaeal-host hypothesis, which places Eukarya inside Archaea  (Figure 2b). While the three-domains hypothesis implies that Archaea and Eukarya had a common ancestor, which then split into the two lineages, the archaeal-host hypothesis implies that the first Eukaryotes arose directly from an Archaea. In other words, this means that the first Eukaryote was probably an Archaea that somehow acquired the cell structure present today only in Eukarya, perhaps by fusion with another cell. The implication of this alternative hypothesis is that we are members of the Archaea domain, and that there are only two, not three, domains of life.
Figure 2 ~ Competing hypotheses about the relationships of the three main groups of organisms. (a) The three-domains tree; (b) the archaeal-host tree. A sister-group relationship is indicated by a branch that splits into the two groups. It means that these groups share a more recent common ancestor with each other than with any other group in the tree and therefore are more closely related. Modified with permission from  Macmillan Publishers Ltd: Nature 2013.
Reconstructing the ancient history of life
Inferring the phylogeny of a group can be done by following a few steps: we get samples from the different species we are interested in; extract and sequence fragments of their DNA; and compare all the sequences to infer how they are related to each other. By comparing the same parts of the sequence across all species, we can conclude that the more similar they are, the closer those organisms are historically. This seems straightforward, but in fact it is not.
Having a good sampling of the diversity of organisms is very important because it is hard to account for information that is unknown, so there is a risk of getting the history wrong if just a few species are represented. Archaeal samples have been especially underrepresented, since they often occupy extreme environments and are hard to cultivate in the lab. However, recent advances in molecular methods now allow us to obtain sequences directly from organisms in natural environments. Particularly, many new sequences from four groups of Archaea (TACK: Thaumarchaeota, Aigarchaeota, Chenarchaeota/Eocytes, Korarchaeota) have been included in recent studies. These new data support the archaeal-host hypothesis and find that the closest relatives of the Eukaryotes are one or all of the TACK Archaea  (Figure 2b).
Different models of sequence evolution also have a large impact on the outcome of phylogenetic analyses. Simple models of evolution generally assume that all the DNA positions in a sequence evolve at the same rate, and that base composition (A, C, G, T in the DNA) is constant across different groups. Analyses using those models traditionally recover the three-domains tree. However, those simplified assumptions are not justified in most cases. Base frequencies actually vary widely among the three domains, and more complex models that take this into account need to be used to avoid error. Also, there are sites that go through base changes more often, while others are constrained by natural selection and remain the same for longer periods of time – for instance when changes in certain regions of the DNA are more likely to damage its function than changes in other positions, so that variations in the former will often be selected against. To deal with such issues, different models of evolution can be applied to different parts of sequences, and more sophisticated models have recently supported the archaeal-host hypothesis [5, 6].
Another issue comes from the disagreement among genes: even when we are able to accurately reconstruct the history of a gene, not all genes in any given organism tell the same history. Important genes to be considered in phylogenetic reconstruction of such ancient relationships are those that are very conserved, meaning that they are very similar in long-diverged species. Long periods of time make possible the occurrence of consecutive changes in the same site, which confuses the analysis. Some genes, however, are very important in the integration of cell functions and thus are very constrained by selection. Examples are sequences related to transcription and translation (reading genes and transforming their code into proteins, respectively).
Eukaryotic genomes are a mixture of genes from distinct origins. Some are very similar to bacterial genes because they indeed have a bacterial origin. They were transferred to Eukaryotes from the bacterium that was engulfed by an early Eukaryote and eventually became the mitochondria (organelle responsible for energy production inside cells) . But after comparing the similarity in conserved genes among the Bacteria, Archaea and Eukarya, studies found that they were more similar between Eukarya and subgroups of Archaea (TACK). This supports the archaeal-host hypothesis , in which important genes in the nucleus came from the host that gave rise to the Eukaryotic lineage.
Even with great support for the archaeal-host tree, we are still missing parts of the story. For example, a major challenge to the hypothesis is to explain the evolution of the membrane that surrounds cells. Membranes in Bacteria and Eukarya have the same biochemical structure, while Archaea have a different type . Following the tree, the first membrane to exist was the one from Bacteria, which then went through modifications in the Archaea lineage. If, as in the three-domains hypothesis, Eukarya was an independent lineage from Archaea, this would mean that Eukaryotes kept the original membrane and that the new one appeared in Archaea. But if Eukaryotes arose from inside Archaea, there must have been a change back to the previous state of the membrane when Eukaryotes appeared. This explanation involves more steps, but it could be the real one. In fact, most genes needed for the synthesis of both types of membrane are present in all three groups, which means that the transition between different membranes maybe did not require radical genomic change.
Although there are many difficulties in generating and interpreting phylogenies, a phylogenetic framework is essential to allow us to understand the fascinating history of life, its distribution and diversity. Recent studies now make the archaeal-host tree the best-supported hypothesis for the origin of Eukaryotes. This scenario is consistent with the existence of only two primary lineages of organisms and with the idea of Eukaryotes first emerging from an interaction between an archaeal host and a bacterial partner. Future research will hopefully increase the resolution of the Tree of Life with better knowledge of its diversity and even more sophisticated methods. These efforts will also help to confidently find the root of all life and to resolve relationships of more recent groups.
Tauana Junqueira Cunha is a PhD student in the department of Organismic and Evolutionary Biology at Harvard University.
 Knoll, A.H. 2003. Life on a Young Planet: The First Three Billion Years of Evolution on Earth. Princeton University Press, Princeton, New Jersey.
 Williams, T.A., Foster, P.G., Cox, C.J., Embley, T.M. 2013. An archaeal origin of eukaryotes supports only two primary domains of life. Nature 504, 231–6. http://www.nature.com/nature/journal/v504/n7479/full/nature12779.html
 Woese, C.R., Kandler, O., Wheelis, M. 1990. Towards a natural system of organisms: Proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences of the United States of America 87, 4576–4579. http://www.pnas.org/content/87/12/4576
 Animation from Raven, P. 2008. Biology. 8th edition. McGraw-Hill Higher Education. http://highered.mcgraw-hill.com/sites/9834092339/student_view0/chapter28/animation_-_three_domains.html
 Cox, C.J., Foster, P.G., Hirt, R.P., Harris, S.R., Embley, T.M. 2008. The archaebacterial origin of eukaryotes. Proceedings of the National Academy of Sciences of the United States of America 105, 20356–61. http://www.pnas.org/content/105/51/20356
 Guy, L., Ettema, T.J.G. 2011. The archaeal “TACK” superphylum and the origin of eukaryotes. Trends in Microbiology 19, 580–7. http://ac.els-cdn.com/S0966842X11001740/1-s2.0-S0966842X11001740-main.pdf?_tid=f29b6ef4-c5d1-11e3-95c4-00000aacb35e&acdnat=1397699306_e5573f81419002d7da11d34c3b9b4107
 Martin, W., Mentel, M. 2010. The Origin of Mitochondria. Nature Education 3(9):58. http://www.nature.com/scitable/topicpage/the-origin-of-mitochondria-14232356