by Misha Gupta
figures by Xiaomeng Han

For close to two centuries, humans have been studying the biological past using fossil records. In recent history, we have added the ability to reconstruct the sequence of our DNA to our arsenal. Furthermore, phylogenetic trees (structures that define the evolutionary relationships in the line of descent from a common ancestor) have been created for all manners of organisms, allowing us to study their complex evolutionary relationships. Now, we are slowly uncovering the “Viral Fossil Record,” inferring and tracing back the history of viral evolution through the ages. This is done using current existing genetic information as a means to look into the past. So, what does this ‘fossil record’ tell us? Can it have implications for the future of disease transmission within our species? 

Viruses and retroviruses

While most microbiologists agree that viruses are not living organisms, they are considered infectious agents. Viruses are small pieces of genetic material protected by a capsid (a protein shell) and an outermost layer called the envelope. The genetic material can be either DNA or RNA. Capsids and envelopes show a dazzling array of morphological diversity between viruses, including the spike proteins from the SARS-CoV viruses that cause COVID19, as well as the more complicated icosahedral structures of bacteriophages.

Uniquely, viruses cannot replicate outside a host organism or on their own. Not only do viruses lack the machinery required to make their own functional proteins, but they also cannot synthesize the energy-carrying molecule ATP. Viruses are completely reliant on host cells for reproduction and use a variety of methods to co-opt host cell machinery for their own purposes. For example, viruses insert their own genetic material into host cells by  binding to the cell and then degrading their own capsid to release their DNA or RNA into the host cell. The host cell machinery is then used to produce multiple copies of the virus’s genetic material until, for certain types of viruses, the cell eventually bursts from the sheer volume of viral particles being produced inside it, releasing more viral particles to continue the cycle (Figure 1). 

Figure 1: Viral DNA, known as an Endogenous Virus Elements (ERV) is integrated into the host genome and passed from parent to progeny when the host genome replicates.

Despite not being classified as living organisms, viruses are still subject to evolutionary pressures! They can accumulate mutations during the process of replication inside host cells, and undergo selection. Some viruses, particularly the retrovirus family, integrate their own genetic material into the host genome, leaving traces in the genetic code that we can use to study the evolutionary history of these organisms and interactions. Not only have viruses changed our genomes by bringing in novel genetic material, but host–pathogen interactions have also shaped our past in dramatic ways. This ‘fossil record’ of viral evolution that can be extracted from modern DNA holds many interesting clues to our past.

Retroviral ‘fossils’ in modern DNA

Since retroviruses can integrate into human DNA and get passed on from parent to offspring, we have a wealth of information to infer viral evolutionary history using phylogenetic tools. Within humans, it is estimated that these retroviral DNA segments comprise up to 8% of the human genome, having accumulated in our DNA over multiple infection events. This percentage is extremely high, considering the fact that functional genes account for only 2% of the genome!  The viral genetic fragments that are passed down from parent to progeny, and integrated via reproductive cells like sperm and egg, are called Endogenous Virus Elements or EVEs. For retroviruses, these ‘fossils’ are known as Endogenous Retrovirus Elements, or ERVs (Figure 1).

Looking for prior selection of advantageous antiviral genes in modern genomes can help us identify when viral infections happened. This can be extended to study how these viral infections led to the evolution of new or improved defense mechanisms in other species, as well as how the virus may have responded to evade these defenses. Modern genomes contain information about this evolutionary arms race that has been going on for millenia.

Paleo-virologists can use differences and similarities in the sequences of viral fragments between different species to date when a particular infection event may have happened, and to estimate how long that particular virus may have been actively circulating. The rate of evolution and mutation of viruses can also be approximated, and has been found to be surprisingly slow over the course of evolutionary time scales; for example, modern viral sequences haven’t gained enough mutations to be indistinguishable from ancient viruses. This contrasts starkly with the current high rates of viral mutation and evolution we see in modern viruses, like RNA viruses, suggesting that these may be happening on the timescale of a few decades instead of over the course of evolutionary history. Studying these ‘fossils’ also lets us discover genes that were co-opted from viral DNA by the host, and that now play key roles in important biological processes! In a more urgent context, such studies also let us infer when and how viruses ‘jumped’ between species, as is predicted to have happened with COVID19! 

Past and future viral pandemics

Scientists have used these studies to trace stories of pandemics that happened in our past, including one that lasted for nearly 15 million years! This particular pandemic, caused by ERV-Fc and thought to have begun 30 million years ago in the Oligocene Epoch, was highly prevalent and was believed to have infected at least 28 diverse mammalian species (Figure 2). The evolutionary success of this virus is thought to be due to genetic recombination– exchanging genes with other viruses could have brought in enough genetic novelty to keep adapting to the host defense mechanism for 15 million years! These studies serve as stepping stones to improve our understanding of how and when viruses evolve, as well as how we evolve to fight them. 

Figure 2: The ERV-Fc virus was active starting around 30 million years in the Oligocene epoch. We can trace its evolutionary history back, almost as if looking at its fossil record!

This has become particularly important in our current world, which is grappling with issues of climate change. While viruses may not leave physical records behind, they can survive for millions of years in harsh conditions, including the layers of permafrost and glaciers. As these now start to melt, we run a real risk of these ancient viruses coming back into circulation. We are also at an increased risk for viral zoonotic jumps (as we saw with the SARS-COV-19 virus). Viruses from non-human species are likely to become pathogenic to humans as we encroach upon previously undisturbed habitats of wildlife. Could these studies into the past help us better deal with the future?

While the jury on that is still out, one thing is clear: viruses have been around for millions of years and they are here to stay! The evolutionary arms race is on! 


Misha Gupta is a third year graduate student in the department of Organismic and Evolutionary Biology at Harvard University. 

Xiaomeng Han is a graduate student in the Harvard Ph.D. Program in Neuroscience. She uses correlated light and electron microscopy to study neuronal connectivity.

Cover image by laurentarroues from pixabay

For More Information:

  • To learn about another ancient pandemic that occurred around 25,000 years ago, click here.
  • Read more about pandemics throughout history here.
  • If you’re interested in paleovirology and HIV, read more here.

Leave a Reply

Your email address will not be published.