Every organism is an incredibly complex machine whose biological processes benefit from 3.6 billion years of refinement through natural selection. Thus, it should not be surprising that designing biological systems is still difficult for scientists and engineers. Despite the obstacles, successes in areas such as protein engineering are leading to useful applications like better medicines and cheaper biofuels. Now, a new method of artificial selection is helping to evolve useful proteins 100 times faster than was previously possible.
Introduction to protein engineering
Proteins are molecular machines that enable every biological process. For example, during the growth of tissues in our bodies, one set of proteins catalyzes the chemical reactions that break down food into small molecules. Cells then use other proteins to assemble these small molecules into a variety of larger pieces. Other proteins act as the building blocks of cellular structures, regulating cell division, or controlling the behaviors of cells. Proteins also have many industrial uses, like breaking down plant wastes for the production of biofuels and synthesizing complicated compounds. As structural machines, proteins have medicinal uses, like binding to receptors on the surface of cancer cells to prevent growth. These applications often require proteins to function outside of their natural contexts. Optimizing them to do this can be difficult.
Each protein is composed of a chain of amino acids. Twenty different amino acids exist naturally, and any of these can occupy each position of the chain. The sequence of amino acids in a given protein determines both its shape and function, just as letters of a word determine its meaning. A random sequence of 100 letters is unlikely to form a real word; similarly, a random sequence of 100 amino acids would not likely form a functional protein. Because there are so many possible arrangements of amino acid, it would be impossible to screen all random proteins with a length of, say, 100 amino acids because the total number of possible proteins is too large (Figure 1). Therefore, when engineers try to make new proteins, they prefer to start with a natural protein and modify it by changing the amino acids at only a few positions (Romero & Arnold, 2009).
Figure 1. Finding the right modifications can be difficult because the number of possible protein sequences is so large. For every position to be modified, there are 20 possible choices for the amino acid. Thus, if N positions are modified, the number of possible proteins is 20N. This number grows extremely fast as N increases. Researchers can screen from 108 to 1015 different proteins at a time, using display technologies that screen the function of proteins attached to bacteria, viruses or ribosomes (Baker, 2011). This limits a screen to a “protein” consisting of only 6-12 variable positions – much shorter than most naturally occurring proteins, which can be hundreds of amino acids long.
The process of optimizing a protein for a specific function is called directed evolution. Beginning with a natural protein, researchers generate many versions with slightly different amino acid sequences. Every modification has the potential to change the protein’s function. These changes will usually be neutral or have a negative impact. But occasionally, a modification may result in a better protein. After generating many different proteins, engineers test each protein for the desired function, identify the proteins that work best, and use those proteins as a template for a new round of modifications. This process is called selection, because the only the best proteins are selected to continue to the next round. The challenge of directed evolution is to improve the protein’s function through multiple rounds of modification and selection.
Current methods of directed evolution have two limitations. First, the functions that can be evolved are limited by current selection technologies, which are good at finding proteins that stick to each other or that catalyze specific chemical reactions, but not so good at optimizing the function of proteins within living cells. Second, every round of variation and selection requires manual setup by researchers. This time expense limits the total number of rounds that can be completed.
Designing better selections
Researchers in David R. Liu’s lab at Harvard Medical School recently developed a new technology that overcomes some of the limitations of existing selection techniques. The researchers call their system Phage-Assisted Continuous Evolution (PACE). PACE uses phages (a type of virus) to evolve useful proteins. For example, the scientists wanted to evolve improved versions of the protein called T7 RNA polymerase, which makes RNA molecules from the DNA inside a cell’s nucleus. This process is called transcription. Other molecules in the cell subsequently use this RNA as a blueprint for creating new proteins. The scientists take a virus and disable its ability to replicate by removing an essential gene. They then provide the virus with a way to get that gene back by optimizing a protein (in this case, T7 RNA polymerase). The beauty of the new system is there are many possible ways to return function to an essential gene, and therefore many functions that can be optimized by directed evolution.
To do this, the authors put the protein they want to optimize into the phage. The phage is then allowed to infect bacteria. Once inside a bacterium, the phage can replicate itself by copying its DNA and forcing the cell to make the proteins that form the mature phage structure. One of these proteins, called “p3,” is required for phage to infect other bacteria. Remember how the scientists wanted to disable the phage’s ability to replicate? They did this by removing the DNA sequence for p3 from the phage. At the same time, they actually added it to the genome of the bacteria that the phage is infecting! This makes the phage dependent on the bacteria for replication. Any process that increases the expression of the p3 protein by the bacteria therefore also increases the replication rate of the phage.
As mentioned earlier, the RNA needed to create a protein is transcribed from DNA. In bacteria, transcription begins at a particular location on the DNA called the promoter. In this study, the scientists evolved the T7 polymerase to initiate transcription at a new, different promoter by inserting the new promoter sequence in front of the p3 gene. In this way, if a phage contained a version of T7 polymerase that increased levels of transcription, it would lead to increased production of p3 and to faster replication of that phage.
PACE is unique because the system evolves continuously, without manual intervention between rounds of selection. A new round of selection is as simple as adding fresh bacteria and washing away the old. At every step, the fastest replicating phages benefit most from the new batch of bacteria, and each new, superior version therefore quickly outpaces its slower brethren. Since adding new bacteria to the mix is easy to automate, the system can be run without human supervision. The end result? The scientists are now able to go through about 200 generations of directed evolution per week!
Another major advantage of PACE is that it focuses the selection on the genome of the phage, not the bacteria. If the bacteria were allowed to evolve, they might find other ways of expressing the p3 gene that aren’t related to improving the function of a desired protein like T7 polymerase, effectively short circuiting the selection. With PACE, such mutations still may occur in the bacteria, but they won’t affect the selection because the bacteria are constantly being washed out and replaced with new bacteria that don’t have those mutations. Limiting the evolution to only the phage enables selection for more complicated cellular functions, which would be difficult to otherwise achieve because of the potential for short-circuit mutations.
Systems like PACE will be useful for optimizing the behavior of biological systems. PACE and other methods for directed evolution will hopefully provide us with many useful tools for creating new medicines and better chemistry, ultimately improving our health and the environment.
Eric Kelsic is a graduate student in the Systems Biology PhD program at Harvard University.
- Esvelt KM, Carlson JC, Liu DR. A system for the continuous directed evolution of biomolecules. Nature 472 (2011).
- Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution. Nat Rev Mol Cell Biol. 2009 December 10(12). (http://1.usa.gov/qSioD7)
- Baker M. Protein engineering: navigating between chance and reason. Nature Methods 8 (2011)
Additional Reading (Open Access)
- Gitig, D, Directed evolution gets a significant speed boost. Ars Technica, 2011. (http://arstechnica.com/science/news/2011/05/using-phage-infections-to-force-the-directed-evolution-of-proteins.ars)
- Stapleton JA, Swartz JR. Development of an In Vitro Compartmentalization Screen for High-Throughput Directed Evolution of [FeFe] Hydrogenases. PLoS ONE 5(12) 2010. (http://bit.ly/nptARG)
- The authors develop a method of selecting new protein catalysts that are useful for biofuel production.
- Bolt A, Berry A, Nelson A. Directed evolution of aldolases for exploitation in synthetic organic chemistry. Arch Biochem Biophys. 2008 June 15; 474(2). (http://1.usa.gov/qWLDql)
- The authors review efforts to evolve enzymes that catalyze the aldol reaction, which is useful for the production of small molecule medicines.
- Du SX, Xu L, Zhang W, Tang S, Boenig RI, et al. A Directed Molecular Evolution Approach to Improved Immunogenicity of the HIV-1 Envelope Glycoprotein. PLoS ONE 6(6) 2011. (http://bit.ly/pntHly)
- The authors evolve a protein that could help make better HIV vaccines by enhancing the immune response.
- Slonczeski JL, Foster JW. Microbiology – An Evolving Science. Ch11 Molecular Biology Of Viruses, eTopics 11.1 The filamentous phage m13: vaccines and nanowires., 2nd edition, 2010. ()