Jan 252012
 

Imagine that you could get an injection of a protein that would chop up arterial plaques. Imagine that you could drop a plastic bottle into a pool of bacteria that would transform it back into high-grade oil. Imagine that you could take any organic material at all and, with a minimum of planning, transform it into any kind of desired organic chemical with a bare minimum of energy input and no need to purify intermediates. This is the vision behind the applied structural biology of protein design, the holy grail of which is to come up with a way to make enzymes that will perform novel chemistry. A study recently published online in Nature Biotechnology by David Baker’s group (1) suggests that the design process could be improved by crowdsourcing certain parts of the problem to gamers (the paper is paywalled at Nature but freely available via the Foldit site).

To do this, the Baker group used their program Foldit, which they have used previously for predicting three-dimensional protein structures from their amino acid sequences. Rather than predicting a structure from a known sequence, however, the Baker group asked the Foldit players to figure out an amino acid sequence that would generate a desired structure. The goal was to enhance an enzyme that would perform the chemically useful Diels-Alder reaction.

An enzyme is a protein that increases the rate of (catalyzes) a chemical reaction, often by incredible amounts. The best enzymes can increase reaction rates by factors of up to 1017 relative to the same reaction occurring in pure water. Protein design aims to produce artificial enzymes with rate enhancements comparable to their natural counterparts. To do this, biochemists try to design an active site that stabilizes the transition state of a chemical reaction. The transition state is the point of a reaction where the molecules are in their least stable state, and equally likely to revert to substrates or continue on and become products.

Unfortunately, it’s not just as simple as stabilizing a transition state. Enzymes have to bind and release their substrates and products, producing energy landscapes that are at least as complex as the one I have drawn below. Using a protein design protocol they had described in previous publications, Baker’s group managed to produce a weak enzyme. They then asked the Foldit players to help out, by posing some specific challenges to try and stabilize the bound substrates. The Foldit players eventually produced an 18-fold improvement in the enzyme’s kcat/KM value. To understand what that means and what the players accomplished, let’s examine this reaction coordinate:

That’s a busy little figure, but it’s not as bad as it looks. The position up or down in the figure indicates how much energy a state has. The more energy, the less likely the system is to occupy that state. Left to right positions show us how close we are to the desired state of the system, which is to have the product (P) we want separate from the enzyme (E) that catalyzed its production from substrate (S). To move from one stable state to another stable state, you have to push the system over hills (energy barriers) in the landscape, just like pushing a car up a hill. The higher the barrier, the slower that step becomes. For simplicity, this diagram shows only one substrate, but the artificial enzyme had two. We can pretend that the Foldit effort started with an enzyme that resembled the blue curve.

We start with E and S separate from each other in solution (E+S). E and S bind to each other to form ES, releasing binding energy. Here I’ve shown a small barrier between E+S and ES, but in many cases there is no barrier here, or it is negligible. Next S is converted to P, and as you can see there is usually a large energy barrier, at the top of which is the transition state (TS). The height of the barrier is determined by the activation energy, which is affected by the structure of the enzyme-substrate complex. Once P has been formed, the complex dissociates so we have free enzyme and product (E+P). Here I have shown E+P to be a lower-energy state than EP, but this won’t necessarily be true.

In the language of Michaelis-Menten kinetics, this landscape is described by two main parameters. KM, also called the Michaelis constant, describes the balance between E+S and ES, and therefore primarily reflects the binding energy. The larger the binding energy, the more ES will be favored, and the lower KM will be. The turnover number, or kcat (maybe we should call this the Menten constant?) describes the creation of product over time, and in this diagram it depends on the activation energy. Again, the larger the activation energy, the lower kcat will be. However, kcat really just depends on the slowest step of the catalytic cycle. If the largest energy barrier was between EP and E+P, kcat would depend on that barrier. Because kcat/KM is something like a normal rate constant, and combines the values in an easy-to-understand way (a higher kcat/KM means a better enzyme), it’s often used to describe an enzyme’s activity.

So how did the Foldit players improve the activity by a factor of 18? The original enzyme design left part of the active site open to water. Through a series of iterations, the Foldit players filled in this void with a self-stabilizing helix-loop-helix motif (Figure 1b). The upshot of this was that the affinity of the enzyme for both substrates increased. Thus, KM decreased, as shown in Table 1, for both substrates. At the end of the process, the diene bound six times as tightly and the affinity for the dienophile improved by about a factor of three. This accounts for all the observed change in kcat/KM, because kcat was not improved.

Although it may not seem like it, we can also learn a great deal from the fact that kcat did not change. This observation shows that the changes made by the Foldit players did stabilize the TS. Otherwise, the energy barrier would have increased when they stabilized the ES complex. However, the best-case scenario would have been for them to uniquely stabilize TS without improving the energy of ES, because this would effectively lower the energy barrier and increase the reaction rate. Because this didn’t happen, the situation follows the orange curve in the figure above: the ES and TS states have shifted down in energy by the same amount, with no change to the activation energy.

The lack of change in kcat also indicates that the Diels-Alder reaction itself, rather than product dissociation, is rate-limiting for the enzyme. My reasoning here is that the increase in affinity is general. We know that both the ES and TS complexes were stabilized by the changes, so EP probably was too, as shown in the orange curve. If the EP → E+P transition were rate-limiting, these stabilizing mutations would have made the enzyme slower.

The Foldit players made this a better enzyme, but that doesn’t exactly mean that it’s an impressive one. The observed kcat is significantly slower than almost any natural enzyme, and the overall rate enhancement is on the order of 103-104, which is not much better than catalytic antibodies. The success of the Foldit players at improving the affinity of the enzyme for all the bound states suggests that it might be possible to use crowdsourced systems like Foldit to accomplish the more difficult feat of stabilizing a TS, or at least to generate folds that support a pre-defined TS. The ultimate goal is to produce something like the green curve, where substrate binding is stronger and activation energy is lower. I hope that such efforts will be taking place among the Foldit players soon, if they haven’t started already.

Disclaimer: I am part of an ongoing collaboration with David Baker’s group unrelated to the Foldit program.

1) Eiben, C., Siegel, J., Bale, J., Cooper, S., Khatib, F., Shen, B., Players, F., Stoddard, B., Popovic, Z., & Baker, D. (2012). Increased Diels-Alderase activity through backbone remodeling guided by Foldit players Nature Biotechnology DOI: 10.1038/nbt.2109 Also available for free from the Foldit site.

Sep 202011
 

One of the goals of computational biology is to predict the complete high-order structure of a protein from its amino acid sequence. Often reasonably good structures can be produced by modeling a new protein according to an already-known structure of a homologous protein, one with a similar sequence and presumably a similar structure. However, these structures can be inaccurate, and obviously this method will not work if no homologous structure is known.

Foldit is an online game developed by the research team of Dr. David Baker that attempts to address this problem by combining an automated structure prediction program called ROSETTA with input from human players who manually remodel structures to improve them. Even though most of the players have little or no advanced biochemical knowledge, Foldit has already had some striking results improving on computational models. An upcoming paper in Nature Structural & Molecular Biology (1) (PDF also available directly from the Baker lab) details some interesting new successes from the Foldit players.

Contrary to some reports, the Foldit players did not solve any mystery directly related to HIV, although their work may prove helpful in developing new drugs for AIDS. What the Foldit players actually did was to outperform many protein structure prediction algorithms in the CASP9 contest, and to play a key role in helping solve the structure of an unusual protease from a simian retrovirus.

M-PMV Protease

If you don’t recognize Mason-Pfizer Monkey Virus (M-PMV) as a cause of AIDS in humans, that’s because it isn’t. It causes acquired immune deficiency in macaques, however, and it has an unusual protease that may tell us useful things.

Crystal structure of inactive HIV-1 protease mutant in complex with substrate.

A crystal structure of an inactive mutant of HIV-1 protease in complex with its substrate. The protease monomers are in dark green and cyan, the substrate is represented as purple bonds.

Retroviruses like HIV often produce proteins in a fused form rather than as individual folded units. In order to be functional, the various proteins must be snipped out of these long polyprotein strands, so the virus includes a protease (protein-cutting enzyme) to do this. In most retroviruses, this protease is dimeric: it is composed of two protein molecules with identical sequences and similar, symmetric structures. The long-known structure of HIV protease, seen on the right (learn more about HIV protease or explore this structure at the Protein Data Bank) is an example of this architecture.

People infected with HIV often take protease inhibitors to interfere with viral replication. These drugs attack the active site, where the chemical reaction that cuts the protein strand takes place, but it has been theorized that viral proteases could also be attacked by splitting up the dimers into single proteins, or monomers. The problem is, the free monomer structures aren’t known.

This is where the M-PMV protease comes in. Although it is homologous to the dimeric proteases, M-PMV protease is a monomer in the absence of its cutting target. If we knew this protein’s structure, we could perhaps design drugs that would stabilize other proteases in their monomer form, rendering them inactive. An attempt to determine the structure using magnetic resonance data (NMR) produced models that seemed poorly folded and had bad ROSETTA energy scores. And, although the protein formed crystals, X-ray crystallography could not solve its structure either, despite a decade of effort.

An X-ray diffraction pattern.The reason for this has to do with how X-ray crystallography works. If you fire a beam of X-rays at a crystal of a protein, some of the rays will be deflected by electrons within it and you will observe a pattern of diffracted dots similar to the one at left, kindly provided by my colleague Young-Jin Cho. The intensities and locations of these dots depend on the structure and arrangement of the molecules within the crystal. X-ray crystallographers can use the diffraction patterns to calculate the electron density of the protein and fit the molecular bonds into it (below, also courtesy of Young-Jin). However, the electron density cannot be calculated from the diffraction pattern unless the phases of the diffracted X-rays are also known. Unfortunately there is no way to calculate the phases from the dots.

An electron density map

An electron density model (wireframe) with the chemical bonds of the peptide backbone (heavy lines) fitted into it.

There are many ways to solve this problem, but not all of them work in every system. One widely-applicable approach is called “molecular replacement”. In this method, a protein with a structure similar to that of the one being studied is used to guess the phases. If this guess is close enough, the structure factors can be refined from there. In the case of M-PMV protease, however, the dimeric homologues could not be used for replacement, and an attempt to use the NMR structure to calculate the phases also failed.

Then the Foldit players went to work. Starting from the NMR structure, Foldit players made a variety of refinements. A player called spvincent made some improvements using the new alignment tool, which a player called grabhorn improved further by rearranging the amino acid side chains in the core of the molecule. A player named mimi added the final touch by rearranging a critical loop.

Going from mimi’s structure (several others also proved suitable), the crystallographers were able to solve the phase problem by molecular replacement and finally determine the protease’s structure. None of the Foldit results were exactly right, so it’s inaccurate to say that the players solved the structure. However, their models were very close to the right answer, and provided the critical data that allowed the crystal structure to be solved. Once the paper is published, you’ll be able to find that structure at the PDB under the accession code 3SQF.

We can’t know right now whether this structure will enable the design of new drugs, but the Foldit players were the key to giving us a better chance of using it for this purpose. What may be even more exciting is the possibility that Foldit could be used in other structural studies to come up with improved starting models for molecular replacement. As with any method of predicting protein structures, however, the gold standard is CASP, so the Foldit teams participated in CASP9.

CASP9

The Critical Assessment of protein Structure Prediction is a long-running biennial test of computer algorithms to calculate a protein’s structure from its sequence. This experiment in prediction has a fairly simple setup.

1) Structural biologists give unpublished structures to the CASP organizers.

2) The sequences belonging to these structures are given to computational biologists.

3) After a set period, the computational predictions are compared to the known structural results.

The Baker group generated starting structures using ROSETTA, then handed the five lowest-energy results off to the Foldit players. For proteins that had known homologues, the results were disappointing. Foldit players did well, but they overused Foldit’s ROSETTA-based minimization routine, which tended to distort conserved loops.

An energy landscape showing an incorrect move towards a false minimum and a correct, more difficult move towards a true minimum.The nature of this problem became even more clear when the Baker group handed the Foldit players ROSETTA results for proteins that had no known homologues. In that case they noticed that players were using the minimization routine to “tunnel” to nearby, incorrect minima. You can get a feel for what that means by looking at the figure to the left.

In this energy landscape diagram, the blue line represents every possible structure of a pretend protein laid out in a line, with similar structures near each other and the higher-energy (worse) structures placed higher on the Y axis. From a relatively high-energy initial structure, Foldit players tended to use minimization to draw it ever-downward towards the nearest minimum-energy structure (red arrow). Overuse of the computer algorithm discouraged them from pulling the structure past a disfavored state that would then start to collapse towards the true, global minimum energy (green arrow).

The Foldit players still had some successes — for instance, they were able to recognize one structure ROSETTA didn’t like very much as a near-native structure. The Void Crushers team successfully optimized this structure, producing the best score for that particular target, and one of the highest scores of the CASP test. If the initial ROSETTA structures had too low of a starting energy, though, the players wouldn’t perturb them enough to get over humps in the landscape.

Thus, Baker’s group tried a new strategy. Taking the parts of one structure that they knew (from the CASP organizers) had a correct structure, they aligned the sequence with those parts and then took a hammer to the rest, pushing loops and structural elements out of alignment. This encouraged the players to be more daring in their remodeling of regions where the predictions had been poor, while preserving the good features of the structure. Again, the Void Crushers won special mention, producing the best-scoring structure of target TR624 in the whole competition.

Man over machine?

Does this prove that gamers know more about folding proteins than computers do? Some of them might, but Foldit doesn’t really use human expertise. Rather, the game uses human intelligence to identify when the ROSETTA program has gone down the wrong path and figure out how to push it over the hump. When the human intelligences aren’t daring enough, or trust the system too much, as in the case of the CASP results, Foldit doesn’t do any better than completely automated structural methods. When the human players are encouraged to challenge the computational results, however, the results can be striking. As Baker’s group are clearly aware, further development of the program needs to be oriented towards encouraging players to go further afield from the initial ROSETTA predictions. This will likely mean many more failed attempts by players, but also more significant successes like these.

Disclaimer: I am currently collaborating with David Baker’s group on a research project involving ROSETTA (but not Foldit).

1) Khatib, F., DiMaio, F., Cooper, S., Kazmierczyk, M., Gilski, M., Krzywda, S., Zabranska, H., Pichova, I., Thompson, J., Popović, Z., Jaskolski, M., & Baker, D. (2011). Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural & Molecular Biology DOI: 10.1038/nsmb.2119

Sep 062011
 

Over the last two decades, multiple kinds of NMR experiments have repeatedly shown that protein structures are quite variable, frequently shifting to minor conformations. The most striking evidence in this line has come from hydrogen-exchange experiments, which have demonstrated that virtually all proteins undergo excursions to partially-folded states at equilibrium. As R2 relaxation-dispersion experiments have become more widely used, excursions to alternative folded states have repeatedly been detected. The challenge now is to find ways to characterize these low-population states. Advanced crystallographic techniques have proven useful in determining some of these alternative structures. However, proteins are not always amenable to crystallography, and the minor state in the crystal may not correspond exactly to the minor state in solution. Therefore there is an ongoing effort to define these states by NMR. Lewis Kay’s group in Toronto is in the forefront of this effort, and recently reported the solution structure of a minor state of a T4 lysozyme mutant (1).

Lysozyme is an extremely common enzyme because it has the useful property of degrading the peptidoglycan that makes up bacterial cell walls. This makes it a natural antibiotic against gram-positive bacteria, and as a result it is found in many secretions and fluids, including saliva and egg whites. Because it is plentiful it has been widely studied, with many mutants made and characterized for their activity and stability. Lysozyme also crystallizes easily — doing this was actually part of my biochemistry lab class back in college. So, many structures of the enzyme and its mutants are available.

T4 lysozyme L99A with benzene boundOne lysozyme mutant that has interesting properties is the L99A mutant of the lysozyme from the T4 bacteriophage. This mutation creates a cavity in the upper part of the protein that is known to bind hydrophobic ligands such as benzene (right, benzene in purple, PDB code 3DMX). However, crystal structures show this binding pocket to be completely buried, even when empty. This poses the question of how the ligand gets in. Although the structure of L99A is very similar to WT, the Kay lab noticed that the NMR spectra of the mutant contained broadened peaks, indicating the presence of an exchange process between two conformations. Therefore, the Kay lab used R2 relaxation-dispersion to show that the protein sampled a minor state that accounted for 3% of the total protein, with a lifetime of about 1 ms (2). This conformation was presumed to be the binding-competent form of the protein. However, without a structure of this state, they could not confirm that the pocket was accessible. This led to their present attempts to characterize this low-population state using NMR.

As I have mentioned before, R2 relaxation-dispersion experiments can provide three important pieces of information: the populations of the two conformational states (pG, pE for ‘Ground’ and ‘Excited’), the rate of exchange between them (kEX = kGE + kEG), and the difference in chemical shift between the two states at each nucleus (|Δω|). Because the chemical shift is determined by the protein conformation, and because additional experiments can determine the sign of Δω, it should be possible to figure out the structure of the alternate state, given enough relaxation-dispersion data. Therefore, the Kay lab performed a large number of experiments to determine Δω for nearly all of the backbone 15N, 13C, and 1H atoms, as well as many side-chain methyl groups. They then fed this data to the CS-ROSETTA protocol, which can determine a protein structure using chemical shifts alone. While holding the majority of the protein in a single conformation, they allowed CS-ROSETTA to remodel the part of the mutant where they had detected conformational fluctuations.

Lysozyme minor state/major state overlay
Major state (green) and 5 lowest-energy conformers of the minor state (Excited) ensemble (blue)

Using this method, they were able to produce a structure of the transiently-populated minor state of the mutant protein, which I show to the left in comparison to the major conformation (PDB codes 2LCB and 3DMV, respectively, aligned using residues 10-100, 150-160). The most dramatic change is that two of the helices have been fused into one. As you can see, the new helix clashes with the usual position of phenylalanine 114 (pale green, because of the overlap it’s hard to see), which has in turn shifted so that it occupies part of the cavity where benzene binds (pale blue). This suggests, contra the Kay group’s earlier work, that the minor state is also incapable of binding to benzene.

This is a difficult prediction to test in the L99A system because the minor state (E) lives for such a short time that it’s difficult to tell whether anything binds to it or not. Therefore, Bouvignies et al. made a double-mutant protein with the L99A mutation and an additional G113A mutation that was predicted to stabilize the long helix observed in the minor form. This turned out to be the case: the E structure was enriched in the double mutant. In addition, the interconversion rate was slow enough that at low temperature distinct peaks could be observed for each conformation, as well as cross-peaks indicating exchange between them (I discussed this kind of experiment in my previous posts about cyclophilin). Under these conditions, the minor form is sufficiently populous and long-lived to determine whether ligands bind to it.

The Kay group did this by adding an equimolar amount of benzene to the reaction and observing whether there were exchange peaks. If you examine their figure 3c, it’s clear that exchange occurs between all three possible states: (G)round, (E)xcited, and (B)ound. This might seem to contradict their hypothesis. However, the E→B exchange peaks have very low intensity and take significantly longer to reach a maximum than the other exchange peaks. Therefore, this exchange peak may represent a low-frequency E→G→B event rather than direct exchange between the E and B states. Fits of the exchange curves seem to substantiate this interpretation, as the fit tended towards a value of zero for kEB and the χ2 jumped up significantly when kEB was fixed to a very low number.

My only concern with this result is that the kEG rate changes from ~31 to ~36 s-1 when benzene is added (kGE remains the same). It’s possible that the presence of benzene really does accelerate this process, or that the errors are underestimated. The model might also be janky in some hidden way, but my back-of-the-envelope check of the parameters suggests that the results are consistent with what is known about benzene binding to the L99A mutant, e.g. various ways of calculating the KD from these data produce a value of approximately 1 mM, matching earlier results.

If the E state does not represent a binding-competent state, that means the protein must be exchanging to yet another, still-undetected state. According to Bouviginies et al., the E structure they determined can account for all of the observed chemical exchange. If the alternative state that is capable of admitting benzene to the hydrophobic pocket cannot be detected by relaxation-dispersion experiments, it must constitute a very small fraction of the overall protein population (< 1%) and undergo very fast exchange. In principle, the existence of such a process can be detected using experiments designed to measure the intrinsic R2 of a residue, and also should be detectable using 1H experiments directed towards the methyl groups (the side chains likely represent the best bet for explaining the phenomenon). It does not appear that those experiments have been done yet, but I’m certain they’re underway.

Bouvignies et al. made a third construct incorporating the R119P mutation to stabilize the E state even further. This succeeded, producing a protein that spent most of its time in the E state and occasionally sampled the G state. The paper contains no data as to whether benzene detectably binds this mutant, although that strikes me as an obvious experiment to try. Presumably the obligate route through a high-energy intermediate would slow the kinetics of binding relative to the single mutant. If the penalty for adopting the G fold in this mutant is high enough, it might also significantly reduce the affinity.

The findings in this paper are not of any immediate practical use. The L99A mutant is a biophysical curiosity, not a disease target, and most of these techniques have been presented before, at least individually. However, this does serve as a very nice example of the advanced NMR methods that allow the determination of minor states, and of the surprising findings that can be derived from them. This paper should serve as a model approach to this sort of question, which may find broad applicability in the study of signaling, ligand binding, and protein evolution.


Disclaimer: I am currently collaborating with David Baker’s lab on a research project using ROSETTA.

1) Bouvignies G, Vallurupalli P, Hansen D, Correia B, Lange O, Bah A, Vernon R, Dahlquist FW, Baker D, & Kay LE (2011). Solution structure of a minor and transiently formed state of a T4 lysozyme mutant Nature, 477 (7362), 111-114 DOI: 10.1038/nature10349

2) Mulder FA, Mittermaier A, Hon B, Dahlquist FW, & Kay LE (2001). Studying excited states of proteins by NMR spectroscopy. Nature structural biology, 8 (11), 932-5 PMID: 11685237