Jan 252008
 
Blogging on Peer-Reviewed ResearchAnonymous left a comment on my post on Bruschweiller’s work, referencing a couple of papers by Amarda Shehu, Cecilia Clementi, and Lydia Kavraki, the cites for which you can find at the bottom of this post. The most fascinating thing about these papers is the remarkable fidelity with which their Protein Ensemble Method (PEM) reproduces NMR-derived order parameters, 3-bond J couplings, and residual dipolar couplings. The authors demonstrate excellent correlations for ubiquitin, eglin c, Fyn SH3, Fnf10, and CI-2, and while all of these are relatively small proteins this is still a major accomplishment. Nonetheless, it is striking how little we learn from the exercise.

Keep in mind that one of the key goals of a structural biology research program is to get a veridical ensemble, i.e. an ensemble of structures that closely resembles those actually sampled by a protein under equilibrium conditions. We can learn important information from other kinds of ensembles, but the one that contains the information we are really after is the veridical ensemble.

The limitation here is intrinsic to the technique, so the technique bears some explanation. PEM utilizes an algorithm derived from robotics to move pieces of the protein. Initially, the approach was designed to map the ensemble of structures available to a loop, and I want to stress that with regards to that task I have no complaints. When given the task of mapping out the range of likely conformations of these regions this seems like an excellent approach, and the second figure of the 2006 paper seems to put this usage on fairly solid footing. The overall idea is that positioning the ends of a loop next to their anchor points is similar to solving a problem for getting a robotic arm with some number of degrees of freedom to adopt a particular pose. The authors’ algorithm solves this inverse kinematic problem with a coarse-grained view of the backbone. At this point the backbone is frozen, the side chains are added back and their conformations are sampled randomly. The conformations thus generated are then subjected to energy refinement using a conventional force field. For a loop with no surroundings, this is all well and good.

The problem arises when the whole protein is subjected to the technique. This is done by using a rolling window of residues: the fragment is chosen, an ensemble defined for it while the rest of the protein is held rigid, and then the window moves to the next overlapping fragment. The various structures determined in this phase are all stored; the dynamic properties of a given residue are derived from a weighted average of all snapshots of all fragments that include that residue, with the exception that the first and last few residues of any fragment are out of bounds due to artificial restraints.

The ensemble of structures derived is therefore not veridical. Because of the fragment-replacement approach, only a single part of the protein is ever actually departing from the equilibrium or minimum-energy structure—it is unlikely that motions are actually distributed this way. Moreover, because the endpoints of all snapshots cannot be simultaneously resolved, it is not possible to assemble whole-protein conformational ensembles from the individual fragment ensembles. So, no individual snapshot is likely to reflect a significantly populated member of the ensemble, and also there is no way to collate the snapshots in such a way that the energetics of the real ensemble are accurately sampled. We thus end up with an ensemble of structures that does not reflect the set of structures actually sampled by the protein at equilibrium.

As a result, the structure that is produced can give us only limited information about the protein. For instance, this might be a reasonably reliable way to predict what sorts of deformations are possible or likely in a binding interaction. Also, PEM probably does a good job of reporting at least the lower limit of the range of the structural ensemble. However, because it does not allow for significant compensating deformations outside of the modeled region the conformations obtained probably do not cover the entire solution ensemble even for a particular fragment.

A clear implication of this work is the idea that the data are dominated by local fluctuations. That is, dynamics information derived from NMR relaxation experiments, quantitative J-coupling analysis, and RDCs primarily reflects short-range motions that do not involve major excursions from the overall structure. If this were not the case, it is unlikely that an intrinsically short-range method such as PEM could reproduce the data so well. This is not exactly a surprise, however, and the nature of PEM for the most part prevents us from learning how local motions in one region of the protein affect local motions in a distal region.

In a larger sense, however, this work reinforces the idea that the ideal approach to constructing a veridical ensemble will involve some combination of coarse-grained and all-atom approaches. The key problem here is not the computational method but the windowing. If the inverse kinematics approach used here can be extended to treat the whole protein—or at least multiple regions of the protein—simultaneously, then I think the situation improves. The question is whether this kind of algorithm will be any more efficient than MD if the whole system is in motion; I suspect at least some part of the computational savings (after what comes automatically with the coarse-graining during step one) arises from having rigid context for the fragment motions. However, this approach is also likely to be more amenable to parallelism than standard MD simulations, and because of the coarse-graining it has the ability to sample structures accessible on a timescale longer than MD can treat.

The authors of these studies imply that their future focus will be on extending this approach to larger structures; I would urge them instead to prioritize developing a way to employ PEM or a similar method without relying on fragment replacement.

Shehu, A., Kavraki, L.E., Clementi, C. (2006). On the Characterization of Protein Native State Ensembles. Biophysical Journal, 92(5), 1503-1511. DOI: 10.1529/biophysj.106.094409

Shehu, A., Clementi, C., Kavraki, L.E. (2006). Modeling protein conformational ensembles: From missing loops to equilibrium fluctuations. Proteins: Structure, Function, and Bioinformatics, 65(1), 164-179. DOI: 10.1002/prot.21060

 Posted by at 10:45 PM
Nov 082007
 

Blogging on Peer-Reviewed Research

As I mentioned in my last post, a major challenge in the interpretation of protein motion has been the poor correlation between dynamics information arising from computer molecular dynamics simulations and NMR relaxation experiments. MD simulations complement the order parameters of the model-free formalism by providing detailed descriptions of motions that model-free parameters describe in a general way. However, the output of an MD simulation is only to be trusted if its specific model matches the experimental observations from NMR. Historically this has not been the case, especially when it comes to the description of side-chain motions.

Rafael Brüschweiller’s lab has for several years been engaged, with some success, in an effort to improve MD simulations to the point where they can predict S2 values that match NMR results for side chains. While a cold-eyed analysis of the correlations they’ve obtained so far (r values near 0.6) might not be very favorable, the comparisons aren’t that bad. Their most recent communication to JACS (citation at the end of the post), appearing online last week, displays a marked improvement in the correlations between simulation and experiment. The r values are still not that close to 1, but the current results appear to be a significant step forward.

So, what was done differently? Showalter et al. use an altered version of the AMBER99 forcefield that has a modified dihedral angle potential. This had good results in simulating the dynamics of backbone amide moieties, although MD has historically done a reasonably good job with these anyway. In this communication they simulated the side-chain motions of calbindin and compared them to experimental values. They calculated spectral densities J(ω) from their simulated correlation functions, though they are required in this case to make use of experimentally determined molecular correlation times. They do a strikingly good job of predicting the J(ω) at the Larmor frequency of deuterium and also at twice that frequency (figure filched from paper):
This really is an amazingly good job. Yet, as you can see from their figure 2 (a part of it is at right), the S2 values they obtain aren’t very close to those that are derived from NMR experiments. This is also reflected in the relatively poor agreement at J(0) (r=0.86). This seems to be very odd, because the magnitude of the spectral density at J(0) is very strongly dependent on τm, which they took from an NMR experiment. As is evident from the model-free expression for J(ω), the order parameter scales this term. Keep in mind that τm is typically on the order of 10-9 seconds while τe is on the order of 10-11 seconds—this means that the second term in the spectral density expression is negligible at J(0). Because they took their τm from experimental data, the decreased correlation at J(0) indicates that their correlation functions converged to inaccurate values.

Giving the data a once-over, it appears that their fitted order parameters were mostly high. It’s not clear to me why this should be so, except that over-constraint of the backbone may be affecting the side chains. From the supplementary information it appears that fits of backbone order parameters were also slightly higher in MD than experiment.

The most notable failure is not much help because it missed low. The significant outlier in the J(0) plot is threonine 45, shown in orange at right—this figure is made from PDB structure 3ICB, which was used in this simulation. The correlation function for this residue fails to converge, largely due to sampling of an alternate ψ angle. This behavior is consistent with the observations of low order parameters in that particular loop of the protein. From the crystal structure it appears that the hydroxyl moiety of Thr 45 is capping a helix (blue). It’s possible that the misbehavior of this particular residue is due to some miscalibration of the force-field that doesn’t accurately capture this capping interaction. The altered backbone dihedral angle potential may be overwhelming the hydrogen bonding interaction, resulting in the aberrantly low J(0) fit for this methyl group.

In general, the simulations for threonines and valines were not as accurate as those for other types of residues, which seems a little strange. These were also unusual in that they missed low, while, as I mentioned, on average residues tended to miss high. The branched nature of these amino acids causes some steric interactions with the backbone, so one would expect an improved potential to help these residues the most. However, if the altered potential is causing unwarranted excursions from the equilibrium structure, as seems to be the case with Thr 45, then valines and threonines, whose motions are at least partially controlled by steric interactions with the backbone, might be the most strongly affected.

I should stress that it’s not necessary for the simulation to produce too much backbone motion to get this result. If the backbone dynamics are the wrong kind of motion that could have this effect even if the model-free parameters for the backbone derived from the simulation appear to be accurate.

It isn’t terribly clear why an improved backbone potential should increase the correlation of side-chain order parameters between MD and NMR. Showalter et al. venture no explanation, and my own research and that of others hasn’t shown any particular linkage between backbone dynamics and side-chain dynamics, except in the case of alanine residues. It may be that a more accurate depiction of backbone motions contributes to a more accurate dynamic environment generally. Or, the motions of the backbone and side chains could be related in unexpected ways. An in-depth analysis of the simulation probing for these correlations could be very instructive. Regardless, these results are a significant, encouraging step towards using MD simulations to interpret the findings of NMR dynamics experiments.

Showalter, S. A.; Johnson, E.; Rance, M.; Bruschweiler, R. “Toward Quantitative Interpretation of Methyl Side-Chain Dynamics from NMR by Molecular Dynamics Simulations” J. Am. Chem. Soc. (Communication); 2007;ASAP Article.

Oct 292007
 
PZ Myers is one the internet’s most sarcastic and unapologetic atheists, and occasionally this leads him to say things I find extremely disagreeable. His lightning-rod status, however, means that he occasionally picks up some very interesting stuff from the interwebs. By this I do not mean his regular e-mails from the religious fringe, but instead the stuff he finds in support of evolution. As a case in point, this post where he gets an interesting video in which watches are evolved in silico from random parts. You should go check it out; it will take about 10 minutes for an entertaining explanation. Particularly note the way that the evolution plays out—relatively stable forms persist for huge numbers of generations and then rapidly change into completely different forms. Apply this knowledge the next time someone trots out the “no transitional fossils” argument.

Also, RIP Arthur Kornberg, Nobel Prize winner and great biochemist. Despite his gifts, his own research might be his secondary contribution, as his biological and scientific progeny may prove the greater. They already include another Nobelist. As scientists, we are always tempted to see our own work as being of paramount importance, but the training we give to our students and the spirit of enquiry we impart to our children are truly our greatest gift. Any scientist who neglects these aspects of his or her legacy is a failure, no matter how many splashy publications decorate his or her CV.

 Posted by at 1:45 PM