Mar 112008
ResearchBlogging.orgUnless you have extremely good luck or a lot of supporting information, deriving a protein structure from NMR data is an enormous pain in the ass. First, you have to assign the resonances of the protein—that is, you must determine the chemical shifts of most or all of the protons in the protein, which in turn entails figuring out the chemical shifts of most of the carbon and nitrogen atoms as well. Then you have to acquire nuclear Overhauser effect (NOE) and/or residual dipolar coupling (RDC) data to figure out how far the atoms are from each other and how some of the bonds are oriented. Automated NOE assignment programs make the analysis of all this data less onerous than it once was, but particularly if your protein has non-ideal relaxation characteristics, scraping together enough data to derive a structure can be a tough task. The most irksome thing about it is that in principle, all the structural information you could ever want is contained in the chemical shifts you figured out in the very first step. What if you had a technique that could figure out a structure just from them?

First, a little more explanation about chemical shift. The local chemical structure dominates the chemical shift in most cases—you expect to find a proton in a particular place based on whether it’s in a methyl group or bound to a nitrogen. Additionally, the chemical shift is sensitive to local bond angles. Surrounding groups (especially aromatic rings and paramagnetic atoms) can also alter the chemical shift substantially. However, the chemical shift is an ensemble averaged property. We cannot receive NMR data from a single molecule; as a result the observed chemical shift reflects every conformation in the ensemble and also (to some extent) the interconversions between those conformations. Because it should be possible to reconstruct an entire protein structure just knowing the local information about bond angles, it should in principle be possible to use chemical shifts to reconstruct the average conformation. The problem is that all these different factors get mashed up into a single number, often in contradictory ways. Parsing the purely structural factors (bond angles) out from this single number has proven difficult. Dihedral angle restraints based on chemical shifts have been used for many years, but only as a component of a more complete structural determination using NOEs or RDCs.

However, a series of publications over the past year or so has pointed towards some steady progress towards developing structures from chemical shifts alone, with a paper from the labs of Ad Bax and David Baker now in PNAS preprints showing some of the best progress yet (1). The approach used much resembles the CHESHIRE method described last year by Michele Vendruscolo (2), in that both are based on fragment replacement. The Vendruscolo group’s paper explicitly compares CHESHIRE to David Baker’s ROSETTA program. So it seems only natural for Shen et al. to incorporate refinements based on chemical shift directly into the ROSETTA program to create CS-ROSETTA.

The standard ROSETTA approach is to break the protein up into small overlapping fragments of several peptides. A library of structures (the PDB) is then searched with these fragments to obtain a set of about 200 potential conformations based on sequence similarity. ROSETTA then attempts to assemble low-energy (stable) structures out of these potential fragment conformations. CS-ROSETTA uses chemical shift data at two distinct steps. First, chemical shift data are used to select the most appropriate potential conformations from the library, theoretically improving the “building materials” for ROSETTA. In later stages, the consistency between the ROSETTA-predicted structures and the known chemical shifts is used to re-score their energy.

That this can significantly improve the ROSETTA output can be seen from the part of Shen et al.‘s Figure 2 that I have shamelessly stolen for your benefit. These are predictions for calbindin (B) and HPr (C), with the ROSETTA predictions on top and the rescored energies on the bottom. As you can see, the calbindin structures do not have a well-defined energy minimum in the ROSETTA prediction, and the HPr structure has three minima which are not all close to the actual structure as measured by Cα root mean square deviations (RMSDs). Rescoring, however, produces funnel-shaped distributions of energy with respect to RMSD, such that low energies reliably indicate structures close to reality.

Shen et al. optimized CS-ROSETTA against 16 known structures. I checked their results back against the CHESHIRE results. Five proteins were predicted in both papers, and CS-ROSETTA did a better job in terms of backbone atom RMSD for four of them. On average, CS-ROSETTA produced a 24% reduction in this RMSD relative to CHESHIRE. Also, Shen et al. tested CS-ROSETTA blindly against nine proteins whose structures had been recently solved by the Northeast Structural Genomics Consortium, with favorable results.

This isn’t the end of the road by a long shot. Backbone RMSDs for these predictions are generally <2 Å, which is easily good enough for picking out general characteristics of a fold. Identifying subtle features, however, will probably require higher precision and thus more rigorous refinement. However, having these predicted conformations in hand may significantly accelerate the assignment and refinement of structures using NOE data. Combining fragment-replacement approaches based on RDC data and chemical shift may also produce significant improvements. There were other limitations. Shen et al. were not able to converge structures for every protein attempted. CS-ROSETTA is presently limited to proteins smaller than many routinely solved by NMR, and proteins with unusual or complicated topologies may not be solvable using this approach. And, of course, the presence of cofactors that significantly alter local chemical shifts will significantly complicate analyses of this kind, if not render them impossible. Obviously, a great deal of work remains to be done before computational approaches will be capable of tackling the large, highly degenerate systems where they would have the most power to resolve problems. However, the excellent results of CHESHIRE and CS-ROSETTA suggest that our ability to derive structures from limited NMR data will improve dramatically in the next few years.

1. Shen, Y., Lange, O., Delaglio, F., Rossi, P., Aramini, J.M., Liu, G., Eletsky, A., Wu, Y., Singarapu, K.K., Lemak, A., Ignatchenko, A., Arrowsmith, C.H., Szyperski, T., Montelione, G.T., Baker, D., Bax, A. (2008). Consistent blind protein structure generation from NMR chemical shift data. Proceedings of the National Academy of Sciences, 105 (12), 4685-4690. DOI: 10.1073/pnas.0800256105

2. Cavalli, A., Salvatella, X., Dobson, C.M., Vendruscolo, M. (2007). Protein structure determination from NMR chemical shifts. Proceedings of the National Academy of Sciences, 104(23), 9615-9620. DOI: 10.1073/pnas.0610313104

POSTSCRIPT: You can read another take on this paper at Plausible Accuracy.

 Posted by at 1:30 AM

  3 Responses to “Protein Structure from Chemical Shifts Alone”

  1. When you say "First, chemical shift data are used to select the most appropriate potential conformations from the library", do you mean that low-energy conformations are selected based on their characteristic chemical shifts? That would seem to me to be a better technique than using a potential energy function with all its pitfalls to decide which conformations are "low-energy". Nice blog by the way.

  2. It's obvious your not a Chemist or a Physicist. Stick to your field, 'Bio basics'.

  3. [...] 15N, 13C, and 1H atoms, as well as many side-chain methyl groups. They then fed this data to the CS-ROSETTA protocol, which can determine a protein structure using chemical shifts alone. While holding the majority of the protein in a single conformation, they allowed CS-ROSETTA to [...]

Sorry, the comment form is closed at this time.