Table Of ContentOrientational potentials extracted from protein
structures improve native fold recognition
NICOLAE-VIORELBUCHETE,1,3 JOHNE.STRAUB,1 AND DEVARAJANTHIRUMALAI2
1DepartmentofChemistry,BostonUniversity,Boston,Massachusetts02215,USA
2InstituteforPhysicalScienceandTechnology,UniversityofMaryland,CollegePark,Maryland20742,USA
(RECEIVEDOctober21,2003;FINALREVISIONDecember21,2003;ACCEPTEDJanuary9,2004)
Abstract
We develop coarse-grained, distance- and orientation-dependent statistical potentials from the growing
protein structural databases. For protein structural classes ((cid:1), (cid:2), and (cid:1)/(cid:2)), a substantial number of back-
bone–backbone and backbone–side-chain contacts stabilize the native folds. By taking into account the
importanceofbackboneinteractionswithavirtualbackboneinteractioncenterasthe21stanisotropicsite,
we construct a 21×21 interaction scheme. The new potentials are studied using spherical harmonics
analysis(SHA)andasmooth,continuousversionisconstructedusingsphericalharmonicsynthesis(SHS).
Ourapproachhasthefollowingadvantages:(1)Thesmooth,continuousformoftheresultingpotentialsis
more realistic and presents significant advantages for computational simulations, and (2) with SHS, the
potentialvaluescanbecomputedefficientlyforarbitrarycoordinates,requiringonlytheknowledgeofafew
spherical harmonic coefficients. The performance of the new orientation-dependent potentials was tested
using a standard database of decoy structures. The results show that the ability of the new orientation-
dependent potentials to recognize native protein folds from a set of decoy structures is strongly enhanced
by the inclusion of anisotropic backbone interaction centers. The anisotropic potentials can be used to
developrealisticcoarse-grainedsimulationsofproteins,withdirectapplicationstoproteindesign,folding,
and aggregation.
Keywords: side-chainpacking;statisticalcoarse-grainedpotentials;harmonicanalysis;Boltzmanndevice;
protein-fold recognition
Development of low-resolution models for proteins is es- obtaining residue–residue interaction potentials (Miyazawa
sentialinproteinstructurepredictionandproteindesign.To and Jernigan 1985, 1999b; Godzik et al. 1995; Bahar and
achievethisgoal,weneedaneffectiveinteractionpotential. Jernigan 1996; Lee et al. 1999). Tanaka and Scheraga
Starting with the seminal work of Tanaka and Scheraga (1976) proposed that the frequencies of side-chain pairing
(1976), there has been a growing interest in obtaining rea- can be used to determine potential interaction parameters.
sonably accurate force fields. The accelerated growth of Sincethen,withtheexceptionofafewstudies(Sippl1990),
structural data on thousands of proteins in the Protein Data mostoftheknowledge-basedpotentialshavebeenobtained
Bank (PDB; Berman et al. 2000) has been a source for solely in terms of residue–residue contacts (Skolnick et al.
1997,2000;MiyazawaandJernigan1999b;Tobietal.2000).
Sippl(1990)usedastatisticalmethod,knownastheBoltz-
Reprintrequeststo:JohnE.Straub,DepartmentofChemistry,Boston
University,Boston,MA02215,USA;e-mail:[email protected];fax:(617) mann device, to obtain the distance-dependent mean force
353-6466; or Devarajan Thirumalai, Institute for Physical Science and potentials. This method relies on the assumption that the
Technology, University of Maryland, College Park, MD 20742, USA;
knownX-rayorNMR-resolvedproteinstructuresrepresent
e-mail:[email protected];fax:(301)314-9404.
3Presentaddress:LaboratoryofChemicalPhysics,NationalInstituteof classical equilibrium states and, therefore, the distribution
DiabetesandDigestiveandKidneyDiseases,NationalInstitutesofHealth of distances between two side chains should correspond to
(NIH),Bethesda,MD20892-0520,USA.
theequilibriumBoltzmanndistribution.Otherstructuralpa-
Article and publication are at http://www.proteinscience.org/cgi/doi/
10.1110/ps.03488704. rameters, such as dihedral angles, can also be used in this
862 ProteinScience(2004),13:862–874.PublishedbyColdSpringHarborLaboratoryPress.Copyright©2004TheProteinSociety
Orientationalstatisticalpotentialsforproteins
treatment.Thestatisticalpotentialsdevelopedusingthisap- assess the effectiveness of these potentials in recognizing
proachandothermethods(Tobietal.2000;TobiandElber the native folded states. The results show that the newly
2000;Melleretal.2002)havealsofocusedonlyonextract- derivedorientation-dependentpotentialsforsidechainsand
ing and analyzing probability density functions dependent the protein backbone are successful in a vast majority of
solely on distance. cases.
Previous studies have shown that the relative orientation
and packing of side chains in proteins is an important de-
Results
terminant of the local (secondary structure) geometry as
well as three-dimensional (tertiary structure) topology (Ba-
The importance of side chain–backbone and
har and Jernigan 1996; Bagci et al. 2002a,b). By analyzing
backbone–backboneinteractions
various families of protein structures, we had previously
shownthatcertainorientationalorderparametersarepromi- Amajorimprovementofthepotentialsthatareconstructed,
nent (Buchete et al. 2003). From a quantitative analysis of presented,andtestedinthisstudyisduetotheinclusionof
the statistical data on orientational distributions of side avirtualinteractioncenter(Pep)locatedonthebackbonein
chains extracted from PDB structures, we determined ori- themiddleofthepeptidelink(seeMaterialsandMethods).
entation-dependentinteractions.Thisapproachissupported In our previous study, we considered the inter-residue in-
by the results of Bahar and Jernigan (1996), who proposed teractions (Buchete et al. 2003) using 20×20 side chain–
abackbone-dependentcoordinationsystemforstudyingthe side chain (SC–SC) orientation-dependent potentials. We
statisticaldistributionofresidue–residueinteractionsinpro- were motivated to include explicitly the side chain–back-
teins. They showed that residue-specific coordination loci bone (SC–BB) and backbone–backbone (BB–BB) interac-
and packing characteristics can be extracted by statistical tions in a new 21×21 interaction scheme by the results of
analysis of protein structures, and knowledge-based orien- the structural analysis performed on the main protein
tational potentials can be constructed. However, using the classes.
backbone-dependent coordination system employed by Ba- These results, summarized in Table 1, show that back-
har and Jernigan (1996), it is difficult to include variations bone–backbone and backbone–side chain interactions rep-
in the sizes of side chains and their rotational degrees of resent substantial fractions of the total number of contacts
freedom. These effects are important, especially for large forallthemainclassesofproteins,mainly-(cid:1),mainly-(cid:2),and
sidechainsthathaveseveralprobablerotamericstateswith (cid:1)/(cid:2) (Orengo et al. 1997; Pearl et al. 2000, 2003). The pro-
respect to the backbone. Therefore, we use a coordination teinstructuresanalyzedherearetakenfromthemostrecent
system based on side-chain local reference frames (LRFs, CATH database (Pearl et al. 2003; http://www.biochem.
see Materials and Methods) that are backbone independent ucl.ac.uk/bsm/cath_new). For short-range interactions (2.0
for most amino acids. On the basis of earlier estimations (cid:1)5.6Å),thefractionofSC–BBcontactsismoreprevalent
(Buchete et al. 2003) and on statistical corrections for data in structures with (cid:2)-sheet topology (especially for |i – j| (cid:3)
sparsity,wecanalsousesmallerorientationalbinsizesthen 4), but these contacts cannot be ignored even for (cid:1)-helical
usedbyBaharandJernigan(1996).Inthisstudy,wepresent proteins (see Table 1). Interestingly, as shown in Table 1,
a new method for extracting coarse-grained distance- and for medium- (5.6 (cid:1) 9.2 Å) and long-range (9.2 (cid:1) 12.8 Å)
orientation-dependent residue–residue statistical potentials. interactions, all of the three interaction types occur with
We first show that in many protein structures, there is a similar frequencies, regardless of the protein architecture.
substantial number of contacts between side chains and This is an effect of the more uniform averaging at larger
backbone. The near globularity of protein structures (i.e., distanceswhenorientationalpreferencesarenotthatimpor-
they are nearly maximally compact) implies that such con- tant. The results presented in Table 1 show that the back-
tactsshouldbeprevalent.Thenecessitytoincludeexplicitly bone interaction centers play an important role in the sta-
thebackboneinteractionsisalsosupportedbytheresultsof bility of the secondary structures of proteins for all the
previous statistical derivations of backbone potentials that protein classes. Thus, these interaction centers must be in-
usedvirtualbondandtorsionangles(Baharetal.1997)and cludedinrepresentingthecoarse-grainedpotentialsforlow-
secondary structure information (Miyazawa and Jernigan resolutionstructureprediction,aswellasforanalyzingpro-
1999a). To capture the presence of the large number of tein-folding pathways. This observation is in agreement
side-chain backbone contacts in protein structures, we in- withthepreviousstudyofBaharandJernigan(1997),where
clude an extra anisotropic backbone interaction center lo- distance-dependentSC–SC,SC–BB,andBB–BBstatistical
cated at the peptide bond. Spherical harmonic analysis interactions were investigated in globular proteins. In this
(SHA) and spherical harmonic synthesis (SHS) methods study,thepeptide(Pep)backbonemoietyistreatedasa21st
expresstheorientation-dependentpotentialsinacontinuous typeofinteractioncenter.ByincludingPepasaninteraction
manner for short-, medium-, and long-range interactions. site,wedeterminetheparametersforthestatisticalpotential
Multipledecoysets(SamudralaandLevitt2000)areusedto using a 21×21 interaction scheme.
www.proteinscience.org 863
Bucheteetal.
Table1. Fractionsofsidechain–sidechain(SC–SC),sidechain–backbone(SC–BB),andbackbone–backbone(BB–BB)contacts
calculatedforthethreemosttypicalproteinclasses,namely(cid:1),(cid:2),andmixed(cid:1)/(cid:2)
Shortrange Mediumrange Longrange
2.0→5.6Å 5.6→9.2Å 9.2→12.8Å
For|i−j|(cid:3)3
SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB
(cid:1): 44.5% 25.2% 30.3% 30.9% 34.7% 34.4% 31.6% 31.4% 37.0%
(cid:1)/(cid:2): 38.9% 26.0% 35.1% 31.2% 34.2% 34.6% 32.4% 32.3% 35.3%
(cid:2): 40.0% 25.9% 34.1% 31.0% 34.6% 34.4% 32.2% 32.3% 35.5%
For|i−j|(cid:3)4
SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB
(cid:1): 61.4% 25.9% 12.7% 30.0% 34.7% 35.3% 31.3% 31.1% 37.6%
(cid:1)/(cid:2): 43.1% 26.2% 30.7% 31.2% 34.7% 34.1% 32.1% 32.0% 35.9%
(cid:2): 40.6% 25.9% 33.5% 31.6% 35.5% 32.9% 31.9% 32.0% 36.1%
Thefractionsofcontactswerecalculatedbothforinteractioncentersseparatedbyatleastthreepeptidebondsalongthesequence(|i−j|(cid:3)3)andbyat
leastfour(|i−j|(cid:3)4).
Orientational probability density maps and results for drophobic core of the protein structures. A related effect is
applying the Boltzmanndevice noticeable for Arg, which is one of the most hydrophilic
residues. There is a region of high interaction probability
Theorientationaldensitymapswerecalculatedbydividing
toward its south pole, which can be explained by the pref-
the interaction range into three regions as described above.
erence of Arg to be exposed to solvent. The orientational
Figure 1 displays the probability density maps in the short
probability maps are also shown on the bottom of Figure 1
range(2.0(cid:1)5.6Å)fortherelativeorientationsofotherside
for Gly and Trp, and they present the same qualitative fea-
chains (denoted by the term “all”, and obtained by averag-
tures that are expected when considering their size and hy-
ing data from all the amino acids types) with respect to Ile
dropathicproperties.Dataforallthe21×21possibletypes
(Fig.1A),Arg(Fig.1B),Gly(Fig.1C),andTrp(Fig.1D).
of relative residu–residue and residue–backbone orienta-
Thegrayscalemappingisdirectlyproportionaltotheprob-
tions was collected and analyzed in this work.
abilities of finding another side chain at various orienta-
The orientational probability density maps were further
tions,asshowninthegrayscalebar.NotethatforIle,oneof
used (see Materials and Methods) to construct the corre-
the most hydrophobic residues, the highest probability of
sponding potentials. The values in the orientational prob-
finding a neighboring residue is toward its north pole,
abilitydensitymapforArgresiduesaroundIle(Fig.2A)are
which,accordingtoourdefinitionsoflocalreferenceframes
dividedbythevaluesofthereferencemap(Fig.2A)thatis
(LRFs, see Materials and Methods), is situated away from
obtained by averaging over all the 20 side-chain types (see
thelocalbackbone.Thiseffectisamanifestationofthehigh
MaterialsandMethods).Inthisstudy,thereferenceaverage
statistical probability of finding Ile residues within the hy-
probabilitiesarenotsidechain-specific,butthisiscloserto
therandommixingapproximationofsidechains.Thenega-
tivelogarithmoftheresultgivesthestatisticalpotentialfor
therelativeIle–Argorientations(Fig.2C)inunitsofkT.In
this work, we derive orientation-dependent potentials for
short- [i.e., r (cid:1) (2 Å, 5.6 Å)], medium- [r (cid:1) (5.6 Å, 9.2
ij ij
Å)],andlong-rangeinteractionshells[r (cid:1)(9.2Å,12.8Å)].
ij
The maps shown here correspond to the second shell of
interactions (i.e., medium range).
We used the same procedure to obtain the orientational
potentials for all the types of SC–SC and SC–BB interac-
tions.
Figure1. Examplesofprobabilitydensitymapsfortherelativeresidue– The spherical harmonic analysis (SHA) of inter-residue
residueorientationsinproteinsforshortrangeinteractions(2.0(cid:1)5.6Å).
orientationalpotentials
Thegrayscalemappingisdirectlyproportionaltotheprobabilitiesoffind-
inganyothersidechainatvariousorientationswithrespecttoIle(A),Arg
The statistical potentials derived using the Boltzmann de-
(B),Gly(C),andTrp(D).Thenormalizedprobabilityvaluesshownonthe
grayscalescalehaveunitsof10−3. vice were further analyzed using SHA (see Materials and
864 ProteinScience,vol.13
Orientationalstatisticalpotentialsforproteins
Figure2. TheBoltzmanndevice.Statisticalpotentialsfortherelativeresidue–residueorientationscanbederivedfromprobability
densitymaps.Theorientationalprobabilitydensitymap(inunitsof10−3)forArgresiduesaroundIle(A)andthereferencemap(B)
correspondingtoalloftheside-chaintypesareusedtobuildthestatisticalpotentialfortherelativeIle–Argorientations(C)inunits
ofkT.Thesemapscorrespondtothesecondinteractionshell[r (cid:1)(5.6Å,9.2Å)].Thesmoothingeffectofsphericalharmonicsynthesis
ij
(seeMaterialsandMethods)isshowninD.
Methods). We adapted Spherepack routines (Adams and Fig. 3 for an example), suggesting that further filtering
Swarztrauber 1997, 1999) to analyze the potential data, methods can be applied, and that efficient computational
which was first constructed on a 12×24 equiangular grid methods using the new smooth potentials resulting from
onsphericaldomainscorrespondingtothethree(i.e.,short, SHS can be developed. The dominance of only a few ex-
middle, and long) interaction ranges. pansion coefficients (Fig. 3) is consistent with our earlier
For example, in Figure 3 we show the a and b co- finding(Bucheteetal.2003)that,inproteinswithdifferent
mn mn
efficients (see eq. 10 in Materials and Methods) computed architectures, only a few orientational order parameters are
for long-range Ile–Arg interactions, up to order n(cid:1)13 (m relevant.
(cid:4) n). The analysis of all 21×21 types of orientational We show in Figure 4 the reconstructed Ile–Arg orienta-
potentials was performed and the a and b coefficients tional potential using 12×24 equiangular bins (up) and a
mn mn
were stored. Calculation of the expansion coefficients (a 92×184 grid (down), for short-range (left), middle-range
mn
and b ; see eq. 10) is vital, because it permits rapid (middle), and long-range (right) interactions. When com-
mn
calculation of each specific orientational potential by paring the SHS potential values reconstructed on the
spherical harmonic synthesis (SHS) for any value of the 92×184 grid to the original orientational potential values
LRF orientational parameters (cid:5) and (cid:6). Importantly, not forIle–ArgshowninFigure4(left),thesmoothingeffectof
many a and b coefficients have large amplitudes (see the SHA/SHS procedure is evident.
The results of the same type of SHA/SHS process are
shown in Figure 5 for the anisotropic virtual backbone in-
teraction centers (Pep) located in the middle of the peptide
bond (see Materials and Methods). The smooth Pep–Pep
orientational potentials are represented for short-range
(left), middle-range (middle), and long-range (right) inter-
actions. The grayscale coding used in this figure is similar
to the one used in Figure 4. From both Figures 4 and 5 we
can see that, for each interaction range, there are specific
anisotropic features of the orientation-dependent statistical
Figure3. Exampleofsphericalharmonica (A)andb (B)coefficients potentials. Some of the attractive or repulsive angular re-
mn mn
(m(cid:4)n(cid:4)13)forIle–Argorientationalstatisticalpotentials.Thisisatypi- gionsareconservedfromoneinteractionshelltotheother,
cal situation for long-range interactions, in which only a few dominant
yet some present significant changes that could explain the
eigenvalues exist. These values can be used in the spherical harmonics
specificfeaturesofresidue–residue,residue–backbone,and
synthesisprocessforreconstructingsmoothorientationalpotentialsforall
ofthepossiblerelativeorientations,withhighaccuracy. backbone–backbone interactions.
www.proteinscience.org 865
Bucheteetal.
Figure 4. The smooth Ile–Arg orientational potentials represented for short-range (top), middle-range (center), and long-range
(bottom) interactions. The potential values, calculated originally for a 12×24 equiangular grid are shown in A, C, and E. The
correspondingsmoothpotentialscomputedfora92×184gridusingsphericalharmonicsynthesisareshowninB,D,andF.Dark
regions correspond to attractive (i.e., negative) potentials, whereas white regions are positive, thus less likely to correspond to
interactionloci.
The three-dimensional contour maps (Fig. 6) of continu- presence of other Pep particles is favored at these orienta-
ous and smooth reconstructed statistical potential fields il- tions. The light-gray regions represent unfavorable and re-
lustrate the relative LRF orientation of the “virtual back- pulsiveregionsaroundPep.Thistypeofthree-dimensional
bone interaction center” backbone particle (Pep). For the representation can be used to investigate all of the possible
objectsshownhere,thegrayscaleisdirectlyproportionalto side chain–side chain, side chain–Pep, and Pep–Pep orien-
the amplitude of the potentials. The negative attractive po- tation-dependentstatisticalpotentialfieldsandtheirunique
tential values are indicated as dark angular regions. The features.
Figure5. ThesmoothPep–Peporientationalpotentialsrepresentedforshort-range(left),middle-range(center),andlong-range(right)
interactions.Thepotentialvalues,calculatedoriginallyfora12×24equiangulargridareshowninA,C,andE.Thecorresponding
smoothpotentialscomputedfora92×184gridusingsphericalharmonicsynthesisareshowninB,D,andF.Darkregionscorrespond
toattractive(i.e.,negative)potentials,whereaswhiteregionsarepositive.
866 ProteinScience,vol.13
Orientationalstatisticalpotentialsforproteins
abilityofourstatisticalpotentialstodiscriminatethenative
structure of a protein from a large set of multiple decoy
structures from the database of Samudrala and Levitt
(2000).Theseresultsareshownintermsofthevaluesofthe
energyandroot-mean-squaredistance(RMSD)Zscores(Z
E
and Z ) that are defined next. The RMSD is calculated
RMSD
with respect to the C atoms. The Z score of a statistical
(cid:1)
quantity x (e.g., in our case E or RMSD) is
x−x
Z = (1)
x (cid:7)
x
where (cid:7) is the standard deviation and xis the mean of the
x
distribution of x values. For comparing the performance
(with and without the Pep interaction center) of the inter-
action potentials on sets of decoy structures, we calculate
both Z and Z .
E RMSD
The data in Figure 7 shows energy distributions for the
set of 500 decoys of the 2cro protein from the fisa family
(Simons et al. 1997; Samudrala and Levitt 2000). The two
distributions correspond to the distance-dependent statisti-
calpotential(U histogram,right)fora20×20interaction
D
scheme (Buchete et al. 2003), and to the present smoothed
distance-andorientation-dependent21×21potential(U ,
DO
left)thatincludesbackboneinteractions.Thevaluesforthe
corresponding energies of the native 2cro state and the
meanvaluesarealsoshownforillustratingthedefinitionsof
the energy Z scores (Z ) used in our analysis. For ideal
E
potentials, it is expected that the structure corresponding
to the native state would have the most negative Z . In
E
addition, a good potential scoring function should assign
Figure 6. Three-dimensional representations of the statistical potential
fieldforthesmoothshort-range(A),middle-range(B),andlong-range(C)
backbone–backboneinteractions(Pep–Pep).Therelativeorientationofthe
Pep particle with respect to the orientation-dependent potential values is
alsoshown.Thedark,attractiveregionsresponsibleforhydrogenbonding
areapparentformidrangeinteractions(B).
Effect of including explicitly the backbone interactions
on thepotentials
One of the main features of the SHA/SHS approach is that
specific values of the orientational potentials can be calcu-
lated (reconstructed) from the a and b coefficients for
mn mn
all values of the orientational parameters (cid:5) and (cid:6). The
smoothing effect of the SHA/SHS procedure, which elimi-
nates the unrealistic discontinuities in the binned orienta-
Figure7. Distributionofenergiesforthe500decoysofthe2croprotein
tional potentials, can lead to information loss (Adams and fromthefisaset.Thetwoenergydistributionscorrespondtothedistance-
Swarztrauber 1997, 1999). To assess the efficacy of the dependentstatisticalpotential(U histogram,right)fora20×20inter-
D–20
reconstructed orientational potentials, we performed tests actionscheme(Bucheteetal.2003)andtothepresentsmootheddistance-
and orientation-dependent 21×21 potential (U , left) that includes
fordiscriminatingthenativestatefrommultipledecoysets DO–21s
backbone interactions. The values for the corresponding energies of the
(Samudrala and Levitt 2000; Buchete et al. 2003). The re-
native2crostateandthemeanvaluesarealsoshownforillustratingthe
sults (see Figs. 8–10, below) were obtained for testing the definitionsoftheenergyZscores(Z )usedinouranalysis.
E
www.proteinscience.org 867
Bucheteetal.
a very negative C RMSD Z score (Z ) to the decoy Samudrala and Levitt 2000; Fain et al. 2001; Keasar and
(cid:1) RMSD
structure that has the lowest energy. It is clear from Fig- Levitt2003)lmds,fisa,fisa_casp3,and4stateareshownin
ure 7 that the U (i.e., smooth, distance- and orienta- Figure 8. To assess the performance of the present poten-
DO–21s
tion-dependent potentials that are reconstructed by SHA tials, we compare the results using U (dark bars) and
DO–20
using the 21×21 interaction model) are very successful in U (white bars). The cases in which the new U
DO–21 DO–21
correctly identifying the native state in the 2cro decoy set. potentials perform better in discriminating the native state
The U notation is used for statistical potentials that are from decoys are emphasized by the arrows on the left. For
DO
both distance- and orientation-dependent, whereas the a large majority of decoy sets (84% when considering the
U potentials depend solely on distance. The numerical energyscoreZ ),theperformanceisimprovedbyincluding
D E
Z score in this case is lower than what is found using the backbone interaction centers. The values of Z com-
E E
the U potential. In this instance, U identifies the puted using U are more negative then for U for
D–20 DO–21s DO–21 DO–20
native state as the one with the lowest energy (Fig. 7). On all decoy sets except the 4state (Fig. 8).
the other hand, for the U potential, there are decoy The results in Figure 8 show an important number of
D–20
structures with lower energies than the native state. In improvements for the Z score. However, our results show
E
our tests, we use Z scores to assess the efficacy of the that the Z score is a noisy quantity, in agreement with
E RMSD
potentials. theobservationsofSamudralaandLevitt(2000),showinga
We have computed both the energy and the RMSD Z lesssystematicbehavior.Asaconsequence,wearefocusing
scores (Z and Z ) for the distribution of the total en- ontheenergyZ scoreasthemainquantitativeperformance
E RMSD E
ergies for each protein decoy set. Assuming pairwise addi- indicator of the ability of our potentials to identify native-
tivity, the total potential for the residue pair ij is like protein conformations in decoy tests.
Uij (cid:1)r ,(cid:8) ,(cid:5) ,(cid:8) ,(cid:5) (cid:2)=Uij (cid:1)r ,(cid:8) ,(cid:5) (cid:2)
DO ij ij ij ji ji DO ij ij ij Effect of the SHA/SHS method on the
+Uji (cid:1)r ,(cid:8) ,(cid:5) (cid:2). (2)
DO ji ji ji smoothedpotentials
The results for Z and Z for the multiple decoy sets Toassesstheeffectofsmoothing(seeMaterialsandMeth-
E RMSD
(ParkandLevitt1996;Parketal.1997;Simonsetal.1997; ods) on the results, we compare in Figure 9 Z scores cal-
E
Figure8. Resultsfromdecoytestsontheorientation-dependentpotentials.Theenergy(Z )andC RMSDZscores(Z )calculated
E (cid:1) RMSD
formultipledecoysets(ParkandLevitt1996;Parketal.1997;Simonsetal.1997;SamudralaandLevitt2000;KeasarandLevitt2003)
lmds,fisa_casp3,fisa,and4statearecomparedbefore(U )andafter(U )theSHA/SHSmethodisapplied.ThePDBcode
DO–21 DO–21s
foreachproteinandthenumberofdecoysinitscorrespondingsetareshowninthemiddle.ThedarkbarscorrespondtoU and
DO–21
thewhitebarsareforU .ThecasesinwhichtheU potentialsperformbetterindiscriminatingthenativestatefromdecoys
DO–21s DO–21
areemphasizedbythearrowsontheleft.Foramajorityofdecoysets(84%forZ and56%forZ ),theperformanceisimproved.
E RMSD
868 ProteinScience,vol.13
Orientationalstatisticalpotentialsforproteins
culated using the U potentials and the U poten-
DO–21 DO–21s
tials reconstructed from expansion coefficients with the
spherical harmonic synthesis (SHS) method. The effects of
theSHA/SHSmethodontheenergyZ scorearerelatively
E
small, and a very good correlation is observed between Z
E
scores obtained using the 21×21 interaction scheme
(U ) and the Z scores calculated using the smooth
DO–21 E
U potentials.Noticeably,theZ scoresobtainedusing
DO–21s E
U are marginally better (i.e., more negative). Al-
DO–21s
though there is an intrinsic information loss introduced by
the SHS/SHA procedure (Adams and Swarztrauber 1997,
1999), the resulting potential smoothing improves the per-
formanceoftheorientation-dependentpotentials.Thesere-
sults show that the coarse graining of the orientational po-
tential using the spherical harmonic analysis does not lead
to loss of accuracy and could enhance the native fold rec-
ognition ability.
Figure 10 shows the energy Z scores calculated for the
E
same multiple decoy sets, but using the 20×20 interaction
scheme dependent only on residue–residue distances
(U , dark bars) and the U potentials (white). The
D–20 DO–21s
newU potentialsperformbetterinamajorityofcases
DO–21s
for the fisa and fisa_casp3 and lmds decoy sets, as empha-
sized by the arrows in Figure 10. For the 4state set, where
the energy Z scores for U are better, we still obtained
D–20
negative Z values consistently for all the new potentials. Figure10. ComparisonofZ scoresformultipledecoysets(Samudrala
E E
We need to mention that the U potentials were con- andLevitt2000).Thecasesinwhichthesmooth,distance-andorientation-
D–20
structed as described by Buchete et al. (2003) for 20 radial dependentpotentials(UDO–21s,whitebars)performbetterthanpotentials
dependingsolelyondistances(U ,black)indiscriminatingthenative
binsof1.2Å,whereasthenewU potentialshaveonly D–20
DO–21s statefromdecoysareemphasizedbythearrowsontheleft.
threeinteractionsranges(short,middle,andlong).Itisno-
ticeable, therefore, that the detailed orientation dependence
and the inclusion of interactions with the backbone in the
U potentials confer significant advantages in an im-
DO–21s
portant number of cases.
Role of decoysets
Theperformanceofthestatisticalpotentialsiscruciallyde-
pendent on the decoy sets used. There are considerable
variations in the nature of the decoy sets. For example, the
decoysgeneratedfor2cro(depictedinFig.11A)bydiffer-
entmethods,haveC RMSDvaluesfromthenativestatein
(cid:1)
the range 0.81(cid:1)8.31 Å for 4state, 4.29(cid:1)12.60 Å for the
fisa,and3.87(cid:1)13.47Åforthelmdssets.Theresultsshown
in Figure 10 for the Z score of the 4state decoys could be
E
due to the fact that for these small (cid:1)-helical proteins, the
decoy structures are so close to the native state that small
errorsinthepotentialscanhaveimportantnegativeeffects.
Figure9. Forallthedecoysetsanalyzed(ParkandLevitt1996;Parketal. Itisnotourgoaltoanalyzeindetailthemagnitudeofthese
1997;Simonsetal.1997;SamudralaandLevitt2000),averygoodcor- errors and all of their possible sources (e.g., assumptions
relation is observed between Z scores obtained using the 21×21 inter-
E behind the statistical model, the dependency on training
action scheme (U ) and the Z scores calculated after applying the
DO–21 E sets,andothertechnicaldetails).Duetothestatisticalnature
spherical harmonic synthesis (SHS) method (U ). In fact, the Z
DO–21s E
scoresobtainedusingU aremarginallybetter(i.e.,morenegative). of this approach, there is always a possibility to fail in
DO–21s
www.proteinscience.org 869
Bucheteetal.
value(6.14Å)ascomparedwiththenativestateconforma-
tion. In Figure 11, the grayscale coding corresponds to
structural features such as dark gray for helices, black for
turns,andlightgrayforcoilregions.AsshowninFigure11,
evenifthestructureidentifiedasnative-likebyusingback-
bone-related information (i.e., U , Fig. 11C) does not
DO–21s
have a Z score as good as the one obtained by using only
E
the side chain–side chain distances (i.e., U , Fig. 11E),
D–20
the U performance is reasonable, because the native-
DO–21s
like structure shown in Figure 11C has noticeably more
nativefeatures(e.g.,numberandrelativepositionsofhelical
chains).ThedecoywiththelargestC RMSD(8.31Å)from
(cid:1)
the native state is also shown in Figure 11B for illustrating
therangeofconformationsavailableinthe4statedecoyset.
The decoys with the highest U (Fig. 11E) and U
D–20 DO–21s
(Fig. 11F) values are observed to have large structural dif-
ferences from the native conformation in both cases, as a
manifestation of the efficacy of both potentials discussed
here. Thus, the U parameter set correctly discrimi-
DO–21s
nates between native and other misfolded structures even
though there is no improvement in Z relative to the U
E D–20
potentials.
Besides the results shown in Figures 8–10 for the lmds
(Keasar and Levitt 2003), fisa_casp3 (Simons et al. 1997),
fisa (Simons et al. 1997), and 4state (a.k.a. 4state_reduced;
Park and Levitt 1996) decoy sets, we also computed the
correspondingZscoresforthemultipledecoysetshg_struc-
tal, ig_structal (Samudrala and Levitt 2000). For the hg-
_structal(hemoglobines)andig_structal(immunoglobulins)
decoysets,whichhavebeenbuiltwiththeprogramsegmod
(Levitt1992)bycomparativemodelingusingotherglobins
as templates, we also obtained better Z scores for the
E
Figure11. Examplesofsixstructures(onenativeandfivedecoysfromthe U potentials than for the U . Including the 21st
DO–21 DO–20
4stateset)forthe2croprotein.(A)Thenativestate;(B)thedecoywiththe backbone, interaction center proves to be beneficial for al-
largestC RMSD(8.31Å);(C)thebest(i.e.,native-like)decoywiththe
(cid:1) most 70% of the immunoglobulin sets as compared with
lowestU potential;(D)thedecoywiththehighestU potential;
DO–21s DO–21s only 34% of the hemoglobins. This could be explained by
(E) the decoy with the lowest U potential; (F) the decoy with the
highestU potential.TheU D–2p0otentialpresentsanincreasedability thefactthatforthistypeof|i–j|(cid:3)4interactions(asisthe
D–20 DO–21s
toidentifythenative-likedecoystateshowninC.Thesestructureshave case for the potentials presented here), there is a signifi-
beenalignedandplottedusingVMD(Humphreyetal.1996).Allofthe cantly smaller number of backbone–backbone contacts at
RMSDvaluesaregiveninÅ,andthegraylevelscorrespondtostructural short-rangedistancesin(cid:1)-helicalthanin(cid:2)-sheetstructures
features (dark gray for helices, black for turns, and light gray for coil
(see data in Table 1).
regions).
Theresultsofthetestsondiscriminatingthenativestates
from decoy sets show that the new 21×21 smoothed po-
identifying the native state when using more detailed sta- tentials, which include the virtual Pep particle to represent
tistical data in the analysis of protein structures For ex- thebackboneinteractioncenter,performbetterinamajority
ample, the case of the 2cro protein seems to correspond to of cases. In agreement with our previous calculations
a failure for the 4state set of decoys (see Fig. 10), because (Buchete et al. 2003), the inclusion of orientational depen-
the Z score of the structure with the lowest U value dence alone can strongly improve the quality of inter-resi-
E DO–20
(shown in Fig. 11E) is more negative. However, the decoy duestatisticalpotentials.Eventhoughintheworkpresented
structure that scored the best by using to the new U hereweconsideronlythreedistanceinteractionranges,the
DO–21s
statistical potential (see Fig. 11C) has a much lower C performanceoftheneworientation-anddistance-dependent
(cid:1)
RMSD value (only 2.77 Å) from the native state. In addi- potentials is generally better then the performance of the
tion, the decoy structure that scored the worst for the potentialsthatdependsolelyoninter-residuedistances.The
U statisticalpotential(Fig.11D)hasahighC RMSD performance of the new orientation-dependent potentials is
DO–21s (cid:1)
870 ProteinScience,vol.13
Orientationalstatisticalpotentialsforproteins
even further enhanced by including the 21st anisotropic abilityofthesmoothedorientation-anddistance-dependent
backbone interaction site. potentials in recognizing the native structures from a large
numberofdecoysetsisgreatlyenhancedbytheinclusionof
the virtual interaction center Pep for the backbone. For all
Discussion
decoy sets except for the 4state, the energy Z-scores im-
Thewidelyusedstatisticalprocedureforbuildingdistance- prove with the potentials used here.
dependent coarse-grained inter-residue statistical potentials The results of testing the reconstructed potentials on the
(Sippl1990;Godziketal.1995;Leeetal.1999;Miyazawa decoysetsofSamudralaandLevitt(2000)showedthatonly
and Jernigan 1999b) can be generalized to include orienta- smalldifferencesareintroducedbythecontinuousspherical
tional effects (Bahar and Jernigan 1996; Buchete et al. harmonicstreatmentwithrespecttothecasewhentheraw-
2003). The large and continually increasing number of ex- binnedorientationalpotentialsareused.Thediscriminatory
perimentally derived protein structures available from pro- power of the new orientational potentials is strongly im-
tein databases (Berman et al. 2000) can be used to extract provedbyconsideringthe21stanisotropicinteractionsitein
both distance- and orientation-dependent statistical poten- themiddleofthepeptidebondanditisenhancedevenmore
tials. Our investigation of relative side chain–side chain by the SHA/SHS procedure in a majority of cases. From a
orientations in proteins, permits the identification of statis- computational point of view, there are potential benefits
tically preferred interaction loci for side chains, indepen- both for free energy calculations and coarse-grained dy-
dently of their neighboring C backbone atoms. In accor- namicalsimulationsthatusestatisticalpotentialsasfollows:
(cid:1)
dance with previously reported results for a backbone-de- (1)Thememoryrequirementsforstoringthesphericalhar-
pendent coordination system (Bahar and Jernigan 1996) monic coefficients instead of the raw orientational data are
glycine,serine,andthreoninepresentlesscomplexorienta- smaller, and (2) the values of the potentials can be recon-
tionalprobabilitydistributions,withfewerpeaksthanother structedforanyspecificvaluesofthe(cid:5)and(cid:6)orientational
aminoacids(e.g.,arginineorisoleucine).Ourorientational parameters as smooth and continuous functions over the
probability density maps present, in general, more peaks entire spherical domain. The parameters of the new orien-
than the orientational distributions of Bahar and Jernigan tation-dependent potentials are available on request. The
(1996), but a detailed comparison is not feasible due to the novel smoothed distance- and orientation-dependent poten-
different residue coordination system used in that study. tials constructed by spherical harmonic analysis of the sta-
Ourprevioustests(Bucheteetal.2003)showedthatori- tistical data are suited not only for use in protein fold rec-
entation-dependent potentials are better in discriminating ognition studies, but also could help the development of
the native folded state from large sets of alternative decoy new large-scale coarse-grained protein simulations using
structures(SamudralaandLevitt2000),thantheradial-only molecular dynamics or Monte Carlo methods.
dependent statistical potentials.
Inthisstudy,wedemonstratedthatfurtherimprovements
can be obtained by including backbone interactions explic- Materials and methods
itly,andbyanalyzinganddescribingcoarse-grained,orien-
tation-dependentinter-residuepotentialsintermsoftheco-
Local reference frames of sidechains
efficientsresultingfromasphericalharmonicanalysis.The
statisticaldataextractedfromthePDBstructuresisusedto Togetparametersfortheorientationaldependenceofthecoarse-
build orientation-dependent potentials that have sufficient grained potentials, it is useful to define local reference frames
continuity properties to make possible their spherical har- (LRFs)foreachaminoacid(Fig.12).Foranyaminoacid,aLRF
canbeconstructedbyconsideringatleastthreenoncollinearpoints
monic analysis. We have constructed and studied a novel
(P , P , and P ) that uniquely define the orientation of the LRF,
distance- and orientation-dependent potential for the 20 1 2 3
and a fourth point (usually denoted by S for the i-th side chain)
aminoacidsetandanextraanisotropicbackboneinteraction thatspecifiesthelocationoftheLRForigiin.Inthecoarse-grained
center. The explicit inclusion of the 21st interaction center representation, the S points can be considered as the interaction
i
onthebackbonewasmotivatedbytheimportantbackbone– centers, as all of the relative side chain–side chain distances and
orientationsaremeasuredwithrespecttothem.Thechoiceofthe
backbone and backbone–side chain contact fractions that
threepointsP ,P ,andP isimportantforthefollowingreasons:
can be observed in a representative set of the main protein 1 2 3
(1) The reference points should be unambiguously identifiable in
classes(Pearletal.2003).Thesmooth,continuouspotential allofthesidechains,regardlessoftheirparticularconformation;
is described using its spherical harmonic coefficients for (2)theymustnotbearrangedinacollinearconfiguration;and(3)
short-, medium-, and long-range interactions. the positions of the side-chain atoms in the LRF should vary as
littleaspossibleforvariousside-chainconformationsofthesame
Because the native structures, regardless of their archi-
aminoacidtype.Thechoiceofthesethreereferencepointsiseasy
tecture((cid:1),(cid:2),or(cid:1)/(cid:2)),arestabilizedbyasubstantialnumber
forsmallorrelativelyrigidaminoacids;however,itismoredif-
of backbone–side chain and backbone–backbone interac- ficult for the relatively large ones that are expected to be quite
tions,itiscrucialtoaccountfortheminanyforcefield.The mobile,withlongsidechainssuchaslysineorarginine.
www.proteinscience.org 871
Description:and Jernigan 1985, 1999b; Godzik et al. The smooth Ile–Arg orientational potentials represented for short-range (top), middle-range (center), and long-range .. where Ynm are complex spherical harmonics (Arfken and Weber.