Table Of ContentMulticanonical Study of Coarse-Grained Off-Lattice Models for Folding
Heteropolymers
Michael Bachmann,1,∗ Handan Arkın,1,2,† and Wolfhard Janke1,‡
1Institut fu¨r Theoretische Physik, Universit¨at Leipzig,
5 Augustusplatz 10/11, D-04109 Leipzig, Germany
0 2Department of Physics Engineering, Hacettepe University, 06532 Ankara, Turkey
0
2 We have performed multicanonical simulations of hydrophobic–hydrophilic heteropolymers with
two simple effective, coarse-grained off-lattice models to study the influenceof specific interactions
n
in the models on conformational transitions of selected sequences with 20 monomers. Another
a
aspect of the investigation was the comparison with the purely hydrophobic homopolymer and
J
the study of general conformational properties induced by the “disorder” in the sequence of a
5
heteropolymer. Furthermore, we applied an optimization algorithm to sequences with up to 55
1
monomers and compared the global-energy minimum found with lowest-energy states identified
within the multicanonical simulation. This was used to find out how reliable the multicanonical
]
h method samples thefree-energy landscape, in particular for low temperatures.
c
e PACSnumbers: 05.10.-a,87.15.Aa,87.15.Cc
m
-
t I. INTRODUCTION (hypothetic) aqueousenvironmentby the polar residues.
a
t Statistical mechanics simulations of this model are still
s
subject of studies requiring the application of sophisti-
. The understanding of protein folding is one of the
t cated algorithms [3, 4, 5].
a most challenging objectives in biochemically motivated
m A manifest off-lattice generalisation of the HP model
research. Althoughthephysicalprinciplesareknown,the
is the AB model [6], where the hydrophobic monomers
- complexity of proteins as being macromolecules consist-
d are labelled by A and the polar or hydrophilic ones by
ing of numerous atoms, the influence of quantum chemi-
n B. The contact interaction is replaced by a distance-
caldetailsonlong-rangeinteractionsaswellastheroleof
o dependent Lennard-Jones type of potential accounting
c thesolvent,etc.,makesanaccurateanalysisofthefolding
forshort-rangeexcludedvolumerepulsionandlong-range
[ process of realistic proteins extremely difficult. There-
interaction, the latter being attractive for AA and BB
fore, one of the most important questions in this field is
1 pairs and repulsive for AB pairs of monomers. An addi-
howmuchdetailedinformationcanbeneglectedtoestab-
v
tionalinteractionaccountsforthebending energyofany
5 lish effective, coarse-grainedmodels yielding reasonable,
pair of successive bonds. This model was first applied in
5 at least qualitative, results that allow for, e.g., a more
two dimensions [6] and generalized to three-dimensional
3 global view on the relationship between the sequence
1 of amino acid residues and the existence of a global, AB proteins [7, 8], partially with modifications taking
0 implicitly into account additional torsional energy con-
funnel-likeenergyminimuminaruggedfree-energyland-
5 tributions of each bond.
scape [1].
0 Moreknowledge-basedcoarse-grainedmodelsareoften
/ Within the past two decades much work has been de-
t of G¯o type, where, in the simplest lattice formulation,
a voted to introduce minimalistic models based on general
the energy is proportionalto the negative number of na-
m principlesthatarebelievedtoprimarilycontrolthestruc-
tive contacts. These models usually require the knowl-
- ture formation of proteins. One of the most prominent edgeofthenativeconformationandserve,e.g.,asmodels
d examples is the HP model of lattice proteins [2] which
n for studies of folding pathways [9, 10] and native topol-
has been exhaustively investigated without revealing all
o ogy [11, 12]. There is also growing interest in modeling
secrets, despite its simplicity. In this model, only two
c the folding behavior of single-domain proteins with sim-
v: typesofmonomersareconsidered,withhydrophobic(H) plified models as many of them show up a simple two-
and polar (P) character. Chains on the lattice are self-
i statekinetics[13]withoutintermediarystatesthatwould
X avoiding to account for the excluded volume. The only
slowdownthefoldingdynamics(“traps”). Itseems,how-
r explicit interaction is between non-adjacent but next- ever,thatnative-centricmodelssuchasthoseofG¯otype
a neighbored hydrophobic monomers. This interaction of
require modifications for a qualitatively correct descrip-
hydrophobiccontactsisattractiveto forcethe formation
tion of this sharp folding transition [14, 15].
ofacompacthydrophobiccorewhichisscreenedfromthe
In this paper we reportstudies of thermodynamic and
ground-statepropertiesofABsequencesknownfromthe
literature [7, 16, 17] for two representations of the AB
model [6, 7] which are described in Sec. II. While, com-
∗E-mail: [email protected]
pared to all-atom formulations, the interactions in these
†E-mail: [email protected]
‡E-mail: [email protected]; coarse-grained models are greatly simplified and hence
Homepage: http://www.physik.uni-leipzig.de/CQT can be computed much faster, they still preserve a com-
2
plicated rugged free-energy landscape, where naive sim- N−2 N 1 1
4 C (σ ,σ ) − , (3)
ulations wouldeasilygettrapped. Inorderto accurately II i j r12 r6
resolve the low-temperature behavior we, therefore, ap- Xi=1 j=Xi+2 ij ij!
plied a multicanonical Monte Carlo algorithm [18, 19]
where b is the bond vector between the monomers k
with an appropriate update mechanism. For simulations k
and k+1 with length unity. In Ref. [7] different values
of systems with complex free energy landscapes, multi-
for the parameterset (κ , κ ) weretested and finally set
canonical sampling has become very popular and its ap- 1 2
to (−1, 0.5) as this choice led to distributions for the
plication in proteinsimulations has a long tradition[20].
angles between bond vectors b and b as well as the
The ground-state search was achieved by means of the k k+1
torsionanglesbetweenthesurfacevectorsb ×b and
energy landscape paving minimizer (ELP) [21]. Sec- k k+1
b ×b thatagreedbestwithdistributionsobtained
tion III is devoted to the description of these methods. k+1 k+2
forselectedfunctionalproteins. Sinceb ·b =cosϑ ,
We present the results for the global energy minima and k k+1 k
the choice κ = −1 makes the coupling between suc-
the thermodynamic quantities in Sec. IV. The paper is 1
cessivebonds “antiferromagnetic”or“antibending” con-
concluded by the summary in Sec. V.
trary to what was chosen in Eq. (1) for AB model I.
The second term in Eq. (3) takes torsional interactions
intoaccountwithoutbeinganenergyassociatedwiththe
II. EFFECTIVE OFF-LATTICE MODELS
puretorsionalbarriersintheusualsense. Thethirdterm
contains now a pure Lennard-Jones potential, where the
We investigated two effective off-lattice models of AB 1/r6 long-range interaction is attractive whatever types
ij
typeforheteropolymerswithN monomers. Thefirstone of monomers interact. The monomer-specific prefactor
is the originalAB modelas proposedin Ref.[6]with the C (σ ,σ ) only controlsthe depth ofthe Lennard-Jones
II i j
energy function valley:
EI = 41N−2(1−cosϑk)+ CII(σi,σj)= ++1/12,, σσii,,σσjj ==AB, or σi 6=σj.
(cid:26)
Xk=1 (4)
N−2 N 1 C (σ ,σ ) For technical reasons, we have introduced in both mod-
I i j
4 r12 − r6 , (1) els a cut-off rij = 0.5 for the Lennard-Jones potentials
i=1 j=i+2 ij ij ! belowwhichthe potentialishard-corerepulsive(i.e.,the
X X
potential is infinite). For both models only a few results
where the first sum runs over the (N − 2) angles 0 ≤
are given in the literature. For the first model these are
ϑ ≤ π of successive bond vectors. This term is the
k estimates for the global energy minima of certain AB
bending energyandthe couplingis “ferromagnetic”,i.e.,
sequences [16], while for AB model II primarily thermo-
it costs energy to bend the chain. The second term par-
dynamic quantities were determined [7]. We have per-
tiallycompeteswiththebendingbarrierbyapotentialof
formed ELP optimizations [21] in order to find deeper-
Lennard-Jones type depending on the distance between
lying energy minima than the values quotedand, in par-
monomers being non-adjacent along the chain. It also
ticular, multicanonical sampling [18, 19] for enabling us
accounts for the influence of the AB sequence (σ = A
i to focus on thermodynamic properties of the sequences
for hydrophobic and σ = B for hydrophilic monomers)
i given in Ref. [7].
onthe energyof a conformationas its long-rangebehav-
ior is attractive for pairs of like monomers and repulsive
for AB pairs of monomers:
III. METHODS
+1, σ ,σ =A,
i j Inthissectionwedescribethecomputationalsampling
C (σ ,σ )= +1/2, σ ,σ =B, (2)
I i j i j methods and the update procedure we applied to obtain
−1/2, σ 6=σ .
i j results for off-lattice AB heteropolymers.
We will refer to thismodel as AB model I throughout
the paper.
A. Energy-landscape paving optimization
The other model we have studied has been introduced
procedure
in Ref. [7] and is a variant of AB model I in that it also
consistsofangularanddistance-dependentenergyterms,
In order to locate global energy minima of a complex
but with some substantial modifications. In the follow-
system, it is often useful to apply specially biased al-
ing, we denote it as AB model II. The energy is given
gorithms that only serve this purpose. We used the
by
energy landscape paving (ELP) minimizer [21] to find
global energy minima of the sequences under consider-
N−2 N−3
E = −κ b ·b −κ b ·b + ation. The ELP minimization is a Monte Carlo opti-
II 1 k k+1 2 k k+2
mization method, where the energy landscape is locally
k=1 k=1
X X
3
flattened. This meansthat if a state xwith energyE(x)
is hit, the energy is increasedby a “penalty” whichitself
depends on the histogram of any suitably chosen order
parameter. Thesimplestchoiceistheenergydistribution
H(E)suchthatwedefineE˜ =E +f(H(E )). Thus,the
t t t
Boltzmann probability exp(−E˜ /k T) for a Metropolis
t B
update, where k T is the thermal energy at the tem-
B
perature T, becomes a function of “time” t. The ad-
vantage of this method is that local energy minima are
filled up and the likelihood of touching again recently
visited regions decreases. This method has successfully
proved to be applicable to find global energy minima in
rough energy landscapes of proteins [21, 22]. Of course, FIG.1: Sphericalupdateofthebondvectorbetweentheith
asaconsequenceofthebiasedsampling,stochasticmeth- and (i+1)th monomer.
ods alongthe line of ELP violate detailed balance and it
is therefore inappropriate to apply those for uncovering
thermodynamic properties of a statistical system. The histograms obtained in the iteration runs were ac-
cumulated error-weighted [19] such that the estimation
of the weights was basedon increasingstatistics and not
only on the relatively small number of sweeps per run.
B. Multicanonical method
In the production period, 5×107 sweeps were generated
to have reasonable statistics for estimating the thermo-
To obtain statistical results we applied a multicanon- dynamicquantities. Statisticalerrorsareestimatedwith
ical Monte Carlo algorithm, where the energy distribu- the standard Jackknife technique [23, 24].
tion is flattened artificially allowing, in principle, for a
random walk of successive states in energy space. This
flattening is controllable and therefore reproducible. For
this purpose, the Boltzmann probability is multiplied by C. Spherical update mechanism
a weight factor W(E), which in our case is a function
of the energy. Then the probability for a state with en-
Forupdatingaconformationweusetheproceduredis-
ergy E reads p (E) = exp(−E/k T)W(E). In order
M B
played in Fig. 1. Since the length of the bonds is fixed
to obtaina multicanonicalor “flat” distribution, the ini-
(|b |=1,k =1,...,N−1),the(i+1)thmonomerlieson
tially unknown weight function W(E) has to be chosen k
the surface of a sphere with radius unity around the ith
accordingly what is done iteratively: In the beginning,
monomer. Therefore,sphericalcoordinatesarethenatu-
the weights W(0)(E) are set to unity for all energies let-
ralchoiceforcalculatingthenewpositionofthe(i+1)th
ting the firstrunbe ausualMetropolissimulationwhich
monomer on this sphere. For the reasonof efficiency, we
yieldsanestimateH(0)(E)forthecanonicaldistribution.
do not select any point on the sphere but restrict the
Thehistogramisusedtodeterminethenextguessforthe
choice to a spherical cap with maximum opening angle
weights, the simplest update is to calculate W(1)(E) =
2θ (the dark area in Fig. 1). Thus, to change the
H(0)(E)/W(0)(E). Then the next run is performed with max
position of the (i+1)th monomer to (i+1)′, we select
probabilities p(1)(E) = exp(−E/k T)W(1)(E) of states
M B theanglesθandϕrandomlyfromtherespectiveintervals
with energy E. The iterative procedure is continued un-
cosθ ≤cosθ ≤1and0≤ϕ≤2π,whichensureauni-
max
til the weights are appropriate in a way that the his-
form distribution of the (i+1)th monomer positions on
togram is “flat”. The introduction of a flatness crite-
theassociatedsphericalcap. Afterupdatingtheposition
rion, for instance, that the fluctuations around the aver-
ofthe (i+1)thmonomer,the followingmonomersin the
age value of the histogram are less than 20%, is use-
chainaresimply translatedaccordingto the correspond-
ful, but not necessary, if the number of iterations is
ing bond vectors which remain unchanged in this type
very large. After having determined accurate weights
of update. Only the bond vector between the ith and
W(E), they are kept fixed and following some thermal-
the (i+1)th monomers is rotated, all others keep their
izationsweepsalongproductionrunisperformed,where
direction. This is similar to single spin updates in local-
statistical quantities O are obtained multicanonically,
update Monte Carlo simulations of the classical Heisen-
hOi = p (E({x}))O({x})/Z with the multi-
M {x} M M berg model with the difference that in addition to local
canonicalPpartition function ZM = {x}pM(E({x})). energychangeslong-rangeinteractionsofthe monomers,
The canonical statistics is obtained by reweighting the changing their relative position to each other, have to
P
multicanonical to the canonical distribution, i.e., mean be computed anew after the update. In our simulations
values are computed as hOi=hOW−1iM/hW−1iM. of the AB models we used a very small opening angle,
Forthedeterminationofthemulticanonicalweightswe cosθ = 0.99, in order to be able to sample also very
max
performed 200 iterations with at least 105 sweeps each. narrow and deep valleys in the landscape of angles.
4
TABLE I: The four Fibonacci sequences, for which we analyzed theground states in both models.
N sequence
13 AB AB ABAB AB
2 2 2
21 BABAB ABAB AB ABAB AB
2 2 2 2
34 AB AB ABAB AB ABAB ABAB AB ABAB AB
2 2 2 2 2 2 2 2
55 BABAB ABAB AB ABAB ABAB AB ABAB AB ABAB ABAB AB ABAB AB
2 2 2 2 2 2 2 2 2 2 2 2
TABLE II: Estimates for global energy minima obtained with multicanonical (MUCA) sampling and ELP minimization [21]
for the Fibonacci sequences of Table I using both models. The values for AB model I are compared with results quoted in
Ref. [16] employing off-lattice PERM and after subsequent conjugate-gradient minimization (PERM+). Minimum energies
foundwithMUCAandELPforABmodelIIarecomparedwiththelowestenergieslistedinRef.[17],obtainedwithannealing
contour Monte Carlo (ACMC) and improved by Metropolis quenching(ACMC+).
AB model I AB model II
N EMUCA EELP EPERM [16] EPERM+ [16] EMUCA EELP EACMC [17] EACMC+ [17]
min min min min min min min min
13 −4.967 −4.967 −3.973 −4.962 −26.496 −26.498 −26.363 −26.507
21 −12.296 −12.316 −7.686 −11.524 −52.915 −52.917 −50.860 −51.718
34 −25.321 −25.476 −12.860 −21.568 −97.273 −97.261 −92.746 −94.043
55 −41.502 −42.428 −20.107 −32.884 −169.654 −172.696 −149.481 −154.505
tion. In Table I we list the Fibonacci sequences [6] that
TABLE III: Root mean square deviations rmsd and overlap
have already been studied in Ref. [16] by means of an
Q of lowest-energy conformations found with multicanonical
off-lattice variant of the improved version[4] of the cele-
sampling and with the ELP optimizer for the Fibonacci se-
brated chain-growth algorithm with population control,
quencesof length N given in Table I.
PERM [25]. In Ref. [16], first estimates for the putative
AB model I ABmodel II ground-stateenergiesofthe Fibonaccisequenceswith13
N rmsd Q rmsd Q to 55 monomers were given for AB model I [6] in three
13 0.015 0.994 0.006 0.998 dimensions. We compare these results with the respec-
21 0.025 0.992 0.009 0.997 tive lowest energies found in our multicanonical simula-
tions and with the results obtained with the minimiza-
34 0.162 0.979 1.412 0.840
tion algorithm ELP [21]. It turns out that the ground-
55 2.271 0.766 1.904 0.857
state energies found by multicanonical sampling agree
well with what comes out by the ELP minimizer, cf. Ta-
ble II. Another interesting result is that our estimates
IV. RESULTS
for the ground-state energies lie significantly below the
energiesquoted in Ref. [16], obtainedwith the off-lattice
We applied the multicanonical algorithm primarily to PERM variant, and our values are even lower than the
study thermodynamic properties, e.g., the “phase” be- energies obtained by PERM and subsequent conjugate-
havior of off-lattice heteropolymers. Before discussing gradient minimization in the attraction basin.
these results, however,we analyze the capability of mul- WehavealsoperformedthistestwithmodelII[7]and
ticanonicalsamplingtofindlowest-energyconformations, wecancompareourresultswithminimumenergieslisted
inparticularthenativefold,becausetheidentificationof inRef.[17],wheretheso-calledannealingcontourMonte
these structures is not only interesting as a by-product Carlo (ACMC) method was applied to these Fibonacci
of the simulation. Rather, since they dominate the low- sequences. FromTableIIweseethatwefindwithMUCA
temperature behavior, it is necessary that they are gen- and ELP runs lower energies for the sequences with 21,
erated frequently in the multicanonical sampling. 34,and55monomers,whiletheresultsforthe13merare
comparable.
Notethatthemulticanonicalalgorithmisnottunedto
A. Search for global energy minima give good results in the low-energy sector only. For all
sequences studied in this work, the same algorithm also
In order to be able to investigate the low-temperature yieldedthethermodynamicresultstobediscussedinthe
behavior of AB off-lattice heteropolymers, we first have following section.
toassessthecapabilitiesofthemulticanonicalmethodto In order to check the structural similari-
samplelowest-energyconformationsand,inparticular,to ties of the lowest-energy conformations ob-
approach closely the global energy minimum conforma- tained with multicanonical sampling and those
5
TABLE IV: Sequences used in the study of thermodynamic
properties of heteropolymers as introduced in Ref. [7]. The
numberof hydrophobicmonomers is denoted by #A.
No. sequence #A
20.1 BA BA BA BA B 14
6 4 2 2 2
20.2 BA BA BABA BA B 14
2 4 2 5
20.3 A B A BA BA B A 14
4 2 4 2 3 2
20.4 A BA BABA B A BA 14
4 2 2 2 3 2
20.5 BA B A B ABABA BAB 10
2 2 3 3 2
20.6 A B AB ABAB ABABABA 10
3 2 2 2
π−ϑ , respectively)
i
1 Nt Nb
d(X,X′) = d (Φ ,Φ′)+ d (Θ ,Θ′) ,
π t i i b i i
!
FIG. 2: Side (left) and top view (right) of theglobal energy i=1 i=1
X X
minimumconformationofthe55mer(ABmodelI)foundwith d (Φ ,Φ′) = min(|Φ −Φ′|,2π−|Φ −Φ′|),
t i i i i i i
theELPminimization algorithm (darkspheres: hydrophobic
d (Θ ,Θ′) = |Θ −Θ′|.
monomers – A,light spheres: hydrophilic– B). b i i i i
Since −π ≤ Φ ≤ π and 0 ≤ Θ ≤ π it follows immedi-
i i
atelythat0≤d ≤π. Theoverlapisunity,ifallangles
from the ELP optimization runs, XMUCA,ELP = of the conformatt,ibons X and X′ coincide, else 0≤Q<1.
(xMUCA,ELP,xMUCA,ELP,...,xMUCA,ELP), we calcu-
1 2 N
lated the rmsd (root mean square deviation) of the
Note that both models are energetically invariant un-
respective pairs,
der reflection symmetry. This means that also the land-
scape of the free energy as function of the bond and
N torsion angles is trivially symmetric with respect to the
1
rmsd=min |x˜MUCA−x˜ELP|2. (5) torsional degrees of freedom. Consequently, unless ex-
vN i i
u i=1 ceptional cases, there is thus a trivial twofold energetic
u X
t degeneracy of the global energy minimum, but the re-
Here, x˜MUCA,ELP = xMUCA,ELP − xMUCA,ELP denotes spectivermsdandoverlapparameteraredifferentforthe
i i 0
thepositionwithrespecttotherespectivecentersofmass associatedconformations. Thevaluesquotedthroughout
x = N x /N oftheithmonomerofthelowest-energy the paper wereobtainedby comparingthe lowest-energy
0 j=1 j
conformations found with multicanonical sampling and conformation found with both reference conformations
ELP mPinimization, respectively. Obviously, the rmsd is andquoting the value indicating better coincidence (i.e.,
zero for exactly coinciding conformations and the larger lower rmsd and higher overlap). This obvious ambigu-
thevaluetheworsethecoincidence. Theminimizationof ity canbe circumventedby adding a symmetry-breaking
the sum in Eq. (5) is performed with respect to a global termtothemodelsthatdisfavors,e.g.,left-handedhelic-
relativerotationofthetwoconformationsinordertofind ity [15].
the best match. For the explicit calculation we used the In Table III we list the values of both parameters for
exactquaternion-basedoptimizationproceduredescribed the lowest-energyconformations found for the Fibonacci
in Ref. [26]. sequences of Table I. We see that for the shortest se-
As an alternative, we introduce here another parame- quences (with 13 and 21 monomers) the coincidence of
terthatenablesustocompareconformations. Insteadof thelowest-energystructuresfoundisextremelygoodand
performing comparisons of positions it is much simpler wearepretty surethat wehavefoundthe groundstates.
and less time-consuming to calculate the so-called over- In the case of the 34mer, modelled with AB model I,
lapbetween two conformationsby comparing their bond both structures also coincide still very well and it seems
and torsion angles. As an extension of the torsion-angle that the attraction valley towards the ground state was
basedvariant[27,28],wedefinethemoregeneraloverlap found within the multicanonical simulation. In the sim-
parameter ulation of the 34mer with model II the situation is dif-
ferent. As seen from Table II, we found surprisingly the
N +N −d(X,X′) marginallylowerenergyvalue inthe multicanonicalsim-
Q(X,X′)= t b , (6)
N +N ulation, but the associated conformation differs signif-
t b
icantly from that identified with ELP. It is likely that
where (with N = N − 3 and N = N − 2 being the bothconformationsdonotbelongtothesameattraction
t b
numbers of torsional angles Φ and bond angles Θ = basinanditisafuturetasktorevealwhetherthisisafirst
i i
6
TABLE V: Minimal energies and temperatures of themaxi- TABLE VI: Same as Table V, but using AB model II. For
mum specific heats for the six 20mers of Table IV using AB comparison the specific heat maximum locations Ts, esti-
model I as obtained by multicanonical sampling. The global mated in Ref. [7], are also given.
maximumoftherespectivespecificheatsisindicatedbyastar
(⋆). Forcomparison, wehavealso giventheglobally minimal No. EmMiUnCA EmELinP rmsd Q TC Ts [7]
energiesfoundfromminimizationwithELPaswellasthere- 20.1 −58.306 −58.317 0.006 0.999 0.35(4) 0.36
spective rmsd and the structural overlap parameter Q of the 20.2 −58.880 −58.914 0.009 0.997 0.33(4) 0.32
corresponding minimum energy conformations.
20.3 −59.293 −59.338 0.010 0.997 0.29(3) 0.30
No. EMUCA EELP rmsd Q T(1) T(2) 20.4 −59.068 −59.079 0.007 0.998 0.27(4) 0.27
min min C C
20.1 −33.766 −33.810 0.048 0.954 0.27(3)⋆ 0.61(5) 20.5 −51.525 −51.566 0.012 0.998 0.33(5) 0.33
20.2 −33.920 −33.926 0.015 0.992 0.26(4)⋆ 0.69(4) 20.6 −53.359 −53.417 0.014 0.996 0.25(2) 0.26
20.3 −33.582 −33.578 0.025 0.990 0.25(3)⋆ 0.69(3)
20.4 −34.496 −34.498 0.030 0.985 0.26(3) 0.66(2)⋆
20.5 −19.647 −19.653 0.017 0.988 0.15(2) 0.41(1)⋆ quences 20.1–20.4,while sequences20.5and20.6possess
20.6 −19.322 −19.326 0.047 0.989 0.15(2)a 0.35(1)⋆ only 10 hydrophobic residues. In Ref. [7], the thermody-
namic behaviorofthese sequenceswas studied by means
aSpecific heat of sequence 20.6 possesses only one maximum at
of the simulated tempering method. For revealing low-
T(2) ≈ 0.35. The value given for T(1) belongs to the pronounced
C C temperature properties, an additional quenching proce-
turningpoint.
dure was performed. In our simulations, we used multi-
canonicalsamplingovertheentirerangeoftemperatures
without any additional quenching.
indication of metastability. The situation is even more
complex for the 55mer. As the parameters tell us, the
lowest-energyconformationsidentifiedbymulticanonical
simulationandELPminimizationshowsignificantstruc- 1. Multicanonical sampling of heteropolymers with 20
tural differences in both models. monomers
Answeringthe questionofmetastabilityis stronglyre-
lated with the problem of identifying the folding path or
As described in Sec. IVA, the multicanonical method
an appropriate parameterizationof the free energy land-
is capable to find even the lowest-energy states with-
scape. Due to hiddenbarriersit is practicallyimpossible
out any biasing or quenching. We proved this also
that ordinary stand-alone multicanonical sampling will
for the 20mers of Table IV by comparing once more
be ableto achievethis forsuchrelativelylongsequences.
with the ELP minimization method. In Tables V
Note that, in our model I multicanonical simulation for
and VI we present the estimates of the global energy
the 55mer, we precisely sampled the density of states
minima in both models we found in these simulations.
over 120 orders of magnitude, i.e., the probability for
Once more, the values obtained with multicanonical
finding randomly the lowest-energy conformation (that
sampling agree pretty well with those from ELP min-
we identified with multicanonical sampling) in the con-
imization. The respective structural coincidences are
formational space is even less than 10−120! In Fig. 2 we
confirmed by the values for the rmsd and the overlap
show two views of the global energy minimum confor-
also being given in these tables. In order to identify
mation with energy E ≈ −42.4 for the 55mer found
min conformational transitions, we calculated the specific
by applying the ELP minimizer to AB model I. The hy- heat C (T) = (hE2i − hEi2)/k T2 with hEki =
V B
drophobic core is tube-like and the chain forms a helical g(E)Ekexp(−E/k T)/ g(E)exp(−E/k T)
E B E B
structure(whichishereanintrinsicpropertyofthemodel
from the density of states g(E). The density of states
and is not due to hydrogen-bonding obviously being not P P
was found (up to an unimportant overall normalization
supplied by the model).
constant) by reweighting the multicanonical energy
distribution obtained with multicanonical sampling
to the canonical distribution Pcan,T(E) at infinite
B. Thermodynamic properties of heteropolymers temperature (β ≡1/k T =0), since g(E)=Pcan,∞(E).
B
Figure 3 shows, as an example, the density of states
Our primary interest of this study is focussed on ther- g(E) (normalized to unity) and the multicanonical
modynamic properties of heteropolymers, in particular histogram for sequence 20.1 simulated with AB model
on conformational transitions heteropolymers pass from II. We sampled conformations with energy values lying
random coils to native conformations with compact hy- in the interval [−60.0,50.0], discretized in bins of size
drophobiccore. Weinvestigatedsixheteropolymerswith 0.01, and required the multicanonical histogram to
20 monomers as studied by Irb¨ack et al. in Ref. [7] with be flat for at least 70% of the core of this interval,
AB model II. The associated sequences are listed in Ta- i.e., within the energy range [−43.5,33.5]. Within and
bleIV. Noticethatthehydrophobicity(=#Amonomers partly beyond this region, we achieved almost perfect
in the sequence) is identical (= 14) for the first four se- flatness, i.e., the ratios between the mean and maximal
7
8×103 highly compact conformations with decreasing temper-
0
ature. The conformations dominant for high temper-
7
-10 (2)
atures T > T are random coils, while for tempera-
6 C
-20 H(E) tures T < T(1) primarily conformations with compact
5 C
log g(E)
-30 10 hydrophobic core are favored. The intermediary globu-
log10g(E) 4 H(E) lar “phase” is not at all present for the exemplified se-
-40
3 quenceswhenmodeledwithABmodelII,whereonlythe
-50 2 lattertwo “phases”canbe distinguished. We haveto re-
-60 1 markthatwhatwedenote“phases”arenotphasesinthe
strict thermodynamic sense, since for heteropolymers of
-70 0
-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 the typeweusedinthis study (this meanswearenotfo-
E cussedon sequences that have specialsymmetries, as for
example diblock copolymers A B ), a thermodynamic
n m
FIG. 3: Density of states g(E) (normalized to unity over
limit is in principle nonsensical. Therefore conforma-
theplottedenergyinterval)andflatmulticanonicalhistogram
tional transitions of heteropolymers are not true phase
H(E) for sequence20.1 (ABmodel II).
transitions. As a consequence,fluctuating quantities, for
example the derivatives with respect to the temperature
ofthemeanradiusofgyration,dhR i/dT,andthemean
histogram value, H /H , as well as the ratio gyr
mean max end-to-end distance, dhR i/dT, do not indicate confor-
between minimum and mean, H /H , exceeded ee
min mean mational activity at the same temperatures, as well as
0.9. As a consequence of this high-accurate sampling
when compared with the specific heat.
of the energy space this enabled us to calculate the
We conclude that conformational transitions of het-
density of states very precisely over about 70 orders
eropolymershappenwithinacertainintervaloftempera-
of magnitude. The energy scale is, of course, bounded
tures,notatafixedcriticaltemperature. Thisisatypical
from below by the ground-state energy E , and that
min
finite-size effect and, for this reason, the peak tempera-
we closely approximated this value can be seen by the
strong decrease of the logarithm of the density of states tures TC(1) and TC(2) (for model I) and TC (for model II)
for the lowest energies. This strong decrease of the defined above for the specific heat are only representa-
density of states curves near the ground-state energy is tives for the entire intervals. In order to make this more
common to all short heteropolymer sequences studied. explicit, we consider sequence 20.3 in more detail. In
It reflects the isolated character of the ground state Figs. 5(a) (for AB model I) and 5(b) (AB model II), we
within the energy landscape. compare the energetic fluctuations (in form of the spe-
We used the density of states to calculate the specific cific heat CV) with the respective fluctuations of radius
heats of the 20mers in both models. The results are of gyration and end-to-end distance, dhRee,gyri/dT =
shown in Fig. 4. A first observation is that the specific (hRee,gyrEi−hRee,gyrihEi)/kBT2. Obviously, the tem-
heats obtained from simulations of AB model I show up peratureswithmaximalfluctuationsarenotidenticaland
twodistinctpeakswiththelow-temperaturepeaklocated theshadedareasarespannedoverthetemperatureinter-
at T(1) and the high-temperature peak at T(2) compiled vals,wherestrongestactivityisexpected. Weobservefor
C C this examplethatinmodelI(Fig.5(a))twosuchcenters
in Table V. The AB model II, on the other hand, fa-
of activity can be separated linked by an intermediary
vors only one pronounced peak at T with a long-range
C
intervalof globular traps. In fact, there is a minimum of
high-temperaturetail. InTableVI wecompareourpeak
temperatures TC with the values given in Ref. [7]. They the specific heat at TCmin between TC(1) and TC(2), but the
are in good correspondence. increaseofinternalenergybyformingthesestates,which
The sequences considered here are very short and the isgivenby TCmindT C (T),israthersmallandtheglob-
T(2) V
nativefoldcontainsasinglehydrophobiccore. Interpret- C
ules are notRvery stable. FromFig.5(b), we conclude for
ing the curves for the specific heats in Fig. 4 in terms
this sequence that in model II no peculiar intermediary
of conformational transitions, we conclude that the het-
conformations occur, and the folding of the heteropoly-
eropolymers simulated with AB model I tend to form,
mer is a one-step process.
within the temperature region T(1) < T < T(2), in-
C C It is widely believed and experimentally consolidated
termediate states (often also called traps) comparable
that realistic short single-domain proteins are usually
with globulesinthe collapsedphaseofpolymers. Forse-
two-statefolders[13]. Thismeans,thereisonlyonefold-
quences 20.5 and, in particular, 20.6 the smaller number
ing transition and the protein is either in the folded or
of hydrophobic monomers causes a much sharper transi-
an unfolded (or denatured) state. Therefore, AB model
tionatTC(2) thanatTC(1) (where,infact,thespecificheat II couldindeed serveas a simple effective modelfor two-
of sequence 20.6 possesses only a turning point). The state heteropolymers. The maindifference betweenboth
pronouncedtransitionnearT(2) isconnectedwitha dra- models under study is that model II containsanimplicit
C
matic change of the radius of gyration, as can be seen torsional energy which is not present in model I. This
later in Fig. 7, indicating the collapse from stretched to is in correspondence with more knowledge-basedG¯o-like
8
3.0 3.0
20.1 20.2
2.5 ABmodelII 2.5 ABmodelII
2.0 2.0
CV(T) CV(T)
1.5 1.5
N N ABmodelI
ABmodelI
1.0 1.0
0.5 0.5
0.0 0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
T T
3.5 3.0
3.0 20.3 ABmodelII 2.5 20.4 ABmodelII
2.5
2.0
CV(T)2.0 CV(T)
1.5 ABmodelI
N 1.5 N
ABmodelI
1.0
1.0
0.5 0.5
0.0 0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
T T
4.0 5.0
3.5 20.5 4.5 20.6
4.0
3.0
3.5
2.5 ABmodelII 3.0 ABmodelII
CV(T) CV(T)
2.0 2.5
N N
1.5 2.0
ABmodelI 1.5
1.0
1.0 ABmodelI
0.5 0.5
0.0 0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
T T
FIG. 4: Specific heats of the 20mers listed in Table IV.
models with explicit torsional energy contributions for 2. Comparison with the homopolymer
the study ofsmallproteins with knowntypicaltwo-state
folding–unfolding kinetics [14].
It is also interesting to compare the thermodynamic
behaviorofthe heteropolymerswith20monomersasde-
scribed in the previous section with the homopolymer
consisting of 20 A-type monomers, A . This is the con-
20
sequentoff-latticegeneralisationofself-avoidinginteract-
ing walks on the lattice (ISAW) that have been exten-
sively studied over the past decades. In contrast to het-
Nonetheless, there are also examples of small pep-
eropolymers, where, because of the associated sequence
tides exhibiting two clear peaks in the specific heat. In
offinitelength,athermodynamiclimitdoesnotexist,ho-
Ref. [29], the artificialpeptide Ala Gly Ala wasstud-
10 5 10
mopolymersshowupacharacteristicsecond-orderphase
ied in detail and it turned out that two transitions sep-
transitionbetweenrandomcoilconformations(“goodsol-
arate the ground-state conformation and random coil
vent”) and compact globules (“poor solvent”), the so-
states. One is the alanine mediated helix-coil transition
called Θ transition [30].
and the second the formation of a glycine hairpin that
leads to a more compact conformation. In this study, our interest was not focussed on the
9
2.5 0.5
(a) 0.14 (a)
2.0 CV(T)/N 0.4
1.5 0.3 1dhReei, 0.12
CVN(T) 1.0 5dhRgyri/NdT 0.2 N5dhdRTgyri hRgyNri(T) 0.10
N dT
0.1
0.5 dhReei/NdT 0.08
0.0
0.0 0.06
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
T 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
0.25 T
3.0 (b) 0.10
5dhRgyri/NdT 0.20
(b)
0.15
2.5 1dhReei, 0.09
CV(T) 0.10N dT
N 2.0 0.05 5dhRgyri hRgyri(T) 0.08
N dT N
CV(T)/N 0.00
1.5 dhReei/NdT
-0.05 0.07
1.0 -0.10
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
T 0.06
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
T
FIG. 5: Fluctuations of energy (specific heat), radius of gy-
ration and end-to-end distance for sequence 20.3 from simu-
FIG. 7: Mean radius of gyration hR i as a function of the
lations with (a) ABmodel I and (b) AB model II. gyr
temperatureT forthesequences20.1–20.4(solidcurves),20.5,
20.6 (long dashes) and the homopolymer A (short-dashed
20
curve) for (a) ABmodel I and (b) AB model II.
3.0
2.5 ABmodelII
2.0 the homopolymer obviously takes more compact confor-
CV(T) mations than the heteropolymers, since its mean radius
1.5
N of gyration is always smaller. This different behavior is
ABmodelI
an indication for a rearrangementof the monomers that
1.0
isparticularforheteropolymers: theformationofthehy-
0.5 drophobiccoresurroundedbythehydrophilicmonomers.
Sincethe homopolymertriviallyalsotakesinthe ground
0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 stateahydrophobiccoreconformation(sinceitonlycon-
T sistsofhydrophobicmonomers),whichisobviouslymore
compact than the complete conformations of the het-
FIG. 6: Specific heats of the homopolymer A20 with 20 eropolymers, we conclude that hydrophobic monomers
monomers for both models. weaken the compactness of low-temperature conforma-
tions. Thus, homopolymers and heteropolymers show
a different “phase” behavior in the dense phase. Ho-
Θ transition but more on the direct comparison of the mopolymers fold into globular conformations which are
finite-size homopolymer and the different heteropolymer hydrophobic cores with maximum number of hydropho-
sequences. In Fig. 6 we have plotted the specific heats biccontacts. Heteropolymersalsoformverycompacthy-
of this homopolymer for both models under study. The drophobic cores which are, of course, smaller than that
first observation is that, independent of the model, the of the homopolymer due to the smaller number of hy-
collapsefromrandomcoils(hightemperatures)toglobu- drophobic monomers in the sequence. In total, however,
lar conformations (low temperatures) happens, roughly, heteropolymersarelesscompactthanhomopolymersbe-
in one step. There is only one energetic barrier as indi- cause the hydrophilic monomers are pushed off the core
cated by the single peak of the specific heats. andarrangethemselvesinashellaroundthehydrophobic
In Fig. 7 we have plotted for both models the mean core. FormodelI,wealsoseeinFig.7(a)acleartendency
radii of gyration as a function of the temperature for that the mean radius of gyration and thus the compact-
the sequences from Table IV in comparisonwith the ho- ness strongly depends on the hydrophobicity of the se-
mopolymer. For alltemperaturesin the intervalplotted, quence, i.e., the number of hydrophobic monomers. The
10
curves for sequences 20.5 and 20.6 (long-dashed curves) peptides seem to possess a rather smooth free-energy
with10A’sinthesequencecanclearlybeseparatedfrom landscape,whereonlyasinglebarrierseparatesunfolded
theotherheteropolymersinthestudy(with14hydropho- states and native fold [13]. Most of the previous investi-
bicmonomers)andthe homopolymer. Thissupportsthe gationsinthisregardwereperformedbymeansofakind
assumption that for heteropolymers the formation of a of knowledge-based potentials, where topological prop-
hydrophobic core is more favorablethan the folding into erties (e.g., the native contacts) of the folded state ex-
an entire maximally compact conformation. plicitly enter. These potentials allow then quantitative
studies of the dynamics of the folding process for the
specified protein. In our study, however, we were more
V. SUMMARY interestedin the influence ofbasic principlesonthe fold-
ingtransitionandthereforequantitativecomparisonwith
the folding of realistic proteins is not appropriate at this
Wehaveinvestigatedtwocoarse-grainedoff-latticehet-
level of abstraction.
eropolymer models of AB type that mainly differ in
In order to reveal folding properties specific to het-
the modelling of energetic bending and torsional barri-
eropolymers, we also compared with a purely hydropho-
ers. While the original AB model (model I) [6], which
bic polymer. Our results for the homopolymer are in ac-
wasfirstintroducedfortwo-dimensionalheteropolymers,
cordancewith the widely acceptedviewof the formation
treatsthepolymerasastiffchainofhydrophobic(A)and
of compact globular conformations below the Θ transi-
hydrophilic (B) monomers without considering torsional
tion temperature. The globules, which are in our in-
barriers, model II favors bending and an additional con-
terpretation only compact hydrophobic cores with max-
tributioncontainingalsotorsionalenergyisregarded[7].
imum number of hydrophobic contacts, minimize the
Another noticeable property of model I is that contacts
surface exposed to the (implicit) solvent. This behav-
between A and B monomers are always suppressed, in
ior was observed for both models which differ, for the
contrast to model II.
purely hydrophobic homopolymer, only in local covalent
We studied these models by means of the multicanon-
bond properties, but not in the non-bonded interaction
ical Monte Carlo method. In a first test, we checked
between the monomers. Since heteropolymers with the
the ability of the algorithm to find lowest-energyconfor-
same chain length contain less hydrophobic monomers,
mations. The results were compared with minimum en-
thehydrophobiccoreis,althoughstillverydense,smaller
ergy values obtained with the energy-landscape paving
and the hydrophilic monomers form a shell surrounding
(ELP) algorithm by measuring the rmsd and a gener-
thecoreandscreeningitfromthesolvent. This,inconse-
alized overlap parameter that in addition to torsional
quence,leadsto anativeconformationthatqualitatively
degrees of freedom also allows the comparison of bond
differsfromthetypicalglobulesknownfrompolymers. In
angles. We found very good coincidences for minimum
models allowingfor intermediarystates (such as model I
energies and associated conformations for all sequences
inourstudy),theglobular“phase”isactuallypresentfor
under study, with the exception of the 34mer in model
heteropolymers, too, and is dominant in a temperature
II and the 55mer in both models. In the latter case the
intervalbetween the hydrophobic-corephase at very low
random walk in the energy space, which is considered as
temperatures and the random coils that are present at
the system parameter,in not sufficient to find the global
high temperatures.
energy minimum and a more detailed study of the ori-
In conclusion, simple, coarse-grained off-lattice het-
gin of the free energy barriers, i.e., the identification of
eropolymer models without specific or knowledge-based
an appropriate order parameter, is required. Nonethe-
parameterization are attractive as they allow for the
less, for all sequences we obtained much lower values for
study of how basic energetic contributions, bonded and
the respectiveputative global-energyminimumthanfor-
non-bonded,inthemodelarerelatedtoandinfluencethe
merly quoted in the literature using an off-lattice chain-
conformationalfoldingprocessofheteropolymers,respec-
growthalgorithm[16]and alsoconsiderablylowervalues
tively. Herein, the main focus is on the study of general
than with the annealing contour Monte Carlo (ACMC)
properties of these systems. In perspective, such mod-
method [17].
els, after refinements with respect to a few specific inter-
Our main objective was the comparison of the confor-
atomic interactions (e.g., hydrogen bonds) will be suit-
mationaltransitions in both models. We primarily stud-
able for studying heteropolymers of a few hundreds of
ied energetic and conformational fluctuations of several
monomers even quantitatively without the requirement
sequences with 20 monomers and found that in model
of a prior knowledge of the final fold.
I there is a general tendency that, independent of the
sequence, the folding from random coils (high tempera-
tures) to lowest-energy conformations is a two-step pro-
VI. ACKNOWLEDGEMENTS
cess and the folding is slowed down by weakly stable in-
termediary conformations. This is different in model II,
were traps are avoidedand the heteropolymers exhibit a M.B. thanks Thomas Weikl for interesting discussions
two-state folding behavior. This is (qualitatively!) in on the folding behavior of short peptides. H.A. ac-
correspondence with the observation that many short knowledges support by the exchange programme of the