More than once, people have written to the FAH forum that there is no duplicated work. But I have not seen anyone support that conviction with facts or reasons.
My understanding is that all the work is duplicated (cloned) a lot. Why do you think that there is a "clone" parameter?? And all but one of the duplicates is thrown away. And this cloning is, by far, the fastest way to fold proteins.
I gleaned the following explanation from "Atomistic protein folding simulations on the hundreds of microsecond timescale using worldwide distributed computing" by Vijay Pande et al., which I refer to below as "  "..
The way proteins fold is this: They quickly fold to a local energy minimum, and then sit around in this energy well for ages, until being kicked out, usually by thermal fluctuations. Then they quickly fold to the next local energy minimum, and the cycle starts all over again. To quote :.
|It has been demonstrated that free energy barrier crossing problems (like protein folding) do not make steady, gradual progress from the unfolded to folded states during a folding transition, but rather spend most of the trajectory time dwelling in a free energy minimum, "waiting" for thermal fluctuations to push the system ove a free energy barrier. Indeed, this process is dominated by the "waiting" time, and the time to cross the free energy barrier is in fact much shorter than the overall folding time, typically by several orders of magnitude.|
Now, protein folding is stochastic (the time to get out of an energy minimum is random)..
This is one place where  is a little confusing to me. Vijay assumes that folding time follows an exponential distribution for a protein with only one energy barrier. [I.e. the fraction of proteins which fold in time t is: f(t) = 1 - exp (-k t), where k is the folding rate. ] But I don't know whether, for the multiple-energy-well proteins, this distribution is for the entire fold or only for each energy well. But I think that it doesn't make any difference, as, it seems to me, that if single energy barrrier crossing is exponential, the entire fold should be too.
This means that sometimes a protein will jump out of the energy minimum quickly, and sometimes take a lot longer. The time the protein spends sitting around in a local minimum is wasted, in the sense that the folding is not going on..
If FAH were to do straightforward simulations, it would take way too long to simulate a fold, as almost all the CPU time would be spent with the protein sitting around, waiting to be kicked out of a local energy minimum..
So here is the trick that FAH uses to speed things up enormously!.
They get a whole passle of CPUs to start doing the same folding at the same time (clones). And, probability being what it is, in one of those simulations the protein will get kicked out of a local energy minimum quickly. Let's say that upon getting kicked out, the protein is in state A. Now you don't need to wait for the rest of them to get kicked out. You simply throw their results away as soon as their work unit is finished, and restart them all at state A. They will then all quickly fall into the next local minimum, and the process starts all over again..
Here are the relevant parts of the report :
To achieve this, we use the following algorithm . . . Consider running M independent simulations started from a given initial condition (each run starts from the same coordinates, but with different velocities). We next wait for the first simulation to cross the free energy barrier . . . Since the average time for the first of M simulations to cross a single barrier is M times less than the average time for all the simulations . . . we can use M processors to effectively achieve an M times speedup of a dynamical simulation, thus avoiding the waiting in free energy minima . . . [O]ne can speed the entire (multiple barrier) problem by turning it into a series of single barrier problems, restarting the processors from the new free energy minima after each barrier crossing . . .
We start M simulations from a single initial condition, and then wait for the first simulation to cross a free energy barrier. Once this simulation has crossed over to the next free energy minimum, we restart all other simulations from that new location . . .
For each run, we have used M "clone" processors . . . Once one of these clones makes a transition . . . we . . . copy the resulting configuration to all of the other processors, and recommence simulations from the new configuration.
Vijay et al. point out in the report that using M processors speeds up the expected folding time by exactly M times..
This is a wonderful way to go, as you can count on going through the entire fold, while spending very little time sitting in a local minimum, doing nothing..
Of course, the "extra" CPUs are not wasted, even though their work is tossed. since, you can't predict which of the clones will pop out of the energy minimum first..
This is my guess as to how the FAH algorithm works. and I, for one, think that it's pretty nifty!.
I do though have several questions about this method. The main one is: If you are relying on thermal fluctuations to kick a protein out of a local minimum, wouldn't there be many different ways it could happen? I.e. different atom chains could get enough energy to push the protein over the top, and if atom chain A collapses before atom chain B, it could result in a different fold than if chain B collapses first.
Another variation on this point is that each clone starts with the atoms having different velocities. It seems to me that different velocity distributions could result in different foldings. Vijay et al. are aware of something like this, stating that :
|[R]estarting them with differing velocities, . . . may lead to potentially non-physical discontinuities . . . [H]owever, if the velocity decorrelation time is much shorter then the conformational decorrelation time . . . then the effects are likely to be minimal.|
So, I'm guessing that my assumptions about the way proteins fold are wrong. Apparently a protein folds only one way, regardless of how it gets out of a local energy minimum. Otherwise the algorithm wouldn't work..
I should also mention that one way I know my explanation is flawed is that, according to the report, the proteins don't fall into "local energy minima" but rather hit "free energy barriers" or "free energy minima". I don't really understand what Vijay et al. mean by this, and it might affect my analysis..
One interesting point made in the report, although not directly germaine to this note, is the limits to the number of processors one can use to achieve a speed increase. The report concludes:.
|The scalability will be inherently limited by the barrier crossing time t(cross), since we expect that for M > t(fold) / t(cross) additional processors will not give any additional speed up. Thus, the bounds of scalability for this method are also related to an interesting physical question: how much time is required to actually cross the free energy barrier. This time can be quantified by using our method to look for the limits of scalability within our technique.|
Based on the above, I surmise that this is the meaning of the terms:.
Clone: The work being done is the same as for all the other clones with the same Run and Gen number, but the random number sequence is different..
Gen: (Generation) When one WU is finished, another is assigned to continue from where it left off. That one would be one generation further..
Run: When the protein is the same, but the run numbers are different, this might indicate that the protein is being folded with different parameters then other runs. Examples of different parameters might be: different temperature, different force cutoff distance, etc..
This isn't the only possibility. It's also possible that Run really means what I have taken Gen to mean, and each Gen is a start from leaving an energy minimum. I.e. A new generation starts when all the clones are restarted after one has jumped an energy barrier.
I am aware that promulgating false information is a cardinal sin in science. And I sincerely hope that I'm not doing it here. But I haven't seen any explanation for these things, except from the report . These are my best ideas as to what's going on. I hope that Pandegroup will quickly clear up misconceptions I have regarding these things..
[ Home ]