We describe here an algorithm for distinguishing
sequential from nonsequentially folding proteins. Several
experiments have recently suggested that most of the proteins
that are synthesized in the eukaryotic cell may fold sequentially.
This proposed folding mechanism in vivo is particularly
advantageous to the organism. In the absence of chaperones,
the probability that a sequentially folding protein will
misfold is reduced significantly. The problem we address
here is devising a procedure that would differentiate between
the two types of folding patterns. Footprints of sequential
folding may be found in structures where consecutive fragments
of the chain interact with each other. In such cases, the
folding complexity may be viewed as being lower. On the
other hand, higher folding complexity suggests that at
least a portion of the polypeptide backbone folds back
upon itself to form three-dimensional (3D) interactions
with noncontiguous portion(s) of the chain. Hence, we look
at the mechanism of folding of the molecule via analysis
of its complexity, that is, through the 3D interactions
formed by contiguous segments on the polypeptide chain.
To computationally splice the structure into consecutively
interacting fragments, we either cut it into compact hydrophobic
folding units or into a set of hypothetical, transient,
highly populated, contiguous fragments (“building
blocks” of the structure). In sequential folding,
successive building blocks interact with each other from
the amino to the carboxy terminus of the polypeptide chain.
Consequently, the results of the parsing differentiate
between sequentially vs. nonsequentially folded chains.
The automated assessment of the folding complexity provides
insight into both the likelihood of misfolding and the
kinetic folding rate of the given protein. In terms of
the funnel free energy landscape theory, a protein that
truly follows the mechanism of sequential folding, in principle,
encounters smoother free energy barriers. A simple sequentially
folded protein should, therefore, be less error prone and
fold faster than a protein with a complex folding pattern.