A statistical analysis of a representative data
set of 169 known protein structures was used to analyze
the specificity of residue interactions between spatial
neighboring strands in β-sheets. Pairwise potentials
were derived from the frequency of residue pairs in nearest
contact, second nearest and third nearest contacts across
neighboring β-strands compared to the expected frequency
of residue pairs in a random model. A pseudo-energy function
based on these statistical pairwise potentials recognized
native β-sheets among possible alternative pairings.
The native pairing was found within the three lowest energies
in 73% of the cases in the training data set and in 63%
of β-sheets in a test data set of 67 proteins, which
were not part of the training set. The energy function
was also used to detect tripeptides, which occur frequently
in β-sheets of native proteins. The majority of native
partners of tripeptides were distributed in a low energy
range. Self-correcting distance geometry (SECODG) calculations
using distance constraints sets derived from possible low
energy pairing of β-strands uniquely identified the
native pairing of the β-sheet in pancreatic trypsin
inhibitor (BPTI). These results will be useful for predicting
the structure of proteins from their amino acid sequence
as well as for the design of proteins containing β-sheets.