TitleGibbs sampling and helix-cap motifs.
Publication TypeJournal Article
Year of Publication2005
AuthorsKruus, E, Thumfort, P, Tang, C, Wingreen, NS
JournalNucleic Acids Res
Date Published2005
KeywordsAlgorithms, Amino Acid Motifs, Databases, Protein, Models, Molecular, Protein Structure, Secondary, Sequence Analysis, Protein

Protein backbones have characteristic secondary structures, including alpha-helices and beta-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of alpha-helix caps, we test whether the information content of the sequence-structure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of +/-1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.

Alternate JournalNucleic Acids Res.