|
Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA. firas@u.washington.edu
MOTIVATION: Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slip-knots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. RESULTS: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect un-protein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.
Latest citations:
School of Informatics, Indiana University Purdue University Indianapolis, and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, 719 Indiana Avenue, Walker Plaza Building Suite 319, Indianapolis, IN 46202, USA.
Worldwide structural genomics projects are increasing structure coverage of sequence space but have not significantly expanded the protein structure space itself (i.e., number of unique structural folds) since 2007. Discovering new structural folds experimentally by directed evolution and random recombination of secondary-structure blocks is also proved rarely successful. Meanwhile, previous computational efforts for large-scale mapping of protein structure space are limited to simple model proteins and led to an inconclusive answer on the completeness of the existing observed protein structure space. Here, we build novel protein structures by extending naturally occurring circular (single-loop) permutation to multiple loop permutations (MLPs). These structures are clustered by structural similarity measure called TM-score. The computational technique allows us to produce different structural clusters on the same naturally occurring, packed, stable core but with alternatively connected secondary-structure segments. A large-scale MLP of 2936 domains from structural classification of protein domains reproduces those existing structural clusters (63%) mostly as hubs for many nonredundant sequences and illustrates newly discovered novel clusters as islands adopted by a few sequences only. Results further show that there exist a significant number of novel potentially stable clusters for medium-size or large-size single-domain proteins, in particular,>100 amino acid residues, that are either not yet adopted by nature or adopted only by a few sequences. This study suggests that MLP provides a simple yet highly effective tool for engineering and design of novel protein structures (including naturally knotted proteins). The implication of recovering new-fold targets from critical assessment of structure prediction techniques (CASP) by MLP on template-based structure prediction is also discussed. Our MLP structures are available for download at the publication page of the Web site http://sparks.informatics.iupui.edu.
Other papers by authors:
Department of Biomolecular Engineering, University of California at Santa Cruz Santa Cruz, CA 95064.
MOTIVATION: Knots in polypeptide chains have been found in very few proteins, and consequently should be generally avoided in protein structure prediction methods. Most effective structure prediction methods do not model the protein folding process itself, but rather seek only to correctly obtain the final native state. Consequently, the mechanisms that prevent knots from occurring in native proteins are not relevant to the modeling process, and as a result, knots can occur with significantly higher frequency in protein models. Here we describe Knotfind, a simple algorithm for knot detection that is fast enough for structure prediction, where tens or hundreds of thousands of conformations may be sampled during the course of a prediction. We have used this algorithm to characterize knots in large populations of model structures generated for targets in CASP 5 and CASP 6 using the Rosetta homology-based modeling method. RESULTS: Analysis of CASP5 models suggested several possible avenues for introduction of knots into these models, and these insights were applied to structure prediction in CASP 6, resulting in a significant decrease in the proportion of knotted models generated. Additionally, using the knot detection algorithm on structures in the Protein Data Bank, a previously unreported deep trefoil knot was found in acetylornithine transcarbamylase. AVAILABILITY: The Knotfind algorithm is available in the Rosetta structure prediction program at http://www.rosettacommons.org CONTACT: bort@soe.ucsc.edu.
Genome Res. 2012 Mar 27;:
22454233
Adnan Derti,
Philip Garrett-Engele,
Kenzie D Macisaac,
Richard C Stevens,
Shreedharan Sriram,
Ronghua Chen,
Carol A Rohl,
Jason M Johnson,
Tomas Babak
Merck.
We developed PolyA-Seq, a strand-specific and quantitative method for high-throughput sequencing of 3' ends of polyadenylated transcripts, and used it to globally map polyA sites in 24 matched tissues in human, rhesus, dog, mouse and rat. We show that PolyA-Seq is as accurate as existing RNA sequencing (RNA-Seq) approaches for digital gene expression (DGE), enabling simultaneous mapping of polyadenylation (polyA) sites and quantitative measurement of their usage. In human, we confirmed 158,533 known sites and discovered 280,857 novel sites (FDR<2.5%). On average 10% of novel human sites were also detected in matched tissues in other species. Most novel sites represent uncharacterized alternative polyA events and extensions of known transcripts in human and mouse, but primarily delineate novel transcripts in the other three species. 69.1% of known human genes that we detected have multiple polyA sites in their 3'UTRs, with 49.3% having three or more. We also detected polyadenylation of noncoding and antisense transcripts, including constitutive and tissue-specific primary microRNAs. The canonical polyA signal was strongly enriched and positionally conserved in all species. In general, usage of polyA sites is more similar within the same tissues across different species than within a species. These quantitative maps of polyA usage in evolutionarily and functionally related samples constitute a resource for understanding the regulatory mechanisms underlying alternative polyadenylation.
Firas Khatib,
Frank Dimaio,
Seth Cooper,
Maciej Kazmierczyk,
Miroslaw Gilski,
Szymon Krzywda,
Helena Zabranska,
Iva Pichova,
James Thompson,
Zoran Popović,
Mariusz Jaskolski,
David Baker
Nat Biotechnol. 2012 ;30 (4):344-8
22334048
Nanopore Group, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA.
An emerging DNA sequencing technique uses protein or solid-state pores to analyze individual strands as they are driven in single-file order past a nanoscale sensor. However, uncontrolled electrophoresis of DNA through these nanopores is too fast for accurate base reads. Here, we describe forward and reverse ratcheting of DNA templates through the α-hemolysin nanopore controlled by phi29 DNA polymerase without the need for active voltage control. DNA strands were ratcheted through the pore at median rates of 2.5-40 nucleotides per second and were examined at one nucleotide spatial precision in real time. Up to 500 molecules were processed at ∼130 molecules per hour through one pore. The probability of a registry error (an insertion or deletion) at individual positions during one pass along the template strand ranged from 10% to 24.5% without optimization. This strategy facilitates multiple reads of individual strands and is transferable to other nanopore devices for implementation of DNA sequence analysis.
Christopher B Eiben,
Justin B Siegel,
Jacob B Bale,
Seth Cooper,
Firas Khatib,
Betty W Shen,
Foldit Players,
Barry L Stoddard,
Zoran Popovic,
David Baker
Department of Biochemistry, University of Washington, Seattle, Washington, USA.
Computational enzyme design holds promise for the production of renewable fuels, drugs and chemicals. De novo enzyme design has generated catalysts for several reactions, but with lower catalytic efficiencies than naturally occurring enzymes. Here we report the use of game-driven crowdsourcing to enhance the activity of a computationally designed enzyme through the functional remodeling of its structure. Players of the online game Foldit were challenged to remodel the backbone of a computationally designed bimolecular Diels-Alderase to enable additional interactions with substrates. Several iterations of design and characterization generated a 24-residue helix-turn-helix motif, including a 13-residue insertion, that increased enzyme activity >18-fold. X-ray crystallography showed that the large insertion adopts a helix-turn-helix structure positioned as in the Foldit model. These results demonstrate that human creativity can extend beyond the macroscopic challenges encountered in everyday life to molecular-scale design problems.
Miroslaw Gilski,
Maciej Kazmierczyk,
Szymon Krzywda,
Helena Zábranská,
Seth Cooper,
Zoran Popović,
Firas Khatib,
Frank DiMaio,
James Thompson,
David Baker,
Iva Pichová,
Mariusz Jaskolski
Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, 60-780 Poznan, Poland.
Mason-Pfizer monkey virus (M-PMV), a D-type retrovirus assembling in the cytoplasm, causes simian acquired immunodeficiency syndrome (SAIDS) in rhesus monkeys. Its pepsin-like aspartic protease (retropepsin) is an integral part of the expressed retroviral polyproteins. As in all retroviral life cycles, release and dimerization of the protease (PR) is strictly required for polyprotein processing and virion maturation. Biophysical and NMR studies have indicated that in the absence of substrates or inhibitors M-PMV PR should fold into a stable monomer, but the crystal structure of this protein could not be solved by molecular replacement despite countless attempts. Ultimately, a solution was obtained in mr-rosetta using a model constructed by players of the online protein-folding game Foldit. The structure indeed shows a monomeric protein, with the N- and C-termini completely disordered. On the other hand, the flap loop, which normally gates access to the active site of homodimeric retropepsins, is clearly traceable in the electron density. The flap has an unusual curled shape and a different orientation from both the open and closed states known from dimeric retropepsins. The overall fold of the protein follows the retropepsin canon, but the C(α) deviations are large and the active-site 'DTG' loop (here NTG) deviates up to 2.7 Å from the standard conformation. This structure of a monomeric retropepsin determined at high resolution (1.6 Å) provides important extra information for the design of dimerization inhibitors that might be developed as drugs for the treatment of retroviral infections, including AIDS.
Firas Khatib,
Seth Cooper,
Michael D Tyka,
Kefan Xu,
Ilya Makedon,
Zoran Popovic,
David Baker,
Foldit Players
Department of Biochemistry, University of Washington, Box 357370, Seattle, WA 98195, USA.
Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.
Mol Syst Biol. 2011 ;7 :539
21988835
Cit:1
Fabian Sievers,
Andreas Wilm,
David Dineen,
Toby J Gibson,
Kevin Karplus,
Weizhong Li,
Rodrigo Lopez,
Hamish McWilliam,
Michael Remmert,
Johannes Söding,
Julie D Thompson,
Desmond G Higgins
School of Medicine and Medical Science, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland.
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
Firas Khatib,
Frank Dimaio,
Seth Cooper,
Maciej Kazmierczyk,
Miroslaw Gilski,
Szymon Krzywda,
Helena Zabranska,
Iva Pichova,
James Thompson,
Zoran Popović,
Mariusz Jaskolski,
David Baker
Department of Biochemistry, University of Washington, Seattle, Washington, USA.
Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.
Department of Biomolecular Engineering, University of California at Santa Cruz, Santa Cruz, CA 95064, USA.
We report the identification and characterization of a previously unidentified protein domain found in bacterial chemoreceptors and other bacterial signal transduction proteins. This domain contains a motif of three noncontiguous histidines and one cysteine, arranged as Hxx[WFYL]x(21-28)Cx[LFMVI]Gx[WFLVI]x(18-27)HxxxH(boldface type indicates residues that are nearly 100% conserved). This domain was first identified in the soluble Helicobacter pylori chemoreceptor TlpD. Using inductively coupled plasma mass spectrometry on heterologously and natively expressed TlpD, we determined that this domain binds zinc with a subfemtomolar dissociation constant. We thus named the domain CZB, for chemoreceptor zinc binding. Further analysis showed that many bacterial signaling proteins contain the CZB domain, most commonly proteins that participate in chemotaxis but also those that participate in c-di-GMP signaling and nitrate/nitrite sensing, among others. Proteins bearing the CZB domain are found in several bacterial phyla. The variety of signaling proteins using the CZB domain suggests that it plays a critical role in several signal transduction pathways.
Latest similar papers:
Zhang Initiative Research Unit, Advanced Science Institute, RIKEN, Wako, Saitama, Japan.
MOTIVATION Clustering is commonly used to identify the best decoy among many generated in protein structure prediction when using energy alone is insufficient. Calculation of the pairwise distance matrix for a large decoy set is computationally expensive. Typically, only a reduced set of decoys using energy filtering is subjected to clustering analysis. A fast clustering method for a large decoy set would be beneficial to protein structure prediction and this still poses a challenge. RESULTS We propose a method using propagation of geometric constraints to accelerate exact clustering, without compromising the distance measure. Our method can be used with any metric distance. Metrics that are expensive to compute and have known cheap lower and upper bounds will benefit most from the method. We compared our method's accuracy against published results from the SPICKER clustering software on 40 large decoy sets from the I-TASSER protein folding engine. We also performed some additional speed comparisons on six targets from the 'semfold' decoy set. In our tests, our method chose a better decoy than the energy criterion in 25 out of 40 cases versus 20 for SPICKER. Our method also was shown to be consistently faster than another fast software performing exact clustering named Calibur. In some cases, our approach can even outperform the speed of an approximate method. AVAILABILITY Our C++ software is released under the GNU General Public License. It can be downloaded from http://www.riken.jp/zhangiru/software/durandal_released.tgz.
Division of Mathematical Biology, National Institute for Medical Research, London, United Kingdom. wtaylor@nimr.mrc.ac.uk
It is well known that the set of observed topological arrangements of secondary structures in globular proteins is highly limited. These limitations have been explained as the consequence of several rules of thumb including a strong preference for right-handed connections, against crossing loops and certain beta strand patterns. We present a critical evaluation of the power of these rules to distinguish known from possible topologies in a large set of two- and three-layer protein structures and determine that although these rules are still largely valid, an increasing number of exceptions can be found to many of them. The rules are then used to construct a generalised linear model for assessing the probability of occurrence of an arbitrary topology in the PDB. Application of the model to a large set of topologies generated during structure prediction showed that many had a similar probability of occurrence to known PDB folds.
Mississippi State University, Starkville.
This paper presents a new streamline placement algorithm that produces evenly spaced long streamlines while preserving topological features of a flow field. Singularities and separatrices are extracted to decompose the flow field into topological regions. In each region, a seeding path is selected from a set of streamlines integrated in the orthogonal flow field. The uniform sample points on this path are then used as seeds to generate streamlines in the original flow field. Additional seeds are placed where a large gap between adjacent streamlines occurs. The number of short streamlines is significantly reduced as evenly spaced long streamlines spawned along the seeding paths can fill the topological regions very well. Several metrics for evaluating streamline placement quality are discussed and applied to our method as well as some other approaches. Compared to previous work in uniform streamline placement, our method is more effective in creating evenly spaced long streamlines and preserving topological features. It has the potential to provide both intuitive perception of important flow characteristics and detail reconstruction across visually pleasing streamlines.
Department of Statistics, University of California Berkeley, Berkeley, California, United States of America.
Distributions of the backbone dihedral angles of proteins have been studied for over 40 years. While many statistical analyses have been presented, only a handful of probability densities are publicly available for use in structure validation and structure prediction methods. The available distributions differ in a number of important ways, which determine their usefulness for various purposes. These include: 1) input data size and criteria for structure inclusion (resolution, R-factor, etc.); 2) filtering of suspect conformations and outliers using B-factors or other features; 3) secondary structure of input data (e.g., whether helix and sheet are included; whether beta turns are included); 4) the method used for determining probability densities ranging from simple histograms to modern nonparametric density estimation; and 5) whether they include nearest neighbor effects on the distribution of conformations in different regions of the Ramachandran map. In this work, Ramachandran probability distributions are presented for residues in protein loops from a high-resolution data set with filtering based on calculated electron densities. Distributions for all 20 amino acids (with cis and trans proline treated separately) have been determined, as well as 420 left-neighbor and 420 right-neighbor dependent distributions. The neighbor-independent and neighbor-dependent probability densities have been accurately estimated using Bayesian nonparametric statistical analysis based on the Dirichlet process. In particular, we used hierarchical Dirichlet process priors, which allow sharing of information between densities for a particular residue type and different neighbor residue types. The resulting distributions are tested in a loop modeling benchmark with the program Rosetta, and are shown to improve protein loop conformation prediction significantly. The distributions are available at http://dunbrack.fccc.edu/hdp.
Nat Protoc. 2010 Apr ;5 (4):725-38
20360767
Cit:55
[1] Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA.[2] Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, Kansas, USA.
The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm. Starting from an amino acid sequence, I-TASSER first generates three-dimensional (3D) atomic models from multiple threading alignments and iterative structural assembly simulations. The function of the protein is then inferred by structurally matching the 3D models with other known proteins. The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology terms. An estimate of accuracy of the predictions is provided based on the confidence score of the modeling. This protocol provides new insights and guidelines for designing of online server systems for the state-of-the-art protein structure and function predictions. The server is available at http://zhanglab.ccmb.med.umich.edu/I-TASSER.
ABSTRACT: BACKGROUND: Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. RESULTS: We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. CONCLUSIONS: Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.
ABSTRACT: BACKGROUND: Knowledge-based potentials have been widely used in the last 20 years for fold recognition, protein structure prediction from amino acid sequence, ligand binding, protein design, and many other purposes. However generally these are not readily accessible online. RESULTS: Our new knowledge-based potential server makes available many of these potentials for easy use to automatically compute the energies of protein structures or models supplied. Our web server for protein energy estimation uses four-body potentials, short-range potentials, and 23 different two-body potentials. Users can select potentials according to their needs and preferences. Files containing the coordinates of protein atoms in the PDB format can be uploaded as input. The results will be returned to the user's email address. CONCLUSION: Our Potentials 'R'Us server is an easily accessible, freely available tool with a web interface that collects all existing and future protein coarse-grained potentials and computes energies of multiple structural models.
Katarzyna Prymula,
Monika Piwowar,
Marek Kochanczyk,
Lukasz Flis,
Maciej Malawski,
Tomasz Szepieniec,
Giovanni Evangelista,
Giuseppe Minervini,
Fabio Polticelli,
Zdzisław Wiśniowski,
Kinga Sałapa,
Ewa Matczyńska,
Irena Roterman
Department of Bioinformatics and Telemedicine, Collegium Medicum, Jagiellonian University, Lazarza 16, PL-31-530 Krakow, Poland.
The three-dimensional structures of a set of 'never born proteins'(NBP, random amino acid sequence proteins with no significant homology with known proteins) were predicted using two methods: Rosetta and the one based on the 'fuzzy-oil-drop'(FOD) model. More than 3000 different random amino acid sequences have been generated, filtered against the non redundant protein sequence data base, to remove sequences with significant homology with known proteins, and subjected to three-dimensional structure prediction. Comparison between Rosetta and FOD predictions allowed to select the ten top (highest structural similarity) and the ten bottom (the lowest structural similarity) structures from the ranking list organized according to the RMS-D value. The selected structures were taken for detailed analysis to define the scale of structural accordance and discrepancy between the two methods. The structural similarity measurements revealed discrepancies between structures generated on the basis of the two methods. Their potential biological function appeared to be quite different as well. The ten bottom structures appeared to be 'unfoldable' for the FOD model. Some aspects of the general characteristics of the NBPs are also discussed. The calculations were performed on the EUChinaGRID grid platform to test the performance of this infrastructure for massive protein structure predictions.
Department of Computer Science, Rice University, Houston, TX 77005, USA. mmoll@cs.rice.edu
There is an increasing number of proteins with known structure but unknown function. Determining their function would have a significant impact on understanding diseases and designing new therapeutics. However, experimental protein function determination is expensive and very time-consuming. Computational methods can facilitate function determination by identifying proteins that have high structural and chemical similarity. Our focus is on methods that determine binding site similarity. Although several such methods exist, it still remains a challenging problem to quickly find all functionally-related matches for structural motifs in large data sets with high specificity. In this context, a structural motif is a set of 3D points annotated with physicochemical information that characterize a molecular function. We propose a new method called LabelHash that creates hash tables of n-tuples of residues for a set of targets. Using these hash tables, we can quickly look up partial matches to a motif and expand those matches to complete matches. We show that by applying only very mild geometric constraints we can find statistically significant matches with extremely high specificity in very large data sets and for very general structural motifs. We demonstrate that our method requires a reasonable amount of storage when employing a simple geometric filter and further improves on the specificity of our previous work while maintaining very high sensitivity. Our algorithm is evaluated on 20 homolog classes and a non-redundant version of the Protein Data Bank as our background data set. We use cluster analysis to analyze why certain classes of homologs are more difficult to classify than others. The LabelHash algorithm is implemented on a web server at http://kavrakilab.org/labelhash/.
The Burnham Institute, La Jolla, CA, USA.
The observation that similar protein sequences fold into similar three-dimensional structures provides a basis for the methods which predict structural features of a novel protein based on the similarity between its sequence and sequences of known protein structures. Similarity over entire sequence or large sequence fragment(s) enables prediction and modeling of entire structural domains while statistics derived from distributions of local features of known protein structures make it possible to predict such features in proteins with unknown structures. The accuracy of models of protein structures is sufficient for many practical purposes such as analysis of point mutation effects, enzymatic reactions, interaction interfaces of protein complexes, and active sites. Protein models are also used for phasing of crystallographic data and, in some cases, for drug design. By using models one can avoid the costly and time-consuming process of experimental structure determination. The purpose of this chapter is to give a practical review of the most popular protein structure prediction methods based on sequence similarity and to outline a practical approach to protein structure prediction. While the main focus of this chapter is on template-based protein structure prediction, it also provides references to other methods and programs which play an important role in protein structure prediction.
|
Polish News |
|
||
|
|