BioInfoBank Library


 
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
The process of experimental determination of protein structure is marred with a high ratio of failures at many stages. With availability of large quantities of data from high-throughput structure determination in structural genomics centers, we can now learn to recognize protein features correlated with failures; thus, we can recognize proteins more likely to succeed and eventually learn how to modify those that are less likely to succeed. Here, we identify several protein features that correlate strongly with successful protein production and crystallization and combine them into a single score that assesses "crystallization feasibility." The formula derived here was tested with a jackknife procedure and validated on independent benchmark sets. The "crystallization feasibility" score described here is being applied to target selection in the Joint Center for Structural Genomics, and is now contributing to increasing the success rate, lowering the costs, and shortening the time for protein structure determination. Analyses of PDB depositions suggest that very similar features also play a role in non-high-throughput structure determination, suggesting that this crystallization feasibility score would also be of significant interest to structural biology, as well as to molecular and biochemistry laboratories.

Latest citations:

go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Midwest Center for Structural Genomics, Biosciences Division, Argonne National Laboratory, 9700 S Cass Ave., Argonne, IL, 60439, USA, gbabnigg@anl.gov.
The high-throughput structure determination pipelines developed by structural genomics programs offer a unique opportunity for data mining. One important question is how protein properties derived from a primary sequence correlate with the protein's propensity to yield X-ray quality crystals (crystallizability) and 3D X-ray structures. A set of protein properties were computed for over 1,300 proteins that expressed well but were insoluble, and for ~720 unique proteins that resulted in X-ray structures. The correlation of the protein's iso-electric point and grand average hydropathy (GRAVY) with crystallizability was analyzed for full length and domain constructs of protein targets. In a second step, several additional properties that can be calculated from the protein sequence were added and evaluated. Using statistical analyses we have identified a set of the attributes correlating with a protein's propensity to crystallize and implemented a Support Vector Machine (SVM) classifier based on these. We have created applications to analyze and provide optimal boundary information for query sequences and to visualize the data. These tools are available via the web site http://bioinformatics.anl.gov/cgi-bin/tools/pdpredictor .
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, 30318.
SUMMARY: In the post-genomic era, the annotation of protein function facilitates the understanding of various biological processes. To extend the range of function annotation methods to the twilight zone of sequence identity, we have developed approaches that exploit both protein tertiary structure and/or protein sequence evolutionary relationships. To serve the scientific community, we have integrated the structure prediction tools, TASSER, TASSER-Lite and METATASSER, and the functional inference tools, FINDSITE, a structure based algorithm for binding site prediction, GO molecular function inference and ligand screening, EFICAz(2), a sequence based approach to enzyme function inference, and DBD-hunter, an algorithm for predicting DNA binding proteins and associated DNA binding residues, into a unified web resource, PSiFR (Protein Structure and Function prediction Resource). Availability and Implementation: PSiFR is freely available for use on the web at http://psifr.cssb.biology.gatech.edu/ CONTACT: skolnick@gatech.edu.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
[My paper] Andrzej Joachimiak
Midwest Center for Structural Genomics, Structural Biology Center, Biosciences Division, Argonne National Laboratory, 9700 S Class Ave., Argonne, IL 60439, USA.
Protein X-ray crystallography recently celebrated its 50th anniversary. The structures of myoglobin and hemoglobin determined by Kendrew and Perutz provided the first glimpses into the complex protein architecture and chemistry. Since then, the field of structural molecular biology has experienced extraordinary progress and now more than 55000 protein structures have been deposited into the Protein Data Bank. In the past decade many advances in macromolecular crystallography have been driven by world-wide structural genomics efforts. This was made possible because of third-generation synchrotron sources, structure phasing approaches using anomalous signal, and cryo-crystallography. Complementary progress in molecular biology, proteomics, hardware and software for crystallographic data collection, structure determination and refinement, computer science, databases, robotics and automation improved and accelerated many processes. These advancements provide the robust foundation for structural molecular biology and assure strong contribution to science in the future. In this report we focus mainly on reviewing structural genomics high-throughput X-ray crystallography technologies and their impact.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada.
Production of high-quality crystals is one of the main bottlenecks in the X-ray crystallography driven protein structure determination. Availability of structure determination data repositories, such as TargetDB and PepcDB, and flexibility in target selection in structural genomics motivate development of methods that predict crystallization propensity from a given protein sequence. We introduce a novel linear model tree-based meta-predictor, MetaPPCP, which takes advantage of the complementarity of state-of-the-art protein crystallization propensity predictors to provide predictions with about 80% accuracy. Our method combines predictions of XtalPred and CRYSTALP2 with information concerning isoelectric point, hydropathy and number of solved structures for similar sequences. Empirical comparison shows that MetaPPCP outperforms current predictors including OB-Score, XtalPred, ParCrys and CRYSTALP2. MetaPPCP obtains over 92% accuracy for over a half of its predictions that have probability (propensity to be predicted as crystallizable or crystallization resistant) of above 0.8. The proposed method could provide useful input for target selection procedures of current structural genomics efforts.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
ABSTRACT: BACKGROUND: Current protocols yield crystals for <30% of known proteins, indicating that automatically identifying crystallizable proteins may improve high-throughput structural genomics efforts. We introduce CRYSTALP2, a kernel-based method that predicts the propensity of a given protein sequence to produce diffraction-quality crystals. This method utilizes the composition and collocation of amino acids, isoelectric point, and hydrophobicity, as estimated from the primary sequence, to generate predictions. CRYSTALP2 extends its predecessor, CRYSTALP, by enabling predictions for sequences of unrestricted size and provides improved prediction quality. RESULTS: A significant majority of the collocations used by CRYSTALP2 include residues with high conformational entropy, or low entropy and high potential to mediate crystal contacts; notably, such residues are utilized by surface entropy reduction methods. We show that the collocations provide complementary information to the hydrophobicity and isoelectric point. Tests on four datasets show that CRYSTALP2 outperforms several existing sequence-based predictors (CRYSTALP, OB-score, and SECRET). CRYSTALP2's accuracy, MCC, and AROC range between 69.3 and 77.5%, 0.39 and 0.55, and 0.72 and 0.79, respectively. Our predictions are similar in quality and are complementary to the predictions of the most recent ParCrys and XtalPred methods. Our results also suggest that, as work in protein crystallization continues (thereby enlarging the population of proteins with known crystallization propensities), the prediction quality of the CRYSTALP2 method should increase. The prediction model and the datasets used in this contribution can be downloaded from http://biomine.ece.ualberta.ca/CRYSTALP2/CRYSTALP2.html. CONCLUSIONS: CRYSTALP2 provides relatively accurate crystallization propensity predictions for a given protein chain that either outperform or complement the existing approaches. The proposed method can be used to support current efforts towards improving the success rate in obtaining diffraction-quality crystals.
go to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Department of Molecular Physiology and Biological Physics, University of Virginia, 1340 Jefferson Park Avenue, Charlottesville, VA 22908, USA. wladek@iwonka.med.virginia.edu.
While three dimensional structures have long been used to search for new drug targets, only a fraction of new drugs coming to the market has been developed with the use of a structure-based drug discovery approach. However, the recent years have brought not only an avalanche of new macromolecular structures, but also significant advances in the protein structure determination methodology only now making their way into structure-based drug discovery. In this paper, we review recent developments resulting from the Structural Genomics (SG) programs, focusing on the methods and results most likely to improve our understanding of the molecular foundation of human diseases. SG programs have been around for almost a decade, and in that time, have contributed a significant part of the structural coverage of both the genomes of pathogens causing infectious diseases and structurally uncharacterized biological processes in general. Perhaps most importantly, SG programs have developed new methodology at all steps of the structure determination process, not only to determine new structures highly efficiently, but also to screen protein/ligand interactions. We describe the methodologies, experience and technologies developed by SG, which range from improvements to cloning protocols to improved procedures for crystallographic structure solution that may be applied in "traditional" structural biology laboratories particularly those performing drug discovery. We also discuss the conditions that must be met to convert the present high-throughput structure determination pipeline into a high-output structure-based drug discovery system.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, USA.
A large set of three-dimensional structures of 264 protein-protein complexes with known nonsynonymous single nucleotide polymorphisms (nsSNPs) at the interface was built using homology-based methods. The nsSNPs were mapped on the proteins' structures and their effect on the binding energy was investigated with CHARMM force field and continuum electrostatic calculations. Two sets of nsSNPs were studied: disease annotated Online Mendelian Inheritance in Man (OMIM) and nonannotated (non-OMIM). It was demonstrated that OMIM nsSNPs tend to destabilize the electrostatic component of the binding energy, in contrast with the effect of non-OMIM nsSNPs. In addition, it was shown that the change of the binding energy upon amino acid substitutions is not related to the conservation of the net charge, hydrophobicity, or hydrogen bond network at the interface. The results indicate that, generally, the effect of nsSNPs on protein-protein interactions cannot be predicted from amino acids' physico-chemical properties alone, since in many cases a substitution of a particular residue with another amino acid having completely different polarity or hydrophobicity had little effect on the binding energy. Analysis of sequence conservation showed that nsSNP at highly conserved positions resulted in a large variance of the binding energy changes. In contrast, amino acid substitutions corresponding to nsSNPs at nonconserved positions, on average, were not found to have a large effect on binding affinity. pKa calculations were performed and showed that amino acid substitutions could change the wild-type proton uptake/release and thus resulting in different pH-dependence of the binding energy.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Research Center for Asian Infectious Diseases, Institute of Medical Science, University of Tokyo, 4-6-1, Shirokanedai, Minato-ku, Tokyo 108-8639, Japan; China-Japan Joint Laboratory of Structural Virology and Immunology, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, People's Republic of China.
The expression and solubilization of insoluble proteins have been facilitated by the introduction of protein tags. In our analyses of viral protein R (Vpr) of human immunodeficiency virus 1 (HIV-1), however, several conventional tag proteins enhanced its expression but failed to solubilize it. Therefore, we decided to explore whether proteins derived from Thermus thermophilus HB8 (T. th.), a highly heat-stable bacterium, could be used as tag proteins to enhance the solubilization of Vpr. Based on the data accumulated during the recent structural genomics project of T. th., we selected 15 T. th. proteins with high expression levels and solubilities. From this group, we identified a T. th. tag protein that expressed Vpr in a soluble form. Furthermore, two T. th. tag proteins, including the identified one, were found to solubilize the extremely insoluble membrane-spanning domain of the envelope protein of HIV-1. When green fluorescent protein (GFP) was used as a passenger protein of T. th. tags, the brightness and stability of GFP were similar to those of untagged GFP, suggesting that the T. th. tags do not negatively affect the function of the passenger protein. Thus, data of structural genomics can be applied to generate a customized versatile protein tag for protein analyses.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
[1] Northeast Structural Genomics Consortium, 702A Fairchild Center, MC2434, Columbia University, New York, New York 10027, USA.[2] Department of Biological Sciences, 702A Fairchild Center, MC2434, Columbia University, New York, New York 10027, USA.
Crystallization is the most serious bottleneck in high-throughput protein-structure determination by diffraction methods. We have used data mining of the large-scale experimental results of the Northeast Structural Genomics Consortium and experimental folding studies to characterize the biophysical properties that control protein crystallization. This analysis leads to the conclusion that crystallization propensity depends primarily on the prevalence of well-ordered surface epitopes capable of mediating interprotein interactions and is not strongly influenced by overall thermodynamic stability. We identify specific sequence features that correlate with crystallization propensity and that can be used to estimate the crystallization probability of a given construct. Analyses of entire predicted proteomes demonstrate substantial differences in the amino acid-sequence properties of human versus eubacterial proteins, which likely reflect differences in biophysical properties, including crystallization propensity. Our thermodynamic measurements do not generally support previous claims regarding correlations between sequence properties and protein stability.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Biozentrum, University of Basel and SIB Swiss Institute of Bioinformatics, Basel, Switzerland.
SWISS-MODEL Repository (http://swissmodel.expasy.org/repository/) is a database of 3D protein structure models generated by the SWISS-MODEL homology-modelling pipeline. The aim of the SWISS-MODEL Repository is to provide access to an up-to-date collection of annotated 3D protein models generated by automated homology modelling for all sequences in Swiss-Prot and for relevant models organisms. Regular updates ensure that target coverage is complete, that models are built using the most recent sequence and template structure databases, and that improvements in the underlying modelling pipeline are fully utilised. As of September 2008, the database contains 3.4 million entries for 2.7 million different protein sequences from the UniProt database. SWISS-MODEL Repository allows the users to assess the quality of the models in the database, search for alternative template structures, and to build models interactively via SWISS-MODEL Workspace (http://swissmodel.expasy.org/workspace/). Annotation of models with functional information and cross-linking with other databases such as the Protein Model Portal (http://www.proteinmodelportal.org) of the PSI Structural Genomics Knowledge Base facilitates the navigation between protein sequence and structure resources.

Other papers by authors:

go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics, La Jolla, CA 92037, USA.
XtalPred is a web server for prediction of protein crystallizability. The prediction is made by comparing several features of the protein with distributions of these features in TargetDB and combining the results into an overall probability of crystallization. XtalPred provides:(1) a detailed comparison of the protein's features to the corresponding distribution from TargetDB;(2) a summary of protein features and predictions that indicate problems that are likely to be encountered during protein crystallization;(3) prediction of ligands; and (4)(optional) lists of close homologs from complete microbial genomes that are more likely to crystallize. AVAILABILITY: The XtalPred web server is freely available for academic users on http://ffas.burnham.org/XtalPred
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics, Bioinformatics Core, Burnham Institute for Medical Research, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA.
Even closely homologous proteins often have different crystallization properties and propensities. This observation can be used to introduce an additional dimension into crystallization trials by simultaneous targeting multiple homologs in what we call a "genome pool" strategy. We show that this strategy works because protein physicochemical properties correlated with crystallization success have a surprisingly broad distribution within most protein families. There are also "easy" and "difficult" families where this distribution is tilted in one direction. This leads to uneven structural coverage of protein families, with more "easy" ones solved. Increasing the size of the "genome pool" can improve chances of solving the "difficult" ones. In contrast, our analysis does not indicate that any specific genomes are "easy" or "difficult". Finally, we show that the group of proteins with known 3D structures is systematically different from the general pool of known proteins and we assess the structural consequences of these differences.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics (http://www.jcsg.org); Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA.
Mre11 nuclease plays a central role in the repair of cytotoxic and mutagenic DNA double-strand breaks (DSBs). As x-ray structural information has only been available for the Pyrococcus furiosus enzyme (PfMre11), the conserved and variable features of this nuclease across the domains of life have not been experimentally defined. Our crystal structure and biochemical studies demonstrate that TM1635 from Thermotoga maritima, originally annotated as a putative nuclease, is the Mre11 endo/exonuclease from T. maritima (TmMre11) and the first such structure from eubacteria. TmMre11 and PfMre11 display similar overall structures, despite sequence identity in the twilight zone of only ~20%. However, they differ substantially in their DNA specificity domains and in their dimeric organization. Residues in the nuclease domain are highly conserved, but those in the DNA specificity domain are not. The structural differences likely affect how Mre11s from different organisms recognize and interact with single-stranded DNA, double-stranded DNA and DNA hairpin structures during DNA repair. The TmMre11 nuclease active site has no bound metal ions, but is conserved in sequence and structure with exception of a histidine that is important in PfMre11 nuclease activity. Nevertheless, biochemical characterization confirms that TmMre11 possesses both endonuclease and exonuclease activities on ssDNA and dsDNA substrates, respectively.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics; Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA.
Pleckstrin homology (PH) domains have been identified only in eukaryotic proteins to date. We have determined crystal structures for three members of an uncharacterized protein family (Pfam PF08000), which provide compelling evidence for the existence of Pleckstrin homology-like domains (PH-like) in bacteria (PHb). The first two structures contain a single PHb domain that forms a dome-shaped, oligomeric ring with C(5) symmetry. The third structure has an additional helical hairpin attached at the C-terminus and forms a similar, but much larger ring with C(12) symmetry. Thus, both molecular assemblies exhibit rare, higher order, cyclic symmetry, but preserve a similar arrangement of their PHb domains, which gives rise to a conserved hydrophilic surface at the intersection of the beta-strands of adjacent protomers that likely mediates protein-protein interactions. As a result of these structures, additional families of bacterial PH (PHb) domains can now be identified, suggesting that PH domains are much more widespread than originally anticipated. Thus, rather than being a eukaryotic innovation, the PH domain superfamily appears to have existed before prokaryotes and eukaryotes diverged.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Molecular Modeling (JCMM), Burnham Institute for Medical Research, La Jolla, CA 92037, USA.
Metabolic pathways have traditionally been described in terms of biochemical reactions and metabolites. With the use of structural genomics and systems biology, we generated a three-dimensional reconstruction of the central metabolic network of the bacterium Thermotoga maritima. The network encompassed 478 proteins, of which 120 were determined by experiment and 358 were modeled. Structural analysis revealed that proteins forming the network are dominated by a small number (only 182) of basic shapes (folds) performing diverse but mostly related functions. Most of these folds are already present in the essential core (approximately 30%) of the network, and its expansion by nonessential proteins is achieved with relatively few additional folds. Thus, integration of structural data with networks analysis generates insight into the function, mechanism, and evolution of biological networks.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics; Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA, USA.
Cell cycle regulated stalk biogenesis in Caulobacter crescentus is controlled by a multi-step phosphorelay system consisting of the hybrid histidine kinase ShkA, the histidine-phosphotransfer protein ShpA and the response regulator TacA. ShpA shuttles phosphoryl groups between ShkA and TacA. When phosphorylated, TacA triggers a downstream transcription cascade for stalk synthesis in an RpoN-dependent manner. The crystal structure of ShpA was determined to 1.52 A resolution. ShpA belongs to a family of monomeric histidine phosphotransfer (HPt) proteins, which feature a highly conserved four-helix bundle. The phosphorylatable histidine, His56, is located on the surface of the helix bundle and is fully solvent exposed. One end of the four-helix bundle in ShpA is shorter compared to other characterized histidine phosphotransfer proteins, whereas the face that potentially interacts with the response regulators is structurally conserved. Similarities of the interaction surface around the phosphorylation site suggest that ShpA is likely to share a common mechanism for molecular recognition and phosphotransfer with yeast phosphotransfer protein YPD1 despite low overall sequence similarity.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics, SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA 94025, USA; Stanford Synchrotron Radiation Lightsource (SSRL), SLAC National Accelerator Laboratory, Stanford University, Menlo Park, CA 94025, USA.
The crystal structures of two homologous endopeptidases from cyanobacteria Anabaena variabilis and Nostoc punctiforme were determined at 1.05 and 1.60 A resolution, respectively, and contain a bacterial SH3-like domain (SH3b) and a ubiquitous cell-wall-associated NlpC/P60 (or CHAP) cysteine peptidase domain. The NlpC/P60 domain is a primitive, papain-like peptidase in the CA clan of cysteine peptidases with a Cys126/His176/His188 catalytic triad and a conserved catalytic core. We deduced from structure and sequence analysis, and then experimentally, that these two proteins act as gamma-D-glutamyl-L-diamino acid endopeptidases (EC 3.4.22.-). The active site is located near the interface between the SH3b and NlpC/P60 domains, where the SH3b domain may help define substrate specificity, instead of functioning as a targeting domain, so that only muropeptides with an N-terminal L-alanine can bind to the active site.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Joint Center for Structural Genomics.
ECX21941 represents a very large family (over 600 members) of novel, ocean metagenome-specific proteins identified by clustering of the dataset from the Global Ocean Sampling expedition. The crystal structure of ECX21941 reveals unexpected similarity to Sm/LSm proteins, which are important RNA-binding proteins, despite no detectable sequence similarity. The ECX21941 protein assembles as a homopentamer in solution and in the crystal structure when expressed in Escherichia coli and represents the first pentameric structure for this Sm/LSm family of proteins, although the actual oligomeric form in vivo is currently not known. The genomic neighborhood analysis of ECX21941 and its homologs combined with sequence similarity searches suggest a cyanophage origin for this protein. The specific functions of members of this family are unknown, but our structure analysis of ECX21941 indicates nucleic acid-binding capabilities and suggests a role in RNA and/or DNA processing. Proteins 2009.(c) 2008 Wiley-Liss, Inc.

Latest similar papers:

go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
ABSTRACT: BACKGROUND: Many protein structures determined in high-throughput structural genomics centers, despite their significant novelty and importance, are available only as PDB depositions and are not accompanied by a peer-reviewed manuscript. Because of this they are not accessible by the standard tools of literature searches, remaining underutilized by the broad biological community. RESULTS: To address this issue we have developed TOPSAN, The Open Protein Structure Annotation Network, a web-based platform that combines the openness of the wiki model with the quality control of scientific communication. TOPSAN enables research collaborations and scientific dialogue among globally distributed participants, the results of which are reviewed by experts and eventually validated by peer review. The immediate goal of TOPSAN is to harness the combined experience, knowledge, and data from such collaborations in order to enhance the impact of the astonishing number and diversity of structures being determined by structural genomics centers and high-throughput structural biology. CONCLUSIONS: TOPSAN combines features of automated annotation databases and formal, peer-reviewed scientific research literature, providing an ideal vehicle to bridge a gap between rapidly accumulating data from high-throughput technologies and a much slower pace for its analysis and integration with other, relevant research.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Midwest Center for Structural Genomics (MCSG), Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL, 60439, USA, abinkowski@anl.gov.
A semi-automated computational procedure to assist in the identification of bound ligands from unknown electron density has been developed. The atomic surface surrounding the density blob is compared to a library of three-dimensional ligand binding surfaces extracted from the Protein Data Bank (PDB). Ligands corresponding to surfaces which share physicochemical texture and geometric shape similarities are considered for assignment. The method is benchmarked against a set of well represented ligands from the PDB, in which we show that we can identify the correct ligand based on the corresponding binding surface. Finally, we apply the method during model building and refinement stages from structural genomics targets in which unknown density blobs were discovered. A semi-automated computational method is described which aims to assist crystallographers with assigning the identity of a ligand corresponding to unknown electron density. Using shape and physicochemical similarity assessments between the protein surface surrounding the density and a database of known ligand binding surfaces, a plausible list of candidate ligands are identified for consideration. The method is validated against highly observed ligands from the Protein Data Bank and results are shown from its use in a high-throughput structural genomics pipeline.
go to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
[My paper] W F Anderson
Department of Molecular Pharmacology and Biological Chemistry, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA. wf-anderson@northwestern.edu.
The application of structural genomics methods and approaches to proteins from organisms causing infectious diseases is making available the three dimensional structures of many proteins that are potential drug targets and laying the groundwork for structure aided drug discovery efforts. There are a number of structural genomics projects with a focus on pathogens that have been initiated worldwide. The Center for Structural Genomics of Infectious Diseases (CSGID) was recently established to apply state-of-the-art high throughput structural biology technologies to the characterization of proteins from the National Institute for Allergy and Infectious Diseases (NIAID) category A-C pathogens and organisms causing emerging, or re-emerging infectious diseases. The target selection process emphasizes potential biomedical benefits. Selected proteins include known drug targets and their homologs, essential enzymes, virulence factors and vaccine candidates. The Center also provides a structure determination service for the infectious disease scientific community. The ultimate goal is to generate a library of structures that are available to the scientific community and can serve as a starting point for further research and structure aided drug discovery for infectious diseases. To achieve this goal, the CSGID will determine protein crystal structures of 400 proteins and protein-ligand complexes using proven, rapid, highly integrated, and cost-effective methods for such determination, primarily by X-ray crystallography. High throughput crystallographic structure determination is greatly aided by frequent, convenient access to high-performance beamlines at third-generation synchrotron X-ray sources.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Department of Computer Science, University of Minnesota 117 Pleasant St SE, Room 464, Minneapolis, MN 55455.
MOTIVATION: Identifying residues that interact with ligands is useful as a first step to understanding protein function and as an aid to designing small molecules that target the protein for interaction. Several studies have shown sequence features are very informative for this type of prediction while structure features have also been useful when structure is available. We develop a sequence-based method, called LIBRUS, that combines homology-based transfer and direct prediction using machine learning and compare it to previous sequence-based work and current structure-based methods. RESULTS: Our analysis shows that homology-based transfer is slightly more discriminating than a support vector machine learner using profiles and predicted secondary structure. We combine these two approaches in a method called LIBRUS. On a benchmark of 885 sequence independent proteins, it achieves an area under the ROC curve (ROC) of 0.83 with 45% precision at 50% recall, a significant improvement over previous sequence-based efforts. On an independent benchmark set, a current method, FINDSITE, based on structure features achieves a 0.81 ROC with 54% precision at 50% recall while LIBRUS achieves a ROC of 0.82 with 39% precision at 50% recall at a smaller computational cost. When LIBRUS and FINDSITE predictions are combined, performance is increased beyond either reaching an ROC of 0.86 and 59% precision at 50% recall. AVAILABILITY: Software developed for this study is available at http://bioinfo.cs.umn.edu/supplements/binf2009 along with supplemental data on the study. CONTACT: kauffman@cs.umn.edu, karypis@cs.umn.edu.
go to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Australian Centre for Plant Functional Genomics, School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA, 5064, Australia, maria.hrmova@adelaide.edu.au.
By mid-2007, the three-dimensional (3D) structures of some 45,000 proteins have been solved, over a period where the linear structures of millions of genes have been defined. Technical challenges associated with X-ray crystallography are being overcome and high-throughput methods both for crystallization of proteins and for solving their 3D structures are under development. The question arises as to how structural biology can be integrated with and adds value to functional genomics programs. Structural biology will assist in the definition of gene function through the identification of the likely function of the protein products of genes. The 3D information allows protein sequences predicted from DNA sequences to be classified into broad groups, according to the overall 'fold', or 3D shape, of the protein. Structural information can be used to predict the preferred substrate of a protein, and thereby greatly enhance the accurate annotation of the corresponding gene. Furthermore, it will enable the effects of amino acid substitutions in enzymes to be better understood with respect to enzyme function and could thereby provide insights into natural variation in genes. If the molecular basis of transcription factor-DNA interactions were defined through precise 3D knowledge of the protein-DNA binding site, it would be possible to predict the effects of base substitutions within the motif on the specificity and/or kinetics of binding. In this chapter, we present specific examples of how structural biology can provide valuable information for functional genomics programs.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Department of Chemistry and Chemical Biology, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, Swiss Institute of Bioinfomatics & Biozentrum, University of Basel, CH-4056 Basel, Switzerland, Biochemie-Zentrum, Heidelberg University, D-69120 Heidelberg, Germany, Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22908-0736, Department of Biochemistry & Molecular Biophysics, Columbia University, New York, NY 10027 and Harvard Institute of Proteomics & Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115.
The Protein Structure Initiative Structural Genomics Knowledgebase (PSI SGKB, http://kb.psi-structuralgenomics.org) has been created to turn the products of the PSI structural genomics effort into knowledge that can be used by the biological research community to understand living systems and disease. This resource provides central access to structures in the Protein Data Bank (PDB), along with functional annotations, associated homology models, worldwide protein target tracking information, available protocols and the potential to obtain DNA materials for many of the targets. It also offers the ability to search all of the structural and methodological publications and the innovative technologies that were catalyzed by the PSI's high-throughput research efforts. In collaboration with the Nature Publishing Group, the PSI SGKB provides a research library, editorials about new research advances, news and an events calendar to present a broader view of structural biology and structural genomics. By making these resources freely available, the PSI SGKB serves as a bridge to connect the structural biology and the greater biomedical communities.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
[My paper] Roman A Laskowski
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
PDBsum (http://www.ebi.ac.uk/pdbsum) provides summary information about each experimentally determined structural model in the Protein Data Bank (PDB). Here we describe some of its most recent features, including figures from the structure's key reference, citation data, Pfam domain diagrams, topology diagrams and protein-protein interactions. Furthermore, it now accepts users' own PDB format files and generates a private set of analyses for each uploaded structure.
go to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York YO10 5YW, U.K.
In recent times, there has been a large increase in the number of protein structures deposited in the Protein Data Bank. Structural genomics initiatives have contributed to this expansion through their focus on high-throughput structural determination. This has fuelled advances in many of the techniques in the pipeline from gene to protein to crystal to structure. These include ligation-independent cloning methods, parallel purification systems, robotic crystallization devices and automated methods of crystal identification, data collection and, in some cases, structure solution. Some of these advances are described and discussed briefly with an emphasis on activities in the York Structural Biology Laboratory through its participation in the Structural Proteomics in Europe consortium.
go to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
SGX Pharmaceuticals, Inc., San Diego, California.
: Phase II of the Protein Structure Initiative, funded by the NIH NIGMS (National Institute of General Medical Sciences), is a 5-year effort to determine thousands of protein structures. The New York SGX Research Center for Structural Genomics (NYSGXRC) is one of the four large-scale production centers tasked with determining 100-200 structures annually. Almost all protein production is carried out using the high throughput structural biology platform at SGX Pharmaceuticals (SGX), which supplies 120 or more ultrapure proteins per month for NYSGXRC crystallization and structure determination activities. Protocols for PCR, cloning, expression/solubility testing, fermentation, purification, and crystallization are described. General protocols and detailed experimental results for each target are updated weekly at the public PepcDB website (pepcdb.pdb.org/), and all NYSGXRC clones should be available in 2008 through the PlasmID resource operated by the Harvard Institute of Proteomics.
go to Publishergo to Pubmedgo to Scholargo to Googleshow EndNote Citationshow BibTex Citation
Structural Genomics Consortium, University of Oxford, Old Road Campus Research Building, Roosevelt Drive, Headington, Oxford OX3 7DQ, UK; Nuffield Department of Clinical Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Headington, Oxford OX3 9DU, UK.
Structural genomics (SG) has significantly increased the number of novel protein structures of targets with medical relevance. In the protein kinase area, SG has contributed >50% of all novel kinases structures during the past three years and determined more than 30 novel catalytic domain structures. Many of the released structures are inhibitor complexes and a number of them have identified new inhibitor binding modes and scaffolds. In addition, generated reagents, assays, and inhibitor screening data provide a diversity of chemogenomic data that can be utilized for early drug development. Here we discuss the currently available structural data for the kinase family considering novel structures as well as inhibitor complexes. Our analysis revealed that the structural coverage of many kinases families is still rather poor, and inhibitor complexes with diverse inhibitors are only available for a few kinases. However, we anticipate that with the current rate of structure determination and high throughput technologies developed by SG programs these gaps will be closed sooner. In addition, the generated reagents will put SG initiatives in a unique position providing data beyond protein structure determination by identifying chemical probes, determining their binding modes and target specificity.
leszek
lukasz
 

2010-09-09 08:31:29 © BioInfoBank Institute