Gemma L Holliday,
Gail J Bartlett,
Daniel E Almonacid,
Noel M O'Boyle,
Peter Murray-Rust,
Janet M Thornton,
John B O Mitchell
SUMMARY: MACiE (mechanism, annotation and classification in enzymes) is a publicly available web-based database, held in CMLReact (an XML application), that aims to help our understanding of the evolution of enzyme catalytic mechanisms and also to create a classification system which reflects the actual chemical mechanism (catalytic steps) of an enzyme reaction, not only the overall reaction. AVAILABILITY: http://www-mitchell.ch.cam.ac.uk/macie/.
Mesh-terms: Catalysis; Computational Biology :: methods; Database Management Systems; Databases, Factual; Databases, Genetic; Databases, Protein; Enzymes :: chemistry; Gene Expression; Information Storage and Retrieval; Internet; Models, Chemical; Models, Statistical; Programming Languages; Proteins :: chemistry; Research Support, Non-U.S. Gov't; Sequence Alignment; Sequence Analysis; Software;
Latest citations:
Magnetic Resonance Center (CERM)- University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy.
SUMMARY: Metal-MACiE is a new publicly available web-based database, held in MySQL, which aims to organize the available information on the properties and the roles of metals in the context of the catalytic mechanisms of metalloenzymes. Metal-MACiE, which currently covers 75% of metal-dependent EC sub-sub-classes and is continuously growing, exploits the existing MACiE database for the annotation of the reaction mechanisms. The two databases constitute complementary sources of information for enzymology, biochemistry and molecular pharmacology studies. AVAILABILITY: http://www.ebi.ac.uk/thornton-srv/databases/Metal_MACiE/home.html CONTACT: andreini@cerm.unifi.it SUPPLEMENTARY INFORMATION: Table S1, Figure S1.
We report, for the first time, on the statistics of chemical mechanisms and amino acid residue functions that occur in enzyme reaction sequences using the MACiE database of 202 distinct enzyme reaction mechanisms as a knowledge base. MACiE currently holds representatives from each Enzyme Commission sub-subclass where there is an available crystal structure and sufficient evidence in the primary literature for a mechanism. Each catalytic step of every reaction sequence in MACiE is fully annotated, so that it includes the function of the catalytic residues involved in the reaction and the chemical mechanisms by which substrates are transformed into products. We show that the most catalytic amino acid residues are histidine, cysteine and aspartate, which are also the residues whose side-chains are more likely to serve as reactants, and that have the greatest versatility of function. We show that electrophilic reactions in enzymes are very rare, and the majority of enzyme reactions rely upon nucleophilic and general acid/base chemistry. However, although rare, radical (homolytic) reactions are much more common than electrophilic reactions. Thus, the majority of amino acid residues perform stabilisation roles (as spectators) or proton shuttling roles (as reactants). The analysis presented provides a better understanding of the mechanisms of enzyme catalysis and may act as an initial step in the validation and prediction of mechanism in an enzyme active site.
Comparing the chemical spaces of metabolites and available chemicals: models of metabolite-likeness.
REQUIMTE, CQFB, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal, sunil.gupta@dq.fct.unl.pt.
The chemical space covered by compounds involved in metabolic reactions was compared with that of a random dataset of purchasable compounds by chemoinformatics techniques. The comparison was based on 3D structure, 2D structure, or descriptors of global properties, by means of self-organizing maps, random forests, and classification trees. The overlap between metabolites and non-metabolites was observed to be the least in the space defined by the global descriptors, the most discriminatory features being the number of OH groups, presence of aromatic systems, and molecular weight. Discrimination between the two datasets was achieved with accuracy up to 97%. Models were built to produce a metabolite-likeness parameter. A relationship between metabolite-likeness and ready biodegradability was observed.
ABSTRACT: BACKGROUND: The functional annotation of most genes in newly sequenced genomes is inferred from similarity to previously characterized sequences, an annotation strategy that often leads to erroneous assignments. We have performed a reannotation of 245 genomes using an updated version of EFICAz, a highly precise method for enzyme function prediction. RESULTS: Based on our three-field EC number predictions, we have obtained lower-bound estimates for the average enzyme content in Archaea (29%), Bacteria (30%) and Eukarya (18%). Most annotations added in KEGG from 2005 to 2006 agree with EFICAz predictions made in 2005. The coverage of EFICAz predictions is significantly higher than that of KEGG, especially for eukaryotes. Thousands of our novel predictions correspond to hypothetical proteins. We have identified a subset of 64 hypothetical proteins with low sequence identity to EFICAz training enzymes, whose biochemical functions have been recently characterized and find that in 96%(84%) of the cases we correctly identified their three-field (four-field) EC numbers. For two of the 64 hypothetical proteins: PA1167 from Pseudomonas aeruginosa, an alginate lyase (EC 4.2.2.3) and Rv1700 of Mycobacterium tuberculosis H37Rv, an ADP-ribose diphosphatase (EC 3.6.1.13), we have detected annotation lag of more than two years in databases. Two examples are presented where EFICAz predictions act as hypothesis generators for understanding the functional roles of hypothetical proteins: FLJ11151, a human protein overexpressed in cancer that EFICAz identifies as an endopolyphosphatase (EC 3.6.1.10), and MW0119, a protein of Staphylococcus aureus strain MW2 that we propose as candidate virulence factor based on its EFICAz predicted activity, sphingomyelin phosphodiesterase (EC 3.1.4.12). CONCLUSIONS: Our results suggest that we have generated enzyme function annotations of high precision and recall. These predictions can be mined and correlated with other information sources to generate biologically significant hypotheses and can be useful for comparative genome analysis and automated metabolic pathway reconstruction.
Department of Biopharmaceutical Sciences, University of California, San Francisco, California 94143, USA.
Enzyme evolution is often constrained by aspects of catalysis. Sets of homologous proteins that catalyze different overall reactions but share an aspect of catalysis, such as a common partial reaction, are called mechanistically diverse superfamilies. The common mechanistic steps and structural characteristics of several of these superfamilies, including the enolase, Nudix, amidohydrolase, and haloacid dehalogenase superfamilies have been characterized. In addition, studies of mechanistically diverse superfamilies are helping to elucidate mechanisms of functional diversification, such as catalytic promiscuity. Understanding how enzyme superfamilies evolve is vital for accurate genome annotation, predicting protein functions, and protein engineering.
School of Enzymology, Department of Chemistry, M.V. Lomonosov Moscow State University, Moscow, 119992, Russia.
SUMMARY: Universal ontology of catalytic sites is required to systematize enzyme catalytic sites, their evolution, as well as relations between catalytic sites and protein families, organisms and chemical reactions. Here we present a classification of hydrolases catalytic sites based on hierarchical organization. The Web-accessible database provides information on the catalytic sites, protein folds, EC numbers and source organisms of the enzymes and includes software allowing for analysis and visualization of the relations between them. AVAILABILITY: http://www.enzyme.chem.msu.ru/hcs/.
Other papers by authors:
Gemma L Holliday,
Daniel E Almonacid,
Gail J Bartlett,
Noel M O'boyle,
James W Torrance,
Peter Murray-Rust,
John B O Mitchell,
Janet M Thornton
EMBL-EBI, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD, UK.
MACiE (Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and is publicly available as a web-based data resource. This paper presents the first release of a web-based search tool to explore enzyme reaction mechanisms in MACiE. We also present Version 2 of MACiE, which doubles the dataset available (from Version 1). MACiE can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/
We report, for the first time, on the statistics of chemical mechanisms and amino acid residue functions that occur in enzyme reaction sequences using the MACiE database of 202 distinct enzyme reaction mechanisms as a knowledge base. MACiE currently holds representatives from each Enzyme Commission sub-subclass where there is an available crystal structure and sufficient evidence in the primary literature for a mechanism. Each catalytic step of every reaction sequence in MACiE is fully annotated, so that it includes the function of the catalytic residues involved in the reaction and the chemical mechanisms by which substrates are transformed into products. We show that the most catalytic amino acid residues are histidine, cysteine and aspartate, which are also the residues whose side-chains are more likely to serve as reactants, and that have the greatest versatility of function. We show that electrophilic reactions in enzymes are very rare, and the majority of enzyme reactions rely upon nucleophilic and general acid/base chemistry. However, although rare, radical (homolytic) reactions are much more common than electrophilic reactions. Thus, the majority of amino acid residues perform stabilisation roles (as spectators) or proton shuttling roles (as reactants). The analysis presented provides a better understanding of the mechanisms of enzyme catalysis and may act as an initial step in the validation and prediction of mechanism in an enzyme active site.
Unilever Centre for Molecular Science Informatics, Dept of Chemistry, University of Cambridge, Lensfield Rd, Cambridge CB2 1EW, UK.
The concept of reaction similarity has been well studied in terms of the overall transformation associated with a reaction, but not in terms of mechanism. We present the first method to give a quantitative measure of the similarity of reactions based upon their explicit mechanisms. Two approaches are presented to measure the similarity between individual steps of mechanisms: a fingerprint-based approach that incorporates relevant information on each mechanistic step; and an approach based only on bond formation, cleavage and changes in order. The overall similarity for two reaction mechanisms is then calculated using the Needleman-Wunsch alignment algorithm. An analysis of MACiE, a database of enzyme mechanisms, using our measure of similarity identifies some examples of convergent evolution of chemical mechanisms. In many cases, mechanism similarity is not reflected by similarity according to the EC system of enzyme classification. In particular, little mechanistic information is conveyed by the class level of the EC system.
EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
The MACiE database, a database of 223 distinct step-wise enzyme reaction mechanisms, currently holds representatives from each Enzyme Commission sub-subclass where there is an available crystal structure and sufficient evidence in the primary literature to support a mechanism. Each catalytic step of every reaction sequence in MACiE is fully annotated so that it includes the function of the catalytic residues involved in the reaction and the mechanism by which substrates are transformed into products. Using MACiE as a knowledge-base we have seen that the top ten most catalytic residues are histidine, aspartate, glutamate, lysine, cysteine, arginine, serine, threonine, tyrosine and tryptophan. Of these only seven (cysteine, histidine, aspartate, lysine, serine, threonine and tyrosine) dominate catalysis and provide essentially five functional roles that are absolutely essential. Stabilisation is the most common and essential role for all classes of enzyme, followed by general acid/base (proton acceptor and proton donor) functionality, with nucleophilic addition following closely behind (nucleophile and nucleofuge). We investigated the occurrence of these residues in MACiE and the Catalytic Site Atlas and found that, as expected, certain residue types are associated with each functional role, with some residue types able to perform diverse roles. In addition, it was seen that different classes of enzyme (as determined by the EC classification) have a tendency to employ different residues for catalysis. Further we show that whilst the differences between EC classes in catalytic residue composition are not immediately obvious from the general classes of Ingold mechanisms, there is some weak correlation between the mechanisms involved in a given EC class and the functions that the catalytic amino acid residues are performing. The analysis presented here provides a valuable insight into the functional roles of catalytic amino acid residues, which may have applications in many aspects of enzymology, from the design of novel enzymes to the prediction and validation of an enzyme's reaction mechanism.
EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
The process of deducing the catalytic mechanism of an enzyme from its structure is highly complex and requires extensive experimental work to validate a proposed mechanism. As one step towards improving the reliability of this process, we have gathered statistics describing the typical geometry of catalytic residues with regard to the substrate and one another. In order to analyse residue-substrate interactions, we have assembled a dataset of structures of enzymes of known mechanism bound to substrate, product, or a substrate analogue. Despite the challenges presented in obtaining such experimental data, we were able to include 42 enzyme structures. We have also assembled a separate dataset of catalytic residues which act upon other catalytic residues, using a set of 60 enzyme structures. For both datasets, we have extracted the distances between residues with a given catalytic function and their target moieties. The geometry of residues whose function involves the transfer or sharing of hydrogens (either with substrate or another residue) was analysed more closely. The results showed that the geometry for such productive interactions (prior to the transition state) closely resembles that seen in non-catalytic hydrogen bonds, with distances and angles in the normal expected range. Such statistics provide limits on "expected geometries" for catalytic residues, which will help to identify these residues and elucidate enzyme mechanisms.
Magnetic Resonance Center (CERM)- University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy.
SUMMARY: Metal-MACiE is a new publicly available web-based database, held in MySQL, which aims to organize the available information on the properties and the roles of metals in the context of the catalytic mechanisms of metalloenzymes. Metal-MACiE, which currently covers 75% of metal-dependent EC sub-sub-classes and is continuously growing, exploits the existing MACiE database for the annotation of the reaction mechanisms. The two databases constitute complementary sources of information for enzymology, biochemistry and molecular pharmacology studies. AVAILABILITY: http://www.ebi.ac.uk/thornton-srv/databases/Metal_MACiE/home.html CONTACT: andreini@cerm.unifi.it SUPPLEMENTARY INFORMATION: Table S1, Figure S1.
Gemma L Holliday,
Janet M Thornton,
Andrée Marquet,
Alison G Smith,
Fabrice Rébeillé,
Ralf Mendel,
Heidi L Schubert,
Andrew D Lawrence,
Martin J Warren
Covering: 1945 to 2007.
Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom.
Random Forest regression (RF), Partial-Least-Squares (PLS) regression, Support Vector Machines (SVM), and Artificial Neural Networks (ANN) were used to develop QSPR models for the prediction of aqueous solubility, based on experimental data for 988 organic molecules. The Random Forest regression model predicted aqueous solubility more accurately than those created by PLS, SVM, and ANN and offered methods for automatic descriptor selection, an assessment of descriptor importance, and an in-parallel measure of predictive ability, all of which serve to recommend its use. The prediction of log molar solubility for an external test set of 330 molecules that are solid at 25 degrees C gave an r2 = 0.89 and RMSE = 0.69 log S units. For a standard data set selected from the literature, the model performed well with respect to other documented methods. Finally, the diversity of the training and test sets are compared to the chemical space occupied by molecules in the MDL drug data report, on the basis of molecular descriptors selected by the regression analysis.
Chemical Markup, XML, and the World Wide Web. 6. CMLReact, an XML Vocabulary for Chemical Reactions.
Department of Chemistry, Unilever Center for Molecular Informatics, Lensfield Road, Cambridge CB2 1EW, U.K., and Department of Chemistry, Imperial College London, South Kensington Campus, London SW7 2AZ, U.K.
A set of components (CMLReact) for managing chemical and biochemical reactions has been added to CML. These can be combined to support most of the strategies for the formal representation of reactions. The elements, attributes, and types are formally defined as XMLSchema components, and their semantics are developed. New syntax and semantics in CML are reported and illustrated with 10 examples.
Richard A George,
Ruth V Spriggs,
Gail J Bartlett,
Alex Gutteridge,
Malcolm W MacArthur,
Craig T Porter,
Bissan Al-Lazikani,
Janet M Thornton,
Mark B Swindells
Because of the extreme impact of genome sequencing projects, protein sequences without accompanying experimental data now dominate public databases. Homology searches, by providing an opportunity to transfer functional information between related proteins, have become the de facto way to address this. Although a single, well annotated, close relationship will often facilitate sufficient annotation, this situation is not always the case, particularly if mutations are present in important functional residues. When only distant relationships are available, the transfer of function information is more tenuous, and the likelihood of encountering several well annotated proteins with different functions is increased. The consequence for a researcher is a range of candidate functions with little way of knowing which, if any, are correct. Here, we address the problem directly by introducing a computational approach to accurately identify and segregate related proteins into those with a functional similarity and those where function differs. This approach should find a wide range of applications, including the interpretation of genomics/proteomics data and the prioritization of targets for high-throughput structure determination. The method is generic, but here we concentrate on enzymes and apply high-quality catalytic site data. In addition to providing a series of comprehensive benchmarks to show the overall performance of our approach, we illustrate its utility with specific examples that include the correct identification of haptoglobin as a nonenzymatic relative of trypsin, discrimination of acid-d-amino acid ligases from a much larger ligase pool, and the successful annotation of BioH, a structural genomics target.
Latest similar papers:
EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
The MACiE database, a database of 223 distinct step-wise enzyme reaction mechanisms, currently holds representatives from each Enzyme Commission sub-subclass where there is an available crystal structure and sufficient evidence in the primary literature to support a mechanism. Each catalytic step of every reaction sequence in MACiE is fully annotated so that it includes the function of the catalytic residues involved in the reaction and the mechanism by which substrates are transformed into products. Using MACiE as a knowledge-base we have seen that the top ten most catalytic residues are histidine, aspartate, glutamate, lysine, cysteine, arginine, serine, threonine, tyrosine and tryptophan. Of these only seven (cysteine, histidine, aspartate, lysine, serine, threonine and tyrosine) dominate catalysis and provide essentially five functional roles that are absolutely essential. Stabilisation is the most common and essential role for all classes of enzyme, followed by general acid/base (proton acceptor and proton donor) functionality, with nucleophilic addition following closely behind (nucleophile and nucleofuge). We investigated the occurrence of these residues in MACiE and the Catalytic Site Atlas and found that, as expected, certain residue types are associated with each functional role, with some residue types able to perform diverse roles. In addition, it was seen that different classes of enzyme (as determined by the EC classification) have a tendency to employ different residues for catalysis. Further we show that whilst the differences between EC classes in catalytic residue composition are not immediately obvious from the general classes of Ingold mechanisms, there is some weak correlation between the mechanisms involved in a given EC class and the functions that the catalytic amino acid residues are performing. The analysis presented here provides a valuable insight into the functional roles of catalytic amino acid residues, which may have applications in many aspects of enzymology, from the design of novel enzymes to the prediction and validation of an enzyme's reaction mechanism.
Magnetic Resonance Center (CERM)- University of Florence, Via L. Sacconi 6, 50019 Sesto Fiorentino, Italy.
SUMMARY: Metal-MACiE is a new publicly available web-based database, held in MySQL, which aims to organize the available information on the properties and the roles of metals in the context of the catalytic mechanisms of metalloenzymes. Metal-MACiE, which currently covers 75% of metal-dependent EC sub-sub-classes and is continuously growing, exploits the existing MACiE database for the annotation of the reaction mechanisms. The two databases constitute complementary sources of information for enzymology, biochemistry and molecular pharmacology studies. AVAILABILITY: http://www.ebi.ac.uk/thornton-srv/databases/Metal_MACiE/home.html CONTACT: andreini@cerm.unifi.it SUPPLEMENTARY INFORMATION: Table S1, Figure S1.
Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan.
SUMMARY: Many data manipulation processes involve the use of programming libraries. These processes may beneficially be automated due to their repeated use. A convenient type of automation is in the form of workflows that also allow such processes to be shared amongst the community. The Taverna workflow system has been extended to enable it to use and invoke Java classes and methods as tasks within Taverna workflows. These classes and methods are selected for use during workflow construction by a Java Doclet application called the API Consumer. The API Consumer generates an XML file that enables Taverna to select a subset of Java classes and methods for use in the composition of Taverna workflows. The ability of Taverna to invoke Java classes and methods is demonstrated by a workflow in which we use libSBML to map gene expression data onto a metabolic pathway represented as an SBML model. AVAILABILITY: Taverna and the API Consumer application can be freely downloaded from http://taverna.sourceforge.net. CONTACT: peter.li@manchester.ac.uk SUPPLEMENTARY INFORMATION: Supplementary material and documentation are available from http://www.mcisb.org/software/taverna/libsbml/index.html.
Unilever Centre for Molecular Science Informatics, Dept of Chemistry, University of Cambridge, Lensfield Rd, Cambridge CB2 1EW, UK.
The concept of reaction similarity has been well studied in terms of the overall transformation associated with a reaction, but not in terms of mechanism. We present the first method to give a quantitative measure of the similarity of reactions based upon their explicit mechanisms. Two approaches are presented to measure the similarity between individual steps of mechanisms: a fingerprint-based approach that incorporates relevant information on each mechanistic step; and an approach based only on bond formation, cleavage and changes in order. The overall similarity for two reaction mechanisms is then calculated using the Needleman-Wunsch alignment algorithm. An analysis of MACiE, a database of enzyme mechanisms, using our measure of similarity identifies some examples of convergent evolution of chemical mechanisms. In many cases, mechanism similarity is not reflected by similarity according to the EC system of enzyme classification. In particular, little mechanistic information is conveyed by the class level of the EC system.
Molecular Modeling Group, Organic Chemical Sciences, Indian Institute of Chemical Technology, Hyderabad 500007, India.
Cation-aromatic database (CAD) is a publicly available web-based database that aims to provide further understanding of interaction between a cation and the pi interactions. A tool to identify the interactions in a user-given protein is also added to the database. CAD is freely accessible via the Internet at http://203.199.182.73/gnsmmg/databases/cad/ Proteins 2007.(c) 2007 Wiley-Liss, Inc.
Gemma L Holliday,
Daniel E Almonacid,
Gail J Bartlett,
Noel M O'boyle,
James W Torrance,
Peter Murray-Rust,
John B O Mitchell,
Janet M Thornton
EMBL-EBI, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD, UK.
MACiE (Mechanism, Annotation and Classification in Enzymes) is a database of enzyme reaction mechanisms, and is publicly available as a web-based data resource. This paper presents the first release of a web-based search tool to explore enzyme reaction mechanisms in MACiE. We also present Version 2 of MACiE, which doubles the dataset available (from Version 1). MACiE can be accessed from http://www.ebi.ac.uk/thornton-srv/databases/MACiE/
Department of Biopharmaceutical Sciences, University of California, San Francisco, California 94143, USA.
Enzyme evolution is often constrained by aspects of catalysis. Sets of homologous proteins that catalyze different overall reactions but share an aspect of catalysis, such as a common partial reaction, are called mechanistically diverse superfamilies. The common mechanistic steps and structural characteristics of several of these superfamilies, including the enolase, Nudix, amidohydrolase, and haloacid dehalogenase superfamilies have been characterized. In addition, studies of mechanistically diverse superfamilies are helping to elucidate mechanisms of functional diversification, such as catalytic promiscuity. Understanding how enzyme superfamilies evolve is vital for accurate genome annotation, predicting protein functions, and protein engineering.
School of Enzymology, Department of Chemistry, M.V. Lomonosov Moscow State University, Moscow, 119992, Russia.
SUMMARY: Universal ontology of catalytic sites is required to systematize enzyme catalytic sites, their evolution, as well as relations between catalytic sites and protein families, organisms and chemical reactions. Here we present a classification of hydrolases catalytic sites based on hierarchical organization. The Web-accessible database provides information on the catalytic sites, protein folds, EC numbers and source organisms of the enzymes and includes software allowing for analysis and visualization of the relations between them. AVAILABILITY: http://www.enzyme.chem.msu.ru/hcs/.
