|
Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, Pawinskiego 5a, 02-106 Warsaw, Poland. D.Plewczynski@icm.edu.pl
We present here the random forest supervised machine learning algorithm applied to flexible docking results from five typical virtual high throughput screening (HTS) studies. Our approach is aimed at: i) reducing the number of compounds to be tested experimentally against the given protein target and ii) extending results of flexible docking experiments performed only on a subset of a chemical library in order to select promising inhibitors from the whole dataset. The random forest (RF) method is applied and tested here on compounds from the MDL drug data report (MDDR). The recall values for selected five diverse protein targets are over 90% and the performance reaches 100%. This machine learning method combined with flexible docking is capable to find 60% of the active compounds for most protein targets by docking only 10% of screened ligands. Therefore our in silico approach is able to scan very large databases rapidly in order to predict biological activity of small molecule inhibitors and provides an effective alternative for more computationally demanding methods in virtual HTS.
Latest citations:
J Mol Recognit. ;24 (2):149-64
21360606
Cit:2
Medicinal Chemistry and Drug Action, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC 3052, Australia. Elizabeth.Yuriev@pharm.monash.edu.au
Docking is a computational technique that places a small molecule (ligand) in the binding site of its macromolecular target (receptor) and estimates its binding affinity. This review addresses methodological developments that have occurred in the docking field in 2009, with a particular focus on the more difficult, and sometimes controversial, aspects of this promising computational discipline. These developments aim to address the main challenges of docking: receptor representation (such aspects as structural waters, side chain protonation, and, most of all, flexibility (from side chain rotation to domain movement)), ligand representation (protonation, tautomerism and stereoisomerism, and the effect of input conformation), as well as accounting for solvation and entropy of binding. This review is strongly focused on docking advances in the context of drug design, specifically in virtual screening and fragment-based drug design.
Other papers by authors:
BMC Bioinformatics. 2006 ;7 :53
16460560
Cit:13
Marcin von Grotthuss,
Dariusz Plewczynski,
Krzysztof Ginalski,
Leszek Rychlewski,
Eugene I Shakhnovich
Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA. mvg@paradox.harvard.edu
BACKGROUND: The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. RESULTS: Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. CONCLUSION: We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. AVAILABILITY: http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
CMBI, NCMLS, Radboud University Nijmegen Medical Centre, Geert Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands.
The 'omics' revolution is causing a flurry of data that all needs to be annotated for it to become useful. Sequences of proteins of unknown function can be annotated with a putative function by comparing them with proteins of known function. This form of annotation is typically performed with BLAST or similar software. Structural genomics is nowadays also bringing us three dimensional structures of proteins with unknown function. We present here software that can be used when sequence comparisons fail to determine the function of a protein with known structure but unknown function. The software, called 3D-Fun, is implemented as a server that runs at several European institutes and is freely available for everybody at all these sites. The 3D-Fun servers accept protein coordinates in the standard PDB format and compare them with all known protein structures by 3D structural superposition using the 3D-Hit software. If structural hits are found with proteins with known function, these are listed together with their function and some vital comparison statistics. This is conceptually very similar in 3D to what BLAST does in 1D. Additionally, the superposition results are displayed using interactive graphics facilities. Currently, the 3D-Fun system only predicts enzyme function but an expanded version with Gene Ontology predictions will be available soon. The server can be accessed at http://3dfun.bioinfo.pl/ or at http://3dfun.cmbi.ru.nl/.
Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw University, Warszawa, Poland. D.Plewczynski@icm.edu.pl
We present here a neural network-based method for detection of signal peptides (abbreviation used: SP) in proteins. The method is trained on sequences of known signal peptides extracted from the Swiss-Prot protein database and is able to work separately on prokaryotic and eukaryotic proteins. A query protein is dissected into overlapping short sequence fragments, and then each fragment is analyzed with respect to the probability of it being a signal peptide and containing a cleavage site. While the accuracy of the method is comparable to that of other existing prediction tools, it provides a significantly higher speed and portability. The accuracy of cleavage site prediction reaches 73% on heterogeneous source data that contains both prokaryotic and eukaryotic sequences while the accuracy of discrimination between signal peptides and non-signal peptides is above 93% for any source dataset. As a consequence, the method can be easily applied to genome-wide datasets. The software can be downloaded freely from http://rpsp.bioinfo.pl/RPSP.tar.gz.
Cell Cycle. 2008 Feb ;7 (4):542-4
18235229
Cit:1
Lukasz Knizewski,
Kamil Steczkiewicz,
Krzysztof Kuchta,
Lucjan Wyrwicz,
Dariusz Plewczynski,
Andrzej Kolinski,
Leszek Rychlewski,
Krzysztof Ginalski
We present here the recent update of AutoMotif Server (AMS 2.0) that predicts post-translational modification sites in protein sequences. The support vector machine (SVM) algorithm was trained on data gathered in 2007 from various sets of proteins containing experimentally verified chemical modifications of proteins. Short sequence segments around a modification site were dissected from a parent protein, and represented in the training set as binary or profile vectors. The updated efficiency of the SVM classification for each type of modification and the predictive power of both representations were estimated using leave-one-out tests for model of general phosphorylation and for modifications catalyzed by several specific protein kinases. The accuracy of the method was improved in comparison to the previous version of the service (Plewczynski et al.,"AutoMotif server: prediction of single residue post-translational modifications in proteins", Bioinformatics 21: 2525-7, 2005). The precision of the updated version reached over 90% for selected types of phosphorylation and was optimized in trade of lower recall value of the classification model. The AutoMotif Server version 2007 is freely available at http://ams2.bioinfo.pl/. Additionally, the reference dataset for optimization of prediction of phosphorylation sites, collected from the UniProtKB was also provided and can be accessed at http://ams2.bioinfo.pl/data/.
Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, Pawinskiego 5a Street, 02-106 Warsaw, Poland.
A structure-based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein-ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High-Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta-Database. Selected molecules having close structural relationship to a 2-methyl-2,4-pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors.
Dariusz Plewczynski,
Marcin von Grotthuss,
Stephane A H Spieser,
Leszek Rychewski,
Lucjan S Wyrwicz,
Krzysztof Ginalski,
Uwe Koch
BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznan, Poland. darman@bioinfo.pl.
In many cases at the beginning of an HTS-campaign, some information about active molecules is already available. Often known active compounds (such as substrate analogues, natural products, inhibitors of a related protein or ligands published by a pharmaceutical company) are identified in low-throughput validation studies of the biochemical target. In this study we evaluate the effectiveness of a support vector machine applied for those compounds and used to classify a collection with unknown activity. This approach was aimed at reducing the number of compounds to be tested against the given target. Our method predicts the biological activity of chemical compounds based on only the atom pairs (AP) two dimensional topological descriptors. The supervised support vector machine (SVM) method herein is trained on compounds from the MDL drug data report (MDDR) known to be active for specific protein target. For detailed analysis, five different biological targets were selected including cyclooxygenase-2, dihydrofolate reductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor. The accuracy of compound identification was estimated using the recall and precision values. The sensitivities for all protein targets exceeded 80% and the classification performance reached 100% for selected targets. In another application of the method, we addressed the absence of an initial set of active compounds for a selected protein target at the beginning of an HTS-campaign. In such a case, virtual high-throughput screening (vHTS) is usually applied by using a flexible docking procedure. However, the vHTS experiment typically contains a large percentage of false positives that should be verified by costly and time-consuming experimental follow-up assays. The subsequent use of our machine learning method was found to improve the speed (since the docking procedure was not required for all compounds from the database) and also the accuracy of the HTS hit lists (the enrichment factor).
Department of Biochemistry and Howard Hughes Medical Institute, University of Texas, Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9038, USA.
Meta-BASIC (http://basic.bioinfo.pl) is a novel sensitive approach for recognition of distant similarity between proteins based on consensus alignments of meta profiles. Specifically, Meta-BASIC compares sequence profiles combined with predicted secondary structure by utilizing several scoring systems and alignment algorithms. In our benchmarking tests, Meta-BASIC outperforms many individual servers, including fold recognition servers, and it can compete with meta predictors that base their strength on the structural comparison of models. In addition, Meta-BASIC, which enables detection of very distant relationships even if the tertiary structure for the reference protein is not known, has a high-throughput capability. This new method is applied to 860 PfamA protein families with unknown function (DUF) and provides many novel structure-functional assignments available on-line at http://basic.bioinfo.pl/duf.pl. Detailed discussion is provided for two of the most interesting assignments. DUF271 and DUF431 are predicted to be a nucleotide-diphospho-sugar transferase and an alpha/beta-knot SAM-dependent RNA methyltransferase, respectively.
Interdisciplinary Center for Mathematical and Computational Modeling Warsaw University Warszawa, Poland.
We present here a simple method for fast and accurate comparison of proteins using their structures. The algorithm is based on structural alignment of segments of Calpha chains (with size of 99 or 199 residues). The method is optimized in terms of speed and accuracy. We test it on 97 representative proteins with the similarity measure based on the SCOP classification. We compare our algorithm with the LGscore2 automatic method. Our method has the same accuracy as the LGscore2 algorithm with much faster processing of the whole test set, which is promising. A second test is done using the ToolShop structure prediction evaluation program and shows that our tool is on average slightly less sensitive than the DALI server. Both algorithms give a similar number of correct models, however, the final alignment quality is better in the case of DALI. Our method was implemented under the name 3D-Hit as a web server at http://3dhit.bioinfo.pl/ free for academic use, with a weekly updated database containing a set of 5000 structures from the Protein Data Bank with non-homologous sequences.
BioInfoBank Institute, Poznan, Poland.
In CASP5, the BioInfo.PL group has used the structure prediction Meta Server and the associated newly developed flexible meta-predictor, called 3D-Jury, as the main structure prediction tools. The most important feature of the meta-predictor is a high (86%) correlation between the reported confidence score and the quality of the selected model. The Gene Relational Database (GRDB) was used to confirm the fold recognition results by selecting distant homologues and subsequent structure prediction with the Meta Server. A fragment-splicing procedure was performed as a final processing step with large fragments extracted from selected models using model quality control provided by Verify3D. The comparison of submitted models with the native structure conducted after the CASP meeting showed that the GRDB-supported structure prediction led to a satisfactory template fold selection, whereas the fragment-splicing procedure must be improved in the future.
Latest similar papers:
J Cheminform. 2012 May 15;4 (1):10
22587596
ABSTRACT: BACKGROUND: Experimental screening of chemical compounds for biological activity is a time consuming and expensive practice. In silico predictive models permit inexpensive, rapid "virtual screening" to prioritize selection of compounds for experimental testing. Both experimental and in silico screening can be used to test compounds for desirable or undesirable properties. Prior work on prediction of mutagenicity has primarily involved identification of toxicophores rather than whole-molecule predictive models. In this work, we examined a range of in silico predictive classification models for prediction of mutagenetic properties of compounds, including methods such as J48 and SMO which have not previously been widely applied in cheminformatics. RESULTS: The Bursi mutagenicity data set containing 4337 compounds (Set 1) and a Benchmark data set of 6512 compounds (Set 2) were taken as input data seta in this work. A third data set (Set 3) was prepared by joining up the previous two sets. Classification algorithms including Naive Bayes, Random Forest, J48 and SMO with 10 fold cross-validation and default parameters were used for model generation on these data sets. Models built using the combined performed better than those developed from the Benchmark data set. Significantly, Random Forest outperformed other classifiers for all the data sets, especially for Set 3 with 89.27% accuracy, 89% precision and ROC of 95.3%. To validate the developed models two external data sets, AID1189 and AID1194, with mutagenicity data were tested showing 62% accuracy with 67% precision and 65% ROC area and 91% accuracy, 91% precision with 96.3% ROC area respectively. A Random Forest model was used the approved drugs from DrugBank and metabolites from the Zinc Database with True Positives rate almost 85% showing the robustness of the model. CONCLUSION: We have created a new mutagenicity benchmark data set with around 8,000 compounds. Our work shows that highly accurate predictive mutagenicity models can be built using machine learning methods based on chemical descriptors and trained using this set, and these models provide a complement to toxicophores based methods. Further, our work supports other recent literature in showing that Random Forest models generally outperform other comparable machine learning methods for this kind of application.
J Pharm Biomed Anal. 2012 Mar 29;:
22502908
Angela De Simone,
Francesca Mancini,
Sandro Cosconati,
Luciana Marinelli,
Valeria La Pietra,
Ettore Novellino,
Vincenza Andrisano
Department of Drug Discovery and Development, Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy.
In the present work, a human recombinant BACE1 immobilized enzyme reactor (hrBACE1-IMER) has been applied for the sensitive fast screening of 38 compounds selected through a virtual screening approach. HrBACE1-IMER was inserted into a liquid chromatograph coupled with a fluorescent detector. A fluorogenic peptide substrate (M-2420), containing the β-secretase site of the Swedish mutation of APP, was injected and cleaved in the on-line HPLC-hrBACE1-IMER system, giving rise to the fluorescent product. The compounds of the library were tested for their ability to inhibit BACE1 in the immobilized format and to reduce the area related to the chromatographic peak of the fluorescent enzymatic product. The results were validated in solution by using two different FRET methods. Due to the efficient virtual screening methodology, more than fifty percent of the selected compounds showed a measurable inhibitory activity. One of the most active compound (a bis-indanone derivative) was characterized in terms of IC(50) and K(i) determination on the hrBACE1-IMER. Thus, the hrBACE1-IMER has been confirmed as a valid tool for the throughput screening of different chemical entities with potency lower than 30μM for the fast hits' selection and for mode of action determination.
J Chem Inf Model. 2012 Mar 21;:
22435959
Thomas Scior,
Gary Tresadern,
Andreas Bender,
Jose Luis Medina-Franco,
Karina Martinez-Mayorga,
Thierry Langer,
Karina Cuanalo-Contreras,
Dimitris K Agrafiotis
The aim of virtual screening (VS) is to identify bioactive compounds through computational means, by employing knowledge about the protein target (structure-based VS) or known bioactive ligands (ligand-based VS). In VS, a large database of molecules are ranked according to their likelihood to be active against a given protein target, with the hope that the top fraction of the resulting list is enriched in bioactive compounds. At its core, VS attempts to improve the odds of identifying bioactive molecules by maximizing the true positive rate, that is, by ranking the truly active molecules as high as possible (and, correspondingly, the truly inactive ones as low as possible). In choosing the right approach, the researcher is faced with many questions: where does the optimal balance between efficiency and accuracy lie when evaluating a particular algorithm; do some methods perform better than others and in what particular situations; and what do retrospective results tell us about the prospective utility of a particular method? Given the multitude of settings, parameters, and data sets the practitioner can choose from, there are many pitfalls that lurk along the way which might render VS less efficient or downright useless. This review attempts to catalog published and unpublished problems, shortcomings, failures, and technical traps, so that the readers can avoid them in their own applications of VS.
Lab for Aging Research State Key Laboratory of Biotherapy and Cancer Center West China Hospital, Sichuan University, 1 Keyuan 4 Road, Gaopeng Avenue, Chengdu 610041 China. Hengyix@scu.edu.cn.
Aging and its related diseases are severe issues in modern society. Many efforts have been made to understand the mechanisms of aging and find the ways for preventing age-related diseases. Identifying the compounds targeting aging-related signals is a challenging work because there are so many proteins and signals involved. Recently, companying with the progresses in high throughput screening (HTS) technology, more and more small molecules targeting aging-related pathologic processes have been identified. In this review, we introduce the basic workflow, classification and assay strategies of HTS technology, and sort out known small molecules identified via HTS technology by their roles in aging related diseases, such neural degenerative diseases, diabetes and tumors. Given the fact that application of HTS on aging research is still at an early stage, we summarize the cellular mechanisms about aging process, paralleled with the compounds which can modulate the functions of proteins important for aging signals. Finally, we briefly discuss some advanced HTS technologies for their potent applications on the discovery of anti-aging compounds. The main purpose of this review is to provide updated and useful information to those who are interested in pharmacology and HTS technology, but not familiar with aging biology, or vice versa.
Jason Wang,
Stephen Griesmer,
Miguel Cervantes-Cervantes,
Stephen J Griesmer,
Yang Song,
Jason T L Wang
Bioinformatics Program and Department of Computer Science, New Jersey Institute of Technology, Newark, New Jersey 07102, USA.
We propose here a new approach for ncRNA prediction. Our approach selects features derived from RNA folding programs and ranks these features using a class separation method that measures the ability of the features to differentiate between positive and negative classes. The target feature set comprising top-ranked features is then used to construct several classifiers with different supervised learning algorithms. These classifiers are compared to the same supervised learning algorithms with the baseline feature set employed in a state-of-the-art method. Experimental results based on ncRNA families taken from the Rfam database demonstrate the good performance of the proposed approach.
BMC Res Notes. 2011 ;4 :504
22099929
GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi - 110007, India. jaleel.uc@gmail.com.
UNLABELLED ABSTRACT: BACKGROUND Tuberculosis is a contagious disease caused by Mycobacterium tuberculosis (Mtb), affecting more than two billion people around the globe and is one of the major causes of morbidity and mortality in the developing world. Recent reports suggest that Mtb has been developing resistance to the widely used anti-tubercular drugs resulting in the emergence and spread of multi drug-resistant (MDR) and extensively drug-resistant (XDR) strains throughout the world. In view of this global epidemic, there is an urgent need to facilitate fast and efficient lead identification methodologies. Target based screening of large compound libraries has been widely used as a fast and efficient approach for lead identification, but is restricted by the knowledge about the target structure. Whole organism screens on the other hand are target-agnostic and have been now widely employed as an alternative for lead identification but they are limited by the time and cost involved in running the screens for large compound libraries. This could be possibly be circumvented by using computational approaches to prioritize molecules for screening programmes. RESULTS We utilized physicochemical properties of compounds to train four supervised classifiers (Naïve Bayes, Random Forest, J48 and SMO) on three publicly available bioassay screens of Mtb inhibitors and validated the robustness of the predictive models using various statistical measures. CONCLUSIONS This study is a comprehensive analysis of high-throughput bioassay data for anti-tubercular activity and the application of machine learning approaches to create target-agnostic predictive models for anti-tubercular agents.
Mol Divers. 2011 Sep 27;:
21947759
TargetEx, Kápolna köz 4/a, Dunakeszi, 2120, Hungary.
Rapid in silico selection of target focused libraries from commercial repositories is an attractive and cost-effective approach when starting new drug discovery projects. If structures of active compounds are available rapid 2D similarity search can be performed on multimillion compounds' databases. This in silico approach can be combined with physico-chemical parameter filtering based on the property space of the active compounds and 3D virtual screening if the structure of the target protein is available. A multi-step virtual screening procedure was developed and applied to select potential phosphodiesterase 5 (PDE5) inhibitors in real time. The combined 2D/3D in silico method resulted in the identification of 14 novel PDE5 inhibitors with <1 μMIC(50) values and the hit rate in the second in silico selection and in vitro screening round exceeded the 20%.
Artif Intell Med. 2011 Sep 19;:
21937203
Department of Computer Science, The University of Texas at Dallas, 800 W. Campbell Road, Richardson, TX 75080, USA.
OBJECTIVES: This paper explores the use of an automated method for analyzing narratives of monolingual English speaking children to accurately predict the presence or absence of a language impairment. The goal is to exploit corpus-based approaches inspired by the fields of natural language processing and machine learning. METHODS AND MATERIALS: We extract a large variety of features from language samples and use them to train language models and well known machine learning algorithms as the underlying predictors. The methods are evaluated on two different datasets and three language tasks. One dataset contains samples of two spontaneous narrative tasks performed by 118 children with an average age of 13 years and a second dataset contains play sessions from over 600 younger children with an average age of 6 years. RESULTS: We compare results against a cut off baseline method and show that our results are far superior, reaching F-measures of over 85% in two of the three language tasks, and 48% in the third one. CONCLUSIONS: The different experiments we present here show that corpus based approaches can yield good prediction results in the problem of language impairment detection. These findings warrant further exploration of natural language processing techniques in the field of communication disorders. Moreover, the proposed framework can be easily adapted to analyze samples in languages other than English since most of the features are language independent or can be customized with little effort.
Popul Health Metr. 2011 ;9 :29
21816105
Cit:7
Institute for Health Metrics and Evaluation, University of Washington, 2301 Fifth Ave,, Suite 600, Seattle, WA 98121, USA. abie@uw.edu.
UNLABELLED ABSTRACT: BACKGROUND Computer-coded verbal autopsy (CCVA) is a promising alternative to the standard approach of physician-certified verbal autopsy (PCVA), because of its high speed, low cost, and reliability. This study introduces a new CCVA technique and validates its performance using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 verbal autopsies (VAs). METHODS The Random Forest (RF) Method from machine learning (ML) was adapted to predict cause of death by training random forests to distinguish between each pair of causes, and then combining the results through a novel ranking technique. We assessed quality of the new method at the individual level using chance-corrected concordance and at the population level using cause-specific mortality fraction (CSMF) accuracy as well as linear regression. We also compared the quality of RF to PCVA for all of these metrics. We performed this analysis separately for adult, child, and neonatal VAs. We also assessed the variation in performance with and without household recall of health care experience (HCE). RESULTS For all metrics, for all settings, RF was as good as or better than PCVA, with the exception of a nonsignificantly lower CSMF accuracy for neonates with HCE information. With HCE, the chance-corrected concordance of RF was 3.4 percentage points higher for adults, 3.2 percentage points higher for children, and 1.6 percentage points higher for neonates. The CSMF accuracy was 0.097 higher for adults, 0.097 higher for children, and 0.007 lower for neonates. Without HCE, the chance-corrected concordance of RF was 8.1 percentage points higher than PCVA for adults, 10.2 percentage points higher for children, and 5.9 percentage points higher for neonates. The CSMF accuracy was higher for RF by 0.102 for adults, 0.131 for children, and 0.025 for neonates. CONCLUSIONS We found that our RF Method outperformed the PCVA method in terms of chance-corrected concordance and CSMF accuracy for adult and child VA with and without HCE and for neonatal VA without HCE. It is also preferable to PCVA in terms of time and cost. Therefore, we recommend it as the technique of choice for analyzing past and current verbal autopsies.
Nicole C Kleinstreuer,
Richard S Judson,
David M Reif,
Nisha S Sipes,
Amar V Singh,
Kelly J Chandler,
Rob Dewoskin,
David J Dix,
Robert J Kavlock,
Thomas B Knudsen
National Center for Computational Toxiciology, Office of Research and Development, U.S. Environmental Protection Agency, Research Triangle Park, North Carolina, USA. kleinstreuer.nicole@epa.gov
BACKGROUND Understanding health risks to embryonic development from exposure to environmental chemicals is a significant challenge given the diverse chemical landscape and paucity of data for most of these compounds. High-throughput screening (HTS) in the U.S. Environmental Protection Agency (EPA) ToxCast™ project provides vast data on an expanding chemical library currently consisting of > 1,000 unique compounds across > 500 in vitro assays in phase I (complete) and Phase II (under way). This public data set can be used to evaluate concentration-dependent effects on many diverse biological targets and build predictive models of prototypical toxicity pathways that can aid decision making for assessments of human developmental health and disease. OBJECTIVE We mined the ToxCast phase I data set to identify signatures for potential chemical disruption of blood vessel formation and remodeling. METHODS ToxCast phase I screened 309 chemicals using 467 HTS assays across nine assay technology platforms. The assays measured direct interactions between chemicals and molecular targets (receptors, enzymes), as well as downstream effects on reporter gene activity or cellular consequences. We ranked the chemicals according to individual vascular bioactivity score and visualized the ranking using ToxPi (Toxicological Priority Index) profiles. RESULTS Targets in inflammatory chemokine signaling, the vascular endothelial growth factor pathway, and the plasminogen-activating system were strongly perturbed by some chemicals, and we found positive correlations with developmental effects from the U.S. EPA ToxRefDB (Toxicological Reference Database) in vivo database containing prenatal rat and rabbit guideline studies. We observed distinctly different correlative patterns for chemicals with effects in rabbits versus rats, despite derivation of in vitro signatures based on human cells and cell-free biochemical targets, implying conservation but potentially differential contributions of developmental pathways among species. Follow-up analysis with antiangiogenic thalidomide analogs and additional in vitro vascular targets showed in vitro activity consistent with the most active environmental chemicals tested here. CONCLUSIONS We predicted that blood vessel development is a target for environmental chemicals acting as putative vascular disruptor compounds (pVDCs) and identified potential species differences in sensitive vascular developmental pathways.
|
Polish News |
|
||
|
|