|
Latest Paper:
Nat Biotechnol. 2012 ;30 (4):344-8
22334048
Nanopore Group, Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA.
An emerging DNA sequencing technique uses protein or solid-state pores to analyze individual strands as they are driven in single-file order past a nanoscale sensor. However, uncontrolled electrophoresis of DNA through these nanopores is too fast for accurate base reads. Here, we describe forward and reverse ratcheting of DNA templates through the α-hemolysin nanopore controlled by phi29 DNA polymerase without the need for active voltage control. DNA strands were ratcheted through the pore at median rates of 2.5-40 nucleotides per second and were examined at one nucleotide spatial precision in real time. Up to 500 molecules were processed at ∼130 molecules per hour through one pore. The probability of a registry error (an insertion or deletion) at individual positions during one pass along the template strand ranged from 10% to 24.5% without optimization. This strategy facilitates multiple reads of individual strands and is transferable to other nanopore devices for implementation of DNA sequence analysis.
Mol Syst Biol. 2011 ;7 :539
21988835
Cit:1
Fabian Sievers,
Andreas Wilm,
David Dineen,
Toby J Gibson,
Kevin Karplus,
Weizhong Li,
Rodrigo Lopez,
Hamish McWilliam,
Michael Remmert,
Johannes Söding,
Julie D Thompson,
Desmond G Higgins
School of Medicine and Medical Science, UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland.
Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.
Department of Biomolecular Engineering, University of California at Santa Cruz, Santa Cruz, CA 95064, USA.
We report the identification and characterization of a previously unidentified protein domain found in bacterial chemoreceptors and other bacterial signal transduction proteins. This domain contains a motif of three noncontiguous histidines and one cysteine, arranged as Hxx[WFYL]x(21-28)Cx[LFMVI]Gx[WFLVI]x(18-27)HxxxH(boldface type indicates residues that are nearly 100% conserved). This domain was first identified in the soluble Helicobacter pylori chemoreceptor TlpD. Using inductively coupled plasma mass spectrometry on heterologously and natively expressed TlpD, we determined that this domain binds zinc with a subfemtomolar dissociation constant. We thus named the domain CZB, for chemoreceptor zinc binding. Further analysis showed that many bacterial signaling proteins contain the CZB domain, most commonly proteins that participate in chemotaxis but also those that participate in c-di-GMP signaling and nitrate/nitrite sensing, among others. Proteins bearing the CZB domain are found in several bacterial phyla. The variety of signaling proteins using the CZB domain suggests that it plays a critical role in several signal transduction pathways.
Department of Biomedical Engineering, Institute for Computational Medicine, Johns Hopkins University, Baltimore, MD 21218, USA. jsamayoa@jhu.edu
MOTIVATION Accurate prediction of genes encoding small proteins (on the order of 50 amino acids or less) remains an elusive open problem in bioinformatics. Some of the best methods for gene prediction use either sequence composition analysis or sequence similarity to a known protein coding sequence. These methods often fail for small proteins, however, either due to a lack of experimentally verified small protein coding genes or due to the limited statistical significance of statistics on small sequences. Our approach is based upon the hypothesis that true small proteins will be under selective pressure for encoding the particular amino acid sequence, for ease of translation by the ribosome and for structural stability. This stability can be achieved either independently or as part of a larger protein complex. Given this assumption, it follows that small proteins should display conserved local protein structure properties much like larger proteins. Our method incorporates neural-net predictions for three local structure alphabets within a comparative genomic approach using a genomic alignment of 22 closely related bacteria genomes to generate predictions for whether or not a given open reading frame (ORF) encodes for a small protein. RESULTS We have applied this method to the complete genome for Escherichia coli strain K12 and looked at how well our method performed on a set of 60 experimentally verified small proteins from this organism. Out of a total of 11 407 possible ORFs, we found that 6 of the top 10 and 27 of the top 100 predictions belonged to the set of 60 experimentally verified small proteins. We found 35 of all the true small proteins within the top 200 predictions. We compared our method to Glimmer, using a default Glimmer protocol and a modified small ORF Glimmer protocol with a lower minimum size cutoff. The default Glimmer protocol identified 16 of the true small proteins (all in the top 200 predictions), but failed to predict on 34 due to size cutoffs. The small ORF Glimmer protocol made predictions for all the experimentally verified small proteins but only contained 9 of the 60 true small proteins within the top 200 predictions. CONTACT jsamayoa@jhu.edu
Bioinformatics. 2010 Feb 3;:
20130034
Cit:5
Department of Computer Science, Woodland Road, University of Bristol, BS8 1UB, UK.
MOTIVATION: Existing methods for protein sequence analysis are generally firstorder and inherently assume that each position is independent. We develop a general framework for introducing longer-range interactions. We then demonstrate the power of our approach by applying it to secondary structure prediction; under the independence assumption sequences produced by existing methods can produce features that are not protein-like, an extreme example being a helix of length one. Our goal was to make the predictions from state of the art methods more realistic, without loss of performance by other measures. RESULTS: Our framework for longer-range interactions is described as a k-mer order model. We succeeded in applying our model to the specific problem of secondary structure prediction, to be used as additional layer on top of existing methods. We achieved our goal of making the predictions more realistic and protein-like, and remarkably this also improved the overall performance. We improve the SOV score by 1.8%, but more importantly we radically improve the probability of the real sequence given a prediction from an average of 0.271 per residue to 0.385. Crucially this improvement is obtained using no additional information. AVAILABILITY: http://supfam.cs.bris.ac.uk/kmer CONTACT: gough@cs.bris.ac.uk.
Proteins. 2009 Aug 12;:
19768677
Cit:16
Elmar Krieger,
Keehyoung Joo,
Jinwoo Lee,
Jooyoung Lee,
Srivatsan Raman,
James Thompson,
Mike Tyka,
David Baker,
Kevin Karplus
Centre for Molecular and Biomolecular Informatics, Radboud University Nijmegen Medical Centre, The Netherlands.
A correct alignment is an essential requirement in homology modeling. Yet in order to bridge the structural gap between template and target, which may not only involve loop rearrangements, but also shifts of secondary structure elements and repacking of core residues, high-resolution refinement methods with full atomic details are needed. Here, we describe four approaches that address this "last mile of the protein folding problem" and have performed well during CASP8, yielding physically realistic models: YASARA, which runs molecular dynamics simulations of models in explicit solvent, using a new partly knowledge-based all atom force field derived from Amber, whose parameters have been optimized to minimize the damage done to protein crystal structures. The LEE-SERVER, which makes extensive use of conformational space annealing to create alignments, to help Modeller build physically realistic models while satisfying input restraints from templates and CHARMM stereochemistry, and to remodel the side-chains. ROSETTA, whose high resolution refinement protocol combines a physically realistic all atom force field with Monte Carlo minimization to allow the large conformational space to be sampled quickly. And finally UNDERTAKER, which creates a pool of candidate models from various templates and then optimizes them with an adaptive genetic algorithm, using a primarily empirical cost function that does not include bond angle, bond length, or other physics-like terms. Proteins 2009.(c) 2009 Wiley-Liss, Inc.
Proteins. 2009 Jun 19;:
19639637
Cit:6
Department of Biomolecular Engineering, University of California, Santa Cruz, California 95064.
Our group tested three quality assessment functions in CASP8: a function which used only distance constraints derived from alignments (SAM-T08-MQAO), a function which added other single-model terms to the distance constraints (SAM-T08-MQAU), and a function which used both single-model and consensus terms (SAM-T08-MQAC). We analyzed the functions both for ranking models for a single target and for producing an accurate estimate of GDT_TS. Our functions were optimized for the ranking problem, so are perhaps more appropriate for metaserver applications than for providing trustworthiness estimates for single models. On the CASP8 test, the functions with more terms performed better. The MQAC consensus method was substantially better than either single-model function, and the MQAU function was substantially better than the MQAO function that used only constraints from alignments. Proteins 2009.(c) 2009 Wiley-Liss, Inc.
Nucleic Acids Res. 2009 May 29;:
19483096
Cit:12
Department of Biomolecular Engineering, Baskin School of Engineering, University of California, Santa Cruz, CA 95064, USA.
The SAM-T08 web server is a protein structure prediction server that provides several useful intermediate results in addition to the final predicted 3D structure: three multiple sequence alignments of putative homologs using different iterated search procedures, prediction of local structure features including various backbone and burial properties, calibrated E-values for the significance of template searches of PDB and residue-residue contact predictions. The server has been validated as part of the CASP8 assessment of structure prediction as having good performance across all classes of predictions. The SAM-T08 server is available at http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html.
Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA. firas@u.washington.edu
MOTIVATION: Our focus has been on detecting topological properties that are rare in real proteins, but occur more frequently in models generated by protein structure prediction methods such as Rosetta. We previously created the Knotfind algorithm, successfully decreasing the frequency of knotted Rosetta models during CASP6. We observed an additional class of knot-like loops that appeared to be equally un-protein-like and yet do not contain a mathematical knot. These topological features are commonly referred to as slip-knots and are caused by the same mechanisms that result in knotted models. Slip-knots are undetectable by the original Knotfind algorithm. We have generalized our algorithm to detect them, and analyzed CASP6 models built using the Rosetta loop modeling method. RESULTS: After analyzing known protein structures in the PDB, we found that slip-knots do occur in certain proteins, but are rare and fall into a small number of specific classes. Our group used this new Pokefind algorithm to distinguish between these rare real slip-knots and the numerous classes of slip-knots that we discovered in Rosetta models and models submitted by the various CASP7 servers. The goal of this work is to improve future models created by protein structure prediction methods. Both algorithms are able to detect un-protein-like features that current metrics such as GDT are unable to identify, so these topological filters can also be used as additional assessment tools.
Proteins. 2008 Sep 30;:
19004017
Cit:13
University of California at Santa Cruz, Biomolecular Engineering, Santa Cruz, California.
Undertaker is a program designed to help predict protein structure using alignments to proteins of known structure and fragment assembly. The program generates conformations and uses cost functions to select the best structures from among the generated conformations. This paper describes the use of Undertaker's cost functions for model quality assessment. We achieve an accuracy that is similar to other methods, without using consensus-based techniques. Adding consensus-based features further improves our approach substantially. We report several correlation measures, including a new weighted version of Kendall's tau (tau(3)) and show model quality assessment results superior to previously published results on all correlation measures when using only models with no missing atoms. Proteins 2009.(c) 2008 Wiley-Liss, Inc.
|
Polish News | |||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||
|
|