BioInfoBank Library


FP7 Partner
Add BioInfo.PL bioinformatics lab to Your FP7 application

Open Reading Frames

Latest Paper:

Most cited papers:

DNA Res. 1996 Jun 30;3 (3):109-36 8905231 (P,S,G,E,B) Cited:846
Kazusa DNA Research Institute, Chiba, Japan.
The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of lambda clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.
Science. 1996 Oct 25;274 (5287):546, 563-7 8849441 (P,S,G,E,B) Cited:831
The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.
Cell. 1993 Dec 17;75 (6):1215-25 8261515 (P,S,G,E,B) Cited:828
Johns Hopkins Oncology Center, Baltimore, Maryland 21231.
Recent studies have shown that a locus responsible for hereditary nonpolyposis colorectal cancer (HNPCC) is on chromosome 2p and that tumors developing in these patients contain alterations in microsatellite sequences (RER+ phenotype). We have used chromosome microdissection to obtain highly polymorphic markers from chromosome 2p16. These and other markers were ordered in a panel of somatic cell hybrids and used to define a 0.8 Mb interval containing the HNPCC locus. Candidate genes were then mapped, and one was found to lie within the 0.8 Mb interval. We identified this candidate by virtue of its homology to mutS mismatch repair genes. cDNA clones were obtained and the sequence used to detect germline mutations, including those producing termination codons, in HNPCC kindreds. Somatic as well as germline mutations of the gene were identified in RER+ tumor cells. This mutS homolog is therefore likely to be responsible for HNPCC.
Cell. 2000 Jul 7;102 (1):109-26 10929718 (P,S,G,E,B) Cited:818
Rosetta Inpharmatics, Inc., Kirkland, Washington 98034, USA.
Ascertaining the impact of uncharacterized perturbations on the cell is a fundamental problem in biology. Here, we describe how a single assay can be used to monitor hundreds of different cellular functions simultaneously. We constructed a reference database or "compendium" of expression profiles corresponding to 300 diverse mutations and chemical treatments in S. cerevisiae, and we show that the cellular pathways affected can be determined by pattern matching, even among very subtle profiles. The utility of this approach is validated by examining profiles caused by deletions of uncharacterized genes: we identify and experimentally confirm that eight uncharacterized open reading frames encode proteins required for sterol metabolism, cell wall function, mitochondrial respiration, or protein synthesis. We also show that the compendium can be used to characterize pharmacological perturbations by identifying a novel target of the commonly used drug dyclonine.
Cell. 1994 Jul 15;78 (1):67-74 8033213 (P,S,G,E,B) Cited:757
H Toyoshima, T Hunter
Molecular Biology and Virology Laboratory, Salk Institute, La Jolla, California 92037.
Using a yeast interaction screen to search for proteins that interact with cyclin D1-Cdk4, we identified a 27 kDa mouse protein related to the p21 cyclin-Cdk inhibitor. p27 interacts strongly with D-type cyclins and Cdk4 in vitro and more weakly with cyclin E and Cdk2. In mouse fibroblasts, p27 is associated predominantly with cyclin D1-Cdk4. Recombinant p27 is a potent inhibitor of cyclin D1-Cdk4 and cyclin A-Cdk2 protein kinase activity and a weaker inhibitor of cyclin B1-Cdc2. Overexpression of p27 in Saos-2 cells causes G1 arrest. p27 protein levels do not change as serum-stimulated quiescent mouse fibroblasts progress through the cell cycle. p27 is identical to p27Kip1, a cyclin-Cdk inhibitor present in TGF beta-treated cells. p27 has the hallmarks of a negative regulator of G1 progression and may mediate TGF beta-induced G1 arrest.
Science. 2003 May 30;300 (5624):1394-9 12730500 (P,S,G,E,B) Cited:685
In March 2003, a novel coronavirus (SARS-CoV) was discovered in association with cases of severe acute respiratory syndrome (SARS). The sequence of the complete genome of SARS-CoV was determined, and the initial characterization of the viral genome is presented in this report. The genome of SARS-CoV is 29,727 nucleotides in length and has 11 open reading frames, and its genome organization is similar to that of other coronaviruses. Phylogenetic analyses and sequence comparisons showed that SARS-CoV is not closely related to any of the previously characterized coronaviruses.
Science. 2003 May 30;300 (5624):1399-404 12730501 (P,S,G,E,B) Cited:626
Department of Microbiology, University of Colorado Health Sciences Center, Denver, CO 80262, USA. kathryn.holmes@UCHSC.edu
We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously known groups of coronaviruses. The genome sequence will aid in the diagnosis of SARS virus infection in humans and potential animal hosts (using polymerase chain reaction and immunological tests), in the development of antivirals (including neutralizing antibodies), and in the identification of putative epitopes for vaccine development.
Proc Natl Acad Sci U S A. 1999 Apr 13;96 (8):4285-8 10200254 (P,S,G,E,B) Cited:560
Molecular Biology Institute and Departments of Energy Laboratory of Structural Biology and Molecular Medicine, University of California, Los Angeles, CA 90095-1570, USA.
Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.
DNA Res. 2000 Dec 31;7 (6):347-55 11214970 (P,S,G,E,B) Cited:557
Kazusa DNA Research Institute, Kisarazu, Chiba, Japan.
As an extension of our human cDNA project for accumulating sequence information on the coding sequences of unidentified genes, we here present the entire sequences of 100 cDNA clones of unidentified genes, named KIAA1673-KIAA1772, from three sets of size-fractionated cDNA libraries derived from human adult whole brain, hippocampus, and fetal whole brain. The average sizes of the inserts and corresponding open reading frames of cDNA clones analyzed here were 4.9 kb and 2.7 kb (corresponding to 895 amino acid residues), respectively. By computer-assisted analysis of the deduced amino acid sequences, 44 predicted gene products were classified into five functional categories of proteins relating to cell signaling/communication, nucleic acid management, cell structure/motility, protein management, and metabolism. Furthermore, the expression profiles of the genes were also studied in 10 human tissues, 8 brain regions, spinal cord, fetal brain and fetal liver by reverse-transcription-coupled polymerase chain reaction, the products of which were quantified by enzyme-linked immunosorbent assay.

Science news