Open Reading Frames
Latest Paper:
Most cited papers:
T Kaneko,
S Sato,
H Kotani,
A Tanaka,
E Asamizu,
Y Nakamura,
N Miyajima,
M Hirosawa,
M Sugiura,
S Sasamoto,
T Kimura,
T Hosouchi,
A Matsuno,
A Muraki,
N Nakazaki,
K Naruo,
S Okumura,
S Shimpo,
C Takeuchi,
T Wada,
A Watanabe,
M Yamada,
M Yasuda,
S Tabata
Kazusa DNA Research Institute, Chiba, Japan.
T Kaneko,
S Sato,
H Kotani,
A Tanaka,
E Asamizu,
Y Nakamura,
N Miyajima,
M Hirosawa,
M Sugiura,
S Sasamoto,
T Kimura,
T Hosouchi,
A Matsuno,
A Muraki,
N Nakazaki,
K Naruo,
S Okumura,
S Shimpo,
C Takeuchi,
T Wada,
A Watanabe,
M Yamada,
M Yasuda,
S Tabata
Kazusa DNA Research Institute, Chiba, Japan.
The sequence determination of the entire genome of the Synechocystis sp. strain PCC6803 was completed. The total length of the genome finally confirmed was 3,573,470 bp, including the previously reported sequence of 1,003,450 bp from map position 64% to 92% of the genome. The entire sequence was assembled from the sequences of the physical map-based contigs of cosmid clones and of lambda clones and long PCR products which were used for gap-filling. The accuracy of the sequence was guaranteed by analysis of both strands of DNA through the entire genome. The authenticity of the assembled sequence was supported by restriction analysis of long PCR products, which were directly amplified from the genomic DNA using the assembled sequence data. To predict the potential protein-coding regions, analysis of open reading frames (ORFs), analysis by the GeneMark program and similarity search to databases were performed. As a result, a total of 3,168 potential protein genes were assigned on the genome, in which 145 (4.6%) were identical to reported genes and 1,257 (39.6%) and 340 (10.8%) showed similarity to reported and hypothetical genes, respectively. The remaining 1,426 (45.0%) had no apparent similarity to any genes in databases. Among the potential protein genes assigned, 128 were related to the genes participating in photosynthetic reactions. The sum of the sequences coding for potential protein genes occupies 87% of the genome length. By adding rRNA and tRNA genes, therefore, the genome has a very compact arrangement of protein- and RNA-coding regions. A notable feature on the gene organization of the genome was that 99 ORFs, which showed similarity to transposase genes and could be classified into 6 groups, were found spread all over the genome, and at least 26 of them appeared to remain intact. The result implies that rearrangement of the genome occurred frequently during and after establishment of this species.
Mesh-terms: Bacterial Proteins :: genetics; DNA Nucleotidyltransferases :: metabolism; Genome, Bacterial; Open Reading Frames; Photosynthesis; Sequence Analysis, DNA; Support, Non-U.S. Gov't; Synechocystis Group :: enzymology; Synechocystis Group :: genetics; Synechocystis Group :: physiology; Transposases;
A Goffeau,
B G Barrell,
H Bussey,
R W Davis,
B Dujon,
H Feldmann,
F Galibert,
J D Hoheisel,
C Jacq,
M Johnston,
E J Louis,
H W Mewes,
Y Murakami,
P Philippsen,
H Tettelin,
S G Oliver
The genome of the yeast Saccharomyces cerevisiae has been completely sequenced through a worldwide collaboration. The sequence of 12,068 kilobases defines 5885 potential protein-encoding genes, approximately 140 genes specifying ribosomal RNA, 40 genes for small nuclear RNA molecules, and 275 transfer RNA genes. In addition, the complete sequence provides information about the higher order organization of yeast's 16 chromosomes and allows some insight into their evolutionary history. The genome shows a considerable amount of apparent genetic redundancy, and one of the major problems to be tackled during the next stage of the yeast genome project is to elucidate the biological functions of all of these genes.
Mesh-terms: Amino Acid Sequence; Base Sequence; Chromosome Mapping; Chromosomes, Fungal :: genetics; Computer Communication Networks; DNA, Fungal :: genetics; Evolution, Molecular; Fungal Proteins :: chemistry; Fungal Proteins :: genetics; Fungal Proteins :: physiology; Gene Library; Genes, Fungal; Genome, Fungal; International Cooperation; Multigene Family; Open Reading Frames; RNA, Fungal :: genetics; Saccharomyces cerevisiae :: genetics; Sequence Analysis, DNA;
F S Leach,
N C Nicolaides,
N Papadopoulos,
B Liu,
J Jen,
R Parsons,
P Peltomäki,
P Sistonen,
L A Aaltonen,
M Nyström-Lahti
Johns Hopkins Oncology Center, Baltimore, Maryland 21231.
Recent studies have shown that a locus responsible for hereditary nonpolyposis colorectal cancer (HNPCC) is on chromosome 2p and that tumors developing in these patients contain alterations in microsatellite sequences (RER+ phenotype). We have used chromosome microdissection to obtain highly polymorphic markers from chromosome 2p16. These and other markers were ordered in a panel of somatic cell hybrids and used to define a 0.8 Mb interval containing the HNPCC locus. Candidate genes were then mapped, and one was found to lie within the 0.8 Mb interval. We identified this candidate by virtue of its homology to mutS mismatch repair genes. cDNA clones were obtained and the sequence used to detect germline mutations, including those producing termination codons, in HNPCC kindreds. Somatic as well as germline mutations of the gene were identified in RER+ tumor cells. This mutS homolog is therefore likely to be responsible for HNPCC.
Mesh-terms: Amino Acid Sequence; Animals; Base Sequence; Brain :: metabolism; Cell Line; Chromosome Banding; Chromosome Mapping; Chromosomes, Human, Pair 2; Colon :: metabolism; Colonic Neoplasms :: genetics; Colorectal Neoplasms, Hereditary Nonpolyposis :: genetics; Comparative Study; DNA Primers; DNA Repair :: genetics; DNA-Binding Proteins; Fungal Proteins :: genetics; Gene Library; Genetic Markers; Hamsters; Human; Hybrid Cells; In Situ Hybridization, Fluorescence; Linkage (Genetics) ; Mice; Molecular Sequence Data; Mutation; Open Reading Frames; Polymerase Chain Reaction; Polymorphism (Genetics) ; Proto-Oncogene Proteins :: genetics; Rats; Saccharomyces cerevisiae :: genetics; Sequence Homology, Amino Acid; Support, Non-U.S. Gov't; Support, U.S. Gov't, P.H.S. ;
T R Hughes,
M J Marton,
A R Jones,
C J Roberts,
R Stoughton,
C D Armour,
H A Bennett,
E Coffey,
H Dai,
Y D He,
M J Kidd,
A M King,
M R Meyer,
D Slade,
P Y Lum,
S B Stepaniants,
D D Shoemaker,
D Gachotte,
K Chakraburtty,
J Simon,
M Bard,
S H Friend
Rosetta Inpharmatics, Inc., Kirkland, Washington 98034, USA.
Ascertaining the impact of uncharacterized perturbations on the cell is a fundamental problem in biology. Here, we describe how a single assay can be used to monitor hundreds of different cellular functions simultaneously. We constructed a reference database or "compendium" of expression profiles corresponding to 300 diverse mutations and chemical treatments in S. cerevisiae, and we show that the cellular pathways affected can be determined by pattern matching, even among very subtle profiles. The utility of this approach is validated by examining profiles caused by deletions of uncharacterized genes: we identify and experimentally confirm that eight uncharacterized open reading frames encode proteins required for sterol metabolism, cell wall function, mitochondrial respiration, or protein synthesis. We also show that the compendium can be used to characterize pharmacological perturbations by identifying a novel target of the commonly used drug dyclonine.
Mesh-terms: Cell Wall :: physiology; Databases, Factual; Ergosterol :: biosynthesis; Fungal Proteins :: biosynthesis; Fungal Proteins :: genetics; Gene Expression Profiling; Gene Expression Regulation, Fungal; Genes, Fungal; Genes, Reporter; Genetic Complementation Test; Human; Mitochondria :: metabolism; Models, Genetic; Mutagenesis; Open Reading Frames; Phenotype; Propiophenones :: pharmacology; Receptors, sigma :: genetics; Ribosomes; Saccharomyces cerevisiae :: drug effects; Saccharomyces cerevisiae :: genetics; Saccharomyces cerevisiae :: physiology; Steroid Isomerases :: genetics; Support, Non-U.S. Gov't; Support, U.S. Gov't, P.H.S. ; Transcription, Genetic; Variation (Genetics) ;
Molecular Biology and Virology Laboratory, Salk Institute, La Jolla, California 92037.
Using a yeast interaction screen to search for proteins that interact with cyclin D1-Cdk4, we identified a 27 kDa mouse protein related to the p21 cyclin-Cdk inhibitor. p27 interacts strongly with D-type cyclins and Cdk4 in vitro and more weakly with cyclin E and Cdk2. In mouse fibroblasts, p27 is associated predominantly with cyclin D1-Cdk4. Recombinant p27 is a potent inhibitor of cyclin D1-Cdk4 and cyclin A-Cdk2 protein kinase activity and a weaker inhibitor of cyclin B1-Cdc2. Overexpression of p27 in Saos-2 cells causes G1 arrest. p27 protein levels do not change as serum-stimulated quiescent mouse fibroblasts progress through the cell cycle. p27 is identical to p27Kip1, a cyclin-Cdk inhibitor present in TGF beta-treated cells. p27 has the hallmarks of a negative regulator of G1 progression and may mediate TGF beta-induced G1 arrest.
Mesh-terms: 3T3 Cells; Amino Acid Sequence; Animals; Base Sequence; CDC2-CDC28 Kinases; Cell Cycle :: drug effects; Cell Cycle Proteins; Cell Line; Cloning, Molecular; Cyclin D1; Cyclin-Dependent Kinases; Cyclins :: genetics; Cyclins :: metabolism; Fungal Proteins :: genetics; Fungal Proteins :: metabolism; Fungal Proteins :: pharmacology; G1 Phase :: drug effects; Gene Expression Regulation; Mice; Microtubule-Associated Proteins :: genetics; Microtubule-Associated Proteins :: metabolism; Microtubule-Associated Proteins :: pharmacology; Molecular Sequence Data; Oncogene Proteins :: genetics; Oncogene Proteins :: metabolism; Open Reading Frames; Protein Kinase Inhibitors; Protein-Serine-Threonine Kinases; Proto-Oncogene Proteins; RNA, Messenger :: analysis; Recombinant Fusion Proteins :: metabolism; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, P.H.S. ; Sequence Analysis, DNA; Tumor Suppressor Proteins; Yeasts :: metabolism;
Paul A Rota,
M Steven Oberste,
Stephan S Monroe,
W Allan Nix,
Ray Campagnoli,
Joseph P Icenogle,
Silvia Peñaranda,
Bettina Bankamp,
Kaija Maher,
Min-Hsin Chen,
Suxiong Tong,
Azaibi Tamin,
Luis Lowe,
Michael Frace,
Joseph L DeRisi,
Qi Chen,
David Wang,
Dean D Erdman,
Teresa C T Peret,
Cara Burns,
Thomas G Ksiazek,
Pierre E Rollin,
Anthony Sanchez,
Stephanie Liffick,
Brian Holloway,
Josef Limor,
Karen McCaustland,
Melissa Olsen-Rasmussen,
Ron Fouchier,
Stephan Günther,
Albert D M E Osterhaus,
Christian Drosten,
Mark A Pallansch,
Larry J Anderson,
William J Bellini
In March 2003, a novel coronavirus (SARS-CoV) was discovered in association with cases of severe acute respiratory syndrome (SARS). The sequence of the complete genome of SARS-CoV was determined, and the initial characterization of the viral genome is presented in this report. The genome of SARS-CoV is 29,727 nucleotides in length and has 11 open reading frames, and its genome organization is similar to that of other coronaviruses. Phylogenetic analyses and sequence comparisons showed that SARS-CoV is not closely related to any of the previously characterized coronaviruses.
Mesh-terms: Amino Acid Sequence; Conserved Sequence; Coronavirus :: classification; Coronavirus :: genetics; DNA, Complementary; Endopeptidases :: chemistry; Endopeptidases :: genetics; Genome, Viral; Humans; Membrane Glycoproteins :: chemistry; Membrane Glycoproteins :: genetics; Molecular Sequence Data; Nucleocapsid Proteins :: chemistry; Nucleocapsid Proteins :: genetics; Open Reading Frames; Phylogeny; Polyproteins :: chemistry; Polyproteins :: genetics; RNA Replicase :: chemistry; RNA Replicase :: genetics; RNA, Messenger :: genetics; RNA, Messenger :: metabolism; RNA, Viral :: genetics; Regulatory Sequences, Nucleic Acid; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, P.H.S. ; SARS Virus :: chemistry; SARS Virus :: classification; SARS Virus :: genetics; SARS Virus :: isolation & purification; Sequence Analysis, DNA; Severe Acute Respiratory Syndrome :: virology; Transcription, Genetic; Viral Envelope Proteins :: chemistry; Viral Envelope Proteins :: genetics; Viral Matrix Proteins :: chemistry; Viral Matrix Proteins :: genetics; Viral Proteins :: chemistry; Viral Proteins :: genetics;
Marco A Marra,
Steven J M Jones,
Caroline R Astell,
Robert A Holt,
Angela Brooks-Wilson,
Yaron S N Butterfield,
Jaswinder Khattra,
Jennifer K Asano,
Sarah A Barber,
Susanna Y Chan,
Alison Cloutier,
Shaun M Coughlin,
Doug Freeman,
Noreen Girn,
Obi L Griffith,
Stephen R Leach,
Michael Mayo,
Helen McDonald,
Stephen B Montgomery,
Pawan K Pandoh,
Anca S Petrescu,
A Gordon Robertson,
Jacqueline E Schein,
Asim Siddiqui,
Duane E Smailus,
Jeff M Stott,
George S Yang,
Francis Plummer,
Anton Andonov,
Harvey Artsob,
Nathalie Bastien,
Kathy Bernard,
Timothy F Booth,
Donnie Bowness,
Martin Czub,
Michael Drebot,
Lisa Fernando,
Ramon Flick,
Michael Garbutt,
Michael Gray,
Allen Grolla,
Steven Jones,
Heinz Feldmann,
Adrienne Meyers,
Amin Kabani,
Yan Li,
Susan Normand,
Ute Stroher,
Graham A Tipples,
Shaun Tyler,
Robert Vogrig,
Diane Ward,
Brynn Watson,
Robert C Brunham,
Mel Krajden,
Martin Petric,
Danuta M Skowronski,
Chris Upton,
Rachel L Roper
Department of Microbiology, University of Colorado Health Sciences Center, Denver, CO 80262, USA. kathryn.holmes@UCHSC.edu
We sequenced the 29,751-base genome of the severe acute respiratory syndrome (SARS)-associated coronavirus known as the Tor2 isolate. The genome sequence reveals that this coronavirus is only moderately related to other known coronaviruses, including two human coronaviruses, HCoV-OC43 and HCoV-229E. Phylogenetic analysis of the predicted viral proteins indicates that the virus does not closely resemble any of the three previously known groups of coronaviruses. The genome sequence will aid in the diagnosis of SARS virus infection in humans and potential animal hosts (using polymerase chain reaction and immunological tests), in the development of antivirals (including neutralizing antibodies), and in the identification of putative epitopes for vaccine development.
Mesh-terms: 3' Untranslated Regions; 5' Untranslated Regions; Animals; Base Sequence; Conserved Sequence; Coronavirus :: classification; Coronavirus :: genetics; DNA, Complementary; Frameshifting, Ribosomal; Genome, Viral; Humans; Membrane Glycoproteins :: chemistry; Membrane Glycoproteins :: genetics; Nucleocapsid Proteins :: chemistry; Nucleocapsid Proteins :: genetics; Open Reading Frames; Phylogeny; RNA Replicase :: chemistry; RNA Replicase :: genetics; RNA, Viral :: genetics; RNA, Viral :: isolation & purification; Regulatory Sequences, Nucleic Acid; Research Support, Non-U.S. Gov't; SARS Virus :: classification; SARS Virus :: genetics; SARS Virus :: isolation & purification; Sequence Analysis, DNA; Severe Acute Respiratory Syndrome :: virology; Viral Envelope Proteins :: chemistry; Viral Envelope Proteins :: genetics; Viral Matrix Proteins :: chemistry; Viral Matrix Proteins :: genetics; Viral Proteins :: chemistry; Viral Proteins :: genetics;
Molecular Biology Institute and Departments of Energy Laboratory of Structural Biology and Molecular Medicine, University of California, Los Angeles, CA 90095-1570, USA.
Determining protein functions from genomic sequences is a central goal of bioinformatics. We present a method based on the assumption that proteins that function together in a pathway or structural complex are likely to evolve in a correlated fashion. During evolution, all such functionally linked proteins tend to be either preserved or eliminated in a new species. We describe this property of correlated evolution by characterizing each protein by its phylogenetic profile, a string that encodes the presence or absence of a protein in every known genome. We show that proteins having matching or similar profiles strongly tend to be functionally linked. This method of phylogenetic profiling allows us to predict the function of uncharacterized proteins.
Mesh-terms: Bacterial Proteins :: chemistry; Bacterial Proteins :: genetics; Comparative Study; Escherichia coli :: genetics; Evolution, Molecular; Genome; Genome, Bacterial; Models, Biological; Open Reading Frames; Phylogeny; Proteins :: chemistry; Proteins :: genetics; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, U.S. Gov't, P.H.S. ; Ribosomal Proteins :: chemistry;
Kazusa DNA Research Institute, Kisarazu, Chiba, Japan.
As an extension of our human cDNA project for accumulating sequence information on the coding sequences of unidentified genes, we here present the entire sequences of 100 cDNA clones of unidentified genes, named KIAA1673-KIAA1772, from three sets of size-fractionated cDNA libraries derived from human adult whole brain, hippocampus, and fetal whole brain. The average sizes of the inserts and corresponding open reading frames of cDNA clones analyzed here were 4.9 kb and 2.7 kb (corresponding to 895 amino acid residues), respectively. By computer-assisted analysis of the deduced amino acid sequences, 44 predicted gene products were classified into five functional categories of proteins relating to cell signaling/communication, nucleic acid management, cell structure/motility, protein management, and metabolism. Furthermore, the expression profiles of the genes were also studied in 10 human tissues, 8 brain regions, spinal cord, fetal brain and fetal liver by reverse-transcription-coupled polymerase chain reaction, the products of which were quantified by enzyme-linked immunosorbent assay.
Mesh-terms: Amino Acids :: chemistry; Brain :: metabolism; Cell Movement; DNA, Complementary :: metabolism; Databases, Factual; Enzyme-Linked Immunosorbent Assay; Gene Library; Genome, Human; Human; Models, Genetic; Nucleic Acids :: metabolism; Open Reading Frames; Physical Chromosome Mapping; Reverse Transcriptase Polymerase Chain Reaction; Signal Transduction; Software; Support, Non-U.S. Gov't; Tissue Distribution;
