DNA, Intergenic
Latest Paper:
Pontifical Catholic University of Chile, Santiago, Chile.
Cupriavidus necator JMP134 has been extensively studied because of its ability to degrade chloroaromatic compounds, including the herbicides 2,4-dichlorophenoxyacetic acid (2,4-D) and 3-chlorobenzoic acid (3-CB), which is achieved through the pJP4-encoded chlorocatechol degradation gene clusters: tfdCIDIEIFI and tfdDIICIIEIIFII. The present work describes a different tfd-genes expression profile depending on whether C. necator cells were induced with 2,4-D or 3-CB. By contrast, in vitro binding assays of the purified transcriptional activator TfdR showed similar binding to both tfd intergenic regions; these results were confirmed by in vivo studies of the expression of transcriptional lacZ fusions for these intergenic regions. Experiments aimed at investigating whether other pJP4 plasmid or chromosomal regulatory proteins could contribute to the differences in the response of both tfd promoters to induction by 2,4-D and 3-CB showed that the transcriptional regulators from the benzoate degradation pathway, CatR1 and CatR2, affected 3-CB- and 2,4-D-related growth capabilities. It was also determined that the ISJP4-interrupted protein TfdT decreased growth on 3-CB. In addition, an ORF with 34% amino acid identity to IclR-type transcriptional regulator members and located near the tfdII gene cluster module was shown to modulate the 2,4-D growth capability. Taken together, these results suggest that tfd transcriptional regulation in C. necator JMP134 is far more complex than previously thought and that it involves proteins from different transcriptional regulator families.
Mesh-terms: 2,4-Dichlorophenoxyacetic Acid :: metabolism; Artificial Gene Fusion; Bacterial Proteins :: metabolism; Chlorobenzoates :: metabolism; Cupriavidus necator :: physiology; DNA, Intergenic; Electrophoretic Mobility Shift Assay; Gene Expression Profiling; Gene Expression Regulation, Bacterial; Gene Order; Genes, Bacterial; Genes, Reporter; Protein Binding; Regulon; Transcription Factors :: metabolism; Transcriptional Activation; beta-Galactosidase :: genetics; beta-Galactosidase :: metabolism;
Most cited papers:
Jun Yu,
Songnian Hu,
Jun Wang,
Gane Ka-Shu Wong,
Songgang Li,
Bin Liu,
Yajun Deng,
Li Dai,
Yan Zhou,
Xiuqing Zhang,
Mengliang Cao,
Jing Liu,
Jiandong Sun,
Jiabin Tang,
Yanjiong Chen,
Xiaobing Huang,
Wei Lin,
Chen Ye,
Wei Tong,
Lijuan Cong,
Jianing Geng,
Yujun Han,
Lin Li,
Wei Li,
Guangqiang Hu,
Xiangang Huang,
Wenjie Li,
Jian Li,
Zhanwei Liu,
Long Li,
Jianping Liu,
Qiuhui Qi,
Jinsong Liu,
Li Li,
Tao Li,
Xuegang Wang,
Hong Lu,
Tingting Wu,
Miao Zhu,
Peixiang Ni,
Hua Han,
Wei Dong,
Xiaoyu Ren,
Xiaoli Feng,
Peng Cui,
Xianran Li,
Hao Wang,
Xin Xu,
Wenxue Zhai,
Zhao Xu,
Jinsong Zhang,
Sijie He,
Jianguo Zhang,
Jichen Xu,
Kunlin Zhang,
Xianwu Zheng,
Jianhai Dong,
Wanyong Zeng,
Lin Tao,
Jia Ye,
Jun Tan,
Xide Ren,
Xuewei Chen,
Jun He,
Daofeng Liu,
Wei Tian,
Chaoguang Tian,
Hongai Xia,
Qiyu Bao,
Gang Li,
Hui Gao,
Ting Cao,
Juan Wang,
Wenming Zhao,
Ping Li,
Wei Chen,
Xudong Wang,
Yong Zhang,
Jianfei Hu,
Jing Wang,
Song Liu,
Jian Yang,
Guangyu Zhang,
Yuqing Xiong,
Zhijie Li,
Long Mao,
Chengshu Zhou,
Zhen Zhu,
Runsheng Chen,
Bailin Hao,
Weimou Zheng,
Shouyi Chen,
Wei Guo,
Guojie Li,
Siqi Liu,
Ming Tao,
Jian Wang,
Lihuang Zhu,
Longping Yuan,
Huanming Yang
We have produced a draft sequence of the rice genome for the most widely cultivated subspecies in China, Oryza sativa L. ssp. indica, by whole-genome shotgun sequencing. The genome was 466 megabases in size, with an estimated 46,022 to 55,615 genes. Functional coverage in the assembled sequences was 92.0%. About 42.2% of the genome was in exact 20-nucleotide oligomer repeats, and most of the transposons were in the intergenic regions between genes. Although 80.6% of predicted Arabidopsis thaliana genes had a homolog in rice, only 49.4% of predicted rice genes had a homolog in A. thaliana. The large proportion of rice genes with no recognizable homologs is due to a gradient in the GC content of rice coding sequences.
Mesh-terms: Arabidopsis :: genetics; Base Composition; Computational Biology; Contig Mapping; DNA Transposable Elements; DNA, Intergenic; DNA, Plant :: chemistry; DNA, Plant :: genetics; Databases, Nucleic Acid; Exons; Gene Duplication; Genes, Plant; Genome, Plant; Genomics; Introns; Molecular Sequence Data; Oryza sativa :: genetics; Plant Proteins :: chemistry; Plant Proteins :: genetics; Polymorphism (Genetics) ; Repetitive Sequences, Nucleic Acid; Sequence Analysis, DNA; Sequence Homology, Nucleic Acid; Software; Species Specificity; Support, Non-U.S. Gov't; Support, U.S. Gov't, Non-P.H.S. ; Support, U.S. Gov't, P.H.S. ; Synteny;
J C Venter,
M D Adams,
E W Myers,
P W Li,
R J Mural,
G G Sutton,
H O Smith,
M Yandell,
C A Evans,
R A Holt,
J D Gocayne,
P Amanatides,
R M Ballew,
D H Huson,
J R Wortman,
Q Zhang,
C D Kodira,
X H Zheng,
L Chen,
M Skupski,
G Subramanian,
P D Thomas,
J Zhang,
G L Gabor Miklos,
C Nelson,
S Broder,
A G Clark,
J Nadeau,
V A McKusick,
N Zinder,
A J Levine,
R J Roberts,
M Simon,
C Slayman,
M Hunkapiller,
R Bolanos,
A Delcher,
I Dew,
D Fasulo,
M Flanigan,
L Florea,
A Halpern,
S Hannenhalli,
S Kravitz,
S Levy,
C Mobarry,
K Reinert,
K Remington,
J Abu-Threideh,
E Beasley,
K Biddick,
V Bonazzi,
R Brandon,
M Cargill,
I Chandramouliswaran,
R Charlab,
K Chaturvedi,
Z Deng,
V Di Francesco,
P Dunn,
K Eilbeck,
C Evangelista,
A E Gabrielian,
W Gan,
W Ge,
F Gong,
Z Gu,
P Guan,
T J Heiman,
M E Higgins,
R R Ji,
Z Ke,
K A Ketchum,
Z Lai,
Y Lei,
Z Li,
J Li,
Y Liang,
X Lin,
F Lu,
G V Merkulov,
N Milshina,
H M Moore,
A K Naik,
V A Narayan,
B Neelam,
D Nusskern,
D B Rusch,
S Salzberg,
W Shao,
B Shue,
J Sun,
Z Wang,
A Wang,
X Wang,
J Wang,
M Wei,
R Wides,
C Xiao,
C Yan,
A Yao,
J Ye,
M Zhan,
W Zhang,
H Zhang,
Q Zhao,
L Zheng,
F Zhong,
W Zhong,
S Zhu,
S Zhao,
D Gilbert,
S Baumhueter,
G Spier,
C Carter,
A Cravchik,
T Woodage,
F Ali,
H An,
A Awe,
D Baldwin,
H Baden,
M Barnstead,
I Barrow,
K Beeson,
D Busam,
A Carver,
A Center,
M L Cheng,
L Curry,
S Danaher,
L Davenport,
R Desilets,
S Dietz,
K Dodson,
L Doup,
S Ferriera,
N Garg,
A Gluecksmann,
B Hart,
J Haynes,
C Haynes,
C Heiner,
S Hladun,
D Hostin,
J Houck,
T Howland,
C Ibegwam,
J Johnson,
F Kalush,
L Kline,
S Koduru,
A Love,
F Mann,
D May,
S McCawley,
T McIntosh,
I McMullen,
M Moy,
L Moy,
B Murphy,
K Nelson,
C Pfannkoch,
E Pratts,
V Puri,
H Qureshi,
M Reardon,
R Rodriguez,
Y H Rogers,
D Romblad,
B Ruhfel,
R Scott,
C Sitter,
M Smallwood,
E Stewart,
R Strong,
E Suh,
R Thomas,
N N Tint,
S Tse,
C Vech,
G Wang,
J Wetter,
S Williams,
M Williams,
S Windsor,
E Winn-Deen,
K Wolfe,
J Zaveri,
K Zaveri,
J F Abril,
R Guigó,
M J Campbell,
K V Sjolander,
B Karlak,
A Kejariwal,
H Mi,
B Lazareva,
T Hatton,
A Narechania,
K Diemer,
A Muruganujan,
N Guo,
S Sato,
V Bafna,
S Istrail,
R Lippert,
R Schwartz,
B Walenz,
S Yooseph,
D Allen,
A Basu,
J Baxendale,
L Blick,
M Caminha,
J Carnes-Stine,
P Caulk,
Y H Chiang,
M Coyne,
C Dahlke,
A Mays,
M Dombroski,
M Donnelly,
D Ely,
S Esparham,
C Fosler,
H Gire,
S Glanowski,
K Glasser,
A Glodek,
M Gorokhov,
K Graham,
B Gropman,
M Harris,
J Heil,
S Henderson,
J Hoover,
D Jennings,
C Jordan,
J Jordan,
J Kasha,
L Kagan,
C Kraft,
A Levitsky,
M Lewis,
X Liu,
J Lopez,
D Ma,
W Majoros,
J McDaniel,
S Murphy,
M Newman,
T Nguyen,
N Nguyen,
M Nodell,
S Pan,
J Peck,
M Peterson,
W Rowe,
R Sanders,
J Scott,
M Simpson,
T Smith,
A Sprague,
T Stockwell,
R Turner,
E Venter,
M Wang,
M Wen,
D Wu,
M Wu,
A Xia,
A Zandieh,
X Zhu
A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies-a whole-genome assembly and a regional chromosome assembly-were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective coverage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additional approximately 12,000 computationally derived genes with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious, almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with 75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history. Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal function, with tissue-specific developmental regulation, and with the hemostasis and immune systems. DNA sequence comparisons between the consensus sequence and publicly funded genome data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins, but the task of determining which SNPs have functional consequences remains an open challenge.
Mesh-terms: Algorithms; Animals; Chromosome Banding; Chromosome Mapping; Chromosomes, Artificial, Bacterial; Computational Biology; Consensus Sequence; CpG Islands; DNA, Intergenic; Databases, Factual; Evolution, Molecular; Exons; Female; Gene Duplication; Genes; Genome, Human; Human; Human Genome Project; Introns; Male; Phenotype; Physical Chromosome Mapping; Polymorphism, Single Nucleotide; Proteins :: genetics; Proteins :: physiology; Pseudogenes; Repetitive Sequences, Nucleic Acid; Retroelements; Sequence Analysis, DNA :: methods; Species Specificity; Support, Non-U.S. Gov't; Variation (Genetics) ;
Adam Siepel,
Gill Bejerano,
Jakob S Pedersen,
Angie S Hinrichs,
Minmei Hou,
Kate Rosenbloom,
Hiram Clawson,
John Spieth,
Ladeana W Hillier,
Stephen Richards,
George M Weinstock,
Richard K Wilson,
Richard A Gibbs,
W James Kent,
Webb Miller,
David Haussler
Center for Biomolecular Science and Engineering, University of California, Santa Cruz, Santa Cruz, California 95064, USA. acs@soe.ucsc.edu
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%-8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%-53%), Caenorhabditis elegans (18%-37%), and Saccharaomyces cerevisiae (47%-68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3' UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.
Mesh-terms: 3' Untranslated Regions; Animals; Base Pairing :: genetics; Base Sequence; Caenorhabditis elegans :: genetics; Conserved Sequence; DNA, Intergenic; Evolution, Molecular; Genome; Humans; Insects :: genetics; Molecular Sequence Data; Research Support, N.I.H., Extramural; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, P.H.S. ; Saccharomyces :: genetics; Vertebrates :: genetics; Yeasts :: genetics;
Kayoko Yamada,
Jun Lim,
Joseph M Dale,
Huaming Chen,
Paul Shinn,
Curtis J Palm,
Audrey M Southwick,
Hank C Wu,
Christopher Kim,
Michelle Nguyen,
Paul Pham,
Rosa Cheuk,
George Karlin-Newmann,
Shirley X Liu,
Bao Lam,
Hitomi Sakano,
Troy Wu,
Guixia Yu,
Molly Miranda,
Hong L Quach,
Matthew Tripp,
Charlie H Chang,
Jeong M Lee,
Mitsue Toriumi,
Marie M H Chan,
Carolyn C Tang,
Courtney S Onodera,
Justine M Deng,
Kenji Akiyama,
Yasser Ansari,
Takahiro Arakawa,
Jenny Banh,
Fumika Banno,
Leah Bowser,
Shelise Brooks,
Piero Carninci,
Qimin Chao,
Nathan Choy,
Akiko Enju,
Andrew D Goldsmith,
Mani Gurjal,
Nancy F Hansen,
Yoshihide Hayashizaki,
Chanda Johnson-Hopson,
Vickie W Hsuan,
Kei Iida,
Meagan Karnes,
Shehnaz Khan,
Eric Koesema,
Junko Ishida,
Paul X Jiang,
Ted Jones,
Jun Kawai,
Asako Kamiya,
Cristina Meyers,
Maiko Nakajima,
Mari Narusaka,
Motoaki Seki,
Tetsuya Sakurai,
Masakazu Satou,
Racquel Tamse,
Maria Vaysberg,
Erika K Wallender,
Cecilia Wong,
Yuki Yamamura,
Shiaulou Yuan,
Kazuo Shinozaki,
Ronald W Davis,
Athanasios Theologis,
Joseph R Ecker
Plant Gene Expression Center, Albany, CA 94710, USA.
Functional analysis of a genome requires accurate gene structure information and a complete gene inventory. A dual experimental strategy was used to verify and correct the initial genome sequence annotation of the reference plant Arabidopsis. Sequencing full-length cDNAs and hybridizations using RNA populations from various tissues to a set of high-density oligonucleotide arrays spanning the entire genome allowed the accurate annotation of thousands of gene structures. We identified 5817 novel transcription units, including a substantial amount of antisense gene transcription, and 40 genes within the genetically defined centromeres. This approach resulted in completion of approximately 30% of the Arabidopsis ORFeome as a resource for global functional experimentation of the plant proteome.
Mesh-terms: Arabidopsis :: genetics; Chromosome Mapping; Chromosomes, Plant :: genetics; Cloning, Molecular; Computational Biology; DNA, Complementary :: genetics; DNA, Intergenic; Expressed Sequence Tags; Gene Expression Profiling; Genes, Plant; Genome, Plant; Genomics; Nucleic Acid Hybridization; Oligonucleotide Array Sequence Analysis; Open Reading Frames; RNA, Messenger :: genetics; RNA, Plant :: genetics; Reverse Transcriptase Polymerase Chain Reaction; Support, Non-U.S. Gov't; Support, U.S. Gov't, Non-P.H.S. ; Transcription, Genetic;
Gill Bejerano,
Michael Pheasant,
Igor Makunin,
Stuart Stephen,
W James Kent,
John S Mattick,
David Haussler
Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, CA 95064, USA.
There are 481 segments longer than 200 base pairs (bp) that are absolutely conserved (100% identity with no insertions or deletions) between orthologous regions of the human, rat, and mouse genomes. Nearly all of these segments are also conserved in the chicken and dog genomes, with an average of 95 and 99% identity, respectively. Many are also significantly conserved in fish. These ultraconserved elements of the human genome are most often located either overlapping exons in genes involved in RNA processing or in introns or nearby genes involved in the regulation of transcription and development. Along with more than 5000 sequences of over 100 bp that are absolutely conserved among the three sequenced mammals, these represent a class of genetic elements whose functions and evolutionary origins are yet to be determined, but which are more highly conserved between these species than are proteins and appear to be essential for the ontogeny of mammals and other vertebrates.
Mesh-terms: Alternative Splicing; Animals; Base Sequence; Chickens :: genetics; Computational Biology; Conserved Sequence; DNA, Intergenic; Dogs :: genetics; Evolution, Molecular; Exons; Gene Expression Regulation; Genes; Genes, Structural; Genome; Genome, Human; Human; Introns; Mice :: genetics; Molecular Sequence Data; Mutation; Nucleic Acid Conformation; RNA :: chemistry; RNA :: genetics; RNA :: metabolism; Rats :: genetics; Support, Non-U.S. Gov't; Support, U.S. Gov't, P.H.S. ; Takifugu :: genetics;
U.S. Department of Energy Joint Genome Institute, Walnut Creek, CA 94598, USA.
Mesh-terms: Animals; Anura :: genetics; Base Sequence; Conserved Sequence; DNA, Intergenic; Drosophila Proteins; Enhancer Elements (Genetics) ; Evolution, Molecular; Gene Expression Regulation; Genes, Reporter; Human; Humans; Introns; Mice; Mice, Transgenic; Nuclear Proteins :: genetics; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, U.S. Gov't, P.H.S. ; Support, U.S. Gov't, Non-P.H.S. ; Support, U.S. Gov't, P.H.S. ; Synteny; Takifugu :: genetics; Tetraodontiformes :: genetics; Xenopus :: genetics; Zebrafish :: genetics;
M K Raghuraman,
E A Winzeler,
D Collingwood,
S Hunt,
L Wodicka,
A Conway,
D J Lockhart,
R W Davis,
B J Brewer,
W L Fangman
Department of Genetics, Department of Mathematics, University of Washington, Seattle, WA 98195, USA. raghu@u.washington.edu
Oligonucleotide microarrays were used to map the detailed topography of chromosome replication in the budding yeast Saccharomyces cerevisiae. The times of replication of thousands of sites across the genome were determined by hybridizing replicated and unreplicated DNAs, isolated at different times in S phase, to the microarrays. Origin activations take place continuously throughout S phase but with most firings near mid-S phase. Rates of replication fork movement vary greatly from region to region in the genome. The two ends of each of the 16 chromosomes are highly correlated in their times of replication. This microarray approach is readily applicable to other organisms, including humans.
Mesh-terms: Algorithms; Base Sequence; Centromere :: metabolism; Chromosomes, Fungal :: genetics; Chromosomes, Fungal :: metabolism; DNA Replication; DNA, Fungal :: biosynthesis; DNA, Fungal :: genetics; DNA, Fungal :: metabolism; DNA, Intergenic; Fourier Analysis; Genome, Fungal; Kinetics; Nucleic Acid Hybridization; Oligonucleotide Array Sequence Analysis; Replication Origin; S Phase; Saccharomyces cerevisiae :: cytology; Saccharomyces cerevisiae :: genetics; Saccharomyces cerevisiae :: metabolism; Support, U.S. Gov't, P.H.S. ; Telomere :: metabolism; Transcription, Genetic;
Ewen F Kirkness,
Vineet Bafna,
Aaron L Halpern,
Samuel Levy,
Karin Remington,
Douglas B Rusch,
Arthur L Delcher,
Mihai Pop,
Wei Wang,
Claire M Fraser,
J Craig Venter
Laboratory of Genomic Diversity, National Cancer Institute, Frederick, MD 21702, USA.
A survey of the dog genome sequence (6.22 million sequence reads; 1.5x coverage) demonstrates the power of sample sequencing for comparative analysis of mammalian genomes and the generation of species-specific resources. More than 650 million base pairs (>25%) of dog sequence align uniquely to the human genome, including fragments of putative orthologs for 18,473 of 24,567 annotated human genes. Mutation rates, conserved synteny, repeat content, and phylogeny can be compared among human, mouse, and dog. A variety of polymorphic elements are identified that will be valuable for mapping the genetic basis of diseases and traits in the dog.
Mesh-terms: Animals; Chromosomes, Mammalian :: genetics; Comparative Study; Computational Biology; Conserved Sequence; Contig Mapping; DNA, Intergenic; Dogs :: genetics; Genome; Genome, Human; Genomics; Human; Long Interspersed Nucleotide Elements; Male; Mice :: genetics; Molecular Sequence Data; Mutation; Phylogeny; Physical Chromosome Mapping; Polymorphism, Single Nucleotide; RNA, Messenger :: genetics; Repetitive Sequences, Nucleic Acid; Sequence Alignment; Sequence Analysis, DNA; Short Interspersed Nucleotide Elements; Support, Non-U.S. Gov't; Synteny; Variation (Genetics) ;
Viktor Stolc,
Zareen Gauhar,
Christopher Mason,
Gabor Halasz,
Marinus F van Batenburg,
Scott A Rifkin,
Sujun Hua,
Tine Herreman,
Waraporn Tongprasit,
Paolo Emilio Barbano,
Harmen J Bussemaker,
Kevin P White
Yale U, New Haven, CT
We used a maskless photolithography method to produce DNA oligonucleotide microarrays with unique probe sequences tiled throughout the genome of Drosophila melanogaster and across predicted splice junctions. RNA expression of protein coding and nonprotein coding sequences was determined for each major stage of the life cycle, including adult males and females. We detected transcriptional activity for 93% of annotated genes and RNA expression for 41% of the probes in intronic and intergenic sequences. Comparison to genome-wide RNA interference data and to gene annotations revealed distinguishable levels of expression for different classes of genes and higher levels of expression for genes with essential cellular functions. Differential splicing was observed in about 40% of predicted genes, and 5440 previously unknown splice forms were detected. Genes within conserved regions of synteny with D. pseudoobscura had highly correlated expression; these regions ranged in length from 10 to 900 kilobase pairs. The expressed intergenic and intronic sequences are more likely to be evolutionarily conserved than nonexpressed ones, and about 15% of them appear to be developmentally regulated. Our results provide a draft expression map for the entire nonrepetitive genome, which reveals a much more extensive and diverse set of expressed sequences than was previously predicted.
Mesh-terms: Algorithms; Animals; Computational Biology; DNA, Intergenic; Drosophila :: genetics; Drosophila Proteins :: genetics; Drosophila Proteins :: physiology; Drosophila melanogaster :: genetics; Drosophila melanogaster :: growth & development; Evolution, Molecular; Exons; Female; Gene Expression; Gene Expression Profiling; Genes, Insect; Genome; Introns; Life Cycle Stages; Male; Oligonucleotide Array Sequence Analysis; Oligonucleotide Probes; RNA Splicing; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S. ; Research Support, U.S. Gov't, P.H.S. ; Synteny; Transcription, Genetic;
Paul Cliften,
Priya Sudarsanam,
Ashwin Desikan,
Lucinda Fulton,
Bob Fulton,
John Majors,
Robert Waterston,
Barak A Cohen,
Mark Johnston
Department of Genetics, Washington University School of Medicine, 660 South Euclid Avenue, St. Louis, MO 63110, USA.
The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.
Mesh-terms: Algorithms; Base Sequence; Binding Sites; Comparative Study; Computational Biology; Conserved Sequence; DNA, Intergenic; Gene Expression Profiling; Genes, Fungal; Genome, Fungal; Molecular Sequence Data; Phylogeny; Regulatory Sequences, Nucleic Acid; Research Support, U.S. Gov't, P.H.S. ; Saccharomyces :: classification; Saccharomyces :: genetics; Saccharomyces :: physiology; Saccharomyces cerevisiae :: genetics; Saccharomyces cerevisiae :: physiology; Sequence Alignment; Sequence Analysis, DNA; Support, U.S. Gov't, P.H.S. ; Transcription Factors :: metabolism;
