|
Latest Paper:
Statistical and Computational Genetics, Wellcome Trust Sanger Institute, Cambridge CB10 1HH, United Kingdom.
Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. pickrell@uchicago.edu
Li et al.(Research Articles, 1 July 2011, p. 53; published online 19 May 2011) reported more than 10,000 mismatches between messenger RNA and DNA sequences from the same individuals, which they attributed to previously unrecognized mechanisms of gene regulation. We found that at least 88% of these sequence mismatches can likely be explained by technical artifacts such as errors in mapping sequencing reads to a reference genome, sequencing errors, and genetic variation.
PLoS One. 2012 ;7 (2):e30629
22359548
Jean-Baptiste Veyrieras,
Daniel J Gaffney,
Joseph K Pickrell,
Yoav Gilad,
Matthew Stephens,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America.
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3' untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
Daniel G MacArthur,
Suganthi Balasubramanian,
Adam Frankish,
Ni Huang,
James Morris,
Klaudia Walter,
Luke Jostins,
Lukas Habegger,
Joseph K Pickrell,
Stephen B Montgomery,
Cornelis A Albers,
Zhengdong D Zhang,
Donald F Conrad,
Gerton Lunter,
Hancheng Zheng,
Qasim Ayub,
Mark A DePristo,
Eric Banks,
Min Hu,
Robert E Handsaker,
Jeffrey A Rosenfeld,
Menachem Fromer,
Mike Jin,
Xinmeng Jasmine Mu,
Ekta Khurana,
Kai Ye,
Mike Kay,
Gary Ian Saunders,
Marie-Marthe Suner,
Toby Hunt,
If H A Barnes,
Clara Amid,
Denise R Carvalho-Silva,
Alexandra H Bignell,
Catherine Snow,
Bryndis Yngvadottir,
Suzannah Bumpstead,
David N Cooper,
Yali Xue,
Irene Gallego Romero,
Jun Wang,
Yingrui Li,
Richard A Gibbs,
Steven A McCarroll,
Emmanouil T Dermitzakis,
Jonathan K Pritchard,
Jeffrey C Barrett,
Jennifer Harrow,
Matthew E Hurles,
Mark B Gerstein,
Chris Tyler-Smith
Wellcome Trust Sanger Institute, Hinxton, UK. macarthur@atgu.mgh.harvard.edu
Genome-sequencing studies indicate that all humans carry many genetic variants predicted to cause loss of function (LoF) of protein-coding genes, suggesting unexpected redundancy in the human genome. Here we apply stringent filters to 2951 putative LoF variants obtained from 185 human genomes to determine their true prevalence and properties. We estimate that human genomes typically contain ~100 genuine LoF variants with ~20 genes completely inactivated. We identify rare and likely deleterious LoF alleles, including 26 known and 21 predicted severe disease-causing variants, as well as common LoF variants in nonessential genes. We describe functional and evolutionary differences between LoF-tolerant and recessive disease genes and a method for using these differences to prioritize candidate genes found in clinical sequencing studies.
Jacob F Degner,
Athma A Pai,
Roger Pique-Regi,
Jean-Baptiste Veyrieras,
Daniel J Gaffney,
Joseph K Pickrell,
Sherryl De Leon,
Katelyn Michelini,
Noah Lewellen,
Gregory E Crawford,
Matthew Stephens,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.
The mapping of expression quantitative trait loci (eQTLs) has emerged as an important tool for linking genetic variation to changes in gene regulation. However, it remains difficult to identify the causal variants underlying eQTLs, and little is known about the regulatory mechanisms by which they act. Here we show that genetic variants that modify chromatin accessibility and transcription factor binding are a major mechanism through which genetic variation leads to gene expression differences among humans. We used DNase I sequencing to measure chromatin accessibility in 70 Yoruba lymphoblastoid cell lines, for which genome-wide genotypes and estimates of gene expression levels are also available. We obtained a total of 2.7 billion uniquely mapped DNase I-sequencing (DNase-seq) reads, which allowed us to produce genome-wide maps of chromatin accessibility for each individual. We identified 8,902 locations at which the DNase-seq read depth correlated significantly with genotype at a nearby single nucleotide polymorphism or insertion/deletion (false discovery rate = 10%). We call such variants 'DNase I sensitivity quantitative trait loci'(dsQTLs). We found that dsQTLs are strongly enriched within inferred transcription factor binding sites and are frequently associated with allele-specific changes in transcription factor binding. A substantial fraction (16%) of dsQTLs are also associated with variation in the expression levels of nearby genes (that is, these loci are also classified as eQTLs). Conversely, we estimate that as many as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. Our observations indicate that dsQTLs are highly abundant in the human genome and are likely to be important contributors to phenotypic variation.
Genome Res. 2011 Dec 29;:
22207615
George H Perry,
Pall Melsted,
John C Marioni,
Ying Wang,
Russell Bainer,
Joseph K Pickrell,
Katelyn Michelini,
Sarah Zehr,
Anne D Yoder,
Matthew Stephens,
Jonathan K Pritchard,
Yoav Gilad
University of Chicago;
Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of non-human primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps towards addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 non-human primates. Of the non-human primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5,721 genes per species, and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerels sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have therefore strong potential to achieve long-term success.
Genome Biol. 2011 Jul 3;12 (6):405
21791120
Jordana T Bell,
Athma A Pai,
Joseph K Pickrell,
Daniel J Gaffney,
Roger Pique-Regi,
Jacob F Degner,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, 920 E, 58th St, Chicago, IL 60637, USA. jordana@well.ox.ac.uk.
Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. pickrell@uchicago.edu
MOTIVATION Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy. RESULTS Using sequencing data from the 1000 Genomes Project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. AVAILABILITY Files for masking out these regions are available at eqtl.uchicago.edu
Genome Biol. 2011 ;12 (1):R10
21251332
Cit:3
Jordana T Bell,
Athma A Pai,
Joseph K Pickrell,
Daniel J Gaffney,
Roger Pique-Regi,
Jacob F Degner,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA. jordana@well.ox.ac.uk
BACKGROUND DNA methylation is an essential epigenetic mechanism involved in gene regulation and disease, but little is known about the mechanisms underlying inter-individual variation in methylation profiles. Here we measured methylation levels at 22,290 CpG dinucleotides in lymphoblastoid cell lines from 77 HapMap Yoruba individuals, for which genome-wide gene expression and genotype data were also available. RESULTS Association analyses of methylation levels with more than three million common single nucleotide polymorphisms (SNPs) identified 180 CpG-sites in 173 genes that were associated with nearby SNPs (putatively in cis, usually within 5 kb) at a false discovery rate of 10%. The most intriguing trans signal was obtained for SNP rs10876043 in the disco-interacting protein 2 homolog B gene (DIP2B, previously postulated to play a role in DNA methylation), that had a genome-wide significant association with the first principal component of patterns of methylation; however, we found only modest signal of trans-acting associations overall. As expected, we found significant negative correlations between promoter methylation and gene expression levels measured by RNA-sequencing across genes. Finally, there was a significant overlap of SNPs that were associated with both methylation and gene expression levels. CONCLUSIONS Our results demonstrate a strong genetic component to inter-individual variation in DNA methylation profiles. Furthermore, there was an enrichment of SNPs that affect both methylation and gene expression, providing evidence for shared mechanisms in a fraction of genes.
PLoS Genet. 2010 ;6 (12):e1001236
21151575
Cit:4
Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America.
While the majority of multiexonic human genes show some evidence of alternative splicing, it is unclear what fraction of observed splice forms is functionally relevant. In this study, we examine the extent of alternative splicing in human cells using deep RNA sequencing and de novo identification of splice junctions. We demonstrate the existence of a large class of low abundance isoforms, encompassing approximately 150,000 previously unannotated splice junctions in our data. Newly-identified splice sites show little evidence of evolutionary conservation, suggesting that the majority are due to erroneous splice site choice. We show that sequence motifs involved in the recognition of exons are enriched in the vicinity of unconserved splice sites. We estimate that the average intron has a splicing error rate of approximately 0.7% and show that introns in highly expressed genes are spliced more accurately, likely due to their shorter length. These results implicate noisy splicing as an important property of genome evolution.
|
Polish News |
|||||||||||||||
|
|||||||||||||||||
|
|