|
Jacob F Degner,
Athma A Pai,
Roger Pique-Regi,
Jean-Baptiste Veyrieras,
Daniel J Gaffney,
Joseph K Pickrell,
Sherryl De Leon,
Katelyn Michelini,
Noah Lewellen,
Gregory E Crawford,
Matthew Stephens,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA.
The mapping of expression quantitative trait loci (eQTLs) has emerged as an important tool for linking genetic variation to changes in gene regulation. However, it remains difficult to identify the causal variants underlying eQTLs, and little is known about the regulatory mechanisms by which they act. Here we show that genetic variants that modify chromatin accessibility and transcription factor binding are a major mechanism through which genetic variation leads to gene expression differences among humans. We used DNase I sequencing to measure chromatin accessibility in 70 Yoruba lymphoblastoid cell lines, for which genome-wide genotypes and estimates of gene expression levels are also available. We obtained a total of 2.7 billion uniquely mapped DNase I-sequencing (DNase-seq) reads, which allowed us to produce genome-wide maps of chromatin accessibility for each individual. We identified 8,902 locations at which the DNase-seq read depth correlated significantly with genotype at a nearby single nucleotide polymorphism or insertion/deletion (false discovery rate = 10%). We call such variants 'DNase I sensitivity quantitative trait loci'(dsQTLs). We found that dsQTLs are strongly enriched within inferred transcription factor binding sites and are frequently associated with allele-specific changes in transcription factor binding. A substantial fraction (16%) of dsQTLs are also associated with variation in the expression levels of nearby genes (that is, these loci are also classified as eQTLs). Conversely, we estimate that as many as 55% of eQTL single nucleotide polymorphisms are also dsQTLs. Our observations indicate that dsQTLs are highly abundant in the human genome and are likely to be important contributors to phenotypic variation.
Keywords: eqtl; qtl; dnase; chromatin accessibility; expression; expression variation; variation; human expression; chromatin; factor bind; gene; locy; accessibility; variant; map;
Journal Clubs: Bioinformatics Laboratory@BioInfoBank Institute;
Other papers by authors:
Genome Biol. 2012 Jan 31;13 (1):R7
22293038
Daniel J Gaffney,
Jean-Baptiste Veyrieras,
Jacob F Degner,
Roger Pique-Regi,
Athma A Pai,
Gregory E Crawford,
Matthew Stephens,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, University of Chicago, 920 E58th Street, Chicago, IL 60637, USA. dg13@sanger.ac.uk.
ABSTRACT: BACKGROUND: Expression quantitative trait loci (eQTLs) are likely to play an important role in the genetics of complex traits; however, their functional basis remains poorly understood. Using the HapMap lymphoblastoid cell lines, we combine 1000 Genomes genotypes and an extensive catalogue of human functional elements to investigate the biological mechanisms that eQTLs perturb. RESULTS: We use a Bayesian hierarchical model to estimate the enrichment of eQTLs in a wide variety of regulatory annotations. We find that approximately 40% of eQTLs occur in open chromatin, and that they are particularly enriched in transcription factor binding sites, suggesting that many directly impact protein-DNA interactions. Analysis of core promoter regions shows that eQTLs also frequently disrupt some known core promoter motifs but, surprisingly, are not enriched in other well-known motifs such as the TATA box. We also show that information from regulatory annotations alone, when weighted by the hierarchical model, can provide a meaningful ranking of the SNPs that are most likely to drive gene expression variation. CONCLUSIONS: Our study demonstrates how regulatory annotation and the association signal derived from eQTL-mapping can be combined into a single framework. We used this approach to further our understanding of the biology that drives human gene expression variation, and of the putatively causal SNPs that underlie it.
Genome Biol. 2011 Jul 3;12 (6):405
21791120
Jordana T Bell,
Athma A Pai,
Joseph K Pickrell,
Daniel J Gaffney,
Roger Pique-Regi,
Jacob F Degner,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, 920 E, 58th St, Chicago, IL 60637, USA. jordana@well.ox.ac.uk.
Genome Biol. 2011 ;12 (1):R10
21251332
Cit:3
Jordana T Bell,
Athma A Pai,
Joseph K Pickrell,
Daniel J Gaffney,
Roger Pique-Regi,
Jacob F Degner,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA. jordana@well.ox.ac.uk
BACKGROUND DNA methylation is an essential epigenetic mechanism involved in gene regulation and disease, but little is known about the mechanisms underlying inter-individual variation in methylation profiles. Here we measured methylation levels at 22,290 CpG dinucleotides in lymphoblastoid cell lines from 77 HapMap Yoruba individuals, for which genome-wide gene expression and genotype data were also available. RESULTS Association analyses of methylation levels with more than three million common single nucleotide polymorphisms (SNPs) identified 180 CpG-sites in 173 genes that were associated with nearby SNPs (putatively in cis, usually within 5 kb) at a false discovery rate of 10%. The most intriguing trans signal was obtained for SNP rs10876043 in the disco-interacting protein 2 homolog B gene (DIP2B, previously postulated to play a role in DNA methylation), that had a genome-wide significant association with the first principal component of patterns of methylation; however, we found only modest signal of trans-acting associations overall. As expected, we found significant negative correlations between promoter methylation and gene expression levels measured by RNA-sequencing across genes. Finally, there was a significant overlap of SNPs that were associated with both methylation and gene expression levels. CONCLUSIONS Our results demonstrate a strong genetic component to inter-individual variation in DNA methylation profiles. Furthermore, there was an enrichment of SNPs that affect both methylation and gene expression, providing evidence for shared mechanisms in a fraction of genes.
Nature. 2010 Mar 10;:
20220758
Cit:61
Joseph K Pickrell,
John C Marioni,
Athma A Pai,
Jacob F Degner,
Barbara E Engelhardt,
Everlyne Nkadori,
Jean-Baptiste Veyrieras,
Matthew Stephens,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics.
Understanding the genetic mechanisms underlying natural variation in gene expression is a central goal of both medical and evolutionary genetics, and studies of expression quantitative trait loci (eQTLs) have become an important tool for achieving this goal. Although all eQTL studies so far have assayed messenger RNA levels using expression microarrays, recent advances in RNA sequencing enable the analysis of transcript variation at unprecedented resolution. We sequenced RNA from 69 lymphoblastoid cell lines derived from unrelated Nigerian individuals that have been extensively genotyped by the International HapMap Project. By pooling data from all individuals, we generated a map of the transcriptional landscape of these cells, identifying extensive use of unannotated untranslated regions and more than 100 new putative protein-coding exons. Using the genotypes from the HapMap project, we identified more than a thousand genes at which genetic variation influences overall expression levels or splicing. We demonstrate that eQTLs near genes generally act by a mechanism involving allele-specific expression, and that variation that influences the inclusion of an exon is enriched within and near the consensus splice sites. Our results illustrate the power of high-throughput sequencing for the joint analysis of variation in transcription, splicing and allele-specific expression across individuals.
PLoS One. 2012 ;7 (2):e30629
22359548
Jean-Baptiste Veyrieras,
Daniel J Gaffney,
Joseph K Pickrell,
Yoav Gilad,
Matthew Stephens,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America. jb.veyrieras@gmail.com
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3' untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
Genome Res. 2011 Mar ;21 (3):447-55
21106904
Cit:5
Department of Human Genetics, University of Chicago, Chicago, Illinois 60637, USA. rpique@uchicago.edu
Accurate functional annotation of regulatory elements is essential for understanding global gene regulation. Here, we report a genome-wide map of 827,000 transcription factor binding sites in human lymphoblastoid cell lines, which is comprised of sites corresponding to 239 position weight matrices of known transcription factor binding motifs, and 49 novel sequence motifs. To generate this map, we developed a probabilistic framework that integrates cell- or tissue-specific experimental data such as histone modifications and DNase I cleavage patterns with genomic information such as gene annotation and evolutionary conservation. Comparison to empirical ChIP-seq data suggests that our method is highly accurate yet has the advantage of targeting many factors in a single assay. We anticipate that this approach will be a valuable tool for genome-wide studies of gene regulation in a wide variety of cell types or tissues under diverse conditions.
Genome Res. 2011 Dec 29;:
22207615
George H Perry,
Pall Melsted,
John C Marioni,
Ying Wang,
Russell Bainer,
Joseph K Pickrell,
Katelyn Michelini,
Sarah Zehr,
Anne D Yoder,
Matthew Stephens,
Jonathan K Pritchard,
Yoav Gilad
University of Chicago;
Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of non-human primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps towards addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 non-human primates. Of the non-human primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5,721 genes per species, and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerels sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have therefore strong potential to achieve long-term success.
Bioinformatics. 2009 Oct 6;:
19808877
Cit:21
Jacob F Degner,
John C Marioni,
Athma A Pai,
Joseph K Pickrell,
Everlyne Nkadori,
Yoav Gilad,
Jonathan K Pritchard
Department of Human Genetics, Howard Hughes Medical Institute, and Committee on Genetics, Genomics and Systems Biology, University of Chicago, 920 E. 58th St., CLSC 507, Chicago, IL 60637 .
MOTIVATION: Next-generation sequencing has become an important tool for genome-wide quantification of DNA and RNA. However, a major technical hurdle lies in the need to map short sequence reads back to their correct locations in a reference genome. Here we investigate the impact of SNP variation on the reliability of read-mapping in the context of detecting allele-specific expression (ASE). RESULTS: We generated sixteen million 35 bp reads from mRNA of each of two HapMap Yoruba individuals. When we mapped these reads to the human genome we found that, at heterozygous SNPs, there was a significant bias towards higher mapping rates of the allele in the reference sequence, compared to the alternative allele. Masking known SNP positions in the genome sequence eliminated the reference bias but, surprisingly, did not lead to more reliable results overall. We find that even after masking, approximately 5-10% of SNPs still have an inherent bias towards more effective mapping of one allele. Filtering out inherently biased SNPs removes 40% of the top signals of ASE. The remaining SNPs showing ASE are enriched in genes previously known to harbor cis-regulatory variation or known to show uniparental imprinting. Our results have implications for a variety of applications involving detection of alternate alleles from short-read sequence data. AVAILABILITY: Scripts, written in Perl and R, for simulating short reads, masking SNP variation in a reference genome, and analyzing the simulation output are available upon request from JFD. Raw short read data were deposited in GEO (http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE18156. CONTACT: jdegner@uchicago.edu, marioni@uchicago.edu, gilad@uchicago.edu, pritch@uchicago.edu.
Darren A Cusanovich,
Christine Billstrand,
Xiang Zhou,
Claudia Chavarria,
Sherryl De Leon,
Katelyn Michelini,
Athma A Pai,
Carole Ober,
Yoav Gilad
Recent genome-wide association studies (GWAS) have identified a number of novel genetic associations with complex human diseases. In spite of these successes, results from GWAS generally explain only a small proportion of disease heritability, an observation termed the 'missing heritability problem'. Several sources for the missing heritability have been proposed, including the contribution of many common variants with small individual effect sizes, which cannot be reliably found using the standard GWAS approach. The goal of our study was to explore a complimentary approach, which combines GWAS results with functional data in order to identify novel genetic associations with small effect sizes. To do so, we conducted a GWAS for lymphocyte count, a physiologic quantitative trait associated with asthma, in 462 Hutterites. In parallel, we performed a genome-wide gene expression study in lymphoblastoid cell lines from 96 Hutterites. We found significant support for genetic associations using the GWAS data when we considered variants near the 193 genes whose expression levels across individuals were most correlated with lymphocyte counts. Interestingly, these variants are also enriched with signatures of an association with asthma susceptibility, an observation we were able to replicate. The associated loci include genes previously implicated in asthma susceptibility as well as novel candidate genes enriched for functions related to T cell receptor signaling and adenosine triphosphate synthesis. Our results, therefore, establish a new set of asthma susceptibility candidate genes. More generally, our observations support the notion that many loci of small effects influence variation in lymphocyte count and asthma susceptibility.
Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA. pickrell@uchicago.edu
MOTIVATION Sequencing-based assays such as ChIP-seq, DNase-seq and MNase-seq have become important tools for genome annotation. In these assays, short sequence reads enriched for loci of interest are mapped to a reference genome to determine their origin. Here, we consider whether false positive peak calls can be caused by particular type of error in the reference genome: multicopy sequences which have been incorrectly assembled and collapsed into a single copy. RESULTS Using sequencing data from the 1000 Genomes Project, we systematically scanned the human genome for regions of high sequencing depth. These regions are highly enriched for erroneously inferred transcription factor binding sites, positions of nucleosomes and regions of open chromatin. We suggest a simple masking procedure to remove these regions and reduce false positive calls. AVAILABILITY Files for masking out these regions are available at eqtl.uchicago.edu
Latest similar papers:
Hum Mol Genet. 2012 May 16;:
22595970
Anbarasu Lourdusamy,
Stephan Newhouse,
Katie Lunnon,
Petra Proitsi,
John Powell,
Angela Hodges,
Sally K Nelson,
Alex Stewart,
Stephen Williams,
Iwona Kloszewska,
Patrizia Mecocci,
Hilkka Soininen,
Magda Tsolaki,
Bruno Vellas,
Simon Lovestone,
Richard Dobson
NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Foundation Trust & Institute of Psychiatry, Kings College London, UK.
Proteins are central to almost all cellular processes, and dysregulation of expression and function are associated with a range of disorders. A number of studies in human have recently shown that genetic factors significantly contribute gene expression variation. By contrast, very little is known about the genetic basis of variation in protein abundance in man. Here, we assayed the abundance levels of proteins in plasma from 96 elderly Europeans using a new aptamer-based proteomic technology and performed genome-wide local (cis-) regulatory association analysis to identify protein quantitative trait loci (pQTL). We detected robust cis- associations for sixty proteins at a false discovery rate of 5%. The most highly significant SNP detected was rs7021589 (FDR, 2.5 × 10(-12)), mapped within the gene coding sequence of Tenascin C (TNC). Importantly, we identified evidence of cis- regulatory variation for twenty previously disease associated genes encoding protein, including variants with strong evidence of disease association show significant association with protein abundance levels. These results demonstrate that common genetic variants contribute to the differences in protein abundance levels in human plasma. Identification of pQTLs will significantly enhance our ability to discover and comprehend the biological and functional consequences of loci identified from genome-wide association study of complex traits. This is the first large-scale genetic association study of proteins in plasma measured using a novel, highly multiplexed Slow Off-rate Modified Aptamer (SOMAmer) proteomic platform.
Diabetologia. 2012 May 16;:
22584726
Division of Systems Medicine, Department of Pediatrics, Stanford University School of Medicine, 1265 Welch Road, Room X163, Stanford, CA, 94305, USA.
AIMS/HYPOTHESIS: While genome-wide association studies (GWASs) have been successful in identifying novel variants associated with various diseases, it has been much more difficult to determine the biological mechanisms underlying these associations. Expression quantitative trait loci (eQTL) provide another dimension to these data by associating single nucleotide polymorphisms (SNPs) with gene expression. We hypothesised that integrating SNPs known to be associated with type 2 diabetes with eQTLs and coexpression networks would enable the discovery of novel candidate genes for type 2 diabetes. METHODS: We selected 32 SNPs associated with type 2 diabetes in two or more independent GWASs. We used previously described eQTLs mapped from genotype and gene expression data collected from 1,008 morbidly obese patients to find genes with expression associated with these SNPs. We linked these genes to coexpression modules, and ranked the other genes in these modules using an inverse sum score. RESULTS: We found 62 genes with expression associated with type 2 diabetes SNPs. We validated our method by linking highly ranked genes in the coexpression modules back to SNPs through a combined eQTL dataset. We showed that the eQTLs highlighted by this method are significantly enriched for association with type 2 diabetes in data from the Wellcome Trust Case Control Consortium (WTCCC, p = 0.026) and the Gene Environment Association Studies (GENEVA, p = 0.042), validating our approach. Many of the highly ranked genes are also involved in the regulation or metabolism of insulin, glucose or lipids. CONCLUSIONS/INTERPRETATION: We have devised a novel method, involving the integration of datasets of different modalities, to discover novel candidate genes for type 2 diabetes.
Hum Mutat. 2012 May 9;:
22573514
Christelle Borel,
Eugenia Migliavacca,
Audrey Letourneau,
Maryline Gagnebin,
Frédérique Béna,
M Reza Sailani,
Emmanouil T Dermitzakis,
Andrew J Sharp,
Stylianos E Antonarakis
Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland; Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York.
Association studies have revealed expression quantitative trait loci (eQTLs) for a large number of genes. However, the causative variants that regulate gene expression levels are generally unknown. We hypothesized that copy number variation of sequence repeats contribute to the expression variation of some genes. Our laboratory has previously identified that the rare expansion of a repeat c.-174CGGGGCGGGGCG in the promoter region of the CSTB gene causes a silencing of the gene, resulting in progressive myoclonus epilepsy. Here, we genotyped the repeat length and quantified CSTB expression by qRT-PCR in 173 lymphoblast (LCLs) and fibroblast samples from the GenCord collection. The majority of alleles contain either 2 or 3 copies of this repeat. Independent analysis revealed that the c.-174CGGGGCGGGGCG repeat length is strongly associated with CSTB expression (P = 3.14 × 10(-11)) in LCLs only. Examination of both genotyped and imputed SNPs within 2Mb of CSTB revealed that the dodecamer repeat represents the strongest cis-eQTL for CSTB in LCLs. We conclude that the common 2 or 3 copy variation is likely the causative cis-eQTL for CSTB expression variation. More broadly, we propose that polymorphic tandem repeats may represent the causative variation of a fraction of cis-eQTLs in the genome.
PLoS Genet. 2012 Apr ;8 (4):e1002639
22532805
Barbara E Stranger,
Stephen B Montgomery,
Antigone S Dimas,
Leopold Parts,
Oliver Stegle,
Catherine E Ingle,
Magda Sekowska,
George Davey Smith,
David Evans,
Maria Gutierrez-Arcelus,
Alkes Price,
Towfique Raj,
James Nisbett,
Alexandra C Nica,
Claude Beazley,
Richard Durbin,
Panos Deloukas,
Emmanouil T Dermitzakis
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom.
The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants, but also more recently to assist in the interpretation and elucidation of disease signals. To date, many studies have looked in specific tissues and population-based samples, but there has been limited assessment of the degree of inter-population variability in regulatory variation. We analyzed genome-wide gene expression in lymphoblastoid cell lines from a total of 726 individuals from 8 global populations from the HapMap3 project and correlated gene expression levels with HapMap3 SNPs located in cis to the genes. We describe the influence of ancestry on gene expression levels within and between these diverse human populations and uncover a non-negligible impact on global patterns of gene expression. We further dissect the specific functional pathways differentiated between populations. We also identify 5,691 expression quantitative trait loci (eQTLs) after controlling for both non-genetic factors and population admixture and observe that half of the cis-eQTLs are replicated in one or more of the populations. We highlight patterns of eQTL-sharing between populations, which are partially determined by population genetic relatedness, and discover significant sharing of eQTL effects between Asians, European-admixed, and African subpopulations. Specifically, we observe that both the effect size and the direction of effect for eQTLs are highly conserved across populations. We observe an increasing proximity of eQTLs toward the transcription start site as sharing of eQTLs among populations increases, highlighting that variants close to TSS have stronger effects and therefore are more likely to be detected across a wider panel of populations. Together these results offer a unique picture and resource of the degree of differentiation among human populations in functional regulatory variation and provide an estimate for the transferability of complex trait variants across populations.
PLoS One. 2012 ;7 (3):e34286
22479588
Pierre R Bushel,
Ray McGovern,
Liwen Liu,
Oliver Hofmann,
Ahsan Huda,
Jun Lu,
Winston Hide,
Xihong Lin
Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, United States of America.
Gene expression quantitative trait loci (eQTL) are useful for identifying single nucleotide polymorphisms (SNPs) associated with diseases. At times, a genetic variant may be associated with a master regulator involved in the manifestation of a disease. The downstream target genes of the master regulator are typically co-expressed and share biological function. Therefore, it is practical to screen for eQTLs by identifying SNPs associated with the targets of a transcript-regulator (TR). We used a multivariate regression with the gene expression of known targets of TRs and SNPs to identify TReQTLs in European (CEU) and African (YRI) HapMap populations. A nominal p-value of <1×10(-6) revealed 234 SNPs in CEU and 154 in YRI as TReQTLs. These represent 36 independent (tag) SNPs in CEU and 39 in YRI affecting the downstream targets of 25 and 36 TRs respectively. At a false discovery rate (FDR) = 45%, one cis-acting tag SNP (within 1 kb of a gene) in each population was identified as a TReQTL. In CEU, the SNP (rs16858621) in Pcnxl2 was found to be associated with the genes regulated by CREM whereas in YRI, the SNP (rs16909324) was linked to the targets of miRNA hsa-miR-125a. To infer the pathways that regulate expression, we ranked TReQTLs by connectivity within the structure of biological process subtrees. One TReQTL SNP (rs3790904) in CEU maps to Lphn2 and is associated (nominal p-value = 8.1×10(-7)) with the targets of the X-linked breast cancer suppressor Foxp3. The structure of the biological process subtree and a gene interaction network of the TReQTL revealed that tumor necrosis factor, NF-kappaB and variants in G-protein coupled receptors signaling may play a central role as communicators in Foxp3 functional regulation. The potential pleiotropic effect of the Foxp3 TReQTLs was gleaned from integrating mRNA-Seq data and SNP-set enrichment into the analysis.
Neurobiol Dis. 2012 Mar 12;:
22433082
Dena G Hernandez,
Mike A Nalls,
Matthew Moore,
Sean Chong,
Allissa Dillman,
Daniah Trabzuni,
J Raphael Gibbs,
Mina Ryten,
Sampath Arepalli,
Michael E Weale,
Alan Zonderman,
Juan Troncoso,
Richard O'Brian,
Robert Walker,
Colin Smith,
Stefania Bandinelli,
Bryan J Traynor,
John Hardy,
Andy B Singleton,
Mark R Cookson
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA; Department of Molecular Neuroscience, UCL Institute of Neurology, London, UK.
Genome-wide association studies have nominated many genetic variants for common human traits, including diseases, but in many cases the underlying biological reason for a trait association is unknown. Subsets of genetic polymorphisms show a statistical association with transcript expression levels, and have therefore been nominated as expression quantitative trait loci (eQTL). However, many tissue and cell types have specific gene expression patterns and so it is not clear how frequently eQTLs found in one tissue type will be replicated in others. In the present study we used two appropriately powered sample series to examine the genetic control of gene expression in blood and brain. We find that while many eQTLs associated with human traits are shared between these two tissues, there are also examples where blood and brain differ, either by restricted gene expression patterns in one tissue or because of differences in how genetic variants are associated with transcript levels. These observations suggest that design of eQTL mapping experiments should consider tissue of interest for the disease or other traits studied.
Methods Mol Biol. 2012 ;856 :469-85
22399471
Laboratory of Nematology, Wageningen University, Wageningen, The Netherlands, pjotr.prins@wur.nl.
Genetical genomics combines acquired high-throughput genomic data with genetic analysis. In this chapter, we discuss the application of genetical genomics for evolutionary studies, where new high-throughput molecular technologies are combined with mapping quantitative trait loci (QTL) on the genome in segregating populations.The recent explosion of high-throughput data-measuring thousands of proteins and metabolites, deep sequencing, chromatin, and methyl-DNA immunoprecipitation-allows the study of the genetic variation underlying quantitative phenotypes, together termed xQTL. At the same time, mining information is not getting easier. To deal with the sheer amount of information, powerful statistical tools are needed to analyze multidimensional relationships. In the context of evolutionary computational biology, a well-designed experiment may help dissect a complex evolutionary trait using proven statistical methods for associating phenotypical variation with genomic locations.Evolutionary expression QTL (eQTL) studies of the last years focus on gene expression adaptations, mapping the gene expression landscape, and, tentatively, eQTL networks. Here, we discuss the possibility of introducing an evolutionary prior, in the form of gene families displaying evidence of positive selection, and using that in the context of an eQTL experiment for elucidating host-pathogen protein-protein interactions. Through the example of an experimental design, we discuss the choice of xQTL platform, analysis methods, and scope of results. The resulting eQTL can be matched, resulting in putative interacting genes and their regulators. In addition, a prior may help distinguish QTL causality from reactivity, or independence of traits, by creating QTL networks.
PLoS One. 2012 ;7 (2):e30629
22359548
Jean-Baptiste Veyrieras,
Daniel J Gaffney,
Joseph K Pickrell,
Yoav Gilad,
Matthew Stephens,
Jonathan K Pritchard
Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America. jb.veyrieras@gmail.com
Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3' untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
Genome Res. 2012 May ;22 (5):860-869
22300769
Timothy E Reddy,
Jason Gertz,
Florencia Pauli,
Katerina S Kucera,
Katherine E Varley,
Kimberly M Newberry,
Georgi K Marinov,
Ali Mortazavi,
Brian A Williams,
Lingyun Song,
Gregory E Crawford,
Barbara Wold,
Huntington F Willard,
Richard M Myers
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama 35806, USA;
A complex interplay between transcription factors (TFs) and the genome regulates transcription. However, connecting variation in genome sequence with variation in TF binding and gene expression is challenging due to environmental differences between individuals and cell types. To address this problem, we measured genome-wide differential allelic occupancy of 24 TFs and EP300 in a human lymphoblastoid cell line GM12878. Overall, 5% of human TF binding sites have an allelic imbalance in occupancy. At many sites, TFs clustered in TF-binding hubs on the same homolog in especially open chromatin. While genetic variation in core TF binding motifs generally resulted in large allelic differences in TF occupancy, most allelic differences in occupancy were subtle and associated with disruption of weak or noncanonical motifs. We also measured genome-wide differential allelic expression of genes with and without heterozygous exonic variants in the same cells. We found that genes with differential allelic expression were overall less expressed both in GM12878 cells and in unrelated human cell lines. Comparing TF occupancy with expression, we found strong association between allelic occupancy and expression within 100 bp of transcription start sites (TSSs), and weak association up to 100 kb from TSSs. Sites of differential allelic occupancy were significantly enriched for variants associated with disease, particularly autoimmune disease, suggesting that allelic differences in TF occupancy give functional insights into intergenic variants associated with disease. Our results have the potential to increase the power and interpretability of association studies by targeting functional intergenic variants in addition to protein coding sequences.
Mutagenesis. 2012 Mar ;27 (2):161-7
22294763
Department of Molecular Biology of Cancer, Institute of Experimental Medicine, Academy of Sciences of the Czech Republic, Videnska 1083, 14220 Prague 4, Czech Republic. pardini@biomed.cas.cz
Colorectal cancer (CRC) is one of the most common cancers worldwide with a peak of incidence in industrialised countries. It is a complex disease related to environmental and genetic risk factors. Low-penetrance genetic variations contribute significantly to sporadic and familial form of CRC. Genome-wide association studies (GWAS) have uncovered numerous robust associations between common variants and CRC risk; only a few of those were protein altering non-synonymous polymorphisms. One of the hypotheses is that non-coding and intergenic variants may change the expression levels of one or several target genes and, thus, account for a fraction of phenotypic differences, including susceptibility to CRC. Such genetic variations have been detected as expression quantitative loci (eQTLs) that show linkage/association to a large number of genes and have been defined as "master regulators of transcription". In the present work, we overview the potentialities to use results from GWAS and eQTL studies in the identification as well as investigation of master regulators in CRC susceptibility.
|
Polish News |
|
||
|
|