BioInfoBank Library


FP7 Partner
Add BioInfo.PL bioinformatics lab to Your FP7 application
Genome Res. 2006 Nov 22;: 17122085 (P,S,G,E,B,D) Cited:42
The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom;
This study describes a new tool for accurate and reliable high-throughput detection of copy number variation in the human genome. We have constructed a large-insert clone DNA microarray covering the entire human genome in tiling path resolution that we have used to identify copy number variation in human populations. Crucial to this study has been the development of a robust array platform and analytic process for the automated identification of copy number variants (CNVs). The array consists of 26,574 clones covering 93.7% of euchromatic regions. Clones were selected primarily from the published "Golden Path," and mapping was confirmed by fingerprinting and BAC-end sequencing. Array performance was extensively tested by a series of validation assays. These included determining the hybridization characteristics of each individual clone on the array by chromosome-specific add-in experiments. Estimation of data reproducibility and false-positive/negative rates was carried out using self-self hybridizations, replicate experiments, and independent validations of CNVs. Based on these studies, we developed a variance-based automatic copy number detection analysis process (CNVfinder) and have demonstrated its robustness by comparison with the SW-ARRAY method.

Latest citations:

Nucleic Acids Res. 2009 Jun 18;: 19541849 (P,S,G,E,B,D)
Institute of Human Genetics, Medical University of Graz, Harrachgasse 21/8, A-8010 Graz, Das Kinderwunsch-Institut Schenk GmbH, Am Sendergrund 11, A-8143 Dobl, Institute for Genomics and Bioinformatics, Graz University of Technology, Petersgasse 14/V, 8010 Graz, Austria, Department of Genetics, Institute for Cancer Research, Department of Pathology, Norwegian Radium Hospital, Oslo University Hospital, 0310 Oslo and Biomedical Research Group, Department of Informatics, University of Oslo, P.O. Box 1080, Blindern, 0316 Oslo, Norway.
Clinical DNA is often available in limited quantities requiring whole-genome amplification for subsequent genome-wide assessment of copy-number variation (CNV) by array-CGH. In pre-implantation diagnosis and analysis of micrometastases, even merely single cells are available for analysis. However, procedures allowing high-resolution analyses of CNVs from single cells well below resolution limits of conventional cytogenetics are lacking. Here, we applied amplification products of single cells and of cell pools (5 or 10 cells) from patients with developmental delay, cancer cell lines and polar bodies to various oligo tiling array platforms with a median probe spacing as high as 65 bp. Our high-resolution analyses reveal that the low amounts of template DNA do not result in a completely unbiased whole genome amplification but that stochastic amplification artifacts, which become more obvious on array platforms with tiling path resolution, cause significant noise. We implemented a new evaluation algorithm specifically for the identification of small gains and losses in such very noisy ratio profiles. Our data suggest that when assessed with sufficiently sensitive methods high-resolution oligo-arrays allow a reliable identification of CNVs as small as 500 kb in cell pools (5 or 10 cells), and of 2.6-3.0 Mb in single cells.
Pharmacogenomics. 2009 Jun ;10 (6):1043-53 19530973 (P,S,G,E,B,D)
Center for Human Genetics Research & Department of Molecular Physiology & Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA.
AIMS: The 'rhythmonome' is the term we have adopted to describe the set of genes that determine the normal coordinated electrical activity in the heart. Elements of this set include pore-forming ion channels, function-modifying proteins and intracellular calcium control elements. Rare mutations in many of these genes are known to cause unusual congenital monogenic arrhythmia syndromes, and single common variants have been reported to modify arrhythmia phenotypes. Here, we report an evaluation of the variation and haplotype structure in six key components of the rhythmonome. MATERIALS & METHODS: SNPs were typed using DNA extracted from Coriell cell lines to survey allele frequencies and haplotype structure in six genes (ANK2, SCN5A, KCNE1 and 2 gene cluster, KCNQ1, KCNH2 and RYR2) across four human populations (African-American, European American, Han Chinese and Mexican American). RESULTS: A total of 307 SNPs were analyzed across the six genes, revealing significant allele-frequency differences between populations and clear differences in haplotype structure. CONCLUSIONS: The pattern of variation we report is an important step towards incorporating common variation across the rhythmonome in studies of arrhythmia susceptibility.
Genetics. 2009 Jun 15;: 19528327 (P,S,G,E,B,D)
Baylor College of Medicine.
Copy number variation contributes in phenotypically relevant ways to the genetic variability of many organisms. Cost-effective genome-wide methods for identifying copy number variation are necessary to elucidate the contribution that these structural variants make to the genomes of model organisms. We have developed a novel approach for the identification of copy number variation by next generation sequencing. As a proof of concept our method has been applied to map the deletions of three Drosophila deficiency strains. We demonstrate that low sequence coverage is sufficient for identifying and mapping large deletions at kilobase resolution, suggesting that data generated from high-throughput sequencing experiments are sufficient for simultaneously analyzing many strains. Genomic DNA from two Drosophila deficiency stocks was barcoded and sequenced in multiplex, and the breakpoints associated with each deletion were successfully identified. The approach we describe is immediately applicable to the systematic exploration of copy number variation in model organisms and humans.
Annu Rev Genomics Hum Genet. 2009 May 19;: 19453250 (P,S,G,E,B,D)
Jiannis Ragoussis
Genomics Laboratory, Wellcome Trust Centre for Human Genetics, Oxford University, Oxford, OX3 7BN, United Kingdom; email: ioannis.ragoussis@well.ox.ac.uk.
The past few years have seen enormous advances in genotyping technology, including chips that accommodate in excess of 1 million SNP assays. In addition, the cost per genotype has been driven down to levels unimagined only a few years ago. These developments have resulted in an explosion of positive whole-genome association studies and the identification of many new genes for common diseases. Here I review high-throughput genotyping platforms as well as other approaches for lower numbers of assays but high sample throughput, which play an important role in genotype validation and study replication. Further, the utility of SNP arrays for detecting structural variation through the development of genotyping algorithms is reviewed and methods for longrange haplotyping are presented. It is anticipated that in the future, sample throughput and cost savings will be increased further through the combination of automation, microfluidics, and nanotechnologies. Expected final online publication date for the Annual Review of Genomics and Human Genetics Volume 10 is August 30, 2009. Please see http://www.annualreviews.org/catalog/pubdates.aspx for revised estimates.
Genet Epidemiol. 2009 Apr 27;: 19399904 (P,S,G,E,B,D)
Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, Alabama.
While recent genomic surveys reveal growing numbers of di-allelic copy number variations, it is genes with multiallelic (>2) copy numbers that have shown association with distinct phenotypes. Current high-throughput laboratory methods are restricted to enumerating total gene copy numbers (GCNs) per individual and not the "genotype," i.e. gene copy per chromosome. Thus, association studies of multiallelic GCNs have been limited to comparison of median copies in different groups. Our new nonparametric statistical approach is based on GCN information within a trio-based study design. We present theoretical derivation of the statistics and results of simulation studies that show robustness of our approach and power under several genetic models. Genet. Epidemiol. 2009.(c) 2009 Wiley-Liss, Inc.
Bioinformatics. 2009 Apr 23;: 19389735 (P,S,G,E,B,D)
Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, Ontario, Canada.
MOTIVATION: Motivation: Efficient and accurate ascertainment of copy number variations (CNVs) at the population level is essential to understand the evolutionary process and population genetics, and to apply CNVs in population-based genome-wide association studies for complex human diseases. We propose a novel Bayesian segmentation approach to identify CNVs in a defined population of any size. It is computationally efficient and provides statistical evidence for the detected CNVs through the Bayes factor. This approach has the unique feature of carrying out segmentation and assigning copy number status simultaneously- a desirable property that current segmentation methods do not share. RESULTS: In comparisons with popular two-step segmentation methods for a single individual using benchmark simulation studies, we find the new approach to perform competitively with respect to false discovery rate and sensitivity in breakpoint detection. In simulation study of multiple samples with recurrent copy numbers, the new approach outperforms two leading single sample methods. We further demonstrate the effectiveness of our approach in populationlevel analysis of a previously published HapMap data. We also apply our approach in studying population genetics of CNVs. AVAILABILITY: R programs are available at http://www.mshri.on.ca/mitacs/software/SOFTWARE.HTML CONTACT: lwu@math.uwaterloo.ca.
Methods Mol Biol. 2009 ;529 :1-22 19381982 (P,S,G,E,B)
Martin Dufva
Technical University of Denmark, Kgs, Lyngby, Denmark.
DNA microarrays can be used for large number of application where high-throughput is needed. The ability to probe a sample for hundred to million different molecules at once has made DNA microarray one of the fastest growing techniques since its introduction about 15 years ago. Microarray technology can be used for large scale genotyping, gene expression profiling, comparative genomic hybridization and resequencing among other applications. Microarray technology is a complex mixture of numerous technology and research fields such as mechanics, microfabrication, chemistry, DNA behaviour, microfluidics, enzymology, optics and bioinformatics. This chapter will give an introduction to each five basic steps in microarray technology that includes fabrication, target preparation, hybridization, detection and data analysis. Basic concepts and nomenclature used in the field of microarray technology and their relationships will also be explained.
Methods Mol Biol. 2009 ;529 :1-13 19381971 (P,S,G,E,B)
Wellcome Trust, Sanger Institute,, Cambridge, UK.
Microarray-based Comparative Genomic Hybridization (array-CGH) has been applied for a decade to screen for submicroscopic DNA gains and losses in tumor and constitutional DNA samples. This method has become increasingly flexible with the integration of new biological resources generated by genome sequencing projects. In this chapter, we describe alternative strategies for whole genome screening and high resolution breakpoint mapping of copy number changes by array-CGH, as well as tools available for accurate analysis of array-CGH experiments. Although most methods listed here have been designed for microarrays comprising large-insert clones, they can be adapted easily to other types of microarray platforms, such as those constructed from printed or synthesized oligonucleotides.
Neuropsychiatr Dis Treat. 2007 Oct ;3 (5):613-8 19300590 (P,S,G,E,B)
Coriell Institute for Medical Research, Camden, NJ, USA.
Family history, which includes both common environmental and genetic effects, is associated with an increased risk for many neuropsychiatric diseases. Investigators have identified several disease-causing mutations for specific neuropsychiatric disorders that display Mendelian segregation. Such discoveries can lead to more rational drug design and improved intervention from a better understanding of the underlying biological mechanisms. However, a key challenge of genetic discovery in human complex diseases, including neuropsychiatric disorders, is that most diseases with genetic components display non-Mendelian patterns of inheritance. Recent advances in human population genetics include high-density genome-wide analyses of single nucleotide polymorphisms (SNPs) that make it possible to study complex genetic contributions to human disease. This approach is currently the most powerful strategy for analyzing the genetics of complex diseases. Genome-wide SNP analyses often require a large collaborative effort to collect, manage, and disseminate the numerous samples and corresponding clinical data. In this review we discuss the use of publicly available biorepositories for the collection and distribution of human genetic material, associated phenotypic information, and their use in genome-wide investigations of human neuropsychiatric diseases.
Cytogenet Genome Res. 2008 ;123 (1-4):333-42 19287172 (P,S,G,E,B)
J H Lee, J T Jeon
Division of Animal Science and Resources, College of Agriculture and Life Sciences, Chungnam National University, Daejeon, Korea. jtjeon@gnu.ac.kr
Copy number variations (CNVs) have effects on phenotypes by altering transcription levels of genes and may have major impacts on protein sequence, structure and function. Therefore, CNV screening and analysis focused on the identification of CNV-genetic disease relations are actively progressing. CNVs can be detected and analyzed by various methodologies at the genome-wide and locus-specific levels. The genome-wide analysis of CNVs has been enhanced by bioinformatic tools for long-range sequence analysis, and comparative genome hybridization using microarrays containing either single nucleotide polymorphisms or bacterial artificial chromosome clones that represent the whole genome. RFLP followed by Southern blot analysis, quantitative real-time PCR, pyrosequencing, ligation detection reaction and the invader assay have become the main tools for locus-specific analysis so far. In this review, we present a brief principle, application history, and strengths and weaknesses of the methods used to detect CNVs at the genome-wide and locus-specific levels.

Other papers by authors:

Nature. 2006 Nov 23;444 (7118):444-454 17122850 (P,S,G,E,B,D) Cited:725
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
Copy number variation (CNV) of DNA sequences is functionally significant but has yet to be fully ascertained. We have constructed a first-generation CNV map of the human genome through the study of 270 individuals from four populations with ancestry in Europe, Africa or Asia (the HapMap collection). DNA from these individuals was screened for CNV using two complementary technologies: single-nucleotide polymorphism (SNP) genotyping arrays, and clone-based comparative genomic hybridization. A total of 1,447 copy number variable regions (CNVRs), which can encompass overlapping or adjacent gains or losses, covering 360 megabases (12% of the genome) were identified in these populations. These CNVRs contained hundreds of genes, disease loci, functional elements and segmental duplications. Notably, the CNVRs encompassed more nucleotide content per genome than SNPs, underscoring the importance of CNV in genetic diversity and evolution. The data obtained delineate linkage disequilibrium patterns for many CNVs, and reveal marked variation in copy number among populations. We also demonstrate the utility of this resource for genetic disease studies.
Nature. 2009 Oct 7;: 19812545 (P,S,G,E,B,D)
[1] The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA UK [2] These authors contributed equally to this work.
Structural variations of DNA greater than 1 kilobase in size account for most bases that vary among human genomes, but are still relatively under-ascertained. Here we use tiling oligonucleotide microarrays, comprising 42 million probes, to generate a comprehensive map of 11,700 copy number variations (CNVs) greater than 443 base pairs, of which most (8,599) have been validated independently. For 4,978 of these CNVs, we generated reference genotypes from 450 individuals of European, African or East Asian ancestry. The predominant mutational mechanisms differ among CNV size classes. Retrotransposition has duplicated and inserted some coding and non-coding DNA segments randomly around the genome. Furthermore, by correlation with known trait-associated single nucleotide polymorphisms (SNPs), we identified 30 loci with CNVs that are candidates for influencing disease susceptibility. Despite this, having assessed the completeness of our map and the patterns of linkage disequilibrium between CNVs and SNPs, we conclude that, for complex traits, the heritability void left by genome-wide association studies will not be accounted for by common CNVs.
Genome Res. 2006 Jun 29;: 16809666 (P,S,G,E,B) Cited:3
Department of Pathology, Brigham and Women’s Hospital, Boston, Massachusetts 02115, USA;
DNA copy number variation has long been associated with specific chromosomal rearrangements and genomic disorders, but its ubiquity in mammalian genomes was not fully realized until recently. Although our understanding of the extent of this variation is still developing, it seems likely that, at least in humans, copy number variants (CNVs) account for a substantial amount of genetic variation. Since many CNVs include genes that result in differential levels of gene expression, CNVs may account for a significant proportion of normal phenotypic variation. Current efforts are directed toward a more comprehensive cataloging and characterization of CNVs that will provide the basis for determining how genomic diversity impacts biological function, evolution, and common human diseases.
Nat Genet. 2007 Sep 9;: 17828263 (P,S,G,E,B,D) Cited:29
Starch consumption is a prominent characteristic of agricultural societies and hunter-gatherers in arid environments. In contrast, rainforest and circum-arctic hunter-gatherers and some pastoralists consume much less starch. This behavioral variation raises the possibility that different selective pressures have acted on amylase, the enzyme responsible for starch hydrolysis. We found that copy number of the salivary amylase gene (AMY1) is correlated positively with salivary amylase protein level and that individuals from populations with high-starch diets have, on average, more AMY1 copies than those with traditionally low-starch diets. Comparisons with other loci in a subset of these populations suggest that the extent of AMY1 copy number differentiation is highly unusual. This example of positive selection on a copy number-variable gene is, to our knowledge, one of the first discovered in the human genome. Higher AMY1 copy numbers and protein levels probably improve the digestion of starchy foods and may buffer against the fitness-reducing effects of intestinal disease.
Nat Genet. 2007 Jul ;39 (7 Suppl):S7-15 17597783 (P,S,G,E,B,D) Cited:56
There has been an explosion of data describing newly recognized structural variants in the human genome. In the flurry of reporting, there has been no standard approach to collecting the data, assessing its quality or describing identified features. This risks becoming a rampant problem, in particular with respect to surveys of copy number variation and their application to disease studies. Here, we consider the challenges in characterizing and documenting genomic structural variants. From this, we derive recommendations for standards to be adopted, with the aim of ensuring the accurate presentation of this form of genetic variation to facilitate ongoing research.
Genome Res. 2008 Sep 4;: 18775914 (P,S,G,E,B,D) Cited:2
Arizona State University;
Copy number variants (CNVs) underlie many aspects of human phenotypic diversity and provide the raw material for gene duplication and gene family expansion. However, our understanding of their evolutionary significance remains limited. We performed comparative genomic hybridization on a single human microarray platform to identify CNVs among the genomes of 30 humans and 30 chimpanzees, as well as fixed copy number differences between species. We found that human and chimpanzee CNVs occur in orthologous genomic regions far more often than expected by chance and are strongly associated with the presence of highly homologous intra-chromosomal segmental duplications. By adapting population genetic analyses for use with copy number data, we identified functional categories of genes that have likely evolved under purifying or positive selection for copy number changes. In particular, duplications and deletions of genes with inflammatory response and cell proliferation functions may have been fixed by positive selection and involved in the adaptive phenotypic differentiation of humans and chimpanzees.
EMBO J. 2007 May 10;: 17491589 (P,S,G,E,B,D) Cited:7
The Wellcome Trust and Cancer Research UK Gurdon Institute, Department of Zoology, University of Cambridge, Cambridge, UK.
Phosphorylated histone H2AX (gammaH2AX) is generated in nucleosomes flanking sites of DNA double-strand breaks, triggering the recruitment of DNA-damage response proteins such as MDC1 and 53BP1. Here, we study shortened telomeres in senescent human cells. We show that most telomeres trigger gammaH2AX formation, which spreads up to 570 kb into the subtelomeric regions. Furthermore, we reveal that the spreading patterns of 53BP1 and MDC1 are very similar to that of gammaH2AX, consistent with a structural link between these factors. Moreover, different subsets of telomeres signal in different cell lines, with those that signal tending to equate to the shortest telomeres of the corresponding cell line, thus linking telomere attrition with DNA-damage signalling. Notably, we find that, in some cases, gammaH2AX spreading is modulated in a manner suggesting that H2AX distribution or its ability to be phosphorylated is not uniform along the chromosome. Finally, we observe weak gammaH2AX signals at telomeres of proliferating cells, but not in hTERT immortalised cells, suggesting that low telomerase activity leads to telomere uncapping and senescence in proliferating primary cells.
Science. 2007 Feb 9;315 (5813):848-53 17289997 (P,S,G,E,B,D) Cited:95
Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Extensive studies are currently being performed to associate disease susceptibility with one form of genetic variation, namely, single-nucleotide polymorphisms (SNPs). In recent years, another type of common genetic variation has been characterized, namely, structural variation, including copy number variants (CNVs). To determine the overall contribution of CNVs to complex phenotypes, we have performed association analyses of expression levels of 14,925 transcripts with SNPs and CNVs in individuals who are part of the International HapMap project. SNPs and CNVs captured 83.6% and 17.7% of the total detected genetic variation in gene expression, respectively, but the signals from the two types of variation had little overlap. Interrogation of the genome for both types of variants may be an effective way to elucidate the causes of complex phenotypes and disease in humans.
Nucleic Acids Res. 2006 Dec 18;: 17178751 (P,S,G,E,B,D) Cited:7
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SA, UK.
Heterogeneity in the genome copy number of tissues is of particular importance in solid tumor biology. Furthermore, many clinical applications such as pre-implantation and non-invasive prenatal diagnosis would benefit from the ability to characterize individual single cells. As the amount of DNA from single cells is so small, several PCR protocols have been developed in an attempt to achieve unbiased amplification. Many of these approaches are suitable for subsequent cytogenetic analyses using conventional methodologies such as comparative genomic hybridization (CGH) to metaphase spreads. However, attempts to harness array-CGH for single-cell analysis to provide improved resolution have been disappointing. Here we describe a strategy that combines single-cell amplification using GenomePlex library technology (GenomePlex((R)) Single Cell Whole Genome Amplification Kit, Sigma-Aldrich, UK) and detailed analysis of genomic copy number changes by high-resolution array-CGH. We show that single copy changes as small as 8.3 Mb in single cells are detected reliably with single cells derived from various tumor cell lines as well as patients presenting with trisomy 21 and Prader-Willi syndrome. Our results demonstrate the potential of this technology for studies of tumor biology and for clinical diagnostics.
Nat Genet. 2006 Nov 22;: 17115057 (P,S,G,E,B,D) Cited:28
Program in Genetics and Genomic Biology, The Hospital for Sick Children and Department of Molecular and Medical Genetics, University of Toronto and The Centre for Applied Genomics, MaRS Centre, Toronto, Ontario, M5G 1L7, Canada.
Numerous types of DNA variation exist, ranging from SNPs to larger structural alterations such as copy number variants (CNVs) and inversions. Alignment of DNA sequence from different sources has been used to identify SNPs and intermediate-sized variants (ISVs). However, only a small proportion of total heterogeneity is characterized, and little is known of the characteristics of most smaller-sized (<50 kb) variants. Here we show that genome assembly comparison is a robust approach for identification of all classes of genetic variation. Through comparison of two human assemblies (Celera's R27c compilation and the Build 35 reference sequence), we identified megabases of sequence (in the form of 13,534 putative non-SNP events) that were absent, inverted or polymorphic in one assembly. Database comparison and laboratory experimentation further demonstrated overlap or validation for 240 variable regions and confirmed >1.5 million SNPs. Some differences were simple insertions and deletions, but in regions containing CNVs, segmental duplication and repetitive DNA, they were more complex. Our results uncover substantial undescribed variation in humans, highlighting the need for comprehensive annotation strategies to fully interpret genome scanning and personalized sequencing projects.

Latest similar papers:

Brief Bioinform. 2010 Jan 6;: 20053733 (P,S,G,E,B,D)
The advent of high-throughput sequencing (HTS) technologies is enabling sequencing of human genomes at a significantly lower cost. The availability of these genomes is hoped to enable novel medical diagnostics and treatment, specific to the individual, thus launching the era of personalized medicine. The data currently generated by HTS machines require extensive computational analysis in order to identify genomic variants present in the sequenced individual. In this paper, we overview HTS technologies and discuss several of the plethora of algorithms and tools designed to analyze HTS data, including algorithms for read mapping, as well as methods for identification of single-nucleotide polymorphisms, insertions/deletions and large-scale structural variants and copy-number variants from these mappings.
Bioinformatics. 2009 Dec 18;: 20022973 (P,S,G,E,B,D)
Wellcome Trust Sanger Institute, Hinxton, CB10 1HH, UK.
SUMMARY: We have developed an algorithm to detect copy number variants (CNVs) in homozygous organisms, such as inbred laboratory strains of mice, from short read sequence data. Our novel approach exploits the fact that inbred mice are homozygous at virtually every position in the genome to detect CNVs using a hidden Markov model (HMM). This HMM uses both the density of sequence reads mapped to the genome, and the rate of apparent heterozygous single nucleotide polymorphisms (SNP), to determine genomic copy number. We tested our algorithm on short read sequence data generated from re-sequencing chromosome 17 of the mouse strains A/J and CAST/EiJ with the Illumina platform. In total, we identified 118 copy number variants (43 for A/J and 75 for CAST/EiJ). We investigated the performance of our algorithm through comparison to CNVs previously identified by array-comparative genomic hybridization (array CGH). We performed quantitative-PCR validation on a subset of the calls that differed from the array CGH data sets. AVAILABILITY: The software described in this manuscript, named cnD for copy number detector, is free and released under the GPL. The program is implemented in the D programming language using the Tango library. Source code and pre-compiled binaries are available at www.sanger.ac.uk/Software/analysis/cnd/. CONTACT: rd@sanger.ac.uk.
Genome Biol. 2009 Nov 9;10 (11):R125 19900272 (P,S,G,E,B,D)
ABSTRACT: BACKGROUND: Copy number variants (CNVs) account for a large proportion of genetic variation in the genome. The initial discoveries of long (>100kb) CNVs in normal healthy individuals were made on BAC arrays and low resolution oligonucleotide arrays. Subsequent studies that used higher resolution microarrays and SNP genotyping arrays detected the presence of large numbers of CNVs that are <100kb, with median lengths of ~10kb. More recently, whole genome sequencing of individuals has revealed an abundance of shorter CNVs with lengths <1kb. RESULTS: We used custom high density oligonucleotide arrays in whole-genome scans at ~200bp resolution; and followed up with a localized CNV typing array at resolutions as close as 10bp, to confirm regions from the initial genome scans, and to detect the occurrence of sample-level events at shorter CNV regions identified in recent whole-genome sequencing studies. We surveyed 90 Yoruba Nigerians from the HapMap Project, and uncovered ~2,700 potentially novel CNVs, not previously reported in the literature, having a median length of ~3kb. We generated sample-level event calls in the 90 Yoruba at nearly 9,000 regions, including ~2,500 regions having a median length of just ~200bp that represent the union of CNVs independently discovered through whole-genome sequencing of two individuals of Western European descent. Event frequencies were noticeably higher at shorter regions <1kb, compared to longer CNVs (>1kb). CONCLUSIONS: As new shorter CNVs are discovered through whole-genome sequencing, high resolution microarrays offer a cost-effective means to detect the occurrence of events at these regions in large numbers of individuals in order to gain biological insights beyond the initial discovery.
Genome Biol. 2009 Oct 22;10 (10):R119 19849861 (P,S,G,E,B,D)
ABSTRACT: Copy number variants (CNVs) have roles in human disease, and DNA microarrays are important tools for identifying them. In this paper, we frame CNV identification as an objective function optimization problem. We apply our method to data from hundreds of samples, and demonstrate its ability to detect CNVs at a high level of sensitivity without sacrificing specificity. Its performance compares favorably with currently available methods and it reveals previously unreported gains and losses.
Biostatistics. 2009 Oct 15;: 19837654 (P,S,G,E,B,D) Cited:1
Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK. cdg@sanger.ac.uk.
High-throughput oligonucleotide microarrays are commonly employed to investigate genetic disease, including cancer. The algorithms employed to extract genotypes and copy number variation function optimally for diploid genomes usually associated with inherited disease. However, cancer genomes are aneuploid in nature leading to systematic errors when using these techniques. We introduce a preprocessing transformation and hidden Markov model algorithm bespoke to cancer. This produces genotype classification, specification of regions of loss of heterozygosity, and absolute allelic copy number segmentation. Accurate prediction is demonstrated with a combination of independent experimental techniques. These methods are exemplified with affymetrix genome-wide SNP6.0 data from 755 cancer cell lines, enabling inference upon a number of features of biological interest. These data and the coded algorithm are freely available for download.
Prenat Diagn. 2009 Sep 30;: 19795450 (P,S,G,E,B,D)
Signature Genomic Laboratories, Spokane, WA, USA.
OBJECTIVE: To determine the detection rates of whole-genome microarray technology compared to targeted microarray analysis for chromosome abnormalities in prenatal samples submitted for diagnostic testing. METHODS: Microarray analysis using either whole-genome bacterial artificial chromosome (BAC)-based and oligonucleotide (oligo)-based microarrays or targeted BAC microarrays was performed on 182 and 62 prenatal cases, respectively, from North American healthcare providers without previously known chromosome abnormalities or family history of a parent with a known chromosome rearrangement. RESULTS: Microarray analysis identified clinically significant chromosome alterations in 7 out of 182 (3.8%) prenatal specimens, two of which each had two unrelated abnormalities. After excluding two of the cases in which the abnormality would have been identified by routine karyotyping, the diagnostic yield of clinically significant findings was 5 out of 182 (2.7%). One case had a finding of unclear significance (0.5%) and 16 cases had benign copy number variants (CNVs)(8.8%). Targeted microarray analysis combined with previously published data demonstrated detection rates of 0.9% for clinically significant results, 0.5% for results of unclear significance, and 8.0% for benign CNVs. CONCLUSIONS: Whole-genome prenatal aCGH detected clinically significant submicroscopic chromosome abnormalities in addition to chromosome abnormalities that could be identified by concurrent karyotyping without an increase in unclear results or benign CNVs compared to targeted aCGH. Copyright (c) 2009 John Wiley & Sons, Ltd.
BMC Genomics. 2009 Sep 28;10 (1):453 19785739 (P,S,G,E,B,D)
ABSTRACT: BACKGROUND: Copy number variation (CNV) in the human genome is recognised as a widespread and important source of human genetic variation. Now the challenge is to screen for these CNVs at high resolution in a reliable, accurate and cost-effective way. RESULTS: Multiplex Amplifiable Probe Hybridisation (MAPH) is a sensitive, high-resolution technology appropriate for screening for CNVs in a defined region, for a targeted population. We have developed MAPH to a highly multiplexed format ("QuadMAPH") that allows the user a four-fold increase in the number of loci tested simultaneously. We have used this method to analyse a genomic region of 210kb, including the MSH2 gene and 120kb of flanking DNA. We show that the QuadMAPH probes report copy number with equivalent accuracy to simplex MAPH, reliably demonstrating diploid copy number in control samples and accurately detecting deletions in Hereditary nonpolyposis colorectal cancer (HNPCC) samples. CONCLUSIONS: QuadMAPH is an accurate, high-resolution method that allows targeted screening of large numbers of subjects without the expense of genome-wide approaches. Whilst we have applied this technique to a region of the human genome, it is equally applicable to the genomes of other organisms.
Genome Res. 2009 Aug 5;: 19657104 (P,S,G,E,B,D)
Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724, USA;
Methods for the direct detection of copy number variation (CNV) genome-wide have become effective instruments for identifying genetic risk factors for disease. The application of next-generation sequencing platforms to genetic studies promises to improve sensitivity to detect CNVs as well as inversions, indels, and SNPs. New computational approaches are needed to systematically detect these variants from genome sequence data. Existing sequence-based approaches for CNV detection are primarily based on paired-end read mapping (PEM) as reported previously by Tuzun et al. and Korbel et al. Due to limitations of the PEM approach, some classes of CNVs are difficult to ascertain, including large insertions and variants located within complex genomic regions. To overcome these limitations, we developed a method for CNV detection using read depth of coverage. Event-wise testing (EWT) is a method based on significance testing. In contrast to standard segmentation algorithms that typically operate by performing likelihood evaluation for every point in the genome, EWT works on intervals of data points, rapidly searching for specific classes of events. Overall false-positive rate is controlled by testing the significance of each possible event and adjusting for multiple testing. Deletions and duplications detected in an individual genome by EWT are examined across multiple genomes to identify polymorphism between individuals. We estimated error rates using simulations based on real data, and we applied EWT to the analysis of chromosome 1 from paired-end shotgun sequence data (30x) on five individuals. Our results suggest that analysis of read depth is an effective approach for the detection of CNVs, and it captures structural variants that are refractory to established PEM-based methods.
Yi Chuan. 2009 Apr ;31 (4):339-47 19586885 (P,S,G,E,B)
Zhi-Jun Wu, Wei Jin
Department of Cardiology, Rui Jin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai 200025, China E-mail: totito19822005@hotmail.com.
Copy number variation (CNV) is increasingly recognized as a source of inter-individual differences in genome sequence and has been proposed as a driving force for genome evolution and phenotypic variation. Many CNVs resulted in different levels of gene expression, which may account for a significant proportion of normal phenotypic variation and human diseases. This review unveiled the research process and study strategy of CNVs. Subsequently, the potential mechanisms of CNV formation and its clinical implications were discussed. In addition, the first-generation copy number variation map of the human genome was introduced, which demonstrated that DNA copy number variation was associated with specific chromosomal rearrangements and genomic disorders.
Genomics. 2009 Jun 24;: 19559783 (P,S,G,E,B,D)
Institute of Biomedical Sciences, Academia Sinica, 128, Academia Road, Section 2 Nankang, Taipei 115, Taiwan; Division of Molecular and Genomic Medicine, National Health Research Institutes, Zhunan, Taiwan.
Copy number variation (CNV) is a form of DNA sequence variation in the human genome. CNVs can affect expression of nearby and distant genes, and some of them might cause certain phenotypic differences. CNVs vary slightly in location and frequency among different populations. Because currently-available CNV information from Asian population was limited to fewer small-scale studies with only dozens of subjects, a high-resolution CNV survey was conducted using a large number of Han Chinese in this study. The Illumina HumanMap550K single-nucleotide polymorphism array was used to identify CNVs from 813 unrelated Han Chinese residing in Taiwan. A total of 365 CNV regions were identified in this population, and the average size of the CNV regions was 235 kb (covering a total of 2.86% of the human genome), and 67 (18.4%) were newly-discovered CNV regions. Two hundred and seventy-nine CNV regions (76%) were verified from 304 randomly-selected samples by Affymetrix 500 K GeneChip and qPCR experiments. These regions contain 1,029 genes, some of which are associated with diseases. Consistent with previous studies, most CNVs were rare structural variations in the human genome, and only 64 regions (17.5%) had a CNV allele frequency greater than 1%. Our discovery of 67 new CNV regions indicates that previous CNV coverage of the human genome is incomplete and there is diversity among different ethnic populations. The comprehensive knowledge of CNVs in the human genome is very important and useful in further genetic studies.
Authors of some of these papers

Science news