would be overlapping with inter-specific CNVs between more distant species.
Gene content of chicken CNVR
Within the 1,556 CNVRs, a total of 2,642 unique Ensembl peptides were identified based on chicken build 2.1. To examine whether genes involved in spe- cific pathways or biological processes are more prone to copynumbervariation, we performed a gene en- richment analysis for the genes located within the CNVRs. The chicken transcript ids were used as input into DAVID for a gene enrichment and ontology ana- lysis . Terms showing significant enrichments were the GO terms “functional constituent of cytoskeleton”, “nuclear binding”, “cellular response to stress”, and “macromolecule catabolic processes”. The GO term “functional constituent of cytoskeleton” is mainly driven by the keratin superfamily. The avian keratin genes are over-represented when compared to mam- mals . Phylogenetic analysis demonstrated that evo- lution of archosaurian epidermal appendages in the linage leading to birds was accompanied by duplication and divergence of the ancestral ß-keratin gene cluster. In the chicken, four subfamilies (claw, feather, feather- like and scale) of the ß-keratin genes have been named in accordance with tissue-specific expression and se- quence heterogeneity . These ß-keratin gene sub- families are clustered on GGA25 whereas the genes for two other monophyletic groups of feather keratins are located on GGA27 and GGA2 respectively. We ob- served large CNVRs (CNVRs 863, 873 and 791) within all three regions in the chicken genome up to 2 Mb in size. Within these CNVRs we observed both CNV losses and gains.
Another striking feature of Choosiness-associated CNVs was the high frequency of variants that were categorized as ‘non-identical by state’, i.e. those with strong but not strict overlap in their coordinates across replicate comparisons of Choosy and Non-Choosy populations. By contrast, less than 10% of Choosiness-associated CNVs that were identical by state (i.e. those with strictly identical coordinates; electronic supplementary material, table S5). CNVs identical by state show a pattern of divergence consistent with a differentiation in allele frequency from shared standing variation, whereas CNVs non-identical by state show a pattern consistent with convergent evolution. Therefore, our results imply that the majority of Choosiness-associated CNVs may have arisen either through convergent copy-number differentiation or through the mutational size-expansion or size-contraction of CNVs, followed by the differentiation of CNV sizes between populations. Both explanations imply a high mutation rate. Convergent evolution at CNV ‘hotspots’ has been described in several detailed case studies, [97,98] often attributed to variation in the rate of aberrant recombination . While we did not find an overall effect of recombination rate variation when explaining the genome-wide distribution of CNVs, it is nonetheless possible that fine-scale recombina- tion hotspots could lead to recurrent deletions or tandem duplications, or that the recombination rate data available (measured in laboratory strains) does not reflect the recombi- nation landscape of the wild populations we measured. Other mechanisms of local recurrent mutation, such as DNA fragility, may also contribute .
Our simulations showed that algorithm sensitivity could be adjusted to identify CNVs of <5 Mb. However, this led to increased detection of nonsimulated CNVs, suggesting that specificity was compromised (Supplemental Fig. 1D, second and fourth panels). In these simulation experiments, as with all single-cell sequencing experiments, it is impossible to determine whether the nonsimu- lated CNVs represent true CNVs undetectable at less sensitive pa- rameters or false CNVs caused by random fluctuations in read depth that are inappropriately identified as CNVs when sensitivity is increased. The next best way to verify CNVs is to sequence cells that are closely related to one another, ideally the two products of a cell division. Barring errors during DNA replication, two daughter cells should have identical or complimentary CNVs. A CNV pre- sent in only one of the cells, henceforth called a private CNV, therefore likely represents a false positive CNV or a CNV that failed to be detected in the other cell(s).
For Peer Review
allowed accurate SNP genotyping using TaqMan assays as well as Illumina technology. For the latter, the calling agreement for leukocyte and buccal DNA was 99.99%. In the absence of other studies providing similar information, caution is needed when analyzing buccal cell DNA and new methodological studies specifically addressing these issues are needed. Select commercial SNP-array platforms have included monomorphic probes to improve coverage of CNV analyses. We have analyzed whether monomorphic and polymorphic probes performed differently in assessing CNV. Surprisingly, we observed that, regardless of the algorithm used, CNVs showing discordance between duplicates contained a higher proportion of monomorphic probes than CNVs that were concordant. The difference was greater for QuantiSNP. Hence, our findings indicate that polymorphic probes deliver more robust information than monomorphic probes, at least using the current CNV calling tools. Alternatively, it is possible that monomorphic probes may concentrate in a small number of large CNVs being difficult to call since they are not homogenously distributed across the genome and are placed in those regions suspected of harbouring CN changes (Iafrate et al., 2004; Redon et al., 2006). Nevertheless, there is no evidence that CNVs in these regions are larger that those elsewhere.
A variety of mechanisms are thought to give rise to CNVs . A major source of structural variation is non-allelic homologous recombination (NAHR), which occurs due to aberrant pairing of regions of extended homology. Other mechanisms involve re-joining of breaks in DNA but do not require extensive homology. In addition to this, errors in replication, such as slippage at variable number of tandem repeat (VNTR) loci or insertion of transposable elements also generate variation in copynumber. CNV formation appears to occur at higher rates in certain genomic regions termed rearrangement hotspots. In particular, CNVs associated with NAHR tend to be clustered in the genome, and CNVs are enriched in the vicinity of segmental duplications. This suggests regions of local sequence homology are hotspots of CNV formation by NAHR [12-14]. In humans, the initiation of meiotic double-stranded breaks (DSBs) is thought to begin with the binding of the protein PRDM9 to a degenerate 13-bp sequence motif [15-17]. This motif is also enriched in CNV breakpoints , including several involved in disease , which implicates DSBs formed in this way in CNV formation by NAHR.
Background: Array-based comparative genomic hybridization (array-CGH) is a recently developed technique for analyzing changes in DNAcopynumber. As in all microarray analyses, normalization is required to correct for experimental artifacts while preserving the true biological signal. We investigated various sources of systematic variation in array-CGH data and identified two distinct types of spatial effect of no biological relevance as the predominant experimental artifacts: continuous spatial gradients and local spatial bias. Local spatial bias affects a large proportion of arrays, and has not previously been considered in array-CGH experiments. Results: We show that existing normalization techniques do not correct these spatial effects properly. We therefore developed an automatic method for the spatial normalization of array- CGH data. This method makes it possible to delineate and to eliminate and/or correct areas affected by spatial bias. It is based on the combination of a spatial segmentation algorithm called NEM (Neighborhood Expectation Maximization) and spatial trend estimation. We defined quality criteria for array-CGH data, demonstrating significant improvements in data quality with our method for three data sets coming from two different platforms (198, 175 and 26 BAC-arrays). Conclusion: We have designed an automatic algorithm for the spatial normalization of BAC CGH- array data, preventing the misinterpretation of experimental artifacts as biologically relevant outliers in the genomic profile. This algorithm is implemented in the R package MANOR (Micro- Array NORmalization), which is described at http://bioinfo.curie.fr/projects/manor and available from the Bioconductor site http://www.bioconductor.org. It can also be tested on the CAPweb bioinformatics platform at http://bioinfo.curie.fr/CAPweb.
number alterations in zebrafish MPNSTs. Initially, we identified CNAs for individual tumors by comparison of the massively parallel sequencing of DNA taken from fresh tumors versus normal (tail) tissue from the same fish. This latter control was particularly important because it has been shown that portions of the normal zebrafish genome can exhibit fish to fish germline copynumbervariation . As noted above, the MPNSTs arising within diploid fish have near-triploid genomes . Thus, the copynumber calls for the tumor tissue were made relative to this 3N baseline copynumber, such that underrepresented chromo- somes (‘‘loss’’) exist at less than three copies, and overrepresented chromosomes (‘‘gains’’) exist at greater than three copies. These zebrafish MPNSTs were isolated from several different genetic backgrounds. 53 came from diploid fish heterozygous for any one of 14 rp mutations (on 11 different chromosomes), and 49 were isolated from diploid fish homozygous for tp53 M214K . In addition, given that MPNSTs have a near-triploid copynumber  and triploid zebrafish are viable , we also analyzed 45 tumors from triploid tp53 M214K homozygotes to determine whether starting with a triploid genome would alter the genomic content of the resultant tumors. Interestingly, MPNSTs arising in triploid tp53 M214K homozygotes had a pseudo-triploid chromosome number similar to MPNSTs from diploid fish, arguing strongly that this represents the preferred genomic state of this tumor type. Heat maps of all 147 tumors are shown in Figure S3A and per-sample numerical data is available in Dataset S1 and Dataset S2.
Zhang and Rodaway, 2007 ; Zhang and Hamza, 2018 ). Obviously, deficiencies in haematopoiesis during development might have downstream effects as well. Other mitochondria-related genes of this GO term were decreased, such as oxct1a, oxct1b, members of cyp450 family 2 and gpx1a, which are related to the metabolism of fatty acids and ketone bodies, exogenous drug catabolism and hydrogen peroxide degradation in the mitochondria. The decrease in mtDNA copynumber and subsequent mitochondrial dysfunction appears to result in enrichment of GO terms related to rRNA synthesis, ncRNA processing, and ribosomal assembly (Table 1), and the vast majority of genes in these GO terms were upregulated (Supplementary Table S4). In addition, the GO term tRNA processing was also significantly enriched. In general, expression of genes belonging to these GO terms is increased, including a number of genes that are linked to mitochondrial function, like mto1 (1.64), trmt10c (1.55), and elac2 (1.61) ( Haack et al., 2013 ; Metodiev et al., 2016 ; Fakruddin et al., 2018 ). Increased tRNA synthesis was also observed by Torrent et al. (2018) upon applying diverse stress conditions to Saccharomyces cerevisiae, demonstrating the regulation of tRNA abundance upon cellular stress. However, future research is needed to identify if the observed activation nucleolar processes is caused by mitochondrial stress or is a reaction to compensate for the reduction in mitochondrial transcripts. The resulting disbalance between mitochondrial transcripts and proteins, and nuclear encoded proteins that migrate into the mitochondria might be a factors as well. Also, gene expression of 8 out of 9 exosome components, belonging to GO term ribosomal biogenesis, were significantly upregulated in tfam splice-MO zebrafish. Like increased rRNA and tRNA expression, upregulation of the exosome components seems to be a compensatory or stress-induced mechanism, as exosomes contain and shuttle mRNA, and are important in intercellular communication and developmental patterning ( Wan et al., 2012 ;
Trimmomatic (Bolger et al., 2014) and transcript abundance estimated using Kallisto (Bray et al., 2016), using the latest reannotation of Arabidopsis thaliana reference transcriptome (Araport 11). Differential expression was assessed by Likelihood Ratio Testing with Sleuth in R (Pimentel et al., 2017) using “genotype” (i.e #236, #289 and WT) as factor in the full model, against a reduced model without genotype information. The minimum detection frequency filter was set to > 0.3 to allow for detection of transcripts detected in at least one of the three genotypes. Transcripts were aggregated into genes during the Sleuth analysis. The comparison of the fits between full and reduced models for the abundance of each gene highlights those whose expression is more likely determined by the genotype than by the null hypothesis. Finally, log2 fold change approximations were extracted relative to WT using a Wald test, also in Sleuth. Genes with an absolute log2 fold change (b-value) > 0.58, representing a fold change of > 1.5 were selected and p-value correction performed using Benjamini-Hochberg on the p-values of the genes passing the fold change threshold. The fold change cut-off was selected since individual genotypes were grown on separate plates, potentially creating subtle batch effects. Hence, we only considered targets with a fold change greater than 1.5 as differentially expressed. In summary, genes with i) a likelihood ratio q-value < 0.05, ii) a Wald test q-value < 0.05 and iii) an absolute log2 fold change (b- value) > 0.58 were deemed significantly dysregulated. For MapMan analysis (Usadel et al., 2009), the lists of up and down-regulated genes, as well as the entire list of expressed genes (background) was used for bin enrichment. Genes were assigned into a MapMan bin structure using Mercator4 (Schwacke et al., 2019). A MapMan bin was classified as enriched if the number of genes belonging to that bin in the up/down lists was statistically higher than expected from the background, using a two-tailed Fisher’s exact test and adjusted for multiple testing using Benjamini-Hochberg correction. Only bins enriched in both lines were retained for the analysis, to highlight possible similarities between the two independent LCN lines. The Log2 Fold Enrichment was calculated by dividing the number of observed versus expected genes in each bin.
Implications for phylogeography
Previous cpDNA phylogeographic studies of the Medi- terranean olive tree have been limited due to the low number of haplotypes detected [17,18]. Here, we demonstrate that a genomic profiling approach of the plastid DNA mostly based on microsatellites and indels can solve this problem. The high variation detected in five distant wild populations indicates a high potential of our approach for resolving the Mediterranean olive tree history. One putative limitation is the level of homoplasy on microsatellite motifs, reported by differ- ent authors [39-42], and which could prove problematic when accurately identifying evolutionary relationships between haplotypes. We reconstructed a reduced med- ian network based on molecular markers (Figure 3c). The Mediterranean haplotypes clustered into three lineages (E1, E2 and E3), while the haplotype of subsp. maroccana formed a fourth lineage (M) in northern Africa. This topology is fully congruent with Besnard et al. [15,29], who used different cpDNA data (i.e., micro- satellites, indels and CAPS, or nucleotides). Each lineage displays at least one specific indel, with the exception of lineage M (Figure 3c). Phylogenetic relationships remain unresolved at the base of lineages E1 and E2, as well as in the centre of the network, as a consequence of homo- plasy between haplotypes belonging to different lineages (e.g., shared length polymorphisms between clades Cp-I and Cp-II at loci 1, 2, 9, 17, 25, 38, 47, 48, 49, 50 and 58; Additional file 3). Such a difficulty for determining the ancestral state hampers the correct identification of historical links between divergent lineages. In contrast, we expect that homoplasy will not be a serious limita- tion to resolve phylogenetic relationships among lineages, since their haplotypes have diverged more recently . In any case, for an optimal analysis of the cpDNA variation at the population level, possible length homoplasy will need to be considered and the use of appropriate models will be necessary [41,43].
against ejaculate sperm numbers provides some evid- significant variance in sperm size between males within populations has been documented in a range of ence for a trade-off between sperm size and number
but, when a single outlying data point is removed, the other taxa, including humans (Ward, 1998; Morrow & Gage, 2001). The existence of this variance remains relationship falls far from significance (P =0.350). A
Figure 8: représentation schématique des notions de profondeur et de couverture de séquençage. Chaque flèche bleue représente une lecture du brin forward, chaque flèche rouge
représente une lecture du brin reverse.
L’exon 1 est ici couvert à 100% (encadré noir : couverture de 100% à une profondeur de 8X) L’exon 2 n’est pas couvert à 100% (encadré en pointillé : zone terminale non couverte) L’étape d’appel des variants permet de détecter les variations de séquence grâce à des algorithmes spécifiques. Les bases de l’ADN du patient sont comparées une par une au génome de référence pour mettre en évidence une variation de séquence (substitution d’une base, délétion ou insertion d’une ou plusieurs bases ; Figure 9 ). Un fichier ayant une extension .VCF (Variant Call Format) est alors obtenu.
Our findings may also have implications for under- standing the origins of phenotypes of chromosome gains
and losses in humans. Phenotypes observed in monoso- mies are likely due to a combination of specific gene copynumber alterations and mass effects of individually harmless genes, as 1.5% –5% of the human genome is pre- dicted to be haploinsufficient (Dang et al. 2008; Huang et al. 2010). However, contribution of mass action of genes with little or no phenotypic effect when varied individual- ly could significantly impact phenotypes associated with trisomy. For example, despite intense efforts, no specific gene or chromosomal region has been identified as respon- sible for the general growth retardation, developmental abnormalities, and cognitive deficiencies characteristic of DS. In developing a framework to think about treating diseases involving whole-chromosome or segmental aneuploidies, our data suggest that, rather than solely searching for therapies that target single genes, we also need to identify methods to increase the overall fitness of aneuploid cells.
Les titulaires de droits peuvent seulement conditionner le bénéfice des exceptions à un accès licite à l’œuvre, mais nous avons vu que c’était bien le cas pour la Copy Party.
LE CERCLE DE FAMILLE AU CENTRE DES RÉSEAUX ? Définir les limitations d'usage dans le cadre de la manifestation organisée permit aux organisateurs de souligner à quel point l'articulation entre un usage collectif et une limitation au cercle de famille devient source de confusion, tant, à l'heure des réseaux, ledit "cercle de famille" ressemble à la sphère pascalienne dont "le centre est partout et la circonférence nulle part". La loi précise que les copies pour rester licites doivent être « strictement réservées à l’usage du copiste et non destinées à une utilisation collective ». Les copies des œuvres réalisées dans le cadre de la Copy Party sont donc réservées à l'usage personnel des copistes, à l’exclusion de toute forme d’usage public. Mais la jurisprudence admet néanmoins, dans des conditions restrictives, que l’usage reste privé s’il se limite à votre cercle de famille (parents et amis proches), tout en excluant cependant le prêt à des tiers des copies réalisées, qui impliquerait un abandon de la part du copiste, de sa maîtrise des copies. L’usage privé exclut également formellement toute forme de mise en ligne, sur Internet, mais aussi à un petit nombre d’utilisateurs ciblés (ex : partage à des amis sur un réseau social). Il exclut également l’envoi par mail en pièces des copies à un tiers. Il est en revanche possible de réaliser plusieurs copies ou transfert des œuvres reproduites lors de la Copy Party, de les stocker sur le disque dur d’un ordinateur personnel, une clé USB ou même un service de stockage en ligne (type Dropbox), dans la mesure où ce stockage exclut toute forme de partage.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires Detection of copynumber variations from NGS data using read depth information: a diagnostic performance
Ruibin Xi 1 , Joe Luquette 1 , Angela Hadjipanayis 2 , Tae-Min Kim 1 , Peter J Park 1,3,4*
From Beyond the Genome: The true gene count, human evolution and disease genomics Boston, MA, USA. 11-13 October 2010
DNAcopynumber alterations (CNA), which are amplifications and deletions of certain regions in the genome, play an important role in the pathogenesis of cancer and have been shown to be associated with other diseases such as autism, schizophrenia and obe- sity. Next-generation sequencing technologies provide an opportunity to identify CNA regions with unprece- dented accuracy. We developed a CNA detection algo- rithm based on single-end whole-genome sequencing data for samples with matched controls. This algo- rithm, called BIC-seq, can accurately and efficiently identify the CNAs via minimizing the Bayesian infor- mation criterion (BIC). We applied BIC-seq on a glio- blastoma multiforme (GBM) tumor genome from the Cancer Genome Atlas (TCGA) project and identified hundreds of CNVs, some were as small as 10 bp. We compared these CNAs with those detected using the array Comparative Genomic Hybridization (CGH) plat- forms and found that about one third were ‘missed’ by the array-CGH platforms, most of which were CNAs less than 10 kb. We selected 17 of the CNAs not detected by the array-based platforms for validation, ranging from 110 bp to 14 kb, and found that 15 of them are true CNAs. We further extended BIC-seq to the multi-sample case to identify recurrent CNAs in across multiple tumor genomes.
MYC expression levels were associated with a favorable
prognosis, despite the presence of TERT gains.
In conclusion, we found no activating TERT promoter mutations in 122 breast cancer patients. The T349C TERT promoter SNP was not significantly associated with TERT expression but T349C carriers had shorter survival. Notably, TERT gene copynumber gain was significantly related to TERT upregulation and, in association with MYC overexpression, characterized particularly aggressive disease. These results show a significant effect of gene copynumber gain on TERT expression level and provide a new insight into the clinical significance of TERT and MYC upregulation in breast cancer.
The identification of single copy (1-to-1) orthologs in any group of organisms is important for functional classification and phylogenetic studies. The Metazoa are no exception, but only recently has there been a wide-enough distribution of taxa with sufficiently high quality sequenced genomes to gain confidence in the wide-spread single copy status of a gene. Here, we present a phylogenetic approach for identifying overlooked single copy orthologs from multigene families and apply it to the Metazoa. Using 18 sequenced metazoan genomes of high quality we identified a robust set of 1,126 orthologous groups that have been retained in single copy since the last common ancestor of Metazoa. We found that the use of the phylogenetic procedure increased the number of single copy orthologs found by over a third more than standard taxon- count approaches. The orthologs represented a wide range of functional categories, expression profiles and levels of divergence. To demonstrate the value of our set of single copy orthologs, we used them to assess the completeness of 24 currently published metazoan genomes and 62 EST datasets. We found that the annotated genes in published genomes vary in coverage from 79% (Ciona intestinalis) to 99.8% (human) with an average of 92%, suggesting a value for the underlying error rate in genome annotation, and a strategy for identifying single copy orthologs in larger datasets. In contrast, the vast majority of EST datasets with no corresponding genome sequence available are largely under-sampled and probably do not accurately represent the actual genomic complement of the organisms from which they are derived.
Received: 17 November 2010 / Accepted: 27 January 2011 / Published online: 12 February 2011 # The Author(s) 2011. This article is published with open access at Springerlink.com
Abstract Autism spectrum disorder is a genetically com- plex and clinically heterogeneous neurodevelopmental disorder. A recent study by the Autism Genome Project (AGP) used 1M single-nucleotide polymorphism arrays to show that rare genic copynumber variants (CNVs), possibly acting in tandem, play a significant role in the genetic aetiology of this condition. In this study, we describe the phenotypic and genomic characterisation of a multiplex autism family from the AGP study that was found to harbour a duplication of exons 31–44 of the Duchenne/Becker muscular dystrophy gene DMD and also a rare deletion involving exons 1–9 of TRPM3. Further characterisation of these extremely rare CNVs was carried out using quantitative PCR, fluorescent in situ hybridisation,