Preparation and sequencing of non-normalized 59-end cDNA libraries
Total RNA was extracted from individual snails using Trizol Reagent (Invitrogen) according to the manufacturer’s instructions. Pools of total RNA made of 2 m g RNA from each individual snail were used for preparation of cDNA libraries. Therefore, each of the four RNA pool was prepared from an equal amount of RNA from a total of 18 individuals from three experiments (6 individuals/ experiment). This procedure aimed at obtaining samples represen- tative of each treatment. For each sample (RNA pool), 20 m g total RNA was DNase treated (Turbo DNase kit) and sent to GATC for quality check, library preparation using an ‘‘oligo-capping’’ method  and deep sequencing according to Illumina/Solexa proce- dures. Briefly, after testing total RNA integrity, poly(A)-RNAs were purified and treated with calf intestine phosphatase (CIP) in order to hydrolyze the 59Phosphate of truncated mRNAs. Tobacco acid pyrophosphatase (TAP) was then used to remove the cap structure of intact mRNAs, and an oligo-RNA adapter was ligated to the 59 - phosphate of decapped mRNAs. First-strand cDNA synthesis was then performed using a N6 randomized adapter primer and M- MLV-RNase H- reverse transcriptase. The resulting cDNAs were amplified with 21 cycles of PCR. Amplicons in the size range of 350–650 bp were purified and processed for deep sequencing on Illumina Genome Analyser II (Illumina GAII) according to Illumina procedures. 36 bp long sequences were produced and referred to as ‘‘reads’’ in this manuscript.
Non hymenopteran toxins
The blast search against nr database of T. bicarinatum contigs Tb16400 and Tb34742 of 1303 bp and 1169 bp, respectively showed similarities with waprin-like (WAPs) proteins from Hymenoptera species that were submitted to GenBank following whole genome sequencing or gen- ome mining. In order to check the possible homologies with known WAP from other venomous species, a blast search against an in-house toxin database (see Methods section) has been achieved and has revealed homologies with snake-venom WAP. An average similarity of 33% with the matched snake species was recorded. Domain search blast of the identified contigs against the PRO- SITE database  revealed that the predicted sequences (94 amino acid long each) contain the WAP-type ‘four- disulfide core’ domain profile or whey acidic protein motif. In addition, the signature pattern of cysteine resi- dues (CxxxxxCxxxxxCC) in the central region found in all WAP-motif proteins  was also identified (Figure 11), which suggests that snake-venom WAP-like are possibly expressed in T. bicarinatum venom glands. To the best of our knowledge, WAPs have never been reported from
RNA extraction and library preparation
Total RNAs (tRNAs) from venom glands sample were iso- lated with RNeasy Micro Kit (Qiagen, France) including an on column DNase digestion whereas total RNAs from ant body carcasses were extracted using 400 μl of TRI reagent (Sigma) according to the manufacturer’s protocol. Sequen- cing and cDNA library preparation were performed by Beckman Coulter Genomics services (http://www.beck- mangenomics.com/). Given the very limited amount of the total RNA extracted from the venom glands (7 ng/μl), mRNA from this sample (sample G) was transcribed into cDNA and amplified using the Ovation RNA-Seq System V2 kit, especially applied to limited biological material (NuGEN Technologies Inc.). After cDNA fragmentation, end-repair and purification with the Agencourt® AMPure® XP kit (Agencourt Bioscience, Beckman Coulter, San Car- los, CA, USA), TruSeq sequencing adapters (Illumina) were ligated to the cDNA fragments. Finally, the library was PCR-amplified (14 cycles) to about 20–30 ng/μl using a high fidelity DNA polymerase. For the total RNA sample from ant carcasses (Sample F), poly (A) RNA was isolated and fragmented. First-strand cDNA synthesis was primed with an N6 randomized primer.
Felix Dafhnis-Calas 5 , Shima Khoshraftar 2 , Sunir Malla 4 , Neel Mehta 5 , Cheuk C Siow 5 , Jonas Warringer 7 ,
Alan M Moses 2,3 , Edward J Louis 5 and Conrad A Nieduszynski 5
Background: Comparative genomics is a formidable tool to identify functional elements throughout a genome. In the past ten years, studies in the budding yeast Saccharomyces cerevisiae and a set of closely related species have been instrumental in showing the benefit of analyzing patterns of sequence conservation. Increasing the number of closely related genome sequences makes the comparative genomics approach more powerful and accurate. Results: Here, we report the genome sequence and analysis of Saccharomyces arboricolus, a yeast species recently isolated in China, that is closely related to S. cerevisiae. We obtained high quality de novo sequence and assemblies using a combination of next generation sequencing technologies, established the phylogenetic position of this species and considered its phenotypic profile under multiple environmental conditions in the light of its gene content and phylogeny.
The first threshold was determined by estimating that the rate of error was mostly dependent on the Mint reverse transcriptase, which is supposed to make an error every 30,000 nucleotides  (Sanger sequencing error rate is comparatively much lower). To be stringent we estimated that the global error rate was 10 24 and used a binomial law to calculate the probability to have k errors at a position of depth D (k being the occurrence of the minor allele). To take into account the fact that in each contig errors can occur at all sites, we used the probability calculated above and a binomial law to calculate the probability to have at least one position with k errors among all the positions of the contig. As the mean length of contigs is 985 bp, we calculated this probability for a length of 1,000 bp. If the occurrences of the minor allele could be explained by errors with a probability higher than 0.01, the putative polymorphic site was not retained.
Recently, Oxford nanopore has emerged as a competitor for long-read sequencing. Notably, Oxford nanopore produces a mini- sequencer, the MinION, requiring only a start-up fee of $1000, which includes two ﬂow cells and a library preparation kit (https:// store.nanoporetech.com/minion/sets/?___SID=U). Furthermore, recent updates in nanopore sequencing technology that became commercially available in late 2016 made it possible to obtain gigabases of sequence data from a single ﬂowcell. Prior to this, due to relatively low output, the nanopore sequencing technology was mainly used to analyze and assemble microbial samples (Loman et al., 2015; Quick et al., 2015; Jain et al., 2016; Kranz et al., 2017). Notably, early reports of Oxford nanopore reads indicate that they are exceptionally long (Weirather et al., 2017) but have a high (Judge et al., 2015), and nonrandom, error rate (Deschamps et al., 2016). A new Solanum pennellii accession has been identi ﬁed with traits that make it an interesting target for de novosequencing. S. pennellii is a wild, green-fruited tomato species native to Peru that exhibits bene ﬁcial traits such as abiotic stress resistances (Lippman et al., 2007; Koenig et al., 2013). The previously se- quenced accession LA716 (Bolger et al., 2014a) has been used to generate a panel of introgression (Eshed and Zamir, 1995) and backcrossed introgression (Ofner et al., 2016) lines that have been used to identify many interesting quantitative trait loci (Alseekh et al., 2015; Fernandez-Moreno et al., 2017), thus complementing large-scale genomic panel studies for tomato (Lin et al., 2014; Tieman et al., 2017). However, the accession LA716 does not perform well in the ﬁeld and carries the NECROTIC DWARF gene on chromosome 6, which reduces plant vigor when introduced into a Solanum lycopersicum background (Ranjan et al., 2016). A novel divergent accession LYC1722 was identi ﬁed in a large panel of tomato accessions obtained from the IPK gene bank in Ger- many as a self-compatible, phenotypically uniform biotype of S. pennellii that does not exhibit these negative traits of LA716. We chose to sequence and assemble the LYC1722 accession de novo using Oxford nanopore technology. The availability of a reference quality genome for the LA716 S. pennellii accession also made it an excellent genome with which to evaluate not just the practicality, but also the resulting quality of Oxford nanopore sequencing for assembling a gigabase-sized plant genome.
Due to the development of highly sensitive and resolutive analytical tools, the detection of low incorporation rates (measure of isotopic ratio) has become possible and we decided to undertake some feeding experiments with this dinoflagellate using the most sensitive techniques for detection . In addition to giving valuable information on the biochemical transformations occurring in living organisms, such in vivo experiments mostly traduce the de novo metabolic activities of the cells and hence the direct response of the organism to environmental changes. We consequently decided to develop a very sensitive and general process for the feeding experiments of dinoflagellates with radiolabeled precursors, enabling the measure of the specific activity of several metabolites by a straightforward HPLC purification/RadioTLC detection . This highly sensitive process allows us to work with close to “natural” concentration of precursors.
We performed a VBM study of 144 de novo PD patients (age: 61.30y±9.06; sex: 53F) and 66 HC (age:
60.12y±11.39; sex: 23F) from the PPMI database  through two different state-of-the-art pipelines: 1) CAT12  and 2) volBrain . All 3D T1-weighted scans selected were obtained under the same acquisition conditions. Both CAT12 and volBrain follow similar pre-processing steps including intensity normalization, bias and noise
All eukaryotic cells with linear chromosomes face the problem of terminal DNA sequence loss that occurs as a result of either incomplete replication of the DNA strand synthesized by the lagging strand replication machinery or accidental collapse of the replication forks. If left unrepaired, these losses will even- tually trigger the DNA damage response (DDR) and cell cycle arrest. Most eukaryotes use the enzyme telomerase that special- izes in supplementing lost sequences at the chromosome ends to counteract this problem. Telomerase is minimally composed of the catalytic subunit, a telomerase reverse transcription, and the RNA component that serves as a scaffold for telomerase subunit assembly and also carries a region that templates the synthesis of telomeric DNA repeats. The template region of telomerase RNA determines both the guanine-rich sequence of telomeric repeats and the specificity of telomerase for chromosome ends because the template region has to anneal to the single-strand DNA (ssDNA) tail exposed at the site of its action. To further reinforce the specificity of telomerase for chromosome ends, a telomerase recruitment mechanism has evolved that relies on specialized proteins that bind telomeric ssDNA with high af- finity and sequence specificity such as Cdc13 in budding yeast and POT1 in mammalian cells. However, in spite of these ad- aptations, telomerase does interfere with repair of DNA double strand breaks (DSBs) and may occasionally add telomeric re- peats to either spontaneous or induced DSBs, a process known as chromosome healing by de novo telomere addition (Penna- neach et al., 2006). In this issue, Ouenzar et al. describe a novel mechanism that restricts the action of telomerase by spatial ex- clusion from sites of DNA repair.
Traditionally, the quality of de novo consensus sequences — the extent to which they correspond to full TE reference sequences rather than truncated versions — is not assessed. Validation is instead indirect: researchers annotate a genome sequence with RepeatMasker, using Repbase Update as TE databank, and consider the resulting TE annotations as the references. They then annotate the same genome with RepeatMasker, using the de novo consensus as TE databank, and consider these TE annotations as predictions. These two sets of annotations are then compared, by calculating sensitivity and specificity at the nucleotide level. The criterion used to estimate the quality of the de novo method is therefore the extent to which de novo predictions and reference annotations overlap. However, as we are particularly interested in TE dynamics, we need to assess the quality of the de novo library itself, by evaluating the extent to which full ancestral TE reference sequences are recovered. Such sequences, which originate from the reconstruction of a given element from its copies, are not only useful for subsequent TE annotation, but also provide a condensed view of the TEs in the genome. One way of assessing the quality of the de novo consensus sequences obtained with our three-step approach would be to compare these sequences with reference sequences from the Berkeley Drosophila Genome Project (BDGP) or Repbase Update databanks. However, it was clear that some of the reference sequences present in these databanks would not be present in the genomes analyzed. For example, the ‘‘P-element’’ reference sequence is absent from the genome sequence of the D. melanogaster strain used here. So, rather than using the reference databanks directly, we first constructed, for each genome, a ‘‘knowledge-based’’ databank comprising one consensus sequence per reference TE sequence, based on its genomic copies (see Methods). For each genome, we then compared each de novo databank with its corresponding ‘‘knowledge-based’’ databank through pairwise sequence alignments. We then calculated the sensitivity S n *, specificity S p * and recovery ratio R CC (see Methods
5.2.4 Gap filling algorithm
To increase the size of contigs, a simple heuristics procedure can be used to fill scaffolds gaps. By construction, the length of scaffold gaps are shorter than insert size. Inspired by related techniques [ 6 , 7 ], the proposed procedure does not require the complete graph (as in Euler-USR [ 7 ]). Furthermore, reads localization (as in Allpaths [ 6 ]) is not a pre-requisite. To fill the gap between two reads l and r, the procedure starts from the string t = l and repeatedly extends t to the right in a depth-first fashion, until r is reached. Right-most extensions of t are computed as follows. All the overlaps between the read-length suffix of t and the input reads are retrieved. Possible extensions are computed from the set of matches according to a voting mechanism to cope with sequencing errors. The search tree for t can possibly be large due to repeats in the original sequence. When an unambiguous sequence f to the left or to the right of the gap is long enough, reads localization is performed. In this context, reads localization selects paired reads where the other mate aligns to the sequence f , and extends only with these reads.
Este trabalho analisa um estudo sobre os sistemas tradicionais de criação de galinhas na agricultura familiar. Utiliza a pesquisa-ação com famílias da região norte do estado do Espírito Santo. Envolve a realização de diagnósticos e ações de inovação e socialização de tecnologias em avicultura agroecológica. As inovações buscaram resgatar, adaptar e desenvolver tecnologias adequadas à melhoria do funcionamento do sistema de criação da galinha caipira de forma participativa. As atividades estabeleceram um novo olhar sobre estes
74 database (Pruitt et al. 2007). From these results, we computed the species distribution of the best hit of each transcript using Blast2GO version 3.2.7 (Conesa et al. 2005b). For G. rostochiensis, we have done a BLAST analysis of every de novo transcriptomes on the reference transcriptome to compute how many genes of the reference were covered by at least one hit (E-value < 1e-50). Secondly, the lists of differentially expressed genes (DEG; P<0.05 FDR-corrected) before and after decontamination were compared. DEGs were identified with the DESeq2 R package version 1.6.3 with the parametric wald test (Anders and Huber 2010, Love et al. 2014) using the count tables produced by Corset (Davidson and Oshlack 2014) as input and by doing pairwise comparisons between hydrated cysts (reference) against all other conditions. To evaluate the gain in statistical power, we computed the mean p-value and adjusted p-value of all common DEGs between transcriptomes. We also compared the DEGs found in de novo transcriptomes to those obtained using the reference transcriptome. By using the BLASTn command, we computed the number of common DEGs (E-value < 1e-10) and the percentage of DEGs that had a BLAST hit on the G. rostochiensis genome.
Fig. 5. Final image and vertexes trajectories in the image space.
constraints such as joints limit avoidance. Furthermore, a powerful use of the task sequencing method could be to move around elementary tasks within the stack during a short period of time in order to gain DOF that can be used to avoid obstacles, joint limits, etc. Another future perspective of this research is the automation of the elementary tasks choice. It would be useful for the robot to determine automatically alone when a specific task should be added.
Mots clés : assemblage de génome de novo, séquençage nouvelle-génération, qualité
d'assemblage, graphe DeBruijn, k-mère, endosymbiote, Rickettsiales Summary
The goal of this project was to develop de novo genome assembly methods adapted to small genomes, especially bacterial, using next-generation sequencing data. Eventually, these methods could be used to assemble the genome of StachEndo, an unknown Alpha- Proteobacteria ensymbiont of the Stachyamoeba lipophora amoeba. Preliminary findings showed that the use of Illumina reads with DeBruijn graph assemblers yielded the best results. These experiments also showed that contigs produced with k-mers of various sizes were complementary in genome finishing assays. The addition of long-range paired-end reads proved necessary to fully close genomic assembly gaps. These methods made the assembly of StachEndo’s genome (1.7 Mb) possible. Through the annotation of StachEndo’s genes, several features that are unusal for endosymbionts were identified. StachEndo seems to be an interesting species for the study of endosymbiotic evolution.
> La déficience intellectuelle (DI) consti- tue le handicap sévère le plus fréquent de l’enfance, et touche 1-2 % de la popula- tion. Elle survient avant l’âge de 18 ans et se traduit par un fonctionnement intellectuel significativement inférieur à la moyenne accompagné de difficul- tés importantes d’adaptation (commu- nication, soins personnels, compétences domestiques, habiletés sociales, aptitu- des scolaires et fonctionnelles, etc.)  . La forme la plus répandue est la DI non syndromique (DINS), qui n’est associée à aucune autre anomalie. En l’absence de signes physiques qui orientent les cher- cheurs dans leurs explorations, l’identifi- cation des gènes autosomiques dominants associés à la DINS représente un défi particulier. Les familles dans lesquelles la DI est transmise sur le mode autosomique dominant sont rares en raison d’un taux de reproduction inférieur à celui de la popu- lation générale. Ces mutations autoso- miques dominantes pourraient par contre survenir de novo (c’est-à-dire n’être pas héritées des parents). De fait, les rema- niements chromosomiques de novo repré- sentent la cause génétique de DI la plus fréquemment identifiée. Des mutations ponctuelles de novo pourraient elles aussi contribuer au développement de la DINS. La détection de ces mutations doit reposer sur leur recherche directe par séquençage.
A complete set of all genes along with non-coding DNA in an organism is called a genome. The NGS machines generate short fragments called “reads” of length thirty five to few hundreds of base-pairs. These reads are part of a large genome containing millions of base-pairs (the size ¿ of the human genome = 3x109 bp). Sequence Assembly is a computational biology problem where the genome is assembled using the reads generated from the NGS ma- chine. The construction of genome is more complex than the well studied shortest super-string construction problem. The problem becomes more computationally intensive as the sequencing machine generates reads with errors.The complexity of problem further increases due to repeats which are some common regions (sequences) in the genome. Also NGS machines have a constraint on the length of the reads generated. If the length of the read is less than the repeat, identifying the portion of the genome from where the read came from becomes very difficult and is practically impossible to solve.
the sequential loading with double handling. Ambrosino et al. (2013) extend the work of Ambrosino et al. (2011) with a specific emphasis on the influence of storing strategies in lots and on the handling costs in the yard for the joint loading/sequencing problem. The setting of these papers creates a problem that differs from that addressed in the present case study. Their system creates a cost associated with not loading a container, rehandling a container in the yard and with a backtracking of the crane to load a wagon previously kept empty in the sequential loading. More importantly, they consider single-stack trains, which are easier to load as opposed to double-stack ones because of the absence of predecessors in the placement of containers (top after bottom). Unlike us, they consider GC whereas we consider the more complicated case of RS. The use of GC can be considered a variant on the use of RS with relaxed constraints as there are less restrictions on the allowed actions for gantry cranes.
Little is known about the rate of emergence of de novo genes, what their initial properties are, and how they spread in populations. We examined wild yeast populations (Saccharomyces paradoxus) to characterize the diversity and turnover of intergenic ORFs over short evolutionary timescales. We find that hundreds of intergenic ORFs show translation signatures similar to canonical genes, and we experimentally confirmed the translation of many of these ORFs in labo- ratory conditions using a reporter assay. Compared with canonical genes, intergenic ORFs have lower translation efficiency, which could imply a lack of optimization for translation or a mechanism to reduce their production cost. Translated intergenic ORFs also tend to have sequence properties that are generally close to those of random intergenic sequences. However, some of the very recent translated intergenic ORFs, which appeared <110 kya, already show gene- like characteristics, suggesting that the raw material for functional innovations could appear over short evolutionary timescales.
de novo OM. L’extension métastatique, représentée par le nombre de lésions et de sites métastatiques atteints, que l’on pouvait penser déterminante, n’apparaît pas non plus associée au devenir de ces patientes. L’atteinte hépatique par exemple, fortement représentée dans notre cohorte et habituellement de pronostic péjoratif, n’influence pas la survie des patientes de novo OM. De même, les taux de LDH et de CA 15-3 ne sont pas associés au pronostic dans notre étude, indiquant que la charge tumorale n’est pas un bon reflet du devenir de cette maladie particulière. Ainsi, parmi les données disponibles en routine au diagnostic, seul le grade histopronostique SBR 3 est associé à une moins bonne SSP et SG en analyse multivariée, ce qui laisse penser que le devenir des patientes de novo OM est lié à l’agressivité de la maladie plus qu’à l’extension métastatique en soi.