• Aucun résultat trouvé

Next Generation Sequencing

27 2.1.2 Study of T-cell Lymphomas

2.2 Next Generation Sequencing

Next generation sequencing (NGS, also known as massive parallel sequencing or as second-generation sequencing) technologies are revolutionising the ability to characterize cancers at a genomic, transcriptomic and epigenetic levels. For the past 15 years, Sanger sequencing (capillary-based DNA sequencing or first-generation sequencing, first developed in 1975 [101]) and fluorescence-based electrophoresis technologies have been extensively used in somatic and germline genetic studies. Modern Sanger sequencing used automated instruments that detect fluorescently labeled nucleotide sequences. Capillary electrophoresis resolved DNA strands of increasing length during DNA elongation where single-stranded amplified DNA template is randomly terminated by incorporating fluorescent dideoxynucleotides. Nevertheless, Sanger sequencing presented a relatively restricted sequencing capacity, limiting the use of this approach in case of single gene or hot-spot sequence analysis. Improvements in instrumentation coupled with the development of high performance computing and bioinformatics have allowed the progress of massive parallel sequencing, overcoming the limited scalability of traditional Sanger sequencing. Many different high-throughput technologies for massive parallel DNA sequencing emerged in late 1996, and became commercially available since 2005. Different engineering configurations and sequencing chemistry characterized different NGS platforms. However, they share the technical paradigm of massive parallel sequencing via spatially separated, clonally amplified DNA templates. All approaches provide three critical steps: DNA sample preparation, immobilization, and sequencing.

Generally, the first part of the technique implies the preparation of a "sequencing library", which involves the fragmentation of DNA and the addition of defined sequences, known as “adapters,” to the ends of randomly fragmented DNA. The addition of adapters is required for two main reasons: first, to anchor the DNA fragments of the sequencing library to a solid surface and then, to define the site in which the sequencing reactions begin. The amplification of the sequencing library DNA is required only for some high-throughput sequencing systems in order to form spatially distinct and detectable sequencing features. Amplification can be performed in situ, in emulsion or in solution to generate clusters of clonal DNA copies. Sequencing is performed using either DNA polymerase synthesis for fluorescent nucleotides or the ligation of fluorescent oligonucleotides. In the physical sequencing phase, the individual bases in each fragment are identified one after the other in the right order. Perhaps more important than the

Introduction

33

sequencing throughput provided by this technology and its relative low cost compared with traditional sequencing methods is the type of data it generates. The result of a sequenced segment is called a "read". Each nucleotide of the regions studied will be included in many reads, and, therefore, repeatedly analyzed. In the re-assembly phase, bioinformatics tools are used to align overlapping reads, which allows the original genome to be assembled into contiguous sequences. The longer the read length, the easier it is to reassemble the genome. The million of analyzed reads are then compared with a reference human genome. Instead of long reads generated from a PCR-amplified sample, massively parallel sequencing methods provide much shorter reads (~21 to ~400 base pairs), but millions of them. The short reads generated in the sequencing of each DNA molecule can be counted and quantified, allowing the identification of mutations, the quantification of its frequency and accurate copy number assessment of each genomic region. In addition, with the introduction of approaches that allow for the sequences of both ends of a DNA molecule (that is, paired end massively parallel sequencing or mate pair sequencing), it has become possible to detect balanced and unbalanced somatic rearrangements, such as fusion genes, in a genome-wide fashion in a single assay [102].

Nevertheless, PCR amplification of DNA to be analyzed represents a limit of second generation sequencing techniques. In fact the possible introduction of base sequence errors or the promotion of the use of certain sequences over others, may lead to the identification of mutations not tumor-related and to the change of the frequency and abundance of various DNA fragments that existed before amplification [103]. The sequencing from a single DNA molecule (without the need for PCR amplification and its potential for distortion of abundance levels), together with the ultimate miniaturization and the minimal use of biochemicals can overcome this limit. This method of sequencing from a single DNA molecule is called "third-generation sequencing" [104]. Numerous are the advantages of these very recent approaches over second NGS techniques: higher throughput, faster turnaround time, longer read lengths, higher consensus accuracy to enable rare variant detection, small amount of starting material and low cost [103]. Many of the third generation sequencing platforms integrate DNA sequencing with other more general utility, such as comprehensive characterization of transcriptomes and translation and identification of patterns of methylation. High-throughput sequencing technologies can be used also to sequence cDNA, in an approach called "RNA-seq", in order to get informations about differential expression of genes, including gene alleles and differently spliced transcripts, non-coding RNAs, post-transcriptional mutations or editing and gene

Introduction

34

fusions. Besides RNA-seq, high troughput sequencing techniques have a wide range of applications, such as: chromatin immunoprecipitation coupled to DNA microarray (ChIP-chip) or sequencing (ChIP-seq), whole genome genotyping, genome wide structural variation and sequencing of mitochondrial genome.

Advances in the field of genomics, and especially in sequencing approaches, have broadened the horizons and expectations towards a new era of personalized therapy.

Introduction

35