• Aucun résultat trouvé

1.3 Proteomics

1.3.2 Mass Spectrometry

Following improvements in the instrumentation, mass spectrometry (MS) progres-sively arose as the reference proteomics approach for the analysis of complex protein samples [59]. The core principle of MS is based on the measurement of the mass-to-charge (m/z) ratio of ionized analytes in a gas phase. This analysis is performed by a mass spectrometer instrument, which is composed of three main components:

an ion source that volatilize and ionize the analytes (i); a mass analyzer that sep-arates the ionized analytes based on their measured m/z ration (ii); and a detector that records the number of ions for each detected m/z value (iii) [59]. The gener-ated data consists of mass spectra, which are two-dimensional plots representing the ion intensity depending on the m/z ratio. The MS analysis can be performed ei-ther on intact proteins followingtop-downstrategies or on digested peptides follow-ingbottom-upstrategies (also known asshotgunproteomics) (Figure1.6). Peptide MS is usually preferred since protein MS is less sensitive and the mass of intact proteins does not provide enough information for their reliable identification [59].

After or during protein isolation by biochemical fractionation and chromatography, a digestion step is thus frequently performed through the use of enzymes such as trypsin. The resulting peptides have a positively charged C-terminus, which is helping their subsequent sequencing. Themiddle-downhybrid strategy consisting in the MS analysis of large polypeptides is also available, providing more PTM co-existence information than bottom-up proteomics while being more accessible than top-down proteomics [60].

Figure 1.6: Overview of the main proteomics strategies. Figure courtesy of L. Switzaret al. [61].

1.3. PROTEOMICS

The advent of mass spectrometry in proteomics is closely related to the emergence of new soft ionization techniques that enabled the analysis of biomolecules such as proteins in the late 1980s [62]. Electrospray ionization (ESI) [63] and Matrix-assisted laser desorption/ionization (MALDI) [64] successively arose as methods for volatilizing and ionizing biomolecules [65]. The main differences between the two techniques lie in the state of the sample and the source of the ionization, which in turn influence their applications. In the ESI technique, an analyte solution flowing through thin capillaries is subjected to a strong electric field, under atmospheric pressure, to generate an electrospray. In the MALDI technique, the analytes are embedded in a dry crystalline matrix and get sublimated and ionized by short laser pulses. Both analytical techniques possess high sensitivity and can thus be applied to low concentration samples [66]. Since the sample is a solution in ESI, it can eas-ily be coupled with liquid-based separation techniques such as liquid chromatogra-phy (LC) or more recently high-performance liquid chromatograchromatogra-phy (HPLC). This property makes ESI the preferred approach for quantification experiments [66].

Another distinction is that MALDI tends to produce singly charged ions, while ESI tends to produce multiply charged ions and has consequently an increased mass range. Overall, ESI LC/MS has emerged as the reference for the analysis of com-plex protein samples in shotgun proteomics experiments, while MALDI is being used in more specific applications such as protein imaging [66].

Figure 1.7: Ions produced from peptide fragmentation. Figure courtesy of Z. Haoet al.

[67].

Further increasing the diversity of MS instrumentation, four main distinct types of mass analyzers were engineered through the years: the time-of-flight (ToF),

Fourier-transform ion cyclotron resonance (FTICR), quadrupole, and ion trap in-struments. Since each mass analyzer is based on different physical principles, they vary in terms of their analytical performance [68]. Multiple mass analyzers can however be combined to take advantage of the strengths of each technology. While MALDI is mainly paired with ToF analyzers for the measurement of the mass of in-tact peptides [59], ESI has been commonly used with a wider variety of instruments.

Single-stage MS is however insufficient to fully resolve peptide sequences. In tan-dem mass spectrometry (MS/MS) approaches, the ions isolated by the first mass analyzer (MS1), referred to as parent or precursor ions, are selected for a fragmen-tation step before being analyzed by a second mass analyzer (MS2). The exact loca-tion at which a peptide bond linking two amino acids will break depends on the frag-mentation technique in use, thus determining the types of ions produced (Figure 1.7). Collision-induced dissociation (CID) has historically been the preferred frag-mentation method in MS proteomics. In CID, the precursor ions are fragmented through their collision with an inert gas. More fragmentation techniques based on different principles appeared in recent years, with the electron-capture dissociation (ECD) in 1998 [69] and electron-transfer dissociation (ETD) in 2004 [70].

1.3.3 Data and Databases

MS instruments generate raw data in the form of large binary files that are often encoded in proprietary formats. In addition to the recorded MS/MS spectra, these formats also contain a wide range of metadata information describing the data ac-quisition process. To be exploitable by most bioinformatics software, raw data re-quires to be converted into open formats using tools such as MSConvert from the ProteoWizard toolkit [71]. The generated files are usually formatted either in text-based peak list formats (e.g., MGF, PKL and DTA) or in XML-text-based formats (e.g., mzData, mzXML, and more recently mzML). The tools for analyzing raw data pro-duce processed data, which can also be formatted in various data formats. Most of these formats are XML-based, and are used to store either the identification results (e.g., mzIdentML) or quantification results (e.g., mzQuantML) from proteomics ex-periments along with the relevant metadata information.

Following the example of other omics fields such as genomics, research in pro-teomics progressively became more open and community-centered. Despite initial reluctance, the public sharing of data generated by MS-based proteomics experi-ments, whether raw data or processed data, became the norm over time. It is even starting to be mandatory by an increasing number of journals and editors to be able to publish an article, in order to ensure the reproducibility of the experimental results presented. Several repositories were established through the years to store

1.3. PROTEOMICS

such MS data, such as PRIDE [72], MassIVE and jPOST [73]. These projects which were formerly completely independent are now coordinated in the framework of the ProteomeXchange consortium [74], ensuring of the proper and efficient sharing and dissemination of proteomics data between resources and users.

For a more in-depth review of the proteomics databases and data formats, please refer to the chapter I wrote for the Elsevier Encyclopedia of Bioinformatics and Computational Biologyentitled ”Proteomics Data Representation and Databases”

(see the SectionA.1of the Appendix).

Documents relatifs