• Aucun résultat trouvé

CHAPITRE II: Proteomics: Beyond cDNA

3. Computer analysis of proteomics images

For each analytical method (2-DE, LC, MS, MS/MS) described in the previous session, there is a corresponding bioinformatics tool to analyze and interpret the resulting data. These tools allow for detecting differentially expressed proteins in different proteomics samples, on the one hand, and for identifying and characterizing the most pertinent proteins on the other hand.

3.1 Analysis of 2-DE gels

The 2-DE gel patterns, once digitized, provide an important basis for quantitative analysis and comparative proteomics. The possibility of detecting protein expression changes associated with diseases and treatments or find therapeutic molecular targets, open up new frontiers for biological and biomedical research. These applications have been a major incentive to the development of specialized software systems for 2-DE gel image analysis [36-40]. Currently, a number of dedicated software packages are commercialized, the main ones being listed in Table 1. Although each of the 2-DE gel image analysis systems has its own philosophy and approach [41] most of them provide the same basic operations and functionalities necessary to carry out a complete gel study. These operations are usually based on state-of-the-art image processing algorithms that have been adapted to this specific biological need. In this Section, the key issues and steps of 2-DE gel image analysis are discussed and illustrated using the ImageMaster™ 2D Platinum release 6 (powered by Melanie, which is developed by the Proteome Informatics Group at the Swiss Institute of Bioinformatics). Note that a free Viewer of this software is available at www.expasy.org/melanie/. It has the usual visualization operations of the full version and most of the analysis procedures as well, even though the analysis is restricted to a small number of proteins and only from gels that have already been analyzed by a full version.

Table 1: Major commercialized 2-D image analysis software

Software Company Source website

DeCyder GE Healthcare (formerly

Amersham Biosciences)

www.gehealthcare.com

Delta2D Decodon www.decodon.com

ImageMaster™ 2D Platinum GE Healthcare www.gehealthcare.com

PDQuest Bio-Rad www.bio-rad.com

Progenesis/Phoretix Nonlinear Dynamics www.nonlinear.com

Proteomweaver Definiens www.definiens.com

Z3/Z4000 Compugen www.2dgels.com

3.1.1 Data analysis and validation

The first operation of a 2-DE gel analysis is the scanning of the gels. In most cases, this is achieved by the use of flatbed document scanners, camera systems, densitometers, phosphor imagers or fluorescence scanners. These equipments produce images of typically around 2000 X 2000 pixels or more, and a depth of 12 or 16 bits, thus providing a dynamic range of 4096, respectively 65536 gray levels.

One of the basic properties of 2-DE analysis software is image visualization. ImageMaster provides several means of manipulating and displaying the gels, such as gel stacking, various zoom modes, customizable grids, “flicking” back and forth between gels, or a transparency mode to visually inspect the result of matching. This makes it easy to browse through the data extracted from the 2-DE images.

Besides the basic visualization properties, the major functions of software systems for 2-DE image analysis are a) to detect and quantify the protein spots on the gels, b) to match corresponding spots across gels and c) to locate significant protein expression changes. To achieve c), a) and b) must have been successfully carried out. The optimal and reproducible definition of spot borders depends mostly on gel running issues, uneven focusing and polymerization problems. Very often proteins are not resolved as discrete spots, in particular in regions with high spot density. Numerous dim spots may be missed because they are confused with the background, whilst others might be wrongly detected. In order to overcome these issues, detection algorithms often include filtering steps to automatically

remove streak artifacts and noise spikes [42] or a segmentation process based on the analysis of the gray-levels [43]. Spot detection algorithms produce a repository of all protein spots contained in the 2-DE images, as well as related quantitative data, such as the spots' optical densities, area and volume (integration of the optical density over the area). Relative measures of these values are also given, such as for example the relative volume calculated as the absolute volume divided by the total volume of protein in the whole 2-DE gel. Relative values allow for partially compensating for variations in sample load or staining. Using such relative quantitative values provides better reproducibility of data.

Finding corresponding pairs of spots in gel images is also a critical task, whether it is based on the detection of spots first [44]or based on the intensities of the regions before the detection of spots [45]. Pair matching relies on the similarity of the spatial distribution of spots, which then may vary according to experimental gel running conditions and gel scanning. Quantitative differential protein expression can be erroneous when spots representing the same protein are not correctly matched or when spots representing different proteins are mistakenly matched together.

After matching, the statistical data analysis is carried out to find interesting proteins, i.e. those that have been suppressed or are up-regulated. Descriptive statistics summarize the values of matched spots that may indicate significant characteristic spots of gel populations. Usually, 2-DE analysis software offers common statistical tests such as Student t-test, Mann-Whitney or Kolmogorov test, but they also propose multivariate analysis, clustering tools or neural networks [46] to locate variations in protein expression profiles. The results can then be visualized through histograms, scatter plots, reports or different views of the gels such as gel transparency, overlapping spot contours or 3-D view.

Figure 3: Gel images analyzed with ImageMaster. See text for details.

Figure 3 illustrates the analysis reasoning with eight 2-DE images. These are eight gels from smooth muscle cells of rat samples, out of which four are from newborn and four from aged rats [47]. All images have been matched to gel 930018c-w (upper left in the figure). Two classes have been defined, each containing four images (marked Newborn and Aged, respectively on the labels in Figure 3). We first select all groups of spots in each of the two classes, and then produce an Inter-Class Report (detail of Figure 3) that shows the Maximum value of each class computed on the spots' normalized volumes (%VOL). A group is a set of spots that have been matched across all gel images, thus representing the same protein. A Statistical Tests report will then show various statistics about the groups in the two classes, such as Student t-test, Wilcoxon test or the Kolmogorov–Smirnov test. This lets the user select and highlight protein spots that are differentially expressed between the two classes.

Figure 3 shows the result of one of these tests, highlighting the group of spots that were ranked highest in term of separability between the two classes. In Figure 3, the same group of spots has been highlighted on the Inter-Class Report, the Inter-Class+Intra-Class Histograms and in the gels.

Once specific proteins of interest have been selected via careful data analysis, such as illustrated in Figure 3, further analysis may be carried out to identify or partially characterize those proteins.

3.1.2 Annotation and databases

After extensive analysis of the protein spots has been carried out, including protein identification as detailed in Section 4, 2-DE images may be annotated. The annotations have mainly two functions: linking gels and external databases, and adding information on the gels for later reference. ImageMaster provides broad annotation capabilities in order to include into the gel image all related data and information that has been acquired. Annotations can be added either manually, or they may be imported from an external database, for example through a Laboratory Information Management System (LIMS). Any kind of annotations may be attached to a spot or a pixel, as for example the protein ID, a SWISS-2DPAGE ID (or any other protein database ID), a landmark, calibration values (pI, Mr, intensities) or comments, as well as links to external files such as text files or MS spectra, and also Internet links. By double clicking on the various labels that mark annotated objects, the corresponding piece of information is displayed. In the case of an Accession number of a SWISS-2DPAGE entry (e.g. P02990), it launches the default Web browser and downloads the corresponding database entry from the local or external user-specified database.

3.2. Analysis of LC-MS images

So far, we have seen examples from a proteomics workflow in which samples are separated via 2-DE gels. Another possible workflow in proteomics combines separation of proteins and peptides by liquid chromatography followed by direct analysis by mass spectrometry (Figure 1). In this case, data may also be represented in two dimensions, the elution time and m/z, and they can be visualized and analyzed as images. LC-MS image analysis systems are still in their infancy. Some prototypes have been presented so far, such as Decyder-MS, MapQuant, SpectroArray, MSDViewer, MarkerView but almost no literature is available describing their performances and characteristics. Pep3D is a tool for producing LC-MS images that is also capable of representing the score values of protein identifications of precursor ions using different color hues [48]. Its simple interface permits the visualization of one experiment at a time and only a single view of the image. The Proteome Informatics Group at the Swiss Institute of Bioinformatics has developed a software tool, MSight [49], for the

two-dimensional representation, visual analysis and comparison of LC-MS datasets. MSight features 1) the display and browsing, as an image, of any portion of the collected mass spectra, with a smooth transition from a global overview of all spectra to selected isotopic peaks, 2) user-friendly navigation through large volumes of data, and 3) visualization tools to discriminate peptide or protein from noise or to perform differential analysis. This software tool is available free of charge through the ExPASy Web server [50] at http://www.expasy.org/MSight. Future versions of this tool will allow for semi-automatic analysis of LC-MS datasets, including quantitative differential proteome analysis.

Figure 4: The MSight software showing three different views of the same LC-MS sample: 2-D, 3-D and 1-D (one spectrum) views.

The images displayed in Figure 4 were obtained from a 42-59 kDa fraction of an extract of the human BJAB B-cell line. The sample was digested with trypsin and separated by reverse-phase capillary LC coupled to a SCIEX/Applied Biosystems QSTAR quadrupole-time-of-flight mass spectrometer equipped with an electrospray ionization source. Spectra were acquired in the m/z range 400-1200, and the image was created using a 0.025 m/z sampling rate and thus contains 55 million measures. The top-right image in Figure 4 displays a 2-D view of the sample described above and the bottom-left image highlights part of this

sample in a 3-D view. The bottom right image in Figure 4 shows one single spectrum of the sample.

Image processing of LC-MS datasets can be extremely useful for the monitoring and quality control of experiments as well as knowledge extraction. Software tools specifically developed for the representation of mass spectra along with data from the separation step (such as LC-MS or SDS-LC-MS) provide simple ways to navigate through very large volumes of data.

Assessment of the data quality and of the experimental design is simplified by providing a direct means of verifying the quality of the separation and detecting the presence of artifacts and contaminants or mass calibration problems. The redundancy in successive mass spectra may even be used to enhance the signal-to-noise ratio, thereby improving the reliability of MS analysis. Such visual representation of experimental data also helps to understand features such as post-translational modifications of peptides. Most importantly, it allows for automatic or semi-automatic differential proteome analysis by comparing several sets of data, as well as providing fast and intuitive detection of significant qualitative and quantitative differences.