• Aucun résultat trouvé

overrepresented as determined by Fisher’s exact test

Dans le document Data Mining in Proteomics (Page 65-74)

KEGG pathway Proteins with

detected P-sites Proteins with

regulated P-sites p-value Insulin signaling

pathway 52 19 0.0002

MAPK signaling

pathway 74 21 0.004

mTOR signaling

pathway 22 9 0.005

ErbB signaling

pathway 40 13 0.007

Axon guidance 34 10 0.04

Prostate cancer 21 7 0.04

Nonsmall cell lung

cancer 17 6 0.04

53 Analysis of Phosphoproteomics Data

ribosomal protein S6 (UniProt id P62753). Whereas five of them, all between amino acid position 235 and 242, are downregulated, two of them, at 244 and 246, are upregulated. Furthermore, KEGG, like other publicly available pathway resources, cover the pathways not in all details. In case of the mTOR pathway, we therefore draw the essential parts of the pathway based on the current literature (e.g., (32) using Inkscape http://www.inkscape.

org, see Fig. 6). The identified phosphorylation sites can then be added to the protein nodes as small ellipses labeled by the posi-tion and colored by regulaposi-tion factor. This allows capturing all details of the regulated pathway and supports the understanding of the mode of action of the substance. Mapping of the regulated phosphorylation sites to signal transduction pathways reveals that sorafenib treatment leads to severe downregulation of the MAP kinase pathway in PC3 cells. In addition, several other pathways are deregulated. In particular, the mTOR pathway is significantly affected by sorafenib in PC3 cells. Obviously, these hypotheses have to be validated with independent technologies that confirm the downstream effect on transcription or translation.

Fig. 5. KEGG MAPK signaling pathway (31). Proteins with detected phosphorylation sites are colored dark gray, if they are downregulated after the treatment with sorafenib, and light gray if they are not regulated at all. See the online version of this chapter for a colored figure.

54 Schaab

The analysis of global phosphoproteomes is a relatively new field within bioinformatics. In the last few years, technical advances have led to a steady increase in the number of detectable phos-phorylation sites. It has recently become possible to detect and quantify 6,600 sites (8) or even 16,000 sites (5) in a single experi-ment. The processing of phosphoproteomics raw data requires software that combines standard search engines, such as Mascot and Sequest, with specialized algorithms for the identification of phosphorylated peptides, the localization of the phosphorylation sites, and their quantification. Examples of such software are MSQuant, SuperHirn, and MaxQuant.

We have seen that many of the methods that have been devel-oped for gene expression analysis can also be applied to the down-stream analysis of phosphoproteomics data. Additional methods that take the particular nature of the data into account have been developed, e.g., the enrichment of kinase motifs in the set of dif-ferential phosphorylated sites.

Unlike genetic mutational analysis or gene expression analysis that measure surrogates only, phosphoproteome analysis directly measures the signaling activity in the cell. Therefore, phosphop-roteome analysis will be a valuable tool whenever effects on

4. Conclusion

Fig. 6. mTOR pathway with identified phosphorylation sites. Sites, that are downregulated after the treatment with sorafenib, depicted as ellipses, upregulated sites as rectangles. See the online version of this chapter for a colored figure.

55 Analysis of Phosphoproteomics Data

cellular signaling activity are studied. For example, such an analysis may reveal the mode of action of drugs that inhibit certain kinases.

Or, more visionary, such an analysis may discover biomarker sig-natures that allow to predict the optimal targeted therapy for a patient (personalized medicine, see (6)).

1. The peptide ion counts depend not only on the peptide concentration but on a number of additional parameters, such as the ionization efficiency, the elution behavior in the nanoLC, and the enrichment efficiency. These parameters differ for different peptides. Thus, two different peptides with identical concentration in the sample may have very different ion counts in the MS. On the other hand, these parameters do not differ for chemically identical peptides of different iso-tope composition. Thus, labeling methods, such as SILAC or iTRAQTM, allow the relative quantification of a peptide in dif-ferent samples, whereas absolute quantification is impossible in principle (see however Note 2). The situation is analogous to the situation with microarray-based gene expression data, where due to the differences in the hybridization efficiency only comparisons between samples rather than between fea-tures are possible.

2. If defined amounts of synthetically produced, isotopically labeled peptides are spiked into the samples, absolute quanti-fication of the corresponding natural peptides is possible (33).

3. If only a certain set of phosphopeptides is to be analyzed, one can use so-called targeted approaches to improve the cover-age of this set. This includes the use of inclusion lists (34) or MRM-based methods (35).

4. Depending on the quality of the MS/MS spectra, it is not always possible to assign the phosphorylation to a specific amino acid. MaxQuant calculates the localization probability that the given amino acid is indeed the one that is phospho-rylated. It often makes sense to restrict oneself to phosphory-lation sites that are identified and localized with high confidence. Therefore, so-called class I sites are defined as the ones that have a localization probability of at least 75% and a score difference of at least 5 (8).

5. A very different approach has been taken by Zhou et al. (30) who proposed a “global rank test” for microarray data. Here, the sites are ranked by ratios within each replicate. Sites that are consistently ranked top or bottom T are identified as

5. Notes

56 Schaab

differentially phosphorylated sites. The parameter T is fixed by an appropriate FDR that is estimated parametrically or based on permutations. A nice feature of this test procedure is that the FDR actually decreases with the number of tested sites. Standard FDR procedures show the opposite behavior.

6. There are a number of databases containing signal transduc-tion pathways, including KEGG (http://www.genome.jp/

kegg/pathway.html), BioCarta (http://www.biocarta.com/), and PANTHER (http://www.pantherdb.org/pathway/). By identifying pathways in which differentially phosphorylated proteins are overrepresented, one can expect that the corre-sponding biological processes differentially respond to the tested conditions.

7. Many motifs are known and the above approach can be used to identify motifs for which differential phosphorylation sites are overrepresented. Another approach is to de novo identify motifs from all differentially phosphorylated sites (36).

References

1. Hunter T (2000) Signaling–2000 and beyond.

Cell 100:113–127

2. Pawson T, Nash P (2003) Assembly of cell regulatory systems through protein interac-tion domains. Science 300:445–452

3. Blume-Jensen P, Hunter T (2001) Oncogenic kinase signalling. Nature 411:355–365 4. Kaminska B (2005) MAPK signalling

path-ways as molecular targets for anti-inflamma-tory therapy – from molecular mechanisms to therapeutic benefits. Biochim Biophys Acta 1754:253–262

5. Tebbe A, Klammer M, Kaminski M, Wandinger S, Eckert C, Müller S, Gorray M, Enghofer E, Schaab C, Godl K (2009) Mode of action analysis of sorafenib by integrating chemical proteomics and phosphoproteomics. Eur J Cancer 7:14–15

6. Lim YP (2005) Mining the tumor phosphop-roteome for cancer markers. Clin Cancer Res 11:3163–3169

7. Huang PH, Mukasa A, Bonavia R, Flynn RA, Brewer ZE, Cavenee WK et al (2007) Quantitative analysis of EGFRvIII cellular sig-naling networks reveals a combinatorial thera-peutic strategy for glioblastoma. Proc Natl Acad Sci U S A 104:12867–12872

8. Olsen JV, Blagoev B, Gnad F, Macek B, Kumar C, Mortensen P, Mann M (2006) Global, in vivo, and site-specific phosphoryla-tion dynamics in signaling networks. Cell 127:635–648

9. Macek B, Mann M, Olsen JV (2009) Global and site-specific quantitative phosphoproteomics:

principles and applications. Annu Rev Pharmacol Toxicol 49:199–221

10. Rush J, Moritz A, Lee KA, Guo A, Goss VL, Spek EJ et al (2005) Immunoaffinity profiling of tyrosine phosphorylation in cancer cells.

Nat Biotechnol 23:94–101

11. Ong SE, Blagoev B, Kratchmarova I, Kristensen DB, Steen H, Pandey A, Mann M (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a simple and accu-rate approach to expression proteomics. Mol Cell Proteomics 1:376–386

12. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S et al (2004) Bioconductor: open software development for computational biology and bioinformat-ics. Genome Biol 5:R80

13. Mueller LN, Rinner O, Schmidt A, Letarte S, Bodenmiller B, Brusniak MY et al (2007) SuperHirn – a novel tool for high resolution LC-MS-based peptide/protein profiling.

Proteomics 7:3470–3480

14. Cox J, Mann M (2008) MaxQuant enables high peptide identification rates, individual-ized p.p.b.-range mass accuracies and pro-teome-wide protein quantification. Nat Biotechnol 26:1367–1372

15. Cox J, Matic I, Hilger M, Nagaraj N, Selbach M, Olsen JV, Mann M (2009) A practical guide to the MaxQuant computational

plat-57 Analysis of Phosphoproteomics Data

form for SILAC-based quantitative proteom-ics. Nat Protoc 4:698–705

16. Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116–5121

17. Bonferroni CE (1936) Teoria statistica delle classi e calcolo delle prababilita. Publicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenzi 9:3–62

18. Holm S (1979) A simple sequentially rejective multiple test procedure. Scand J Stat 6:65–70 19. Hochberg Y (1988) A sharper Bonferroni

procedure for multiple tests of significance.

Biometrika 75:800–803

20. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and power-ful approach to multiple testing. J R Stat Soc B 57:289–300

21. Storey JD, Tibshirani R (2003) Statistical sig-nificance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445

22. Xie Y, Pan W, Khodursky AB (2005) A note on using permutation-based false discovery rate estimates to compare different analysis methods for microarray data. Bioinformatics 21:4280–4288

23. Jiao S, Zhang S (2008) On correcting the overestimation of the permutation-based false discovery rate estimator. Bioinformatics 24:

1655–1661

24. Fisher RA (1935) The logic of inductive infer-ence. J Royal Stat Soc 98:39–54

25. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M et al (2003) GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4:R28

26. Maere S, Heymans K, Kuiper M (2005) BiNGO:

a Cytoscape plugin to assess overrepresenta-tion of gene ontology categories in biological networks. Bioinformatics 21:3448–3449

27. Ackermann M, Strimmer K (2009) A general modular framework for gene set enrichment analysis. BMC Bioinformatics 10:47

28. Goeman JJ, Buhlmann P (2007) Analyzing gene expression data in terms of gene sets: method-ological issues. Bioinformatics 23:980–987 29. McDermott U, Sharma SV, Dowell L,

Greninger P, Montagut C, Lamb J et al (2007) Identification of genotype-correlated sensitiv-ity to selective kinase inhibitors by using high-throughput tumor cell line profiling. Proc Natl Acad Sci U S A 104:19936–19941 30. Zhou Y, Cras-Meneur C, Ohsugi M, Stormo

GD, Permutt MA (2007) A global approach to identify differentially expressed genes in cDNA (two-color) microarray experiments.

Bioinformatics 23:2073–2079

31. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28:27–30

32. Hay N, Sonenberg N (2004) Upstream and downstream of mTOR. Genes Dev 18:

1926–1945

33. Stemmann O, Zou H, Gerber SA, Gygi SP, Kirschner MW (2001) Dual inhibition of sis-ter chromatid separation at metaphase. Cell 107:715–726

34. Mueller LN, Brusniak MY, Mani DR, Aebersold R (2008) An assessment of soft-ware solutions for the analysis of mass spec-trometry based quantitative proteomics data.

J Proteome Res 7:51–61

35. Kitteringham NR, Jenkins RE, Lane CS, Elliott VL, Park BK (2009) Multiple reaction monitoring for quantitative biomarker analy-sis in proteomics and metabolomics. J Chromatogr B Analyt Technol Biomed Life Sci 877:1229–1239

36. Ritz A, Shakhnarovich G, Salomon AR, Raphael BJ (2009) Discovery of phosphoryla-tion motif mixtures in phosphoproteomics data. Bioinformatics 25:14–21

Part II

Databases

61

Chapter 4

Dans le document Data Mining in Proteomics (Page 65-74)