Tools for Meta-Analysis - Software Implementation

GO-Based Gene Function and Network Characterization

5.8 Software Implementation

5.8.2 Tools for Meta-Analysis

Function-prediction tools using meta-analysis of microarray data are available from http://digbio.missouri.edu/meta_analyses/. All programs were written using ANSI C language, and they are compatible with both Linux, as well as Windows, operating systems.

5.9 Conclusion

This chapter introduced various aspects of GO and its applications in gene function and regulatory-network characterization. GO provides a controlled vocabulary to map functions of genes into identifi ers in any organism. This notation makes the computational method feasible to manipulate gene functions in terms of ontology or certain types of mapping. GO tremendously saved the time for other researchers to collect up-to-date function annotation from the literature, as it is continuously updated, and new versions are made available on a monthly basis. There are also

some other types of ontologies, such as the KEGG ontology. KEGG (Kyoto Ency-clopedia of Genes and Genomes) is a collection of online databases dealing with genomes, enzymatic pathways, and biological chemicals. More ontologies are in-troduced in Chapter 1 of this book.

GO offers the most comprehensive sets of relationships to describe gene/pro-tein activities. However, GO also has some limitations. For example, some GO terms are generic and not informative for biological studies, although GO has been improved with more specifi c function details over the years. Furthermore, GO’s choice to segregate gene ontology to subdomains of molecular function, biologi-cal process, and cellular component creates some limitations [55]. With further developments of gene ontology to overcome these limitations, new computational methods for gene-function prediction will also emerge.

Acknowledgements

We would like to thank our collaborators, Drs. R. Michael Roberts and Jeffery Becker. We would also like to thank Yu Chen for his early involvement in this work.

This study was supported by USDA/CSREES-2004-25604-14708 and NSF/ITR-IIS-0407204 and a Monsanto internship for Gyan Srivastava.

References

[1] Troyanskaya, O. G., “Putting Microarrays in a Context: Integrated Analysis of Diverse Biological Data,” Brief Bioinform, Vol. 6, No. 1, 2005, pp. 34–43.

[2] Watson, J. D., R. A. Laskowski, and J. M. Thornton, “Predicting Protein Function from Se-quence and Structural Data,” Curr Opin Struct Biol, Vol. 15, No. 3, 2005 pp. 275–284.

[3] Barutcuoglu, Z., R. E. Schapire, and O. G. Troyanskaya, “Hierarchical Multi-Label Predic-tion of Gene FuncPredic-tion,” Bioinformatics, Vol. 22, No. 7, 2006, pp. 830–836.

[4] Deng, M., T. Chen, and F. Sun, “An Integrated Probabilistic Model for Functional Predic-tion of Proteins,” J Comput Biol, Vol. 11, Nos. 2–3, 2004, pp. 463–475.

[5] Lanckriet, G. R., et al., “Kernel-Based Data Fusion and Its Application to Protein Function Prediction in Yeast,” Pac Symp Biocomput, Honolulu, January, 6–10, 2004, pp. 300–311.

[6] Marcotte, E. M., et al., “A Combined Algorithm for Genome-Wide Prediction of Protein Function,” Nature, Vol. 402, No. 6757, 1999 pp. 83–86.

[7] Pavlidis, P., et al., “Learning Gene Functional Classifi cations from Multiple Data Types,”

J Comput Biol, Vol. 9, No. 2, 2002, pp. 401–411.

[8] Brazhnik, P., A. de la Fuente, and P. Mendes, “Gene Networks: How to Put the Function in Genomics,” Trends Biotechnol, Vol. 20, No. 11, 2002, pp. 467–472.

[9] Ashburner, M., et al., “Gene Ontology: Tool for the Unifi cation of Biology. The Gene On-tology Consortium,” Nat Genet, Vol. 25, No. 1, 2000, pp. 25–29.

[10] Chen, Y., and D. Xu, “Global Protein Function Annotation Through Mining Genome-Scale Data in Yeast Saccharomyces Cerevisiae,” Nucleic Acids Res, Vol. 32, No. 21, 2004, pp. 6414–6424.

5.9 Conclusion 109

[11] Chen, Y., and D. Xu, “Understanding Protein Dispensability Through Machine-Learning Analysis of High-Throughput Data,” Bioinformatics, Vol. 21, No. 5, 2005, pp. 575–581.

[12] Choi, J. K., et al., “Combining Multiple Microarray Studies and Modeling Interstudy Vari-ation,” Bioinformatics, Vol. 19, Supplement 1, 2003, pp. i84–i90.

[13] Hughes, T. R., et al., “Functional Discovery via a Compendium of Expression Profi les,”

Cell, Vol. 102, No. 1, 2000, pp. 109–126.

[14] Joshi, T., et al., “GeneFAS: A Tool for Prediction of Gene Function Using Multiple Sources of Data,” Methods Mol Biol, Vol. 439, 2008, pp. 369–386.

[15] Kishino, H., and P. J. Waddell, “Correspondence Analysis of Genes and Tissue Types and Finding Genetic Links from Microarray Data,” Genome Inform, Genome Inform Ser Workshop, Vol. 11, 2000, pp. 83–95.

[16] Lee, H. K., et al., “Coexpression Analysis of Human Genes Across Many Microarray Data Sets,” Genome Res, Vol. 14, No. 6, 2004, pp. 1085–1094.

[17] Rhodes, D. R., et al., “Meta-Analysis of Microarrays: Interstudy Validation of Gene Ex-pression Profi les Reveals Pathway Dysregulation in Prostate Cancer,” Cancer Res, Vol. 62, No. 15, 2002, pp. 4427–4433.

[18] Rhodes, D. R., et al., “Large-Scale Meta-Analysis of Cancer Microarray Data Identifi es Common Transcriptional Profi les of Neoplastic Transformation and Progression,” Proc Natl Acad Sci USA, Vol. 101, No. 25, 2004, pp. 9309–9314.

[19] Seki, M., et al., “Functional Annotation of a Full-Length Arabidopsis cDNA Collection,”

Science, Vol. 296, No. 5565, 2002, pp. 141–145.

[20] Zhou, X. J., et al., “Functional Annotation and Network Reconstruction Through Cross-Platform Integration of Microarray Data,” Nat Biotechnol, Vol. 23, No. 2, 2005, pp. 238–243.

[21] Joshi, T., and Xu, D., “Quantitative Assessment of Relationship Between Sequence Similar-ity and Function SimilarSimilar-ity,” BMC Genomics, Vol. 8, No. 1, 2007, p. 222.

[22] Altschul, S. F., et al., “Gapped BLAST and PSI-BLAST: A New Generation of Protein Data-base Search Programs,” Nucleic Acids Res, Vol 25, No. 17, 1997, pp. 3389–3402.

[23] O’Brien, K. P., M. Remm, and E. L. Sonnhammer, “Inparanoid: A Comprehensive Da-tabase of Eukaryotic Orthologs,” Nucleic Acids Res, Vol. 33, DaDa-tabase Issue, 2005, pp. D476–D480.

[24] Pellegrini, M., et al., “Assigning Protein Functions by Comparative Genome Analy-sis: Protein Phylogenetic Profi les,” Proc Natl Acad Sci USA, Vol. 96, No. 8, 1999, pp. 4285–4288.

[25] Finn, R. D., et al., “Pfam: Clans, Web Tools and Services,” Nucleic Acids Res, Vol 34, Da-tabase Issue, 2006, pp. D247–D251.

[26] Mulder, N. J., et al., “InterPro, Progress and Status in 2005,” Nucleic Acids Res, Vol. 33, Database Issue, 2005, pp. D201–D205.

[27] Glazko, G., A. Gordon, and A. Mushegian, “The Choice of Optimal Distance Measure in Genome-Wide Datasets,” Bioinformatics, Vol 21, Supplement 3, 2005, pp. iii3–iii11.

[28] Warnat, P., R. Eils, and B. Brors, “Cross-Platform Analysis of Cancer Microarray Data Improves Gene Expression Based Classifi cation of Phenotypes,” BMC Bioinformatics, Vol.

6, 2005, p. 265.

[29] Stevens, J. R., and R.W. Doerge, “Combining Affymetrix Microarray Results,” BMC Bio-informatics, Vol. 6, 2005, p. 57.

[30] Reverter, A., et al., “Joint Analysis of Multiple cDNA Microarray Studies via Multivariate Mixed Models Applied to Genetic Improvement of Beef Cattle,” J Anim Sci, Vol. 82, No.

12, 2004, pp. 3430–3439.

[31] Magwene, P. M., and J. Kim, “Estimating Genomic Coexpression Networks Using First-Order Conditional Independence,” Genome Biol, Vol. 5, No. 12, 2004, p. R100.

[32] Culhane, A. C., et al., “MADE4: An R Package for Multivariate Analysis of Gene Expres-sion Data,” Bioinformatics, Vol. 21, No. 11, 2005, pp. 2789–2790.

[33] Eisen, M. B., et al., “Cluster Analysis and Display of Genome-Wide Expression Patterns,”

Proc Natl Acad Sci USA, Vol. 95, No. 25, 1998, pp. 14863–14868.

[34] Kim, S. K., et al., “A Gene Expression Map for Caenorhabditis Elegans,” Science, Vol 293, No. 5537, 2001, pp. 2087–2092.

[35] Segal, E., et al., “Module Networks: Identifying Regulatory Modules and Their Condi-tion-Specifi c Regulators from Gene Expression Data,” Nat Genet, Vol. 34, No. 2, 2003, pp. 166–176.

[36] Barrett, T., et al., “NCBI GEO: Mining Millions of Expression Profi les—Database and Tools,” Nucleic Acids Res,Vol. 33, Database Issue, 2005, pp. D562–D566.

[37] Barrett, T., et al., “NCBI GEO: Mining Tens of Millions of Expression Profi les—Database and Tools Update,” Nucleic Acids Res, Vol. 35, Database Issue, 2007, pp. D760–D765.

[38] Barrett, T., and R. Edgar, “Mining Microarray Data at NCBI’s Gene Expression Omnibus (GEO),” Methods Mol Biol, Vol. 338, 2006, pp. 175–190.

[39] Park, T., et al., “Combining Multiple Microarrays in the Presence of Controlling Vari-ables,” Bioinformatics, Vol. 22, No. 14, 2006, pp. 1682–1689.

[40] Ghosh, D., et al., “A Link Between SIN1 (MAPKAP1) and Ply(rC) Binding Protein 2 (PCBP2) in Counteracting Environmental Stress,” Proc Natl Acad Sci USA, Vol. 105, No. 33, 2008, pp. 11673–11678.

[41] Stuart, J. M., et al., “A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules,” Science, Vol. 302, No. 5643, 2003, pp. 249–255.

[42] Barabasi, A. L., and Z. N. Oltvai, “Network Biology: Understanding the Cell’s Functional Organization,” Nat Rev Genet, Vol. 5, No. 2, 2004, pp. 101–113.

[43] de la Fuente, A., P. Brazhnik, and P. Mendes, “Linking the Genes: Inferring Quan-titative Gene Networks from Microarray Data,” Trends Genet, Vol. 18, No. 9, 2002.

pp. 395–398.

[44] Gachon, C. M., et al., “Transcriptional Co-Regulation of Secondary Metabolism Enzymes in Arabidopsis: Functional and Evolutionary Implications,” Plant Mol Biol, Vol 58, No. 2, 2005, pp. 229–245.

[45] Ideker, T., et al., “Discovering Regulatory and Signalling Circuits in Molecular Interaction Networks,” Bioinformatics, Vol. 18, Supplement 1, 2002 pp. S233–S240.

[46] Toh, H., and K. Horimoto, “Inference of a Genetic Network by a Combined Approach of Cluster Analysis and Graphical Gaussian Modeling,” Bioinformatics, Vol. 18, No. 2, 2002, pp. 287–297.

[47] Opgen-Rhein, R., and K. Strimmer, “From Correlation to Causation Networks: A Simple Approximate Learning Algorithm and Its Application to High-Dimensional Plant Gene Expression Data,” BMC Syst Biol, Vol. 1, 2007 p. 37.

[48] Lee, T. I., et al., “Transcriptional Regulatory Networks in Saccharomyces Cerevisiae,” Sci-ence, Vol. 298, No. 5594, 2002, pp. 799–804.

[49] Haake, V., et al., “Transcription Factor CBF4 Is a Regulator of Drought Adaptation in Arabidopsis,” Plant Physiol, Vol. 130, No. 2, 2002, pp. 639–648.

[50] Seki, M., et al., “Monitoring the Expression Pattern of 1300 Arabidopsis Genes Under Drought and Cold Stresses by Using a Full-Length cDNA Microarray,” Plant Cell, Vol. 13, No. 1, 2001, pp. 61–72.

[51] Yugi, K., et al., “A Microarray Data-Based Semi-Kinetic Method for Predicting Quantita-tive Dynamics of Genetic Networks,” BMC Bioinformatics, Vol. 6, 2005, p. 299.

[52] Schmid, M., et al., “A Gene Expression Map of Arabidopsis Thaliana Development,” Nat Genet, Vol. 37, No. 5, 2005, pp. 501–506.

5.9 Conclusion 111

[53] Palaniswamy, S. K., et al., “AGRIS and AtRegNet. A Platform to Link Cis-Regulatory Elements and Transcription Factors into Regulatory Networks,” Plant Physiol, Vol. 140, No. 3, 2006, pp. 818–829.

[54] Davuluri, R.V., et al., “AGRIS: Arabidopsis Gene Regulatory Information Server, an In-formation Resource of Arabidopsis Cis-Regulatory Elements and Transcription Factors,”

BMC Bioinformatics, Vol. 4, 2003, p. 25.

[55] Pal, D., “On Gene Ontology and Function Annotation,” Bioinformation, Vol. 1, No. 3, 2006, pp. 97–98.

113 C H A P T E R 6

Mapping Genes to Biological Pathways

Dans le document Data Mining in Biomedicine Using Ontologies (Page 124-130)