HAL Id: hal-01242893
https://hal.archives-ouvertes.fr/hal-01242893
Submitted on 14 Dec 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Topological and semantic Web based method for analyzing TGF-β signaling pathways
Jean Coquet, Geoffroy Andrieux, Jacques Nicolas, Olivier Dameron, Nathalie Théret
To cite this version:
Jean Coquet, Geoffroy Andrieux, Jacques Nicolas, Olivier Dameron, Nathalie Théret. Topological and semantic Web based method for analyzing TGF-β signaling pathways. JOBIM 2015, Dec 2015, Clermont-Ferrand, France. JOBIM : Journées Ouvertes en Biologie, Informatique & Mathématiques, 2015. �hal-01242893�
RESULTS
ANALYSIS OF TGF-β SIGNALIZATION PATHWAYS THANKS
TO TOPOLOGICAL AND SEMANTIC WEB METHOD
CONTEXT
How to explore the combinatorial complexity of cell signaling?
1 Université de Rennes 1 - IRISA/INRIA, UMR6074, 263 avenue du Général Leclerc, 35042 Rennes, Cedex, France 2 Université de Rennes 1 - IRSET EA 4427 SeRAIC, IFR140, 2 avenue Pr Léon Bernard, Rennes, Cedex, France
3 German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, 69120 Heidelberg, Germany
Jean COQUET
1,2, Geoffroy ANDRIEUX
3, Jacques NICOLAS
1, Olivier DAMERON
1and Nathalie THERET
2I
RAW DATA
ˆ TGF-β signalizations influence some genes
ˆ Other stimuli in addition to TGF-β
TGF-β Other stimuli g1 g2 g3 g4 g5 p1 p2 p5 p7 p11 p16 p15 p17 p12 p8 p6 p3 p 4 p9 p10 p14 p13 p18
II
GENES INFLUENCED BY PROTEIN SETS
ˆ Signaling pathway = proteins set
ˆ Proteins set influence at least one gene
s1 = {p1,p7,p11,p15} s2 = {p1,p15} s3 = {p2,p5,p15} s4 = {p2,p5,p16} s5 = {p3,p6,p8,p12,p17} s6 = {p3,p6,p8,p9,p13} s7 = {p3,p6,p8,p9,p10,p18} s8 = {p4,p14} g1 g2 g3 g4 g5
III
(1) {g1,g2} x {s1,s2,s3} (2) {g2} x {s1,s2,s3,s4} (3) {g3,g4} x {s5} (4) {g4} x {s5,s6,s7} (5) {g4,g5} x {s7} (6) {g5} x {s7,s8} Formal ConceptsCLUSTERS OF SIMILAR GENES AND PROTEIN SETS
ˆ Identify clusters of similar genes and
protein sets with formal concept analysis
ˆ Binary matrix
ˆ Formal concepts = maximal bi-cliques
g1 g2 g3 g4 g5 s1 s2 s3 s4 s5 s6 s7 s8 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 1 Protein sets Genes
IV
ASSESS CLUSTERS' BIOLOGICAL RELEVANCE WITH SEMANTIC
HOMOGENEITY
ˆ Formal concept homogeneity = zScore of
similarity gene group
ˆ Semantic similarity based on Wang measure
[Wang et al, 2007] 0.8 0.8 0.8 0.8 0.8 0.6 0.43 0.24 0.21 Gene Ontology GO terms of g1 GO terms of g2
• zScore > 1.96 : strong similarity • zScore < -1.96 : low similarity
Semantic Similarity X
zScore(fci) = sim(fci) - m σ
zScore for each genes group:
BIOLOGICAL QUESTIONS:
Which pathways are cancer-specific?
Which mechanism is involved in the healthy → pathological transition? ˆ TGF-β = Transforming Growth Factor β
ˆ necessary in healthy situation
ˆ deleterious role in fibrosis and cancer
ˆ Pleiotropic effects of TGF-β are linked to the complex nature of its activation and signaling networks
ˆ Model of TGF-β signal propagation based on guarded transitions [Andrieux et al, 2014] represented in CADBIOM.
ˆ Pathway Interaction Database (PID) transform into a single unified model of signal transduction
ˆ Some formal concepts are highly connected
ˆ Gene clusters can have:
ˆ High homogeneity = genes similar influenced by same pathways
ˆ Low homogeneity = genes dissimilar influenced by same pathways
ˆ Chains of gene clusters included into one-another indicates that some influence sets are specific to some gene subsets
ˆ Validate the method on small synthetic models
ˆ Improve the method to find better gene clusters
ˆ Integrate the overlap between proteins sets
ˆ Try different methods of clustering
ˆ Compute the semantic homogeneity of protein sets
ˆ Take into account the structure underlying gene sets
ˆ Compute the stability of formal concepts
CONCLUSION
New method to analyze pathways: 1) Topological analysis
2) Clustering
3) Biological relevance
Model of TGF-β signal propagation
Gene clusters influenced by same pathways
with a biological relevance
References:
[Andrieux et al, 2014] Andrieux, G., Le Borgne, M., & Théret, N. (2014). An
integrative modeling framework reveals plasticity of TGF-beta signaling. BMC systems biology, 8(1), 30.
[Wang et al, 2007] Wang, J. Z., Du, Z., Payattakool, R., Philip, S. Y., & Chen, C. F. (2007). A new method to measure the semantic similarity of GO terms.
Bioinformatics, 23(10), 1274-1281.
SIGNALIZATION MODEL
ANALYSIS OF SIGNALING PATHWAYS : STEPS METHOD
IN PROGRESS
DISCUSSION
DATA ANALYSIS PROBLEM:
16,000 chains of reactions linking TGF-β to at least one of 159 target genes.
Concept lattice. Nodes are formal concepts, two nodes are linked if one is included into the other.
1 13 1 1 5 1 3 3 4 1 3 2 17 1 2 1 12 11 10 9 9 6 8 5 15 14 3 12 11 5 4 2 2 1 2 1 2 1 2 1 2 1 7 4 6 3 4 3 22 6 1 5 4 13 1 12 1 2 11 1 10 1 7 6 6 5 3 18 2 2 2 0 Homogeneity zScore -1.8 2.2 Number of genes 22 1 22