• Aucun résultat trouvé

Pattern Matching in Protein-Protein Interaction Graphs

N/A
N/A
Protected

Academic year: 2022

Partager "Pattern Matching in Protein-Protein Interaction Graphs"

Copied!
61
0
0

Texte intégral

(1)

Pattern Matching in Protein-Protein Interaction Graphs

Ga ¨elle Brevier(Universit ´e de Grenoble, France) Romeo Rizzi(Universit `a di Udine, Italy) St ´ephane Vialette(Universit ´e Paris-Est, France)

Lisbon, September 19, 2008

(2)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(3)

Introduction

Protein interactions identified on a genome-wide scale are commonly visualized as protein interaction graphs, where proteins are vertices and interactions are edges.

(4)

Gene or Protein Interactions Databases

BioGRID- A Database of Genetic and Physical Interactions DIP- Database of Interacting Proteins

MINT- A Molecular Interactions Database IntAct- EMBL-EBI Protein Interaction

MIPS- Comprehensive Yeast Protein-Protein interactions

Yeast Protein Interactions- Yeast two-hybrid results from Fields’ group PathCalling- A yeast protein interaction database by Curagen SPiD- Bacillus subtilis Protein Interaction Database

AllFuse- Functional Associations of Proteins in Complete Genomes BRITE- Biomolecular Relations in Information Transmission and Expression ProMesh- A Protein-Protein Interaction Database

The PIM Database- by Hybrigenics Mouse Protein-Protein interactions

Human herpesvirus 1 Protein-Protein interactions Human Protein Reference Database

BOND- The Biomolecular Object Network Databank. Former BIND

MDSP- Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry Protcom- Database of protein-protein complexes enriched with the domain-domain structures Proteins that interact with GroEL and factors that affect their release

YPDTM- Yeast Proteome Database by Incyte . . .

(5)

Introduction

Comparative analysis of protein-protein interaction graphs aims at finding complexes that are common to different species.

Mounting evidence suggests that proteins that function together in a pathway or a structural complex are likely to evolve in a

correlated fashion.

(6)

Intoduction

Pattern matching in protein-protein interaction graphs Finding a protein complex in another protein network.

Graph matching

Focus on mappings that preserve adjacencies (to deal with interaction datasets that are missing many true protein interactions).

Injective list homomorphisms and optimization State-of-the art approaches to identifying orthologs (genes in different species that originate from a single gene in the last common ancestor of these species).

Putative orthologs are represented by colors

(7)

Introduction: Searching for an exact occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =5

(8)

Introduction: Searching for an exact occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =5 θ:V(G)−−−λGH V(H)

(9)

Introduction: Searching for an exact occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =5 θ:V(G)−−−λGH V(H)

(10)

Introduction: Searching for an exact occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =5 θ:V(G)−−−λGH V(H)

(11)

Introduction: Searching for an exact occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =5 θ:V(G)−−−λGH V(H)

(12)

Introduction: Searching for an exact occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =5 θ:V(G)−−−λGH V(H)

(13)

Introduction: Searching for the best occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =4

(14)

Introduction: Searching for the best occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =4 θ:V(G)−−−λGH V(H)

(15)

Introduction: Searching for the best occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =4 θ:V(G)−−−λGH V(H)

(16)

Introduction: Searching for the best occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =4 θ:V(G)−−−λGH V(H)

(17)

Introduction: Searching for the best occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =4 θ:V(G)−−−λGH V(H)

(18)

Introduction: Searching for the best occurrence

Pattern graph(G, λG) mult(G, λG) =2

Target graph(H, λH) mult(H, λH) =4 θ:V(G)−−−λGH V(H)

(19)

Problem

Max–(ρ, σ)–Matching–Colors

Input: Two graphsGandH and the coloring mappings λG :V(G)→C, mult(G, λG) =ρ, and

λH :V(H)→C, mult(H, λH) =σ.

Solution: An injective mappingθ:V(G)−−−λGH V(H).

Measure: The number of edges ofGmatched by the injective mappingθ.

EXACT–(ρ, σ)–MATCHING–COLORSis the extremal problem of finding an injective mappingθ:V(G)−−−λGH V(H)that matches all the edges ofG.

(20)

Introduction

Trim instance

An instance of theMAX–(ρ, σ)–MATCHING–COLORSor the

EXACT–(ρ, σ)–MATCHING–COLORSproblem is said to betrimif the following conditions hold true:

1 for each colorciC,#CG(ci)≤#CH(ci), and

2 for each edge{ui,uj}∈E(G), there exists an edge{vi,vj}∈E(H) such thatλG(ui) =λH(vi)andλG(uj) =λH(vj).

(21)

Related works in the context

List injective homomorphisms for protein graphs

[Fagnot, Lelandais and V., 2007; Fertin, Rizzi and V., 2005].

Reaction motifs in metabolic networks

[Lacroix, Fernandes and Sagot, 2006; Hermelin, Fellows, Fertin and V., 2007].

QPath

[Shlomi, Segal, Ruppin and Sharan, 2006].

Path Matching and Graph Matching in Biological Networks [Yang and Sze, 2007].

(22)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(23)

Exact colorful instances

Theorem (Fagnot, Lelandais and V., 2007)

Both theEXACT–(1, σ)–MATCHING–COLORSproblem for∆(G)≤2 and theEXACT–(ρ,2)–MATCHING–COLORSproblem are solvable in polynomial-time for any constantρandσ.

Theorem (Fertin, Rizzi and V., 2005)

The EXACT–(1,3)–MATCHING–COLORSproblem for∆(G) =3and

∆(H) =4isNP-complete.

We focus here on theEXACT–(1, σ)–MATCHING–COLORSproblem.

(24)

Exact colorful instances

Algorithm 1:Rand-Exact-Matching-Colors begin

terminating whether an occurrence ofGinH w.r.tλGandλH is found.Letθ:V(G)−−−λGH V(H)be a random injective mapping.

up to 3nGtimes, terminating whether an occurrence of GinH w.r.t λG andλH is found. (1)Choose at random an edge e∈E(G)that is not matched byθ.

(2)Choose at random one vertex u∈e.

(3)Change at random the value ofθ(u)w.r.t.λG andλH. end

(25)

Exact colorful instances: Random walk

Particle moving along the integer line Fix an optimal solutionθopt.

θi andθoptagree on exactlyj vertices.

e={u,v}∈E(G)random edge that is not matched byθi. θi andθoptdisagree on exactly one ofu andv.

0 j1 j j+1 nG

σ1 2

1 2

σ2 2

(26)

Exact colorful instances: Random walk

Particle moving along the integer line Fix an optimal solutionθopt.

θi andθoptagree on exactlyj vertices.

e={u,v}∈E(G)random edge that is not matched byθi. θi andθoptdisagree on bothu andv.

0 j1 j j+1 nG

0 2

2

4 2

(27)

Exact colorful instances: Random walk

Particle moving along the integer line Pessimistic stochastic process(Y1,Y2, . . .)

0 j1 j j+1 nG

3 2

1 2

0

(28)

Exact colorful instances: Random walk

Useful bounds

Letrj be the probability of exactlyk “moves down”, andj+k

“moves up” in a sequence of 2k+j moves:

rj

3 2

k 1 2

j+k

Letqj be the probability that the algorithm finds an injective homomorphism withinj+2k ≤3nG steps, starting from a random injective mappingθ:V(G)−−−λGH V(H)

qj

3 8p

πj

27(2σ3) 4(2σ2)3

j

(29)

Exact colorful instances: Random walk

Theorem

AlgorithmRand-Exact-Matching-Colorsreturns an injective

homomorphismθ:V(G)−−−λGH V(H)(if such a mapping exists) in O(f˜ (σ)nG)expected time, where

f(σ) = 4σ(2σ−2)3

4(2σ−2)3+27(2σ−3)·

Notice

f(σ)< σ, forσ >2.

f(3)<2.279,f(4)<3.460 andf(5)<4.578.

ρ=1.

(30)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(31)

Hardness results

Theorem

TheMAX–(3,3)–MATCHING–COLORSproblem isAPX-hard even if both G and H are linear forests, and

theMAX–(2,2)–MATCHING–COLORSproblem isAPX-hard even if both G and H are trees.

Notice

It remains open, however, whether the

MAX–(ρ, σ)–MATCHING–COLORSproblem for linear forestsGandHis polynomial-time solvable in caseρ <3.

(32)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(33)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(34)

Bounded degree graphs: Intermediate problem

Max–Matching–With–Color–Constraints

Input: A graphGtogether with a coloring mapping

λG : V(G) → {c1,c2, . . . ,cm}, and a symmetric matrix A = [ai,j] of ordermwhose entries are natural integers.

Solution : A matchingM ⊆ E(G) s.t. the constraint that, for 1≤i≤j ≤m, the number of edges inMhaving one end-vertex coloredci and one end-vertex coloredcj is at mostai,j.

Measure: The size of the matching,i.e.,#M.

(35)

Bounded degree graphs: Intermediate problem

Theorem

The MAX–MATCHING–WITH–COLOR–CONSTRAINTSproblem is NP-complete but is approximable within ratio3/2+ε, for anyε >0.

Proof.

Approximation preserving reduction toMAXIMUM B-SETPACKING. MAXIMUM SETPACKINGis defined as follows: Given a collection Sof finite subsets of a ground setX, find a maximum cardinality collection of pairwise disjoint setsS0⊆S.

MAXIMUMB-SETPACKINGis the variation ofMAXIMUM SET

PACKINGin which the cardinality of all sets inC are bounded from above by a constantB≥3.

(36)

Bounded degree graphs

Chromatic index

Anedge coloringof a graphGisproperif no two adjacent edges are assigned the same color.

The smallest number of colors needed in a proper edge coloring of a graphGis thechromatic indexχ0(G).

Vizing’s theorem states thatχ0(G)≤∆(G) +1and that such an edge coloring can be found in polynomial-time.

Petersen graph: χ0(G) =∆(G) +1=4

(37)

Bounded degree graphs

Theorem

For anyρandσ, theMAX–(ρ, σ)–MATCHINGCOLORS problem is approximable within ratio3/2(∆min+1) +εdor anyε >0, where

min =min{∆(G), ∆(H)}.

Key elements Chromatic index.

Vizing’s theorem.

Iteratively using the(3/2+ε)-approximation algorithm for

instances of theMAX–MATCHING–WITH–COLOR–CONSTRAINTS

problem.

(38)

Bounded degree graphs

Proof.

1. ∆min =∆(H).

1 H admits a proper edge coloring with at most∆(H) +1 colors, say {c10,c20, . . . ,c∆(H)+10 }.

2 For 1≤i≤∆(H) +1,

1 letHi be the graph obtained fromHby deleting all edges but those colored with colorci0,note thatHi is a matching.

2 Using the(3/2+ε)-approximation algorithm for the

MAX–MATCHING–WITH–COLOR–CONSTRAINTSproblem, we obtain 2-approximation algorithm for the new instance of the MAX–(ρ, σ)–MATCHINGCOLORSproblem obtained by replacingH byHi.

3 Returning the best one these∆(H) +1 mappings yields an

approximation algorithm with performance ratio 3/2(∆(H) +1) +ε.

(39)

Bounded degree graphs

Proof.

2. ∆min =∆(G):

1 Gadmits a proper edge coloring with at most∆(G) +1 colors, say {c10,c20, . . . ,c∆(G)+10 }.

2 For 1≤i≤∆(G) +1,

1 letGi be the graph obtained fromGby deleting all edges but those colored with colorci0,note thatGi is a matching.

2 Using the(3/2+ε)-approximation algorithm for the

MAX–MATCHING–WITH–COLOR–CONSTRAINTSproblem, we obtain 2-approximation algorithm for the new instance of the MAX–(ρ, σ)–MATCHINGCOLORSproblem obtained by replacingG byGi.

3 Returning the best one these∆(G) +1 mappings yields an

approximation algorithm with performance ratio 3/2(∆(G) +1) +ε.

(40)

Bounded degree graphs

Proof.

3. Combining

min=∆(H): (3/2(∆(H) +1) +ε)-approximation algorithm.

min=∆(G): (3/2(∆(G) +1) +ε)-approximation algorithm.

yields

(3/2(∆min+1) +ε)-approxmination algorithm for anyε >0,

min=min{∆(G), ∆(H)}.

(41)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(42)

A randomized algorithm

Definitions

LetGbe a graph andλG :V(G)Cbe a coloring mapping ofG.

Alegal(`1, `2)-labelingofGis an assignment to labels{`1, `2}to the vertices ofGsuch that, for each colorciC, eitherj#

CG(ci) 2

k

orl#

CG(ci) 2

m

vertices inCG(ci)are labeled`1.

Thecut induced by a legal(`1, `2)-labelingto be the set of edges that have one end-vertex with label`1and one end-vertex with label`2.

(43)

A randomized algorithm

(44)

A randomized algorithm

`1

`1

`1 `2

`2

`1

`1

`2

`2

`1 `1

`2

(45)

A randomized algorithm

`1-subset `2-subset

`1

`1

`1 `2

`2

`1

`1

`2

`2

`1 `1

`2

(46)

A randomized algorithm

`1-subset `2-subset

(`1, `2)-cut edges

`1

`1

`1 `2

`2

`1

`1

`2

`2

`1 `1

`2

(47)

A randomized algorithm

Theorem

There exists a randomized algorithm for the

MAX–(ρ, σ)–MATCHING–COLORSproblem with expected performance ratio4σ.

Key elements

Random(`1, `2)-labeling.

Random mappingθ:V(G)−−−λGH V(H).

Maximum weighted matching in bipartite graphs.

(48)

A randomized algorithm

Pattern graph(G, λG) Target graph(H, λH)

(49)

A randomized algorithm

Pattern graph(G, λG)

`2-labeling

`1-labeling

Target graph(H, λH)

(50)

A randomized algorithm

Pattern graph(G, λG)

`2-labeling

`1-labeling

Target graph(H, λH) Optimal solutionθopt

(51)

A randomized algorithm

Pattern graph(G, λG)

`2-labeling

`1-labeling

Target graph(H, λH) Optimal solutionθopt

Random mapping θrand

(52)

A randomized algorithm

Pattern graph(G, λG)

`2-labeling

`1-labeling

Target graph(H, λH) Optimal solutionθopt

Random mapping θrand u

v

θopt(u) =θrand(u)

∀w ∈V(G)|`1 θrand(w)6=θopt(v)

(53)

A randomized algorithm

Pattern graph(G, λG)

`2-labeling

`1-labeling

Target graph(H, λH) Optimal solutionθopt

Random mapping θrand u

v

θopt(u) =θrand(u)

∀w ∈V(G)|`1 θrand(w)6=θopt(v) Weight. matching

M

(54)

A randomized algorithm

Pattern graph(G, λG)

`2-labeling

`1-labeling

Target graph(H, λH) Optimal solutionθopt

Random mapping θrand u

v

θopt(u) =θrand(u)

∀w ∈V(G)|`1 θrand(w)6=θopt(v)

Solution θsolrand+M

Weight. matching M

(55)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(56)

Linear Forests

Theorem

The MAX–(3,3)–MATCHING–COLORSproblem isAPX-hard even if both G and H are linear forests.

Theorem

For anyρandσ, theMAX–(ρ, σ)–MATCHING–COLORSproblem is approximable within ratio4in case both G and H are linear forests.

Key elements

Balanced 2-intervals.

Weighted independent set.

(57)

Linear Forests

Definitions

A2-intervalD = (I,J)is the union of two disjoint intervals defined over a single line.

D

I J

A 2-intervalD = (I,J)is said to bebalancedif|I|=|J|. D

I J

Two 2-intervalsD1= (I1,J1)andD2= (I2,J2)aredisjoint, if both 2-intervals share no common point.

(58)

Linear Forests

11 11 2 11 11 11

P1G P2G

P1H P2H P3H

Pattern graph(G, λG) Target graph(H, λH)

P1G P2G P1H P2H P3H

(59)

Linear Forests

Theorem (Crochemore, Hermelin, Landau, Rawitz and V., 2006)

There exists a polynomial-time algorithm with performance ratio4for finding a maximum weight subset of disjoint2-intervals in a set of weighted balanced2-intervals.

Key elements

Local ratio technique.

r-effective weight function.

Theorem

For anyρandσ, theMAX–(ρ, σ)–MATCHING–COLORSproblem is approximable within ratio4in case both G and H are linear forests.

(60)

Outline

1 Introduction

2 Exact colorful instances

3 Hardness results

4 Approximation algorithms Bounded degree graphs A randomized algorithm Linear forests

5 Future works

(61)

Future works

Improve the random walk algorithm for the EXACT–(ρ, σ)–MATCHING–COLORSproblem.

What aboutρ≥2 . . . ?

Improve the approximation ratio for bounded degree graphs Design a better (randomized?) approximation algorithm for the MAX–(ρ, σ)–MATCHING–COLORSproblem.

Is theMAX–(ρ, σ)–MATCHING–COLORSproblem approximable within ratioσ?

Références

Documents relatifs

Pour réaliser la photo ci-dessus on a introduit de l’huile de colza dans un verre contenant un mélange d’eau et d’alcool ayant la même masse volumique que l’huile. A partir

Les indicateurs statistiques concernant les résultats du joueur n°2 sont donnés dans le tableau ci-dessous.. Indicateurs statistiques des résultats du

Note that the [∆]-stable matrices on Σ are encountered, but nowhere with this name so far, e.g., in the theory of grouped Markov chains (in the special case ∆ = Σ (see, e.g., [2,

This differential equation has the structure of a gradient flow and this makes easy to analyze its long time behaviour, that is equivalent to know what are the stable Dirac states

Exprimer la masse de cette plaque en g. Calculer la masse volumique de cette plaque en g/mL. En quel matériau la plaque est elle formée ?. Le manomètre indique 1 000 hPa. Le

[r]

[r]

un polynôme irréductible sur k avec n racines distinctes dans une extension algébrique de k... Alors, l’ordre de G divise 24 et est divisble