• Aucun résultat trouvé

Graph Embedding Probem

N/A
N/A
Protected

Academic year: 2022

Partager "Graph Embedding Probem"

Copied!
37
0
0

Texte intégral

(1)

International Master of Research in Computer Science: Computer Aided Decision Support

Graph Embedding Probem

Jean-Yves Ramel – Romain Raveaux

Laboratoire Informatique de Tours - FRANCE Presented by :

Romain Raveaux

(2)

International Master of Research in Computer Science: Computer Aided Decision Support

Content

1. 

Graph Embedding problem

1. 

G à RN

2. 

Kernel

3. 

Kernel trick

(3)

Statistical vs Structural Pattern Recognition

   

symbolic  data  structure   numeric  feature  vector  

Yes   No    

No   Yes  

Yes   No  

No   Yes  

Data structure

Representational strength Fixed dimensionality

Sensitivity to noise

Efficient computational tools

Pa#ern  Recogni-on  

Structural   Sta-s-cal  

(4)

Feature space

   

(5)

   

Structutal PR

Expressive, convenient, powerful but computationally expensive

representations

Statistical PR

Mathematically sound, mature,

less expensive and computationally efficient models Graph embedding

(6)

   

Explicit GEM

§  embeds each input graph into a numeric feature vector

§  provides more useful methods of GEM for PR

§  can be employed in a standard dot product for defining an implicit graph embedding function

Implicit GEM

§  computes scalar product of two graphs in an implicitly existing vector space, by using graph kernels

§  does not permit all the operations that could be defined on vector spaces

(7)

Explicit GEM

Topological Descriptors

•  Principle

•  Map each graph to a feature vector

•  Use distances and metrics on vectors for learning on graphs

•  Advantages

•  Reuses known and efficient tools for feature vectors

•  Disadvantages

•  Efficiency comes at a price: feature vector transformation leads to loss of topological information (or includes

subgraph isomorphism as one step)

   

(8)

Implicit GEM

Polynomial Alternatives

•  Graph kernels

•  Compare substructures of graphs that are computable in polynomial time.

•  Criteria for a good graph kernel

•  Expressive

•  Efficient to compute

•  Positive definite

•  Applicable to wide range of graphs

   

(9)

Implicit Graph Embedding

•  Graph kernel

•  What is a Kernel?

•  Map two objects x and x′ via mapping φ into feature space H.

•  Measure their similarity in H as <φ(x), φ(x′)>.

•  Kernel Trick: Compute inner product in H as kernel in input space

•  k(x, x′) = <φ(x), φ(x′)>.

   

(10)

   

Graph Kernels

•  General case:

•  Directed

•  Labels on both vertex and edges

•  Loops and cycles are allowed (not in all algorithms)

•  Particular cases easily derived from the general one:

•  Non-directed

•  No label on edge, no label on vertex

(11)

   

Theoretical issues

•  To design a kernel taking the whole graph structure into account amounts to build a complete graph kernel that

distinguishes between 2 graphs only if they are not isomorphic

•  Complete graph kernel design is theoretically possible … but practically infeasible (NP-complex)

•  Approximations are therefore necessary:

•  Common local subtrees kernels

•  Common (label) walk kernels (most popular)

(12)

12  

                                                                                                   ϕ : G→  Rn          ϕ( g) = (x1,….,  xn)’  

Many  informa8on  can  be  extracted  :  

                                                               ànodes,  cliques,  paths,  walk,  …    

Very Simple Graph Kernel

Graph  Kernel  /  Graph  Embedding  [Bunke09]  

Example  

Graph  Embedding  

(13)

Implicit Graph Embedding

   

(14)

   

Graph kernels based on Common Walks

•  Walk = (possibly infinite) sequence of labels obtained by following edges on the graph

•  Path = walk with no vertex visited twice

•  Important concept: direct product of two graphs G1xG2

•  V(G1xG2)={(v1,v2), v1 and v2: same labels)

•  E(G1xG2)={(e1,e2): e1, e2: same labels, p(e1) and p(e2) same labels, n(e1) and n(e2) same labels}

e

p(e) n(e)

Same labels : Difficulty to deal with numeric attributes

(15)

Random Walks:

Explanation : Direct Graph Product

   

(16)

Random Walk

   

(17)

Walks of lengh 2

   

(18)

Random Walks:

Explanation : Idea

   

(19)

Random Walks:

Explanation : a better idea

   

(20)

Random Walks:

Explanation : Diffusion Kernels

   

(21)

Random Walks:

Explanation : Graph Comparison

   

(22)

Efficient computation ?

   

(23)

Part 2

• 

Feature space

   

(24)

Explicit Graph Embedding

• Graph probing based methods

• Spectral based graph embedding

   

(25)

Graph probing

•  Feature extraction from a graph

•  [Papadopoulos et al., 1999]

•  [Lopresti 2003]

   

(26)

Spectral based graph embedding

•  [Harchaoui, 2007] [Luo et al., 2003] [Robleskelly and Hancock, 2007]

•  Often limited to graph databases where all graphs have the same number of nodes.

   

1  

1   1  

1   1  

1   1   1  

1   1  

Spectral graph theory employing the adjacency and Laplacien matrices

Eigen values and Eigen vectors PCA, ICA, MDS

(27)

Spectral based graph embedding

•  [Shokoufandeh 2006], a "topological signature vector"

   

(28)

Trends and recent ideas

•  Muzzamil Luqman [Luqman09]

•  Sidère [Sidère09]

   

(29)

Reconnaissance des Formes à base de graphes

Graph Embedding « flou » [Luqman09]

ϕ( g) = (x1,…., xn)’

Fonction d’appartenance floue (définissant la contribution d’une primitive en fonction de sa valeur )

(30)

Lexique des motifs topologiques Fréquences d’apparition des motifs

→ Construction d’un vecteur ou matrice de caractéristiques statistiques

Quelle transformation ϕ ? Quelle mesure de similarité ?

Pas d’information sur les appariements entre sommets des deux graphes

Reconnaissance des Formes à base de graphes

Graph Embedding [Sidère09]

Exemple

Problèmes avec le embedding, probing, kernel

Graphe avec attributs

(31)

Why I did not speak about :

•  Graphlet Kernel (B., Petri, et al., MLG 2007)

•  Principle

•  Count subgraphs of limited size k in G and G

•  These subgraphs are referred to as graphlets (Przulj, Bioinformatics 2007)

•  Define graph kernel that counts isomorphic graphlets in two graphs

   

(32)

Why I did not speak about :

•  Combine graph kernels with graphical models (Bach, ICML 2008)

•  Presents a new kernel for 2D or 3D point clouds

•  Compares local subsets of the point clouds

•  Considers subsets based on subtrees and walks

•  Uses a specific factorized form for the local kernels between subtrees.

•  Combine graph kernels with group theory (Kondor and B., ICML 2008)

•  Represent graph as a function over the symmetric group

•  Derive invariants for that function called the skew spectrum

•  Use subset of these invariants that is computable in O(n3) as feature representation of the graph.

   

(33)

Big Open Questions

•  Comparing paths in two different graphs is polynomial

•  Subgraph isomorphism is known to be NP-hard

•  Computing the so-called universal graph distance which counts all common subgraphs of two graphs is harder than subgraph

isomorphism

•  When we compare any other subgraphs e.g. simple paths (where vertices do not repeat)

•  Cycles

•  Trees

We seem to lose polynomial run-time

•  Are there other subgraphs for which efficient computation is possible?

   

(34)

Conclusion

   

(35)

To  take  the  stock  

•  Provide  a  very  high  representa8on  

–  Topology,  structure,  composi8on,  ...  

–  Choice  of  representa8on  (node  or  arc)  

–  Choice  of  aRributes  (symbolic,  numerical,  ...)   –  Time  of  construc8on,  size,  ...  

•  Treatments  graphs  

–  Local  analysis    segmenta8on,  localiza8on   –  See  Opera8onal  Research  (many  algorithms?)  

•  Comparison  graphs  

–  many  changes  

–  Very  high  complexity   –  Use  heuris8c  

–  Transforma8on  vectors  sta8s8cs   –  Loss  of  interest?  

•  The  last  of  the  rules  RdF  J.  C.  Simon:  

–  Above  all,  never  despair!  

(36)

Literrature  

•  Towards  the  unifica8on  of  structural  and   sta8s8cal  paRern  recogni8on  

–  Horst  Bunke,  Kaspar  Riesen    

(37)

   

Références

Documents relatifs

component must be determined. This will require a measurement of the correlation curve at a minimum of two different valves of *. It can be shown that the advantage

Domain decomposition, coupling methods, optimized boundary con- ditions, Navier-Stokes equations, hydrostatic approximation, Schwarz alternating

problem A to one of these two problems, it would be necessary to ”translate” the restrictions associated with an instance of A to some graph substructures just like we did in the

Der „Vertrag zwischen Österreich und der Bundesrepublik Deutschland zur Ordnung vermögensrechtlicher Beziehungen“ (Vermögensvertrag) gliedert sich in sechs Teile: Teil eins regelte

Methods to learn representations that encode structure in- formation about the graph can be categorized to be graph- kernel based or graph neural network (GNN) based2. Inter-

We have studied in this paper the conditions, on the costs of the elementary operations of edit paths, under which the optimal edit path encoding the edit distance should pass through

This paper presents a new Mixed Integer Linear Program (MILP) formulation for the Graph Edit Distance (GED) problem.. The contribution is an exact method that solves the GED problem

Serratosa, A general model to define the substitu- tion, insertion and deletion graph edit costs based on an embedded space, Pattern Recognition Let- ters 138 (2020) 115 –