• Aucun résultat trouvé

Probabilistic Frequent Subtree Kernels

N/A
N/A
Protected

Academic year: 2022

Partager "Probabilistic Frequent Subtree Kernels"

Copied!
1
0
0

Texte intégral

(1)

Probabilistic Frequent Subtree Kernels

Pascal Welke1, Tam´as Horv´ath1,2, and Stefan Wrobel2,1

1 Dept. of Computer Science, University of Bonn, Germany

2 Fraunhofer IAIS, Schloss Birlinghoven, Sankt Augustin, Germany

Abstract. Graph kernels have become a well-established approach in graph mining. One of the early graph kernels, thefrequent subgraph ker- nel, is based on embedding the graphs into a feature space spanned by the set of all frequent connected subgraphs in the input graph database.

A drawback of this graph kernel is that the preprocessing step of gen- eratingallfrequent connected subgraphs is computationally intractable.

Many practical approaches ignore this limitation, implying that such sys- tems can be infeasible even for small datasets. Approaches that do not disregard this aspect either restrict the feature space or restrict the class of the input graphs to guarantee correctness and efficiency.

We propose a frequent subgraph kernel that is not restricted to any particular graph class, but still efficiently computable. All such kernels can only be achieved by relaxing the correctness condition on mining frequent connected subgraphs. We give up the demand on completeness and represent each input graph by a polynomial size random sample of its spanning trees. Such a random sample is a forest and can be generated in polynomial time. Thus, as frequent subtrees in forests can be listed with polynomial delay, we arrive at an efficient frequent subgraph mining algorithm. Our approach is sound, but incomplete: (i) it is only able to identify frequent subtrees, and not arbitrary graph patterns, and (ii) even if a tree pattern is frequent, it might not be identified as such.

Calculating a representation in this feature space for any unseeng graph is done by the same incomplete procedure.

Our empirical evaluation on two chemical datasets shows that a consid- erable fraction of all frequent subtrees can be recovered even fromone random spanning tree per graph. Regarding the expressive power of prob- abilistic frequent subtrees, we have observed a marginal loss in predictive performance. However, we have achieved a three time speed-up against the ordinary frequent subgraph kernel. Furthermore, our method is able to process significantly larger datasets and generates a much smaller fea- ture set than the original algorithm.

A long version of this extended abstract appeared in [1].

[1] P. Welke, T. Horv´ath, and S. Wrobel. Probabilistic Subtree Kernels. To appear in:

New Frontiers in Mining Complex Patterns, Springer, 2016.

Copyright, 2015 by the paper’s authors. Copying permitted only for private andc academic purposes. In: R. Bergmann, S. G¨org, G. M¨uller (Eds.): Proceedings of the LWA 2015 Workshops: KDML, FGWM, IR, and FGDB. Trier, Germany, 7.-9.

October 2015, published at http://ceur-ws.org

107

Références

Documents relatifs

Since we are inside the while loop only if one of the graphs G[A] and G[B] is such that none of the subgraphs has a satisfactory subset, the corresponding graph has in each subgraph

plicative kernels are connected. We make first some remarks that simplify these equations.. Let hex, t) denote a solution of the Smoluchowski’s coagulation equation

On the other hand, explicit graph embedding methods explicitly embed an input graph into a feature vector and thus enable the use of all the method- ologies and techniques devised

In this chapter we present subexponential parameterized algorithms on planar graphs for a family of problems that consist in, given a graph G, finding a connected (induced) subgraph

Given two graphs G and H and a spanning tree T of G, list all the maximal common connected induced subgraphs between G and H for which the subgraph in G is connected using edges of

In this chapter we present subexponential parameterized algorithms on planar graphs for a family of problems that consist in, given a graph G, finding a connected (induced) subgraph