• Aucun résultat trouvé

A parallel graph edit distance algorithm Expert Systems With Applications

N/A
N/A
Protected

Academic year: 2022

Partager "A parallel graph edit distance algorithm Expert Systems With Applications"

Copied!
17
0
0

Texte intégral

(1)

ContentslistsavailableatScienceDirect

Expert Systems With Applications

journalhomepage:www.elsevier.com/locate/eswa

A parallel graph edit distance algorithm R

Zeina Abu-Aisheh

, Romain Raveaux , Jean-Yves Ramel , Patrick Martineau

Laboratoire d’Informatique (LI), Université François Rabelais, 37200, Tours, France

a rt i c l e i n f o

Article history:

Received 22 June 2017 Revised 14 September 2017 Accepted 18 October 2017 Available online 20 October 2017 Keywords:

Graph matching Parallel computing Graph edit distance Pattern recognition Load balancing

a b s t r a c t

Grapheditdistance(GED)hasemergedasapowerfulandflexiblegraphmatchingparadigmthatcanbe usedtoaddressdifferenttasksinpatternrecognition,machinelearning,anddatamining.GEDisanerror- tolerantgraphmatchingproblemwhichconsistsinminimizingthecostofthesequencethattransforms agraphintoanotherbymeansofeditoperations.Editoperationsaredeletion,insertionandsubstitution ofverticesandedges.Eachvertex/edgeoperationhasitsassociatedcostdefinedinthevertex/edgecost function.Unfortunately,Unfortunately,theGEDproblemisNP-hard.Thequestionofelaboratingfastand precisealgorithmsisoffirstinterest.Inthispaper,aparallelalgorithmforexactGEDcomputationispro- posed.Ourproposalisbasedonabranch-and-boundalgorithmcoupledwithaloadbalancingstrategy.

Parallelthreadsrunabranch-and-boundalgorithmtoexplorethesolutionspaceandtodiscardmislead- ingpartialsolutions.Inthemeantime,theloadbalancingschemeensuresthatnothreadremainsidle.

Experimentson4publiclyavailabledatasetsempiricallydemonstratedthatundertimeconstraintsour proposalcandrasticallyimproveasequentialapproachandanaiveparallelapproach.Ourproposalwas comparedto6othermethodsandprovidedmoreprecisesolutionswhilerequiringalowmemoryusage.

© 2017ElsevierLtd.Allrightsreserved.

1. Introduction

Attributed graphs are powerful data structures for the repre- sentation of structured entities. In a graph-based representation, vertices and their attributes describe objects (or part of objects) whileedgesrepresentinterrelationshipsbetweentheobjects.Due to the inherent genericity of graph-based representations, and thanks to the improvement of computer capacities, structural representations havebecomemoreandmorepopularinthe field ofPatternRecognition.

Grapheditdistance(GED)isagraphmatchingparadigmwhose conceptwasfirstreportedinSanfeliuandFu(1983).Thebasicidea istofindthebestsequenceofeditoperationtotransformagraph G1 into another graph G2. The allowed operations are insertion, deletion and/or substitution of vertices and their corresponding edges. GEDcan be used asadissimilarity measure forarbitrarily structured and arbitrarily attributed graphs. In contrast to other approaches, it does not suffer from any restrictions and can be applied to any type of graph (including hypergraphs (Bunke &

R Fully documented templates are available in the elsarticle package on CTAN .

Corresponding author.

E-mail addresses: [email protected] , Zeina.Abu-Aisheh@univ- tours.fr (Z. Abu-Aisheh), [email protected] (R. Raveaux), Jean- [email protected] (J.-Y. Ramel), [email protected] (P. Mar- tineau).

URL: http://www.elsevier.com (Z. Abu-Aisheh)

Allermann,1983)).ThemaindrawbackofGEDisitscomputational complexitywhich isexponentialinthe numberofverticesofthe involvedgraphs.

Many fast heuristic GED methods have been proposed in the literature (Bougleux et al., 2017; Fankhauser, Riesen, Bunke, &

Dickinson, 2012; Ferrer, Serratosa, & Riesen, 2015; Fischer, Suen, Frinken,Riesen,&Bunke,2013;Serratosa,2015;Christmas,Kittler,

&Petrou,1995;Zeng,Tung, Wang,Feng,&Zhou,2009).However, these heuristic algorithms can only find unbounded suboptimal values.Onthe other hand, onlyfew exact approacheshave been proposed(Abu-Aisheh,Raveaux,Ramel, &Martineau,2015;Justice

&Hero,2006;Riesen,Fankhauser,&Bunke,2007;Tsai,Member,&

Fu,1979).

Parallel computing has been fruitfully employed to handle time-consumingoperations.Researchresultsintheareaofparallel algorithms for solving machine learning and computer vision problems have been reported in Kumar, Gopalakrishnan, and Kanal(1990).Theseresearches demonstratedthat parallelismcan beexploitedefficientlyinvariousmachine intelligenceandvision problemssuchasdeeplearning(Deng& Yu,2014) orfastFourier transform (Van Loan, 1992). In this paper, we take benefit of parallelcomputingtosolvetheexactGEDproblem.

The main contribution of this paper is an exact parallel al- gorithm based on a load balancing strategy for solving the GED problem.This paperliesinthe ideathat a parallelexecution can help to converge faster to the optimal solution. Our method is very generic and can be applied to directed or undirected fully https://doi.org/10.1016/j.eswa.2017.10.043

0957-4174/© 2017 Elsevier Ltd. All rights reserved.

(2)

attributedgraphs(i.e.,withattributesonbothverticesandedges).

Bylimitingtherun-time,ourexactmethodprovides(sub)optimal solutions and becomes an efficient upper bound approximation ofGED.A completecomparativestudyis providedwhere6exact and approximate GED algorithms were compared on 4 graph datasets.Byconsideringboththequalityoftheproposedsolutions andthe speed of the algorithm,we show that our proposal isa goodchoicewhena fastdecisionisrequiredasinaclassification contextorwhenthetimeislessamatterbutaprecised solution isrequiredasinimageregistration.

This paper is organized as follows: Section 2 presents the important definitions necessary for introducing our GED algo- rithm. Then, Section 3 reviews the existing approximate and exact approaches for computing GED. Section 4 describes the proposed parallel scheme based on a load balancing paradigm.

Section 5 presents the experiments and analyses the obtained results.Section6providessomeconcludingremarks.

2. Problemstatements

In this section, we first introduce the GED problemwhich is formally defined as an optimization problem. Secondly, to cope with the inherent complexity of the GED problem, the use of parallelcomputingisargued.However, theparallelexecutionofa combinatorialoptimizationproblemisnottrivialandconsequently the load balancing question is being raised. Finally, the load balancingproblemis formally definedandpresented to establish thebasementofanefficientparallelalgorithm.

2.1.Grapheditdistanceproblem

Attributedgraphisdefinedasatupleof4sets(V,E,

μ

,

ζ

)such

that:

Definition1. AttributedGraph G=(V,E,

μ

,

ζ

)

Visasetofvertices

EisasetofedgessuchasEV×V

μ

: VLV such that

μ

is a vertex labeling function which

associatesalabellV toavertexviwithviV,

i∈[1,|V|]

ζ

: ELE such that

ζ

is an edge labeling function which

associatesalabellEtoanedgeeiwitheiE,

i∈[1,|E|]

Definition1allowstohandlearbitrarilystructuredgraphswith unconstrained labeling functions. Labels for both vertices and edgescanbe givenby asetofintegersL={1,2,3,},avector spaceL=Rnand/orafinitesetofsymboliclabelsL={x,y,z,}.

GED is an error-tolerant graph matching paradigm, it defines the dissimilarityof two graphs by the minimum amount of dis- tortion needed to transform one graph into another (Bunke &

Allermann,1983).GEDrequiresthateachvertex/edgeofgraphG1 ismapped to a distinct vertex/edge of graphG2 or to a dummy vertex/edge.This dummy elementscan absorb structuralmodifi- cationsbetweentheinvolvedtwographs.Moreformally,GEDcan bedefinedasfollows:

Definition2. GraphEditDistanceProblem GED

(

G1,G2

)

= min

γ(G1,G2)

o∈γ c

(

o

)

where (G1, G2) denotes the set of edit paths transforming G1=(V1,E1,

μ

1,

ζ

1) into G2=(V2,E2,

μ

2,

ζ

2), and c denotes the costfunctionmeasuringthestrengthc(o)ofeditoperationo.

A sequence of edit operations

γ

that transforms a graph G1

intoa graphG2 iscommonlyreferred toaseditpathbetweenG1 andG2.Inorderto representthe degreeofmodificationimposed

Fig. 1. An incomplete search tree example for solving the GED problem. The first floor represents possible matchings of vertex A with each vertex of the second graph (in blue). A tree node is a partial solution which is to say a partial edit path.

(For interpretation of the references to color in this figure legend, the reader is re- ferred to the web version of this article.)

onagraphbyan editpath,acostfunction isintroducedmeasur- ing thestrengthof thedistortions causedbyeach editoperation.

Consequently,the editdistancebetweengraphsisdefinedby the minimum costedit pathbetween two graphs.Note that theedit operations on edges can be inferred by editoperations on their adjacent vertices, i.e., whether an edge is substituted, deleted, or inserted, depends on the edit operations performed on its adjacentvertices.Anelementaryeditoperationoisone ofvertex substitution (v1v2), edgesubstitution (e1e2), vertexdeletion (

v

1

), edge deletion (e1→

), vertex insertion (

v

2) and edgeinsertion(

e2)withv1V1,v2V2,e1E1ande2E2.

is

adummyvertexoredgewhichisusedtomodelinsertionordele- tion. c(.) is a cost function on elementary editoperations o. The cost function c(.)is of firstinterest and can change the problem being solved. In Bunke (1997) and Brun(2012) a particular cost function for GED was introduced, and it was shown that under thiscostfunction,GEDcomputationisequivalenttothemaximum common subgraph problem. Neuhaus and Bunke (2007) showed that if each elementary operation satisfies the criteria of a dis- tance (separability, symmetry and triangular inequality) then GEDis metric.Recently, methodsto learn thematching editcost between graphs have been published (Cortés & Serratosa, 2015).

The discussion around the cost functions is beyond the topic of thispaperthatessentiallyfocusesontheGEDcomputation.

2.2. FromGEDproblemtoloadbalancingproblem

GED is a discrete optimization problem that faces the com- binatorial explosion curse. The complexity of GED was proven to be NP-hard where the computational complexity of matching is exponential in the number of vertices of the involved graphs (Zeng et al., 2009). At run time, the evolution of the size and shape of the search space is irregular and unpredictable. The searchspaceisrepresentedasanorderedtree.

Forthesakeofclarityintherestofthepaper,thetermvertex referstoanelementofagraphwhilethetermtreenodeornode representsanelementofthesearchtree.

The initial andleaf tree nodescorrespond to the initial state and the final acceptable state in the search tree, respectively.

Each edge represents a possible way of state change. A com- binatorial optimization problem is essentially the problem of finding aminimum-cost pathfroman initial node toa leaf node in the search tree. More concretely in GED, each tree node is a sequence ofeditoperations. Leaf nodesare complete editopera- tions sequences (edit path) while intermediate nodesare partial solutionsrepresentingpartialeditpath.Anexampleofsearchtree correspondingtoGEDcomputationisshowninFig.1.

(3)

The parallelism of combinatorialoptimization problemsis not trivial. In a parallel combinatorialsearch application, each thread searches for optimal solutions within a portion of the solution space.Theshapeandsizeofthesearchspacechangeasthesearch proceeds.Consequently,treenodesaregeneratedanddestroyedat run-time.

Portions that encompass the most promising solutions with high probability are expanded in priority and explored exhaus- tively,while portions that haveunfruitful solutions are discarded at run-time. To ensure that parallel threads are always busy, tree nodes have to be dispatched at run-time. Hence, the local workloadofathreadisdifficulttopredict.

The parallel executionofcombinatorial optimizationproblems relies on load balancing strategies to divide the globalworkload of all threads iteratively at run-time. From the viewpoint of a workload distribution strategy, parallel optimizations fall in the asynchronous communication category where no thread waits another thread to finish in order to start a new task (Bertsekas

& Tsitsiklis, 1997). A thread initiatesa balancing operation when it becomes lightly loaded or overloaded. The objective of the data distributionstrategiesis toensure afastconvergence tothe optimalsolutionsuchthatallthetreenodesareevaluatedasfast as possible. In this paper, we propose a parallel GED approach equipped with a load balancing strategy. This approach ensures that all threads have the same amount of load and all threads explorethemostpromisingtreenodesfirst.

2.3. Loadbalancingproblem

Aparallelprogramiscomposedofmultiplethreads,eachthread isaprocessingunitwhichprocessesonetask.Multiplethreadscan existwithinthesameprocessandshareresourcessuchasmemory (i.e.,thevaluesofitsvariablesatanygivenmoment). InourGED case,thetaskistoevaluateatreenode(apartialeditpath)andto generateitschildren (thenext possiblestates).Athreadperforms a taskon a setofworks. Awork isa treenode characterized by its workload.Awork cannotbe sharedbetweenthreads, itisthe smallestunitofconcurrencytheparallelprogramcanexploit.

Creating a parallel program involves first decomposing the overall computation into worksand then assigningthe works to threads.The decompositiontogetherwiththeassignmentstepsis oftencalledpartitioning.Theassignmentonitsownisreferredto staticloadbalancing.

Loadbalancingalgorithms canbebroadlycategorizedintotwo families:Staticanddynamic.

2.3.1. Staticloadbalancing

Static load balancing algorithms distribute works to threads onceandforall,inmostcasesrelyingonaprioriknowledgeabout the works and the system on which they run. These algorithms rely on the estimated executiontimes ofworks and inter-thread communication requirements. It is not satisfactory for parallel programs that dynamic and/or unpredictable. The problem of staticloadbalancingisknowntobeNP-hardwhenthenumberof threadsisgreaterorequalto2(Drozdowski,2009).

2.3.2. Dynamicloadbalancing

Dynamic load balancing algorithms bind works to threads at run-time. Generally,a dynamic loadbalancing algorithm consists ofthreecomponents: Aloadmeasurement rule,aninitiationrule andaloadbalancing operation.Avery detaileddefinition ofload balancingmodelscanbefoundinXuandLau(1997).

Load Measurement. Dynamic load balancing algorithms rely on the workload information of threads. The workload information is typically quantified by a load index, a non-negative variable

whichisequalto zeroifthe threadisidleortakesanincreasing positivevaluewhentheloadincreases(Xu&Lau,1997).Sincethe measureofworkloadwouldoccurfrequently, itscalculation must beefficient.

InitiationRule. Thisruledictateswhentoinitiatealoadbalancing operation. The execution of a balancing operation incurs non- negligibleoverhead; its invocationmust weightits overheadcost against its expected performance benefit. An initiation policy is thus neededtodetermine whethera balancing operation willbe profitable.

Load Balancing Operation. This operation is defined by three rules: Location, distribution and selection rules. The location rule determines the partnersof the balancing operation, i.e., the threads to involve in the balancing operation. The distribution ruledetermineshowtoredistributeworkloadamongtheselected threads. The selection rule selects the most suitable data for transferamongthreads.

3. Relatedwork

Inthissection, an overviewofthe GEDmethodspresented in the literature is given. Since our goal is to speed up the calcu- lations of GED,parallelism ishighly required. Therefore,we also covertheparallelmethods,dedicatedtosolvingbranch-and-bound (BnB)problems,aimingatgettinginspiredbysomeoftheseworks forparallelizingtheGEDcalculations.

3.1. State-of-the-artofgrapheditdistance

The methods of the literature can be divided into two cat- egories depending on whether they can ensure the optimal matchingtobefoundornot.

3.1.1. Exactgrapheditdistanceapproaches

Awidely used methodforedit distancecomputationis based on the A algorithm (Riesen et al., 2007). This algorithm is con- sidered as a foundation work for solving GED. A is a best-first algorithm where the enumeration of all possible solutions is achievedby meansofanorderedtreethat isconstructed dynam- icallyatrun time byiteratively creatingsuccessornodes.Ateach time, thenode orso calledpartial editpath pthat hasthe least g(p)+h(p)ischosenwhereg(p)representsthecostofthepartial editpathaccumulatedso farwhereas h(p) denotes theestimated costfromptoa leafnoderepresentingacompleteeditpath.The sumg(p)+h(p)isreferred to asa lower boundlb(p). Giventhat theestimationofthe futurecosts h(p)is lowerthan, orequalto, therealcosts,an optimalpathfromtherootnodeto aleafnode is guaranteed to be found (Riesen & Bunke, 2009). Leaf nodes correspond to feasible solutions and so complete edit paths. In the worst case, the spacecomplexity can be expressed asO(||) (Cormen et al., 2009) where || is the cardinality of the set of all possibleeditpaths. Since || isexponential inthe numberof verticesinvolvedinthegraphs,thememoryusageisstillanissue.

To overcome the A problem, a recent depth-first BnB GED algorithm, referred to as DF, has been proposed in Abu- Aishehetal.(2015).Thisalgorithmspeedsupthecomputationsof GEDthanks to its upper andlower bounds pruningstrategy and its preprocessingstep.Moreover, thisalgorithm doesnot exhaust memoryas the numberof pending editpaths that are storedat anytimetisrelativelysmallthankstothespacecomplexitywhich isequalto|V1|.|V2|intheworstcase.

In both A and DF, h(p) can be estimated by mapping the unprocessedvertices andedges of graphG1 to theunmapped to those of graph G2 such that the resulting cost is minimal. The

(4)

Table 1

Characteristics of exact graph edit distance methods.

Reference Size of Graphs Execution Time Complexity Parallel?

Riesen et al. (2007) 10 10 ms Exponential No

Abu-Aisheh et al. (2015) 15 10 0,0 0 0 ms Exponential No Justice and Hero (2006) 40 150,0 0 0 ms Exponential No

unprocessededgesofbothgraphsarehandledseparatelyfromthe unprocessedvertices. This mappingis done in a faster waythan theexactcomputationandshouldreturnagoodapproximationof thetruefuturecost.Notethatthesmallerthedifferencebetween h(p)andtherealfuturecost,thefewernodeswillbeexpandedby AandDF.

Almohamad andDuffuaa (1993)proposed the firstlinear pro- grammingformulation ofthe weighted graphmatching problem.

Itconsistsindeterminingthe permutationmatrixminimizingthe L1 normofthe differencebetweenadjacencymatrix oftheinput graphandthepermutedadjacencymatrixofthetargetone.More recently, Justice and Hero (2006) also proposed a binary linear programmingformulationofthegrapheditdistanceproblem.GM istreated as finding a subgraph of a larger graph known as the editgrid.Theeditgridonlyneedstohaveasmanyverticesasthe sumofthetotalnumberofverticesinthegraphsbeingcompared.

Onedrawbackofthismethodisthatitdoesnottakeintoaccount attributesonedgeswhichlimitstherangeofapplication.

Table 1 synthesizes the aforementioned methods in terms of thesize ofthe graphs they could match, the executiontime and thecomplexity. One can see thecomplexity ofthe exact GED in terms of the number of vertices that the methods could match.

Based on these facts, researchers shed light on the approximate GEDside.

3.1.2. Approximategrapheditdistanceapproaches

VariantsofapproximateGEDalgorithms areproposedtomake GEDcomputationsubstantially faster.A modificationofA,called Beam-Search (BS), has been proposed in Neuhaus, Riesen, and Bunke(2006).ThepurposeofBS,istoprunethesearchtreewhile searching for an optimal edit path. Instead of exploring all edit pathsinthesearchtree,aparametersissettoanintegerxwhich isincharge ofkeepingthexmostpromisingpartialeditpaths in thesetofpromisingcandidates.

In Riesen and Bunke (2009), the problem of graph matching isreducedto findingtheminimumassignmentcost whereinthe worst case, the maximum number of operations needed by the algorithm is O(n3). This algorithm is referred to as BP. Since BP considers local structures rather than global ones, the optimal GED is overestimated. Recently, researchers have observed that BP’soverestimationisveryoftenduetoafewincorrectlyassigned vertices.Thatis,onlyfew vertexsubstitutionsfromthenextstep areresponsibleforadditional(unnecessary)edgeoperationsinthe stepafterandthus resultingintheoverestimationoftheoptimal editdistance.InRiesenandBunke(2014),BPisusedasaninitial step. Then, pairwise swapping of vertices (local search) is done aimingatimprovingtheaccuracy ofthedistanceobtainedsofar.

InRiesen,Fischer,andBunke(2014),asearch procedurebasedon a genetic algorithm is proposed to improve the accuracy of BP. These improvements increase run times. However, they improve theaccuracyoftheBPsolution.

In Fischer, Suen, Frinken, Riesen, and Bunke (2015), the authors propose a novel modification of the Hausdorff dis- tance that takes into account not only substitution, but also deletion and insertion cost. H(V1, V2) is defined as follows:

H(V1,V2)=

g1ming2c¯1(u,

v

)+g

2ming1c¯2(u,

v

) which can be interpretedasthe sumofdistancesto themostsimilar vertexin theothergraph.Thisapproachallowsmultiplevertexassignments,

consequently, the time complexity is reduced to quadratic (i.e., O(n2)) with respect to the number of vertices of the involved graphs.In Riesen(2015) andBougleuxetal.(2017),the GEDwas showntobeequivalenttoaQuadraticAssignmentProblem(QAP).

InBougleuxetal.(2017),theQAPformulationoftheGEDproblem is solved by two well-known graph matching methods called Integer ProjectedFixed Pointmethod (Leordeanu, Hebert,& Suk- thankar, 2009) andGraduated NonConvexity andConcavity Pro- cedure (Liu& Qiao, 2014). These two approximatemethods have beenadaptedandoptimizedtosolvesub-optimallytheGEDprob- lem.In Leordeanuet al.(2009),thisheuristic improves an initial solutionbytryingtosolvealinearassignmentproblemandthere- laxedQAPwherebinaryconstraintsarerelaxedtothecontinuous domain. Iteratively,the quadratic formulation is linearlyapproxi- matedbyits1st-orderexpansionaroundthecurrentsolution.The resultingassignmenthelpsatguidingthe minimizationofthere- laxedQAP.InLiuandQiao(2014),apathfollowingalgorithmaims atapproximating thesolution ofa QAPby considering a convex- concaverelaxationthroughthemodifiedquadraticfunction.

3.1.3. Synthesis

Table 2 summarizes the aforementioned approximate GED methods. Approximate GED methods often have a polynomial computational time in the size ofthe input graphs and thus are much fasterthanthe exactones. Nevertheless,thesemethods do not guarantee to find the optimal matching. On the exact GED side, only few approaches have been proposed to postpone the graph size restriction (Abu-Aisheh et al., 2015; Justice & Hero, 2006; Riesenetal., 2007;Tsai etal., 1979). Forallthesereasons, we believe that proposing a fast and exact GED algorithm is of greatinterest.

3.2. State-of-the-artofparallelbranch-and-boundalgorithms

Parallel BnBalgorithms have been wildlystudied in the past.

In this section, we report approaches that have been proposed in the literature to solve BnB in a fully parallel manner. The state-of-the-art in this section is divided into two big families, dependingonwhetherornottheexplorationofthesearch treeis doneinaregularway.

3.2.1. Regularexplorationofthesearchspace

Chakroun and Melab (2013) put forward a template that transformstheunpredictableandirregularworkloadassociatedto the explored BnB treeinto regular data-parallel GPU kernels.1 A pool of pendingnodes is offloadedto theGPU whereeach node is evaluated in parallel. Moreover, the branching and pruning steps are performedon the CPU side. In fact, besides equivalent operations, thepruningoperator ontop ofGPUreducesthe time oftransferring the resultingpool fromthe GPUto the CPU since the non promising generated sub-problems are kept in the GPU memoryanddeleted there.The authorsofBoukedjar,Lalami,and El-Baz (2012) presented a CPU-GPU model. When a kernel is launched, the works are assigned to idle threads. Each thread performs computation on only one node of the BnB list. More- over, each thread hasits own register andprivate local memory.

1Kernels are functions executed by many GPU threads in parallel.

(5)

Table 2

Characteristics of approximate GED methods.

Reference Size of Graphs Execution Time Complexity Parallel?

Riesen and Bunke (2009) 100 400 ms Cubic No

Riesen et al. (2007) 70 28 s | V 1| x No

Fischer et al. (2015) 100 500 ms Quadratic No

Bougleux et al. (2017) 70 27 s Cubic No

Threads can communicate by means of a global memory. Both ChakrounandMelab(2013)andBoukedjaretal.(2012)solvedthe irregularityofBnB.However, theexplorations inbothapproaches take longer time. Thatis because in the first kernel each thread generates only one child at each time while the elimination of branchesoccursinthesecondkernel.

An OPEN-MP approaches have been put forward in Dorta, Leon, and Rodriguez (2003). T threads are established by the master program. Moreover, themaster program generates tree nodes and put them in a queue. Then T tree nodes are removed from the queue and assigned to each thread. The best solution must be modified carefully where only one thread can change it at any time. The same thing is done when a thread tries to insert a new tree nodes in the global shared queue. In this approach, each thread only takes one node, exploresit and at the end of its exploration it sends its result to the master that forwardsthe message to other slavesif the upperbound is updated.Thus,thismodeldidnottackletheirregularityofBnB.

3.2.2. Irregularexplorationofthesearchspace

A master-slave parallel formulation of depth-first search was proposed inRao andKumar (1987). Each thread takesa disjoint partofthesearch space.Onceathread finishesits assignedpart, it stealsunexplored nodesofthe searchspace ofanotherthread.

A dynamic depth eager scheduling method was proposed in NearyandCappello(2005).Inthebeginning,adepthparameteris setto2,whichmeansthatallthetaskswhoselevelinthesearch space is 2 are processed by threads with no furthersubdivision.

Then, each thread works on its associated problems. When a thread runs out of work, it requests work from some thread thatitknows.Thisbalancesthecomputationalloadaslongasthe numberoftasksperthreadishigh.ThecommunicationinRaoand Kumar (1987) and Neary and Cappello (2005) is asynchronous, and thus threads communicate if they succeed in updating the upper bound.An eager schedulingapproach isused tomake the tasksbalanceddependingonthedifficultyofeachtreenode.

A master-slave hybrid Depth-First/Best-First was proposed in Chung, Flynn, and Sang (2012). The master thread keeps gener- ating the tree nodes at a predetermined level (i.e., level i) and saves them to a work pool. Then, each of the worker threads takesa node withthe minimumlower boundfromthe pool and explores it in a depth-first way. Generating and exploring nodes arerepeateduntilfindingthesolutionoftheproblem.Thismodel hasalesscommunicationbetweenthreads,however,itisirregular astheworksgiventothreadsdonothavethesamedifficulty.

InAllenandYasuda(1997),aBnBalgorithmforsolvinginexact graphmatchingwasproposed.Thisalgorithmaimsatdetermining a minimum-distance between two unattributed graphs. At each iteration,eachthreadtakesanodefromitsqueuetobesolvedby expanding it in a depth-first search wayuntil its branch is fully explored,updatingthelocalbestpermutationandthecorrespond- ing degreeofmismatchandeliminatingtest.Afterwards,a global permutationwithitscorrespondingdegreeofmismatchisupdated andgiventoallthreadswhenallofthemfinishsolvingtheircho- sennodesorproblems,then,anextnodeischosenbyeachthread.

Athread becomesinactive whenithasno nodeleft inits queue.

Loadbalancingisperformedifthenumberofinactivethreadswith

emptyqueuesisaboveathresholdT.Thebestpermutationandthe bestdegreeofmatchareonlyupdatedattheendofeachiteration, suchafactwillnotprunethesearchspaceasfastaspossible.

3.2.3. Synthesis

Based on the aforementioned parallel BnB algorithms and to the best of our knowledge, none of these algorithms addressed the GED problem. We believe that proposing a parallel branch- and-bound algorithm dedicated to solving the GED problem is ofgreat interest since the computational time will be improved.

ThesearchtreeofGEDisirregular(i.e.,thenumberoftreenodes variesdependingontheabilityofthelowerandupperboundsin pruningthesearch tree)andthus theregularparallel approaches (e.g.,Boukedjaretal.,2012;Chakroun&Melab,2013;Dortaetal., 2003)arenotsuitableforsuchaproblem.

The approaches in Rao and Kumar (1987), Chung et al. (2012) and Neary and Cappello (2005) are inter- estingsincethecommunicationisasynchronous2and thusthere is noneedtostopathreadifitdidnotfinishitstasks,unlessanother threadranoutoftasks.InChungetal.(2012),however,loadbal- ancingis notintegrated. Thus,when thereareno moreproblems tobegeneratedbythemasterthread,somethreadsmightbecome idleforacertain amountoftimewhilewaitingtheotherthreads tofinish theirassociated tasks.ForGED,loadbalancing isimpor- tanttokeeptheamountofworkbalancedbetweenallthreads.

On this basis, we propose a parallel GED method equipped with a load balancing strategy. This paper is considered as an extension ofthe mostrecent BnBalgorithm (DF). When thinking of a parallel and/or a distributed approach of DF, the editpaths can be considered as atomic tasks to be solved. Edit paths can bedispatched andcan begivento threads inordertodivide the GED problem into smaller problems. It is hard to estimate the timeneededbythreadstoexploreasub-tree(i.e.,tobecomeidle).

Likewise,thenumberofCPUsand/ormachineshavetobeadapted to the amountand type of data that haveto be analyzed.Some experiments in Section 5.5 illustrate this point andare followed byadiscussion.

4. Proposal:parallelgrapheditdistanceusingaloadbalancing strategy

Inthissection,an overviewofourproposalisgiven.The main objectives of the approach lie in, first, making sure that all the threadshaveaworktodo.Second,balancingtheworkloadofthe threadsatrun-time. Third,exploring thefruitfulpartialeditpaths ofthesearchtreethankstothecostestimationoflb(p).

Generallyspeaking, ourproposal isinspired by some ideasin RaoandKumar(1987)andNearyandCappello(2005).Abest-first procedure is performed before startingto decompose the search tree(composed of edit paths)into sub-trees. The loadbalancing procedure occurs when any thread finishes all its assigned edit paths. The algorithm terminates when all threads finish the ex- ploration of their assigned editpaths. Our algorithm, denoted by

2Asynchronous communication indicates that no thread waits another thread to finish in order to start its new task ( Bertsekas & Tsitsiklis, 1997 ).

(6)

Fig. 2. The main steps of PDFS .

PDFS, consists of three main steps: Initialization-Decomposition- Assignment,Branch-and-BoundandLoadBalancing.Fig.2pictures thewholestepsofPDFS.

4.1.Initialization,decompositionandassignment

Theobjectiveofthisstepistwofold.First:dividingtheproblem intosub-problems. Second, making sure, at the beginning ofthe method,that all threadshavean equivalentworkload intermsof numberofeditpathsandtheirdifficulty.

Initialization Procedure. As in DF (Abu-Aisheh et al., 2015), this phaseconsistsof3steps, eachofwhich aimsatspeedingup the calculations.

First, the vertices and edges cost matrices (Cv and Ce) are constructed, respectively. This step aims at getting rid of re- calculating the distances between attributes when matching verticesandedgesofG1 andG2.

LetG1=(V1,E1,

μ

1,

ξ

1)andG2=(V2,E2,

μ

2,

ξ

2)betwographs withV1=(u1,...,un) and V2=(

v

1,...,

v

m). A vertices cost ma- trixCv, whose dimension is (n+2) X (m+2), is constructed as follows:

Cv=

c1,1 ... c1,κ c1 ...

... ... ... ... ... ...

... ... ... ... ... ...

cn,1 ... cn,κ... cn

c1 ... ∞ ∞ ...

... cκ...

wherenis thenumberofvertices ofG1 and

κ

isthe numberof

verticesofG2.

Each element ci,j inthe matrix Cv corresponds to the cost of assigningthe ith vertexof the graphG1 to the jth vertexof the graphG2.Theleftuppercornerofthematrixcontainsallpossible nodesubstitutionswhiletherightuppercornerrepresentsthecost ofall possible vertices insertionsand deletionsof verticesof G1, respectively.The left bottom cornercontains all possiblevertices insertionsanddeletionsofverticesofG2,respectivelywhereasthe bottomrightcornerelementscostissettoinfinitywhichconcerns thesubstitutionof

.Similarly,Cecontainsallthepossiblesub- stitutions,deletionsandinsertionsofedgesofG1andG2.Ceiscon- structedintheverysamewayasCv.Theaforementionedmatrices CvandCe areusedasaninputofthefollowingphase.

Second, the vertices of G1 are sorted in order to start with themost promisingvertices in G1.BP is applied to establish the initial editpath EP(Riesen & Bunke, 2009). Afterwards, the edit operations of EP are sorted in ascending order of the matching cost where EPsorted =

{

u

v } ∀

uV1∪{

}. At last, from EPsorted,

eachuV1isinsertedinsorted-V1.Thisoperationhelpsinfinding themostpromisingverticesviV1 thatwillbematchedfirstwith the unmatchedverticesin V2 tospeed up the exploration ofthe searchtreewhilesearchingfortheoptimalsolution.

Third, a first upper bound (UB) is computed by BP algorithm as it is relatively fast and it provides reasonable results, see RiesenandBunke(2009)formoredetails.

Decomposition. Before starting the parallelism, a distribution approach is applied aiming at dispatching the workload or sub- problemsamong threads. Forthat purpose,N editpaths are first generated using A by the main thread and saved in the heap.

Afterwards,theN partialeditpathsare sortedasan orderedtree starting fromthe node whose lb(p) is minimum up to the most expensiveone.NotethatNisaparameterofPDFS.

Assignment. Let Q be the set of partial solutions outputted by A.Assigning partial solutions toparallel threads is equivalentto solving the static loadbalancing problem stated inSection 2.3.1. Duetothecomplexityoftheproblem,wechosetoavoidanexact computationof loadbalancing and we adopted an approximated one.Algorithm1depictsthestrategywehavefollowed.Oncethe

Algorithm1 Dispatch-Tasks.

Input:AsetofpartialeditpathsQgeneratedbyAandT threads.

Output:ThelocallistOPENofeachthreadTi 1: Q←sortAscending(Q)

2: forTindexT do 3: OPENT

index

{ φ}

4: endfor

5: i=0 avariableusedforthread’sindices 6: for pQdo

7: index=i%

|

T

|

8: OPENT

index.addTask(p) 9: i++

10: endfor 11: ReturnOPENT

index

index∈1,· · ·,

|

T

|

partial editpaths are sorted inthe centralized heap(line 1), the locallistOPEN ofeachthreadisinitialized asan emptyset (lines 2–4).Each threadreceives onepartial solutionata time,starting from the mostpromising partial edit paths (line 8). The threads keeptakingeditpathsinthatwayuntilthereisnomoreeditpath inthecentralizedheap(lines6–10).

Each thread maintainsa local heap to keep the assignededit paths forexploring editpathslocally.Such an iterativewayguar-

(7)

Fig. 3. Edge mappings based on their adjacent vertices and whether or not an edge between two vertices can be found.

anteesthediversityofnodesdifficultythat areassociatedtoeach thread.

4.2. Branch-and-boundmethod

In this section we explain the components of BnB that each thread executes onits assignedpartial editpaths. First,the rules of selecting edit paths, branching and bounding are described.

Second,updatingtheupperboundandpruningthesearchtreeare detailed.

Selection Rule. Asystematicevaluationof allpossible solutionsis performedwithout explicitlyevaluating all of them.The solution spaceisorganizedasanorderedtreewhichisexploredinadepth- firstway.Indepth-firstsearch,eacheditpathisvisitedjustbefore its children.In other words,whentraversing thesearch tree,one should travel as deep as possible from node i to node j before backtracking.Ateachstep,themostpromisingchildischosen.

Branching Procedure. Initially each thread only has its assigned editpathsinitslocalheapset(OPEN)i.e.,thesetoftheeditpaths, found sofar.The explorationstartswiththefirstmostpromising vertexu1 insorted-V1 inordertogeneratethechildrenofthese- lectededitpath.Thechildrenconsistofsubstitutingu1 withallthe verticesofG2,inadditiontothedeletionofu1 (i.e.,u1

).Then,

the children are added to OPEN. Consequently, a minimum edit path (pmin) is chosen to be explored by selecting the minimum cost node(i.e.,min(g(p)+h(p))) amongthechildren of pmin and soon.

Starting fromthesecondpromising vertexu2 in sorted-V1,the edges of both G1 andG2 are handled. Edgesof G1 canbe either matched withedges of G2 or deleted while edges of G2 can be either insertedin G1 ormatched withedges inG1.However, the decision of whether an edge is inserted, substituted, or deleted is done regarding the matching of their adjacent vertices. That is, the neighborhood of edges dominates their matchings. Edges are handled as follows: Let ui and ujV1 be matched with vk andvzV2,respectively(i.e.,uivk andujvz). Basedon these matchings.Oneofthefollowingedgeoperationsisselected:

If

eijE1 and

ekzE2 theneijekz

If

eijE1 andekzE2theneij

IfeijE1and

ekzE2then

ekz

The searchforabetter editpath continuesthroughbacktrack- ing if pmin equals

φ

.In this case, the next child of pmin is tried outandsoon(Fig.3).

Pruning Procedure. As in DF, pruning, or bounding, is achieved thanks to h(p), g(p) and an upper bound UB obtained at node leaves. Formally, for a node p in the search tree, the sum g(p)+h(p) istakenintoaccount andcompared withUB.Thatis, ifg(p)+h(p) islessthan UB thenpcan be explored. Otherwise,

theencounteredpwillbeprunedfromOPENandabacktrackingis donelookingforthenextpromisingnode andsoon untilfinding the best UB that represents the optimal solution of PDFS. Note that OPEN is a local search tree of each thread. This algorithm differsfromAasatanytime t,intheworstcase, OPENcontains exactly|V1|.|V2| elementsand hencethe memoryconsumption is notexhausted.

UpperBoundUpdate. The bestupperbound isgloballysharedby allthreads(sharedUB).Whenathreadfindsabetterupperbound, thesharedUBisupdated(i.e.,acompletepathfoundby athread whosecostislessthanthecurrentUB).

Heuristic. After comparingseveralheuristics h(p)fromthe litera- ture,we selectedthebipartite graphmatchingheuristicproposed inRiesen andBunke (2009). Thecomplexityofsuch a methodis O(

{|

V1

|

,

|

V2

|}

3+

|

E1

|

,

|

E2

|}

3).Foreachtreenode p, theunmatched vertices and edges are handled in a complete independent way.

Unmatched vertices of G1 and unmatched vertices of G2 are matchedatbestbysolvingalinearsumassignmentproblem.Un- matchededgesofbothgraphsarehandledanalogously.Obviously, this procedure allows multiple substitutions involving the same vertexoredgeand,therefore,itpossiblyrepresentsaninvalidway to edit the remaining part of G1 into the remaining part of G2. However, the estimated cost certainly constitutes a lower bound oftheexactcost.

4.3.Loadbalancingandcommunication

LoadMeasurement. Eachthreadiprovidessomeinformationabout its workload or weight index

ω

i. Obviously, the number of edit paths in OPEN can be a workload index. However, this choice may not be accurate since BnB computations are irregular with different computational requirements. Several workload indices can be adapted.One could think abouth(p). h(p) can be hard to interpret,itcanbe smalleitherbecausethepiscloseto theleaf nodeor becausepisa verypromisingsolution. Toeliminatethis ambiguity, instead, one can count the number of vertices in G1 thathavenotbeenmatchedyet.Thisisdone oneacheditpathin thelocalheap.Inourapproach,wehaveselectedthelatter.

InitiationRule. An initiationrule dictates whento initiate a load balancing operation. Its invocation decision must appearwhen a threadworkloadindex

ω

ireachesazerovaluethatistosayifthe threadisidle.

Load Balancing Operation. In parallel BnB computations, each processsolvesoneormoresubproblemsdependingonthedecom- positeprocedure.Inourproblem,twothreads areinvolvedinthe loadbalancingoperation: Heavyandidlethreads. Whena thread becomesidle, the heaviest thread will be in charge of giving to the idlethread some edit paths to explore. All theedit paths of

(8)

Table 3

The characteristics of the GREC, Mutagenicity, Protein and PAH datasets.

Dataset GREC Mutagenicity Protein PAH

Size 4337 1100 660 484

Vertex labels x,y coordinates Chemical symbol Type and amino acid sequence None

Edge labels Line type Valence Type and length None

vertices 11.5 30.3 32.6 20.7

edges 12.2 30.8 62.1 24.4

Max vertices 25 417 126 28

Max edges 30 112 146 34

Table 4

The cost functions and meta parameters of the datasets.

Dataset GREC Mutagenicity Protein PAH

τvertex 90 11 11 3

τedge 15 1.1 1 3

α 0.5 0.25 0.75 0.5

Vertex substitution function Extended Euclidean distance Dirac function Extended string edit distance 0 Edge substitution function Dirac function Dirac function Dirac function 0

Reference of cost functions Riesen and Bunke (2009) Riesen and Bunke (2009) Riesen and Bunke (2009) Gauzere, Brun, and Villemin (2012)

theheavy thread are ordered usingtheir lb(p).The heavy thread distributesthebesteditpathsbetweenitandtheidlethread.This procedure guarantees the exploration of the best editpaths first sinceeachthreadholdssomepromisingeditpaths.

Threads Communication. All threads share Ce, Ce, sorted-V1 and UB.Since allthreadstrytofindabetterUB,a memorycoherence protocolis requiredonthe sharedmemorylocation ofUB.When two threads simultaneously try to update UB, a synchronization processbasedonmutexisappliedinordertomakesurethatonly onethreadcanaccesstheresourceatagivenpointintime.

5. Experiments

This section aims at evaluating the proposed contribution throughanexperimentalstudythatcompares9methodsinterms of precision, execution time and classification rate on reference datasets. We first describe the datasets, the methods that have beenstudiedandtheprotocol.Then,theresultsarepresentedand discussed.

5.1.Datasets

To the best of our knowledge, few publicly available graphs databases are dedicated to precise evaluation of graph matching tasks.However, mostofthesedatasets consistofsyntheticgraphs that are not representative of PR problems concerning graph matching undernoise anddistortion. We shed light on the IAM graphrepository which is a widely used repository dedicated to a wide spectrum of tasks in pattern recognition and machine learning(Riesen & Bunke, 2008). Moreover, it containsgraphs of bothsymbolicandnumericattributeswhichisnotoftenthecase of other datasets. Consequently, the GED algorithms involved in the experiments are applied to three different real world graph datasets taken from the IAM repository (Riesen & Bunke, 2008) (i.e.,GREC,Mutagenicity(MUTA)andProteindatasets).Continuous attributeson vertices and edges of GREC play an important role in the matching process whereas MUTA is representative of GM problems where graphs have only symbolic attributes. On the other hand, the Proteindatabase contains numeric attributes on eachvertexaswell asastringsequencethat isusedtorepresent the amino acid sequence. For the scalability experiment, the subsets of GREC, MUTA and Protein, proposed in the repository GDR4GED(Abu-Aisheh,Raveaux,&Ramel,2015),werechosen.On

Table 5

Time constraints for accuracy evaluation in limited time.

Dataset GREC MUTA Protein PAH C T(milliseconds) 400 500 400 55

theother hand,fortheclassificationexperiment, theexperiments wereconductedonthetrainandtestsetsofeachofthem.

In addition to these datasets, a chemical dataset, called PAH, takenfrom GREYCs Chemistrydataset repository,3 was also inte- gratedintheexperiments.Thisdatasetisquitechallengingsinceit hasnoattributesonbothverticesandedges.Table3summarizes thecharacteristicsofalltheselecteddatasets.

These datasets have been chosen by carefully reviewing all the publicly available datasets that havebeen used in the refer- enceworks mentionedin Section 3(LETTER, GREC, COIL,Alkane, FINGERPRINT, PAH,MUTA, PROTEIN andAIDS to name the most frequent ones). On the basis of this review, a subset of these datasetshasbeenchoseninordertogetagoodrepresentativeness ofthedifferentgraphfeatureswhichcan affectGEDcomputation (sizeandlabelling):

Eachdatasethasspecificeditcostfunctions.Twonon-negative meta parameters are associated to GM: (

τ

vertex and

τ

edge) where

τ

vertex denotes a vertexdeletion or insertion costs whereas

τ

edge

denotes anedge deletion orinsertion costs.A thirdmeta param- eter

α

isintegratedtocontrolwhethertheeditoperationcoston

the vertices oron the edges is moreimportant. Table 4 demon- stratesthecostfunctionsofeach oftheincludeddatasetsaswell astheirmetaparameters.

5.2. Studiedmethods

We compared PDFS to five other GED algorithms from the literature. From the related work, we chose two exact methods and three approximate methods. On the exact method side, A algorithmapplied toGEDproblem(Riesenetal.,2007) isafoun- dation work. It is the most well-known exact method and it is oftenusedtoevaluatetheaccuracyofapproximatemethods.DFis alsoa depth-first GEDthat has beenrecentlypublishedand that beatsAintermsofrunningtimeandprecision(Abu-Aishehetal., 2015). Moreover, a naive parallel PDFS, referred to as naive-PDFS, isimplementedandaddedtothelistofexactmethods.The basis

3https://brunl01.users.greyc.fr/CHEMISTRY/index.html .

(9)

Table 6

The effect of the number of threads on the performance of PDFS .

Method # best found solutions # optimal solutions Idle Time over CPU Time

PDFS-2T 67 48 1.7 ∗10 −5

PDFS-4T 79 54 9.5 ∗10 −5

PDFS-8T 83 66 2.8 ∗10 −4

PDFS-16T 92 69 7.7 ∗10 −4

PDFS-32T 94 69 0.011

PDFS-64T 95 68 0.043

PDFS-128T 98 66 0.169

Fig. 4. Time-deviation score: left ( # Threads), right ( # Edit Paths).

ofnaive-PDFSissimilartoPDFS.However,naive-PDFSdoesnotin- cludeneithertheassignmentphase (seeSection4.1) northeload balancingphase(seeSection4.3).Insteadoftheassignmentphase, a random assignment is applied. naive-PDFS does not include a balancing strategy which means that if a thread Ti finished its assignednodes,itwouldbeidleduringtherestoftheexecutionof naive-PDFS.On the approximatemethod side, we can distinguish three familiesofmethods,tree-basedmethods, assignment-based methods andset-basedmethods.Forthetree-basedmethods,the truncated version of A (i.e., BS-x) waschosen where x refers to maximum number of open edit paths. Among the assignment- based methods,we selected BP.In Riesenand Bunke (2009),au- thorsdemonstratedthatBP isagoodcompromisebetweenspeed andaccuracy.Finally,we pickedaset-basedmethod.Anapproach basedontheHausdorff matching,denoted byH,wasproposedin Fischeretal.(2015).AllthesemethodscoveragoodrangeofGED solvers andreturna vertextovertex matching,exceptH,aswell as a distance between two graphs G1 and G2 except the lower boundGEDwhichonlyreturnsadistancebetweentwographs.

5.3. Environment

PDFSandnaive-PDFSwereimplementedusingJavathreads.The evaluation of both algorithms was conducted on a 24-core Intel i5 processor 2.10 GHz, 16GB memory. In PDFS, the partial edit paths are sorted in the centralized heap OPEN, as mentioned in Section 4.1. Each thread takes one edit path fromOPEN andthe editpathisthendeletedfromOPEN.TheCPU onlyneedstomove tothenextmemorylocationsothespatiallocalityisexploitedat besttoreducecachemisses.Forsequentialalgorithms,evaluations wereconductedononecore.

5.4.Protocol

Inthissection, theexperimentalprotocolispresentedandthe objectivesoftheexperimentaredescribed.

Let S be a graph dataset consisting of k graphs, S=

{

g1,g2,...,gk

}

. Let M=MeMa be the set of all the GED methodslistedinSection5.2,withMe=

{

A∗,DF,PDFS}thesetof exact methods andMa=

{

BP,BS-1,BS-10,BS-100, H}the set of ap- proximatemethods(wherexinBSwassetto1,10and100).Given amethodmM,wecomputedallthepairwisecomparisonsd(gi, gj)m, whered(gi,gj)m is the value returned by methodm on the graphpair(gi,gj)withincertaintimeandmemorylimits.

Twotypesof experiments were carriedout scalabilityexperi- mentandclassificationexperiment.

5.4.1. Scalabilityexperimentundertimeconstraints

In the scalability experiment, several metrics were included:

The number of best found solutions andthe number of optimal solutions.Moreover,aprojectionofponatwo-dimensionalspace (R2) isachieved by usingspeed-score and deviation-scorefeatures where speed and deviation are two concurrent criteria to be minimized. First, foreach database, the mean deviation and the meantimeisderivedasfollows:

devkp= 1 m×m

m

i=1

m

j=1

dev

(

gi,gj

)

p

pP

k∈#subsets (1)

timekp= 1 m×m

m

i=1

m

j=1

time

(

Gi,Gj

)

pand

(

i,j

)

∈[[1,m]]2

k #subsets (2)

(10)

Table 7

The effect of the number of edit paths on the performance of PDFS .

Method # best found solutions # optimal solutions Idle Time over CPU Time

PDFS-1st Floor 82 53 0.067

PDFS-100 EP 85 66 0.067

PDFS-250 EP 85 41 0.053

PDFS-500 EP 82 41 0.023

PDFS-10 0 0 EP 83 40 0.018

Table 8

The effect of the number of edit paths on the performance of PDFS when executed on the GREC dataset.

Method # optimal solutions Mean CPU Time (ms) Mean Variance (ms)

naive-PDFS 63 4,792,4 4 4 36,1157.6

PDFS 91 7,275,476 49,880.62

Fig. 5. Number of best found solution under big time constraint.

(11)

Fig. 6. Number of optimal solutions under big time constraint.

where dev(Gi, Gj) is the deviation of each d(Gi, Gj) and time(Gi, Gj) istheruntime ofeachd(Gi,Gj).Toobtaincomparableresults between databases, mean deviations and times are normalized between0and1asfollows:

deviation_scorem= 1

#subsets

S∈subsets

devmS

max_devS (3)

time_scorem= 1

#subsets

S∈subsets

timemS

max_timeS (4)

wheremax_de

v

S andmax_timeS denoterespectively themaximal mean deviation and the maximal mean execution time obtained amongallthemethodsMondatasetS.

All these metrics have been proposed in Abu- Aishehetal.(2015).Thisexperimentwasdecomposedof2tests:

Accuracy Test. The aim was to illustrate the error committed by approximated methods over exact methods. In an ideal case,

no time constraint (CT) should be imposed to reach the optimal solution. Due to the large number of considered matchings and the exponential complexityof the tested algorithms, we allowed a maximum CT of 300 s. This time constraint waslarge enough to let the methods search deeply intothe solution space and to ensure that many nodes will be explored. The key idea was to reach the optimality, whenever it is possible, or at least to get to the Graal (i.e., the optimalsolution) asclose aspossible. This usecaseisnecessarywhenit isimportanttoaccurately compare imagesrepresentedbygraphseveniftheexecutiontimeislong.

SpeedTest. Thegoalwastoevaluate theaccuracyof exactmeth- ods against approximate methods when time matters. That is to sayin a context ofvery limited time. Thus, foreach dataset,we selecttheslowestgraphcomparisonusinganapproximatemethod amongBPandHasafirsttime constraint.UnlikeBPandH,BSis not included asit is a tree-search algorithmwhich could output a solution even under a limited CT. Mathematically saying, CT is

Références

Documents relatifs

Five values of N were chosen: -1, 100, 250, 500 and 1000, where N =-1 represents the de- composition of the first floor in the search tree with all possible branches, N = 100 and

[r]

[r]

The above results can be used to generate integer vertices of the feasible set X, adjacent to a given integer vertex x 1.. The efficiency of a procedure based on these results will

Résoudre de petits problèmes portant sur les nombres

C’est coupé en 2

C’est coupé en 2

[r]