HAL Id: hal-02435355
https://hal.inria.fr/hal-02435355
Submitted on 25 Sep 2020
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A trichotomy for regular simple path queries on graphs
Guillaume Bagan, Angela Bonifati, Benoit Groz
To cite this version:
Guillaume Bagan, Angela Bonifati, Benoit Groz. A trichotomy for regular simple path queries on graphs. Journal of Computer and System Sciences, Elsevier, 2020, 108, pp.29-48.
�10.1016/j.jcss.2019.08.006�. �hal-02435355�
Contents lists available atScienceDirect
Journal of Computer and System Sciences
www.elsevier.com/locate/jcss
A trichotomy for regular simple path queries on graphs ✩
Guillaume Bagan
a,
∗, Angela Bonifati
a, Benoit Groz
baUniversitéLyon1,LIRISUMRCNRS5205,F-69622,Lyon,France bUniversitéParisSud,LRIUMRCNRS8623,F-91405,Orsay,France
a r t i c l e i n f o a b s t r a c t
Articlehistory:
Received3June2018
Receivedinrevisedform 23June2019 Accepted19August2019
Availableonline20September2019 Keywords:
Graphs Paths
Regularsimplepaths Complexity Regularlanguages Automata
We focus onthe computational complexity ofregular simple pathqueries (RSPQs). We consider thefollowingproblemRSPQ(L) foraregularlanguage L:givenanedge-labeled digraphG andtwonodesxand y,isthereasimplepathfromxto ythatformsaword belonging toL?We fullycharacterize thefrontierbetweentractability andintractability forRSPQ(L).Moreprecisely,weproveRSPQ(L)iseitherAC0,NL-completeorNP-complete dependingonthe languageL.Wealsoprovideasimplecharacterizationofthetractable fragment in terms of regular expressions. Finally, we also discuss the complexity of deciding whether a language L belongs to the fragment above. We consider several alternative representations of L:DFAs, NFAsorregular expressions,and provethat this problemisNL-completeforthefirstrepresentationandPSpace-completefortheothertwo.
©2019ElsevierInc.Allrightsreserved.
1. Introduction
Graphdatabaseshave beeninvestigatedstarting fromthelate 80s andarenow againinvogue duetotheir wide ap- plicationscenarios, rangingfromsocial networksto biologicalandscientificdatabases(see [2] fora survey).Regularpath queries (RPQs) are one ofthe mostnotable classesof queries on graphdatabases. Theyallow to retrievepairs ofnodes connectedbyapath,wherethepathisdescribedthrougharegularexpression.Suchregularpathqueriesarecomputablein timepolynomialinbothqueryanddatacomplexity(combinedcomplexity).Inthispaper,weinvestigatethecomputational complexityofregularsimplepathqueries(RSPQs),avariantofRPQinwhichthepathconnectingthepairhastobesimple, i.e., does not haverepeated vertices.Given an edge-labeled graph G anda regular language L, an RSPQ selects pairs of verticesconnectedbyasimplepathwhoseedgelabelsformawordinL.
TheevaluationofRSPQsisNP-completeevenforfixedbasiclanguagessuchas
(
aa)
∗ora∗ba∗[20],insharpcontrastwith RPQs.RSPQs are desirablein manyapplicationscenarios [17,23,6,15,13,30], such astransportation problems,VLSIdesign, metabolicnetworks,DNAmatchingandroutinginwirelessnetworks.Additionally,regularsimplepathshavebeenrecently considered in SPARQL 1.1 queries exhibiting property paths. In particular, recent studies on the complexity of property paths inSPARQL [3,18] havehighlighted the hardness ofthe semanticsproposed by W3Cto evaluate such paths inRDF graphs.Roughly speaking,accordingtothe semantics consideredin [18],the evaluationofexpressions underKleene-star closure should return a simple path, whereas the evaluationof the remaining expressions allows to traverse the same vertexmultipletimes.Assuch,thesemanticsstudiedin [18] isanhybridbetweenregularpaths andregularsimplepaths semantics.RPQshavebeenrecentlyfoundinpracticewithinreal-worldSPARQLquerylogs,suchasDBPediaandWikidata✩ Thisarticleisanextendedversionof[5].
*
Correspondingauthor.E-mailaddresses:guillaume.bagan@liris.cnrs.fr(G. Bagan),angela.bonifati@univ-lyon1.fr(A. Bonifati),benoit.groz@lri.fr(B. Groz).
https://doi.org/10.1016/j.jcss.2019.08.006
0022-0000/©2019ElsevierInc.Allrightsreserved.
querylogs [8,7,9],showingthustheincreasinginterestofusersmanipulatingreal-worldgraphdatasetswiththesequeries.
In particular,the majorityofRPQsanalyzed in large corpuses [8,9] belongto thetractable fragment Ctract studiedinthis paper,furtherconfirmingtheapplicabilityandimpactofourtheoreticalinvestigation.
Contributions.Inthispaper,weaddressthelongstanding openquestion [20,6] ofexactlycharacterizingthemaximal class ofregular languagesforwhichRSPQsare tractable.By“tractable”wemeancomputableintime polynomialinthe sizeof thegraph.Precisely,we establishacomprehensiveclassificationofthecomplexityofRSPQsforafixedregularlanguage L.
Afirst steptowardsthisimportantissuehasbeenmadein [20].Theyexhibit atractablefragment: theclassoflanguages closedundertakingsubword.However,theirfragmentisnotmaximal.
Ourcontributions canbe detailedasfollows.Weintroducea classoflanguages,namedCtract,forwhichRSPQscan be evaluated inpolynomial time (data complexity),andevenin NL. Wethen show thatRSPQ evaluationisNP-completefor every regularlanguagethatdoesnotbelongtoCtract.Consequently,themaximaltractablefragment Ctract characterizesthe frontierbetweentractabilityandintractabilityforthisproblem,underthehypothesisNL
=
NP.Wefurtherrefineourresults toshowthefollowingtrichotomy:theevaluationofRSPQproblemiseitherAC0,NL-completeorNP-complete.Wenotethat weconsiderthelanguage Lasafixedparameter,thusourresultscharacterizethedatacomplexityofRSPQevaluation.We also discuss the complexity of deciding, given a language L, whether the RSPQ problem for L is tractable. We considerseveralalternativerepresentationsofL:DFAs,NFAsorregularexpressions.Weprovethatthisproblemofdeciding tractabilityisNL-completeforthefirstrepresentationandPSpace-completeforthetwoothers.
Next,wegiveacharacterizationofthetractablefragmentCtract foredge-labeledgraphsintermsofregularexpressions.
We also show that Ctract isclosed byunion andintersectionand show thatlanguages inCtract are aperiodic, i.e., can be expressedbyfirst-orderformulas [29].
Weconcludewithsome minorresultsthatidentifyfurthercaseswhereRSPQsadmitefficientsolutions.Wethusprove thatRSPQsare FPTfortheclassoffinitelanguages.Furthermore,weprovethattheproblemisalsoFPTfortheclassofall regular languageswhentheparameteristhesizeofthepath.Finally,weprovethattheproblemRSPQispolynomialw.r.t.
combinedcomplexityongraphsofboundeddirectedtreewidth.Thisisactuallyastraightforwardgeneralizationofaresult of[14].
A preliminary version ofthe presentarticlehas appearedin [5] without proofsofthe main results.Here we provide detailedproofs.
Relatedwork.Afew papersdealwithRSPQsorarerelatedtothem.Lapaughetal. [16] provethatfinding simplepathsof evenlengthispolynomialfornondirectedgraphsandNP-completefordirectedgraphs.Thisstudyhasbeenextendedin [4]
byconsideringpathsoflengthi modk.Similarly,findingk disjointpathswithextremitiesgivenasinputispolynomialfor non directedgraphs [26] andNP-completefordirected graphs [10]. Usingtheseresults,MendelzonandWoodprovethat evaluating anRSPQ onanedge-labeled directedgraphisNP-hardevenforfixed languages [20].However,they show that the problemcanbe decidedinpolynomialtime forsubword-closedlanguages.Theyalsoshowthattheproblembecomes polynomialundersomerestrictionsonthesizeofcyclesofbothgraphandautomaton.Asubsequentpaper [22] provesthe polynomialityfortheclassofouterplanargraphs.Barrettetal. [6] extendthisresult,provingthat theregularsimplepath problem ispolynomial w.r.t.combined complexity forgraphs ofboundedtreewidth. Barrett etal. [6] alsoshow that the problemisNP-completefortheclassofgridgraphsevenwhenthelanguageisfixed.
WealreadymentionedthatthecomplexityofevaluatingSPARQLpropertypathshasbeeninvestigatedinpreviousstud- ies [18,3]. As highlighted above, the semantics of SPARQL property paths blends the arbitrary paths and simple paths semantics. Losemann andMartens [18] andArenaset al. [3] focus onthe complexity of evaluating such property paths.
TheyshowthattheevaluationisNP-completeinseveralcases,andexhibitcasesinwhichitispolynomial.Moreprecisely, LosemannandMartens [18] classifyseveralfragmentsofpropertypathswithrespecttotheircomplexity.Boththeseman- ticsandthequeryfragmentsaredifferentfromtheonesinourpaper.Countingthenumberofpathsthatmatcharegular expression(whichispermittedforinstanceinSPARQL1.1)ishardinmanycases [18,3].Recently,MartensandTrautner [19]
studied thedecision- and enumerationproblemsconcerning theevaluationofRPQsby consideringseveralsemantics: ar- bitrary paths,shortestpaths,andsimplepaths.Whilewe proveherea trichotomyforthedatacomplexityofthedecision problem,theyfocusonthedatacomplexityofthepolynomialdelayenumerationproblem.
Section 2 introduces and illustrates the problem. We then establish our trichotomy. We first show insection 3 why languages outside our tractable fragment have NP-hard complexity.Section 4 discusses some properties ofthe tractable fragment, andshowshow thoseproperties(about stronglyconnectedcomponents) lead toa polynomial evaluationalgo- rithm.Andfinally,Section5detailsouralgorithmtodealwiththetractablelanguages.
2. Preliminaries
Let
[
n]
denotethesetofintegers{
1, . . . ,
n}
and[
n,
m]
denotethesetofintegers{
n, . . . ,
m}
. 2.1. ComplexityA TM refers to a Turing Machine anda NDTM refers to a non deterministic TuringMachine. AC0
,
L,
NL,
P,
NP,
PSpace refer to theclassical classesofcomplexity [24]. Therelations AC0⊆
L⊂
NL⊆
P⊆
NP⊆
PSpacebetweentheseclasses are wellknown.FL is the class of functions computable by a deterministic log-space transducer. The class LNL is the set of decision problemscomputable by a deterministic log-space algorithmwith an oraclein NL.The class FLNL isthe set offunctions computablebyadeterministiclog-spacetransducerwithanoracleinNL.TheclassFLNL
(
NL)
isthesetofproblemsreducible toaprobleminNLbyafunctioninFLNL.Lemma1.[24,11]NL
=
LNL=
FLNL(
NL)
.NLNL istheclassofdecisionproblemscomputablebyanondeterministiclog-spacealgorithmwithanoracleinNL.The nextlemma istrueonlyifwemakesomerestrictions ontheoraclemachinemodel(see[27]):theTMmustwriteonthe oracletapedeterministicallyi.e.itworksdeterministicallyassoonasitstartstowrite onthetapeuntilitcallstheoracle.
Theoracletapeiserasedattheendofeachcall.
Lemma2.[11]NL
=
NLNL. 2.2. GraphsInourpaper,weessentiallyconsiderdb-graphs.Adb-graphisatupleG
= (
V, ,
E)
whereV isasetofvertices,isa setoflabelsandE
⊆
V× ×
V isasetofedgeslabeledbysymbolsof.Givenaset S
⊆
V,G[
S]
theinducedsubgraph ofG by S is(
S, ,
E∩
S× ×
S)
.Apath pofadb-graphG fromxto y isasequence(
v1=
x,
a1, . . . ,
vm,
am,
vm+1=
y)
suchthatforeachi∈ [
m+
1]
,viisavertexinGandforeachi∈ [
m]
,(
vi,
ai,
vi+1)
isanedgeinG.Apath pissimpleifall verticesvi inparedistinct.GivenalanguageL⊆
∗,pisL-labeledifa1. . .
am∈
L.Givenasubset S⊆
V,pis S-restricted ifeveryintermediate vertexof p belongsto S.Givena simplepath pandtwo verticesxand y in p,p[
x,
y]
denotesthe subpathofpfromxto y.2.3. Languagesandautomata
Let L be a regular language. Given a word w and a language L, w−1L
= {
w:
w w∈
L}
. We denote by AL= (
QL,
iL,
FL,
L)
the minimal DFA for L, and by ML the number of states ML= |
QL|
in AL. Whenever the language is obviousfromcontext,wedropthesubscriptandwrite M insteadofML.Weassumethat AL iscompletei.e.L isatotal function,so thatingeneral AL mayhavea sinkstate.Foranystate q
∈
Q andword w∈
∗,L
(
q,
w)
denotes thestate obtainedwhenreading wfromq.Foranystateq∈
Q andsetofwords S⊆
∗,L
(
q,
S)
denotesthesetofstatesq such thatthereexists w∈
S withq=
L(
q,
w)
.Finally,Lq denotesthesetofallwordsacceptedfromq.Foreverystateq we denotebyLoop(
q)
thesetofallnonemptywordsthatallowtolooponq: Loop(
q) = {
w∈
+|
L(
q,
w) =
q}
.Wesaythat astateq isreachablefromastate qifq∈
L(
q,
∗)
.A(stronglyconnected)componentof AL isamaximalsetofstates thatarepairwisereachable.TherunofL(or AL)overapathp
= (
v1,
a1, . . . ,
am,
vm+1)
isthemappingρ : {
v1, . . . ,
vm+1} →
QL suchthat:ρ (
v1) =
iL andρ (
vi+1) =
L(
iL,
a1. . .
ai)
forevery i∈ [
m]
.Therearemanycharacterizationsofaperiodiclanguages [29].Alanguage L isaperiodicifandonlyifitsatisfiesL
(
q,
wM+1) =
L(
q,
wM)
foreverystate qandwordw.Intuitively,thatmeansthat foranystate q0 andaword w,intheinfinitesequence(
q0,
q1,
q2, . . .)
withqi+1=
L(
q,
w)
foranyintegeri,thereisan integerMsuchthatqM+1=
qM.2.4. Regularsimplepaths
GivenaregularlanguageL,wedefinethefollowingproblem:
RSPQ
(
L)
Input: Adb-graphG
= (
V, ,
E)
,andtwoverticesx,
y∈
V Question:IsthereasimpleL-labeledpathfromxtoy?Forthisproblem, L isfixed,sowefocuson datacomplexity.Notice thattherepresentationofL doesnot matterhere.
AlthoughweconsidertheBooleanversionoftheproblem,namelydecidingtheexistenceofapath,ouralgorithmsactually alsoreturnasimpleL-labeledpath.
The main problem that we address in thispaper is to distinguish cases when RSPQ
(
L)
is tractable (i.e. decidablein polynomialtime)andwhenitisnot(i.e.NP-hard).2.5. Theclassoftractablelanguages
WerecallthatM referstothesizeof QL,hereandhenceforth.WenextintroducetheclassCtract oflanguages.Wewill provethatitisexactlytheclassofregularlanguagesforwhichRSPQ
(
L)
istractable.Definition1.A regular language L belongs tothe class Ctract if the followingproperty is satisfied: forall pairs of states q1
,
q2∈
QL andallwordswwithLoop(
q1) = ∅
,Loop(
q2) = ∅
,q2∈
L(
q1,
∗)
andw∈
Loop(
q2)
,itholdsthatwMLq2⊆
Lq1. ThisdefinitionismerelyatechnicaldefinitionforCtract,butwewillprovideinTheorem6amoreintuitivecharacteriza- tionsoftheclass.Example1.Asan introductory example,considerthe language L
=
a∗(
bb++ )
c∗.Weobserve thatthislanguage belongs toourclassCtract.WewishtodecideRSPQ(L),i.e.,whetherthereexistsasimplepathfromxto ylabeledby L,giventwo verticesx,
y ofadb-graphG.ItisnotabsolutelytrivialthatRSPQ(L)canbesolvedefficiently:RSPQ(a∗bc∗)hasindeedbeen provedNP-complete.YetweoutlinebelowapolynomialalgorithmforL.Wedistinguishtwocases:thereisasimpleL-labeledpathfromxto yifandonlyifoneofthefollowingcasesholds:
1: thereexistsasimplea∗bkc∗-labeledpathfromxto yforsomek
∈ {
0,
2,
3}
2: case 1doesnotholdandthereexistsasimplea∗b4b∗c∗-labeledpathfromxto y.The first caseis theeasiest to check. We firstcheck whether y canbe reachedfrom x by a(non-necessarily simple) a∗c∗-labeledpath.Ifwefindone,we obtainasimplea∗c∗-labeledpathby eliminatingits loops.Assumenowthere isno a∗c∗-labeledpathfromxto y.Wethencheckasfollowsifthereexistsasimplea∗bkc∗-labeledpathfromxto y forsome k
∈ {
2,
3}
:we tryeverypossibleassignmentforthekmiddleb-edges.Foreach combinationofk b-edges, wecheckifthe initialb-edgecanbereachedfromxthroughana∗-labeledpath(avoidingtheverticesoftheotherbedges),andcheckifthe finalb-edgecanleadto ythroughsomec∗-labeledpath(avoidingtheverticesoftheotherbedges).Intheresultinga∗bkc∗- labeledpaththea∗-labeledprefixandc∗-labeledsuffixcannotintersect(weassumedthereisnoa∗c∗ path).Consequently weobtainasimplea∗bkc∗-labeledpathbyeliminatingitsloops.Asthenumberofpossibleassignmentsforkedges(k≤
3) ispolynomial,wehaveprovedthatwecanfindoutinpolynomialtimewhethercase 1holds.Letusnowassumew.l.o.g.that thereisnoa∗bkc∗-labeledpathfromxto y fork
∈ {
0,
2,
3}
.Wecanshow thatinthis secondcasethereexistsasimpleL-labeledpathfromxto y ifandonlyifthereexistsixverticesv1,
v2,
v3,
v4,
v5,
v6,two integersla,
lb andtwosetsSa,Sbsatisfyingallfollowingconditions:•
theverticesv1, . . . ,
v6 arealldistinctexceptthat v3 mayequalv4.•
thereisab-labelededgefromv1 tov2,fromv2to v3,fromv4 tov5,andfromv5to v6.•
thereisana∗-labeledpathfromxtov1 avoidingallother vis(
i>
1)
.Theshortestpossiblesuchpathhaslengthla.•
Saisthesetofallverticesreachablefromxthroughana∗-labeledpathoflengthatmostlathatavoidsallvis(
i>
1)
.•
thereisab∗-labeledpathfromv3 tov4 ofwhichallvertices(butthefirstandlast)avoidSa andthevis.Theshortest possiblesuchpathhaslengthlb.•
Sb isthesetofall verticesreachablefrom v3 throughanyb∗-labeledpathoflengthatmostlb that avoids Sa andall other vis(
i=
4)
.•
thereisac∗-labeledpathfromv6 to yofwhichallvertices(butthefirst)avoidSaandSbandallother vis(
i<
6)
. Thefigurebelowsummarizesalltheseconditions.Theseconditionscanclearlybeverifiedintimepolynomialin G.Itisrelativelyclearalsothatthepathconstructedabove isan L-labeledsimplepathfromx to y,sotheconditionsaresufficienttoobtain an L-labeledsimplepath.Toprovethat our procedureiscorrect, we onlyhavetoprove thatreciprocally,ifthereexists asimple L-labeledpathwe canfind one satisfyingourrestrictions(theconditionsaboveinvolvingthevi,Sa, Sb).
ForeveryshortestL-labeledsimplepathp fromxto y,letv1
, . . . ,
v6 denotetheverticesthatdelimitthefirstandlast two b-edges of p.We nextshow that thoseverticessatisfy theconditionsabove. Thelast vertexof pthat belongsto Sa cannotoccurafterv3 inp.Otherwise,wecouldobtainasimplepath pbyreplacingtheprefixofpuptov withashorter path through Sa.Thisresultingpath p wouldstill be L-labeledby definitionof the vi,whichcontradicts theminimality of p.Asimilarargumentshowsthat thelastoccurrenceofavertexfrom Sb cannotoccurafter v6.We concludethat the paths connectingv3 tov4 in p(resp.v6 to y)excluderespectivelyall verticesfromSa (resp.Sa∪
Sb).Asaconsequence, the pathfromxto v1 willonlyfeature verticesfromSa byminimalityof p,whichproves thatvertices v1, . . . ,
v6 satisfy theconditionsabove.The crux ofour approachis to constructthe a∗, b∗ andc∗ subpathsindependently, lestwe enumerate exponentially manypaths.Thisiswhywerequirethattheb∗ subpath avoids Sa: thiscondition isstrongerthannecessarytoguarantee thefirsttwosubpathsdonotintersect,butthestrongerrequirementallowsustobuildthetwosubpathsindependently,as Saisasupersetoftheverticesonthesubpathfromxtov1.Ouralgorithmfortractableinstanceswillgeneralizethisidea.
3. HardlanguagesforRSPQ
Thissectionisdevotedtotheproofofahardnessresult:RSPQ
(
L)
isNP-hardforeveryregularlanguage Lthatdoesnot belongtoCtract.ThefirststeptowardthatproofliesinthefollowingcharacterizationofCtract.Definition2(Witnessofhardness).Let L be aregular language. Awitnessforhardnessof L isa tuple
(
wl,
wm,
wr,
w1,
w2)
wherewl,
wr∈
∗ andwm,
w1,
w2∈
+satisfying•
wlw∗1wmw∗2wr⊆
L•
wl(
w1+
w2)
∗wr∩
L= ∅
.Lemma3.LetL bearegularlanguagethatdoesnotbelongtoCtract.Then,L admitsawitnessforhardness.
Proof. Let L be a regular language that does not belong to Ctract. For commodity, we distinguish two cases, depending on whether L satisfies or not the following property: Lq2
⊆
Lq1 for every q1,
q2∈
QL such that q2∈
L(
q1,
∗)
and Loop(
q1) ∩
Loop(
q2) = ∅
(PropertyP).LetLbealanguagethatdoesnotsatisfyPropertyP,thereexistq
,
q2,
wm,
w,
wrsuchthatL
(
q,
wm) =
q2,w∈
Loop(
q) ∩
Loop(
q2)
,andwr∈
Lq2\
Lq.Letwl suchthat(
iL,
wl) =
q.Then wl,
wm,
wr,
w1=
w2=
wisawitnessforhardness.Wenextplantoexhibitawitness forhardnessforthecasewhereL satisfiesPropertyP,butwefirstprovethatevery language satisfyingproperty P (whetherin Ctract ornot)is aperiodic.Let L be a language satisfyingProperty P, q
∈
QL andw awordin+.Letalsoqdenotethestateq
=
L(
q,
wM)
.Wedenotebyq thestateL
(
q,
w)
.Wewanttoprove thatq=
q.Bythepigeonhole principlethereexistssome k0<
k1≤
M suchthatL
(
q,
wk0) =
L(
q,
wk1)
.Wethen haveL
(
q,
wk) =
q fork=
k1−
k0.Then q andq both loopon wk,sothat Lq=
Lq bydefinitionofP,henceq=
q by minimality.Consequently,Lisaperiodic.Let L be a language that satisfies Property P (and so in particular is aperiodic), butthat does not belong to Ctract. Bydefinition ofCtract there existstates q
,
q2 andwords wl,
w1,
w2,
wm,
wr such that(
iL,
wl) =
q, w1∈
Loop(
q)
, w2∈
Loop(
q2)
,L
(
q,
wm) =
q2, wr∈
Lq2 and wM2 wr∈ /
Lq. W.l.o.g.we can supposethat w1= (
w1)
M forsome word w1.We then claimthat Lq⊆
Lq forevery q inL
(
q,
∗w1)
.Indeed,forevery q∈
L(
q,
∗w1)
, thereexists somek>
0 such thatL
(
q,
wk1) =
q, hence q loops over w1 by aperiodicity of L. We thus have w1∈
Loop(
q) ∩
Loop(
q)
and therefore Lq⊆
LqduetoPropertyP.Letwr
=
wM2 wr.Bydefinition,wmw∗2wr⊆
Lq becausewr∈
Lq2.Wenowprovethat(
w1+
w2)
∗wr∩
Lq= ∅
,because any wordin(
w1+
w2)
∗wr can be decomposed into uv withu∈ + (
w1+
w2)
∗w1 and v∈ (
w2)
∗wr. We recall that wr=
w2Mwr∈ /
LqandLisaperiodic,sothatv∈ /
Lq.Furthermore,wehavejustprovedthatq=
L(
q,
u)
satisfiesLq⊆
Lq. Consequently, v∈ /
Lq anduv∈ /
Lq.Thus, wl, wm, wr, w1,and w2 providea witnessforhardness, whichconcludesthe proofofLemma3.We cannow prove ourhardness result, by reduction fromVertex-Disjoint-Path, a problemalsoused in [20] toprove hardnessintheparticularcaseofa∗ba∗.
Vertex-Disjoint-Path
Input: AdirectedgraphG
= (
V,
E)
,fourverticesx1,
y1,
x2,
y2∈
VQuestion:Aretheretwodisjointpaths,onefromx1 toy1 andtheotherfromx2to y2?
Lemma4.LetL bearegularlanguagethatdoesnotbelongtoCtract.Then,RSPQ
(
L)
isNP-hard.Proof. LetL
∈ /
Ctract. Weexhibit areduction fromtheVertex-Disjoint-PathproblemtoRSPQ(
L)
. AccordingtoLemma3, L admitsawitnessforhardness wl,
wm,
wr,
w1,
w2.Bydefinitionwegetwl(
w1+
w2)
∗wr∩
L= ∅
andwlw∗1wmw∗2wr⊆
L.Webuild fromG adb-graph Gwhoseedges arelabeled bynon emptywords.Thisisactuallya generalizationofdb- graphs. Nevertheless,by addingintermediate vertices,an edge labeledby a word w canbe replaced witha pathwhose edgesformthewordw.
Gisconstructedasfollows.TheverticesofGarethesameastheverticesofG.Foreachedge
(
v1,
v2)
inG,weaddtwo edges(
v1,
w1,
v2)
and(
v1,
w2,
v2)
.Moreover,we addtwo newvertices x,
y andthreeedges(
x,
wl,
x1)
,(
y1,
wm,
x2)
and(
y2,
wr,
y)
.WenextprovethatRSPQ(
L)
returnsTruefor(
G,
x,
y)
iffVertex-Disjoint-PathreturnsTruefor(
G,
x1,
y1,
x2,
y2)
. Assume thereis a simple L-labeled path p fromx to y in G.By definitionof G, thispathnecessarily goesthrough the edge(
y1,
wm,
x2)
since wl(
w1+
w2)
∗wr∩
L= ∅
.Since p is simple, the subpaths from x1 to y1 andx2 to y2 are disjoint,henceVertex-Disjoint-PathreturnsTruefor(
G,
x1,
y1,
x2,
y2)
.Reciprocally,ifVertex-Disjoint-PathreturnsTruefor(
G,
x1,
y1,
x2,
y2)
,thereexistdisjointpaths fromx1 to y1 andfromx2 to y2.Bydefinitionthesetwopathsmatchaword in(
w1+
w2)
∗.We canthen obtain twodisjointsimple paths,one from x1 to y1 matchingaword in w∗1 andone from x2 to y2 matchinga wordin w∗2.Toobtain thosepathswe keepthevertices astheoriginal paths,eliminate theloopsifFig. 1.Reduction forL=a∗b(cc)∗d.
there areany,andswitch w1 andw2 edges whereneeded:wecanalways replacea w1 edgewitha w2 by construction of G sinceeverypairofverticesisconnectedby bothtypesofedges ornone.Concatenatingtheedge
(
x,
wl,
x1)
withthe firstpath,theedge(
y1,
wm,
x2)
,thesecondpathandtheedge(
y2,
wr,
y)
providesasimpleL-labeledpath pfromxto y, whichconcludesourproof.WeillustrateinFig.1thereductionforL=
a∗b(
cc)
∗d,onaninstance(
G,
x1,
y1,
x2,
y2)
,choosing wl=
w1=
a,wm=
b, w2=
cc,andwr=
d.This concludesour proof that languagesoutsideCtract are intractable. After thisnegative result, we now focuson the positiveresult,namelythatlanguagesinCtract admitefficientalgorithms.
4. PropertiesoflanguagesinCtract
The main result ofthispaper is that forevery L
∈
Ctract,RSPQ(
L) ∈
NL. The algorithm toevaluate efficiently RSPQ(
L)
exploitsaparticularkindofpumpingargumentbetweenstronglyconnectedcomponentsoftheautomaton.Thispumping argument proves that if we build carefully a path using the usual reachability algorithm inside the strongly connected components,thenweneednotcareaboutpossibleintersectionsbetweensubpathscorrespondingtodifferentcomponents.Inthissection,weintroduceandprovethispumpingargumentinLemma11throughaserieoftechnicallemmasaboutthe structureofautomatathatrecognizeCtractlanguages.
4.1. AlternativecharacterizationofCtract
Tobeginwith,weprovethateverylanguagefromCtractisaperiodicanddeduceanalternativecharacterizationofCtract. Lemma5.LetL bearegularlanguageinCtract.ThenL isaperiodic.
Proof. IntheproofofLemma3wedefinedapropertyPandshowedthatlanguagessatisfyingpropertyPareaperiodic.We show thatevery L
∈
Ctract satisfiespropertyP.LetL∈
Ctract,q1,
q2∈
QL andw satisfyq2∈
L(
q1,
∗)
andw∈
Loop(
q1) ∩
Loop(
q2)
.BydefinitionofCtract, wMLq2⊆
Lq1,henceLq2⊆
Lq1 becausew∈
Loop(
q1)
.We then exploit thisaperiodicity property to establish the followingcharacterization of Ctract, which strengthens the requirementsfromDefinition1ontheloopsof AL.
Lemma6.LetL bea regularlanguage.Then, L belongstoCtractiffforeverypairofstatesq1
,
q2∈
QL suchthatLoop(
q1) = ∅
, Loop(
q2) = ∅
andq2∈
L(
q1,
∗)
,thefollowingstatementholds:(
Loop(
q2))
MLq2⊆
Lq1.Proof. The (if) implication is immediate by Definition 1. Let us now prove the (only if) implication. Assume L
∈
Ctract. Let q1,
q2∈
QL satisfy Loop(
q1) = ∅
, Loop(
q2) = ∅
, q2∈
L(
q1,
∗)
, and let w∈
Loop(
q2)
. Let also q3 denote the stateL
(
q1,
wM)
.ThenDefinition1implies wMLq2⊆
Lq1.Thus,Lq2⊆
Lq3.Thecruxoftheproofistochoosecarefullyq1,q2 andw toexploittheconstraintsonLq3.Let q1
,
q2 be two statessuch that Loop(
q1) = ∅
,Loop(
q2) = ∅
andq2∈
L(
q1,
∗)
. Let(
v1, . . . ,
vM)
be a sequence of wordsin(
Loop(
q2))
M andq3=
L(
q1,
v1. . .
vM)
.WewishtoproveLq2⊆
Lq3.Forsome i
,
j,0≤
i<
j≤
M, wegetL
(
q1,
v1. . .
vi) =
L(
q1,
v1. . .
vj)
,usingtheconventionL
(
q1,
v1. . .
vi) =
q1 for i=
0.Letu1=
v1. . .
vi,u2=
vi+1. . .
vjandu3=
vj+1. . .
vM.Letq4=
L(
q1,
u1)
.WeclaimthatLq2⊆
Lq4.Theresultthen follows fromLq2=
u−31Lq2⊆
u−31Lq4=
Lq3.Toprove theclaim, let w=
u1uM2 andq5=
L(
q1,
wM)
.AsL
(
q1,
wM) =
q5 and w∈
Loop(
q2)
, we get Lq2⊆
Lq5 through Definition 1 with q1,
q2 and w. Furthermore, u2 belongs to Loop(
q5)
because Lisaperiodic.Toconcludetheproof,weobservethatLq5⊆
Lq4,byDefinition1withq5,
q4 andu2,andbecauseL
(
q4,
u2M) =
q4 andu2∈
Loop(
q5)
.11 ThislastapplicationofDefinition1correspondsactuallytoobservingthateverylanguageinCtractsatisfiespropertyPfromLemma3.
4.2. TechnicallemmasonthecomponentsofAL
In thissection, we show propertiesabout the componentsof Ctract languages. Notice that states in a componentare mutually reachable, but not reachable from states in other components that they can reach themselves. From now on, anduntilthe endofthe section,we fix alanguage L
∈
Ctract.We introduce inLemmas9 and11thepumping argument thatwe exploitinthealgorithm tocomputea simplepath.Intheotherlemmaswe proveauxiliaryresults,basedonthe decompositionofthe automatoninstronglyconnectedcomponents.We provethat componentsoflanguagesinCtract are veryparticular,inthesensethateverywordstayinglongenoughinthecomponentissynchronizing.Apreliminarylemma showsthattwodistinctstatesq1 andq2inthesamecomponentcannotlooponthesameword.Lemma7.Letq1andq2betwostatesbelongingtothesamecomponentofAL.IfLoop
(
q1) ∩
Loop(
q2) = ∅
,thenq1=
q2.Proof. Letq1
,
q2asabove,andletw awordinLoop(
q1) ∩
Loop(
q2)
.AccordingtoDefinition1,wMLq2⊆
Lq1,henceLq2⊆
Lq1 since w∈
Loop(
q1)
.Bysymmetry,Lq2=
Lq1,whichimpliesq2=
q1.Thenexttwolemmascharacterizetheinternallanguageofacomponent.
Lemma8.LetC beacomponentofAL,q1
,
q2∈
C anda∈
.ThenL
(
q1,
a) ∈
C iffL
(
q2,
a) ∈
C .Proof. Let q1
=
q2 two states in the same component C. Let a satisfyL
(
q1,
a) ∈
C. Let also w∈
Loop(
q1) ∩
a∗ and q3
=
L(
q2,
wM)
:a andw necessarilyexistbecauseC isthestronglyconnectedcomponentofq1 andq2.Wenextprove thatq3 belongstoC:byourdefinitionofC,thisimpliesL
(
q2,
a) ∈
C.As Lisaperiodic, w∈
Loop(
q3)
,andconsequently, wMLq3⊆
Lq1 byDefinition1.Furthermore,wMLq1⊆
Lq2 alsobyDefinition1.HenceLq3⊆
Lq1andLq1⊆ (
wM)
−1Lq2=
Lq3.Thus,Lq1=
Lq3 and,byminimalityofAL,q1=
q3,sothatq3∈
C.Notation1.WedenotetheinternalalphabetofacomponentC ofAL by
C
= {
a∈ : ∃
q1,
q2∈
C.
L(
q1,
a) =
q2}
. AsadirectconsequenceofLemma8weget:Lemma9.LetC beacomponentofAL,q
∈
C andw∈
∗.ThenL
(
q,
w) ∈
C iffw∈ (
C)
∗.Finally,weprovethatinsidea component,everywordwithlength atleast M2 issynchronizing. Thisresultisthecore ofourpumpingargumentbetweenstronglyconnectedcomponentsasexposedinLemma11.
Lemma10.LetC beacomponentof AL,
C be theinternalalphabetofC , q1
,
q2 be twostatesof C and w∈ (
C)
M2.Then,L
(
q1,
w) =
L(
q2,
w)
.Proof. Assumethat w
=
a1. . .
aM2. Foreach i from0 to M2 andα =
1,
2, let qα,i =L
(
qα,
a1. . .
ai)
. Since there are at mostM2 distinctpairs(
q1,i,
q2,i)
,thereexisti,
j,withi<
j suchthatq1,i=
q1,j andq2,i=
q2,j.ByLemma9,q1,i,
q2,i∈
C. Let w=
ai+1. . .
aj. We have w∈
Loop(
q1,i) ∩
Loop(
q2,i)
, hence q1,i=
q2,i by Lemma 7. As a consequence,L
(
q1,
w) =
L(
q2,
w)
.Noticethattheabovelemmastillholdsforw
∈ (
C)
M2∗C.Hereandthereafter,wefixtheconstant N
=
2M2.Lemma11.Letq1
,
q2betwostatessuchthatLoop(
q1) = ∅
,Loop(
q2) = ∅
,andq2∈
L(
q1,
∗)
.LetC bethecomponentthatcontains q2andCbetheinternalalphabetofC .Then,Lq2
∩ (
C)
N∗
⊆
Lq1.Proof. Let w
∈
Lq2∩ (
C)
N∗. There are some words u
,
v∈ (
C)
M2, w∈
∗ such that w=
uv w. By Lemma 9 and the PigeonholePrinciple,there exista state q3∈
C and M+
1 non-empty words v1, . . . ,
vM+1 such that v=
v1. . .
vM+1 andL
(
q2,
uv1. . .
vi) =
q3 for every i∈ [
M]
. Therefore, w∈
uv1(
Loop(
q3))
M−1vM+1w. By Lemma 10,L
(
q3,
uv1) =
L(
q2,
uv1) =
q3.Thus, wbelongstoboth(
Loop(
q3))
MvM+1wandLq3.ByLemma6,w∈
Lq1.Ormainresultfocusesondatacomplexityandthereforeassumesthelanguage(henceN)isconstant.Yetthecomplexity willbeexponentialinN thereforewenextprove,forthesakeofefficiency,thatwecantake N
=
MinLemma11.Lemma12.Letq1