• Aucun résultat trouvé

Wenextcompareourformalismtopreviouslyproposedformalisms:querylanguagesoverbags(withnoorder);aquery languageforpartiallyorderedmultisets;andother relatedwork.Toourknowledge,however,noneoftheseworksstudied thepossibilityorcertaintyproblemsforpartiallyordereddata,sothatourtechnicalresultsdonotfollowfromthem.

Standardbagsemantics. Anatural desideratumfor oursemantics on(partially) ordered relationsis that itshould be a faithfulextensionofthebagsemanticsforrelationalalgebra.WefirstconsidertheBALG1 language onbags[18] (the“flat fragment”oftheirlanguageBALG onnestedrelations).WedenotebyBALG1+thefragmentof BALG1 thatincludesthe stan-dardextensionofpositiverelationalalgebraoperationstobags:additiveunion,crossproduct,selection,andprojection.We observethat,indeed,oursemanticsfaithfullyextendsBALG1+:queryevaluationcommuteswith“forgetting”theorder.Formally, forapo-relation

,wedenotebybag

()

itsunderlyingbagrelation,anddefinelikewisebag

(

D

)

forapo-database D asthe databaseoftheunderlyingbagrelations.Forthefollowingcomparison,weidentifyboth×DIRand×LEXwiththe×of [18]

(asbothourproductoperationsyieldthesamebagasoutput,foranyinput),andweidentifyourunionwiththeadditive unionof [18].Thefollowingthentriviallyholds.

Proposition65.ForanyPosRAqueryQ andapo-relationD,bag

(

Q

(

D

))

=Q

(

bag

(

D

))

,where Q

(

D

)

isdefinedaccordingtoour semanticsandQ

(

bag

(

D

))

isdefinedbyBALG1+.

Proof. Thereisanexactcorrespondenceintermsoftheoutputbagsbetweenadditiveunionandourunion;betweencross productand×DIR and×LEX;betweenourselectionandthat ofBALG1+,andsimilarlyforprojection (asnotedbeforethe statementofProposition65inthemaintext,atechnicalsubtletyisthattheprojectionofBALG canonlyprojectonasingle attribute, butone can encode “standard” projection on multipleattributes). The propositionfollows by induction onthe querystructure. 2

ThefullBALG1 languageincludesadditionaloperatorssuchasbagintersectionandsubtraction,whicharenon-monotone andassuchmaynotbeexpressedinourlanguage:itisalsounclearhowtheycouldbeextendedtooursetting(seefurther discussion in“Algebra onpomsets” below). Onthe other hand,BALG1 doesnot includeaggregation,and so PosRAaccand BALG1 areincomparableintermsofexpressivepower.

Abetteryardsticktocompareagainstforaccumulationcouldbe theworkof [19]:theyshowthat theirbasiclanguage BQL is equivalent to BALG, andthen furtherextendthe language withaggregate operators, to define alanguage called N RLaggr on nestedrelations. Onflatrelations, N RLaggr capturesfunctionsthat cannotbe capturedinour language:in particulartheaverage functionAVGis non-associativeandthus cannot becapturedby ouraccumulationfunction (which anywayfocusesonorder-dependent functions,asPOSS/CERTare trivialotherwise).Ontheother hand,N RLaggr cannot testparity(Corollary 5.7in[19])whereasthisiseasilycapturedbyouraccumulationoperator.WeconcludethatN RLaggr and PosRAaccareincomparableintermsofcapturedtransformationsonbags,evenwhenrestrictedtoflatrelations.

Algebraonpomsets. We now compare our work to algebras defined onpomsets [20,21], which also attemptto bridge partial ordertheory anddata management(although, again, theseworks donot studypossibilityandcertainty). Pomsets are labeled posets quotientedby isomorphism (i.e.,renaming of identifiers), like po-relations. Beyond similarities in the language design, amajor conceptual difference betweenourformalism andthat of [20,21] isthat their work focuseson processingconnectedcomponentsofthepartialordergraph,andtheiroperators aretailoredforthatsemantics.Asa conse-quence, theirsemantics isnot afaithful extensionofbagsemantics, i.e.,their language wouldnot satisfy thecounterpart ofProposition65(see,forinstance,thesemanticsofduplicateeliminationin[20]).Bycontrast,wemanipulatepo-relations thatstandforsetsofpossiblelistrelations,andouroperatorsaredesignedaccordingly,unlike thoseof[20],where trans-formationstake intoaccountthestructure (connectedcomponents)oftheentireposetgraph.Becauseofthischoice, [20]

introduces non-monotone operators that we cannot express,andcan designa duplicate eliminationoperatorthat cannot fail.Indeed,thepossiblefailureofourduplicateeliminationoperatorisadirectconsequenceofitssemanticsofoperating oneachpossibleworld,possiblyleadingtocontradictions.

Ifweconsequentlydisallowduplicateeliminationinbothlanguagesforthesakeofcomparison,thentheresulting frag-mentPom-Algεn ofthelanguageof[20] canyieldonlyseries-paralleloutputs(Proposition 4.1of [20]),unlike PosRAqueries whoseoutputordermaybearbitrary.Toformalizethis,weneedthenotionofarealizer[5] ofaposet P=

(

V

, <)

:thisisa setoftotalorders

(

V

, <

1

), . . . , (

V

, <

n

)

suchthat,foreveryx

,

yV,wehavex

<

yiffx

<

iyforall i.Wecanuserealizers to expressarbitrarypo-relations usingthe ×DIR-product,asis shownby rephrasingin ourcontext an existing resulton partialorders(Theorem 9.6of [22],seealso[23]).

Lemma66.Letn∈N,andlet

(

P

, <

P

)

beaposetthathasarealizer

(

L1

, . . . ,

Ln

)

ofsize n.ThenP isisomorphictoasubset

of

= [≤lDIR· · · ×DIR[≤l],withn factorsintheproduct,forsomeintegerl∈N(theorderon

beingtherestrictiononthatof

).

Proof. Wedefine

bytakingl··= |P|,andweidentifyeachelementxofP to f

(

x

)

··=

(

nx1

, . . . ,

nxn

)

,wherenxi istheposition where x occurs in Li. Now, for any x

,

yP, we have x

<

P y iffnxi

<

niy for all 1≤in (that is, x

<

Li y), hence iff f

(

x

) <

f

(

y

)

:thisisbecausetherearenotwoelements x=y and1insuchthatthei-thcomponentsof f

(

x

)

andof f

(

y

)

arethesame.Hence,taking

tobetheimageof f (whichisinjective),

isindeedisomorphicto P. 2

Thisimpliesthat PosRAqueriescanyieldarbitrarypo-relationsasoutput.

Proposition67.Foranypo-relation

,thereisaPosRAqueryQ withnoinputssuchthatQ

()

=

.

Proof. We first show that for anyposet

(

P

, <)

, there exists a PosRADIR query Q such that the tuples of

··=Q

()

all haveuniquevaluesandtheunderlyingposetof

is

(

P

, <)

.Indeed,wecantaked tobetheorderdimensionof P,which is necessarilyfinite [5], andthen, bydefinition, P hasa realizerofsize d.By Lemma66,there isan integerl∈N such that

··= [≤l]×DIR· · · ×DIR[≤l](withnfactorsintheproduct)hasasubset S isomorphicto

(

P

, <)

.Hence,letting

ψ

be a tuplepredicatesuch that

σ

ψ

(

)

=S (which can clearly be constructedby enumeratingthe elements of S), thequery

Q··=

σ

ψ

(

)

provestheclaim,with

expressedasabove.

Now, to prove the desired resultfrom this claim, build Q from Q by taking its join (i.e., ×LEX-product, selection, projection)withaunionofsingletonconstantexpressionsthatmapeachuniquetuplevalueof Q

()

tothedesiredvalueof thecorrespondingtupleinthedesiredpo-relation

.Thisconcludestheproof. 2

WeconcludethatPom-Algεn doesnotsubsume PosRA.

Incompletenessindatabases. Ourwork isinspired by thefield ofincomplete informationmanagement,which hasbeen studiedforvariousmodels [24,12],inparticularrelationaldatabases [25].Thisfieldinspiresourdesignofpo-relationsand ourstudyofpossibilityandcertainty [13,26].However,uncertaintyinthesesettingstypicallyfocusesonwhethertuplesexist oron whattheirvalues are(e.g.,withnulls [27],includingthenovelapproachof [28,29];withc-tables [25],probabilistic databases [30] or fuzzynumerical values asin [31]). Toour knowledge, though, our work is the first to study possible and certain answers inthe context of order-incomplete data. Combining order incompleteness withstandard tuple-level uncertaintyisleftasachallengeforfuturework.Notethatsomeworks [32,33,29] usepartialordersonrelationstocompare theinformativenessofrepresentations.Thisisunrelatedtoourpartialordersontuples.

Ordereddomains. Anotherlineofworkhasstudiedrelationaldatamanagementwherethedomainelementsare(partially) ordered [34–38]. Thisis inparticular thecasein worksaimingto querysequences or supportiteration (see e.g. [37,38], whichhoweverdonotconsideruncertaintyandconsequentlyneitherpartialorder).However,ourgoal,settingand perspec-tivearedifferent:weseeorderontuplesaspartoftherelations,andasbeingconstructedbyapplyingouroperators;these works seeorderasbeinggivenoutside ofthe query,hencedonot studythe propagationofuncertaintythrough queries.

Also, queries insuchworkscan oftendirectlyaccesstheorderrelation [36,39,37].Some worksalsostudyuncertaintyon totallyorderednumericaldomains [31,40],whilewelookatgeneralorderrelations.

Temporaldatabases. Temporaldatabases [41,42] consider orderon facts, butit isusually induced by timestamps, hence total.Anotableexceptionis [43] whichconsidersthatsomefactsmaybemorecurrentthanothers,withconstraintsleading toapartialorder.Inparticular,theystudythecomplexity ofretrievingqueryanswers thatarecertainlycurrent,forarich query class.Incontrast,we canmanipulate theordervia queries,andwe canalsoask aboutaspectsbeyondcurrency,as shownthroughoutthepaper(e.g.,viaaccumulation).

Using preference information. Order theory has been also used to handle preference information in database sys-tems [44–48],withsomeoperators beingthesameasours,andforrankaggregation[49,44,50],i.e., retrievingtop-kquery answers givenmultiplerankings. However,such workstypicallytry toresolve uncertaintybyreconciling manyconflicting representations (e.g.,via knowledgeonthe individual scoresgivenby differentsources andafunction toaggregate them [49],or apreference function [47]).In contrast,we focus onmaintaininga faithful modelofall possible worldswithout reconcilingthem,studyingpossibleandcertainanswersinthisrespect.

Computationalsocialchoice. The notion of preferences hasbeen studied in the domain ofcomputational social choice to determine thepossible outcomesof an electiongiven partial preferenceinformation expressedby voters [51]. In this setting,the notions ofpossiblewinnersand necessarywinners have beenintroduced to summarizethepossible outcomes, andthey havebeenconnectedto thenotion ofpossibleandnecessary answersofdatabasequeries [52].The complexity ofthese problemshasbeen studied,witha dichotomy resultthat classifiesits complexity depending on theaggregation used [51,53–55]. However, the expressiveness of thiscomputational social choice framework is incomparable to that of ourframework.Specifically,theirframeworkonlystudiespossibleandnecessaryanswersintermsofachievingamaximal scorecomputedasasumofnumericalvaluesfollowingsome positionalscoringrule,whereas ourframeworkcanperform accumulationinarbitrarymonoidsandontopofpositiverelationalalgebraqueries.Conversely,thereisnoapparentwayin ourframeworktoencodeaccumulationfollowingpositionalscoringrules,aswewouldneedtoapplyaccumulationtosum thecandidatescores,andthenlookatthetopanswersaccordingtoadifferentorderontheresult.

9. Conclusion

This paperintroduced an algebra for order-incomplete data.We have studied the complexity ofpossible and certain answersforthisalgebra,haveshowntheproblemstobegenerallyintractable,andidentifiedseveraltractablecases.

AprimemotivationforourworkistoprovideasemanticsforafragmentofSQL(namely,SPJU+aggregates)inpresence of partiallyordered data. We seeour work asa first step inthisrespect, andour choice ofoperators forthe algebra is by nomeans theonly possibleone. Infuture workwe plan tostudythe incorporationof additionaloperators,including inparticularconstructorsofthe(partial)orderbasedonthetuplevalues.Wewillalsoinvestigatehowto combine order-uncertaintywithuncertaintyonvalues,andstudyadditionalsemantics fordupElim (toavoidthe pitfallsoftheproposed semanticswhichwediscussedabove).

Inconnection withthechoice ofoperators,a naturalquestion iswhetheronemayachieve a completenessresult. We haveshown(Proposition67) that ourlanguage iscomplete intermsof“individualoutputs”, i.e., that PosRAcan beused toconstructanypo-relationusingonlythe built-inconstantrelations andoperators. Amorechallenginggoalistodesign alanguagethatiscompleteintermsoftransformations,i.e.thatmaycaptureallfunctionsoverpo-relationsinsomeclass.

Thisisanotherintriguingtopicforfurtherinvestigation.

Last,manyopenquestionsremainaboutthecomplexityofPOSS,e.g.,wedonotknowwhetherPOSSistractablewhen the accumulationmonoidisa finitegroup. Ideally, we wouldwantto establisha dichotomyresult forthecomplexityof POSS,andacompletesyntacticcharacterizationofcaseswherePOSSistractable:thisisinvestigatedfurtherinafollow-up workinvolvingthefirstauthor[56].

DeclarationofCompetingInterest

Thereisnoconflictofinterest.

Acknowledgements

We aregratefulto MarzioDeBiasi, toPálvölgyiDömötör, andto MikhailRudoy, fromcstheory.stackexchange.com,for helpful suggestions.We are alsograteful tothe anonymousreviewers fortheir feedback that helpedimprovethispaper.

ThisresearchwaspartiallysupportedbytheIsraelScienceFoundation (grant1636/13),theBlavatnik ICRC,andIntel.

References

[1]S.Abiteboul,R.Hull,V.Vianu,FoundationsofDatabases,Addison-Wesley,1995.

[2]A.Amarilli,M.L.Ba,D.Deutch,P.Senellart,Possibleandcertainanswersforqueriesoverorder-incompletedata,in:Proc.TIME,2017.

[3]L.S.Colby,E.L.Robertson,L.V.Saxton,D.V.Gucht,Aquerylanguageforlist-basedcomplexobjects,in:PODS,1994.

[4]L.S.Colby,L.V.Saxton,D.V.Gucht,Conceptsformodelingandqueryinglist-structureddata,Inf.Process.Manag.30(1994).

[5]B.Schröder,OrderedSets:AnIntroduction,Birkhäuser,2003.

[6]D.R.Fulkerson,NoteonDilworth’sdecompositiontheoremforpartiallyorderedsets,in:Proc.Amer.Math.Soc,1955.

[7]R.P.Dilworth,Adecompositiontheoremforpartiallyorderedsets,Ann.Math.(1950).

[8]A.Brandstädt,V.B.Le,J.P.Spinrad,Posets,in:GraphClasses,ASurvey,SIAM,1987.

[9]R.P.Stanley,EnumerativeCombinatorics,CambridgeUniversityPress,1986.

[10]J.M.Howie,FundamentalsofSemigroupTheory,ClarendonPress,Oxford,1995.

[11]M.Lenzerini,Dataintegration:atheoreticalperspective,in:PODS,2002.

[12]L.Libkin,Dataexchangeandincompleteinformation,in:PODS,2006.

[13]L.Antova,C.Koch,D.Olteanu,World-setdecompositions:Expressivenessandefficientalgorithms,in:ICDT,2007.

[14]M.K.Warmuth,D.Haussler,Onthecomplexityofiteratedshuffle,J.Comput.Syst.Sci.28(1984).

[15]M.R.Garey,D.S.Johnson,ComputersandIntractability:AGuidetotheTheoryofNP-Completeness,W.H.Freeman,1979.

[16]R.Fraîssé,L’intervalleenthéoriedesrelationssesgenéralisations,filtreintervallaireetclôtured’unerelation,North-Holl.Math.Stud.99(1984).

[17]J.-E.Pin,Syntacticsemigroups,in:HandbookofFormalLanguages,Springer,1997.

[18]S.Grumbach,T.Milo,Towardstractablealgebrasforbags,J.Comput.Syst.Sci.52(1996).

[19]L.Libkin,L.Wong,Querylanguagesforbagsandaggregatefunctions,J.Comput.Syst.Sci.55(1997).

[20]S.Grumbach,T.Milo,Analgebraforpomsets,in:ICDT,1995.

[21]S.Grumbach,T.Milo,Analgebraforpomsets,Inf.Comput.150(1999).

[22]T.Hiraguchi,Onthedimensionoforders,Sci.Rep.KanazawaUniv.4(1955).

[23]O.Øre,Partialorder,in:TheoryofGraphs,AMS,1962.

[24]P.Barceló,L.Libkin,A.Poggi,C.Sirangelo,XMLwithincompleteinformation,J.ACM58(2010).

[25]T.Imieli ´nski,W.Lipski,Incompleteinformationinrelationaldatabases,J.ACM31(1984).

[26]W.LipskiJr.,Onsemanticissuesconnectedwithincompleteinformationdatabases,ACMTrans.DatabaseSyst.4(1979).

[27]E.F.Codd,Extendingthedatabaserelationalmodeltocapturemoremeaning,ACMTrans.DatabaseSyst.4(1979).

[28]L.Libkin,Incompletedata:Whatwentwrong,andhowtofixit,in:PODS,2014.

[29]L.Libkin,SQL’sthree-valuedlogicandcertainanswers,in:ICDT,2015.

[30]D.Suciu,D.Olteanu,C.Ré,C.Koch,ProbabilisticDatabases,SynthesisLecturesonDataManagement,Morgan&ClaypoolPublishers,2011.

[31]M.A.Soliman,I.F.Ilyas,Rankingwithuncertainscores,in:ICDE,2009.

[32]P.Buneman,A.Jung,A.Ohori,Usingpowerdomainstogeneralizerelationaldatabases,Theor.Comput.Sci.91(1991).

[33]L.Libkin,Asemantics-basedapproachtodesignofquerylanguagesforpartialinformation,in:SemanticsinDatabases,1998.

[34]N.Immerman,Relationalqueriescomputableinpolynomialtime,Inf.Control68(1986).

[35]W.Ng,Anextensionoftherelationaldatamodeltoincorporateordereddomains,ACMTrans.DatabaseSyst.26(2001).

[36]R.vanderMeyden,Thecomplexityofqueryingindefinitedataaboutlinearlyordereddomains,J.Comput.Syst.Sci.54(1997).

[37]A.J.Bonner,G.Mecca,Sequences,datalog,andtransducers,J.Comput.Syst.Sci.57(1998)234–259.

[38]N.Coburn,G.E.Weddell,Alogicforrule-basedqueryoptimizationingraph-baseddatamodels,in:DOOD,1993,pp. 120–145.

[39]M.Benedikt,L.Segoufin,Towardsacharacterizationoforder-invariantqueriesovertamegraphs,J.Symb.Log.74(2009).

[40]M.A.Soliman,I.F.Ilyas,S.Ben-David,Supportingrankingqueriesonuncertainandincompletedata,VLDBJ.19(2010).

[41]J.Chomicki,D.Toman,Timeindatabasesystems,in:HandbookofTemporalReasoninginArtificialIntelligence,Elsevier,2005.

[42]R.T.Snodgrass,J.Gray,J.Melton,DevelopingTime-OrientedDatabaseApplicationsinSQL,MorganKaufmann,2000.

[43]W.Fan,F.Geerts,J.Wijsen,Determiningthecurrencyofdata,ACMTrans.DatabaseSyst.37(2012).

[44]M.Jacob,B.Kimelfeld,J.Stoyanovich,Asystemformanagementandanalysisofpreferencedata,Proc.VLDBEndow.7(2014).

[45]A.Arvanitis,G.Koutrika,PrefDB:supportingpreferencesasfirst-classcitizensinrelationaldatabases,IEEETrans.Knowl.DataEng.26(2014).

[46]W.Kiessling,Foundationsofpreferencesindatabasesystems,in:VLDB,2002.

[47]B.Alexe,M.Roth,W.-C.Tan,Preference-awareintegrationoftemporaldata,Proc.VLDBEndow.8(2014).

[48]K.Stefanidis,G.Koutrika,E.Pitoura,Asurveyonrepresentation,compositionandapplicationofpreferencesindatabasesystems,ACMTrans.Database Syst.36(2011).

[49]R.Fagin,A.Lotem,M.Naor,Optimalaggregationalgorithmsformiddleware,in:PODS,2001.

[50]C.Dwork,R.Kumar,M.Naor,D.Sivakumar,RankaggregationmethodsfortheWeb,in:WWW,2001.

[51]K.Konczak,J.Lang,Votingprocedureswithincompletepreferences,in:IJCAI-05WorkshoponAdvancesinPreferenceHandling,2005.

[52]B.Kimelfeld,P.G.Kolaitis,J.Stoyanovich,Computationalsocialchoicemeetsdatabases,in:Proc.IJCAI,2018.

[53]L.Xia,V.Conitzer,Determiningpossibleandnecessarywinnersgivenpartialorders,J.Artif.Intell.Res.41(2011).

[54]N.Betzler,B.Dorn,Towardsadichotomyforthepossiblewinnerprobleminelectionsbasedonscoringrules,J.Comput.Syst.Sci.76(2010).

[55]D.Baumeister,J.Rothe,Takingthefinalsteptoafulldichotomyofthepossiblewinnerprobleminpurescoringrules,Inf.Process.Lett.112(2012).

[56]A.Amarilli,C.Paperman,Topologicalsortingunderregularconstraints,in:ICALP,2018.

Documents relatifs