Wenextcompareourformalismtopreviouslyproposedformalisms:querylanguagesoverbags(withnoorder);aquery languageforpartiallyorderedmultisets;andother relatedwork.Toourknowledge,however,noneoftheseworksstudied thepossibilityorcertaintyproblemsforpartiallyordereddata,sothatourtechnicalresultsdonotfollowfromthem.
Standardbagsemantics. Anatural desideratumfor oursemantics on(partially) ordered relationsis that itshould be a faithfulextensionofthebagsemanticsforrelationalalgebra.WefirstconsidertheBALG1 language onbags[18] (the“flat fragment”oftheirlanguageBALG onnestedrelations).WedenotebyBALG1+thefragmentof BALG1 thatincludesthe stan-dardextensionofpositiverelationalalgebraoperationstobags:additiveunion,crossproduct,selection,andprojection.We observethat,indeed,oursemanticsfaithfullyextendsBALG1+:queryevaluationcommuteswith“forgetting”theorder.Formally, forapo-relation
,wedenotebybag
()
itsunderlyingbagrelation,anddefinelikewisebag(
D)
forapo-database D asthe databaseoftheunderlyingbagrelations.Forthefollowingcomparison,weidentifyboth×DIRand×LEXwiththe×of [18](asbothourproductoperationsyieldthesamebagasoutput,foranyinput),andweidentifyourunionwiththeadditive unionof [18].Thefollowingthentriviallyholds.
Proposition65.ForanyPosRAqueryQ andapo-relationD,bag
(
Q(
D))
=Q(
bag(
D))
,where Q(
D)
isdefinedaccordingtoour semanticsandQ(
bag(
D))
isdefinedbyBALG1+.Proof. Thereisanexactcorrespondenceintermsoftheoutputbagsbetweenadditiveunionandourunion;betweencross productand×DIR and×LEX;betweenourselectionandthat ofBALG1+,andsimilarlyforprojection (asnotedbeforethe statementofProposition65inthemaintext,atechnicalsubtletyisthattheprojectionofBALG canonlyprojectonasingle attribute, butone can encode “standard” projection on multipleattributes). The propositionfollows by induction onthe querystructure. 2
ThefullBALG1 languageincludesadditionaloperatorssuchasbagintersectionandsubtraction,whicharenon-monotone andassuchmaynotbeexpressedinourlanguage:itisalsounclearhowtheycouldbeextendedtooursetting(seefurther discussion in“Algebra onpomsets” below). Onthe other hand,BALG1 doesnot includeaggregation,and so PosRAaccand BALG1 areincomparableintermsofexpressivepower.
Abetteryardsticktocompareagainstforaccumulationcouldbe theworkof [19]:theyshowthat theirbasiclanguage BQL is equivalent to BALG, andthen furtherextendthe language withaggregate operators, to define alanguage called N RLaggr on nestedrelations. Onflatrelations, N RLaggr capturesfunctionsthat cannotbe capturedinour language:in particulartheaverage functionAVGis non-associativeandthus cannot becapturedby ouraccumulationfunction (which anywayfocusesonorder-dependent functions,asPOSS/CERTare trivialotherwise).Ontheother hand,N RLaggr cannot testparity(Corollary 5.7in[19])whereasthisiseasilycapturedbyouraccumulationoperator.WeconcludethatN RLaggr and PosRAaccareincomparableintermsofcapturedtransformationsonbags,evenwhenrestrictedtoflatrelations.
Algebraonpomsets. We now compare our work to algebras defined onpomsets [20,21], which also attemptto bridge partial ordertheory anddata management(although, again, theseworks donot studypossibilityandcertainty). Pomsets are labeled posets quotientedby isomorphism (i.e.,renaming of identifiers), like po-relations. Beyond similarities in the language design, amajor conceptual difference betweenourformalism andthat of [20,21] isthat their work focuseson processingconnectedcomponentsofthepartialordergraph,andtheiroperators aretailoredforthatsemantics.Asa conse-quence, theirsemantics isnot afaithful extensionofbagsemantics, i.e.,their language wouldnot satisfy thecounterpart ofProposition65(see,forinstance,thesemanticsofduplicateeliminationin[20]).Bycontrast,wemanipulatepo-relations thatstandforsetsofpossiblelistrelations,andouroperatorsaredesignedaccordingly,unlike thoseof[20],where trans-formationstake intoaccountthestructure (connectedcomponents)oftheentireposetgraph.Becauseofthischoice, [20]
introduces non-monotone operators that we cannot express,andcan designa duplicate eliminationoperatorthat cannot fail.Indeed,thepossiblefailureofourduplicateeliminationoperatorisadirectconsequenceofitssemanticsofoperating oneachpossibleworld,possiblyleadingtocontradictions.
Ifweconsequentlydisallowduplicateeliminationinbothlanguagesforthesakeofcomparison,thentheresulting frag-mentPom-Algεn ofthelanguageof[20] canyieldonlyseries-paralleloutputs(Proposition 4.1of [20]),unlike PosRAqueries whoseoutputordermaybearbitrary.Toformalizethis,weneedthenotionofarealizer[5] ofaposet P=
(
V, <)
:thisisa setoftotalorders(
V, <
1), . . . , (
V, <
n)
suchthat,foreveryx,
y∈V,wehavex<
yiffx<
iyforall i.Wecanuserealizers to expressarbitrarypo-relations usingthe ×DIR-product,asis shownby rephrasingin ourcontext an existing resulton partialorders(Theorem 9.6of [22],seealso[23]).Lemma66.Letn∈N,andlet
(
P, <
P)
beaposetthathasarealizer(
L1, . . . ,
Ln)
ofsize n.ThenP isisomorphictoasubsetof
= [≤l]×DIR· · · ×DIR[≤l],withn factorsintheproduct,forsomeintegerl∈N(theorderon
beingtherestrictiononthatof
).
Proof. Wedefine
bytakingl··= |P|,andweidentifyeachelementxofP to f
(
x)
··=(
nx1, . . . ,
nxn)
,wherenxi istheposition where x occurs in Li. Now, for any x,
y∈P, we have x<
P y iffnxi<
niy for all 1≤i≤n (that is, x<
Li y), hence iff f(
x) <
f(
y)
:thisisbecausetherearenotwoelements x=y and1≤i≤nsuchthatthei-thcomponentsof f(
x)
andof f(
y)
arethesame.Hence,takingtobetheimageof f (whichisinjective),
isindeedisomorphicto P. 2
Thisimpliesthat PosRAqueriescanyieldarbitrarypo-relationsasoutput.
Proposition67.Foranypo-relation
,thereisaPosRAqueryQ withnoinputssuchthatQ
()
=.
Proof. We first show that for anyposet
(
P, <)
, there exists a PosRADIR query Q such that the tuples of··=Q
()
all haveuniquevaluesandtheunderlyingposetofis
(
P, <)
.Indeed,wecantaked tobetheorderdimensionof P,which is necessarilyfinite [5], andthen, bydefinition, P hasa realizerofsize d.By Lemma66,there isan integerl∈N such that··= [≤l]×DIR· · · ×DIR[≤l](withnfactorsintheproduct)hasasubset S isomorphicto
(
P, <)
.Hence,lettingψ
be a tuplepredicatesuch thatσ
ψ(
)
=S (which can clearly be constructedby enumeratingthe elements of S), thequeryQ··=
σ
ψ(
)
provestheclaim,withexpressedasabove.
Now, to prove the desired resultfrom this claim, build Q from Q by taking its join (i.e., ×LEX-product, selection, projection)withaunionofsingletonconstantexpressionsthatmapeachuniquetuplevalueof Q
()
tothedesiredvalueof thecorrespondingtupleinthedesiredpo-relation.Thisconcludestheproof. 2
WeconcludethatPom-Algεn doesnotsubsume PosRA.
Incompletenessindatabases. Ourwork isinspired by thefield ofincomplete informationmanagement,which hasbeen studiedforvariousmodels [24,12],inparticularrelationaldatabases [25].Thisfieldinspiresourdesignofpo-relationsand ourstudyofpossibilityandcertainty [13,26].However,uncertaintyinthesesettingstypicallyfocusesonwhethertuplesexist oron whattheirvalues are(e.g.,withnulls [27],includingthenovelapproachof [28,29];withc-tables [25],probabilistic databases [30] or fuzzynumerical values asin [31]). Toour knowledge, though, our work is the first to study possible and certain answers inthe context of order-incomplete data. Combining order incompleteness withstandard tuple-level uncertaintyisleftasachallengeforfuturework.Notethatsomeworks [32,33,29] usepartialordersonrelationstocompare theinformativenessofrepresentations.Thisisunrelatedtoourpartialordersontuples.
Ordereddomains. Anotherlineofworkhasstudiedrelationaldatamanagementwherethedomainelementsare(partially) ordered [34–38]. Thisis inparticular thecasein worksaimingto querysequences or supportiteration (see e.g. [37,38], whichhoweverdonotconsideruncertaintyandconsequentlyneitherpartialorder).However,ourgoal,settingand perspec-tivearedifferent:weseeorderontuplesaspartoftherelations,andasbeingconstructedbyapplyingouroperators;these works seeorderasbeinggivenoutside ofthe query,hencedonot studythe propagationofuncertaintythrough queries.
Also, queries insuchworkscan oftendirectlyaccesstheorderrelation [36,39,37].Some worksalsostudyuncertaintyon totallyorderednumericaldomains [31,40],whilewelookatgeneralorderrelations.
Temporaldatabases. Temporaldatabases [41,42] consider orderon facts, butit isusually induced by timestamps, hence total.Anotableexceptionis [43] whichconsidersthatsomefactsmaybemorecurrentthanothers,withconstraintsleading toapartialorder.Inparticular,theystudythecomplexity ofretrievingqueryanswers thatarecertainlycurrent,forarich query class.Incontrast,we canmanipulate theordervia queries,andwe canalsoask aboutaspectsbeyondcurrency,as shownthroughoutthepaper(e.g.,viaaccumulation).
Using preference information. Order theory has been also used to handle preference information in database sys-tems [44–48],withsomeoperators beingthesameasours,andforrankaggregation[49,44,50],i.e., retrievingtop-kquery answers givenmultiplerankings. However,such workstypicallytry toresolve uncertaintybyreconciling manyconflicting representations (e.g.,via knowledgeonthe individual scoresgivenby differentsources andafunction toaggregate them [49],or apreference function [47]).In contrast,we focus onmaintaininga faithful modelofall possible worldswithout reconcilingthem,studyingpossibleandcertainanswersinthisrespect.
Computationalsocialchoice. The notion of preferences hasbeen studied in the domain ofcomputational social choice to determine thepossible outcomesof an electiongiven partial preferenceinformation expressedby voters [51]. In this setting,the notions ofpossiblewinnersand necessarywinners have beenintroduced to summarizethepossible outcomes, andthey havebeenconnectedto thenotion ofpossibleandnecessary answersofdatabasequeries [52].The complexity ofthese problemshasbeen studied,witha dichotomy resultthat classifiesits complexity depending on theaggregation used [51,53–55]. However, the expressiveness of thiscomputational social choice framework is incomparable to that of ourframework.Specifically,theirframeworkonlystudiespossibleandnecessaryanswersintermsofachievingamaximal scorecomputedasasumofnumericalvaluesfollowingsome positionalscoringrule,whereas ourframeworkcanperform accumulationinarbitrarymonoidsandontopofpositiverelationalalgebraqueries.Conversely,thereisnoapparentwayin ourframeworktoencodeaccumulationfollowingpositionalscoringrules,aswewouldneedtoapplyaccumulationtosum thecandidatescores,andthenlookatthetopanswersaccordingtoadifferentorderontheresult.
9. Conclusion
This paperintroduced an algebra for order-incomplete data.We have studied the complexity ofpossible and certain answersforthisalgebra,haveshowntheproblemstobegenerallyintractable,andidentifiedseveraltractablecases.
AprimemotivationforourworkistoprovideasemanticsforafragmentofSQL(namely,SPJU+aggregates)inpresence of partiallyordered data. We seeour work asa first step inthisrespect, andour choice ofoperators forthe algebra is by nomeans theonly possibleone. Infuture workwe plan tostudythe incorporationof additionaloperators,including inparticularconstructorsofthe(partial)orderbasedonthetuplevalues.Wewillalsoinvestigatehowto combine order-uncertaintywithuncertaintyonvalues,andstudyadditionalsemantics fordupElim (toavoidthe pitfallsoftheproposed semanticswhichwediscussedabove).
Inconnection withthechoice ofoperators,a naturalquestion iswhetheronemayachieve a completenessresult. We haveshown(Proposition67) that ourlanguage iscomplete intermsof“individualoutputs”, i.e., that PosRAcan beused toconstructanypo-relationusingonlythe built-inconstantrelations andoperators. Amorechallenginggoalistodesign alanguagethatiscompleteintermsoftransformations,i.e.thatmaycaptureallfunctionsoverpo-relationsinsomeclass.
Thisisanotherintriguingtopicforfurtherinvestigation.
Last,manyopenquestionsremainaboutthecomplexityofPOSS,e.g.,wedonotknowwhetherPOSSistractablewhen the accumulationmonoidisa finitegroup. Ideally, we wouldwantto establisha dichotomyresult forthecomplexityof POSS,andacompletesyntacticcharacterizationofcaseswherePOSSistractable:thisisinvestigatedfurtherinafollow-up workinvolvingthefirstauthor[56].
DeclarationofCompetingInterest
Thereisnoconflictofinterest.
Acknowledgements
We aregratefulto MarzioDeBiasi, toPálvölgyiDömötör, andto MikhailRudoy, fromcstheory.stackexchange.com,for helpful suggestions.We are alsograteful tothe anonymousreviewers fortheir feedback that helpedimprovethispaper.
ThisresearchwaspartiallysupportedbytheIsraelScienceFoundation (grant1636/13),theBlavatnik ICRC,andIntel.
References
[1]S.Abiteboul,R.Hull,V.Vianu,FoundationsofDatabases,Addison-Wesley,1995.
[2]A.Amarilli,M.L.Ba,D.Deutch,P.Senellart,Possibleandcertainanswersforqueriesoverorder-incompletedata,in:Proc.TIME,2017.
[3]L.S.Colby,E.L.Robertson,L.V.Saxton,D.V.Gucht,Aquerylanguageforlist-basedcomplexobjects,in:PODS,1994.
[4]L.S.Colby,L.V.Saxton,D.V.Gucht,Conceptsformodelingandqueryinglist-structureddata,Inf.Process.Manag.30(1994).
[5]B.Schröder,OrderedSets:AnIntroduction,Birkhäuser,2003.
[6]D.R.Fulkerson,NoteonDilworth’sdecompositiontheoremforpartiallyorderedsets,in:Proc.Amer.Math.Soc,1955.
[7]R.P.Dilworth,Adecompositiontheoremforpartiallyorderedsets,Ann.Math.(1950).
[8]A.Brandstädt,V.B.Le,J.P.Spinrad,Posets,in:GraphClasses,ASurvey,SIAM,1987.
[9]R.P.Stanley,EnumerativeCombinatorics,CambridgeUniversityPress,1986.
[10]J.M.Howie,FundamentalsofSemigroupTheory,ClarendonPress,Oxford,1995.
[11]M.Lenzerini,Dataintegration:atheoreticalperspective,in:PODS,2002.
[12]L.Libkin,Dataexchangeandincompleteinformation,in:PODS,2006.
[13]L.Antova,C.Koch,D.Olteanu,World-setdecompositions:Expressivenessandefficientalgorithms,in:ICDT,2007.
[14]M.K.Warmuth,D.Haussler,Onthecomplexityofiteratedshuffle,J.Comput.Syst.Sci.28(1984).
[15]M.R.Garey,D.S.Johnson,ComputersandIntractability:AGuidetotheTheoryofNP-Completeness,W.H.Freeman,1979.
[16]R.Fraîssé,L’intervalleenthéoriedesrelationssesgenéralisations,filtreintervallaireetclôtured’unerelation,North-Holl.Math.Stud.99(1984).
[17]J.-E.Pin,Syntacticsemigroups,in:HandbookofFormalLanguages,Springer,1997.
[18]S.Grumbach,T.Milo,Towardstractablealgebrasforbags,J.Comput.Syst.Sci.52(1996).
[19]L.Libkin,L.Wong,Querylanguagesforbagsandaggregatefunctions,J.Comput.Syst.Sci.55(1997).
[20]S.Grumbach,T.Milo,Analgebraforpomsets,in:ICDT,1995.
[21]S.Grumbach,T.Milo,Analgebraforpomsets,Inf.Comput.150(1999).
[22]T.Hiraguchi,Onthedimensionoforders,Sci.Rep.KanazawaUniv.4(1955).
[23]O.Øre,Partialorder,in:TheoryofGraphs,AMS,1962.
[24]P.Barceló,L.Libkin,A.Poggi,C.Sirangelo,XMLwithincompleteinformation,J.ACM58(2010).
[25]T.Imieli ´nski,W.Lipski,Incompleteinformationinrelationaldatabases,J.ACM31(1984).
[26]W.LipskiJr.,Onsemanticissuesconnectedwithincompleteinformationdatabases,ACMTrans.DatabaseSyst.4(1979).
[27]E.F.Codd,Extendingthedatabaserelationalmodeltocapturemoremeaning,ACMTrans.DatabaseSyst.4(1979).
[28]L.Libkin,Incompletedata:Whatwentwrong,andhowtofixit,in:PODS,2014.
[29]L.Libkin,SQL’sthree-valuedlogicandcertainanswers,in:ICDT,2015.
[30]D.Suciu,D.Olteanu,C.Ré,C.Koch,ProbabilisticDatabases,SynthesisLecturesonDataManagement,Morgan&ClaypoolPublishers,2011.
[31]M.A.Soliman,I.F.Ilyas,Rankingwithuncertainscores,in:ICDE,2009.
[32]P.Buneman,A.Jung,A.Ohori,Usingpowerdomainstogeneralizerelationaldatabases,Theor.Comput.Sci.91(1991).
[33]L.Libkin,Asemantics-basedapproachtodesignofquerylanguagesforpartialinformation,in:SemanticsinDatabases,1998.
[34]N.Immerman,Relationalqueriescomputableinpolynomialtime,Inf.Control68(1986).
[35]W.Ng,Anextensionoftherelationaldatamodeltoincorporateordereddomains,ACMTrans.DatabaseSyst.26(2001).
[36]R.vanderMeyden,Thecomplexityofqueryingindefinitedataaboutlinearlyordereddomains,J.Comput.Syst.Sci.54(1997).
[37]A.J.Bonner,G.Mecca,Sequences,datalog,andtransducers,J.Comput.Syst.Sci.57(1998)234–259.
[38]N.Coburn,G.E.Weddell,Alogicforrule-basedqueryoptimizationingraph-baseddatamodels,in:DOOD,1993,pp. 120–145.
[39]M.Benedikt,L.Segoufin,Towardsacharacterizationoforder-invariantqueriesovertamegraphs,J.Symb.Log.74(2009).
[40]M.A.Soliman,I.F.Ilyas,S.Ben-David,Supportingrankingqueriesonuncertainandincompletedata,VLDBJ.19(2010).
[41]J.Chomicki,D.Toman,Timeindatabasesystems,in:HandbookofTemporalReasoninginArtificialIntelligence,Elsevier,2005.
[42]R.T.Snodgrass,J.Gray,J.Melton,DevelopingTime-OrientedDatabaseApplicationsinSQL,MorganKaufmann,2000.
[43]W.Fan,F.Geerts,J.Wijsen,Determiningthecurrencyofdata,ACMTrans.DatabaseSyst.37(2012).
[44]M.Jacob,B.Kimelfeld,J.Stoyanovich,Asystemformanagementandanalysisofpreferencedata,Proc.VLDBEndow.7(2014).
[45]A.Arvanitis,G.Koutrika,PrefDB:supportingpreferencesasfirst-classcitizensinrelationaldatabases,IEEETrans.Knowl.DataEng.26(2014).
[46]W.Kiessling,Foundationsofpreferencesindatabasesystems,in:VLDB,2002.
[47]B.Alexe,M.Roth,W.-C.Tan,Preference-awareintegrationoftemporaldata,Proc.VLDBEndow.8(2014).
[48]K.Stefanidis,G.Koutrika,E.Pitoura,Asurveyonrepresentation,compositionandapplicationofpreferencesindatabasesystems,ACMTrans.Database Syst.36(2011).
[49]R.Fagin,A.Lotem,M.Naor,Optimalaggregationalgorithmsformiddleware,in:PODS,2001.
[50]C.Dwork,R.Kumar,M.Naor,D.Sivakumar,RankaggregationmethodsfortheWeb,in:WWW,2001.
[51]K.Konczak,J.Lang,Votingprocedureswithincompletepreferences,in:IJCAI-05WorkshoponAdvancesinPreferenceHandling,2005.
[52]B.Kimelfeld,P.G.Kolaitis,J.Stoyanovich,Computationalsocialchoicemeetsdatabases,in:Proc.IJCAI,2018.
[53]L.Xia,V.Conitzer,Determiningpossibleandnecessarywinnersgivenpartialorders,J.Artif.Intell.Res.41(2011).
[54]N.Betzler,B.Dorn,Towardsadichotomyforthepossiblewinnerprobleminelectionsbasedonscoringrules,J.Comput.Syst.Sci.76(2010).
[55]D.Baumeister,J.Rothe,Takingthefinalsteptoafulldichotomyofthepossiblewinnerprobleminpurescoringrules,Inf.Process.Lett.112(2012).
[56]A.Amarilli,C.Paperman,Topologicalsortingunderregularconstraints,in:ICALP,2018.