Haut PDF An algebraic approach to recursive queries in relational databases

An algebraic approach to recursive queries in relational databases

An algebraic approach to recursive queries in relational databases

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignemen[r]

27 En savoir plus

Data-Driven Publication of Relational Databases

Data-Driven Publication of Relational Databases

Some recent research works attempt at enabling a composition of XML export and XML publishing languages [16, 11]. SilkRoute for instance [16] combines an RXL query that exports in XML some parts of a relational database with correlated SQL queries, and an XML-QL query over the exported document. The composition algorithm avoids a full materialization. Although our approach is, to some extent, similar, our motivation differs by at least two important points. First we do not consider a middleware-based architecture, where relational data needs a preliminary transformation (in XML) in order to be accessible by other applications. Consequently we avoid the issues raised by the combination of two languages (e.g., XSQL/XSLT, or RXL/XML-QL) and by the necessary infrastructure. Second we advocate a high- level specification of the navigation in the database, from which we derive an appropriate embedded SQL program. In summary, we propose with D OC QL a direct, lightweight, approach which straightly navigates
En savoir plus

19 En savoir plus

Provenance and Probabilities in Relational Databases: From Theory to Practice

Provenance and Probabilities in Relational Databases: From Theory to Practice

Exploiting the data structure. An alternative is to exploit the fact that the structure of the data is not arbitrary. Indeed, if the data has the structure of a tree, or has a low treewidth, meaning that its structure is close to that of a tree, it has been shown [2, 3] that a bounded- treewidth provenance circuit can be constructed, which in turn supports tractable query evalu- ation. This line of technique has been suc- cessfully applied to synthetic [40] and real- world [38] data, for specific kinds of queries. When none of this is feasible, one can resort to gen- eral knowledge compilation techniques [18]. Knowl- edge compilation is the problem of transforming Boolean functions of a certain form into another, more tractable, form. Over the years, a wide vari- ety of techniques, results, heuristics, and tools have emerged from the knowledge compilation community. In particular, tools such as c2d [17], Dsharp [41], and d4 [35] compile arbitrary formulas in conjunctive
En savoir plus

12 En savoir plus

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Validation of the SPARQL query results We compared the data retrieval resulting from the three approaches (i.e., the Dijkstra algorithm, BioSemantic and a human SQL query builder) (Table 4). We refer to a human SQL query as a query that is manually written by an expert with good knowledge of the database schema. A first general observation demonstrates that the num- ber of results is identical for BioSemantic queries and the manual SQL queries. BioSemantic globally retrieves more results than the Dijkstra algorithm. The gap for Query1 is explained because of the inheritance relation- ships missed by the Dijkstra algorithm. Indeed, in that case, BioSemantic detects these relationships and re- groups the subdivided paths into the final query. Fur- thermore, BioSemantic preferentially selects binary association tables that promote more data retrieval. Both Query2 and Query3 correspond to a short path without inheritance but with several paths having the same node numbers. In that case, weighting the BioSemantic path favours binary associations, whereas the Dijkstra algo- rithm chooses the first detected path having a minimum node number. For Query2, BioSemantic favours the de- tection of a more pertinent path, whereas the same paths are detected for Query3. For Query4, no equivalent path guides to the same results; in other words, both algo- rithms select the same path. In each case, we manually verified that the retrieved data were identical.
En savoir plus

16 En savoir plus

Ontology-Mediated Queries for NOSQL Databases

Ontology-Mediated Queries for NOSQL Databases

The NO-RL rule language operates on key-value records and can be compared to existing languages de- signed for reasoning on nested structures. For instance, Frame Logic (Kifer, Lausen, and Wu 1995; Kifer 2005) provides a logical foundation for frame-based and object-oriented languages for data and knowledge representation. Its expressivity captures the NO-RL lan- guage but there are no computational guarantees. The Elog rule language (Baumgartner et al. 2001) underly- ing the the Lixto system (Baumgartner et al. 2003) is a fragment of monadic Datalog (Gottlob and Koch 2004) that has been specifically designed for extracting tree shaped data from HTML pages. These rules are simi- lar to ∀-rules of the NO-RL(2) fragment. They are used in forward-chaining manner to generate novel logical structures that are then exported in XML. Active XML (Abiteboul, Benjelloun, and Milo 2004) is a formalism to model distributed systems represented as trees with function calls for tasks such as sending, receiving and querying data (like web services). Active XML function calls can be assimilated to NO-RL(3) rules that are ap- plied in a forward-chaining manner to expand answers returned by XML queries. However, none of these ex- isting work follows an OBDA approach, that is, they do not add a backward-chaining step of query reformula- tion on top of data management.
En savoir plus

8 En savoir plus

Ontology-Mediated Queries for NOSQL Databases

Ontology-Mediated Queries for NOSQL Databases

Whether this paradigm can be used in conjunction with other kinds of query lan- guages is a still an open question. The naive way to deal with non relational datasources is to define mappings for translating them into relational structures, and then use the classic OBDA framework as it is. However, this approach would induce a significant performance degrade as it would add a step for converting the data using the map- pings and, most importantly, it would make impossible to take advantage of the low level query optimizations provided by native systems. This can be particularly acute for NOSQL systems, like key-value (KV) stores, that have been specifically designed to scale when dealing with very large collections of data.
En savoir plus

5 En savoir plus

An algebraic approach to MSO-definability on countable linear orderings

An algebraic approach to MSO-definability on countable linear orderings

Proof. Recall that the proof of Theorem 28 was based on closure properties of recognizable ⍟-languages under boolean operations and projections, which could be easily implemented at the level of the ⍟-algebras. Because in this proof we do not have to deal with effectiveness, it is convenient to work directly at the level of ⍟-monoids. In particular, the monoids recognizing the considered languages will be defined using logical types and Shelah’s composition method [ She75 ]. We shall consider MSO formulas up to syntactic equivalence, that is, up to associativity, commutativity, idempotency, and distributivity of conjunctions and disjunctions, commutativity of conjunctions with universal quantifications and disjunctions with existential quantifications, and renamings of quantified variables. Recall that, over a fixed finite signature with only relational symbols, there exist only finitely many sentences up to syntactic equivalence.
En savoir plus

49 En savoir plus

Processing Fuzzy Relational Queries Using Fuzzy Views

Processing Fuzzy Relational Queries Using Fuzzy Views

Finally, it appears that using materialized views is not an interesting solution as they induce an important increase in terms of storage space and do not yield significantly better query processing times than abstract views. Let us not forget either that materialized views are problematic from a DB update perspective, since every modification of the data in a base table makes it necessary to recalculate the associated fuzzy views. On the other hand, the approach based on abstract views appears to be a promising solution: it is more efficient than a mild coupling approach such as PostgreSQLf [15] and does not induce any extra cost in terms of data storage. It is all the more interesting as it is completely portable: its use does not require to modify the underlying DBMS, but only implies to add a simple software layer on top of it.
En savoir plus

7 En savoir plus

Computation of Extended Answers to Relational Database Queries

Computation of Extended Answers to Relational Database Queries

PathSim [28] focuses on the number of meta-paths between objects. This is rather new and we will see that the approach described in Section 1.4 is based on a somewhat similar concept. FlexRecs [18], like most recommending systems, bases its recommendations on the similarity be- tween values. Even though the framework proposed is quite innovative – allowing the user to personalize recommendations effectively – the means to compute these recommendations are not so original. RQL [1, 3] is a Recommendation Query Language capable of computing suggestions for any user, provided that they have notions of any programming language in the first place. While the flexibility given here is inspiring, the complexity of the queries make it very difficult to use because of fairly long execution times. Recommender systems as a whole tend to either use feedback from the users with collaborative filtering, or compare the items’ attributes directly with content-based approaches. Doing that, they mostly neglect another important part of databases: the schema.
En savoir plus

51 En savoir plus

Translation of Relational and Non-Relational Databases into RDF with xR2RML

Translation of Relational and Non-Relational Databases into RDF with xR2RML

Evaluation. We evaluated the prototype using two simple databases: a MySQL relational database and a MongoDB database with two collections. In both cases, the data and associated xR2RML map- pings were written to cover most mapping situations addressed by xR2RML: strategies for handling mul- tiple RDF terms, JSONPath and XPath expressions, mixed-syntax paths with mixed contents (relational, JSON, XML, CSV/TSV), cross-references, produc- tion of RDF collection/containers, management of UTF-8 characters. A dump of both databases as well as the example mappings are available on the same GitHub repository. The current status of the prototype applies the data materialization approach, i.e. RDF data is generated by sequentially applying all triples maps. The query rewriting approach (SPARQL to database specific query rewriting) may be considered in future works as suggested in section 7. At the time of writing the prototype has two limitations: (i) only one level of RDF collections and containers can be generated (no nested collections/containers), and (ii) the result of a joint query in a relational database can- not be translated into an RDF collection or container.
En savoir plus

13 En savoir plus

On the Optimization of Recursive Relational Queries: Application to Graph Queries

On the Optimization of Recursive Relational Queries: Application to Graph Queries

14–19, 2020, Portland, OR, USA. ACM, New York, NY, USA, 23 pages. https://doi.org/10.1145/3318464.3380567 1 INTRODUCTION The expressive power of query languages has been greatly improved with the introduction of recursion. Recursive queries are, for instance, very useful in data integration since expressive ontologies use recursion [29]. Graph databases are another example where recursion is particularly useful for expressing navigation along paths connecting nodes in the graph. For this purpose, graph query languages often include constructs such as Regular Path Queries (RPQs) [27], and various extensions such as Conjunctions of them (CR- PQs) and Union of CRPQs (UCRPQs) [19, 20, 26, 50]. For instance, the query language SPARQL 1.1 [43] introduced Property Paths, and language proposals such as OpenCypher [34, 55] and G-core [11] also include the possibility of ex- pressing recursive paths. SPARQL’s Property Paths revealed crucial for extracting information from RDF data structures such as those found in social networks, life sciences and transportation networks. However, recursive path queries are notoriously known to be much harder to optimize and evaluate than non-recursive ones [54, 71]. In practice, even with datasets of modest sizes, the benchmarking work found in [15] notices that “all tested systems either failed on the majority of these [recursive] queries or had to be manually terminated after unexpectedly long running times.” A major difficulty is to find an appropriate way to execute the query, a task frequently referred to in the literature as finding an appropriate Query Execution Plan (QEP). For example, let us consider the following query:
En savoir plus

24 En savoir plus

Probabilistic relational models learning from graph databases

Probabilistic relational models learning from graph databases

B.2 Neo4j graph database Neo4j 1 is a NOSQL graph database. It is a fully transactional database (ACID) that stores data structured as graphs. It offers high query performance on complex data, while remaining intuitive and simple for the developer. Neo4j is developed by the Swedish-American company Neo technol- ogy. It has been in commercial development for 10 years and in production for over 7 years. Most importantly it has the largest and most vibrant, helpful and contributing community surrounding it. The Neo4j database is built to be extremely efficient for handling node links. This performance is due to the fact that Neo4j pre-calculates joins at the time of data writing, compared to relational databases that calculate joins to read using indexes and key logic. Neo4j is the only graph database that combines native graph storage, scalable architecture optimized for speed, and ACID compliance to ensure predictability of relationship-based queries. This graph database uses Cypher query lan- guage as a declarative, pattern-matching language for connected data. This language offer a query plan visualization. This can be really useful, and allows possibilities to rephrase queries in order to allow an optimization to occur. Also, Neo4j provides results based on real-time data providing real time insights of whats happening with the data. This makes Neo4j a suitable technology for large, 1. https://neo4j.com/
En savoir plus

154 En savoir plus

A Cost Estimation Technique for Recursive Relational Algebra

A Cost Estimation Technique for Recursive Relational Algebra

France ABSTRACT With the increasing popularity of data structures such as graphs, re- cursion is becoming a key ingredient of query languages in analytic systems. Recursive query evaluation involves an iterative applica- tion of a function or operation until some condition is satisfied. It is particularly useful for retrieving nodes reachable along deep paths in a graph. The optimization of recursive queries has remained a challenge for decades. Recently, extensions of Codd’s classical relational algebra to support recursive terms and their optimisation gained renewed interest [10]. Query optimization crucially relies on enumeration of query evaluation plans and on cost estimation techniques. Cost estimation for recursive terms is far from trivial, and received less attention. In this paper, we propose a new cost estimation technique for recursive terms of the extended relational algebra. This technique allows to select an estimated cheapest query plan, in terms of computing resources usage e.g. memory footprint, CPU and I/O and evaluation time. We evaluate the effectiveness of our cost estimation technique on a set of recursive graph queries on both generated and real datasets of significant size, including Yago: a graph with more than 62 millions edges and 42 million nodes. Experiments show that our cost estimation technique improves the performance of recursive query evaluation on popular relational database engines such as PostgreSQL.
En savoir plus

5 En savoir plus

Using Probabilistic Relational Models to Generate Synthetic Spatial or Non-spatial Databases

Using Probabilistic Relational Models to Generate Synthetic Spatial or Non-spatial Databases

Existing tools that generate multi-table data often generate random data. However, to get synthetic relational data that resemble real world data, we should consider dependencies among attributes or those among objects. One approach to achieve this could be to generate data probabilistically using a generative model, which typically uses probabilistic models to describe how data is generated. Among such generative models is a Bayesian network (BN) [11], which represents probabilistic dependencies among random variables as a graph. However, BNs can only model single-table (non-relational) data. Because BNs are among simple probabilistic graphical models (PGM) with intuitive graphical representation, and several algorithms for sampling a BN to generate unseen data are already available in the literature, we consider Prob- abilistic Relational Models (PRMs) [12], [13], an extension of Bayesian networks for relational settings, to generate syn- thetic relational data. In our earlier works [14], [15], we had proposed a method for generating (non-spatial) datasets using PRMs. In this article, we extend it to generate spatial datasets using PRMs with Spatial Attributes (PRMs-SA) [16], an extension of PRMs that support spatial objects. We will also discuss on three different data sampling techniques applicable in our framework, and present experimental results on the performance of the data generation algorithms, and the quality of generated datasets. Our dataset generation method serves as a method for benchmark generation not only for evaluating PRM learning and other relational learning algorithms but also for testing database applications.
En savoir plus

13 En savoir plus

Preserving object-relational databases for the next generations

Preserving object-relational databases for the next generations

relevant information to ensure the integrity of the data, grant or revoke access to data, specify the join paths between tables, specify the format of the multimedia content, and so on. The internal structure is thus known (and needs to be preserved) by the users and the DBMS; and this internal structure (or schema) evolves over time. Access to an object- relational database is achieved through a sophisticated graphical user interface (GUI), which then utilizes the so-called logical layer to give access to the physical data, as stored in separate, indexed files. The queries posed through this GUI are inherently “ad hoc” in nature, and change over time. Furthermore, the users range from novices to database administrators, who require very different types of access to, and preservation of, the data [5,6]. This paper describes an environment to preserve such evolving object-relational databases over a very long period of time. To this end, we designed and implemented a multi-agent system, to deal with the scalability and evolution of the data, and the associated database schema [7]. An experimental environment is developed to validate our implementation and to provide a base for further research. We combine theoretical proof and empirical confirmation to illustrate our environment.
En savoir plus

9 En savoir plus

Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries

Equivalence-Invariant Algebraic Provenance for Hyperplane Update Queries

are related to the notion of relational difference. Algebraic provenance models for queries with difference have been proposed in [4, 16, 17], but naturally none could be directly applied to update queries; in particular using the “monus" op- eration of [16, 17] as our minus operation may not result in a structure satisfying the equivalence axioms. Further explo- ration of the connections between these models and ours is left for future work. Closest to our work is the multi-version semiring model (MV-semiring) [6] discussed above, a prove- nance model for queries and updates that extends the semir- ing annotation framework of [21]. This model is designed to support transactions executed using the snapshot isolation concurrency control protocol. While the MV-semiring allows for provenance tracking for rich update queries, beyond the class of “domain-based" transactions that we present here, it is not preserved under transaction equivalences as shown in Section 3. An extension of the semiring model of [21] to account for updates was also studied in [24], however the focus there is on the use of provenance in the context of trust and while the work includes an efficient implementation, it falls short of proposing a generic algebraic construction. 8 CONCLUSION AND LIMITATIONS We have developed a novel algebraic provenance model for hyperplane update queries and sequences thereof, following the axiomatization of query equivalence in [23]. We have shown that the model captures the “essence of computation" for such queries, i.e. equivalent transactions yield equiva- lent provenance. We have shown means of instantiating the model, towards applications of provenance in this context. The example instances show the usefulness of the generic model: by following the axioms, we are guaranteed that our provenance construction is independent of transaction rewrites. We have further studied the efficient computation and storage of provenance, and have shown a minimization technique that leads to compact provenance representation via a normal form. This again leverages the axioms, this time in a computational manner. Our experimental evaluation shows the tractability and usefulness of the approach, as well as the benefit of using the normal form.
En savoir plus

16 En savoir plus

An exact approach to learning probabilistic relational model

An exact approach to learning probabilistic relational model

Abstract Probabilistic Graphical Models (PGMs) offer a popular framework including a variety of statistical formalisms, such as Bayesian networks (BNs). These latter are able to depict real-world situations with high degree of uncertainty. Due to their power and flexibility, several extensions were pro- posed, ensuring thereby the suitability of their use. Probabilistic Relational Models (PRMs) extend BNs to work with relational databases rather than propositional data. Their construction represents an active area since it remains the most complicated issue. Only few works have been proposed in this direction, and most of them don’t guarantee an optimal identification of their dependency structure. In this paper we intend to propose an approach that ensures returning an optimal PRM structure. It is inspired from a BN method whose performance was already proven.
En savoir plus

13 En savoir plus

On the Optimization of Recursive Relational Queries: Application to Graph Queries

On the Optimization of Recursive Relational Queries: Application to Graph Queries

1 INTRODUCTION The expressive power of query languages has been greatly improved with the introduction of recursion. Recursive queries are, for instance, very useful in data integration since expressive ontologies use recursion [29]. Graph databases are another example where recursion is particularly useful for expressing navigation along paths connecting nodes in the graph. For this purpose, graph query languages often include constructs such as Regular Path Queries (RPQs) [27], and various extensions such as Conjunctions of them (CR- PQs) and Union of CRPQs (UCRPQs) [19, 20, 26, 50]. For instance, the query language SPARQL 1.1 [43] introduced Property Paths, and language proposals such as OpenCypher [34, 55] and G-core [11] also include the possibility of ex- pressing recursive paths. SPARQL’s Property Paths revealed crucial for extracting information from RDF data structures such as those found in social networks, life sciences and transportation networks. However, recursive path queries are notoriously known to be much harder to optimize and evaluate than non-recursive ones [54, 71]. In practice, even with datasets of modest sizes, the benchmarking work found in [15] notices that “all tested systems either failed on the majority of these [recursive] queries or had to be manually terminated after unexpectedly long running times.” A major difficulty is to find an appropriate way to execute the query, a task frequently referred to in the literature as finding an appropriate Query Execution Plan (QEP). For example, let us consider the following query:
En savoir plus

24 En savoir plus

Fuzzy Quantified Queries to Fuzzy RDF Databases

Fuzzy Quantified Queries to Fuzzy RDF Databases

However, to the best of our knowledge, there does not exist any work in the literature that deals with fuzzy quantified patterns in the SPARQL query language, which was the main goal of our work. Fuzzy quantified queries have been long studied in a relational database context, see e.g. [1] whose authors distinguish two types of fuzzy quantification: horizon- tal quantification [9] used for combining atomic conditions in a where clause and vertical quantification for which the quantifier appears in a having clause in order to express a condition on the cardinality of a fuzzy subset of a group. This is the type of use we make in our approach.
En savoir plus

8 En savoir plus

Average-case complexity for the execution of recursive definitions on relational databases

Average-case complexity for the execution of recursive definitions on relational databases

At a first level, our approach has provided a formalism for describing the underlying structure mainly based on the notion of equivalent nodes and then has allowed us to obtain expressions for the complexity of the algorithms by averaging over all possible queries on a given database structure. At a second level, average-case results have been obtained by taking into account all possible structures with a given number of nodes. The first level can be used to characterize any given situation but requires a rather detailed representation of the structure. The second level allows an abstract characterization which uses no representation at all and leads to simpler expressions.
En savoir plus

16 En savoir plus

Show all 10000 documents...