Haut PDF The semantics of queries and updates in relational databases

The semantics of queries and updates in relational databases

The semantics of queries and updates in relational databases

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignemen[r]

41 En savoir plus

SqlIceCube: Automatic Semantics-Based Reconciliation for Mobile Databases

SqlIceCube: Automatic Semantics-Based Reconciliation for Mobile Databases

criterion (e.g. timestamps) to decide which updates to retain and/or the order to run them. This is inflexible and may cause spurious conflicts even in very small problems. Consider the example of figure 1, where two users make meeting requests to a calendar program. One user requests room A at 9:00, and either room B or C, also at 9:00. Meanwhile, other user requests either room A or B at 9:00. Combining the logs in some simple way does not work. For instance running Log 1 then Log 2 will reserve rooms A and B for the first user, and the second user’s request is dropped. Running Log 2 first has a similar problem. Satisfying all three requests requires reordering them, which syntactic systems cannot do.
En savoir plus

13 En savoir plus

Learning probabilistic relational models with (partially structured) graph databases

Learning probabilistic relational models with (partially structured) graph databases

Figures 7 represents running time in the same context, in log scale. We can observe that increasing the slot chain length improves the quality of reconstruction, but increases the running time. The deeper we go, the better results we get in term of Precision, Recall and F-score. By increasing the search space, we increase the number of times the data is accessed, and Figure 7 shows us again the interest of using a graph database instead of a relational one, with a running time divided by a factor greater than 10. This is even more impressive in Figure 7(right) with 500.000 instances, where the running time drops drastically. transforming the structure learning process into a feasible task.
En savoir plus

9 En savoir plus

A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet

A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet

Ideally, one would like to compute frame co-occurrences statistics from a frame annotated corpus which is big enough for machine learning tasks. FrameNet does provide a frame annotated corpus, but it is too small to be truly useful for making generalizations. A workaround is to use an unannotated cor- pus together with FN’s frame-evoking lexicon to decide which frame is being triggered by a particular lexical unit. The problem is now to resolve the ambiguity in cases where a lexical unit can potentially trigger more than one frame. (Pennacchiotti and Wirth, 2009) suggested a weighted co-occurrence mea- sure, which gave lower weights to the co-occurrence of ambiguous words. The probabilities of sense occurrences of lexical units were learned from the WordNet sense tagged corpus SemCor. Our approach is slightly different in the sense that instead of learning word-sense probabilities first and then map- ping it to frames, we directly learn the probabilities of lexical units triggering particular frames from FrameNet’s annotated corpus using the ratio of the number of frames triggered by a lexical unit lu in the FrameNet corpus and of the total number of occurrences of lu in the corpus. This is arguably more direct and only use FrameNet information, although both approaches need annotated data. The probabilities are then used to compute a weighted PMI between frames (the weighting function simply sums up the probabilities of a lexical unit triggering a particular frame F over the entire GigaWord).
En savoir plus

13 En savoir plus

Preserving object-relational databases for the next generations

Preserving object-relational databases for the next generations

relevant information to ensure the integrity of the data, grant or revoke access to data, specify the join paths between tables, specify the format of the multimedia content, and so on. The internal structure is thus known (and needs to be preserved) by the users and the DBMS; and this internal structure (or schema) evolves over time. Access to an object- relational database is achieved through a sophisticated graphical user interface (GUI), which then utilizes the so-called logical layer to give access to the physical data, as stored in separate, indexed files. The queries posed through this GUI are inherently “ad hoc” in nature, and change over time. Furthermore, the users range from novices to database administrators, who require very different types of access to, and preservation of, the data [5,6]. This paper describes an environment to preserve such evolving object-relational databases over a very long period of time. To this end, we designed and implemented a multi-agent system, to deal with the scalability and evolution of the data, and the associated database schema [7]. An experimental environment is developed to validate our implementation and to provide a base for further research. We combine theoretical proof and empirical confirmation to illustrate our environment.
En savoir plus

9 En savoir plus

A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet

A Supervised Approach for Enriching the Relational Structure of Frame Semantics in FrameNet

We did an error analysis on the 37 false positive in the top 100 test set, and came to the conclusion that most frame pairs were picked up as related because they involve related lexical units (and often with different senses), rather obviously, and that they involve either (1) co-hyponyms at different levels of granularity; or (2) frame describing opposite events, such as Removing and Filling, which do not corre- spond to any FrameNet relation. Case (2) is arguably a FrameNet problem, since a lot of frames involve antonyms, and could indicate here missing frames at a higher level up the hierarchy of inheritance, an inconsistency noted in previous work (Hasegawa et al., 2011). As an example the two adjectives easy and difficult are part of the same Frame D IFFICULTY , while the verbs empty and fill are in separate frames E MPTYING and F ILLING . Case (1) is an interesting perspective for improvement: relations should not be considered between frame pairs in isolation, but with respect to all existing or predicted relations, in a global way. If a frame F 1 is already related to a frame F 2 , there should not be a relation between F 1 and, e.g., subframes of F 2 . Note also that most Frames that are siblings in a subframe relation also have explicit relations, usually a “Precede” relation (see figure 1), but also vaguer links such as “Using”.
En savoir plus

12 En savoir plus

Partitionning medical image databases for content-based queries on a grid

Partitionning medical image databases for content-based queries on a grid

or the farm manager will not accept communication with unauthenticated hosts for security reasons. Data management. Since we do not make any assumption on each host file system, each host is supposed to dispose of its own storage resources, not necessarily visible from the other hosts. At creation time, a grid node declares its available space to the farm manager and will send frequent updates of this value. The farm manager holds a catalog of all files known to the middleware. Each file is described by a unique grid wide identifier. Thus the user can refer to a file without needing to know its physical location. A file becomes known to the grid once it has been registered: the user transfers the file from its local machine or any external storage through the middleware interface. The file is registered (its identifier is written in the farm manager files table) and a physical copy is stored on a grid node. The farm manager holds a table giving associations between the file identifier and its physical replicates. Several replicates may exist on several nodes for a file. Indeed, when an host is responsible for executing a job, it needs to access a set of files manipulated by this job. Since all job files are not necessarily located on a single node, they are first copied onto the target node before the job is started. These multiple instances of a file are then kept on the nodes, unless disk space is lacking, for caching in case of subsequent use. This replication of files causes an obvious problem of coherence that is not handled in the current implementation: the user is responsible for creating a new file in case of modification. Our middleware controls the access to file authorization through the user certificate subject. The subject string is stored in the farm manager table on file registration allowing the system to control file access at each user level.
En savoir plus

22 En savoir plus

A Coq Mechanised Formal Semantics for Realistic SQL Queries

A Coq Mechanised Formal Semantics for Realistic SQL Queries

to, while extending it, the one presented in [ 8 ]. By formally relating SQL Coq to SQL Alg , through a Coq mechanised transla- tion, and by formally proving that these translations preserve semantics, we not only are able to (iv) recover all well-known algebraic equivalences on which are based most of compilers’ optimisations but we also (v) establish the first, to our best knowledge, mechanised, formal proof of equivalence between the considered SQL fragment and bag relational algebra. Organisation In Section 2 , we first present SQL and SQL’s subtleties that need to be taken into account to provide a correct semantics. Then we detail, in Section 3 , SQL Coq ’s syn-
En savoir plus

14 En savoir plus

Ontology-Mediated Queries for NOSQL Databases

Ontology-Mediated Queries for NOSQL Databases

Whether this paradigm can be used in conjunction with other kinds of query lan- guages is a still an open question. The naive way to deal with non relational datasources is to define mappings for translating them into relational structures, and then use the classic OBDA framework as it is. However, this approach would induce a significant performance degrade as it would add a step for converting the data using the map- pings and, most importantly, it would make impossible to take advantage of the low level query optimizations provided by native systems. This can be particularly acute for NOSQL systems, like key-value (KV) stores, that have been specifically designed to scale when dealing with very large collections of data.
En savoir plus

5 En savoir plus

Provenance and Probabilities in Relational Databases: From Theory to Practice

Provenance and Probabilities in Relational Databases: From Theory to Practice

We fix a finite set X = { x 1 , . . . , x n }, the elements of which we call Boolean events (i.e., variables that can be either > or ⊥). As in [31], we let provenance tokens (the anno- tations attached to tuples of the input databases) be Boolean functions over X, that is, functions of the form ϕ : (X → {>, ⊥}) → {>, ⊥}. They are interpreted under a possible-world semantics: every valuation ν : X → {>, ⊥} denotes a possible world of the database; in this possible world, a tuple with annotation ϕ exists if and only if ϕ(ν) = >. For a given database D, we denote ν(D) the set of tu- ples t with annotation ϕ t such that ϕ t (ν) = >. It is
En savoir plus

12 En savoir plus

Computation of Extended Answers to Relational Database Queries

Computation of Extended Answers to Relational Database Queries

Conclusion This report presented a state of the art on databases and information retrieval, followed with the presentation of the work done during the Master’s internship. The original prototype now offers cinematographic recommendations based on several criteria, that may be computed with several matching and aggregation measures, and that may be explained. A few notions on recommender systems, schema summarization and similarity measures were introduced to give a scope on the possibilities of their uses in our subject. Fusing classical recommender systems with our approach appears to be feasible, having the advantage of providing explanations as to why elements are suggested in addition to offering another way to compute elements. To improve the quality of our recommendations as cooperative answers, we looked into highlighting the atypical properties of each item recommended with regard to the whole set of similar items. So beyond simply offering recommendations as well as explaining them, we can also demonstrate how different they are from each other. This concept may be useful when facing overspecialization in content-based recom- mender systems or, as we introduced it, favorises knowledge discovery. Completely generalizing this method to any database automatically does not seem possible at the moment, considering the subjectivity of the notion of similarity according to users. A possibility that we will be investigating later resides in the use of domain ontologies and query logs to define which associations of tables may be interesting for the users and potentially suggest queries.
En savoir plus

51 En savoir plus

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Clever generation of rich SPARQL queries from annotated relational schema: application to Semantic Web Service creation for biological databases

Validation of the SPARQL query results We compared the data retrieval resulting from the three approaches (i.e., the Dijkstra algorithm, BioSemantic and a human SQL query builder) (Table 4). We refer to a human SQL query as a query that is manually written by an expert with good knowledge of the database schema. A first general observation demonstrates that the num- ber of results is identical for BioSemantic queries and the manual SQL queries. BioSemantic globally retrieves more results than the Dijkstra algorithm. The gap for Query1 is explained because of the inheritance relation- ships missed by the Dijkstra algorithm. Indeed, in that case, BioSemantic detects these relationships and re- groups the subdivided paths into the final query. Fur- thermore, BioSemantic preferentially selects binary association tables that promote more data retrieval. Both Query2 and Query3 correspond to a short path without inheritance but with several paths having the same node numbers. In that case, weighting the BioSemantic path favours binary associations, whereas the Dijkstra algo- rithm chooses the first detected path having a minimum node number. For Query2, BioSemantic favours the de- tection of a more pertinent path, whereas the same paths are detected for Query3. For Query4, no equivalent path guides to the same results; in other words, both algo- rithms select the same path. In each case, we manually verified that the retrieved data were identical.
En savoir plus

16 En savoir plus

Processing Fuzzy Relational Queries Using Fuzzy Views

Processing Fuzzy Relational Queries Using Fuzzy Views

the filtering of the result (top k elements and/or those whose score is over the threshold α). Besides, SQLf also preserves (and extends) the constructs specific to SQL, e.g. nesting operators, relation partitioning, etc., see [14] for more detail. In the following, we only consider single-block Selection-Projection-Join SQLf queries. Any fuzzy querying system must provide users with a convenient way to define the fuzzy terms that they wish to include in their queries. In practice, the membership function associated with a fuzzy set F is often chosen to be of a trapezoidal shape. Then, F may be expressed by a quadruplet (a, b, c, d) where core(F ) = [b, c] and support(F ) = (a, d). In a previous work [15], we described a graphical interface aimed at helping the user express his/her fuzzy queries. B. About Fuzzy Query Processing: Related Work
En savoir plus

7 En savoir plus

Average-case complexity for the execution of recursive definitions on relational databases

Average-case complexity for the execution of recursive definitions on relational databases

The derivation of mean execution costs has now become an important chapter in the analysis of algorithms ([10]). Average-case analysis provides results which, to some extent, summarize the salient features of the behaviour of the algorithm and can highlight aspects of the problem that are not visible through the most commonly used worst-case analysis. In fact, by using average-case evaluation, we can gain a quick insight into the properties of the algorithms with- out depending on information that is really too detailed to be handled in practice. Of course, we do not imply that this mean execution costs, based on the hypothesis of equal occurrence probabilities for the distinct possible queries, or, at the higher level, of equal occurrence prob- abilities for the distinct possible queries forest structures for the EDB relation, reflect exactly the actual costs in every situation occurring in practice. However, we have undertaken to treat the average-case complexity analysis of logic programming algorithms in a systematic way and to produce asymptotic expressions for this complexity; this, to our knowledge, is the first time that such a theoretical thought process is undertaken.
En savoir plus

16 En savoir plus

Bipolar SQLf: a Flexible Querying Language for Relational Databases

Bipolar SQLf: a Flexible Querying Language for Relational Databases

In order to define a bipolar relational algebra, the algebraic operators (selec- tion, projection, join, union, intersection) have been extended to fuzzy bipolar conditions [11, 3]. These operators allow the expression of fuzzy bipolar queries. We are aimed in this article to define the Bipolar SQLf language which is an SQL-like language based on a bipolar relational algebra. Since fuzzy bipolar con- ditions generalize fuzzy conditions, we consider the enrichment to fuzzy bipolar conditions of the SQLf language [2, 1] which is devoted to flexible querying with fuzzy sets. At the first step basic Bipolar SQLf queries are defined in terms of expression, evaluation and calibration. Then, complex bipolar queries based on nesting (in = , in ≈ , exists, θany) and partitioning operators are defined.
En savoir plus

13 En savoir plus

Fuzzy Quantified Queries to Fuzzy RDF Databases

Fuzzy Quantified Queries to Fuzzy RDF Databases

However, to the best of our knowledge, there does not exist any work in the literature that deals with fuzzy quantified patterns in the SPARQL query language, which was the main goal of our work. Fuzzy quantified queries have been long studied in a relational database context, see e.g. [1] whose authors distinguish two types of fuzzy quantification: horizon- tal quantification [9] used for combining atomic conditions in a where clause and vertical quantification for which the quantifier appears in a having clause in order to express a condition on the cardinality of a fuzzy subset of a group. This is the type of use we make in our approach.
En savoir plus

8 En savoir plus

Probabilistic relational models learning from graph databases

Probabilistic relational models learning from graph databases

B.2 Neo4j graph database Neo4j 1 is a NOSQL graph database. It is a fully transactional database (ACID) that stores data structured as graphs. It offers high query performance on complex data, while remaining intuitive and simple for the developer. Neo4j is developed by the Swedish-American company Neo technol- ogy. It has been in commercial development for 10 years and in production for over 7 years. Most importantly it has the largest and most vibrant, helpful and contributing community surrounding it. The Neo4j database is built to be extremely efficient for handling node links. This performance is due to the fact that Neo4j pre-calculates joins at the time of data writing, compared to relational databases that calculate joins to read using indexes and key logic. Neo4j is the only graph database that combines native graph storage, scalable architecture optimized for speed, and ACID compliance to ensure predictability of relationship-based queries. This graph database uses Cypher query lan- guage as a declarative, pattern-matching language for connected data. This language offer a query plan visualization. This can be really useful, and allows possibilities to rephrase queries in order to allow an optimization to occur. Also, Neo4j provides results based on real-time data providing real time insights of whats happening with the data. This makes Neo4j a suitable technology for large, 1. https://neo4j.com/
En savoir plus

154 En savoir plus

Translation of Relational and Non-Relational Databases into RDF with xR2RML

Translation of Relational and Non-Relational Databases into RDF with xR2RML

1 INTRODUCTION The web of data is now emerging through the pub- lication and interlinking of various open data sets in RDF. Initiatives such as the W3C Data Activity 1 and the Linking Open Data (LOD) project 2 aim at Web-scale data integration and processing, assuming that making heterogenous data available in a common machine-readable format should create opportunities for novel applications and services. Their success largely depends on the ability to reach data from the deep web (He et al., 2007), a part of the web content consisting of documents and databases hardly linked with other data sources and hardly indexed by stan- dard search engines. Furthermore, the integration of heterogeneous data sources is a major challenge in several domains (Field et al., 2013). As data seman- tics is often poorly captured in database schemas, or encoded in application logics, data integration tech- niques have to capture and expose database semantics in an explicit and machine-readable manner.
En savoir plus

13 En savoir plus

Displaying Updates in Logic

Displaying Updates in Logic

Our dynamic interpretation of the ternary relation is consistent with the above considera- tions: sometimes updating beliefs amounts to revise beliefs. The dynamic reading of the ternary relation and its corresponding conditional is very much in line with the so-called “Ramsey Test” of conditional logic. The Ramsey test can be viewed as the very first modern contribution to the logical study of conditionals and much of the contemporary work on conditional logic can be traced back to the famous footnote of Ramsey [58]. 2 Roughly, it consists in defining a counterfactual conditional in terms of belief revision: an agent currently believes that ϕ would be true if ψ were true (i.e. ψ ⊃ ϕ) if and only if he should believe ϕ after learning ψ. A first attempt to provide truth conditions for conditionals, based on Ramsey’s ideas, was proposed by Stalnaker. He defined his semantics by means of selection functions over possible worlds f : W × 2 W → W . As one can easily notice, Stalnaker’s selection functions could also be considered from a formal point of view as a special kind of ternary relation, since a relation R f ⊆ W ×2 W ×W can be canonically associated to each selection function f . 3 So, the
En savoir plus

58 En savoir plus

Data-Driven Publication of Relational Databases

Data-Driven Publication of Relational Databases

remain to be investigated: typing (i.e., compile-time verification of the output document validity with respect to a given grammar), closure and composition of queries, optimal query evaluation and rewriting, etc. The rewriting rules and evaluation strategy aim at eliminating redundant queries or repetitive execu- tion of invariant queries. They show the potential and flexibility of the language for optimization of large sets of interrelated SQL queries. We plan to use this flexibility in the context of object-oriented inter- faces (e.g., hibernate, http://www.hibernate.org) which promote a one-tuple-at-a-time access strategy to relational databases. In such a context our query graph can be interpreted as the “profile” of a program, allowing the OO interface to anticipate the fetching of tuples and their grouped transfer from the server. The materialized subgraph acts as a structured client cache, raising several issues regarding the amount of allocated memory, the partitioning of the query graph in SQL queries, and the asynchronous transfer be- tween the client and the server. Such an application represents an extension of the motivation that underlies the present paper: separating the programming aspects from the client-server SQL exchanges.
En savoir plus

19 En savoir plus

Show all 10000 documents...