Annotation methods - Query Engine - Querying Evolving Data

Querying Evolving Data

5.6. Query Engine

5.8.6. Annotation methods

In Research Question 3, we wanted to know how the different annotation methods influ-enced the execution times. From the results in Subsection 5.7.3, we can conclude that graph-based annotation results in the lowest execution times. It can also be seen that an-notation with time intervals has the problem of continuously increasing execution times, because of the continuously growing dataset. Time interval annotation can be desired if we for example want to maintain the history of certain facts, as opposed to just having the last version of facts using expiration times. In future work, we will investigate alter-native techniques to support time interval annotation without the continuously increasing execution times.

In this work, the frequency at which our queries are updated is purely data-driven using time intervals or expiration times. In the future it might be interesting, to provide a con-trol to the user to change this frequency, if for example this user only desires query up-dates at a lower frequency than the data actually changes.

In future work, it is important to test this approach with a larger variety of use cases. The time annotation mechanisms we use are generic enough to transform all static facts to dy-namic data for any number of triples. The CityBench [34] benchmark can for example be used to evaluate these different cases based on city sensor data. These tests must be scaled (both in terms of clients as in terms of dataset size), so that the maximum number of concurrent requests can be determined, with respect to the dataset size.

5.9. Addendum

In this section, I summarize the follow-up work that has been done since the article corre-sponding to this chapter has been published. Concretely, I focus on “On the Semantics of TPF-QS towards Publishing and Querying RDF Streams at Web-scale” [129] that has been published two years after the work from this chapter. This article aims to resolve some of the initial weaknesses. Concretely, a proper formalization is introduced, using which the system (TPF-QS) is compared using alternative RDF stream processing sys-tems. Furthermore, a more extensive evaluation is done using a state of the art bench-mark. These two parts are summarized hereafter.

5.9.1. Formalization

RSP-QL [130] is a formal reference model in which different RDF Stream Processing (RSP) systems can be compared to each other, such as C-SPARQL [15], CQELS [16] and TPF-QS [96]. It can be seen as an extension of RDF and SPARQL, by introducing tem-poral semantics. In the next paragraphs, I will summarize the RSP-QL model, explain how TPF-QS fits into this, and how TPF-QS can be compared to alternative RSP systems using this model. I will omit the details and full formal definitions that can be found in the full paper [129].

RSP-QL Overview

RSP-QL introduces the concept of an RDF stream that is defined as an unbounded se-quence of pairs. Each pair consists of an RDF statement and a time instant.

In order to query RDF streams, the concept of an RDF dataset was extended to an RSP-QL dataset. Such a dataset consists of an optional default graph, zero or more named graphs, and zero or more (named) time-varying graphs. A time-varying graph is a func-tion that maps time instants to instantaneous RDF graphs.

To avoid querying over very large streams, the concept of a time-based window was in-troduced. A time-based window is defined by a certain width, a slide parameter, and a starting time, where all of these parameters are expressed in time units. Concretely, such a window takes an RDF stream as input, and produces a time-varying graph.

To model the different ways in which repeated query evaluation can occur, so-called evaluation strategies were introduced. For example, the Content Change (CC) strategy makes the window report results when window contents change. Window Close (WC) re-ports when the window closes. The Non-empty Content (NC) strategy rere-ports if the active window is not empty. The Periodic (P) stategy reports at regular time intervals.

Finally, after windowing, query execution results can be reported in different ways, where each of them adds time annotations to the solution mappings. RStream annotates an input sequence of solution mappings with the evaluation time; IStream streams the difference between the answer of the current evaluation and the one of the previous itera-tion; DStream streams the part of the answer at the previous iteration that is not in the current one.

TPF-QS in terms of RSP-QL

From the perspective of a TPF-QS client, the data that was retrieved from a TPF server can be interpreted as an RSP-QL stream, for which we introduced a formal mapping.

Based on this, all elements of the RSP-QL model can be applied.

Windows within TPF-QS can have a configurable starting time, and always have a width and slide parameter of exactly one time unit. As a consequence, the evaluation of a wdow in TPF-QS will always produce a time-varying graph that contains exactly one in-stantaneous RDF graph.

TPF-QS supports two configurable evaluation strategies: Periodic and Mapping Expire (ME). The Mapping Expire strategy is specific to TPF-QS, and is possible because of the time validity annotations that are exposed by TPF servers. In summary, Mapping Expire will make the window report when the validity of an RDF statement that was used in the last solution mapping expires.

Table 16: Comparison TPF-QS, C-SPARQL, and CQELS in terms of the main elements of the RSP-QL reference model.

Dans le document Storing and querying evolving knowledge graphs on the web (Page 149-152)