• Aucun résultat trouvé

Trajectory Query Processing

Dans le document Mobility, Data Mining and Privacy (Page 183-188)

Trajectory Database Systems

6.7 Open Issues: Roadmap .1 Trajectory Indexing.1 Trajectory Indexing

6.7.2 Trajectory Query Processing

Although sufficient amount of research work exists in the context of trajectory query processing, still there are several issues to be handled. Outlining this sec-tion, we suggest that (a) nearest-neighbor search asks for more efficient support by the various indexing methods, (b) trajectory similarity search and derived infor-mation querying (involving speed, heading, etc.) need to be supported by general purpose indexes, (c) query optimization techniques must be further examined, and (d) trajectory querying under uncertainty needs further study.

6.7.2.1 Nearest-Neighbor Search

As previously mentioned, R-tree-like structures can efficiently support NN queries, while regarding the rest of the proposed spatiotemporal indexes, the corresponding papers do not consider NN search algorithms. However, for some of the proposals, NN querying can be easily supported. For example, since in the FNR-tree the under-lying network is indexed by a conventional R-tree, the best-first algorithm described

176 E. Frentzos et al.

in [31] can be employed to find the spatial nearest neighbor; then, given that the network line segments (i.e., the spatial elements of the trajectory segments) are reported in incremental order of their distance from the query object, the algorithm would have to report such nearest segments until retrieving the first overlapping the query in the temporal dimension. A similar approach can be also employed in MON-tree, while SETI would have to search among all entries contained inside the corresponding cell in which the query point lies.

6.7.2.2 Trajectory Similarity Search

Although, as previously discussed, there is a sufficient number of research papers in the domain of similarity search, the majority of the proposals exploit dedicated index structures to prune the search space and efficiently support themost similar trajec-tory(MST) search. However, these dedicated structures require costly preprocessing steps and they do not conform to the requirement of online action, since there is no obvious way for them to be updated during the database operation. Therefore, future work needs to deal with the k-most similar trajectory search in MODs stor-ing historical trajectory information, exploitstor-ing existstor-ing index structures, which can also be used to support other types of queries. Moreover, in order to use traditional index structures, future work should be based on novel metrics that follow trian-gle inequality, since the already proposed schemes [12, 70] typically use nonmetric measures that cannot be indexed with the majority of the proposed spatiotemporal indexes. One such proposal for the dissimilarity between trajectories completing the above requirements is the following [38]:

DISSIM(Q,T) = tn

t1

DistQ,T(t)dt, (6.3) where DistQ,T(t)is the function of the Euclidean distance between trajectoriesQ andT with time. However, adopting the model where each trajectory is represented by a collection of discrete points with linear interpolation applied in between, the definition of dissimilarity can be transformed to [22]:

DISSIM(Q,T) =n−1

k=1 tk+1

tk DistQ,T(t)dt, (6.4) wheretkare the timestamps that objectsT andQrecorded their position. Frentzos et al. [22] discuss exactly this problem, evaluating the above equation that provides a closed formula, nevertheless, expensive in terms of computational power. This for-mula is subsequently efficiently approximated using numerical analysis techniques.

Moreover, Frentzos et al. [22] provide an efficient algorithm, based on several novel metrics and heuristics used for pruning purposes.

Moreover, additional algorithms to support other types of similarity search (i.e., directional, speed pattern, etc.), as already discussed in the previous chapter, have

to be proposed. For example, since the evolution of the speed or heading of a given trajectory can be considered as an one-dimensional time series, metrics, and algo-rithms already utilized in the context of time series can be directly applied (such as edit distance, LCSS, and dynamic time wrapping).

6.7.2.3 Derived Information Queries

As discussed earlier, queries regarding derived information on trajectories have found limited interest. This category includes queries of the following types:Find objects moving instantly with speed between Vmin and Vmax, inside a given time period (and/or given spatial extent)orFind objects moving with an average heading between dir1and dir2. Currently, such queries can be answered by employing a sim-ple temporal index storing the speed or heading time series or through an exhaustive search over the trajectories stored inside the database.

On the other hand, assuming that trajectories are indexed by an R-tree-like spa-tiotemporal index, some preliminary information about each object’s velocity vector can be found before accessing the leaf node containing its entries. For example, the dimensions of the bounding box can provide a first estimation of the average speed of the object during the temporal period covered by the node MBB. Moreover, by employing the TB-tree, which stores in each leaf node segments from the same trajectory, we can estimate some tighter bounds for the maximum and minimum average speed of a single trajectory without actually accessing leaf nodes, by using the fact that the length of the part of the trajectory being inside the leaf MBB can not be smaller than the MBB’s diagonal and its lifetime is exactly determined by the MBB’s temporal extension.

6.7.2.4 Spatiotemporal Query Optimization

Query optimization concerns about the development of selectivity estimation tech-niques and cost formulas for the execution of the several types of queries. On the subject of estimating the selectivity of several spatiotemporal queries against tra-jectory data sets, there are two equally significant directions. The first deals with the estimation of thenumber of three-dimensional line segmentsin the spatiotem-poral data space retrieved by a given spatiotemspatiotem-poral range query, while the second deals with the actual number of distinct trajectoriesretrieved by the same query.

The former is useful in the case of approximating the cost in the execution of a query, since all the proposed indexing schemes physically index three-dimensional line segments. Therefore, formulas of cost models already proposed for query opti-mization (as the one presented in 6.1 for the R*-tree [35]), involving the data set cardinalityN(which is equal to the data set density in the unit space), would have to utilize the local densityNinstead ofNto produce a more accurate result, whereN andNrefer to the number of line segments and not the number of distinct trajecto-ries. On the basis of the above discussion, an extension of a simple spatial histogram,

178 E. Frentzos et al.

as the one presented in [1], in the spatiotemporal space, could be straightforwardly utilized to efficiently approximate the data set’s local density.

On the other hand, regarding the later estimation (i.e., the number of distinct tra-jectories), it is not an easy task, since it involves the well-knowndistinct-counting problem[63]. The distinct-counting problem stands when an object samples its posi-tion in several timestamps inside a given query window, resulting to be counted multiple times in the query result. Tao et al. [63] provide a solution to the aforemen-tioned problem by integrating spatiotemporal indexes with sketches, traditionally used for approximate query processing. However, their proposal reduces the space requirements only a few times (typically about the 40% of the original database size), while the corresponding index structure is maintained on the disk. Clearly, such an approach cannot be utilized instead of histograms (having a typical size of a few kilobytes [1]), since it introduces a sizeable overhead in terms of both memory space and processing time requirements.

In the same fashion, a spatiotemporal histogram concerning about the number of distinct trajectories would have to partition the space into several spatiotemporal buckets, counting the number of distinct trajectories inside each bucket. However, when trying to produce a selectivity estimation regarding a query window that con-tains more than one buckets, this estimation cannot be computed as the sum of the cardinality of two buckets since trajectories may be counted several times depending on the number of buckets they overlap. Figure 6.17 exemplifies this problem, illus-trating four histogram buckets (B1,B2,B3,B4) along with their respective selectivity Sel(Bi): the selectivity of all four buckets Sel(∑Bi) =3 is far from being the sum of∑Sel(Bi) =7 because trajectoriesT1,T2,T3will be counted as many times, as the buckets each of them overlaps. Moreover, the same problem arises during the histogram construction following the methodology introduced in [1] for simple spa-tial histograms: the construction algorithm inispa-tially calculates the number of distinct objects inside each cell produced by a dense spatial grid, and then in each iteration it aggregates groups of cells to form more wide buckets, based on theMinSkew heuris-tic. However, during this aggregation, the number of trajectories inside each resulted bucket has to be calculated, clearly, not as the sum of the trajectories contained inside each fundamental cell.

Future work in the field of spatiotemporal query optimization includes the development of formulas for selectivity estimation on a variety of queries, such

B4

Fig. 6.17 The distinct-counting problem in trajectory histograms

as spatiotemporal joins, similarity-based search, derived information queries, etc.

Moreover, since the work on spatiotemporal cost models is very limited, the need for cost models regarding the variety of the proposed spatiotemporal indexes arises.

Finally, interesting directions arise when dealing with cost models under the pres-ence of uncertainty. For example, while probabilistic queries have been well studied in the context of spatial and spatiotemporal databases, there is no work dealing with the estimation of the selectivity (or the cost of processing) of a probabilistic query of the form:retrieve objects inside a given spatial (or spatiotemporal) interval with probability greater than x%, neither in spatial nor in spatiotemporal databases.

6.7.2.5 Querying Imprecise Trajectories

There are several interesting issues for future work on query processing under loca-tion uncertainty. One important research direcloca-tion is to lower the execuloca-tion time of already proposed range queries and nearest-neighbor queries under uncertainty.

An extension is to investigate other probabilistic queries, like k-nearest-neighbor queries and reverse nearest-neighbor queries. A critical issue in all these works is to define the quality of query answers over imprecise data, i.e., how reliable are the query results. In the same rationale, it is also important to bound the error associated with the provided answers.

Future work on trajectory querying under uncertainty must also include the deter-mination of the error introduced in query results by measurement and sampling errors. For example, a very interesting MOD capability is to a priori provide the user with an estimation of the error introduced in query results. This approach can be considered as an alternative to the processing of probabilistic queries, already examined in the spatiotemporal domain. Possible research steps toward this direction are:

– To estimate the error (number of false negatives and false positives) introduced in queries over uniformly distributed spatial point data. The estimation of this error could be carried out by formulating the probability of a point to incur false hit regarding a single query and then integrate this probability over the entire space so as to produce its mean value.

– To extend this approach in the spatiotemporal domain assuming also uniform distribution of trajectories.

– Finally, to relax the uniformity assumption with the employment of spatiotempo-ral approximation techniques such as the one presented in [63] or spatiotempospatiotempo-ral histograms as previously discussed.

Regarding the second step and time-slice queries, as illustrated in Fig. 6.18, the extension is straightforward since the temporal slice produces a set of (spatial) points along with their uncertainty area. In the general case, however, of range queries with nonzero temporal extent, this extension is not an easy task; never-theless, we subsequently provide hints toward this direction. Consider for example Fig. 6.19 illustrating trajectories of three moving objects along with their uncertainty

180 E. Frentzos et al.

t y

Q1

x

y

x

Fig. 6.18 The effect of uncertainty in time-slice queries over moving object trajectories is equivalent to the effect of the uncertainty in range queries over spatial point data

t

x Range query

T1

T2

T3

Fig. 6.19 The effect of uncertainty in general range queries

regions (e.g., the dotted areas) in thex–tspace, along with a range query. Because of simplicity reasons, all trajectories are illustrated as line segments without loss of generality. Trajectories T1andT2cannot never encounter a false hit regarding the query window due to the fact that for at least one time instance their uncertainty region was entirely located inside it. On the other hand, trajectoryT3may encounter a false hit because it is not inside the query window; nevertheless, its uncertainty region crosses it. Generalizing the above observation, we can state that only objects whose uncertainty area crosses the query window without being entirely inside it at any time instance may contribute to the number of false hits in the results of the query.

Dans le document Mobility, Data Mining and Privacy (Page 183-188)