Dealing with Location Uncertainty - Trajectory Database Systems

Trajectory Database Systems

6.5 Dealing with Location Uncertainty

N f^j

∏

i=1(s_i,j+q_i)

, (6.1)

whereNis the data set cardinality, f is the fanout of tree nodes, jis the respective level of the R-tree, andq_iis the extent of queryqalong theith dimension (a formula that origins itself in the spatial database domain [65]). The experimental evaluation presented in [59] shows that the proposed model provides accurate estimates for the expected number of node accesses in all settings, while other tested cost mod-els (such as [65]) completely fail. Although the model is not developed only for spatiotemporal data, it is capable to predict the performance of a three-dimensional R*-tree since it supports tree nodes being elongated in the temporal dimension.

However, it cannot be used to other R-tree variants since the calculation of the ERF is based on the R*-tree splitting algorithm.

6.5 Dealing with Location Uncertainty

The recorded location of a moving object does not always represent its precise loca-tion mainly due to GPS erroneous measurements and sampling errors. This problem, known aslocation uncertainty, affects several aspects of a MOD like representa-tion, querying, and indexing. So far, related research emphasizes on representation issues, i.e., how the notion of uncertainty is incorporated into the representation of moving objects within a MOD (see Chap. 5 for more details). Lately, however, sev-eral approaches have raised that deal with querying and indexing under uncertainty.

In this section we present these approaches.

Pfoser and Jensen [46] constrain the uncertainty area of the moving object between two consecutive sampled positions to be the intersection of the uncertainty areas of the samples. In addition, they illustrate how their uncertainty model can be used for query processing purposes in conjunction with a moving point index that supports range queries. Supported queries are the so-calledprobabilistic range queries(PRQ) (i.e., “Retrieve the moving-object positions that were inside query rectangle A at some time between time points B andC with a probability of at

166 E. Frentzos et al.

query window

worst-case error - expansion

measure mean P

query window

Fig. 6.9 Filter step (left) – reﬁnement step (right) [46]

leastX%”). The standard filter-and-refinement method, borrowed from the spatial query processing domain, is adopted for query processing purposes. More specif-ically, the authors expand the query window Aso as to retrieve all line segments containing positions that lie in A with a probability higher or equal to X%; the expansion is determined using the probability X% and the worst-case sampling error (represented as a circle). This comprises the filtering step (left part of Fig. 6.9), which usually returns a superset of the qualifying positions. In the refinement step (right part of Fig. 6.9), the positions contained in the retrieved line segments that actually lie within the query rectangleAwith probability at leastX% are identified.

Instead of the worst-case sampling rate that is used during the ﬁltering step, the sampling error (represented as an ellipse), unique for each position, is used during the reﬁnement step to evaluate positions in time.

Actually the emphasis in this work is on reducing uncertainty in-between sam-pled positions, rather than querying and indexing. The rest approaches [13, 68, 69], as will be shown below, adopt some simpler uncertainty model and give emphasis on efﬁcient and effective query processing.

Trajcevski et al. [69] associate an uncertainty thresholdrto the whole trajectory.

Each point(x,y,t)of the trajectory is associated with anr-uncertainty area, which is actually a horizontal disk with radiusrand center(x,y,t);(x,y)is the expected position at timet. Thus, the trajectory is modeled as a cylindrical volume in three-dimensional space around the given trajectory polyline. In this work, two categories of operators for querying moving objects under uncertainty have been introduced, namely point queries and spatiotemporal range queries, both referring to a single trajectory.Point querieseither refer to the location of the moving object at a speciﬁc time point or to the time point(s) at which the moving object is expected to be at a speciﬁc location.Spatiotemporal range queriesextend the traditional spatiotempo-ral range queries by also considering the uncertainty that is inherent in the database locations of the moving objects.

Location uncertainty affects the deﬁned queries in both their temporal and spa-tial aspects: Regarding the temporal effect, one may query for the objects that are inside the query regionsometimeduring the time interval or for those that are inside the query regionalwaysduring the time interval. Intuitively, the “sometime” oper-ator corresponds to cases where the moving object appears within the query region for some time during the query time, whereas the “always” operator corresponds to cases where the moving object lies within the query region during the whole

(a) (b) (c)

Fig. 6.10 Possible positions of a moving point with respect to regionRi: (a) Possibly–Sometime–

Inside, (b) Possibly–Always–Inside, and (c) Always–Possibly–Inside [69]

(a) (b) (c)

Fig. 6.11 Deﬁnite positions of a moving point with respect to regionRi: (a) Deﬁnitely–Always–

Inside, (b) Deﬁnitely–Sometime–Inside, and (c) Sometime–Deﬁnitely–Inside [69]

query time. Regarding the spatial effect, one may query for the objects that are

“possibly inside” or “definitely inside” the query region. Intuitively, the “possibly inside” operator corresponds to cases where part of the uncertainty area of the mov-ing object appears within the query region, whereas the “definitely inside” operator corresponds to cases where the whole uncertainty area of the moving object lies within the query region. Combining the temporal and spatial effects, the following types of spatiotemporal queries under uncertainty arise: Possibly–Sometime–Inside, Possibly–Always–Inside, Always–Possibly–Inside (Fig. 6.10), Definitely–Always–

Inside, Deﬁnitely–Sometime–Inside, Sometime–Deﬁnitely–Inside (Fig. 6.11).

For the purposes of query processing, Trajcevski et al. [69] assume that a three-dimensional indexing schema is provided by the underlying ORDBMS. The insertion of each trajectory is achieved by enclosing the respective trajectory vol-ume betweent_i andt_i+1in a MBB. The standard filter-and-refinement method is adapted for query processing: During the filtering step, the trajectories that have at least one of their MBBs intersecting with the query polygon are retrieved. For the refinement step, the method relies on the areas of geometry and motion plan-ning [40]. Although in this work a simple uncertainty model (simpler than [46]) is considered, an interesting set of queries over uncertain trajectories is presented.

Trajcevski [68] provides a methodology for answering PRQ under uncertainty.

The queries treated there are of the form:What is the probability that a given moving object was/will be inside a given region sometimes/always during a given time inter-val?This probability is given by the fraction of the intersection area between the trajectory volume and the query region, with the whole trajectory volume. In [72], Wolfson et al. introduce a probabilistic model for processing PRQ in motion data-bases. The output of this type of query consists of pairs of the form(o_i,p_i), wherep_i is the probability that the objecto_iintersects the query regionRat timet. The uncer-tain position of the moving object is represented through a density function. Query

168 E. Frentzos et al.

predicates are distinguished into two parts: the static part,C₁, which refers to the static attributes of the objects, e.g., color, type, etc., and the dynamic part,C2, which refers to the location attributes. The idea is to first retrieve the set of objects satis-fying the predicates of the static part, i.e.,C1, and then proceed with the dynamic part, i.e.,C₂. So, after the retrieval of the set of objects satisfyingC₁, the routes of the resulted objects are retrieved. Then, for each routerand each atomic predicate pappearing inC₂, the list of the intervals of routerwith the region defined bypis retrieved – any spatial indexing schema can be used toward this aim. Finally, the list of intervals of routerwith all predicates of queryqis computed. For each routerand for each objectotraveling onr, the probability that it satisfies queryqis given by:

∑

Cheng et al. [13] study the execution of probabilistic range queries (PRQ) and probabilistic nearest-neighbor queries (PNNQ) under uncertainty. They adopt a generic uncertainty model that for each time point associates an object with an uncertainty region. The position of the object is modeled through a probability density function, which is zero outside the uncertainty region. The algorithm for probabilistic range queriesprocessing integrates the probability distribution func-tion in the overlapping area deﬁned by the query region and the object’s uncertainty region. Processing anearest-neighbor queryinvolves evaluating the probability of each object being closest to a query pointq. The adopted solution consists of four steps: projection, pruning, bounding, and evaluation phases. During theprojection phase, the uncertainty region of each moving object is computed based on the uncer-tainty model used by the application (Fig. 6.12a shows the last recorded object

(a)

Fig. 6.12 An example of a PNNQ processing: (a) locations of objects, (b) uncertainty regions and distances fromq, (c) bounding circle, and (d) bounded regions [13]

locations and Fig. 6.12b their uncertainty regions). During thepruningphase, the minimum f of the longest distances of the uncertainty regions fromqis found and any object with shortest distance toqlarger than f is eliminated (Fig. 6.12c shows how pruning removes objects that are irrelevant toq). During theboundingphase, a bounding circleCof radius f and centerqis conceptually drawn and any object outside this circle is ignored. (This concept is depicted in Fig. 6.12d.) During the evaluation phase, for each resulted object o the probability that it is the nearest neighbor ofqwith distanceris calculated. This probability is given by the proba-bility ofobeing at distancerfromqtimes the probability that every other object is at a distance≥rfromq.

An index structure, the so-called velocity-constrained index (VCI), has been pro-posed by Cheng et al. [13] and is particularly suited for handling uncertainty of free-moving objects. VCI is an R-tree-like index structure. Its difference from R-tree lies in the fact that each node is accompanied with an additional ﬁeld,v_max, which is the maximum possible velocity of movement over all the objects that fall under that node. The only restriction imposed on the movement of objects is that they do not exceed a certain velocity. This velocity could potentially be adjusted if the object wants to move faster than its current maximum velocity. The index is built based on the locations of the objects at a given time point,t₀. However, it can also be used at a later time pointtwithout being updated. The idea is that for a given VCI node, no object under this node can move faster than the maximum velocity stored in the node. Thus, if the MBB is expanded byv_max(t−t₀), then the expanded region is guaranteed to contain all the points under this subtree.

The last subject related to the management of the location uncertainty in trajec-tory databases is the so-calledmap-matchingproblem, which deals with the problem of matching tracking data (via GPS or any other positioning method) in an under-lying map containing, e.g., a road network. This problem occurs due to the fact that raw trajectory positions cannot be directly matched to the underlying infras-tructure, and, they are mainly affected by two factors [9]: the measurement error introduced by, e.g., GPS, and the sampling error being up to the frequency with which position samples are taken, both contributing to the moving object’s location uncertainty.

Related work in the subject of map matching includes, among others, [9, 16, 71, 76], which propose a variety of map-matching algorithms. Perhaps the most promis-ing approach is the one presented in [9], where three map-matchpromis-ing algorithms are presented. These algorithms consider the trajectory nature of the data rather than simply the object’s current position as it often happens in the typical map-matching case. The first one is an incremental algorithm, which matches consecutive por-tions of the trajectory to the road network, effectively trading accuracy for speed of computation. Specifically, Brakatsoulas et al. [9] first employ the similarity measure s(p_i,c_j)[23] between a sampled positionp_iand a network edgec_j, used in order to evaluate the likelihood of p_ito match each one of the candidate network edgesc_j. Then, they propose an algorithm that looks ahead, and rather than calculating the similarity for just one sampled position, it takes into account the sum of the sim-ilarities between thel ahead positions against the local candidate path. The value

170 E. Frentzos et al.

ofl=4 is established empirically that is optimal in terms of matching quality and running time.

The other two algorithms compare the entire trajectory to candidate paths in the road network using the Fréchet and the weak Fréchet distances. The Fréchet and the weak Fréchet distances can be illustrated as follows: suppose a man is walking his dog and that he is constrained to walk on a curve and his dog on another curve.

Both the man and the dog are allowed to control their speed independently, but are not allowed to go backward in the case of the simple Fréchet distance, while in the case of the weak Fréchet distance, they are. Then, the Fréchet and the weak Fréchet distances between the two curves is the minimal length of a leash that is necessary in each case. The proposed global map-matching algorithms find a curve in the road network that is as close as possible to the given trajectory. The underlying distance measure, i.e., the Fréchet distance and the weak Fréchet distance, also serves as a quality guarantee for the computed result.

Finally, the proposed algorithms are evaluated in terms of their running time and the quality of their matching result. Comparing the asymptotic running times, it is revealed that the incremental algorithm has a signiﬁcant performance advantage over the global algorithms. On the other hand, the global algorithms were found to produce better matching results.

Dans le document Mobility, Data Mining and Privacy (Page 173-178)