Retrieval Algorithms - Storage and Retrieval Algorithms

Storage and Retrieval Algorithms

4.4 Retrieval Algorithms

As discussed in the storage algorithms (cf. Section 4.3), one of the aims (besides the persis-tence) of the proposed storage algorithm is to keep a high accessibility degree by keeping the replicas of a hoverinfo spread throughout the entire anchor area of the hoverinfo. While this mechanism already tackles with the accessibility of hoverinfos, what is missing is the actual re-trieval of replicas when required by a mobile application. Therefore, in this section we propose two retrieval algorithms, a push-based and a pull-based accordingly to the access paradigms of hovering information (Subsection3.5.1). Before describing in detail both retrieval algorithms, the following subsection describes the semantics when retrieving hoverinfos.

4.4.1 Semantic of Subscriptions and Queries

We propose in this thesis two retrieval algorithms: a push-based and a pull-based, accordingly to the hovering information model (cf. Chapter3). As we will see in the following subsections describing each of the algorithms, the push-based retrieval pro-actively tries to satisfy subscrip-tions submitted by mobile applicasubscrip-tions whereas the pull-based reactively try to match queries submitted by mobile applications. Regardless of the mechanism how the retrieval algorithm

h1 h2

(a) First scenario, we can observe the pres-ence of only passive replicas inside the an-chor area

(b) The replicah1self-actives after the expiration of its self-activation timer

h2 h3

(d) The passive replica h₁ self-actives after a period of entering into the anchor area

Figure 4.10: Self-activation of passive replicas

perform their tasks, we explain in this subsection the semantics of the subscriptions and the queries.

A subscriptionsis defined in the following way:

s=<name_M,opM,val_M,AM,u_M> (4.6) wherename_M is the matching name,op_Mis a binary operator where the first argument is a text and the second isvalMwhich is the matching value, the operator returns a boolean value.AM is the matching relevance area andu_M is the matching mobile user. Using the previous notation, we can then express the matching semantic between a replicahand a subscriptionqas it follows:

[(nameM(s) =name(h))∨(nameM(s) =ANY NAME)]∧ h matches s⇔ [opM(content(h),val_M(s)) =T RU E]∧

[(loc(s)∈A_M(s))∧(loc(s)∈A_H(h))]∧

[(uM(s) =audienceH(h))∨(uM(s) =ANY U SER)]

(4.7)

In the same way, we define a queryqas it follows:

q=<nameM,opM,valM,uM> (4.8)

wherenameM is the matching name,opMis a binary operator where the first argument is a text and the second isvalMwhich is the matching value, the operator returns a boolean value. And u_Mis the matching mobile user. Using the previous notation, we can then express the matching semantic between a replicahand a querysas it follows:

[(nameM(q) =name(h))∨(nameM(q) =ANY NAME)]∧

h matches q⇔ [opM(content(h),val_M(q)) =T RU E]∧ [loc(q)∈AH(h)]∧

[(uM(q) =audienceH(h))∨(uM(q) =ANY U SER)]

(4.9)

We notice that the semantics of both the subscriptions and queries are quite similar with the exception of the matching relevant areaA_M. Indeed, a subscription is meant to last for as long as the mobile user decides to remove the subscription. Moreover, we can imagine that a mobile user want the subscription to be matched only when the mobile user is at certain places, being the reason of the existence of a matching relevance area. In the case of the queries, we can not define such an area because a query is meant to be matched immediately, thus the relevance area is the current location of the mobile user. Afterwards, a query does not exist any more or in other words it has a one-time nature.

4.4.2 Pull-Based Retrieval

The Pull-Based Retrieval (PULLBR) algorithm aims at enabling mobile applications to retrieve hoverinfos on-demand. A mobile application submits a query accordingly to the syntax and semantic discussed in Subsection4.4.1. The mobile node on which the application is running first checks its local buffer for matching replicas and then it floods the network with the query in a controlled way in order to find matching replicas. Therefore, the success of the algorithm depends on the accessibility degree of hoverinfos and the degree of partition of the network.

More precisely, whenever a mobile application wants to submit a query, the mobile node on which the mobile application is running — we call it the querying node — first checks its local buffer for matching replicas. If found any, the querying node delivers the replicas to the respective mobile application. Afterwards, the querying node starts a querying process by performing a geographical scoped flooding with the query. It is the mobile application which defines the geographical scope for the query, expressed as a geographical distance d_MQD(Max Querying Distance) from the querying node. Thus, the query is flooded throughout a circular area around the querying node. In order to avoid broadcast storm, we have followed the distance-based scheme from [NTCS99] when flooding a query. At the end of this process, the query will have flooded the circular area around the querying node, and the rate of nodes possessing the query will depend on whether the nodes form or not a partitioned network inside the flooding area, and the degree of usage of the communication channel as some messages could be lost due to the unreliable nature of broadcasting messages.

A mobile node receiving a query checks in its buffer for a matching replica or replicas. If matching replicas exist, the node sends them back through another geographical scoped flood-ing. However, unlike the flooding mechanism of the propagation of a query described in the previous paragraph, the flooding used to propagate back the matching queries constrains the flooding to a geographical area being defined as a triangle having its main vertex at the replying node and its base being at the querying node. The base of the triangle is proportional to the

maximal distance that the querying node may have moved during the time since the propaga-tion of the query and the arrival of the reply. Another possibility for the propagapropaga-tion of a reply would have been that of the reversal path using either unicast or broadcast messages. However, we preferred the geographical scoped flooding because of the dynamism of the environment — the validity time of a path is short due to the mobility of nodes so that it is safer to follow an state-less approach.

An interesting point of the propagation back of replies lies on the way how each node pos-sessing a matching replica delays its propagation in order to decrease the chances of collisions between the messages of several replies. We called this mechanism theDiscs and Sectors Delay-ing Mechanism(DSDM). More precisely, the circular flooding area of a query is partitioned into concentric discs, and each disc in sectors (as for the hard disks). Outer a disk is, more sectors it will have. A node computes in which sector it is located to then compute the initial delay that will be applied before propagating back a reply. The initial delay is proportional to the position of the sector which is defined by the position of the disk and the position of the sector itself inside the disk. In other words, a time slot, we called itcircular slot, is assigned to each sector in an exclusive way. However, several nodes possessing replies may be located inside a same sector and they would compute the same initial delay. As the nodes are not synchronised, there will exist a natural delay between these nodes, but this delay might not be long enough to avoid collisions between them. Therefore, an additional random delay is added to the initial computed delay.

The following formula describes the way how the initial delay ρinit is computed at each node: whereFDis an estimation of the duration of flooding a query, cis the disc at which the node is located,TSLOT is the time duration of an slot,sis the sector at which the node is located and MNICSis the maximal number of distinguishable nodes inside a sector. The way how thecand sare computed is as follows:

where d_Q is the distance between the node and the querying node,R_SLOT is the radial size of a sector, and θ is the angle formed by the location of the node, the querying node and the x-axis. Figure4.11illustrates the circular slots defined by the DSDM mechanism when we only consider three discs.

Figure4.12 depicts all the phases of PULLBR. In Figure 4.12(a)we can see that a mo-bile node floods its surroundings with a queryq. Although not shown in the Figure, the node has previously checked its local buffer for matching replicas. In Figures 4.12(b), 4.12(c)and 4.12(d), we can see that matching replicas are propagated back using a scoped flooding. More-over, we can also see that the propagation back of replies is delayed accordingly to the DSDM mechanism.

All along the propagation of a query and the propagation back of replies, a node transits from several states to coordinate some tasks. A node becomes aware of a query just after receiving it

Figure 4.11: Circular slots defined by DSDM

q h1

(a) The querying node (black one) floods its surroundings with the query q using a scoped flooding mechanism

q h1

(b) After a delay determined by its circular slot, the matching replicah3(reply) is propa-gated back through a scoped flooding mech-anism

q h1

(c) After a longer delay determined by its circular slot, the matching replicah₂(reply) is propagated back through a scoped flood-ing mechanism

q h1

(d) Finally, after an even longer delay de-termined by its circular slot, the matching replicah1(reply) is propagated back through a scoped flooding

Figure 4.12: Pull-Based Retrieval (PULLBR)

for the first time to then forward it if required by the flooding mechanism. Once a node forwards (or not) a query, it then becomes ready to forward replies in transit (the ones from another nodes) as part of the geographical scoped flooding process. Moreover, it also becomes ready to propagate back its own replies (the ones matched in its local buffer). A node stays in this state until the query timeout expires. The timeout timer of a query is set as soon as a node receives a

query. The timeout is set accordingly to the following formula:

T_timeout =2·FD+4· dMFD RSLOT

+1e²·T_SLOT (4.13)

where FD is the estimation of the query flooding mechanism,MFD is the maximal flooding distance allowed, R_SLOT is the radial size of a sector, andT_SLOT is the time slot assigned to a sector.

As a way of optimising and decreasing the redundancy of replies whenever several nodes possess similar replicas, nodes append to a query message a list containing the identifiers of the matched replicas that the node found in its local buffer. Then, a node receiving a query matches the query against its local buffer by taking care of skipping those replicas (contained in the list of identifiers received with the query) that were already matched in previous nodes. In this way, redundant replies are avoided along the propagation path of a query. However, this does not imply that the redundancy is completely mitigated. On the contrary, redundant replies will always exist as several propagation paths of a query exist throughout the querying area. This redundancy is important to increase the delivery rate of replies as the communication channel is not reliable.

Once the querying node initiates the propagation of a query, it waits for replies that will be delivered to the mobile application as soon as they arrive. It may happen that several repli-cas, belonging to the same hoverinfo, arrive to the querying node due to the flooding nature of SAPRESA (several nodes may host a replica of the same hoverinfo). In such a case, the querying node only takes into consideration the first replica and discards the duplicate replicas.

The querying process is stopped once the query timeout expires and any other received reply afterwards is simply discarded.

4.4.3 Push-Based Retrieval

The Push-Based Retrieval (PUSHBR) algorithm aims at enabling mobile applications to retrieve hoverinfos in a proactive way. The retrieved hoverinfos must match some specific criteria spec-ified by the mobile applications in the form of subscriptions. More precisely, the push-based retrieval algorithm enables a mobile application to submit subscriptions having the syntax and semantic described in Subsection 4.4.1. The subscriptions are stored in the the mobile node where the mobile application is running. Thus, there is no a centralised entity managing the subscriptions and a subscription is only valid in the mobile node where it is stored. The mobile application that submitted a subscription is called a subscriber.

In order to retrieve hoverinfos matching the subscriptions, PUSHBR is composed of two mechanisms: the local buffer querying and the 1-hop neighbours querying. In what concerns the first mechanism, the local buffer querying, a mobile node periodically, everytPU SHBR (push-based periodical matching timerparameter) seconds, checks its local buffer for replicas match-ing one of the several subscriptions that might exist in the node (submitted by the mobile ap-plication running on the mobile node). Whenever a replica matching a subscription is found, the mobile node delivers such a replica to the respective subscriber. The success of the previous mechanism strongly depends on the accessibility degree of hoverinfos as the chances of finding a matching replica for a subscription are higher when multiple replicas populate the anchor area in a spread way, which is a consequence of the storage algorithm as discussed in Section4.3.

The second mechanism, the 1-hop neighbours querying, increases the chances of matching the subscriptions of a mobile node by periodically, everytPU SHBRseconds, querying the 1-hop

neighbours of the node. Indeed, both mechanisms are carried out at the same time. More pre-cisely, the subscriber node, as we call the mobile node interested in matching its subscriptions, periodically broadcasts the list of all its subscriptions to its 1-hop neighbours. A neighbour-ing mobile node receivneighbour-ing such a list performs a matchneighbour-ing process between the subscriptions of the list and the replicas stored in its buffer. If a matching replica is found, the neighbour-ing node sends it to the queryneighbour-ing node. As each neighbourneighbour-ing node havneighbour-ing received the list of subscriptions performs the same matching process, each node delays the sending of a matching replica in order to decrease the chances of collision between the multiple messages containing the matching replicas. The way how the delay is computed is accordingly to the DSDM delaying mechanism described in Subsection4.4.2. The only difference is that the query flooding estima-tion parameterFDis set to zero as in the case of a subscription there is no flooding process. In addition to the previous delay, a neighbouring mobile node hearing a matching replica sent by another node removes the replica in order to do not re-send the same replica in case it was one of its matching replicas to be sent. This optimisation decreases redundancy of matching replicas and also prevents the communication channel from collisions.

As a mobile node periodically broadcast its list of subscriptions, it may happen that another node receives such a list more than once so that re-doing the matching process would be redun-dant if the buffer of the node has not changed (no new stored replicas) since the last time the node received the list. In order to avoid this redundancy, the querying node marks each message containing the subscriptions list with a master sequence number and an slave sequence number.

The first list sent has a master sequence number of 1 and an slave sequence number of 1. Af-terwards, the list is sent with an increasing slave sequence number as long as the list does not have any new subscriptions. On the other hand, if subscriptions are removed from the list, the slave sequence number continues to increment. Whenever new subscriptions are added to the list, the master sequence number is incremented. A node receiving a list records both the last master sequence number received and the last slave sequence number. Based on these values, the node can determine whether the received list is similar or different from the the last list the node received.

Figure4.13 depicts PUSHBR. In Figure 4.13(a), we can observe that a subscription s is satisfied by a replica hresiding in the subscriber mobile node. In Figures4.13(b)and4.13(c), the subscriber node periodically broadcasts its list of subscriptions along its trajectory. A node receiving such a list replies whenever it has a matching replica as it is the case for the node sending back the replicahin Figure4.13(c).

4.5 Evaluation

This section aims at evaluating the storage algorithm, SAPRESA, and the two retrieval algo-rithms, PUSHBR and PULLBR, proposed in this thesis. We evaluate several aspects such as the critical mass of density of nodes that is required by SAPRESA to store a hoverinfo in a persistent way. The messages complexity of SAPRESA (i.e. the asymptotic behaviour of the number of sent messages as a function of the density of nodes). The behaviour of the IPR, LRR and SA mechanisms. The convergence and stability of SAPRESA. Regarding the retrieval algorithms, we aim at evaluating the retrieval performances such as the matching rate and delay. We also study the behaviour of these algorithms. Finally, all these evaluations aim at answering to the question on whether or not hovering information is feasible.

h s

(a) The subscription s is satisfied by the replicahstored in the local buffer of the subscriber node

(b) The subscriber node periodically broadcasts its list of subscriptions along its trajectory

Figure 4.13: Push-Based Retrieval (PUSHBR) 4.5.1 Simulation Framework and Settings

We used the discrete event simulation framework OMNet++ (distribution 3.3) which is gaining widespread popularity in the research community as a simulation framework². Due to the mod-ular architecture of OMNet++, models are composed of modules whose behaviour are coded in C++. Such modules are assembled together into larger modules using a high-level language called NED. Moreover, OMNet++ provides an extensive GUI support as well as additional tools to plot and manipulate data. We also used the Mobility Framework 2.0p3 (MF) which provides to OMNet++ a set of modules to simulate 802.11b networking. The radio propagation model im-plemented in MF is the Free Space Model. All the simulations ran on a Linux cluster (Myrinet) of 32 computation nodes (Sun V60x dual Intel Xeon 2.8GHz, 2GB RAM).

The simulation model that we developed for simulating the storage and retrieval algorithms is composed of several modules: a storage module, two retrieval modules (a pull-based and a push-based), a hoverinfos manager, a queries manager, a subscriptions manager, a failures manager, and a mobility manager.

Mobile nodes move accordingly to the Random Direction Mobility (RDM) model [SPR06, Bet01,RMsM01]. In RDM, a mobile node choses randomly a direction of movement following

2Another popular simulation frameworks is ns-2

a uniform distribution. It then choses a speed of movement, between a minimum and a maxi-mum, following a uniform or normal distribution. Finally, the mobile node choses a duration (of the movement) following an exponential distribution. RDM tends to produce uniform spatial distributions of nodes compared to Random Way Point (RWP) mobility model, which is also broadly used in the research community of MANTEs. Regarding the evaluation on synthetic human mobility traces, we used the SLAW mobility model [LHK⁺09].

The simulation area is a surface of 600m x 600m where mobile nodes move and send/receive messages using a 802.11b network interface in ad hoc mode. The minimum speed of nodes is 0.5m/s and the maximum is 1.5m/s (people speeds). The simulated time is 3600s (1 hour).

Based on these basic configurations, we defined 60 specific scenarios with varying number of

Dans le document Hovering information: a self-organising, infrastructure-free information storage and retrieval service for mobile applications (Page 108-168)