Information Dissemination - Related Work - Hovering information: a self-organising, infrastruct

Related Work

2.1 Information Dissemination

Information dissemination on mobile ad hoc networks has been extensively addressed. The main goal is that of disseminating information to a community of nodes which are located in a geographical region. Depending on the aim of the user application, target nodes may be all of the nodes of the region or a subset of them, the latter is known as selective dissemination. For instance, in vehicular ad hoc networks, traffic reports are constantly disseminated among cars passing through a precise route in order to help them to avoid traffic jams. While dissemina-tion may be achieved with the help of fixed infrastructure like infostadissemina-tions in the case of route traffic reports, the process itself of dissemination is a decentralised one which can be classified in flooding and epidemic or gossiping approaches accordingly to the way how information is disseminated from one node to another.

2.1.1 Flooding and its Variants

Flooding is the most simple and effective way of disseminating information through out the entire network. In the simple flooding approach, also called plain flooding, each host receiving a packet makes a copy of the packet and re-broadcasts it. In order to avoid infinite loops, each new packet to be flooded is marked with a source address (i.e. the host that initiates the flooding) and a sequence number. A packet can be then uniquely identified. Each host that receives a packet to re-broadcast is able to check whether it has already received and re-broadcasted the packet based on these two attributes [PR99, JMB01, JHM07] (this approach is also used in fixed networks [Tan96, KR08]). All the hosts of the network will eventually receive the packet provided the network is connected. While the plain flooding is simple and effective, the

major drawback is the high network overhead – the messages complexity¹isO(n)– which might induce a broadcast storms problem as described in [NTCS99]. A broadcast storm problem refers to serious redundancy, contention and collision levels that may arise from plain flooding due to high number of messages that may be exchanged.

In the following paragraphs, we present two categories of variants of flooding which aim at overcoming the broadcast storm problem that plain flooding suffers from. The first category proposes methods that aim at decreasing the redundancy of messages as much as possible. The second category proposes similar approaches but they are adaptable to the density and speed of hosts which has an impact on the performances of flooding.

Simple Heuristic Variants

In [NTCS99], authors have proposed several variants of flooding in order to overcome the broad-cast storm problem. The main idea behind all of them is to inhibit a host from re-broadbroad-casting a message whenever it considers that the re-broadcasting will not cover additional nodes and it will rather increase the chances of contention or collision. In the first approach, the probabilistic scheme, a host re-broadcasts a message with probabilityPafter having waited for a small ran-dom number of slots. The scheme is similar to the classical flooding whenP=1. In the second approach, the counter-based scheme, a hosts that receives a message waits for a random number of slots before re-broadcasting the messages. While waiting, the host might receive the same message from a neighbouring host that re-broadcasted it. In such a case, the host increases a counter of redundant received messages and it stops the re-broadcasting if the counter is above a certain maximum counter threshold. The distance-based scheme works in a similar way to the counter-based scheme. However, instead of counting the redundant received messages, it takes into consideration the distance of sender hosts. More precisely, when a host receives a messages it estimates the distance of the sender host using the signal attenuation. It then updates a min-imum sender distance. Whenever this minmin-imum sender distance is below a certain minmin-imum distance threshold, the host aborts the re-broadcasting because it considers that the additional coverage would be insignificant. The location-based scheme works also in a similar way to the counter-based scheme. It takes advantage of the geo-localisation capabilities of hosts. When a host receives a messages, it estimates the additional covered area based on the geographical locations of the sender and the previous senders (i.e. each host sends its location along with the message). If the additional coverage is below a certain additional coverage threshold, the host stops the re-broadcasting. The different thresholds for the previous schemes are fixed parameters whose value were figured out analytically and through simulations in [NTCS99]. There is a fifth approach by the same authors – the cluster-based scheme. Each host assumes the existence of an underlying clustering algorithm as the one described in [JLT99] (we review several clustering algorithms in Subsection2.3.4). A cluster is composed of three types of hosts: a cluster head, gateways and members. A gateway host is able to communicate between two clusters. Based on these structure of clusters, the cluster-based scheme makes only cluster heads and gateways to be able to re-broadcast a message. Moreover, the previous described schemes can be also applied to this scheme, taking into consideration only cluster heads and gateway.

Finally, In [SCS03] authors propose a simple probabilistic flooding algorithm: each node

1We use the termmessages complexityto refer to the asymptotic behaviour of the number of sent networking messages as a function of the density of nodes (i.e. the number of nodes). Thus, the messages complexity is expressed using the big O notation excluding coefficients and low order terms.

re-broadcasts a message with probability pafter receiving it. Although this approach is similar to the probabilistic scheme discussed above, authors studied the flooding process from a phase transition phenomena point of view using percolation theory and random graphs. Through sim-ulations, they proved that the success rate (defined as the ratio of distinct packets received at each node by the total number of distinct packets broadcasted in the network, averaged across all nodes) presents a phase transition when pis around 0.59 in an ideal network (i.e. no channel interference). However, in real MANETs conditions which are prone to packet collisions, they did not find any effect similar to a phase transition phenomena. The success rate grows rather linearly with p. However, in high density scenarios, the behaviour looks like a bell curve as beyond a value of 0.1 for pthe success rate degrades due to packet collisions. They concluded that in high density scenarios the probabilistic flooding enhances the flooding process when pis carefully set, while in low density scenarios a probabilistic approach is not efficient.

Adaptive Schemes

An adaptive version of some of the flooding variants described in the previous subsection has been presented in [TNS03]. Indeed, the drawback of previous schemes is the reliance on fixed values for the maximum counter, minimum distance and minimum additional coverage thresh-olds. This fixed values are not well adapted to the density of hosts. Therefore, the authors proposed three new schemes called adaptive counter-based scheme, adaptive location-based scheme and adaptive neighbour-coverage scheme. For the first two schemes, which are ex-tension of counter-based and location-based schemes, the maximum counter thresholdC(n)and minimum additional coverage thresholdAC(n)depend on the current number of neighbours of a host, which isn. Intuitively, authors mention that the re-broadcasting should be reinforced in sparse networks and inhibited in dense networks. They propose concrete functions forC(n)and AC(n)which are the result of a tuning process through simulations and analytical analysis. The adaptive neighbour-coverage makes use of 2-hop neighbourhood information which is obtained through underlying neighbours discovery algorithms (one method is based on broadcasting pe-riodically beaconing hello messages along with the list of neighbours). Thanks to the previous underlying mechanism, each hostxmaintains a setN_xof its 1-hop neighbours and for each host h∈Nxa setN_x,hcontaining the 1-hop neighbours ofh(thus some 2-hop neighbours of hostx).

Whenever a hostx receives for the first time a message from a hosth, it initialises a setT as T =N_x−N_x,h−hwhich is the list of pending hosts having not received the message as view from hostx. Then, the hostxdelays the re-broadcasting of the message for a random number of slots. While waiting, if the hostxreceives a redundant message from a hosth, it updatesT asT =T−N_x,h−h. Whenever the set T is empty, which means that there is no more 1-hop neighbours having not received the message, the host x is inhibited from re-broadcasting the message. Otherwise, the waiting is resumed.

Finally, the adaptive flooding approach proposed in [VO02] follows a different direction. A host participating in a flooding process switches between three types of flooding algorithms as a result of the current speed of the host. The first type of flooding is called scoped flooding. Each host maintains a list of 2-hops neighbourhood information in a similar way to how the adaptive neighbour-coverage scheme described above does. Whenever a host receives a message, it does not re-broadcast the message if its set of neighbours is a subset of the neighbours of the host from which it received the message (in practice the condition is satisfied whenever both sets overlap on more than 85% of their hosts). The second algorithm is called hyper flooding. Through

a beaconing mechanism, each host maintains a 1-hop neighbours list. The first time a host receives a message, it re-broadcasts the message (plain flooding). However, the host also stores the message into its messages cache. Whenever the host receives a hello message (beaconing) or a message (to flood) from a new host (not belonging to its current list of neighbours), the host re-broadcasts all the packets stored in its cache. The idea behind this is to make sure that any new host will get the message as part of the flooding. The third algorithm is simply the plain flooding. A host continuously switches between these three flooding algorithms based on its current relative speed which is computed thanks to the additional velocity information carried by the hello messages. Whenever the relative speed is lower than 10m/s, a host switches to scoped flooding. If the relative speed is higher than 25m/s, it then switches to hyper flooding.

Otherwise, it switches to plain flooding. The idea behind this is that at higher speed of nodes, greater the chances of missing a message, and vice-versa. The speed thresholds have been determined through simulations by the authors.

2.1.2 Epidemic Dissemination and Gossiping

Epidemic dissemination is inspired on the spreading of infectious diseases. However, the aim of epidemic dissemination is to rapidly spread some data to all nodes of the network whereas the study of epidemics in biology aims at preventing infectious diseases from spreading across large groups of people. The epidemic theory has been first applied in the domain of distributed systems and networking (propagating updates in distributed databases [DGH⁺87, KSSV00], failures detection[VRMH98], data aggregation, resource discovery and monitoring[VRBV03], broadcasting and multicasting[BHO⁺99, EG02]). Moreover, several works have focused on studying different mathematical aspect of epidemic dissemination (branching processes, finite population models [Bai75], proportion of infected processes , probability of atomic infection, latency of infection [Pit87]). Regarding the domain of MANETs, the epidemic dissemination has been mainly applied to disseminate information. The following paragraphs describe the main approaches.

In large scale populations of hosts, we use a deterministic compartmental epidemic model [KBTR02]. The dissemination (epidemical infection) of some information throughout the whole network is triggered by one host. All hosts have a state: susceptible of infection (S), infected (I) and recovered (R). Those host which are able to receive the information and store it are susceptible of infection. Those host which actually host the information are infected. Those hosts which removed the information from their buffer are recovered. A system of differential equations may describe the evolution of hosts (through the states) throughout time. An important parameter of the model is the infection rateβ which defines the rate at which a host get infected.

The following paragraphs describe two algorithms which are based on these type of models.

In [SMML07], authors proposed a controlled dissemination algorithm for human wireless networks (Epcast). Whenever a host receives a new data to disseminate, it also receives the percentage of nodes that should receive the data (the total number of nodes is known) and the time limit to propagate the data. Based on this information, the host computes the infection rate λ by relying on a three compartmental SIR-model (Kermack and McKendrick model [AM92]).

More precisely, the host numerically solves the following system of differential equations which

describes the SIR-model: 

where S(t), I(t) and R(t) are respectively the number of susceptible, infected and recovered hosts at timet,β is the average number of contacts with susceptible hosts which induces a new infected host per unit of time per infected, andγis the average rate of recovered of infected hosts per unit of time per infected in the population. Using the percentage of node that should receive the data (get infected) and the time limit as the limits of the system, theβ parameter is tuned.

Moreover, assuming a homogeneous network structure,β is expressed asλ^<k>_N where<k>is the average network degree. In this way, the infection rateλ is computed, which is then used as the constant infection rate when propagating the data. Theγ parameter is fixed defined by the buffering properties of the host (if the host never removes data, thenγ=0).

In [KBTR02] authors propose a dissemination algorithm which is based on SPIN-1 (Sensor Protocol for Information via Negotiation [HKB99]). Whenever a host gets some data to be disseminated, it advertises a summary of its data entities to its neighbours. Neighbouring hosts then request the data which they are interested in. The advertising host send the requested data. In order to study the impact of the density of hosts on the infection rate of the algorithm, authors proposed a two compartmental SI-model which is described by the following system of differential equations: whereS(t)andI(t)the number of susceptible and infected hosts respectively,ais the infection rate, and N is the total number of hosts. They found that the infection rate increases with the density of hosts until reaching a maximum to then start decreasing. This maximum is the optimum density of hosts. Assuming a uniform distribution of hosts, they found that the mean number of optimum neighbours is 10.95, threshold at which the highest infection rate is reached.

This optimum number of neighbours is very similar to the optimum network degree, 10, found in [RMsM01] for delivering the maximum data packets in MANETs (the optimum network degree is 6 when hosts are fixes [KS78]). Following a similar approach, a host may compute the infection rate for different algorithms and switch between them when the density and size of the network changes.

Finally, Autonomous Gossiping (A/G) [DQA04] is a type of epidemic algorithm for se-lective dissemination of information. A/G does not require any infrastructure or middleware like a multicast tree and (un)subscriptions maintenance (for publish/subscribe). It rather uses ecological and economic principles in a self-organising manner in order to achieve any arbi-trary selective dissemination (flexible casting). The trade-off of using a stateless self-organising mechanism like A/G is that it does not guarantee completeness deterministically like other se-lective dissemination schemes (e.g. publish/subscribe schemes). Such incompleteness is not a problem in many non-critical real-life civilian application scenarios and realistic node mo-bility patterns, where the overhead of infrastructure maintenance may outweigh the benefits of completeness.

2.1.3 Data-Centric Routing

In this subsection we review algorithms that aim at gathering data from sensors in a Wireless Sensor Network (WSN). When a sensors detects some event, this event (i.e. all the data de-scribing the event) is either sent to a Base Station (BS) or it is stored in the sensornet². We can then identify three types of canonical storage methods: external storage (ES), local storage (LS) and data-centric storage (DCS). Whenever a node³, called the sink node or sink, in the sensornet or an external node wants to gather data matching an interest, the node either sends the interest to a BS or to the sensornet depending on where the event is stored. In the sec-ond case, when the event is stored in the sensornet, the sink node must disseminate the query throughout the entire sensornet or some region. In response, nodes storing events that match the interest (this nodes are called source nodes) send back the respective event to the sink node.

We are then talking about two problems: routing interests towards nodes where events matching these interests are stored, and routing events matching interests towards the nodes that initi-ated those interests. It is important to notice that the routing decisions are based on interests and events instead of network addresses of nodes. That is why it is called data-centric routing.

Several works have been proposed in the literature and comprehensive surveys can be found in [YMG08,AKK04,ASSC02,AY05]. In the following paragraphs we present some of these works.

Before presenting the data-centric routing algorithms, it is pertinent to mention three as-pects. First, in general authors have structured the content of interests and events as sets of pairs (attribute,value). Second, some authors have named interests as queries. In the rest of this subsection, we will mainly use the terms interest and event. However, it may happen, when necessary, that we use the terms query and data event to refer to interest and event respectively.

And third, another approach of gathering events from WSNs has been proposed in the literature.

In this approach, an event is not necessarily stored in the node that detected the respective event but in nodes being located in or around geographical locations associated to the data (generally through a hash functions that maps data to locations). In this way, interests must be routed to-wards these geographical locations instead of flooding the entire sensornet. This approach is called data-centric storage and we will discuss about it in Subsection2.2.3.

In [IGE00], authors proposed Directed Diffusion. An interest can be inserted at any node of the network, this node becomes the sink for that interest. The sink initiates a propagation of the interest throughout the network. The key point during this propagation is that gradients are set up in order to pull down events matching the interest. Flooding or geocasting mechanisms can be used to propagate an interest. Each node that receives an interest from another node stores the interest by creating an interest entry in its table of interests (if not yet stored) and sets up a gradient towards the node that sent the interest. A gradient is a pair(value,direction)where value is application-specific and direction refers to the sender node. During this propagation process, a complete or partial redundancy of interests may exist and in that case interests are aggregated (in-network aggregation). Any node may also determine when to remove interests from its cache (i.e. table of interests) based on the lifetime and timestamp of these (lifetime and timestamps are generated at the sink). Regarding the events, when a source node detects an event that matches an interest stored in its table of interests, the source node initiates a propagation of the event towards the sink of the interest by relying on the gradients set up during the propagation

2we also use the word sensornet to describe a network of sensors or a WSN.

3we also use the word node to refer to a sensor because it is part of a network.

of the interest. The source node sends the event to those neighbours for which it has a gradient,

Dans le document Hovering information: a self-organising, infrastructure-free information storage and retrieval service for mobile applications (Page 30-37)