Évaluation analytique de la sauvegarde coopérative

2.2 Évaluation de la résilience des systèmes mobiles

2.2.1 Évaluation analytique de la sauvegarde coopérative

Thèse : [33] - Publication majeure : [34]

Dans [34], nous abordons l’évaluation analytique de la sûreté de fonctionnement du ser-vice de sauvegarde coopérative présenté en section 2.1.1. En effet, comme mentionné pré-cédemment, plusieurs stratégies de réplication et de dispersion sont envisageables pour ce service. Par exemple, en termes de réplication, il est possible de créer de simples copies com-plètes des données utiles (réplication simple), ou encore d’envisager des techniques plus éla-borées à base de codes à effacement. Ce choix de politique a un impact sur l’efficacité du stockage et sur la confidentialité des données, tel que décrit dans [36], mais son impact sur la disponibilité des données n’était pas facile à évaluer et son étude a donc fait l’objet de ces travaux.

Codes à effacement. Un code à effacement (n,k) produit un mot de n bits à partir d’une entrée de k bits. Pour recouvrer les données originales, m bits sont nécessaires et suffisants, avec k ≤ m ≤ n. Lorsque m = k, le code est dit optimal. Lorsque chaque k fragment est stocké sur un dispositif différent, un code optimal permet de tolérer n−k défaillances, au-delà de la défaillance du client du service. Le coût du stockage est alors deⁿ_k. Pour tolérer f fautes, il faut n = k + f , ce qui donne un coût de 1 + ^f_k. Un code à effacement (avec k ≥ 2) est donc plus efficace qu’une simple réplication (avec k = 1) pour ce qui concerne la quantité d’information à stocker.

Stratégies de réplication et opportunités de sauvegarde. Nous considérons le cas d’un client du service de sauvegarde qui doit répliquer un seul élément de données. Ce client suit une stratégie prédéfinie, à base d’un code à effacement (n,k) dont les paramètres sont éga-lement définis hors-ligne. Lorsque k = 1, la stratégie correspond à une réplication simple. Nous considérons une stratégie statique. En particulier, nous considérons que les clients ne détectent pas les défaillances des contributeurs du service. Les clients ne peuvent donc pas décider de créer des répliques supplémentaires d’un fragment lorsque le contributeur qui le stockait est défaillant. Nous considérons que chaque rencontre avec un nouveau dispositif est une nouvelle opportunité de sauvegarde. Précisément, cela implique qu’une relation de confiance peut être établie avec chaque contributeur et que ce dernier dispose de suffisam-ment d’espace de stockage. L’effet de cette hypothèse peut aisésuffisam-ment être étudié en faisant

2.2 Évaluation de la résilience des systèmes mobiles 53

owner

owner meets a contributor

FC FC α α L L OU OU λ₀ λ₀ L L β₀ β₀ L L OD OD MF MF (MF m(MF)β L L (MF m(MF)λ L L m m(SF) + m(MF) < k L L SF m(SF) ≥ k L L DL DL DS DS SF SF L ≡ (m(DL) = 0) ∧ (m(DS) = 0)

Figure 1. Petri net of the replication and

scatter-ing process for an (n,k) erasure code.

access the Internet. In other words, the Internet-based

store of our cooperative backup service is abstracted as

a “reliable store”. Conversely, if a participating device

fails before reaching the Internet, then all the fragments

it holds are considered lost. Thus, with (n,k) erasure

cod-ing, a data item is definitely lost if and only if its owner

device fails and less than k contributors hold or have held

a fragment of the data item.

Our model consists of three main processes

repre-sented by timed transitions with constant rate

exponen-tial distributions:

• A process with rate α that models the encounter of a

contributor by the data owner, where the owner sends

one data fragment to the contributor.

• A process that models the connection of a device to

the Internet, with rate β₀ for the owner and β for

con-tributors.

• A process that represents the failure of a device, with

rate λ₀ for the owner and λ for contributors.

The GSPN in Figure 1is divided into two interacting

sub-nets. The subnet on the left describes the evolution of a

data item at the owner device: either it is lost (with rate

λ₀), or it reaches the Internet store (with rate β₀). Places

OU and OD denote situations where the owner device is

“up” or “down”, respectively. The subnet on the right

de-scribes: (i) the data replication process leading to the

cre-ation of “mobile fragments” (place MF) on contributor

devices as they are encountered (with rateα), and (ii) the

processes leading to the storage of the fragments (place

SF) in the reliable store (rateβ), or its loss caused by the

failure of the contributor device (rateλ). The initial

mark-ing of place FC denotes the number of fragments to

cre-ate. The transition rates associated with the loss of a data

fragment or its storage on the Internet are weighted by

the marking of place MF, i.e., the number of fragments

that can enable the corresponding transitions.

Two places with associated immediate transitions

are used in the GSPN to identify when the data item is

safely stored in the reliable store (place DS), or is

defi-nitely lost (place DL), respectively. The “data safe” state

is reached (i.e., DS is marked) when the original data

item from the owner node or at least k fragments from

the contributors reach the Internet. The “data loss” state

is reached (i.e., DL is marked) when the data item from

the owner node is lost and less than k fragments are

avail-able. This condition is represented by a predicate

associ-ated with the immediate transition that leads to DL.

Final-ly, L is the GPSN “liveliness predicate”, true if and only

if m(DS) = m(DL) = 0: as soon as either DS or DL

con-tains a token, no transition can be fired.

The GSPN model of Figure 1is generic and can be

used to automatically generate the Markov chain

associ-ated with any (n,k) erasure code. Examples of Markov

chains for different (n,k) may be found in [4]. The total

number of states in such an (n,k) Markov chain is O(n2).

The models we are considering, with reasonably small

values of n are tractable using available modeling tools.

3.4. Quantitative Measures

We analyze the dependability of our backup service via

the probability of data loss, i.e., the asymptotic

proba-bility, noted PL, of reaching the “data lost” state. For a

given erasure code (n,k), PL can be easily evaluated from

the corresponding Markov chain using well-known

tech-niques for absorbing Markov chains [11]. The smaller

PL is, the more dependable is the data backup service.

To measure the dependability improvement offered

by MoSAIC, we compare PL with the probability of

data loss PL_ref of a comparable, non-MoSAIC scenario

where:

• devices do not cooperate;

• data owner devices fail with rate λ₀;

• data owners gain Internet access and send their data

items to a reliable store with rate β₀.

This scenario is modeled by a simple Markov chain

where the owner’s device can either fail and lose the data

or reach the Internet and save the data. The probability

of loss in this scenario is: PL_ref = ^λ

λ

₀

+β

₀

.

We note LRF the data loss probability reduction

fac-tor offered by MoSAIC compared to the above

non-Mo-SAIC scenario, where LRF = PL_ref/PL. The higher LRF,

the more MoSAIC improves data dependability. For

in-stance, LRF = 100 means that data on a mobile device is

FIGURE2.5 – Réseau de Petri de la réplication et dispersion d’un code à effacement (n, k) varier le taux de rencontre des contributeurs. Nous considérons qu’une connexion à un réseau d’infrastructure n’est exploitée que lorsque la bande passante est abondante et peu chère, par conséquent, en cas de connexion à l’Internet, tous les fragments stockés sont transférés.

Modélisation. La figure 2.5 présente le réseau de Petri stochastique généralisé (GSPN) du service de sauvegarde coopérative avec un algorithme à base de code à effacement (n, k). Ce modèle se focalise sur la partie ad hoc de l’algorithme. Un fragment de donnée est considéré comme “sûr” (qu’il ne peut plus être perdu) lorsque son propriétaire ou le contributeur qui le stocke se connecte à Internet. Inversement, lorsqu’un dispositif participant défaille, tous les fragments qu’il stocke sont perdus. En conséquence, avec un code à effacement (n, k), une donnée est perdue lorsque son propriétaire défaille et moins de k contributeurs stockent encore un fragment ou en ont stocké un et l’ont transféré sur Internet. Ce modèle consiste en trois processus principaux représentés par des transitions temporisées :

– Un processus avec le taux α qui modélise la rencontre d’un contributeur par le proprié-taire, où ce dernier envoie un fragment au contributeur.

– Un processus qui modélise la connexion d’un dispositif à Internet, avec le taux β₀pour le propriétaire et β pour les contributeurs.

than when not using it.

3.5. Parameters

PL and LRF depend on a number of parameters (n, k,α,

β, λ, β

₀

, and λ

₀

). Rather than considering absolute values

for the rates of stochastic processes, we consider ratios

of rates of pertinent competing processes.

First, the usefulness of cooperative backup will

depend on the rates at which contributing devices

en-counter one another relative to the rate at which

connection to the fixed infrastructure is possible. Second, the ef

-fectiveness of devices towards data backup will depend

on the rate at which they fail relative to the rate at which

they are able to connect to the Internet to make the data

safe. We therefore study LRF as a function of the

contrib-utor and data owner effectiveness ratios

_λ^β

and

^β0

λ₀

.

Finally, one may question the assumption that

con-tributors accept all requests, at rateα, regardless of their

amount of available resources. However, simple

back-of-the-envelope calculations provide evidence that this

is a reasonable assumption. When the replication

strate-gy described in Section 3.1.1 is used, the number of

frag-ments (i.e., storage requests) that a contributor may

re-ceive during the time between two consecutive Internet

connections is, on average,

^α_β

. Let s be the size of a

frag-ment: a contributor needs, on average, V = s(

) storage

units to serve all these requests. If a contributor’s storage

capacity, C, is greater than V, it can effectively accept all

requests; otherwise, the contributor is saturated and can

no longer accept any storage request.

In other words, redefining α as the effective

en-counter rate (i.e., the rate of enen-counters of contributors

that accept storage requests), and letting γ be the actual

encounter rate, we have:

^α_β

= min(

,

). A realistic

esti-mate with C = 2 (contributor storage capacity of 1GB)

³⁰

and s = 2 (fragment size of 1 KB) shows that contribu-

¹⁰

tors would only start rejecting requests when

^γ_β

> 2 , a

²⁰

ratio that is beyond most realistic scenarios.

4. Results

This section discusses the results of our analysis.

4.1. Overview

Figure 2 shows the data dependability improvement

yielded by MoSAIC with a (2,1) erasure code using the

replication strategy outlined in Section 3.1.1. Here, we

assume that contributors and owners behave identically,

i.e.,β

₀

= β and λ

₀

= λ.

Three observations can be made from this plot.

First, as expected, the cooperative backup approach is

1 10 100 1000 10000 100000 Participant effectiveness (β/λ) ¹ 10 100 1000 10000 100000 Connectivity ratio (α/β) 1 10 100 1000 10000 100000 Loss reduction factor (LRF)

Figure 2. Loss reduction factor LRF for a (2,1)

erasure code.

not very relevant compared to the reference backup

ap-proach when

^α_β

= 1 (i.e., when Internet access is as

fre-quent as ad hoc encounters). Looking at the contour

lines of LRF from Figure 2, it appears that, for the

coop-erative backup approach to offer at least an order of

mag-nitude improvement over the reference backup scheme,

the environment must satisfy

^β_λ

> 2 and

> 10.

Second, for any given

^α_β

, LRF reaches an asymptote

after a certain

_λ^β

threshold. Thus, for any given

connec-tivity ratio

^α_β

, increasing the infrastructure connectivity to

failure rate ratio

_λ^β

is only beneficial up to that threshold.

The third observation that can be made is that the

dependability improvement factor first increases

propor-tionally to

^α_β

, and then, at a certain threshold, rounds

off towards an asymptote (visible on Figure 2 for small

values of

^β_λ

but hidden for high values due to choice of

scale). Other (n,k) plots have a similar shape.

4.2. Asymptotic Behavior

Figure 3 shows LRF as a function of

^β_λ

, for different

val-ues of

and different erasure codes (again, assuming

the data owner’s failure and connection rates are the

same as those of contributors). This again shows that the

maximum value of LRF for any erasure code, as

^β_λ

tends

to infinity, is a function of

^α_β

. We verified the following

formula for a series of codes with n ∈



 

2, 3, 4, 5

  

and

k ∈

  

1, 2, 3

 



(with k ≤ n) and postulate that it is true for

all positive values of n and k such that n ≥ k:

lim

β λ→∞

(LRF

n,k

(α

β^,

β

λ)) = ¹

1 −

(

α β 1+^α β

) ^(4.1)

First, it describes an asymptotic behavior, which

con-firms our initial numerical observation. Second, it does

not depend on n. This observation provides useful insight

FIGURE2.6 – Le LRF pour un code à effacement (2, 1)

– Un processus qui représente la défaillance d’un dispositif, λ₀ pour le propriétaire et λ pour les contributeurs.

Le réseau de la figure 2.5 est divisé en deux sous-réseaux. Le sous-réseau de gauche repré-sente l’évolution d’une donnée chez son propriétaire : soit elle est perdue car le propriétaire défaille (avec le taux λ₀), soit elle est transférée sur Internet car il s’y connecte (avec le taux β₀). Le sous-réseau de droite décrit : (i) le processus de création de fragments de donnée lors-qu’un contributeur est rencontré (avec le taux α) et (ii) le processus de transfert du fragment sur Internet par le contributeur (avec le taux β) ou sa perte par défaillance du contributeur (avec le taux λ). Un tel modèle par GSPN peut être utilisé pour générer les chaînes de Markov associées avec n’importe quel code à effacement (n, k). Ces chaînes de Markovsont détaillées dans [33]. Le nombre total d’états dans une chaîne de Markov (n, k) est O(n2).

Analyse. Le paramètre d’analyse le plus important est ici la probabilité de perte des don-nées, notée P L. On compare, pour un code (n, k) donné, P L avec la probabilité de perte de référence, notée P L_ref, qui correspond à un scénario sans service de sauvegarde coopé-rative où le propriétaire défaille avec le taux λ₀ et il se connecte à Internet avec le taux β₀. P L_ref = ^λ0

λ0+β0. On note LRF le facteur de réduction de la probabilité de perte : LRF = ^{P L}ref

P L . Un LRF de 100 veut dire qu’une donnée a une probabilité 100 fois plus faible d’être perdue avec la sauvegarde coopérative que sans elle.

La figure 2.6 donne le LRF pour un service de sauvegarde utilisant un code à effacement (2,1) et la stratégie de réplication résumée ci-dessus. Les propriétaires et contributeurs ont ici les mêmes caractéristiques : λ₀ = λ et β₀ = β. On peut remarquer trois choses sur cette figure :

– Comme on pouvait s’en douter, l’utilisation d’un service de sauvegarde coopérative n’est pas très judicieux lorsque les accès à Internet sont fréquents (^α_β ' 1). Si on

re-2.2 Évaluation de la résilience des systèmes mobiles 55

in how to choose the most appropriate erasure coding

pa-rameters, as we will see in Section 4.3.

We can similarly compute the limiting value of

LRF(n, k) as ^α_β tends to infinity:

lim

α β

→∞(LRF

n,k(α

β^,

β

λ)) =

n

(1 + ^β

λ)

k−1

∑_x₌₀ (n

x ) x

(β

λ) ^(4.2)

This shows that, when ^α_β grows, LRF reaches an

asymp-tote that depends on _λ^β.

4.3. Erasure Coding vs. Simple Replication

Figure 3 allows us to compare the improvement factor

yielded by MoSAIC when different erasure codes are

used. The erasure codes shown on the plot all incur the

same storage cost: n

k = 2. In all cases, the maximum

de-pendability improvement decreases as k increases. This

is confirmed analytically by computing the following

ra-tio, for any p > 1 such that pk and pn are integers:

R_p =

lim

β λ

→∞(LRF

pn,pk(α

β,^β

λ))

lim

β λ

→∞(LRF

n,k(α

β,^β

λ)) ⁼

1 −

k

(

α β

1+

^α β

)

1 −

kp

(

α β

1+

^α β

)

^(4.3)

We see that R_p < 1 for p > 1. Thus, we conclude that,

from the dependability viewpoint, simple replication

(i.e., with k = 1) is always preferable to erasure coding

(i.e., with k > 1) above a certain ^β_λ threshold. Below that

threshold, erasure coding is sometimes preferable to

sim-ple replication. Figure 4 compares the dependability

im-provement yielded by several erasure codes having the

same storage cost; only the top-most erasure code (i.e.,

the surface with the highest LRF) is visible from above.

The (2,1) plot is above all other plots, except in a small

region where the other erasure codes (thin dashed and

dotted lines) yield a higher LRF.

A look at a projection of this 3D plot on the ^β_λ and

α

β plane (omitted for reasons of space), allows the

visu-alization of the region where erasure codes perform

bet-ter than simple replication. Erasure codes yield a higher

data dependability than simple replication in the region

defined (roughly) by ^α_β > 100 and 1 < ^β_λ < 100. However,

in this region, the dependability yielded by erasure codes

is typically less than an order of magnitude higher than

that yielded by simple replication, even for the (extreme)

case where ^α_β = 1000 (see Figure 3).

Interestingly,similar plots obtained for larger values

of n

k (omitted for reasons of space) show that the region

where erasure codes prevail tends to shift towards lower

1

10

100 1000

10000

1 10 100 1000

Loss reduction factor (

LRF

)

Participant effectiveness (β/λ)

EC (6,3), α/β=1000

EC (6,3), α/β=10

EC (4,2), α/β=1000

EC (4,2), α/β=10

EC (2,1), α/β=1000

EC (2,1), α/β=10

Figure 3. Loss reduction factor for different

erasure codes.

EC (2,1) EC (4,2) EC (6,3) EC (8,4) 1 10 100 1000 10000 100000 Participant effectiveness (β/λ) ¹ 10 100 1000 10000 100000 Connectivity ratio (α/β) 1 10 100 1000 10000 100000 Loss reduction factor (LRF)

Figure 4. Comparing LRF for different erasure

codes with ⁿ_k = 2.

β

λ values as n

k increases. In other words, the spectrum of

scenarios where erasure codes provide better

dependabil-ity than simple replication narrows as the chosen storage

overhead (the n

k ratio) increases.

Nevertheless, when confidentiality is an important

criterion, using erasure coding instead of simple

replica-tion is relevant. Erasure coding can achieve better

con-fidentiality than simple replication [8] at the cost of a

slightly lower asymptotic dependability improvement

factor. For instance, in the context of Figure 3, if the user

wants to maximize confidentiality while requiring a

min-imum improvement factor of 100, a (6,3) erasure code

would be chosen rather than simple replication.

FIGURE2.7 – Le LRF pour différents codes à effacement

cherche une amélioration d’un ordre de grandeur, l’environnement doit alors satisfaire

λ ≥ 2 et α

β ≥ 10. Ces conditions sont relativement peu contraignantes : des contri-buteurs qui se connectent deux fois plus souvent à Internet qu’ils ne défaillent et la rencontre de 10 contributeurs pour une connexion à Internet.

– Pour tout ratio^α_β, le LRF arrive à une asymptote après un certain seuil de ^β_λ. Augmenter

λ une fois passé ce seuil ne sert donc plus à rien.

– Le LRF croit proportionnellement à ^α_β jusqu’à un certain seuil puis suit une asymptote. L’analyse du comportement asymptotique du LRF peut être trouvé dans [34].

La figure 2.7 permet de comparer les LRF pour différents codes à effacement. Chaque code à effacement a un coût de stockage équivalent ⁿ_k = 2. On y voit qu’à partir d’un certain ratio

λ, la réplication simple est préférable (^β_λ ' 30 pour les courbes avec α

β = 1000. Si l’on affine cette analyse [34] on se rend compte que les codes à effacement à coût de stockage de 2 sont plus efficaces dans une zone où ^α_β ≥ 100 et 1 ≤ β

λ ≤ 100. Cette zone se restreint lorsque le coût de stockage augmente. Toutefois il faut noter que l’utilisation des codes à effacement, si elle n’est pas particulièrement bénéfique pour la disponibilité, permet d’accroître notablement la confidentialité des données pour un coût de calcul bien inférieur à toute technique à base de chiffrement [36].

Dans le document CONTRIBUTIONS À LA RÉSILIENCE ET AU RESPECT DE LA VIE PRIVÉE DES SYSTÈMES MOBIQUITAIRES (Page 53-57)

Évaluation analytique de la sauvegarde coopérative

2.2 Évaluation de la résilience des systèmes mobiles

2.2.1 Évaluation analytique de la sauvegarde coopérative

Figure 1. Petri net of the replication and

scatter-ing process for an (n,k) erasure code.

access the Internet. In other words, the Internet-based

store of our cooperative backup service is abstracted as

a “reliable store”. Conversely, if a participating device

fails before reaching the Internet, then all the fragments

it holds are considered lost. Thus, with (n,k) erasure

cod-ing, a data item is definitely lost if and only if its owner

device fails and less than k contributors hold or have held

a fragment of the data item.

Our model consists of three main processes

repre-sented by timed transitions with constant rate

exponen-tial distributions:

• A process with rate α that models the encounter of a

contributor by the data owner, where the owner sends

one data fragment to the contributor.

• A process that models the connection of a device to

the Internet, with rate β0 for the owner and β for

con-tributors.

• A process that represents the failure of a device, with

rate λ0 for the owner and λ for contributors.

The GSPN in Figure 1is divided into two interacting

sub-nets. The subnet on the left describes the evolution of a

data item at the owner device: either it is lost (with rate

λ0), or it reaches the Internet store (with rate β0). Places

OU and OD denote situations where the owner device is

“up” or “down”, respectively. The subnet on the right

de-scribes: (i) the data replication process leading to the

cre-ation of “mobile fragments” (place MF) on contributor

devices as they are encountered (with rateα), and (ii) the

processes leading to the storage of the fragments (place

SF) in the reliable store (rateβ), or its loss caused by the

failure of the contributor device (rateλ). The initial

mark-ing of place FC denotes the number of fragments to

cre-ate. The transition rates associated with the loss of a data

fragment or its storage on the Internet are weighted by

the marking of place MF, i.e., the number of fragments

that can enable the corresponding transitions.

Two places with associated immediate transitions

are used in the GSPN to identify when the data item is

safely stored in the reliable store (place DS), or is

defi-nitely lost (place DL), respectively. The “data safe” state

is reached (i.e., DS is marked) when the original data

item from the owner node or at least k fragments from

the contributors reach the Internet. The “data loss” state

is reached (i.e., DL is marked) when the data item from

the owner node is lost and less than k fragments are

avail-able. This condition is represented by a predicate

associ-ated with the immediate transition that leads to DL.

Final-ly, L is the GPSN “liveliness predicate”, true if and only

if m(DS) = m(DL) = 0: as soon as either DS or DL

con-tains a token, no transition can be fired.

The GSPN model of Figure 1is generic and can be

used to automatically generate the Markov chain

associ-ated with any (n,k) erasure code. Examples of Markov

chains for different (n,k) may be found in [4]. The total

number of states in such an (n,k) Markov chain is O(n2).

The models we are considering, with reasonably small

values of n are tractable using available modeling tools.

3.4. Quantitative Measures

We analyze the dependability of our backup service via

the probability of data loss, i.e., the asymptotic

proba-bility, noted PL, of reaching the “data lost” state. For a

given erasure code (n,k), PL can be easily evaluated from

the corresponding Markov chain using well-known

tech-niques for absorbing Markov chains [11]. The smaller

PL is, the more dependable is the data backup service.

To measure the dependability improvement offered

by MoSAIC, we compare PL with the probability of

data loss PLref of a comparable, non-MoSAIC scenario

where:

• devices do not cooperate;

• data owner devices fail with rate λ0;

• data owners gain Internet access and send their data

items to a reliable store with rate β0.

This scenario is modeled by a simple Markov chain

where the owner’s device can either fail and lose the data

the Internet, with rate β₀ for the owner and β for

rate λ₀ for the owner and λ for contributors.

λ₀), or it reaches the Internet store (with rate β₀). Places

data loss PL_ref of a comparable, non-MoSAIC scenario

• data owner devices fail with rate λ₀;

items to a reliable store with rate β₀.

of loss in this scenario is: PL_ref = ^λ

non-Mo-SAIC scenario, where LRF = PL_ref/PL. The higher LRF,