• Aucun résultat trouvé

Distributed Sub-gradient Method for Delay Tolerant Networks

N/A
N/A
Protected

Academic year: 2021

Partager "Distributed Sub-gradient Method for Delay Tolerant Networks"

Copied!
42
0
0

Texte intégral

(1)

HAL Id: inria-00506485

https://hal.inria.fr/inria-00506485v3

Submitted on 24 Aug 2010

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Networks

Riccardo Masiero, Giovanni Neglia

To cite this version:

Riccardo Masiero, Giovanni Neglia. Distributed Sub-gradient Method for Delay Tolerant Networks.

[Research Report] RR-7345, INRIA. 2010. �inria-00506485v3�

(2)

a p p o r t

d e r e c h e r c h e

0

2

4

9

-6

3

9

9

IS

R

N

IN

R

IA

/R

R

--7

3

4

5

--F

R

+

E

N

G

Networks and Telecommunications

Distributed Sub-gradient Method for Delay Tolerant

Networks

Riccardo Masiero — Giovanni Neglia

N° 7345

(3)
(4)

Centre de recherche INRIA Sophia Antipolis – Méditerranée

2004, route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex

Ri ardoMasiero

∗†

,Giovanni Neglia

Theme: Networks andTele ommuni ations Networks, SystemsandServi es, DistributedComputing

Équipe-ProjetMAESTRO

Rapportdere her he n° 7345June201038pages

Abstra t: Inthisreport we onsiderthat nodesin a DelayTolerantNetwork (DTN) may ollaborate to minimize the sum of lo al obje tive fun tions, de-pendingingeneralonsomeparametersora tionsofallthenodesinthenetwork. Ifthelo alobje tivefun tionsare onvex,it anbeadoptedare entlyproposed omputation framework that relies onlo al sub-gradientmethods and onsen-sus algorithmsto average ea h nodeinformation. Existing onvergen eresults for this framework an be applied to DTNs only in the ase of syn hronous nodeoperationandsimplemobilitymodelswithoutmemory. Weaddressboth these issues. First, we prove onvergen e to the optimal solution for a more general lassofmobility pro esses. Se ond,weshowthat,underasyn hronous operations, a straight appli ation of the original method would lead to sub-optimal solutions and we propose some hanges to solve this problem. As a parti ular asestudy, we showhow theframework an beapplied to optimize thedisseminationofdynami ontentin aDTN.

Key-words: delay tolerant networks, distributed optimization, onsensus, sub-gradientmethod

DepartmentofInformationEngineering, UniversityofPadova6/B,Via G.Gradenigo, I35131Padova(PD),Italy. E-mail: ri ardo.masierodei.unipd.it.

INRIA-MAESTRO,2004routedesLu ioles,BP93,FR-06902SophiaAntipolis(BIOT), Fran e. E-mails: {ri ardo.masiero, giovanni.neglia}sophia.inria.fr.

(5)

Résumé : Dans e rapport, nous onsidérons que les n÷uds dans un réseau tolérant les delais (Delay TolerantNetwork, DTN) peuvent ollaborer an de minimiserlasommedesfon tionsobje tivelo ales,quidépendentengénéraldes paramètresoudesa tionsdetouslesn÷udsduréseau. Silesfon tionsobje tif lo alessont onvexes,ilpeutêtreadoptéuneméthodologieré emmentproposé, quis'appuiesurle al uldusous-gradientdelafon tionlo aleetlesalgorithmes de onsensuspourfairelamoyennedel'informationde haquen÷ud. Les résul-tats de onvergen e existantspour etteméthodologie peuvent être appliqués auxDTNs uniquementdans le asde operationsyn hronedesn÷uds et pour desmodèlesde mobilitésimples,sansmémoire. Nousabordons es deux ques-tions. Toutd'abord,nousprouvonsla onvergen edelaméthodeà lasolution optimale pourune lasse plusgénérale des pro essus de mobilité. Deuxième-ment,nousmontronsque,dansle asdeopérationasyn hrone,uneappli ation dire te dela méthodeoriginale onduit à dessolutions sous-optimales et nous proposonsquelquesmodi ationspourrésoudre eproblème. Commeétudede asparti ulier,nousmontrons ommentle adrepeutêtreutilisépouroptimiser ladiusionde ontenusdynamiques dansunDTN.

Mots- lés : réseau tolérant les délais, optimisation distribuée, onsensus, méthodedesous-gradient

(6)

1 Introdu tion

Innetworkingappli ations,theperforman eofaDelayTolerantNetwork(DTN) isaglobalmeasurethatdependsonde isions(i.e.,proto olrules)andvariables (i.e., proto ol parameters) at ea h network node. Hen e, the optimization of anygivennetworkproto ol an bedes ribedas a globaloptimization problem whi hisgovernedbythelo ala tionstakenbyea hnode. Asanexample,the messagedeliverydelayandtheenergy onsumptionunderthegossipproto ol[1 , 2℄ depend on the message forwarding probabilities whi h an be lo ally and independently al ulated by ea h node. To further ompli ate matters, lo al (butgloballyoptimal)de isionsatdierentnodesarenotindependentandthe optimal ongurationis in generalheterogeneous and depends on the spe i s enario, as dierent nodes have dierentroles in the network. Given this, it maybenotpossibleto omputeoptimalproto ol rulesandparameterso-line prior to network deployment. In addition, the dis onne ted nature of DTNs alls foron-lineanddistributed approa hestooptimization where,in pra ti e, ea hnodehasa esstolo alvariablesandruleswhi h anonlybeseta ording to what o urs within its immediate surroundings (the visibility s ope of the node).

The authors of [3 ℄ present a distributed solution to this problem for the ase where the global optimization target

f

an be expressed as sum of

M

onvex fun tions

f

i

and ea h node

i

only knows the orresponding fun tion

f

i

, referred to as the lo al obje tive fun tion. Many performan e metri s of interest have this de omposition property. For example, this is the ase of performan emetri s related tonodes(e.g., energy onsumption at ea h node) ortomessages(e.g.,deliverytime,deliveryprobability,numberof opiesinthe network). In either ase, the metri s an naturally beexpressed as a sum of lo al ost fun tions relative to ea h node. Convexity may benot guaranteed, but when this assumption does nothold the system onverges in generalto a sub-optimalbut stilldesirablesolution.

Intheframeworkproposed in[3 ℄,and laterextendedin [4 ℄,nodesoptimize theirown lo al obje tivefun tions through a sub-gradientmethod,where they trytorea hagreementontheirlo alestimatesbyo asionallyex hangingtheir lo al information and averaging it, likein a onsensus problem [5 ,6 ℄. Within thisapproa h, referredtoasthedistributedsub-gradientmethod,thelo al esti-mate ofea h nodeis proven to onvergeto theoptimalsolutionunder ertain assumptions. However,two ofthese assumptionsappearto beparti ularly re-stri tiveforpra ti aluseinDTNs enarios. Spe i ally,thenodemobility pro- essshouldhavestri tdeterministi boundsontheinter-meetingtimesbetween nodes, see[3 ℄,or itshould bememory-less, see[4 ℄. Neitherofthese onditions is in generalsatised in a real network. Se ond,allnodes shouldupdate their estimates at the same time, but syn hroni ityis di ult to a hievein su h a dis onne teds enario.

Asarst ontribution, inthisreport werelaxtheseassumptions. First,we prove that the distributed sub-gradient method also onverges under a more general Markovian mobility model with memory in the meeting pro ess. In addition,weshowthatadire tappli ationoftheframeworkwhennodesoperate asyn hronouslymayintrodu e a bias,leading to onvergen e to asub-optimal solution. Hen e,wepropose some adjustments, and show by simulationsthat theyareableto orre tthebias.

(7)

Furthermore, inspiredbytheworkin [7 ℄,wepropose apossible appli ation oftheframeworktoaDTNs enariowhereaServi eProvider(SP)disseminates adynami ontentover amobileso ialnetwork,withthehelpoftheusersthat opportunisti allyshareamongthemselves ontentupdates. Inthis ontextthe SP shouldde idehowto allo ateitsbandwidthoptimally, andtothis purpose it needs to olle t information about nodeutility fun tions and nodemeeting rates. Weshowthatdistributedsub-gradients anbeee tivelyusedtoletthe nodesperformsu hoptimization.

The report is organized as follows. In Se . 2 we review the distributed sub-gradient method, whilst in Se . 3 weshow how to apply it to DTNs and motivate our work. As original ontributions, in Se . 4 we extend theresults in [3 ℄and[4 ℄tomore generalnetworkmobility modelsandin Se .5,westudy howthepresentedframeworkneedstobeextendedto opewithasyn hronous node operations. Hen e, in Se . 6, we illustrate a possible DTN appli ation exploiting thedistributed sub-gradientmethod. Se .7 on ludesthereport.

(8)

2 Distributed Sub-gradient Method's Overview

InthisSe tionwereviewthemainresultsin[3,4℄on onvergen eand optimal-ityofthedistributed sub-gradientmethod whena randomnetworks enario is onsidered.

Letus onsideraset of

M

nodes(agents),thatwantto ooperativelysolve thefollowingoptimizationproblem:

Problem1(GlobalOptimizationProblem). Given

M

onvexfun tions

f

i

(x) :

R

N

→ R

,determine:

x

∈ argmin

x∈R

N

f (x) =

M

X

i=1

f

i

(x) .

Clearly, for theaboveproblem weassumethat a feasiblesolution exists. The di ultyof thetask arisesfrom the fa t that agent

i

, for

i = 1, 2, · · · M

, only knowsthe orrespondingfun tion

f

i

(x)

,namelyitslo al obje tivefun tion. For example

f

i

ouldbeaperforman emetri relativetonode

i

,and

f

ouldindi ate globalnetworkperforman e.

Ifthefun tions

f

i

aredierentiable,ea hnode ouldapplyagradientmethod toitsfun tion

f

i

togenerateasequen eoflo alestimates,but thiswouldlead to

M

biased estimatesofthe solutionofproblem 1. In[3 ℄ and[4℄, itisshown thatifnodesperformagradientmethodbutarealsoabletoaverage theirlo al estimates, under opportune onditions,these estimatesall onvergeto a point ofminimumof

f

,i.e.,

x

.

Inparti ular, atimeslottedsystemisassumed,where,at theend ofaslot, ea hnode

i

ommuni ates itslo alestimateto asubsetofalltheother nodes, andthenupdatestheestimatea ordingtothefollowingequation

1 :

x

i

(k + 1) =

M

X

j=1

a

ij

(k)x

j

(k) − γ(k)d

i

(k) ,

(1)

where theve tor

d

i

(k) ∈ R

M

is a sub-gradient 2

ofagent

i

'sobje tivefun tion

f

i

(x)

omputed at

x

= x

i

(k)

, thes alar

γ(k) > 0

is the step-size of the sub-gradient algorithm at iteration

k

, and

a

ij

(k)

are non-negative weights, su h that

a

ij

(k) > 0

ifand onlyifnode

i

hasre eived nodej'sestimateat thestep

k

and

P

M

j=1

a

ij

(k) = 1

. Wedenoteby

A(k)

thematrixwhoseelementsarethe weights,i.e.

[A(k)]

ij

= a

ij

(k)

.

Weobservethattherstaddend intherighthand sideof1 orrespondsto averagea ordingtoa onsensusalgorithm[5 ℄.

[3 ℄provesthat theiterations(1 ) generatesequen es onvergingto a mini-mumof

f

underthefollowingsetof onditions:

1. thestep-size

γ(k)

issu hthat

P

i=1

γ(k) = ∞

and

P

i=1

γ(k)

2

< ∞

; 2. thegradientofea hfun tion

f

i

isbounded;

3. ea hmatrix

A(k)

issymmetri (thendoubly sto hasti ); 1

Inthisreportalltherealvaluedve torsareassumedtobe olumnve tors. 2

d

i

∈ R

N

isasub-gradientofthefun tion

f

i

at

x

i

dom

(f

i

)

iif

f

i

(x

i

) + (d

i

)

T

(x − x

i

) ≤

f

i

(x)

forall

x

dom

(f

i

).

(9)

4. itexists

η > 0

,su hthat

a

ii

(k) > η

and,if

a

ij

(k) > 0

,then

a

ij

(k) ≥ η

; 5. theinformation of ea h agent

i

rea hes every other agent

j

(dire tly or

indire tly)innitelyoften;

6

) thereisadeterministi boundfortheinter ommuni ationintervalbetween two nodes.

Webetterformalize onditions

5

and

6

(resp.

6

′′

). Considerthegraph

(V, E

)

, where

V

is the set of nodes and the edge

(i, j)

belongs to

E

if nodes

i

and

j

ommuni ate innitely often (i.e. if

a

ij

(k)

is positive for innite values

k

). Condition

5

imposesthatthegraph

(V, E

)

is(strongly) onne ted. Condition

6

requires that there is a positive integer onstant

B

, su h that two nodes ommuni atinginnitelyoften, ommuni ateatleaston eevery

B

slots,i.e.,if

(i, j) ∈ E

,then

max{a

ij

(k), a

ij

(k + 1), · · · , a

ij

(k + B − 1)} > 0

.

In[4 ℄, the inter-meeting times arenot deterministi ally bounded, but ma-tri es are required to be independently and identi ally distributed. In fa t, ondition

6

isrepla edbythefollowingone:

6

′′

) matri es

A(k)

arei.i.d. randommatri es.

In su h ase,

(i, j) ∈ E

if and only if E

[a

ij

(k) > 0]

. Note that, when the matri es

A

(k)

arerandom(likein

6

′′

), ondition

5

requiresthematrixE

[A(k)]

tobeirredu ibleandaperiodi .

Bothpapersaddressalsothe asewhenthegradientstep-sizedoesnot van-ish, but it iskept onstant(

γ(k) = γ

). Inthis ase,thesequen e ofestimates

x

i

does not onvergein general to a point of minimumof

f

, but itmay keep os illatingaroundone of su h point. It ispossible tobound thedieren e be-tweenthevalues that

f

assumesat thepointsofa smoothedaverageof

x

i

and theminimumof

f

. Asitisintuitive,thesmaller

γ

,thesmallersu hdieren e. Wearegoingtoprovideanintuitiveexplanationofwhytheresultson on-vergen e and optimality hold, and an outline ofthe proofs in [3 ,4 ℄. This will beusefulforourfollowingextensions. Werstformulate(1 )in matrixformas follows:

X(k + 1) = A(k)X(k) − γ(k)D(k) ,

(2) where

X

(k)

def

=

x

1

(k + 1), · · · , x

i

(k + 1), · · · , x

M

(k + 1)



T

and

D(k)

def

=

d

1

(k + 1), · · · , d

i

(k + 1), · · · , d

M

(k + 1)



T

.

Thisequationiterativelyleads to(seeAppendixA)

X(k + 1)

=

A

(k)

(1)

X(1) −

k

X

s=2

A

(k)

(s)

γ(s − 1)D(s − 1) +

−γ(k)D(k) ,

(3) where

A

(k)

(s)

,with

s, k ≥ 1

and

s ≤ k

,istheba kwardmatrixprodu t,i.e.,

A

(k)

(s)

=

A(k)A(k − 1) · · · A(s)

. We introdu e the average of all the nodes estimates,

y(k) ∈ R

M

,denedas:

y(k)

T

=

1

M

1

(10)

By (2 ),weobtain

y(k + 1)

T

=

1

M

1

T

A(k)X(k) − 1

T

γ(k)

M

D(k) =

=

y(k)

T

γ(k)

M

1

T

D(k) .

(4)

Assume for a moment that

x

i

(k) = x

j

(k) = y(k)

, forea h

i

and

j

, then sub-gradients

d

i

(k)

are allevaluated in

y(k)

and

1

T

D(k)

is a sub-gradient of the globalfun tion

f

evaluatedin

y(k)

. Thus, theaboveequation(4 ) orresponds to a basi iteration of a sub-gradient method for the global fun tion

f

. The intuitiveexplanationoftheresultisthataveragingkeepstheestimatesof

x

i

(k)

, for all

i

, lose ea h other (and then lose to

y(k)

) and makes the lo al sub-gradients updates equivalent to a sub-gradient update for the global fun tion

f

.

We illustrate this through a simple toy example that we are going to use dierent timesa ross thisreport. Consider three nodes, labeledas

1

,

2

and

3

. Theirlo alobje tivefun tionsare

f

1

(x) = f

2

(x) = x(x − 1)/2

and

f

3

(x) = 2x

2

, where

x ∈ R

. Then the global fun tionis

f (x) =

P

i

f

i

(x) = 3x

2

− x

, it has minimumvalueequalto

−1/12

andauniquepointofminimumin

x = 1/6

. The weightmatri es

A

(k)

arei.i.d. randommatri es. Atea hstep

A

(k)

isequalto one ofthefollowingthree matri es

1

2

1

2

0

1

2

1

2

0

0

0

1

,

1

2

0

1

2

0

1

0

1

2

0

1

2

,

1

0

0

0

1

2

1

2

0

1

2

1

2

,

(5)

withprobability

2/3

,

1/6

and

1/6

,respe tively. Fig.1showstheevolutionofthe estimates atthe 3 nodes, whenthealgorithm is appliedwith

γ(k) = 1/k

. We anseethatstate'sestimatestendsto oupleandthen onvergetotheoptimal value. Inparti ular,estimatesofnode

1

and node

2

arekept loserea h other sin etherstmatrixofaboveissele tedwithhigherprobability. Similarresults havebeenobtainedwith

γ(k) = γ ≪ 1

.

Theproofsofthe onvergen eresultsin[3 ,4℄sharemainlythesameoutline. A keyelementis provingthat theaveraging omponent(the onsensus) ofthe algorithm onvergesexponentiallyfast. Moreformally,under

6

,[3 ℄provesthat

A

(k)

(s)

surely onverges to the matrix

J

= 1/M 11

T

, and that there are two

positive onstants

C

and

β

su h that

kA

(k)

(s)

− Jk

≤ Cβ

k−s

for all

k ≥ s

. Under

6

′′

,[4 ℄provesalmostsurely onvergen eof

A

(k)

(s)

to

J

,andanexponential onvergen erate in expe tation, i.e.

E[kA

(k)

(s)

− Jk

] ≤ Cβ

k−s

. Then, similar

boundsareestablishedforthedistan ebetween

y

and

x

,andbetween

x

i

and

y

, and onvergen e results (when

γ(k)

satises ondition 5) and asymptoti bounds(when

γ(k)

is onstant)followfromtheexponential onvergen erateof theaveraging omponent. Asa onsequen eofthedierentkindof onvergen e for

A

(k)

(s)

in thetwo ases,these resultshold surelyunder

6

andalmost surely under

6

′′

(11)

0

50

100

150

200

250

300

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

number of pair meetings

node state [x]

optimal x

*

x

1

x

2

x

3

0

50

100

150

200

250

300

−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

number of pair meetings

objective function [f(x)]

optimal f(x

*

)

f(x

1

)

f(x

2

)

f(x

3

)

Figure 1: Toyexample, onvergen eof thethreeestimates. Topgraph: state's estimatesfor ea h node. Bottom graph: obje tivefun tionvalue omputedin thestate'sestimatesofea hnode.

3 Appli ation to Optimization in DTNs

Thedistributedsub-gradientmethodpresentedinSe .2isparti ularly appeal-ing in the ontext ofDelayTolerantNetworks, see e.g.,[8℄and [9 ℄. DTNs are sparseand/orhighlymobilewirelessadho networkswhereno ontinuous on-ne tivityguarantee anbeassumed. Thisintrinsi allyleadstotheimpossibility of olle ting,atlow ostandatasingledatapro essingpoint,theinformation neededtosolvenetworkoptimizationproblemsina entralizedfashion. Dueto this,inthepresentreportweadvo atetheuseofdistributedapproa hes,whi h lend themselves well to distributed and ommuni atione ientoptimization. Tobemore on rete,in what followswebriey dis usstwopossible DTN s e-narioswhere aglobalnetwork'sfun tion

f

hastobeoptimized.

One entralprobleminDTNsisrelatedtoroutingpa ketstowardsintended destinations. Commonte hniques,designedtoover ometheabsen eofa om-pleterouteto thedestination, relyonmulti- opydisseminationof messagesin the network [10℄. In this ontext, it is natural to dene global optimization fun tions that areable to takein a ount thetrade-o between delivery time and the ost due to the useof resour es su h as buer spa e, bandwidth and transmission power. Fun tions of this kind are onvex and an be written as sumoflo ally measurablequantities,thuswe an optimizedthemthrough the distributed sub-gradientframework[11 ℄.

(12)

Ase ondDTNs enario on ernsthedisseminationofdynami ontent,like newsortra information. Referringtotheappli ationexampleof[7 ℄,wemight thinkofaServi eProvider(SP)withlimitedbandwidth,thathastode idethe update rate to assignat ea h node. Nodes an share their ontentwhen they meet with the global obje tive of maintaining the average information in the networkasfresh aspossible.[7 ℄showsthatthisproblem analsobeformalized as a lassi al onvexoptimization problem, and that the orresponding global obje tivefun tion anbeexpressedintermsofthesumoflo alfun tions. The derivation ofthe latterlo al fun tions entails the olle tion of statisti swhi h are omputed at ea h nodeonly onsidering its own meeting o urren es. As an appli ation examplefor our te hniques, in Se .6 we apply thedistributed sub-gradientframeworktothiss enario.

From a general perspe tive, the distributed sub-gradient optimization in DTNs an be applied as follows. Nodes ex hange their lo al estimates every timetheymeetandperformtheupdatestepinequation(1)atagivensequen e of time instants

{t

k

}

k

≥1

. This sequen e an either oin ide with themeeting times, i.e.,ea h timetwo nodesmeetstheyex hangeand subsequently update theirestimatesorbeindependentfromthem,i.e.,inthis ase

{t

k

}

k≥1

isdened a prioriandisknowntoeverynetworknode. Inany ase,theweightmatri es

A(k)

originate from thenode meeting pro ess. In parti ular, we an onsider the onta tmatrix

C(k)

, where

c

ij

(k) = 1

ifnode

i

has metnode

j

sin elast time instant

t

k−1

, and

c

ij

(k) = 0

otherwise. Wedenoteas

C

the(nite)set of allpossible

M × M

onta tmatri esdes ribingthe onta tsamong

M

nodes. Ea h node

i

an thus al ulate its own weights

a

ij

(k)

, for

j = 1, . . . , M

, in oneofthefollowingtwoways(whi hguaranteethatthematrix

A

(k)

isdoubly sto hasti ):

Rule 1 (Updates independent from meetings)For

j 6= i

, set

a

ij

(k) = 1/M

if

c

ij

(k) = 1

, otherwiseset

a

ij

(k) = 0

. Set

a

ii

(k) = 1 −

P

j6=i

a

ij

(k)

. This

methodrequiresea h nodetoknow

M

,i.e., thetotalnumberofnodesin thesystem.

Rule 2 (Updates syn hronizedwith meetings) Whenever node

i

meets node

j

,italsoupdatesitsestimate. Inthis ase,set

a

ij

(k) = a

with

0 < a < 1

,

a

ii

(k) = 1 − a

and

a

ih

= 0

for

h 6= j, i

.

Next,wedis usstwo keyissuesthat an negativelyimpa t the onvergen e of thedistributedoptimizationpro essinaDTNs enario. Therstoneisrelated to the validity of assumptions

6

and

6

′′

. In fa t, ondition

6

is essentially equivalent toassume thatthere is a deterministi boundfor theinter-meeting times of two nodes (that meet innitely often), and this is for example not the ase for all therandom mobility models usually onsidered, see e.g.,[12 ℄. Condition

6

′′

relaxes

6

,butrequirestheindependen eofthemeetingso urring in ea h time slot

[t

k−1

, t

k

]

, and meetings under realisti mobility are instead orrelated(e.g.,ifinthere entpast

i

hasmet

j

and

j

hasmet

h

,thenthethree nodesarelikelytobe loseinspa eandtheprobabilitythat

i

meets

h

ishigher that withuniform andindependentmobility). Weaddress thisissuein Se .4 , where weprovethat onvergen eresultshold undermore generalassumptions onthesto hasti pro essofthematri es

A(k)

.

These ond issueis relatedto thesyn hroni ityoftheupdates. Infa t,the originalframework[3 ,4 ℄requiresallthenodestoupdatetheirestimatesatthe

(13)

sametimeinstants. Thisisnotalwaysfeasibleinadis onne tedanddistributed s enariolikeaDTN.Forexample,underRule2thereadermayhavenotedthat syn hronousupdates require ea h node to know when a meeting between any two nodesinthenetworko urs. Thisdoesnotappear tobepra ti al. Under Rule 1, whi h requires ea h node to know the total number of nodes in the system,nodesshouldalsotrytokeeptheirinternal lo kssyn hronizedinorder to be able to perform their updates at  lose enough time instants and this presents some di ulties as well. We now show through an examplethat we annot simply ignore the issue of syn hroni ity and that a dire t appli ation of the algorithm des ribed in the previous se tion in general does notleadto orre tresults. Comingba ktothetoyexamplepresentedinSe .2,weobserve that we an think ourthree matri es in (5) as generated a ordingto Rule 2, when the meeting pro ess hasthe following hara teristi s: at ea h time slot, node

1

andnode

2

meetwithprobability

2/3

,node

1

and

3

meetwithprobability

1/6

and node

2

and

3

meet with probability

1/6

. Fig. 2 shows the evolution of the estimates when the step-size is onstant and equal to

25 · 10

−4

, both for the syn hronous ase, where all the nodes update their estimates when a meeting o urs (even the node that is not involved in the meeting), and for the asyn hronous ase,where only thenodesinvolvedin themeeting perform the update. The urves represents the average estimates over

100

dierent simulationswithdierentmeetingsequen es. Wenotethat inthesyn hronous ase(top gure)allnodesagreeontheoptimalvalueto set

x

,whereas,in the asyn hronous ase(bottom gure) theestimatesstill onverge, but notto the orre t point of minimum for the global fun tion

f

. We address this issue in Se .5 ,whereweunderstandtherootsofthis onvergen eproblemandpropose some simplemodi ationstothebasi frameworktoee tively opewithit.

Inour opinion, these extensions to the basi framework proposed in [3 ,4℄, while motivated in this report by the DTN s enario, are of wide interest for otherpossibleappli ationssu hasmobilewirelessadho andsensornetworks.

(14)

0

200

400

600

800

1000

1200

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

number of pair meetings

node state [x]

optimal x

*

biased x

*

x

1

x

2

x

3

0

200

400

600

800

1000

1200

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

number of pair meetings

node state [x]

Figure2: Toyexample, onvergen eofthethreeestimatesin aseofxed step-size

γ = 25 · 10

−4

. Results have beenaveraged over

100

simulation runs. Top graph: syn hronousupdates. Bottomgraph: asyn hronousupdates.

4 Extension to more General Mobility Models

In our DTN s enario, we onsider that the weights are determined from the onta tsthrough abije tivefun tion(asin the aseofthetwo rulespresented in Se .3 ). Then onditions

5

and

6

′′

of [4 ℄, an be expressed in terms of the sequen e of onta t matri es as follows: the onta t matri es

C(k)

are i.i.d. and E

[C(k)]

is anirredu ible aperiodi matrix. In thisse tion, weextendthe onvergen eresultsto thefollowing,moregeneral,mobilitymodel.

Assumption 1 (Mobility model). It existsan irredu ible, aperiodi and sta-tionaryMarkov hain

Φ

withaniteor ountablesetofstates

S

andafun tion

g : S → C

,su hthat

C(k) = g(Φ

k

)

, forea h

Φ

k

∈ S

. Moreover, E

[C(k)]

is an irredu ibleaperiodi matrix.

Sin ethereisabije tive orresponden eamongweightand onta tmatri es, weobservethat under assumption 1,italso exists a fun tion

ˆ

g : S → A

, su h that

A(k) = ˆ

g(Φ

k

)

. The asewhenthe onta tmatri es(andthentheweight matri es)arei.i.d. isaparti ular aseofourmobilitymodel.

Ourproof followsthe same outline of [3 ,4℄presented in Se . 2. Themain issue is to prove theexponentialrate of onvergen e of theba kward produ t

A

(k)

(1)

to

J

= 1/M 11

T

(15)

Beforeprovingthe onvergen eof theba kwardprodu t, weneedto re all an ergodi propertyof thetime shiftoperator

θ

forirredu ible, aperiodi and stationary Markov hains. The denitions of measure-preserving and ergodi operators may be found in Appendix B (see also hapter V of [13 ℄ for more details).

Proposition1. Givenanirredu ibleaperiodi andstationaryMarkov hain

Φ

with nite or ountable states, the shift operator

θ

is measure-preserving and ergodi togetherwith allitspowers

θ

k

,where

k ∈ N

.

Proof. ForstationaryMarkov hainstheshiftoperatoranditspowersare mea-sure-preserving by denition of stationarity. Moreover, irredu ible, aperiodi and stationary Markov hains with nite or ountable states are mixing (see Theorem 3.1 in [14 ℄). From the denition of mixing, if

θ

is mixing, also the operator

θ

k

is mixing foranygiven

k ∈ N

. But every mixing operator is also ergodi byTheorem2in [13 ℄,then

θ

k

isergodi forany

k ∈ N

.

We observe that the sto hasti pro ess

A(k) = ˆ

g(Φ

k

)

is not in general a Markov hain, be ausedierentstatesof

S

maybemappedtothesameweight matrix,butneverthelessitis stationaryandergodi .

Wewill alsoneedthefollowingresult,whoseproofisin AppendixC.

Lemma 1. (Windowing a Markov hain) Let

Φ = {Φ

n

, n ∈ N}

be an irre-du ible,aperiodi andstationaryMarkov hain. Considerthesto hasti pro ess

Ψ = {Ψ

n

, n ∈ N}

,where

Ψ

n

= (Φ

n

, Φ

n+1

, · · · , Φ

n+h−1

)

with

h

apositive inte-ger.

Ψ

isalsoan irredu ibleaperiodi stationaryMarkov hain.

Now we have all theinstrumentsto study the onvergen e of

A

(k)

(1)

. First,

weprovethe onvergen eto

J

. Thisisa orollaryofresultsin[15 ℄.

Proposition2(Convergen eoftheba kwardprodu t). Letassumption1hold, then

lim

k→+∞

A

(k)

(1)

=

1

M

11

T

, J

almost surely (a. s.)

.

Proof. Weobservethat

A(k)

isastationaryandergodi sequen eofsto hasti matri es with stri tly positive diagonalentries. Moreover, E

[A(k)]

is an irre-du ibleaperiodi matrix,thenitseigenvaluewiththese ondlargestmodulehas module stri tly smaller than 1 (

2

(

E

[A(k)])| < 1

). From Theorem 3 in [15 ℄, it followsthat, withprobabilityone, forea hsequen e

A(k)

it exists a ve tor

v

∈ R

M

,su h that

P

i

v

i

= 1

and

lim

k→+∞

A

(k)

(1)

= 1v

T

.

Notethatingeneral

v

isarandomvariable,dependingonthespe i sequen e

{A(k)}

k≥1

. The matri es

A

(k)

are doubly sto hasti , then

w

= 1/M 1

is a left eigenve tor orrespondingtotheuniteigenvalueforallthematri es

A

(k)

. Theorem 4 in [15℄, guarantees that in this asethe above ve tor

v

is a deter-ministi onstantalmostsurelyandin parti ularisequalto

w

. This on ludes theproof.

Nowwearereadytoprovethatthe onvergen erateisalmostalways expo-nential.

(16)

Proposition 3. Under assumption 1 on the mobility models, if the matri es are doubly sto hasti , then for almost all the sequen es there exist

C > 0

and

0 < β < 1

(with

C

in general dependingof the sequen e) su hthatfor

k ≥ s

A

(k)

(s)

− J

max

≤ Cβ

k−s

.

Proof. Givena matrix

A

, onsider the oe ientofergodi ity,see e.g.,[16 ℄:

τ

1

(A) =

1

2

max

i,j

M

X

s=1

[

A]

is

− [A]

js

.

In the proof of Theorem 3 in [15℄ is shown that it exists a positive natural

h

and

η < 1

su hthat

P

h

τ

1



A

(s+rh−1)

(s+(r−1)h)



< η

forinnitelymanyr

i

= 1 .

(6)

Thenwede ompose

A

(k)

(s)

,intheprodu tof

i

k

blo ksofsize

h

andoneblo k ofsize

(k + 1) mod h

as itfollows:

A

(k)

(s)

= A

(k)

(s+i

k

h)

A

(s+hi

k

−1)

(s+h(i

k

−1))

· · · A

(s+h−1)

(s)

.

Be auseofthepropertiesofa oe ientofergodi ity:

τ

1



A

(k)

(s)



≤ τ

1



A

(k)

(s+i

k

h)



Y

i

k

j=1

τ

1



A

(s+hj−1)

(s+h(j−1))



i

k

Y

j=1

τ

1



A

(s+hj−1)

(s+h(j−1))



.

Then,we an write:

log



τ

1



A

(k)

(s)



i

k

X

j=1

log



τ

1



A

(s+hj−1)

(s+h(j−1))



.

We now onsider the Markov hain

Φ

, that generates" the sequen e of matri es

A(k)

underlying the mobility pro ess. Be ause of Lemma 1 ,

Ψ

t

=

t

, Φ

t+1

, · · · , Φ

t+h−1

)

isanirredu ibleaperiodi stationaryMarkov hain,and then all the powers of the shift operator

θ

, and in parti ular

θ

h

, are ergodi .

We observe that

log



τ

1



A

(j+h−1)

(j)



is a fun tion of

j

, Φ

j+1

, · · · , Φ

j+h−1

)

and then of

Ψ

j

. We all su h fun tion

f

, i.e.,

f (Ψ

j

)

def

= log



τ

1



A

(j+h−1)

(j)



.

Notethat

f (Ψ

t

) ≤ 0

and,fromequation(6 ),

f (Ψ

t

) < log(η) < 0

innitelyoften almost surely. We an then apply Birkho's ergodi theorem to the random sequen e

{f (Ψ

s

), f (Ψ

s+h

), f (Ψ

s+2h

), · · · }

,anditfollowsthat:

lim

i→∞

1

i

i

X

j=1

log



τ

1



A

(s+hj−1)

(s+h(j−1))



=

= lim

i→∞

1

i

i

X

j=1

f (Ψ

s+hj

) =

E

[f (Ψ

t

)] < 0

a. s.

,

(17)

therefore,

lim sup

h→∞

1

h

log



τ

1



A

(s+h)

(s)



E

[f (Ψ

t

)] < 0

a. s.

.

Consider E

[f (Ψ

t

)] < ζ < 0

, thenforalmost allthesequen esitexists

h

0

,su h that forall

h ≥ h

0

,itholds:

1

h

log



τ

1



A

(s+h)

(s)



≤ ζ,

i.e.,

τ

1



A

(s+h)

(s)



≤ e

ζh

.

Ifwedene

β = exp(ζ) < 1

,

C = β

−h

0

andre allthat

τ

1



A

(k)

(s)



≤ 1

,weobtain:

τ

1



A

(k)

(s)



≤ Cβ

k−s

for

k ≥ s .

(7)

Intheaboveequation,thevalueofthe onstant

h

0

dependsonthespe i random sequen e and also on

s

(while the same value

ζ

an be sele ted for all thesequen esand independentlyfrom

s

). We needthen touse a orollary of the ergodi theorem about nearly uniform" onvergen e that is stated as proposition (1.5) in [17 ℄: if

f (·)

is square integrable, then for almost all the sequen es we an sele t

h

0

independently from

s

. Clearly this is the ase for our fun tion

f (Ψ

t

)

, therefore we an on lude that

C

in (7 ) only depends on the onsideredsequen e.

Sofar we haveestablished the existen eof a geometri onvergen e result forthe ergodi oe ient

τ

1

. Thelast stepofourproof requiresus to provea geometri bound forthedistan ebetween

A

(k)

(s)

anditsalmost surelimit

J

. In parti ularweprovethat

h

A

(k)

(s)

i

uv

1

M

≤ 2τ

1



A

(k)

(s)



,

(8)

for ea h

u

and

v

. From(8 )a geometri bound followsalsofor

kA

(k)

(s)

− Jk

max

.

Thederivation of (8 ) hasnoparti ular di ultyand mainly followsthe proof ofTheorem4.17in[16 ℄, thenwehavemovedittoAppendix D .

In[4℄adierentresultisproven,i.e.,thatthere exist

C

ˆ

and

β

ˆ

su h that E

h

A

(k)

(s)

− J

max

i

≤ ˆ

C ˆ

β

k−s

.

Then a series of inequalities for the expe ted values of

ky(k) − x

i

(k)k

2

are

obtained for all

i

. Using Fatou's Lemma, along with the non-negativeness of distan es,itispossibletoderiveinequalitiesthatholdwithprobability1. Using Proposition 3, instead, it is possible to obtain the same inequalities dire tly withouttheneedto onsidertheexpe tation.

(18)

5 Asyn hronous Updates

In this se tion, we study how the presented framework needs to be extended in order to support the ase when nodes asyn hronously update their status. We onsiderthesequen e

{t

k

}

k≥1

oftimeinstantsatwhi h oneormore nodes perform an update of their estimates. Again, we denote that the estimate of node

i

attime

t

k

(immediatelybeforetheupdate)is

x

i

(k)

andrepresentallthe estimatesthroughthematrix

X

(k)

. Theevolutionoftheestimates anstillbe expressedin amatrixformsimilarlyto(2) :

X

(k + 1) = A(k)X(k) − Γ(k)D(k) ,

(9)

where

Γ(k)

isadiagonalmatrixandtheelement

[Γ(k)]

ii

issimplythestep-size usedbynode

i

atthe

k

-thupdate. Wedenotethisstep-sizeas

γ

i

(k)

. If

j

isnot among thenodeswhi h perform theupdate at time

t

k

, thenit will simplybe

a

jj

(k) = 1

,

a

jh

(k) = 0

for

h 6= j

and

γ

jj

(k) = 0

.

Inwhatfollowswerst onsider the aseofde reasing step-sizes, similarly to ondition

1

,des ribedinSe .2. Thatis,wewill onsiderthatforea h

i

,the sequen e

i

(k)}

k≥1

satises:

P

k=1

γ

i

(k) = ∞

and

P

k=1

γ

i

(k)

2

< ∞

.

We an go over the rationale in [4 ℄ and prove similar results for the new systemdes ription. Inparti ular,ourproofoftheexponential onvergen erate of

A

(k)

(s)

holds learly alsoin this ase. Boundsfor the distan ebetween

x

i

(k)

and

y

(k) = 1/M 1

T

X

(k)

hold withminimal hanges, sothat we anprovethe analogousofProposition 2in [4℄:

Proposition 4 (Convergen e of Agent Estimates). Under assumption 1, the estimate ofea h node onverges almostsurelytothe ve tor

y(k)

,i.e.,

lim

k→+∞

ky(k) − x

i

(k)k

2

= 0

a. s. ,forall

i .

Proof. SeeAppendixE .

Thefollowingstepistouseboundsforthedistan ebetween

y(k)

and

x

(a pointofminimumfor

f

)toshowthat

lim

k

→+∞

y(k) = x

.

Inparti ular thefollowing inequality is derived in [4℄(forthe syn hronous asetheyare onsidering):

k

X

s=1

γ(s) [f (y(s)) − f (x

)] ≤

M

2

ky(1) − x

k

2

2

+

+ 2L

M

X

j=1

k

X

s=1

γ(s)ky(s) − x

j

(s)k

2

+

L

2

2

k

X

s=1

γ

2

(s) .

Thersttermat therighthand sideoftheaboveinequalityis a onstant,the last termissummablebe auseof theassumptiononthestep-sizes. In[4 ℄it is proventhat

P

s=1

γ(s)ky(s) − x

j

(s)k

2

< ∞

almostsurely. Thustheyshowthat

0 ≤

X

s=1

(19)

from (10 )andthefa tthat

P

s=1

γ(s) = ∞

,itispossible to on ludethat

lim inf

k→∞

f (y(k)) = f (x

)

a. s. ,and

lim

k→∞

x

i

(k) = x

a. s.

.

InAppendix F, a similar derivation is arried on, leading to the following generalization of (10 ):

X

s=1

M

X

i=1

γ

i

(s) [f

i

(y(s)) − f

i

(x

)] < ∞

a. s.

.

(11)

Unfortunately,thedierentvaluesof

γ

i

(s)

donotallowus toformulatethe inequalityaboveintermsoftheglobalfun tion

f

asin (10 ) .

We do not have urrently a formal result stating under whi h onditions the asyn hronoussystem onverges to theoptimal solution, but (11 )suggests us that alltheweights

γ

i

(k)

should haveonaveragethe same value. Wethen proposethefollowing onje ture,that wesupportlaterwithsomeexamples:

Conje ture 1. When updates are asyn hronous, onvergen e results for sub-gradientmethodsholdifE

i

(k)] =

E

j

(k)]

forea h

i

and

j

.

Letusseehowwe anguaranteethis onditionindierent ases.We onsider that updates o ur after every meeting following rule 2 in Se . 3 . Moreover onsiderthat

γ

i

(k) = 1/n

i

(k)

,where

n

i

(k)

isthetotalnumberofupdatesnode

i

has performed until the time instant

t

k

. If the meeting pro ess follows a Poissonpro esswithtotalrate

λ

andatea hinstanttheprobabilitythat node

i

meetsanother nodeis

p

i

,weexpe tthat bytime

k

,node

i

has

p

i

k

meetings (and anequal number ofupdates). Thentheexpe tedvalueof itsstep-size is E

i

(k)] =

E

[1/n

i

(k)] = p

i

/(p

i

k) = 1/k

. In on lusion ifstep-sizes follow the rule

γ

i

(k) = 1/n

i

(k)

, weexpe t the asyn hronous sub-gradient me hanismto onvergeto theoptimalsolution. Fig.3(top gure)showsthat thisistruefor our toyexample. The simulationsfor theoptimization problem onsidered in Se .6 onrmsu h onvergen e.

Let us now revisit the examplein Se . 3 showing that the estimates were not onvergingto a pointofminimum (Fig.2, bottom gure). Here step-sizes were onstant, i.e.

γ

i

(k) = γ

. Now, reasoning as abovewe an on lude that E

i

(k)] = p

i

γ

. Hen etheexpe tedvaluesarenotequalas farasnodemeeting rates(andthen updaterates) arenotequal: thiswas the aseofourexample, where

3

p

1

= 5/6

,

p

2

= 5/6

and

p

3

= 1/3

. Intuitively,weexpe t onvergen eto bebiasedtowardsvalues losertothe optimum ofthelo al fun tions of those nodes that perform the updates more often. Equation (11 ) suggests us that what thedistributed me hanism was really doing is to minimize the fun tion

P

i

p

i

f

i

= (3/2)x

2

− (5/6)x

rather then

f =

P

i

f

i

= 3x

2

− x

. This is the ase,beingthattheestimatesare onvergingto

5/18

(dot-dashedlineinallthe previous gures). Ifnowwewantto orre tthebias,itissu ientto onsider that ea h node sele ts its step-size inversely proportional to its meeting rate. Fig. 3 (bottom gure) shows that also this orre tion leads the estimates to onvergeto the orre tresults.

3

(20)

0

200

400

600

800

1000

1200

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

number of pair meetings

node state [x]

optimal x

*

biased x

*

x

1

x

2

x

3

0

200

400

600

800

1000

1200

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

number of pair meetings

node state [x]

Figure 3: Toy example, onvergen e of the three estimates in ase of asyn- hronous updates. Resultshavebeenaveraged over

100

simulationruns. Top graph: de reasing step-size

γ

i

(k) = 1/n

i

(k)

. Bottom graph: weighted xed step-size

γ

i

(k) = p

−1

i

· 25 · 10

−4

.

6 Appli ation in DTNs: a Case Study

Inthisse tionweapplythedistributedsub-gradientmethodwithour enhan e-ments to a DTN s enario inspired by the work in [7 ℄. As explained in Se .5 ourenhan ements onsistof: 1)allowingnodestoupdateasyn hronouslytheir estimates,i.e., whenever anytwoof themmeetand2)applyingthede reasing stepsizeruletoavoidpossiblebiasee tsinthe onvergen etowardstheglobal optimum.

Allthenodesinthenetworkareinterestedinthesamedynami information ontentand an share itwhenever theymeet. Theinformationupdate is per-formedbyaServi eProvider(SP)thatinje tsfreshinformationinthenetwork a ordingtoa Poissonpro essofparameter

µ

update/se . Atagiveninstant

t

¯

we all

t

i

t)

thetimeatwhi htheSPgeneratedthemostre ent ontentversion available at node

i

,then

Y

i

t) = ¯

t − t

i

t)

istheage ofsu hversion. An infor-mation ontenthasa non-in reasingvaluein time. Forexample,wegivevalue

u

i

(Y

i

t))

to the information stored in node

i

, where

u

i

(·)

is a non-in reasing fun tion. ThegoaloftheSPistooptimize

f (x) =

M

X

i=1

f

i

(x) =

M

X

i=1

E

x

[u

i

(Y

i

)] ,

(12)

(21)

where

x

∈ R

M

istherateallo ationve tor,su hthat

P

M

i=1

x

i

≤ µ

and

x

i

≥ 0

for

all

i

. Notethattheage

Y

i

t)

ismodeledasarandomvariable

Y

i

dependingon

x

. In[7 ℄,equation(12 )isprovedtobe on aveandthereforetheoptimal

x

anbe obtainedbytheSPusingstandardoptimizationte hniques(seee.g.,[18 ℄)su h as the proje ted gradient des ent algorithm

4

: namely, iteratively omputing

x(k + 1) = Π (x(k) + γ

k

∇f (x(k)))

, where

k

}

k≥0

is a positive sequen e of parameters su h that

P

k

γ

k

= ∞

,

lim

k→∞

γ

k

= 0

and

Π

is the proje tion ontothefeasibleset for

x

. Ingeneral,a losed formulafor

f (x)

isnotknown; thus,thegradientneedsto beestimatedas explainedin [7℄. Here, ourpurpose is to fo us on thedistributed sub-gradientmethod so we onsider the spe i asewhere updates an travel at mosttwo hops, thus avoidingto address the gradientestimation'sissue. Indetail,atagiveninstant

t

¯

,letus all

t

SP

j

t)

the

time atwhi h theSP dire tly inje tedfresh ontent tonode

j

, thenwe dene thefollowingproto ol's rule:

Denition1(ContentSharing). Whenanode

j

meetsanode

i

attime

¯

t

,

j

will opyto user

i

thelast ontentdownloadeddire tly fromtheSP ifthis ontent ismore re entthenthe ontentstoredin

i

, i.e.,

t

i

t) < t

SP

j

t)

.

Inthis ase,assumingalsothata)

u(Y

i

) = χ {Y

i

≤ τ }

forallnodes

i

,where

τ

isa giventhresholdafterwhi h theinformationisworthless 5

(e.g., the infor-mation onsistsof news abouteventsthat expire after some time)and b) the meeting pro ess amongnodepairsis Poissondistributed,we an omputethe lo alutilityfun tionforea hnodeas

f

i

(x) = 1 −

Y

j∈N

i

x

j

e

−λ

ij

τ

− λ

ij

e

−x

j

τ

x

j

− λ

ij

e

−x

i

τ

,

(13)

where

λ

ij

isthe meeting ratebetween

i

and

j

and

N

i

def

= {j : λ

ij

> 0}

. The global utility fun tionin (12 )is then simply obtained summing (13 )over

i =

1, 2, . . . , M

. The omputationstoderive(13 ) anbefoundin AppendixG. The lo al gradient fun tion needed in (1) an be omputed dire tly from (13 ),wherenodesonlyneedtoestimatesimplestatisti sontheirownmeeting rates. Clearly, (12 ) an be optimized also in a entralized fashion olle ting, forexampleat theSP itself,information aboutthestatisti oftheoverall net-workmeeting pro ess [7 ℄. However,in a DTN s enario theSP may be ableto ommuni ate with a group of onne ted nodes onlyfor short periods of time, that wewould liketoexploit transmittingthea tual ontentusers areabout. Moreover,issuesrelatedtopriva yeasilyapplytothiss enario: forexample,a node mayprefernottodis lose informationaboutitsmeetingsto theSP and, in some ases,itwould beequally desirabletomaintaininformationaboutthe utilityfun tion

u

i

(·)

reserved(e.g.,in militaryappli ations). A distributed ap-proa his thereforeofa tualinterestnotonlyforDTNs, but alsofors enarios that gobeyondthem.

Tooptimize

f (x)

inadistributedfashionwe anusetheframeworkpresented in Se .2 . Ea h node

i

an omputea lo al estimateof theoptimalallo ation

x

, i.e.,

x

i

, through iterativeupdates. Indetail,when two nodes

i

and

j

meet 4

Foradis ussionabouttheproje tedgradientmethodimplementedinadistributefashion see[19 ℄.

5

(22)

they: i)update

x

i

and

x

j

asin (1 )and, ii)proje ttheresultso obtainedonto thefeasibleset

P

M

l=1

x

l

≤ µ

and

x

l

≥ 0

forall

l ∈ {1, . . . , M }

. In our implementation of sub-gradient optimization all the

x

i

eventually onvergetotheoptimum

x

of(12 ). Hen eforth,theSP anretrievetheoptimal transmissionratesusingthefollowingpush-poli y. Duringtheexe utionofthe algorithm ea h node

i

maintainsitsown estimate

x

i

. TheSP olle ts

x

i

from every nodeandobtainstherateallo ationve toras

x

= (

P

M

i=1

x

i

)/M

. To testtheperforman ea hievablebythedistributedsub-gradientmethod under tra eswithmemory,wesimulatedmeeting eventsamong

M = 10

nodes as follows. Calling

R

1

and

R

2

two distin t regions ofthe spa e, nodes an be pla ed either in

R

1

orin

R

2

. Onlynodes thatare within thesame region an ommuni atewithea hother. Weletnodesfreeto hangeregionofpla ement a ordingto aPoissonpro essofoverallrate

λ

d

= 0.1

,thusnetwork'sfull on-ne tivityisguaranteed. Inaddition,a ordingtoaPoissonpro essofparameter

λ

m

= 1

(notethat

λ

m

> λ

d

),wegeneratemeetingeventsamongpairof nodes belonging to the same region. Ea h node is sele ted for a meeting a ording to a weightwhi h is proportionally inverse to itsindex, i.e., node

i

is sele ted with weight

w

i

= i

−3

. Note that wegenerate a meeting pro ess that is both stationaryandergodi ,andalongthispro essnodeshavediverse onta trates. In parti ular node

1

has the highest onta t rate, whilst node

10

the lowest. To sum up, letting nodes

i

and

j

update their states at their meeting times a ordingtotheabovepro ess,weobtaina orrespondingsequen e

{A(k)}

k≥1

that an beviewedasgeneratedfromanergodi andstationaryMarkov hain. Also, given that nodes have dierent onta t ratesand asyn hronous updates areperformed,weknowthatadire tappli ationofthedistributedsub-gradient algorithmmayleadtosub-optimal results,see thetoyexampleofSe .3 .

Figures4 5 showsimulationresultsfortheabovesettingofparametersand

τ = 20

.

Fig.4showstheoptimalallo ationrateforea huser. Whenthebandwidth thattheSP anusetosendupdatesisverylow(i.e.,small

µ

),thebestsolution is thattheSP uniquelysendsupdatesto thenodethat hasthehigher onta t rate,i.e.,node

1

;forlargevalues of

µ

,instead,theSP anevenlysendupdates to all the nodes in the network. Interestingly, as already observed in [7 ℄, for some values of

µ

(inour ase

µ

around

10

0.7

update/se )theoptimal hoi e is fortheSP toallo atemore bandwidth(i.e.,alarger fra tionof

µ

)to thenode with the lowest onta trate, namely, node

10

. For these values of

µ

, in fa t, thosenodeswithalarge onta tratesu has node

1

areableto maintainhigh values for their utility fun tions just by olle ting information from the large number ofnodestheymeet.

InFig.5weshowthemeantraje torytowardstheoptimalfortwoelements in

x

= (

P

M

i=1

x

i

)/M

,wheretheve tors

x

i

havebeenobtainedalongasequen e of

5 · 10

4

meetings onsidering

µ = 10

−1.1

update/se . We note that the es-timates providedthrough thedistributed sub-gradientmethod onvergetothe theoreti aloptimalallo ationvaluesin Fig.4 . Con ernsaboutthe onvergen e rate of su h estimates are out of the s ope of the present report and will be addressedin thefutureresear h

6 . 6

Note,however, thatawide literatureaddressingthisissue alreadyexists. E.g.,forthe problem of designingsuitable sequen es

{A(k)}

to speed up the onvergen e of onsensus see[20 ℄.

(23)

−1.6

0

−1.1

−0.6

−0.1

0.4

0.9

1.4

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

log bandwidth [log

10

µ]

optimal rate allocation [x /

µ

]

node 1

node 2

node 3

node 4

node 5

node 6

node 7

node 8

node 9

node 10

Figure4: Optimalbandwidthallo ationfora networkof

10

nodes.

Finally, in Fig. 6 we draw with a solid-line the maximum of

f (x)

orre-spondingto theoptimalrateallo ations inFig.4,whi hwasobtained usinga entralized solver [18 ℄. Note that with illimited bandwidth, the maximum of

f (x)

isequalto

10

,i.e.,whenea hnode

i

hasutility

u

i

= 1

. Foreightdierent valuesoftheavailablebandwidth

µ

,wealsoplotwithsquaredpointstheutility fun tionvalues orrespondingtotheoptimalrateallo ationa hievedagainwith

x

= (

P

M

i=1

x

i

)/M

using sub-gradient optimizationtogether withour enhan e-ments. With rossesweshowtheperforman eofthesub-gradientoptimization with a xed step size [3 ℄, whi h negle ts the asyn hronous update issue. As expe ted,theresultsofthelatteralgorithmaresub-optimal. Mostimportantly, thesolutions a hievedwithourapproa harevery losetothea tualoptimum forallvalues of

µ

. This onrmsthevalidityofthedistributedframeworkthat wepresented.

(24)

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

number of pair meetings [10

4

]

rate allocation [x /

µ

]

optimal − node 1

optimal − node 10

estimate − node 1

estimate − node 10

Figure 5: Example of onvergen e of estimate for two nodes when the sub-gradientmethodisused. Asyn hronousupdatesandde reasing step-size.

7 Con lusions

Inthisreportwe onsideredthere entframeworkofthedistributedsub-gradient optimizationproposedin[3 ℄,andlaterextendedin[4 ℄forappli ationonrandom s enarios. Wepointedoutthat existing onvergen eresultsforthisframework an beapplied to DTNs only in the ase of syn hronous node operation and in thepresen e ofsimple mobilitymodels withoutmemory. Therefore,we ad-dressed both these issues: rst, we proved onvergen e to optimality of the sub-gradientoptimizationte hniqueunderamoregeneral lassofmobility pro- essesandse ond, weproposedsomemodi ationstotheoriginalsub-gradient algorithm so as to avoidbias problems (i.e., onsisting of the onvergen e to-wardssub-optimal solutions) when nodes operate asyn hronously. Finally, as a ase study, we applied the presented framework to the optimization of the disseminationofdynami ontentinaDTN.Alltheprovidedresults onrmed thatthedistributedsub-gradientmethodisanee tiveandverypromisingtool foroptimizationin distributed ontexts.

(25)

−2.6

−2.1

−1.6

−1.1

−0.6

−0.1

0.4

0.9

1

2

3

4

5

6

7

8

9

10

log bandwidth [log

10

µ]

global utility function [f(x)]

optimum

distributed estimation − decreasing step−size

distributed estimation − fixed step−size

Figure6: Performan e omparison: entralizedsolver

vs

distributed method.

8 A knowledgments

Authors kindly thankDr. Mi hele Rossi, forhis pre ious omments and sug-gestions.

(26)

A Derivation of Equation (2)

Inthis Appendixweshowthemathemati alstepsthat leadfromequation (2 ) toequation(3 ).

From(2 )wehavethat

X

(k + 2) = A(k + 1)X(k + 1) − Γ(k + 1)D(k + 1) =

= A(k + 1)[A(k)X(k) − Γ(k)D(k)] − Γ(k + 1)D(k + 1) =

= A

(k+1)

(k)

X(k) − A

(k+1)

(k+1)

Γ(k)D(k) − Γ(k + 1)D(k + 1) ,

where

Γ(k)

is a diagonal matrix and the element

[Γ(k)]

ii

= γ(k)

, for all

i

. Iteratively,repla ing

k + 2

with

k + s + 1

,

k + 1

with

k + s

and

k

with

k + s − 1

, respe tively, leadsto

X(k + s + 1) =

=A

(k+s)

(k+s−1)

X(k + s − 1) − A

(k+s)

(k+s)

Γ(k + s − 1)D(k + s − 1)+

− Γ(k + s)D(k + s) =

=A

(k+s)

(k+s−1)

[A(k + s − 2)X(k + s − 2) − Γ(k + s − 2)D(k + s − 2)] +

− A

(k+s)

(k+s)

Γ(k + s − 1)D(k + s − 1) − Γ(k + s)D(k + s) =

=A

(k+s)

(k+s−2)

X

(k + s − 2)+

− A

(k+s)

(k+s−1)

Γ(k + s − 2)D(k + s − 2)+

− A

(k+s)

(k+s)

Γ(k + s − 1)D(k + s − 1) − Γ(k + s)D(k + s) =

=A

(k+s)

(k+s−2)

X

(k + s − 2) −

1

X

l=0

A

(k+s)

(k+s−l)

Γ

(k + s − 1 − l)D(k + s − 1 − l)+

− Γ(k + s)D(k + s) =

. . .

=A

(k+s)

(1)

X(1) −

k+s−2

X

l=0

A

(k+s)

(k+s−l)

Γ(k + s − 1 − l)D(k + s − 1 − l)+

− Γ(k + s)D(k + s) ,

andnally,repla ing

k + s

with

k

,wehavethat

X

(k + 1) = A

(k)

(1)

X

(1) −

k−2

X

l=0

A

(k)

(k−l)

Γ

(k − 1 − l)D(k − 1 − l) − Γ(k)D(k)

(14)

or, repla inginequation(14 )

k − l

with

s

X(k + 1) = A

(k)

(1)

X(1) −

k

X

s=2

A

(k)

(s)

Γ(s − 1)D(s − 1) − Γ(k)D(k) ,

(27)

B Stationarity and Ergodi ity: Con epts

Let

(Ω, F, P)

be a probability spa es. Let also the fun tion

T : Ω → Ω

bea measurabletransformationof

intoitself.

Denition 2 (Measure-Preserving). A measurabletransformation

T : Ω → Ω

into

ismeasure-preservingif,forevery

B

in

F

,

P

(T B) = P(B)

.

Denition3(Stationarity). Arandomsequen e

ω ∈ Ω

isstationaryiftheshift operatorismeasure preserving.

Denition4(Invariantset). If

T

isameasure-preservingtransformation,

B ∈

F

isaninvariantsetif

T B = B

,orequivalently

P

[(B \ T B) ∪ (T B \ B)] = 0

.

Denition 5 (Ergodi ity). Ameasure-preservingtransformation

T

isergodi , if,givenanyinvariantset

B ∈ F

,itholds

P

(B) = 0

or

P

(B) = 1

.

Denition 6 (Mixing). A measure-preserving transformation

T

is mixing if, forall

B

and

C

in

F

,

lim

n→∞

P

(B ∩ T

n

C) = P(B)P(C).

Notethatwhenarandomsequen eissaidtobeergodi tout ourt,itmeans that the shift operator is ergodi . Similarly when a random sequen e is said to bemixing(or mixingin theergodi -theoreti sense),itmeansthattheshift operatorismixing.

(28)

C Proof of Lemma 1

InthisAppendixweprovelemma1,reportedin thefollowingforreader onve-nien e.

Lemma. (Windowing aMarkov hain) Let

Φ = {Φ

n

, n ∈ N}

beanirredu ible, aperiodi and stationary Markov hain. Consider the sto hasti pro ess

Ψ =

n

, n ∈ N}

,where

Ψ

n

= (Φ

n

, Φ

n+1

, · · · , Φ

n+h−1

)

with

h

apositiveinteger.

Ψ

isalso anirredu ibleaperiodi stationaryMarkov hain.

Proof. First ofallit isevidentthat

Ψ

isalsoa Markov hain,whosestatesare possible

h

-uplesof statesof

Φ

,e.g

(s

1

, s

2

, · · · , s

h

)

. Thetransitionprobabilities ould be al ulated starting from those of

Φ

. Stationarity of

Ψ

easily follows from the stationarity of

Ψ

.

Ψ

is also irredu ible be ause

Φ

is irredu ible. In fa t, given two states

s

= (s

1

, s

2

, · · · , s

h

)

and

t

= (t

1

, t

2

, · · · , t

h

)

, for the irredu ibilityof

Φ

, itexists

n

0

,su h thatthe hain

Ψ

movesfrom

s

to astate

u

= (u

1

, u

2

, u

h−1

, t

1

)

after

n

0

steps,andthenitispossible tomovefrom

s

h

to

t

1

.

t

isastateof

Ψ

andthereforeitisalsoavalidsequen eofstatetransitions for

Φ

, onsequentlyin

h − 1

timesteps,

Φ

anmovefrom

t

1

to

t

h

goingthrough

t

2

,

· · ·

,

t

h−1

and

Ψ

anmovefrom

u

to

t

. In on lusionin

n

0

+ h − 1

steps,

Ψ

an movefrom

s

to

t

.

Aperiodi ity requires a more detailed dis ussion. Given a possible state

s

= (s

1

, s

2

, · · · , s

h

)

,wewanttoprovethat thegreatest ommondivisorofthe possibletimestepsafterwhi h the hain

Ψ

an returnin

s

isequalto

1

. Note that even if

Φ

hadthe property that itis possible to dire tly movefrom ea h state to itself, for a state

s

with

s

i

6= s

1

for some

i = 2, · · · , h

, at least

h

steps are requiredto return to that state. Consider the minimum number

k

0

of timestepsafter whi h the hain

Φ

an movefrom

s

h

to

s

1

(again

k

0

exists be ause

Φ

is irredu ible). Consider then the in reasing sequen e of all the possibletimesteps

k

1

, k

2

, . . .

after whi hitispossibletoreturnin

s

1

. Observe that also

2k

i

belongstothis sequen e. Itis learthat

Ψ

an returnin

s

after

k

0

+ k

1

+ h − 1, k

0

+ k

2

+ h − 1, . . .

steps. Letusdenote

g

thegreatest ommon divisorofthissequen eofnumbers,wehavethatforea h

i > 0 (k

0

+ k

i

+ h − 1)

mod g = 0

. Inparti ularalso

(k

0

+ 2k

i

+ h − 1) mod g = 0

,anditfollowsthat

(k

1

) mod g = 0

. Thisimpliesthat

g

isalsoadivisorofthesequen e

k

1

, k

2

, . . .

. Sin e

Φ

isaperiodi , itfollowsthat

g = 1

. This on ludes theproof that

Ψ

is alsoaperiodi .

Figure

Figure 1: Toy example, onvergene of the three estimates. Top graph: state's
Figure 2: Toy example, onvergene of the three estimates in ase of xed step-
Figure 3: Toy example, onvergene of the three estimates in ase of asyn-
Figure 4: Optimal bandwidth alloation for a network of 10 nodes.
+3

Références

Documents relatifs

~ber 5 willkiirlichen Punkten des R.~ gibt es eine Gruppe yon 32 (involutorischen) Transformationen, welche auch in jeder anderen iiber irgend 6 besonderen

This paper presents a fully distributed algorithm allowing each node of a DTN to estimate the status of its own sensors using LODT performed during the meeting of nodes.. The

Ne croyez pas que le simple fait d'afficher un livre sur Google Recherche de Livres signifie que celui-ci peut etre utilise de quelque facon que ce soit dans le monde entier..

DXS provides con- current 3270, 3780 emulation and is capable of running inquiries that access data from both the DXS and the host simultaneously. DXS uses high-level

Keywords : sub-gradient diusion, Leray-Lions operator, singular ux, tangential gradient, duality, elliptic equation, parabolic equation, contraction, collapsing sand- pile,

Most of the performance metrics are related to nodes (e.g., energy consumption at each node) or to messages (e.g., delivery time, delivery probability or energy consumption

2) Synchronization method and messages for Tree Merging Process: ‘Rule3’ in Figure 1 represents the spanning tree construction scenario (merging trees pro- cess). DA-GRS

For smooth and strongly convex cost functions, the convergence rates of SPAG are almost always better than those obtained by standard accelerated algorithms (without preconditioning)