HAL Id: inria-00506485
https://hal.inria.fr/inria-00506485v3
Submitted on 24 Aug 2010
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Networks
Riccardo Masiero, Giovanni Neglia
To cite this version:
Riccardo Masiero, Giovanni Neglia. Distributed Sub-gradient Method for Delay Tolerant Networks.
[Research Report] RR-7345, INRIA. 2010. �inria-00506485v3�
a p p o r t
d e r e c h e r c h e
0
2
4
9
-6
3
9
9
IS
R
N
IN
R
IA
/R
R
--7
3
4
5
--F
R
+
E
N
G
Networks and Telecommunications
Distributed Sub-gradient Method for Delay Tolerant
Networks
Riccardo Masiero — Giovanni Neglia
N° 7345
Centre de recherche INRIA Sophia Antipolis – Méditerranée
2004, route des Lucioles, BP 93, 06902 Sophia Antipolis Cedex
Ri ardoMasiero
∗†
,Giovanni Neglia
†
Theme: Networks andTele ommuni ations Networks, SystemsandServi es, DistributedComputing
Équipe-ProjetMAESTRO
Rapportdere her he n° 7345June201038pages
Abstra t: Inthisreport we onsiderthat nodesin a DelayTolerantNetwork (DTN) may ollaborate to minimize the sum of lo al obje tive fun tions, de-pendingingeneralonsomeparametersora tionsofallthenodesinthenetwork. Ifthelo alobje tivefun tionsare onvex,it anbeadoptedare entlyproposed omputation framework that relies onlo al sub-gradientmethods and onsen-sus algorithmsto average ea h nodeinformation. Existing onvergen eresults for this framework an be applied to DTNs only in the ase of syn hronous nodeoperationandsimplemobilitymodelswithoutmemory. Weaddressboth these issues. First, we prove onvergen e to the optimal solution for a more general lassofmobility pro esses. Se ond,weshowthat,underasyn hronous operations, a straight appli ation of the original method would lead to sub-optimal solutions and we propose some hanges to solve this problem. As a parti ular asestudy, we showhow theframework an beapplied to optimize thedisseminationofdynami ontentin aDTN.
Key-words: delay tolerant networks, distributed optimization, onsensus, sub-gradientmethod
∗
DepartmentofInformationEngineering, UniversityofPadova6/B,Via G.Gradenigo, I35131Padova(PD),Italy. E-mail: ri ardo.masierodei.unipd.it.
†
INRIA-MAESTRO,2004routedesLu ioles,BP93,FR-06902SophiaAntipolis(BIOT), Fran e. E-mails: {ri ardo.masiero, giovanni.neglia}sophia.inria.fr.
Résumé : Dans e rapport, nous onsidérons que les n÷uds dans un réseau tolérant les delais (Delay TolerantNetwork, DTN) peuvent ollaborer an de minimiserlasommedesfon tionsobje tivelo ales,quidépendentengénéraldes paramètresoudesa tionsdetouslesn÷udsduréseau. Silesfon tionsobje tif lo alessont onvexes,ilpeutêtreadoptéuneméthodologieré emmentproposé, quis'appuiesurle al uldusous-gradientdelafon tionlo aleetlesalgorithmes de onsensuspourfairelamoyennedel'informationde haquen÷ud. Les résul-tats de onvergen e existantspour etteméthodologie peuvent être appliqués auxDTNs uniquementdans le asde operationsyn hronedesn÷uds et pour desmodèlesde mobilitésimples,sansmémoire. Nousabordons es deux ques-tions. Toutd'abord,nousprouvonsla onvergen edelaméthodeà lasolution optimale pourune lasse plusgénérale des pro essus de mobilité. Deuxième-ment,nousmontronsque,dansle asdeopérationasyn hrone,uneappli ation dire te dela méthodeoriginale onduit à dessolutions sous-optimales et nous proposonsquelquesmodi ationspourrésoudre eproblème. Commeétudede asparti ulier,nousmontrons ommentle adrepeutêtreutilisépouroptimiser ladiusionde ontenusdynamiques dansunDTN.
Mots- lés : réseau tolérant les délais, optimisation distribuée, onsensus, méthodedesous-gradient
1 Introdu tion
Innetworkingappli ations,theperforman eofaDelayTolerantNetwork(DTN) isaglobalmeasurethatdependsonde isions(i.e.,proto olrules)andvariables (i.e., proto ol parameters) at ea h network node. Hen e, the optimization of anygivennetworkproto ol an bedes ribedas a globaloptimization problem whi hisgovernedbythelo ala tionstakenbyea hnode. Asanexample,the messagedeliverydelayandtheenergy onsumptionunderthegossipproto ol[1 , 2℄ depend on the message forwarding probabilities whi h an be lo ally and independently al ulated by ea h node. To further ompli ate matters, lo al (butgloballyoptimal)de isionsatdierentnodesarenotindependentandthe optimal ongurationis in generalheterogeneous and depends on the spe i s enario, as dierent nodes have dierentroles in the network. Given this, it maybenotpossibleto omputeoptimalproto ol rulesandparameterso-line prior to network deployment. In addition, the dis onne ted nature of DTNs alls foron-lineanddistributed approa hestooptimization where,in pra ti e, ea hnodehasa esstolo alvariablesandruleswhi h anonlybeseta ording to what o urs within its immediate surroundings (the visibility s ope of the node).
The authors of [3 ℄ present a distributed solution to this problem for the ase where the global optimization target
f
an be expressed as sum ofM
onvex fun tionsf
i
and ea h nodei
only knows the orresponding fun tionf
i
, referred to as the lo al obje tive fun tion. Many performan e metri s of interest have this de omposition property. For example, this is the ase of performan emetri s related tonodes(e.g., energy onsumption at ea h node) ortomessages(e.g.,deliverytime,deliveryprobability,numberof opiesinthe network). In either ase, the metri s an naturally beexpressed as a sum of lo al ost fun tions relative to ea h node. Convexity may benot guaranteed, but when this assumption does nothold the system onverges in generalto a sub-optimalbut stilldesirablesolution.Intheframeworkproposed in[3 ℄,and laterextendedin [4 ℄,nodesoptimize theirown lo al obje tivefun tions through a sub-gradientmethod,where they trytorea hagreementontheirlo alestimatesbyo asionallyex hangingtheir lo al information and averaging it, likein a onsensus problem [5 ,6 ℄. Within thisapproa h, referredtoasthedistributedsub-gradientmethod,thelo al esti-mate ofea h nodeis proven to onvergeto theoptimalsolutionunder ertain assumptions. However,two ofthese assumptionsappearto beparti ularly re-stri tiveforpra ti aluseinDTNs enarios. Spe i ally,thenodemobility pro- essshouldhavestri tdeterministi boundsontheinter-meetingtimesbetween nodes, see[3 ℄,or itshould bememory-less, see[4 ℄. Neitherofthese onditions is in generalsatised in a real network. Se ond,allnodes shouldupdate their estimates at the same time, but syn hroni ityis di ult to a hievein su h a dis onne teds enario.
Asarst ontribution, inthisreport werelaxtheseassumptions. First,we prove that the distributed sub-gradient method also onverges under a more general Markovian mobility model with memory in the meeting pro ess. In addition,weshowthatadire tappli ationoftheframeworkwhennodesoperate asyn hronouslymayintrodu e a bias,leading to onvergen e to asub-optimal solution. Hen e,wepropose some adjustments, and show by simulationsthat theyareableto orre tthebias.
Furthermore, inspiredbytheworkin [7 ℄,wepropose apossible appli ation oftheframeworktoaDTNs enariowhereaServi eProvider(SP)disseminates adynami ontentover amobileso ialnetwork,withthehelpoftheusersthat opportunisti allyshareamongthemselves ontentupdates. Inthis ontextthe SP shouldde idehowto allo ateitsbandwidthoptimally, andtothis purpose it needs to olle t information about nodeutility fun tions and nodemeeting rates. Weshowthatdistributedsub-gradients anbeee tivelyusedtoletthe nodesperformsu hoptimization.
The report is organized as follows. In Se . 2 we review the distributed sub-gradient method, whilst in Se . 3 weshow how to apply it to DTNs and motivate our work. As original ontributions, in Se . 4 we extend theresults in [3 ℄and[4 ℄tomore generalnetworkmobility modelsandin Se .5,westudy howthepresentedframeworkneedstobeextendedto opewithasyn hronous node operations. Hen e, in Se . 6, we illustrate a possible DTN appli ation exploiting thedistributed sub-gradientmethod. Se .7 on ludesthereport.
2 Distributed Sub-gradient Method's Overview
InthisSe tionwereviewthemainresultsin[3,4℄on onvergen eand optimal-ityofthedistributed sub-gradientmethod whena randomnetworks enario is onsidered.
Letus onsideraset of
M
nodes(agents),thatwantto ooperativelysolve thefollowingoptimizationproblem:Problem1(GlobalOptimizationProblem). Given
M
onvexfun tionsf
i
(x) :
R
N
→ R
,determine:x
∗
∈ argmin
x∈R
N
f (x) =
M
X
i=1
f
i
(x) .
Clearly, for theaboveproblem weassumethat a feasiblesolution exists. The di ultyof thetask arisesfrom the fa t that agent
i
, fori = 1, 2, · · · M
, only knowsthe orrespondingfun tionf
i
(x)
,namelyitslo al obje tivefun tion. For examplef
i
ouldbeaperforman emetri relativetonodei
,andf
ouldindi ate globalnetworkperforman e.Ifthefun tions
f
i
aredierentiable,ea hnode ouldapplyagradientmethod toitsfun tionf
i
togenerateasequen eoflo alestimates,but thiswouldlead toM
biased estimatesofthe solutionofproblem 1. In[3 ℄ and[4℄, itisshown thatifnodesperformagradientmethodbutarealsoabletoaverage theirlo al estimates, under opportune onditions,these estimatesall onvergeto a point ofminimumoff
,i.e.,x
∗
.
Inparti ular, atimeslottedsystemisassumed,where,at theend ofaslot, ea hnode
i
ommuni ates itslo alestimateto asubsetofalltheother nodes, andthenupdatestheestimatea ordingtothefollowingequation1 :
x
i
(k + 1) =
M
X
j=1
a
ij
(k)x
j
(k) − γ(k)d
i
(k) ,
(1)where theve tor
d
i
(k) ∈ R
M
is a sub-gradient 2
ofagent
i
'sobje tivefun tionf
i
(x)
omputed atx
= x
i
(k)
, thes alar
γ(k) > 0
is the step-size of the sub-gradient algorithm at iterationk
, anda
ij
(k)
are non-negative weights, su h thata
ij
(k) > 0
ifand onlyifnodei
hasre eived nodej'sestimateat thestepk
andP
M
j=1
a
ij
(k) = 1
. WedenotebyA(k)
thematrixwhoseelementsarethe weights,i.e.[A(k)]
ij
= a
ij
(k)
.Weobservethattherstaddend intherighthand sideof1 orrespondsto averagea ordingtoa onsensusalgorithm[5 ℄.
[3 ℄provesthat theiterations(1 ) generatesequen es onvergingto a mini-mumof
f
underthefollowingsetof onditions:1. thestep-size
γ(k)
issu hthatP
∞
i=1
γ(k) = ∞
andP
∞
i=1
γ(k)
2
< ∞
; 2. thegradientofea hfun tionf
i
isbounded;3. ea hmatrix
A(k)
issymmetri (thendoubly sto hasti ); 1Inthisreportalltherealvaluedve torsareassumedtobe olumnve tors. 2
d
i
∈ R
N
isasub-gradientofthefun tion
f
i
atx
i
∈
dom(f
i
)
iiff
i
(x
i
) + (d
i
)
T
(x − x
i
) ≤
f
i
(x)
forallx
∈
dom(f
i
).
4. itexists
η > 0
,su hthata
ii
(k) > η
and,ifa
ij
(k) > 0
,thena
ij
(k) ≥ η
; 5. theinformation of ea h agenti
rea hes every other agentj
(dire tly orindire tly)innitelyoften;
6
′
) thereisadeterministi boundfortheinter ommuni ationintervalbetween two nodes.
Webetterformalize onditions
5
and6
′
(resp.
6
′′
). Considerthegraph
(V, E
∞
)
, whereV
is the set of nodes and the edge(i, j)
belongs toE
∞
if nodesi
andj
ommuni ate innitely often (i.e. ifa
ij
(k)
is positive for innite valuesk
). Condition5
imposesthatthegraph(V, E
∞
)
is(strongly) onne ted. Condition6
′
requires that there is a positive integer onstant
B
, su h that two nodes ommuni atinginnitelyoften, ommuni ateatleaston eeveryB
slots,i.e.,if(i, j) ∈ E
∞
,thenmax{a
ij
(k), a
ij
(k + 1), · · · , a
ij
(k + B − 1)} > 0
.In[4 ℄, the inter-meeting times arenot deterministi ally bounded, but ma-tri es are required to be independently and identi ally distributed. In fa t, ondition
6
′
isrepla edbythefollowingone:
6
′′
) matri es
A(k)
arei.i.d. randommatri es.In su h ase,
(i, j) ∈ E
∞
if and only if E[a
ij
(k) > 0]
. Note that, when the matri esA
(k)
arerandom(likein6
′′
), ondition
5
requiresthematrixE[A(k)]
tobeirredu ibleandaperiodi .Bothpapersaddressalsothe asewhenthegradientstep-sizedoesnot van-ish, but it iskept onstant(
γ(k) = γ
). Inthis ase,thesequen e ofestimatesx
i
does not onvergein general to a point of minimumof
f
, but itmay keep os illatingaroundone of su h point. It ispossible tobound thedieren e be-tweenthevalues thatf
assumesat thepointsofa smoothedaverageofx
i
and theminimumof
f
. Asitisintuitive,thesmallerγ
,thesmallersu hdieren e. Wearegoingtoprovideanintuitiveexplanationofwhytheresultson on-vergen e and optimality hold, and an outline ofthe proofs in [3 ,4 ℄. This will beusefulforourfollowingextensions. Werstformulate(1 )in matrixformas follows:X(k + 1) = A(k)X(k) − γ(k)D(k) ,
(2) whereX
(k)
def
=
x
1
(k + 1), · · · , x
i
(k + 1), · · · , x
M
(k + 1)
T
andD(k)
def
=
d
1
(k + 1), · · · , d
i
(k + 1), · · · , d
M
(k + 1)
T
.
Thisequationiterativelyleads to(seeAppendixA)
X(k + 1)
=
A
(k)
(1)
X(1) −
k
X
s=2
A
(k)
(s)
γ(s − 1)D(s − 1) +
−γ(k)D(k) ,
(3) whereA
(k)
(s)
,withs, k ≥ 1
ands ≤ k
,istheba kwardmatrixprodu t,i.e.,A
(k)
(s)
=
A(k)A(k − 1) · · · A(s)
. We introdu e the average of all the nodes estimates,y(k) ∈ R
M
,denedas:
y(k)
T
=
1
M
1
By (2 ),weobtain
y(k + 1)
T
=
1
M
1
T
A(k)X(k) − 1
T
γ(k)
M
D(k) =
=
y(k)
T
−
γ(k)
M
1
T
D(k) .
(4)Assume for a moment that
x
i
(k) = x
j
(k) = y(k)
, forea h
i
andj
, then sub-gradientsd
i
(k)
are allevaluated in
y(k)
and1
T
D(k)
is a sub-gradient of the globalfun tion
f
evaluatediny(k)
. Thus, theaboveequation(4 ) orresponds to a basi iteration of a sub-gradient method for the global fun tionf
. The intuitiveexplanationoftheresultisthataveragingkeepstheestimatesofx
i
(k)
, for all
i
, lose ea h other (and then lose toy(k)
) and makes the lo al sub-gradients updates equivalent to a sub-gradient update for the global fun tionf
.We illustrate this through a simple toy example that we are going to use dierent timesa ross thisreport. Consider three nodes, labeledas
1
,2
and3
. Theirlo alobje tivefun tionsaref
1
(x) = f
2
(x) = x(x − 1)/2
andf
3
(x) = 2x
2
, where
x ∈ R
. Then the global fun tionisf (x) =
P
i
f
i
(x) = 3x
2
− x
, it has minimumvalueequalto−1/12
andauniquepointofminimuminx = 1/6
. The weightmatri esA
(k)
arei.i.d. randommatri es. Atea hstepA
(k)
isequalto one ofthefollowingthree matri es
1
2
1
2
0
1
2
1
2
0
0
0
1
,
1
2
0
1
2
0
1
0
1
2
0
1
2
,
1
0
0
0
1
2
1
2
0
1
2
1
2
,
(5)withprobability
2/3
,1/6
and1/6
,respe tively. Fig.1showstheevolutionofthe estimates atthe 3 nodes, whenthealgorithm is appliedwithγ(k) = 1/k
. We anseethatstate'sestimatestendsto oupleandthen onvergetotheoptimal value. Inparti ular,estimatesofnode1
and node2
arekept loserea h other sin etherstmatrixofaboveissele tedwithhigherprobability. Similarresults havebeenobtainedwithγ(k) = γ ≪ 1
.Theproofsofthe onvergen eresultsin[3 ,4℄sharemainlythesameoutline. A keyelementis provingthat theaveraging omponent(the onsensus) ofthe algorithm onvergesexponentiallyfast. Moreformally,under
6
′
,[3 ℄provesthat
A
(k)
(s)
surely onverges to the matrixJ
= 1/M 11
T
, and that there are two
positive onstants
C
andβ
su h thatkA
(k)
(s)
− Jk
∞
≤ Cβ
k−s
for allk ≥ s
. Under6
′′
,[4 ℄provesalmostsurely onvergen eof
A
(k)
(s)
toJ
,andanexponential onvergen erate in expe tation, i.e.E[kA
(k)
(s)
− Jk
∞
] ≤ Cβ
k−s
. Then, similar
boundsareestablishedforthedistan ebetween
y
andx
∗
,andbetween
x
i
and
y
, and onvergen e results (whenγ(k)
satises ondition 5) and asymptoti bounds(whenγ(k)
is onstant)followfromtheexponential onvergen erateof theaveraging omponent. Asa onsequen eofthedierentkindof onvergen e forA
(k)
(s)
in thetwo ases,these resultshold surelyunder6
′
andalmost surely under
6
′′
0
50
100
150
200
250
300
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
number of pair meetings
node state [x]
optimal x
*
x
1
x
2
x
3
0
50
100
150
200
250
300
−0.1
−0.05
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
number of pair meetings
objective function [f(x)]
optimal f(x
*
)
f(x
1
)
f(x
2
)
f(x
3
)
Figure 1: Toyexample, onvergen eof thethreeestimates. Topgraph: state's estimatesfor ea h node. Bottom graph: obje tivefun tionvalue omputedin thestate'sestimatesofea hnode.
3 Appli ation to Optimization in DTNs
Thedistributedsub-gradientmethodpresentedinSe .2isparti ularly appeal-ing in the ontext ofDelayTolerantNetworks, see e.g.,[8℄and [9 ℄. DTNs are sparseand/orhighlymobilewirelessadho networkswhereno ontinuous on-ne tivityguarantee anbeassumed. Thisintrinsi allyleadstotheimpossibility of olle ting,atlow ostandatasingledatapro essingpoint,theinformation neededtosolvenetworkoptimizationproblemsina entralizedfashion. Dueto this,inthepresentreportweadvo atetheuseofdistributedapproa hes,whi h lend themselves well to distributed and ommuni atione ientoptimization. Tobemore on rete,in what followswebriey dis usstwopossible DTN s e-narioswhere aglobalnetwork'sfun tion
f
hastobeoptimized.One entralprobleminDTNsisrelatedtoroutingpa ketstowardsintended destinations. Commonte hniques,designedtoover ometheabsen eofa om-pleterouteto thedestination, relyonmulti- opydisseminationof messagesin the network [10℄. In this ontext, it is natural to dene global optimization fun tions that areable to takein a ount thetrade-o between delivery time and the ost due to the useof resour es su h as buer spa e, bandwidth and transmission power. Fun tions of this kind are onvex and an be written as sumoflo ally measurablequantities,thuswe an optimizedthemthrough the distributed sub-gradientframework[11 ℄.
Ase ondDTNs enario on ernsthedisseminationofdynami ontent,like newsortra information. Referringtotheappli ationexampleof[7 ℄,wemight thinkofaServi eProvider(SP)withlimitedbandwidth,thathastode idethe update rate to assignat ea h node. Nodes an share their ontentwhen they meet with the global obje tive of maintaining the average information in the networkasfresh aspossible.[7 ℄showsthatthisproblem analsobeformalized as a lassi al onvexoptimization problem, and that the orresponding global obje tivefun tion anbeexpressedintermsofthesumoflo alfun tions. The derivation ofthe latterlo al fun tions entails the olle tion of statisti swhi h are omputed at ea h nodeonly onsidering its own meeting o urren es. As an appli ation examplefor our te hniques, in Se .6 we apply thedistributed sub-gradientframeworktothiss enario.
From a general perspe tive, the distributed sub-gradient optimization in DTNs an be applied as follows. Nodes ex hange their lo al estimates every timetheymeetandperformtheupdatestepinequation(1)atagivensequen e of time instants
{t
k
}
k
≥1
. This sequen e an either oin ide with themeeting times, i.e.,ea h timetwo nodesmeetstheyex hangeand subsequently update theirestimatesorbeindependentfromthem,i.e.,inthis ase{t
k
}
k≥1
isdened a prioriandisknowntoeverynetworknode. Inany ase,theweightmatri esA(k)
originate from thenode meeting pro ess. In parti ular, we an onsider the onta tmatrixC(k)
, wherec
ij
(k) = 1
ifnodei
has metnodej
sin elast time instantt
k−1
, andc
ij
(k) = 0
otherwise. WedenoteasC
the(nite)set of allpossibleM × M
onta tmatri esdes ribingthe onta tsamongM
nodes. Ea h nodei
an thus al ulate its own weightsa
ij
(k)
, forj = 1, . . . , M
, in oneofthefollowingtwoways(whi hguaranteethatthematrixA
(k)
isdoubly sto hasti ):Rule 1 (Updates independent from meetings)For
j 6= i
, seta
ij
(k) = 1/M
ifc
ij
(k) = 1
, otherwiseseta
ij
(k) = 0
. Seta
ii
(k) = 1 −
P
j6=i
a
ij
(k)
. Thismethodrequiresea h nodetoknow
M
,i.e., thetotalnumberofnodesin thesystem.Rule 2 (Updates syn hronizedwith meetings) Whenever node
i
meets nodej
,italsoupdatesitsestimate. Inthis ase,seta
ij
(k) = a
with0 < a < 1
,a
ii
(k) = 1 − a
anda
ih
= 0
forh 6= j, i
.Next,wedis usstwo keyissuesthat an negativelyimpa t the onvergen e of thedistributedoptimizationpro essinaDTNs enario. Therstoneisrelated to the validity of assumptions
6
′
and6
′′
. In fa t, ondition6
′
is essentially equivalent toassume thatthere is a deterministi boundfor theinter-meeting times of two nodes (that meet innitely often), and this is for example not the ase for all therandom mobility models usually onsidered, see e.g.,[12 ℄. Condition6
′′
relaxes
6
′
,butrequirestheindependen eofthemeetingso urring in ea h time slot
[t
k−1
, t
k
]
, and meetings under realisti mobility are instead orrelated(e.g.,ifinthere entpasti
hasmetj
andj
hasmeth
,thenthethree nodesarelikelytobe loseinspa eandtheprobabilitythati
meetsh
ishigher that withuniform andindependentmobility). Weaddress thisissuein Se .4 , where weprovethat onvergen eresultshold undermore generalassumptions onthesto hasti pro essofthematri esA(k)
.These ond issueis relatedto thesyn hroni ityoftheupdates. Infa t,the originalframework[3 ,4 ℄requiresallthenodestoupdatetheirestimatesatthe
sametimeinstants. Thisisnotalwaysfeasibleinadis onne tedanddistributed s enariolikeaDTN.Forexample,underRule2thereadermayhavenotedthat syn hronousupdates require ea h node to know when a meeting between any two nodesinthenetworko urs. Thisdoesnotappear tobepra ti al. Under Rule 1, whi h requires ea h node to know the total number of nodes in the system,nodesshouldalsotrytokeeptheirinternal lo kssyn hronizedinorder to be able to perform their updates at lose enough time instants and this presents some di ulties as well. We now show through an examplethat we annot simply ignore the issue of syn hroni ity and that a dire t appli ation of the algorithm des ribed in the previous se tion in general does notleadto orre tresults. Comingba ktothetoyexamplepresentedinSe .2,weobserve that we an think ourthree matri es in (5) as generated a ordingto Rule 2, when the meeting pro ess hasthe following hara teristi s: at ea h time slot, node
1
andnode2
meetwithprobability2/3
,node1
and3
meetwithprobability1/6
and node2
and3
meet with probability1/6
. Fig. 2 shows the evolution of the estimates when the step-size is onstant and equal to25 · 10
−4
, both for the syn hronous ase, where all the nodes update their estimates when a meeting o urs (even the node that is not involved in the meeting), and for the asyn hronous ase,where only thenodesinvolvedin themeeting perform the update. The urves represents the average estimates over
100
dierent simulationswithdierentmeetingsequen es. Wenotethat inthesyn hronous ase(top gure)allnodesagreeontheoptimalvalueto setx
,whereas,in the asyn hronous ase(bottom gure) theestimatesstill onverge, but notto the orre t point of minimum for the global fun tionf
. We address this issue in Se .5 ,whereweunderstandtherootsofthis onvergen eproblemandpropose some simplemodi ationstothebasi frameworktoee tively opewithit.Inour opinion, these extensions to the basi framework proposed in [3 ,4℄, while motivated in this report by the DTN s enario, are of wide interest for otherpossibleappli ationssu hasmobilewirelessadho andsensornetworks.
0
200
400
600
800
1000
1200
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
number of pair meetings
node state [x]
optimal x
*
biased x
*
x
1
x
2
x
3
0
200
400
600
800
1000
1200
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
number of pair meetings
node state [x]
Figure2: Toyexample, onvergen eofthethreeestimatesin aseofxed step-size
γ = 25 · 10
−4
. Results have beenaveraged over
100
simulation runs. Top graph: syn hronousupdates. Bottomgraph: asyn hronousupdates.4 Extension to more General Mobility Models
In our DTN s enario, we onsider that the weights are determined from the onta tsthrough abije tivefun tion(asin the aseofthetwo rulespresented in Se .3 ). Then onditions
5
and6
′′
of [4 ℄, an be expressed in terms of the sequen e of onta t matri es as follows: the onta t matri es
C(k)
are i.i.d. and E[C(k)]
is anirredu ible aperiodi matrix. In thisse tion, weextendthe onvergen eresultsto thefollowing,moregeneral,mobilitymodel.Assumption 1 (Mobility model). It existsan irredu ible, aperiodi and sta-tionaryMarkov hain
Φ
withaniteor ountablesetofstatesS
andafun tiong : S → C
,su hthatC(k) = g(Φ
k
)
, forea hΦ
k
∈ S
. Moreover, E[C(k)]
is an irredu ibleaperiodi matrix.Sin ethereisabije tive orresponden eamongweightand onta tmatri es, weobservethat under assumption 1,italso exists a fun tion
ˆ
g : S → A
, su h thatA(k) = ˆ
g(Φ
k
)
. The asewhenthe onta tmatri es(andthentheweight matri es)arei.i.d. isaparti ular aseofourmobilitymodel.Ourproof followsthe same outline of [3 ,4℄presented in Se . 2. Themain issue is to prove theexponentialrate of onvergen e of theba kward produ t
A
(k)
(1)
toJ
= 1/M 11
T
Beforeprovingthe onvergen eof theba kwardprodu t, weneedto re all an ergodi propertyof thetime shiftoperator
θ
forirredu ible, aperiodi and stationary Markov hains. The denitions of measure-preserving and ergodi operators may be found in Appendix B (see also hapter V of [13 ℄ for more details).Proposition1. Givenanirredu ibleaperiodi andstationaryMarkov hain
Φ
with nite or ountable states, the shift operatorθ
is measure-preserving and ergodi togetherwith allitspowersθ
k
,where
k ∈ N
.Proof. ForstationaryMarkov hainstheshiftoperatoranditspowersare mea-sure-preserving by denition of stationarity. Moreover, irredu ible, aperiodi and stationary Markov hains with nite or ountable states are mixing (see Theorem 3.1 in [14 ℄). From the denition of mixing, if
θ
is mixing, also the operatorθ
k
is mixing foranygiven
k ∈ N
. But every mixing operator is also ergodi byTheorem2in [13 ℄,thenθ
k
isergodi forany
k ∈ N
.We observe that the sto hasti pro ess
A(k) = ˆ
g(Φ
k
)
is not in general a Markov hain, be ausedierentstatesofS
maybemappedtothesameweight matrix,butneverthelessitis stationaryandergodi .Wewill alsoneedthefollowingresult,whoseproofisin AppendixC.
Lemma 1. (Windowing a Markov hain) Let
Φ = {Φ
n
, n ∈ N}
be an irre-du ible,aperiodi andstationaryMarkov hain. Considerthesto hasti pro essΨ = {Ψ
n
, n ∈ N}
,whereΨ
n
= (Φ
n
, Φ
n+1
, · · · , Φ
n+h−1
)
withh
apositive inte-ger.Ψ
isalsoan irredu ibleaperiodi stationaryMarkov hain.Now we have all theinstrumentsto study the onvergen e of
A
(k)
(1)
. First,weprovethe onvergen eto
J
. Thisisa orollaryofresultsin[15 ℄.Proposition2(Convergen eoftheba kwardprodu t). Letassumption1hold, then
lim
k→+∞
A
(k)
(1)
=
1
M
11
T
, J
almost surely (a. s.)
.
Proof. Weobservethat
A(k)
isastationaryandergodi sequen eofsto hasti matri es with stri tly positive diagonalentries. Moreover, E[A(k)]
is an irre-du ibleaperiodi matrix,thenitseigenvaluewiththese ondlargestmodulehas module stri tly smaller than 1 (|λ
2
(
E[A(k)])| < 1
). From Theorem 3 in [15 ℄, it followsthat, withprobabilityone, forea hsequen eA(k)
it exists a ve torv
∈ R
M
,su h thatP
i
v
i
= 1
andlim
k→+∞
A
(k)
(1)
= 1v
T
.
Notethatingeneral
v
isarandomvariable,dependingonthespe i sequen e{A(k)}
k≥1
. The matri esA
(k)
are doubly sto hasti , thenw
= 1/M 1
is a left eigenve tor orrespondingtotheuniteigenvalueforallthematri esA
(k)
. Theorem 4 in [15℄, guarantees that in this asethe above ve torv
is a deter-ministi onstantalmostsurelyandin parti ularisequaltow
. This on ludes theproof.Nowwearereadytoprovethatthe onvergen erateisalmostalways expo-nential.
Proposition 3. Under assumption 1 on the mobility models, if the matri es are doubly sto hasti , then for almost all the sequen es there exist
C > 0
and0 < β < 1
(withC
in general dependingof the sequen e) su hthatfork ≥ s
A
(k)
(s)
− J
max
≤ Cβ
k−s
.
Proof. Givena matrix
A
, onsider the oe ientofergodi ity,see e.g.,[16 ℄:τ
1
(A) =
1
2
max
i,j
M
X
s=1
[
A]
is
− [A]
js
.
In the proof of Theorem 3 in [15℄ is shown that it exists a positive natural
h
andη < 1
su hthatP
h
τ
1
A
(s+rh−1)
(s+(r−1)h)
< η
forinnitelymanyri
= 1 .
(6)Thenwede ompose
A
(k)
(s)
,intheprodu tofi
k
blo ksofsizeh
andoneblo k ofsize(k + 1) mod h
as itfollows:A
(k)
(s)
= A
(k)
(s+i
k
h)
A
(s+hi
k
−1)
(s+h(i
k
−1))
· · · A
(s+h−1)
(s)
.
Be auseofthepropertiesofa oe ientofergodi ity:
τ
1
A
(k)
(s)
≤ τ
1
A
(k)
(s+i
k
h)
Y
i
k
j=1
τ
1
A
(s+hj−1)
(s+h(j−1))
≤
≤
i
k
Y
j=1
τ
1
A
(s+hj−1)
(s+h(j−1))
.
Then,we an write:
log
τ
1
A
(k)
(s)
≤
i
k
X
j=1
log
τ
1
A
(s+hj−1)
(s+h(j−1))
.
We now onsider the Markov hain
Φ
, that generates" the sequen e of matri esA(k)
underlying the mobility pro ess. Be ause of Lemma 1 ,Ψ
t
=
(Φ
t
, Φ
t+1
, · · · , Φ
t+h−1
)
isanirredu ibleaperiodi stationaryMarkov hain,and then all the powers of the shift operatorθ
, and in parti ularθ
h
, are ergodi .
We observe that
log
τ
1
A
(j+h−1)
(j)
is a fun tion of(Φ
j
, Φ
j+1
, · · · , Φ
j+h−1
)
and then of
Ψ
j
. We all su h fun tionf
, i.e.,f (Ψ
j
)
def
= log
τ
1
A
(j+h−1)
(j)
.Notethat
f (Ψ
t
) ≤ 0
and,fromequation(6 ),f (Ψ
t
) < log(η) < 0
innitelyoften almost surely. We an then apply Birkho's ergodi theorem to the random sequen e{f (Ψ
s
), f (Ψ
s+h
), f (Ψ
s+2h
), · · · }
,anditfollowsthat:lim
i→∞
1
i
i
X
j=1
log
τ
1
A
(s+hj−1)
(s+h(j−1))
=
= lim
i→∞
1
i
i
X
j=1
f (Ψ
s+hj
) =
E[f (Ψ
t
)] < 0
a. s.,
therefore,
lim sup
h→∞
1
h
log
τ
1
A
(s+h)
(s)
≤
E[f (Ψ
t
)] < 0
a. s..
Consider E
[f (Ψ
t
)] < ζ < 0
, thenforalmost allthesequen esitexistsh
0
,su h that forallh ≥ h
0
,itholds:1
h
log
τ
1
A
(s+h)
(s)
≤ ζ,
i.e.,τ
1
A
(s+h)
(s)
≤ e
ζh
.
Ifwedeneβ = exp(ζ) < 1
,C = β
−h
0
andre allthat
τ
1
A
(k)
(s)
≤ 1
,weobtain:τ
1
A
(k)
(s)
≤ Cβ
k−s
fork ≥ s .
(7)Intheaboveequation,thevalueofthe onstant
h
0
dependsonthespe i random sequen e and also ons
(while the same valueζ
an be sele ted for all thesequen esand independentlyfroms
). We needthen touse a orollary of the ergodi theorem about nearly uniform" onvergen e that is stated as proposition (1.5) in [17 ℄: iff (·)
is square integrable, then for almost all the sequen es we an sele th
0
independently froms
. Clearly this is the ase for our fun tionf (Ψ
t
)
, therefore we an on lude thatC
in (7 ) only depends on the onsideredsequen e.Sofar we haveestablished the existen eof a geometri onvergen e result forthe ergodi oe ient
τ
1
. Thelast stepofourproof requiresus to provea geometri bound forthedistan ebetweenA
(k)
(s)
anditsalmost surelimitJ
. In parti ularweprovethath
A
(k)
(s)
i
uv
−
1
M
≤ 2τ
1
A
(k)
(s)
,
(8)for ea h
u
andv
. From(8 )a geometri bound followsalsoforkA
(k)
(s)
− Jk
max
.Thederivation of (8 ) hasnoparti ular di ultyand mainly followsthe proof ofTheorem4.17in[16 ℄, thenwehavemovedittoAppendix D .
In[4℄adierentresultisproven,i.e.,thatthere exist
C
ˆ
andβ
ˆ
su h that Eh
A
(k)
(s)
− J
max
i
≤ ˆ
C ˆ
β
k−s
.
Then a series of inequalities for the expe ted values of
ky(k) − x
i
(k)k
2
areobtained for all
i
. Using Fatou's Lemma, along with the non-negativeness of distan es,itispossibletoderiveinequalitiesthatholdwithprobability1. Using Proposition 3, instead, it is possible to obtain the same inequalities dire tly withouttheneedto onsidertheexpe tation.5 Asyn hronous Updates
In this se tion, we study how the presented framework needs to be extended in order to support the ase when nodes asyn hronously update their status. We onsiderthesequen e
{t
k
}
k≥1
oftimeinstantsatwhi h oneormore nodes perform an update of their estimates. Again, we denote that the estimate of nodei
attimet
k
(immediatelybeforetheupdate)isx
i
(k)
andrepresentallthe estimatesthroughthematrix
X
(k)
. Theevolutionoftheestimates anstillbe expressedin amatrixformsimilarlyto(2) :X
(k + 1) = A(k)X(k) − Γ(k)D(k) ,
(9)where
Γ(k)
isadiagonalmatrixandtheelement[Γ(k)]
ii
issimplythestep-size usedbynodei
atthek
-thupdate. Wedenotethisstep-sizeasγ
i
(k)
. Ifj
isnot among thenodeswhi h perform theupdate at timet
k
, thenit will simplybea
jj
(k) = 1
,a
jh
(k) = 0
forh 6= j
andγ
jj
(k) = 0
.Inwhatfollowswerst onsider the aseofde reasing step-sizes, similarly to ondition
1
,des ribedinSe .2. Thatis,wewill onsiderthatforea hi
,the sequen e{γ
i
(k)}
k≥1
satises:P
∞
k=1
γ
i
(k) = ∞
andP
∞
k=1
γ
i
(k)
2
< ∞
.We an go over the rationale in [4 ℄ and prove similar results for the new systemdes ription. Inparti ular,ourproofoftheexponential onvergen erate of
A
(k)
(s)
holds learly alsoin this ase. Boundsfor the distan ebetweenx
i
(k)
and
y
(k) = 1/M 1
T
X
(k)
hold withminimal hanges, sothat we anprovethe analogousofProposition 2in [4℄:
Proposition 4 (Convergen e of Agent Estimates). Under assumption 1, the estimate ofea h node onverges almostsurelytothe ve tor
y(k)
,i.e.,lim
k→+∞
ky(k) − x
i
(k)k
2
= 0
a. s. ,foralli .
Proof. SeeAppendixE .
Thefollowingstepistouseboundsforthedistan ebetween
y(k)
andx
∗
(a pointofminimumfor
f
)toshowthatlim
k
→+∞
y(k) = x
∗
.
Inparti ular thefollowing inequality is derived in [4℄(forthe syn hronous asetheyare onsidering):
k
X
s=1
γ(s) [f (y(s)) − f (x
∗
)] ≤
M
2
ky(1) − x
∗
k
2
2
+
+ 2L
M
X
j=1
k
X
s=1
γ(s)ky(s) − x
j
(s)k
2
+
L
2
2
k
X
s=1
γ
2
(s) .
Thersttermat therighthand sideoftheaboveinequalityis a onstant,the last termissummablebe auseof theassumptiononthestep-sizes. In[4 ℄it is proventhat
P
∞
s=1
γ(s)ky(s) − x
j
(s)k
2
< ∞
almostsurely. Thustheyshowthat0 ≤
∞
X
s=1
from (10 )andthefa tthat
P
∞
s=1
γ(s) = ∞
,itispossible to on ludethatlim inf
k→∞
f (y(k)) = f (x
∗
)
a. s. ,andlim
k→∞
x
i
(k) = x
∗
a. s..
InAppendix F, a similar derivation is arried on, leading to the following generalization of (10 ):
∞
X
s=1
M
X
i=1
γ
i
(s) [f
i
(y(s)) − f
i
(x
∗
)] < ∞
a. s..
(11)Unfortunately,thedierentvaluesof
γ
i
(s)
donotallowus toformulatethe inequalityaboveintermsoftheglobalfun tionf
asin (10 ) .We do not have urrently a formal result stating under whi h onditions the asyn hronoussystem onverges to theoptimal solution, but (11 )suggests us that alltheweights
γ
i
(k)
should haveonaveragethe same value. Wethen proposethefollowing onje ture,that wesupportlaterwithsomeexamples:Conje ture 1. When updates are asyn hronous, onvergen e results for sub-gradientmethodsholdifE
[γ
i
(k)] =
E[γ
j
(k)]
forea hi
andj
.Letusseehowwe anguaranteethis onditionindierent ases.We onsider that updates o ur after every meeting following rule 2 in Se . 3 . Moreover onsiderthat
γ
i
(k) = 1/n
i
(k)
,wheren
i
(k)
isthetotalnumberofupdatesnodei
has performed until the time instantt
k
. If the meeting pro ess follows a Poissonpro esswithtotalrateλ
andatea hinstanttheprobabilitythat nodei
meetsanother nodeisp
i
,weexpe tthat bytimek
,nodei
hasp
i
k
meetings (and anequal number ofupdates). Thentheexpe tedvalueof itsstep-size is E[γ
i
(k)] =
E[1/n
i
(k)] = p
i
/(p
i
k) = 1/k
. In on lusion ifstep-sizes follow the ruleγ
i
(k) = 1/n
i
(k)
, weexpe t the asyn hronous sub-gradient me hanismto onvergeto theoptimalsolution. Fig.3(top gure)showsthat thisistruefor our toyexample. The simulationsfor theoptimization problem onsidered in Se .6 onrmsu h onvergen e.Let us now revisit the examplein Se . 3 showing that the estimates were not onvergingto a pointofminimum (Fig.2, bottom gure). Here step-sizes were onstant, i.e.
γ
i
(k) = γ
. Now, reasoning as abovewe an on lude that E[γ
i
(k)] = p
i
γ
. Hen etheexpe tedvaluesarenotequalas farasnodemeeting rates(andthen updaterates) arenotequal: thiswas the aseofourexample, where3
p
1
= 5/6
,p
2
= 5/6
andp
3
= 1/3
. Intuitively,weexpe t onvergen eto bebiasedtowardsvalues losertothe optimum ofthelo al fun tions of those nodes that perform the updates more often. Equation (11 ) suggests us that what thedistributed me hanism was really doing is to minimize the fun tionP
i
p
i
f
i
= (3/2)x
2
− (5/6)x
rather thenf =
P
i
f
i
= 3x
2
− x
. This is the ase,beingthattheestimatesare onvergingto5/18
(dot-dashedlineinallthe previous gures). Ifnowwewantto orre tthebias,itissu ientto onsider that ea h node sele ts its step-size inversely proportional to its meeting rate. Fig. 3 (bottom gure) shows that also this orre tion leads the estimates to onvergeto the orre tresults.3
0
200
400
600
800
1000
1200
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
number of pair meetings
node state [x]
optimal x
*
biased x
*
x
1
x
2
x
3
0
200
400
600
800
1000
1200
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
number of pair meetings
node state [x]
Figure 3: Toy example, onvergen e of the three estimates in ase of asyn- hronous updates. Resultshavebeenaveraged over
100
simulationruns. Top graph: de reasing step-sizeγ
i
(k) = 1/n
i
(k)
. Bottom graph: weighted xed step-sizeγ
i
(k) = p
−1
i
· 25 · 10
−4
.6 Appli ation in DTNs: a Case Study
Inthisse tionweapplythedistributedsub-gradientmethodwithour enhan e-ments to a DTN s enario inspired by the work in [7 ℄. As explained in Se .5 ourenhan ements onsistof: 1)allowingnodestoupdateasyn hronouslytheir estimates,i.e., whenever anytwoof themmeetand2)applyingthede reasing stepsizeruletoavoidpossiblebiasee tsinthe onvergen etowardstheglobal optimum.
Allthenodesinthenetworkareinterestedinthesamedynami information ontentand an share itwhenever theymeet. Theinformationupdate is per-formedbyaServi eProvider(SP)thatinje tsfreshinformationinthenetwork a ordingtoa Poissonpro essofparameter
µ
update/se . Atagiveninstantt
¯
we allt
i
(¯
t)
thetimeatwhi htheSPgeneratedthemostre ent ontentversion available at nodei
,thenY
i
(¯
t) = ¯
t − t
i
(¯
t)
istheage ofsu hversion. An infor-mation ontenthasa non-in reasingvaluein time. Forexample,wegivevalueu
i
(Y
i
(¯
t))
to the information stored in nodei
, whereu
i
(·)
is a non-in reasing fun tion. ThegoaloftheSPistooptimizef (x) =
M
X
i=1
f
i
(x) =
M
X
i=1
E
x
[u
i
(Y
i
)] ,
(12)where
x
∈ R
M
istherateallo ationve tor,su hthat
P
M
i=1
x
i
≤ µ
andx
i
≥ 0
forall
i
. NotethattheageY
i
(¯
t)
ismodeledasarandomvariableY
i
dependingonx
. In[7 ℄,equation(12 )isprovedtobe on aveandthereforetheoptimalx
anbe obtainedbytheSPusingstandardoptimizationte hniques(seee.g.,[18 ℄)su h as the proje ted gradient des ent algorithm4
: namely, iteratively omputing
x(k + 1) = Π (x(k) + γ
k
∇f (x(k)))
, where{γ
k
}
k≥0
is a positive sequen e of parameters su h thatP
k
γ
k
= ∞
,lim
k→∞
γ
k
= 0
andΠ
is the proje tion ontothefeasibleset forx
. Ingeneral,a losed formulaforf (x)
isnotknown; thus,thegradientneedsto beestimatedas explainedin [7℄. Here, ourpurpose is to fo us on thedistributed sub-gradientmethod so we onsider the spe i asewhere updates an travel at mosttwo hops, thus avoidingto address the gradientestimation'sissue. Indetail,atagiveninstantt
¯
,letus allt
SP
j
(¯
t)
thetime atwhi h theSP dire tly inje tedfresh ontent tonode
j
, thenwe dene thefollowingproto ol's rule:Denition1(ContentSharing). Whenanode
j
meetsanodei
attime¯
t
,j
will opyto useri
thelast ontentdownloadeddire tly fromtheSP ifthis ontent ismore re entthenthe ontentstoredini
, i.e.,t
i
(¯
t) < t
SP
j
(¯
t)
.Inthis ase,assumingalsothata)
u(Y
i
) = χ {Y
i
≤ τ }
forallnodesi
,whereτ
isa giventhresholdafterwhi h theinformationisworthless 5(e.g., the infor-mation onsistsof news abouteventsthat expire after some time)and b) the meeting pro ess amongnodepairsis Poissondistributed,we an omputethe lo alutilityfun tionforea hnodeas
f
i
(x) = 1 −
Y
j∈N
i
x
j
e
−λ
ij
τ
− λ
ij
e
−x
j
τ
x
j
− λ
ij
e
−x
i
τ
,
(13)where
λ
ij
isthe meeting ratebetweeni
andj
andN
i
def
= {j : λ
ij
> 0}
. The global utility fun tionin (12 )is then simply obtained summing (13 )overi =
1, 2, . . . , M
. The omputationstoderive(13 ) anbefoundin AppendixG. The lo al gradient fun tion needed in (1) an be omputed dire tly from (13 ),wherenodesonlyneedtoestimatesimplestatisti sontheirownmeeting rates. Clearly, (12 ) an be optimized also in a entralized fashion olle ting, forexampleat theSP itself,information aboutthestatisti oftheoverall net-workmeeting pro ess [7 ℄. However,in a DTN s enario theSP may be ableto ommuni ate with a group of onne ted nodes onlyfor short periods of time, that wewould liketoexploit transmittingthea tual ontentusers areabout. Moreover,issuesrelatedtopriva yeasilyapplytothiss enario: forexample,a node mayprefernottodis lose informationaboutitsmeetingsto theSP and, in some ases,itwould beequally desirabletomaintaininformationaboutthe utilityfun tionu
i
(·)
reserved(e.g.,in militaryappli ations). A distributed ap-proa his thereforeofa tualinterestnotonlyforDTNs, but alsofors enarios that gobeyondthem.Tooptimize
f (x)
inadistributedfashionwe anusetheframeworkpresented in Se .2 . Ea h nodei
an omputea lo al estimateof theoptimalallo ationx
, i.e.,x
i
, through iterativeupdates. Indetail,when two nodes
i
andj
meet 4Foradis ussionabouttheproje tedgradientmethodimplementedinadistributefashion see[19 ℄.
5
they: i)update
x
i
and
x
j
asin (1 )and, ii)proje ttheresultso obtainedonto thefeasibleset
P
M
l=1
x
l
≤ µ
andx
l
≥ 0
foralll ∈ {1, . . . , M }
. In our implementation of sub-gradient optimization all thex
i
eventually onvergetotheoptimum
x
∗
of(12 ). Hen eforth,theSP anretrievetheoptimal transmissionratesusingthefollowingpush-poli y. Duringtheexe utionofthe algorithm ea h node
i
maintainsitsown estimatex
i
. TheSP olle ts
x
i
from every nodeandobtainstherateallo ationve toras
x
= (
P
M
i=1
x
i
)/M
. To testtheperforman ea hievablebythedistributedsub-gradientmethod under tra eswithmemory,wesimulatedmeeting eventsamong
M = 10
nodes as follows. CallingR
1
andR
2
two distin t regions ofthe spa e, nodes an be pla ed either inR
1
orinR
2
. Onlynodes thatare within thesame region an ommuni atewithea hother. Weletnodesfreeto hangeregionofpla ement a ordingto aPoissonpro essofoverallrateλ
d
= 0.1
,thusnetwork'sfull on-ne tivityisguaranteed. Inaddition,a ordingtoaPoissonpro essofparameterλ
m
= 1
(notethatλ
m
> λ
d
),wegeneratemeetingeventsamongpairof nodes belonging to the same region. Ea h node is sele ted for a meeting a ording to a weightwhi h is proportionally inverse to itsindex, i.e., nodei
is sele ted with weightw
i
= i
−3
. Note that wegenerate a meeting pro ess that is both stationaryandergodi ,andalongthispro essnodeshavediverse onta trates. In parti ular node
1
has the highest onta t rate, whilst node10
the lowest. To sum up, letting nodesi
andj
update their states at their meeting times a ordingtotheabovepro ess,weobtaina orrespondingsequen e{A(k)}
k≥1
that an beviewedasgeneratedfromanergodi andstationaryMarkov hain. Also, given that nodes have dierent onta t ratesand asyn hronous updates areperformed,weknowthatadire tappli ationofthedistributedsub-gradient algorithmmayleadtosub-optimal results,see thetoyexampleofSe .3 .Figures4 5 showsimulationresultsfortheabovesettingofparametersand
τ = 20
.Fig.4showstheoptimalallo ationrateforea huser. Whenthebandwidth thattheSP anusetosendupdatesisverylow(i.e.,small
µ
),thebestsolution is thattheSP uniquelysendsupdatesto thenodethat hasthehigher onta t rate,i.e.,node1
;forlargevalues ofµ
,instead,theSP anevenlysendupdates to all the nodes in the network. Interestingly, as already observed in [7 ℄, for some values ofµ
(inour aseµ
around10
0.7
update/se )theoptimal hoi e is fortheSP toallo atemore bandwidth(i.e.,alarger fra tionof
µ
)to thenode with the lowest onta trate, namely, node10
. For these values ofµ
, in fa t, thosenodeswithalarge onta tratesu has node1
areableto maintainhigh values for their utility fun tions just by olle ting information from the large number ofnodestheymeet.InFig.5weshowthemeantraje torytowardstheoptimalfortwoelements in
x
= (
P
M
i=1
x
i
)/M
,wheretheve torsx
i
havebeenobtainedalongasequen e of
5 · 10
4
meetings onsidering
µ = 10
−1.1
update/se . We note that the es-timates providedthrough thedistributed sub-gradientmethod onvergetothe theoreti aloptimalallo ationvaluesin Fig.4 . Con ernsaboutthe onvergen e rate of su h estimates are out of the s ope of the present report and will be addressedin thefutureresear h
6 . 6
Note,however, thatawide literatureaddressingthisissue alreadyexists. E.g.,forthe problem of designingsuitable sequen es
{A(k)}
to speed up the onvergen e of onsensus see[20 ℄.−1.6
0
−1.1
−0.6
−0.1
0.4
0.9
1.4
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
log bandwidth [log
10
µ]
optimal rate allocation [x /
µ
]
node 1
node 2
node 3
node 4
node 5
node 6
node 7
node 8
node 9
node 10
Figure4: Optimalbandwidthallo ationfora networkof
10
nodes.Finally, in Fig. 6 we draw with a solid-line the maximum of
f (x)
orre-spondingto theoptimalrateallo ations inFig.4,whi hwasobtained usinga entralized solver [18 ℄. Note that with illimited bandwidth, the maximum off (x)
isequalto10
,i.e.,whenea hnodei
hasutilityu
i
= 1
. Foreightdierent valuesoftheavailablebandwidthµ
,wealsoplotwithsquaredpointstheutility fun tionvalues orrespondingtotheoptimalrateallo ationa hievedagainwithx
= (
P
M
i=1
x
i
)/M
using sub-gradient optimizationtogether withour enhan e-ments. With rossesweshowtheperforman eofthesub-gradientoptimization with a xed step size [3 ℄, whi h negle ts the asyn hronous update issue. As expe ted,theresultsofthelatteralgorithmaresub-optimal. Mostimportantly, thesolutions a hievedwithourapproa harevery losetothea tualoptimum forallvalues of
µ
. This onrmsthevalidityofthedistributedframeworkthat wepresented.0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
number of pair meetings [10
4
]
rate allocation [x /
µ
]
optimal − node 1
optimal − node 10
estimate − node 1
estimate − node 10
Figure 5: Example of onvergen e of estimate for two nodes when the sub-gradientmethodisused. Asyn hronousupdatesandde reasing step-size.
7 Con lusions
Inthisreportwe onsideredthere entframeworkofthedistributedsub-gradient optimizationproposedin[3 ℄,andlaterextendedin[4 ℄forappli ationonrandom s enarios. Wepointedoutthat existing onvergen eresultsforthisframework an beapplied to DTNs only in the ase of syn hronous node operation and in thepresen e ofsimple mobilitymodels withoutmemory. Therefore,we ad-dressed both these issues: rst, we proved onvergen e to optimality of the sub-gradientoptimizationte hniqueunderamoregeneral lassofmobility pro- essesandse ond, weproposedsomemodi ationstotheoriginalsub-gradient algorithm so as to avoidbias problems (i.e., onsisting of the onvergen e to-wardssub-optimal solutions) when nodes operate asyn hronously. Finally, as a ase study, we applied the presented framework to the optimization of the disseminationofdynami ontentinaDTN.Alltheprovidedresults onrmed thatthedistributedsub-gradientmethodisanee tiveandverypromisingtool foroptimizationin distributed ontexts.
−2.6
−2.1
−1.6
−1.1
−0.6
−0.1
0.4
0.9
1
2
3
4
5
6
7
8
9
10
log bandwidth [log
10
µ]
global utility function [f(x)]
optimum
distributed estimation − decreasing step−size
distributed estimation − fixed step−size
Figure6: Performan e omparison: entralizedsolver
vs
distributed method.8 A knowledgments
Authors kindly thankDr. Mi hele Rossi, forhis pre ious omments and sug-gestions.
A Derivation of Equation (2)
Inthis Appendixweshowthemathemati alstepsthat leadfromequation (2 ) toequation(3 ).
From(2 )wehavethat
X
(k + 2) = A(k + 1)X(k + 1) − Γ(k + 1)D(k + 1) =
= A(k + 1)[A(k)X(k) − Γ(k)D(k)] − Γ(k + 1)D(k + 1) =
= A
(k+1)
(k)
X(k) − A
(k+1)
(k+1)
Γ(k)D(k) − Γ(k + 1)D(k + 1) ,
where
Γ(k)
is a diagonal matrix and the element[Γ(k)]
ii
= γ(k)
, for alli
. Iteratively,repla ingk + 2
withk + s + 1
,k + 1
withk + s
andk
withk + s − 1
, respe tively, leadstoX(k + s + 1) =
=A
(k+s)
(k+s−1)
X(k + s − 1) − A
(k+s)
(k+s)
Γ(k + s − 1)D(k + s − 1)+
− Γ(k + s)D(k + s) =
=A
(k+s)
(k+s−1)
[A(k + s − 2)X(k + s − 2) − Γ(k + s − 2)D(k + s − 2)] +
− A
(k+s)
(k+s)
Γ(k + s − 1)D(k + s − 1) − Γ(k + s)D(k + s) =
=A
(k+s)
(k+s−2)
X
(k + s − 2)+
− A
(k+s)
(k+s−1)
Γ(k + s − 2)D(k + s − 2)+
− A
(k+s)
(k+s)
Γ(k + s − 1)D(k + s − 1) − Γ(k + s)D(k + s) =
=A
(k+s)
(k+s−2)
X
(k + s − 2) −
1
X
l=0
A
(k+s)
(k+s−l)
Γ
(k + s − 1 − l)D(k + s − 1 − l)+
− Γ(k + s)D(k + s) =
. . .=A
(k+s)
(1)
X(1) −
k+s−2
X
l=0
A
(k+s)
(k+s−l)
Γ(k + s − 1 − l)D(k + s − 1 − l)+
− Γ(k + s)D(k + s) ,
andnally,repla ing
k + s
withk
,wehavethatX
(k + 1) = A
(k)
(1)
X
(1) −
k−2
X
l=0
A
(k)
(k−l)
Γ
(k − 1 − l)D(k − 1 − l) − Γ(k)D(k)
(14)or, repla inginequation(14 )
k − l
withs
X(k + 1) = A
(k)
(1)
X(1) −
k
X
s=2
A
(k)
(s)
Γ(s − 1)D(s − 1) − Γ(k)D(k) ,
B Stationarity and Ergodi ity: Con epts
Let
(Ω, F, P)
be a probability spa es. Let also the fun tionT : Ω → Ω
bea measurabletransformationofΩ
intoitself.Denition 2 (Measure-Preserving). A measurabletransformation
T : Ω → Ω
intoΩ
ismeasure-preservingif,foreveryB
inF
,P
(T B) = P(B)
.Denition3(Stationarity). Arandomsequen e
ω ∈ Ω
isstationaryiftheshift operatorismeasure preserving.Denition4(Invariantset). If
T
isameasure-preservingtransformation,B ∈
F
isaninvariantsetifT B = B
,orequivalentlyP
[(B \ T B) ∪ (T B \ B)] = 0
.Denition 5 (Ergodi ity). Ameasure-preservingtransformation
T
isergodi , if,givenanyinvariantsetB ∈ F
,itholdsP
(B) = 0
orP
(B) = 1
.Denition 6 (Mixing). A measure-preserving transformation
T
is mixing if, forallB
andC
inF
,lim
n→∞
P
(B ∩ T
n
C) = P(B)P(C).
Notethatwhenarandomsequen eissaidtobeergodi tout ourt,itmeans that the shift operator is ergodi . Similarly when a random sequen e is said to bemixing(or mixingin theergodi -theoreti sense),itmeansthattheshift operatorismixing.
C Proof of Lemma 1
InthisAppendixweprovelemma1,reportedin thefollowingforreader onve-nien e.
Lemma. (Windowing aMarkov hain) Let
Φ = {Φ
n
, n ∈ N}
beanirredu ible, aperiodi and stationary Markov hain. Consider the sto hasti pro essΨ =
{Ψ
n
, n ∈ N}
,whereΨ
n
= (Φ
n
, Φ
n+1
, · · · , Φ
n+h−1
)
withh
apositiveinteger.Ψ
isalso anirredu ibleaperiodi stationaryMarkov hain.Proof. First ofallit isevidentthat
Ψ
isalsoa Markov hain,whosestatesare possibleh
-uplesof statesofΦ
,e.g(s
1
, s
2
, · · · , s
h
)
. Thetransitionprobabilities ould be al ulated starting from those ofΦ
. Stationarity ofΨ
easily follows from the stationarity ofΨ
.Ψ
is also irredu ible be auseΦ
is irredu ible. In fa t, given two statess
′
= (s
1
, s
2
, · · · , s
h
)
andt
′
= (t
1
, t
2
, · · · , t
h
)
, for the irredu ibilityofΦ
, itexistsn
0
,su h thatthe hainΨ
movesfroms
′
to astate
u
′
= (u
1
, u
2
, u
h−1
, t
1
)
aftern
0
steps,andthenitispossible tomovefroms
h
tot
1
.t
′
isastateof
Ψ
andthereforeitisalsoavalidsequen eofstatetransitions forΦ
, onsequentlyinh − 1
timesteps,Φ
anmovefromt
1
tot
h
goingthrought
2
,· · ·
,t
h−1
andΨ
anmovefromu
′
tot
′
. In on lusioninn
0
+ h − 1
steps,Ψ
an movefroms
′
tot
′
.Aperiodi ity requires a more detailed dis ussion. Given a possible state
s
′
= (s
1
, s
2
, · · · , s
h
)
,wewanttoprovethat thegreatest ommondivisorofthe possibletimestepsafterwhi h the hainΨ
an returnins
′
isequalto
1
. Note that even ifΦ
hadthe property that itis possible to dire tly movefrom ea h state to itself, for a states
′
with
s
i
6= s
1
for somei = 2, · · · , h
, at leasth
steps are requiredto return to that state. Consider the minimum numberk
0
of timestepsafter whi h the hainΦ
an movefroms
h
tos
1
(againk
0
exists be auseΦ
is irredu ible). Consider then the in reasing sequen e of all the possibletimestepsk
1
, k
2
, . . .
after whi hitispossibletoreturnins
1
. Observe that also2k
i
belongstothis sequen e. Itis learthatΨ
an returnins
′
after