HAL Id: inria-00383079
https://hal.inria.fr/inria-00383079
Submitted on 12 May 2009
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
File Sizes
Eitan Altman, Julio Rojas-Mora, Tania Jimenez
To cite this version:
Eitan Altman, Julio Rojas-Mora, Tania Jimenez. Simulating Bandwidth Sharing with Pareto dis- tributed File Sizes. [Research Report] RR-6926, INRIA. 2009. �inria-00383079�
a p p o r t
d e r e c h e r c h e
0249-6399ISRNINRIA/RR--6926--FR+ENG
Thème COM
Simulating Bandwidth Sharing with Pareto distributed File Sizes
Julio Rojas-Mora — Tania Jiménez — Eitan Altman
N° 6926
May 2009
Centre de recherche INRIA Sophia Antipolis – Méditerranée
JulioRojas-Mora
∗
,Tania Jiménez, Eitan Altman
†
ThèmeCOMSystèmesommuniants
Équipe-ProjetMaestro
Rapportdereherhe n° 6926May200920pages
Abstrat: Thetraontheinternethasknowntobeheavytailed: thesizeof
letransfersthroughFTPorHTTPappliations,aswellasthosetransferredby
P2Pappliations hasbeenobservedtohaveaveryheavytail. Typiallymod-
eledasParetodistributed withparameterbetween1.05to1.5, thelesizehas
innitevariane. Thisisthesoureofmanydiultiesin simulatingdatatraf-
: onvergeneisveryslow,simulationshavetobeverylong,andthestandard
methodsforderivingondeneintervals,basedontheCLT,arenotappliable
here. Weillustratethese wellknownproblemsthroughthesimulationstudyof
a proessorsharingqueue, whih is oftenused to model session levelresoure
sharingin theinternet. Wetest bootstrapmethods to aelerate onvergene
and improvethepreisionofsimulations,and testadiret approah toobtain
ondene intervalbasedonthe histogramoftheempirial distributions. The
onlusiondrawnarethenompared tothoseobtainedwhensimulatingin ns2
datatransferusingTCP.
Key-words: M/G/1 queue. Proessor Sharing. Simulations. Condene
interval. Bootstrap.
∗
LIA, University of Avignon, 339, hemindes Meinajaries, Agropar BP 1228, 84911
AvignonCedex9,Frane. {julio.rojas,tania,jimenez}univ-avignon.fr
†
MaestrogroupINRIA,2004 RoutedesLuioles06902 SophiaAntipolisCedex,Frane.
eitan.altmansophia.inria.fr
Résumé: Lataille des hierstransférés surl'Internet parFTPouHTTP,
ainsi que eux transférés par les appliations P2P, a souvent été observée et
mesurée. La distribution observée a été à queue lourde et elle a souvent été
modélisée par une une distribution de Pareto ave un paramètre entre 1,05
et 1,5; ela signie que la taille des hiers a une variane innie. Cei est
la soure de nombreuses diultés dans la simulation du tra de données:
la onvergene est très lente, les simulations doivent être très longues, et les
méthodeshabituellesdealuld'intervallesdeonane,baséessurleThéorème
Limite Centrale, ne sont pas appliables. Nous illustrons es problèmes bien
onnus à traversla simulationd'une le "Proessor Sharing",qui est souvent
utiliséepourmodéliserlepartagedesressouresauniveausessiondansl'Internet.
Lesontributionprinipaledenotretravailsont(i)l'appliationdelaméthode
deBootstrapquinouspermetd'aélérerlaonvergeneetd'améliorerlapréision
des simulations, et (ii) l'étude de l'approhe pour obtenir des intervales de
onanebaséssurladistributionempirique. Puisnousomparonslesonlusions
à elles obtenues lors de la simulation en NS2 des transfers de hiers sur
l'Internet.
Mots-lés: FileM/G/1,ProessorSharing,Simulatiojs,IntervaldeConane,
Bootstrap
1 Introdution
The proessor sharing queue has been perhaps the most ommon model for
bandwidthsharingofsessions(havingthesameRTT)ofdatatransferoverthe
Internet [4℄, along with the Disriminatory Proessor sharing Sharing queue
[7℄ adapted to the ase of unequal round trip times. Several researh groups
[17, 14, 11, 13℄ haveexamined itsvalidity throughsimulations. Theexpeted
sojourntimeofaustomerin aproessorsharingqueueisknowntobeinsensi-
tivetothelesizedistribution(itdependsonthedistributiononlythroughthe
expetation). Its variane,however,isnotinsensitiveanymore,andinpartiu-
lar,itisinnitewhentheservietimeofalient(orequivalentlyinoursetting
-thesizeofalethatistransferred)hasaninniteseondmoment[3℄.
The size of a data transfer overthe Internet is known to be heavy tailed
[12℄. (This is thease both for FTPtransfers aswell asfor the "on"periods
of HTTP transfers). Among several andidates for modeling the distribution
ofthesetransfers,theParetodistributionwithparameterbetween1.05and1.5
has been the one that gave the best t with experiments for the last twenty
years,see[8,2,6,15℄.
Thisveryheavytail hasbeenausing various serious problemsfor simula-
tionsofdatatransfers:
Theauthorsof[11℄write: "sineasubstantialpartof thedistribution is
in the tail, if the simulation is not runfor very longthe average of the
sampledlesizeswouldbelessthanthenominalaverage,thusleadingto
aloweroeredloadandheneoverestimationofthethroughput."
Centrallimitbasedondeneintervalsarenotavailablesinetheseond
momentof the number of sessions orof the workload in thesystem are
innite.
Duetothelastpointsonemayhaveproblemsininterpretationofsimula-
tionresults. Indeed,variouspapersin whih sharingbandwidthismod-
eledbyproessorsharingreportdierenesbetweentheexpetedtheoret-
ialvalueandthesimulationvalue[13,14,11,17℄that varyfromaround
10%in [11℄ andgoupto afatorof10in somesituationsin [13℄. When
suh deviations our,it is importantto know whether theyanbedue
totheimpreisionsin thesimulationortorealphenomena.
Thewarm-uptimeisextremelylong,see[9℄.
Other approahes to simulating networks with Pareto dis-
tributed le size
Onewayto avoidheavytailsisto trunatethem. Thisis in spiritof thesug-
gestioninthepaper"DiultiesinsimulatingqueueswithParetoservie"[10℄
that says: "Sinefor any nite simulation runlength, there isalways amaxi-
mum value of the random variables generated, we, inatuality, are simulating
atrunatedPareto servie distribution. Ithas alsobeen arguedthat thereisal-
waysamaximumlesizeorlaim amountso,inreality,wearealways dealing
with trunated distributions." Butwhere should we trunate thedistribution?
TrunationatsomesizeM wouldbevalidifthedierenebetweenperformane
with trunation at anyother valueL that is greater than M hasa negligible
impatonperformane. Simulating withatrunated Pareto distribution may
notbesuientandseveralothertestsoftrunationatlargerthresholdvalues
maybeneeded.
Our ontributions
Our main ontribution is in introduing to the networking ontext statistial
methods that, to the best of our knowledge have notbeenused before in the
networking ontext. Inpartiular,
Weobtainanimpressiveimprovementinthepreisionofsimulationsusing
thebootstrapapproah;equivalently,thisallowsusto dereaseonsider-
ably the simulation run times or the number of simulations needed in
order toahieveagivenpreisionlevel. Weshall nallypresentdetailed
experimentationswithsomeofthem.
Condene Intervals Weusethequantilebasedapproahasanalter-
native to the entral limit based approah for obtaining the ondene
intervals. Theentrallimitapproahis notdiretly appliablewhenever
thevarianeisinnite,whihistheasewiththestationarysojourntime
of adata transfersession(having aPareto distribution withashapepa-
rameterKlowerthan2).
We then apply these ideasdiretly to the simulationof ompeting non-
persistent TCP transfers of les that have heavy tail distribution. We
omparethesimulationresultstothoseobtainedfortheproessorsharing
queue. Weidentify various reasons for the deviation of the behaviorof
TCPfromtheidealproessorsharingmodelandmanageto quantifythe
impatofsomeofthese.
Struture of the paper
We begin by presenting in the next setion a bakground on the bootstrap
methodandonthequantilemethodforobtainingondeneinterval. InSetion
3 we present our approah onerning the use of simulation to estimate the
average number of pakets in the proessor sharing queue. In Setion 4 we
study someissues onerning the simulation of the workload of the proessor
sharingqueue. Wethen presentin Setion 5simulationsof TCP onnetions
that share a ommon bottlenek link and ompare the preision that an be
obtained(with andwithoutthebootstrapapproah)tothepreisionobtained
in simulating the proessor sharing queue. We provide explanations for the
dierenesbetween theTCPsenario andits orresponding proessorsharing
model. Aonludingsetionsummariesthepaper. An appendix lariessome
basiquestionsonerningondeneintervals.
Algorithm1Bootstrapalgorithmforestimatingthemeanqueuesize
1. Makensimulationsofthe queuequeuesizeand letXi,∀i= 1, . . . , n,be
theestimationoftheparameterofinterestobtainedfromeahsimulation.
2. Forj = 1, . . . , kdo:
(a) Let X1,j∗ , . . . , Xn,j∗ be a resample, with replaement, taken from
X1, . . . , Xn.
(b) Letθˆj∗=n−1Pn i=1Xi,j∗ .
3. Calulateθˆ∗=k−1Pk
j=1θˆj∗,theMonteCarloapproximationoftheboot- strapestimationofθ.
4. Calulateondeneintervalsforθˆ∗ usingthequantilemethod.
2 Bakground
Bootstrap
BootstrapisamethodreatedbyEfron[20℄fornon-parametrialestimation. Let
θbeaparameterofaompletelyunspeieddistributionF,forwhihwehave
asampleofi.i.d. observationsX1, . . . , Xn,andletθˆbetheestimationmadeof
theparameter. Fromthesample, wewill makek resampleswithreplaement,
X1,j∗ , . . . , Xn,j∗ ∀j= 1, . . . , k,witheahelementhavingprobability1/nofbeing
seleted, and foreah resamplean estimatorθˆ∗j will bealulated. ByMonte
Carloapproximation,thedistributionofθˆisthenestimatedbythedistribution of θˆ∗. When k → ∞, the estimation of θˆ will be better and, in turn, the
real distribution of θ will also be better estimated. We note that the time
and memoryoverheadfor resamplingand performing thebootstrapalgorithm
are oftenmuhsmallerthan theones neededto reatemoresamples, andan
beperformedwithin averyshort amount oftime. Singh [21℄, and Bikel and
Freedman[19℄aregoodreferenestounderstandtheasymptotiharateristis
ofthebootstrap.
Remark. An alternativeway to aelerate the simulationsis the important
samplingormoregenerally,varianeredutiontehniquesseee.g. [22,Chap4℄
for ageneralintrodution. Theyare dierentthan thebootstrapapproahin
thattheyarebasedonsimulatinganothermodel(e.g. usealargerloadinorder
to obtainbetterestimateof arareeventof reahingalargequeuesize). Then
some knowledge of the system is needed in order to transform the simulated
results of the new model to that of the original one. The bootstrap method
that westudyis apost-simulationapproah: it onernsstatistial proessing
ofsimulatedtraes. Itanbeusedontopofvarianeredutiontehniqueswhen
theyareavailable.
Quantile-based ondene interval
Assumewewishtoobtaintheondeneintervaloftheestimationofsomepa-
rameterof asimulatedproessXt. Thequantileapproahtoderiveondene
intervalsisbasedonrunninganumberN ofi.i.d. simulations(eahsimulation orrespondsinourasetothequeuelengthproessortofuntions ofthispro-
ess). Wethenusethese toomputetheempirialdistributionof thefuntion
oftherandomvariable.
For(1−α)·100% ondene level, the (1−α)·100% ondene interval
by the quantile method is the interval between the (α/2)·100- th and the (1−α/2)·100-thpointsofthesortedsample.
3 Simulating the queue size of a proessor shar-
ing queue
ThroughaseriesofsimulationsperformedinJAVA(availablefromtheauthors
byrequest) we exhibit thepower ofthe bootstrapapproah: itsability to in-
reasethepreisionofthesimulationofInternettrathatisthrottledbysome
bottleneklinkandatthesametimedereasetherequireddurationofthesimu-
lation. Inthissetionwerestrittostudyoftheproessorsharingqueuewhih
hadoftenbeenproposedasamodelforTCPtransferssharingabottleneklink
intheInternet. A TCPsession isthenrepresentedbyoneustomerin thePS
queue.
Belowweused the PS queue with aPoisson arrivalproesswith a rateof
λ= 1ustomers perseond (A ustomer representsale when the proessor
sharingqueueis used to model le transfers). We onsider three servietime
distributions: Pareto with shape parameter 1.5, Pareto with shape 2.5, and
exponentially distributed. Wevary the average servietime σ soasto obtain
anaverageloadρ=λσ thattakesthevalues0.6and0.7.
Eahoneoftheguresbeloworrespondto thequeuesize oftheproessor
sharing averaging over 100 independent samples. When onsidering the PS
queueasamodel forbandwidthsharingintheInternet,thequeuesize should
beinterpretedasthenumberofongoingsessions.
Thedurationofeahsimulationis6·106seandthereisawarmuptimeof
300000se. Ineahoneofthesenariosdesribedin Figure1,wepresent:
1. thetheoretialsteadystateexpetedqueuesize
2. thevalueobtainedbythesimulations
3. theondeneintervalorrespondingtoa95%perentileofobtainedusing
thequantilemethod
4. the ondene intervalorrespondingto a 95%perentile obtainedafter
applyingthebootstrapmethod.
Wetook100000resamplesoutofour100originalsamplesforthebootstrap.
Figure1displaysthepreisionofthesimulationswithandwithouttheboot-
strap approahfor ρ = 0.6 (up) and ρ = 0.7 (down). The left subgures are
foranexponentialdistributed servietime,themiddleand rightonesarefora
Pareto distributed servietime withparameterK = 2.5 and K = 1.5,respe-
tively. Theaverageservietimeis thesamein theallthree sub-guresgures
orrespondingto thesameρ. (Itwashosensothat indeedρ=λ·σwillhave
thevalues0.6 and0.7,respetively.)
Weobservethefollowingpointsfrom thesimulations:
ThesimulationsshowwelltheinsensitivityofthePSregimetotheservie
time distribution. Indeed, in eah set of simulations having the same
loadρ, we see onvergene to the sameaverage ratefor the exponential ase,thePareto distributionwithparameterK= 2.5and forthePareto
distributionwithK= 1.5.
The95%ondene intervalwithoutbootstrapisaround10timeslarger
than that of the bootstrap (whih establishes learly the advantage of
usingit). Thisisseentoholdforanydurationofthesimulation.
Weseethattheondeneintervalarearound10timessmallerinthease
ofexponentialand Pareto with shapeparameterof 2.5 than in the ase
ofParetoshapeparameter1.5. Thisanbeexpetedsinethetailofthe
latterdistributionismuh heavier.
Durationofthesimulation:Forallthreedistributions,andforthedierent
loads, we see that the preision obtained with bootstrap after already
400000se is morethan three timesbetterthan that without bootstrap
after wesee that even after 6millionseonds. It thus seemsthat to get
thesamepreisionwithoutbootstrap,onewouldneedtousesimulations
muhlongerthan15timesasmuhaswithbootstrap.
4 On the simulation of the workload in a queue
with heavy tailed servie time
Thestatespaeneededtorepresenttheevolutionofthenumberofpaketsina
proessorsharingisquiteomplex: weneedtheresidualtimeofeahpaketthat
is in the system. Inontrast, the evolutionworkload in the systemis easyto
desribeasitanbewritten asaonedimensionalreursiveequation. Wethus
hose to illustrate some of the diulties in simulating the proessorsharing
queuebyashortdisussiononerningthesimulationoftheworkloadproess.
We onsider in this Setion the workload of an M/G/1 proessorsharing
queue withaPoisson arrivalproess withrate λand withi.i.d. servietimes.
Theworkloadisthetimeittakestoompletetransmissionofalltheonnetions
thatarepresentin thesystem. Assume throughoutthatρ <1sotheworkload
proessisergodi.
Wenote thattheworkloadproess isthesameastheoneofaFIFOqueue
withthesamearrivalsandservietimes.
In ase servie times do not have a nite seond moment, the expeted
workloadat steady stateis innite. (Indeed, the workloadis invariantunder
theservie disipline,and is thus equalto theoneof FIFOdisipline. Dueto
thePASTAproperty,theexpetedstationaryworkloadequalstothatatarrival
epohs whih equals the expeted waiting time. The latter is innite in the
M/G/1FIFOqueuesine sinethearrivalhasto wait morethan theresidual
servie time of the ustomer in servie, and the latter is proportional to the
seondmomentoftheservietime.)
Considernow the asewhere the servie time has anite seond moment
but innite third moment. In that ase the expeted stationary workload is
nitebutitsvarianeisinnite.
Assumethat wewishto estimatetheexpetedstationaryworkloadE[V]in
thesystem. Theworkloadsatisesthereursion
V0(v) =v, Vn+1(v) = (Vn(v) + Θn−τn)+, n≥0
whereVnistheworkloadasseenbythentharrival,Θn istheworkloadbrought
bythentharrival,andτn isthetimebetweenthentharrivalepohTn andthe
(n+ 1)tharrivalinstantTn+1. E[V]anthenbeestimatedby Vbn=
Pn i=1Vi
n
forn large(due to the PASTA property, estimation at arrivalinstantsindeed
onvergestothestationaryrandomvariable). Denethebiasfuntion:
Rn(v) =E[Sn(v)]−nE[V] where Sn(v) = Xn
i=1
Vi(v).
WenotethatRn(0)ismonotonedereasinginn. Ithasalimitasn→ ∞whih
wedenote by R. Rn(0) isknown asthebias; ifit werezero then Vbn =E[V].
OtherwizethetotalexpetedestimationerrorattimeTn isgivenbyRn(v)/n.
Toobtainanestimationerrorsmallerthanǫ,weneedtohoosensuhthat Rn/n < ǫ. It is advoated in [18℄ to hoose nsuh that R/n < ǫ. It ensures
that|Rn|/n < ǫ. Followingthisapproah,wedisover,unfortunately, thatthe simulationlength should betaken to be innitesine |R| turns to beinnite.
Wenextestablishthisonlusion.
LetVn∗ bethestationary workloadproess WeassumethatVn∗ andVn are
dened on ajoint probability spae: the arrivalproess and servietimes are
thesameinboth. Theonlydiereneisintheinitialstate.
Deneη(v)tobethersttimenthat Vn oupleswithVn∗. Letθ(v)bethe
rsttimethatithits0. Then
Rn(v) = Xn
i=0
E
1{η(v)> i}[Vi(v)−Vi∗]
Wemakethefollowingkeyobservation: Whenv > V0∗thenη(v) =θ(v). Indeed,
aslongasVn∗andVn arebothpositivethenthedierene|Vn−Vn∗|isonstant.
Itstart dereasingwhen thesmallerof thetwohitszeroand itvanisheswhen
theyarebothzero.
LetQ0(v) =v anddene forn >0: Qn(v) =
Xn
i=0
1{θ(v)> i}Vi(v)
NotethatRn(v)≥P(V∗ < v)Qn(v).