1
Revisiting Pitfalls of DTN Datasets
Statistical Analysis
Gwilherm Baudic, Tanguy P´erennou and Emmanuel Lochin firstname.lastname@isae.fr
DMIA, ISAE, University of Toulouse, France
2
Contents
1 Introduction
2 Datasets and assumptions
3 Impact of assumptions on dataset analyses
4 Checklist proposal
3
Introduction
Datasets are key in DTN performance evaluation, but. . .
Issues
Data collection is hard to setup
Traces do not capture limitations on node buffers and transfer bandwidth
They may miss some contact opportunities
3
Introduction
Datasets are key in DTN performance evaluation, but. . .
Issues
Data collection is hard to setup
Traces do not capture limitations on node buffers and transfer bandwidth
They may miss some contact opportunities
4
Datasets studied
Characteristics
Rollernet MIT Infocom 2005 Technology Bluetooth
Duration (days) 0.125 284 3 Granularity (s) 15 300 120 Internal nodes 62 89 41 Internal contacts 60,146 114,046 22,459
5
Assumptions
In the following, we focus on:
Choice of nodes Symmetry of the pairs
Minimum number of contacts Treatment of 0-second contacts Dataset time span
Inter-contact definition
5
Assumptions
In the following, we focus on: Choice of nodes
Symmetry of the pairs
Minimum number of contacts
Treatment of 0-second contacts Dataset time span
Inter-contact definition
6
Impact on dataset analyses (1/5)
Baseline assumptions
Choice of nodes: internal only. Symmetry of the pairs: asymmetrical. Minimum number of contacts: not enforced. 0-second contacts: extended to 1 second.
Power-law parameters α and xmin: xminis the measurement
granularity.
7
Impact on dataset analyses (2/5)
0-second contacts
5000 first seconds of Rollernet
1 2 5 10 20 50 100 200 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Rollernet 0s−>1s Rollernet >0s Rollernet >=15s
7
Impact on dataset analyses (2/5)
0-second contacts MIT 1 100 10000 0.0 0.2 0.4 0.6 0.8 1.0 Time (s) P[X>x] MIT 284 days 0s−>1s Pareto alpha=1.534 xmin=300 MIT 284 days >0s
8
Impact on dataset analyses (3/5)
Pareto lower bound estimation
Measurement granularity vs. estimation (Clauset et al.) Infocom 2005 (granularity = 120s) 100 200 500 1000 2000 5000 10000 20000 50000 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Data
Pareto alpha= 1.886 xmin= 120 Pareto alpha= 2.676 xmin= 1402
9
Impact on dataset analyses (4/5)
Trace length Rollernet 1 2 5 10 20 50 100 200 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Rollernet 5000s Rollernet full trace
10
Impact on dataset analyses (5/5)
External nodes
5000 first seconds of Rollernet
1 2 5 10 20 50 100 200 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Rollernet internal Rollernet internal+external
11
Checklist proposal
Did I discard some values or periods of the dataset?
Ex.: 0-second, weekends. . .
Did the fitting method discard some data?
Ex.: Pareto lower bound xmin.
Did I change some values?
11
Checklist proposal
Did I discard some values or periods of the dataset?
Ex.: 0-second, weekends. . .
Did the fitting method discard some data?
Ex.: Pareto lower bound xmin.
Did I change some values?
11
Checklist proposal
Did I discard some values or periods of the dataset?
Ex.: 0-second, weekends. . .
Did the fitting method discard some data?
Ex.: Pareto lower bound xmin.
Did I change some values?
12
Conclusions
Contributions
Summary of pre-analysis assumptions from the literature Study of their influence on statistical analyses
Strong influence of 0-second contacts and Pareto lower bound estimation
Weaker influence of trace length and external nodes
Checklist proposal
Future work
Research the other assumptions Extend the work to pairwise metrics
13