Revisiting Pitfalls of DTN Datasets Statistical Analysis

(1)

1

Revisiting Pitfalls of DTN Datasets

Statistical Analysis

Gwilherm Baudic, Tanguy P´erennou and Emmanuel Lochin firstname.lastname@isae.fr

DMIA, ISAE, University of Toulouse, France

(2)

2

Introduction

Datasets are key in DTN performance evaluation, but. . .

Issues

Data collection is hard to setup

Traces do not capture limitations on node buffers and transfer bandwidth

They may miss some contact opportunities

(4)

3

Introduction

Datasets are key in DTN performance evaluation, but. . .

Issues

Data collection is hard to setup

Traces do not capture limitations on node buffers and transfer bandwidth

They may miss some contact opportunities

(5)

4

Datasets studied

Characteristics

Rollernet MIT Infocom 2005 Technology Bluetooth

Duration (days) 0.125 284 3 Granularity (s) 15 300 120 Internal nodes 62 89 41 Internal contacts 60,146 114,046 22,459

(6)

5

Assumptions

In the following, we focus on:

Choice of nodes Symmetry of the pairs

Minimum number of contacts Treatment of 0-second contacts Dataset time span

Inter-contact definition

(7)

5

Assumptions

In the following, we focus on: Choice of nodes

Symmetry of the pairs

Minimum number of contacts

Treatment of 0-second contacts Dataset time span

Inter-contact definition

(8)

6

Impact on dataset analyses (1/5)

Baseline assumptions

Choice of nodes: internal only. Symmetry of the pairs: asymmetrical. Minimum number of contacts: not enforced. 0-second contacts: extended to 1 second.

Power-law parameters α and xmin: xminis the measurement

granularity.

(9)

7

Impact on dataset analyses (2/5)

0-second contacts

5000 first seconds of Rollernet

1 2 5 10 20 50 100 200 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Rollernet 0s−>1s Rollernet >0s Rollernet >=15s

(10)

7

Impact on dataset analyses (2/5)

0-second contacts MIT 1 100 10000 0.0 0.2 0.4 0.6 0.8 1.0 Time (s) P[X>x] MIT 284 days 0s−>1s Pareto alpha=1.534 xmin=300 MIT 284 days >0s

(11)

8

Impact on dataset analyses (3/5)

Pareto lower bound estimation

Measurement granularity vs. estimation (Clauset et al.) Infocom 2005 (granularity = 120s) 100 200 500 1000 2000 5000 10000 20000 50000 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Data

Pareto alpha= 1.886 xmin= 120 Pareto alpha= 2.676 xmin= 1402

(12)

9

Impact on dataset analyses (4/5)

Trace length Rollernet 1 2 5 10 20 50 100 200 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Rollernet 5000s Rollernet full trace

(13)

10

Impact on dataset analyses (5/5)

External nodes

5000 first seconds of Rollernet

1 2 5 10 20 50 100 200 1e−04 1e−03 1e−02 1e−01 1e+00 Time (s) P[X>x] Rollernet internal Rollernet internal+external

(14)

11

Checklist proposal

Did I discard some values or periods of the dataset?

Ex.: 0-second, weekends. . .

Did the fitting method discard some data?

Ex.: Pareto lower bound xmin.

Did I change some values?

(15)

11

Checklist proposal

(16)

11

Checklist proposal

(17)

12

Conclusions

Contributions

Summary of pre-analysis assumptions from the literature Study of their influence on statistical analyses

Strong influence of 0-second contacts and Pareto lower bound estimation

Weaker influence of trace length and external nodes

Checklist proposal

Future work

Research the other assumptions Extend the work to pairwise metrics

(18)

13

Revisiting Pitfalls of DTN Datasets Statistical Analysis