UFR SCIENCES
Ecole Doctorale de Sciences et Technologies de l’Information et de la Communication
T H E S E
pour obtenir le titre de
Docteur en Sciences de l’Universite de Nice-Sophia Antipolis
Discipline : Informatique
pr´esent´ee et soutenue par
Matti SIEKKINEN
TITRE
Root Cause Analysis of TCP Throughput:
Methodology, Techniques, and Applications
Th`ese dirig´ee par
Ernst BIERSACK
Soutenue publiquement le 30.10. 2006 devant le jury compos´e de
Dr. Walid DABBOUS pr´esident
Professeur Dr. Georg CARLE rapporteur
Dr. Tijani CHAHED rapporteur
Professeur Dr. Ernst W. BIERSACK directeur de th`ese
Dr. Guillaume URVOY-KELLER co-directeur de th`ese
First of all, I would like to express my deepest gratitude to my thesis advisor Prof. Dr. Ernst W. Biersack. He was the one who suggested me to pursue a Ph.D. thesis before I even imagined that I would have the capabilities to do that. Looking back now, the decision to accept this proposal is among the best I have made in my life. You really taught me a lot and your door was always open, thank you Ernst.
I am also grateful to my thesis co-advisor Dr. Guillaume Urvoy-Keller. Guillaume contributed enormously to this thesis. I could always go to him when I had a problem to crack, was unsure of my own reasoning, or lacked faith in what I was doing. Thank you Guillaume for all the long and fruitful discussions. I am also grateful to Prof. Dr. Vera Goebel who taught me among other things to better understand database and operating systems. Vera’s contribution to the thesis was essential and such that I could not have acquired from the “networking people”.
I would like to thank Dr. Taoufik En-Najjary for his major contribution by the development of the PPrate tool and for all the interesting discussions we had. If I had a problem involving mathematics, I knew Taoufik was the person to turn to. I also would like to thank Prof. Dr.
Thomas Plagemann whom I had the opportunity to work with in the early phases of the thesis.
I also want to thank colleagues at Eurecom, especially my good friends Walid Bagga, Luca Brayda, and Federico Matta with whom we go back a long way, and office mates Fabien Pouget, Suna Melek ¨Onen, Corrado Leita, and Van Hau Pham, and J´erˆome H¨arri (who I consider as an office mate even though he sat in another office) for their friendship and good times. I also owe thanks to all the other people in Corporate Communications Department and Institut Eur´ecom that contributed to the success of this thesis.
Last but not least, I wish to dedicate a very special thanks to my family for the endless support and help they provided me with, and close friends in Finland who were there for me. Kiitos.
i
The interest for the research community to measure the Internet has grown tremendously during the last couple of years. This increase of interest is largely due to the growth and expansion of the Internet that has been overwhelming. We have experienced exponential growth in terms of traffic volumes and number of devices connected to the Internet. In addition, the heterogeneity of the Internet is constantly increasing: we observe more and more different devices with different communication needs residing in or moving between different types of networks. This evolution has brought up many needs – commercial, social, and technical needs – to know more about the users, traffic, and devices connected to the Internet. Unfortunately, little such knowledge is available today and more is required every day. That is why Internet measurements has grown to become a substantial research domain today.
This thesis is concerned with TCP traffic. TCP is estimated to carry over 90% of the Internet’s traffic, which is why it plays a crucial role in the functioning of the entire Internet. The most important performance metrics for applications is typically throughput, i.e. the amount of data transmitted over a period of time. Our definition of the root cause analysis of TCP throughput is the analysis and inference of the reasons that prevent a given TCP connection from achieving a higher throughput. These reasons can be many: application, network, or even the TCP protocol itself.
This thesis comprises three parts: methodology, techniques, and applications. The first part introduces our database management system-based methodology for passive traffic analysis. In that part we explain our approach, the InTraBase, which is based on an object-relational data- base management system. We also describe our prototype of this approach, which is implemented on PostgreSQL, and evaluate and optimize its performance.
In the second part, we present the primary contributions of this thesis: the techniques for root cause analysis of TCP throughput. We introduce the different potential causes that can prevent a given TCP connection to achieve a higher throughput and explain in detail the algorithms we developed and used to detect such causes. Given the large heterogeneity and potentially large impact of applications that operate on top of TCP, we emphasize their analysis.
The core of the third part of this thesis is a case study of traffic originating from clients of a commercial ADSL access network. The study focuses on performance analysis of data transfers from a point of view of the client. We discover some surprising results, such as poor overall performance of P2P applications for file distribution due to upload rate limits enforced by client applications. The third part essentially binds the two first ones together: we give an idea of the capabilities of a system combining the methodology of the first part with the techniques of the second part to produce meaningful results in a real world case study.
iii
L’int´erˆet pour la m´etrologie de l’Internet s’est beaucoup accru ces derni`eres ann´ees. Ceci est en grande partie dˆu `a la croissance de l’Internet en termes de volumes de trafic et de nombre de machines reli´es `a l’Internet. Cette ´evolution a sucit´e beaucoup d’envies - du point de vue commercial, social, et technique - d’en savoir plus au sujet des utilisateurs et du trafic Internet en g´en´eral. Malheureusement, il y a peu de connaissances de ce type disponibles aujourd’hui.
C’est pourquoi la m´etrologie de l’Internet est devenue un domaine substantiel de recherches.
Cette th`ese porte sur l’analyse du trafic TCP. On estime que TCP transporte 90% du trafic Internet, ce qui implique que TCP est une pi`ece essentielle dans le fonctionnement de l’Internet.
La m´etrique de performance la plus importante pour les applications est, dans la plupart des cas le d´ebit de transmission ; c’est-`a-dire la quantit´e des donn´ees transmises par p´eriodes de temps.
Notre objectif est l’analyse du d´ebit de transmission de TCP et l’identification des raisons qui empˆechent une connexion TCP d’obtenir un d´ebit plus ´elev´e. Ces raisons peuvent ˆetre multiples : l’application, le r´eseau, ou mˆeme le protocole TCP lui-mˆeme.
Cette th`ese comporte trois parties. Une premi`ere partie sur la m´ethodologie, une seconde sur techniques d’analyse de TCP, et une derni`ere qui est une application de ces technique.
Dans la premi`ere partie, nous pr´esentons notre m´ethodologie bas´ee sur un syst`eme de gestion de base de donn´ees (DBMS) pour l’analyse passive de trafic. Nous expliquons notre approche, nomm´ee InTraBase, qui est bas´ee sur un syst`eme de gestion de base de donn´ees objet-relationelle.
Nous d´ecrivons ´egalement notre prototype de cette approche, qui est impl´ement´e au dessus de PostgreSQL, et nous ´evaluons et optimisons ses performances.
Dans la deuxi`eme partie, nous pr´esentons les contributions principales de cette th`ese : les techniques d’analyse des causes du d´ebit de transmission TCP observ´e. Nous pr´esentons les diff´erentes causes potentielles qui peuvent empˆecher une connexion TCP d’obtenir un d´ebit plus
´elev´e et nous expliquons en d´etail les algorithmes que nous avons d´evelopp´e pour d´etecter ces causes. Etant donn´e leur h´et´erog´en´eit´e et leur impact sur le d´ebit TCP, nous accordons une grande importance aux applications au dessus de TCP.
La troisi`eme partie de cette th`ese est une ´etude de cas du trafic des clients d’un r´eseau d’acc`es commercial d’ADSL. L’´etude se concentre sur l’analyse des performances des transferts de donn´ees d’un point de vue client. Nous d´emontrons quelques r´esultats ´etonnants, tel le fait que les performances globalement faibles des applications pair-`a-pair sont dues aux limitations du d´ebit de transmission impos´ees par ces applications (et non `a la congestion dans le r´eseau).
v
1. Introduction 1
1.1. The Internet: Measurement Target in Constant Motion . . . 1
1.2. Root Cause Analysis of TCP Traffic: What and Why? . . . 2
1.3. Thesis Claims and Structure . . . 3
Part I Methodology: Manageable Approach for Passive Traffic Analysis 5 Overview of Part I 7 2. Measuring the Internet 9 2.1. Setting the Measurement Context . . . 9
2.1.1. Passive and Active Measurements . . . 9
2.1.2. Reducing Passive Measurement Data . . . 10
2.2. Analysis of Passive Measurements . . . 11
2.2.1. Challenges . . . 11
2.2.1.1. Management . . . 12
2.2.1.2. Analysis Cycle . . . 12
2.2.1.3. Scalability . . . 13
2.2.2. Database Systems to the Rescue? . . . 14
2.2.3. Existing Approaches . . . 14
2.3. Conclusions . . . 16
3. InTraBase: Integrated Traffic Analysis Based on Object-Relational DBMS 17 3.1. Approach . . . 17
3.1.1. IntraBase and Other Approaches . . . 17
3.1.2. Fully Integrated Solution Based on Object-Relational DBMS . . . 18
3.1.3. Benefits From Our Approach . . . 19
3.1.3.1. DBMS Is All About Management . . . 19
3.1.3.2. Shorter Analysis Process Cycle . . . 20 vii
3.1.3.3. Improved Scalability . . . 20
3.2. PgInTraBase: Prototype Implementation of InTraBase . . . 21
3.2.1. Database Schema . . . 21
3.2.2. Processing a Trace: Populating Tables . . . 22
3.2.3. Analyzing Processed Data . . . 23
3.2.4. Properties of PgInTraBase . . . 23
3.3. Conclusions . . . 24
4. Evaluation and Optimization of the InTraBase 25 4.1. Evaluation of the Prototype . . . 25
4.1.1. Feasibility of PgInTraBase . . . 25
4.1.1.1. Processing Time of the Initial Steps . . . 26
4.1.1.2. Disk Space Consumption . . . 27
4.1.2. Comparison of InTraBase and Tcptrace . . . 28
4.2. Optimizing the DBS for Efficient Analysis . . . 29
4.2.1. Tuning the DBMS . . . 31
4.2.2. Identifying and Decomposing the Typical Analysis Task . . . 32
4.2.3. Cost Minimization of the Typical Analysis Task . . . 33
4.2.3.1. Indexes for Fast Lookup . . . 33
4.2.3.2. Clustering to Minimize Cost of I/O Reads . . . 34
4.2.3.3. Parallel I/O . . . 36
4.2.3.4. Caching . . . 37
4.3. Evaluation of the Impact of Optimization . . . 37
4.4. Conclusions . . . 38
Conclusions for Part I 39 Part II Root Cause Analysis of TCP Traffic 41 Overview of Part II 43 5. Origins of TCP Transfer Rates 45 5.1. TCP . . . 45
5.1.1. Connection Establishment and Tear Down . . . 46
5.1.2. Error Control: Cumulative Acknowledgments and Timeouts . . . 46
5.1.3. Flow Control: Sliding Window Technique . . . 47
5.1.4. Congestion Control: Resizing the Sliding Window . . . 48
5.1.4.1. Slow Start and Congestion Avoidance . . . 48
5.1.4.2. TCP Tahoe: Fast Retransmit . . . 48
5.1.4.3. TCP Reno: Fast Retransmit & Fast Recovery . . . 48
5.1.4.4. TCP NewReno: Improved Handling of Multiple Losses During Fast Recovery . . . 50
5.1.4.5. Other TCP Versions . . . 50
5.2. What Limits the Transmission Rate of TCP? . . . 50
5.2.1. Application . . . 50
5.2.2. TCP Layer . . . 53
5.2.2.1. TCP End-Point Buffers . . . 53
5.2.2.2. Congestion Avoidance Mechanism: Transport Limitation . . . . 55
5.2.2.3. Short Transfers: Slow Start Mechanism . . . 56
5.2.3. Network . . . 56
5.2.4. Middleboxes . . . 60
5.3. Related Work . . . 62
5.3.1. Analytical Work: Modeling TCP . . . 63
5.3.2. Measurement-Based Analysis . . . 63
5.3.2.1. TCP Performance & Deployment Status . . . 63
5.3.2.2. TCP and the Network . . . 64
5.3.2.3. TCP and Applications . . . 65
5.3.3. TCP Extensions and Improvements . . . 66
5.3.4. Root Cause Analysis . . . 66
5.4. Scope of Our Work . . . 67
6. Applications and Their Interaction with TCP 69 6.1. Isolate & Merge (IM) Algorithm . . . 70
6.1.1. Context . . . 70
6.1.2. Procedures . . . 71
6.2. Validation . . . 71
6.3. Data Sets . . . 73
6.4. Distortion Due to ALPs on End-to-end Path Studies . . . 74
6.4.1. Studying Characteristics of Rates . . . 74
6.4.2. Case Study on RTT Estimation . . . 78
6.5. What Can We Learn From The Different Periods? . . . 80
6.5.1. Properties of the BTPs Identified . . . 80
6.5.2. Discovering the Nature of an Application . . . 81
6.6. Conclusions . . . 83
7. Analysis of TCP Bulk Transfers 85 7.1. Quantitative Analysis: The Limitation Scores . . . 85
7.1.1. Determining Position of Measurement Point . . . 86
7.1.2. Metrics Inferred from Packet Headers . . . 87
7.1.3. Limitation Scores . . . 88
7.1.4. Validations . . . 92
7.1.5. Sources of Errors and Inaccuracy . . . 94
7.2. Interpreting the Limitation Scores: the Classification Scheme . . . 95
7.2.1. Scores and Thresholds . . . 95
7.2.2. Accounting for Middleboxes . . . 97
7.3. Inferring the Threshold Values . . . 98
7.3.1. Experimentation Setup . . . 98
7.3.2. Threshold for Retransmission Score . . . 100
7.3.3. Threshold for Receiver Window Limitation Score . . . 100
7.3.4. Threshold for Dispersion Score . . . 100
7.3.5. Threshold for B-Score . . . 100
7.3.6. Root Cause Classification Results for the Experiments . . . 105
7.3.7. Critical Discussion of Our Approach . . . 108
7.4. T-RAT . . . 109
7.4.1. On the Flight Nature of TCP . . . 109
7.4.2. Comparison With Our Methods . . . 112
7.4.2.1. Unshared Bottleneck Link/Bandwidth Limitation . . . 113
7.4.2.2. Shared Bottleneck Link/Congestion Limitation . . . 114
7.4.2.3. Receiver Limitation . . . 114
7.4.2.4. Application Limitation . . . 116
Conclusions for Part II 119 Part III Real World Case Study on TCP Root Cause Analysis Using InTraBase 121 Overview of Part III 123 8. Adapting InTraBase for TCP Root Cause Analysis 125 8.1. Extended Design of InTraBase for Root Cause Analysis . . . 125
8.1.1. Table Layout . . . 125
8.1.2. Indexes . . . 126
8.2. Root Cause Analysis Functions . . . 127
8.2.1. Populating Root Cause Analysis Tables . . . 127
8.2.2. Going Further With Triggers . . . 128
9. Case Study on Performance Analysis of ADSL Clients 129 9.1. Monitoring the ADSL Access Network of France Telecom . . . 130
9.1.1. Architecture and Setup . . . 130
9.1.2. Main Constraints and Challenges . . . 130
9.2. Traffic Characteristics: Applications, Connections, and Clients . . . 131
9.2.1. General Characteristics of the Traffic . . . 131
9.2.1.1. Traffic per Application . . . 131
9.2.1.2. Traffic per Connection . . . 132
9.2.2. Client Behavior . . . 133
9.2.2.1. Volumes and Applications . . . 133
9.2.2.2. Access Link Utilization . . . 135
9.3. Performance Analysis of Clients . . . 137
9.3.1. Taxonomy of Factors Limiting the Performance of Clients . . . 137
9.3.2. Observed Limiting Factors for Clients . . . 140
9.3.2.1. Main Limitation . . . 141
9.3.2.2. Limitations Experienced . . . 141
9.3.3. Throughput limitation causes experienced by major applications . . . 142
9.3.4. Impact of Limiting Factors On Performance . . . 143
9.3.5. Comparison With Other Related Analysis Work . . . 147
9.4. Closer Look at a Few Example Clients . . . 147
9.5. Conclusions . . . 152
Conclusions for Part III 153 10.Thesis Conclusions 155 10.1. Evaluation of the Thesis Work . . . 155
10.1.1. Claims and Contributions . . . 155
10.1.2. Critical Viewpoint . . . 157
10.2. Future Work . . . 157
10.2.1. InTraBase . . . 157
10.2.2. Root Cause Analysis of TCP Throughput . . . 158
Bibliography 161 Appendix 169 A. List of Abbreviations, Acronyms, and Parameters 171 B. Detailed Analysis of PgInTraBase Performance Measurements 173 B.1. Impact of Indexing and Clustering . . . 173
B.2. Measuring the Effectiveness of Caching . . . 175
B.3. The Impact of Parallel I/O: RAID Striping . . . 176
B.4. DBMS as the Final Bottleneck . . . 176
C. Descriptions of Isolate & Merge Algorithms 179 C.1. Isolate . . . 179
C.2. Merge . . . 180
D. Formal Definitions for Computed Metrics 183 D.1. Inter-arrival Times of Acknowledgments . . . 183
D.2. Round-trip Time . . . 183
D.3. Receiver Advertised Window . . . 184
D.4. Outstanding Bytes . . . 184
D.5. Retransmissions . . . 185
E. R´esum´e de la Th`ese 187 E.1. Introduction . . . 187
E.1.1. Internet : une ´evolution continuelle . . . 187
E.1.2. Analyse des causes du d´ebit de transmission de TCP . . . 188
E.1.3. Contributions de la th`ese . . . 189
E.2. R´esum´e des Trois Parties de la Th`ese . . . 190
E.2.1. Premi`ere Partie : M´ethodologie . . . 190
E.2.1.1. M´etrologie de l’Internet . . . 190
E.2.1.2. InTraBase : Analyse de trafic int´egr´ee et bas´ee sur un syst`eme de gestion de base de donn´ees relationnelle orient´ee objet . . . . 190
E.2.1.3. ´Evaluation et optimisation de l’InTraBase . . . 191
E.2.2. Partie 2 : Analyse des Causes du D´ebit de Transmission TCP . . . 193
E.2.2.1. Causes de limitation des transferts de TCP . . . 193
E.2.2.2. Identification des causes de limitation . . . 194
E.2.3. Partie 3 : Etude de Cas sur l’Analyse du Trafic d’un R´eseau d’Acc`es d’ADSL197 E.2.3.1. Adaptation de l’InTraBase pour l’analyse des causes de d´ebit de transmission de TCP . . . 198
E.2.3.2. ´Etude de cas sur des limitations de performance des clients d’ADSL198 E.3. Conclusions . . . 201
E.4. Contributions . . . 201
E.5. Perspectives . . . 202
1.1. The Internet protocol suite. . . 2
2.1. Typical cycle of tasks for the iterative process for off-line traffic analysis. . . 13
3.1. High-level Architecture of the DBS. . . 18
3.2. Integrated data and tool management. . . 19
3.3. Cycles of tasks for the iterative process for off-line traffic analysis. . . 20
3.4. The layouts of core tables in PgInTraBase after the 5 processing steps. Underlined attributes form a key that is unique for each row. . . 22
4.1. Total processing time of the three steps vs. tcpdump file size. . . 27
4.2. Processing times of different steps with respect to trace file size. . . 27
4.3. Disk space usage for different tcpdumpfile sizes containing bittorrent traffic. . . . 28
4.4. Comparison of Per-Connection Statistics from tcptraceand InTraBase. . . 30
4.5. Executing a typical analysis task. . . 32
4.6. The layouts of core tables with indexes. Numbers in parenthesis indicate the different indexes. . . 34
4.7. The effect of clustering a single connection within two different types of traffic traces of the same size. Black stripes are packets belonging to the connection that is being clustered and their horizontal distance from each other reflects the physical distance on the disk. . . 35
4.8. Elapsed time to index different sizes and types of traces. . . 36
4.9. Elapsed time to cluster different sizes and types of traces. . . 36
4.10. Raid striping over three disks, i.e. RAID level 0. . . 36
5.1. Establishing a connection using the three-way handshake. . . 46
5.2. Sender’s window slides. . . 47
5.3. Evolution of the cwndsize during slow start (SS) and congestion avoidance (CA). 49 5.4. Evolution of the congestion window if Fast Recovery is used. . . 49
5.5. Data flow from the sender to the receiver application through a single TCP con- nection. . . 51
5.6. A short piece of Skype connection. . . 52 xiii
5.7. 20 minutes of a BitTorrent connection. . . 53
5.8. Entire FTP download connection. . . 53
5.9. A piece of a receiver window limited connection. . . 55
5.10. A transport limited bulk transfer period within a long BitTorrent connection. . . 56
5.11. Link utilization along a TCP/IP path where the narrow link is the same as the tight link. . . 58
5.12. Link utilization along a TCP/IP path where the narrow link is not the same as the tight link. . . 58
5.13. A piece of a bandwidth limited transfer where packets are regularly spaced due to the bottleneck link. . . 58
5.14. A piece of transfer limited by a shared bottleneck link. . . 59
5.15. Effect of consecutive losses within a BTP of a long BitTorrent connection. . . 59
5.16. Transfer to a wireless laptop on board of an airplane. . . 61
5.17. Transfer passing through a Packeteer packet shaper. . . 62
5.18. Round-trip time estimation in the middle of the path. . . 65
6.1. Successful merger. . . 71
6.2. Failed merger. . . 71
6.3. CDF of dif f, the fraction of matching samples, for the periods. . . 72
6.4. Q-Q plot of throughput for the BitTorrent BTPs havingdrop= 0.9. . . 75
6.5. Q-Q plot of throughput for the BitTorrent connections transferring at least 100KB. 75 6.6. Throughput of the common connections in the sets of 50 fastest connections vs. 50 fastest BTPs (drop= 0.9) for BitTorrent. . . 77
6.7. Problem with RTT estimation during an ALP. . . 78
6.8. CDFs of ratio of the mean RTTs: RT TALP RT TBT P. . . 79
6.9. Piece of an HTTP connection. Dashed and dotted vertical lines start an ALP and BTP (drop= 0.95), respectively. . . 79
6.10. Number of identified BTPs vs. drop. . . 81
6.11. Total number of identified BTPs+STPs vs. drop. . . 81
6.12. Fraction of all bytes in BTPs vs. drop. . . 81
6.13. Number of identified BTPs per connection, drop= 0.9. . . 81
6.14. Rate limited eDonkey connection. . . 82
6.15. Rate limited BitTorrent connection. . . 82
6.16. CDF plots of duration, volume, and throughput ratios. . . 83
7.1. Determining the measurement position from the three-way handshake of TCP. . 86
7.2. Time series of outstanding bytes and receiver advertised window for a BitTorrent connection. Values are computed using 10 second time windows. . . 89
7.3. CDF plot of receiver window limitation score with threshold lb∈ {1,2,3}. . . 89
7.4. Time sequence diagram of a receiver window limited transfer. Note the clear bursty IAT pattern. . . 90
7.5. Time sequence diagram of a shared bottleneck limited transfer with a high receiver window limitation score. Note the smoothed out IAT pattern. . . 90
7.6. Inter-arrival times of receiver window limited transfer. Black rectangles are sent packets and time runs from right to left. . . 91
7.7. CDF plots of the two receiver window limitation scores when measuring at the sender side. . . 93 7.8. CDF plots of the two receiver window limitation scores when measurement point
is away from sender. . . 93 7.9. CDF of the absolute difference between the Web100 and InTraBase’s scores for
receiver window limitation. . . 94 7.10. Root cause classification scheme. . . 96 7.11. Root cause classification scheme with middleboxes taken into consideration. . . . 97 7.12. CDF plots of the dispersion score when downloading a single file at a time. . . . 101 7.13. CDF plots of the burstiness score when downloading multiple files simultaneously
through a shared bottleneck link or with added delay. . . 101 7.14. Difference of CDF plots between experiments with an artificial bottleneck and
added delay, results with 100ms are excluded. The best matching threshold is found at 0.25 (vertical line). . . 101 7.15. B-scores per server and transfer for experiments with 5Mbit/s bottleneck or 500ms
added delay. Each marker corresponds to a single transfer: x is with delay, o is with a bottleneck. Y values are b-scores, x values are servers. . . 102 7.16. B-scores per server and transfer for experiments with 3Mbit/s bottleneck or 500ms
added delay. . . 103 7.17. B-scores per server and transfer for experiments with 10Mbit/s bottleneck or
200ms added delay. . . 104 7.18. B-scores per server and transfer for experiments with 1Mbit/s bottleneck or 400ms
added delay. . . 105 7.19. Classification of BTPs into clear root causes. . . 106 7.20. Root cause classification of the three data sets with only a single download at a
time. . . 107 7.21. Root cause classification of the three data sets with ten parallel downloads. . . . 107 7.22. Root cause classification of the three data sets with three parallel downloads and
added delay. . . 108 7.23. Simulation Configurations. . . 110 7.24. Histograms of inter-arrival times of packets. . . 111 7.25. Evolution of the PDF of the inter-arrival times of packets from a receiver window
limited connection without and with cross traffic. . . 111 7.26. T-RAT’s classification by limitation cause for traffic from unshared bottleneck
link experiments. . . 113 7.27. T-RAT’s classification by limitation cause for traffic from shared bottleneck link
experiments. . . 115 7.28. RTT evolution of an example transfer over an ADSL access link with a particularly
deep buffer. . . 115 7.29. T-RAT’s classification by limitation cause for traffic from receiver limited exper-
iments. . . 116 7.30. T-RAT’s classification by limitation cause for eMule traffic limited by the applic-
ation. . . 117 7.31. Piece of an application limited eMule transfer. . . 118
8.1. Table layouts of intrabase adapted for TCP root cause analysis. Underlined
attributes form a key that is unique for each row. . . 126
9.1. Architecture of the monitored ADSL platform. . . 130
9.2. Amount of bytes transferred by different applications during the day. . . 132
9.3. CCDF plot of size of connections. Note the logarithmic scale of both axes. . . 133
9.4. Cumulative fraction of all bytes as a function of the connection size. . . 133
9.5. CCDF plot of bytes transferred by clients. . . 134
9.6. Cumulative fraction of all bytes transferred as a function of bytes transferred by a given client. . . 134
9.7. Amount of bytes transferred by client vs. time that client is active. . . 134
9.8. CDF plot of upper bound for link utilization per client for a 30min period: mean throughput divided by maximum instantaneous throughput. For each client, we selected the period during which that client achieved maximum throughput. . . . 136
9.9. Amount of transferred application limited bytes during the day for most common applications. . . 142
9.10. Amount of transmitted bytes through saturated access link by different applications.143 9.11. Amount of transmitted bytes whose rate is limited by a distant link by different applications. . . 144
9.12. CDF plot of access link utilization during ALPs (application) and BTPs limited by different causes. For each client, we considered only traffic of the 30 min period during which that client achieved the highest instantaneous throughput of the day.145 9.13. CDF plot of maximum aggregate per-host download throughput computed over five second intervals. . . 146
9.14. Client A’s link utilization per half hour period during the day. . . 148
9.15. Three-hour piece of an activity plot for client A that transfers a lot of bytes and is active all day. . . 149
9.16. Close up of client A’s connections originating likely from Web browsing that cause the peak download rates. . . 149
9.17. Activity plot for client B that is active only during an hour and most likely browses the Web. . . 149
9.18. Activity plot for client C that transfers a lot of bytes but is active only approx- imately five hours. . . 150
9.19. Client C’s link utilization per half hour period during the day. . . 151
B.1. Total execution time of the c-query for the Gigabit trace. . . 174
B.2. Total execution time of the c-query for the BitTorrent trace. . . 174
B.3. CPU iowait time of the c-query for the Gigabit trace. . . 175
B.4. CPU iowait time of the c-query for the BitTorrent trace. . . 175
B.5. Number of sectors read when executing the c-query for the Gigabit trace. . . 175
B.6. Number of sectors read when executing the c-query for the BitTorrent trace. . . 175
B.7. Average execution time of the c-query for the Gigabit trace. . . 176
B.8. Average CPU iowait time of the c-query for the Gigabit trace. . . 176
B.9. Average execution time of the c-query with and without the ORDER BYclause for the BitTorrent trace. . . 177
B.10.Average execution times of the original c-query and a modified c-query that only counts packets for the BitTorrent trace. . . 177 C.1. Round-trip time estimation from a two-way data transfer. . . 180 D.1. Determining the measurement position from the three-way handshake of TCP.
This figure appears with detailed explanations in Section 7.1.1 . . . 185 E.1. The Internet protocol suite. . . 188 E.2. Architecture du syst`eme de base de donn´ees (DBS). . . 191 E.3. Les tables de base utilis´ees dans PgInTraBase. Les param`etres sous-lign´es forment
un clef unique pour chaque ligne de la table. . . 192 E.4. Le temps total pour traiter une trace de tcpdumpd’une taille vari´ee. . . 193 E.5. Flux de donn´ees de l’application ´emettrice `a l’application qui r´eceptionne par une
simple connexion TCP. . . 194 E.6. Fusion r´eussie. . . 195 E.7. Fusion ´echou´ee. . . 195 E.8. Le num´ero de s´equence en fonction du temps pour un transfert limite par la
fenˆetre de r´ecepteur avec les b-points hauts. . . 196 E.9. Le num´ero de s´equence en fonction du temps pour un transfert limite par la
fenˆetre de r´ecepteur avec les b-points bas. . . 196 E.10.Le sch´ema de classification. . . 197 E.11.La conception du prototype PgInTraBase avec les tables n´ecessaires pour l’analyse
des causes de d´ebit. . . 198 E.12.Les volumes de donn´ees transmis par diff´erents applications pendant la journ´ee. . 199
2.1. Different measurement approaches to achieve data reduction. Data reduction
values are only indicative. . . 10
2.2. Characteristics of different approaches for traffic analysis. Traffic volumes are in the order of magnitude. . . 14
5.1. Summary of different application types. . . 54
6.1. Trace characteristics. . . 73
6.2. Mean values of the throughput ratio. . . 75
6.3. Coefficients of correlation between log of throughput and log of number of bytes transferred. Only connections transferring at least 100KB were included and drop= 0.9 was used when determining the BTPs. . . 76
7.1. Selected mirror sites. . . 99
9.1. Percentages of clients that transmit most bytes using a specific application. . . . 135
9.2. Percentage of active clients that sustain utilization of their access link above specific thresholds for a given fraction of a 30-minute period. . . 137
9.3. Number of active clients limited by different causes. . . 141
B.1. Average values of the measurements. . . 173
xix
Introduction
1.1 The Internet: Measurement Target in Constant Motion
The Internet, started up as a research project of ARPA (Advanced Research Projects Agency) in the USA back in 1969, has evolved into an immense network connecting hundreds of millions of devices today. Its size is matched only by its diversity: On one hand, the devices connected to the Internet comprise PCs, servers, mobile phones, satellites, PDAs, sensors etc. On the other hand, there is a vast amount of services available today, including radio, television, telephone, videoconferencing, instant messaging, and peer-to-peer (P2P) content distribution in addition to the traditional email and World Wide Web (WWW).
Nevertheless, it is a fact that many questions about the behavior and characteristics of the Internet are open. While we are well aware of the characteristics of the individual building blocks of the Internet, it is the whole system in operation that is in many ways perceived as a
“black box”. For example, we would like to know what is precisely the size of the Internet, or just a part of it, in terms of connected nodes. As another example, it is non-trivial to find out the structure, i.e. the topology, of a given part of the Internet. We simply do not have many of the required quantitative metrics to answer these questions. There are many reasons for this unfortunate situation. As the authors of [40] point out, the evolution of the Internet has not been a centralized effort. Several parties have contributed to it, many with different objectives.
In addition, the Internet is dynamic: devices come and go and new networks emerge.
The purpose of the research domain of Internet measurements is to provide answers to these open, yet important, questions. There are a multitude of reasons to do this. In [40], the authors distinguish three categories of reasons: commercial, social, and technical. From the commercial point of view, measurements are crucial for, e.g. Internet Service Providers (ISP) in order to evaluate and troubleshoot the performance of their clients. An example of a social reason is the need to know client behavior in the emergence of new popular applications. Technical reasons are related to evolution of devices and protocols operated in the Internet. As an example the authors of [40] name router design which depends strongly on the characteristics of the traffic it needs to process, e.g. the flow size distributions.
The Internet is an immense moving target to measure. That is why it is a great challenge to 1
measure and characterize it. Its dynamicity appears in many flavors: First of all, the Internet is in constant evolution. The set of services available evolve and change all the time, the amounts of users and traffic volumes grow at exponential rates. On one hand, this rapid evolution increases the need for measurements. For instance, in [36] the authors provide evidence of the dramatic impact of the emergence of new popular applications on traffic characteristics and its implications on network capacity engineering. On the other hand, the evolution brings up new issues in measurements: The volumes of measurement data are ever growing, which complicates the analysis process and poses significant storage problems. Second, there is no such thing as a representative snapshot of the Internet, which means that good local metrics today are not necessarily good local metrics tomorrow. Similarly, good local metrics are perhaps never good metrics elsewhere. For example, application traffic mix and user behavior can differ a lot depending on the day for a given network, and they can be completely different between enterprise networks and ADSL (Asymmetric Digital Subscriber Line) access networks.
1.2 Root Cause Analysis of TCP Traffic: What and Why?
In order to understand what we mean by TCP root cause analysis and why it is important, we need to review some facts about the way Internet functions. Devices connected to the Internet communicate with each other using a common Internet protocol suite. Figure E.1 shows the stack structure of this suite. Each application hands data to be transferred to the lower layer, the transport layer, which is responsible for end-to-end transportation of the data. Two transport layer protocols form the core of the layer in the current Internet: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). The layer below, network layer, consists only of the Internet Protocol (IP) that is used by TCP to package and transmit pieces of data from source to destination.
Application
Link Network Transport
DNS, FTP, HTTP, IMAP, IRC, NNTP, POP3, SMTP, SNMP, SSH, TELNET, BitTorrent, RTP, rlogin, ...
Ethernet, Wi−Fi, Token ring, PPP, SLIP, FDDI, ATM, Frame Relay, SMDS, ...
TCP, UDP
IP
Figure 1.1: The Internet protocol suite.
On the highest layer in Figure E.1, the set of applications contributing most to the traffic in the Internet has changed over the last couple of years from WWW and file transfer (FTP) to P2P applications, and new Internet applications such as RSS feeds or PodCast are emerging constantly. In addition, application mix varies significantly between different environments (e.g.
enterprise vs. access networks). However, TCP still transports the majority of bytes, typically over 90%. This fact together with the rapid growth of traffic volumes highlight the versatility of
TCP but also raise the question of how TCP and these new applications perform in these new environments. As a consequence, the analysis of TCP as a protocol and of TCP traffic is even more vital than before.
Throughput, defined as the amount of bytes transmitted within a specified interval of time, is typically the most important performance metric of an Internet application. Consider, for instance, a file download using FTP. The faster the download finishes, the better. Our definition of root cause analysis of TCP traffic is the analysis and inference of the reasons that prevent a given TCP connection from achieving a higher throughput. We often refer to these reasons as (rate) limitation causes in this thesis. While such an analysis may seem trivial at first sight, it will become clear for the reader that it is far from it due to the variety of ways these different limitation causes may manifest themselves within the TCP traffic and the constraints imposed by the measurement context. Indeed, it is often the case that many metrics that are required for this analysis cannot be directly measured but instead need to be estimated, which complicates significantly the analysis.
Knowledge about the root causes of TCP traffic implies knowledge about the root causes of the vast majority of the Internet’s traffic. That is why this knowledge is very powerful and usable in diverse ways. For example, it could be used by ISPs to troubleshoot their access network or the clients’ performance within that network, or it could enable evaluation of the operational performance of a deployed Internet application.
1.3 Thesis Claims and Structure
We make the following four claims in this thesis:
I. We can overcome many of the problems in management and suboptimal analysis process cycle in passive packet-level traffic analysis by adopting a Database Management System (DBMS) -based approach.
(a) An implementation of such an approach performs feasibly.
(b) We can significantly improve the performance of such a system with Input/Output (I/O) optimizations based on characteristics of packet-level traffic data and popularity assessment of queries.
II. It is possible to infer root causes for TCP throughput from bidirectional packet traces recorded passively in a single measurement point located anywhere on the TCP/IP path (end-point or in the middle). Furthermore, unidirectional traffic traces are insufficient.
III. Different Internet applications interact in complex and different ways with TCP. That is why the effects of applications need to be first filtered out whenever studying the character- istics of the underlying TCP/IP end-to-end path.
IV. Our TCP root cause analysis methods implemented with our DBMS-based approach for traffic analysis enable:
• performance evaluation of Internet application protocols,
• evaluation of network utilization, and
• identification of certain TCP configuration problems.
The structure of the thesis follows largely the claims. In addition to this introduction and final conclusions, we divided the contents of this thesis in three parts.
In the first part, we introduce the world of Internet measurements to the reader in more detail, set the scope of our work within this context, and present our DBMS-based methodology for traffic analysis. We address claim I in this part. Chapter 4 is particularly concerned with the subclaims I.a and I.b.
The second part focuses on root cause analysis of TCP throughput. We first explain the details of the TCP protocol, related research work on that domain, and the origins of TCP transmission rates. We then describe our approach and algorithms to infer root causes of TCP traffic. The second part addresses claim II (Chapters 5 to 7) and claim III (Chapter 6).
The third part ties together the first and the second part. We first explain in detail how we use the DBMS-based approach presented in Part I to implement the root cause analysis techniques described in Part II. We then go through a use case on ADSL client performance analysis to address claim IV (Chapter 9).
Finally, in the last concluding chapter (Chapter 10) we revisit the thesis claims and evaluate how well we fulfilled them. We also assess the thesis work in general, i.e. in which parts we succeeded well and which parts could have been done better, and identify several directions of future work related to this thesis.
Methodology: Manageable Approach for Passive Traffic Analysis
5
In Part I we motivate, describe, and justify our methodology for analyzing traffic measurements, specifically for the root cause analysis of TCP traffic. In Chapter 2, we present the diverse ways in which the Internet can be measured and describe the method we have chosen: passively captured packet headers. We then explain why the analysis of the measurements poses several challenges due to the vast amounts of unstructured measurement data and the multitude of ways it can be processed.
In Chapter 3, we present our DBMS-based approach for analyzing this passive measurement data called the InTraBase (Integrated Traffic Analysis Based on Object Relational DBMS). We explain how it can help to overcome the issues we presented in Chapter 2. We also describe the running prototype of the InTraBase for TCP traffic analysis that we have built.
We demonstrate that the approach is feasible through performance evaluation of the pro- totype in Chapter 4. Furthermore, we perform a study on optimizing the performance of the prototype and show through measurements that for specific tasks, the optimization phase is vital in order to have good performance.
In summary, this part essentially describes the methodology we used to perform the analysis and obtain the results on root cause analysis of TCP traffic presented in Part II.
7
Measuring the Internet
In this chapter, we briefly describe the different ways of measuring the Internet and explain how our work is positioned in this domain of research. We then enumerate and elaborate on the challenges and issues, and existing approaches and solutions in our chosen measurement context: off-line analysis of passive measurements. Some of the contents of this chapter has been published in [107].
2.1 Setting the Measurement Context
2.1.1 Passive and Active Measurements
The domain of the Internet measurements is rich in the number of different measurement tech- niques to choose from. We can identify two different categories of measurement techniques:
active and passive measurements.
Active techniques measure network characteristics by sending probe packets to infer charac- teristics of the path that the packets follow. Therefore, they are especially suitable for inferring end-to-end properties of a given network path. Active measurements are used for estimating link capacities or available bandwidth [69] [102], computing network coordinates [112], or discovering topology [42], for instance. Simple well-known examples of active measurement tools are ping andtraceroute. A major limitation of active measurement techniques is that they generally re- quire several (at least two) reference locations, measurement points that can coordinate between each other during the measurements. For example, in the context of available bandwidth or ca- pacity estimation, a host sends probe packets and another host receives them and analyzes their inter-spacing to infer the capacity of the network path the probes followed [102].
Passive techniques are used to gather data for analysis of network and traffic characterist- ics by measuring observed traffic on a given host or a router. Passive measurements can be divided into three categories: SNMP/RMON based measurement, packet monitoring, and flow measurement [61]. Measurements using SNMP/RMON require access to the measured routers’
MIBs, which is usually prohibited from the outside of the measured network. In addition, MIBs can provide only status information (e.g. the operational status of the interfaces on a router) or highly aggregated metrics (per-interface counters of bytes and packets inbound and outbound).
9
Packet monitoring is recording a copy of or only some information about packets passing by the measurement point. Flow measurement is recording aggregate statistics about groups of packets. These packets usually belong to the same TCP connection or sequence of UDP/ICMP packets between same host and port pairs and, in addition, appear close to each other in time.
Some of the variety of applications for passive measurements are described in [61]. Examples include diagnosing performance problems and intra-domain route tuning. Note that we do not address here the ways traffic can be generated for passive measurements, which is always specific to the analysis. For instance, one could set up honeypots to attract malicious traffic or simply monitor a university edge aggregate link in order to learn what kind of traffic a large group of students generate. Despite the different analysis objectives, the measurement techniques remain the same for both cases.
This thesis focuses on inferring root causes from traffic observed on a single measurement point. Our objective is to be able to infer these root causes on potentially large set of real traffic in order to learn about and explain the possible existing root causes in the traffic of the current Internet and the way they manifest themselves in the traffic. Therefore, we focus in this thesis on passive measurements and do not address the active measurement techniques further.
2.1.2 Reducing Passive Measurement Data
In the context of passive measurements, it is necessary to consider issues related to storing the data and processing it, i.e. performing analysis tasks on the data. The data consists not only of the measurements but also of results of analysis tasks, that we call derived data, and further data derived from already derived data and measurements. The issues in handling the data in the context of passive measurements arise from the potentially huge amount of primarily unstructured measurement data due to the immense volumes of traffic flowing in the Internet today. Storage requirements and processing time are the first to limit the amounts of traffic that can be analyzed in practice. In order to limit the amount of measurement data, several alternatives to full packet measurements exist. Table 2.1 summarizes the different options.
As usual, each choice has advantages and drawbacks, and the choice depends on the tradeoff between the level of detail and the amount of data.
Table 2.1: Different measurement approaches to achieve data reduction. Data reduction values are only indicative.
measured data data reduction advantages drawbacks
full packets none have it all a lot of data,
privacy concerns packet headers around 1/20 have most of knowledge still a lot of data
(70 B hdr vs. 1.5 KB pkt) in summarized format
data reduction, loose packet details, flows 1/avg(flow size in pkts) x 1/20 feasible on-line connections need to (Cisco’s Netflow) be reconstructed depends completely improved data reduction not usable for
sampled headers/flows on the scheme all types of analysis
e.g. loss estimation
In many situations, the payloads of packets, i.e. the actual data transmitted by the applic- ation and, hence, the main and largest component of traffic, are not necessary for the analysis.
Moreover, packet payloads may contain privacy sensitive information about the user. Because of this, publicly available packet traces generally either do not contain the payloads or have scrambled payloads. For these reasons, many analysis approaches focus only on the packet headers.
Flow-level measurements produce an order of magnitude less data than packet-level meas- urements but have the drawback of loosing the packet-level details. Measured aggregates are usually flows defined as group of packets sharing the same five-tuple (source and destination IP addresses and port numbers and transport protocol number) with specific timeouts, e.g. Cisco’s Netflow, a specific flow record type supported by Cisco’s routers, has 15 second inactive and 30 minute active timeouts. Memory limitations in the routers is the reason why the timeouts exist and aggregates may not be complete TCP connections.
Some research has been done on sampling passive network measurements [44]. The idea is to apply classical sampling methods from mathematics on traffic measurements and, thus, record only a subset of all observed data. Sampling can be applied to packet monitoring or flow measurements. The amount of data recorded depends on the utilized scheme. For example, in [45] the authors propose a method for flow measurements called threshold sampling that dynamically controls sample volumes. Moreover, as stated in [44], the best choice of sample design, and, consequently, the amount of data reduction, depends on the traffic characteristics and statistics needed by applications. Unfortunately, not all types of analysis can be applied to sampled traffic measurements. Consider, for example, end-to-end path diagnostics through identification of retransmitted, reordered, and duplicated packets using the method described in [32]. The method inspects the ordering of TCP sequence numbers and IP identification numbers of packets passing by. A sampled packet stream no longer contains the necessary information for this type of analysis.
In this thesis, we concentrate on the analysis of traffic traces consisting of non-sampled TCP/IP packet headers. Flow-level measurements and sampled packet-level measurements do not convey enough information for the techniques we use. We need to be able to perform detailed packet-level analysis tasks, such as in [32], for instance. On the other hand, packet payloads are considered as unnecessary burden for our analysis.
2.2 Analysis of Passive Measurements
2.2.1 Challenges
The analysis of passively collected measurements is non-trivial as the amount of data is poten- tially very large. In addition, this data is typically stored during the measurement process into files in an unstructured format making it difficult to process afterwards. This type of approach is often called off-line traffic analysis because the analysis is not done while measuring. In contrast, on-line analysis can reduce the amount of data that needs to be stored. However, as on-line analysis means performing the analysis tasks on a continuous stream of traffic, such a system needs to be able to process the input data at a rate equal to its arrival rate, which can severely limit the analysis tasks that can be performed. That is why in certain cases it would be desirable to perform a part of the analysis on-line in order to reduce the amount of data, and then perform the heaviest analysis tasks off-line. The raw measurement data, such as TCP/IP
packet headers, is generally processed and analyzed in many ways. Each analysis task generates new data that needs to be stored and possibly processed again later. In other words, traffic analysis is often an iterative process: A first analysis is performed and based on the results obtained, new analysis goals are defined for the next iteration step. Today, handcrafted scripts and a large number of software tools specialized for a single task are commonly used as the tools for traffic analysis. Putting all these facts together, we identify three major challenges in the analysis of passive Internet measurements: management of data, optimization of the analysis process cycle, and scalability.
2.2.1.1 Management
We identify the problem of management as a result of three facts: 1) many tasks are solved in an ad-hoc way using scripts that are developed from scratch, instead of developing tools that are easy to reuse and understand, 2) traffic analysis involves large amounts of data, and 3) the data is typically stored in plain files in a file system.
By data we mean not only the traffic traces containing unprocessed packet data, but also all derived data generated by each analysis task. In [60], the authors describe this type of research work as data intensive science. They describe the data hierarchy in NASA terminology: “The raw Level 0 data is calibrated and rectified to Level 1 datasets that are combined with other data to make derived Level 2 datasets.” The authors continue: “Most analysis happens on these Level 2 datasets with drill down to Level 1 data.” We can see the analogy with the analysis of passive traffic measurements: For example, Level 0 data is the unstructured raw packet traces, Level 1 data is structured packet data, and Level 2 data is aggregated data such as connection-level statistics. However, the tools used generally do not provide any support for managing this large amount of data in this way. Instead, the data is typically archived in plain files in a file system. The problem is that data stored in files has no structure and files contain no metadata beyond a hierarchical directory structure and file names. In fact, we can see the ad-hoc analysis approach as a result of the plain file data storage, because such unstructured and unannotated data encourages ad-hoc techniques to parse and access the data. The situation gets worse when time passes: Depending on the number of files and the skills of the researcher to properly organize them, the later retrieval of a particular data item may be a non-trivial problem. As Paxson [98] has pointed out, the researchers themselves often cannot reproduce their own results. The issue of preserving research data in a larger scale is also discussed in [66].
2.2.1.2 Analysis Cycle
A common workflow to analyze network traffic proceeds in cycles (see Figure 2.1). Due to the lack of structure and metadata, the semantics of the data are not stored during the analysis process. As a result, reusing intermediate results becomes cumbersome and usually the process needs to restart again from the raw data after modifying or changing parameters of the analysis scripts or tools.
Let us take as an example the analysis of BitTorrent [65], a peer-to-peer system for file distribution. When following the analysis steps, one can identify three iterations of the analysis.
In a first iteration, we studied the global performance of BitTorrent in terms of how many peers succeeded downloading the complete file. From the results, we noticed a large flash-crowd of peers arriving at the beginning. In a second iteration, the performance of individual sessions was studied. First, the raw data was analyzed on the basis of individual sessions that either had
successfully completed the file download or aborted. In the next step, the performance of the individual sessions in both categories was computed. The information from the previous cycle was combined to obtain average performance of a session during the flash-crowd period and after. In a last iteration, we considered the geographical location of the clients that successfully completed their download to study download performance per geographic region. Since the semantics of the data were not stored during the analysis process, reusing intermediate results (e.g. to integrate geographic information) turned out to be cumbersome and most of the time the data extraction had to be done again starting from the raw data after modifying the scripts.
The issue originates from the fact that the tools do not “understand” the data. Structuring and annotating the data can tell the tool where in a sequence of bytes to find the data values and what they mean, i.e. the semantics. In this way, building on intermediate results becomes much easier, since each tool does not need to separately parse each piece of data, and the tool and its user know the type and semantics of the data they are handling. For example, subtracting a timestamp from another one automatically produces an interval. Subsequently, one comparison operation can determine whether a third timestamp belongs to this produced interval.
combine w/
results previous process
filtered data filter
raw data
interpret results
store results in files define new
analysis task
Figure 2.1: Typical cycle of tasks for the iterative process for off-line traffic analysis.
2.2.1.3 Scalability
Scalability is an important issue in traffic analysis as the amount of data is typically large. Often analysis tools are first applied to process small amounts of data. If then applied on larger data sets, it often turns out that the run-time or memory requirement of the tool grows more than linearly with the amount of data, in which case modifications and heuristics are introduced that often sacrifice quality of the analysis for performance. Already a measurement data set larger than some Gigabytes may pose serious problems for certain tools because of too large memory or too long run-time requirements. An example is the well-known tool tcptrace [13] that can be used to produce summary connection-level data or to extract an individual flow given as an input packet data measured usingtcpdump. tcptraceuses a heuristic to determine the end of a TCP connection. While this heuristic is nowhere clearly documented, the major part of it is an inactivity timeout, as generally in flow record definitions. Despite this heuristic, we were unable to process file sizes larger than 6GB, as we will detail in Section 4.1.
Scalability can be thought of also in another dimension: the depth of the analysis. If data management is an issue, the depth, i.e. the complexity, of the analysis that can be performed in practice may be greatly reduced.
Table 2.2: Characteristics of different approaches for traffic analysis. Traffic volumes are in the order of magnitude.
Approach Aggregation Traffic Data MD SW PA IA On-
level volumes mgt mgt mgt line
Ad-hoc scripts varies varies
Specialized tools (tcptrace [13]) varies varies X (X)
Toolkits (CoralReef [79]) varies varies X X (X)
NetLogger/NetMiner [21] flow 10Gbps X X
LOBSTER [2] flow 10Gbps X X
DBMS- Gigascope [39] packet stream Gbps X X
based Internet Traffic Warehouse [34] packet 102 MB/day X X X
IPMon [56] packet TB X
InTraBase packet 101 GB/trace X X X X X
X = feature is supported in the approach
blank = feature is not at all supported or is implemented in an ad-hoc manner (X) = feature is supported in some members of the category
MD = Metadata SW = Software
PA = Publicly Available IA = Integrated Approach
2.2.2 Database Systems to the Rescue?
The challenges and issues discussed in the previous section raise the question whether database management systems (DBMSs) might ease the process of analyzing passively collected traffic measurements. Traditional database systems (DBSs) have been used for more than 40 years for applications requiring persistent storage, complex querying, and transaction processing of data. Furthermore, DBMSs are designed to separate data and their management from applic- ation semantics to facilitate independent application development. Internet protocols have a standardized behavior that leads to well-structured traffic data in the form of packets, and, potentially, could therefore easily be handled with a DBMS. Both, DBSs and plain file systems provide persistent data storage. Thus, in both of them data is physically stored on disk and handling of that data is similar in both approaches. Consequently, one may state that there is only a thin line between file systems and DBSs. The authors of [60] state: “Most file systems can manage millions of files, but by the time a file system can deal with billions of files, it has become a database system.”. We present our DBMS-based approach called the InTraBase in Chapter 3 where we also elaborate more on the benefits and gains from using a DBS for analyzing passive packet-level measurements.
2.2.3 Existing Approaches
Table 2.2 summarizes and compares the different existing approaches for passive traffic analysis.
For comparison, we have also included the InTraBase, but discuss it in detail in Chapter 3.
Data, metadata, and software management are related to the issues in management and ana- lysis process cycle (see Section 2.2.1). Because publicly available solutions are generally more interesting for the research community, we include public availability as a metric. Integrated ap-
proach means that in addition to data, metadata and software are also managed in an integrated way. It is a feature of our approach only, which will be described in Section 3.1. Finally, the capability to analyze traffic on-line is the last feature that we consider. By on-line analysis we mean the ability to perform the analysis tasks on a continuous stream of traffic, i.e. to process the input data at a rate equal to its arrival rate. Naturally, off-line analysis is the other option.
The first approach listed is ad-hoc scripts. However, this approach does not have any of the characteristics we look for. The next step forward are the specialized tools such astcptrace[13]
which allows to analyze atcpdump trace and produce statistics or graphs to be visualized using the xplot software. Still none of the important management issues are considered by these tools.
There have been some efforts toward complete analysis toolkits that are flexible enough to be used in customized ways. One example is the Coralreef software suite [79] developed by CAIDA, which is a package of device drivers, libraries, classes and applications. The programming library provides a flexible way to develop analysis software. The package already contains many ready- made solutions. The drivers support all the major traffic capturing formats. This approach concentrates on the software management aspect but addresses neither the problem of handling nor managing the data, i.e. source data and results, nor managing related metadata. Also scalability is an issue.
The next level of approaches is DBMS-based systems. They usually involve large amounts of measurement data, and therefore, require a lot of attention to the organization and handling of the data, i.e. raw traffic data, associated metadata, and derived analysis data. Several of these systems were given birth by industrial projects done by or aimed at Internet Service Providers (ISP): Gigascope, Internet Traffic Warehouse, and IPMON. Consequently, these systems are tailored mostly to fit the needs of large ISPs. Unfortunately, none of them is publicly available.
Sprint labs initiated a project called IP Monitoring Project (IPMON) [57] [56] to develop an infrastructure for collecting time-synchronized packet traces and routing information, and to analyze terabytes of data. In their architecture, a DBS is used for metadata management only and metadata is stored about raw input data sets, analysis programs, result data sets, and analysis operations. Details about metadata management can be found in [93]. IPMON has adopted CVS for managing software.
Gigascope [39] is a fast packet monitoring platform developed at AT&T Labs-Research. In fact, it is not a traditional DBMS but a Data Stream Management System (DSMS) that allows on-line analysis of traffic arriving at high rates. As a powerful DSMS, Gigascope can handle a high rate traffic stream in real-time. However, the real-time requirements imply that the input data are processed in one pass, what evidently imposes limits on the operations that can be performed. We refer the reader to [99] for a detailed assessment of the suitability of DSMSs for traffic analysis. Gigascope is specialized for network monitoring applications such as health and status analysis of a network or intrusion detection. Gigascope does not manage data nor metadata, which requires another solution. At AT&T, this solution is typically their proprietary data warehouse Daytona [1]. Gigascope has a registry for query processes that are providing output streams according to the associated query. The user can also define his own functions and data types for the queries. In this way, Gigascope addresses the software management problem.
The Internet Traffic Warehouse [34] is a data warehouse for managing network traffic data built at Telcordia. Analysis results on application level are provided by storing application information about traffic in addition to IP packet headers. Using a suite of programs, input traffic traces are filtered and parsed into temporary files, which are subsequently loaded into the data warehouse. This system is especially suitable for accounting at ISPs.
2.3 Conclusions
In this chapter, we have explained how the Internet can be measured and how our work fits into this context. Off-line analysis of passively collected traffic measurements is challenging because of issues in management of tools and data, suboptimal analysis process cycle, and scalability of tools in terms of the amount of data that they can process and the depth of analysis that can be reached. We have suggested that a DBS could help overcome these issues. In the next chapter, we show that this is indeed the case when we describe our DBMS-based solution.
InTraBase: Integrated Traffic Analysis Based on Object-Relational DBMS
As discussed in the previous chapter, off-line analysis of passive traffic measurements is chal- lenging from several points of view. Specifically, one needs to consider issues in management, analysis process cycle, and scalability. We also stated that a DBS could overcome many of these issues. In this chapter, we show that it is indeed the case. We introduce our DBMS-based approach for off-line analysis of passive packet-level measurements called InTraBase. We also describe the prototype of the InTraBase built on top of PostgreSQL, an open-source object- relational DBMS. Most of the work described in this chapter has been published in [107].
3.1 Approach
3.1.1 IntraBase and Other Approaches
The Table 2.2 in Chapter 2 summarizes the differences between the InTraBase and the various other existing approaches for analyzing passive traffic measurements. The following are the main characteristics that differentiate the InTraBase from the other approaches:
• InTraBase is aimed only foroff-line analysis and does not address the packet capturing or on-line monitoring related issues at all
• InTraBase is designed for intensive packet-level analysis of
• moderate size (<50GB) traffic traces.
The InTraBase is not designed, for instance, to monitor the health of a large ISP’s network in real-time due to the immense amounts of data that would need to be treated constantly. It is rather an exploratory tool for fine-grained analysis of Internet traffic.
We wish to perform complex traffic analysis tasks that cannot be performed with a single pass over the input data. For this, we need to be able to make multiple iterations over the analysis process cycle, which is generally impossible with systems capable to do on-line analysis. In addition, we perform filtering and parsing operationswithin the DBS, as opposed to processing
17
the measurement data before loading it into the DBS, and preserve the raw measurements stored in the DBS as unchanged as possible.
We present later the first prototype of the InTraBase, an implementation of our DBMS- based approach. The goal is to devise a platform for traffic analysis that would facilitate the researchers’ task. Our solution:
I. conserves the semantics of data during the analysis process;
II. enables the user to manage his own set of analysis tools and methods;
III. enables the user to share his tools and methods with colleagues;
IV. allows the user to quickly retrieve pieces of information from analysis data and simultan- eously develop tools for more advanced processing;
V. includes a portable graphical user interface for facilitating exploratory analysis;
3.1.2 Fully Integrated Solution Based on Object-Relational DBMS
We advocate a DBMS-based approach for traffic analysis. First of all, wecompletely manage the collected data within the DBS. In other words, we process the “raw” measurement data as little as possible prior to loading it into the DBS. A high-level architectural view of our solution is shown in Figure E.2. We store data from different sources into the DBS. The data that is uploaded into the DBS is referred to as base data. Examples of base data are packet traces collected using tcpdump, but also logs or other data obtained from application layer, or time series created with the help of Web100 [85] that allows to track precisely the state of a TCP connection at the sender client.
data files Raw base TCP
Application
Web100 Application logs
Network link
Preprocess
DBMS
IP tcpdump
Off−line analysis
Base data Derived data
Functions Queries
Results Metadata
Figure 3.1: High-level Architecture of the DBS.
Once the base data is uploaded into the DBS, we process it to derive new data that is also stored in the DBS. This processing includes, for example, computations of aggregate metrics for each identified connection from the packet-level tcpdumpbase data. All the processing is done withinthe DBS using the functions and queries of the DBS (see Figure E.2). The DBS contains not only all the data but also contains reusable elementary functions and more complex tools built on top of the elementary functions, as illustrated in Figure 3.2. The boxes on the lowest layer represent the base data uploaded into the DBS. The middle layer contains the elementary