Studying telecommunication applications - A causal approach to the study of telecommunication n

0 0.2 0.4 0.6 0.8 1

0 0.5

10 0.5 1 1.5 2

Figure 3.10:Example of a Gaussian copula

give smoother estimations of the marginals that will be the support for future predictions.

A brief introduction to copulae can be found in Appendix D

3.5 Studying telecommunication applications

The previous section showed that, in practice, applying the causal theory to real case scenarios requires a certain amount of work to solve some practical issues when trying to apply existing methods to the study of systems where many constraints need to be considered. Now that we can assume that these issues have been solved, we still need to mention some intrinsic constraints, not directly linked to the application of the causal theory but more to the field of telecommunication network studies.

3.5.1 Adopting a new approach to study a known system

The previous section tried to highlight the difficulties that exist for the application of the causal theory. It is therefore highly ambitious to apply this theory to the study of systems comprising a high number of parameters and complex relationships that lead to models often difficult to interpret.

One important challenge in the application of the causal theory to the study of telecommunication networks has been to define the correct trade-off be-tween exploiting the benefits of the causal approach by disclosing new mech-anisms, dependencies or properties and applying the causal approach to sys-tems where the interpretation of causal models could be supported by our do-main knowledge.

A causal approach to the study of Telecommunication networks

74 3. COPING WITH TELECOMMUNICATION NETWORK CONSTRAINTS On one hand telecommunication networks have been widely studied and many theoretical and experimental works have been published [PFTK98, MSBZ07], see also [LGBVBP10] and references therein. On the other hand, most of the existing studies present methods specific to a given application or context.

While most of the studies that will be presented in Chapter 5 and Chapter 6 target a given application and rely on domain knowledge in their interpretation of the results, the method we developed in our work aims to be re used in a different context.

However, due to the relative novelty of our approach and the inherent limita-tions, presented in this chapter, finding an interesting problem where results can be interpreted and still disclose new mechanisms has been an important challenge.

3.5.2 The difficulty of choosing the granularity of our models One important notion to keep in mind when designing a causal study is the one of determinism. We call a relationship between two (or more) parameters, X andY,deterministicif it can be written

Y = f(X). (3.23)

Notice the presence of the=sign instead of the:=used in Equation (1.2). The Equation (3.23) symbolizes the absence of any randomness in the relationship between X and Y such that knowing the value of X defines the value of Y unequivocally. An example would beY =3∗Xand, in such case, we can write X = 1/3∗Y equivalently. Such dependencies are not causal and violate the faithfulness assumption (see Section 1.3.4).

It is important to notice that in telecommunication networks the only random-ness comes from failures or congestion events. The implementation of a given protocol at a given host is mainly deterministic. It corresponds to a sequence of instructions that are run upon a given input event or trigger from external environment. The applications running on client and server machines reacts with deterministic mechanisms to (possibly) random events. The presence of deterministic relationships between parameters of our systems makes the in-ference of its corresponding causal model more complex. In our work, we do not address the issue of uncovering causal dependencies in the presence of deterministic dependencies that violates the faithfulness assumption used in most of the causal model inference algorithms. Some works exist to infer sys-tem causal models in the presence of deterministic dependencies [Bau09]. It is therefore important to chose the level of granularity of the models of the sys-tems we study. If we try to understand the mechanisms that correspond to the implementations of a given protocol on a (set of) local or remote machine(s), there is a high risk to focus on a problem more related to reverse engineering

3.5. STUDYING TELECOMMUNICATION APPLICATIONS 75 where the dependencies between the different parameters are deterministic.

On the opposite, if we stay at a too high level, we might not be able to capture mechanisms that rule the functioning of a given system and fail to model the system performance. Trying to model the functioning of a car without opening the hood would very likely lead to an accurate model.

A counter-example, that will be presented in Chapter 4, concerns the study of the Dropbox application. As a matter of fact, Dropbox implements its own congestion control mechanisms at the application layer which is, somehow, by-passing TCP mechanisms (Dropbox still works on top of TCP however the rate at which the data is provided to TCP turned out to vary with the value of param-eters such as loss or delay which are typically used by the TCP protocol). The causal study of the Dropbox application performance finally converted into an exercise of reverse engineering of the Dropbox protocol. First, it is not our goal to reverse engineering proprietary software. Second, Dropbox has its proto-col defined by a set of (proprietary) specifications implemented in the different pieces of softwares of the different agents of the Dropbox system. An important risk is to eventually try to model deterministic dependencies with purely statisti-cal considerations and violated the assumption offaithfulnessthat supports the inference of the causal dependencies with the algorithm presented in 1.3.4.

It is therefore important to chose the application that we want to study but also the degree of abstraction (or granularity) of our model.

3.5.3 Adopting a progressive study approach

In the last two subsections we presented the different challenges, inherent to the subject of our work that influence the choice of our studies, namely the existence of many telecommunication network studies and the determinism of the protocols that rule the telecommunication network communications.

To overcome both of these constraints, and the complexity of applying a causal approach to real case scenarios, our approach has been the following. First we study an emulated network where we can perform the intervention and val-idate our method. This step is very important as the reason of using a causal approach stays with its ability to predict interventions. Second, we study FTP traffic where we set up our own FTP server, which simplifies its study. Finally, we use measurements from a third party to study the impact of DNS service on telecommunication network performance.

A causal approach to the study of Telecommunication networks

Chapter 4 Dropbox protocol study

4.1 Introduction

In Chapter 3, we presented the different constraints inherent to the study of telecommunication networks and the solutions that were developed to over-come such constraints and re use the existing solutions and implementations that support a causal study. From Chapter 3 it could appear that the devel-opment of the presented solutions has been a smooth and straightforward pro-cess, which is not the case. In addition, the solutions we presented in Chapter 3 require more resource in terms of data and computational power when com-pared to using the Z-Fisher criterion to test independences and histograms to model parameter distributions. One could legitimately wonder if such additional cost is justified.

In this chapter, we briefly present a causal study of the Dropbox traffic per-formance. The study of the Dropbox application was the first causal study that we did. Many building blocks are missing from this study. The causal model inference of the system that we study relies on two algorithms used naively. The first algorithm is the PC algorithm [SG91] and its use of the Z-Fisher criterion [And84] to test for independence. The second algorithm is the kPC algorithm [TGS09] and its implementation of an independence test based on the Hilbert Schmidt Independence Criterion [GFT⁺08], along with its imple-mentation of a non linear regression to orient the edges in the Bayesian graph corresponding to the causal model of the system we study.

In a first study, we use Intrabase [SBUK⁺05] to isolate the periods of the Drop-box traffic during which the application is not limiting the performance. We can focus our study of Dropbox performance on such periods where only the net-work, and Dropbox reactions to networks events, impact the user performance.

We then present a second study where we define metrics that better capture the different parameters that influence the performance of the Dropbox

appli-77

78 4. DROPBOX PROTOCOL STUDY cation, taking into account the specificity of the Dropbox protocol. In these two studies we face the different challenges inherent to the study of telecommuni-cation network based applitelecommuni-cations and complex systems. We are able to draw important conclusions that support the choices presented in the design of our final solution in Chapter 3 and helped to design the studies presented in Chap-ter 5 and in ChapChap-ter 6. Hence, while this study did not lead to the discovery of exploitable mechanisms and predictions, it played an important role in the understanding of the constraints of causal studies in the field of telecommuni-cation networks.

Dans le document A causal approach to the study of telecommunication networks (Page 89-94)