A Model for Transformation of Self-Similar Traffic into Poisson’s Arrival Packets
Gennadiy I. Linets, [email protected]
Svetlana V. Govorova, [email protected]
Sergey V. Melnikov, [email protected]
Victor V. Medenec, [email protected] North-Caucasus Federal University, Stavropol, Russian Federation
Abstract
Using functional transformation, we propose an approach allowing to convert a self-similar input packet stream of multiservice traffic and to obtain a stream with the properties of the elementary stream and the variation coefficient equal to one.
1 Introduction
When used in transportation networks, various control mechanisms inevitably experience nonlinear dependence due to the objective limitations of the available resources leading to various conflict situations and manifestation of the fractal properties of the network load, which cannot be resolved by simple methods. If the flow control mechanisms are not used in transportation networks, the network traffic demonstrates fractal properties to a lesser extent. The problem arises with the use of flow control and congestion avoidance mechanisms, when additional non-linearity appears. In such situations, the relation between traffic fluctuations and network control mechanisms may become complex. The problem of the correct allocation of network resources becomes particularly acute in such situations. The arising non-linearity with possible dynamic behavior of systems in packet based networks results in the manifestation of random properties, which entails the following problems.
Required:
1. To check the network traffic for self-similarity.
2. To measure the numerical values of the Hurst exponent.
3. To know the change rate of traffic characteristics when subjected to network management tools in the course of transport networks operation.
4. To assess the impact of traffic on the productivity and efficiency of network management tools.
In reality, the traffic verification for self-similarity is an extremely difficult task. The problem is that the real world always operates finite datasets. Therefore, checking whether a route is self-similar is usually an impossible practical task.
It is important to explore various self-similarity properties in real traffic. The difficulty is that even if certain self- similarity properties are confirmed, it does not conclude that the analyzed data are self-similar in structure, as other impacts may cause the same traffic characteristics (for example, the arising transiency). Some transient processes (such as processes with bias levels) may result in similar properties. This means that the pulsing network traffic can be caused by both long-term dependence and transiency of the observed process. Studies have shown that the present transiency may result in wrong conclusions when analyzing the results of self-similarity tests. It is also essential to apply background information on the change rates within the critical values of self-similarity parameters. This would allow: to simulate such processes in a real network environment, generating processes with given characteristics; to study the transport network response when its inputs are exposed to self-similar traffic; to compile practical steps for self-similar traffic management; to obtain analytical dependencies prior to optimization of the topological structure of transport networks.
2 Current Problem Analysis
The study of self-similar traffic management issues in the early stages of its development [She07]. The scale invariant traffic structure brings new complexity to the overall picture of the transport network management, which complicates the task of providing the required quality of service and efficient use of resources. The pulsating traffic structure with self-similar properties implies the existence of congestion (high activity) periods in long time periods, which negatively affects the overload management. However, the long-term dependence inherently implies the existence of an unusual
Copyrightc 2017 by the paper’s authors. Copying permitted for private and academic purposes.
traffic correlation structure, which can be used for handling the so-called overload control. Shelukhin et al.[She07]
demonstrate the possibility of forecasting random processes in terms of self-similar traffic transmission with high reliability and at long time intervals. To achieve this, they propose a traffic regulation mechanism based on the multi- scale structure of overload control, which may be used to enhance the efficiency of the network functioning [She11].
The pulsating traffic structure invariant to the scale impacts the quality indicators and performance of a network and is incompatible with the traditional models of network traffic. Therefore, it is an extremely important practical task to determine the causes and reduce the consequences of self-similarity influence. When developing efficient integrated network structures, within which the guarantee of the requested QoS (Quality of Service) is maintained using the resources to the maximum effect, the issues of understanding and justification of self-similarity based on the physical principles of the real network environment take the primary focus.
Having analyzed 717 reference sources (mainly US), the authors present the main factors that may produce various types of long-term dependence of network traffic [She07]: user behavior; data generation, structure and retrieval; traffic packing; network control means; closed-loop control mechanisms; network development. It is noteworthy that the effect of such mechanisms allows to impact the traffic structure by changing its nature, and if self-similarity is already inherent in the traffic, in some cases it is reinforced [Lel94]. However, it is difficult to imagine that based on such factors, self- similarity can occur independently. The emergence of fractal properties of traffic at the application level is possible only where the source itself is a chaotic dynamic system generating traffic with self-similarity properties, which is unlikely.
[Yaz06]
Numerous measurements have demonstrated that such traffic structure is not an irrelevant side effect but a unique feature developed within the existing distributed network structures with intermediate accumulation, which include modern transport and telecommunications networks. Therefore, it is natural to believe that self-similarity is based on just one causative factor, which, despite the diversity of manifested forms, incorporates their entirety, by the same method of information processing [Lel94]. Hence, self-similarity is an inherent property of packet networks.
The statistical properties of flow packets are determined by the following factors [Lel94, Mil03]:
1. Randomly nature of traffic as a bit stream generated by the information source.
2. Specifics of the transformation of the bit stream into the packet stream subject to the technological mechanisms of transformation.
3. Task-specific transformation of the flow in course of aggregation in order to improve quality indicators.
Since in such three cases the packets arrive irregularly, the time interval between successive arrivals of packets is a random value [Mil03]. The statistical characteristics and the structure of the received packet stream, in turn, are affected by a range of factors [Lel94]:
a) Specifics of operating systems with time division. Each process in the system developing in the "virtual time", primarily subject to the available resources. Within the process of transmitting information from the application to the link level, the time intervals between the phases of the pack formation are uneven, even if the generated data flow is uniform.
b) The dynamics of the information application work using the means of interworking is an important factor in determining the nature of an aggregated data stream. An application may generate data at a rate determined by the available resources (buffer memory and bandwidth of the communication channels).
c) Implementation of the transport layer protocol. It provides reliable packet delivery and regulation of their rates with the use of a closed feedback loop between the receiver and the data source.
d) Features of link level protocols, such as collision, occurring at the division of the transmission medium increase the time intervals between the packets with the growing link loads. This is particularly evident in networks using TCP/IP protocols implementing "window management". According to [She11], the traffic not exhibiting any self-similar properties previously, having gone through the network hubs is transformed into a network fractal.
e) Characteristics and administrative restrictions imposed on the intermediate network nodes in order to ensure the specified service quality parameters.
More complex relations in a data stream arise with the use of ATM and Frame Relay protocols which imply integrated quality control functions for virtual connections using buffering, prioritization and protection strategies [Lel94]. In this case, traffic shaping is focused on changing the flow packet characteristics compounding the virtual path or channel so as to reduce the peak rate, limit the pack length or reduce the delay time by arranging the pack in time, as well as to plan the traffic (Traffic Shaping). The right for traffic shaping is provided to both network operators and users in order to agree on its parameters, passing through the "user network" interface, with an agreement on the traffic. For network operators, traffic shaping becomes an effective method of the optimal use of network resources by the "delay- performance" criterion [Lel94].
One of the properties of multiservice traffic is its structural complexity, which is characterized by the variation coefficient c(). The variation coefficient c() is the dispersion characteristic of the pack flow defined as the ratio of the mean standard deviation () of the values of time intervals between the arrival of successive packets to their expected value m(), i.e. с()()/m()
Multi-service transport packet streams of modern telecommunication networks are characterized by transiency and self-similarity, which, unlike the pack flows with Poisson distribution, have a variation coefficient greater than one, i.e.
1 ) (
c . According to [Čuč09], the efficiency of using network resources for the traffic with the packet stream with
Poisson distribution is much higher than the self-similar packet stream with the properties mainly described by Hurst coefficient H.
The work proposes a model, based on the functional conversion used to reduce the structural complexity of the self- similar input stream pack and to obtain a stream with the elementary stream properties and the variation coefficient is equal to one.
3 Problem Statement
Based on the research target, we assume that the input packet stream G (τ1) is subjected to the identification by the known analytical techniques so as to determine its self-similar properties. We assume that the identification established the time intervals distribution density between the packets G (τ1) as described by the Pareto Law. The function mapping link of the switching node converts the distribution density of time intervals between flow packs G (τ1), into the exponential law G (τ2).
As a result of the function mapping of the density distribution of time intervals between flow packs G (τ1), its structural complexity was reduced to one, creating stream G (τ2) with the properties of the elementary stream.
It is required to:
1. Develop a model converting the input packet stream G (τ1) with the distribution density of time intervals between packs as per the Pareto Law and structural complexity c(1)1, into the flow G (τ2) with the density of time intervals distribution between packs described the an exponential distribution law and with the structural complexity of c(2)1. 2. Find a functional link between the Hurst exponent and the variation coefficient determining the structural complexity of the input packet stream with known self-similar properties.
To ensure the solution accuracy we have introduced limitations:
1. Pack length L0 is a fixed value;
2. The length of the time interval τi is determined by the time of the pack forming in the buffer;
3. At relatively low values of the pack length L0 with a small error, the mean value of the process rate rср may be replaced by the instantaneous rate r(t) in the interval, i.e. rср = r(t);
4. The average value of the process rate in the interval τ generates a pack with the length L0, i.e. rср = Lо;
5. The random variable Z has a distribution with a heavy tail, if the probability is P
Zx
cxa,x, where 0a2 is a tail index or a shape parameter, c is a positive constant [She11];6. The correlation function R(t1,t2)M
(X(t1)m)(X(t2)m)
is invariant with respect to the burst shift, i.e. that is, )k t , k t ( R ) t , t (
R 1 2 1 2 for any t1,t2,kZ. It is expected that the first two points exist and are finite: mM
X(t)
,
X(t) m
M
2 . Here, M() is the averaging operation; m is the first central point; 2is the dispersion of process X (t);
7. Random process X (t) is exactly self-similar to the second-order process with the Hurst exponent H (1/2 <H <1), if
(k )H k H (k )H
) k (
R 2 2 2
2
1 2
2 1
for any k1.
8. The conversion function of the probability density of time intervals between packs (x)is a single-valued differentiable function for which there is a known inverse functionx(y).
9. If at the function conversion of the probability density of time intervals, the invariance property of probability differential is performed between packs, the probability of such two events are equal g(y)dy f(x)dx.
4 Solution
4.1 Model Transformation of Self-Similar Stream into Poisson Stream Packs
Since the random variables x and y are linked by a single-valued differentiable functional dependence, we base our calculations on the fact that the obtained random variables X, lie in the interval [x, x +dx]. Therefore, value y is also in the interval [y, y + dy], where y(x), and dy(x)dx.
Since such conditions ensure the invariance property of the probability differential, the probability of such two events equal
f(x)dx
g(y)dy (1)
According to formula (1) ,
) y ( )]
y ( [ dy f f(x)dx
g(y) . (2)
The analytical expression for the function y= f(x) indicating the definition domain х plays a key role in the synthesis of random values distribution law. Therefore, within the class of differentiable functions,
х) (
y , (3)
It is essential to define the functions which would ensure resulting g(y) functions with desired characteristics.
When transformations traffic distribution laws, the information transfer rate available to the user of the i - th transport network service is a stochastic value. Therefore, it is an aggregate of time functions and has a probabilistic description.
Relevant probability characteristics can be unconditional and joint probability density of random variables which are point functions of the process for a fixed time. Moreover, their aggregate (for example, the bit rate of data transfer) is an ensemble where any component is a selective function of a random process rk(t), related to a particular session [Lel94].
The values of its implementation at some point of time ti define the random variableR(k )i . [Mil03] For packet switching, the bitstream is converted into a discrete series of packets, generally of variable duration [Lel94, Mil03].
For quantitative analysis, we confine to the fixed package length L. In this case, the traffic structure is completely determined by the distribution of the time intervals duration between transmitted packets [Lel94, Fra10]. The interval length is determined by the time of the information accumulation in the buffer, sufficient to form a packet of predetermined length L0 [Lel94]:
o t τ
t
L t d (t) r
i
i
. (4)The left side of formula (4) is the average rate over the interval , multiplied by the interval length, i.e., [Lel94, Mil03]
L0
rср. , (5)
where t
Tt ср
i
i
dt ) t ( r r
r 1
.
For relatively small values of L0 packet length with a small error of rср , we replace the current rate value in the interval, i.e. rср = r. This assumption makes allows to find a functional dependence between random variables and r [Lel94, Mil03]:
r L0
. (6) and, therefore, to determine the distribution law g () for a continuous random variable as a function of one random argument r, if its distribution law f (r) is known.
Let us consider the case where the initial probability density describes the distribution of the instantaneous values of a random process. The random variable is linked to it through functional dependence (6) and defines the nature of the transformation. The result of the transformation is a package in the pool, and the random variable , where is the time interval between packets, a random variable, and the law of the value distribution fully describes the statistical properties of the packet flow, therefore
(r)
. (7)
The relation between the instantaneous values of the random process r(t) and the time interval between the packages generated in this manner can be rewritten in the form [Jag96, Alt07]:
()
a
f(r)dr )
G(
, (8)
and the distribution density of time intervals between packets is determined by the formula:
)].
( [ f ) (
g()G() (9) In (8) a is the lower threshold parameter determined by the normalization condition:
1 d ) g(
a
.
A distinctive feature of the dependence between random variables r and is the invariance with respect to the partial replacement of function and argument (inverse ratio). This dependence may be expressed through generalized variables y=k/x which allows to consider the distribution law (9) as inverse transformation:
x . k x g k ) x (
f 2
(10) Then the model of a self-similar flow under Pareto law being converted into a Poisson packet stream may be obtained from the following rationale. Let the continuous random variable 1 with distribution density f(1)is being transformed as 2(1). The selected function (1) should ensure the transformation result is coherent with the law of distribution [Lin11, Mao05]
- 2
2) e
(
g , (11)
where 2 is the length of the interval between packets.
According to (1), the distribution functions should be equal, i.e.
) ( G ) (
F 1 2 . (12) Therefore [Mik15]
1-F( )
ln 1
21 . (13) By solving (12) with regard to 2, we receive F(1)1e2. By differentiating (12) by 1we receive [Maz14, Fra10]
) ( F
) ( f ) ( F
) ( F d
d
1 1 1
1 1
2
1 1 1
1
. (14)
4.2 Determining the Functional Connection Between the Hurst Exponent, and Variation Coefficient
Let us consider an input packet stream G (τ1) with having a distribution density of time intervals between packets f (τ1) and self-similar properties. To be specific, we demonstrate a solution to this problem for the selection length of the input flow G (τ1), comprising 1,000,000 packets with the distribution density of time intervals between Pareto packets with known self-similar properties (Hurst exponent H = 0.8), with an integral distribution function as follows:
a
) k (
F
1
1 1
,
where k is the lower threshold, a is the form index.
To generate random variables with Pareto distribution, we use the approach described in [She07, Fra09]. The task is to find the inverse function of the integrated distribution function [She07]:
rnd( )
,) k ( G
а 1 1
1 1
where rnd (1) is a random variable uniformly distributed within the interval [0,1]. The parameter a is linked to the Hurst exponent by the dependence [She07] a32H.
For the exponential networks, the distribution of time intervals between packets is determined by the formula ),
exp(
) (
F 2 1 2 where λ is the controlled packet flowrate [She07, Fra09].
For simulation of the Poisson law of time intervals distribution between packets, use the formula [She07, Lel94]:
rnd( )
ln ) (
G 1 1 1
2
.
Having defined the numerical value of the input stream packets G (τ1): expectation, variance, standard deviation, variation coefficient, the formula will be:
421 1000000 1 3
9 9 9 9 9 9
0 1
1 ,M( ) ,
) ( L ) (
M i
i
; 177056
1000000 1
9 9 9 9 9 9
0
2 1 1
1 ,D( ) ,
)) ( M ) ( L ( ) (
D i
i
;
306
1 13
1
1) D( ), ( ) ,
(
; с(1)(1)/m(1),c(1)3.89.
For traffic with Hurst exponent H = 0.8, the variation coefficient was much greater than one. Fig. 1 demonstrates the dependencies: the variation coefficient of the Hurst exponent of the input flow G (τ1), the relation between the intensity of the exponential law and the variation coefficient of the input stream, as well as the dependence of the Pareto law form value and the intensity of the exponential law at equal mathematical expectations.
Figure 1. Graphs:
a) the dependence of the variation coefficient of the Hurst exponent of the input stream;
b) the relation between the intensity of the exponential law and the variation coefficient of the input stream;
c) the dependence of the a form of Pareto law and the intensity of the exponential law
5 Example
As an example, we consider the transformation of the Pareto distribution law into the exponential law, the simulation results are shown in Table. 1 [Fra10].
Table 1. Functional transformation of the interval distribution of packet repetition under Pareto law into the exponential law
Initial distribution law Transformation
formula (1) Distribution law for packet intervals
Pareto Exponential
1 1 1
) k
g(
1
lnk a
0 e
)
f(2 -2,x
To find the functional link between the variation coefficient and the Hurst exponent, we used a cubic regression equation.
We obtained the following equation:
с(τr) = 73.337833H3-109.155344 H 2 +55.547346 H - 8.229788
The approximation of the variation coefficient dependence on the Hurst exponent of the input stream is shown in Figure 2:
Figure 2. Approximation of the dependence c(τr) on the H of the input stream G (τ1) To assess the significance of regression parameters and correlation, first:
- the mean value of the variation coefficient is as follows:
3793 29 3
98.001 с( )
) 1
с( .
n r
r
- auxiliary variables are listed in Table 2, where,
i r i
r
i
с( ) ˆ с ( )
,
i
i
i1,
i i i
i с( )
) сˆ( ) с( A
1 1 1
Table 2. Table of auxiliary variables to assess the significance of the regression and correlation
i Hi
i(
i)
21 0.3 0.868 0.5906 −2.5113 6.3069 0.277
4 0.077 0.3196 — — 2 0.35 0.952 0.9846 −2.4273 5.892 −0.03
26 0.0011 0.0343 −0.129 5 0.0168 3 0.4 1.054 1.2179 −2.3253 5.4072 −0.16
39 0.0269 0.1555 −0.047 0.0022 4 0.45 1.178 1.3455 −2.2013 4.8459 −0.16
75 0.028 0.1422 0.011 0.0001 5 0.5 1.33 1.4223 −2.0493 4.1998 −0.09
23 0.0085 0.0694 0.0445 0.002 6 0.55 1.522 1.5033 −1.8573 3.4497 0.018
7 0.0003 0.0123 0.0565 0.0032 7 0.6 1.768 1.6437 −1.6113 2.5964 0.124
3 0.0155 0.0703 0.05 0.0025 8 0.65 2.088 1.8983 −1.2913 1.6676 0.189
7 0.036 0.0909 0.026 0.0007 9 0.7 2.514 2.3221 −0.8653 0.7488 0.191
9 0.0368 0.0763 −0.007 5 0.0001 10 0.75 3.091 2.9702 −0.2883 0.0831 0.120
8 0.0146 0.0391 −0.044
5 0.002
11 0.8 3.89 3.8976 0.5107 0.2608 −0.00
76 0.0001 0.002 0.228 0.052 12 0.85 5.011 5.1593 1.6317 2.6623 −0.14
83 0.022 0.0296 −0.067 1 0.0045 13 0.9 6.598 6.8103 3.2187 10.3597 −0.21
23 0.0451 0.0322 −0.015 6 0.0002 14 0.95 8.825 8.9055 5.4457 29.6552 −0.08
05 0.0065 0.0091 0.0984 0.0097 15 1 11.85 11.5 8.4707 71.752 0.35 0.1225 0.0295 0.2549 0.065
∑ — — — — 253.477
9 — 0.7571 2.0698 — 0.376
Calculation of the correlation index:
9985 4779 0
253 7571 1 0
1 2
2
. . . )
) ( c ) ( c (
) ) ( cˆ ) ( c R (
i r i r
i r i
r
Calculation of the determination index:
997 0 9985
0 2
2 . .
R
Use the values in Table 2 to determine the average error of approximation of the variation coefficient
i
r )
( cˆ
i
r )
( c
) ( c
) (
c i
1 1
i 2
iA
i ))2( c
) ( c (
r i r
% .
% . .
% ) .
( c
) ( cˆ ) ( c А n
i r
i r i
r 100 714
29 069757 100 2
1
6 Conclusions
1. We have developed a model for converting the input packet stream G (τ1) with the density distribution of time intervals between packets as per the Pareto law, into the flow G (τ2) with the distribution density of time intervals between which the packets described by the exponential distribution law and with the structural complexity c(2)1. 2. We have developed a model allowing to obtain a sequence of packets with specified properties which can be used in modeling self-similar traffic.
3. The analysis showed that with the increase of the Hurst exponent H, the variation coefficient characterizing the structural complexity of the traffic increases.
4. For the traffic with a variation coefficient much greater than one, in case it is converted in the switching nodes in order to obtain an elementary stream, it is essential to remember that the increased variation coefficient of the input stream, the effluent intensity λ decreases.
5. We have established the dependence of the form index a under the Pareto law and the intensity λ under the exponential law given the equality of their mathematical expectations. We found that the form value a and the intensity of λ increase concurrently.
6. We have established functional relation between the variation coefficient and the Hurst exponent using a cubic regression equation.
7. For the Pareto Law, we determined: the correlation index; the determination index; the average approximation error;
Fisher's F-criteria.
References
[She07] Sheluhin O.I., Smolskiy S.M., Osin A.V. Self-similar processes in telecommunications. – John Wiley & Sons, 2007.
[Lel94] W. E. Leland, M. S. Taqqu, W. Willinger and D. V. Wilson, On the self-similar nature of Ethernet traffic (Extended version), IEEE/ACM Transactions on Networking, Vol.2, pp.1-15, 1994.
[Yaz06] Yazıcı, B. Izzetoglu, M. Onaral, B. and Bilgutay, N. “Kalman filtering for self-similar processes”. (2006).
Signal Processing 86, 760–775
[Mil03] Miloucheva I., Műller E., Anzaloni A., A practical appoach to forecast Quality of Service parameters considering outliers. 2003.
[Čuč09] Ž. Čučej and M.Fras, Data source statistics modeling based on measured packet traffic : a case study of protocol algorithm and analytical transformation approach, TELSIKS 2009, 9th International Conference on Telecommunications in Modern Satellite, Cable and Broadcasting Services, Serbia, Niš, 7-9 October, 2009.
[Fra10] M. Fras, J. Mohorko and Ž. Čučej, Modeling of captured network traffic by the mimic defragmentation process, Simulation: Transactions of The Society for Modeling and Simulation International, San Diego, USA, Published online 20 September 2010.
[Mik15] Mikova S. Iu., Oladko V. S., Nesterenko M. A. Podkhod k klassifikatsii anomalii setevogo trafika [The approach to classification of anomalies in network traffic]. Innovatsionnaia nauka, 2015, no. 11, pp. 78-80 (in Russian).
[She11] Shelukhin O. I., Garmashev A. V. Obnaruzhenie DOS i DDOS-atak metodom diskretnogo veivlet-analiza [Detection DOS and DDOS attacks using discrete wavelet analysis]. T-Comm, 2011, no. 1, pp. 44-46 (in Russian).
[Jag96] D.L. Jagerman, B. Melamed and W. Willinger, "Stochastic Modeling of Traffic Processes", invited chapter in Frontiers in Queueing: Models, Methods and Problems (J.H. Dshalalow, Ed.), CRC Press, 1996.
[Alt07] T. Altiok and B. Melamed, Simulation Modeling and Analysis with Arena, Academic Press, 2007 (456 pages), ISBN 978-0-12-370523-5.
[Lin11] Linets G.I.: Methods of structural and parametric synthesis, identification and management of transportation telecommunication networks for maximum performance. // Monograph. - Stavropol, 2011. - 384 p.
[Mao05] Mao, G. “A Timescale Decomposition Approach to Network Traffic Prediction”. (2005). IEICE Trans.
Commun. Vol E88-B, No. 10, 3974-3981.
[Maz14] Mazurek M., Dymora P. Network anomaly detection based on the statistical self-similarity factor for HTTP protocol. Przegląd Elektrotechniczny, 2014, vol. 90, no. 1, pp. 127-130.
[Fra10] M. Fras, J. Mohorko and Ž. Čučej, Modeling of measured self-similar network traffic in OPNET simulation tool, Inf. MIDEM, 40(3): 224-231, September 2010.
[Fra09] M. Fras, Methods for the statistical modeling of measured network traffic for simulation purposes, Ph.D. thesis, 2009, Maribor, Slovenia.