• Aucun résultat trouvé

Quantitative results

Noam Palatin, Arie Leizarowitz, Assaf Schuster and Ran Wolff

5.4 Data analysis

5.6.2 Quantitative results

The goal of our second experiment was to evaluate the scalability of the Distributed HilOut algorithm. Specifically, we wanted to examine what portion of the entire data set, SN, is

Table 5.2 The output of Distributed HilOut on Bonnie (Bray, 1996) data Rank Machine Score Main contributing attributes

1 bh10 0.447 SwapOut, AvgCPU user, MaxCPU sys 2 i3 0.444 MaxCPU sys, MaxCPU idle, MaxRootUsed%

3-5 i3 . . . . . .

6 bh10 0.417 SwapOut, MaxCPU user, AvgCPU user

Table 5.3 Scalability with the number of out-liers (k) – BYTEmark benchmark, 30 machines

k |SN| |SG| Percent

3 2290 304 13%

5 2290 369 16%

7 2290 428 19%

collected intoSG. To be scalable,SGneeds to grow sublinearly with the number of execution machines and at most linearly withk– the number of desired outliers.

Tables 5.3 and 5.4 depict the percentage of points – out of the total points produced by 30 machines – that were collected when the user chose to search for three, five, or seven outliers.

This percentage increases linearly.

Tables 5.5 and 5.6 depict the percentage of points – out of the total points produced by 10, 20 or 30 machines – collected intoSG. As the number of machines grows, this percentage declines.

Because we were unable to devote enough machines to check the scalability in a large-scale set-up with hundreds of machines, we were forced to simulate the run of the algorithm in such a set-up on a single machine. The data for the scalability check (taken from metric space R70) were randomly generated according to the following procedure. The set of machines P was randomly partitioned into 10 clusters{C1, . . . , C10}. Seven machines fromP were selected to be outlier machines independently of their cluster. For each clusterCr a point i= {φ1i, . . . , φ70i }:∀l=1, . . . ,70φli∈[0,1] was selected with a uniform distribution. After the above steps had been performed, we generated the data setSi = {x1i, . . . , x100i }for every machinePiCr as follows:∀j=1, . . . ,100 and∀l=1, . . . ,70xji(l)=φrl+Rji(l) Where xji(l) is the l component of the j point of machinePiandRji(l) is a random value drawn from N(0,1/7) ifPiis a normative machine, or from the uniform distributionU[12,12] ifPiis an outlier machine.

In real life, there is a partial random order between machine executions. Namely, at each time a random selection of the machines execute on the sameSG. To simulate this, we first randomized the order in which machines execute (by going through them in round robin and skipping them at random). Then, to simulate concurrency, we accumulated the outcomes of machines and only added it toSGat random intervals. The resulting simulation is satisfactorily similar to those experienced in real experiments with fewer machines.

Figure 5.3 depicts the percentage of points collected intoSGout of the total points produced for 100, 200, 500 or 1000 machines. As the number of machines grows, this percentage declines and it drops below two percent for large systems that contains at least 1000 machines. In

Table 5.4 Scalability with the number of outliers (k) – Bonnie benchmark, 30 machines

k |SN| |SG| Percent

3 2343 287 12%

5 2343 372 16%

7 2343 452 19%

5.6 EVALUATION 85 Table 5.5 Scalability with the number of machines and data

points – BYTEmark benchmark.k=7

Number of machines |SN| |SG| Percent

10 751 190 25%

20 1470 336 23%

30 2290 428 19%

Table 5.6 Scalability with the number of machines and data points – Bonnie benchmark.k=7

Number of machines |SN| |SG| Percent

10 828 186 22%

20 1677 373 22%

30 2343 452 19%

addition, as can be shown from Figure 5.4, the amount of workflow performed by the algorithm grows approximately linearly with the number of machines. That is, on average, the overhead of the algorithm on each machine is fixed regardless of the number of machines. These results provide strong evidence that our approach is indeed scalable.

5.6.3 Interoperability

Our final set of experiments validated the ability of the GMS to operate in a real-life grid system.

For this purpose we conducted two experiments. In the first, we ran the Distributed HilOut algo-rithm while the system was under low working load. We noted the progression of the algoalgo-rithm in terms of the recall (portion of the outcome correctly computed) and observed both the recall in terms of outlier machines (Recall – M) and in terms of data points (Recall – P). The data points are important because, for the same outlier machines, several indications of its misbehaviour might exist, such that the attributes explaining the problem differ from one point to another.

Figure 5.3 SG as percent ofSN, synthetic data, k=7

Figure 5.4 Relationship between the amount of running workflows and the number of machines, data points – synthetic data, k=7

As can be seen in Tables 5.7 and 5.8, the progression of the recall is quite fast: Although the entire execution took 46 minutes for the BYTEmark data (88 for Bonnie), within 10 minutes one or two misconfigured machines were already discovered. This means the administrators were able to start analysing misconfigured machines almost right away. In both benchmarks, a quarter of an hour was sufficient to discover all outlying machines, and most of the patterns indicating the reason for them being outliers. The computational overhead was not very large – in all, about 40 workflows have resulted in additions toSG. The percentage ofSGinSN, however, was quite high – about 25 percent. Further analysis shows that many of the points inSGwere contributed in response to the initial message – the one with an emptySG. We expand on this below.

In the second set, we repeated the same experiment with 20 of the well-configured machines shut down. Then, when no more workflows were pending for the active machines, we turned the rest of the machines on. The purpose of this experiment was to observe the GMS behaviour in the presence of failure.

The results of this second test are somewhat surprising. With respect to the recall our expectations – fast convergence regardless of the missing machines – were met. As Tables 5.9 and 5.10 show, the rate of convergence was about the same as that in the previous experiment.

Furthermore, nearly complete recall of the patterns was achieved in both benchmarks, and all of the outlying machines were discovered. What stands out, however, is that all this was achieved with far less overhead in both workflows and data points collected. The overhead remained low even after we turned the missing machines on again (as described below the double line).

Table 5.7 Interoperability – a grid pool in regular working load – BYTEmark benchmark

Time |SG|/|SN| Workflows Recall – P Recall – M

0:00 0/4334 0 0/7 0/2

0:09 476/4334 17 1/7 1/2

0:14 708/4334 26 6/7 2/2

0:38 977/4334 37 7/7 2/2

0:46 1007/4334 39 7/7 2/2

5.6 EVALUATION 87 Table 5.8 Interoperability – a grid pool in regular working load – Bonnie

benchmark

Time |SG|/|SN| Workflows Recall – P Recall – M

0:00 0/4560 0 0/7 0/3

0:05 295/4560 9 2/7 1/3

0:09 572/4560 18 4/7 2/3

0:14 745/4560 24 5/7 3/3

1:28 1068/4560 44 7/7 3/3

Table 5.9 Interoperability – a grid pool with 20 of the machines disabled – BYTEmark benchmark

Time |SG|/|SN| Workflows Recall – P Recall – M

0:00 0/4334 0 0/7 0/2

0:03 82/4334 6 4/7 1/2

0:07 460/4334 20 5/7 2/2

0:10 462/4334 22 6/7 2/2

0:23 495/4334 26 6/7 2/2

1:38 617/4334 37 7/7 2/2

2:54 619/4334 39 7/7 2/2

Further analysis of the algorithm reveals the reason for its relative inefficiency in the first set-up. Had all of the computers been available at the launching of the algorithms, and had Condor been able to deliver the jobs to them simultaneously, they would all receive an empty SGas an argument. In this case, each computer would return its seven most outlying points with their nearest neighbours (five points to each outlier). Because of the possible overlap between outliers and nearest neighbours, this could result in 7–42 points per computer. Multiplied by the number of computers, this gives anything between around 300 and around 1600 points that are sent before any real analysis can begin.

We see three possible solutions to this. One, we could emulate the second set-up by sending the initiation message to small sets of computers. Two, we could add a random delay to the first workflow, increasing the chance of non-emptySGbeing delivered with it. Three, we could restrict the return value of the first job to just some of the outliers and their nearest neighbours.

Table 5.10 Interoperability – a grid pool with 20 of the computers disabled – Bonnie benchmark

Time |SG|/|SN| Workflows Recall – P Recall – M

0:00 0/4560 0 0/7 0/3

0:05 112/4560 4 2/7 1/3

0:07 417/4560 16 4/7 2/3

0:10 475/4560 22 5/7 3/3

0:15 478/4560 24 7/7 3/3

0:17 484/4560 27 6/7 3/3

0:31 494/4560 30 7/7 3/3

2:03 578/4560 40 7/7 3/3

However, we believe none of these solutions is actually needed. In a real system some of the jobs would always be delayed and some of the machines would always be unavailable when the algorithm initiates. Furthermore, under sufficient workload Condor itself delays jobs enough to stop this worst-case scenario from ever happening.