Mixed Workload with Real Applications - Mixed Workload Performance

Experimental System

7.2 Mixed Workload Performance

7.2.2 Mixed Workload with Real Applications

The parameterized mixed workload experiment described above serves as a means to distinguish between message system effects. This section presents the results of running several real applications together in the mixed workload.

In our experiment, we run pairs of compute-intensive/interactive applications on a simulated, 16-node system. The compute-intensive application in each case is one of the four real parallel applications listed in Table 7-1. The interactive application consists the web server from Table 7-1 plus one web robot as a client per node. Each robot generates requests at a controlled rate. We vary the rate to vary the amount of CPU time used by the interactive processes and thus the amount of CPU time available to the compute-intensive application.

The slowdowns observed by our sample parallel applications in the mixed workload experiment under interactive scheduling are moderate. Figures 7-9 and 7-11 illustrate the effect of the interactive application on the compute-intensive application when the system scheduler performs independent, priority-based scheduling.¹ Figure 7-9 shows the slowdown of each compute-intensive application multiprogrammed over the same application run standalone versus the fraction,^f, of each processor’s CPU time devoted to servicing the interactive application. The slowdown is again bounded by dashed lines at 1⁼⁽1^,^f⁾and 1⁼⁽1^,^f⁾¹⁶. The actual slowdowns can be observed to fall between these extremes, depending on application characteristics. Theenumapplication synchronizes infrequently and thus comes nearest to the minimum slowdown of 1⁼⁽1^,^f⁾. The other applications synchronize

1The slowdown of the interactive application under independent, priority-based scheduling is always close to one.

barrier

Fraction of Interactive CPU Time

Slowdown

Figure 7-9. The performance of the compute-intensive application in a mixed workload with interactive scheduling is plotted against the frac-tion of CPU time devoted to interactive tasks (16 processors).

Fraction of Interactive CPU Time

Slowdown

Figure 7-10. The performance of the compute-intensive application in a mixed workload with coscheduling is plotted against the fraction of CPU time devoted to interactive tasks (16 pro-cessors).

Fraction of Interactive CPU Time

Fraction of Messages Buffered

Figure 7-11. The fraction of messages taking the buffered path,^fbufin a mixed workload with interactive scheduling versus the fraction of CPU time devoted to interactive tasks (16 processors).

The dashed line at^fbuf ⁼ ^f represents a naive expectation.

Fraction of Interactive CPU Time

Fraction Buffered

Figure 7-12. The fraction of messages tak-ing the buffered path in a mixed workload with coscheduling versus the fraction of CPU time de-voted to interactive tasks (16 processors).

a moderate amount but show nowhere near the slowdown ofbarrierin Figure 7-7.

The fraction of messages buffered by the sample applications in the experiment roughly follows the fraction of interactive CPU time, as one would expect. Figure 7-11 shows the fraction of total messages,^fbuf, in each compute-intensive application that take the buffered path versus the fraction of CPU time devoted to the interactive application. The fraction^fbufis nonzero as^fapproaches zero because of effects other that the interactive application. In particular, the microkernel-style system scheduler is another user-level application that runs periodically and thus induces a fixed amount of buffering.

The fraction of messages buffered rises with a slope less than unity, eventually falling below the

fbuf ⁼^f line for the CRL applications, because synchronization-induced slowdown tends reduce the demand for buffering. In an application with little synchronization, likeenum, the number of messages buffered tends to follow^f. If one process in the application is stalled and buffers a message due to multiprogramming, the other processes in the application continue sending messages at the same rate. However, in an application that synchronizes, the number of messages buffered is lower.

If one process in the application is stalled and has to buffer a message, some other process may slow down because it is waiting on a reply to that message. The result is that the demand for buffering falls.

The slowdown effects observed by parallel applications other thanbarrierunder interactive scheduling are moderate. For comparison, Figures 7-10 and 7-12 plot the slowdown and fraction of messages buffered, respectively, for the real applications under fixed-timeslice coscheduling as described in conjunction with Figure 7-5. The slowdowns are limited to a factor of about two, as expected.² The fractions of messages buffered are similarly limited.

There are two points to take away from the plots, both qualitative. The first is the main point of the thesis: two-case delivery is justified in much of the graph in Figure 7-9. The performance of each compute-intensive application smoothly approaches a (perfect) slowdown of one at the left as the CPU time used for interactive work diminishes. Optimistic message delivery is justified in this regime because most messages are delivered to the correct process immediately and few are buffered. From Figure 7-11, when about 10% of the CPU time is devoted to interactive tasks, only about 45% of messages are taking the buffering path. In this regime, an alternate system with hardware-managed buffering pays a performance and implementation cost penalty for hardware that is going unused.

Second, towards the right of Figure 7-9, the slowdowns observed eventually justify coordinated scheduling of jobs instead of independent scheduling of processes. Some of the applications in Fig-ure 7-9 exhibit slowdowns disproportionate to the amount of CPU usage, as explained in conjunction with Figure 7-4. The slowdown is due to application characteristics, not the message system, because there still are not very many messages buffered (Figure 7-11). WithFUGU, the decision to switch to coscheduling becomes a tradeoff of message cost versus scheduling flexibility. A system such as the CM-5 that uses coscheduling for protection always suffers the slowdown of coscheduling.

Conclusion 2: The direct virtual network interface is desirable because most messages are delivered via the fast path.

2Some of the slowdowns are still less than two at the extreme right due to a scheduler bug. Similarly, the fraction of messages buffered is non-zero at the right side of the graph due to the same bug.

H/W buffered

Fraction of Interactive CPU Time

Slowdown

Two-case H/W Two-case (coscheduled)

Figure 7-13. An artist’s conception of a comparison of hardware buffering to two-case delivery.

Two-case delivery is desirable at the left because relatively few messages are buffered. Two-case delivery is desirable at the right if coscheduling is used and application characteristics may demand coscheduling anyway.

In short, at the far left, two-case delivery is desirable because most messages are delivered via the fast path, even with independent, priority-based scheduling favoring the interactive application. At the far right, two-case delivery is desirable if the system switches to coscheduling and it is probably useful to switch to coscheduling for application reasons alone. The tradeoff in the middle of the graph is less clear. The next section explores this tradeoff.

Dans le document An Efﬁcient Virtual Network Interface in the FUGU Scalable Workstation (Page 92-95)