Artificially Induced Buffering - Buffer Consumption

Experimental System

7.4 Buffer Consumption

7.4.1 Artificially Induced Buffering

We evaluate the performance of our implementation by examining the effects of buffering in the applications described in Section 7.1 under conditions that induce buffering artificially. The condi-tions are as follows. First, we use a buffer-on-mismatch policy so that buffering can be conveniently induced by the scheduling effects alone. Second, we modify our gang scheduler to deliberately in-troduce “skew” between the scheduling times of applications on different processors, as depicted in Figure 7-17. This (perverse) scheduling arrangement allows us to generate arbitrarily bad scheduling in a controlled manner.

The experiment is to run each application multiprogrammed with a “null” (busy-waiting) appli-cation with varying amounts of scheduling skew. Skew will induce buffering through the buffer-on-mismatch policy. The scheduler gang-schedules the pair of applications using the local cycle count register on each node as a cue to perform a gang switch. The schedule quality is varied by skewing the cycle count register on each node to produce artificially poor schedules in a controlled manner. This skew creates a window at the beginning and end of each timeslice during which arriving messages will generate a mismatch-available interrupt, forcing the application into buffered mode. The skew is varied from zero (perfect gang scheduling) to 90% across the machine. We measure the effects on the real application in each case.

The runtime represents either the third iteration for the iterative applications (water and barnes) or the whole program. The runtime represents all the cycles used on behalf of the

...

Figure 7-17. Buffering is induced by using coscheduling with “skew”. Here, a segment of a timeline for process scheduling on a four-processor machine is shown. The application under test, (A), is multiprogrammed with a synthetic null application, (B) that does nothing but busy-wait. In addition, the timeslice clock is skewed uniformly from processor to processor so that processes are deliberately mis-scheduled for an interval around the beginning of each timeslice.

enum

Figure 7-18. The fraction of messages buffered for applications multiprogrammed with a null application is plotted against decreasing schedule quality (eight processors). The fraction is limited by synchronization effects in the CRL applications.

enum

Maximum pages in message queue

Figure 7-19. The maximum pages of virtual buffer space required per processor for applications multiprogrammed with a null application remains low across the range of scheduling quality (eight processors).

Figure 7-20. Relative runtimes of applications multiprogrammed with a null application are plotted against decreasing schedule quality (eight processors). Runtimes are normalized to the runtime with perfect gang scheduling, which is within 1% of 2the runtime of the application running alone.

application, including the cost of buffer insertion handlers that actually run while “null” is sched-uled. We use a null application rather than two copies of a real application because the experiment is more easily controlled.

Figure 7-18 makes the main point of the experiment: that the demand for buffering is relatively small and increases gracefully. Figure 7-18 plots the fraction of messages that take the buffered path versus decreasing scheduler quality. The applications with intrinsic synchronization exhibit essentially a constant fraction of messages buffered corresponding to the maximum number of messages that can be outstanding simultaneously in the application.Enumexhibits buffering linearly with skew as expected for an application with many messages and little synchronization: the likelihood of a message arriving when a process is not scheduled is proportional to the skew between processors.

The maximum number of physical pages required during any run is low, less than seven pages/node, in all cases. The total is small in each case either because the number of messages outstanding is limited or because (in the case ofenum) the messages are small and are accumulated at only a moderate rate compared to the length of a timeslice. Because the required buffer space is small in the common case, the virtual buffering system will only rarely need to page to disk or invoke the overflow control system.

The applications in the experiment slow down with increased skew largely because of the skew itself and to a small extent because of the cost of buffering. Figure 7-20 lists the relative runtime of each application normalized to the runtime of the application run with zero skew, which is within 1%

of 2the runtime standalone. Thebarrierapplication is very sensitive to skew because it makes progress only when all processes in the job are simultaneously scheduled: its slowdown is almost exactly the inverse of the skew. Because theenumapplication tolerates latency well it is relatively insensitive to poor schedule quality. The runtime increase inenumis due only to the added cost of message buffering. Although the Barnes, Water and LU applications are sensitive to latency, they communicate less frequently thanbarrierandenumand so observe intermediate slowdowns.

Conclusion 4: Application characteristics can naturally limit the demand for buffering without the introduction of explicit flow control.

We conclude that the demand for buffering remains low in our applications despite the use of unacknowledged messages and despite (artificially) adverse conditions. In general, we expect applications to suffer buffering overhead only rarely because buffered mode is entered only under unusual conditions and because ordinary applications will clear buffered messages quickly. The second of these expectations is not immediately obvious, so we explore the incidence of buffering with a synthetic application, below.

Dans le document An Efﬁcient Virtual Network Interface in the FUGU Scalable Workstation (Page 102-105)