• Aucun résultat trouvé

This thesis discusses several architectural ideas unique to a scheduled communication system. A goal of this thesis was to walk through a series of architecture decisions that were made when designing the system. It is hoped that future designers of scheduled com-munication architectures can take advantage of some of these features. This section will recap some of the more important features of the NuMesh architecture.

6.1.1 Virtual FSMs for Virtual Streams

Applications can support a variety of communication models that result in very differ-ent numbers of virtual streams. A goal of the NuMesh architecture is to support an arbi-trary number of streams without performance being lost for applications that require fewer or greater numbers of streams. The NuMesh CFSM supports two physical pipelines of communication that allow two different virtual streams to operate on every clock cycle.

Since the NuMesh system supports four ports, this allows all four port to be used on every clock cycle. In addition, each of the pipelines can be time-multiplexed to support up to thirty-two virtual streams each. The architecture does not put a limit on the number of vir-tual streams, but the implemented chip uses a total of sixty-four due to memory con-straints.

By arranging the NuMesh CFSM as a combination of many small virtual FSMs, each supporting a single virtual stream, the number of state bits required to keep track of the global communication FSM is reduced. In addition, a scheduler can assign each virtual stream an arbitrary amount of bandwidth without having to design a complicated global FSM.

6.1.2 Flow Control Protocol

Once it was discovered that words could back up in the communication network, flow control needed to be implemented. A novel protocol is introduced that allows flow control to occur on a single communication word. The protocol only requires a single internode transfer to complete and only requires two bits of information to be exchanged between the nodes involved in the transfer.

The flow control scheme requires some amount of buffering for each virtual stream, since each virtual stream has the potential to be blocked. A single word of buffering is pro-vided for each virtual stream. The flow control protocol dictates that a virtual stream is not allowed to accept new data if its buffer register is full. Since the virtual streams are sched-uled at compile time, the CFSM can decide a cycle earlier whether a virtual stream will accept a new word of data. At the same time the transmitting node is sending a data word along with a single valid bit, the receiving node can transmit a single accept bit that

indi-cates whether the stream will accept new data. Instead of two cycles or multiple buffers being required as in traditional dynamic routing systems, the NuMesh architecture can accomplish the handshake in a single clock cycle and with one buffer per virtual stream.

6.1.3 Scheduler

One might have assumed that the CFSM state memory could have consisted of a single RAM containing a program of instruction to be executed during run-time. Each instruction could correspond to the actions of a single virtual stream, and the streams could be sched-uled for greater bandwidth by being simply replicated in the loop as needed.

This thesis showed the benefits of decoupling the scheduler state from the virtual stream instructions. Since each virtual stream must have its own buffer storage, there must be a distinction between two streams that travel the same path through a node and a single stream that is scheduled twice. The CFSM architecture defines a single instruction for up to sixty-four virtual streams. The scheduler simply decides which of these sixty-four streams gets scheduled on every cycle. The scheduler forms an outer loop of control across all the CFSMs and forms a global schedule. Another advantage of decoupling the scheduler from the instructions is that either can be changed without a complicated pro-gram code needing to be updated.

6.1.4 Processor Interface

Since the static communication network has to interact with the dynamic timing of the processors, there must be an interface to synchronize the two. A register file of shared memory locations is described, which allows each virtual stream some number of loca-tions to store words at each end of the communication path. A variety of techniques are described to reduce the number of cycles that words spend waiting in these shared mem-ory locations. Mechanisms for interrupts, polling of all the streams, and single stream flow control are described, and the benefits of each are discussed.

6.1.5 Dynamics in a Scheduled Architecture

Static routers require very precise timing between all processors and the communica-tion network. This thesis shows how dynamic behavior can be incorporated into a static

communication network. Virtual streams are assigned certain clock cycles for operation at compile time. If the dynamic timing behavior of the destination processor of a virtual stream prevents the node from removing messages from the communication network, messages back up along the communication path according to the flow control protocol described in 6.1.2. An important result of this protocol is that when words are backed up and stored in buffers, the entire operation occurs within the scheduled clock cycle of the virtual stream that is operating. On the following clock cycle, a completely different set of virtual streams may be operating and the blockage from the previous virtual stream has no lingering effect on the timing of the network. This allows schedules to be created at com-pile time without any regard to the fact that virtual streams can behave dynamically in that they may be backed up in the network depending on the processors’ behavior.

Several mechanisms are described to allow for run-time changes to the static schedule.

Both virtual stream instructions and CFSM schedules can be written during operation of an application. Since these writes can only occur at precise times that are set up at compile time, it is possible for the entire system to change its communication patterns at the same time. Multiple schedules and extra sets of virtual stream instructions can be stored in the CFSM at compile time, allowing a single write to completely change the static schedule being run on a node. For more subtle changes, individual schedule slots and virtual stream instructions can also be written by the processor as they are needed during an application.