Communication Instruction Writes - An Optimized Hardware Architecture and Communication Protoco

The write operation for a stream is fairly simple. The instruction word has an encoded destination that specifies the location to be written. In addition to writing the specified des-tination, the stream’s buffer register also is written. The valid bit for the buffer register gets set based on the result of the destination’s ability to accept data and the type of instruction opcode. A bmove instruction never sets the valid bit unless the buffer stream is the actual encoded destination.

A cmove instruction mimics the action of the previous stream. If the previous stream wrote a valid word into its buffer register, the cmove stream will do the same thing.

The fmove opcode causes the flow control protocol to be observed. For the case when a port is a destination, the connecting node’s accept bit is checked. If the bit is asserted, the valid bit of the word in the buffer register is set to zero. If the accept bit is deasserted, the valid bit gets set since the transferred word was not accepted, and the buffered word must be sent again when the stream gets rescheduled. For the case when either another stream’s buffer or the processor interface is chosen, the ability of these locations to accept data will have been determined on the previous cycle. The stream’s buffer register valid bit will be set based on this information.

The CFSM must also support writing into the memory address space of the node for both bootstrapping and run time dynamic purposes. A specific destination coding corre-sponds to writing the memory address space. This kind of write can never fail, regardless of the type of instruction opcode selected. Words can be transferred to the memory address space in a similar fashion to all other communication transfers. Data ports, the processor interface, or any of the buffer registers can be sources for writing the memory address space. Message can even be forked with one of the branches going to write the memory address space. The high order bits of the communication word choose which piece of memory on the chip will be written, while the lower order bits serve as the data to be writ-ten. Possible locations within the memory address space are included in Figure 4.21.

Many of these locations serve system level needs and will be discussed in section 4.8 of this chapter.

4.10 System Issues

The NuMesh project faces a host of system design issues that are difficult to solve. The architecture design of the CFSM is required to aid in the solution of some of those prob-lems. This section will discuss some of the architectural features that are implemented to make the system design more manageable.

4.10.1 Booting

The NuMesh system is designed to be booted by a host processor connected to one of the ports of the mesh. This idea is illustrated in Figure 4.22. On power-up, all nodes are idle, waiting for information to come in on any port. The host computer sends schedule and instruction data to the node to which it is connected. The instructions set up paths to

each of its neighbors so the host computer can send booting information to each of the nodes. The information fans out until the host computer can load application schedules and instructions into each of the nodes. Finally, the host computer sends around a message that takes each node out of boot node at the same time, and the application communication code is run.

Processors also get booted over the communication network. One of the destinations to which the CFSM can transfer data is a JTAG interface to the processor. In the prototype NuMesh modules, this JTAG interface can be used to boot the attached processor - the microSPARC. Through this interface, the processor can be initialized and its memory can be loaded with application code. When each CFSM node gets initialized during bootup, part of the process is to boot the microSPARC as well. The JTAG interface can be read by the CFSM as a source to provide debugging support for the processor.

A CFSM node gets booted one pipeline at a time. Special memory locations get writ-ten to switch the loading of boot information between the two pipelines. The pipeline is

Location Function

Scheduler Scheduler contents can be altered at any time Schedule Pointer An index into the scheduler can be written to allow quick changes between communication phases

Communication streams

Communication instructions can be overwritten at any time

Processor JTAG A JTAG interface to the processor can be writ-ten by the CFSM

Reset Bits A node can reset any of its neighbors and put them in boot mode

Clk System Bits A node can tell any of its neighbors to perform clock synchronization with it

Diagnostic LEDs A set of LEDs can be written for diagnostic purposes

Figure 4.21 Components of a Node’s Memory Address Space

designed to detect when a node is trying to boot it and will execute a single instruction of reading the specified port and writing the memory address space until the node is taken out of boot mode. Implementation details of the booting process will be included in the next chapter on architecture implementation.

4.10.2 System Clocking

The NuMesh system runs on a synchronous global clock in a scheme described in [Prat95]. All clocks start with the same nominal frequency, but can start out of phase with each other. The node connected to the host computer first synchronizes with each of its neighbors by telling each of the nodes in turn to match its phase. Every CFSM node has a separate clock bit line that is connected to each of its neighbors. These lines can be written as part of the memory address space. When a node sees a clock line from its neighbor pulled high, the node matches phase with the clock line of the neighbor. Successive nodes are added to the collection of phase matched nodes until the fanout includes all nodes in the system. The order in which the nodes are added to the collection of matched phased nodes is important, but is completely under the control of the host computer. The host computer can create a communication path to any node and then transfer a word to the node’s memory address space that asserts the various clock lines.

host computer

Figure 4.22 Booting the Mesh Occurs From A Host Computer NuMesh

4.10.3 Reset Logic

When the nodes are powered up, the nodes are in an idle state. Each node has a reset line connected to each of its neighbors. When a node sees a reset line asserted by any of its neighbors, it immediately goes into boot mode and starts trying to read communication words from the port where the reset was initiated. Any valid words are written to the node’s memory address port. A node can be programmed to ignore resets from certain ports in case there might be a defective node in the mesh or a node is on the edge of the mesh. Once a reset is initiated, the reset can not be over-ridden by a subsequent reset until the original reset line gets deasserted.

One of the side-effects of the power-up reset is to clear all the valid bits in a node for the buffer registers and the processor interface registers. This prevents power-up errors from mistakenly loading these registers with what the node will think is valid data. Resets from a neighbor do not cause all the valid bits to be cleared, although writing a special location in the memory address space can explicitly clear these bits. One way for a node to change communication phases or to load new instruction streams dynamically is for a neighboring node to assert its reset line and put the stream into boot mode in order to load the new data. Since this operation could occur in real time, it may not make sense to auto-matically clear out all the node’s state. On the other hand, if an application decides to go to a new phase of communication, it may make sense to clear out all the data from the previ-ous phase. Since instructions may get overwritten with completely new instructions, it may make no sense to have data that has backed up due to flow control in the previous phase of communication reside in the new streams’ buffers. Having a simple mechanism to immediately clear out all the buffer registers can be very useful.

4.10.4 Diagnostic Support

No parallel processing system is complete until lights can be seen flashing. With this goal in mind, four LEDs per node exist that can be written as part of the memory address space. When the nodes are powered up, these bits default to the on position. As each node goes through the booting process, the lights can flash in interesting patterns to give the illusion of computation.

Dans le document An Optimized Hardware Architecture and Communication Protocol for Scheduled Communication (Page 118-123)