Outputs Decoded in Parallel Output Registers

One way to ensure that the state machine outputs arrive at the device pins earlier is to decode the outputs from the state bits before the state bits are registered, and then store the decoded information in registers. In other words, instead of using the presenCstate information to determine the value for

addr, we use the nexCstate value to determine what addr should be in the next clock cycle. If the next state of the state machine is a state in which addr(J) is a 'I', we store a 'I' into a flip-flop at the rising edge of clk. If the nexCstate value indicates that the next state of the state machine is a state in which addr( 1) is a '0', then we store a '0' into that flip-flop. The same idea can be used for the other outputs, but since there isn't a clock-to-output requirement for these outputs, we leave them as is. We illustrate the concept of storing the values of outputs based on the value of nexCstate in Figure 5-8.

..

Inputs

.. _...

^Next next_state State current_state State

.. _...

Logic Registers

~ Output

...

^Output

Logic

...

Registers

Figure 5-8 Moore machine with outputs decoded in parallel output registers

Outputs

... ...

This implementation can be coded quickly in VHDL. Instead of using presenCstate in the equations for addr, we use nexCstate. We also register signals addr in flip-flops called raddr, for registered address. We do this by adding two lines of code to the process state_clocked, modifying the outpuClogic process to replace presenCstate with nexCstate, modifying the port declaration to replace addr with raddr, and including addr as a signal local to the architecture. That's it! Now, raddr has the same values in the same clock cycle as addr did in the previous implementations, but raddr is available teo after clk (6.0 ns in the CY7C37l-l43 CPLD we have chosen) instead ofte02 (10.5 ns). The outputs are available in teo time because the value of the output address is held in flip-flops for which the outputs may propagate directly to the device pins rather than first propagating through the logic array. Listing 5-5 shows the modified portion of the architecture.

-- combinatorially decoded outputs

output_logic: process (present_state) begin

if (present_state = read1 or present_state read2 or present_state = read3 or present_state = read4) then oe <= '1';

else

oe <= '0';

end if;

if present_state write then we <= '1'; else we <= '0'; end if;

if present_state read2 then addr <= "01";

elsif present_state read3 then addr <= "10";

elsif present_state read4 then

149

150

addr <= "11";

else

addr <= "00";

end if;

end process output_logic;

state_clocked:process(clk) begin if (clk'event and clk

=

'1') then

present_state <= next_state;

raddr <= addr;

end if;

end process state_clocked;

Listing 5-5 Moore machine with outputs from registers.

From the diagram of Figure 5-8, it may look, at first glance, as if this implementation may have two unintended side effects. First, it may look as though this implementation requires two more flip-flops than the previous version; second, it may look as if the propagation delay from flip-flop to flip-flop,

t_{Q_Q,}between the state-bit flip-flops and the raddr flip-flops takes two passes through the

combinational logic array (one for the next-state logic and one for the output logic), affecting the maximum frequency at which this design can operate. Both of these side effects may exist depending on the specific CPLD or FPGA chosen to implement the design. In the particular case of the CY7C371 and the Cypress Warp VHDL synthesis tool, however, they do not exist. The logic that determines the nexCstate signals and the logic that determines the outputs are combined and reduced into a single level of logic by the synthesis software. The resulting logic uses fewer than the 16 product terms available in a CY7C371 macrocell, so the output decoding logic requires only a single pass through the logic array. The equations produced by synthesis are shown below (equations for clock assignment are removed):

=

joe

present_stateSBV_1.Q * /present_stateSBV_2.Q * present_stateSBV_O.Q

+ /present_stateSBV_1.Q * /present_stateSBV_O.Q present_stateSBV_1.D =

/reset * present_stateSBV_1.Q * /present_stateSBV_2.Q * /present_stateSBV_O.Q * burst

+ /reset * /present_stateSBV_1.Q * present_stateSBV_2.Q * /present_stateSBV_O.Q

+ /reset * /ready * present_stateSBV_1.Q present_stateSBV_2.D

=

/reset * /present_stateSBV_1.Q * /present_stateSBV_2.Q * /present_stateSBV_O.Q * bus_id_7 * bus id_6 * bus id_5 * bus_id_4 * /bus_id_3 * /bus_id_2 * bus_id_l * bus_id_O

+ /reset * ready * present_stateSBV_1.Q * /present_stateSBV_2.Q * /present_stateSBV_O.Q * burst

+ /reset * ready * /present_stateSBV_1.Q * /present_stateSBV_2.Q * present_stateSBV_O.Q

+ /reset * /ready * present_stateSBV_2.Q * present_stateSBV_O.Q + /reset * /ready * present_stateSBV_l.Q * present_stateSBV_2.Q

present_stateSBV_O.D

=

/reset * fread_write * /present_stateSBV_l.Q * present_stateSBV_2.Q * /present_stateSBV_O.Q

+ /reset * ready * present_stateSBV_l.Q * present_stateSBV_2.Q + /reset * /present_stateSBV_l.Q * /present_stateSBV_2.Q *

present_stateSBV_O.Q

+ /reset * /ready * present_stateSBV_O.Q raddr_l.D =

/present_stateSBV_l.Q * present_stateSBV_O.Q raddr_O.D =

present_stateSBV_2.Q * present_stateSBV_O.Q + present_stateSBV_l.Q * present_stateSBV_2.Q

Addr required two macrocells in the previous design implementation; in this design implementation, addr is replaced by raddr. Thus, this design requires the same total number of macrocells as the first one. But, more product terms are required because the next state must essentially be decoded twice.

Since the decoding is done in a single pass (single level) of logic, the t_{Q_Q}is still at its maximum for this device, 7.5 ns.

Dans le document Programmable VHDL (Page 161-164)