PCI Transaction Details - — PCI Bus - CPU and Level Two Cache

CPU and Level Two Cache

Section 5 — PCI Bus

5.2 PCI Transaction Details

Further details of the MCM implementation of various PCI transactions, are found in the 660 User’s Manual.

5.2.1 Memory Access Range and Limitations

PCI memory reads and writes by PCI busmasters are decoded by the 660 to determine if they access system memory. PCI memory reads and writes to addresses from 2G to 4G on the PCI bus are mapped by the 660 as system memory reads and writes from 0G to 2G.

These PCI to memory transactions are checked against the top_of_memory variable to de-termine if a given access is to a populated bank. The logic of the 660 does not recognize unpopulated holes in the memory banks. PCI accesses to unpopulated locations below the top_of_memory are undefined.

PCI accesses to system memory are not limited to 32 bytes. PCI burst-mode accesses are limited only by the size of memory, PCI bus latency restrictions, and the PCI disconnect counter.

5.2.2 Bus Snooping on PCI to Memory Cycles

Each time a PCI (or ISA) busmaster accesses memory, (and once again for each time a PCI burst crosses a cache block boundary) the 660 broadcasts a snoop operation on the CPU bus. If the CPU signals an L1 snoop hit by asserting ARTRY#, the 660 retries the PCI transaction. The ISA bridge then removes the grant from the PCI agent, who (according to PCI protocol) releases the bus for at least one cycle and then arbitrates again. Mean-while, the 660 grants the CPU bus to the CPU, allowing it to do a snoop push. Then the PCI agent again initiates the original transaction.

During the transaction, the 660 L2 cache is monitoring the memory addresses. The L2 takes no action on L2 misses and read hits. If there is an L2 write hit, the L2 marks that block as invalid, does not update the block in SRAM, and does not affect the PCI transaction. L2 operations have no effect on PCI to memory bursts.

5.2.3 PCI to PCI Peer Transactions

Peer to peer PCI transactions are supported consistent with the memory maps of Table 5-1, Table 5-2, Table 5-3, and Table 5-4, which together show the ranges of different bus com-mand transactions that are supported. If the ISA_MASTER signal is used with the Intel SIO, then the SIO is not allowed to perform peer to peer PCI memory transactions in the 0 to 2G range. No other transaction types are affected.

5.2.4 PCI to System Memory Transactions

Single and burst transfers are supported. Bursts are supported without special software restrictions. That is, bursts can start at any byte address and end on any byte address and can be of arbitrary length.

As per the PCI specification, the byte enables are allowed to change on each data phase.

This has no practical effect on reads, but is supported on writes. The memory addresses linearly increment by 4 on each beat of the PCI burst. All PCI devices must use only linear burst incrementing.

In ECC mode, PCI to memory transactions that result in less than 8-byte writes, cause the memory controller in the 660 to execute a read-modify-write operation, during which 8

by-tes of memory data are read, the appropriate byby-tes are modified, the ECC byte is modified, and then the resulting 8-byte doubleword is written to memory.

5.2.5 PCI to Memory Burst Transfer Completion

PCI to memory burst transfers continue to normal completion unless one of the following occurs:

The initiating PCI busmaster disconnects. The 660 handles all master disconnects correctly.

The 660 target disconnects on a 1 M boundary. The 660 disconnects on all 1M boundaries.

The 660 target disconnects because the PCI disconnect timer has timed out.

The CPU retries the snoop cycle that the 660 broadcast on the CPU bus. In this case, the 660 target retries the PCI busmaster. (Note that L2 hits do not affect the PCI to memory transaction. Read hits have no effect on the L2, and write hits cause the L2 to invalidate the block.)

The 660 will target disconnect the PCI busmaster if the refresh timer times out. In this case, the 660 will disconnect at the end of the current data phase for writes, or at the end of the current cache block, for reads.

5.2.6 PCI to Memory Access Sequence

When a PCI access is decoded as a system memory read or write, the memory and CPU bus are requested and, when granted, a snoop cycle to the CPU bus and a memory cycle to system memory are generated. If the processor indicates a snoop hit in the L1 cache (ARTRY# asserted), then the memory cycle is abandoned and the PCI cycle is retried. The CPU then does a snoop push. The L2 cache does not need to do a snoop push because it is write-through, and, therefore, system memory always contains the result of all write cycles. See Section 3 for more L2 information.

5.2.7 PCI to Memory Writes

During PCI to memory burst writes, the 660 performs data gathering before initiating the cycle to the memory controller. The data gathering involves combining two PCI write cycles into one memory write cycle if the address of the first write cycle is even.

Minimum initial write access time to 70ns DRAM when the CPU bus is 66MHz and the PCI bus is 33MHz is 5-1-1-1 -3-1-1-1 PCI clocks for 4-4-4-4 -4-4-4-4 bytes of data (14 PCI clocks for 32 bytes of data). Subsequent data phases of the same burst are generally serv-iced at -3-1-1-1 -3-1-1-1 (12 PCI clocks for 32 bytes of data), giving a peak burst write rate of 32 bytes in 12 PCI clocks, or about 85MBps with a 33MHz PCI clock. This scenario holds while the RAS# timer (10us typical) does not time out, the burst remains within the same 4K memory page, and no refresh is requested (15us typ).

5.2.7.1 Detailed Write Burst Sequence Timing

The detailed write sequence is affected by several factors, such as refresh requests, memory arbitration delays, page and/or bank misses, and cache boundary alignment.

Table 5-5 shows the details of the various sequences that a PCI to memory burst write will experience, depending on the address (relative to a cache block boundary) of the first data phase of the transaction. The starting address of the numbering sequence shown on the top row was arbitrarily chosen as xx00, and could be any 32–byte aligned boundary. The times shown in Table 5-5 are in PCI clock cycles, and do not include any cycles that the PCI

Section 5 — PCI Bus

master spends acquiring the PCI bus from the PCI bus arbiter. The initial data phase is timed from the assertion of FRAME# to the PCI clock at which the PCI master samples TRDY# active. Subsequent data phase times are from the PCI clock at which the previous TRDY# was sampled active to the PCI clock at which the current TRDY# is sampled active.

All the numbers shown in Table 5-5 are for parity (or none) operation. The numbers are also correct for ECC mode operation as long as all the writes are gather-store pairs. Incurring a RMW operation costs 3 PCI_CLKs.

Table 5-5. PCI to Memory Write Burst Sequence Timing S S

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 ...

W 1 1 1 Y 1 Z 1 X 1 Z 1 Z 1 Z 1 X ...

W 1 1 Y 1 Z 1 X 1 Z 1 Z 1 Z 1 X ...

W 1 1 1 Y 1 X 1 Z 1 Z 1 Z 1 X ...

W 1 1 Y 1 X 1 Z 1 Z 1 Z 1 X ...

W 1 1 1 G 1 Z 1 Z 1 Z 1 X ...

W 1 1 G 1 Z 1 Z 1 Z 1 X ...

W 1 G 1 Z 1 Z 1 Z 1 X ...

W G 1 Z 1 Z 1 Z 1 X ...

W 1 1 1 Y 1 Z 1 X ...

S indicates a cache block boundary at 0 mod 32. Snoops are broadcast to the CPU bus when a PCI burst crosses one of these boundaries.

W is a function of a 1.5 PCI clock snoop delay and memory arbitration delays. If the CPU is accessing memory when the PCI agent begins the memory write burst, the 660 waits until the CPU completes the current CPU access before allowing the PCI to memory write to proceed. If the RAS# watchdog timer has timed out, the memory controller will precharge the RAS# lines, and if the refresh timer has timed out, the memory controller will do a refresh operation.

W (min) = 5 This occurs when the memory controller is idle and no refresh or RAS# timeout occurs.

W (typ) = 6 or 7 This occurs if the memory controller is in the middle (beat 3 of 4) of serving a CPU burst transfer when the PCI burst starts, and no refresh or RAS# timeout occurs.

W (max) = 23 This occurs when CPU1 is just starting a burst transfer to memory, followed by CPU2 starting a burst transfer to memory, after which a refresh happens to be required.

X is a function of snoop delays only. Whenever the memory access crosses a cache block boundary, the Bridge broadcasts a snoop cycle on the CPU bus. (Due to the posted write buffer structure, delays incurred by crossing a page boundary here do not show up until later in the sequence.)

X = 3 Always. (The only benefit to disabling PCI snooping or enabling pre–snooping is to reduce this delay to 1. Otherwise neither function increases performance.)

Y is a function of memory latency. This page and/or bank miss delay can only be incurred at a page boundary, but shows up here due to the posted write buffer structure. The Bridge has a 4 x 4 posted PCI write buffer, which allows it to accept data phases from the PCI bus while the memory controller is busy servicing page misses. This minimizes the transfer delays caused by these memory overhead functions.

Y (typ) = 1 This occurs for a page hit with no refresh. This is also the minimum.

Y (mid) = 2 This occurs for a page miss with no refresh.

Y (max) = 4 This occurs for a refresh (which also forces a page miss).

Z is a function of a subset of the W factors (RAS# timeouts and refresh operations). This delay is only incurred due to a RAS# timeout or refresh request that has occurred since the last W, Y, Z, or G.

Z (typ) = 2 to 3 This occurs for no refresh and no RAS# timeout.

Z (max) = 3 to 4 This occurs for either a RAS# timeout or a refresh operation.

G is the combination of X and Y, and is equal to the longer of X and Y.

5.2.8 PCI to Memory Reads

During PCI to memory burst reads, the 660 performs memory pre-fetching when it initiates cycles to the memory controller. The pre-fetching involves loading or pre-loading 32 bytes from the memory for eight 4-byte PCI read cycles. Pre-fetching is only done within the same cache line.

Minimum initial read access time from 70ns DRAM when the CPU bus is 66MHz and the PCI bus is 33MHz, is 8-1-1-1 -1-1-1-1 PCI clocks for 4-4-4-4 -4-4-4-4 bytes of data (15 PCI clocks for 32 bytes of data). Subsequent data phases of the same burst are generally serv-iced at -7-1-1-1 -1-1-1-1 (14 PCI clocks for 32 bytes of data), giving a peak burst read rate of 32 bytes in 14 PCI clocks, or about 73MBps with a 33MHz PCI clock. This scenario holds while the RAS# timer (10us typical) does not time out and no refresh is requested (15us typ).

5.2.8.1 Detailed Read Burst Sequence Timing

The actual detailed read sequence is affected by several factors, such as the speed of the DRAM, refresh requests, memory arbitration delays, page and/or bank misses, and cache boundary alignment. Table 5-6 shows the details of the various sequences that a PCI to memory burst read will experience, depending on the address (relative to a cache block boundary) of the first data phase of the transaction. The starting address of the numbering sequence shown on the top row was arbitrarily chosen as xx00, and could be any 32–byte aligned boundary. The times shown in Table 5-6 are in PCI clock cycles, and do not include any cycles that the PCI master spends acquiring the PCI bus from the PCI bus arbiter. The initial data phase is timed from the assertion of FRAME# to the PCI clock at which the PCI master samples TRDY# active. Subsequent data phase times are from the PCI clock at which the previous TRDY# was sampled active to the PCI clock at which the current TRDY#

is sampled active.

All the numbers shown in Table 5-6 are for ECC, parity, or no-error-checking operation.

Section 5 — PCI Bus

Table 5-6. PCI to Memory Read Burst Sequence Timing S S

00 04 08 0C 10 14 18 1C 20 24 28 2C 30 34 38 3C 40 ...

N 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M ...

N 1 1 1 1 1 1 M 1 1 1 1 1 1 1 M ...

N 1 1 1 1 1 M 1 1 1 1 1 1 1 M ...

N 1 1 1 1 M 1 1 1 1 1 1 1 M ...

N 1 1 1 M 1 1 1 1 1 1 1 M ...

N 1 1 M 1 1 1 1 1 1 1 M ...

N 1 M 1 1 1 1 1 1 1 M ...

N M 1 1 1 1 1 1 1 M ...

N 1 1 1 1 1 1 1 M ...

S indicates a cache block boundary at 0 mod 32. Snoops are broadcast to the CPU bus when a PCI burst crosses one of these boundaries.

N is the number of PCI clocks required from the assertion of FRAME# until the master samples the first TRDY# (from the 660) active, and is a function of snoop and memory arbitration delays. If the CPU is accessing memory when the PCI agent begins the memory read burst, the 660 waits until the CPU completes the current CPU access before allowing the PCI to memory read to proceed. If the RAS# watchdog timer has timed out, the memory controller will precharge the RAS# lines, and if the refresh timer has timed out, the memory controller will do a refresh operation.

N (min) = 5 This occurs when the memory controller is idle and no refresh or RAS# timeout occurs, and the access produces a page hit.

N (typ) = 8 or 9 This occurs if the memory controller is in the middle (beat 3 of 4) of serving a CPU burst transfer when the PCI burst starts, and no refresh or RAS# timeout occurs.

N (max) = 26 This occurs when CPU1 is just starting a burst transfer to memory, followed by CPU2 starting a burst transfer to memory, after which a refresh happens to be required.

M is a function of a 2-clock snoop delay and other delays caused by bridge overhead functions. Whenever the memory access crosses a cache block boundary, the Bridge broadcasts a snoop cycle on the CPU bus.

M (typ) = 6 or 7 Unless a refresh or RAS# timeout occurs.

M (typ) = 7 or 8 This occurs for a refresh or RAS# timeout.

The memory controller, running at its own speed, requests up to 4, 8-byte memory reads (into 8, 4-byte buffers in the 663) while the PCI target engine of the 660 is servicing the memory read transaction. Under worst case conditions (slow memory, etc.), the memory controller just keeps up with the PCI bus, and N goes up. Under better conditions, the memory controller gets ahead of the PCI read process, and N decreases.

5.2.9 PCI BE# to CAS# Line Mapping

Table 5-7 shows which CAS# lines are activated when a PCI master writes memory. Note that CAS[0]# refers to byte addresses 0 mod 8, CAS[1]# refers to byte addresses 1 mod 8, etc.. For read cycles, eight bytes of memory data are read on each access, but the master receives only the desired 4 bytes. The bytes are read or written to memory independently of BE or LE mode (the endian mode byte swappers are situated between the CPU and the rest of the system, not between the PCI and the rest of the system).

Table 5-7. Active CAS# Lines – PCI to Memory Writes, BE or LE Mode PCI_

AD[2]

Byte Enables BE[ ]# Column Address Selects CAS[ ]#

AD[2] 3 2 1 0 0 1 2 3 4 5 6 7

0 1 1 1 1

0 1 1 1 0 X

0 1 1 0 1 X

0 1 1 0 0 X X

0 1 0 1 1 X

0 1 0 1 0 X X

0 1 0 0 1 X X

0 1 0 0 0 X X X

0 0 1 1 1 X

0 0 1 1 0 X X

0 0 1 0 1 X X

0 0 1 0 0 X X X

0 0 0 1 1 X X

0 0 0 1 0 X X X

0 0 0 0 1 X X X

0 0 0 0 0 X X X X

1 1 1 1 1

1 1 1 1 0 X

1 1 1 0 1 X

1 1 1 0 0 X X

1 1 0 1 1 X

1 1 0 1 0 X X

1 1 0 0 1 X X

1 1 0 0 0 X X X

1 0 1 1 1 X

1 0 1 1 0 X X

1 0 1 0 1 X X

1 0 1 0 0 X X X

1 0 0 1 1 X X

1 0 0 1 0 X X X

1 0 0 0 1 X X X

1 0 0 0 0 X X X X

Notes:

X = active. Blank = inactive. Byte enables would normally represent contiguous addresses. This table shows what would happen for all cases.

Section 5 — PCI Bus

Dans le document Preliminary, IBM Internal Use Only (Page 107-113)