Internal Instruction and Data - i860™ XP MICROPROCESSOR

The i860 XP microprocessor has separate data and instruction caches on-chip. Having separate caches for instructions and data allows simultaneous cache look-up. Up to two instructions and 128 bits of data can be accessed simultaneously from these caches.

The data and instruction caches hold 16 Kbytes each. A line can be filled from memory with a four-transfer burst.

The caches are fully transparent to applications soft-ware. Snooping (address monitoring) is designed into both instruction and data caches, to maintain cache consistency in multiprocessor systems.

Each cache has two sets of tags: virtual tags used for internal access, and physical tags used for

snooping. Figure 3.3 shows how the bits of both vir-tual and physical addresses are mapped for cach-ing. The presence of both virtual and physical tags supports aliasing, a situation in which the TLBs as-sociate a single physical address with two or more virtual addresses.

Any area of memory can be cached, although both software and hardware can disallow certain areas from being cached-software by setting the CD bit in their page table entries; hardware by deasserting the KEN # signal for bus cycles with addresses that fall in those areas. (Data reads from the two four-Kbyte pages pointed to by the CCUBASE field of ccr are not cached (and the CACHE# signal is inactive), if the DCCU is activated by setting CO of the ccr register. This is independent of the value of KEN #.) When both software and hardware agree that a re-quested datum is cacheable, the i860 XP microproc-essor fetches an entire 32-byte line and places it into the appropriate cache. Cache line fills are gen-erated only for read misses, not for write misses. A store that misses the cache does not copy the missed line into cache from memory, but rather posts the datum in a write buffer, then sends it to the external bus when the bus is available.

INTERNALLY GENERATED ADDRESSES

'SIJ0292827262524 2322 21 20191817161514 13 121110 9 8 7 6 5 -I J 2

CACHE TAG SET SELECT

CACHE TAG

EXTERNALLY GENERATED INQUIRY (SNOOP) ADDRESSES

240874-24

Figure 3.3. Cache Address Usage

intJ

i860™ XP MICROPROCESSOR

3.2.1 DATA CACHE

Figure 3.4 shows the organization of the data cache.

The data cache has two status bits per physical tag and one validity status bit for the virtual tag. A virtual tag hit is possible only when the validity bit of the virtual tag is set and the state of the physical tag is M, E, or S.

Aliasing support is built into the cache look-up algo-rithm. Even though a physical line may be aliased, the processor never enters the line twice in the data cache. If a virtual address is not found among the virtual tags in the data cache, a bus cycle is initiated (except a read is not issued at this time if the bus pipeline is full) and, at the same time, the physical tags are searched for the physical address (which by this time has been retrieved from the paging unit). updated, and the virtual tag is replaced with the new one. However, the cache state (M, E, or S) of the physical-address tag does not change when the vir-tual tag is overwritten.

Note that the BE (big end ian) bit of epsr has no influence on data cache behavior. Data items are kept in cache in exactly the same ordering as in ex-ternal memory. Byte-shifting operations invoked by the BE bit upon loads and stores occur at the input to the register files only.

3.2.1.1 Data Cache Update Policies

To minimize bus traffic, a write-back policy is normal-ly used. The write-back policy (also called copy-back and deferred-write) reduces bus traffic by eliminating

VIRTUAL

V Ox18 Ox10

NOTES: TAG

M Modified VIRTUAL

V Ox18 Ox10

E Exclusive ^TAG

S Shared ^VIRTUALTAG V Ox18 Ox10

many unnecessary writes. Writes to a line in the cache are not immediately forwarded to main mem-ory; instead, they are accumulated in the cache. The modified cache line is written to main memory only virtual tags for hit, another to update the cache line).

However, the cache pipeline allows successive store hits to operate at one per cycle. The proces-sor's internal write buffers can hold two successive stores, preventing a freeze upon store miss.

Under a write-through policy, a write request to a line in the cache triggers updates to both cache and main memory. An address decoder, for example, can select the write-through policy for writes to video RAM, where it is necessary that writes be seen on the video display. Software, by setting the WT page-table bit, can select the write-through policy for spe-cific areas of memory-those that are used for inter-processor message queues, for example.

A write-once policy combines write-through with write-back. Write-through is employed for the first write to a cache line, while subsequent writes to the same line follow the write-back policy. Write-once is valuable in multiprocessor systems to maintain cache consistency with the least possible bus traffic.

The first write broadcasts to other processor nodes the fact that a line has been modified. Write-once is also used if a second-level cache is attached to the i860 XP microprocessor to maintain consistency be-tween the first- and second-level caches.

The external system can dynamically change the up-date policy (write-back, write-through, write-once) of the i860 XP microprocessor with each cache line.

8 0 PHYSICAL

MESI TAG

I Invalid ~32-BYTE LlNES--l

V Validity 240874-25

Figure 3.4. Data Cache Organization

inter

i860™ XP MICROPROCESSOR 3.2.2 INSTRUCTION CACHE

Figure 3.5 shows the organization of the instruction cache. The instruction cache has one validity bit that is common to both virtual and physical tags. Aliasing support for instructions consists not simply of chang-ing the virtual tag, but rather fetchchang-ing a line whenev-er a virtual tag miss occurs. If the physical address already exists in the instruction cache, its line and its tags are overwritten. So, even though a physical line may be aliased, the processor never enters the line twice in the instruction cache.

3.2.3 CACHE REPLACEMENT ALGORITHM The data, instruction, and address-translation caches all use similar algorithms to choose which of the four cache blocks will be overwritten when a miss causes a line fetch.

First, the first invalid line (if any) in a set of four is replaced (in the order 0, 1, 2, 3). When there are no more invalid lines in a set, a pseudorandom replace-ment algorithm chooses which valid lines to replace.

The algorithm is controlled by counters inside the chip. RESET initializes these counters to zero, so that the "randomness" is deterministic and two i860 XP CPUs executing the same code on identical boards have exactly the same series of cache hits, misses, and replacements.

r

^OxlO

'"

,..

_VIRTUAL

""

^~ ^TAG ^Oxl0

'" => ^VIRTUAL_TAG OxlO

1

^VIRTUAL^TAG ^Oxl0

NOTE:

V Validity

Setting ITI to invalidate the caches and TLSs also resets the counters used to select the set used for cache line replacement. This brings the i860 XP mi-croprocessor cache-replacement mechanism to a known state without resetting the whole chip.

When the flush instruction is used to write back modified lines in the data cache, the flush routine must alter the RC (replacement control) field of dirbase. Therefore, replacement is not random. In-stead, the block (or "way") replaced is the one se-lected by the RS (replacement block) field of dirbase.

3.2.4 CACHE CONSISTENCY PROTOCOL The i860™ XP Microprocessor implements cache consistency via its use of a MESI (Modified, Exclu-sive, Shared, Invalid) protocol.

3.2.4.1 Data Cache States

Each line of the data cache of the i860 XP micro-processor can be in one of the states defined in Ta-ble 3.1. Note that the instruction cache of the i860 XP only implements the "SI" part of the MESI protocol, because the instruction cache is not writa-ble.

PHYSICAL TAG PHYSICAL

TAG PHYSICAL

TAG

240874-26

Figure 3.5. Instruction Cache Organization Table 3.1. MESI Cache Line States

M E S I

Cache Line State: Modified Exclusive Shared Invalid

This cache line is valid? Yes Yes Yes No

The memory copy is ... ... out of date . .. valid ... valid

-Copies exist in other caches? No No Maybe Maybe

A write to this line ... ... does not go ... does not go ... goes to bus . .. goes

to bus to bus and updates directly to bus

the cache

inter

^i860™XP MICROPROCESSOR

Table 3.2. Internally Initiated Cache State Transitions State Next State after Read Next State after Write'

I If WB/WT# = 1; E; else S Write-through

The state of a cache line can change as the result of either internal or external activity related to that line.

Table 3.2 presents the line state transitions that re-sult from internal activity of the i860 XP microproces-sor in the data cache.

External cache-consistency support is provided through inquiry cycles. Inquiry cycles are initiated by other processors in a multiprocessor system to

Table 3.3. Inquiry-Initiated Cache State Transitions

3.2.4.2 Write-Once Policy

A write-once cache policy can be implemented write cycles, a write-once cache policy is implement-ed. The easiest way to implement write-once (in sys-tems not using the 82495XP cache controller) is to tie this pin to the W IR # output of the processor.

If the WT bit in the page table entry is set, the i860 XP microprocessor ignores the WB/WT # sig-nal for the cycles that hit that page and always per-forms a write-through. In other words, hardware can-not override software's selection of the write-through policy.

3.2.4.3 Locked Access

Locked accesses are those data loads and stores that occur after a lock instruction up to and including the first load or store after the corresponding unlock instruction.

State transitions for locked accesses differ from those in Table 3.2 in ways that guarantee that locked accesses are seen by all processors in the system. Any locked load or store generates both a cache look-up and an external bus cycle, regardless of cache hit or miss. write-back changes the line to S or I state, the bus data is used.

i860™ XP MICROPROCESSOR

Locked accesses are totally serializing in the sense that:

To maximize performance, instruction fetches during the locked sequence are not serializing. When NA # invokes pipelining, instruction fetches may be issued while locked data fetches or stores remain on the bus.

Dans le document i860™ XP MICROPROCESSOR (Page 42-46)