• Aucun résultat trouvé

Cache Operations

Dans le document PowerPC 750™ PowerPC 740™ (Page 163-167)

Instruction and Data Cache Operation

3.5 Cache Operations

This section describes the 750 cache operations.

3.5.1 Cache Block ReplacementlCastout Operations

Both the instruction and data cache use a pseudo least-recently-used (PLRU) replacement algorithm when a new block needs to be placed in the cache. When the data to be replaced is in the modified (M) state, that data is written into a castout buffer while the missed data is being accessed on the bus. When the load completes, the 750 then pushes the replaced cache block from the castout buffer to the L2 cache (if L2 is enabled) or to main memory (if L2 is disabled).

The replacement logic first checks to see if there are any invalid blocks in the set and chooses the lowest-order, invalid block (L[0-7]) as the replacement target. If all eight blocks in the set are valid, the PLRU algorithm is used to determine which block should be replaced. The PLRU algorithm is shown in Figure 3-5.

3-18 IBM PowerPC 740 I PowerPC 750 RISC Microprocessor User's Manual

80=0

X

81 = 0 81 = 1

L

LO invalid

~ ~I~~

LO valid

6--L1

invalid~

L1 valid

6 - - L2 invalid

~ ~I~~

L2 valid

6 - - L3 invalid

~ ~Ifr~

L3 valid

6 - - L4 invalid

~I~a~

L4 valid

6 - - L5 invalid

~ITIia~

L5 valid

6 - - L6 invalid

~ ~I~a~

L6 valid

6 - - L7 invalid

~ ~I~a~

L7 valid

A

80 = 1

~

82=0 82 = 1

;( ~ ;( ~

83=0 83 = 1 84=0 84 = 1 85=0 85 = 1 86=0 86 = 1

Figure 3-5. PLRU Replacement Algorithm

Each cache is organized as eight blocks per set by 128 sets. There is a valid bit for each block in the cache, L[0-7]. When all eight blocks in the set are valid, the PLRU algorithm is used to select the replacement target. There are seven PLRU bits, B [0-6] for each set in the cache. For every hit in the cache, the PLRU bits are updated using the rules specified in Table 3-2.

Table 3-2. PLRU Bit Update Rules

If the Then the' PLRU bits are Changed to:1

Current Access is

BO 81 82 83 B4 85 B6

To:

La 1 1 x 1 x x x

L1 1 1 x a x x x

L2 1 a x x 1 x x

L3 1 a x x a x x

L4 a x 1 x x 1 x

L5 a x 1 x x a x

L6 a x a x x x 1

L7 a x a x x x a

Note: 1 X = Does not change

If all eight blocks are valid, then a block is selected for replacement according to the PLRU bit encodings shown in Table 3-3.

Table 3-3. PLRU Replacement Block Selection

Then the Block

If the PLRU Bits Are: Selected for

Replacement Is:

a a a La

a a B3 1 L 1

a B1

1 a L2

B4

a 1 1 L3

BO

1 a a L4

1 a B5 1 L5

1 B2

1 a L6

1 1 B6 1 L7

3-20 IBM PowerPC 740 I PowerPC 750 RISC Microprocessor User's Manual

During power-up or hard reset, all the valid bits of the blocks are cleared and the PLRU bits cleared to point to block LO of each set. Note that this is also the state of the data or instruction cache after setting their respective flash invalidate bit (HIDO[DCFI] or HIDO[ICFI]).

3.5.2 Cache Flush Operations

The instruction cache can be invalidated by executing a series of icbi instructions or by setting HIDO[ICFI]. The data cache can be invalidated by executing a series of debi instructions or by setting HIDO[DCFI].

Any modified entries in the data cache can be copied back to memory (flushed) by using the debf instruction or by executing a series of 12 uniquely addressed load or debz instructions to each of the 128 sets. The address space should not be shared with any other process to prevent snoop hit invalidations during the flushing routine. Exceptions should be disabled during this time so that the PLRU algorithm does not get disturbed.

The data cache flush assist bit, HIDO[DCFA], simplifies the software flushing process.

When set, HIDO[DCFA] forces the PLRU replacement algorithm to ignore the invalid entries and follow the replacement sequence defined by the PLRU bits. This reduces the series of uniquely addressed load or debz instructions to eight per set. HIDO[DCFA] should be set just prior to the beginning of the cache flush routine and cleared after the series of instructions is complete.

3.5.3 Data Cache-Block-Fill Operations

The 750's data cache blocks are filled in four beats of 64 bits each, with the critical double word loaded first. The data cache is not blocked to internal accesses while the load (caused by a cache miss) completes. This functionality is sometimes referred to as 'hits under misses,' because the cache can service a hit while a cache miss fill is waiting to complete.

The critical-double-word read from memory is simultaneously written to the data cache and forwarded to the requesting unit, thus minimizing stalls due to cache fill latency.

A cache block is filled after a read miss or write miss (read-with-intent-to-modify) occurs in the cache. The cache block that corresponds to the missed address is updated by a burst transfer of the data from the L2 or system memory. Note that if a read miss occurs in a system with multiple bus masters, and the data is modified in another cache, the modified data is first written to external memory before the cache fill occurs.

3.5.4 Instruction Cache-Block-Fill Operations

The 750's instruction cache blocks are loaded in four beats of 64 bits each, with the critical double word loaded first. The instruction cache is not blocked to internal accesses while the fetch (caused by a cache miss) completes. On a cache miss, the critical and following double words read from memory are simultaneously written to the instruction cache and forwarded to the instruction queue, thus minimizing stalls due to cache fill latency. There is no snooping of the instruction cache.

3.5.5 Data Cache-Block-Push Operation

When a cache block in the 750 is snooped and hit by another bus master and the data is modified, the cache block must be written to memory and made available to the snooping device. The cache block that is hit is said to be pushed out onto the 60x bus. The 750 supports two kinds of push operations-normal push operations and enveloped high-priority push operations, which are described in Section 3.5.5.1, "Enveloped High-Priority Cache-Block-Push Operation."

3.5.5.1 Enveloped High-Priority Cache-Block-Push Operation

In cases where the 750 has completed the address tenure of a read operation, and then detects a snoop hit to a modified cache block by another bus master, the 750 provides a high-priority push operation. If the address snooped is the same as the address of the data to be returned by the read operation, ARTRY is asserted one or more times until the data tenure of the read operation is completed. The cache-block-push transaction can be enveloped within the address and data tenures of a read operation. This feature prevents deadlocks in system organizations that support multiple memory-mapped buses.

More specifically, the 750 internally detects the scenario where a load request is outstanding and the processor has pipelined a write operation on top of the load. Normally, when the data bus is granted to the 750, the resulting data bus tenure is used for the load operation. The enveloped high-priority cache block push feature defines a bus signal, data bus write only (DBWO), which when asserted with a qualified data bus grant indicates that the resulting data tenure should be used for the store operation instead. This signal is described in Section 8.10, "Using Data Bus Write Only." Note that the enveloped copy-back operation is an internally pipelined bus operation.

Dans le document PowerPC 750™ PowerPC 740™ (Page 163-167)