Chapter 3. L1 and L2 Cache Operation
3-59
L2 Cache Interface
The L2 cache hardware ush mechanism is not present in earlier PowerPC microprocessor
implementations. Using L2CR[L2HWF] is the preferred mechanism for ushing the L2
cache on the MPC7400.
3.7.3.8.2 L2 Cache Software Flush
There are a variety of methods to ush the L2 cache using load,
dcbz
,
dcbf
, or AltiVec
stvxl
instructions. The L2 cache ush assist bit, L2CR[L2FA], simpliTes the software ushing
process. In normal (non-ushing) operations, L2FA is cleared and all lines are cast out from
the L1 data cache that have a status of CDMRSV = 01xxx1 (that is, the C bit is negated),
does not allocate in the L2 cache if they miss. However, when set, L2FA forces every
castout from the L1 data cache to allocate an entry in the L2 cache if that castout misses in
the L2 regardless of the state of the C bit.
L2FA should be set just prior to the beginning of the cache ush routine and cleared after
the series of instructions is complete. The address space should not be shared with any other
process to prevent snoop hit invalidations during the ushing routine. Exceptions should be
disabled during this time so that the FIFO replacement logic is not disturbed.
The following procedure is an efTcient L2 cache software ush algorithm using
stvxl
:
1. Set HID0[DCFA]
2. Set L2CR[L2FA] and clear L2CR[L2IO]
3. Set L2CR[L2DO] (to prevent instruction reloads of the L2)
4. Disable all interrupts (to avoid disturbing cache replacement pointers)
5. Execute three uniquely addressed
stvxl
instructions to each 32-byte block of the L2
cache. The three stores must be to the same L2 index (that is, bits 12D26 of the
physical address must be equal). The following pseudo-C code provides an example
of how to do this. Note that this example assumes data transalation is disabled
(MSR[DR] = 0):
r1=0x00000000;/* r1, r2, and r3 can be any values as long */
r2=0x10000000;/* as bits 12-26 are the same for all three */
r3=0x20000000;/* and bits 0-11 are different between all three */
r4=0x0;
r5=0x10;
for (i=0; i<L2_SIZE_IN_BYTES / 32; i++) {
stvxl r0, r1, r4; stvxl r0, r1, r5;
stvxl r0, r2, r4; stvxl r0, r2, r5;
stvxl r0, r3, r4; stvxl r0, r3, r5;
r4 += 0x20; r5 +=0x20;}
The second store to each cache block (using r5) is for performance reasons. The MPC7400
merges the entire 32-byte cache block for each
stvxl
pair. If the stores are mapped global
(M = 1), then the stores perform address-only kill transactions on the bus because they
merge to the full 32-byte cache block. If the stores are mapped non-global
(M = 0), then the stores merge to 32 bytes and silently allocate in the L1 data cache. See
Section 3.6.5, òStore Miss Merging,ó for more information on store miss merging, Note